ORACLE DUL BLOCK INDEX

The Block Index feature, support for corrupted file systems and disk groups

You can even unload a database that has no data files anymore.
A block index is a way to unload from any corrupted file system or disk. It will work for asm disk groups, but also for
unix file systems. The idea is that even if the file system structures are damaged, such a disk can contain a lot of
good database data blocks.
During a scan the whole disk or raw device is inspected, and a list of all blocks found is stored in a so called block
index.
If there are multiple databases it can be difficult to determine which blocks belong to which database.
Anyway if you encounter such a situation contact me for assistance, there is a sound basis of code, but maybe
additions are needed.
steps to create and use a block index
Make sure that the following parameters are set for your database:

compatible
db_block_size

start with an empty control.dul

you have to create a block index for each of the raw devices/disks on which the asm disks or file system
lived. For each device/disk do:
DUL> create block index data1 on /dev/oracleasm/disks/P_DATA01;
DUL> create block index data2 on /dev/oracleasm/disks/P_DATA02;
during the scan for every database db_name and db_id are printed, this gives an idea of how many
databases there were.
add these block indexes to the control.dul:
block index data1
block index data2
There is no need to add the datafiles in the control.dul, the information to find the data is already in the block
indexes.
restart DUL, bootstrap and unload procedures are unchanged.

Oracle Put offline datafiles online without recovery

Put offline datafiles online without recovery By Chen

 

Here I’ll introduce one method to put offline datafiles to be online without
recovery.
We know Oracle check conrolfile information with the datafile header
information to determine whether this DB is consistent, whether the DB need
crash recover or media recover or open directly.
When tablespaces/datafiles are offline normally, the DB issues checkpoint on
these datafiles and update information on the datafiles header and controlfies.
If these files are offline immediate, only controlfile information is updated. The
files can be put online without recovery if they are offline normally, otherwise
they need recovery.
I take this experiment on noarchive mode DB. The main steps are:
1. Put one datafile offline;
2. Switch logfile, cause this offline datafile miss necessary redo logs to put it
online;
3. Modify this file header directly, advance the miss redo logs;
4. Re-create control file, the main purpose is to get rid of controlfile
information affect
@>conn test/test
Connected.
@>drop table t1;
Table dropped.
@>create table t1 tablespace test as select rownum id from all_objects where
rownum<6; Table created. @>conn /as sysdba
Connected.
@>archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination /opt/app/oracle/product/9.2.0/dbs/arch
Oldest online log sequence 37
Current log sequence 39
@>select tablespace_name ,file_name from dba_data_files; 
TABLESPACE_NAME FILE_NAME
------------------------------ --------------------------------------------------
SYSTEM /u03/oradata/9204/chen/system01.dbf
UNDOTBS /u03/oradata/9204/chen/undotbs01.dbf
TEST /u03/oradata/9204/chen/test01.dbf
ASSM /u03/oradata/9204/chen/assm01.dbf
@>alter database datafile '/u03/oradata/9204/chen/test01.dbf' offline drop;
Database altered.
@>desc test.t1
 Name Null? Type
 ----------------------------------------- -------- ----------------------------
 ID NUMBER
@>select * from test.t1;
select * from test.t1
 *
ERROR at line 1:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: '/u03/oradata/9204/chen/test01.dbf'
@>select * from v$recover_file;
 FILE# ONLINE ONLINE_ ERROR
CHANGE# TIME
---------- ------- -------
----------------------------------------------------------------- ---------- ---------
 3 OFFLINE OFFLINE
639738 13-FEB-08
@>alter system switch logfile;
System altered.
…
@>alter database datafile '/u03/oradata/9204/chen/test01.dbf' online;
alter database datafile '/u03/oradata/9204/chen/test01.dbf' online
*
ERROR at line 1:
ORA-01113: file 3 needs media recovery
ORA-01110: data file 3: '/u03/oradata/9204/chen/test01.dbf' 
@>alter database recover datafile '/u03/oradata/9204/chen/test01.dbf';
alter database recover datafile '/u03/oradata/9204/chen/test01.dbf'
*
ERROR at line 1:
ORA-00279: change 639738 generated at 02/13/2008 06:09:57 needed for
thread 1
ORA-00289: suggestion : /opt/app/oracle/product/9.2.0/dbs/arch1_39.dbf
ORA-00280: change 639738 for thread 1 is in sequence #39
@>!ls /opt/app/oracle/product/9.2.0/dbs/arch1_39.dbf
ls: /opt/app/oracle/product/9.2.0/dbs/arch1_39.dbf: No such file or directory
@>select
name,STATUS,RECOVER,FUZZY,CHECKPOINT_CHANGE#,CHECKPOINT_COUN
T from v$datafile_header;
NAME STATUS REC FUZ
CHECKPOINT_CHANGE# CHECKPOINT_COUNT
-------------------------------------------------- ------- --- --- ------------------
----------------
/u03/oradata/9204/chen/system01.dbf ONLINE NO YES
654168 77
/u03/oradata/9204/chen/undotbs01.dbf ONLINE NO YES
654168 77
/u03/oradata/9204/chen/test01.dbf OFFLINE YES YES
639738 37
/u03/oradata/9204/chen/assm01.dbf ONLINE NO YES
654341 45
@>select
name,STATUS,CHECKPOINT_CHANGE#,LAST_CHANGE#,OFFLINE_CHANGE#
from v$datafile;
NAME STATUS
CHECKPOINT_CHANGE# LAST_CHANGE# OFFLINE_CHANGE#
-------------------------------------------------- ------- ------------------
------------ ---------------
/u03/oradata/9204/chen/system01.dbf SYSTEM
654168 0
/u03/oradata/9204/chen/undotbs01.dbf ONLINE
654168 0
/u03/oradata/9204/chen/test01.dbf RECOVER
639738 639868 639725
/u03/oradata/9204/chen/assm01.dbf ONLINE
654341 653877 
Now I use BBED to modify /u03/oradata/9204/chen/test01.dbf file header
information. Here I change four parts data: checkpoint SCN, checkpoint time,
checkpoint number and rba. I refer to system file to obtain the information.
BBED> set dba 3,1
 DBA 0x00c00001 (12582913 3,1)
BBED>
BBED> print kcvfh
struct kcvfh, 360 bytes @0
...
 struct kcvfhckp, 36 bytes @140
 struct kcvcpscn, 8 bytes @140
 ub4 kscnbas @140 0x0009c2fa
 ub2 kscnwrp @144 0x0000
 ub4 kcvcptim @148 0x26899f35
 ub2 kcvcpthr @152 0x0001
 union u, 12 bytes @156
 struct kcvcprba, 12 bytes @156
 ub4 kcrbaseq @156 0x00000027
 ub4 kcrbabno @160 0x000000ab
 ub2 kcrbabof @164 0x0010
 struct kcvcptr, 12 bytes @156
 struct kcrtrscn, 8 bytes @156
 ub4 kscnbas @156 0x00000027
 ub2 kscnwrp @160 0x00ab
 ub4 kcrtrtim @164 0x00000010
...
 ub4 kcvfhcpc @176 0x00000025
 ub4 kcvfhrts @180 0x2689a13d
 ub4 kcvfhccc @184 0x00000024
...
BBED> dump /v dba 3,1 offset 140 count 32
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 140 to 171 Dba:0x00c00001
-------------------------------------------------------
 fac20900 00000000 359f8926 01009162 l ú?......5..&...b
 27000000 ab000000 10000000 02000000 l '...?........... 
 
BBED> set dba 1,1
 DBA 0x00400001 (4194305 1,1)
BBED>
BBED> print kcvfh
struct kcvfh, 360 bytes @0
...
 struct kcvfhckp, 36 bytes @140
 struct kcvcpscn, 8 bytes @140
 ub4 kscnbas @140 0x0009fb58
 ub2 kscnwrp @144 0x0000
 ub4 kcvcptim @148 0x268a4e7e
 ub2 kcvcpthr @152 0x0001
 union u, 12 bytes @156
 struct kcvcprba, 12 bytes @156
 ub4 kcrbaseq @156 0x00000031
 ub4 kcrbabno @160 0x0000000e
 ub2 kcrbabof @164 0x0010
 struct kcvcptr, 12 bytes @156
 struct kcrtrscn, 8 bytes @156
 ub4 kscnbas @156 0x00000031
 ub2 kscnwrp @160 0x000e
 ub4 kcrtrtim @164 0xbfff0010
...
 ub4 kcvfhcpc @176 0x0000004d
 ub4 kcvfhrts @180 0x26899cac
 ub4 kcvfhccc @184 0x0000004c
BBED> dump /v dba 1,1 offset 140 count 32
 File: /u03/oradata/9204/chen/system01.dbf (1)
 Block: 1 Offsets: 140 to 171 Dba:0x00400001
-------------------------------------------------------
 58fb0900 00000000 7e4e8a26 01003495 l X?......~N.&..4.
 31000000 0e000000 1000ffbf 02000000 l 1..........?....
 
BBED> modify /x 58fb0900 dba 3,1 offset 140
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
 File: /u03/oradata/9204/chen/test01.dbf (3) 
 Block: 1 Offsets: 140 to 171 Dba:0x00c00001
------------------------------------------------------------------------
 58fb0900 00000000 359f8926 01009162 27000000 ab000000 10000000
02000000
 
BBED> modify /x 7e4e8a26 dba 3,1 offset 148
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 148 to 179 Dba:0x00c00001
------------------------------------------------------------------------
 7e4e8a26 01009162 27000000 ab000000 10000000 02000000 00000000
25000000
 
BBED> modify /x 31 dba 3,1 offset 156
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 156 to 187 Dba:0x00c00001
------------------------------------------------------------------------
 31000000 ab000000 10000000 02000000 00000000 25000000 3da18926
24000000
 
BBED> modify /x 0e dba 3,1 offset 160
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 160 to 191 Dba:0x00c00001
------------------------------------------------------------------------
 0e000000 10000000 02000000 00000000 25000000 3da18926 24000000
00000000
 

BBED> modify /x 10 dba 3,1 offset 164
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 164 to 195 Dba:0x00c00001
------------------------------------------------------------------------
 10000000 02000000 00000000 25000000 3da18926 24000000 00000000
00000000
 
The checkpoint number will be calculated according to the above information. 
The sequence in the offline file is 0x27, and checkpoint is 0x25, now the
sequence is 0x31, so the checkpoint number can be 0x2f. I think there’re no
effects if the checkpoint number doesn’t be changed.
0x27 -> 0x25
0x31 -> 0x2f
BBED> modify /x 2f dba 3,1 offset 176
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 176 to 207 Dba:0x00c00001
------------------------------------------------------------------------
 2f000000 3da18926 24000000 00000000 00000000 00000000 00000000
00000000
 
BBED> modify /x 2e dba 3,1 offset 184
 File: /u03/oradata/9204/chen/test01.dbf (3)
 Block: 1 Offsets: 184 to 215 Dba:0x00c00001
------------------------------------------------------------------------
 2e000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
 
BBED> sum dba 3,1 apply
Check value for File 3, Block 1:
current = 0x615a, required = 0x615a
Now I put this offline file online again through create controlfile and skip to
recovery through the missing redo logs.
@>alter database backup controlfile to trace;
Database altered.
@>shutdown abort
ORACLE instance shut down.
@>STARTUP NOMOUNT
ORACLE instance started.
Total System Global Area 470881660 bytes 
Fixed Size 451964 bytes
Variable Size 369098752 bytes
Database Buffers 100663296 bytes
Redo Buffers 667648 bytes
@>CREATE CONTROLFILE REUSE DATABASE "CHEN" NORESETLOGS
NOARCHIVELOG
 2 -- SET STANDBY TO MAXIMIZE PERFORMANCE
 3 MAXLOGFILES 5
 4 MAXLOGMEMBERS 5
 5 MAXDATAFILES 100
 6 MAXINSTANCES 1
 7 MAXLOGHISTORY 226
 8 LOGFILE
 9 GROUP 1 '/u03/oradata/9204/chen/redo01.log' SIZE 10M,
 10 GROUP 2 '/u03/oradata/9204/chen/redo02.log' SIZE 10M,
 11 GROUP 3 '/u03/oradata/9204/chen/redo03.log' SIZE 10M
 12 -- STANDBY LOGFILE
 13 DATAFILE
 14 '/u03/oradata/9204/chen/system01.dbf',
 15 '/u03/oradata/9204/chen/undotbs01.dbf',
 16 '/u03/oradata/9204/chen/test01.dbf',
 17 '/u03/oradata/9204/chen/assm01.dbf'
 18 CHARACTER SET US7ASCII
 19 ;
Control file created.
@>RECOVER DATABASE
Media recovery complete.
@>ALTER DATABASE OPEN;
Database altered.
@>desc test.t1
 Name Null? Type
 ----------------------------------------------------- --------
------------------------------------
 ID NUMBER
@>select * from test.t1;
 ID
----------
 1
 2 
 3
 4
 5
We can use this approach to get data back in some situations which data can’t
be got through normal methods.
But it maybe miss some data and the dictionary will mismatch with the actual
data.
References
Disassembling the Oracle Data Block
Advanced Backup, Restore, and Recover Techniques
Recovery architecture Components
msn: y.p.chen@hotmail.com
blog: http://freelists.spaces.live.com

【Oracle数据恢复】对被truncated表的恢复

【Oracle数据恢复】对被truncated表的恢复

 

1、 使用普通备份恢复机制恢复

2、

清理RMAN Catalog恢复目录

有这样一个需求,用户使用RMAN Catalog恢复目录保存多台Database Server的备份信息。 由于每天都会执行大量的backup操作,而且所备份的DB的结构本身都已经十分复杂了,导致recovery catalog恢复目录占用空间迅速增长,且用户的磁盘空间较为紧张,可能在短期内无法扩disk space,这就让我们考虑到需要清理RMAN Recovery Catalog中一些不再需要的记录。

首先搞清楚在Catalog库中那些数据段占用了最多的空间Space:

 

这里 RMAN 用户是Catalog Schema 的拥有者

SQL>  select bytes/1024/1024 "MB" ,segment_name,segment_type
from dba_segments where owner='RMAN' order by bytes desc;

        MB SEGMENT_NAME         SEGMENT_TYPE
---------- -------------------- ------------------
        88 ROUT                 TABLE
        47 ROUT_U1              INDEX
        31 ROUT_I_RSR           INDEX
        23 ROUT_I_DB            INDEX
      .125 RSR                  TABLE

 

可以看到是ROUT表占用了最多的空间,那么这张表是做什么用处的呢?

ROUT表用于记录RMAN在执行backup等命令时的输出内容,实际上堆积了V$RMAN_OUTPUT视图中的内容,当目标数据库连接到CATALOG库执行操作或Resync Catalog时会将V$RMAN_OUTPUT中的信息同步到Catalog库的ROUT表中。

 

SQL> desc v$rman_output;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 SID                                                NUMBER
 RECID                                              NUMBER
 STAMP                                              NUMBER
 SESSION_RECID                                      NUMBER
 SESSION_STAMP                                      NUMBER
 OUTPUT                                             VARCHAR2(130)
 RMAN_STATUS_RECID                                  NUMBER
 RMAN_STATUS_STAMP                                  NUMBER
 SESSION_KEY                                        NUMBER

SQL> select output from v$rman_output where output is not null and rownum <3;

OUTPUT
--------------------------------------------------------------------------------
connected to target database: VPROD (DBID=881465081)
connected to recovery catalog database

 

ROUT表的定义存放在$ORACLE_HOME/rdbms/admin/recover.sql 中,具体如下:

 

CREATE TABLE rout
(
db_key          NUMBER NOT NULL,        -- database output belongs to
rsr_key         NUMBER NOT NULL,        -- command that generated the output
rout_skey       NUMBER NOT NULL,        -- session that created the output
rout_recid      NUMBER NOT NULL,        -- record id from server
rout_stamp      NUMBER NOT NULL,        -- timestamp when row was added
rout_text       VARCHAR2(130) NOT NULL, -- RMAN output
CONSTRAINT rout_u1 UNIQUE(db_key, rout_skey, rsr_key, rout_recid, rout_stamp),
CONSTRAINT rout_f1 FOREIGN KEY(db_key)
  REFERENCES db ON DELETE CASCADE,
CONSTRAINT rout_f2 FOREIGN KEY(rsr_key)
  REFERENCES rsr ON DELETE CASCADE
) &tablespace&

 

正是因为该Catalog恢复目录每天都会记录大量结构复杂(拥有众多数据文件、归档日志)的数据库的备份输出信息,导致ROUT表迅速膨胀,可能会存放几十万条记录,在我们的例子中:

 

SQL> select count(*) from rman.rout;

  COUNT(*)
----------
   1069485

 

根据文档<Rman Catalog Resync Operation is Very slow at 10G [ID 378234.1]>的介绍ROUT表包含了所有RMAN会话的输出内容,且这些内容仅对 Enterprise Manager企业管理器有用。 默认情况下ROUT表也会在每次Resync Catalog操作时被清理(clean up),但是在版本10.2.0.1-10.2.0.3仍会保留最近60天来的记录,当RMAN被频繁使用时60天的ROUT数据也可能非常多。(The ROUT table contains the RMAN output generated during all rman sessions and is used only by Enterprise Manager. The ROUT table is cleaned up automatically during each resync operation leaving by default, the last 60 days worth of entries in ROUT.   However where RMAN is used  very frequently, 60 days of ROUT entries is still too much.)

 

而从版本10.2.0.4 开始清理ROUT的存储过程cleanupROUT默认仅保留最近7天的数据,这就保证了ROUT不会占用过多的空间。

 

这里实际控制ROUT表中数据如何清理的recovery.sql中定义的cleanupROUT存储过程,在版本10.2.0.1 中该过程的定义如下:

 

-- The procedure cleanupROUT (deletes) all RC_RMAN_OUTPUT rows corresponding
-- to job older than 60 days.
PROCEDURE cleanupROUT IS
  start_time       date;
  high_stamp       number;
  high_session_key number;
BEGIN
  IF (this_db_key IS NULL) THEN
    raise_application_error(-20021, 'Database not set');
  END IF;

  start_time := SYSDATE;
  high_stamp := date2stamp(start_time-60);

  SELECT nvl(max(session_key), 0) into high_session_key
     from rc_rman_status where session_stamp < high_stamp;

  DELETE FROM rout
        WHERE rout_skey <= high_session_key
          AND this_db_key = rout.db_key;

  deb('cleanupROUT deleted ' || sql%rowcount || ' rows from rout table');
  deb('cleanupROUT took ' || ((sysdate - start_time) * 86400) || ' seconds');

END cleanupROUT;

 

实际我们只要修改这里high_stamp := date2stamp(start_time-60); 指定的60天的范围,就可以改变10.2.0.4前保留60天ROUT历史数据导致该表暴涨的问题。

 

首先备份recovery.sql 文件, 之后我们修改该cleanupROUT过程的定义如下:

 

 

cd $ORACLE_HOME/rdbms/admin
cp  recover.bsq recover.bsq.bak

PROCEDURE cleanupROUT IS
  start_time       date;
  high_stamp       number;
  high_session_key number;
BEGIN
  IF (this_db_key IS NULL) THEN
    raise_application_error(-20021, 'Database not set');
  END IF;

  start_time      := SYSDATE;
  high_stamp      := date2stamp(start_time-7);

  SELECT max(rsr_key) into high_session_key
    FROM rsr, dbinc
   WHERE dbinc.db_key = this_db_key
     AND rsr.dbinc_key = dbinc.dbinc_key
     AND rsr.rsr_stamp < high_stamp;

  deb('cleanupROUT select took ' || ((sysdate - start_time) * 86400) ||
      ' seconds');

  -- Delete rows from rout table for jobs older than 7 days.
  If high_session_key IS NOT NULL THEN
     DELETE FROM rout
     WHERE  db_key     = this_db_key
       AND  rout_skey <= high_session_key;
     deb('cleanupROUT deleted ' || sql%rowcount || ' rows from rout table');
  END IF;

  deb('cleanupROUT took ' || ((sysdate - start_time) * 86400) || ' seconds');

END cleanupROUT;

 

完成上述定义修改后,需要使以上的定义生效,连接到catalog库并升级catalog:

 

[oracle@vrh8 admin]$ rman target / catalog rman/rman@VPROD

Recovery Manager: Release 10.2.0.1.0 - Production on Mon Dec 5 00:37:35 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: G10R21 (DBID=2807279631)
connected to recovery catalog database

RMAN> upgrade catalog;

recovery catalog owner is RMAN
enter UPGRADE CATALOG command again to confirm catalog upgrade

RMAN> upgrade catalog;

recovery catalog upgraded to version 10.02.00.00
DBMS_RCVMAN package upgraded to version 10.02.00.00
DBMS_RCVCAT package upgraded to version 10.02.00.00

RMAN> resync catalog;

starting full resync of recovery catalog
full resync complete

 

完成上述resync后,ROUT会保留最近7天以来的记录,但是原本占用的空间不会释放,因为cleanupROUT仅仅是delete记录。

 

SQL> select count(*) from rman.rout;

  COUNT(*)
----------
        55

SQL>  select bytes/1024/1024 "MB" ,segment_name,segment_type from dba_segments where segment_name='ROUT';

        MB SEGMENT_NAME         SEGMENT_TYPE
---------- -------------------- ------------------
        88 ROUT                 TABLE

 

这里存在2种方案:

1. 直接truncate rout表,因为本身rman output的数据只对Enterprise Manager有用,对于很多没有部署EM的环境,ROUT上的这些记录实际没有太大的意义,直接truncate rout 可以回收其原占有的空间:

 

stop any rman catalog operation 停止任何RMAN操作

SQL> truncate table rman.rout;

Table truncated.

 

2. 利用10g以后出现的shrink space功能收缩空间,这样既保留了ROUT中的少量记录,又回收了必要的空间:

 

stop any rman catalog operation 停止任何RMAN操作

SQL>  alter table rman.rout enable row movement;

Table altered.

SQL> alter table rman.rout shrink space;

Table altered.

SQL> select bytes/1024/1024 "MB" ,segment_name,segment_type from dba_segments where segment_name='ROUT';

        MB SEGMENT_NAME         SEGMENT_TYPE
---------- -------------------- ------------------
     .0625 ROUT                 TABLE

如何rename datafile name中存在乱码的数据文件

存在这样的情况create tablespace.. datafile or alter tablespace add datafile时加入数据文件的datafile name中存在乱码,例如以下例子:

 

SQL> select file#,name from v$datafile where file#=20;

     FILE# NAME
---------- --------------------------------------------------
        20 /s01/锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟dbf

SQL> alter database rename file '/s01/锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟dbf' to '/s01/rename.dbf';
alter database rename file '/s01/锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟dbf' to '/s01/rename.dbf'
*
ERROR at line 1:
ORA-01511: error in renaming log/data files
ORA-01516: nonexistent log file, datafile, or tempfile
"/s01/锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟斤拷锟dbf"

 

 

以上直接rename存在乱码的数据文件可能遇到ORA-01516错误,对于这种由于存在乱码导致的数据文件管理问题,可以参考以下几种方案:

方法1. 通过alter database backup controlfile to trace; 修改backup controlfile to trace生成的CREATE CONTROLFILE脚本,把存在乱码的数据文件名修改为目标文件名; 这种方法稍微费力一点………….

 

方法2.
通过以下动态SQL执行alter database rename来修改数据文件名:

 

 

declare
x varchar2(600);
y varchar2(600);
begin
y:='/s01/rename.dbf';
select file_name into x from dba_data_files where file_id=20;
execute immediate 'alter database  rename file  '''||x||''' to '''||y||'''';
end;
/

PL/SQL procedure successfully completed.

SQL> select name,file# from v$datafile where file#=20;

NAME
--------------------------------------------------------------------------------
     FILE#
----------
/s01/rename.dbf
        20

 

这种方法通过动态SQL不涉及到乱码的输出和装换所以可以成功。请注意ONLINE 该数据文件!!

 

方法3.

 

 

RMAN> copy datafile 20 to '/s01/rename1.dbf';

RMAN> switch datafile 20 to copy
2> ;

datafile 20 switched to datafile copy "/s01/rename1.dbf"

RMAN> recover datafile 20;                  

RMAN> sql ' alter database datafile 20 online';

sql statement:  alter database datafile 20 online

 

 

直接使用RMAN COPY+ SWITCH DATAFILE TO COPY或者SETNAME都可以代劳帮助你解决该问题,因为RMAN可以直接使用FILE#指代而不需要如ALTER DATABASE RENAME FILE这样必须输入员文件名。

Oracle DUL Data recovery UnLoader what you need to know

 

DISCLAIMER

  • DUL is written by Bernard van Duijnen, from Oracle Support – Netherlands
  • DUL is NOT an Oracle product
  • DUL is NOT a supported Oracle product
  • DUL is strictly for Oracle Support and internal use only
  • DUL engagements are only initiated through the PSF as the result of a recovery effort
  • DUL exposes Oracle’s source code and MUST be strictly controlled

Use the export mode (DUL version 3) or write a Pro*C program to load LONG RAW data.

 

Life without DUL

  • Current recovery options
    • -restore and rollforward
    • -export/import
    • -use SQL*Loader to re-load the data
    • -(parallel) create table as select (PCTS)
    • -Transportable Tablespace

 

  • Diagnostic tools
    • -orapatch
    • -BBED (block browser/editor)
  • Undocumented parameters
    • -_corrupted_rollback_segments, _allow_resetlogs_corruption  etc…

 

Current Limitations

 

  • No alternatives in the case of loss of SYSTEM tablespace datafile(s)
  • The database must be in ‘reasonably’ good condition or else recovery is not possible (even with the undocumented parameters!)
  • Patching is very ‘cumbersome’ and is not always guaranteed to work
  • Certain corruptions are beyond patching
  • Bottom line – loss of data!!

The most common problem is the fact that customer’s backup strategy does not match their business needs.

Eg.  Customer takes weekly backups of the database, but in the event of a restore their business need is to be up and running within (say) 10 hours.   This is not feasible since the ‘rollforward’ of one week’s worth of archive logs would (probably) take more than 10 hours!!

 

Other Options without DUL

  • Building a cloned database exporting data, and importing into the recovery database.
  • Building a cloned database and using Transportable Tablespaces for recovery.

Solution

  • DUL could be a possible solution
    • -DUL (?) – Bernard says ‘Life is DUL without it!’
    • -bottom line – salvage as much data as possible

 

Why DUL?

  • Doesn’t need the database or the instance to be open
  • Does not use recovery, archive logs etc…
  • It doesn’t care about data consistency
    • -more tolerant to data corruptions
  • Does not require the SYTEM tablespace to recover

 

Overview of DUL

  • DUL is a utility that can unload data from “badly damaged” databases.
  • DUL will scan a database file, recognize table header blocks, access extent information, and read all rows
  • Creates a SQL*Loader or Export formatted output
    • -matching SQL*Loader control file is also generated

 

Overview – DUL will…

  • Read the Oracle data dictionary if the SYSTEM tablespace files are available
  • Analyze all rows to determine

-number of columns, column datatypes and column lengths

If the SYSTEM tablespace datafiles are not available DUL does its own analysis, more on this later…

 

Overview – DUL

  • DUL can handle all row types

-normal rows, migrated rows, chained rows, multiple extents, clustered tables, etc.

  • The utility runs completely unattended, minimal manual intervention is needed.
  • Cross platform unloading is supported

 

DUL can open other datafile(s) if there are extents in that datafile(s).

Although DUL can handle it, LONG RAW presents a problem for SQL*Loader – we’ll talk about this shortly…

For cross platform unloading the configuration parameters within “init.dul” will have to be modified to match those of the original platform and O/S rather than the platform from which the unload is being done.

DUL unloads in the physical order of the columns. The cluster key columns are always unloaded first.

 

DUL Concepts

  • Recovers data directly from Oracle data files

-the Database (RDBMS) is bypassed

  • Does dirty reads, it assumes that every transaction is committed
  • Does not check if media recovery has been done
  • DATABASE CORRUPT – BLOCKS OK
  • Support for Locally Managed Tablespaces

 

DUL does not require that media recovery be done.

Since DUL reads the data directly from datafiles,  it  reads data that is committed as well as uncommitted.  Therefore the data that is salvaged by DUL can potentially be logically corrupt.  It is upto the DBA and/or the Application programmers to validate the data.

 

 

Compatibility

 

  • The database can be copied from a different operating system than the DUL-host
  • Supports all database constructs:

-row chaining, row migration, hash/index clusters, longs, raws, rowids, dates, numbers, multiple free list groups, segment high water mark, NULLS, trailing NULL columns etc…

  • DUL should work with all versions 6 , 7, 8 and 9,10,11,12

 

DUL9i/92

  • The main new features are:

  Support for Oracle version 6, 7, 8 and 9

  Support for Automatic Space Managed Segments

  New bootstrap procedure: just use ‘bootstrap;’.   No more

       dictv6,7 or 8.ddl files

  LOB are supported in SQL*Loader mode only

  (Sub)Partitioned tables can be unloaded

  Unload a single (Sub)Partition

  Improved the scan tables

  The timestamp and interval datatypes

  Stricter checking of negative numbers

  (Compressed) Index Organized Tables be unloaded

  Very strict checking of row status flags

  Unload index to see what rows you are missing

  Objects, nested tables and varrays are not supported (internal 

        preparation for varray support )

 

The latest version is DUL92. The main new features are:

  • fix for problem with startup when db_block_size = 16K
  • fix for scan database and Automatic Space Managed Segments
  • fix for block type errors high block types; new max is 51
  • Support for Automatic Space Managed Segments
  • phase zero of new unformat command
  • internal preparation for varray support
  • Bug fix in the stricter checking of negative numbers
  • Bug fix in the unloading of clustered tables

 

Limitations

  • The database can be corrupted, but an individual data block used must be 100% correct

-blocks are checked to make sure that they are not corrupt and belong to the correct segment

  • DUL can and will only unload table/cluster data.

-it will not dump triggers, constraints, stored procedures nor create scripts for tables or views

-But the data dictionary tables describing these can be unloaded

 

Note: If during an unload a bad block is encountered, an error message is printed in the loader file and to standard output. Unloading will continue with the next row or block.

 

 

Configuring DUL

  • There are two configuration files for DUL
    • -init.dul
    • -control.dul
  • Configuration parameters are platform specific.

If you do decide that DUL is the only way to go, then here is how to go about configuring and using DUL.  Good Luck!!

 

 

init.dul

  • Contains parameters to help DUL understand the format of the database files
  • Has information on
    • -DUL cache size
    • -Details of header layout
    • -Oracle block size
    • -Output file format
    • -Sql*Loader format and record size.
    • -etc…

 

 

Sample init.dul file for Solaris looks like:
# The dul cache must be big enough to hold all entries from the Dictionary dollar tables.
dc_columns = 200000
dc_tables = 20000
dc_objects = 20000
dc_users = 40
# OS specific parameters
big_endian_flag = true
dba_file_bits = 6
align_filler = 1
db_leading_offset = 1
# Database specific parameters
db_block_size = 2048
# Sql*Loader format parameters
ldr_enclose_char = "
ldr_phys_rec_size = 81


 

 

control.dul

  • Used to translate the file numbers to file names
  • Each entry on a separate line, first the file_number then the data_file_name

-A third optional field is an extra positive or negative byte offset, that will be added to all fseek() operations for that datafile.

 

Sample “control.dul”

 

# AIX version 7 example with one file on raw device
   1 /usr/oracle/dbs/system.dbf
   8 /dev/rdsk/data.dbf 4096

   # Oracle8 example with a datafile split in multiple parts, each part smaller than 2GB
   0  1 /fs1/oradata/PMS/system.dbf
   1  2 /tmp/huge_file_part1 startblock 1 endblock 1000000
   1  2 /tmp/huge_file_part2 startblock 1000001 endblock 2000000
   1  2 /mnt3/huge_file_part3 startblock 2000001 endblock 2550000


 

Using DUL

  • Case 1:

-SYSTEM tablespace available

  • Case 2:

-Using DUL without the SYSTEM tablespace

 

Case1: Data dictionary usable

  • Straight forward method
  • Execute ‘dul’ from os prompt then ‘bootstrap’ from DUL
  • Don’t need to know about the application tables structure, column types etc…

 

Case2: Without the SYSTEM tablespace

  • Needs an in depth knowledge about the application and the application tables

The unloaded data does not have any value, if you do not know from which table it came from

  • Column types can be guessed by DUL but table and column names are lost

The guessed column types can be wrong

Note:

1) Any old SYSTEM tablespace from the same database but weeks old can be of great help!

2) If you recreate the tables (from the original CREATE TABLE scripts) then the structural information of a “lost” table can be matched to the “seen” tables scanned with two SQL*Plus scripts. (fill.sql andgetlost.sql)

Steps to follow:

1.configure DUL for the target database. This means creating a correct init.dul and control.dul.

2.SCAN DATABASE; : scan the database for extents and segments.

3.SCAN TABLES; : scan the found segments for rows.

4.SCAN EXTENTS; : scan the found extents.

5.Identify the lost tables from the output of step 3.

6.UNLOAD the identified tables.

 

  • DUL will not find “last” columns that only contain NULL’s

Trailing NULL columns are not stored in the database

  • Tables that have been dropped can be seen

When a table is dropped, the description is removed from the data dictionary only

  • Tables without rows will go unnoticed

 

DUL startup steps

During startup DUL goes through the following steps:

  • the parameter file “init.dul” is processed
  • the DUL control file (default “control.dul”) is scanned
  • try to load dumps of the USER$, OBJ$, TAB$ and COL$ if available into DUL’s data dictionary cache
  • try to load seg.dat and col.dat.
  • accept DDL-statements or run the DDL script specified as first argument

 

Availability

Available on most common platforms

  • Solaris
  • AIX
  • Windows
  • Linux

 

DuL with Dictionary

  • Configure init.dul and control.dul
  • Load DuL
  • Bootstrap
  • Unload database, user, table

DuL without Dictionary

  • Configure init.dul and control.dul (control will include

the datafiles needing to be recovered only).

  • Load DuL
  • alter session set use_scanned_extent_map = true
  • scan database
  • scan tables
  • Using the found table definitions construct an uload

statement:

 

 

unload table dul2.emp (EMPLOYEE_ID number(22), FIRST_NAME varchar2(20),
LAST_NAME varchar2(25),
EMAIL varchar2(25),PHONE_NUMBER varchar2(20), HIRE_DATE date, JOB_ID varchar2 (10),
SALARY number(22), COMMISSION_PCT number(22),MANAGER_ID number(22),
DEPARTMENT_ID number(22))
storage (dataobjno 28200);

11g新特性recover corruption list

11g新特性RMAN语法recover corruption list是为了简化数据坏块的修复,在11g中recover corruption块时不需要一一指定数据文件名字了,只要是在v$database_block_corruption视图中记录的坏块,只要使用了 corruption list语法,都会试图修复。

 

下面我们使用recover .. clear命令手动造成个别数据块坏块,之后使用 recover corruption list;修复:

 

RMAN> recover datafile 8 block 100 clear;

Starting recover at 25-NOV-09
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=16 device type=DISK
Finished recover at 25-NOV-09

RMAN> 

RMAN> validate datafile 8 block 100;

Starting validate at 25-NOV-09
using channel ORA_DISK_1
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
input datafile file number=00008 name=+DATA/prodb/datafile/test.262.794687963
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
8    FAILED 0              0            1               0         
  File Name: +DATA/prodb/datafile/test.262.794687963
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0               
  Index      0              0               
  Other      1              1               

validate found one or more corrupt blocks
See trace file /s01/diag/rdbms/prodb/PRODB/trace/PRODB_ora_16689.trc for details
Finished validate at 25-NOV-09

Corrupt block relative dba: 0x02000064 (file 8, block 100)
Bad check value found during validation
Data in bad block:
 type: 30 format: 2 rdba: 0x02000064
 last change scn: 0x0000.00185030 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x50301e01
 check value in block header: 0xcdb0
 computed block checksum: 0x4db5
ksfdrfms:Mirror Read file=+DATA/prodb/datafile/test.262.794687963 fob=0xcdc3a6a0 bufp=0x7f81c5726000 blkno=100 nbytes=8192
ksfdrfms: Read success from mirror side=1 logical extent number=0 disk=DATA_0003 path=/dev/asm-disk7
Mirror I/O done from ASM disk /dev/asm-disk7
Trying mirror side DATA_0003.
Reread of blocknum=100, file=+DATA/prodb/datafile/test.262.794687963. found same corrupt data
ksfdrnms:Mirror Read file=+DATA/prodb/datafile/test.262.794687963 fob=0xcdc3a6a0 bufp=0x7f81c5726000 nbytes=8192
ksfdrnms: Read success from mirror side=2 logical extent number=1 disk=DATA_0002 path=/dev/asm-disk6
Mirror I/O done from ASM disk /dev/asm-disk6
Trying mirror side DATA_0002.
Reread of blocknum=100, file=+DATA/prodb/datafile/test.262.794687963. found same corrupt data
ksfdrnms:Mirror Read file=+DATA/prodb/datafile/test.262.794687963 fob=0xcdc3a6a0 bufp=0x7f81c5726000 nbytes=8192
ksfdrfms:Mirror Read file=+DATA/prodb/datafile/test.262.794687963 fob=0xcdc3a6a0 bufp=0x7f81c5726000 blkno=100 nbytes=8192
ksfdrfms: Read success from mirror side=1 logical extent number=0 disk=DATA_0003 path=/dev/asm-disk7
Mirror I/O done from ASM disk /dev/asm-disk7

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
         8        100          1            1593392 CHECKSUM

RMAN> recover corruption list;

Starting recover at 25-NOV-09
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=26 device type=DISK
searching flashback logs for block images
finished flashback log search, restored 0 blocks

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/25/2009 20:17:29
RMAN-06026: some targets not found - aborting restore
RMAN-06023: no backup or copy of datafile 8 found to restore
RMAN-06023: no backup or copy of datafile 8 found to restore

【Oracle数据恢复】解决ORA-01578错误一例

ORA-01578错误是Oracle中常见的物理坏块讹误(Corruption)错误,从10g以后在拥有完整备份和归档日志的情况下可以通过blockrecover/recover命令在线恢复该坏块,前提是数据块所在磁道在物理上仍可用。

以下是一个在没有充分备份情况下的ORA-01578错误的解决,前提是能够容忍坏块所在数据的丢失:

如果不能自行解决该问题,那么也可以联系MACLEAN专业数据库修复团队。

 

SQL> exec  DBMS_STATS.GATHER_DATABASE_STATS;
BEGIN DBMS_STATS.GATHER_DATABASE_STATS; END;

*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 870212)
ORA-01110: data file 4:
'/s01/oradata/G10R25/datafile/o1_mf_users_7ch7d4nx_.dbf'
ORA-06512: at "SYS.DBMS_STATS", line 15188
ORA-06512: at "SYS.DBMS_STATS", line 15530
ORA-06512: at "SYS.DBMS_STATS", line 15674
ORA-06512: at "SYS.DBMS_STATS", line 15638
ORA-06512: at line 1

 

使用RMAN blockreocver命令试图修改该物理坏块:

 

RMAN> blockrecover datafile 4 block 870212;

Starting blockrecover at 08-NOV-12

channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/s01/flash_recovery_area/G10R25/backupset/2012_08_06/o1_mf_nnndf_TAG20120806T075500_81zd4njn_.bkp tag=TAG20120806T075500
channel ORA_DISK_1: block restore complete, elapsed time: 00:01:16

starting media recovery

archive log thread 1 sequence 467 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_10_31/o1_mf_1_467_893571cm_.arc
archive log thread 1 sequence 468 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_10_31/o1_mf_1_468_893pc84l_.arc
archive log thread 1 sequence 469 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_11_01/o1_mf_1_469_894zsbym_.arc
archive log thread 1 sequence 470 is already on disk as file /s01/flash_recovery_area/G10R25/archivelog/2012_11_01/o1_mf_1_470_896b944y_.arc
4_.arc
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of blockrecover command at 11/08/2012 06:19:40
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of log thread 1 seq 466 lowscn 27762151 found to restore
RMAN-06025: no backup of log thread 1 seq 465 lowscn 27762145 found to restore
RMAN-06025: no backup of log thread 1 seq 464 lowscn 27762142 found to restore

 

由于缺少必要的归档日志导致blockrecover无法成功,需要另想办法。

 

 

首先确认该数据块属于哪个SEGMENT,如果是INDEX那么完全可以重建也不会丢失数据,但是如果是表数据则需要容忍丢失该坏块内的数据:

SQL> col tablespace_name for a20 
SQL> col segment_type for a10
SQL> col segment_name for a20
SQL> col owner for a8
SQL> SELECT tablespace_name, segment_type, owner, segment_name
  2  FROM dba_extents
  3  WHERE file_id = &fileid
  4  and &blockid between block_id AND block_id + blocks - 1;
Enter value for fileid: 4
old   3: WHERE file_id = &fileid
new   3: WHERE file_id = 4
Enter value for blockid: 870212
old   4: and &blockid between block_id AND block_id + blocks - 1
new   4: and 870212 between block_id AND block_id + blocks - 1

TABLESPACE_NAME      SEGMENT_TY OWNER    SEGMENT_NAME
-------------------- ---------- -------- --------------------
USERS                TABLE      SYS      CORRUPT_ME

SQL> select count(*) from CORRUPT_ME;
select count(*) from CORRUPT_ME
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 870212)
ORA-01110: data file 4:
'/s01/oradata/G10R25/datafile/o1_mf_users_7ch7d4nx_.dbf'

SQL> analyze table corrupt_me validate structure;
analyze table corrupt_me validate structure
*
ERROR at line 1:
ORA-01498: block check failure - see trace file

SQL> oradebug setmypid
Statement processed.
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_19749.trc

Corrupt block relative dba: 0x010d4744 (file 4, block 870212)
Bad header found during buffer read
Data in bad block:
 type: 6 format: 2 rdba: 0x000d4744
 last change scn: 0x0000.00000000 seq: 0xff flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x000006ff
 check value in block header: 0x6323
 computed block checksum: 0x0
Reread of rdba: 0x010d4744 (file 4, block 870212) found same corrupted data
*** 2012-11-08 06:23:12.564
table scan: segment: file# 4 block# 870211
            skipping corrupt block file# 4 block# 870212
*** 2012-11-08 06:23:36.955
table scan: segment: file# 4 block# 870211
            skipping corrupt block file# 4 block# 870212
skipping corrupted block at rdba: 0x010d4744

 

 

 

下面使用10231 level 10事件来避免发生ORA-01578错误,并将原坏块表复制出来:

 

SQL> alter session set events '10231 trace name context forever,level 10';

Session altered.

SQL>  select count(*) from CORRUPT_ME;

  COUNT(*)
----------
     50857

SQL> create table corrupt_me_copy tablespace users as select * from  CORRUPT_ME;

Table created.

SQL> analyze table corrupt_me_copy validate  structure;

Table analyzed.

 

 

之后仅需要将新表rename为旧表,并重建索引即可:

 

 

SQL>  alter table corrupt_me rename to corrupt_me_copy1;

Table altered.

SQL> alter table corrupt_me_copy rename to corrupt_me;

Table altered.

SQL> rebuild indexs

 

 

 

 

Oracle在 RAC 中管理备份和恢复

学完本课后,应能完成以下工作:

  • 配置 RAC 数据库,以使用 ARCHIVELOG 模式和快速恢复区
  • RAC 环境配置 RMAN

RAC 备份和恢复与其它 Oracle DB 备份和恢复操作几乎完全相同。这是因为您备份和恢复的是单个数据库。主要区别在于:在 RAC 中,您需要处理多个重做日志文件线程。

 

防止介质故障

 

尽管 RAC 提供了多种方法,用于避免或减少由于一个或多个(但并非全部)实例发生故障而导致的停机时间,但仍需要对数据库本身加以保护,因为该数据库由所有实例所共享。这意味着您需要考虑集群数据库的磁盘备份和恢复策略,就像对非集群化数据库考虑的那样。

为了最大限度地减少由于磁盘故障而导致的潜在数据损失,您可能需要使用磁盘镜像技术(可从服务器或磁盘供应商处获得)。与在非集群化数据库中一样,只要供应商允许,便可以使用多个镜像,以帮助减少潜在的数据损失并提供替代备份策略。例如,如果您的数据库处于 ARCHIVELOG 模式并且有三个磁盘副本,则可以删除一个镜像副本并从该副本执行备份,而其余两个镜像副本可以继续保护实时磁盘活动。要正确地执行此操作,必须首先将表空间置于备份模式,然后,如果集群或磁盘供应商要求,可通过发出 ALTER SYSTEM SUSPEND 命令来暂时中断磁盘操作。该语句完成后,可以中断镜像,然后通过执行 ALTER SYSTEM RESUME 命令并使表空间脱离备份模式来恢复正常操作。

 

 

归档日志文件配置

 

在需要使用归档日志文件的备份和恢复操作期间,Oracle 服务器将根据控制文件来确定文件目的地和名称。如果使用 RMAN,还可以将归档日志文件路径名存储在可选恢复目录中。但是,归档日志文件路径名不包含节点名,因此 RMAN 会预期在分配通道的节点上找到这些文件。

如果使用集群文件系统,则可以将所有实例都写入到同一归档日志目的地。这称为集群文件系统方案。归档日志的备份和恢复非常容易,因为所有日志都位于同一目录中。

如果集群文件系统不可用,则 Oracle 会建议为每个实例创建本地归档日志目的地,这些目的地具有指向其它所有实例的 NFS 读取装载点。这称为具有网络文件系统 (NFS) 方案的本地归档。在备份期间,可以从每个主机备份归档日志,也可以选择一个主机来执行所有归档日志的备份。在恢复期间,一个实例可以从任一主机访问这些日志,而不必先将其复制到本地目的地。

无论使用哪种方案,可能都需要提供另一个归档目的地,以避免出现单点故障。

 

RAC 和快速恢复区

 

要在 RAC 中使用快速恢复区,必须将其放在 ASM 磁盘组、集群文件系统或通过每个 RAC 实例的已认证 NFS 配置的共享目录上。也就是说,快速恢复区必须在 RAC 数据库的所有实例之间共享。

 

使用 EM 执行 RAC 备份和恢复

通过单击“Cluster Database(集群数据库)主页中的“Availability(可用性)”选项卡,可访问集群数据库备份和与恢复相关的任务。在“Availability(可用性)”标签页上,可以使用 RMAN 执行一系列备份和恢复操作,例如,对备份进行排定、根据需要执行恢复,以及配置备份和恢复设置。此外还有与 Oracle Secure Backup 和服务管理有关的链接。

 

使用 EM 配置 RAC 恢复设置

可以使用 Oracle Enterprise Manager 为集群数据库配置重要的恢复设置。方法是:在“Database(数据库)”主页上,单击“Availability(可用性)”选项卡,然后单击“Recovery Settings(恢复设置)”链接。在这里,可以确保数据库处于归档日志模式和配置快速恢复设置。

对于 RAC 数据库,如果所有实例的“Archive Log Destination(归档日志目的地)”设置不完全相同,则该字段将显示为空白,并且显示一则消息,表明实例的这一字段具有不同设置。在这种情况下,在此字段中输入一个位置即可设置数据库所有实例的归档日志位置。通过使用“Initialization Parameters(初始化参数)”页,可以为归档日志目的地分配实例特定的值。

注:只要数据库是由本地实例装载的并且尚未在任何实例中打开,便可以运行 ALTER DATABASE SQL 语句来更改 RAC 中的归档模式。无需修改参数设置即可运行此语句。可在所有实例上将初始化参数 DB_RECOVERY_FILE_DEST 和 DB_RECOVERY_FILE_DEST_SIZE 设置为相同的值,以便在 RAC 环境中配置快速恢复区。

 

RAC 中的归档重做文件约定

参数 说明 示例
%r 重置日志标识符 log_1_62_23452345
%R 填补式重置日志标识符 log_1_62_0023452345
%s 日志序列号(未填充) log_251
%S 日志序列号(左填零补空) log_0000000251
%t 线程号(未填充) log_1
%T 线程号(左填零补空) log_0001

 

对于任何归档重做日志配置,都应使用 LOG_ARCHIVE_FORMAT 参数唯一地标识归档重做日志。此参数的格式是操作系统特定的,可以包含文本字符串、一个或多个变量以及文件扩展名。

所有线程参数(无论大写还是小写)对于 RAC 都是必需的。这使 Oracle DB 能够为原型中的归档日志创建唯一的名称。当 COMPATIBLE 参数设置为 10.0 或更大时,此要求生效。应使用 %R 或 %r 参数来包含重置日志标识符,以避免覆盖先前原型中的日志。如果未指定日志格式,则默认格式为操作系统特定的格式,并且包含 %t、%s 和 %r。

例如,如果与重做线程号 1 相关联的实例将 LOG_ARCHIVE_FORMAT 设置为 log_%t_%s_%r.arc,则其归档重做日志文件将被命名为:

log_1_1000_23435343.arc

log_1_1001_23452345.arc

log_1_1002_23452345.arc

 

使用 EM 配置 RAC 备份设置

使用 Oracle Enterprise Manager 可以配置持久性的备份设置。方法是:在“Database Control(数据库控制)”主页上,单击“Availability(可用性)”选项卡,然后单击“Backup Settings(备份设置)”链接。可以配置磁盘设置,如磁盘备份的目录位置和并行度级别。还可以选择默认的备份类型:

  • 备份集
  • 压缩的备份集
  • 映像副本

还可以指定重要的磁带相关设置,如可用磁带机的数量和供应商特定的介质管理参数。

 

Oracle Recovery Manager

RMAN Real Application Clusters 带来了下列优点:

  • 无需更改配置即可读取
    集群文件或裸分区
  • 可以访问多个归档日志
    目的地

Oracle Recovery Manager (RMAN) 可以使用存储脚本、交互脚本或交互 GUI 前端。将 RMAN 用于 RAC 数据库时,可使用存储脚本从最适合的节点启动备份和恢复进程。

如果对每个节点上的 RAC 实例使用不同的 Oracle 主目录位置,可在所有节点上都存在的一个位置中创建快照控制文件。只有 RMAN 在其中执行备份的节点需要快照控制文件。快照控制文件不必对 RAC 环境中的所有实例都全局可用。

除了使用集群中每个节点上都存在的本地目录以外,还可以使用集群文件或共享裸设备。下面是一个示例:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE TO ‘/oracle/db_files/snaps/snap_prod1.cf’;

对于恢复,必须确保每个恢复节点都可以通过使用前面讨论的归档方案之一访问所有实例中的归档日志文件,或者通过从其它位置复制归档日志,使其可供恢复实例使用。

 

配置 RMAN 快照控制文件位置

  • 快照控制文件路径必须在可能启动 RMAN 备份的每个节点上都有效。
  • RMAN 中配置快照控制文件位置。

确定当前位置:

RMAN> SHOW SNAPSHOT CONTROLFILE NAME;

/u01/app/oracle/product/11.1.0/dbs/scf/snap_prod.cf

 

如果愿意,可以使用 ASM、共享文件系统位置或共享块设备:

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘+FRA/SNAP/snap_prod.cf’;

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/ocfs/oradata/dbs/scf/snap_prod.cf’;

RMAN> CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘/dev/sdj2′;

快照控制文件是 RMAN 所创建的一个临时文件,用于根据控制文件的读取一致性版本进行重新同步。只有在与恢复目录重新进行同步或者在生成当前控制文件的备份时,RMAN 才需要快照控制文件。

在 RAC 数据库中,快照控制文件是在生成备份的节点上创建的。需要为这些快照控制文件配置在可能启动 RMAN 备份的每个节点上都有效的默认路径和文件名。

可运行以下 RMAN 命令来确定快照控制文件的配置位置:

SHOW SNAPSHOT CONTROLFILE NAME

可以更改快照控制文件的配置位置。例如,在基于 UNIX 的系统上,可以通过在 RMAN 提示符下输入以下命令,将快照控制文件位置指定为位于 ASM 磁盘组 +FRA 的 snap_prod.cf:

CONFIGURE SNAPSHOT CONTROLFILE NAME TO ‘+FRA/SNAP/snap_prod.cf’

此命令将在整个集群数据库中全局设置快照控制文件位置的配置。

注:CONFIGURE 命令可在 RMAN 会话中创建持久性设置。

 

配置控制文件和 SPFILE 自动备份

  • BACKUP COPY 之后,RMAN 会自动创建控制文件和 SPFILE 备份:

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;

  • 更改默认位置:

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘+FRA’;

  • 位置必须对 RAC 数据库中的所有节点都可用。

如果将 CONFIGURE CONTROLFILE AUTOBACKUP 设置为 ON,则在运行 BACKUP 或 COPY 命令后,RMAN 将自动创建控制文件和 SPFILE 备份。如果需要启动实例以执行恢复,则 RMAN 还可以自动还原 SPFILE。这意味着,SPFILE 的默认位置必须对 RAC 数据库中的所有节点都可用。

这些功能在灾难恢复中非常重要,因为即使没有恢复目录,RMAN 也可以还原控制文件。即使恢复目录和当前控制文件都丢失,RMAN 也可以还原控制文件的自动备份。

可以通过 CONFIGURE CONTROLFILE AUTOBACKUP FORMAT 命令更改 RMAN 赋予此文件的默认名称。如果在此命令中指定了一个绝对路径名,则此路径必须存在于参与备份的所有节点上,并且是完全相同的。

注:RMAN 将在第一个已分配的通道上执行控制文件自动备份。如果使用不同参数分配了多个通道(尤其是使用 CONNECT 命令分配通道),则必须确定由哪个通道执行自动备份。应始终先为连接的节点分配通道。

 

对多个 RAC 集群节点进行交叉检查

对多个节点进行交叉检查时,要确保该集群中的每个节点都可以访问所有备份。

  • 这样就可以在还原或交叉检查操作期间,在集群中的所有节点上进行通道分配。
  • 否则必须通过向 CONFIGURE CHANNEL 命令提供 CONNECT 选项,才能在多个节点上进行通道分配
  • 如果由于节点上没有配置可访问这些备份的通道,而不能访问备份,则这些备份会被标记为 EXPIRED

交叉检查多个 RAC 节点时,对集群进行配置,以便不管是哪个节点创建的备份,每个节点都可以访问所有备份。以这种方法配置好集群后,就可以在还原或交叉检查操作期间在集群中的所有节点上进行通道分配。

如果无法对集群进行配置以使每个节点都可访问所有备份,则在还原或交叉检查操作期间,必须通过向 CONFIGURE CHANNEL 命令提供 CONNECT 选项才能在多个节点上进行通道分配,这样才能使每个备份至少被一个节点访问。如果由于节点上没有配置可访问这些备份的通道,而使一些备份在交叉检查期间不可访问,则交叉检查后,这些备份在 RMAN 资料档案库中会被标记为 EXPIRED。

例如,可以在集群中的多个节点上创建了磁带备份的 Oracle RAC 配置中使用 CONFIGURE CHANNEL … CONNECT,并且每个备份只在创建该备份的节点上才是可访问的。

 

指向集群实例的通道连接

  • 在备份期间,每个已分配的通道都可以连接到集群中的不同实例。
  • 通道所连接到的实例必须已全部装载或全部打开。

CONFIGURE DEFAULT DEVICE TYPE TO sbt;

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;

CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT=’sys/rac@RACDB1′;

CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT=’sys/rac@RACDB2′;

CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT=’sys/rac@RACDB3′;

CONFIGURE DEFAULT DEVICE TYPE TO sbt;

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;

CONFIGURE CHANNEL DEVICE TYPE sbt CONNECT=’sys/rac@BR‘;

 

在以并行方式生成备份时,RMAN 通道可以连接到集群中的不同实例。本幻灯片中的示例介绍了两种可能的配置:

  • 如果要使通道专用于特定实例,可以像第一个示例所显示的那样,通过对每个通道配置使用独立的连接字符串来控制在哪个实例上分配通道。
  • 如果为备份和恢复作业定义了特殊服务,则可以使用本幻灯片中显示的第二个示例。如果在启用负载平衡的情况下配置此服务,则将在由负载平衡算法确定的节点上分配通道。

在备份期间,通道所连接到的实例必须已全部装载或全部打开。例如,如果 RACDB1 实例装载了数据库,而 RACDB2 和 RACDB3 实例打开了数据库,则备份将失败。

注:在某些集群数据库配置中,集群的某些节点能够以比访问其它数据文件更快的速度访问特定数据文件。RMAN 将自动检测这一点,这称为节点关联识别。在决定使用哪个通道来备份特定数据文件时,RMAN 会首先选择能够更快地访问要备份的数据文件的节点。例如,如果您有一个三个节点的集群,其中节点 1 能够比其它节点更快地读取/写入数据文件 7、8 和 9,则节点 1 与这些文件之间具有比节点 2 和 3 更紧密的节点关联,RMAN 将自动利用这一方面。

 

对网格的 RMAN 通道支持

  • RAC 允许使用非确定性的连接字符串。
  • 它简化了在 RAC 环境中对 RMAN 使用并行度的过程。
  • 它使用了网格环境的负载平衡特性。

通道将连接到负载最小的 RAC 实例。

 

CONFIGURE DEFAULT DEVICE TYPE TO sbt;

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;

 

在 Oracle Database 10g 中,RAC 允许使用非确定性的连接字符串,这些字符串可以基于负载平衡一类的 RAC 功能连接到不同实例。因此,为了支持 RAC,RMAN 轮询机制不再依赖确定性连接字符串,并且您可以通过未绑定到网格环境中特定实例的连接字符串来使用 RMAN。以前,如果要使用 RMAN 并行度并且在多个实例之间分配作业,必须手动为每个实例分配一个 RMAN 通道。在 Oracle Database 10g 中,要使用动态通道分配,不再需要独立的 CONFIGURE CHANNEL CONNECT 语句,而只需使用 CONFIGURE DEVICE TYPE disk PARALLELISM 等命令定义并行度,然后运行备份或还原命令即可。随后,RMAN 将自动连接到不同实例并且以并行方式完成作业。网格环境基于负载平衡选择 RMAN 所连接到的实例。因此,在 RAC 环境中配置 RMAN 并行度变得与在 RAC 环境中进行设置一样简单。通过在备份或恢复 RAC 数据库时配置并行度,RMAN 通道会动态分配到所有 RAC 实例中。

注:RMAN 不会对实例选择加以控制。如果需要与实例建立可靠的连接,则应提供只能连接到所需实例的连接字符串。

 

RMAN 的默认自动定位

  • Recovery Manager 可以自动定位下列文件:

备份片段

备份期间的归档重做日志

数据文件或控制文件副本

  • 如果使用了本地归档,则节点只能读取在该节点上生成的那些归档日志。
  • 在还原时,连接到特定节点的通道将仅还原备份到该节点的那些文件。

Recovery Manager 会自动搜索可以访问要备份或还原的文件的 RAC 配置节点。Recovery Manager 可以自动定位下列文件:

  • 备份或还原期间的备份片段
  • 备份期间的归档重做日志
  • 备份或还原期间的数据文件或控制文件副本

如果使用非集群文件系统本地归档方案,则节点只能读取由实例在该节点上生成的那些归档重做日志。RMAN 绝不会尝试在它无法读取的通道上备份归档重做日志。

在还原操作期间,RMAN 将自动执行备份的自动定位。连接到特定节点的通道将仅尝试还原备份到该节点的那些文件。例如,假定日志序列 1001 已备份到附加至节点 1 的驱动器,而日志 1002 已备份到附加至节点 2 的驱动器。如果您随后分配了连接到每个节点的通道,则连接到节点 1 的通道可以还原日志 1001(但不能还原 1002),而连接到节点 2 的通道可以还原日志 1002(但不能还原 1001)。

 

备份分配

对于 RAC,有三种可能的备份配置:

  • 专用备份服务器为集群和集群数据库执行和管理备份。
  • 一个节点能够访问本地备份设备,并为集群数据库执行和管理备份。
  • 每个节点都可以访问本地备份设备,并且可以写入其自己的本地备份介质。

在配置 RAC 的备份选项时,有三种可能的配置:

  • 网络备份服务器:专用备份服务器为集群和集群数据库执行和管理备份。没有节点具有本地备份设备。
  • 单个本地驱动器:一个节点能够访问本地备份设备,并为集群数据库执行和管理备份。集群的所有节点都位于集群文件系统上,以读取所有数据文件、归档重做日志和 SPFILE。如果只在一个本地驱动器上具有备份介质,则建议不要使用非集群文件系统归档方案。
  • 多个驱动器:每个节点都可以访问本地备份设备,并且可以写入其自己的本地备份介质。

在集群文件系统方案中,任一节点都可以访问全部数据文件、归档重做日志和 SPFILE。在非集群文件系统方案中,必须编写备份脚本,以便将备份分配到每个节点的相应驱动器和路径。例如,节点 1 可以备份其路径名以 /arc_dest_1 开始的归档重做日志,节点 2 可以备份其路径名以 /arc_dest_2 开始的归档重做日志,节点 3 可以备份其路径名以 /arc_dest_3 开始的归档重做日志。

 

单个本地驱动器 CFS 备份方案

RMAN> CONFIGURE DEVICE TYPE sbt PARALLELISM 1;
RMAN> CONFIGURE DEFAULT DEVICE TYPE TO sbt;

RMAN> BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

在集群文件系统备份方案中,集群中的每个节点都可以读取全部数据文件、归档重做日志和 SPFILE。这包括 Automatic Storage Management (ASM)、集群文件系统和网络连接存储 (NAS)。

在集群文件系统备份方案中,如果只备份到一个本地驱动器,则将假定该集群中只有一个节点具有本地备份设备(如磁带机)。在这种情况下,请运行下列一次性配置命令:

CONFIGURE DEVICE TYPE sbt PARALLELISM 1;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;

因为执行备份的任何节点都对其它节点写入的归档重做日志具有读/写访问权,所以任何节点的备份脚本都很简单:

BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

在这种情况下,磁带机将接收所有数据文件、归档重做日志和 SPFILE。

 

多驱动器 CFS 备份方案

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT ‘usr1/pwd1@n1’;
CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT ‘usr2/pwd2@n2’;
CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT ‘usr3/pwd3@n3’;

BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

 

在集群文件系统备份方案中,如果备份到多个驱动器,则将假定该集群中的每个节点都具有自己的本地磁带机。请执行以下一次性配置,以便为集群中的每个节点配置一个通道。这是一个一次性配置步骤。例如,在 RMAN 提示符下输入以下命令:

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT ‘user1/passwd1@node1’;
CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT ‘user2/passwd2@node2’;
CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT ‘user3/passwd3@node3’;

同样,也可以为 DISK 设备类型执行此配置。以下备份脚本(可以在集群中的任意节点运行该脚本)将在备份驱动器间分配数据文件、归档重做日志和 SPFILE 备份:

BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

例如,如果数据库包含 10 个数据文件,并且磁盘上有 100 个归档重做日志,则节点 1 备份驱动器可以备份数据文件 1、3、7 和日志 1-33。节点 2 可以备份数据文件 2、5、10 和日志 34–66。节点 3 备份驱动器可以备份数据文件 4、6、8、9 以及归档重做日志 67–100。

 

CFS 备份方案

CONFIGURE DEVICE TYPE sbt PARALLELISM 3;
CONFIGURE DEFAULT DEVICE TYPE TO sbt;
CONFIGURE CHANNEL 1 DEVICE TYPE sbt CONNECT ‘usr1/pwd1@n1’;
CONFIGURE CHANNEL 2 DEVICE TYPE sbt CONNECT ‘usr2/pwd2@n2’;
CONFIGURE CHANNEL 3 DEVICE TYPE sbt CONNECT ‘usr3/pwd3@n3’;

 

BACKUP DATABASE PLUS ARCHIVELOG DELETE INPUT;

在非集群文件系统环境中,每个节点都只能备份其自己的本地归档重做日志。例如,节点 1 将无法访问节点 2 或节点 3 上的归档重做日志,除非将网络文件系统配置为可以进行远程
访问。要配置 NFS,请将备份分配到多个驱动器。但是,如果为备份配置了 NFS,则只能备份到一个驱动器。

在非集群文件系统备份方案中,如果备份到多个驱动器,则将假定该集群中的每个节点都
具有自己的本地磁带机。您可以像本幻灯片中显示的那样执行类似的一次性配置,以便为
该集群中的每个节点配置一个通道。同样,也可以为 DISK 设备类型执行此配置。可为整
个数据库备份开发一个可从任何节点运行的正式备份脚本。使用 BACKUP 示例,可在不同的磁带机中分配数据文件备份、归档重做日志和 SPFILE 备份。但是,通道 1 只能读取在 /arc_dest_1 上本地归档的日志。这是因为自动定位功能将通道 1 限制为只备份 /arc_dest_1 目录中的归档重做日志。因为节点 2 只能读取 /arc_dest_2 目录中的文
件,所以通道 2 只能备份 /arc_dest_2 目录中的归档重做日志,其余依此类推。关键在于:会对所有日志进行备份,但会将其分配到不同的驱动器中。

 

还原和恢复

  • 介质恢复可能需要使用每个线程的一个或多个归档日志
    文件。
  • RMAN RECOVER 命令可自动还原和应用所需的归档日志。
  • 归档日志可以还原到执行还原和恢复操作的任意节点。
  • 日志在执行还原和恢复活动的节点中必须是可读的。
  • 恢复进程在恢复期间会请求启用附加线程。
  • 恢复进程会通知您因禁用而不再需要的线程。

RAC 所访问的数据库的介质恢复可能会要求每个线程至少有一个归档日志文件。但是,如果线程的联机重做日志包含足够的恢复信息,则不必为任何线程还原归档日志文件。

如果使用 RMAN 介质恢复并共享归档日志目录,则可以用 SET 子句更改归档日志自动还原的目的地,以将文件还原到开始恢复时所在的节点的本地目录。如果在不使用中央介质管理系统的情况下,备份了每个节点的归档日志,则必须先还原远程节点的所有日志文件,然后将它们移到某个主机中,随后您将使用 RMAN 从中启动恢复进程。但如果使用中央介质管理系统备份了每个节点的日志文件,则可以使用 RMAN 的 AUTOLOCATE 功能。这使您可以使用远程节点上的本地磁带机来恢复数据库。

如果恢复到达启用附加线程的时间点,则恢复进程会请求该线程的归档日志文件。如果使用了备份控制文件,则在所有归档日志文件都处理完以后,可能需要将恢复进程重定向到联机重做日志文件以完成恢复。如果恢复到达禁用某个线程的时间点,则恢复进程将通知您已不再需要该线程的日志文件。

在本课中,您应该已经学会:

  • 使用 EM 配置 RAC 恢复设置
  • 使用 EM 配置 RAC 备份设置
  • 启动归档
  • 配置 RMAN
  • 使用 EM 执行 RAC 备份和恢复

本练习包含以下主题:

  • 配置 RAC 数据库以使用 ARCHIVELOG 模式和快速恢复区
  • RAC 环境配置 RMAN

 

 

 

Script:收集介质恢复诊断信息

以下脚本可以用于收集介质恢复诊断信息(recovery_info.sql):


--- begin [recovery_info.sql] ---
define spoolfile = &1
spool &spoolfile
alter session set nls_date_format = 'YYYY-MM-DD HH24:MI:SS';
alter session set timed_statistics = true;
alter session set max_dump_file_size = UNLIMITED;
set feedback on
set term on
set wrap on
set trimspool on
set pagesize 1000
set linesize 100
set numwidth 10
select to_char(sysdate) start_time from dual;
column host_name format a20 tru
select instance_name, host_name, version, status, startup_time from v$instance;
set echo on
select * from v$database;
select * from v$controlfile;
select * from v$tablespace;
select * from v$datafile;
select * from v$datafile_header;
select * from v$tempfile;
select * from v$thread;
select * from v$log;
select * from v$logfile;
select * from v$archived_log;
alter session set events 'immediate trace name file_hdrs level 3';
alter session set events 'immediate trace name redohdr level 3';
set echo off
Prompt;
Prompt Output file name is:;
define spoolfile
Prompt;
Prompt ALERT.LOG and TRACE FILES are located in:;
column host_name format a12 tru
column name format a20 tru
column value format a60 tru
select distinct i.host_name, p.name, p.value from v$instance i, v$parameter p
 where p.name like '%_dump_dest'
   and p.name != 'core_dump_dest';
select to_char(sysdate) end_time from dual;
spool off
exit
--- end [recovery_info.sql] ---

沪ICP备14014813号-2

沪公网安备 31010802001379号