Oracle ORA-00600 [4193] ORA-600 [4193] “seq# mismatch while adding undo record”

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Format: ORA-600 [4193] [a] [b]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:
A mismatch has been detected between Redo records and Rollback (Undo) records.
We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied.

This error is reported when this validation fails.
ARGUMENTS:

Arg [a] Undo record seq number
Arg [b] Redo record seq number

FUNCTIONALITY:

KERNEL TRANSACTION UNDO

IMPACT:

PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

This error may indicate a rollback segment corruption.
This may require a recovery from a database backup depending on the situation.

NB Bug Fixed Description
14034244 11.2.0.3.BP09,
12.1.0.0 Lost write type corruption using ASM in 11.2.0.3
8240762
10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction

ORA-600 [4193] [a] [b] [ ] [ ] [ ]
Versions: 7.2.2 – 9.2.0 Source: ktuc.c
===========================================================================
Meaning: seq# mismatch while adding an undo record to an undo block. This
is done by the application of redo.
—————————————————————————
Argument Description:
a. (ktubhseq): undo record seq# – this is the seq# of the block that
this undo record WILL BE APPLIED TO.
This is from the Undo Block. It is
NOT the seq# of the undo block itself.
b. (ktudbseq): redo RECORD seq# – this is the seq# number in the block
that this redo WILL BE APPLIED TO.
This is from the Redo Record.
—————————————————————————
Diagnosis:
This error is raised in kturdb which handles the adding of undo records
by the application of redo.
When we try to apply redo to an undo block (forward changes are made by
the application of redo to a block) we check that the seq# in the undo
record matches the seq# in the redo record. These seq# should be the
same because when we apply a redo record we must apply it to the
correct version of the block. We can only apply a redo record to a
block that contains the same seq# as in the redo record.
If the seq# do not match then this error is raised. This implies some
kind of block corruption in either the redo or the undo block.
7.3.x – 8.1.7.x
ASSERT2(ubh->ktubhseq == db->ktudbseq, OERI(4193), KSESVSGN,
ubh->ktubhseq, db->ktudbseq);
9.2.x
ksesic2(OERI(4193), ksenrg(ubh->ktubhseq), ksenrg(db->ktudbseq));
struct ktubh
{
kxid ktubhxid; /* txid of tx currently using or last used this block */
ub2 ktubhseq; /* undo block sequence number */
ub1 ktubhcnt; /* high water mark record index, number of undo entries */

ub1 ktubhirb; /* rollback record index, rec index to start the rollback */
ub1 ktubhicl; /* collecting record index, rec index to start retrieving col info */
ub1 ktubhflg; /* dummy */
ub2 ktubhidx[1]; /* byte offset of record in block, grows at runtime */
};
struct ktudb Kernel Transaction Undo Data operation Block (redo)
{
ub2 ktudbsiz; /* size of entry */
ub2 ktudbspc; /* verification: space left in undo block */
ub2 ktudbflg; /* flag to indicate the kind of redo operation */
kxid ktudbxid; /* current tx id */
ub2 ktudbseq; /* block sequence number */
ub1 ktudbrec; /* new record index for this change */
};
Note 452620.1 has a procedure to patch this inconsistency when the problem
is produced in the SYSTEM rollback segment
Articles:
None
—————————————————————————
Known Bugs: (Those bugs that are fixed after version 7.0.12.0.0)
(Bugs must be closed or hold useful information)
Fixed In. Bug No. Description
———+————+—————————————————-
7.X Bug:XXXXXX Desc
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4193 4193 4193 4193 4193 4193 4193 4193 4193 4193
4193 4193 4193 4193 4193 4193 4193 4193 4193 4193

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Oracle ORA-600 [4194] “Undo Record Number Mismatch While Adding Undo Record”

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [4194] [a] [b]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:
A mismatch has been detected between Redo records and rollback (Undo) records.

We are validating the Undo record number relating to the change being
applied against the maximum undo record number recorded in the undo block.
This error is reported when the validation fails.

ARGUMENTS:
Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

FUNCTIONALITY:
Kernel Transaction Undo called from Cache layer

IMPACT:
PROCESS FAILURE
POSSIBLE ROLLBACK SEGMENT CORRUPTION

NB Bug Fixed Description
8240762
10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction
3210520 9.2.0.5, 10.1.0.2 OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC
+ 792610 8.0.6.0, 8.1.6.0 Rollback segment corruption OERI:4194 can occur if block checking detects a
corrupt block

Historic information:
7.3.3 to 8.1.5
==============
Note:69863.1 ALERT: Apparent data corruptions involving Solaris 2.6,
ISM & DR on Starfire
Check USE_ISM parameter on SUN Solaris E10000 Platforms.
ORA-600 [4194] [a] [b]
Versions: 6.0 – 9.2 Source: ktuc.c
===========================================================================
Meaning:
Undo record number mismatch while adding an undo record to an undo
block. This is done by the application of redo.
—————————————————————————
Argument Description:
a. (ktubhcnt): undo record count – This is the maximum number of undo
records that have ever existed
within this Undo Block. In other
words, it is the High Water Mark for
undo records in that undo block.
This is from the Undo Block.
b. (ktudbrec): redo record number – This is the record number for the
new undo record that is to be added
to the undo block. It should be
one greater than the maximum in the
undo block currently. This is from
the Redo Record.
—————————————————————————
Diagnosis:

This error is raised in kturdb which handles the adding of undo records
by the application of redo.
When we try to apply redo to an undo block (forward changes are made by
the application of redo to a block), we check that the number of undo
records in the undo block +1 matches the record number in the redo
record. Because we are adding a new undo record, we know that the record
number in that undo block must be one greater than the maximum number in
that block.
So for UBA=0x08000592.00a0.0b
0x08000592 is the dba of the undo block.
0x00a0 is the seq# number that is in the block that THIS UNDO IS TO
BE APPLIED TO.
0x0b is the number of undo records in the undo block.
In the header this looks like:
UNDO BLK::
xid: 0x0004.00e.0000017f seq: 0x00a0 cnt: 0x0b ……..
Since we are adding a new undo record to our undo block, we would expect
that the new record number is equal to the maximum record number in the
undo block +1. If this is not the case, we get ORA 600 [4194].
This implies some kind of block corruption in either the redo or the
undo block. Look for other errors that would imply that a block is
corrupted.
Note: If the ORA-4194 follows another ORA-600 AND IF AND ONLY IF
the arguments [a] and [b] are the same, then this MAY be due
to Bug:792610 which can cause undo corruption following a
failed block change.
Note:452620.1 has a procedure to patch this inconsistency when the problem
is produced in the SYSTEM rollback segment
—————————————————————————
Known Bugs: (Those bugs that are fixed after version 7.0.12.0.0.
Bugs must be closed or hold useful information.)
Fixed In. Bug No. Description
———+————+—————————————————-
8.0.6/8.1.6 Bug:792610 ORA-600 during redo application to a block may
in turn cause an OERI:4194 on the undo block.
E.g., block checking noticing a corrupt index
block during a multi-row insert.
7.1.5 Bug:239671 Truncate (could possibly happen on other
operations too) on 16k+ block size can cause
the maximum number of undo records in a block
(255) to be exceeded.

Workarounds: Use < 16K blocksize, or avoid
using the TRUNCATE command with the DROP
STORAGE option (which is the default).
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4194 4194 4194 4194 4194 4194 4194 4194 4194 4194
4194 4194 4194 4194 4194 4194 4194 4194 4194 4194

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Oracle Step by step to resolve ORA-600 4194 4193 4197 on database crash

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

APPLIES TO:
Oracle Server – Enterprise Edition – Version 9.2.0.1 to 11.2.0.3 [Release 9.2 to 11.2]
Information in this document applies to any platform.
SYMPTOMS
The following error is occurring in the alert.log right before the database crashes.
ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []
This error indicates that a mismatch has been detected between redo records and rollback (undo) records.
ARGUMENTS:
Arg [a] – Maximum Undo record number in Undo block
Arg [b] – Undo record number from Redo block
Since we are adding a new undo record to our undo block, we would expect that the new record number is equal to the maximum record number in the undo block plus one. Before
Oracle can add a new undo record to the undo block it validates that this is correct. If this validation fails, then an ORA-600 [4194] will be triggered.
CHANGES
This issue generally occurs when there is a power outage or hardware failure that initially crashes the database. On startup, the database does the normal roll forward (redo) and then
rollback (undo), this is where the error is generated on the rollback.
CAUSE
This also can be cause by the following defect
Bug 8240762 Abstract: Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] after SHRINK
Details:
Undo corruption may be caused after a shrink and the same undo block may be used
for two different transactions causing several internal errors like:
ORA-600 [4193] / ORA-600 [4194] for new transactions
ORA-600 [4137] for a transaction rollback
SOLUTION
Best practice to create a new undo tablespace.
This method includes segment check.
Create pfile from spfile to edit
>create pfile from spfile;
1. Shutdown the instance
2. set the following parameters in the pfile
undo_management = manual
event = ‘10513 trace name context forever, level 2’
3. >startup restrict pfile=<initsid.ora>
4. >select tablespace_name, status, segment_name from dba_rollback_segs where status != ‘OFFLINE’;
This is critical – we are looking for all undo segments to be offline – System will always be online.
If any are ‘PARTLY AVAILABLE’ or ‘NEEDS RECOVERY’ – Please open an issue with Oracle Support or update the current SR. There are many options from this moment and Oracle
Support Analyst can offer different solutions for the bad undo segments.
If all offline then continue to the next step
5. Create new undo tablespace – example
>create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
6. Drop old undo tablespace
>drop tablespace <old undo tablespace> including contents and datafiles;
7. >shutdown immediate;
8 >startup mount;
9 modify the pfile with the new undo tablespace name
>alter system set undo_tablespace = ‘<new tablespace>’ scope=pfile;
10. >shutdown immediate;
11. >startup;
Startup using the normal spfile

================
The reason we create a new undo tablespace first is to use new undo segment numbers that are higher then the current segments being used. This way when a transaction goes to do
block clean-out the reference to that undo segment does not exist and continues with the block clean-out.

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Oracle ORA-600[4000] ORA-00600[4000]

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Applies to:
Oracle Server – Enterprise Edition – Version: 8.1.7.4 to 11.1.0.7
Information in this document applies to any platform.

Purpose

Symptoms

Database fails to start because of ora-600[4000].

Alert.log will show:

Errors in file /oracle/admin/sdwh/udump/sdwh_ora_13186.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [1], [], [], [], [], [], []
Tue Sep 9 14:48:04 2008
Error 704 happened during db open, shutting down database
sdwh_ora_13186.trc shows:
*** 2008-09-09 15:33:26.194
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [4000], [1], [], [], [], [], [], []
Current SQL statement for this session:
select ctime, mtime, stime from obj$ where obj# = :1
..
..
row cache parent object: address=0xc9efb27c cid=3(dc_rollback_segments)
hash=35e74caf typ=5 transaction=(nil) flags=00000001
own=0xc9efb2f0[0xc7c83ba0,0xc7c83ba0] wat=0xc9efb2f8[0xc9efb2f8,0xc9efb2f8] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
data=
00000001 ….
BH (0x0x6ffff4ac) file#: 1 rdba: 0x0040007a (1/122) class 1 ba: 0x0x6ff8a000
set: 17 dbwrid: 0 obj: 18 objn: 18
hash: [74ffdc70,c85d94cc] lru: [6ffffad4,c771aabc]
ckptq: [NULL] fileq: [NULL]
use: [c84043f0,c84043f0] wait: [NULL]
st: XCURRENT md: SHR rsop: 0x(nil) tch: 0

LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0]
Using State Objects
—————————————-
SO: 0xc84043d0, type: 24, owner: 0xc722382c, flag: INIT/-/-/0x00
(buffer) (CR) PR: 0x0xc71d1440 FLG: 0x500400
lock rls: 0x(nil), class bit: 0x(nil)
kcbbfbp: [BH: 0x0x6ffff4ac, LINK: 0x0xc84043f0]
where: kdswh02: kdsgrp, why: 0
buffer tsn: 0 rdba: 0x0040007a (1/122)
scn: 0x0000.15ad85b0 seq: 0x01 flg: 0x06 tail: 0x85b00601
frmt: 0x02 chkval: 0xabfc type: 0x06=trans data
Block header dump: 0x0040007a
Object id on Block? Y
seg/obj: 0x12 csc: 0x00.15ad85ad itc: 1 flg: – typ: 1 – DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0001.027.000056dc 0x0080d065.16f2.14 –U- 1 fsc 0x0000.15ad85b0
Trace file shows _SYSSMU1$ has a TX against obj$, and the scn ofthe block touched by this TX is scn:
0x0000.15ad85b0 –> 363693488 decimal.
The ora-600[4000] could be raised at startup if the above scn is ahead of the database SCN.
Last Review Date
October 3, 2008
Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are
included in the document to assist in troubleshooting.
Troubleshooting Details
1) Find database SCN
SQL> startup mount
SQL> select checkpoint_change# from v$database;
2) SQL> select ceil(&decimal_scn_expected/1024/1024/1024) from dual;
3) set parameter _minimum_giga_scn=<result from 2> in the init.ora file.
Using the above trace file example, we found:
SQL> select checkpoint_change# from v$database;
355532971
As suspected the database scn = 355532971 is lower than TX scn=363693488.
SQL> select ceil(&decimal_scn_expected/1024/1024/1024) from dual;
Enter value for decimal_scn_expected: 363693488
old 1: select ceil(&decimal_scn_expected/1024/1024/1024) from dual
new 1: select ceil(363693488/1024/1024/1024) from dual

CEIL(363693488/1024/1024/1024)
——————————
1
1) set parameter _minimum_giga_scn=1 in the init.ora file.
2) open the database
startup mount
recover database
alter database open;
4) Startup database
SQL> startup mount
SQL> recover database
SQL> alter database open;
5) If database opens:
– remove parameter _minimum_giga_scn from init.ora and bounce database
SQL> shutdown immediate
SQL> startup
6) Investigate what could cause the ora-600[4000] , could be because customer forced to open database
using _allow_resetlogs_corruption, and if this is the case we strongly suggest to recreate the database
from scratch taking a full export.

ORA-600 [4000] “trying to get dba of undo segment header block from usn”

Format: ORA-600 [4000] [a]
VERSIONS:
version 6.0 to 9.2
DESCRIPTION:
This has the potential to be a very serious error.
It means that Oracle has tried to find an undo segment number in the
dictionary cache and failed.
ARGUMENTS:
Arg [a] Undo segment number
FUNCTIONALITY:
KERNEL TRANSACTION UNDO
IMPACT:
INSTANCE FAILURE – Instance will not restart
STATEMENT FAILURE
SUGGESTIONS:
As per Note 1371820.8, this can be seen when executing DML on tables residing
in tablespaces transported from another database.
It is fixed in 8.1.7.4, 9.0.1.4 and 9.2.0.1 The workaround however is to
create more rollback segments in the target database until the highest
rollback segment number (select max(US#) from sys.undo$;) is at least
as high as in equivalent max(US#) from the source database.
It has also been seen where memory has been corrupted so try shutting
down and restarting the instance.
If the database will not start contact Oracle Support Services
immediately, providing the alert.log and associated trace files

NB Bug Fixed Description
* 9145541 11.1.0.7.4, 11.2.0.1.2, OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile

* 9145541 11.1.0.7.4, 11.2.0.1.2,
11.2.0.2, 12.1.0.0
OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile
after CREATE CONTROLFILE in 11g
+ 10425010 11.2.0.3, 12.1 Stale data blocks may be returned by Exadata FlashCache
12353983 ORA-600 [4000] with XA in RAC
7687856 11.2.0.1 ORA-600 [4000] from DML on transported ASSM tablespace
2917441 11.1.0.6 OERI [4000] during startup
3115733 9.2.0.5, 10.1.0.2 OERI[4000] / index corruption can occur during index coalesce
2959556 9.2.0.5, 10.1.0.2 STARTUP after an ORA-701 fails with OERI[4000]
1371820 8.1.7.4, 9.0.1.4, 9.2.0.1 OERI:4506 / OERI:4000 possible against transported tablespace
+ 434596 7.3.4.2, 8.0.3.0 ORA-600[4000] from altering storage of BOOTSTRAP$

Bug 1362499
ORA-600 [4000] after migrating 7.3.4.3 to 8.0.6.1 on HP-UX 32-bit
Specific to HP-UX, fixed in one-off patch

Historic info on the Oracle 7.3.x issues re unlimited extents and bootstrap$
In 7.3.4 then due to Bug:434596, this can result from altering the
SYS.BOOTSTRAP$ table.
When a SHUTDOWN command follows this, the database will not startup again.
Example: Any of following modifications of SYS.BOOTSTRAP$
will cause this error:
ALTER TABLE BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED );
ALTER TABLE BOOTSTRAP$ STORAGE (NEXT 1024);
ALTER TABLE SYS.BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED);
ALTER TABLE sys.BOOTSTRAP$ STORAGE (MAXEXTENTS UNLIMITED);
A lock byte is now set on the SYS.BOOTSTRAP$ segment header and
following shutdown the database will not start.
A select from bootstrap$ before shutdown will cleanout the lock on
the SYS.BOOTSTRAP$ segment header and prevent the errors from occuring.
Example: Issue the following BEFORE shutdown:
sql> select count(*) from sys.bootstrap$;
Get a backup history of the Database/s and the exact sequence of steps performed.
Two possible options
a) Go back to backup before the storage clause on BOOTSTRAP$ was changed
b) Oracle Support may be able to patch bootstrap$. See Note:43132.1
Obviously, option a) is always the way to go if at all possible.
Articles:
ALERT about changing MAXEXTENTS to UNLIMITED Note:50380.1
Another cause of an ORA-600 [4000] is that a block scn is ahead of the database scn.
In that case the block with the high scn could be printed in the trace file and

Event ADJUST_SCN or parameter _MINIMUM_GIGA_SCN Note:552438.1 can be used to bump the SCN.

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
4000 4000 4000 4000 4000 4000 4000 4000 4000 4000
4000 4000 4000 4000 4000 4000 4000 4000 4000 4000

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Top Internal Errors – Oracle Server Release 8.1.7

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ORA-600 [ksmals]
Possible bugs: Fixed in:
Bug:2662683 ORA-7445 & HEAP CORRUPTION WHEN RUNNING APPS PROGRAM THAT
DOES HEAVY INSERTS
9.2.0.4
References:
Note:247822.1 ORA-600 [ksmals]
ORA-600 [4000]
Possible bugs: Fixed in:
Bug:2959556 STARTUP after an ORA-701 fails with OERI[4000] 9.2.0.5,10G
Bug:1371820 OERI:4506 / OERI:4000 possible against transported tablespace 8.1.7.4, 9.0.1.4,
9.2.0.1
References:
Note:47456.1 ORA-600 [4000] “trying to get dba of undo segment header block from
usn”
ORA-600 [4454]
Possible bugs: Fixed in:
Bug:1402161 OERI:4411/OERI:4454 on long running job 8.1.7.3, 9.0.1.3,
9.2.0.1
References:
Note:138836.1 ORA-600 [4454]
ORA-600 [kcbgcur_9]
Possible bugs: Fixed in:
Bug:1804676 OERI:KCBGCUR_9 possible from ONLINE REBUILD INDEX with concurrent
DML
8.1.7.3, 9.0.1.3,
9.2.0.1
References:
Note:114058.1 ORA-600 [kcbgcur_9] “Block class pinning violation”
ORA-600 [729]
Possible bugs: Fixed in:
Bug:931820 Direct load fails in kghxhdr when SESSION_CACHED_CURSORS is larger
than zero
9.0.1
Bug:2177050 OERI:729 space leak possible (with tags “define var info” / “oactoid info”) 8.1.7.4, 9.0.1

References:
Note:31056.1 ORA-600 [729] “UGA Space Leak”
ORA-600 [1113]
Possible bugs: Fixed in:
Bug:1307247 OERI:1113 can occur if ANALYZE fails or is interupted 8.0.6.3, 8.1.7.1, 9.0.1
References:
Note:145367.1 Interrupted or failed ANALYZE might cause instance to hang
Note:41767.1 ORA-600 [1113] “Parent SO is free when adding a child”
ORA-600 [1114] / ORA-600 [ksmguard2]
Possible bugs: Fixed in:
Bug:1779978 OERI:1114 / OERI:KSMGUARD2 with more than about 536 sessions 8.1.7.2, 9.0.1
References:
Note:153041.1 ORA-600 [1114] / ORA-600 [KSMGUARD2] on 8.1.7.1.x with Large
Number of Sessions
ORA-600 [4820]
Possible bugs: Fixed in:
Bug:1951929 ORA-7445 in KQRGCU/kqrpfr/kqrpre possible 8.1.7.3, 9.0.1.2, 9.2
References:
ORA-600 [12261]
Possible bugs: Fixed in:
Bug:912223 OERI:12261 / dump in OPIPLS using EXECUTE IMMEDIATE with SQL
derived strings
8.1.7.2, 9.0.1
@Bug:1661786 OERI:12261 / single byte memory corruption possible for CALL type
triggers
8.1.7.3, 9.0.1
References:
ORA-600 [12333]
Possible bugs: Fixed in:
References:
Note:35928.1 ORA-600 [12333] “Fatal Two-Task Protocol Violation”
ORA-600 [16224]
Possible bugs: Fixed in:
Bug:1651530 ORA-600 [16224] [], THEN SMON DIES AND KILLS INSTANCE 8.1.7.4, 9.0.1.3
Bug:1310142 SMON CRASHES INSTANCE WITH ERROR ORA-600 [16224] 8.1.6.3
References:
Note:136754.1 Instance crashes with ORA-600 [16224]
ORA-600 [17069]
Possible bugs: Fixed in:

References:
Note:39616.1 ORA-600 [17069] “Failed to pin a library cache object after 50 attempts”
ORA-600 [17112]
Possible bugs: Fixed in:
References:
Note:47411.1 ORA-600 [17112] “Internal Heap Error”
ORA-600 [17182]
Possible bugs: Fixed in:
References:
Note:34779.1 ORA-600 [17182] “Heap chunk header BAD MAGIC”
ORA-600 [kcbgcur_2]
Possible bugs: Fixed in:
Bug:1502537 OERI:KCBGCUR_2 possible on DML (stack includes ktsf_rsp1) 8.1.7.1, 9.0.1
References:
Note:145610.1 ORA-600 [kcbgcur_2] after upgrade from Oracle7 to 8.1.x
ORA-600 [kccsbck_first] or ORA-600 [3716] or ORA-600 [4185] (HP Tru64 Only)
Possible bugs: Fixed in:
Bug:1379200 NODE PANIC OR SHUTDOWN CAN CAUSE PARITIONED CLUSTER AND
DATABASE CORRUPTION
8.1.7.1, 9.0.1.0
References:
Note:137322.1 ALERT: Node panic or shutdown can cause partitioned cluster and
database corruption
ORA-600 [kwqitnmptme:read]
Possible bugs: Fixed in:
Bug:1663503 ORA-600 [kwqitnmptme:read] w/MAX_RETRIES and EXPIRATION
(AQ_TM_PROCESSES)
9.0.1
References:

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

ORA-00600 [2662] ORA-600 [2662] “Block SCN is ahead of Current SCN”

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ERROR:

Format: ORA-600 [2662] [a] [b] [c] [d] [e]

VERSIONS:

versions 6.0 to 10.1

DESCRIPTION:

A data block SCN is ahead of the current SCN.
The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN stored in a UGA variable.
If the SCN is less than the dependent SCN then we signal the ORA-600 [2662] internal error.

ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

FUNCTIONALITY:

File and IO buffer management for redo logs

IMPACT:
INSTANCE FAILURE

POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:

There are different situations where ORA-600 [2662] can be raised.

It can be raised on startup or duing database operation.

If not using Parallel Server, check that 2 instances have not mounted the same database.
Check for SMON traces and have the alert.log and trace files ready to send to support.
Check the SCN difference [argument d]-[argument b].

If the SCNs in the error are very close, then try to shutdown and startup the instance several times.
In some situations, the SCN increment during startup may permit the database to open. Keep track of the number of times you attempted a If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.
Known Issues:

NB Bug Fixed Description
4453449 10.2.0.2, 11.1.0.6 OERI:3020 / corruption errors from multiple FLASHBACK DATABASE
5889016 Corruption / OERI during recovery
2899477 9.2.0.5, 10.1.0.2 Minimise risk of a false OERI[2662]
2764106 9.2.0.5, 10.1.0.2 False OERI[2662] possible on SELECT which can crash the instance
2216823 10.1.0.2 OERI [2662] reusing a TEMPFILE with a restored database
2054025 9.0.1.3, 9.2.0.1 OERI:2662 possible on new TEMPORARY index block
P 647927 8.0.4.2, 8.0.5.0 Digital Unix ONLY: OERI:2662 could occur under heavy load
851959 7.3.4.5 OERI:2662 possible from distributed OPS select

INTERNAL ONLY SECTION – NOT FOR PUBLICATION OR DISTRIBUTION TO CUSTOMERS
========================================================================
There were 2 forms of this error until 7.2.3:
Type I: 4/5 argument forms –
The SCN found on a block (dependent SCN) is ahead of the
current SCN. See below for this
Type II: 1 Argument (before 7.2.3 only):
Oracle is in the process of writing a block to a log file.
If the calculated block checksum is less than or equal to 1
(0 and 1 are reserved) ORA-600 [2662] is returned.
This is a problem generating an offline immediate log marker
(kcrfwg).
*NOT DOCUMENTED HERE*
Type I
~~~~~~
a. Current SCN WRAP
b. Current SCN BASE
c. dependent SCN WRAP
d. dependent SCN BASE
e. Where present this is the DBA where the dependent SCN came from.
From kcrf.h:
If the SCN comes from the recent or current SCN then a dba
of zero is saved. If it comes from undo$ because the undo segment is
not available then the undo segment number is saved, which looks like
a block from file 0. If the SCN is for a media recovery redo (i.e.

block number == 0 in change vector), then the dba is for block 0
of the relevant datafile. If it is from another database for a
distributed transaction then dba is DBAINF(). If it comes from a TX
lock then the dba is really usn<<16+slot.
Type II
~~~~~~~
a. checksum -> log block checksum – zero if none (thread # in old format)
—————————————————————————
Diagnosis:
~~~~~~~~~~
In addition to different basic types from above, there are different
situations where ORA-600 [2662] type I can be raised.
Getting started:
~~~~~~~~~~~~~~~~
(1) is the error raised during normal database operations (i.e. when the
database is up) or during startup of the database?
(2) what is the SCN difference [d]-[b] ( subtract argument ‘b’ from arg ‘d’)?
(3) is there a fifth argument [e] ?
If so convert the dba to file# block#
Is it a data dictionary object? (file#=1)
If so find out object name with the help of reference dictionary
from second database
(4) What is the current SQL statement? (see trace)
Which table is refered to?
Does the table match the object you found in previous step?
Be careful at this point: there may be no relationship between DBA in [e]
and the real source of problem (blockdump).
Deeper analysis:
~~~~~~~~~~~~~~~~
(1) investigate trace file:
this will be a user trace file normally but could be an smon trace too
(2) search for: ‘buffer’
(“buffer dba” in Oracle7 dumps, “buffer tsn” in Oracle8/Oracle9 dumps)
this will bring you to a blockdump which usually represents the
‘real’ source of OERI:2662
WARNING: There may be more than one buffer pinned to the process
so ensure you check out all pinned buffers.
-> does the blockdump match the dba from e.?
-> what kind of blockdump is it?
(a) rollback segment header
(b) datablock
(c) other
Check list and possible causes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If Parallel Server check both nodes are using the same lock manager
instance & point at the same control files.
Possible causes:
(1) doing an open resetlogs with _ALLOW_RESETLOGS_CORRUPTION enabled
(2) a hardware problem, like a faulty controller, resulting in a failed
write to the control file or the redo logs
(3) restoring parts of the database from backup and not doing the
appropriate recovery
(4) restoring a control file and not doing a RECOVER DATABASE USING BACKUP
CONTROLFILE
(5) having _DISABLE_LOGGING set during crash recovery
(6) problems with the DLM in a parallel server environment
(7) a bug

Solutions:
(1) if the SCNs in the error are very close, attempting a startup several
times will bump up the dscn every time we open the database even if
open fails. The database will open when dscn=scn.
(2)You can bump the SCN either on open or while the database is open
using Event:ADJUST_SCN (see Note:30681.1).
Be aware that you should rebuild the database if you use this
option.
Once this has occurred you would normally want to rebuild the
database via exp/rebuild/imp as there is no guarantee that some
other blocks are not ahead of time.
Articles:
~~~~~~~~~
Solutions:
Note:30681.1 Details of the ADJUST_SCN Event
Note:1070079.6 Alter System Checkpoint
Possible Causes:
Note:1021243.6 CHECK INIT.ORA SETTING _DISABLE_LOGGING
Note:41399.1 Forcing the database open with `_ALLOW_RESETLOGS_CORRUPTION`
Note:851959.9 OERI:2662 DURING CREATE SNAPSHOT AT MASTER SITE
Known Bugs:
~~~~~~~~~~~
Fixed In. Bug No. Description
———+————+—————————————————-
7.1.5 BUG:229873
7.1.3 Bug:195115 Miscalculation of SCN on startup for distributed TX ?
7.1.6.2.7 Bug:297197 Port specific Solaris OPS problem
7.3 Bug:336196 Port specific IBM SP AIX problem -> dlm issue
7.3.4.5 Bug:851959 OERI:2662 possible from distributed OPS select
Not fixed Bug:2216823 OERI:2662 reported when reusing tempfile with restored DB
8.1.7.4 Bug:2177050 OERI:729 space leak possible (with tags “define var info”/”oactoid info”)
can corrupt UGA and cause OERI:2662

ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
ora-600 ora-600 ora-600 ora-600 ora-600 ora-600 ora-600
2662 2662 2662 2662 2662 2662 2662 2662 2662
2662 2662 2662 2662 2662 2662 2662 2662 2662

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Summary of Bugs Containing ORA-00600 [2662] ORA-600 [2662]

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Purpose

The purpose of this Note is to explain bugs filed for ORA-00600 [2662] error against specific Oracle database versions, and explain the symptoms ofeach bug, workarounds if any and references the patch available at the time this article was written.
Scope
This article is a consolidated effort to summarize the top bugs reported (error) which have been fixed. It is directed towards Oracle Support Analystsand Oracle Customers to have an overview of various bugs logged for the same error
Error Description:

The ORA-600 [2662] is raised when data block SCN is ahead of the current SCN.
This is generally related to the redo application which is used to bring the database to a consistent state.
Summary of Bugs Containing ORA-00600 [2662]

Bug 4453449
Abstract: Flashback to guaranteed restore point in orphan inc may result in ORA-600[3020]
Versions affected: 10.2.0.1
Fixed in versions: 10.2.0.2 & 11.0
Backportable: Yes

Symptoms:

The symptom of this bug include ORA-600[3020], ORA-600[2662] after flashback
database and ORA-600[flashback_validation] during flashback database.
There may also be other symptoms.

Details:
ORA-600[3020] / ORA-600 [2662] / ORA-600 [flashback_validation] can occur
after/during multiple flashback/recovery through multiple database resetlogs
without opening the database. There may also be other symptoms which appear as
recovery related corruption errors.
Workaround:
1. If you flashback a crashed primary database, follow flashback database with open
resetlogs. Alternatively, if you’d like to completely undo flashback database,
follow flashback database with recover database without shutting down the
instance first.
2. Restore backup and recover.
Patch details:
Currently there is no one-off patch available for any platform and versions.
Bug 2899477 (Unpublished)
Abstract:ORA-600[2662] CAUSES INSTANCE CRASH
Versions affected: 9.2.0.4
Fixed in versions: 9.2.0.4 & 10.1
Backportable: Yes
Symptoms:
When you have a corrupted SCN and if the corruption is found in selexe,
getting uninitialized selenv from opiexe, then this may be the bug.

One-off patch available for few platforms on top of 9.2.0.4
Check the Metalink for Patch 2899477 availability.
Bug 2764106
Abstract: ORA-600 [2662] BRINGS THE DATABASE DOWN
Versions affected: 8.1.7.4 & 9.2.0.4
Fixed in versions: 9.2.0.5 & 10.1
Backportable: Yes
Symptoms:
OERI(2662) even The dependent scn present in the disk blocks are fine.
Details:
A false ORA-600 [2662] error can occur on SELECT operations
which can result in an instance crash even though there is no
underlying problem with the on disk SCN.
Workaround:
None
Patch details:
One-off patch available for few platforms on top of 8.1.7.4 & 9.2.0.4
Check the Metalink for Patch 2764106 availability.
Bug 2216823 (Unpublished)
Abstract:OERI(2662) REPORTED WHEN REUSING TEMPFILE WITH RESTORED DB
Versions affected: 9.2.0
Fixed in versions: 10.1.0
Backportable: No
Symptoms:
eg:
1. Create a TEMP tablespace.
2. Shutdown a database.
3. Copy control file, data files, and log files to another directory
(but not tempfile).
4. Restart a database.
5. Create a temporary table and insert into it, thereby causing tempfile
to be updated.
6. Shutdown a database.
7. Restore a database.
8. Restart a database.
9. Create a temporary table and insert into it.
10. Commit
^- ORA-600 [2662]
Details:
ORA-600 [2662] can occur when reusing a TEMPFILE with
a restored database.
Workaround:
The workaround is not to use the pre-existing tempfile.
Instead either backup the tempfile with rest of the database
or remove the tempfile then recreate a new tempfile once the
database is open.
Patch details:
Currently there is no one-off patch available for any platforms and versions
Bug 2054025 (Unpublished)

Abstract:ORA-600 [2662] RELATED TO KDIT.C
Versions affected: 9.0.1.2
Fixed in versions: 9.0.1.3 9.2.0.1
Backportable: No
Symptoms:
OERI:2662 possible on new TEMPORARY index block
Details:
ORA-600 [2662] possible on new TEMPORARY index block
Workaround:
None
Patch details:
Currently there is no one-off patch available for any platforms and versions
Bug 851959
Abstract : ORA-600 [2662] OCCURRED DURING CREATE SNAPSHOT AT MASTER SITE
Details :
It is possible to get ORA-600 [2662] caused by mis-adjustment of the Oracle7 SCN (in PARALLEL SERVER mode) when an Oracle8 instance selects from
it over a DBLINK
Version affected : 7.3.4.X
Fixed in version: 7.3.4.5
Workaround :
None
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 647927 (Unpublished)
Abstract : LOCK PROCESS DIES WITH ORA-600 [2662], [0], [40057943], [0], [40063994]
Version affected 8.0.4.X
Fixed in version : 8.0.4.2 8.0.5.0
Symptoms :
Digital Unix ONLY: OERI:2662 could occur under heavy load
Workaround :
None
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 5612217 (Unpublished)
Abstract : ORA-7445 [KDKBIN] LEADING TO ORA-600 [2662] DUE TO BUFFER CORRUPTION
Version affected : 9.2.0.X
Workaround :
None
Patch details :
One-off patch available for few platforms on top of 9.2.0.7
Check the Metalink for Patch 5612217 availability.
Bug 4599505 (Unpublished)
Abstract : ORA-600 [2662] error
Version affected : 10.2.0.X

Fixed in version : 11.0
Symptoms :
ORA-600[2662] after flashback database.
Workaround :
This problem may disappear by itself after the database has been opened for a while and its SCN has passed the SCN of the problematic block. This is
however not a guaranteed workaround
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Bug 2998110
Abstract :ORA-600 [2662] LARGE QUERIES ON STANDBY WITH LOCALLY MANAGED TMP TBLSP
Version affected : 9.2.0.X 10.1.0.X
Fixed in version : 10.2
Symptoms :
The scn of the tempfiles is advanced but not on any other files
when the database is opened in read only mode.
Workaround :
1) Increase the sort_area_size to avoid sort on disk thus avoiding the use of the tempfiles
–OR–
2) After opening the database read only and BEFORE executing any queries
against the standby database, drop and recreate the tempfiles.
–OR–
3) If you are on 10.1 release you can set the following parameter:
_init_tempfile_on_open=TRUE
in your init.ora/spfile and bounce the database.
Setting this parameter will clear all tempfile bitmaps when the database is opened
so the database open may be take a little longer.
Patch details :
Currently there is no one-off patch available for any versions/platforms.
This bug is fixed in 10.2 and is not backportable to previous releases.
Note 356583.1 has been linked to this scenario.
Bug 3517013 (Unpublished)
Abstract :OPEN DB RESETLOG AFTER FLASHBACK DB FAILS ORA-600 [KCLCHKBLK_4], [1904]
Symptoms :
1) When restored the database from backup and did an incomplete recovery.
2) Opened the database with resetlogs.
3) After opening the database, you start getting following errors:
ORA-00600 [kclchkblk_4]
ORA-00600 [2662]
4) Stack trace is:- kclchkblk kcbzib kcbgcur ktfbhget ktftfcload
Cause :
1)
Error, ORA-600[KCLCHKBLK_4], is signaled because the SCN in a tempfile block
is too high. The same reason caused the ORA-600[2662]s in the alert logs.
2)
This issue is because the tempfiles may not get reinitialized during open
resetlogs.
Patch details :
Currently there is no one-off patch available for any versions/platforms.
Note 275902.1 has been linked to this scenario and solution
given under this note.

Many other bugs were filed with development for this issue.
Those bugs are not progressed due to
— Lack of response from the customers
— one-time occurances
— Vendor OS Problem
Disclaimer :
This note contains most frequently hit bugs that can throw the error ORA-00600 [2662] . However the above mentioned are not the complete list of
bugs that can generate this error

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Oracle ORA-00600 [25027] ORA-600 [25027]

2013/03/02 by admin Leave a Comment

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

Format: ORA-600 [25027] [a] [b]

VERSIONS:
versions 9.2 and above
DESCRIPTION:
An invalid Tablespace Number (TSN) and/or Relative File Number (RFN) has been found

ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)

FUNCTIONALITY:
Kernel File management Tablespace component
IMPACT:
PROCESS FAILURE
POSSIBLE PHYSICAL CORRUPTION

SUGGESTIONS:

1. If the Arg [b] (the RDBA) is 0 (zero), then this could be due to fake indexes.

The following query will list fake indexes:

select do.owner,do.object_name, do.object_type,sysind.flags
from dba_objects do, sys.ind$ sysind
where do.object_id = sysind.obj#
and bitand(sysind.flags,4096)=4096;

If the above query returns any rows, check the objects involved and consider dropping them as they can cause this error.

Run analyze table validate structure on the table referenced in the Current SQL statement in
the related trace file.

If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Known Bugs

NB	Bug	Fixed	Description
	14010183	11.2.0.3.BP22, 11.2.0.4.BP03, 12.1.0.2, 12.2.0.0	ORA-600 [ktspfundo:objdchk_kcbgcur_3] in SMON after failed temp segment merge load
	13503554	11.2.0.4, 12.2.0.0	Various ORA-600 errors crashing the apply process in a downstreams environment
	13785716	11.2.0.4, 12.1.0.1	Intermittent ORA-600 [25027] during upgrade from 10.2 to 11.2
	11661824	11.2.0.1.BP09	Assorted Dumps by SQL*LOADER using DIRECT and PARALLEL after exadata bp8 is applied
	10067246	12.2.0.0	ORA-600 [25027] ORA-7445 [kauxs_do_dml_cooperation] by CREATE INDEX ONLINE
	14138130	11.2.0.3.5, 11.2.0.3.BP13, 11.2.0.4, 12.1.0.1	SGA memory corruption / ORA-7445 when modifying uncompressed blocks of an HCC-compressed segment
	13330018	11.2.0.4, 12.1.0.1	ora-600 [ktspfmb_add1], [4294959240] occurred, then cannot recover with ora-600[25027]
	13103913	11.2.0.2.BP15, 11.2.0.3.3, 11.2.0.3.BP03, 11.2.0.4, 12.1.0.1	ORA-600 [25027] [ts#] [1] or false ORA-1 during dml while index is being rebuilt online
	10394825	11.2.0.3, 12.1.0.1	ORA-600[25027] [..] [0] inserting to ASSM segment
	10329146	11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.1	Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
+	10209232	11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.1	ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
+	9399991	11.1.0.7.5, 11.2.0.1.3, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.1	Assorted Internal Errors and Dumps (mostly under kkpa/kcb) from SQL against partitioned tables
*	9145541	11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.1	OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g
	8837919	11.2.0.2, 12.1.0.1	DBV / RMAN enhanced to detect ASSM blocks with ktbfbseg but not ktbfexthd flag set as in Bug 8803762
	8803762	11.1.0.7.6, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1	ORA-600[kdsgrp1], ORA-600[25027] or wrong results on 11g database upgrade from 9i
	8716064	11.2.0.2, 12.1.0.1	Analyze Table Validate Structure fails on ADG standby with several errors
+	8597106	11.2.0.1.BP06, 11.2.0.2, 12.1.0.1	Lost Write in ASM when normal redundancy is used
	7251049	11.2.0.1.BP08, 11.2.0.2, 12.1.0.1	Corruption in bitmap index introduced when using transportable tablespaces
	8437213	10.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1	ASSM first level bitmap block corruption
	8356966	11.2.0.1	ORA-7445 [kdr9ir2rst] by DBMS_ADVISOR or false ORA-1498 by ANALYZE on COMPRESS table
*	8198906	10.2.0.5, 11.2.0.1	OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
*	7263842	10.2.0.4.2, 10.2.0.5, 11.1.0.7.1, 11.2.0.1	ORA-955 during CTAS / OERI [ktsircinfo_num1] / dictionary inconsistency for PARTITIONED Tables
	6666915	10.2.0.5, 11.1.0.7, 11.2.0.1	OERI[25027] / dictionary corruption from concurrent partition DDL
	6025993	10.2.0.5, 11.1.0.6	ORA-600 [25027] in flashback archiving queries
	4925342	9.2.0.8, 10.2.0.3, 11.1.0.6	OERI [25027] / OERI [25012] on IOT analyze estimate statistics
*	7190270	10.2.0.4.1, 10.2.0.5	Various ORA-600 errors / dictionary inconsistency from CTAS / DROP
	4310371	9.2.0.8, 10.2.0.2	OERI [25027] from concurrent startup / shutdown in RAC
	4177651	10.2.0.1	Row migration within a MERGE may OERI[25027]
	4020195	10.1.0.5, 10.2.0.1	OERI 25027 can occur in RAC accessing transported tablespace
	4000840	9.2.0.7, 10.1.0.4, 10.2.0.1	Update of a row with more than 255 columns can cause block corruption
	3963135	10.1.0.5, 10.2.0.1	OERI[kcbgcur_3] / OERI:25027 during bitmap index updates
	3829900	10.1.0.4, 10.2.0.1	OERI[25027] possible accessing index in 10g
	2942185	9.2.0.6, 10.1.0.4, 10.2.0.1	Corruption occurs on direct path load into IOT with ADDED columns
	3085057	10.1.0.2	ORA-600: [25027] from ALTER TABLE .. SHRINK SPACE CASCADE
	2926182	9.2.0.5, 10.1.0.2	OERI[25027] / ORA-22922 accessing LOB columns in IOT in AFTER UPDATE trigger

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

プロOracleデータベースリカバリ技術サポート

2013/03/01 by admin Leave a Comment

ORACLEデータベースによくあるエラの解決策

プロのOracle Databaseの復旧サービスを提供
携帯番号： +86 13764045638 メール：service@parnassusdata.com

データベースは企業のコアとして、データベースが壊れて、まともに運用できなければ、取り換えれない損害を及ぼして、データがなくすかもしれない。データベースに故障が起きれば、有効なバックアップもない場合に、我々こそ最後のディフェンスである。我々必ず最善を尽くして、データを救って、損害をできるだけ低めにする。私たちは元々Oracle会社のアフターサービスリカバリエンジニア（ACS）として、中国でも名を知られたプロリカバリチームである。何百回のリカバリ経験があって；修復できたバーションはORACLE 7.3、ORACLE 8/8I，ORACLE 9I、ORACLE 10G，ORACLE 11G及びORACLE 12Cで、プラットフォームはLinux、Windows、AIX、HP-UNIX、SOLARIなどもある。私たちが勤めさせたお客様はチベットから上海まで、黒龍江から海南まで、中国にあるすべてのところどころに満ちている。お客様の分類もいろいろ：医療機関、軍隊、政府、製造業者、チェーンコンビニ、社会福祉センター、医療保険、物流センター、インターネット、金融機関、病院、警察署など。どんなデータベース故障であっても、いくつのリカバリ会社も訪ねて、リカバリできなかったとしても、私たちがリカバリできる。私たちの技術力が全国一だという大袈裟なこととも言えないが、私たちがリカバリできなかったら、誰でもリカバリできるわけがないと言える。すべてのデータベースがリカバリ出来ない限り、コミットメントを果たせなかったら、何の料金も受け取らない。

もし、プロOracleリカバリ技術サポートが必要とすれば、私たちに連絡してください。
支持しているデータベースリカバリ内容はそれだけではない

誤ったdrop table リカバリ
誤ったdelete/update リカバリ
誤ったtruncate tableリカバリ
systemファイルデータベースをなくしたリカバリ
asm ディスクがフォーマットされた
bootstarp$にオブジェクトテストが壊された
asmディスク損害、mountできなかった
オペレーションシステムが削除されたデータファイルリカバリ
データベースがundo損害/喪失によって起動出来ない
データベースが一部のデータブロック損害でまともに運用出来ない
データベースがORA-600エラによって起動できない
exp dmpファイル損害、データベースリカバリをインポートできない
データベースがコントロールファイル損害/喪失で起動できなくなった
データベースがredo損害/喪失で起動できなくなった
データベースがデータファイル損害/喪失で起動できなくなった
データベースがアーカイブをなくして、データファイルがオンラインできなくなった
データベースがデータベースエラoffline systemで起動できなくなった
データベースがオペレーションシステムが壊された、データファイルだけがリカバリできる。
データベースがストレージ壊されたせいで、データベースが起動できなくなった
expdp dmpファイル損害、データベースリカバリがインポートできない。Asmディスクがなくし、asmで正常に使えなくなった
データベースが誤ったresetlogs操作によって、リカバリが続けない
データベースは電源切れたなどの原因で起動できなくなった

以下は典型的なOracle故障リカバリ
system rollback異常リカバリ
ORA-00604 ORA-00607 ORA-00600[4194]

Thu Jul 26 13:21:11 2012
Recovery of Online Redo Log: Thread 1 Group 1 Seq 3994 Reading mem 0
  Mem# 0: /orasvr/orcl/redo01.log
Block recovery completed at rba 3994.5.16, scn 0.89979533
Thu Jul 26 13:21:11 2012
Errors in file /orasvr/admin/orcl/udump/orcl_ora_2865.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [4194], [31], [2], [], [], [], [], []
Error 604 happened during db open, shutting down database
USER: terminating instance due to error 604
Instance terminated by USER, pid = 2865
ORA-1092 signalled during: ALTER DATABASE OPEN...

undo segment異常リカバリ
ORA-00704 ORA-00604 ORA-01555

Fri May  4 21:04:21 2012
select ctime, mtime, stime from obj$ where obj# = :1
Fri May  4 21:04:21 2012
Errors in file /oracle/admin/standdb/udump/perfdb_ora_1286288.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 40 with name "_SYSSMU40$" too small
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 1286288
ORA-1092 signalled during: alter database open resetlogs...
ORA-00704 ORA-00604 ORA-01173
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Feb 12 10:16:00 2014
SMON: enabling cache recovery
Errors in file H:\ORACLE\diag\rdbms\orcl\orcl\trace\orcl_ora_9232.trc:
ORA-01173: data dictionary indicates missing data file from system tablespace
Errors in file H:\ORACLE\diag\rdbms\orcl\orcl\trace\orcl_ora_9232.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01173: data dictionary indicates missing data file from system tablespace
Errors in file H:\ORACLE\diag\rdbms\orcl\orcl\trace\orcl_ora_9232.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01173: data dictionary indicates missing data file from system tablespace
Error 704 happened during db open, shutting down database
USER (ospid: 9232): terminating the instance due to error 704
Instance terminated by USER, pid = 9232
ORA-1092 signalled during: alter database open resetlogs...

obj$トランザクション異常リカバリ

ORA-00704 ORA-00600[4000]
Thu Feb 28 19:29:10 2013
SMON: enabling cache recovery
Thu Feb 28 19:29:11 2013
Errors in file /u1/PROD/prodora/db/tech_st/10.2.0/admin/PROD_oracle/udump/prod_ora_20989.trc:
ORA-00600: internal error code, arguments: [4000], [50], [], [], [], [], [], []
Thu Feb 28 19:29:13 2013
Incremental checkpoint up to RBA [0x1.3.0], current log tail at RBA [0x1.3.0]
Thu Feb 28 19:29:13 2013
Errors in file /u1/PROD/prodora/db/tech_st/10.2.0/admin/PROD_oracle/udump/prod_ora_20989.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [50], [], [], [], [], [], []
Thu Feb 28 19:29:13 2013
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 20989
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...

IO異常リカバリ

ORA-01115 ORA-01110 ORA-27070 OSD-04016
Tue May 14 15:32:10 2013
Completed redo scan
 16941 redo blocks read, 1106 data blocks need recovery
Tue May 14 15:32:17 2013
Errors in file d:\oracle\product.2.0\admin\orcl\bdump\orcl_p002_1472.trc:
ORA-01115: IO error reading block from file 6 (block # 81951)
ORA-01110: data file 6: 'D:\ORACLE\PRODUCT.2.0\ORADATA\orcl\YD_DATA01.DBF'
ORA-27070: async read/write failed
OSD-04016: Error queuing an asynchronous I/O request
O/S-Error: (OS 1117) The request could not be performed because of an I/O device error.

よくあるエラ情報
テーブルあるいはインディクスベッドブロックに関連するエラ
ORA-01578/ORA-08103/ORA-01410/ORA-08102/ORA-600 kdsgrp1/ORA-600 qertbfetchbyrowid/ORA-01499/ORA-01555/ORA-26040/ORA-27046

コントロールファイル異常
ORA-00202/ORA-600 kccsbck_first/ORA-600 kccscf_1/ORA-600[kccsbck_first]/ORA-600 kccsbck_first/ORA-00205/ORA-600 kccpb_sanity_check_2

REDOあるいはUNDO異常
ORA-00376/ORA-00600 4097/ORA-01595/ORA-600 4194/ORA-600 4193/ORA-600 kcfrbd_3/ORA-00600 4137/ORA-01594/ORA-01555/ORA-00704/ORA-00604/ORA-00607/ORA-600 4000/ORA-00600[3705]/ORA-00316/ORA-00312/ORA-00327/ORA-01623/ORA-01624/ORA-01194/ORA-600 2662/ORA-00368/ORA-00353/ORA-00305/ORA-00340/ORA-00345/ORA-00354

いろんなORA-600エラ
ORA-600 4497/ORA-600 6947/ORA-600 2662/ORA-600 4194/ORA-600 4193/ORA-00600 4137/ORA-600 4000/ORA- 600 kcrf_resilver_log_1/ORA- 600 kdxlin:psno out of range/ORA-600 3020/ORA-600 kccpb_sanity_check_2/ORA-600 3705/ORA-600 kccscf_1/ORA-600 kghstack_free2/ORA-600 kcfrbd_3/ORA-600 ktbdchk1: bad dscn/ORA-600 2252/ORA-600 kcratr_nab_less_than_odr/ORA-600 kccsbck_first/ORA-600 kcratr1_lostwrt/ORA-600 ktspNextL1:4/ORA-600 13013/ORA-600 kdsgrp1/ORA-600 kmgs_parameter_update_timeout_1/ORA-600 kcbgtcr_1/ORA-600 kcbgtcr_1a/ORA-600 kcbgtcr_3/ORA-600 kcbgtcr_4/ORA-600 kcbgtcr_5/ORA-600 kcbgtcr_6/ORA-600 kcbgtcr_7/ORA-600 kcbgtcr_10/ORA-600 kcbgtcr_12/ORA-600 kcbgtcr_13/ORA-600 qertbfetchbyrowid/ORA-600 kmgs_parameter_update_timeout_1

より速くデータベース故障を評価するために、Oracle Database Recovery Checkでデータベースをチェックしてください。html, alertログを作成して、トレースファイルをわたしに送信してください。。

データベースcurrent/active redo異常なエラ
redoファイル損害エラ
Started redo scan
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2960.trc (incident=214262):
ORA-00353: ログ損害がブロック 12014 を 9743799889 に、時間を 12/05/2011 09:21:11に変更してください
ORA-00312: オンラインログ 3 スレッド 1: ‘R:\ORADATA\orcl\REDO03.LOG’
Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdir_214262\orcl_ora_2960_i214262.trc
Aborting crash recovery due to error 368
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora_2960.trc:
ORA-00368: redoログブロックのテストとエラ
ORA-00353: ログ損害がブロック 12014を 9743799889 に、時間を 12/05/2011 09:21:11に変更してください
ORA-00312: オンラインログ 3 スレッド 1: ‘R:\ORADATA\orcl\REDO03.LOG’
ORA-368 signalled during: ALTER DATABASE OPEN…

redoファイルが別のインスタンスに占められて、エラになった
Wed May 16 17:03:11 2012
Started redo scan
Wed May 16 17:03:11 2012
Errors in file /oracle/admin/odsdb/udump/odsdb1_ora_2040024.trc:
ORA-00305: log 14 of thread 1 inconsistent; belongs to another database
ORA-00312: online log 14 thread 1: ‘/dev/rods_redo1_2_2′
ORA-00305: log 14 of thread 1 inconsistent; belongs to another database
ORA-00312: online log 14 thread 1: ‘/dev/rods_redo1_2_1′
ORA-305 signalled during: ALTER DATABASE OPEN…

ストレージ全体エラ
Mon Oct 17 09:35:09 2011
Errors in file /oracle/app/admin/orcl/bdump/orcl2_lgwr_348814.trc:
ORA-00340: IO error processing online log 4 of thread 2
ORA-00345: redo log write error block 6732 count 2
ORA-00312: online log 4 thread 2: ‘/dev/rredo21′
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 6: No such device or address
Additional information: -1
Additional information: 1024
Mon Oct 17 09:35:09 2011
LGWR: terminating instance due to error 340

ストレージIOエラ
Fri Feb 21 08:44:42 2014
Thread 1 advanced to log sequence 591 (LGWR switch)
Current log# 1 seq# 591 mem# 0: J:\ORADATA\ORCL\REDO01.LOG
Fri Feb 21 15:31:20 2014
Errors in file c:\oracle\product\10.2.0\admin\orcl\bdump\orcl_lgwr_10312.trc:
ORA-00316: log 1 of thread 1, type 286 in header is not log file
ORA-00312: online log 1 thread 1: ‘J:\ORADATA\ORCL\REDO01.LOG’

_disable_loggingバラメタを使う
Sat May 14 23:16:49 2005
Errors in file d:\oracle\admin\rman\bdump\rman_arc0_736.trc:
ORA-16038: log 3 sequence# 72 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 3 thread 1: ‘D:\ORACLE\ORADATA\RMAN\REDO03.LOG’

自分でうまくいかない場合に、私たちに連絡してください、プロなデータベース技術を提供できる：

1.データファイルをなくした(ORA-01157)
SQL> startup
ORACLE instance started.

Total System Global Area 260046848 bytes
Fixed Size 1266896 bytes
Variable Size 83888944 bytes
Database Buffers 167772160 bytes
Redo Buffers 7118848 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 4 – see DBWR trace file
ORA-01110: data file 4: ‘/u01/oracle/oradata/orcl/users01.dbf’
データファイルがなくした。対応策：
1).バックアップでデータをリカバリする
2).非undo,systemなら、そのファイルoffline して、データベースを起動する
3). Undoの場合にORA-00376エラになるかもしれない
4).もしsystem offlineなら、ORA-01147になるかもしれない

2. Redoをなくした(ORA-00313)
SQL> startup
ORACLE instance started.

Total System Global Area 260046848 bytes
Fixed Size 1266896 bytes
Variable Size 83888944 bytes
Database Buffers 167772160 bytes
Redo Buffers 7118848 bytes
Database mounted.
ORA-00313: open failed for members of log group 1 of thread 1
ORA-00312: online log 1 thread 1: ‘/u01/oracle/oradata/orcl/redo01.log’
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3
redoファイルがなくした。対応策：
1). v$logを検索し、そのredoはcurrentかactiveかを確認する
2). Redoがアーカイブされたか
3). Inactiveならclear あるいは clear unarchivedを使ってください
4). Activeあるいはcurrentなら、不完全なリカバリ、隠しバラメタなどの方法で解決してください

3. Undoをなくした(ORA-01092 ORA-00376)
SQL> startup
ORACLE instance started.

SQL> alter database datafile 2 offline drop;

Database altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-01092はフォアグラウンドエラで、alertログを検索することで探し出したバックグラウンドエラは主に：
Fri Oct 25 08:16:36 2013
Errors in file /u01/oracle/admin/orcl/bdump/orcl_smon_7437.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-00376: file 2 cannot be read at this time
ORA-01110: data file 2: ‘/u01/oracle/oradata/orcl/undotbs01.dbf’
undoエラがなくしたことで、一部のトランザクションがロールバック出来ないから、エラになる。この場合に隠しバラメタを利用してください

4. Systemをなくした(ORA-01147)

SQL> startup
ORACLE instance started.

SQL> alter database datafile 1 offline drop;

Database altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01147: SYSTEM tablespace file 1 is offline
ORA-01110: data file 1: ‘/u01/oracle/oradata/orcl/system01.dbf’
systemテーブルスペースはシステムテーブルスペースで、そのテーブルスペースのデータファイルがオフラインできない。もしそのテーブルスペースデータファイルがなくしたら、データベースがまともに起動できない。Bbedでシステムファイルをシミュレーションして、データベースを騙す (非file# 1)あるいはdulでデータを抽出する

5. コントロールファイルをなくした(ORA-00205 ORA-00202)
SQL> startup
ORACLE instance started.

Total System Global Area 260046848 bytes
Fixed Size 1266896 bytes
Variable Size 83888944 bytes
Database Buffers 167772160 bytes
Redo Buffers 7118848 bytes
ORA-00205: error in identifying control file, check alert log for more info

ORA-00205はフォアグラウンドエラで、具体的にはログと一緒に分析する必要がある：
Fri Oct 25 08:35:40 2013
ALTER DATABASE MOUNT
Fri Oct 25 08:35:40 2013
ORA-00202: control file: ‘/u01/oracle/oradata/orcl/control01.ctl’
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3

ここで、コントロールファイルがその数値がなくした、対応策は以下の通り：
1).バックアップでコントロールファイルをリカバリする
2).別のコントロールファイルを探し出して、コピする
3).データファイルをリストしてコントロールファイルを再構造する

Filed Under: Oracle, Oracle数据恢复、数据库恢复、灾难恢复

Oracle 数据库完全破损/损坏对策

2013/02/28 by admin Leave a Comment

Block Corruption 造成的影响
对商业运行造成的直接影响

数据损坏导致业务停止
数据恢复时间变长
修复工作发生错误，造成二次灾害
寻找原因的时间变长

数据损坏的主要原因

复杂并且有可能被分割的Layer的风险

系统中数据保护的机制

§ 单个系统中，考虑到了一些无法防御的的风险的机制。

– 例：因为人为错误删除数据时，在RAID结构中无法防御。

§ 复制数据中保护一致性与确实性的机制。

– 例：部分备份导致的数据完整性的欠缺。

§ 考虑到迅速切换以及确实的修复的机制。

– 例：由于灾害恢复训练不足，实际上无法切换，无法返回的备份。

为了确实地能提高工作的可持续性需要什么？

Oracle Maximum Availability Architecture

§ Maximum Availability Architecture (MAA) 是基于oracle验证完成的高可用性的数码与成功事例的最佳实践。

§ MAA的目的

– 为了修复、查出、规避所有的停止的情况提供最优实践。

– 在样本中构成最优的高可用性的架构。

§ 不受硬件以及OS影响（不需要特定的高价的产品以及技术）

§ 马上就可以提供高可用性的解决策略（Oracle事先完成验证）

高可用性的数码的最佳实践。

Oracle MAA 的整体映像

Low-Cost, Integrated, Fully Active, High ROI

Oracle MAA の Data Protection

Oracle Database 11g Release 2

Oracle Database 的
Data Protection功能

Type of Block Corruption

§ Data Block Corruption(Doc ID 840978.1)

– Physical Block Corruption

– Logical Block Corruption

§ Other Corruptions

– Control file Corruption

§ Use control file mirror & Copy

– Redo Corruption

§ ASM Mirroring / Use multiplexed log file

– Dictionary Corruption（Doc ID 136697.1）

Doc ID 1088018.1 Master Note for Handling Oracle Database Corruption

KROWN#152523 [Master Notes] Corruption(破损)

Data Block Corruption

§ 物理性的块内数据破损状态的块

– 破损例

• Block Header不正确

• Block Header与Footer信息不一致

• 数据缺失

• Block配置地点不正确

• 0隐藏的Block

Physical Block Corruptions

Data Block Corruption

§ Block中结构发生理论破损的状态的块

– 物理性（Block Header以及Footer的信息、Checksum的计算结果）正确

– 破损例

• 行碎片的开始与结束位置不在块内。

• 行碎片直接发生重叠

• 锁定行碎片的ITL编号不正确

• ITL展示的锁定中的行碎片的数量与实际不一致

• Block中空白区域的尺寸不正确

• Lost Write

Logical Block Corruptions

Oracle Database中最适合的破损检测

§ Oracle Database的Block并不是单独的bit的罗列，
明确地事先定义过的结构

à 正是有理解了Block的结构的Oracle Database，可以检测
Physical Corruption以及Logical Corruption。

à 并且，通过同时使用Oracle Database的各*Technology，可以提高***检测水平

– OS、文件系统以及存储中，仅仅是和命令一样对块进行I/O，但无法判断块的结构作为一个数据块来说是否正确。Exadata Cell Storage Serverと、H.A.R.D Initiative 是可以检测的

Data Block Format

Oracle DatabaseのData Protection

§ 控制检测范围、水平、破损类型的初始化参数

– DB_BLOCK_CHECKSUM

– DB_BLOCK_CHECKING

– DB_LOST_WRITE_PROTECT

§ 定期检测功能

– Oracle Recovery Manager(CHECK LOGICAL句 / VALIDATE 命令)

– SQL> ANALYZE TABLE文（VALIDATE STRUCTURE CASCADE 选项）

提高检测水平的功能

DB_BLOCK_CHECKSUM

§ 利用Data Block中的Checksum的Physical Corruption检测的机制

– 向Disk写入Block之前

§ 将DBWR以及执行Direct Load的服务器进程、Checksum（从Block中所有数据为基础来计算出的数值）储存到Block Header之中。

– 从Disk中读出Block后

§ 读出Block的进程将重新计算的Checksum与储存到Block Header中的Checksum进行比较验证。

à　如果Checksum不一致时，因为块内数据可能会产生变更，可能判断为Physical Block Corruption以及发生了

概要

DB_BLOCK_CHECKSUM

设定值中的每个操作

DB_BLOCK_CHECKSUM

§ 请注意设定值的不同造成的操作的不同

– DB_BLOCK_CHECKSUM=TYPICAL

• 由于服务器进程的Block更新之后，Checksum为0

• DBWR在向Disk写入时，计算Checksum再嵌入

à 每次更新时，因为不会计算*Checksum而高效化***

– DB_BLOCK_CHECKSUM=FULL

• 由于服务器进程的Block更新之后，计算Checksum再嵌入

• DBWR在向Disk写入时，验证Checksum

à可以验证内存上的Block Corruption

对于Buffer Cache上被更新的Data Block
Checksum嵌入时机

DB_BLOCK_CHECKSUM

§ 每次发行时，请注意其中不同的操作

– Release 11.1以降では、Redo BlockのChecksumの
将生成处理由生成了Redo的Foreground Process担当à减轻 LGWR的负荷

§ 但是，Release 11.1~11.2.0.1中，设定为FULL时，
LGWR在向磁盘写入Redo Block之前，会执行Foreground Process所生成的Checksum的完整性检查。

对Redo Block进行Checksum的生成与验证（KROWN#155653）

DB_BLOCK_CHECKSUM

检测出破损的操作

DB_BLOCK_CHECKSUM

§ Primary Database’s Alert.log

用Automatic Block Media Recovery(ABR)来自动修复时所参考的日志

DB_BLOCK_CHECKING

§ 在Buffer Cache上变更Block之后，
通过检测理论性的完整性，可以检测Logical Corruption

– 即使是Checksum正确，可以检测理论性的不正确的状态。

– 变更后被标记的理由是由于DML数据更新以外，包含变更。

§ 例：伴随着DBWR的写出，变更Block Header的信息

– 不经过Buffer Cache的Direct Load Operation不在本次检测对象中

– 根据参数的设定值，可以控制检测的对象以及水平

概要

DB_BLOCK_CHECKING

§ 上位设定值包含下位设定值的检测

– 比如，「LOW」=「OFF」的检查 + 所有的Block的Block Header Check

– 基本上，Buffer Cache上的Block内容在变更后会执行检测，但
Block Header Check是RAC的实例之间的Block在传送后也会被执行

设定值的操作

DB_BLOCK_CHECKING

– 包含没有commit的redo，发生错误之前的Block的状态

à 同一会话中继续进行事务的可能

检测出破损后的操作（Disk上的Block是正常的情况）

DB_BLOCK_CHECKING

§ Oracle Client

§ Alert.log

检测出时的参考日志（Disk上的Block正常的情况）

DB_BLOCK_CHECKING

– 即使自动修复失败，请注意成功时，只会返回同样的错误。

§ 之后，重新访问block时，会发生ORA-1578 或者ORA-600

à 需要手动操作 Block Media Recovery（ABR不会发动）

检测出破损后的操作（Disk上的Block也破损的情况）

DB_BLOCK_CHECKING

§ Oracle Client

§ Alert.log

检测时的参考日志（Disk上的Blockも发生破损的情况）

Soft Corrupt

§ 检测出破损的（Oracle的破损与认识）Block中被加上的标记

– 访问这些被标记的块时，会发生ORA-1578。

不是与Physical Corruption / Logical Corruption同列的单词

DB_LOST_WRITE_PROTECT

§ 不管从存储装置中block的写入完成是否发出通知，实际上磁盘中没有被写入的事项。

– 因为Data Block的构造是正常的，
即使访问发生了Lost Write 的Block也没有发生错误

à 可能会有提供不正确的数据给顾客或者用户的风险

à 不正确的数据污染可能会扩散

§ Lost Write所影响的例子

– 不管是否有没有储备，都作为有储备来接收订单

– 不管是否接受订单都会变成没有接收订单

Lost Write是什么？

DB_LOST_WRITE_PROTECT

§ Data Guard（Physical Standby Database）中检测出Lost Write的机制Primary Database中从磁盘中读出Block时，生成验证用的redo

§ Data File Number

§ Data Block Address（DBA）

§ System Change Number（SCN）

• Data Guard的机制中，对Physical Standby Database传送Redo

• 比较验证Standby Database方的对象Block与Redo内的SCN

à　如果SCN不一致时，可能发生Lost Write

概要

DB_LOST_WRITE_PROTECT

§ 需要Primary Database以及Standby Database两方面的设定

– Primary 或者 Standby Database的两方面的设*定都是NONE***时无效

§ Primary因为没有生成验证用的redo无法用Standby来验证

§ 即使用Primary来生成Redo，用Standby也不能验证

各个设定值的操作

DB_LOST_WRITE_PROTECT

§ 检测Lost Write的时机是？

– 从Primary Database中发生Lost Write的Block，从磁盘向Buffer读入时生成的Redo，Standby Database的MRP验证时

– à　不是发生Lost Write时（对Disk的写入不足的瞬间）
　　将发生Lost Write的*Block从Disk***读入的时机

– 特定Block中，即使发生Lost Write，只要不使用那个Block

§ 搜索结果以及更新事务都正常

§ 不正确的数据不会传染到其他的块中

Lost Write的发生以及检测时机不一致

DB_LOST_WRITE_PROTECT

§ 验证用redo的生成仅限于向Buffer Cache读入Block时

– 不经过Buffer Cache的Direct Path Read中，不生成验证用的*Redo***

§ 非Data Guard环境中，用TYPICAL以上的设定来生成验证用Redo

– Media recovery（前滚）时可以验证Lost Write

§ 还可以验证Standby Database中发生的Lost Write

– 与Primary中生成验证用redo + Standby中验证这样的功能相同

– 仅限这种情况，但也可以用ASM Mirror以及ABR来自动修复（后面将讲到）

動作の補足

DB_LOST_WRITE_PROTECT

Lost Write检测的流程（Primary中发生Lost Write的情况）

DB_LOST_WRITE_PROTECT

§ Primary中发生的Lost Write在Standby中被检测出来

– 记录Standby DatabaseのAlert.log中发生的ORA-752

– 为了保护Standby Database的数据，自动停止MRP进程

§ 终止以后的Redo适用，防止不正确数据导致的数据污染

– PRxx进程的Trace File（xxx_xxx_prxx_xxx.trc）中记录了Block的详细内容

§ 访问Primary中的对象block时所生成的Redo记录的Dump

§ Standby中所保存的对象Block的Dump

– 从这些信息中，确认以后会发生Lost Write

检测出Lost Write后的操作（Primary中发生Lost Write的情况）

DB_LOST_WRITE_PROTECT

Wed Oct 23 19:18:08 2013

Hex dump of (file 7, block 131) in trace file /u01/app/oracle/diag/rdbms/orcls/orcls1/trace/orcls1_pr02_1401.trc

Reading datafile ‘+DATA/orcls/datafile/lw.281.829593935’ for corruption at rdba: 0x01c00083 (file 7, block 131)

Read datafile mirror ‘DATA_0004’ (file 7, block 131) found same corrupt data (logically corrupt)

Read datafile mirror ‘DATA_0006’ (file 7, block 131) found same corrupt data (logically corrupt)

STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE

LOST A DISK WRITE OF BLOCK 131, FILE 7

NO REDO AT OR AFTER SCN 3987667 CAN BE USED FOR RECOVERY.

Recovery of Online Redo Log: Thread 1 Group 7 Seq 103 Reading mem 0

Slave exiting with ORA-752 exception

Errors in file /u01/app/oracle/diag/rdbms/orcls/orcls1/trace/orcls1_pr02_1401.trc:

ORA-00752: 由于恢复导致检测出数据块的写入欠缺。

ORA-10567: Redo is inconsistent with data block (file# 7, block# 131, file offset is 1073152 bytes)

ORA-10564: tablespace LW

ORA-01110: *数据文件7: ‘+DATA/orcls/datafile/lw.281.829593935’***

ORA-10561: block type ‘TRANSACTION MANAGED DATA BLOCK’, data object# 87637

Wed Oct 23 19:18:12 2013

Recovery Slave PR02 previously exited with exception 752

Wed Oct 23 19:18:12 2013

MRP0: Background Media Recovery terminated with error 448

Errors in file /u01/app/oracle/diag/rdbms/orcls/orcls1/trace/orcls1_pr00_1395.trc:

ORA-00448: バックグラウンド・プロセスが正常終了しました。

MRP0: Background Media Recovery process shutdown (orcls1)

Lost Write检出时的Standby Database的Alert.log输出的一部分摘选

（Primary中发生Lost Write的情况）

DB_LOST_WRITE_PROTECT

Hex dump of (file 7, block 131)

Dump of memory from 0x00000000F03B0000 to 0x00000000F03B2000

0F03B0000 0000A206 01C00083 003CC55A 04010000 [……..Z.<…..]

…

STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE

LOST A DISK WRITE OF BLOCK 131, FILE 7

The block read on the primary had SCN 3977658 (0x0000.003cb1ba) seq 1 (0x01)

while expected to have SCN 3982682 (0x0000.003cc55a) seq 1 (0x01)

The block was read at SCN 3987667 (0x0000.003cd8d3), BRR:

CHANGE #1 TYP:2 CLS:6 AFN:7 DBA:0x01c00083 OBJ:87637 SCN:0x0000.003cb1ba SEQ:1 OP:23.2 ENC:0 RBL:1

…

REDO RECORD – Thread:1 RBA: 0x000067.00000128.0010 LEN: 0x0034 VLD: 0x10

SCN: 0x0000.003cd8d3 SUBSCN: 1 10/23/2013 19:18:16

(LWN RBA: 0x000067.00000126.0010 LEN: 0003 NST: 0001 SCN: 0x0000.003cd8cf)

CHANGE #1 TYP:2 CLS:6 AFN:7 DBA:0x01c00083 OBJ:87637 SCN:0x0000.003cb1ba SEQ:1 OP:23.2 ENC:0 RBL:1

Block Read – afn: 7 rdba: 0x01c00083 BFT:(1024,29360259) non-BFT:(7,131)

scn: 0x0000.003cb1ba seq: 0x01

flags: 0x00000006 ( dlog ckval )

续） PRxx的Trace File输出例（Primary中发生了Lost Write的情况）

DB_LOST_WRITE_PROTECT

Lost Write检测流程（Standby中发生Lost Write的案例）

DB_LOST_WRITE_PROTECT

§ 在Standby Database检测出了 Lost Write。

§ 与Primary中发生的Lost Write不同，自动进行修复

– Primary Database中因为保存了最新的正常Block

 用Data Guard的Automatic Block Media Recovery自动修复

– 如果自动修复的话就不会发生ORA-752（Alert.log中没有输出）

§ MRP进程正常继续运行

检测出Lost Write的操作（Standby中发生了Lost Write的情况）

DB_LOST_WRITE_PROTECT

检测出Lost Write后的操作总结

DB_ULTRA_SAFE Parameter

§ 通过变更DB_ULTRA_SAFE参数值，可以一起变更3个参数值

– 本参数的默认值是FALSE

– 明确地个别设定各个参数时，优先个别设定值

提高检测水平的3个参数的一致设定

OLTP系的Workload中性能比较

§ Exadata X2-2 Quarter Rack（ 2node RAC结构）

§ Oracle Database 11g Release 11.2.0.4

§ Oracle Grid Infrastructure 11g Release 11.2.0.4

§ Exadata Storage Server Software 11g Release 11.2.3.2.1

– Write Through Mode

验证环境

OLTP系的Workload中性能比较

§ 在下面的用户脚步中反复执行泛用的SQL

Workload与Transaction的定义

OLTP系的Workload中性能比较

§ 验证变更了Transaction的比例的三个Workload

– 伴随着Test#的增加，将更新处理的比例

Transaction比例的模式

OLTP系的Workload中性能比较

§ 在前页的各个Workload中，验证下面四个设定模式

Parameter的设定模式

OLTP系的Workload中性能比较

検証結果）全模式测试的吞吐量（Transaction Per Sec）

OLTP系的Workload中性能比较

验证结果）全模式测试的Response Time

OLTP系的Workload中性能比较

验证结果）全模式测试的CPU使用率

Oracle DatabaseのData Protection

§ Oracle Recovery Manager（RMAN）

– VALIDATE评论Data File、Backup File(Image Copy / Backup Piece)的破损检查

– CHECK LOGICAL句

§ 追加Physical Corruption的检测，检测Logical Corruption

§ 可以与BACKUP / VALIDATE / RECOVER 评论同时检测

§ ANALYZE TABLE <TableName> VALIDATE STRUCTURE CASCADE ;

– 可能检测出表与索引Block之间的不完整

提供定期破损检测的功能

RMAN> Validate Check Logical

RMAN> validate check logical datafile 18 ;

Starting validate at 28-AUG-13

using target database control file instead of recovery catalog

…

channel ORA_DISK_1: starting validation of datafile

channel ORA_DISK_1: specifying datafile(s) for validation

input datafile file number=00018 name=+DATA/orcl/datafile/tt.316.824637849

channel ORA_DISK_1: validation complete, elapsed time: 00:00:01

List of Datafiles

=================

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN

—- —— ————– ———— ————— ———-

18 OK 1 277 320 224873829

File Name: +DATA/orcl/datafile/tt.316.824637849

Block Type Blocks Failing Blocks Processed

———- ————– —————-

Data 0 4

Index 0 1

Other 0 38

Finished validate at 28-AUG-13

参考输出日志

SQL> ANALYZE TABLE VALIDATE STRUCTURE CASCADE ;

– 如果发生了Physical Block Corruption，那么就会发生ORA-1578

– 如上所示，发生ORA-1499的话，表以及索引之间发生不完整的状态。

§ 需要区分到底是在那个块中发生了故障。

– 例：FULL提示，或者使用NO_INDEX提示执行全表搜索

KROWN#68739 参考输出日志

Oracle Database 的
Data Protection

Block Corruption 造成的影响
对商业造成的直接伤害

ü数据损伤造成的业务停止

ü修复时间变长

ü修复工作的人为错误、二次灾害

ü探究原因的时间变长

à 需要快速找出故障以及执行正确的修复

Oracle MAA的数据保护功能一览

Oracle Database 11g Release 2

Oracle MAA

主要的修复功能

Oracle ASM / Mirroring

§ 需要Normal（二重化）以及High Redundancy（三重化）的结构

§ 检测到破损时，参考Mirror数据自动进行修复

– 对于Oracle Client（ORA错误不会返回）

– Primary中关于发生的Lost Write，不在此次对象中

§ Standby中发生Lost Write时，使用Mirror防止误检测

§ Redo Block破损时，参考Mirror数据

– Normal/High Redundancy的ASM Diskgroup上配置Redo Log file

§ Doc ID 1274318.1

通过Oracle Client对块进行自动修复

Automatic Block Media Recovery

由于Active Data Guard进行穿透性的块修复

Automatic Block Media Recovery

§ Standby中检测出块破损是，就会用逆向ABR进行自动修复

– 对象块破损

§ Standby中的Physical Block Corruption

– 用DB_BLOCK_CHECKSUM的功能检测出的项目

– 对Soft Corrupt以及标记完成的块进行访问时，不会有任何操作
（ABR至多在标记之前进行尝试修复。）

§ Standby中发生了Lost Write的Block

– Primary中发上来Lost Write的Block不在此次对象中
（更新不正确的块时，生成的不正确的redo是无法修复的）

操作的追加信息

Automatic Block Media Recovery

§ Primary Database’s Alert.log

检测块破损以及用ABR进行自动修复时的参考日志

Oracle Recovery Manager

§ 储存修复对象块的Data File只有在ONLINE状态下才可以执行

§ Lost Write导致的块破损不在此次对象中

– 修复指定块的命令

§ 一般而言，在V$DATABASE_BLOCK_CORRUPTION视图中，指定被表示的block（检测出破损的块）（执行时也需要检测破损）

– 将在V$DATABASE_BLOCK_CORRUPTION视图中表示的视图一起修复的命令。

用RECOVER命令来对块进行修复

Oracle Recovery Manager

§ 对块单位的恢复来说，需要可以Restore的正常块

– 正常块的搜索地址的优先顺序如下所示

• Active Data Guard的Physical Standby Database

• Flashback Log （Recovery执行中的Database内）

• RMAN Image Backup （Recovery执行中的Database内）

– 前述都是正常的块被Restore，用自动恢复来实现最新化

– 如果块单位中无法修复时，需要用数据文件单位的恢复

怎样的块才会是可以成为块单位中恢复基准的正常块呢？

Oracle Enterprise Manager Cloud Control 12^c

§ 根据Database环境状况，可以简单实现最适合的恢复

– 考虑块单位中的恢复是否可能的命令

– 用Oracle Enterprise Manager可以简单地执行恢复能

§ 执行例

– 在下面的Database环境中，我们将介绍用Physical Block Corruption以及
EM（Data Recovery Advisor）来修复的顺序

§ 没有Active Data Guard环境

§ 有Flashback Log（但是是过了保存期限的状态）

§ 没有RMAN Image Copy Backup

Data Recovery Advisor中自动生成修复命令

Oracle Enterprise Manager Cloud Control 12^c

早期掌握Incident Manager的故障

Oracle Enterprise Manager Cloud Control 12^c

Data Recovery Advisor的执行

Oracle Enterprise Manager Cloud Control 12^c

§ 考虑Database环境的状况（3页之前），
判断为块单位中的恢复是不可能的

Data Recovery Advisor的执行结果

Oracle Enterprise Manager Cloud Control 12^c

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com

ORACLEデータベース によくあるエラ の解決策

プロのOracle Databaseの復旧サービスを提供 携帯番号： +86 13764045638 メール：service@parnassusdata.com

Block Corruption 造成的影响 对商业运行造成的直接影响

数据损坏的主要原因

复杂并且有可能被分割的Layer的风险

系统中数据保护的机制

§ 单个系统中，考虑到了一些无法防御的的风险的机制。

– 例：因为人为错误删除数据时，在RAID结构中无法防御。

§ 复制数据中保护一致性与确实性的机制。

– 例：部分备份导致的数据完整性的欠缺。

§ 考虑到迅速切换以及确实的修复的机制。

– 例：由于灾害恢复训练不足，实际上无法切换，无法返回的备份。

为了确实地能提高工作的可持续性需要什么？

Oracle Maximum Availability Architecture

§ Maximum Availability Architecture (MAA) 是基于oracle验证完成的高可用性的数码与成功事例的最佳实践。

§ MAA的目的

– 为了修复、查出、规避所有的停止的情况提供最优实践。

– 在样本中构成最优的高可用性的架构。

§ 不受硬件以及OS影响（不需要特定的高价的产品以及技术）

§ 马上就可以提供高可用性的解决策略（Oracle事先完成验证）

高可用性的数码的最佳实践。

Oracle MAA 的整体映像

Low-Cost, Integrated, Fully Active, High ROI

Oracle MAA の Data Protection

Oracle Database 11g Release 2

Oracle Database 的 Data Protection功能

Type of Block Corruption

§ Data Block Corruption(Doc ID 840978.1)

– Physical Block Corruption

– Logical Block Corruption

§ Other Corruptions

– Control file Corruption

§ Use control file mirror & Copy

– Redo Corruption

§ ASM Mirroring / Use multiplexed log file

– Dictionary Corruption（Doc ID 136697.1）

Doc ID 1088018.1 Master Note for Handling Oracle Database Corruption

KROWN#152523 [Master Notes] Corruption(破损)

Data Block Corruption

§ 物理性的块内数据破损状态的块

– 破损例

• Block Header不正确

• Block Header与Footer信息不一致

• 数据缺失

• Block配置地点不正确

• 0隐藏的Block

Physical Block Corruptions

Data Block Corruption

§ Block中结构发生理论破损的状态的块

– 物理性（Block Header以及Footer的信息、Checksum的计算结果）正确

– 破损例

• 行碎片的开始与结束位置不在块内。

• 行碎片直接发生重叠

• 锁定行碎片的ITL编号不正确

• ITL展示的锁定中的行碎片的数量与实际不一致

• Block中空白区域的尺寸不正确

ORACLEデータベースによくあるエラの解決策

プロのOracle Databaseの復旧サービスを提供
携帯番号： +86 13764045638 メール：service@parnassusdata.com

Block Corruption 造成的影响
对商业运行造成的直接影响

Oracle Database 的
Data Protection功能

§ Oracle Database的Block并不是单独的bit的罗列，
明确地事先定义过的结构

à 正是有理解了Block的结构的Oracle Database，可以检测
Physical Corruption以及Logical Corruption。

à 并且，通过同时使用Oracle Database的各*Technology，可以提高***检测水平

à　如果Checksum不一致时，因为块内数据可能会产生变更，可能判断为Physical Block Corruption以及发生了

à 每次更新时，因为不会计算*Checksum而高效化***

• DBWR在向Disk写入时，验证Checksum

对于Buffer Cache上被更新的Data Block
Checksum嵌入时机

– Release 11.1以降では、Redo BlockのChecksumの
将生成处理由生成了Redo的Foreground Process担当à减轻 LGWR的负荷

§ 但是，Release 11.1~11.2.0.1中，设定为FULL时，
LGWR在向磁盘写入Redo Block之前，会执行Foreground Process所生成的Checksum的完整性检查。

§ 在Buffer Cache上变更Block之后，
通过检测理论性的完整性，可以检测Logical Corruption