一套HP-UX Itanium平台上的10.2.0.2,实例意外终止,维护人员尝试重启实例,在数据库打开后数秒,smon后台进程报ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110]错误,实例再次意外终止。
部分告警日志如下:
Corrupt Block Found TSN = 50, TSNAME = TS_DNI_AAL_12 RFN = 120, BLK = 849708, RDBA = 504166188 OBJN = 701796, OBJD = 701796, OBJECT = MAP_WOL_SILJUK, SUBOBJECT = SEGMENT OWNER = DBOWN, SEGMENT TYPE = Table Segment Corrupt Block Found TSN = 50, TSNAME = TS_DNI_AAL_12 RFN = 121, BLK = 897927, RDBA = 508408711 OBJN = 701796, OBJD = 701796, OBJECT = MAP_WOL_SILJUK, SUBOBJECT = SEGMENT OWNER = DBOWN, SEGMENT TYPE = Table Segment Mon May 4 19:38:19 2009 Errors in file /oracle/admin/TDAY2DB/udump/tday2db_ora_2080.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [121], [897927], [6110], [], [], [], [] Mon May 4 19:38:19 2009 Errors in file /oracle/admin/TDAY2DB/udump/tday2db_ora_2077.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [849708], [6110], [], [], [], [] Mon May 4 21:58:55 2009 Recovery of Online Redo Log: Thread 1 Group 5 Seq 90582 Reading mem 0 Mem# 0: /dev/vx/rdsk/day2db1tdg03/redo05.log Block recovery completed at rba 90582.42.16, scn 1858.566253676 ORACLE Instance TDAY2DB (pid = 22) - Error 81 encountered while recovering transaction (14, 22) on object 701796. Mon May 4 21:58:55 2009 Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_smon_17651.trc: ORA-00081: address range [0x60000000000BD230, 0x60000000000BD234) is not readable ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], [] Mon May 4 21:58:55 2009 Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_smon_17651.trc: ORA-00081: address range [0x60000000000BD230, 0x60000000000BD234) is not readable ORA-00081: address range [0x60000000000BD230, 0x60000000000BD234) is not readable ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], [] Mon May 4 21:58:57 2009 Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_p020_17710.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], [] Mon May 4 21:58:58 2009 Doing block recovery for file 120 block 856039 Block recovery from logseq 90582, block 37 to scn 7980615489643 Mon May 4 21:58:58 2009 Recovery of Online Redo Log: Thread 1 Group 5 Seq 90582 Reading mem 0 Mem# 0: /dev/vx/rdsk/day2db1tdg03/redo05.log Block recovery completed at rba 90582.42.16, scn 1858.566253676 Mon May 4 21:58:58 2009 SMON: Restarting fast_start parallel rollback Mon May 4 21:58:58 2009 Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_p000_17661.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], [] Mon May 4 21:58:59 2009 Doing block recovery for file 120 block 856039 Block recovery from logseq 90582, block 37 to scn 7980615489643 Mon May 4 21:58:59 2009 Recovery of Online Redo Log: Thread 1 Group 5 Seq 90582 Reading mem 0 Mem# 0: /dev/vx/rdsk/day2db1tdg03/redo05.log Block recovery completed at rba 90582.42.16, scn 1858.566253676 Mon May 4 21:58:59 2009 SMON: ignoring slave err,downgrading to serial rollback Mon May 4 21:59:00 2009 Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_smon_17651.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], [] Mon May 4 21:59:09 2009 Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_pmon_17633.trc: ORA-00474: SMON process terminated with error Mon May 4 21:59:09 2009 PMON: terminating instance due to error 474 Instance terminated by PMON, pid = 17633
当Oracle进程在读取数据块时会做一系列逻辑检测,当发现块当中存在逻辑讹误就会触发该ORA-00600 [kddummy_blkchk]等内部错误;[kddummy_blkchk]内部函数的功能大致与[kdBlkCheckError]相仿,它们都有3个参数argument:
ORA-600 [kddummy_blkchk] [file#] [block#] [check code]
ORA-600 [kdBlkCheckError] [file#] [block#] [check code]
file#即问题块所在datafile的文件号,block#即问题块的块号,check code为发现逻辑讹误时的检测种类代码;我们也可以通过file#和block#查找到存在问题的对象,譬如这个case中的file#为120,block#为856039,检查种类代码为6110:
Select segment_name,segment_type,owner from dba_extents where file_id=120 and 856039 between block_id and block_id + blocks -1;
当然以上查询是建立在我们能够打开数据库的前提下的,针对由ORA-600[[kddummy_blkchk]或[kdBlkCheckError]引起的实例意外终止及启动实例失败等现象,我们可以通过修改db_block_checking和db_block_checksum 2个参数为false,来阻止Oracle进程对数据块的一些逻辑检测工作:
SQL> alter system set db_block_checking=false; System altered. SQL> alter system set db_block_checksum=false; System altered.
以上参数能够一定程度上规避ORA-600[kddummy_blkchk]或[kdBlkCheckError]的出现;但因为10g中隐式参数_db_always_check_system_ts控制了Oracle是否对system表空间上的对象进行block check和checksum(_db_always_check_system_ts:Always perform block check and checksum for System tablespace),且该隐式参数默认为TRUE;因此你还是有一定概率无法打开数据库,如遇此类argument[a]对应为system表空间的ORA-600[kddummy_blkchk]内部错误,可以尝试使用10513来进一步阻止数据库打开后smon进程的事务恢复(transaction recovery)行为:
SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile; System altered. SQL> shutdown immediate; ORA-01507: database not mounted ORACLE instance shut down. SQL> startup ; ORACLE instance started. Total System Global Area 2634022912 bytes Fixed Size 2086288 bytes Variable Size 2399144560 bytes Database Buffers 218103808 bytes Redo Buffers 14688256 bytes Database mounted. Database opened. SQL> show parameter event NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ event string 10513 trace name context forever,level 2
在这个案例中我们尝试设置db_block_checksum 和 db_block_checking为false,打开了数据库,并进一步对存在问题块的表执行了导出导入的工作,最终解决了问题。
10g中默认参数db_block_checksum为TRUE,所以建议你在解决类似问题后,将该参数还原。
Bug 7662491 – Array Update can corrupt a row. ORA-600 [kghstack_free1] ORA-600 [kddummy_blkchk][6110/6129]
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.1.0.7 – Release: 10.2 to 11.1
Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.1.0.7 [Release: 10.2 to 11.1]
Information in this document applies to any platform.
Description
With compatibilty set to 10.2 or higher, a row can be corrupted if it is updated with an Array Update (eg: with plsql FORALL … UPDATE or an UPDATE of several rows) .
Likelihood of Occurrence
The rediscovery information is:
1) COMPATIBLE = 10.2 or higher;
2) there is high concurrency in the block;
3) a transaction is added as part of the 1st set of rows updated by a transaction doing an array update
4) the addition of the transaction necessitates an internal block compaction;
5) the 1st row in the 2nd array update by the same transaction updates a row that was moved during a compact operation on the block to accomodate a new transaction.
Possible Symptoms
Several internal errors can be produced such as:
ORA-600 [kghstack_free2]
ORA-600 [kcbzpbuf_1]
ORA-600 [kcbbvr_verify_disk_blk_1]
ORA-600 [kdourp_inorder2]
ORA-600 [kcbnlc_2]
ORA-7445 [_memmove]
ORA-7445 [ksdfsql]
This may introduce a corruption on the block on disk, and depending of the row corruption DBVERIFY may show errors:
Page <Block#> failed with check code 6110
kdbchk: fsbo(<XX>) wrong, (hsz <YY>)
Page <Block#> failed with check code 6129
When the corruption is on disk RMAN reports it with corruption_type=LOGICAL in v$database_block_corruption
If db_block_checking is set to MEDIUM or FULL the next errors can be produced preventing the corruption going to disk:
ORA-600 [kdBlkCheckError] [file#] [block#] [check code]
Where check code can be 6110, 6129. Other codes may also be
present as the most important fact is an Array Update.
The trace file shows that the error is produced by an UPDATE with OP:11.19 (Array Update) and check code [6110] (“kdbchk: the amount of space used is not equal to block size”) or check code [6129] (“kdbchk: fsbo(<XX>) wrong, (hsz <YY>)”). Note that check codes are not limited to 6129/6110.
Workaround or Resolution
This is caused by Bug 7662491. In order to prevent this issue install Patch 7662491
The fix prevents this corruption to be introduced in the Redo/Archived log and does not fix any existent corruption that is already introduced in the Redo/Archived logs. Meaning that if the patch is installed and a media recovery is attempted using the same archived logs, the same corruption is still introduced.
Workaround:
If the corruption is on disk, recreate the table skipping the corrupt row(s) using rowid or
use dbms_repair to skip the affected block.
Media Recovery like RMAN Blockrecover or “RECOVER DATAFILE” don’t fix the corruption as it is already introduced in the Redo/Archived Logs.
If a block is already corrupt on disk and if it is avoiding a media RECOVERY to continue or preventing the database to be OPEN, an option is to set db_block_checking to Medium or Full. This will mark the block as Soft Corrupt allowing theRECOVERY/OPEN to success and subsequent block reads produce error ORA-1578.
Patches
In order to prevent this issue install Patch 7662491
Hdr: 7662491 10.2.0.4.0 RDBMS 10.2.0.4.0 RAM DATA PRODID-5 PORTID-23 ORA-600
Abstract: INSTANCE CRASH / ORA-600 [KDDUMMY_BLKCHK] HIT DURING RECOVERY
PROBLEM:
——–
Provide the following:
1. Clear description of the problem encountered
Customer did observe 2 instance crashes which have been caused by block
corruptions in the same table (has been recreated after 1st occurance!)
2. Pertinent configuration information (MTS/OPS/distributed/etc)
3. Indication of the frequency and predictability of the problem
Problem is not reproducable at will
4. Sequence of events leading to the problem
Unknown
5. Technical impact on the customer. Include persistent after effects.
Instance crash + unable to startup without special event setting
DIAGNOSTIC ANALYSIS:
——————–
Checking backup’s as made by customer made clear that there was no option to
diagnose the 1st corruption due to fact that all backup’s did contain a
corrupted version of the block.
The 2nd occurrence was on a clean file (table was recreated) and during
restore/recovery on test system the corruption was trapped with block
checking enabled.
The action as done on the table is from a PL/SQL block:
CURSOR c_get_daten IS
SELECT ROWID AS w_rowid,
vollstaendig_flag,
pre_vollstaendig_flag,
pre_verarbeitet_flag,
verarbeitet_flag
FROM CCR$TA_W_ERP_STATUSINT
WHERE pre_vollstaendig_flag IS NOT NULL
OR pre_verarbeitet_flag IS NOT NULL;
—
BEGIN
OPEN c_get_daten;
LOOP
FETCH c_get_daten
BULK COLLECT INTO
status_rowids,
vollstaendig_flag,
pre_vollstaendig_flag,
pre_verarbeitet_flag,
verarbeitet_flag
LIMIT 20000;
l_done := c_get_daten%NOTFOUND;
FORALL i IN 1 .. status_rowids.COUNT
UPDATE CCR$TA_W_ERP_STATUSINT
SET VOLLSTAENDIG_FLAG = NVL (PRE_VOLLSTAENDIG_FLAG(i),
VOLLSTAENDIG_FLAG(i)),
VERARBEITET_FLAG = NVL (PRE_VERARBEITET_FLAG(i),
VERARBEITET_FLAG(i))
WHERE rowid = status_rowids (i);
COMMIT;
EXIT WHEN (l_done);
END LOOP;
CLOSE c_get_daten;
COMMIT;
END;
/
WORKAROUND:
———–
At this moment in time advice has been given to enable
db_block_checking=medium
(there is a serious performance drawback due to this setting)
At moment that error occurs the WorkAround is:
drop table in question + purge recyclebin + create table + populate with
contents
RELATED BUGS:
————-
bug:5115882 UPDATES WITH RETURNING CLAUSE AND BEFORE UPDATE TRIGGER
==> no triggers defined on table
Bug:5386204 Block corruption / OERI[kddummy_blkchk] after direct load of ASSM
segment
==> no direct load done on this object
REPRODUCIBILITY:
—————-
Problem is not reproducable at will
TEST CASE:
———-
STACK TRACE:
————
The following information is from last occurrence + restore/recovery on
testsystem:
AL.log:
——-
Tue Dec 16 18:36:56 2008
Errors in file /appl/local/oracle/admin/pDWH2/bdump/pdwh2_smon_18165.trc:
ORA-600: internal error code, arguments: [kdourp_inorder1], [25], [0],
[94], [44], [2], [255], []
Tue Dec 16 18:36:57 2008
Database Characterset is WE8ISO8859P1
Opening with internal Resource Manager plan
where NUMA PG = 9, CPUs = 8
replication_dependency_tracking turned off (no async multimaster replication
found)
Starting background process QMNC
QMNC started with pid=71, OS id=1701
Tue Dec 16 18:37:09 2008
Completed: alter database open
Tue Dec 16 18:40:58 2008
Doing block recovery for file 571 block 79942
Block recovery from logseq 378305, block 878 to scn 9877033293216
Tue Dec 16 18:40:58 2008
Recovery of Online Redo Log: Thread 1 Group 1 Seq 378305 Reading mem 0
Mem# 0: /database/oracle/pDWH2/emc_redo1/o_pDWH2_redo_T1_G1_M1.dbf
Mem# 1: /database/oracle/pDWH2/redo1/o_pDWH2_redo_T1_G1_M2.dbf
Block recovery completed at rba 378305.3389.16, scn 2299.2903479713
The errors as hit during recovery of above:
alert_pDWH2.log
Sun Dec 21 14:30:22 2008
Errors in file /restore_pDWH2/bdump/pdwh2_p004_11521.trc:
ORA-600: internal error code, arguments: [kddummy_blkchk], [570], [79932],
[6110], [], [], [], []
Sun Dec 21 14:30:24 2008
Errors in file /restore_pDWH2/bdump/pdwh2_p004_11521.trc:
ORA-10562: Error occurred while applying redo to data block (file# 570,
block# 79932)
ORA-10564: tablespace CCR_DAT_M
ORA-1110: data file 570: ‘/restore_pDWH2/db/o_pDWH2_CCR_DAT_M_14.dbf’
ORA-10561: block type ‘TRANSACTION MANAGED DATA BLOCK’, data object# 12680578
ORA-600: internal error code, arguments: [kddummy_blkchk], [570], [79932],
[6110], [], [], [], []
Sun Dec 21 14:30:26 2008
Errors with log
/appl/local/oracle/10.2.0.4/dbs/archo_pDWH2_offredo_1_378256_413043102
Recovery interrupted!
– d_abep_statusint3.sql – showing the pl/sql block as used for update of
table
SUPPORTING INFORMATION:
———————–
following files will be uploaded:
– AL.log – alert.log of production database
– alert_pDWH2.log – alert.log of testsystem where recovery is done
===> keep in mind that it does show as well the 1st corruption, recovery in
question was at 21-dec-2008 14.30
– pdwh2_p004_11521.trc – tracefile as created by recovery
– pdwh2_ora_3738.trc – DUMP OF REDO FROM FILE
‘/appl/local/oracle/10.2.0.4/dbs/archo_pDWH2_offredo_1_378256_413043102’
Opcodes *.*
DBAs: (file # 570, block # 79932) thru (file # 570, block # 79932)
– statusint_no_rows.dmp – export of table with ROWS=N
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
DIAL-IN INFORMATION:
——————–
IMPACT DATE:
————
RELEASE NOTES:
]]]] With compatibilty set to 10.2, updates to rows that can be done in place
]]]] without resizing the rows or the columns, and on a block where there is
]]]] high concurrency, redo and or undo corruption could result, with the
]]]] concomittant data block corruption.
REDISCOVERY INFORMATION:
Data block corruption due to corrupted vector value in array update. Sometimes
the redo will be fine, but the undo data will be corrupted. The bug is encoun
tered when:
1) compatiblity = 10.2 or higher;
2) there is high concurrency in the block;
3) an itl is added as part of the 1st set of rows updated by a transaction
doing an array update (KDOM2);
4) the addition of the itl necessitates an internal block compaction;
5) the 1st row in the 2nd array update by the same transaction updates a
row that was moved during a compact operation on the block to
accomodate a new itl.
WORKAROUND:
None
ORA-600 [2015] ORA-600 [kddummy_blkchk] [2007] ORA-600 [kdBlkCheckError] [2007] IN UNDO BLOCK
Applies to:
Oracle Server – Enterprise Edition – Version: 11.1.0.6 to 11.1.0.7 – Release: 11.1 to 11.1
Information in this document applies to any platform.
Symptoms
Corrupt UNDO associated to a COMPRESS table if SUPPLEMENTAL LOGGING is enabled producing ORA-600 [2015] on UNDO block.
If db_block_checking is enabled, the next errors can be produced with check code 2007 preventing the corruption going to disk:
ORA-600 [kdBlkCheckError] [file#] [block#] [2007]
ORA-600 [kddummy_blkchk] [file#] [block#] [2007]
The trace file contains:
ERROR: Undo Block Corrupted. Error Code = 2007
ktu4ubck: undo record #<n> with <n> change vectors exceeds allowed 293
Changes
Supplemental Logging is enabled for a COMPRESS table.
Cause
Supplemental Logging on COMPRESS tables is not supported in 11.1.
This restriction is lifted in 11.2. However, contact support about bug 8331063 with similar errors in 11.2
Solution
Use one of the next workarounds:
1. Set _lmn_compression=FALSE.
2. Disable Supplemental Logging.
3. Disable COMPRESSION :
Note that it does not repair existent corruptions but prevent them from happening.
ORA-600 [kddummy_blkchk]
Note: For additional ORA-600 related information please read Note:146580.1
PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Caused by: com.sun.sql.jdbc.base.BaseBatchUpdateException: [sunm][Oracle JDBC Driver][Oracle]ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [13], [737932], [18040], [], [], [], []
1. Run the following and paste the output into the SR. Also, please upload the resulting trace file:
SQL> oradebug setmypid
execute dbms_space_admin.assm_tablespace_verify(‘FEMS_DATA’,dbms_space_admin.TS_VERIFY_BITMAPS)
oradebug tracefile_name
2. Please dump this block to a trace file and upload the trace file:
A. Get the file name and block_size for FILE# 13:
SQL> select name from v$datafile where file#=13;
B. Dump the block using the file name
Do this in a new session so the file only containst the block dump:
SQL> exit
$> sqlplus….
SQL> oradebug setmypid
alter system dump datafile ” block737932;
3. Run DBVerify against this file and upload the results. We want to make sure other blocks are not affected:
$> dbv file= blocksize= logfile=dbv_13.out
4. What is this table used for ?
There are a number of different ways to try to handle this, but if the data in this table can be obtained from another source, or repopulated so that this table could be dropped and recreated, that would be the cleanest.