一位客户的Oracle告警日志中出现了ORA-600 [kddummy_blkchk] [18038]故障,alert中的具体信息:
Errors in file /u01/app/oracle/admin/prdw014a/udump/prdw014a_ora_4377.trc: ORA-00600: internal error code, arguments: [kddummy_blkchk], [222], [5792], [18038], [], [], [], [] Mon May 17 15:27:53 2010 Trace dumping is performing id=[cdmp_20100517152753] Mon May 17 15:27:53 2010 Doing block recovery for file 2 block 504365 Block recovery from logseq 159276, block 166357 to scn 10934615778284 Mon May 17 15:27:53 2010 Recovery of Online Redo Log: Thread 1 Group 4 Seq 159276 Reading mem 0 Mem# 0: /u01/app/oracle/dataPRDW014/redo04a_1.log Mem# 1: /u01/app/oracle/dataPRDW014/redo04a_2.log Block recovery completed at rba 159276.167277.16, scn 2545.3924010007 Doing block recovery for file 222 block 5792 Block recovery from logseq 159276, block 84741 to scn 10934615778283 Mon May 17 15:27:53 2010 Recovery of Online Redo Log: Thread 1 Group 4 Seq 159276 Reading mem 0 Mem# 0: /u01/app/oracle/dataPRDW014/redo04a_1.log Mem# 1: /u01/app/oracle/dataPRDW014/redo04a_2.log Block recovery completed at rba 159276.167277.16, scn 2545.3924009964 Mon May 17 15:27:55 2010 Block recovery completed at rba 159276.167277.16, scn 2545.3924009964 Mon May 17 15:27:55 2010 Corrupt Block Found TSN = 67, TSNAME = OBA_DATA RFN = 222, BLK = 5792, RDBA = 931141280 OBJN = 1657288, OBJD = 1699775, OBJECT = W_ORG_DS, SUBOBJECT = SEGMENT OWNER = BMS_OBA_DW, SEGMENT TYPE = Table Segment Mon May 17 15:32:56 2010 Trace dumping is performing id=[cdmp_20100517153255]
附600错误产生的trace信息:
prdw014a_ora_4377.trc
/u01/app/oracle/admin/prdw014a/udump/prdw014a_ora_4377.trc Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining Scoring Engine and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/102prdw014 System name: SunOS Node name: v08k405 Release: 5.9 Version: Generic_122300-29 Machine: sun4u Instance name: prdw014a Redo thread mounted by this instance: 1 Oracle process number: 109 Unix process pid: 4377, image: oracle@v08k405 *** 2010-05-17 15:23:15.391 *** ACTION NAME:() 2010-05-17 15:23:15.389 *** MODULE NAME:(pmdtm@v04k413 (TNS V1-V3)) 2010-05-17 15:23:15.389 *** SERVICE NAME:(prdw014_taf) 2010-05-17 15:23:15.389 *** SESSION ID:(789.48811) 2010-05-17 15:23:15.389 TYP:0 CLS: 4 AFN:222 DBA:0x378016a0 OBJ:1699775 SCN:0x09f1.e9e3a3eb SEQ: 2 OP:14.4 kteop redo - redo operation on extent map RESIZE: entry:0 delta: ... .. .. ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [kddummy_blkchk], [222], [5792], [18038], [], [], [], [] Current SQL statement for this session: INSERT /*+ SYS_DL_CURSOR */ INTO bms_oba_dw.W_ORG_DS ("W_CUSTOMER_CLASS","NAME","ST_ADDRESS","CITY","STATE","ZIPCODE","COUNTRY","CUST_TYPE_CODE","CUST_TYPE_NAME","ACTIVE_FLG","DOM_ULT_DUNS_NUM","DUNS_NUM","EMP_COUNT","FORMED_DT","GLBLULT_DUNS_NUM","ANNUAL_REVENUE","BRANCH_FLG","BIRTH_DT","NO_OF_CHILDREN","LEGAL_NAME","FAMILY_NAME","OTHER_NAME","PREFERRED_NAME","INDV_ADDNL_TITLE","INDV_TITLE","INDV_MARITAL_STATE","INDV_GENDER","EMAIL_ADDRESS","RELATIONSHIP_STATE","INDV_EMP_STATUS","FAX_NUM","PAGER_NUM","MOBILE_NUM","LIFE_CYCLE_STATE","CUST_CAT_CODE","CUST_CAT_NAME","SIC_CODE","SIC_NAME","GOVT_ID_TYPE","GOVT_ID_VALUE","DUNNS_SITE_NAME","DUNNS_GLOBAL_NAME","DUNNS_LEGAL_NAME","CUSTOMER_NUM","ALT_CUSTOMER_NUM","ALT_PHONE_NUM","INTERNET_HOME_PAGE","LEGAL_STRUCT_CODE","LEGAL_STRUCT_NAME","DIRECT_MKTG_FLG","SOLICITATION_FLG","CUSTOMER_HIER1_CODE","CUSTOMER_HIER1_NAME","CUSTOMER_HIER2_CODE","CUSTOMER_HIER2_NAME","CUSTOMER_HIER3_CODE","CUSTOMER_HIER3_NAME","CUSTOMER_HIER4_CODE","CUSTOMER_HIER4_NAME","CUSTOMER_HIER5_CODE","CUSTOMER_HIER5_NAME","CUSTOMER_HIER6_CODE","CREATED_BY_ID","CHANGED_BY_ID","CREATED_ON_DT","CHANGED_ON_DT","AUX1_CHANGED_ON_DT","AUX2_CHANGED_ON_DT","AUX3_CHANGED_ON_DT","AUX4_CHANGED_ON_DT","SRC_EFF_FROM_DT","SRC_EFF_TO_DT","DELETE_FLG","DATASOURCE_NUM_ID","INTEGRATION_ID","TENANT_ID","X_CUSTOM","MOT_ATTRIBUTE1","MOT_ATTRIBUTE2","MOT_ATTRIBUTE3","MOT_ATTRIBUTE4","MOT_ATTRIBUTE5","MOT_ATTRIBUTE6","MOT_ATTRIBUTE7","MOT_ATTRIBUTE8","MOT_ATTRIBUTE9","MOT_ATTRIBUTE10","MOT_ATTRIBUTE11","MOT_ATTRIBUTE12","MOT_ATTRIBUTE13","MOT_ATTRIBUTE14","MOT_ATTRIBUTE15","MOT_ATTRIBUTE16","MOT_ATTRIBUTE17","MOT_ATTRIBUTE18","MOT_ATTRIBUTE19","MOT_ATTRIBUTE20","MOT_PARTY_TYPE","MOT_PHONE_AREA_CODE","MOT_ORIG_SYSTEM_REFERENCE","MOT_PER_EMAIL_ADDR","MOT_PERSON_FIRST_NAME","MOT_PHONE_EXTENSION","MOT_ALTERNATE_NAME","MOT_TELEPHONE_TYPE","MOT_SALES_CHANNEL_CODE","MOT_ACCOUNT_NAME","MOT_ATTRIBUTE_CATEGORY","MOT_INTERCOMPANY_FLAG","MOT_PARTY_NUMBER","MOT_PARTY_ID","MOT_LAST_UPDATE_LOGIN","MOT_CUST_CLASS_DESC","MOT_RECEIPT_METHOD_NAME","MOT_PHONE_NUMBER","MOT_CONTACT_POINT_PURPOSE","MOT_SALESREP_NAME","MOT_PAY_TERMS_CODE","MOT_PAY_TERMS_NAME") VALUES (NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedmp()+744 CALL ksedst() 000000840 ? FFFFFFFF7FFF620C ? 000000000 ? FFFFFFFF7FFF2D00 ? FFFFFFFF7FFF1A68 ? FFFFFFFF7FFF2468 ? kgerinv()+200 PTR_CALL 0000000000000000 000106800 ? 10681C1C4 ? 10681C000 ? 00010681C ? 000106800 ? 10681C1C4 ? kseinpre()+96 CALL kgerinv() 106816B18 ? 000000000 ? 1064564C0 ? 000000003 ? FFFFFFFF7FFF6750 ? 000001430 ? ksesin()+52 CALL kseinpre() 000106800 ? 000000003 ? 00000025F ? 10681C1B8 ? FFFFFFFF7FFF6750 ? 1068167D8 ? kco_blkchk()+2568 CALL ksesin() 1064564C0 ? 000000003 ? 000106800 ? 0000000DE ? 000000000 ? 000106800 ? kcoapl()+1284 CALL kco_blkchk() 0001900DE ? 0378016A0 ? 0000016A0 ? 00000FC00 ? 000000000 ? FFFFFFFF7FFF89F8 ? kcbapl()+412 CALL kcoapl() 000000002 ? 000002300 ? 000105800 ? 583DBC000 ? 106816C98 ? 00010598F ? kcrfw_redo_gen()+16 CALL kcbapl() FFFFFFFF7FFF89B8 ? 376 583FB7870 ? FFFFFFFF7AF3AA3C ? B6E9FABD0 ? 000000000 ? 583DBC000 ? kcbchg1_main()+1363 CALL kcrfw_redo_gen() 000000000 ? 2 FFFFFFFF7FFF76C8 ? B693A9998 ? 000000000 ? 3800135A0 ? FFFFFFFF7FFF7700 ? kcbchg1()+1324 CALL kcbchg1_main() 000100C00 ? FFFFFFFF7FFF7850 ? 000000000 ? 583FB7870 ? 000000000 ? 00000FFFF ? ktuchg()+968 CALL kcbchg1() 000106819 ? 1068195B8 ? 1068195C8 ? 106819000 ? 000000000 ? 106819000 ? ktbchg2nt()+104 CALL ktuchg() 000000002 ? 000000001 ? FFFFFFFF7FFF8928 ? B67A76DD8 ? 000000000 ? 000000000 ? kteopgen()+728 CALL ktbchg2nt() FFFFFFFF7FFF89B8 ? FFFFFFFF7FFF87C4 ? 000000000 ? 000000000 ? FFFFFFFF7FFF8928 ? FFFFFFFF7FFF9D98 ? kteopresize()+2276 CALL kteopgen() FFFFFFFF7FFF89B8 ? 000000006 ? 000106800 ? 000000002 ? 10682247C ? 106816B18 ? ktsxbmdelext1()+968 CALL kteopresize() FFFFFFFF7FFF9D98 ? 8 FFFFFFFF7FFF9E88 ? 000000004 ? 000000002 ? 000000000 ? 000000000 ? ktsstrm_segment()+6 CALL ktsxbmdelext1() FFFFFFFF7AD33A78 ? 308 0000016A0 ? 0003FFFFF ? FFFFFFFF7AD33A78 ? 106822000 ? 000000043 ? ktsmg_trimf()+1208 CALL ktsstrm_segment() 000000000 ? 000000003 ? 000000001 ? 000100C00 ? 106819000 ? 000000000 ? kdbltrmt()+1916 CALL ktsmg_trimf() 00010598F ? 0000010E2 ? 106822478 ? 000000005 ? 10682247C ? 106816B18 ? kdblfpl()+96 CALL kdbltrmt() 000000006 ? 000000000 ? FFFFFFFF7AD33918 ? 000000180 ? 0000010E4 ? 000000008 ? kdblfl()+1948 CALL kdblfpl() FFFFFFFF7FFFB0AC ? FFFFFFFF7AD33918 ? 000000000 ? FFFFFFFF7AD33AE0 ? FFFFFFFF7AD33A68 ? 000000000 ? klafin()+160 CALL kdblfl() FFFFFFFF7FFFB0AC ? FFFFFFFF7AD33918 ? 000000000 ? 000000001 ? 000000008 ? 000106800 ? kpodpfin()+76 CALL klafin() FFFFFFFF7AF35C40 ? 1059BF2B8 ? 000000321 ? FFFFFFFF7AD33918 ? 000000000 ? 000400000 ? kpodpmop()+320 CALL kpodpfin() FFFFFFFF7AF35C40 ? 000106816 ? 000106800 ? 000000321 ? 000000001 ? FFFFFFFF7AF35BC8 ? opiodr()+1496 PTR_CALL 0000000000000000 000000301 ? 000000321 ?
进过与Oracle support确认,定位为Bug 5386204 – Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment [ID 5386204.8].
“kteop redo – redo operation on extent map” 记录是确定该Bug的一个重要依据。
该Bug的Oracle note:
Bug 5386204 Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment
This note gives a brief overview of bug 5386204.
The content was last updated on: 08-FEB-2010
Click here for details of each of the sections below.
This bug is alerted in Note:580561.1
Affects:Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions < 11
Versions confirmed as being affected* 9.2.0.8
* 10.2.0.1
* 10.2.0.2
* 10.2.0.3
* 10.2.0.4Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in
* 9.2.0.8 Patch 15 on Windows Platforms
* 10.2.0.2 Patch 15 on Windows Platforms
* 10.2.0.3 Patch 5 on Windows Platforms
* 10.2.0.4.1 (Patch Set Update)
* 10.2.0.4 Patch 2 on Windows Platforms
* 10.2.0.5 (Server Patch Set)
* 11.1.0.6 (Base Release)Symptoms:
Related To:
* Internal Error May Occur (ORA-600)
* Corruption (Logical)
* ORA-600 [kddummy_blkchk]* Direct Path Operations
* ASSM Space Management (Bitmap Managed Segments)Description
Block corruption / ORA-600 [kddummy_blkchk][file#] [block#] [18038]
can occur on a segment which has been direct loaded.(The corruption shows as a PAGETABLE SEGMENT HEADER
having blocks in the “Auxillary Map” outside of the “Extent Map”
range)Note:
This bug was previously incorrectly listed as fixed in 10.2.0.4Further details on this issue can be found in Note:580561.1
ORA-600 [kddummy_blkchk][][][18038] during extent operations like TRUNCATE on ASSM tablespaces [ID 580561.1]Applies to:
Oracle Server – Enterprise Edition – Version: 9.2.0.8 to 10.2.0.4
Information in this document applies to any platform.
DescriptionThis alert describes the problem in Bug 5386204 / Note 5386204.8.
Block corruption with error ORA-600 [kddummy_blkchk] [file#] [block#] [18038]
may be reported during a DROP/TRUNCATEThe corruption shows as a PAGETABLE SEGMENT HEADER having blocks in the
“Auxillary Map” outside of the “Extent Map” range.The same operation terminated without any error in previous RDBMS versions
like Oracle9i.Likelihood of Occurrence
The object is populated by direct path operations such as SQL*Loader using DIRECT=Y for example.
The object is stored in a Locally Managed Tablespace (LMT) that is using ASSM (dba_tablespaces.segment_space_management=’AUTO’).
Bug 5386204 is mostly hit when db_block_size=16384.Possible Symptoms
One evidence of hitting this bug might be the value 18038 in the third argument of
ORA-600 [kddummy_blkchk] where [18038] is a check error code.@Error check code 18038 means that the “Data dba” stored in “Auxiliary Map” is out of range
@TYP:0 CLS: 4 AFN:234 DBA:0x3a801554 OBJ:0 SCN:0x000b.290f5e0d SEQ: 1 OP:14.2
@In this case “Data dba: 0x3a801555” stored in the “Auxiliary Map” is equal to 0x3a801551 + 4 which is out of the extent 0, hence the error.
@Note that extent 0 is 4 blocks, so extent 0 starts from 0x3a801551 to 0x3a801554.Workaround or Resolution
In order to identify objects that are affected by the corruption, use the procedure
DBMS_SPACE_ADMIN.ASSM_TABLESPACE_VERIFY@DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY is also an option but it requires patch for Bug 6760697 is needed)
How to execute DBMS_SPACE_ADMIN.ASSM_TABLESPACE_VERIFY:
alter system set DB_BLOCK_CHECKSUM = OFF;
— open a new session and run :
exec DBMS_SPACE_ADMIN.assm_tablespace_verify(‘<Tablespace Name>’, DBMS_SPACE_ADMIN.TS_VERIFY_DEEP, DBMS_SPACE_ADMIN.SEGMENT_VERIFY_DEEP);See if any trace file is generated in the directory defined by user_dump_dest.
The absence of a trace file means that no corrupt segments were found.Note: DB_BLOCK_CHECKSUM has to be disabled; otherwise the same ORA-600 error may be produced
@Oracle check block type 0x23=PAGETABLE SEGMENT HEADER even if DB_BLOCK_CHECKING is not set.
Example of output from DBMS_SPACE_ADMIN.ASSM_TABLESPACE_VERIFY
Segment header [dba: 0x003a801554, (file 234,block 5460)]
Segment object id: 7825838; inc. no.: 0
*********verifying extent map and tablespace bitmap consistency
———
Verifying extent map and auxilliary extent map consistency in the segment
Block Corruption in seg hdr / ext map block: rdba: 0x3a801554, err code: 18038Identifying the object using the segment header information.
Segment header [dba: 0x003a801554, (file 234,block 5460)]
select *
from DBA_EXTENTS
where FILE_ID = 234
and 5460 between block_id and block_id + blocks – 1;Identifying the object using the Segment object id information.
Segment object id: 7825838; inc. no.: 0
select *
from DBA_OBJECTS
where DATA_OBJECT_ID = 7825838;@How to execute DBMS_SPACE_ADMIN.ASSM_SEGMENT_VERIFY
WORKAROUNDs:
Disable DB_BLOCK_CHECKSUM for any action taken.
Note: DB_BLOCK_CHECKSUM has to be disabled; otherwise the same ORA-600 error may be produced
alter system set DB_BLOCK_CHECKSUM = OFF;
— open a new sessionDROP TABLE .. PURGE;
ALTER TABLE .. MOVE ..;
Create table as select (CTAS)
export/import, etcPatches
The patch prevents the corruption from taking place. Affected objects will have to be recreated.
This bug was previously incorrectly listed as fixed in 10.2.0.4.
@This problem is fixed in the 10.2.0.5 Patch Set (not available yet and still without a due date).
This problem is fixed in the 11.1.0.6 rdbms release.One off patches for this issue are available for some platforms / versions.
See Patch 5386204 for patch availability.
Modification History
03-JUN-2008 – Initial Alert version
04-JUN-2008 – Implemented correction
11-JUN-2008 – Added info about DB_BLOCK_CHECKSUM
13-JUN-2008 – PublishedReferences
BUG:5386204 – ORA-600 [KDDUMMY_BLKCHK] ERRORS WITH CODE 18038
NOTE:5386204.8 – Bug 5386204 – Block corruption / OERI[kddummy_blkchk] after direct load of ASSM segment
Bug 5386204: ORA-600 [KDDUMMY_BLKCHK] ERRORS WITH CODE 18038
Show Bug Attributes Bug Attributes
Type B – Defect Fixed in Product Version 11.1
Severity 1 – Complete Loss of Service Product Version 10.2.0.2
Status 80 – Development to Q/A Platform 226 – Linux x86-64
Created 12-Jul-2006 Platform Version 2.6.5-7.191-SMP
Updated 20-May-2010 Base Bug –
Database Version 10.2.0.2
Affects Platforms Generic
Product Source OracleShow Related Products Related Products
Line Oracle Database Products Family Oracle Database
Area Oracle Database Product 5 – Oracle Server – Enterprise EditionHdr: 5386204 10.2.0.2 RDBMS 10.2.0.2 SPACE PRODID-5 PORTID-226 ORA-600
Abstract: ORA-600 [KDDUMMY_BLKCHK] ERRORS WITH CODE 18038*** 07/12/06 12:59 am ***
TAR:
—-PROBLEM:
——–
1. Clear description of the problem encountered
Customer is getting repeated ORA-600 [kddummy_blkchk] errors reported with
internal check code 18038 on tables which have had bulk deletions made. This
has occurred on both production and test instances.2. Pertinent configuration information (MTS/OPS/distributed/etc)
RAC, ASM3. Indication of the frequency and predictability of the problem
Problem is intermittent but occurs several times a day impacting the
customers ability to work.4. Sequence of events leading to the problem
Error is typically signalled on a COMMIT most likely following a deletion
from the tables.5. Technical impact on the customer. Include persistent after effects.
Severe, as it occurs multiple times per day, and corrupt the underlying
tables preventing further data loads.DIAGNOSTIC ANALYSIS:
——————–
The trace files show that the problem occurs following a bulk deletion from
the underlying tables, which appear to corrupt the extent map, as the segment
header dump shows 1 extent of 4 blks, but the deleteion entry in the redo
stream shows one extent of 8 blks, e.g.:REDO RECORD – Thread:1 RBA: 0x0005da.000e5e34.01c0 LEN: 0x00fc VLD: 0x01
SCN: 0x000d.37eacce9 SUBSCN: 5 07/11/2006 10:29:53
CHANGE #1 TYP:0 CLS:60 AFN:39 DBA:0x09c322e0 OBJ:4294967295
SCN:0x000d.37eacce9 SEQ: 2 OP:5.1
ktudb redo: siz: 112 spc: 15940 flg: 0x0022 seq: 0x011d rec: 0x06
xid: 0x0016.020.000005b6
ktubu redo: slt: 32 rci: 5 opc: 14.5 objn: 2 objd: 93662 tsn: 12
Undo type: Regular undo Undo type: Last buffer split: No
Tablespace Undo: Yes
0x00000000
kteopu undo – undo operation on extent map
segdba: 0x87e3cc class: 4 mapdba:0x87e3cc offset: 3
rbr extent – dba: 0x0 nbk: 0x0
kteop redo – redo operation on extent map
ADD: dba:0x803673d len:8 at offset:1
DEFAULT: ???
SETSTAT: exts:2 blks:16 lastmap:0x0 mapcnt:0
CHANGE #2 TYP:0 CLS: 4 AFN:2 DBA:0x0087e3cc OBJ:93662 SCN:0x000d.37eacce9
SEQ: 1 OP:14.4
kteop redo – redo operation on extent map
DELETE: entry:1
shift back: dba:0x0 len:0
SETSTAT: exts:1 blks:8 lastmap:0x0 mapcnt:0WORKAROUND:
———–
NoneRELATED BUGS:
————-
Bug 4949123 – ORA-600: [KDDUMMY_BLKCHK], [541], [147050], [18038]REPRODUCIBILITY:
—————-
Consistently occurring at customers site.TEST CASE:
———-
n/aSTACK TRACE:
————
ksedst ksedmp ksfdmp kgerinv kseinpre ksesin kco_blkchk kcoapl kcbapl
kcrfw_redo_gen kcbchg1_main kcbchg1 ktuchg ktbchg2nt kteopgen kteopresize
ktsxbmdelext1 ktsstrm_segment ktsmg_icmt_prepare ktcifc ktucmt ktpcmt ktcrcm
ktdcmt k2lcom k2send xctctl xctcom_with_options kksExecuteCommand opiexe
opipls opiodr rpidrus skgmstack rpidru rpiswu2 rpidrv psddr0 psdnal
pevm_EXECC pfrinstr_EXECC pfrrun_no_tool pfrrun plsql_run peicnt kkxexe
opiexe kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real
main __libc_start_main _startSUPPORTING INFORMATION:
———————–
alertlogs and trace files24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
n/aDIAL-IN INFORMATION:
——————–
n/aIMPACT DATE:
————
21-JUL-2006*** 07/12/06 02:34 am *** (CHG: Asg->NEW OWNER OWNER)
A redo dump of the segment header during the entire procedure execution was
requested on 06 Aug and supplied on 09 Aug so why are you asking for this
information again when you already have it? Please check that file
(redo_1.trc in bug5386204_07Aug.zip), and let me know if you need anything
else.
*** 09/19/06 02:39 am *** (CHG: Sta->30)
Uploaded the requested information in file bug5386204_Oct02.zip.*** 11/27/06 11:13 am ***
Here is one theory we (space group) have on this bug so far:
During direct load one of the segments does not get loaded with any data. The
segment is empty and the first extent has 8 blocks (this is 16k block size).
However it goes through the usual high water mark movement phase (even though
the hwm does not move). During the hwm movement phase, the segment is trimmed
close to 64k boundary. For ASSM segment with 16k block size, this means the
segment will be left with no data blocks after the trim- 4 blocks after the
trim would represent bitmaps and segment header only.There are two issues here:
(1) Why was ktsstrm_segment called on an empty (or unloaded) segment at first
place?
(2) Even if it was called, why is segment trimmed to 64k boundary?I’m working on the 2nd issue and will give an update soon.
*** 11/29/06 03:18 am *** (CHG: Pri->1)
*** 11/29/06 03:18 am ***
*** 11/29/06 03:25 am *** -> CLOSED
*** 11/29/06 05:31 pm ***
*** 11/30/06 10:36 pm ***
We ran into some issues (bugs) while testing the code for the diagnostic
patch. I was hoping to have it finished by today but it seems it’ll take some
more time and I’m pretty hopeful of having it ready to go by tomorrow evening
(PST). I’m really sorry for the delay.
*** 12/01/06 07:16 pm ***
*** 12/02/06 05:05 pm ***
Sorry for the delay in replying. I would expect the long regressions to be
complete by sunday afternoon PST. I should be ready to release the patch by
sunday evening if things go fine. Will keep this page updated on my progress.
*** 12/03/06 05:30 pm ***
*** 12/04/06 05:35 pm ***
It seems most of major issues with the long regressions have been taken care
of and I hope to get a clean run on the farm soon, by tomorrow end of day and
the patch should be on its way soonafter.I had a question though, that will help me in getting the patch out faster. I
wanted to know if the customer has had any diagnosibility patches installed
on their 10.2.0.2.0 release version.Another thing which I would like to mention here is that my patch modifies
only one file (ktss.c) in the RDBMS code.
*** 12/05/06 02:19 am ***
*** 12/05/06 04:42 pm ***
*** 12/05/06 05:06 pm ***
I was hoping to have all the farm regressions (and the patch) done by today
evening but it seems farm is taking a bit long to finish the regressions.
I’ll work on the patch as soon as I have the regressions done. Sorry for the
delay. I’ll provide an update on that in the next few hours.
*** 12/05/06 09:07 pm ***
My regressions are still moving very slowly through the queue on the farm.
The farm seems to be busy with 11g Beta 4 deadline round the corner. My
regressions have been on the farm for more than a day now. I’ll work on the
patch as soon as I have a clean farm run.
*** 12/06/06 06:15 pm ***
Still waiting on the clean farm runs. Fortunately, I’ve been able to get a
high priority on the farm jobs. So, I expect things to run clean soon. Will
keep things updated here.
*** 12/07/06 05:54 pm ***
Got my farm runs completed last night but got a small number of diffs. Have
been trying to isolate them and hopefully soon, everything should be clean.
Farm has been giving those diffs over and over again though those look
unrelated to my change. Currently, verifying them on my linux workstation.
*** 12/08/06 06:01 pm ***
*** 12/08/06 06:23 pm ***
Have been able to run almost all the long regressions locally and things look
clean. There’s just a couple of long regressions which I’m still running and
I should expect to be ready to go as soon as they are completed. Should be
able to start the patch building soon.
*** 12/11/06 01:07 am ***
There’s one long regression which seems to be broken. I’m currently working
on that to have it run clean. Will update as soon as I have it running clean.
*** 12/11/06 01:23 pm ***
Everything is clean now. Working on starting the patch building process.
*** 12/11/06 02:46 pm ***The customer has confirmed that following application of the suplied patch
the error no longer occurred when running the testcase, which ran through to
completion after about 8 hours. They are resetting the testcase, and will
run it again to verify this, but the initial response is that this looks to
have resolved the problem.Can you confirm if the patch would need to be rebuilt as a permananent fix,
ie. any diagnostics to be removed etc. or is it actually the full fix anyway?
*** 12/13/06 07:19 am ***
The customer has confirmed the following:1. Rerun the test for the 2nd time with patched rdbms: completed quickly and
without any problems.
2. Rollbacked the patch: the test failed as expected within 30 minutes.
3. Re-applied the patch and ran the test once again: completed ok.This appears to confirm that the patch resolves the problem so could we have
an answer to the previous update?
*** 12/13/06 07:47 pm ***
That is good news.
No additional diagnostics have been added to the patch. So, it’s not needed
to be rebuilt. I guess the supplied patch should be complete in itself.
*** 12/14/06 12:40 am ***
Thanks for the update.
该文档描述当使用直接路径方式导入数据时一定概率导致该Bug产生,譬如使用Sql loader且DIRECT=Y;
该Bug只会由存贮在本地管理方式(LTM)并自动段管理(ASSM)的对象引发, 并且当标准块大小为16k时出现概率较高(Bug 5386204 is mostly hit when db_block_size=16384.)
一般数据库都会启用db_block_checksum,该参数控制Oracle在读入块时做检验操作,[18038]是kddummy_blkchk的一种错误代码,出现该错误代码说明存储在段头中的辅助区间图中的Data dba越界, 我们举一个段头来看:
Start dump data blocks tsn: 4 file#: 4 minblk 139 maxblk 139 buffer tsn: 4 rdba: 0x0100008b (4/139) scn: 0x0000.000f327e seq: 0x01 flg: 0x04 tail: 0x327e2301 frmt: 0x02 chkval: 0x619e type: 0x23=PAGETABLE SEGMENT HEADER Hex dump of block: st=0, typ_found=1 ....... Extent Control Header ----------------------------------------------------------------- Extent Header:: spare1: 0 spare2: 0 #extents: 9 #blocks: 72 last map 0x00000000 #maps: 0 offset: 2716 Highwater:: 0x0101e1f1 ext#: 8 blk#: 8 ext size: 8 #blocks in seg. hdr's freelists: 0 #blocks below: 65 mapblk 0x00000000 offset: 8 Unlocked -------------------------------------------------------- Low HighWater Mark : Highwater:: 0x0101e1f1 ext#: 8 blk#: 8 ext size: 8 #blocks in seg. hdr's freelists: 0 #blocks below: 65 mapblk 0x00000000 offset: 8 Level 1 BMB for High HWM block: 0x0101e1e9 Level 1 BMB for Low HWM block: 0x0101e1e9 -------------------------------------------------------- Segment Type: 1 nl2: 1 blksz: 8192 fbsz: 0 L2 Array start offset: 0x00001434 First Level 3 BMB: 0x00000000 L2 Hint for inserts: 0x0100008a Last Level 1 BMB: 0x0101e1e9 Last Level II BMB: 0x0100008a Last Level III BMB: 0x00000000 Map Header:: next 0x00000000 #extents: 9 obj#: 51806 flag: 0x10000000 Inc # 0 Extent Map ----------------------------------------------------------------- 0x01000089 length: 8 0x0101e1a1 length: 8 0x0101e1a9 length: 8 0x0101e1b9 length: 8 0x0101e1c1 length: 8 0x0101e1c9 length: 8 0x0101e1d9 length: 8 0x0101e1e1 length: 8 0x0101e1e9 length: 8 Auxillary Map -------------------------------------------------------- Extent 0 : L1 dba: 0x01000089 Data dba: 0x0100008c Extent 1 : L1 dba: 0x01000089 Data dba: 0x0101e1a1 Extent 2 : L1 dba: 0x0101e1a9 Data dba: 0x0101e1aa Extent 3 : L1 dba: 0x0101e1a9 Data dba: 0x0101e1b9 Extent 4 : L1 dba: 0x0101e1c1 Data dba: 0x0101e1c2 Extent 5 : L1 dba: 0x0101e1c1 Data dba: 0x0101e1c9 Extent 6 : L1 dba: 0x0101e1d9 Data dba: 0x0101e1da Extent 7 : L1 dba: 0x0101e1d9 Data dba: 0x0101e1e1 Extent 8 : L1 dba: 0x0101e1e9 Data dba: 0x0101e1ea -------------------------------------------------------- Second Level Bitmap block DBAs -------------------------------------------------------- DBA 1: 0x0100008a
其中辅助区间图( Auxillary Map)列出了该段每个区间(Extent)的一级位图块以及区间中实际数据开始的data block address (Data dba).譬如Extent 0 中的Data dba应在
(0x0100008A ~0x01000090)之间,否则即越界。
DROP或TRUNCATE是触发该Bug的主要操作,原因是这2个操作都需要使用到Pagetable segment header中的Auxiliary Map。
Oracle建议的WorkAround方式主要是通过MOVE TABLESPACE 来”REBUILD”这个PAGETABLE SEGMENT HEADER。
这个Case中Oracle support给出Workaround建议:
1-. Make sure the below query will return the table mentioned above:
SQL> select owner, object_name, object_type, SUBOBJECT_NAME, OBJECT_ID,
DATA_OBJECT_ID, CREATED,LAST_DDL_TIME,TIMESTAMP
from DBA_OBJECTS
where DATA_OBJECT_ID =1699775;If so continue:
SQL>alter system set DB_BLOCK_CHECKSUM = OFF;
Find all indexes for W_ORG_DS table.
SQL> select owner, index_name, index_type, table_name , table_owner from dba_indexes
Where table_owner = ‘BMS_OBA_DW’ and
Table_name = ‘W_ORG_DS’;connect as BMS_OBA_DW
SQL> desc W_ORG_DS
if this table does not have LONG column, then Alter table table_name move is like a CTAS but better since is using the same name of the object plus keeping any related object like index, etc. If it has Long column then export/truncate/import need to be use;
SQL>Alter table W_ORG_DS Move;
Then rebuild all indexes for W_ORG_DS table as per above query: .i.e.
SQL>Alter index rebuild
To avoid problem, please apply patch for bug 5386204, see note 580561.1 for further information.
Oracle文档宣称其已在10.2.0.4的第一个patch set update及10.2.0.5中修复了该Bug.
注:最早认为该Bug在10.2.0.4中就已经修复了,但后来确认“This bug was previously incorrectly listed as fixed in 10.2.0.4”。