周五被叫到客户现场解决一套10.2.0.4 RAC数据库上的ORA-00600[13013]内部错误问题,这个问题同事已经在上午通过远程拨号了解过情况,
初步判断是索引存在讹误corruption引起的600。
前期诊断
同事在我抵达现场之前已经做了初步的诊断,该10.2.0.4上的RAC系统主用节点的告警日志中多次出现ORA-00600:[13013], [5001]、ORA-00600:[qertbFetchByRowID]及ORA-00600: [25027] 等内部错误,具体的日志如下:
Fri Sep 16 01:16:54 2011 Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_2138788.trc: ORA-00600: internal error code, arguments: [13013], [5001], [66209], [50730514], [23], [50730514], [3], [] Fri Sep 16 01:16:55 2011 Trace dumping is performing id=[cdmp_20110916011655] Fri Sep 16 01:17:06 2011 Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_2138788.trc: ORA-00600: internal error code, arguments: [13013], [5001], [66209], [50730514], [23], [50730514], [3], [] 以下为13013 trace ORA-00600: internal error code, arguments: [13013], [5001], [66209], [50730514], [23], [50730514], [3], [] Current SQL statement for this session: UPDATE CUST_SUBSCRB_PERSONAL A SET a.cust_mobile='A04204441' WHERE a.subscrbid=71524739 ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 088424844 ? 041124844 ? ksedmp+0290 bl ksedst 104A54870 ? ksfdmp+0018 bl 03F30204 kgeriv+0108 bl _ptrgl kgesiv+0080 bl kgeriv 1000D8FE4 ? 0785F7165 ? 10501ED28 ? FFFFFFFFFFF6748 ? 06C626E72 ? ksesic6+0060 bl kgesiv 110072D08 ? 7000000AC52B8F8 ? 7000000AC52B3A8 ? 07FFFFFFF ? 000000000 ? updThreePhaseExe+0c bl 01F9D7A8 4c updexe+041c bl updThreePhaseExe 7000000AC52B8F8 ? 000000000 ? 110540128 ? FFFFFFFFFFF7C80 ? opiexe+27d8 bl updexe 7000000AC52B8F8 ? FFFFFFFFFFF82B8 ? kpoal8+0edc bl opiexe FFFFFFFFFFFB434 ? FFFFFFFFFFFB198 ? FFFFFFFFFFF9608 ? opiodr+0ae0 bl _ptrgl ttcpip+1020 bl _ptrgl opitsk+1124 bl 01F9F2A0 opiino+0990 bl opitsk 000000000 ? 000000000 ? ============ Plan Table ============ ------------------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ------------------------------------------------------+-----------------------------------+ | 0 | UPDATE STATEMENT | | | | 1 | | | 1 | UPDATE | CUST_SUBSCRB_PERSONAL | | | | | | 2 | INDEX UNIQUE SCAN | CUST_SUBSCRB_PERSONAL_PK| 1 | 33 | 1 | 00:00:01 | ------------------------------------------------------+-----------------------------------+ =============================================================================================================== 同时还伴随有ORA-00600:[qertbFetchByRowID]出现 Fri Sep 16 01:08:57 2011 Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_2138788.trc: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] Current SQL statement for this session: SELECT * from CUST_SUBSCRB_PERSONAL A WHERE a.subscrbid=307557025 ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 578318D500000003 ? FFFFFFFFFFF80D0 ? ksedmp+0290 bl ksedst 104A54870 ? ksfdmp+0018 bl 03F30204 kgerinv+00dc bl _ptrgl kgeasnmierr+004c bl kgerinv 11048AE68 ? 000000000 ? 000000001 ? 104BDEC18 ? 110489398 ? qertbFetchByRowID+0 bl 03F2EF38 d34 opifch2+141c bl 03F2EB1C opifch+003c bl opifch2 1100DD338 ? 000000000 ? FFFFFFFFFFF9980 ? ============ Plan Table ============ ---------------------------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | ---------------------------------------------------------------+-----------------------------------+ | 0 | SELECT STATEMENT | | | | 1 | | | 1 | TABLE ACCESS BY INDEX ROWID | CUST_SUBSCRB_PERSONAL | 1 | 287 | 1 | 00:00:01 | | 2 | INDEX UNIQUE SCAN | CUST_SUBSCRB_PERSONAL_PK| 1 | | 1 | 00:00:01 | ---------------------------------------------------------------+-----------------------------------+ =============================================================================================================== 此外还会出现ORA-00600: [25027], [6], [1443670797], [], [], [], [], [] Fri Sep 16 15:06:00 2011 Errors in file /oravl01/oracle/admin/CRMDB2/udump/crmdb22_ora_1872436.trc: ORA-00600: internal error code, arguments: [25027], [6], [1443670797], [], [], [], [], [] ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [25027], [6], [4278267905], [], [], [], [], [] Current SQL statement for this session: select q.subscrbid,q.serv_lvl,q.cust_lvl from cust_subscrb_personal q where q.serv_lvl is null or q.cust_lvl is null ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 088424844 ? 041124844 ? ksedmp+0290 bl ksedst 104A54870 ? ksfdmp+0018 bl 03F30204 kgeriv+0108 bl _ptrgl kgesiv+0080 bl kgeriv 000000000 ? 000000000 ? 000000000 ? 105673724 ? 00000000C ? ksesic2+0060 bl kgesiv 000000245 ? 0FFFFFFFF ? 700000010013BD0 ? 700000010018078 ? 110000998 ? krtd2abh+040c bl ksesic2 61C3000061C3 ? 000000000 ? 000000006 ? 000000000 ? 0FF013001 ? 000000FA0 ? 105673FB8 ? 105673FA8 ? kcbgtcr+24a0 bl krtd2abh 700000471D95738 ? 7000001A9F0F050 ? FFFFFFFFFFF78E0 ? 700000471D93500 ? 000000003 ? ktrget+04ac bl kcbgtcr 1105A75A8 ? 000002000 ? 000000000 ? 000001940 ? kdsgrp+00bc bl ktrget 000000064 ? 110000998 ? 700000010008000 ? kdsgnp+0444 bl kdsgrp 000000000 ? 000000000 ? FFFFFFFFFFF80D0 ? kafger+08fc bl kdsgnp 000000000 ? 000000000 ? 111119E38 ? kdstf1100101km+0f58 bl kafger 1105A74D8 ? 111119E38 ? FFFFFFFFFFF8190 ? 000000000 ? 70000046DCA8488 ? 000000000 ? 0FFFF9608 ? 110471948 ? kdsttgr+1a04 bl kdstf1100101km 111119E38 ? 0000102A1 ? 70000042DF6DC50 ? FFFFFFFFFFF8418 ? 0FFFFFFFF ? 000001FE8 ? 000000000 ? 000000000 ? qertbFetch+09b8 bl kdsttgr 111119E38 ? 000000418 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000418 ? BE4610195B2C ? opifch2+141c bl 03F2EB1C opifch+003c bl opifch2 1100DD338 ? 000000000 ? FFFFFFFFFFF9980 ? ============ Plan Table ============ --------------------------------------------------+-----------------------------------+ | Id | Operation | Name | Rows | Bytes | Cost | Time | --------------------------------------------------+-----------------------------------+ | 0 | SELECT STATEMENT | | | | 4898 | | | 1 | TABLE ACCESS FULL | CUST_SUBSCRB_PERSONAL| 15 | 165 | 4898 | 00:00:59 | --------------------------------------------------+-----------------------------------+
我们先从ORA-00600:[13013], [5001]内部错误入手,Mos Note <How to resolve ORA-00600 [13013], [5001] [ID 816784.1]> 比较翔实地介绍了该600错误,其argument的具体含义如下:
Arg [a] Passcount
Arg [b] Data Object number
Arg [c] Tablespace Relative DBA of block containing the row to be updated
Arg [d] Row Slot number
Arg [e] Relative DBA of block being updated (should be same as [c])
Arg [f] Code
我们这里的[13013], [5001], [66209], [50730514], [23], [50730514], [3], [] 就是
data_object_id = 66209 = 0x102A1
RDBA = 50730514 即 3061612 fild id=12 block id =398866
Row Slot number = 23
Code = 3
翻译过来就是对data_object_id为66209 的对象在12号数据文件上的398866块的第23行做代码为3的检测时发现了异常,通过data_object_id可以定位到该对象就是正在执行的SQL语句中的CUST_SUBSCRB_PERSONAL表。
ORA-00600:[13013], [5001]可能由表上的数据行或索引中的记录逻辑讹误(logical Corruption)所造成,这里要强调一下是逻辑讹误,而非物理损坏。逻辑讹误一般是由于Oracle Bug或者memory Corruption 所引起的,该ORA-00600:[13013], [5001]已知可能由5085288和4549673等多个Bug 引发,在版本10.2.0.4 上这些bug 触发概率较高。
伴随ORA-00600:[13013], [5001]发生的ORA-600 [qertbFetchByRowID]常由索引损坏引起,介绍了该问题:
ORA-600 [qertbFetchByRowID] Select Queries on 10.2.0.4 Database [ID 755592.1] Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.4 and later [Release: 10.2 and later ] Symptoms The following error was received during SELECT operation: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] The call stack contains: qertbFetchByRowID <- qergiFetch <- rwsfcd <- qerhjFetch < - qerjotFetch <- rwsfcd <- qertqoFetch <- qerpxSlaveFetch <- qerpxFetch Cause Index corruption. Solution 1. Analyze the affected object to look for logical corruption. SQL> analyze table TABLENAME validate structure cascade; This will place an exclusive lock on the table. If an error is reported try dropping and recreating the index(es) first. 2. If this problem is occurring on a version less than 10.2.0.4, ensure the patch for Bug 4883635 has been applied if Materialized Views are used. BUG:5866783 - ORA-00600: [QERTBFETCHBYROWID] ON SELECT FROM ONE TABLE
而ORA-00600: [25027]也是类似的问题:
ORA-600 [25027] [ID 284433.1] PURPOSE: This article represents a partially published OERI note. It has been published because the ORA-600 error has been reported in at least one confirmed bug. Therefore, the SUGGESTIONS section of this article may help in terms of identifying the cause of the error. This specific ORA-600 error may be considered for full publication at a later date. If/when fully published, additional information will be available here on the nature of this error. ERROR: Format: ORA-600 [25027] [a] [b] VERSIONS: versions 9.2 and above ARGUMENTS: Arg [a] Tablespace Number (TSN) Arg [b] Decimal Relative Data Block Address (RDBA) SUGGESTIONS: 1. If the Arg [b] (the RDBA) is 0 (zero), then this could be due to fake indexes. The following query will list fake indexes: select do.owner,do.object_name, do.object_type,sysind.flags from dba_objects do, sys.ind$ sysind where do.object_id = sysind.obj# and bitand(sysind.flags,4096)=4096; If the above query returns any rows, check the objects involved and consider dropping them as they can cause this error. 2. Run analyze table validate structure on the table referenced in the Current SQL statement in the related trace file. If the Known Issues section below does not help in terms of identifying a solution, please submit the trace files and alert.log to Oracle Support Services for further analysis.
但是请注意在该故障示例中ORA-00600: [25027]给出的Arg [b] Decimal Relative Data Block Address (RDBA)是一个完全不相干的RDBA,具体原因尚不明确。
小插曲
这当中有一个小的插曲,原来客户这里的应用人员需要尽快在该CUST_SUBSCRB_PERSONAL表上执行一段查询语句,但是该语句一旦执行就会遇到ORA-600 [qertbFetchByRowID]错误,因为是周五小周末,所以如果今天无法运行的话,就要拖到下个礼拜了。
所以被要求优先解决该语句执行的问题,首先看了一下该语句的执行计划:
SQL> explain plan for select count(*) from SHUCRM2C.cust_subscrb_personal a, SHUCRM2C.wzy_bs20110916 b 2 where a.svcnum=b.svcnum and a.countyid='A00' and a.serv_lvl=0; Explained. SQL> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------------------ Plan hash value: 3616548176 ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 31 | 231 (1)| 00:00:03 | | 1 | SORT AGGREGATE | | 1 | 31 | | | |* 2 | HASH JOIN | | 425 | 13175 | 231 (1)| 00:00:03 | |* 3 | TABLE ACCESS BY INDEX ROWID| CUST_SUBSCRB_PERSONAL | 425 | 8075 | 208 (0)| 00:00:03 | |* 4 | INDEX RANGE SCAN | CUST_SUBSCRB_PERSONAL_2IX | 42462 | | 20 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | WZY_BS20110916 | 39792 | 466K| 22 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A"."SVCNUM"="B"."SVCNUM") 3 - filter(TO_NUMBER("A"."SERV_LVL")=0) 4 - access("A"."COUNTYID"='A00') Note ----- - dynamic sampling used for this statement 23 rows selected. SQL> select count(*) from SHUCRM2C.cust_subscrb_personal a, SHUCRM2C.wzy_bs20110916 b 2 where a.svcnum=b.svcnum and a.countyid='A00' and a.serv_lvl=0; select count(*) from SHUCRM2C.cust_subscrb_personal a, SHUCRM2C.wzy_bs20110916 b * ERROR at line 1: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
以上执行计划中对表上的索引CUST_SUBSCRB_PERSONAL_2IX做了range scan后通过获得的rowid到表上去fetch记录(qertbFetchByRowID),在实际fetch by rowid 的时候引发了ORA-600错误。
因为需求较为紧急,所以我还是考虑能否使执行计划绕过该索引,投石问路给语句加上了RULE提示,再次执行:
SQL> select /*+ rule */ 2 count(*) from SHUCRM2C.cust_subscrb_personal a, SHUCRM2C.wzy_bs20110916 b 3 where a.svcnum=b.svcnum and a.countyid='A00' and a.serv_lvl=0; COUNT(*) ---------- 11559 Execution Plan ---------------------------------------------------------- Plan hash value: 2851452146 ------------------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | NESTED LOOPS | | | 3 | TABLE ACCESS FULL | WZY_BS20110916 | |* 4 | TABLE ACCESS BY INDEX ROWID| CUST_SUBSCRB_PERSONAL | | 5 | AND-EQUAL | | |* 6 | INDEX RANGE SCAN | CUST_SUBSCRB_SVCNUM_2006 | |* 7 | INDEX RANGE SCAN | CUST_SUBSCRB_PERSONAL_2IX | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(TO_NUMBER("A"."SERV_LVL")=0) 6 - access("A"."SVCNUM"="B"."SVCNUM") 7 - access("A"."COUNTYID"='A00') Note ----- - rule based optimizer used (consider using cbo)
想不到居然执行成功了,但是执行计划当中仍有CUST_SUBSCRB_PERSONAL_2IX这个索引,这让我潜意识中认识到很可能是表而非索引存在逻辑讹误。
不管怎么说至少解了燃眉之急,先把这个方法告诉应用人员,然后回过头来继续诊断。
初步验证索引
为了确定到底是表还是索引存在逻辑讹误,一般需要使用analyze table validate structure cascade命令以验证表和索引的结构,但是该命令会要求以共享方式锁住表(TM mode=4),对于更新频繁的生产系统中的关键应用表,这是不可接受的。同事在接手这个故障时已经考虑过该问题了,所以他推荐用查询的方式来检验到底是那些索引存在corruption,具体方法如下:
通过ROWID_CREATE 构造出ROWID SQL> select dbms_rowid.ROWID_CREATE(1,66209,12,398866,23) from dual; DBMS_ROWID.ROWID_C ------------------ AAAQKhAAMAABhYSAAX SQL> select * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); no rows selected SQL> SELECT INDEX_NAME,COLUMN_NAME FROM DBA_IND_COLUMNS WHERE TABLE_NAME='CUST_SUBSCRB_PERSONAL' ORDER BY 1; INDEX_NAME COLUMN_NAME ------------------------------ ---------------------------------------- CUST_SUBSCRB_CARD_NUM VIP_CARD_NUM CUST_SUBSCRB_MANAGERID_2006 MANAGERID CUST_SUBSCRB_PERSONAL_2IX COUNTYID CUST_SUBSCRB_PERSONAL_3IX CUST_LVL CUST_SUBSCRB_PERSONAL_PK SUBSCRBID CUST_SUBSCRB_SERV_COUNTY SERV_COUNTYID CUST_SUBSCRB_SVCNUM_2006 SVCNUM IDX_CUST_SUBSCRB_PERSONAL_01 SERV_LVL 之后强制使用index提示使用不同的索引 SQL> select /*+ INDEX(a CUST_SUBSCRB_CARD_NUM ) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); no rows selected SQL> select /*+ INDEX(a CUST_SUBSCRB_MANAGERID_2006) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); no rows selected SQL> select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_2IX) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_2IX) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX') * ERROR at line 1: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] SQL> select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_3IX) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); no rows selected SQL> select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_PK ) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); select /*+ INDEX(a CUST_SUBSCRB_PERSONAL_PK ) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX') * ERROR at line 1: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] SQL> select /*+ INDEX(a CUST_SUBSCRB_SERV_COUNTY) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); no rows selected SQL> select /*+ INDEX(a CUST_SUBSCRB_SVCNUM_2006) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); select /*+ INDEX(a CUST_SUBSCRB_SVCNUM_2006) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX') * ERROR at line 1: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], [] SQL> select /*+ INDEX(a IDX_CUST_SUBSCRB_PERSONAL_01) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX'); select /*+ INDEX(a IDX_CUST_SUBSCRB_PERSONAL_01) */ * from SHUCRM2O.CUST_SUBSCRB_PERSONAL a where rowid=chartorowid('AAAQKhAAMAABhYSAAX') * ERROR at line 1: ORA-00600: internal error code, arguments: [qertbFetchByRowID], [], [], [], [], [], [], []
可以看到使用以上方法在其中4个索引上遇到了ORA-00600:[qertbFetchByRowID]。 但是使用该方法还是无法彻底搞清楚逻辑到底存在于table还是index上?
必要的工具
这里我们要介绍一下validate structure还存在online在线使用的选项,在online模式下整个验证过程不会在表上加任何TM锁,这一点具体可以使用10704 事件来证明:
10704, 00000, "Print out information about what enqueues are being obtained" // *Cause: When enabled, prints out arguments to calls to ksqcmi and // ksqlrl and the return values. // *Action: Level indicates details: // Level: 1-4: print out basic info for ksqlrl, ksqcmi // 5-9: also print out stuff in callbacks: ksqlac, ksqlop // 10+: also print out time for each line SQL> create table validate_me (t1 int); Table created. SQL> create index ind_validate_me on validate_me(t1); Index created. SQL> insert into validate_me select rownum from dba_tables where rownum<201; 200 rows created. SQL> commit; Commit complete. SQL> oradebug setmypid; Statement processed. SQL> oradebug event 10704 trace name context forever,level 10; Statement processed. SQL> analyze table validate_me validate structure cascade online; Table analyzed. SQL> oradebug tracefile_name; c:\app\diag\rdbms\g11r2\g11r2\trace\g11r2_ora_3020.trc g11r2_ora_3020.trc========================================================================= *** 2011-09-18 20:55:25.373 Oradebug command 'event 10704 trace name context forever,level 10' console output: <none> *** 2011-09-18 20:55:49.765 ksqgtl *** TX-00060005-000006a9 mode=6 flags=0x401 timeout=0 *** ksqgtl: xcb=0x31434194, ktcdix=2147483647, topxcb=0x31434194 ktcipt(topxcb)=0x0 *** 2011-09-18 20:55:49.766 ksucti: init txn DID from session DID ksqgtl: ksqlkdid: 0001-001B-00000006 *** 2011-09-18 20:55:49.766 *** ksudidTrace: ksqgtl ktcmydid(): 0001-001B-00000006 ksusesdi: 0000-0000-00000000 ksusetxn: 0001-001B-00000006 ksqgtl: RETURNS 0 *** 2011-09-18 20:55:49.766 ksqrcl: TX,60005,6a9 ksqrcl: returns 0
但是validate strucutre online也有它的缺点,那就是在线模式下结构验证命令将不填充索引的状态信息到index_stats视图,如:
SQL> analyze index ind_validate_me validate structure ; Index analyzed. SQL> select count(*) from index_stats; COUNT(*) ---------- 1 SQL> conn / as sysdba Connected. SQL> analyze index ind_validate_me validate structure online; Index analyzed. SQL> select count(*) from index_stats; COUNT(*) ---------- 0
但是因为我们这里只要用到validate structure的结构验证功能,而对索引的详细状态没有兴趣,所以我们可以充分利用该online模式。
具体验证
使用validate structure online具体验证该问题表和表上的索引:
先仅对表进行验证,以区分到底是表还是索引存在逻辑讹误 SQL> analyze table SHUCRM2O.CUST_SUBSCRB_PERSONAL validate structure online; analyze table SHUCRM2O.CUST_SUBSCRB_PERSONAL validate structure online * ERROR at line 1: ORA-01498: block check failure - see trace file 若验证发现问题会出现ORA-01498错误,并产生trace 文件 kdrchk: row is marked as both a Key and Clustered prow=0x7000001f241c45c flag=0xff Block Checking: DBA = 50730514, Block Type = KTB-managed data block data header at 0x7000001f241c07c kdbchk: bad row tab 0, slot 23 Block header dump: 0x03061612 Object id on Block? Y seg/obj: 0x102a1 csc: 0xb43.ecde68ca itc: 3 flg: E typ: 1 - DATA brn: 0 bdba: 0x3061609 ver: 0x01 opc: 0 inc: 0 exflg: 0 ............ 可以看到问题发生在 23 号槽位上 tab 0, row 23, @0x3e0 tl: 4 fb: KCHDFLPN lb: 0x1 cc: 0 cki: 1 tab 0, row 24, @0x3de tl: 2 fb: --HD---N lb: 0x30 tab 0, row 25, @0x3dc tl: 2 fb: --HD---- lb: 0x30 tab 0, row 26, @0x3d8 tl: 4 fb: KCHDFLPN lb: 0xff cc: 0 cki: 255
这里居然第23个row piece的 flag 是KCHDFLPN 即实际fb = 0xff,也就是该row piece同时被标记为key和clustered(row is marked as both a Key and Clustered),因此不管当服务进程尝试update该问题行记录或者通过ROWID访问该row时都出现了ORA-00600错误,虽然其错误代码不同,但都是由于该数据块中第23行记录的flag存在讹误引起的。
关于该ORA-00600:[13013], [5001]问题的成因和解决方法,更多内容可以参考<手工模拟Oracle数据块逻辑讹误引发ORA-00600:[13013], [5001]一例>一文。
待修订!
这个好像前一阵遇到过,不过解决就完事了,没总结啊,汗颜!