【数据恢复】详解ORA-8103错误

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

 

ORA-8103是我们Database Consultant 经常要遇到的一个问题,了解ORA-8103的成因非常重要。

【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题

简单来说ORA-8103 的主要成因有2类:

  • 数据块的 block type 类型 是 无效的 或者读出来的块类型与Oracle期望的不一致。  例如 Oracle 认为该数据块的类型为data(type=6),但实际却不是。
  • 数据块中的data_object_id 和 数据字典中的data_object_id不匹配

 

针对ORA-8103问题 我们优先推荐一些措施:

ORA-08103问题的诊断最好是能生成8103错误的ERROR STACK TRACE, 在TRACE中会记录具体引发8103的对象的OBJ和OBJD,这便于我们定位可能存在corruption的对象。

问题在于往往前台进程遇到ORA-08103错误不会在后台生成TRACE文件,这需要我们手动设置8103 触发ERRORSTACK的EVENTS:

ALTER SYSTEM SET EVENTS ‘8103 TRACE NAME ERRORSTACK LEVEL 3’;

解决思路包括:
1. 通过OBJD和DBA定位到具体的表名和块号
2. 有条件的情况下对该表做一个analyze .. validate structure
3. 有条件的情况下对该表所在tablespace做一个 dbms_space_admin.ASSM_TABLESPACE_VERIFY
4. 有条件的情况下move这张表或者相关的分区,尝试绕过该问题
5. 有条件的情况下降该表或分区移动到MSSM表空间上,绕过该问题

execute dbms_space_admin.tablespace_verify(‘&tablespace_name’)
oradebug setmypid
oradebug tracefile_name

execute dbms_space_admin.assm_tablespace_verify(‘&tablespace_name’,dbms_space_admin.TS_VERIFY_BITMAPS)
oradebug setmypid
oradebug tracefile_name

 

 

针对不同的 analyze validate structure 后得到的结果 , 我们可以得到一些初步的结论:

 

如果执行 flush buffer cache之后再次analyze validate structure不再报ORA-8103错误则说明:

可能是完全正常的现象,之前的ORA-8103正是也因为对象正在被DROP/TRUNCATE而导致SELECT报ORA-8103。一般来说Call Stack会显示进程正尝试访问该段的segment header。 更多信息可以参考BUG 7441661

也可能该问题仅仅发生在buffer cache层,而没有发生在DISK上。通过flush buffer_cache若能解决,则一般是这种情况,往往是Buffer Cache管理的BUG 。

 

 

如果执行 flush buffer cache之后再次analyze validate structure再次报ORA-8103错误则说明:

如果dump对应的数据块发现 该块在逻辑上是完整一致的(也可以用bbed/dbv工具验证), 则有可能是Lost Write,则不是被其他对象重格式化使用了。

这里判断Lost Write的一个重要手段是 对块做recover/blockrecover,如果recover能修复该块,则说明是因为Lost Write引起了本ORA-8103问题,如果不是则说明99%的可能性是BUG引起的。

常见的一种现象是 使用第三方工具在数据库打开的情况下copy 数据库,这些工具的BUG可能导致copy 老的版本的block到目标新库中。

 

另一种可能是 extent盘区级别的不一致。 同一个数据块/extent 可能 同时属于 2个数据段segment,这导致其中的一个被后者覆盖。 通过recover的方式是无法修复这种场景的, 因为这种逻辑的讹误发生在表空间级别的extent信息上。 可以检查dba_extents/dba_segments/dba_free_space这些视图来确定问题数据块到底是否同时属于多个对象, 或者 一个数据块 同时出现在dba_extents/dba_segments/dba_free_space 三个视图中, 因为 used extent 不该出现在dba_free_space中,而free extent不该在dba_extents,当然要排除recyclebin中对象的影响。 绝大多数情况下这种extent逻辑不一致的现象, 被称作extent overlap , 通常是Oracle Space Management空间管理层面的BUG。

 

在对ORA-8103问题的诊断过程中 定位问题的OBJD异常重要。应当说准确地将ORA-8103错误与BUG定位起来是有难度的,因为这往往需要涉及到redo dump以发现到底是哪些opcode造成了后续的objd 或 block type 不一致。在一些BUG中我们发现,由于可能的变量陈旧,造成objd的结构未合理清除, 之后就发现block上的objd是错的了,可能遇到ORA-8103也可能是ORA-1410, 这引起了后续其他的逻辑讹误,以至于很难通过TRACE/REDO LOG DUMP来定位原始问题所在。 这也是为什么虽然在例如版本10.2.0.4上有几个ORA-8103的bug Note, 但这些BUG最终未被close为real software  bug即真的软件BUG , 大多都是不了了之,因为在用户现场的TRACE和REDO DUMP都未必能真实定位到问题所在,这也是为什么我们要说逻辑讹误的分析和处理原要比物理讹误来的复杂。

 

Maclean的经验是 在有大量Oracle DB的环境下 一年出个几次的逻辑/物理坏块是很正常的事情, 对于物理讹误 我们只要切实备份即可99%得解决。 而对于逻辑坏块可做的 事情不多, 打最新的补丁 开 db_block_checking、db_block_checksum几件事情而已。

 

值得一说的是 如果去读一下ORA-8103的一些Bug Note,可以发现使用 LOB、APPEND INSERT、PARALLEL INSERT、exchange partition 、Split partition、advanced compression、HCC 混合列压缩往往是引起ORA-8103的高危操作 , 但实际我们又不可能放弃上述操作。

 

目前已知ORA-8103相关的BUG 列表:

 

NB Bug Fixed Description
13910420 11.2.0.3.BP09, 12.1.0.0 ORA-8103 during insert / update of basicfile LOB in assm segment using space search cache
13725395 11.2.0.3.BP07, 11.2.0.4, 12.1.0.0 ORA-600 [kdzhFindHeadPiece: unnewed > 1] from load into HCC table
13700577 11.2.0.3.BP07, 11.2.0.4, 12.1.0.0 PQ slave dies with ORA-600 [kdblddr_2]
12747437 12.1.0.0 ORA-600 [ktspfmdb:objdchk_kcbnew_3] after purging single consumer queue table
12582839 11.2.0.3, 12.1.0.0 ORA-8103/ORA-600 [3020] on RMAN recovered locally managed tablespace
12321309 12.1.0.0 ORA-600 / ORA-8103 UNUSABLE state of partitioned index is not carried across by TABLESPACE transport using DataPump
11937253 11.2.0.2.6, 11.2.0.2.BP11, 11.2.0.3, 12.1.0.0 A Parallel query fails with ORA-8103 on an Active Dataguard Enviroment.
11850492 11.2.0.3, 12.1.0.0 ORA-8103 ORA-600 ORA-3113 on temporary tables using INDEX FAST FULL SCAN and DIRECT read
10385812 11.2.0.3, 12.1.0.0 ORA-1410 or ORA-8103 by queries with DIRECT READ while concurrent DIRECT INSERT
10329146 11.2.0.1.BP10, 11.2.0.2.2, 11.2.0.2.BP03, 11.2.0.2.GIBUNDLE02, 11.2.0.2.GIPSU02, 11.2.0.3, 12.1.0.0 Lost write in ASM with multiple DBWs and a disk is offlined and then onlined
+ 10209232 11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.0 ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
10136415 11.2.0.3, 12.1.0.0 ORA-8103 on Partitioned IOT after partition maintenance
9965085 11.2.0.3, 12.1.0.0 ORA-1578 / ORA-8103 Temporary table block corruption / space wastage from PDML
9659614 10.2.0.5.3, 11.2.0.2, 11.2.0.3.5, 11.2.0.3.BP05, 12.1.0.0 Large trace file for ORA-8103
9651350 11.2.0.2.2, 11.2.0.2.BP05, 11.2.0.3, 12.1.0.0 Large redo dump and ORA-308 might be raised due to ORA-8103
9275027 11.2.0.2, 12.1.0.0 ORA-600 [kcbnew_3] can occur after TRUNCATE / DROP
9272086 11.1.0.7.4, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0 ORA-8103 by a query on DBA_EXTENTS. Trace file with Block type: 0x44=NGLOB: Extent Map
8754670 11.2.0.2, 12.1.0.0 IMP-17 / ORA-8103 transporting a large dictionary managed tablespace
8740993 11.1.0.7.8, 11.2.0.2, 12.1.0.0 ORA-1410 / ORA-8103 on ADG STANDBY during table scan after DROP/TRUNCATE/SHRINK in PRIMARY
8725282 11.2.0.1.BP08, 11.2.0.2, 12.1.0.0 Corruption from cross platform transport of tablespace with securefile objects
8716064 11.2.0.2, 12.1.0.0 Analyze Table Validate Structure fails on ADG standby with several errors
+ 8597106 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0 Lost Write in ASM when normal redundancy is used
8428523 11.2.0.2, 12.1.0.0 Alter Table Rename causes wrong results/ora-8103/hangs on ADG Standby.
7710827 11.2.0.2, 12.1.0.0 Index rebuild or Merge partition causes wrong results in concurrent reads instead of ORA-8103
7519406 10.2.0.5.1, 11.2.0.1.2, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0 Larger trace than needed for ORA-8103 under kteinicnt1
P 12330911 12.1 EXADATA LSI firmware for lost writes
8876094 11.1.0.7.2, 11.2.0.2 ORA-8103 by DBA_UNDO_EXTENTS or DBMS_SPACE_ADMIN.TABLESPACE_VERIFY on Block type: 0x25
9167831 11.2.0.2 ORA-8103 instead of ORA-1410
7650993 11.1.0.7.1, 11.2.0.1 ORA-8103 in a select at ADG standby database from table stored in ASSM tablespace
7432556 11.1.0.7.1, 11.2.0.1 ORA-8103 by Parallel Query on Partitioned Tables in BIGFILE Tablespaces
7390324 11.2.0.1 ANALYZE signals OERI [kcbgtcr_12]/ORA-8103 on bitmap index
7117200 11.2.0.1 ORA-8103 after TSPITR/PLUGIN tablespace from a restored Level 1 Backup
8825048 11.1.0.7.3 ORA-308/ORA-27037 when dumping archived log for ORA-8103. Dump when event 10736 level 4 is set
6337376 11.1.0.7 OERI:kcbgcur_3 / ORA-8103 after truncating a partition table with LOBs
9711472 11.1.0.6 ORA-8103 on operations for a partitioned LOB if any different partition is dropped
5637976 10.2.0.4, 11.1.0.6 ORA-8103/ORA-1410 from concurrent INSERT / export on ASSM tables
5083393 10.2.0.4, 11.1.0.6 DBA_FREE_SPACE FILE_ID / REL_FNO may be wrong
4592596 10.2.0.4, 11.1.0.6 Corruption (ORA-1410 / ORA-8103) from multi-table insert with direct load
6864586 10.2.0.5 ORA-8103 on partitioned table with a LOB column during analyze table with concurrent add/drop partition.
3569503 9.2.0.6, 10.2.0.4 PQ may signal a false ORA-8103 under load
13618170 ORA-8103 for create index online when the fix of bug 10027403 is installed
3966709 9.2.0.7, 10.1.0.4, 10.2.0.1 Range/object reuse prematurely (ORA-8103)
3868753 9.2.0.7, 10.1.0.5, 10.2.0.1 Concurrent export / INSERT of ASSM segment can fail with ORA-1410 / ORA-8103
+ 5523799 Various OERI (eg kcbgtcr_12) using ASSM managed segments – superceded
P* 6047085 Linux x64-64: SGA corruption / crash following any ORA-7445
* 3785200 9.2.0.6, 10.1.0.2 Corruption possible in automatic space managed segments
3083560 9.2.0.5, 10.1.0.2 ORA-1410 / ORA-8103 from direct path export if concurrent DML occurs
2619867 9.2.0.3, 10.1.0.2 OERI:[KCBGTCR_12] / ORA-8103 / ORA-1410 SELECTing from bitmap managed segment
2551000 9.2.0.4, 10.1.0.2 False ORA-1410 / ORA-8103 possible from ANALYZE COMPUTE/ESTIMATE STATISTICS
2333731 9.2.0.2 ORA-8103 possible in PQ slave
2105419 9.0.1.3, 9.2.0.1 ORA-8103 possible from PQ on bitmap managed segments with concurrent inserts
1998455 8.1.7.3, 9.0.1.3, 9.2.0.1 OERI:KCBGTCR_4 possible from long running DDL if referenced object dropped/truncated
1804299 9.0.1.1, 9.2.0.1 Rollback of Direct load can corrupt BITMAP managed segments / ORA-8103
1698789 9.2.0.1 Wrong results, ORA-1410, ORA-8103, OERI:25012 on SELECT of UNSCOPED REF with ROWID
1504967 9.2.0.1 ORA-8103 possible on READ ONLY standby after TRUNCATE on primary
1400739 8.1.7.1, 9.0.1.0 Block corruption/OERI:2023 /ORA-8103 can occur if TRUNCATE is interrupted (Ctrl-C)
1283521 8.1.7.0 ORA-8103 can occur on TRUNCATED cluster table
589855 7.3.3.6, 7.3.4.1 ORA:1578 or ORA:8103 selecting invalid ROWID
P 1053863 8.0.5.2, 8.0.6.2 NCR: ORA-8103 / corrupt read possible using async IO

 

对于ORA-8103问题的更多信息可以参考:

MOS文档Note 268302.1 ORA-8103 Diagnostics and Solution 

诊断ORA-08103错误

【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题

 

Oracle数据恢复专题

备份恢复是Oracle中永恒的话题, 只要有数据 就有备份恢复的需求。 而在国内对于备份以及备份的可用性往往被企业所忽视。这造成了再数据库恢复上存在着东西方的差异。 更多的老外DBA把经历花在对Oracle内部原理和性能优化的研究上。

oracle data block structure

 

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

ORACLE PRM-DUL 是诗檀软件独立研发的ORACLE数据库灾难恢复软件,其具有全程图形化界面、简单高效等特点。

 

而我们国内 DBA似乎必须要精通一门额外的技术==》 在没有任何备份的情况下 恢复Oracle数据库中数据的技术!  虽然这在大多数情况下是屠龙之技, 但很多时候却又变成了衡量一个DBA技术水准的标准了,(这样不好..  不好)。

 

当然也并不是说 这种无备份下的数据恢复是无技术含金量的,实际上它们很需要对Oracle数据文件、数据块及其数据结构的理解,以及对数据字典构成的了解。

 

这里我们总结Oracle数据恢复专题的专题,包括一些在无备份情况下的数据恢复:例如DUL和BBED工具恢复等技术。

 

 

【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题

【数据恢复】ORA-600[kccpb_sanity_check_2]一例

Oracle rman中set newname可能很慢

如何清除Oracle控制文件中的无用记录,例如v$archived_log中的deleted归档日志记录

如何找回被create or replace覆盖的PL/SQL对象

Archivelog Completed Before VS UNTIL TIME

ASM丢失disk header导致ORA-15032、ORA-15040、ORA-15042 Diskgroup无法mount

Overcome ORA-600[4xxx] open database

数据恢复:解决ORA-600[kghstack_free2][kghstack_err+0068]一例

清理RMAN Catalog恢复目录

如何rename datafile name中存在乱码的数据文件

11g新特性recover corruption list

解决ORA-01578错误一例

Script:收集介质恢复诊断信息

如何重建SYSAUX表空间上的对象

Oracle数据恢复:解决ORA-00600:[4000] ORA-00704: bootstrap process failure错误一例

Script:检查数据库当前是否有备份操作在执行中

数据恢复:模拟2个逻辑坏块

Script:收集Oracle备份恢复信息

Oracle备份恢复:Rman Backup缓慢问题一例

了解rman catalog的兼容性

Oracle内部错误:ORA-00600[2608]一例

使用bbed解决ORA-01189错误

Fractured block found during backing up datafile

手动递增SCN号的几种方法:How to increase System Change Number by manual

DBMS_REPAIR example

 

 

Oracle的损坏/坏块 主要分以下几种:

 

ORA-1578
ORA-8103
ORA-1410
ORA-1499
ORA-1578
ORA-81##
ORA-14##
ORA-26040
ORA-600 Errors
Block Corruption
Index Corruption
Row Corruption
UNDO Corruption
Control File
Consistent Read
Dictionary
File/RDBA/BL

 

 

Error
Description Corruption related to:
ORA-1578 ORA-1578 is reported when a block is thought to be corrupt on read. Block
OERR: ORA-1578 “ORACLE data block corrupted (file # %s, block # %s)” Master Note
OERR: ORA-1578 “ORACLE data block corrupted (file # %s, block # %s)”
Fractured Block explanation
Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g/11g
Diagnosing and Resolving 1578 reported on a Local Index of a Partitioned table
ORA-1410 This error is raised when an operation refers to a ROWID in a table for which there is no such row.
The reference to a ROWID may be implicit from a WHERE CURRENT OF clause or directly from a WHERE ROWID=… clause.
ORA 1410 indicates the ROWID is for a BLOCK that is not part of this table.
Row
Understanding The ORA-1410
Summary Of Bugs Containing ORA 1410
OERR: ORA 1410 “invalid ROWID”
ORA-8103 The object has been deleted by another user since the operation began.
If the error is reproducible, following may be the reasons:-
a.) The header block has an invalid block type.
b.) The data_object_id (seg/obj) stored in the block is different than the data_object_id stored in the segment header. See dba_objects.data_object_id and compare it to the decimal value stored in the block (field seg/obj).
Block
ORA-8103 Troubleshooting, Diagnostic and Solution
OERR: ORA-8103 “object no longer exists” / Troubleshooting, Diagnostic and Solution
ORA-8102 An ORA-08102 indicates that there is a mismatch between the key(s) stored in the index and the values stored in the table. What typically happens is the index is built and at some future time, some type of corruption occurs, either in the table or index, to cause the mismatch. Index
OERR ORA-8102 “index key not found, obj# %s, file %s, block %s (%s)
ORA-1499 An error occurred when validating an index or a table using the ANALYZE command.
One or more entries does not point to the appropriate cross-reference.
Index
ORA-1499. Table/Index row count mismatch
OERR: ORA-1499 table/Index Cross Reference Failure – see trace file
ORA-1498 Generally this is a result of an ANALYZE … VALIDATE … command.
This error generally manifests itself when there is inconsistency in the data/Index block. Some of the block check errors that may be found:-
a.) Row locked by a non-existent transaction
b.) The amount of space used is not equal to block size
c.) Transaction header lock count mismatch.
While support are processing the tracefile it may be worth the re-running the ANALYZE after restarting the database to help show if the corruption is consistent or if it ‘moves’.
Send the tracefile to support for analysis.
If the ANALYZE was against an index you should check the whole object. Eg: Find the tablename and execute:
ANALYZE TABLE xxx VALIDATE STRUCTURE CASCADE;
Block
OERR: ORA 1498 “block check failure – see trace file”
ORA-26040 Trying to access data in block that was loaded without redo generation using the NOLOGGING/UNRECOVERABLE option.
This Error raises always together with ORA-1578
Block
OERR ORA-26040 Data block was loaded using the NOLOGGING option
ORA-1578 / ORA-26040 Corrupt blocks by NOLOGGING – Error explanation and solution
ORA-1578 ORA-26040 in a LOB segment – Script to solve the errors
ORA-1578 ORA-26040 in 11g for DIRECT PATH with NOARCHIVELOG even if LOGGING is enabled
ORA-1578 ORA-26040 On Awr Table
Errors ORA-01578, ORA-26040 On Standby Database
Workflow Tables ORA-01578 ORACLE data block corrupted ORA-26040 Data block was loaded using the NOLOGGING option
ORA-1578, ORA-26040 Data block was loaded using the NOLOGGING option
ORA-600[12700] Oracle is trying to access a row using its ROWID, which has been obtained from an index.
A mismatch was found between the index rowid and the data block it is pointing to. The rowid points to a non-existent row in the data block. The corruption can be in data and/or index blocks.
ORA-600 [12700] can also be reported due to a consistent read (CR) problem.
Consistent Read
Resolving an ORA-600 [12700] error in Oracle 8 and above.
ORA-600 [12700] “Index entry Points to Missing ROWID”
ORA-600[3020] This is called a ‘STUCK RECOVERY’.
There is an inconsistency between the information stored in the redo and the information stored in a database block being recovered.
Redo
ORA-600 [3020] “Stuck Recovery”
Information Required for Root Cause Analysis of ORA-600 [3020] (stuck recovery)
ORA-600[4194] A mismatch has been detected between Redo records and rollback (Undo) records.
We are validating the Undo record number relating to the change being applied against the maximum undo record number recorded in the undo block.
This error is reported when the validation fails.
Undo
ORA-600 [4194] “Undo Record Number Mismatch While Adding Undo Record”
Basic Steps to be Followed While Solving ORA-00600 [4194]/[4193] Errors Without Using Unsupported parameter
ORA-600[4193] A mismatch has been detected between Redo records and Rollback (Undo) records.
We are validating the Undo block sequence number in the undo block against the Redo block sequence number relating to the change being applied.
This error is reported when this validation fails.
Undo
ORA-600 [4193] “seq# mismatch while adding undo record”
Basic Steps to be Followed While Solving ORA-00600 [4194]/[4193] Errors Without Using Unsupported parameter
Ora-600 [4193] When Opening Or Shutting Down A Database
ORA-600 [4193] When Trying To Open The Database
ORA-600[4137] While backing out an undo record (i.e. at the time of rollback) we found a transaction id mis-match indicating either a corruption in the rollback segment or corruption in an object which the rollback segment is trying to apply undo records on.
This would indicate a corrupted rollback segment.
Undo/Redo
ORA-600 [4137] “XID in Undo and Redo Does Not Match”
ORA-600[6101] Not enough free space was found when inserting a row into an index leaf block during the application of undo. Index
ORA-600 [6101] “insert into leaf block (undo)”
ORA-600[2103] Oracle is attempting to read or update a generic entry in the control file.
If the entry number is invalid, ORA-600 [2130] is logged.
Control File
ORA-600 [2130] “Attempt to access non-existant controlfile entry”
ORA-600[4512] Oracle is checking the status of transaction locks within a block.
If the lock number is greater than the number of lock entries, ORA-600 [4512] is reported followed by a stack trace, process state and block dump.
This error possibly indicates a block corruption.
Block
ORA-600 [4512] “Lock count mismatch”
ORA-600[2662] A data block SCN is ahead of the current SCN.
The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN stored in a UGA variable.
If the SCN is less than the dependent SCN then we signal the ORA-600 [2662] internal error.
Block
ORA-600 [2662] “Block SCN is ahead of Current SCN”
ORA 600 [2662] DURING STARTUP
ORA-600[4097] We are accessing a rollback segment header to see if a transaction has been committed.
However, the xid given is in the future of the transaction table.
This could be due to a rollback segment corruption issue OR you might be hitting the following known problem.
Undo
ORA-600 [4097] “Corruption”
ORA-600[4000] It means that Oracle has tried to find an undo segment number in the dictionary cache and failed. Undo
ORA-600 [4000] “trying to get dba of undo segment header block from usn”
ORA-600[6006] Oracle is undoing an index leaf key operation. If the key is not found, ORA-00600 [6006] is logged.
ORA-600[6006] is usually caused by a media corruption problem related to either a lost write to disk or a corruption on disk.
Index
ORA-600 [6006]
ORA-600[4552] This assertion is raised because we are trying to unlock the rows in a block, but receive an incorrect block type.
The second argument is the block type received.
Block
ORA-600 [4555]
ORA-600[6856] Oracle is checking that the row slot we are about to free is not already on the free list.
This internal error is raised when this check fails.
Row
ORA-600 [6856] “Corrupt Block When Freeing a Row Slot
ORA-600[13011] During a delete operation we are deleting from a view via an instead-of trigger or an Index organized table and have exceeded a 5000 pass count when we raise this exception. Row
ORA-600 [13011] “Problem occurred when trying to delete a row”
ORA-600[13013] During the execution of an UPDATE statement, after several attempts (Arg [a] passcount) we are unable to get a stable set of rows that conform to the WHERE clause. Row
ORA-600 [13013] “Unable to get a Stable set of Records”
How to resolve ORA-00600 [13013], [5001]
ORA-600[13030]
ORA-600 [13030]
ORA-600[25012] We are trying to generate the absolute file number given a tablespace number and relative file number and cannot find a matching file number or the file number is zero. afn/rdba/tsn
ORA-600 [25012] “Relative to Absolute File Number Conversion Error”
ORA-600[25026] Looking up/checking a tablespace
invalid tablespace ID and/or rdba found
afn/rdba/tsn
ORA-600 [25026]
ORA-600[25027] Invalid tsn and/or rfn found afn/rdba/tsn
ORA-600 [25027]
ORA-600[kcbz_check_objd_typ] An object block buffer in memory is checked and is found to have the wrong object id. This is most likely due to corruption. Buffer Cache
ORA-600 [kcbz_check_objd_typ_3]
ORA-600 [kcbz_check_objd_typ]
ORA-600[kddummy_blkchkORA-600[kdblkcheckerror] ORA-600[kddummy_blkchk] is for 10.1/10.2 and ORA-600[kdblkcheckerror] for 11 onwards. Block
ORA-600 [kddummy_blkchk]
How to Resolve ORA-00600[kddummy_blkchk]
ORA-600 [kdblkcheckerror]
QREF – kddummy_blkchk / kdBlkCheckError – Check Codes Listing (Full) [This section is not visible to customers.]
QREF – kddummy_blkchk / kdBlkCheckError – Check Codes Definition && Return Values[This section is not visible to customers.]
ORA-600[ktadrprc-1] Dictionary
ORA-600 [ktadrprc-1]
ORA-600[ktsircinfo_num1] This exception occurs when there are problems obtaining the row cache information correctly from sys.seg$. In most cases there is no information in sys.seg$. Dictionary
ORA-600 [ktsircinfo_num1]
ORA-600[qertbfetchbyrowid] Row
ORA-600 [qertbfetchbyrowid]
ORA-600[ktbdchk1-bad dscn] This exception is raised when we are performing a sanity check on the dependent SCN and fail.
The dependent scn is greater than the current scn.
Dictionary
ORA-600 [ktbdchk1: bad dscn]

 

 

 

对于任何数据层面的BUG,我们在诊断时值得自问的是:

 

  • 是否收集了基本的数据
  • 这些数据损坏/坏块仅仅是发生在磁盘上的吗
  • 是否实例崩溃了且起不来了?
  • stack trace里是否有KDBLKCHECKERROR的信息?
  • stack trace里是否有kdsgrp1的信息?
  • 是否导致查询到错误的结果?
  • 有哪些RDBMS特性正在被使用?
  • 是否运行在Exadata上?

 

 

基础数据的收集

 

除非数据库已经崩溃而没法升级,否则我们总是推荐先收集基础数据,这帮助我们了解数据库的运行历史情况。 一个重要的信息在于要找出哪些问题对象的DDL语句,这让我们知道它使用了哪些RDBMS特性,例如表上有 LOB大对象 的情况和没有的情况就完全不一样。

 

数据结构

 

首先搞清楚哪些对象涉及到这个问题中

  • 搞清楚以来的对象 例如索引,约束等等
  • 这个问题是发生在普通堆表上吗? 是什么类型的堆表? 无压缩,OLTP压缩还是 EHCC压缩?
  • 问题和 LOB有关吗? 是Securefile还是BasicFile? 高级Securefile特性还包括压缩 加密等
  • 问题发生在 IOT上吗?
  • 问题发生在索引上吗? 无压缩,压缩还是bitmap index?
  • 表的基础上是否定义了视图?
  • 表或索引是否分区了? 如果是,何种分区? 多少分区?
  • 如果表被分区了,是本地还是全局索引?

 

ALERT和TRACE文件

 

alert.log和trace文件对于此类意外是必要查看的。一般只收集相关文件,因为其内容已经足够丰富了,一般不收集整个udump和bdump目录。 alert.log 最好能完整一些,要包含第一次发生该错误,同时这也让我们能够参考发生问题的init.ora参数文件信息。 如果是一系列相同的错误,那么提供前几个trace文件较好,而不是后来生成的。

对于 RAC而言要查看所有节点上的alert.log

 

Redo重做

 

请收集如下关于 redo的 dump:

  • 获得对应的redo block dump, BLOCK号可以从错误trace中获得
  • 从之前的归档中转储redo dump
  • 如果之前的备份可用,那么把原始的数据块从备份中搞一份出来

 

 

磁盘上的损坏/坏块

 

磁盘上的数据块损坏/坏块是较为严重的问题 可能导致数据丢失。所以确认到底是不是发生在磁盘上的corruption很重要。 如果我们怀疑一张表上存在磁盘上的损坏,那么做下面的事情:

  • 运行DBV:
    • DBV可以指定运行在某个数据文件或者一整个块上, 前提是你知道 块号, 那么先针对块运行
    • 如果你确认不到块,那么对整个数据文件运行
  • 运行 SELECT * FROM TABLE;
    • SELECT * FROM TABLE,运行该语句对于确认表上的数据是否还能被全部读出做判断, 可以配合 alter system flush buffer_cache运行
    • 如果运行SELECT * FROM TABLE 的输出太多了,那么可以运行例如’select  max(col1), max(col2) … max(colN) from Table’; 或者自己写一个游标循环
  • 运行ANALYZE TABLE VALIDATE STRUCTURE

 

 

通过上面的分析, 我们可以确认下面的问题了:

  • 检查block cache header是否正常
    • 检查 dba、scn、checksum、 tail和format
    • 如果cache header不正常,那么是数据层面的问题的概率不大
    • 检查缓存或者VOS层
  • 有多少块损坏了呢? 如果有非常多, 则不太可能是oracle bug导致的
  • 有多少表或分区损坏了呢? 收集那些对象涉及到此问题的,以及其DDL等基础信息
  • 有多少行数据被影响了?

 

为了实现root cause analysis根本原因的分析, 以下的信息总是需要的:

  • 所有RAC节点的alert.log
  • 此意外事件相关的所有trace文件
  • 该坏块相关的完整的redo logfile, 它们也可能来自于归档文件
  • 如果有可用的备份,从备份中取出一份原始数据块

 

同时我们有必要向用户推荐

  1. 设置db_block_checking=true
  2. 指定今后的恢复策略

 

 

 

 

 

 

 

Assistant: Get Assistance to understand and solve Oracle Database Server Corruptions (Doc ID 1543698.2)

 

 

需要注意的是 不仅仅ORACLE BUG可能造成这些损坏/坏块, 引起这些corruption问题的另一个主要原因是 OS操作系统、 卷管理器volume manager 或者存储引起的,  因为ORACLE仅仅是一个程序 作为这些底层建筑的使用者。

 

已知的一些由于 硬件/OS/卷管理引起的corruption如下:

cause:

This type of timestamps is used by Veritas Netbackup tool. Netbackup is corrupting the block in a backup.

Netbackup is a 3rd party tool provided by Symantec/Veritas and it can be used with RMAN.

Netbackup should be reading from Oracle files, but somehow it is writing timestamps into the Oracle file head

To prevent this corruption: this issue is known by Symantec/Veritas, and it is fixed beginning with Netbackup client 5.1 MP6 or 4.5 MP9. Open a case with Symantec/Veritas if more information is needed.

该问题的原因是Netbackup NBU早期版本可能直接写时间戳到ORACLE 文件头

 

 Physical Corrupted Blocks consisting of all Zeroes indicate a problem with OS, HW or Storage (Doc ID 1545366.1)

这个Note 介绍 有一些OS或硬件问题 导致数据块变全零, 但ORACLE的设计并不会写出全零到数据块

 

Oracle by design does not write blocks of all zeroes.

This is done to easily identify problems in the underlying operating system, hardware or storage.  This is enforced by Oracle checks that are enabled by default before writing any data or redo block.  No special parameters are needed to enable these checks nor can anyone disable them.    Every Oracle process that writes changes to the disk abides by these rules.    Oracle has seen bugs in hardware, operating system, firmware and storage that result in zero out blocks. Oracle deliberately avoids writing complete zero blocks so it easily detects when external forces caused these data block corruptions. Every Oracle data block, controlfile, redo and tempblocks are stamped with minimally set of meta data.  For data blocks, each block is formatted to include the block offset address (rdba), block format, a flag, tail and checksum within the datafile. When a new datafile is created, Oracle stamps every block with these fields.   Redo logs are similarly pre-formatted or initialized before being used in the database.

Ensuring Oracle never writes zeroed out blocks is also checked at the lowest levels of the database, right before Oracle issues the write() call to the Operating System. If the Oracle RDBMS detects an all-zero block at the point of the write is issued to the operating system, an error ORA-600 is produced or the block is reported as corrupt in the alert log; the block then is not written to disk.
Storage vendors have implemented checks based on the Oracle design of not writing zeros.

This Oracle property (not writing zero blocks) is now checked by some storage vendors to avoid some of these corruptions caused by operating system or storage.

EMC Double Checksum checks for a non-zero DBA field in Oracle blocks. An all-zero block by definition fails checks. Hitachi Database Validator performs the same check. Oracle Exadata performs these, and additional, more comprehensive (HARD) checks.

If Oracle wrote all-zero blocks, it would never have allowed these checks to be implemented into third-party hardware.

 

 

 

ORA-354 Redo log corruption when using Xisgo Driver (Doc ID 1498389.1)

Xisgo Driver 可能引起 redo logfile 损坏

 

 

Database Corruption due to Lost IO on Hitachi storage. ORA-600 [kdsgrp1] ORA-1499 ORA-1410 ORA-600 [3020] (Doc ID 1512717.1)

 

Hitachi Storage 日立存储的微码可能导致 LOST Write写丢失问题,导致镜像不同步, 进而ORACLE出现大量如下错误

Wrong results
ORA-600 [kdsgrp1]
ORA-600 [qertbFetchByRowID]
ORA-1499 by analyze validate structure cascade
ORA-8102
ORA-600 [25027]
ORA-600 [kcbz_check_objd_typ_3]
ORA-8103
ORA-1410
ORA-600 [kclchkblk_3]
ORA-600 [4137]
ORA-600 [4193]
ORA-600 [4194]

During Media Recovery:
ORA-600 [3020]
ORA-00752 (when DB_LOST_WRITE_PROTECT is enabled – 11g)

 

Workaround

In order to workaround the problem please first contact Hitachi to repair storage mirror inconsistency; in addition, evaluate and implement one of the following Oracle workarounds:

If using a Dataguard environment with a Physical Standby Database consider to switchover to the standby.  The physical standby database may detect this problem with an ORA 600[3020] or ORA-752, then the database can be failed over to the standby database with data loss but with a  database free of inconsistencies. The error provides an early detection mechanism and prevents further logical block corruptions. If the standby is using Hitachi Storage that is exposed to this problem, then the standby itself could be logically corrupted too.
Determine when problem started and restore the database from a point in time before it happened.
Recreate the affected objects but be aware that there is no deterministic means to find out the extent of the logical corruption in the database.  Use all the checks in Note 836658.1 to try to identify inconsistencies.
Restore the database and apply media recovery.  This may fail with ORA-600 [3020] or ORA-752 and decide to stop when these errors are encountered.

Fix/Solution

To prevent this problem from happening contact Hitachi for a fix.

 

 

 

ORA-1578 Transient Corruption – Caused by Parity Error on EMC DMX4 (Doc ID 1486903.1)

 

EMC DMX4阵列存储可能导致临时性的ORA-1578错误

 

DBV would often not detect any corruption, when it did it often showed 8 consecutive blocks. 
Usually the blocks reported were found in dba_free_space with basically the same string written across the block.

Frequently 8 consecutive 32K blocks were corrupted which seemed to match up to a write for a  256K allocation unit.
DBVERIFY - Verification starting : FILE = /padb/index/o1_mf_partner__5svmocgf_.dbf
Page 305361 is marked corrupt
***
Corrupt block relative dba: 0x1684a8d1 (file 90, block 305361)
Bad header found during dbv:
Data in bad block -
 type: 5 format: 0 rdba: 0x0fc27bbc
 last change scn: 0x58fe.03282668 seq: 0xe4 flg: 0x9e
 consistency value in tail: 0xcd163ffa
 check value in block header: 0xf3b4, computed block checksum: 0x0
 spare1: 0xdc, spare2: 0x56, spare3: 0x190
***

Page 305362 is marked corrupt
***
Corrupt block relative dba: 0x1684a8d2 (file 90, block 305362)
Bad header found during dbv:
Data in bad block -
 type: 5 format: 0 rdba: 0x0fc27bbf
 last change scn: 0x58fe.03282668 seq: 0xe4 flg: 0x9e
 consistency value in tail: 0xcd163ffa
 check value in block header: 0xc743, computed block checksum: 0x0
 spare1: 0xdc, spare2: 0x56, spare3: 0x190
***

The application also would fail (indicating data and index blocks were also affected)

Corrupt block relative dba: 0x1384a736 (file 78, block 304950)
Bad header found during buffer read
Data in bad block -
 type: 5 format: 0 rdba: 0x0ac2745b
 last change scn: 0x58ec.be15e45c seq: 0xe4 flg: 0x9e
 consistency value in tail: 0x0f223ffa
 check value in block header: 0x58df, computed block checksum: 0x0
 spare1: 0xdc, spare2: 0x56, spare3: 0x190
***
Reread of rdba: 0x1384a736 (file 78, block 304950) found same corrupted data

However, subsequent runs of the application process would find valid data (reflecting the transient nature of the corruption).

Also  at the exact minute ( in ten minutes intervals 4, 14, 24, 34, 44, 54 on the hour) when the corruption was reported, 
it was nearly always detected on more than one server.  It was the same type of corruption just different blocks.
Cause

EMC Parity Issue.

The corruption was detected at various times by the application (ORA-1578), RMAN and by DBV running against the datafile.

The source of the corruption could not be determined until it was called in Virtual Instruments to detect what was occurring on the fabric.

The corruption reported by Oracle was related to the parity (cache) on the EMC DMX4 array.

There is  a parity check  run against  the disk.  If the disk is busy the device  will build an image from parity and provision it.

There was something wrong with the parity check causing it to provide invalid data.

First read against  the parity generated image failed, the second  against the disk was successful.

EMC Engineering  found that corruption is evident from the DMX4 to the host – but not when reading from the primary data disk, 
but when they try to satisfy a read request by using parity to figure out the data needed (this is done only when the production 
data disk is busier and the parity short cut should in theory take less time).

The problem is that Parity on some devices seems to be corrupt (although it was not detected until engineering went in and scanned at the byte level) 
so we end up sending back bad data, although the Sym thinks it’s good.

EMC is doing a root cause analysis.  They think that a rebalance failed during a disk replacement introducing the corruption.

Because no disk errors were being reported internal checks on the DMX4 were not triggered.
Solution

EMC fixed the parity on the devices by rebuilding all the devices.

 

 

ORA-1578 ORA-353 ORA-19599 Corrupt blocks with zeros when filesystemio_options=SETALL on ext4 file system using Linux (Doc ID 1487957.1)

filesystemio_options=SETALL + EXT4 文件系统可能导致 ORA-1578 ORA-353 ORA-19599等坏块,这是kernel-uek-2.6.39-200.29.3.el6uek内核的BUG

 

Database files on ext4 File System on Linux and Database parameter filesystemio_options is set to SETALL.
Cause

Database files on ext4 File System on Linux and Database parameter filesystemio_options is set to SETALL.

This is a Linux defect when using O_SYNC|O_DIRECT on ext4 file systems (filesystemio_options=SETALL open the database files using O_SYNC|O_DIRECT). 
Solution

This is a Linux defect known to be fixed in OS Linux version kernel-uek-2.6.39-200.29.3.el6uek. 

Workaround:

The workaround to avoid corruptions in the Oracle database files is to set filesystemio_options=NONE or 
filesystemio_options=DIRECTIO or filesystemio_options=ASYNCH in the database parameter file (spfile / init.ora).

To repair the affected blocks:

    Use RMAN Blockrecover or Datafile media recovery.  Reference Note 1578.1

 

 

 

 

ORA-1578 Misplaced blocks against datafiles stored in NFS filers (Doc ID 1525108.1)

由于使用相同的 IP 可能导致NFS 或者NAS导致ORA-1578 坏块, ORACLE作为用户程序 无法了解到这些底层的问题

 

It was identified that there were two devices using the same IP address.  
There were network packets going to switch-server-switch causing low performance with high cpu usage. 
 That network was in the same network used by the NAS storage. 
 It looks like that when the affected switch came online it corresponded near to when the corruption was reported.  

This is not an Oracle defect.
Solution

Workaround:

If the blocks are already corrupt, use RMAN block media recovery or Datafile media recovery to repair the corruption (rfile#=45 in our example above). 

Solution:

There are not new corruptions after the duplicate IP was fixed.  P
lease contact the SYSADMIN or NFS filer admin for any further information as fixing IP addresses is not part of Oracle Software.

 

 

 

ORA-1578 ORA-354 ORA-600 [3020] Misplaced blocks by Symantec / Veritas after adding LUN (Doc ID 1323532.1)

 

Veritas /Symantec 的卷管理器 当使用 PERSISTENCE=NO 时存在BUG ,导致 IO丢失:

 

Changes

A new LUN has been added with PERSISTENCE set to NO while the Oracle database is OPEN.
Cause

The cause of the issue is a bug on the Veritas layer when presenting LUNs (new or existing ones) with PERSISTENCE=NO.  
It is supposed to make sure all I/O is halted before rearranging the existing LUN names to accommodate the new LUN 
(when Veritas PERSISTENCE parameter is set to NO it means the disk names can float when new LUNs are added and upon server reboot). 
The Veritas VXIO  bug causes the IOs to be miss-directed for a few seconds.
Solution

Workaround:

Set PERSISTENCE to YES before adding LUN.  Contact Symantec / Veritas for more information.

If the blocks are already corrupt, use RMAN block media recovery or Datafile media recovery to repair the corruption (rfile#=5 in our example above). 

As this could also cause a missing write in the original block were the IO was intended to go to in the first place (rfile#=10 in our example above), 
the recovery of this block may produce ORA-600 [3020] for which a Point In Time Recovery before problem happens can be the best overall solution to get a consistent database.

Solution:

To prevent this issue from happening contact Symantec about:

    Symantec Technote Tech158119
    Symantec etrack product defect incident 2349352
    Symantec Issue will be resolved in 5.0MP3RP5P1

    The fix is platform dependent.  For more information reference technote TECH158119:

    http://www.symantec.com/business/support/index?page=content&id=TECH158119

 

 

 

ORA-1578 blocks overwritten by EMC Networker – RMAN Networker Module for Oracle (NMO) (Doc ID 1323567.1)

EMC Networker Module for Oracle (NMO)的 BUG可能导致数据块被EMC覆盖掉

 

Changes

EMC Networker Module for Oracle (NMO) has been installed.
Cause

This is caused by EMC Bug LGTSC23600

Issue with nmo 4.5 scheduling an RMAN backup.
Solution

Workarounds

To prevent the issue from happening downgrade from nmo 4.5 to nmo 4.2.

To repair the corruption use RMAN block media recovery or Datafile Media Recovery.

Solution

Contact EMC to install a fix for EMC Bug LGTSC23600.

 

 

 

 

ORA-1578 Block overwritten with string “DiskDescription cyl alt hd sec” when using Symantec / Veritas (Doc ID 1313454.1)

 

Symantec / Veritas的metadata存在限制,可能导致 ORACLE数据块被误覆盖

 

Disks have been installed using Symantec / Veritas
Cause

Symantec has confirmed it as a limitation with their software :

There is limitation of the size of the metadata of the volume manager. Max size of VTOC (volume table of contens) is 65535. 
If number of cylinders for a lun exceeds this limit, metadata may be written into data part of disk and thus may corrupt blocks in filesystems.
Solution

To repair the block:

    Use RMAN Blockrecover or Datafile media recovery.
    If the affected block is the OS BLock header, resize the datafile.

Contact Symantec for more information about this limitation.

 

 

 

ORA-1578 Data Block Corruptions when using EMC Storage. Blocks with 0xc9 byte (Doc ID 1323108.1)

EMC 存储的raid机制可能导致ORA-1578

 

 

Changes
RAID 5 and EMC storage.
Cause
From EMC:

EMC new disks have generic data from the factory C9C9C9 repeated pattern.

When using RAID 5, if defective SAN disks are detected upon power-up, hot spare disk are kicked in automatically to replace the bad drives.

The hot-spare were prematurely declared as ready, before the data completes the reconstruction.

Hot-spare unfortunately claimed to be authoritative to provide data whereas it 
still had the generic data from the factory (ie. C9C9C9 repeated pattern)

Solution
Workaround

Oracle database Blocks can be repaired using RMAN Block media recovery or Datafile media recovery.

Solution

Contact EMC to get a fix for software bug seen on code level 5773-141.

EMC manually drops the hot-spare and forces it to resync,  
reconstructing the data. Database are requested to remain down to avoid further potential corruption.

 

 

Block Corruption (ALL ZERO) detected after reclaiming space on Veritas Filesystem (Doc ID 1587427.1)

 

Symptoms

 RDBMS is reporting corrupted blocks with Zeroes after the underlying diskspace has been 'reclaimed'.  Example from RMAN validate:
Hex dump of (file 15, block 783824) in trace file /u01/app/oracle/diag/rdbms/orcl/ORCL/trace/ORCL_ora_12515.trc
Corrupt block relative dba: 0x03cbf5d0 (file 15, block 783824)
Completely zero block found during validation
Reread of blocknum=783824, file=/u01/oradata/orcl/datafile_08.dbf. found same corrupt data
Reread of blocknum=783824, file=/u01/oradata/orcl/datafile_08.dbf. found same corrupt data
Reread of blocknum=783824, file=/u01/oradata/orcl/datafile_08.dbf. found same corrupt data
Reread of blocknum=783824, file=/u01/oradata/orcl/datafile_08.dbf. found same corrupt data
Reread of blocknum=783824, file=/u01/oradata/orcl/datafile_08.dbf. found same corrupt data

 This may also produce ORA-1578 or ORA-8103 if the corrupt block is read by a SQL statement.
Cause

Corruption is caused by an issue in the Veritas Software : Thin Reclamation problem.

This is outside of the Oracle RDBMS.
Solution

 Symantec provided a fix for the issue :

      VxVM Hot-fix has been released and created for Solaris Sparc, 6.0.3.011 (vm-sol10_sparc-6.0.3.011). 
VRTSaslapm 6.0.100.201 is required when install VxVM Hot-fix 6.0.3.011.

Symantex recommends all customers to install this fix when using thin reclamation with VxVM and DMP.

Contact Symantec if more information is needed.

Symantec Article: TECH209201 :

      Veritas Volume Manager (VxVM) 6.0.3 (Solaris Sparc) VxVM thin reclamation functionality can cause disk label loss or private/public region corruption issue.

 

【数据恢复】利用构造ROWID实现无备份情况下绕过ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

对于无备份情况下的ORA-1578、ORA-8103、ORA-1410等逻辑/物理坏块问题,可以通过如下PL/SQL 构造ROWID的方式挽救绝大多数非坏块的数据, 一般分成 2种情况 有索引可以用来获取ROWID, 或者 没有索引、索引不可用,必须通过dbms_rowid.ROWID_CREATE来构造ROWID的方式。

 

对于有索引的情况,可以直接使用MOS上提供的脚本:

 

 

REM Create a new table based on the table that is producing errors with no rows:

create table 
as
select *
from   
where  1=2;

REM Create the table to keep track of ROWIDs pointing to affected rows:

create table bad_rows (row_id rowid
                      ,oracle_error_code number);
set serveroutput on

DECLARE
  TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  CURSOR c1 IS select /*+ index(tab1) */ rowid
  from  tab1
  where  is NOT NULL;
  r RowIDTab;
  rows NATURAL := 20000;
  bad_rows number := 0 ;
  errors number;
  error_code number;
  myrowid rowid;
BEGIN
  OPEN c1;
  LOOP
   FETCH c1 BULK COLLECT INTO r LIMIT rows;
   EXIT WHEN r.count=0;
   BEGIN
    FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
     insert into 
     select /*+ ROWID(A) */ 
     from  A where rowid = r(i);
   EXCEPTION
   when OTHERS then
    BEGIN
     errors := SQL%BULK_EXCEPTIONS.COUNT;
     FOR err1 IN 1..errors LOOP
      error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
      if error_code in (1410, 8103, 1578) then
       myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
       bad_rows := bad_rows + 1;
       insert into bad_rows values(myrowid, error_code);
      else
       raise;
      end if;
     END LOOP;
    END;
   END;
   commit;
  END LOOP;
  commit;
  CLOSE c1;
  dbms_output.put_line('Total Bad Rows: '||bad_rows);
END;
/

 

 

 

对于没有索引或者索引损坏的情况可以使用如下方法:

 

 

创建示例数据

create table maclean_tab1 (t1 int,t2 date default sysdate) tablespace users
partition by range(t1) 
(partition p1 values less than (10000),
partition p2 values less than (20000),
partition p3 values less than (30000),
partition p4 values less than (40000),
partition p5 values less than (50000),
partition p6 values less than (60000),
partition p7 values less than (70000),
partition p8 values less than (80000),
partition p9 values less than (90000),
partition p10 values less than (100000),
partition p11 values less than (110000),
partition p12 values less than (120000),
partition p13 values less than (130000),
partition p14 values less than (140000),
partition p15 values less than (150000),
partition p16 values less than (160000))
;

insert into maclean_tab1(t1) select rownum from dual connect by level<160000;

commit;

SQL> select count(*) from maclean_tab1;

  COUNT(*)
----------
    159999

exec dbms_stats.gather_table_stats(USER,'MACLEAN_TAB1');   

SQL> alter system flush buffer_cache;

System altered.

SQL> /

System altered.

随即采样一些块来做 坏块 约涉及到10个块的数据

set linesize 200 pagesize 1400

select dbms_rowid.rowid_block_number(rowid) blkid,
       dbms_rowid.rowid_relative_fno(rowid) rfile
  from maclean_tab1 sample(0.01)
 where rownum <= 200
 group by dbms_rowid.rowid_block_number(rowid),
          dbms_rowid.rowid_relative_fno(rowid)
 order by 1;

     BLKID      RFILE
---------- ----------
    741833          4
    741850          4
    741994          4
    742030          4
    742085          4
    742141          4
    742159          4
    742172          4
    742173          4
    742179          4

 制造坏块

    [oracle@vrh8 udump]$ rman target  /

RMAN> blockrecover datafile 4 block 741833,741850,741994,742030,742085,742141,742159,742172,742173,742179 clear;

Starting blockrecover at 21-APR-13
using channel ORA_DISK_1
Finished blockrecover at 21-APR-13

SQL> select count(*) from maclean_tab1;
select count(*) from maclean_tab1
                     *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 4, block # 741833)
ORA-01110: data file 4:
'/s01/oradata/G10R25/datafile/o1_mf_users_8nx5srgb_.dbf'

通过blockrecover datafile block clear 构造了一系列坏块且没有备份 ,我们通过下列脚本挽回大部分可用数据

一个实际操作过程



drop table maclean_tab_backup;

create table maclean_tab_backup 
tablespace users 
nologging compress pctfree 0 pctused 99        --可以注释掉的
 as select * from maclean_tab1 where 1=0;

drop table bad_rows;
create table bad_rows (row_id rowid,oracle_error_code varchar2(50))
tablespace users 
nologging compress pctfree 0 pctused 99        --可以注释掉的;

set serveroutput on;
set timing on;

DECLARE
  TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  CURSOR Crowid_info IS
    select Do.DATA_OBJECT_ID dataid,
           DE.FILE_ID        fid,
           DE.BLOCK_ID       blkid,
           DE.BLOCKS         blkcnt
      from dba_objects DO, dba_extents DE
     where DO.OBJECT_NAME = 'MACLEAN_TAB1' 
     --and DE.PARTITION_NAME='&PARTITION_NAME'          --若指定分区则取消注释
       and nvl(DO.SUBOBJECT_NAME,'-1') = nvl(DE.PARTITION_NAME,'-1')
       and DO.OBJECT_NAME = DE.SEGMENT_NAME
       and DO.owner = 'SYS'
     order by 1, 2, 3 asc;
  bad_rows   number := 0;
  errors     varchar2(500);
  error_code varchar2(500);
  myrowid    rowid;
BEGIN
  /* Maclean Liu https://www.askmac.cn * Copy Right 2013-4-20 */ 
  execute immediate 'alter session set commit_write=''batch,nowait'' ';
  for i in Crowid_info loop
    for j in 0 .. i.blkcnt - 1 loop
      for z in 0 .. 2000 loop
        begin
          myrowid := dbms_rowid.ROWID_CREATE(1,
                                             i.dataid,
                                             i.fid,
                                             i.blkid + j,
                                             z);
          insert into maclean_tab_backup
            select /*+ ROWID(A) */
             *
              from maclean_tab1 A
             where rowid = myrowid;
        EXCEPTION
          when OTHERS then
            BEGIN
              errors     := SQLERRM;
              error_code := SQLCODE;
              if (error_code like '%1410%' or error_code like '%8103%' or  error_code like '%1578%') then
                bad_rows := bad_rows + 1;
                insert into bad_rows values (myrowid, error_code);
                commit;
              else
                raise;
              end if;
            END;
            commit;
        end;
      end loop;
    end loop;
  end loop;
  dbms_output.put_line('Total Bad Rows: ' || bad_rows);
  commit;
END;
/

Elapsed: 00:01:10.16

SQL> select count(*) from maclean_tab_backup;

  COUNT(*)
----------
    155921

 ===>损失了少量的10个块的数据

 

 

原始脚本如下:

步骤1 创建备份表

create table <new table name>
as
select *
from   <original table name>
where  1=2;

步骤2 创建bad_rows表

create table bad_rows (row_id rowid,oracle_error_code varchar2(50));

步骤3 运行下列脚本

DECLARE
  TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
  CURSOR Crowid_info IS
    select Do.DATA_OBJECT_ID dataid,
           DE.FILE_ID        fid,
           DE.BLOCK_ID       blkid,
           DE.BLOCKS         blkcnt
      from dba_objects DO, dba_extents DE
     where DO.OBJECT_NAME = '&TABNAME' 
     --and DE.PARTITION_NAME='&PARTITION_NAME'          --若指定分区则取消注释
       and nvl(DO.SUBOBJECT_NAME,'-1') = nvl(DE.PARTITION_NAME,'-1')
       and DO.OBJECT_NAME = DE.SEGMENT_NAME
       and DO.owner = '&OWNER'
     order by 1, 2, 3 asc;
  bad_rows   number := 0;
  errors     varchar2(500);
  error_code varchar2(500);
  myrowid    rowid;
BEGIN
  /* Maclean Liu https://www.askmac.cn * Copy Right 2013-4-20 */ 
  execute immediate 'alter session set commit_write=''batch,nowait'' ';
  for i in Crowid_info loop
    for j in 0 .. i.blkcnt - 1 loop
      for z in 0 .. 2000 loop
        begin
          myrowid := dbms_rowid.ROWID_CREATE(1,
                                             i.dataid,
                                             i.fid,
                                             i.blkid + j,
                                             z);
          insert into &backup_table
            select /*+ ROWID(A) */
             *
              from &source_table A
             where rowid = myrowid;
        EXCEPTION
          when OTHERS then
            BEGIN
              errors     := SQLERRM;
              error_code := SQLCODE;
              if (error_code like '%1410%' or error_code like '%8103%' or  error_code like '%1578%') then
                bad_rows := bad_rows + 1;
                insert into bad_rows values (myrowid, error_code);
                commit;
              else
                raise;
              end if;
            END;
            commit;
        end;
      end loop;
    end loop;
  end loop;
  dbms_output.put_line('Total Bad Rows: ' || bad_rows);
  commit;
END;
/

构造ROWID绕过ORA-1578、ORA-8103、ORA-1410脚本下载

【数据恢复】ORA-600[kccpb_sanity_check_2]一例

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com

 

 

kccpb_sanity_check_2内核函数kernel function负责监测控制文件的健康性,该ORA-600[kccpb_sanity_check_2]一般在alter database mount阶段发生; 该ORA-600[kccpb_sanity_check_2]发生的原因一般是 控制文件controlfile 块头的seq#号大于控制文件头中的seq#,所以该监测函数认为存在控制文件逻辑不一致。

该kccpb_sanity_check_2函数是从10gR2才引入了,换句话说9i没有这样的控制文件健康性监测,引入该特性的目的是为了检测出写丢失lost write和陈旧读stale read。

 

 

该ORA-600[kccpb_sanity_check_2]一般有2个argument代码:

ARGUMENTS:
Arg [a] seq# in control block header.
Arg [b] seq# in the control file header.
Arg [c] maclean

 

 

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kccpb_sanity_check_2], [53672], [53643], [0x000000000], [], [], [], []
Current SQL statement for this session:
ALTER DATABASE   MOUNT
----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? 700000010007FE0 ?
ksedmp+0290          bl       ksedst               104C1FCD0 ?
ksfdmp+02d8          bl       03F34734             
kgerinv+00dc         bl       _ptrgl               
kgeasnmierr+004c     bl       kgerinv              1105312C0 ? 110211598 ?
                                                   000000000 ? 000015018 ?
                                                   000000000 ?
kccpb_sanity_check+  bl       kgeasnmierr          11019B7D0 ? 1104A0040 ?
013c                                               104DFF150 ? 300000003 ?
                                                   000000000 ? 00000D1A8 ?
                                                   000000000 ? 00000D18B ?
kccbmp_get+00f8      bl       kccpb_sanity_check   FFFFFFFFFFFFFFFF ?
                                                   2700000027 ?
kccsed_rbl+008c      bl       kccbmp_get           1100745A0 ? 104E02124 ?
kccocx+08fc          bl       kccsed_rbl           100000068 ?
kccocf+0140          bl       kccocx               FFFFFFFFFFEB7C8 ? 0CBC08BD8 ?
                                                   16D694B2F ? 110231D90 ?
kcfcmb+0ab4          bl       kccocf               000000000 ? 000000000 ?
                                                   000000002 ? 10041C1F4 ?
kcfmdb+0054          bl       kcfcmb               0FFFED160 ? 7000000B9FF038E ?
                                                   000000003 ? 000000000 ?
                                                   000000003 ? 000000000 ?
                                                   000000000 ? 000000000 ?
adbdrv+06c0          bl       kcfmdb               110262390 ? 7000000BCFFA050 ?
                                                   7000000BCFFA470 ? 104F38E08 ?
opiexe+2d34          bl       adbdrv               
opiosq0+1ac8         bl       opiexe               000000001 ? 000000000 ?
                                                   FFFFFFFFFFF9148 ?
kpooprx+016c         bl       opiosq0              300000020 ? 110231D90 ?
                                                   7000000BD1038F8 ?
                                                   A400011019B7D0 ? 000000000 ?
kpoal8+03cc          bl       kpooprx              FFFFFFFFFFFB954 ?
                                                   FFFFFFFFFFFB700 ?
                                                   1600000016 ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103ABA18 ?
opiodr+0b2c          bl       _ptrgl               
ttcpip+1020          bl       _ptrgl               
opitsk+117c          bl       01FC6908             
opiino+09d0          bl       opitsk               1EFFFFD920 ? 000000000 ?
opiodr+0b2c          bl       _ptrgl               
opidrv+04a4          bl       opiodr               3C102A89D8 ? 404C72CF0 ?
                                                   FFFFFFFFFFFF8E0 ? 0102A89D0 ?
sou2o+0090           bl       opidrv               3C02A2895C ? 400000020 ?
                                                   FFFFFFFFFFFF8E0 ?
opimai_real+01bc     bl       01FC3174             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0090         bl       main                 000000000 ? 000000000 ?

    last wait for 'control file sequential read' wait_time=0.000511 sec, seconds since wait started=0
                file#=0, block#=27, blocks=1
                blocking sess=0x0 seq=23
    Dumping Session Wait History
     for 'control file sequential read' count=1 wait_time=0.000511 sec
                file#=0, block#=27, blocks=1
     for 'control file sequential read' count=1 wait_time=0.000957 sec
                file#=0, block#=1, blocks=1
     for 'CSS operation: action' count=1 wait_time=0.036477 sec
                function_id=41, =0, =0
     for 'CSS operation: action' count=1 wait_time=0.000200 sec
                function_id=41, =0, =0
     for 'CSS initialization' count=1 wait_time=0.060291 sec
                =0, =0, =0
     for 'control file sequential read' count=1 wait_time=0.000547 sec
                file#=0, block#=1, blocks=1
     for 'control file sequential read' count=1 wait_time=0.003763 sec
                file#=0, block#=3, blocks=20
     for 'control file heartbeat' count=1 wait_time=3.906271 sec
                =0, =0, =0
     for 'control file sequential read' count=1 wait_time=0.020452 sec
                file#=0, block#=3, blocks=20
     for 'control file sequential read' count=1 wait_time=0.000310 sec
                file#=0, block#=1, blocks=1

 

 

 

上例是一个ORA-600[kccpb_sanity_check_2]的实战案例,虽然出现了该错误,但是由于多路复用了controlfile控制文件,通过修改参数control_files ,发现其中第二个控制文件mount时未报错,可以确信仅有1个控制文件有问题,所以只需要dd 复制一下即可。

针对这个 实例mount阶段的ORA-600[kccpb_sanity_check_2]错误,一般有几种解决方法:

1、如果多路复用了控制文件,则未必所有控制文件都坏了,修改control_files参数一个个试过来,注意当 好的控制文件和坏的控制文件都在参数control_files里时是无法mount成功的

2、从备份中restore健康的控制文件出来

3、若没有备份,则需要手动重建控制文件了

Oracle rman中set newname可能很慢

Oracle rman中set newname可能很慢, 对于该问题有几种可能:

 

1、rman 软件bug ,具体请去MOS查询

2、打开了BACKUP OPTIMIZATION on,show all命令确认下,若是则 CONFIGURE BACKUP OPTIMIZATION OFF; 暂时关掉

3、 set newname 之前不要set until time之类的设置,最好也别allocate channel xx type sbt,可以在set newname之后再设不迟

4 、使用catalog 恢复目录替代controlfile模式往往可以绕过该问题

 

具体的set newname操作涉及到  sys . dbms_backup_restore . normalizeFileName和 dbms_rcvman . translateDataFile  2个存储过程:

 

FUNCTION normalizeFileName (fname IN varchar2) RETURN varchar2;

— Normalizes the file name according to the rules of the target
— database and returns the normalized filename.

— Input parameters:
— fname
— File name to be normalized.
— Exceptions:
— NAME-TOO-LONG (ora-19704)
— The specified file name is longer than the port-specific
— maximum file name length.

— translateDataFile translates the datafile name/number into
— a datafile number and creation SCN and filename. getDataFile must
— be called to obtain the translation info, just as for the other translate
— functions.
— Unlike the other translation functions, translateDatafile by name is always
— performed relative to current time. If an until setting is in effect,
— and if the filename is ambiguous, then an exception is raised. Ambiguous
— means that the filename refers to different datafile at the until time than
— it does at the current time. This happens only when a filename has been
— reused. When fno and ckpscn are passed, the filename and other info as of
— that scn is returned.

— Input parameters:
— fname
— name of the datafile to be translated.
— The name must be a normalized filename.
— fno
— The datafile number. If the datafile number was not in use at the
— until time, then an exception is raised.
— Exceptions:
— DATAFILE_DOES_NOT_EXIST (ORA-20201)
— the datafile to be translated does not exists
— Check that the recovery catalog is current.

PROCEDURE translateDataFile(
fname IN varchar2);

PROCEDURE translateDatafile(
fno IN number);

PROCEDURE translateDatafile(
fno IN number
,ckpscn IN number);

 

如何清除Oracle控制文件中的无用记录,例如v$archived_log中的deleted归档日志记录

有这样一种需求,虽然controlfile中的archivelog和其他一些备份信息会被覆盖重用,收到参数control_file_record_keep_time的影响,但是我们没法手动触发这种重用,这导致有时候controlfile中的archivelog 记录过多,可能导致一些备份、恢复过程中的”control file sequential read”等待事件。

已知的一些解决方案,包括:

 

1. 重建控制文件, 例如 backup controlfile to trace后重建该控制文件,但要求有数据文件均存在

2. 设置control_file_record_keep_time=0 然后等待记录被重用, 太过被动了….

 

这里再介绍一种手动清除v$ARCHIVED_LOG中不管是DELETED还是available 记录的方法,注意不要在你的产品数据库上这样做:

 

SQL> select count(*) from v$archived_log;

COUNT(*)
———-
553

SQL> /

COUNT(*)
———-
553
SQL> execute sys.dbms_backup_restore.resetCfileSection( 11);

PL/SQL procedure successfully completed.

SQL> select count(*) from v$archived_log;

COUNT(*)
———-
0

 

PROCEDURE resetCfileSection(record_type IN binary_integer );

— This procedure attempts to reset the circular controlfile section.

— Input parameters:
— record_type
— The circular record type whose controlfile section is to be reset.

 

execute sys.dbms_backup_restore.resetCfileSection( 11);  ==> 清理v$ARCHIVED_LOG对应的记录

execute sys.dbms_backup_restore.resetCfileSection(28);  ==>清理v$rman_status对应的记录

 

以上是2个例子,具体的section_id可以这样获得:

 

1* select rownum-1, type from v$controlfile_record_section
SQL> /

ROWNUM-1 TYPE
———- —————————-
0 DATABASE
1 CKPT PROGRESS
2 REDO THREAD
3 REDO LOG
4 DATAFILE
5 FILENAME
6 TABLESPACE
7 TEMPORARY FILENAME
8 RMAN CONFIGURATION
9 LOG HISTORY
10 OFFLINE RANGE
11 ARCHIVED LOG
12 BACKUP SET
13 BACKUP PIECE
14 BACKUP DATAFILE
15 BACKUP REDOLOG
16 DATAFILE COPY
17 BACKUP CORRUPTION
18 COPY CORRUPTION
19 DELETED OBJECT
20 PROXY COPY
21 BACKUP SPFILE
22 DATABASE INCARNATION
23 FLASHBACK LOG
24 RECOVERY DESTINATION
25 INSTANCE SPACE RESERVATION
26 REMOVABLE RECOVERY FILES
27 RMAN STATUS
28 THREAD INSTANCE NAME MAPPING
29 MTTR
30 DATAFILE HISTORY
31 STANDBY DATABASE MATRIX
32 GUARANTEED RESTORE POINT
33 RESTORE POINT

LOG History占用的控制文件记录数可以查询v$controlfile_record_section获得:

 

SQL> select * from v$controlfile_record_section where type=’LOG HISTORY’ ;

TYPE RECORD_SIZE RECORDS_TOTAL RECORDS_USED FIRST_INDEX LAST_INDEX LAST_RECID
—————————- ———– ————- ———— ———– ———- ———-
LOG HISTORY 56 584 563 300 278 570

control file sequential read

Reading from the controlfile. This happens in many cases, e.g. while:

– making a backup of the controlfiles.

– the shared info (between instances) from the controlfile.

– reading other blocks from the controlfiles.

– reading the header block.

Wait time

The wait time is the elapse time of the read.

Parameters

file#

This identifies the controlfile that Oracle7 is reading from and with the following SQL statement one can determine the name of the controlfile:

select *

from x$kcccf

where indx = file#

block#

Blocknumber in the controlfile from where we are starting to read. The blocksize is the as the physical blocksize of the port (normally 512 bytes, some UNIX ports have 1 Kilobytes or 2 Kilobytes).

blocks

The number of blocks that we are trying to read.

Advise

If the wait time is too long (more then average I/O speed), check if the controlfiles are not a disk that is too busy. This could really impact Parallel Server performance as some of the synchronization between instances is done through the controlfiles.

ORACLE PRM-DULのリカバリ :RAIDが壊れた後にORACLEデータを抽出する

ORACLEデータベース によくあるエラ の解決策

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

 

 

前週の金曜日で、RAIDがシャットダウンしたあと、Oracle9 iデータベースが起動できなくなったとお客様からの知らせがあった。この話を続く前に、言っておきたいことがある。このデータベースは我々が担当していない。私たちはこの知らせを受けた前に、このデータベース自身が存在していることですら知らなかった。一部のユーザーがすべてを私たちに担当させたが、一部のユーザーが五分五分であった。けど、私たちに対して、お客様がどのやり方を取っても構わない。これはユーザーそれぞれ異なった選択である。いや、ちょっと話がわき道にそれた。

原因を探して見れば、このデータベースはNOARCHIVELOGモードであったうえに、コールドバックアップもない。一回だけのデータベースをエクスポートしたこともない。しかも、こんなことはいつも金曜日に起こる!

まずは、壊れたオンラインredoログで最初シャットダウンをリカバリしたが、失敗した。そして、お客様が人工的にリカバリしてみたが、状況があまり変わっていない。その上に、シャットダウンした直後、すぐにコールドバックアップを作成していないから。これらの試しがよりひどい損害をもたらしたか私たちも見当がつかない。

 

SCNをゆっくり増やしてリカバリしてみた。異なったredoログメンバーも試した。けど、なかなかリカバリできず、ORA-600 [3020]が現れた。

_allow_resetlogs_corruption OPEN RESETLOGS でログファイルを成功にリセットしたが、データベースがまだ起動できず、「一部のオブジェクトが見つけ出せない」とエラ情報を返した。

一言で言えば、システムテーブルスペースが足りないから、データベースを起動出来ない。

 

このあとでいくつかのアイデアが生み出したが、やはりOracle PRM-DULを使うことが最適だった。PRM-DULはData ExtractionでDatabaseをUnloadingすることと意味している。その機能は起動できなくなったデータベースからデータを抽出する。もちろんこれはデータベース損害次第である。

 

より詳しい情報は「PRM-DULガイドブック」に参考してください。

 

もし、あいにくお客様がこのような場合に遭遇したら、www.parnassusdata.com に登録してください。そこで、PRM-DUL体験版をもらえる。あるいは私たちにデータ抽出サビースの申請を出してください。私はもう既にPRM-DULについて詳しいから、一つ目の方法を選んだ。

 

それを使ったら、数時間だけでデータを収集できた。胡さんより、実演バーションをもらった。

 

実演バーションで、すべてのテーブルに前の10行のデータを抽出できた。すべても成功に実行できたと確認できたら、お客様が七日のライセンスをもらった。私は数分だけで抽出コマンドを再び実行したが、なんと、PRM-DULが奇跡的にすべてのテーブルのダンプファイルもリカバリできた。後で必要なのはほかのパソコンにコピするだけ

 

PRM-DULはすべてのシーケンスもPL / SQLオブジェクトも作成した。いくつのSYSオブジェクトをダンプするだけで、インディクスと制約が抽出できるが、ユーザーが同じデータモデルを持っているから、そこから定義を抽出する方が簡単になる。

 

お客様はこんな短い時間にこれほど致命的な損害をリカバリできたことに高く評価した。これはすべてもPRM-DULとその開發者、胡さんのおかげだ。胡さん、金曜日に仕事を早めに完成できて、ありがとうございました。

 

Oracle ORA-8102 损坏索引,ORA-1499 即使在drop并重建索引后

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com

 

ORA-08102
oerr ora 8102
08102, 00000, "index key not found, obj# %s, file %s, block %s (%s)"
// *Cause: Internal error: possible inconsistency in index
// *Action: Send trace file to your customer support representative, along
// with information on reproducing the error


ORA-01499
oerr ora 1499
01499, 00000, "table/index cross reference failure - see trace file"
// *Cause:
// *Action:

 

 

 

问题摘要行:

=====================

ORA-8102, ORA-1499 EVEN AFTER DROPPING AND RECREATING INDEX

 

Problem Description:

====================

ORA-8102 occurs when accessing a table.

 

08102, 00000, “index key not found, obj# %s, dba %s (%s)”

// *Cause:  Internal error: possible inconsistency in index

// *Action:  Send trace file to your customer support representative, along

//with information on reproducing the error

 

通过使用OBJ#从错误消息中找出有问题的对象:

 

select OWNER,OBJECT_NAME, OBJECT_TYPE from DBA_OBJECTS

where OBJECT_ID=<object#_from_error>;

 

如果返回的对象是一个索引,使用上述查询返回的值,通过以下查询找出与其相关联的表:

 

SELECT TABLE_NAME FROM DBA_INDEXES

WHERE INDEX_NAME=<object_name> AND OWNER=<owner>;

 

接着,分析表来确定索引中不一致的类型:

 

ANALYZE TABLE <owner>.<table_name> VALIDATE STRUCTURE CASCADE;

 

使用CASCADE子句是很重要的,因为分别验证的表和索引结构不会显示不一致性。

 

analyze 命令返回ORA-1499

01499, 00000, “table/index cross reference failure – see trace file”

 

检查在user_dump_dest 中生成的跟踪文件,显示cross reference 失败的类型。在这个情况中,类型是”row not found in index”。

Oracle Support 能帮助你确定哪个索引有丢失行条目。

 

普通的drop并创建索引的解决方法不适用于这种情况。使用VALIDATE STRUCTURE CASCADE 分析表仍返回ORA-1499,即使在重建了索引之后。

=====================

 

通过export/import 重建表,然后重建解决问题的索引。

 

 

说明:

============

 

There was a row in the table without a corresponding index entry. The normal

way to resolve this is to drop and recreate the index.  IMPORTANT: Rebuilding

the index will NOT resolve this error as the existing index, which is missing

the row, will be used to rebuild the index.  Dropping and recreating the index

forces Oracle to recreate the index from the table data.  This was most likely

caused by some type of hardware problem as the problem failed to be reproduced

in house. 在表中有一行没有对应的索引条目。解决的正常方法这是drop并重建索引。重要提示:重建该索引将不会解决这个错误,因为现有的丢失行的索引将用于重建索引。drop并重建索引强制Oracle从表数据重建索引。这最有可能是由某种类型的硬件问题引起的,因为问题未能被再现。

 

Oracle ORA-1499. 表/索引行数不匹配

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com

 

ora-01499
 oerr ora 1499
01499, 00000, "table/index cross reference failure - see trace file"
// *Cause:
// *Action:



适用于:

Oracle Database – Enterprise Edition – 版本 8.1.7.0 到12.1.0.1 [Release 8.1.7 to 12.1]
本文信息适用于任何平台。
Oracle Server Enterprise Edition – 版本: 8.1.7.0 到 11.2.0.2

症状

使用”validate structure cascade”分析表失败,生成ORA-1499 且跟踪文件包含信息 “Table/Index row count mismatch”。示例:

SQL> analyze table test validate structure cascade;
analyze table test validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure – see trace file

 

The associated trace file contains:

Table/Index row count mismatch
table 6559 : index 10000, 0
Index root = tsn: 6 rdba: 0x01400091

It means: A table scan returned 6559 rows and an index scan returned 10000 rows.

“Index root” is the segment header information for the index:

rdba: 0x01400091 is the Index segment header relative data block address. It is decimal 20971665 which is Rfile#=5 Block#=145 :

SQL> select dbms_utility.data_block_address_file(20971665)  “Rfile#”
2          ,dbms_utility.data_block_address_block(20971665) “Block#”
3 from dual;

Rfile#     Block#
———- ———-
5          145
Running the next query can identify the associated index:

QUERY 1:

SQL> select owner, segment_name, segment_type
2    from  dba_segments
3    where header_file = 5
4      and header_block = 145;

OWNER    SEGMENT_NAME    SEGMENT_TYPE
——– ————— ——————
SCOTT    I_TEST          INDEX

 

该逻辑不一致问题也能显示为10g+中ORA-600 [kdsgrp1] 或更低版本中的ORA-600 [12700]。

原因

在表和其索引间有逻辑不一致性。这种类型的逻辑不一致性通常是由于表中的High Water Mark (HWM) 的问题,其中全表扫描返回的行可能比索引扫描的少。

不一致性可能由Oracle defect 导致或由于LOST IO引起的OS/硬件问题。

解决方案

The rows retrieved through the index that are not retrieved by a full table scan can be identified by running this query: 通过运行此查询,可以识别由不由全表扫描检索的索引检索的行:

select /*+ INDEX_FFS(<tablename> <indexname identified in 2.1>) */ rowid
, dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno
, dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
from <tablename>
where <indexed column> is not null
minus
select /*+ FULL(<tablename>)*/ rowid
, dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno
, dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
from <tablename>;

示例:

select /*+ INDEX_FFS(TEST I_TEST) */ rowid
, dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno
, dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
from test
where c2 is not null
minus
select /*+ FULL(TEST)*/ rowid
, dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno
, dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
from test;

全表扫描丢失的值可以使用下一个plsql(SCRIPT 1)的索引存储在另一个表中:

drop table test_copy;

create table test_copy as select * from test where 1=2;

declare
cursor missing_rows is
select /*+ INDEX_FFS(TEST I_TEST) */ rowid rid
from test
where c2 is not null
minus
select /*+ FULL(TEST)*/ rowid rid
from test;
begin
for i in missing_rows loop
insert into TEST_COPY
select /*+ ROWID(TEST) */ * from TEST where rowid = i.rid;
end loop;
end;
/
解决方案

– 当索引的行比表少,重建索引可能解决该问题。

– 当索引的行比表多,重建索引或通过运行dummy 插入到表来提高HWM可能最终修复该逻辑损坏。在这种情况下,逻辑损坏被修复但这些行可能”lost”,可以在应用这些解决方法之前使用以上”SCRIPT 1″来恢复它们。

– 为了避免该不一致性,查看以下已知问题的列表。

如果需要Oracle Support的额外帮助,请提供:

1. analyze 命令生成的跟踪文件。
2. 以上”QUERY 1″ 的结果。
3. 通过dump基表的段头生成的跟踪文件:

select header_file, header_block, tablespace_name
from   dba_segments
where  owner = upper(‘&table_owner’) and segment_name = upper(‘&table_name’);

alter system dump datafile &header_file block &header_block;

This trace file will be used to identify the HWM.

@ Example from a block dump not using ASSM (Automatic Segment Space Management):
@
@ Extent Control Header
@ —————————————————————–
@ Extent Header:: spare1: 0 spare2: 0 #extents: 4 #blocks: 31
@ last map 0x00000000 #maps: 0 offset: 4128
@ Highwater:: 0x014000d6 ext#: 3 blk#: 5 ext size: 8
@
@ So, HWM is located at RDBA 0x014000d6 in extent_id=5 and block#=5 in that extent.

  1. 为索引检索的额外行识别extent id’s 的查询结果:

select rid, a.relative_fno, a.block, e.owner, e.segment_name, e.segment_type, e.extent_id, e.blocks
from (select /*+ INDEX_FFS(<table name> <index name>) */ rowid rid
, dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno
, dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
from <table owner.table name>
where <indexed column> is not null
minus
select /*+ FULL(<table name>)*/ rowid rid
, dbms_rowid.ROWID_RELATIVE_FNO(rowid) relative_fno
, dbms_rowid.ROWID_BLOCK_NUMBER(rowid) block
from <table owner.table name>
) a
, dba_extents e
where a.relative_fno=e.relative_fno
and e.tablespace_name = upper(‘&tablespace_name’)
and v.ts#=&tablespace_number
and (a.block between e.block_id and e.block_id+blocks-1);

 

注:

– 用适当值替换owner, table name, index name 和indexed column。
– tablespace_name 是以上步骤3中提供的。
– 该查询提供了由索引检索行位于的extent 。
 

已知问题: 

要避免该问题,参考 Note 1499.1 获取已知的Oracle 缺陷。对于由OS/硬件引起的LOST IO,联系第三方供应商。

参考

NOTE:1554054.1 - Quick method to identify table/index mismatch when analyze validate structure cascade takes significant time

Oracle ASM 在节点2上ORA-15186 ORA-15063

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com

 

 

适用于:

Oracle Database – Enterprise Edition – 版本 11.1.0.7 11.2.0.3 [Release 11.1 to 11.2]
本文信息适用于任何平台。

症状

 在一个2节点RAC集群中创建asmlib 磁盘后

在第一个节点上磁盘组被成功创建,

在第二个节点上尝试mount磁盘组,得到::ORA-15063: ASM discovered an insufficient number of disks for diskgroup

查询v$asm_disk – 显示 header_status: unknown

同样在节点2看到以下错误

SQL> ALTER DISKGROUP ALL MOUNT
NOTE: cache registered group DATA0 number=1 incarn=0x6805c243
NOTE: cache began mount (not first) of group DATA0 number=1 incarn=0x6805c243
NOTE: cache registered group FRA number=2 incarn=0x6805c244
NOTE: cache began mount (not first) of group FRA number=2 incarn=0x6805c244
NOTE:Loaded lib: /opt/oracle/extapi/64/asm/orcl/1/libasm.so
ERROR: kfkOsmIdent:asm_open(): asm_erc:1 msg:Operation not permitted pid:15563
ORA-15186: ASMLIB error function = [kfkOsmIdent:asm_open], Â error = [1], Â mesg = [Operation not permitted]
ERROR: kfkOsmIdent:asm_open(): asm_erc:1 msg:Operation not permitted pid:15563
ORA-15186: ASMLIB error function = [kfkOsmIdent:asm_open], Â error = [1], Â mesg = [Operation not permitted]
ERROR: kfkOsmIdent:asm_open(): asm_erc:1 msg:Operation not permitted pid:15563
ORA-15186: ASMLIB error function = [kfkOsmIdent:asm_open], Â error = [1], Â mesg = [Operation not permitted]
ERROR: kfkOsmIdent:asm_open(): asm_erc:1 msg:Operation not permitted pid:15563
ORA-15186: ASMLIB error function = [kfkOsmIdent:asm_open], Â error = [1], Â mesg = [Operation not permitted]
ERROR: kfkOsmIdent:asm_open(): asm_erc:1 msg:Operation not permitted pid:15563
ORA-15186: ASMLIB error function = [kfkOsmIdent:asm_open], Â error = [1], Â mesg = [Operation not permitted]

原因

在节点1

$ ls -l /dev/oracleasm/disks
total 0
brw-rw—- 1 oracle dba 253, 34 Sep 18 07:34 ASMDB1
brw-rw—- 1 oracle dba 253, 29 Sep 18 07:36 ASMF1
brw-rw—- 1 oracle dba 253, 28 Sep 18 07:37 ASMRD1
brw-rw—- 1 oracle dba 253, 25 Sep 18 07:38 ASMRD2
brw-rw—- 1 oracle dba 253, 33 Sep 18 07:39 ASMT1

在节点2

—->oracle@xxxxxx:/oracle/diag/asm/+asm/+ASM2/trace
$ ls -l /dev/oracleasm/disks
total 0
brw-rw—- 1 oracle dba 65, 161 Sep 18 07:40 ASMDB1
brw-rw—- 1 oracle dba 65, 177 Sep 18 07:40 ASMF1
brw-rw—- 1 oracle dba 65, 193 Sep 18 07:40 ASMRD1
brw-rw—- 1 oracle dba 65, 209 Sep 18 07:40 ASMRD2
brw-rw—- 1 oracle dba 65, 225 Sep 18 07:40 ASMT1

节点2 不在使用多路径设备。

解决方案

确认以下:

1.   两个节点都有相同 /etc/sysconfig/oracleasm  文件且都有以下行

# ORACLEASM_SCANORDER: Matching patterns to order disk scanning
ORACLEASM_SCANORDER=”dm”

# ORACLEASM_SCANEXCLUDE: Matching patterns to exclude disks from scan
ORACLEASM_SCANEXCLUDE=”sd”

2.  两个节点显示相同symlink ,当发出以下:

# ls -l  /etc/sysconfig/oracleasm
lrwxrwxrwx 1 root root 24 Sep 18 22:10 /etc/sysconfig/oracleasm -> oracleasm-_dev_oracleas

一旦完成,请重新扫描#oracleasm scandisks 并再次验证两个节点上的# ls -l /dev/oracleasm/disks

它们应当使用dm 设备而不是单路径。 

参考

NOTE:398622.1 – ORA-15186: ASMLIB error function = [asm_open], error = [1], mesg = [Operation not permitted]

沪ICP备14014813号-2

沪公网安备 31010802001379号