Introduction TXChecker provides information useful for determining the nature and extent of corruption that may be introduced when forcing a database open with a combination of _CORRUPTED_ROLLBACK_SEGMENTS and _OFFLINE_ROLLBACK_SEGMENTS. When a database is opened with these parameters due to missing or corrupt undo datafiles, it is unknown how much data will be inaccessible (returning errors or hanging/spinning) or force-committed. Force-committing can cause logical data corruption and possible data dictionary corruption. This tool provides information necessary for the correction and repair of such corruption by identifying which objects will be effected. TXChecker MUST only be used under the advice of Oracle Support Services. Failure to do so and forcing the database open will render the database unsupported by Oracle. Back to Contents Overview In cases were a problem occurs with one or more undo datafiles preventing the database from opening, two hidden and unsupported init.ora parameters _OFFLINE_ROLLBACK_SEGMENTS and/or _CORRUPTED_ROLLBACK_SEGMENTS may be used by Oracle Support to aid in opening the database. These two parameters change the way Oracle handles active transactions when the undo required to roll them back is not available. They should only be used under the advice of Oracle Support due to the increased possibilities of introducing logical data corruptions throughout the database. Without using TXChecker the only way to know which objects are affected is by forcing the database open and then manually dump the active transaction undo chains. This method is very time consuming and can prove futile if after the database is opened you find out important objects are no longer accessible. TXChecker will NOT help identify objects that may be negatively affected by using the _ALLOW_RESETLOGS_CORRUPTIONparameter. TXChecker will only identify objects that may require undo information to rollback active transactions or clean out dirty blocks. It will NOT identify objects that are out of sync with each other or the data dictionary. Having datafiles from different points in time will produce inaccurate results from TXChecker. TXChecker will show the object name, owner and type of all objects involved in active transactions, assuming the undo datafiles can be read successfully. If the active transaction chains cannot be read TXChecker can scan one or more datafiles to identify objects that may require undo data from missing or corrupted undo segments. TXChecker results is easier for Oracle Support to decide the best course of action before forcing the database open. If all the affected objects can be rebuilt from other sources and the data dictionary is not affected, the database can be opened without the need to rebuild the entire database using export/import. TXChecker operates in a read-only manner and does NOT write to any Oracle datafiles, control files or log files. TXChecker will not damage the database. Back to Contents Features Version 1.4 of TXChecker provides the following features: Show you which undo segments were online the last time the database was open. If the database was shutdown cleanly they should all be offline, except for the system undo segment. If the database crashed, this information can then be used to set the relevant rollback segment parameters should they be needed to force the database open. Show you which undo segments contained active transactions at time of a crash, and which undo segments will report problems when Oracle tries to access them. The undo chains for active transactions are also checked to identify the objects involved or what problems are encountered when traversing them. Obviosly if the undo datafile cannot be read the undo segments can not be accessed. Browse | Subscriptions | Authoring Wizard | Documents In Progress | Query By Attribute Knowledge -a When scanning datafiles (with -d/-f/-l/-t options) report objects using any of the undo segments (not just those with errors) (OPTIONAL) -b For objects found, print the datablock addresses (OPTIONAL) -ccontrolfile_name Fully qualified controlfile name to read (MANDATORY) For each active transaction, show the object owner, object name, partition name if a partitioned object and object type for all objects involved in that transaction. A count of undo records found for each object is also shown. Scan a single datafile, a list of datafiles, a tablespace or all datafiles in the database for datablocks that may need undo data for clean-out purposes, or datablocks that are involved in active transactions. This is useful for determining what objects will need attention after the database is forced open. For all objects identified as needing undo data, a SQL script is created which contains the commands to carry out an 'ANALYZE ... VALIDATE STRUCTURE' to check for possible corruption or access problems. Identify which undo segments were used to update each object. When the datablocks are listed in the report, the transactions IDs are also displayed. Lists which undo segments were used before and after determined SCN values. This is useful to find out which undo segments would need to be named in either of the underscore parameters to force the database open to allow active transactions to be force committed or dirty datablocks to be cleaned out. Allows you to search the database for active transactions using a particular undo segment, a transaction ID or an undo segment and slot number, or a transaction using an undo segment where the wrap number is greater than a determined value. When identifying each affected datablock address, TXChecker will create a script which can be run to dump those datablocks for further analysis. Will scan controlfiles and datafiles located on RAW devices. Back to Contents Supported Platforms 64-bit & 32-bit Sparc Solaris 64-bit & 32-bit Linux 64-bit & 32-bit HP-UX 64-bit AIX 32-bit MS Windows TXChecker has had initial testing against versions 8.0 through to 10.2 but only on a limited basis so feedback is encouraged to let us know how well it worked, what problems were encountered or what enhancements you would like to have added. Back to Contents Download TXChecker Current TXChecker Version: 1.4 October, 2007 x86 Linux (v1.4) Click here to download the file Sparc Solaris (v1.4) Click here to download the file HP/UX - RISC (v1.4) Click here to download the file AIX (v1.4) Click here to download the file MS Windows (v1.4) Click here to download the file If a file download dialog box does not appear when clicking on the above link, you may need to clear your web browser's cache and/or restart your web browser. If you are still unable to download the file, you may request that we email you a copy: Stephan.Haisley@oracle.com Back to Contents Installing TXChecker Download TXChecker using one of the links above. Copy the tar file to the directory where TXChecker is to be installed and issue the following commands. Unix/Linux % tar xvfp TXChecker_version_platform.tar % cd TXChecker % chmod u+x TXChecker Windows 1. Extract the files from the zip file into a directory: C:\TXC> unzip TXChecker_verson_MSWindows.zip 2. Leave the Microsoft.VC80.CRT directory where it is to allow TXChecker to find the DLLs. This User Guide is also uncompressed with full instructions on how to run the tool. Back to Contents Running TXChecker To see the list of program arguments: % TXChecker Usage is: TXChecker [options] Options: -ccontrolfile_name Fully qualified controlfile name to read (MANDATORY) -d Scan database for active TXs (use when undo not available) (OPTIONAL) -ffilename Scan the named datafile for active TXs (OPTIONAL) -g Indicates you want to find all blocks taking part in transactions with a USN > than the USN supplied in -x parameter (same constraints as -w) (OPTIONAL) -llistfile Scan all the datafiles listed in the listfile for active TXs (OPTIONAL) -mminutes Number of minutes used to consider a TX as active (1-120) (DEFAULTS TO 15 MINUTES) -p Show the names and last known status of the UNDO segments (OPTIONAL) -s Skip read-only or offline normal datafiles (OPTIONAL) -ttablespace Scan all the datafiles for this tablespace (OPTIONAL) -u Report ITL entries active if marked with an upper bound ('U' flag) fast commit SCN instead if active transactions (OPTIONAL) -wwrap# Wrap# for XID in ITL entry to report blocks where wrap# > this one (OPTIONAL). Must use -x with this option -xXID XID for transaction wanting to search for (OPTIONAL) Use format rrrr.ssss.wwwwwwww using Hexadecimal numbers. See the readme for full instructions on using -x, -w and -g options NOTE: Options -d/-f/-l/-t are exclusive, and only one should be specified. NOTES: Options -x (search for a transaction ID) and -a (search for all active transactions) are exclusive and will not work if specified together. Options -w (search for transactions with a wrap number greater than one specified) and -g (search for transactions using an undo segment with a USN greater than the number specified) are exclusive and should not be used together. The dump_datablocks.sql script containing the commands to dump the affected datablocks will only be produced when using the -b option. TXChecker will always create a logfile in the current directory called TXChecker_date_time_DBName.log. This MUST be reviewed and uploaded to Oracle Support Services for analysis. TXChecker uses the control file to gather the locations and names of all tablespaces and datafiles belonging to the database. Using a controlfile from a different point in time to the rest of the database may result in errors identifying missing datafiles. If this happens, use the listfile (-l), tablespace (-t) or datafile (-d) options to scan selected datafiles. If there is adequate CPU and I/O bandwidth available TXChecker could be run in parallel to scan different tablespaces or datafiles concurrently, reducing the time it takes to scan larger databases. To do this, simply run TXChecker from multiple sessions at the same time. NOTE: TXChecker will always create a logfile in the current directory called TXChecker_date_time_DBName.log. This MUST be uploaded to Oracle Support Services for analysis and for advice on how best to continue. Also upload a conttol file and file headers dump (alter session set events 'immediate trace name controlf level 12' and alter session set events 'immediate trace name file_hdrs level 10'). Common Examples Here are some common examples of using TXChecker along with links to the output created by using the commands. 1. Check the undo segments for active transactions: % ./TXChecker -c/u02/oradata/S102/control02.ctl Example output: TXChecker_ex1.out 2. Scan the database for objects which may require undo from troubled undo segments, showing datablock addresses: % ./TXChecker -c/u02/oradata/S102/control02.ctl -d -b Example output: TXChecker_ex2.out 3. Find all objects in a list of datafiles which may require undo from any undo segments: % ./TXChecker -c/u03/oradata/S901/control02.ctl -lfiles.S901 -a Example output: TXChecker_ex3.out See the Appendix B for details of creating a list file containing the names of the datafiles to be scanned. 4. Scan a single datafile for objects which may require undo from troubled undo segment with a particular transaction ID: % ./TXChecker -f/u02/oradata/S101/users01.dbf -c/u01/oradata/S101/control01.ctl -x0001.0004.00000125 -b Example output: TXChecker_ex4.out 5. Scan a tablespace for with non-committed transactions using a USN greater than a known USN: % ./TXChecker -c/u02/oradata/S102/control02.ctl -tsystem -x0009.FFFF.FFFFFFFF -g Example output: TXChecker_ex5.out Back to Contents Warnings and Restrictions When TXChecker is scanning datafiles it is reading and checking every data block for active ITL (Interested Transaction List) entries that may require access to undo information. To scan large datafiles or a large database TXChecker may take some time due to the amount of reads being carried out, so consider using the listfile or tablespace option to scan the most important datafiles. TXChecker has NOT been coded for OCFS or ASM files yet. This may be introduced in a later release. TXChecker will currently not handle a database where there are more than 250 rows in the X$KCCFE table. This restriction has been removed in version 1.4. Details of a workaround can be found in Appendix C. Back to Contents Future Release Features Features that may be introduced in future releases of TXChecker include: a. Add support for ASM, OCFS. b. Create a script using the datablock addresses to be used to select data around affected datablocks. Create a similar script to try selecting the data in the affected datablocks. c. Add the ability to carry out more internal related undo problem diagnostics. d. Add parallelism feature to scan datafiles in parallel. Back to Contents Reporting Feedback If you encounter problems running TXChecker or would like to provide feedback or enhancement requests, please send emails to Stephan.Haisley@oracle.com. I have done my best to test TXChecker on many different machines and conditions but if it does core dump or report unexpected errors, please send me the logfile, stack trace (from the core) and if possible a binary copy of the controlfile and system datafile. If the system datafile is too big to upload to Oracle, copy the first 3000 blocks using dd: % dd if=system_datafile of=system_datafile_copy bs=blocksize count=3000 Back to Contents Appendix A: Output Example The following logfiles were produced from some of the above common examples. TXChecker_ex1.out TXChecker started at: Thu May 3 12:06:17 2007 TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07) Program command: TXChecker -c/u02/oradata/S102/control02.ctl Database Name: S102 Version: 10.2.0 *** Database last checkpointed at 05/03/2007 12:05:42 (SCN: 0x0.0xc5f24c) *** Using 60 minutes to find most active transactions (-m60) *** Undo Segments: USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000 USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59c2a XactSQN: 0x00000208 UndoSQN: 0x0000020b USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59cc7 XactSQN: 0x000002a6 UndoSQN: 0x00000220 USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59c50 XactSQN: 0x0000029c UndoSQN: 0x00000277 ... USN: 10 Name: _SYSSMU10$ TBS#: 1 File: 2 Block: 153 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59c1c XactSQN: 0x000001f7 UndoSQN: 0x00000263 USN: 11 Name: _SYSSMU11$ TBS#: 1 File: 2 Block: 241 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0006f10a XactSQN: 0x00000003 UndoSQN: 0x00000002 USN: 12 Name: _SYSSMU12$ TBS#: 1 File: 2 Block: 289 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0006f0f0 XactSQN: 0x00000003 UndoSQN: 0x00000002 *** Active Transactions: USN: 2 Name: _SYSSMU2$ File: 2 Block: 25 Instance: 0 Status: 3 - Online * Active TX at slot 44 #undo blocks: 2 Last bk: 2.3103 USN: 6 Name: _SYSSMU6$ File: 2 Block: 89 Instance: 0 Status: 3 - Online * Active TX at slot 24 #undo blocks: 7 Last bk: 2.1301 *** Objects that may require undo data: Obj#: 51465 Name: OE.ITEM_ORDER_IX Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51466 Name: OE.ITEM_PRODUCT_IX Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51402 Name: OE.INVENTORIES Type: TABLE Undo recs: 95 Used undo segment IDs: 6, Obj#: 51400 Name: OE.ORDERS Type: TABLE Undo recs: 2 Used undo segment IDs: 2, Obj#: 51396 Name: OE.ORDER_ITEMS Type: TABLE Undo recs: 15 Obj#: 51396 Name: OE.ORDER_ITEMS Type: TABLE Undo recs: 15 Used undo segment IDs: 2, Obj#: 51397 Name: OE.ORDER_ITEMS_PK Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51398 Name: OE.ORDER_ITEMS_UK Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51401 Name: OE.ORDER_PK Type: INDEX Undo recs: 2 Used undo segment IDs: 2, Obj#: 51468 Name: OE.ORD_CUSTOMER_IX Type: INDEX Undo recs: 2 Used undo segment IDs: 2, Obj#: 51469 Name: OE.ORD_ORDER_DATE_IX Type: INDEX Undo recs: 2 Used undo segment IDs: 2, Obj#: 51467 Name: OE.ORD_SALES_REP_IX Type: INDEX Undo recs: 2 Used undo segment IDs: 2, *** Use validate_objects.sql script file to validate the structure of possibly corrupt objects if the undo required is not available. NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE! ************************************************************************************* Upload the logfile (TXChecker_050307_1206_S102.log) to Oracle Support Services for analysis. Do NOT attempt to force the database open until the logfile has been analyzed. TXChecker_ex2.out TXChecker started at: Thu May 3 12:32:26 2007 TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07) Program command: TXChecker -c/u02/oradata/S102/control02.ctl -d -b Database Name: S102 Version: 10.2.0 *** Database last checkpointed at 05/03/2007 12:05:42 (SCN: 0x0.0xc5f24c) *** Using 60 minutes to find most active transactions (-m60) *** Undo Segments: USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000 USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59c2a XactSQN: 0x00000208 UndoSQN: 0x0000020b USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59cc7 XactSQN: 0x000002a6 UndoSQN: 0x00000220 USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59c50 XactSQN: 0x0000029c UndoSQN: 0x00000277 ... USN: 10 Name: _SYSSMU10$ TBS#: 1 File: 2 Block: 153 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00c59c1c XactSQN: 0x000001f7 UndoSQN: 0x00000263 USN: 11 Name: _SYSSMU11$ TBS#: 1 File: 2 Block: 241 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0006f10a XactSQN: 0x00000003 UndoSQN: 0x00000002 USN: 12 Name: _SYSSMU12$ TBS#: 1 File: 2 Block: 289 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0006f0f0 XactSQN: 0x00000003 UndoSQN: 0x00000002 *** Active Transactions: USN: 2 Name: _SYSSMU2$ File: 2 Block: 25 Instance: 0 Status: 3 - Online * Active TX at slot 44 #undo blocks: 2 Last bk: 2.3103 USN: 6 Name: _SYSSMU6$ File: 2 Block: 89 Instance: 0 Status: 3 - Online * Active TX at slot 24 #undo blocks: 7 Last bk: 2.1301 *** Objects that may require undo data: Obj#: 51465 Name: OE.ITEM_ORDER_IX Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51466 Name: OE.ITEM_PRODUCT_IX Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51402 Name: OE.INVENTORIES Type: TABLE Undo recs: 95 Used undo segment IDs: 6, Obj#: 51400 Name: OE.ORDERS Type: TABLE Undo recs: 2 Used undo segment IDs: 2, Obj#: 51396 Name: OE.ORDER_ITEMS Type: TABLE Undo recs: 15 Used undo segment IDs: 2, Obj#: 51397 Name: OE.ORDER_ITEMS_PK Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51398 Name: OE.ORDER_ITEMS_UK Type: INDEX Undo recs: 15 Used undo segment IDs: 2, Obj#: 51401 Name: OE.ORDER_PK Type: INDEX Undo recs: 2 Used undo segment IDs: 2, Obj#: 51468 Name: OE.ORD_CUSTOMER_IX Type: INDEX Undo recs: 2 Used undo segment IDs: 2, Obj#: 51469 Name: OE.ORD_ORDER_DATE_IX Type: INDEX Undo recs: 2 Used undo segment IDs: 2, Obj#: 51467 Name: OE.ORD_SALES_REP_IX Type: INDEX Undo recs: 2 Used undo segment IDs: 2, *** Undo segments (headers) that encountered errors preventing Active TX scan: USN: 5 Name: _SYSSMU5$ File: 2 Block: 73 Instance: 0 Error: 20 - Bad block type *** Scanning database for datablocks that may require undo: *** Asterisk ('*') denotes blocks being updated since 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef) Scanning datafile#: 5 - /u03/oradata/S102/example01.dbf (EXAMPLE) - Active TX blocks: 0 Scanning datafile#: 9 - /u02/oradata/S102/logmnr01.dbf (LOGMNR) - Active TX blocks: 0 Scanning datafile#: 6 - /u03/oradata/S102/perfstat01.dbf (PERFSTAT) - Active TX blocks: 0 Scanning datafile#: 8 - /u03/oradata/S102/stradmin01.dbf (STRADMIN) - Active TX blocks: 0 Scanning datafile#: 3 - /u02/oradata/S102/sysaux01.dbf (SYSAUX) - Active TX blocks: 23 * Scanning datafile#: 1 - /u02/oradata/S102/system01.dbf (SYSTEM) - Active TX blocks: 486 Scanning datafile#: 1 - /u02/oradata/S102/system01.dbf (SYSTEM) - Active TX blocks: 486 Temporary datafile (/u03/oradata/S102/temp01.dbf) - SKIPPING Scanning datafile#: 7 - /u02/oradata/S102/test01.dbf (TEST) - Active TX blocks: 0 Undo datafile (/u02/oradata/S102/undotbs01.dbf) - SKIPPING Scanning datafile#: 4 - /u03/oradata/S102/users01.dbf (USERS) - Active TX blocks: 0 Scanning datafile#: 10 - /u03/oradata/S102/users02.dbf (USERS) - Active TX blocks: 0 *** Objects that may require undo data: *** Asterisk ('*') denotes blocks being updated since 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef) DataObj#: 593 Name: SYS.I_AW_OBJ$ Type: INDEX Datablocks: 1 Used undo segment IDs: 5, DBAs: 0x00c04797 ( 3. 18327) XID: 0005.0021.000000c3 Rows: 229 ---- DataObj#: 254 Name: SYS.I_H_OBJ#_COL# Type: INDEX Datablocks: 1 Used undo segment IDs: 5, DBAs: 0x0040f320 ( 1. 62240) XID: 0005.002a.00000267 Rows: 393 ---- DataObj#: 586 Name: SYS.AW_OBJ$ Type: TABLE Datablocks: 20 Used undo segment IDs: 5, DBAs: 0x00c04373 ( 3. 17267) XID: 0005.0021.000000c3 Rows: 24 ---- 0x00c04374 ( 3. 17268) XID: 0005.0021.000000c3 Rows: 25 ---- 0x00c04376 ( 3. 17270) XID: 0005.0021.000000c3 Rows: 11 ---- 0x00c0478a ( 3. 18314) XID: 0005.0021.000000c3 Rows: 21 ---- ... *** Use validate_objects.sql script file to validate the structure of possibly corrupt objects if the undo required is not available. Undo Segment Usage Summary ************************** *** Undo segments identified in use by active transaction datablocks AFTER 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef): USN: 5 Name: _SYSSMU5$ *** Undo segments identified in use by active transaction datablocks BEFORE 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef): USN: 5 Name: _SYSSMU5$ NOTE: The database was scanned for active transactions using the problemmatic undo segments ONLY. To scan the database for ALL active transactions (using good and bad undo segments) use '-a' option NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE! ************************************************************************************* Upload the logfile (TXChecker_050307_1232_S102.log) to Oracle Support Services for analysis. Do NOT attempt to force the database open until the logfile has been analyzed. TXChecker_ex3.out TXChecker started at: Thu May 3 13:19:01 2007 TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07) Program command: TXChecker -c/u03/oradata/S901/control02.ctl -lfiles.S901 -a Database Name: S901 Version: 9.0.1 *** Database last checkpointed at 03/20/2007 02:27:18 (SCN: 0x0.0x452b955) *** Using 60 minutes to find most active transactions (-m60) *** Undo Segments: USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 2 Instance: 0 SMU: N Status: 3 - Online SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000 USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0452b950 XactSQN: 0x00011078 UndoSQN: 0x00004a8b USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0452b944 XactSQN: 0x00010b60 UndoSQN: 0x000008a9 USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0452b94d XactSQN: 0x000110fe UndoSQN: 0x000008e2 ... USN: 9 Name: _SYSSMU9$ TBS#: 1 File: 2 Block: 137 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0452b94b XactSQN: 0x0001106e UndoSQN: 0x00004a8e USN: 10 Name: _SYSSMU10$ TBS#: 1 File: 2 Block: 153 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.0452b946 XactSQN: 0x0001102f UndoSQN: 0x00004a80 *** Undo segments (headers) that encountered errors preventing Active TX scan: USN: 1 Name: _SYSSMU1$ File: 2 Block: 9 Instance: 0 Error: 1 - Datafile was not accessible USN: 2 Name: _SYSSMU2$ File: 2 Block: 25 Instance: 0 Error: 1 - Datafile was not accessible USN: 3 Name: _SYSSMU3$ File: 2 Block: 41 Instance: 0 Error: 1 - Datafile was not accessible NOTE: IT IS RECOMMENDED TO SCAN THE DATABASE FOR OBJECTS INVOLVED IN ACTIVE TXS USING THESE BAD UNDO SEGMENTS *** Scanning datafiles for datablocks that may require undo: *** Asterisk ('*') denotes blocks being updated since 03/20/2007 01:55:35 (SCN: 0x0.0x44fe999) Scanning datafile#: 1 - /u02/oradata/S901/system01.dbf (SYSTEM) - Active TX blocks: 12 * Scanning datafile#: 4 - /u03/oradata/S901/tools01.dbf (TOOLS) - Active TX blocks: 0 Scanning datafile#: 4 - /u03/oradata/S901/tools01.dbf (TOOLS) - Active TX blocks: 0 Scanning datafile#: 5 - /u03/oradata/S901/users01.dbf (USERS) - Active TX blocks: 0 *** Objects that may require undo data: *** Asterisk ('*') denotes blocks being updated since 03/20/2007 01:55:35 (SCN: 0x0.0x44fe999) DataObj#: 118 Name: SYS.I_ARGUMENT2 Type: INDEX Datablocks: 9 * Used undo segment IDs: 3,4,5,6,7,10, DataObj#: 45 Name: SYS.I_COL1 Type: INDEX Datablocks: 1 * Used undo segment IDs: 10, DataObj#: 119 Name: SYS.I_SOURCE1 Type: INDEX Datablocks: 2 * Used undo segment IDs: 2, *** Use validate_objects.sql script file to validate the structure of possibly corrupt objects if the undo required is not available. Undo Segment Usage Summary ************************** *** Undo segments identified in use by active transaction datablocks AFTER 03/20/2007 01:55:35 (SCN: 0x0.0x44fe999): USN: 2 Name: _SYSSMU2$ USN: 3 Name: _SYSSMU3$ USN: 4 Name: _SYSSMU4$ USN: 5 Name: _SYSSMU5$ USN: 6 Name: _SYSSMU6$ USN: 7 Name: _SYSSMU7$ USN: 10 Name: _SYSSMU10$ NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE! ************************************************************************************* Upload the logfile (TXChecker_050307_1319_S901.log) to Oracle Support Services for analysis. Do NOT attempt to force the database open until the logfile has been analyzed. TXChecker_ex4.out TXChecker started at: Thu May 3 14:58:02 2007 TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07) Program command: TXChecker -f/u02/oradata/S101/users01.dbf -c/u01/oradata/S101/control01.ctl -x0001.0004.00000125 -b Database Name: S101 Version: 10.1.0 *** Database last checkpointed at 04/03/2007 17:11:29 (SCN: 0x0.0x33286) *** Using 60 minutes to find most active transactions (-m60) *** Undo Segments: USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000 USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.0003309e XactSQN: 0x00000124 UndoSQN: 0x00000045 USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.000330a4 XactSQN: 0x00000121 UndoSQN: 0x0000003d USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 3 - Online SCN: 0000.00033098 XactSQN: 0x0000011c UndoSQN: 0x00000044 .... *** Active Transactions: USN: 1 Name: _SYSSMU1$ File: 2 Block: 9 Instance: 0 Status: 3 - Online * Active TX at slot 4 #undo blocks: 6 Last bk: 2.540 *** Objects that may require undo data: *** Using XID: 1.4.125 Obj#: 9375 Name: SCOTT.OBJ Type: TABLE Undo recs: 207 Used undo segment IDs: 1, *** Scanning datafiles for datablocks that may require undo: *** Asterisk ('*') denotes blocks being updated since 04/03/2007 17:04:41 (SCN: 0x0.0x33102) Scanning datafile#: 4 - /u02/oradata/S101/users01.dbf (USERS) - Active TX blocks: 112 * *** Objects that may require undo data: *** Asterisk ('*') denotes blocks being updated since 04/03/2007 17:04:41 (SCN: 0x0.0x33102) *** Using XID: 1.4.125 DataObj#: 9375 Name: SCOTT.OBJ Type: TABLE Datablocks: 112 * Used undo segment IDs: 1, DBAs: 0x0100000c ( 4. 12)* XID: 0001.0004.00000125 Rows: 81 ---- 0x0100000d ( 4. 13)* XID: 0001.0004.00000125 Rows: 84 ---- 0x0100000e ( 4. 14)* XID: 0001.0004.00000125 Rows: 79 ---- 0x0100000f ( 4. 15)* XID: 0001.0004.00000125 Rows: 79 ---- 0x01000010 ( 4. 16)* XID: 0001.0004.00000125 Rows: 80 ---- *** Use validate_objects.sql script file to validate the structure of possibly corrupt objects if the undo required is not available. Undo Segment Usage Summary ************************** *** Undo segments identified in use by active transaction datablocks AFTER 04/03/2007 17:04:41 (SCN: 0x0.0x33102): USN: 1 Name: _SYSSMU1$ NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE! ************************************************************************************* Upload the logfile (TXChecker_050307_1458_S101.log) to Oracle Support Services for analysis. Do NOT attempt to force the database open until the logfile has been analyzed. TXChecker_ex5.out TXChecker started at: Thu May 3 15:18:27 2007 TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07) Program command: TXChecker -c/u02/oradata/S102/control02.ctl -tsystem -x0009.FFFF.FFFFFFFF -g Database Name: S102 Version: 10.2.0 *** Database last checkpointed at 05/03/2007 13:13:06 (SCN: 0x0.0xc64144) *** Using 60 minutes to find most active transactions (-m60) *** Undo Segments: USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000 USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.00c641a5 XactSQN: 0x00000208 UndoSQN: 0x0000020b USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.00c641b5 XactSQN: 0x000002a7 UndoSQN: 0x0000022a USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 2 - Offline SCN: 0000.00c641b4 XactSQN: 0x0000029d UndoSQN: 0x0000027e ... *** Tablespace Scan => SYSTEM *** Scanning tablespace for datablocks that may require undo: *** Asterisk ('*') denotes blocks being updated since 05/03/2007 12:02:06 (SCN: 0x0.0xc5eafd) Scanning datafile#: 1 - /u02/oradata/S102/system01.dbf (SYSTEM) - Active TX blocks: 1541 *** Objects that may require undo data: *** Asterisk ('*') denotes blocks being updated since 05/03/2007 12:02:06 (SCN: 0x0.0xc5eafd) *** Using XID: 9.ffff.ffffffff DataObj#: 114 Name: SYS.I_IDL_UB11 Type: INDEX Datablocks: 3 Used undo segment IDs: 10, DataObj#: 73 Name: SYS.IDL_UB1$ Type: TABLE Datablocks: 1538 Used undo segment IDs: 10, *** Use validate_objects.sql script file to validate the structure of possibly corrupt objects if the undo required is not available. NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE! ************************************************************************************* Upload the logfile (TXChecker_050307_1518_S102.log) to Oracle Support Services for analysis. Do NOT attempt to force the database open until the logfile has been analyzed. Back to Contents Appendix B: Listfile Example The listfile should contain a list of fully qualified datafile names: /u02/oradata/S901/system01.dbf /u03/oradata/S901/tools01.dbf /u03/oradata/S901/users01.dbf Back to Contents Appendix C: Workaround to the 250 X$KCCFE rows limitation (version 1.3 ONLY) The current version of TXChecker has a limitation where it will not work with a database that has more than 250 rows in the X$KCCFE table. This table represents datafile entries contained in the controlfile and is the basis for the V$DATAFILE view. Even when datafiles are deleted, their entries will still exist in X$KCCFE and X$KCCFN (file name table) until the slot is reused when new datafiles are added to the database. It is possible after dropping a large number of datafiles there are a greater number of rows in x$kccfe compared to v$datafile, which as a consequence will slow down queries of v$datafile. The only way to remove this row count difference is to rebuild the controlfile. If there are genuinely a large number of datafiles in the database (more than 250) a new temporary controlfile can be created which will only contain subset of the database datafiles, allowing TXChecker to run succesfully. database datafiles, allowing TXChecker to run succesfully. The instructions below will detail how to create a temporary controlfile for use with TXChecker that will contain 250 datafiles or less, working around the current limitation. The controlfile should NOT be used to open the database. NOTE: It is recommended to make a backup of the current controlfile before you do this, just in case you overwrite the controlfile by accident. 1. Mount the database using the current controlfile: SQL> STARTUP MOUNT 2. Create the script to create the temporary controlfile: SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE NORESETLOGS; SQL> show parameters user_dump_dest -- This is where the trace file is created SQL> SHUTDOWN IMMEDIATE; 3. Edit the create controlfile script and remove the files that are not needed for scanning by TXChecker: NOTE: Make sure you include the SYSTEM and UNDO datafiles as these are needed by TXChecker to access the data dictionary. The location of the script to edit is located in the user_dump_dest directory. Remove all lines before the first STARTUP MOUNT command Remove any datafiles listed after 'DATAFILE' that you don't want to scan and will bring the datafile count under 250 Example: CREATE CONTROLFILE REUSE DATABASE "F102" NORESETLOGS NOARCHIVELOG MAXLOGFILES 16 MAXLOGMEMBERS 3 MAXDATAFILES 500 MAXINSTANCES 8 MAXLOGHISTORY 292 LOGFILE GROUP 1 '/u01/oradata/F102/redo01.log' SIZE 50M, GROUP 2 '/u01/oradata/F102/redo02.log' SIZE 50M, GROUP 3 '/u01/oradata/F102/redo03.log' SIZE 50M DATAFILE '/u01/oradata/F102/system01.dbf', '/u01/oradata/F102/undotbs01.dbf', '/u01/oradata/F102/sysaux01.dbf', '/u01/oradata/F102/users01.dbf', '/u01/oradata/F102/example01.dbf', '/u01/oradata/F102/streams01.dbf', ...etc. '/u01/oradata/F920/tbs25_6.dbf', '/u01/oradata/F920/tbs25_7.dbf', '/u01/oradata/F920/tbs25_8.dbf', '/u01/oradata/F920/tbs25_9.dbf', '/u01/oradata/F920/tbs25_10.dbf' CHARACTER SET WE8ISO8859P1; Remove any more text after the create controlfile statement. NOTE: Make sure you remove any commands after the create controlfile statement!!! Save the edited trace file and rename it to cr_ctl.sql 4. Edit the init.ora file so the create controlfile script will create the temporary controlfile in a new location without touching the current controlfile: If you are using an spfile, you need to create a pfile first: SQL> STARTUP NOMOUNT SQL> CREATE PFILE FROM SPFILE; -- Created in the $ORACLE_HOME/dbs directory SQL> SHUTDOWN IMMEDIATE NOTE: If you are using an spfile move it out of the way before moving to step #5 so that it won't be read on startup. Edit the init.ora in the $ORACLE_HOME/dbs directory and change the control_files parameter: Change from (example): control_files='/u01/oradata/F102/control01.ctl','/u01/oradata/F102/control02.ctl' To: control_files='/u01/oradata/F102/TXChecker_ctlfile.tmp' 5. Create the temporary controlfile: SQL> @cr_ctl.sql --- DO NOT OPEN the database -- SQL> SHUTDOWN IMMEDIATE; 6. Run TXChecker against this new controlfile. You can use the -d to scan the database, and all of the datafiles contained in the temporary controlfile will be scanned. 7. Edit the init.ora file back to use correct controlfiles or use the old spfile. When the database is opened the correct controlfile will be used, like before. Disclaimer Oracle Support Services provides TXChecker for diagnostic purposes ONLY. It has been tested and appears to work as intended as a read-only tool against a problem database to provide diagnostic data to be used by Oracle Support Services to help in formulating the best course of action after certain undo segment related failures. TXChecker will not provide you direct advice guaranteed to fix your failure and the logfile created MUST be uploaded to Oracle Support for further analysis. You should always run new scripts on a test instance initially.
Oracle TXChecker THE TXCHECKER USER GUIDE
备份TB级别Oracle数据库的一些技巧
备份TB级别数据的一些技巧
1、 考虑使用增量备份, 不要老是想着用全量备份
2、 对于增量备份而言 开启block change tracking 能极大地减少物理读,提升速度
3、 11g以后对于bigfile tablespace可以启用section size 来提升速度
4、 考虑到负载更低的 Data Guard物理备库上去做备份
5、 启用备份并行
CONFIGURE DEVICE TYPE DISK PARALLELISM 16;
6、 对于多节点RAC 使用 service控制负载均衡
srvctl add service –d -s -r -a , srvctl add service –d -s -r -a , For example srvctl add service –d maclean –s bkup1 –r maclean1 –a maclean2,maclean3,maclean4,maclean5,maclean6,maclean7,maclean8 srvctl add service –d maclean –s bkup2 –r maclean2 –a maclean1,maclean3,maclean4,maclean5,maclean6,maclean7,maclean8 srvctl add service –d maclean –s bkup3 –r maclean3 –a maclean1,maclean2,maclean4,maclean5,maclean6,maclean7,maclean8 srvctl add service –d maclean –s bkup4 –r maclean4 –a maclean1,maclean2,maclean3,maclean5,maclean6,maclean7,maclean8 srvctl add service –d maclean –s bkup5 –r maclean5 –a maclean1,maclean2,maclean3,maclean4,maclean6,maclean7,maclean8 srvctl add service –d maclean –s bkup6 –r maclean6 –a maclean1,maclean2,maclean3,maclean4,maclean5,maclean7,maclean8 srvctl add service –d maclean –s bkup7 –r maclean7 –a maclean1,maclean2,maclean3,maclean4,maclean5,maclean6,maclean8 srvctl add service –d maclean –s bkup8 –r maclean8 –a maclean1,maclean2,maclean3,maclean4,maclean5,maclean6,maclean7 srvctl start service –d -s <service_name #1> Alternatively, to start all the services for a particular database srvctl start service –d run { allocate channel ch01 connect string ‘sys/@<scan_address>/<service_name>; allocate channel ch02 connect string ‘sys/@<scan_address>/<service_name>; allocate channel ch16 connect string ‘sys/@<scan_address>/<service_name>; backup database …; } For example run { allocate channel ch01 device type disk connect 'sys/welcome1@maclean-scan/bkup1'; allocate channel ch02 device type disk connect 'sys/welcome1@maclean-scan/bkup2'; allocate channel ch03 device type disk connect 'sys/welcome1@maclean-scan/bkup3'; allocate channel ch04 device type disk connect 'sys/welcome1@maclean-scan/bkup4'; allocate channel ch05 device type disk connect 'sys/welcome1@maclean-scan/bkup5'; allocate channel ch06 device type disk connect 'sys/welcome1@maclean-scan/bkup6'; allocate channel ch07 device type disk connect 'sys/welcome1@maclean-scan/bkup7'; allocate channel ch08 device type disk connect 'sys/welcome1@maclean-scan/bkup8'; allocate channel ch09 device type disk connect 'sys/welcome1@maclean-scan/bkup1'; allocate channel ch10 device type disk connect 'sys/welcome1@maclean-scan/bkup2'; allocate channel ch09 device type disk connect 'sys/welcome1@maclean-scan/bkup1'; allocate channel ch10 device type disk connect 'sys/welcome1@maclean-scan/bkup2'; allocate channel ch11 device type disk connect 'sys/welcome1@maclean-scan/bkup3'; allocate channel ch12 device type disk connect 'sys/welcome1@maclean-scan/bkup4'; allocate channel ch13 device type disk connect 'sys/welcome1@maclean-scan/bkup5'; allocate channel ch14 device type disk connect 'sys/welcome1@maclean-scan/bkup6'; allocate channel ch15 device type disk connect 'sys/welcome1@maclean-scan/bkup7'; allocate channel ch16 device type disk connect 'sys/welcome1@maclean-scan/bkup8'; backup as backupset incremental level 0 section size 128g database; }
【Oracle数据恢复】ORA-00600[kdBlkCheckError]错误解析
【Oracle数据恢复】ORA-600[kdBlkCheckError]错误解析
kdBlkCheckError本质上是 Kernel Data Block Check Error, 是对数据块的逻辑检测配合db_block_checking的存在,当db_block_checking!=false 且其发现了一个损坏,则报一个ORA-600 [kddummy_blkchk] / ORA-600 [kdBlkCheckError] . 进一步的损坏/坏块将被终止
kdBlkCheckError/kddummy_blkchk有一大堆的检测代码,每一个检测代码对应对数据块中数据的一个逻辑分析,如果分析发现存在mismatch则认为数据块存在逻辑讹误。
例如检测代码23001 代表Wrong total extent count。
如果出现ORA-00600[kdBlkCheckError]则一般是ORACLE 软件BUG 或者内存存在讹误。
如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!、
与该ORA-00600[kdBlkCheckError]错误相关的Bug Note如下:
NB | Bug | Fixed | Description |
17447078 | 12.1.0.2, 12.2.0.0 | Diagnostic enhancement for ORA-600 [kdBlkCheckError] .. [18007] errors | |
14400110 | 11.2.0.4, 12.2.0.0 | Bad redo / ORA-600 [kdBlkCheckError] .. [6135] for opcode 19.1 redo | |
12349316 | 11.2.0.4, 12.2.0.0 | DBMS_SPACE_ADMIN.TABLESPACE_FIX_BITMAPS fails with ORA-600 [kddummy_blkchk] / ORA-600 [kdBlkCheckError] / ORA-607 | |
14351566 | 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 | ORA-600 [kclchkblk_4] when doing flash back | |
16347904 | 11.2.0.4, 12.2.0.0 | Corrupt block with check code 6101, 6110 or 6255 on compressed table with DML workload | |
17325413 | 12.1.0.2 | Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption | |
14741727 | 11.2.0.2.9, 11.2.0.2.BP19, 11.2.0.3.BP12, 11.2.0.3.BP13, 12.1.0.1 | Fixes for bug 12326708 and 14624146 can cause problems – backout fix | |
14551844 | 11.2.0.4, 12.1.0.1 | ORA-600 [kdblkcheckerror] [6126] / ORA-600 [17182] for DELETE / INSERT (QMD / QMI) in COMPRESS BASIC table due to negative avsp | |
13804294 | 11.2.0.3.4, 11.2.0.3.BP07, 11.2.0.4, 12.1.0.1 | Internal errors, corruptions, using pipelined function whose rows raise exceptions | |
13715932 | 11.2.0.4, 12.1.0.1 | CREATE TABLE fails with ORA-600 [kddummy_blkchk] with large datafile | |
* | 13605839 | 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 | ORA-600 [ktbsdp1] ORA-600 [kghfrempty:ds]. Corruption in Rollback with Clusterwide Global Transactions in RAC |
12417369 | 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 | Block corruption from rollback on compressed table | |
* | 10205230 | 11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.1 | ORA-600 / corruption possible during shutdown in RAC |
10180121 | 11.2.0.3, 12.1.0.1 | ORA-600 [kdBlkCheckError] .. [6251] / block corruption during parallel DML | |
+ | 9724970 | 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 | Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check |
* | 9711859 | 10.2.0.5.1, 11.1.0.7.6, 11.2.0.2, 12.1.0.1 | ORA-600 [ktsptrn_fix-extmap] / ORA-600 [kdblkcheckerror] during extent allocation caused by bug 8198906 |
9541485 | 11.2.0.4, 12.1.0.1 | Create materialized view on views based on CON$ fails with ORA-600 [kdBlkCheckError] | |
* | 9406607 | 11.2.0.1.3, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 | Corrupt blocks in 11.2 in table with unique key. OERI[kdBlkCheckError] by block check |
9295217 | 11.2.0.2, 12.1.0.1 | ORA-600 [ktsk_dba_to_hwm-1] / corruption during SHRINK of HWM | |
+ | 9019113 | 11.2.0.1.BP02, 11.2.0.2, 12.1.0.1 | ORA-600 [17182] for OLTP COMPRESS table in OLTP Compression REDO during RECOVERY |
8720802 | 10.2.0.5, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.1 | Add check for row piece pointing to itself (db_block_checking,dbv,rman,analyze) | |
* | 8331063 | 11.2.0.3, 12.1.0.1 | Corrupt Undo. ORA-600 [2015] during rollback in undo block for COMPRESS table with SUPPLEMENTAL LOGGING |
8277580 | 11.1.0.7.2, 11.2.0.1, 11.2.0.2, 12.1.0.1 | Corruption on compressed tables during Recovery and Quick Multi Delete (QMD). | |
6523037 | 11.2.0.1.BP07, 11.2.0.2.2, 11.2.0.2.BP01, 11.2.0.3, 12.1.0.1 | Corruption / ORA-600 [kddummy_blkchk] [6110] on update | |
8437213 | 10.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1 | ASSM first level bitmap block corruption | |
8360192 | 11.1.0.7.6, 11.2.0.1 | ORA-600 [kdBlkCheckError] [6110] / corruption from insert | |
* | 8198906 | 10.2.0.5, 11.2.0.1 | OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents |
7715244 | 11.1.0.7.2, 11.2.0.1 | Corruption on compressed tables. Error codes 6103 / 6110 |
【Oracle ASM数据恢复】ORA-15038: disk ‘XXXXXXX’ mismatch on ‘Time Stamp’ with Target Disk Group错误解析
如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!
如果在mount diskgroup时遇到如下错误,则你有必要读一下这篇文章:
SQL> alter diskgroup DATA mount; alter diskgroup DATA mount * ERROR at line 1: ORA-15032: not all alterations performed ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "17" is missing from group number "1" ORA-15042: ASM disk "16" is missing from group number "1" ORA-15042: ASM disk "15" is missing from group number "1" ORA-15042: ASM disk "14" is missing from group number "1" ORA-15042: ASM disk "13" is missing from group number "1" ORA-15038: disk 'ORCL:DATA25' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156] ORA-15038: disk 'ORCL:DATA24' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156] ORA-15038: disk 'ORCL:DATA23' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156] ORA-15038: disk 'ORCL:DATA22' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156] ORA-15038: disk 'ORCL:DATA21' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156] [oracle@mlab2 ~]$ oerr ora 15042 15042, 00000, "ASM disk \"%s\" is missing from group number \"%s\" " // *Cause: The specified disk, which is a necessary part of a diskgroup, // could not be found on the system. // *Action: Check the hardware configuration. // [oracle@mlab2 ~]$ oerr ora 15038 15038, 00000, "disk '%s' mismatch on '%s' with target disk group [%s] [%s]" // *Cause: An attempt was made to mount into a disk group a disk whose // recorded allocation unit size, metadata block size, physical // sector size, or creation time stamp was inconsistent with the other // disk group members. // *Action: Check if the system configuration has changed. Verify disk // discovery string. //
该错误一般是由于OS级别的多路径配置存在问题,导致某个节点上出现大量重复的存储设备而引起的。
对于该类问题,首先从alert.log确认最近一次成功mount时那些磁盘对mount而言是必要的,例如:
NOTE: cache opening disk 11 of grp 1: DATA2 label:DATA2 <<<<<<<<<<<<<<<<<< 5 NOTE: cache opening disk 12 of grp 1: DATA20 label:DATA20 > NOTE: cache opening disk 13 of grp 1: DATA3 label:DATA3 <<<<<<<<<<<<<<<<<<<<<< 6 NOTE: cache opening disk 14 of grp 1: DATA4 label:DATA4 NOTE: cache opening disk 15 of grp 1: DATA5 label:DATA5 NOTE: cache opening disk 16 of grp 1: DATA6 label:DATA6 NOTE: cache opening disk 17 of grp 1: DATA7 label:DATA7 NOTE: cache opening disk 18 of grp 1: DATA8 label:DATA8 NOTE: cache opening disk 19 of grp 1: DATA9 label:DATA9
接下来可以通过kfed来验证DISK HEADER,使用如下脚本:
#! /bin/sh rm /tmp/kfed_DH.out /tmp/kfed_FS.out /tmp/kfed_BK.out /tmp/kfed_FD.out /tmp/kfed_DD.out /tmp/kfed_PST.out for i in `ls /dev/asm-disk*` do echo $i >> /tmp/kfed_DH.out kfed read $i >> /tmp/kfed_DH.out echo $i >> /tmp/kfed_FS.out kfed read $i blkn=1 >> /tmp/kfed_FS.out echo $i >> /tmp/kfed_BK.out kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out echo $i >> /tmp/kfed_FD.out kfed read $i aun=2 blkn=1 >> /tmp/kfed_FD.out echo $i >> /tmp/kfed_DD.out kfed read $i aun=2 blkn=2 >> /tmp/kfed_DD.out echo $i >> /tmp/kfed_PST.out kfed read $i aun=1 blkn=2 >> /tmp/kfed_PST.out done
【Oracle数据恢复】ORACLE数据库启动startup 停止shutdown相关文档汇总
【Oracle数据恢复】ORACLE数据库启动startup 停止shutdown相关文档汇总
在ORACLE数据库启动过程中,首先启动的是实例instance即nomount阶段, 之后再通过CONTROL_FILES 参数指定的控制文件位置来加载数据库MOUNT阶段。 下一个步骤是打开数据库open ,这个步骤包含打开数据文件和redo logfile重做日志文件。接着如果之前的一次shutdown不是优雅关闭的话, 则需要完成前滚(实际就是apply redo)和回滚未提交的事务(rollback)。
数据库的停止和启动一样也分成几个阶段。 首先数据库被关闭 close这让数据库不在发生变化,即数据文件和redo logfile日志文件均被关闭。接着数据库进入dismount阶段,实例将与数据库脱离联系。当数据库被unmount之后,ORACLE数据库关闭其控制文件。下一步是 移除SGA和中止后台进程 这样实例就算关闭了。
有多种shudown 的模式例如normal,immediate,transaction ,和abort. 除去SHUTDOWN ABORT外,数据库都会将SGA中必要的数据写出到磁盘上的数据文件和redo logfile中。 如果SHUTDOWN ABORT或者发生异常中止,则SGA中必要的数据仍未写出,需要在下次数据库启动时 完成crash recovery,这些都是ORACLE数据库自动完成的。
Startup upgrade/migrate 或者 设置 _system_trig_enabled = FALSE 都将在启动过程中禁止触发器
如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!
Startup Slow / Hang Startup can hang in any of the stages like nomount,mount or open stage. Following are some of the known issues reported so far: Note 1367724.1 Startup Hangs at Mount Stage after AIO is Enabled on Linux Note 429390.1 Database Startup Takes Longer time After Upgrade To 10.x Note 552019.1 New Install and Creation of a 10gR2 database Hangs at Startup Nomount Note 344933.1 DB Startup Can Hang if USE_INDIRECT_DATA_BUFFERS=TRUE and AWE_WINDOW_MEMORY Is Set Too High Note 838451.1 Startup Hangs When "Processes" Parameter is Higher Than 10800 Note 1076092.1 Instance Startup Hangs After Creating New Undo Tablespace and Switching Between Old and New Note 1476526.1 Database STARTUP NOMOUNT very slow after migrate to T4-4 Servers SPARC 64-Bit Note 1475621.1 Oracle Database Startup And SGA Value Change Takes A Very Long Time To Complete ORA-27102 Errors ORA-27102 errors normally occurs due to memory issues.The Common causes could due to Semaphore Kernel misconfigurations,Memory related Ulimit settings,RAM and swap configurations. Note 274092.1 LOCK_SGA on Windows fails with ORA-27102 Note 219752.1 ORA-27102 OSD-00034 Starting Database on Windows 2000 Note 390547.1 ORA-27102 Cannot Startup Instance via sqlplus Note 1060677.6 ORA-27100 ORA-27102 Trying to Start 8.0.4 Database Note 399895.1 Database Startup On Solaris 10 Fails With Ora-27102 Out Of Memory Error Note 467707.1 ORA-27102: Out Of Memory on Oracle 10g Solaris 10 (x86-64) Note 7272646.8 Bug 7272646 - Linux-x86_64: ORA-27103 on startup when MEMORY_TARGET > 3g Note 790205.1 UNABLE TO START INSTANCE WITH LARGE SGA ORA-27102 SVR4 ERROR: 22: INVALID ARGUMENT Note 263537.1 ORA-27102 out of memory When Trying To Start Database With SGA> 4G Note 842881.1 ORA-27102, OSD-00031 Unable To Extend Memory_max_target And Memory_target Past 2GB Note 1449714.1 Startup Fails With ORA-27102 After Upgrade From 10.2.0.3 To 10.2.0.4 Note 461519.1 ORA-27102 Database Will Not Start With SHMMAX Set To 8589934592 (8GB) Note 351930.1 Receiving ORA-27102 For 32-Bit Oracle While Allocating Sga_max_size Greater Than 4gb Note 1351705.1 Startup Fails with ORA-27102 and 'SVR4 Error 28: No space left on device' Note 1292225.1 ORA-27102 OSD-00025 O/S-Error: (OS 1453) When Lock_sga is Set to True Note 401077.1 Ora-27102: Out Of Memory: Linux Error: 12: Cannot Allocate Memory with LOCK_SGA=TRUE Note 577898.1 ORA-27102 Received At Startup When LOCK_SGA Is Set Although Enough Memory Is Available Note 301830.1 Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device Note 859898.1 The ORA-27102 error is generated on Solaris 10 having apparently correct settings of kernel parameters Note 779861.1 ORA-27102: Out Of Memory And SVR4 Error: 22: Invalid argument During Startup On Solaris10 Server With Multiple DB Instances ORA-27300 Errors These errors are generally reported when the Operating System called for error or when there was a connection killed or a network interconnection failures or an OS configuration issue.The error ORA-27300 will also be accompanied by ora-27301 and ora-27302 Note 560309.1 Database Cannot Start Due to Lack of Memory Note 579365.1 Troubleshooting ORA-27300 ORA-27301 ORA-27302 errors Note 314179.1 Instance Startup Fails With Error ORA-27154,ORA-27300,ORA-27301,ORA-27302 Note 812115.1 Startup Fails With ORA-27300: Os System Dependent Operation:Fork Failed With Status:17 Note 814896.1 Startup Fails With ORA-27300: OS system dependent operation:IPC init failed with status: 65 Note 949468.1 Database Startup Fails with ORA-27300: OS system dependent operation:semget failed with status: 28 ORA-64 Errors This error could occur when the database init.ora parameter calling for more resources than the Operating System is configured to provide.The parameters could be PROCESSES,DB_BLOCK_SIZE,SGA and more. Note 470742.1 ORA-00064 Starting Database With Dispatchers Note 556258.1 ORA-64 if Db_keep_cache_size is Set to 70 Gigs Note 283980.1 ORA-00064: object is too large to allocate on this O/S Note 1232463.1 ORA-00064: Object Is Too Large To Allocate On This O/S Note 4466378.8 Bug 4466378 - ORA-64 does not report the caller description Note 179301.1 Instance Startup Fails With ORA-00064 After Increasing Processes Note 1457812.1 ORA-00064 Error Reported After Increasing Processes Parameter value Note 18255.1 OERR: ORA 64 object is too large to allocate on this O/S <num, num> Note 310838.1 Instance Startup failed with ORA-00064 when processes parameter set to High Value Note 886312.1 Database startup can fail with ORA-00064 Errors with huge sga_target of over 40Gig Note 1328620.1 ASM Instance Is Not Coming Up ORA-00064 (1,4468736,Kfchl Array) Kfchl Array Note 815954.1 ORA-64 when starting ASM instance after changing db_cache_size and shared_pool_size Note 7659217.8 Bug 7659217 - ORA-64 attempting to startup with a large SGA / buffer cache size Note 386855.1 'startup migrate' failed with ORA-64 while upgrading to 10.2.0.2 with DBUA ORA-27123 Errors Note 167250.1 ORA-27123 When Connecting As Non Oracle User Note 207743.1 Getting ORA-27123 when trying to startup Oracle Note 115753.1 UNIX: Resolving the ORA-27123 error Note 307323.1 Ora-27123 When Creating New Database Note 167250.1 ORA-27123 When Connecting As Non Oracle User Note 207743.1 Getting ORA-27123 when trying to startup Oracle Note 61912.1 OERR: ORA-27123 unable to attach to shared memory segment Note 250966.1 ORA-27123 During Startup, Immediately after a Shutdown Note 872532.1 ORA-27123 on RHEL5 (PAE) 32bit when SGA larger than 2Gb Note 552633.1 Starting the Database with SGA_TARGET set Fails with ORA-27123 Note 437582.1 Export Fails With EXP-00056 ORA-01034 ORA-27123 EXP-00005 Note 733974.1 ORA-27123 During Startup Nomount in 11G on AIX, Failure in SHMAT() Note 369262.1 Startup with Maximum SGA Fails With Ora-27123 Unable To Attach Sga Note 735187.1 Cannot Create Database With DBCA - Startup Nomount Gives ORA-27123 Note 1268668.1 Oracle Database 10g R2 Version 10.2.0.3 - Receiving Ora-27123 Errors Note 390766.1 ora-27123 on Solaris 10 with larger than 1.6 - 1.7Gb SGA Note 384262.1 ORA-01034, ORA-27123, HP-UX Error 22 Connecting Via Oracle Net Note 149070.1 VMS: Connections fail with ORA-1034 and ORA-27123 errors Note 453930.1 Connecting to the database fails with ORA-12547, Ora-600 [Ksmlsge1], ORA-27123, Error 13 Note 207797.1 ORA-1034 ORA-27123 SVR4 Error: 13: Permission denied when other then Oracle user Note 356957.1 OpenVMS: Client Connections Report ORA-1034, ORA-27123, %SYSTEM-W-REGISFULL Note 1401726.1 ORA-27123 When Starting Instance With No Setting Of SGA_TARGET Or SGA_MAX_SIZE Note 730107.1 Getting ORA-27123 & not able to run Oracle when logging to server by a user other than that installed Oracle although it belongs to the same group ORA-1081 Errors This error could occur when we try to startup an instance that is already running or if the shared segments/semaphores already exist. Note 1010214.6 ORA-1081: Starting Instance Note 18657.1 OERR: ORA 1081 cannot start already-running Oracle - shut it down first NFS Related Issues Note 8418190.8 Bug 8418190 - Direct NFS warnings during database startup Note 236794.1 NFS Locking Problems Encountered During Database Startup Note 971406.1 DATABASE STARTUP HANGS AT MOUNTING CONTROLFILE WHEN DNFS IS ENABLED Note 1430654.1 Database Startup Failed with "Direct NFS: please check that oradism is setuid" Note 430920.1 NetApp: Using 'nolock' NFS Mount Option with non-RAC Systems Results in Database Corruption ORA-600 /ORA-7445 Errors Note 435436.1 ORA-00600: [kccpb_sanity_check_2] During Instance Startup Note 101589.1 Startup database returns ORA-00600 [ktpridestroy2] Note 405602.1 ORA-600 [16305] While Starting Up the Database Note 466596.1 Core Dumps In skgfqio() - Database Startup Hangs/Spins Note 336447.1 Startup Database Produces ORA-00600: [Keltnfy-Ldminit] Note 847786.1 Can Not Open Database After Shutdown get ORA-7445 [kewa_dump_time_diff] Note 779071.1 Unable to Start Instance Due to ORA-600 [skgmhash] after a clean shutdown Note 549000.1 ORA-600 [6006] ORA-600 [6856] During Startup Instance, Followed by Termination by SMON Note 453775.1 Database Startup Fails With ORA-7445 [INVALID PERMISSIONS FOR MAPPED OBJECT] After Creation of User LBACSYS Transaction Recovery Slowness There could be slowness in the database during the open phase when the database is busy performing transaction recovery. Note 1494886.1 Database Transaction Recovery Note 414242.1 Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery Note 12934890.8 Bug 12934890 - Startup hangs waiting for row cache lock due to open transaction against UNDO$ ORA-704 Errors This is a general error reported at startup when there is some problem during processing of bootstrap information.There should be an accompanying error/s. Note 18494.1 OERR: ORA 704 "bootstrap process failure" Note 560417.1 Recovery Through Upgrade returns ORA-1092 on Open Note 1349722.1 Ora-00704,Ora-39700: Database Must Be Opened With Upgrade Option Note 435337.1 Unable To Open Database Before/After Upgrade - ORA-00704 ORA-39700 ORA-01092 Note 1383179.1 Unavailable Bootstrap Object ACCESS$ Causes ORA-704 ORA-604 ORA-942 When Opening Database Note 1345417.1 After failed upgrade, startup from a restored backup fails on ORA-00704 and ORA-39700 ORA-9968 Errors There are some client shadow processes hanging. Although the lk< SID> file is deleted the hanging processes still have a lock on the open file handle. This prevents the database to startup although a new lk file can be created successfully. An oracle process (background or shadow process) that exists while the instance is not started (crashed or not cleanly stopped) can have a lock on a file while this file is actually removed from the system. This is because on UNIX there is still a lock on the open file handle. Note 467251.1 ORA-09968, ORA-01102 When Starting a Database Note 160395.1 Database Startup Fails with ORA-1102 and ORA-9968 Note 1488147.1 Instance Startup Raises Error ORA-09968: unable to lock file (Doc ID 1488147.1) ORA-12547 Errors The error ORA-12547 indicates that the communication channel has been broken. It's most often thrown because the other end of the process went away unexpectedly. Note 1307075.1 Oracle Database Fails to Start with Error ORA-12547 Note 381566.1 connect / as sysdba Fails with Ora-12547 And Tns-12514 Note 744512.1 Ora-12547: Tns:Lost Contact Creating Database After Clean Installation ORA-27125 Errors Note 1067569.6 HP-UX: ORA-27125: NOT OWNER TRYING TO LOCK THE SGA IN MEMORY Note 199068.1 OpenVMS: Instance startup fails with ORA-27125 and %SYSTEM-F-VA_IN_USE Note 121983.1 Starting Database Fails on Solaris with ORA-27126 or ORA-27125 Using LOCK_SGA Note 577428.1 OpenVMS: Following an Oracle RELINK, Database Instance Startup fails with ORA-27125 or ORA-7217 ORA-1157 Errors The background process was not able to find one of the datafiles.The database will prohibit access to this file but other files will be unaffected.However, the first instance to open the database will need to access all online datafiles.Accompanying messages from the operating system will describe why the file was not found. Note 184327.1 ORA-1157 Troubleshooting Note 1035992.1 Oracle Troubleshooting Note 212053.1 ORA-1157/ORA-1110 Trying To Open The Database Note 145194.1 ORA-1157 ORA-1110 ORA-27086 Starting up Database Note 444151.1 ORA-01157 on Database Startup After Dropping an Alias Note 301635.1 ORA-01157, ORA-01110, ORA-27046 Starting A Restored Database Note 429912.1 ORA-01157 ORA-01110 ORA-27086 after crash prevents database from opening Note 256835.1 Database Startup Fails With ORA-1110, ORA-1157, ORA-27092 Trying Startup From 'at' or 'cron' on HP-UX Other Issues Note 1113864.1 MBIND: Cannot Allocate Memory On Startup Note 578536.1 MBIND: CANNOT ALLOCATE MEMORY ON STARTUP Note 6795133.8 Bug 6795133 - Startup delayed by QMNC queries Note 301072.1 Dbstart Fails With Ora-01031 When Called From User Root Note 1286665.1 ORA-00371: Not Enough Shared Pool Memory signalled on Startup Note 779356.1 Database not starting up with errors ORA-01092 ORA-24324 ORA-01041 Note 1176443.1 ORA-4031 During Startup Nomount using RMAN without parameter file (PFILE) Note 839789.1 ORA-12853 / ORA-4031 or ORA-4030 on Instance Startup With increased SGA size Common Errors / Issues During Database Shutdown The most common issue observed while bringing down the database is shutdown immediate hang. The main reasons for Shutdown immediate hang is: - processes still continue to be connected to the database and do not terminate. - SMON is cleaning temp segments or performing delayed block cleanouts. - Uncommitted transactions are being rolled back. The below section provides the consolidated list of known issues during shutdown. The documents mentioned in the below section can be specific to platform or database versions. Shutdown Slow / Hang Note 1197314.1 Shutdown Normal Hung On ORA_J00# Process Note 309230.1 Database Doesn't Shutdown Immediate During Server Boot Note 1039389.6 Alert Log: Shutdown Waiting for Active Calls to Complete Note 305666.1 Shutdown is Cancelled With ORA-1013 After Waiting for an Hour Note 1194229.1 Database shutdown immediate Hangs: Startup can also hang Note 1183213.1 Shutdown Normal or Immediate Hang Waiting for MMON process Note 428688.1 Bug 5057695: Shutdown Immediate Very Slow To Close Database Note 416658.1 Shutdown Immediate Hangs / Active Processes Prevent Shutdown Note 437876.1 Database Does Not Shutdown Cleanly When Oracle Service Is Restarted Note 304414.1 Shutdown hangs in 9i with: SHUTDOWN: waiting for logins to complete Note 332177.1 Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort Note 13440516.8 Bug 13440516 - Index skip scan cannot be interrupted - can block shutdown Note 12879056.8 Bug 12879056 - Index skip scan cannot be interrupted - can block shutdown ORA-1031 Errors Note 309059.1 Oradim Command Fails to Shutdown Database(s) with ORA-01031 under 9.2.0.6 Note 846679.1 Ora-1031 Error Stopping Database Or Permission Denied Error Running Lsnrctl Transaction Recovery Note 375935.1 What To Do and Not To Do When 'shutdown immediate' Hangs Note 1076161.6 Shutdown Normal or Shutdown Immediate Hangs. SMON disabling TX Recovery Note 414242.1 Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery Note 100054.1 Transaction Rollback after a failed operation or during Database Shutdown ORA-600 / ORA-7445 Errors Note 604067.1 Ora-600[3708] On Database Shutdown Note 455181.1 ORA-00600[17302] During Shutdown Immediate Note 470362.1 ORA-07445 With kpogup At Database Shutdown Note 1135453.1 Database Hung On Shutdown After ORA-600 [KGHFRE3] Error Note 359563.1 ORA-00600: Internal Error Code, Arguments: [17302], [2] During Shutdown Note 435926.1 Shutdown Database Erroring ORA-600 [Librarycachenotemptyonclose], [] Note 8519322.8 Bug 8519322 - ORA-600 [17148] / ORA-600 [730] on database shutdown Note 1326908.1 Ora-00600: [3708], ORA-600 [2103] When Database shutdown on IBM:Linux on System Z ORA-24324 Errors Note 794293.1 ORA-24324 During Shutdown Note 1168554.1 Ora-24324 And Ora-1041 Errors Trying To Startup Or Shutdown The Database DISM Note 1001248.1 On Solaris 9 Systems, Oracle Shutdown May Hang If Utilizing Dynamic Initmate Shared Memory (DISM) Memory Related Errors Note 1319253.1 "ERROR: SGA memory leak detected" message in alert.log on database shutdown Other Issues Note 1017085.102 ORA-01122, ORA-01210, ORA-01110: On Database Shutdown Note 429603.1 ORA-29702 During Automatic Shutdown of Database using ASM Note 1022414.6 ORA-01033 DATABASE INITIALIZATION OR SHUTDOWN IN PROGRESS Note 784754.1 11g - Receiving Ora-27167 Error On Database Startup or Shutdown Note 419651.1 Event 10621 and Event 10626/10629 Causes Shutdown Immediate to Hang Note 118228.1 ALERT: Hang During Startup/Shutdown on Unix When System Uptime > 248 Days Note 343031.1 How to deal with an ORA-01033 'Oracle startup or shutdown in progress' error Note 18302.1 OERR: ORA 106 cannot startup/shutdown database when connected to a dispatcher Note 763932.1 Shutdown Error In EM: Execution Failed Due To Binary Missing Or Permission Issues Note 10194190.8 Bug 10194190 - Solaris: Process spin / ASM and DB crash if RAC instance up for > 248 days Note 1001248.1 On Solaris 9 Systems, Oracle Shutdown May Hang If Utilizing Dynamic Initmate Shared Memory (DISM) Note 760968.1 Database Startup, Shutdown Or New Connections Hang With Truss Showing OS Failing Semtimedop Call With Err#11 EAGAIN Issues specific to automatic shutdown and startup This is specific to the automatic shutdown and startup that can be configured with the dbora / dbshut / dbstart scripts. Automatic Startup Failure The key to diagnosing automatic startup failures is to determine where startup fails. This can be done via the following steps: Determine if instance starts manually as Oracle software owner. Determine if instance starts via dbstart command run as Oracle software owner. Determine if instance starts when root runs following dbstart command: su - $ORA_OWNER -c $ORA_HOME/bin/dbstart where $ORA_OWNER is set to Oracle software owner. Determine if instance starts when running as root the OS script which calls dbstart, ie "/etc/init.d/dbora start". NOTE: Running via sh -x command will show each command as it is run from script to better see what is going on. #> sh -x /etc/init.d/dbora start Automatic Shutdown Failure As with automatic startup, the key to diagnosing automatic shutdown failures is to determine where shutdown fails. This can be done via following steps: Determine if instance stops manually as Oracle software owner. Determine if instance stops via dbshut command run as Oracle software owner. Determine if instance stops when root runs command su - $ORA_OWNER -c $ORA_HOME/bin/dbshut where $ORA_OWNER is set to Oracle software owner. Determine if instance stops when running as root the OS script which calls dbshut, ie "/etc/init.d/dbora stop". NOTE: Running via sh -x command will show each command as it is run from script to better see what is going on. #> sh -x /etc/init.d/dbora stop
【Oracle数据恢复】ORA-600 [kcbgcur_9]错误解析
ORA-600 [kcbgcur_9]错误的触发原理是 当 buffer被以某种类型顺序pin住以便防止内部死锁时,发现当前buffer被以错误的类型顺序pinned或者同一class中有重复的buffer; 也可能在更新只读表空间时与ORA-372错误一起发生。
该ORA-600 [kcbgcur_9]错误隶属于内核 buffer cache模块
这个ORA-600 [kcbgcur_9]错误一般有4个argument:
Arg [a] The Database Block Address (DBA)
Arg [b] The buffer class
Arg [c] Block class order mask
Arg [d] Block class checking mask
ORA-600 [kcbgcur_9]错误可能的影响是造成进程失败,但一般不会造成数据块的损坏
ORACLE SUPPORT对遇到该问题的主要建议是 检查所有的系统硬件 磁盘和内存, 以便保证不是底层系统造成了该问题。
一些可能造成ORA-600 [kcbgcur_9]的BUG:
NB | Bug | Fixed | Description |
10061015 | 11.2.0.2.3, 11.2.0.2.BP08, 11.2.0.3, 12.1.0.1 | ORA-600 [ktfbhget:clsviol_kcbgcur_9] can occur on transported in tablespace | |
9407198 | 11.2.0.3, 12.1.0.1 | “LOG ERRORS INTO” can cause ORA-600 [kcb***] or hang scenarios | |
9133601 | 12.1.0.1 | ORA-600 [kcbgcur_9] from MERGE into an IOT | |
8459467 | 10.2.0.5, 11.2.0.1 | ORA-600 [ktfbhget:clsviol_kcbgcur_9] attempting to ONLINE a tablespace | |
7644703 | 10.2.0.5, 11.2.0.1 | OERI[kcbgcur_9] on alter tablespace online | |
6918892 | 11.2.0.1 | ORA-600 [kcbgcur_9] from DML on sys.aud$ | |
4917666 | 10.2.0.3, 11.1.0.6 | OERI[kcbgcur_9] from type constructor | |
8778379 | 10.2.0.5 | Fix event 10227 in 10.2 ORA-600[kcbgcur_3] or ORA-600[kcbgcur_9] | |
3655948 | 10.1.0.5, 10.2.0.1 | OERI:kcbgcur_9 during disconnect with distributed transaction | |
3127116 | 9.2.0.5, 10.1.0.2 | OERI:kcbgcur_9 during rollback of DML in ASSM segment | |
2771042 | 9.2.0.6, 10.1.0.2 | OERI[kcbgcur_9] when performing DML on an IOT with an FGA policy | |
2747873 | 9.2.0.8, 10.1.0.2 | DBMS_SPACE.UNUSED_SPACE may fail with OERI:kcbgcur_9 | |
2722809 | 9.2.0.4, 10.1.0.2 | OERI:kcbgcur_9 on direct load into AUTO space managed segment | |
2716406 | 10.1.0.2 | OERI[kcbgcur_9] executing DBMS_SPACE.FREE_BLOCKS procedure on bitmapped segment | |
2619895 | 9.2.0.8, 10.1.0.2 | Dump / OERI:kcbgcur_9 using PLSQL in SET clause of MERGE .. UPDATE | |
2392885 | 9.2.0.4, 10.1.0.2 | Direct path load may fail with OERI:kcbgcur_9 / OERI:ktfduedel2 | |
2311214 | 9.0.1.4, 9.2.0.2, 10.1.0.2 | OERI:KCBGCUR_9 possible for SQL against IOT | |
1785175 | 9.2.0.2, 10.1.0.2 | OERI:kcbgcur_9 from CLOB TO CHAR or BLOB TO RAW conversion | |
4124059 | 9.2.0.8 | OERI[kcbget_31] / similar KCB errors in single instance | |
2202310 | 9.0.1.4, 9.2.0.1 | OERI:KCBGCUR_9 possible from SMON dropping a rollback segment in locally managed tablespace | |
2035267 | 9.0.1.3, 9.2.0.1 | OERI:KCBGCUR_9 possible during TEMP space operations | |
1804676 | 8.1.7.3, 9.0.1.3, 9.2.0.1 | OERI:KCBGCUR_9 possible from ONLINE REBUILD INDEX with concurrent DML | |
1397075 | 8.1.7.2, 9.0.1.0 | OERI:KCBGCUR_9 from SMON during temp seg cleanup for segment in read only LOCALLY MANAGED TABLESPACE |
如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!
【Oracle数据恢复】ORA-00704 bootstrap process failure错误解析
【Oracle恢复】ORA-00704 bootstrap process failure错误解析,本质上这个发生在open database 过程中的错误 其主要原因是在处理自举 bootstrap信息时发现了数据存在的问题,有多重情况均可能导致该错误发生。
当数据库被强制FORCE REESTLOGS 打开时可能遇到如下的错误信息:
ORA-00704: bootstrap process failure ORA-00704: bootstrap process failure ORA-00604: error occurred at recursive SQL level 1 ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11$" too small Tue Jan 17 04:46:17 2012 Error 704 happened during db open, shutting down database USER: terminating instance due to error 704 Instance terminated by USER, pid = 5496 ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS... [oracle@mlab2 ~]$ oerr ora 704 00704, 00000, "bootstrap process failure" // *Cause: Failure in processing bootstrap data - see accompanying error. // *Action: Contact your customer support representative.
如果对于ORA-1555没有TRACE产生的话,可以以如下方法生成TRACE:
startup mount; alter system set events '1555 trace name errorstack level 3'; alter database open resetlogs;
到user dump目录下找到对应的TRACE文件和ALERT.LOG, 假设alert.log中出现了这样的信息:
ORA-01555: snapshot too old: rollback segment number 11 with name “_SYSSMU11$” too small
则说明此场景中引起 ORA-1555的 USN undo segment number是11, 11的16进制是 b
之后我们阅读之前生成的TRACE,主要的工作是 寻找 表或索引的数据块的 DUMP并发现其中ITL 带有undo segment number 为11的记录,如下面的例子:
BH (0xacff8e48) file#: 1 rdba: 0x0040003e (1/62) class: 1 ba: 0xacf66000 set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0 dbwrid: 0 obj: 18 objn: 18 tsn: 0 afn: 1 hash: [d0e04c98,d0e04c98] lru: [acff8fd8,acff8db8] ckptq: [NULL] fileq: [NULL] objq: [cc9a3d00,cc9a3d00] use: [ce699910,ce699910] wait: [NULL] st: CR md: EXCL tch: 0 cr: [scn: 0x0.4124e1e],[xid: 0x6.0.c28d],[uba: 0x820075.ea1.23],[cls: 0x0.46b5261],[sfl: 0x1] flags: Using State Objects ---------------------------------------- SO: 0xce6998d0, type: 24, owner: 0xd04439e8, flag: INIT/-/-/0x00 (buffer) (CR) PR: 0xd02fa378 FLG: 0x500000 class bit: (nil) kcbbfbp: [BH: 0xacff8e48, LINK: 0xce699910] where: kdswh02: kdsgrp, why: 0 buffer tsn: 0 rdba: 0x0040003e (1/62) scn: 0x0000.046b527a seq: 0x00 flg: 0x00 tail: 0x527a0600 frmt: 0x02 chkval: 0x0000 type: 0x06=trans data Hex dump of block: st=0, typ_found=1 Dump of memory from 0x00000000ACF66000 to 0x00000000ACF68000 0ACF67FF0 FFFF02C1 01FF8001 02C10280 527A0600 [..............zR] ... Block header dump: 0x0040003e Object id on Block? Y seg/obj: 0x12 csc: 0x00.46b519e itc: 1 flg: - typ: 1 - DATA fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000b.00b.00000e7a 0x00802042.00db.1a C--- 0 scn 0x0000.04624228
这里可以看到第一条 ITL 0x01 , 其中<undo no>.<slot>.<wrap> 对应为 0x000b.00b.00000e7a ,则USN 为 0x000b 即 十进制的11
而这个数据块属于0x0040003e –(1/62), 其FILE_ID=1 BLOCK_ID=62的数据块
这个块 实际是实例中已经读到buffer cache中的buffer ,找到buffer header BH中的SCN 信息 即为0x0000.046b527a
之后可以通过_mimimum_giga_scn 等方法调整SCN,或者通过修改ITL来绕过上述问题成功打开数据库。
如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!
诗檀软件专业数据库修复团队
服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com
【Oracle数据恢复】ORA-01578错误解析
一般认为ORA-1578是 数据块物理坏块/损坏的一种,不同于逻辑损坏/坏块,其错误信息如下:
如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!
诗檀软件专业数据库修复团队
服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com
[oracle@oel8 dirdat]$ oerr ora 1578 01578, 00000, "ORACLE data block corrupted (file # %s, block # %s)" // *Cause: The data block indicated was corrupted, mostly due to software // errors. // *Action: Try to restore the segment containing the block indicated. This // may involve dropping the segment and recreating it. If there // is a trace file, report the errors in it to your ORACLE // representative 这个1578一般 会伴随ORA-1110出现,一旦ORACLE读取到存在损坏的块就会报出: SQL> select * from scott.emp; select * from scott.emp * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 11, block # 34) 这里的FILE#是 RFN ORA-01110: data file 6:'/home/oracle/corrclass.dbf' 这里的是AFN
断裂的数据块Fractured Block:
Corrupt block relative dba: 0x0380e573 (file 14, block 58739) Fractured block found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x0380e573 last change scn: 0x0288.8e5a2f78 seq: 0x1 flg: 0x04 consistency value in tail: 0x00780601 check value in block header: 0x8739, computed block checksum: 0x2f00 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x0380e573 (file 14, block 58739) found same corrupted data
或者错误的校验和checksum
Corrupt block relative dba: 0x0380a58f (file 14, block 42383) Bad check value found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x0380a58f last change scn: 0x0288.7784c5ee seq: 0x1 flg: 0x06 consistency value in tail: 0xc5ee0601 check value in block header: 0x68a7, computed block checksum: 0x2f00 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x0380a58f (file 14, block 42383) found same corrupted data
或者错误的块头
Corrupt block relative dba: 0x0d805a89 (file 54, block 23177) Bad header found during buffer read Data in bad block - type: 6 format: 2 rdba: 0x0d805b08 last change scn: 0x0692.86dc08e3 seq: 0x1 flg: 0x04 consistency value in tail: 0x08e30601 check value in block header: 0x2a6e, computed block checksum: 0x0 spare1: 0x0, spare2: 0x0, spare3: 0x0 *** Reread of rdba: 0x0d805a89 (file 54, block 23177) found valid data
一些可能引起ORA-1578的BUG 列表如下:
如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!
NB | Bug | Fixed | Description |
13804294 | 11.2.0.3.4, 11.2.0.3.BP07, 12.1.0.0 | Internal errors, corruptions, using pipelined function whose rows raise exceptions | |
11707302 | 11.2.0.2.3, 11.2.0.2.BP06, 11.2.0.3, 12.1.0.0 | Corruption from ASM crash during rebalance diskgroup. Misplaced Blocks | |
11659016 | 11.2.0.3, 12.1.0.0 | ORA-1578 against recently create tablespace that once was encrypted | |
+ | 10209232 | 11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.0 | ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM |
* | 10205230 | 11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.0 | ORA-600 / corruption possible during shutdown in RAC |
9965568 | 12.1.0.0 | Block dumps are not formatted if there is a chkval error | |
9965085 | 11.2.0.3, 12.1.0.0 | ORA-1578 / ORA-8103 Temporary table block corruption / space wastage from PDML | |
9739664 | 11.2.0.2, 12.1.0.0 | ORA-1578 / ORA-26040 MANUAL RECOVER marks block as corrupt NOLOGGING in even if LOGGING is enabled | |
+ | 9724970 | 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.0 | Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check |
9407198 | 11.2.0.3, 12.1.0.0 | “LOG ERRORS INTO” can cause ORA-600 [kcb***] or hang scenarios | |
* | 9406607 | 11.2.0.1.3, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0 | Corrupt blocks in 11.2 in table with unique key. OERI[kdBlkCheckError] by block check |
* | 8943287 | 11.2.0.2, 12.1.0.0 | ORA-1578 corrupt block with AUTH SQL*Net strings |
* | 8898852 | 11.1.0.7.2, 11.2.0.1.1, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.0 | ORA-1578 Blocks misplaced in ASM when file created with compatible.asm < 11 and resized |
8885304 | 11.2.0.2, 12.1.0.0 | ORA-7445 [ktu_format_nr] during RMAN CONVERT or Corrupt fractured block of UNDO tablespace datafile | |
* | 8768374 | 10.2.0.5, 11.1.0.7.8, 11.2.0.1.BP11, 11.2.0.2, 12.1.0.0 | RFS in Standby with a wrong location for archived log corrupting/overwriting database files when max_connections > 1 |
8760225 | 11.2.0.2, 12.1.0.0 | Auto Block Media Recovery reports ORA-1578 on first query | |
8731617 | 11.2.0.3, 12.1.0.0 | ORA-1578 from DESCRIBE or CTAS even if table not accessed / ORA-959 from DBMS_STATS | |
8720802 | 10.2.0.5, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.0 | Add check for row piece pointing to itself (db_block_checking,dbv,rman,analyze) | |
8493978 | 11.2.0.2, 12.1.0.0 | Reserve file descriptors for datafile access | |
P | 12330911 | 12.1 | EXADATA LSI firmware for lost writes |
10025963 | 11.2.0.1.BP09, 11.2.0.2 | Block corruption of LOB blocks with checksum value but block has checksum disabled | |
8714541 | 11.2.0.2 | ORA-1578 Corrupt Block in ASM with 0xbadfda7a after ASM block repair due to disk read error when ASM mirror is used | |
13101288 | ORA-600, corruption or check errors dropping a column in a OLTP compressed table | ||
+ | 8354682 | 11.2.0.1 | ORA-1578 – Blocks can be misplaced in ASM when there is IO error and AU > 1MB |
+ | 8339404 | 10.2.0.5, 11.1.0.7.1, 11.2.0.1 | ORA-1578 – Blocks can be misplaced in ASM during a REBALANCE |
8227257 | 11.2.0.1 | ORA-1578 corruption found after media recovery on encrypted datafile | |
7396077 | 10.2.0.5, 11.2.0.1 | RMAN does not differentiate NOLOGGING corrupt blocks that produce ORA-1578/ORA-26040 | |
6471351 | 10.2.0.5, 11.1.0.7, 11.2.0.1 | ORA-1578 / ORA-26040 due to NOLOGGING after recovery despite of FORCE LOGGING | |
6674196 | 10.2.0.4, 10.2.0.5, 11.1.0.6 | OERI / buffer cache corruption using ASM, OCFS or any ksfd client like ODM | |
5515492 | 10.2.0.3, 11.1.0.6 | ORA-1578 corruption with Block Misplaced during ASM rebalance after IO error | |
5031712 | 10.2.0.4, 11.1.0.6 | DBV enhanced to report NOLOGGING corrupt blocks with DBV-201 instead of DBV-200 | |
+ | 4724358 | 11.1.0.6 | ORA-27045 ORA-1578 ORA-27047 corruption caused by DBMS_LDAP |
4684074 | 10.2.0.2, 11.1.0.6 | OERI:510 / block corruption (ORA-1578) with DB_BLOCK_CHECKING | |
4655520 | 9.2.0.8, 10.2.0.3, 11.1.0.6 | Block corrupted during write not noticed | |
4411228 | 9.2.0.8, 10.2.0.3, 11.1.0.6 | ORA-1578 Block misplaced with mixture of file system and RAW files | |
4344935 | 10.2.0.4, 11.1.0.6 | OERI from DML on TEMPORARY TABLE after autonomous TRUNCATE | |
7381632 | 11.1.0.6 | ORA-1578 Free corrupt blocks may not be reformatted when Flashback is enabled | |
8976928 | 10.2.0.5 | ORA-1578 caused by a former free corrupt block and remains unformatted | |
8684999 | 10.2.0.5 | ORA-1578 caused by a former free corrupt block and remains unformatted | |
+ | 3544995 | 9.2.0.6, 10.1.0.3, 10.2.0.1 | LOB segments with “CACHE READS” generate no REDO even with the logging option |
+ | 1281962 | 9.2.0.1 | Media recovery after ORA-1578 on rollback can cause logical inconsistency |
589855 | 7.3.3.6, 7.3.4.1 | ORA:1578 or ORA:8103 selecting invalid ROWID | |
406863 | 7.3.3.4, 7.3.4.0, 8.0.3.0 | ORA-1578 using PQ with heavy simultaneous INSERTS | |
P | 707304 | 7.3.4.4 | AIX: Resizing RAW datafile can corrupt a DB block |
603502 | 7.3.4.3, 8.0.4.4, 8.0.5.0 | Possible Corruption if a session with LOOPBACK DB Links aborts. |
ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196
If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.
Parnassusdata Software Database Recovery Team
Service Hotline: +86 13764045638 E-mail: service@parnassusdata.com
This document provides an explanation of error ORA-15196, including the details of each argument, suggestions for the diagnostic of the error and finally includes a case study using a real problem reported by a customer.
Error Description
ORA-15196 is reported after a validation of an ASM metadata block has failed. The error will be reported in the following format:
ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]
Where the arguments indicate:
Argument Meaning
- 1st Function and line number in the code, where the exception is raised 2nd Field failing the validation
- 3rd ASM object number stored in the block
- 4th ASM block number stored in the block
- 5th Value associated with field referenced by argument 2 6th Expected value for field referenced by argument 2
Example:
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]
Function and line number in the code, where the exception is raised = kfc.c:7997
Field failing the validation = endian_kfbh ASM object number stored in the block = 1 ASM block number stored in the block = 93
Value associated with field referenced by argument #2 = 211
Expected value for field referenced by argument #2 = 0
Arguments description
- Function and line number in the code, where the exception is raised
In general terms it is valid to say this argument will be the same in most of the possible cases, because is always the same routine where this exception is raised.
#define kfbValid(data, len, type, bl) \
kfbValidPriv(data, len, type, bl, FILE , LINE ).
- Field failing the validation
The ASM metadata is composed by many different structures like file directory, disk directory, active change directory (ACDC), etc, which are organized by files (asm file# between 1 and 255). Each file will be made of extents, which will be made of ASM block (4096 bytes). Each block has a generic block header (kfbh), and any of those fields can be validated.
kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 80 ; 0x004: T=0 NUMB=0x50 kfbh.block.obj: 1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check: 4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000
A short description of each of the fields referenced above (file kf3.h):
kfbh.endian endianness of writer big or little endian
kfbh.hard H.A.R.D. magic # and block size
kfbh.type metadata block type (type of ASM metadata)
kfbh.datfmt metadata block data format
kfbh.block.blk block location of this block
kfbh.block.obj check value to verify consistency
kfbh.check change number of last change
kfbh.spare1 zero pad out to 32bytes
kfbh.spare2 zero pad out to 32 bytes
A list of the fields reported by this error through different SR is:
endian_kfbh
obj_kfbl hard_kfbh
type_kfbh
datfmt_kfbh
check_kfbh
- ASM object number stored in the block
Every ASM metadata block belongs to a specific file associated with a specific ASM structure. That’s why ASM File numbers between 1 and 255 are used to identify the files storing those structures. The value on this field, references the ASM file number.
ASM File Number ASM Metadata
1 File Directory
2 Disk Directory
3 Active Change Directory (ACD)
4 Continous Operations Directory (COD)
5 Template Directory
6 Alias Directory
9 Attributes Directory
12 Staleness Directory
For other ASM metadata structures like PST, ATB, DISK HEADER, this field will have a static value 2147483648 (0x80000000)
- ASM block number stored in the block
An ASM file will allocate extents, which are associated with Allocation Units. Multiple ASM metadata blocks of 4096 bytes make the extent, considering the default Allocation Unit size of 1MB; there are 256 blocks on each extent/AU.
The value stored on this field indicates the block number relative to a particular file. In this example, (93) is the block number, which will be stored in the first extent of the file. That extent will be allocated on a specific Allocation Unit of any of the disks in the diskgroup.
- Value associated with field referenced by argument #2
This is the value found in the block for the field referenced in argument #2.
- Expected value for field referenced by argument 2
This is the expected value for the block referenced by argument # 2.
Having the description of all the arguments for error ORA-15196, It should be possible to have a better understanding of the message:
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]
In the previous example, the field failing the validations is endian_kfbh, belong to file 1 (FILE DIRECTORY); it was also relative block 93, and the value for endian_kfbh was 211 while the correct value should have been 0.
Diagnostics
Up to 10gR2, there are some bugs (patch included) related to this error.
5554692 | Related to indirect extent allocation. Please read the bug descriptionin webiv, because not all cases of ORA-15196 are this particular bug. |
6027802 | This was closed as not a bug, but was related to some IO issues caused by EMC Powerpath. Same type of data mismatch has been observed on other PP installations |
6453944 | ORA-15196 with ASM disks larger than 2TB using ASMLIB |
The major number of issues of this error is associated with data changed outside of ASM. This include:
- Disks formatted at the OS level while it was used by ASM
- Disks assigned to a file system while used by ASM
- IO errors (stale writes)
- Usage of 3rdparty software
Once this error is reported, the diskgroup needs to be recreated. There are situations where diskgroup cannot be mounted, or others where any reference to the metadata (recursive or non recursive), will signal the error and dismount the diskgroup.
Data Collection
In order to understand the extension of the problem and produce a correct diagnostic, it is essential to obtain the following data:
- Alert.log and trace file associated to the error
- First 300MB of the disk affected with the error
In the alert.log, review the line before the report of error ORA-15196:
WARNING: cache failed to read fn=1 blk=80 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]
In the line prior the report of error ORA-15196, it indicates the disk storing the block: from disk(s): 0.
To get the first 300MB:
$dd if=<device path> of=/tmp/disk.dd bs= 1048576 count=300
It may be necessary to provide partial copy of other disks in the diskgroup.
- Output from AMDU if available
AMDU will be explained with more detail in a different note (TBD).
This tool is part of the New Features introduced with 11g. It reads the ASM disks and extract information into different files. Those files have a mapping of the ASM metadata, an image with the content of the disks or it is possible to extract files from the diskgroup.
AMDU can extract the information even if the diskgroup is dismounted.
The mapping file is very important for the diagnostic of error ORA-15196. It has the specific location for each of the extents of each ASM metadata file.
Note 553639.1 is the placeholder for the AMDU binaries for some of the platforms.
Data Review
- Always review other blocks in the boundaries of the affected block. If more than one block has incorrect data (zeros), and they belong to different ASM structures (file directory, disk directory, etc), it is most likely was caused outside of ASM: disk reformatted, assigned to another volume manager, etc.
Use kfed to extract the content of the blocks.
- Reviewing the trace file generated by the error.
The trace file always will print a dump of the ASM metadata block in memory, and also a short call stack. The output of the block is the same generated by kfed, which is a readable by the user.
*** SERVICE NAME:() 2008-01-23 11:57:23.892
*** SESSION ID:(39.74) 2008-01-23 11:57:23.892
OSM metadata block dump:
kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 80 ; 0x004: T=0 NUMB=0x50
kfbh.block.obj: 1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check: 4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
/* data remove on purpose */
After the OSM metadata block dump, the short call stack is printed:
—– Abridged Call Stack Trace —–
kfcReadBlk()+1276 kfcLoad()+2148 kffbScanNext()+252 kffbTableCb()+700 kfgTableCb()+1252 kffilTableCb()+240 qerfxFetch()+896 qersoFetch()+720 qerjotFetch()+184 opifch2()+8092 kpoal8()+4196 opiodr()+1548 ttcpip()+1284 opitsk()+1432 opiino()+1128 opiodr()+1548 opidrv()+896 sou2o()+80 opimai_real()+124 main()+152
- Compare the data in the trace file with the data extracted from disk using kfed.
Comparing the block dumped in the trace file and the block in disk, it is possible to identify the exact cause of the check validation failure. Every case will be different, but if the data stored in disk is zeros, always remember to validate other blocks (adjacent). If more blocks are reporting invalid data (zeros), this is an indication the disk has been formatted outside ASM.
Example 1:
This is an example of a block with invalid data. The type of the block is KFBTYP_INVALID, generated when a incorrect type is stored.
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 34 ; 0x001: 0x22
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 4290772992 ; 0x004: T=1 NUMB=0x7fc00000
kfbh.block.obj: 0 ; 0x008: TYPE=0x0 NUMB=0x0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 13879 ; 0x010: 0x00003637
kfbh.fcn.wrap: 512 ; 0x014: 0x00000200
kfbh.spare1: 978943 ; 0x018: 0x000eefff
kfbh.spare2: 2054913149 ; 0x01c: 0x7a7b7c7d
Example 2:
The full content of the block has 0xd4.
disk:0 au:2 block:253 file:1 physical extent:0 block:253 kfed read ausz=1048576 blksz=4096 aunum=2 blknum=253 dev=/dev/rdsk/c2t50060E8000C41384d2s6 kfbh.endian: 212 ; 0x000: 0xd4 kfbh.hard: 212 ; 0x001: 0xd4 kfbh.type: 212 ; 0x002: *** Unknown Enum *** kfbh.datfmt: 212 ; 0x003: 0xd4 kfbh.block.blk: 3570717908 ; 0x004: T=1 NUMB=0x54d4d4d4 kfbh.block.obj: 3570717908 ; 0x008: TYPE=0xd NUMB=0x4d4d4 kfbh.check: 3570717908 ; 0x00c: 0xd4d4d4d4 kfbh.fcn.base: 3570717908 ; 0x010: 0xd4d4d4d4 kfbh.fcn.wrap: 3570717908 ; 0x014: 0xd4d4d4d4 kfbh.spare1: 3570717908 ; 0x018: 0xd4d4d4d4 kfbh.spare2: 3570717908 ; 0x01c: 0xd4d4d4d4 kfbtTraverseBlock: Invalid OSM block type 212 0000: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0020: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0040: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0060: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0080: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 00a0: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4
CASE STUDY
The diskgroup was not used for some months, used by a copy of a database. Due to business reasons, that database required to be used. Mounting the diskgroup was possible, but when the database was mounted, and reading the ASM metadata was required, error ORA-15196 was signaled and diskgroup dismounted.
The diskgroup was configured using external redundancy with a single disk and using the default Allocation Unit size of 1MB.
Data Collected
- The messages in the alert.log:
WARNING: cache failed to read fn=1 blk=256 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]
- The ASM block dumped in the trace file.
*** SESSION ID:(108.5) 2008-02-06 10:05:31.054
OSM metadata block dump:
kfbh.endian: 0 ; 0x000: 0x00 kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 7 ; 0x002: KFBTYP_ACDC
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 10752 ; 0x004: T=0 NUMB=0x2a00
kfbh.block.obj: 3 ; 0x008: TYPE=0x0 NUMB=0x3
kfbh.check: 1103194877 ; 0x00c: 0x41c16afd
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
- AMDU together with 300MB for the disk were collected.
Data Review
- The error:
WARNING: cache failed to read fn=1 blk=256 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]
The error provides the following information:
o The field failing the validation is obj_kfbl
o The block belongs to file 1 (fn=1). File 1 is the File Directory.
o The block is block 256 (blk=256)
o The value for obj_kfbl found was 3 but the expected value should be 1.
File extents, allocation units, blocks in ASM start at 0. Also, block size is 4096. Using the default AU size (1MB), there are 256 blocks. Block 256 is stored in the second extent.
Although the diskgroup was mounted, any query referencing x$kffxp trying to get the extent mapping for file 1 failed. As a result, it was not possible to identify the AU used by block 256 from file 1 (the affected block).
- Using AMDU
One of the files generated by AMDU is the mapping file (*.map) . That file contains the location on disk for every extent of the files stored in the diskgroup. The only record for file 1 was:
N0001 D0000 R00 A00000002 F00000001 I0 E00000000 U00 C00256 S0001 B0002097152
This line indicates that for File 1 (F00000001)), the first extent is stored in Allocation Unit 2 ( A00000002 ) from disk 0 ( D0000 ) .
t was not another entry for file 1 in the mapping file, but AMDU was generating a core dump. It was discovered AMDU was trying to read Allocation Unit 50.
One of the cool things of AMDU, is the possibility of dumping the content of a complete extent for a particular file, redirecting the output into a text file.
$amdu –diskstring ‘<path of device>’ –dump ‘<diskgroup name> -print ‘DG.F1.X1.B0.C256’
The previous command will dump 256 blocks of File 1 Extent 1 starting at block 0.
The results of the last command were:
************************** PRINTING XYZ.F1.X1.B0.C2 **************************
——————————– BLOCK 1 OF 2 ——————————–
…………………………………………………………………
disk:0 au:50 block:0 file:1 physical extent:1 block:0
kfed read ausz=1048576 blksz=4096 aunum=50 blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd
At this point the conclusions were:
- The ASM metadata shows that Allocation Unit 50 from disk 0 belongs to File 1.
——————————– BLOCK 1 OF 2 ——————————–
…………………………………………………………………
disk:0 au:50 block:0 file:1 physical extent:1 block:0
kfed read ausz=1048576 blksz=4096 aunum=50 blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd
- If the block belongs to file 1, the value for kfbh.block.obj field should have been 1 together with the value for kfbh.type, which should have been KFBTYP_FILEDIR. But that was not the case:
The error ORA-15196:
WARNING: cache failed to read fn=1 blk=256 from disk(s): 0
ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]
- The content dumped into the trace file was the same found on disk. The check validation failed because the data stored in the block was not part of the correct ASM metadata, in this case file directory.
The next step was to validate all the blocks in the same Allocation Unit. Those blocks belong to the same ASM metadata (KFBTYP_FILEDIR). One Allocation Unit is used exclusively by one unique file.
Example for block 1 from AU 50:
disk:0 au:50 block:1 file:1 physical extent:1 block:1
kfed read ausz=1048576 blksz=4096 aunum=50 blknum=1 dev=/emea/bde/home/users/jfiguer2/disk.dd
The solution
There was not an available backup for the database stored on the diskgroup, so it was required to keep the diskgroup mounted. Patching the ASM metadata, replacing the content of the first block from Allocation Unit 50, with a valid data.
It was not possible to rebuild the real data for the block 0, so it was replaced with block
- Additional patching was required, in order to adjust other fields in the block. Once the block was successfully patched, the diskgroup was mounted and queries on internal views did not dismount the diskgroup.
Opening the database report errors trying to identify one data file. The extent mapping for this file was stored in the patched block. Luckily that file was not relevant for the database. After setting the file offline, the database opened without errors.
Because was not possible to guarantee the integrity of the diskgroup, it was recommended to take a backup of the database and rebuild the diskgroup.
【Oracle数据恢复】ORA-600[4511]错误解析
ORA-00600[4511]错误的触发条件是当ORACLE检验一个行锁并发现该锁不再活动时。 当然也可以由于一个数据块中存放了4095条以上的行时,一般仅仅当使用32kb数据块时可能包含这么多行, 针对这种情况的一种workaround是限制块中的数据行数,可以通过设置pctfree为一个较高值。
ORA-00600[4511]错误属于内核事务块处理,该错误可能造成进程失败或者可能的坏块
如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!
如果用户已经了解到是那张表上存在可能引起ORA-00600[4511]错误的问题数据块,则可以通过执行ANALYZE TABLE <表名> VALIDATE STRUCTURE CASCADE;命令来检验是否存在潜在的问题。也可以用DBV工具来扫描表所在的数据文件。
这也可能是Consistent Read一致性读引起的该问题,那么alter system flush buffer_cache或者重启实例都可能可以绕过该问题。
ORA-00600[4511]相关的BUG 信息:
9724970 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check
4000840 9.2.0.7, 10.1.0.4, 10.2.0.1 Update of a row with more than 255 columns can cause block corruption