Oracle TXChecker THE TXCHECKER USER GUIDE


Introduction
TXChecker provides information useful for determining the nature and extent of corruption that may be introduced when forcing a database open with a combination
of _CORRUPTED_ROLLBACK_SEGMENTS and _OFFLINE_ROLLBACK_SEGMENTS.
When a database is opened with these parameters due to missing or corrupt undo datafiles, it is unknown how much data will be inaccessible (returning errors or
hanging/spinning) or force-committed. Force-committing can cause logical data corruption and possible data dictionary corruption. This tool provides information
necessary for the correction and repair of such corruption by identifying which objects will be effected.
TXChecker MUST only be used under the advice of Oracle Support Services. Failure to do so and forcing the database open will render the
database unsupported by Oracle.
Back to Contents
Overview
In cases were a problem occurs with one or more undo datafiles preventing the database from opening, two hidden and unsupported init.ora parameters
_OFFLINE_ROLLBACK_SEGMENTS and/or _CORRUPTED_ROLLBACK_SEGMENTS may be used by Oracle Support to aid in opening the database. These two
parameters change the way Oracle handles active transactions when the undo required to roll them back is not available. They should only be used under the advice
of Oracle Support due to the increased possibilities of introducing logical data corruptions throughout the database.
Without using TXChecker the only way to know which objects are affected is by forcing the database open and then manually dump the active transaction undo
chains. This method is very time consuming and can prove futile if after the database is opened you find out important objects are no longer accessible.
TXChecker will NOT help identify objects that may be negatively affected by using the _ALLOW_RESETLOGS_CORRUPTIONparameter. TXChecker will only identify objects that may require undo information to rollback active transactions or clean out dirty blocks. It will NOT identify objects
that are out of sync with each other or the data dictionary. Having datafiles from different points in time will produce inaccurate results from
TXChecker.
TXChecker will show the object name, owner and type of all objects involved in active transactions, assuming the undo datafiles can be read successfully. If the
active transaction chains cannot be read TXChecker can scan one or more datafiles to identify objects that may require undo data from missing or corrupted undo
segments. TXChecker results is easier for Oracle Support to decide the best course of action before forcing the database open. If all the affected objects can be
rebuilt from other sources and the data dictionary is not affected, the database can be opened without the need to rebuild the entire database using export/import.
TXChecker operates in a read-only manner and does NOT write to any Oracle datafiles, control files or log files. TXChecker will not damage the
database.
Back to Contents
Features
Version 1.4 of TXChecker provides the following features:
Show you which undo segments were online the last time the database was open. If the database was shutdown cleanly they should all be offline, except for
the system undo segment. If the database crashed, this information can then be used to set the relevant rollback segment parameters should they be needed
to force the database open.
Show you which undo segments contained active transactions at time of a crash, and which undo segments will report problems when Oracle tries to access
them. The undo chains for active transactions are also checked to identify the objects involved or what problems are encountered when traversing them.
Obviosly if the undo datafile cannot be read the undo segments can not be accessed.
Browse | Subscriptions | Authoring Wizard | Documents In Progress | Query By Attribute
Knowledge
-a When scanning datafiles (with -d/-f/-l/-t options) report objects using any of the undo segments (not just
those with errors) (OPTIONAL)
-b For objects found, print the datablock addresses (OPTIONAL)
-ccontrolfile_name Fully qualified controlfile name to read (MANDATORY)
For each active transaction, show the object owner, object name, partition name if a partitioned object and object type for all objects involved in that
transaction. A count of undo records found for each object is also shown.
Scan a single datafile, a list of datafiles, a tablespace or all datafiles in the database for datablocks that may need undo data for clean-out purposes, or
datablocks that are involved in active transactions. This is useful for determining what objects will need attention after the database is forced open.
For all objects identified as needing undo data, a SQL script is created which contains the commands to carry out an 'ANALYZE ... VALIDATE STRUCTURE' to
check for possible corruption or access problems.
Identify which undo segments were used to update each object. When the datablocks are listed in the report, the transactions IDs are also displayed.
Lists which undo segments were used before and after determined SCN values. This is useful to find out which undo segments would need to be named in
either of the underscore parameters to force the database open to allow active transactions to be force committed or dirty datablocks to be cleaned out.
Allows you to search the database for active transactions using a particular undo segment, a transaction ID or an undo segment and slot number, or a
transaction using an undo segment where the wrap number is greater than a determined value.
When identifying each affected datablock address, TXChecker will create a script which can be run to dump those datablocks for further analysis.
Will scan controlfiles and datafiles located on RAW devices.
Back to Contents
Supported Platforms
64-bit & 32-bit Sparc Solaris
64-bit & 32-bit Linux
64-bit & 32-bit HP-UX
64-bit AIX
32-bit MS Windows
TXChecker has had initial testing against versions 8.0 through to 10.2 but only on a limited basis so feedback is encouraged to let us know how well it worked, what
problems were encountered or what enhancements you would like to have added.
Back to Contents
Download TXChecker
Current TXChecker Version: 1.4 October, 2007
x86 Linux (v1.4) Click here to download the file
Sparc Solaris (v1.4) Click here to download the file
HP/UX - RISC (v1.4) Click here to download the file
AIX (v1.4) Click here to download the file
MS Windows (v1.4) Click here to download the file
If a file download dialog box does not appear when clicking on the above link, you may need to clear your web browser's cache and/or restart your web browser. If
you are still unable to download the file, you may request that we email you a copy: Stephan.Haisley@oracle.com
Back to Contents
Installing TXChecker
Download TXChecker using one of the links above. Copy the tar file to the directory where TXChecker is to be installed and issue the following commands.
Unix/Linux % tar xvfp TXChecker_version_platform.tar % cd TXChecker % chmod u+x TXChecker
Windows
1. Extract the files from the zip file into a directory:
C:\TXC> unzip TXChecker_verson_MSWindows.zip
2. Leave the Microsoft.VC80.CRT directory where it is to allow TXChecker to find the DLLs.
This User Guide is also uncompressed with full instructions on how to run the tool.
Back to Contents
Running TXChecker
To see the list of program arguments:
% TXChecker
Usage is: TXChecker [options]
Options:
-ccontrolfile_name Fully qualified controlfile name to read (MANDATORY)
-d Scan database for active TXs (use when undo not available) (OPTIONAL)
-ffilename Scan the named datafile for active TXs (OPTIONAL)
-g Indicates you want to find all blocks taking part in transactions with a USN > than the USN supplied in -x
parameter (same constraints as -w) (OPTIONAL)
-llistfile Scan all the datafiles listed in the listfile for active TXs (OPTIONAL)
-mminutes Number of minutes used to consider a TX as active (1-120) (DEFAULTS TO 15 MINUTES)
-p Show the names and last known status of the UNDO segments (OPTIONAL)
-s Skip read-only or offline normal datafiles (OPTIONAL)
-ttablespace Scan all the datafiles for this tablespace (OPTIONAL)
-u Report ITL entries active if marked with an upper bound ('U' flag) fast commit SCN instead if active
transactions (OPTIONAL)
-wwrap# Wrap# for XID in ITL entry to report blocks where wrap# > this one (OPTIONAL). Must use -x with this
option
-xXID XID for transaction wanting to search for (OPTIONAL)
Use format rrrr.ssss.wwwwwwww using Hexadecimal numbers. See the readme for full instructions on
using -x, -w and -g options
NOTE: Options -d/-f/-l/-t are exclusive, and only one should be specified.
NOTES:
Options -x (search for a transaction ID) and -a (search for all active transactions) are exclusive and will not work if specified together.
Options -w (search for transactions with a wrap number greater than one specified) and -g (search for transactions using an undo segment with a USN
greater than the number specified) are exclusive and should not be used together.
The dump_datablocks.sql script containing the commands to dump the affected datablocks will only be produced when using the -b option.
TXChecker will always create a logfile in the current directory called TXChecker_date_time_DBName.log. This MUST be reviewed and uploaded to Oracle
Support Services for analysis.
TXChecker uses the control file to gather the locations and names of all tablespaces and datafiles belonging to the database. Using a controlfile from a different
point in time to the rest of the database may result in errors identifying missing datafiles. If this happens, use the listfile (-l), tablespace (-t) or datafile (-d) options
to scan selected datafiles.
If there is adequate CPU and I/O bandwidth available TXChecker could be run in parallel to scan different tablespaces or datafiles concurrently, reducing the time it
takes to scan larger databases. To do this, simply run TXChecker from multiple sessions at the same time.
NOTE: TXChecker will always create a logfile in the current directory called TXChecker_date_time_DBName.log. This MUST be uploaded to Oracle
Support Services for analysis and for advice on how best to continue. Also upload a conttol file and file headers dump (alter session set events
'immediate trace name controlf level 12' and alter session set events 'immediate trace name file_hdrs level 10').
Common Examples
Here are some common examples of using TXChecker along with links to the output created by using the commands.
1. Check the undo segments for active transactions:
% ./TXChecker -c/u02/oradata/S102/control02.ctl
Example output: TXChecker_ex1.out
2. Scan the database for objects which may require undo from troubled undo segments, showing datablock addresses:
% ./TXChecker -c/u02/oradata/S102/control02.ctl -d -b
Example output: TXChecker_ex2.out
3. Find all objects in a list of datafiles which may require undo from any undo segments:
% ./TXChecker -c/u03/oradata/S901/control02.ctl -lfiles.S901 -a
Example output: TXChecker_ex3.out
See the Appendix B for details of creating a list file containing the names of the datafiles to be scanned.
4. Scan a single datafile for objects which may require undo from troubled undo segment with a particular transaction ID:
% ./TXChecker -f/u02/oradata/S101/users01.dbf -c/u01/oradata/S101/control01.ctl -x0001.0004.00000125 -b
Example output: TXChecker_ex4.out
5. Scan a tablespace for with non-committed transactions using a USN greater than a known USN:
% ./TXChecker -c/u02/oradata/S102/control02.ctl -tsystem -x0009.FFFF.FFFFFFFF -g
Example output: TXChecker_ex5.out
Back to Contents
Warnings and Restrictions
When TXChecker is scanning datafiles it is reading and checking every data block for active ITL (Interested Transaction List) entries that may require access to undo
information. To scan large datafiles or a large database TXChecker may take some time due to the amount of reads being carried out, so consider using the listfile
or tablespace option to scan the most important datafiles.
TXChecker has NOT been coded for OCFS or ASM files yet. This may be introduced in a later release.
TXChecker will currently not handle a database where there are more than 250 rows in the X$KCCFE table. This restriction has been removed in version 1.4.
Details of a workaround can be found in Appendix C.
Back to Contents
Future Release Features
Features that may be introduced in future releases of TXChecker include:
a. Add support for ASM, OCFS.
b. Create a script using the datablock addresses to be used to select data around affected datablocks. Create a similar script to try selecting the data
in the affected datablocks.
c. Add the ability to carry out more internal related undo problem diagnostics.
d. Add parallelism feature to scan datafiles in parallel.
Back to Contents
Reporting Feedback
If you encounter problems running TXChecker or would like to provide feedback or enhancement requests, please send emails to Stephan.Haisley@oracle.com.
I have done my best to test TXChecker on many different machines and conditions but if it does core dump or report unexpected errors, please send me the logfile,
stack trace (from the core) and if possible a binary copy of the controlfile and system datafile. If the system datafile is too big to upload to Oracle, copy the first 3000
blocks using dd:
% dd if=system_datafile of=system_datafile_copy bs=blocksize count=3000
Back to Contents
Appendix A: Output Example
The following logfiles were produced from some of the above common examples.
TXChecker_ex1.out
TXChecker started at: Thu May 3 12:06:17 2007
TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07)
Program command: TXChecker -c/u02/oradata/S102/control02.ctl
Database Name: S102 Version: 10.2.0
*** Database last checkpointed at 05/03/2007 12:05:42 (SCN: 0x0.0xc5f24c)
*** Using 60 minutes to find most active transactions (-m60)
*** Undo Segments:
USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online
SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000
USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59c2a XactSQN: 0x00000208 UndoSQN: 0x0000020b
USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59cc7 XactSQN: 0x000002a6 UndoSQN: 0x00000220
USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59c50 XactSQN: 0x0000029c UndoSQN: 0x00000277
...
USN: 10 Name: _SYSSMU10$ TBS#: 1 File: 2 Block: 153 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59c1c XactSQN: 0x000001f7 UndoSQN: 0x00000263
USN: 11 Name: _SYSSMU11$ TBS#: 1 File: 2 Block: 241 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0006f10a XactSQN: 0x00000003 UndoSQN: 0x00000002
USN: 12 Name: _SYSSMU12$ TBS#: 1 File: 2 Block: 289 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0006f0f0 XactSQN: 0x00000003 UndoSQN: 0x00000002
*** Active Transactions:
USN: 2 Name: _SYSSMU2$ File: 2 Block: 25 Instance: 0 Status: 3 - Online
* Active TX at slot 44 #undo blocks: 2 Last bk: 2.3103
USN: 6 Name: _SYSSMU6$ File: 2 Block: 89 Instance: 0 Status: 3 - Online
* Active TX at slot 24 #undo blocks: 7 Last bk: 2.1301
*** Objects that may require undo data:
Obj#: 51465 Name: OE.ITEM_ORDER_IX Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51466 Name: OE.ITEM_PRODUCT_IX Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51402 Name: OE.INVENTORIES Type: TABLE Undo recs: 95
Used undo segment IDs: 6,
Obj#: 51400 Name: OE.ORDERS Type: TABLE Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51396 Name: OE.ORDER_ITEMS Type: TABLE Undo recs: 15
Obj#: 51396 Name: OE.ORDER_ITEMS Type: TABLE Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51397 Name: OE.ORDER_ITEMS_PK Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51398 Name: OE.ORDER_ITEMS_UK Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51401 Name: OE.ORDER_PK Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51468 Name: OE.ORD_CUSTOMER_IX Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51469 Name: OE.ORD_ORDER_DATE_IX Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51467 Name: OE.ORD_SALES_REP_IX Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
if the undo required is not available.
NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_050307_1206_S102.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.
TXChecker_ex2.out
TXChecker started at: Thu May 3 12:32:26 2007
TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07)
Program command: TXChecker -c/u02/oradata/S102/control02.ctl -d -b
Database Name: S102 Version: 10.2.0
*** Database last checkpointed at 05/03/2007 12:05:42 (SCN: 0x0.0xc5f24c)
*** Using 60 minutes to find most active transactions (-m60)
*** Undo Segments:
USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online
SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000
USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59c2a XactSQN: 0x00000208 UndoSQN: 0x0000020b
USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59cc7 XactSQN: 0x000002a6 UndoSQN: 0x00000220
USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59c50 XactSQN: 0x0000029c UndoSQN: 0x00000277
...
USN: 10 Name: _SYSSMU10$ TBS#: 1 File: 2 Block: 153 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00c59c1c XactSQN: 0x000001f7 UndoSQN: 0x00000263
USN: 11 Name: _SYSSMU11$ TBS#: 1 File: 2 Block: 241 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0006f10a XactSQN: 0x00000003 UndoSQN: 0x00000002
USN: 12 Name: _SYSSMU12$ TBS#: 1 File: 2 Block: 289 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0006f0f0 XactSQN: 0x00000003 UndoSQN: 0x00000002
*** Active Transactions:
USN: 2 Name: _SYSSMU2$ File: 2 Block: 25 Instance: 0 Status: 3 - Online
* Active TX at slot 44 #undo blocks: 2 Last bk: 2.3103
USN: 6 Name: _SYSSMU6$ File: 2 Block: 89 Instance: 0 Status: 3 - Online
* Active TX at slot 24 #undo blocks: 7 Last bk: 2.1301
*** Objects that may require undo data:
Obj#: 51465 Name: OE.ITEM_ORDER_IX Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51466 Name: OE.ITEM_PRODUCT_IX Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51402 Name: OE.INVENTORIES Type: TABLE Undo recs: 95
Used undo segment IDs: 6,
Obj#: 51400 Name: OE.ORDERS Type: TABLE Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51396 Name: OE.ORDER_ITEMS Type: TABLE Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51397 Name: OE.ORDER_ITEMS_PK Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51398 Name: OE.ORDER_ITEMS_UK Type: INDEX Undo recs: 15
Used undo segment IDs: 2,
Obj#: 51401 Name: OE.ORDER_PK Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51468 Name: OE.ORD_CUSTOMER_IX Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51469 Name: OE.ORD_ORDER_DATE_IX Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
Obj#: 51467 Name: OE.ORD_SALES_REP_IX Type: INDEX Undo recs: 2
Used undo segment IDs: 2,
*** Undo segments (headers) that encountered errors preventing Active TX scan:
USN: 5 Name: _SYSSMU5$ File: 2 Block: 73 Instance: 0 Error: 20 - Bad block type
*** Scanning database for datablocks that may require undo:
*** Asterisk ('*') denotes blocks being updated since 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef)
Scanning datafile#: 5 - /u03/oradata/S102/example01.dbf (EXAMPLE) - Active TX blocks: 0
Scanning datafile#: 9 - /u02/oradata/S102/logmnr01.dbf (LOGMNR) - Active TX blocks: 0
Scanning datafile#: 6 - /u03/oradata/S102/perfstat01.dbf (PERFSTAT) - Active TX blocks: 0
Scanning datafile#: 8 - /u03/oradata/S102/stradmin01.dbf (STRADMIN) - Active TX blocks: 0
Scanning datafile#: 3 - /u02/oradata/S102/sysaux01.dbf (SYSAUX) - Active TX blocks: 23 *
Scanning datafile#: 1 - /u02/oradata/S102/system01.dbf (SYSTEM) - Active TX blocks: 486
Scanning datafile#: 1 - /u02/oradata/S102/system01.dbf (SYSTEM) - Active TX blocks: 486
Temporary datafile (/u03/oradata/S102/temp01.dbf) - SKIPPING
Scanning datafile#: 7 - /u02/oradata/S102/test01.dbf (TEST) - Active TX blocks: 0
Undo datafile (/u02/oradata/S102/undotbs01.dbf) - SKIPPING
Scanning datafile#: 4 - /u03/oradata/S102/users01.dbf (USERS) - Active TX blocks: 0
Scanning datafile#: 10 - /u03/oradata/S102/users02.dbf (USERS) - Active TX blocks: 0
*** Objects that may require undo data:
*** Asterisk ('*') denotes blocks being updated since 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef)
DataObj#: 593 Name: SYS.I_AW_OBJ$ Type: INDEX Datablocks: 1
Used undo segment IDs: 5,
DBAs:
0x00c04797 ( 3. 18327)
XID: 0005.0021.000000c3 Rows: 229 ----
DataObj#: 254 Name: SYS.I_H_OBJ#_COL# Type: INDEX Datablocks: 1
Used undo segment IDs: 5,
DBAs:
0x0040f320 ( 1. 62240)
XID: 0005.002a.00000267 Rows: 393 ----
DataObj#: 586 Name: SYS.AW_OBJ$ Type: TABLE Datablocks: 20
Used undo segment IDs: 5,
DBAs:
0x00c04373 ( 3. 17267)
XID: 0005.0021.000000c3 Rows: 24 ----
0x00c04374 ( 3. 17268)
XID: 0005.0021.000000c3 Rows: 25 ----
0x00c04376 ( 3. 17270)
XID: 0005.0021.000000c3 Rows: 11 ----
0x00c0478a ( 3. 18314)
XID: 0005.0021.000000c3 Rows: 21 ----
...
*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
if the undo required is not available.
Undo Segment Usage Summary
**************************
*** Undo segments identified in use by active transaction datablocks AFTER 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef):
USN: 5 Name: _SYSSMU5$
*** Undo segments identified in use by active transaction datablocks BEFORE 04/09/2007 00:00:58 (SCN: 0x0.0xc54fef):
USN: 5 Name: _SYSSMU5$
NOTE: The database was scanned for active transactions using the problemmatic undo segments ONLY.
To scan the database for ALL active transactions (using good and bad undo segments) use '-a' option
NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_050307_1232_S102.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.
TXChecker_ex3.out
TXChecker started at: Thu May 3 13:19:01 2007
TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07)
Program command: TXChecker -c/u03/oradata/S901/control02.ctl -lfiles.S901 -a
Database Name: S901 Version: 9.0.1
*** Database last checkpointed at 03/20/2007 02:27:18 (SCN: 0x0.0x452b955)
*** Using 60 minutes to find most active transactions (-m60)
*** Undo Segments:
USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 2 Instance: 0 SMU: N Status: 3 - Online
SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000
USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0452b950 XactSQN: 0x00011078 UndoSQN: 0x00004a8b
USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0452b944 XactSQN: 0x00010b60 UndoSQN: 0x000008a9
USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0452b94d XactSQN: 0x000110fe UndoSQN: 0x000008e2
...
USN: 9 Name: _SYSSMU9$ TBS#: 1 File: 2 Block: 137 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0452b94b XactSQN: 0x0001106e UndoSQN: 0x00004a8e
USN: 10 Name: _SYSSMU10$ TBS#: 1 File: 2 Block: 153 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.0452b946 XactSQN: 0x0001102f UndoSQN: 0x00004a80
*** Undo segments (headers) that encountered errors preventing Active TX scan:
USN: 1 Name: _SYSSMU1$ File: 2 Block: 9 Instance: 0 Error: 1 - Datafile was not accessible
USN: 2 Name: _SYSSMU2$ File: 2 Block: 25 Instance: 0 Error: 1 - Datafile was not accessible
USN: 3 Name: _SYSSMU3$ File: 2 Block: 41 Instance: 0 Error: 1 - Datafile was not accessible
NOTE: IT IS RECOMMENDED TO SCAN THE DATABASE FOR OBJECTS INVOLVED IN ACTIVE TXS USING THESE BAD UNDO SEGMENTS
*** Scanning datafiles for datablocks that may require undo:
*** Asterisk ('*') denotes blocks being updated since 03/20/2007 01:55:35 (SCN: 0x0.0x44fe999)
Scanning datafile#: 1 - /u02/oradata/S901/system01.dbf (SYSTEM) - Active TX blocks: 12 *
Scanning datafile#: 4 - /u03/oradata/S901/tools01.dbf (TOOLS) - Active TX blocks: 0
Scanning datafile#: 4 - /u03/oradata/S901/tools01.dbf (TOOLS) - Active TX blocks: 0
Scanning datafile#: 5 - /u03/oradata/S901/users01.dbf (USERS) - Active TX blocks: 0
*** Objects that may require undo data:
*** Asterisk ('*') denotes blocks being updated since 03/20/2007 01:55:35 (SCN: 0x0.0x44fe999)
DataObj#: 118 Name: SYS.I_ARGUMENT2 Type: INDEX Datablocks: 9 *
Used undo segment IDs: 3,4,5,6,7,10,
DataObj#: 45 Name: SYS.I_COL1 Type: INDEX Datablocks: 1 *
Used undo segment IDs: 10,
DataObj#: 119 Name: SYS.I_SOURCE1 Type: INDEX Datablocks: 2 *
Used undo segment IDs: 2,
*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
if the undo required is not available.
Undo Segment Usage Summary
**************************
*** Undo segments identified in use by active transaction datablocks AFTER 03/20/2007 01:55:35 (SCN: 0x0.0x44fe999):
USN: 2 Name: _SYSSMU2$
USN: 3 Name: _SYSSMU3$
USN: 4 Name: _SYSSMU4$
USN: 5 Name: _SYSSMU5$
USN: 6 Name: _SYSSMU6$
USN: 7 Name: _SYSSMU7$
USN: 10 Name: _SYSSMU10$
NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_050307_1319_S901.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.
TXChecker_ex4.out
TXChecker started at: Thu May 3 14:58:02 2007
TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07)
Program command: TXChecker -f/u02/oradata/S101/users01.dbf -c/u01/oradata/S101/control01.ctl -x0001.0004.00000125 -b
Database Name: S101 Version: 10.1.0
*** Database last checkpointed at 04/03/2007 17:11:29 (SCN: 0x0.0x33286)
*** Using 60 minutes to find most active transactions (-m60)
*** Undo Segments:
USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online
SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000
USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.0003309e XactSQN: 0x00000124 UndoSQN: 0x00000045
USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.000330a4 XactSQN: 0x00000121 UndoSQN: 0x0000003d
USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 3 - Online
SCN: 0000.00033098 XactSQN: 0x0000011c UndoSQN: 0x00000044
....
*** Active Transactions:
USN: 1 Name: _SYSSMU1$ File: 2 Block: 9 Instance: 0 Status: 3 - Online
* Active TX at slot 4 #undo blocks: 6 Last bk: 2.540
*** Objects that may require undo data:
*** Using XID: 1.4.125
Obj#: 9375 Name: SCOTT.OBJ Type: TABLE Undo recs: 207
Used undo segment IDs: 1,
*** Scanning datafiles for datablocks that may require undo:
*** Asterisk ('*') denotes blocks being updated since 04/03/2007 17:04:41 (SCN: 0x0.0x33102)
Scanning datafile#: 4 - /u02/oradata/S101/users01.dbf (USERS) - Active TX blocks: 112 *
*** Objects that may require undo data:
*** Asterisk ('*') denotes blocks being updated since 04/03/2007 17:04:41 (SCN: 0x0.0x33102)
*** Using XID: 1.4.125
DataObj#: 9375 Name: SCOTT.OBJ Type: TABLE Datablocks: 112 *
Used undo segment IDs: 1,
DBAs:
0x0100000c ( 4. 12)*
XID: 0001.0004.00000125 Rows: 81 ----
0x0100000d ( 4. 13)*
XID: 0001.0004.00000125 Rows: 84 ----
0x0100000e ( 4. 14)*
XID: 0001.0004.00000125 Rows: 79 ----
0x0100000f ( 4. 15)*
XID: 0001.0004.00000125 Rows: 79 ----
0x01000010 ( 4. 16)*
XID: 0001.0004.00000125 Rows: 80 ----
*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
if the undo required is not available.
Undo Segment Usage Summary
**************************
*** Undo segments identified in use by active transaction datablocks AFTER 04/03/2007 17:04:41 (SCN: 0x0.0x33102):
USN: 1 Name: _SYSSMU1$
NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_050307_1458_S101.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.
TXChecker_ex5.out
TXChecker started at: Thu May 3 15:18:27 2007
TXChecker - v1.3 by Center Of Expertise (COE), Oracle Corporation (build 04/13/07)
Program command: TXChecker -c/u02/oradata/S102/control02.ctl -tsystem -x0009.FFFF.FFFFFFFF -g
Database Name: S102 Version: 10.2.0
*** Database last checkpointed at 05/03/2007 13:13:06 (SCN: 0x0.0xc64144)
*** Using 60 minutes to find most active transactions (-m60)
*** Undo Segments:
USN: 0 Name: SYSTEM TBS#: 0 File: 1 Block: 9 Instance: 0 SMU: N Status: 3 - Online
SCN: 0000.00000000 XactSQN: 0x00000000 UndoSQN: 0x00000000
USN: 1 Name: _SYSSMU1$ TBS#: 1 File: 2 Block: 9 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.00c641a5 XactSQN: 0x00000208 UndoSQN: 0x0000020b
USN: 2 Name: _SYSSMU2$ TBS#: 1 File: 2 Block: 25 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.00c641b5 XactSQN: 0x000002a7 UndoSQN: 0x0000022a
USN: 3 Name: _SYSSMU3$ TBS#: 1 File: 2 Block: 41 Instance: 0 SMU: Y Status: 2 - Offline
SCN: 0000.00c641b4 XactSQN: 0x0000029d UndoSQN: 0x0000027e
...
*** Tablespace Scan => SYSTEM
*** Scanning tablespace for datablocks that may require undo:
*** Asterisk ('*') denotes blocks being updated since 05/03/2007 12:02:06 (SCN: 0x0.0xc5eafd)
Scanning datafile#: 1 - /u02/oradata/S102/system01.dbf (SYSTEM) - Active TX blocks: 1541
*** Objects that may require undo data:
*** Asterisk ('*') denotes blocks being updated since 05/03/2007 12:02:06 (SCN: 0x0.0xc5eafd)
*** Using XID: 9.ffff.ffffffff
DataObj#: 114 Name: SYS.I_IDL_UB11 Type: INDEX Datablocks: 3
Used undo segment IDs: 10,
DataObj#: 73 Name: SYS.IDL_UB1$ Type: TABLE Datablocks: 1538
Used undo segment IDs: 10,
*** Use validate_objects.sql script file to validate the structure of possibly corrupt objects
if the undo required is not available.
NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!NOTE!
*************************************************************************************
Upload the logfile (TXChecker_050307_1518_S102.log) to Oracle Support Services for analysis.
Do NOT attempt to force the database open until the logfile has been analyzed.
Back to Contents
Appendix B: Listfile Example
The listfile should contain a list of fully qualified datafile names:
/u02/oradata/S901/system01.dbf
/u03/oradata/S901/tools01.dbf
/u03/oradata/S901/users01.dbf
Back to Contents
Appendix C: Workaround to the 250 X$KCCFE rows limitation (version 1.3 ONLY)
The current version of TXChecker has a limitation where it will not work with a database that has more than 250 rows in the X$KCCFE table. This table represents
datafile entries contained in the controlfile and is the basis for the V$DATAFILE view.
Even when datafiles are deleted, their entries will still exist in X$KCCFE and X$KCCFN (file name table) until the slot is reused when new datafiles are added to the
database. It is possible after dropping a large number of datafiles there are a greater number of rows in x$kccfe compared to v$datafile, which as a consequence
will slow down queries of v$datafile. The only way to remove this row count difference is to rebuild the controlfile.
If there are genuinely a large number of datafiles in the database (more than 250) a new temporary controlfile can be created which will only contain subset of the
database datafiles, allowing TXChecker to run succesfully.
database datafiles, allowing TXChecker to run succesfully.
The instructions below will detail how to create a temporary controlfile for use with TXChecker that will contain 250 datafiles or less, working around the current
limitation. The controlfile should NOT be used to open the database.
NOTE: It is recommended to make a backup of the current controlfile before you do this, just in case you overwrite the controlfile by accident.
1. Mount the database using the current controlfile:
SQL> STARTUP MOUNT
2. Create the script to create the temporary controlfile:
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE NORESETLOGS;
SQL> show parameters user_dump_dest -- This is where the trace file is created
SQL> SHUTDOWN IMMEDIATE;
3. Edit the create controlfile script and remove the files that are not needed for scanning by TXChecker:
NOTE: Make sure you include the SYSTEM and UNDO datafiles as these are needed by TXChecker to access the data dictionary.
The location of the script to edit is located in the user_dump_dest directory.
Remove all lines before the first STARTUP MOUNT command
Remove any datafiles listed after 'DATAFILE' that you don't want to scan and will bring the datafile count under 250
Example:
CREATE CONTROLFILE REUSE DATABASE "F102" NORESETLOGS NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 500
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 '/u01/oradata/F102/redo01.log' SIZE 50M,
GROUP 2 '/u01/oradata/F102/redo02.log' SIZE 50M,
GROUP 3 '/u01/oradata/F102/redo03.log' SIZE 50M
DATAFILE
'/u01/oradata/F102/system01.dbf',
'/u01/oradata/F102/undotbs01.dbf',
'/u01/oradata/F102/sysaux01.dbf',
'/u01/oradata/F102/users01.dbf',
'/u01/oradata/F102/example01.dbf',
'/u01/oradata/F102/streams01.dbf',
...etc.
'/u01/oradata/F920/tbs25_6.dbf',
'/u01/oradata/F920/tbs25_7.dbf',
'/u01/oradata/F920/tbs25_8.dbf',
'/u01/oradata/F920/tbs25_9.dbf',
'/u01/oradata/F920/tbs25_10.dbf'
CHARACTER SET WE8ISO8859P1;
Remove any more text after the create controlfile statement.
NOTE: Make sure you remove any commands after the create controlfile statement!!!
Save the edited trace file and rename it to cr_ctl.sql
4. Edit the init.ora file so the create controlfile script will create the temporary controlfile in a new location without touching the current controlfile:
If you are using an spfile, you need to create a pfile first:
SQL> STARTUP NOMOUNT
SQL> CREATE PFILE FROM SPFILE; -- Created in the $ORACLE_HOME/dbs directory
SQL> SHUTDOWN IMMEDIATE
NOTE: If you are using an spfile move it out of the way before moving to step #5 so that it won't be read on startup.
Edit the init.ora in the $ORACLE_HOME/dbs directory and change the control_files parameter:
Change from (example):
control_files='/u01/oradata/F102/control01.ctl','/u01/oradata/F102/control02.ctl'
To:
control_files='/u01/oradata/F102/TXChecker_ctlfile.tmp'
5. Create the temporary controlfile:
SQL> @cr_ctl.sql
--- DO NOT OPEN the database --
SQL> SHUTDOWN IMMEDIATE;

6. Run TXChecker against this new controlfile. You can use the -d to scan the database, and all of the datafiles contained in the temporary controlfile will be scanned.

7. Edit the init.ora file back to use correct controlfiles or use the old spfile.
When the database is opened the correct controlfile will be used, like before.





Disclaimer
Oracle Support Services provides TXChecker for diagnostic purposes ONLY. It has been tested and appears to work as intended as a read-only tool against a
problem database to provide diagnostic data to be used by Oracle Support Services to help in formulating the best course of action after certain undo segment
related failures.


TXChecker will not provide you direct advice guaranteed to fix your failure and the logfile created MUST be uploaded to Oracle Support for further analysis.
You should always run new scripts on a test instance initially.

备份TB级别Oracle数据库的一些技巧

备份TB级别数据的一些技巧

1、 考虑使用增量备份, 不要老是想着用全量备份

2、 对于增量备份而言 开启block change tracking 能极大地减少物理读,提升速度

3、 11g以后对于bigfile tablespace可以启用section size 来提升速度

4、 考虑到负载更低的 Data Guard物理备库上去做备份

5、 启用备份并行

CONFIGURE DEVICE TYPE DISK PARALLELISM 16;

6、 对于多节点RAC 使用 service控制负载均衡

srvctl add service –d  -s  -r  -a ,
srvctl add service –d  -s  -r  -a ,
For example
srvctl add service –d maclean –s bkup1 –r maclean1 –a maclean2,maclean3,maclean4,maclean5,maclean6,maclean7,maclean8
srvctl add service –d maclean –s bkup2 –r maclean2 –a maclean1,maclean3,maclean4,maclean5,maclean6,maclean7,maclean8
srvctl add service –d maclean –s bkup3 –r maclean3 –a maclean1,maclean2,maclean4,maclean5,maclean6,maclean7,maclean8
srvctl add service –d maclean –s bkup4 –r maclean4 –a maclean1,maclean2,maclean3,maclean5,maclean6,maclean7,maclean8
srvctl add service –d maclean –s bkup5 –r maclean5 –a maclean1,maclean2,maclean3,maclean4,maclean6,maclean7,maclean8
srvctl add service –d maclean –s bkup6 –r maclean6 –a maclean1,maclean2,maclean3,maclean4,maclean5,maclean7,maclean8
srvctl add service –d maclean –s bkup7 –r maclean7 –a maclean1,maclean2,maclean3,maclean4,maclean5,maclean6,maclean8
srvctl add service –d maclean –s bkup8 –r maclean8 –a maclean1,maclean2,maclean3,maclean4,maclean5,maclean6,maclean7

srvctl start service –d  -s <service_name #1>
Alternatively, to start all the services for a particular database
srvctl start service –d 

run {
allocate channel ch01 connect string ‘sys/@<scan_address>/<service_name>;
allocate channel ch02 connect string ‘sys/@<scan_address>/<service_name>;

allocate channel ch16 connect string ‘sys/@<scan_address>/<service_name>;
backup database …;
}

For example

run
{
allocate channel ch01 device type disk connect 'sys/welcome1@maclean-scan/bkup1';
allocate channel ch02 device type disk connect 'sys/welcome1@maclean-scan/bkup2';
allocate channel ch03 device type disk connect 'sys/welcome1@maclean-scan/bkup3';
allocate channel ch04 device type disk connect 'sys/welcome1@maclean-scan/bkup4';
allocate channel ch05 device type disk connect 'sys/welcome1@maclean-scan/bkup5';
allocate channel ch06 device type disk connect 'sys/welcome1@maclean-scan/bkup6';
allocate channel ch07 device type disk connect 'sys/welcome1@maclean-scan/bkup7';
allocate channel ch08 device type disk connect 'sys/welcome1@maclean-scan/bkup8';
allocate channel ch09 device type disk connect 'sys/welcome1@maclean-scan/bkup1';
allocate channel ch10 device type disk connect 'sys/welcome1@maclean-scan/bkup2';
allocate channel ch09 device type disk connect 'sys/welcome1@maclean-scan/bkup1';
allocate channel ch10 device type disk connect 'sys/welcome1@maclean-scan/bkup2';
allocate channel ch11 device type disk connect 'sys/welcome1@maclean-scan/bkup3';
allocate channel ch12 device type disk connect 'sys/welcome1@maclean-scan/bkup4';
allocate channel ch13 device type disk connect 'sys/welcome1@maclean-scan/bkup5';
allocate channel ch14 device type disk connect 'sys/welcome1@maclean-scan/bkup6';
allocate channel ch15 device type disk connect 'sys/welcome1@maclean-scan/bkup7';
allocate channel ch16 device type disk connect 'sys/welcome1@maclean-scan/bkup8';
backup
as backupset
incremental level 0
section size 128g
database;
}

【Oracle数据恢复】ORA-00600[kdBlkCheckError]错误解析

【Oracle数据恢复】ORA-600[kdBlkCheckError]错误解析

kdBlkCheckError本质上是 Kernel Data Block Check Error, 是对数据块的逻辑检测配合db_block_checking的存在,当db_block_checking!=false 且其发现了一个损坏,则报一个ORA-600 [kddummy_blkchk] / ORA-600 [kdBlkCheckError] . 进一步的损坏/坏块将被终止

kdBlkCheckError/kddummy_blkchk有一大堆的检测代码,每一个检测代码对应对数据块中数据的一个逻辑分析,如果分析发现存在mismatch则认为数据块存在逻辑讹误。

例如检测代码23001 代表Wrong total extent count。

如果出现ORA-00600[kdBlkCheckError]则一般是ORACLE 软件BUG  或者内存存在讹误。

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!、

与该ORA-00600[kdBlkCheckError]错误相关的Bug Note如下:

 

NB Bug Fixed Description
17447078 12.1.0.2, 12.2.0.0 Diagnostic enhancement for ORA-600 [kdBlkCheckError] .. [18007] errors
14400110 11.2.0.4, 12.2.0.0 Bad redo / ORA-600 [kdBlkCheckError] .. [6135] for opcode 19.1 redo
12349316 11.2.0.4, 12.2.0.0 DBMS_SPACE_ADMIN.TABLESPACE_FIX_BITMAPS fails with ORA-600 [kddummy_blkchk] / ORA-600 [kdBlkCheckError] / ORA-607
14351566 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 ORA-600 [kclchkblk_4] when doing flash back
16347904 11.2.0.4, 12.2.0.0 Corrupt block with check code 6101, 6110 or 6255 on compressed table with DML workload
17325413 12.1.0.2 Drop column with DEFAULT value and NOT NULL definition ends up with dropped column data hitting disk leading to corruption
14741727 11.2.0.2.9, 11.2.0.2.BP19, 11.2.0.3.BP12, 11.2.0.3.BP13, 12.1.0.1 Fixes for bug 12326708 and 14624146 can cause problems – backout fix
14551844 11.2.0.4, 12.1.0.1 ORA-600 [kdblkcheckerror] [6126] / ORA-600 [17182] for DELETE / INSERT (QMD / QMI) in COMPRESS BASIC table due to negative avsp
13804294 11.2.0.3.4, 11.2.0.3.BP07, 11.2.0.4, 12.1.0.1 Internal errors, corruptions, using pipelined function whose rows raise exceptions
13715932 11.2.0.4, 12.1.0.1 CREATE TABLE fails with ORA-600 [kddummy_blkchk] with large datafile
* 13605839 11.2.0.3.8, 11.2.0.3.BP21, 11.2.0.4, 12.1.0.1 ORA-600 [ktbsdp1] ORA-600 [kghfrempty:ds]. Corruption in Rollback with Clusterwide Global Transactions in RAC
12417369 11.2.0.2.5, 11.2.0.2.BP13, 11.2.0.2.GIPSU05, 11.2.0.3, 12.1.0.1 Block corruption from rollback on compressed table
* 10205230 11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.1 ORA-600 / corruption possible during shutdown in RAC
10180121 11.2.0.3, 12.1.0.1 ORA-600 [kdBlkCheckError] .. [6251] / block corruption during parallel DML
+ 9724970 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check
* 9711859 10.2.0.5.1, 11.1.0.7.6, 11.2.0.2, 12.1.0.1 ORA-600 [ktsptrn_fix-extmap] / ORA-600 [kdblkcheckerror] during extent allocation caused by bug 8198906
9541485 11.2.0.4, 12.1.0.1 Create materialized view on views based on CON$ fails with ORA-600 [kdBlkCheckError]
* 9406607 11.2.0.1.3, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.1 Corrupt blocks in 11.2 in table with unique key. OERI[kdBlkCheckError] by block check
9295217 11.2.0.2, 12.1.0.1 ORA-600 [ktsk_dba_to_hwm-1] / corruption during SHRINK of HWM
+ 9019113 11.2.0.1.BP02, 11.2.0.2, 12.1.0.1 ORA-600 [17182] for OLTP COMPRESS table in OLTP Compression REDO during RECOVERY
8720802 10.2.0.5, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.1 Add check for row piece pointing to itself (db_block_checking,dbv,rman,analyze)
* 8331063 11.2.0.3, 12.1.0.1 Corrupt Undo. ORA-600 [2015] during rollback in undo block for COMPRESS table with SUPPLEMENTAL LOGGING
8277580 11.1.0.7.2, 11.2.0.1, 11.2.0.2, 12.1.0.1 Corruption on compressed tables during Recovery and Quick Multi Delete (QMD).
6523037 11.2.0.1.BP07, 11.2.0.2.2, 11.2.0.2.BP01, 11.2.0.3, 12.1.0.1 Corruption / ORA-600 [kddummy_blkchk] [6110] on update
8437213 10.2.0.4.3, 10.2.0.5, 11.1.0.7.7, 11.2.0.1 ASSM first level bitmap block corruption
8360192 11.1.0.7.6, 11.2.0.1 ORA-600 [kdBlkCheckError] [6110] / corruption from insert
* 8198906 10.2.0.5, 11.2.0.1 OERI [kddummy_blkchk] / OERI [5467] for an aborted transaction of allocating extents
7715244 11.1.0.7.2, 11.2.0.1 Corruption on compressed tables. Error codes 6103 / 6110

【Oracle ASM数据恢复】ORA-15038: disk ‘XXXXXXX’ mismatch on ‘Time Stamp’ with Target Disk Group错误解析

 

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

如果在mount diskgroup时遇到如下错误,则你有必要读一下这篇文章:

 

SQL> alter diskgroup DATA mount;
alter diskgroup DATA mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "17" is missing from group number "1"
ORA-15042: ASM disk "16" is missing from group number "1"
ORA-15042: ASM disk "15" is missing from group number "1"
ORA-15042: ASM disk "14" is missing from group number "1"
ORA-15042: ASM disk "13" is missing from group number "1"
ORA-15038: disk 'ORCL:DATA25' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156]
ORA-15038: disk 'ORCL:DATA24' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156]
ORA-15038: disk 'ORCL:DATA23' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156]
ORA-15038: disk 'ORCL:DATA22' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156]
ORA-15038: disk 'ORCL:DATA21' mismatch on 'Time Stamp' with target disk group [1861040353] [1861808156]

[oracle@mlab2 ~]$ oerr ora 15042
15042, 00000, "ASM disk \"%s\" is missing from group number \"%s\" "
// *Cause:  The specified disk, which is a necessary part of a diskgroup,
//          could not be found on the system.
// *Action: Check the hardware configuration.
//
[oracle@mlab2 ~]$ oerr ora 15038
15038, 00000, "disk '%s' mismatch on '%s' with target disk group [%s] [%s]"
// *Cause:  An attempt was made to mount into a disk group a disk whose
//          recorded allocation unit size, metadata block size, physical 
//          sector size, or creation time stamp was inconsistent with the other 
//          disk group members.
// *Action: Check if the system configuration has changed. Verify disk 
//          discovery string. 
//

 

 

 

该错误一般是由于OS级别的多路径配置存在问题,导致某个节点上出现大量重复的存储设备而引起的。

对于该类问题,首先从alert.log确认最近一次成功mount时那些磁盘对mount而言是必要的,例如:

 

 

NOTE: cache opening disk 11 of grp 1: DATA2 label:DATA2 <<<<<<<<<<<<<<<<<< 5 NOTE: cache opening disk 12 of grp 1: DATA20 label:DATA20 >
NOTE: cache opening disk 13 of grp 1: DATA3 label:DATA3 <<<<<<<<<<<<<<<<<<<<<< 6
NOTE: cache opening disk 14 of grp 1: DATA4 label:DATA4
NOTE: cache opening disk 15 of grp 1: DATA5 label:DATA5
NOTE: cache opening disk 16 of grp 1: DATA6 label:DATA6
NOTE: cache opening disk 17 of grp 1: DATA7 label:DATA7
NOTE: cache opening disk 18 of grp 1: DATA8 label:DATA8
NOTE: cache opening disk 19 of grp 1: DATA9 label:DATA9

 

 

 

接下来可以通过kfed来验证DISK HEADER,使用如下脚本:

#! /bin/sh
rm /tmp/kfed_DH.out /tmp/kfed_FS.out /tmp/kfed_BK.out /tmp/kfed_FD.out /tmp/kfed_DD.out /tmp/kfed_PST.out
for i in `ls /dev/asm-disk*`
do
echo $i >> /tmp/kfed_DH.out
kfed read $i >> /tmp/kfed_DH.out
echo $i >> /tmp/kfed_FS.out
kfed read $i blkn=1 >> /tmp/kfed_FS.out
echo $i >> /tmp/kfed_BK.out
kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out
echo $i >> /tmp/kfed_FD.out
kfed read $i aun=2 blkn=1 >> /tmp/kfed_FD.out
echo $i >> /tmp/kfed_DD.out
kfed read $i aun=2 blkn=2 >> /tmp/kfed_DD.out
echo $i >> /tmp/kfed_PST.out
kfed read $i aun=1 blkn=2 >> /tmp/kfed_PST.out
done

【Oracle数据恢复】ORACLE数据库启动startup 停止shutdown相关文档汇总

【Oracle数据恢复】ORACLE数据库启动startup 停止shutdown相关文档汇总

 

在ORACLE数据库启动过程中,首先启动的是实例instance即nomount阶段, 之后再通过CONTROL_FILES 参数指定的控制文件位置来加载数据库MOUNT阶段。 下一个步骤是打开数据库open ,这个步骤包含打开数据文件和redo logfile重做日志文件。接着如果之前的一次shutdown不是优雅关闭的话, 则需要完成前滚(实际就是apply redo)和回滚未提交的事务(rollback)。

数据库的停止和启动一样也分成几个阶段。 首先数据库被关闭 close这让数据库不在发生变化,即数据文件和redo logfile日志文件均被关闭。接着数据库进入dismount阶段,实例将与数据库脱离联系。当数据库被unmount之后,ORACLE数据库关闭其控制文件。下一步是 移除SGA和中止后台进程 这样实例就算关闭了。

有多种shudown 的模式例如normal,immediate,transaction ,和abort. 除去SHUTDOWN ABORT外,数据库都会将SGA中必要的数据写出到磁盘上的数据文件和redo logfile中。 如果SHUTDOWN ABORT或者发生异常中止,则SGA中必要的数据仍未写出,需要在下次数据库启动时 完成crash recovery,这些都是ORACLE数据库自动完成的。

Startup upgrade/migrate 或者 设置  _system_trig_enabled = FALSE 都将在启动过程中禁止触发器

如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!

 

Startup Slow / Hang

Startup can hang in any of the stages like nomount,mount or open stage. Following are some of the known issues reported so far:

Note 1367724.1 Startup Hangs at Mount Stage after AIO is Enabled on Linux
Note 429390.1 Database Startup Takes Longer time After Upgrade To 10.x
Note 552019.1 New Install and Creation of a 10gR2 database Hangs at Startup Nomount
Note 344933.1 DB Startup Can Hang if USE_INDIRECT_DATA_BUFFERS=TRUE and AWE_WINDOW_MEMORY Is Set Too High
Note 838451.1 Startup Hangs When "Processes" Parameter is Higher Than 10800
Note 1076092.1 Instance Startup Hangs After Creating New Undo Tablespace and Switching Between Old and New
Note 1476526.1 Database STARTUP NOMOUNT very slow after migrate to T4-4 Servers SPARC 64-Bit
Note 1475621.1 Oracle Database Startup And SGA Value Change Takes A Very Long Time To Complete

ORA-27102 Errors

ORA-27102 errors normally occurs due to memory issues.The Common causes could due to Semaphore Kernel misconfigurations,Memory related Ulimit settings,RAM and swap configurations.

Note 274092.1 LOCK_SGA on Windows fails with ORA-27102
Note 219752.1 ORA-27102 OSD-00034 Starting Database on Windows 2000
Note 390547.1 ORA-27102 Cannot Startup Instance via sqlplus
Note 1060677.6 ORA-27100 ORA-27102 Trying to Start 8.0.4 Database
Note 399895.1 Database Startup On Solaris 10 Fails With Ora-27102 Out Of Memory Error
Note 467707.1 ORA-27102: Out Of Memory on Oracle 10g Solaris 10 (x86-64)
Note 7272646.8 Bug 7272646 - Linux-x86_64: ORA-27103 on startup when MEMORY_TARGET > 3g
Note 790205.1 UNABLE TO START INSTANCE WITH LARGE SGA ORA-27102 SVR4 ERROR: 22: INVALID ARGUMENT
Note 263537.1 ORA-27102 out of memory When Trying To Start Database With SGA> 4G
Note 842881.1 ORA-27102, OSD-00031 Unable To Extend Memory_max_target And Memory_target Past 2GB
Note 1449714.1 Startup Fails With ORA-27102 After Upgrade From 10.2.0.3 To 10.2.0.4
Note 461519.1 ORA-27102 Database Will Not Start With SHMMAX Set To 8589934592 (8GB)
Note 351930.1 Receiving ORA-27102 For 32-Bit Oracle While Allocating Sga_max_size Greater Than 4gb
Note 1351705.1 Startup Fails with ORA-27102 and 'SVR4 Error 28: No space left on device'
Note 1292225.1 ORA-27102 OSD-00025 O/S-Error: (OS 1453) When Lock_sga is Set to True
Note 401077.1 Ora-27102: Out Of Memory: Linux Error: 12: Cannot Allocate Memory with LOCK_SGA=TRUE
Note 577898.1 ORA-27102 Received At Startup When LOCK_SGA Is Set Although Enough Memory Is Available
Note 301830.1 Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device
Note 859898.1 The ORA-27102 error is generated on Solaris 10 having apparently correct settings of kernel parameters
Note 779861.1 ORA-27102: Out Of Memory And SVR4 Error: 22: Invalid argument During Startup On Solaris10 Server With Multiple DB Instances

ORA-27300 Errors

These errors are generally reported when the Operating System called for error or when there was a connection killed or a network interconnection failures or an OS configuration issue.The error ORA-27300 will also be accompanied by ora-27301 and ora-27302

Note 560309.1 Database Cannot Start Due to Lack of Memory
Note 579365.1 Troubleshooting ORA-27300 ORA-27301 ORA-27302 errors
Note 314179.1 Instance Startup Fails With Error ORA-27154,ORA-27300,ORA-27301,ORA-27302
Note 812115.1 Startup Fails With ORA-27300: Os System Dependent Operation:Fork Failed With Status:17
Note 814896.1 Startup Fails With ORA-27300: OS system dependent operation:IPC init failed with status: 65
Note 949468.1 Database Startup Fails with ORA-27300: OS system dependent operation:semget failed with status: 28

ORA-64 Errors

This error could occur when the database init.ora parameter calling for more resources than the Operating System is configured to provide.The parameters could be PROCESSES,DB_BLOCK_SIZE,SGA and more.

Note 470742.1 ORA-00064 Starting Database With Dispatchers
Note 556258.1 ORA-64 if Db_keep_cache_size is Set to 70 Gigs
Note 283980.1 ORA-00064: object is too large to allocate on this O/S
Note 1232463.1 ORA-00064: Object Is Too Large To Allocate On This O/S
Note 4466378.8 Bug 4466378 - ORA-64 does not report the caller description
Note 179301.1 Instance Startup Fails With ORA-00064 After Increasing Processes
Note 1457812.1 ORA-00064 Error Reported After Increasing Processes Parameter value
Note 18255.1 OERR: ORA 64 object is too large to allocate on this O/S <num, num>
Note 310838.1 Instance Startup failed with ORA-00064 when processes parameter set to High Value
Note 886312.1 Database startup can fail with ORA-00064 Errors with huge sga_target of over 40Gig
Note 1328620.1 ASM Instance Is Not Coming Up ORA-00064 (1,4468736,Kfchl Array) Kfchl Array
Note 815954.1 ORA-64 when starting ASM instance after changing db_cache_size and shared_pool_size
Note 7659217.8 Bug 7659217 - ORA-64 attempting to startup with a large SGA / buffer cache size
Note 386855.1 'startup migrate' failed with ORA-64 while upgrading to 10.2.0.2 with DBUA

ORA-27123 Errors

Note 167250.1 ORA-27123 When Connecting As Non Oracle User 
Note 207743.1 Getting ORA-27123 when trying to startup Oracle
Note 115753.1 UNIX: Resolving the ORA-27123 error
Note 307323.1 Ora-27123 When Creating New Database
Note 167250.1 ORA-27123 When Connecting As Non Oracle User 
Note 207743.1 Getting ORA-27123 when trying to startup Oracle
Note 61912.1 OERR: ORA-27123 unable to attach to shared memory segment
Note 250966.1 ORA-27123 During Startup, Immediately after a Shutdown
Note 872532.1 ORA-27123 on RHEL5 (PAE) 32bit when SGA larger than 2Gb
Note 552633.1 Starting the Database with SGA_TARGET set Fails with ORA-27123
Note 437582.1 Export Fails With EXP-00056 ORA-01034 ORA-27123 EXP-00005
Note 733974.1 ORA-27123 During Startup Nomount in 11G on AIX, Failure in SHMAT()
Note 369262.1 Startup with Maximum SGA Fails With Ora-27123 Unable To Attach Sga
Note 735187.1 Cannot Create Database With DBCA - Startup Nomount Gives ORA-27123
Note 1268668.1 Oracle Database 10g R2 Version 10.2.0.3 - Receiving Ora-27123 Errors
Note 390766.1 ora-27123 on Solaris 10 with larger than 1.6 - 1.7Gb SGA
Note 384262.1 ORA-01034, ORA-27123, HP-UX Error 22 Connecting Via Oracle Net
Note 149070.1 VMS: Connections fail with ORA-1034 and ORA-27123 errors
Note 453930.1 Connecting to the database fails with ORA-12547, Ora-600 [Ksmlsge1], ORA-27123, Error 13
Note 207797.1 ORA-1034 ORA-27123 SVR4 Error: 13: Permission denied when other then Oracle user
Note 356957.1 OpenVMS: Client Connections Report ORA-1034, ORA-27123, %SYSTEM-W-REGISFULL
Note 1401726.1 ORA-27123 When Starting Instance With No Setting Of SGA_TARGET Or SGA_MAX_SIZE
Note 730107.1 Getting ORA-27123 & not able to run Oracle when logging to server by a user other than that installed Oracle although it belongs to the same group

ORA-1081 Errors

This error could occur when we try to startup an instance that is already running or if the shared segments/semaphores already exist.

Note 1010214.6 ORA-1081: Starting Instance
Note 18657.1 OERR: ORA 1081 cannot start already-running Oracle - shut it down first
NFS Related Issues

Note 8418190.8 Bug 8418190 - Direct NFS warnings during database startup
Note 236794.1 NFS Locking Problems Encountered During Database Startup
Note 971406.1 DATABASE STARTUP HANGS AT MOUNTING CONTROLFILE WHEN DNFS IS ENABLED
Note 1430654.1 Database Startup Failed with "Direct NFS: please check that oradism is setuid"
Note 430920.1 NetApp: Using 'nolock' NFS Mount Option with non-RAC Systems Results in Database Corruption
ORA-600 /ORA-7445 Errors 

Note 435436.1 ORA-00600: [kccpb_sanity_check_2] During Instance Startup
Note 101589.1 Startup database returns ORA-00600 [ktpridestroy2]
Note 405602.1 ORA-600 [16305] While Starting Up the Database
Note 466596.1 Core Dumps In skgfqio() - Database Startup Hangs/Spins
Note 336447.1 Startup Database Produces ORA-00600: [Keltnfy-Ldminit]
Note 847786.1 Can Not Open Database After Shutdown get ORA-7445 [kewa_dump_time_diff]
Note 779071.1 Unable to Start Instance Due to ORA-600 [skgmhash] after a clean shutdown
Note 549000.1 ORA-600 [6006] ORA-600 [6856] During Startup Instance, Followed by Termination by SMON
Note 453775.1 Database Startup Fails With ORA-7445 [INVALID PERMISSIONS FOR MAPPED OBJECT] After Creation of User LBACSYS

Transaction Recovery Slowness

There could be slowness in the database during the open phase when the database is busy performing transaction recovery.

Note 1494886.1 Database Transaction Recovery
Note 414242.1 Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery
Note 12934890.8 Bug 12934890 - Startup hangs waiting for row cache lock due to open transaction against UNDO$
ORA-704 Errors

This is a general error reported at startup when there is some problem during processing of bootstrap information.There should be an accompanying error/s.

Note 18494.1 OERR: ORA 704 "bootstrap process failure"
Note 560417.1 Recovery Through Upgrade returns ORA-1092 on Open
Note 1349722.1 Ora-00704,Ora-39700: Database Must Be Opened With Upgrade Option
Note 435337.1 Unable To Open Database Before/After Upgrade - ORA-00704 ORA-39700 ORA-01092
Note 1383179.1 Unavailable Bootstrap Object ACCESS$ Causes ORA-704 ORA-604 ORA-942 When Opening Database
Note 1345417.1 After failed upgrade, startup from a restored backup fails on ORA-00704 and ORA-39700
ORA-9968 Errors

There are some client shadow processes hanging. Although the lk< SID> file is deleted the hanging processes still have a lock on the open file handle. This prevents the database to startup although a new lk file can be created successfully. An oracle process (background or shadow process) that exists while the instance is not started (crashed or not cleanly stopped) can have a lock on a file while this file is actually removed from the system. This is because on UNIX there is still a lock on the open file handle.

Note 467251.1 ORA-09968, ORA-01102 When Starting a Database 
Note 160395.1 Database Startup Fails with ORA-1102 and ORA-9968
Note 1488147.1 Instance Startup Raises Error ORA-09968: unable to lock file (Doc ID 1488147.1)
ORA-12547 Errors

The error ORA-12547 indicates that the communication channel has been broken. It's most often thrown because the other end of the process went away unexpectedly.

Note 1307075.1 Oracle Database Fails to Start with Error ORA-12547
Note 381566.1 connect / as sysdba Fails with Ora-12547 And Tns-12514
Note 744512.1 Ora-12547: Tns:Lost Contact Creating Database After Clean Installation
ORA-27125 Errors

Note 1067569.6 HP-UX: ORA-27125: NOT OWNER TRYING TO LOCK THE SGA IN MEMORY
Note 199068.1 OpenVMS: Instance startup fails with ORA-27125 and %SYSTEM-F-VA_IN_USE 
Note 121983.1 Starting Database Fails on Solaris with ORA-27126 or ORA-27125 Using LOCK_SGA
Note 577428.1 OpenVMS: Following an Oracle RELINK, Database Instance Startup fails with ORA-27125 or ORA-7217
ORA-1157 Errors 

The background process was not able to find one of the datafiles.The database will prohibit access to this file but other files will be unaffected.However, the first instance to open the database will need to access all online datafiles.Accompanying messages from the operating system will describe why the file was not found.

Note 184327.1 ORA-1157 Troubleshooting
Note 1035992.1 Oracle Troubleshooting
Note 212053.1 ORA-1157/ORA-1110 Trying To Open The Database
Note 145194.1 ORA-1157 ORA-1110 ORA-27086 Starting up Database
Note 444151.1 ORA-01157 on Database Startup After Dropping an Alias
Note 301635.1 ORA-01157, ORA-01110, ORA-27046 Starting A Restored Database
Note 429912.1 ORA-01157 ORA-01110 ORA-27086 after crash prevents database from opening
Note 256835.1 Database Startup Fails With ORA-1110, ORA-1157, ORA-27092 Trying Startup From 'at' or 'cron' on HP-UX
Other Issues

Note 1113864.1 MBIND: Cannot Allocate Memory On Startup
Note 578536.1 MBIND: CANNOT ALLOCATE MEMORY ON STARTUP
Note 6795133.8 Bug 6795133 - Startup delayed by QMNC queries
Note 301072.1 Dbstart Fails With Ora-01031 When Called From User Root
Note 1286665.1 ORA-00371: Not Enough Shared Pool Memory signalled on Startup
Note 779356.1 Database not starting up with errors ORA-01092 ORA-24324 ORA-01041
Note 1176443.1 ORA-4031 During Startup Nomount using RMAN without parameter file (PFILE)
Note 839789.1 ORA-12853 / ORA-4031 or ORA-4030 on Instance Startup With increased SGA size
Common Errors / Issues During Database Shutdown

The most common issue observed while bringing down the database is shutdown immediate hang. The main reasons for Shutdown immediate hang is:
- processes still continue to be connected to the database and do not terminate.
- SMON is cleaning temp segments or performing delayed block cleanouts.
- Uncommitted transactions are being rolled back.

The below section provides the consolidated list of known issues during shutdown. The documents mentioned in the below section can be specific to platform or database versions.

Shutdown Slow / Hang

Note 1197314.1 Shutdown Normal Hung On ORA_J00# Process
Note 309230.1 Database Doesn't Shutdown Immediate During Server Boot
Note 1039389.6 Alert Log: Shutdown Waiting for Active Calls to Complete
Note 305666.1 Shutdown is Cancelled With ORA-1013 After Waiting for an Hour
Note 1194229.1 Database shutdown immediate Hangs: Startup can also hang
Note 1183213.1 Shutdown Normal or Immediate Hang Waiting for MMON process
Note 428688.1 Bug 5057695: Shutdown Immediate Very Slow To Close Database
Note 416658.1 Shutdown Immediate Hangs / Active Processes Prevent Shutdown
Note 437876.1 Database Does Not Shutdown Cleanly When Oracle Service Is Restarted
Note 304414.1 Shutdown hangs in 9i with: SHUTDOWN: waiting for logins to complete
Note 332177.1 Database Shutdown Immediate Takes Forever, Can Only Do Shutdown Abort
Note 13440516.8 Bug 13440516 - Index skip scan cannot be interrupted - can block shutdown
Note 12879056.8 Bug 12879056 - Index skip scan cannot be interrupted - can block shutdown
ORA-1031 Errors

Note 309059.1 Oradim Command Fails to Shutdown Database(s) with ORA-01031 under 9.2.0.6 
Note 846679.1 Ora-1031 Error Stopping Database Or Permission Denied Error Running Lsnrctl
Transaction Recovery

Note 375935.1 What To Do and Not To Do When 'shutdown immediate' Hangs
Note 1076161.6 Shutdown Normal or Shutdown Immediate Hangs. SMON disabling TX Recovery
Note 414242.1 Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery
Note 100054.1 Transaction Rollback after a failed operation or during Database Shutdown
ORA-600 / ORA-7445 Errors

Note 604067.1 Ora-600[3708] On Database Shutdown
Note 455181.1 ORA-00600[17302] During Shutdown Immediate
Note 470362.1 ORA-07445 With kpogup At Database Shutdown
Note 1135453.1 Database Hung On Shutdown After ORA-600 [KGHFRE3] Error
Note 359563.1 ORA-00600: Internal Error Code, Arguments: [17302], [2] During Shutdown
Note 435926.1 Shutdown Database Erroring ORA-600 [Librarycachenotemptyonclose], []
Note 8519322.8 Bug 8519322 - ORA-600 [17148] / ORA-600 [730] on database shutdown
Note 1326908.1 Ora-00600: [3708], ORA-600 [2103] When Database shutdown on IBM:Linux on System Z
ORA-24324 Errors

Note 794293.1 ORA-24324 During Shutdown 
Note 1168554.1 Ora-24324 And Ora-1041 Errors Trying To Startup Or Shutdown The Database
DISM

Note 1001248.1 On Solaris 9 Systems, Oracle Shutdown May Hang If Utilizing Dynamic Initmate Shared Memory (DISM) 

Memory Related Errors

Note 1319253.1 "ERROR: SGA memory leak detected" message in alert.log on database shutdown

Other Issues

Note 1017085.102 ORA-01122, ORA-01210, ORA-01110: On Database Shutdown
Note 429603.1 ORA-29702 During Automatic Shutdown of Database using ASM
Note 1022414.6 ORA-01033 DATABASE INITIALIZATION OR SHUTDOWN IN PROGRESS
Note 784754.1 11g - Receiving Ora-27167 Error On Database Startup or Shutdown 
Note 419651.1 Event 10621 and Event 10626/10629 Causes Shutdown Immediate to Hang
Note 118228.1 ALERT: Hang During Startup/Shutdown on Unix When System Uptime > 248 Days 
Note 343031.1 How to deal with an ORA-01033 'Oracle startup or shutdown in progress' error
Note 18302.1 OERR: ORA 106 cannot startup/shutdown database when connected to a dispatcher
Note 763932.1 Shutdown Error In EM: Execution Failed Due To Binary Missing Or Permission Issues
Note 10194190.8 Bug 10194190 - Solaris: Process spin / ASM and DB crash if RAC instance up for > 248 days
Note 1001248.1 On Solaris 9 Systems, Oracle Shutdown May Hang If Utilizing Dynamic Initmate Shared Memory (DISM)
Note 760968.1 Database Startup, Shutdown Or New Connections Hang With Truss Showing OS Failing Semtimedop Call With Err#11 EAGAIN

Issues specific to automatic shutdown and startup

This is specific to the automatic shutdown and startup that can be configured with the dbora / dbshut / dbstart scripts.

Automatic Startup Failure

The key to diagnosing automatic startup failures is to determine where startup fails. This can be done via the following steps:

Determine if instance starts manually as Oracle software owner.
Determine if instance starts via dbstart command run as Oracle software owner.
Determine if instance starts when root runs following dbstart command:
su - $ORA_OWNER -c $ORA_HOME/bin/dbstart

where $ORA_OWNER is set to Oracle software owner.
Determine if instance starts when running as root the OS script which calls dbstart, ie "/etc/init.d/dbora start". NOTE: Running via sh -x command will show each command as it is run from script to better see what is going on.
#> sh -x /etc/init.d/dbora start
Automatic Shutdown Failure

As with automatic startup, the key to diagnosing automatic shutdown failures is to determine where shutdown fails. This can be done via following steps:

Determine if instance stops manually as Oracle software owner.
Determine if instance stops via dbshut command run as Oracle software owner.
Determine if instance stops when root runs command
su - $ORA_OWNER -c $ORA_HOME/bin/dbshut
where $ORA_OWNER is set to Oracle software owner.
Determine if instance stops when running as root the OS script which calls dbshut, ie "/etc/init.d/dbora stop". NOTE: Running via sh -x command will show each command as it is run from script to better see what is going on.
#> sh -x /etc/init.d/dbora stop

【Oracle数据恢复】ORA-600 [kcbgcur_9]错误解析

ORA-600 [kcbgcur_9]错误的触发原理是 当 buffer被以某种类型顺序pin住以便防止内部死锁时,发现当前buffer被以错误的类型顺序pinned或者同一class中有重复的buffer;  也可能在更新只读表空间时与ORA-372错误一起发生。
该ORA-600 [kcbgcur_9]错误隶属于内核 buffer cache模块

这个ORA-600 [kcbgcur_9]错误一般有4个argument:

 

Arg [a] The Database Block Address (DBA)
Arg [b] The buffer class
Arg [c] Block class order mask
Arg [d] Block class checking mask

ORA-600 [kcbgcur_9]错误可能的影响是造成进程失败,但一般不会造成数据块的损坏

 

ORACLE SUPPORT对遇到该问题的主要建议是 检查所有的系统硬件 磁盘和内存, 以便保证不是底层系统造成了该问题。

 

一些可能造成ORA-600 [kcbgcur_9]的BUG:

 

 

NB Bug Fixed Description
10061015 11.2.0.2.3, 11.2.0.2.BP08, 11.2.0.3, 12.1.0.1 ORA-600 [ktfbhget:clsviol_kcbgcur_9] can occur on transported in tablespace
9407198 11.2.0.3, 12.1.0.1 “LOG ERRORS INTO” can cause ORA-600 [kcb***] or hang scenarios
9133601 12.1.0.1 ORA-600 [kcbgcur_9] from MERGE into an IOT
8459467 10.2.0.5, 11.2.0.1 ORA-600 [ktfbhget:clsviol_kcbgcur_9] attempting to ONLINE a tablespace
7644703 10.2.0.5, 11.2.0.1 OERI[kcbgcur_9] on alter tablespace online
6918892 11.2.0.1 ORA-600 [kcbgcur_9] from DML on sys.aud$
4917666 10.2.0.3, 11.1.0.6 OERI[kcbgcur_9] from type constructor
8778379 10.2.0.5 Fix event 10227 in 10.2 ORA-600[kcbgcur_3] or ORA-600[kcbgcur_9]
3655948 10.1.0.5, 10.2.0.1 OERI:kcbgcur_9 during disconnect with distributed transaction
3127116 9.2.0.5, 10.1.0.2 OERI:kcbgcur_9 during rollback of DML in ASSM segment
2771042 9.2.0.6, 10.1.0.2 OERI[kcbgcur_9] when performing DML on an IOT with an FGA policy
2747873 9.2.0.8, 10.1.0.2 DBMS_SPACE.UNUSED_SPACE may fail with OERI:kcbgcur_9
2722809 9.2.0.4, 10.1.0.2 OERI:kcbgcur_9 on direct load into AUTO space managed segment
2716406 10.1.0.2 OERI[kcbgcur_9] executing DBMS_SPACE.FREE_BLOCKS procedure on bitmapped segment
2619895 9.2.0.8, 10.1.0.2 Dump / OERI:kcbgcur_9 using PLSQL in SET clause of MERGE .. UPDATE
2392885 9.2.0.4, 10.1.0.2 Direct path load may fail with OERI:kcbgcur_9 / OERI:ktfduedel2
2311214 9.0.1.4, 9.2.0.2, 10.1.0.2 OERI:KCBGCUR_9 possible for SQL against IOT
1785175 9.2.0.2, 10.1.0.2 OERI:kcbgcur_9 from CLOB TO CHAR or BLOB TO RAW conversion
4124059 9.2.0.8 OERI[kcbget_31] / similar KCB errors in single instance
2202310 9.0.1.4, 9.2.0.1 OERI:KCBGCUR_9 possible from SMON dropping a rollback segment in locally managed tablespace
2035267 9.0.1.3, 9.2.0.1 OERI:KCBGCUR_9 possible during TEMP space operations
1804676 8.1.7.3, 9.0.1.3, 9.2.0.1 OERI:KCBGCUR_9 possible from ONLINE REBUILD INDEX with concurrent DML
1397075 8.1.7.2, 9.0.1.0 OERI:KCBGCUR_9 from SMON during temp seg cleanup for segment in read only LOCALLY MANAGED TABLESPACE

 

如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!

 

 

【Oracle数据恢复】ORA-00704 bootstrap process failure错误解析

【Oracle恢复】ORA-00704 bootstrap process failure错误解析,本质上这个发生在open database 过程中的错误 其主要原因是在处理自举 bootstrap信息时发现了数据存在的问题,有多重情况均可能导致该错误发生。

 

 

 

当数据库被强制FORCE REESTLOGS 打开时可能遇到如下的错误信息:

 

 

ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 11 with name "_SYSSMU11$" too small
Tue Jan 17 04:46:17 2012
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704
Instance terminated by USER, pid = 5496
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...

[oracle@mlab2 ~]$ oerr ora 704
00704, 00000, "bootstrap process failure"
// *Cause:  Failure in processing bootstrap data - see accompanying error.
// *Action: Contact your customer support representative.

 

 

如果对于ORA-1555没有TRACE产生的话,可以以如下方法生成TRACE:

startup mount;

alter system set events '1555 trace name errorstack level 3';

alter database open resetlogs;

 

 

到user dump目录下找到对应的TRACE文件和ALERT.LOG, 假设alert.log中出现了这样的信息:

ORA-01555: snapshot too old: rollback segment number 11 with name “_SYSSMU11$” too small

则说明此场景中引起 ORA-1555的 USN undo segment number是11, 11的16进制是 b

 

之后我们阅读之前生成的TRACE,主要的工作是 寻找 表或索引的数据块的 DUMP并发现其中ITL 带有undo segment number 为11的记录,如下面的例子:

 

BH (0xacff8e48) file#: 1 rdba: 0x0040003e (1/62) class: 1 ba: 0xacf66000
set: 3 blksize: 8192 bsi: 0 set-flg: 2 pwbcnt: 0
dbwrid: 0 obj: 18 objn: 18 tsn: 0 afn: 1
hash: [d0e04c98,d0e04c98] lru: [acff8fd8,acff8db8]
ckptq: [NULL] fileq: [NULL] objq: [cc9a3d00,cc9a3d00]
use: [ce699910,ce699910] wait: [NULL]
st: CR md: EXCL tch: 0
cr: [scn: 0x0.4124e1e],[xid: 0x6.0.c28d],[uba: 0x820075.ea1.23],[cls: 0x0.46b5261],[sfl: 0x1]
flags:
Using State Objects
----------------------------------------
SO: 0xce6998d0, type: 24, owner: 0xd04439e8, flag: INIT/-/-/0x00
(buffer) (CR) PR: 0xd02fa378 FLG: 0x500000
class bit: (nil)
kcbbfbp: [BH: 0xacff8e48, LINK: 0xce699910]
where: kdswh02: kdsgrp, why: 0
buffer tsn: 0 rdba: 0x0040003e (1/62)
scn: 0x0000.046b527a seq: 0x00 flg: 0x00 tail: 0x527a0600
frmt: 0x02 chkval: 0x0000 type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dump of memory from 0x00000000ACF66000 to 0x00000000ACF68000
0ACF67FF0 FFFF02C1 01FF8001 02C10280 527A0600 [..............zR]
...
Block header dump: 0x0040003e
Object id on Block? Y
seg/obj: 0x12 csc: 0x00.46b519e itc: 1 flg: - typ: 1 - DATA
fsl: 0 fnx: 0x0 ver: 0x01
Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000b.00b.00000e7a 0x00802042.00db.1a C--- 0 scn 0x0000.04624228

 

这里可以看到第一条 ITL 0x01 , 其中<undo no>.<slot>.<wrap>  对应为 0x000b.00b.00000e7a ,则USN 为 0x000b 即 十进制的11

而这个数据块属于0x0040003e –(1/62), 其FILE_ID=1 BLOCK_ID=62的数据块

这个块 实际是实例中已经读到buffer cache中的buffer ,找到buffer header BH中的SCN 信息 即为0x0000.046b527a

之后可以通过_mimimum_giga_scn 等方法调整SCN,或者通过修改ITL来绕过上述问题成功打开数据库。

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

 

【Oracle数据恢复】ORA-01578错误解析

一般认为ORA-1578是 数据块物理坏块/损坏的一种,不同于逻辑损坏/坏块,其错误信息如下:

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

[oracle@oel8 dirdat]$ oerr ora 1578
01578, 00000, "ORACLE data block corrupted (file # %s, block # %s)"
// *Cause:  The data block indicated was corrupted, mostly due to software
//          errors.
// *Action: Try to restore the segment containing the block indicated. This
//          may involve dropping the segment and recreating it. If there
//          is a trace file, report the errors in it to your ORACLE
//          representative

这个1578一般 会伴随ORA-1110出现,一旦ORACLE读取到存在损坏的块就会报出:

SQL> select * from scott.emp;
select * from scott.emp
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 11, block # 34) 这里的FILE#是  RFN
ORA-01110: data file 6:'/home/oracle/corrclass.dbf'   这里的是AFN

 

 

断裂的数据块Fractured Block:

 

Corrupt block relative dba: 0x0380e573 (file 14, block 58739)
Fractured block found during buffer read
Data in bad block -
 type: 6 format: 2 rdba: 0x0380e573
 last change scn: 0x0288.8e5a2f78 seq: 0x1 flg: 0x04
 consistency value in tail: 0x00780601
 check value in block header: 0x8739, computed block checksum: 0x2f00
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0380e573 (file 14, block 58739) found same corrupted data

或者错误的校验和checksum

 

Corrupt block relative dba: 0x0380a58f (file 14, block 42383)
Bad check value found during buffer read
Data in bad block -
 type: 6 format: 2 rdba: 0x0380a58f
 last change scn: 0x0288.7784c5ee seq: 0x1 flg: 0x06
 consistency value in tail: 0xc5ee0601
 check value in block header: 0x68a7, computed block checksum: 0x2f00
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0380a58f (file 14, block 42383) found same corrupted data

 

或者错误的块头

 

Corrupt block relative dba: 0x0d805a89 (file 54, block 23177)
Bad header found during buffer read
Data in bad block -
 type: 6 format: 2 rdba: 0x0d805b08
 last change scn: 0x0692.86dc08e3 seq: 0x1 flg: 0x04
 consistency value in tail: 0x08e30601
 check value in block header: 0x2a6e, computed block checksum: 0x0
 spare1: 0x0, spare2: 0x0, spare3: 0x0
***
Reread of rdba: 0x0d805a89 (file 54, block 23177) found valid data

 

一些可能引起ORA-1578的BUG 列表如下:

如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!

 

NB Bug Fixed Description
13804294 11.2.0.3.4, 11.2.0.3.BP07, 12.1.0.0 Internal errors, corruptions, using pipelined function whose rows raise exceptions
11707302 11.2.0.2.3, 11.2.0.2.BP06, 11.2.0.3, 12.1.0.0 Corruption from ASM crash during rebalance diskgroup. Misplaced Blocks
11659016 11.2.0.3, 12.1.0.0 ORA-1578 against recently create tablespace that once was encrypted
+ 10209232 11.1.0.7.7, 11.2.0.1.BP08, 11.2.0.2.1, 11.2.0.2.BP02, 11.2.0.2.GIBUNDLE01, 11.2.0.3, 12.1.0.0 ORA-1578 / ORA-600 [3020] Corruption. Misplaced Blocks and Lost Write in ASM
* 10205230 11.2.0.1.6, 11.2.0.1.BP09, 11.2.0.2.2, 11.2.0.2.BP04, 11.2.0.3, 12.1.0.0 ORA-600 / corruption possible during shutdown in RAC
9965568 12.1.0.0 Block dumps are not formatted if there is a chkval error
9965085 11.2.0.3, 12.1.0.0 ORA-1578 / ORA-8103 Temporary table block corruption / space wastage from PDML
9739664 11.2.0.2, 12.1.0.0 ORA-1578 / ORA-26040 MANUAL RECOVER marks block as corrupt NOLOGGING in even if LOGGING is enabled
+ 9724970 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.0 Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check
9407198 11.2.0.3, 12.1.0.0 “LOG ERRORS INTO” can cause ORA-600 [kcb***] or hang scenarios
* 9406607 11.2.0.1.3, 11.2.0.1.BP06, 11.2.0.2, 12.1.0.0 Corrupt blocks in 11.2 in table with unique key. OERI[kdBlkCheckError] by block check
* 8943287 11.2.0.2, 12.1.0.0 ORA-1578 corrupt block with AUTH SQL*Net strings
* 8898852 11.1.0.7.2, 11.2.0.1.1, 11.2.0.1.BP04, 11.2.0.2, 12.1.0.0 ORA-1578 Blocks misplaced in ASM when file created with compatible.asm < 11 and resized
8885304 11.2.0.2, 12.1.0.0 ORA-7445 [ktu_format_nr] during RMAN CONVERT or Corrupt fractured block of UNDO tablespace datafile
* 8768374 10.2.0.5, 11.1.0.7.8, 11.2.0.1.BP11, 11.2.0.2, 12.1.0.0 RFS in Standby with a wrong location for archived log corrupting/overwriting database files when max_connections > 1
8760225 11.2.0.2, 12.1.0.0 Auto Block Media Recovery reports ORA-1578 on first query
8731617 11.2.0.3, 12.1.0.0 ORA-1578 from DESCRIBE or CTAS even if table not accessed / ORA-959 from DBMS_STATS
8720802 10.2.0.5, 11.2.0.1.BP07, 11.2.0.2, 12.1.0.0 Add check for row piece pointing to itself (db_block_checking,dbv,rman,analyze)
8493978 11.2.0.2, 12.1.0.0 Reserve file descriptors for datafile access
P 12330911 12.1 EXADATA LSI firmware for lost writes
10025963 11.2.0.1.BP09, 11.2.0.2 Block corruption of LOB blocks with checksum value but block has checksum disabled
8714541 11.2.0.2 ORA-1578 Corrupt Block in ASM with 0xbadfda7a after ASM block repair due to disk read error when ASM mirror is used
13101288 ORA-600, corruption or check errors dropping a column in a OLTP compressed table
+ 8354682 11.2.0.1 ORA-1578 – Blocks can be misplaced in ASM when there is IO error and AU > 1MB
+ 8339404 10.2.0.5, 11.1.0.7.1, 11.2.0.1 ORA-1578 – Blocks can be misplaced in ASM during a REBALANCE
8227257 11.2.0.1 ORA-1578 corruption found after media recovery on encrypted datafile
7396077 10.2.0.5, 11.2.0.1 RMAN does not differentiate NOLOGGING corrupt blocks that produce ORA-1578/ORA-26040
6471351 10.2.0.5, 11.1.0.7, 11.2.0.1 ORA-1578 / ORA-26040 due to NOLOGGING after recovery despite of FORCE LOGGING
6674196 10.2.0.4, 10.2.0.5, 11.1.0.6 OERI / buffer cache corruption using ASM, OCFS or any ksfd client like ODM
5515492 10.2.0.3, 11.1.0.6 ORA-1578 corruption with Block Misplaced during ASM rebalance after IO error
5031712 10.2.0.4, 11.1.0.6 DBV enhanced to report NOLOGGING corrupt blocks with DBV-201 instead of DBV-200
+ 4724358 11.1.0.6 ORA-27045 ORA-1578 ORA-27047 corruption caused by DBMS_LDAP
4684074 10.2.0.2, 11.1.0.6 OERI:510 / block corruption (ORA-1578) with DB_BLOCK_CHECKING
4655520 9.2.0.8, 10.2.0.3, 11.1.0.6 Block corrupted during write not noticed
4411228 9.2.0.8, 10.2.0.3, 11.1.0.6 ORA-1578 Block misplaced with mixture of file system and RAW files
4344935 10.2.0.4, 11.1.0.6 OERI from DML on TEMPORARY TABLE after autonomous TRUNCATE
7381632 11.1.0.6 ORA-1578 Free corrupt blocks may not be reformatted when Flashback is enabled
8976928 10.2.0.5 ORA-1578 caused by a former free corrupt block and remains unformatted
8684999 10.2.0.5 ORA-1578 caused by a former free corrupt block and remains unformatted
+ 3544995 9.2.0.6, 10.1.0.3, 10.2.0.1 LOB segments with “CACHE READS” generate no REDO even with the logging option
+ 1281962 9.2.0.1 Media recovery after ORA-1578 on rollback can cause logical inconsistency
589855 7.3.3.6, 7.3.4.1 ORA:1578 or ORA:8103 selecting invalid ROWID
406863 7.3.3.4, 7.3.4.0, 8.0.3.0 ORA-1578 using PQ with heavy simultaneous INSERTS
P 707304 7.3.4.4 AIX: Resizing RAW datafile can corrupt a DB block
603502 7.3.4.3, 8.0.4.4, 8.0.5.0 Possible Corruption if a session with LOOPBACK DB Links aborts.

ORA-15196 Oracle ASM CASE STUDY: UNDERSTANDING ERROR ORA-15196

 

If you cannot recover data by yourself, ask Parnassusdata, the professional ORACLE database recovery team for help.

Parnassusdata Software Database Recovery Team

Service Hotline:  +86 13764045638 E-mail: service@parnassusdata.com

 

This document provides an explanation of error ORA-15196, including the details of each argument, suggestions for the diagnostic of the error and finally includes a case study using a real problem reported by a customer.

 

Error Description

 

ORA-15196 is reported after a validation of an ASM metadata block has failed. The error will be reported in the following format:

ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]

 

Where the arguments indicate:

Argument                   Meaning

  • 1st                           Function and line number in the code, where the exception is raised 2nd        Field failing the validation
  • 3rd                         ASM object number stored in the block
  • 4th                         ASM block number stored in the block
  • 5th                         Value associated with  field referenced by argument 2 6th    Expected value for field referenced by argument 2

 

 

Example:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

Function and line number in the code, where the exception is raised = kfc.c:7997

Field failing the validation = endian_kfbh ASM object number stored in the block = 1 ASM block number stored in the block = 93

Value associated  with field referenced by argument #2 = 211

Expected value for field referenced by argument #2 = 0

 

Arguments description

 

  • Function and line number in the code, where the exception is raised

 

In general terms it is valid to say this argument will be the same in most of the possible cases, because is always the same routine where this exception is raised.

 

#define kfbValid(data, len, type, bl) \

kfbValidPriv(data, len, type, bl,     FILE    ,     LINE   ).

 

  • Field failing the validation

 

The ASM metadata is composed by many different structures like file directory, disk directory, active change directory (ACDC), etc, which are organized by files (asm file# between 1 and 255). Each file will be made of extents, which will be made of ASM block (4096 bytes). Each block has a generic block header (kfbh), and any of those fields can be validated.

 

kfbh.endian:                           0 ; 0x000: 0x00
kfbh.hard:                           130 ; 0x001: 0x82
kfbh.type:                             4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                           1 ; 0x003: 0x01
kfbh.block.blk:                       80 ; 0x004: T=0 NUMB=0x50
kfbh.block.obj:                        1 ; 0x008: TYPE=0x0 NUMB=0x1
kfbh.check:                   4268948098 ; 0x00c: 0xfe72fa82
kfbh.fcn.base:                         0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                         0 ; 0x014: 0x00000000
kfbh.spare1:                           0 ; 0x018: 0x00000000
kfbh.spare2:                           0 ; 0x01c: 0x00000000

 

 

A short description of each of the fields referenced above (file kf3.h):

 

kfbh.endian                  endianness  of  writer          big or little endian

kfbh.hard                      H.A.R.D. magic # and block size

kfbh.type                       metadata block type   (type of ASM metadata)

kfbh.datfmt                  metadata block data format

kfbh.block.blk              block location of this block

kfbh.block.obj              check value to verify consistency

kfbh.check                     change number of last change

kfbh.spare1                    zero pad out to 32bytes

kfbh.spare2                    zero pad out to 32 bytes

A list of the fields reported by this error through different SR is:

 

 

endian_kfbh
obj_kfbl hard_kfbh
type_kfbh
datfmt_kfbh
check_kfbh

 

  • ASM object number stored in the block

 

Every ASM metadata block belongs to a specific file associated with a specific ASM structure. That’s why ASM File numbers between 1 and 255 are used to identify the files storing those structures. The value on this field, references the ASM file number.

ASM File Number     ASM Metadata

1                                                 File Directory

2                                                Disk Directory

3                                                Active Change Directory (ACD)

4                                                Continous Operations Directory (COD)

5                                                Template Directory

6                                                Alias Directory

9                                                Attributes Directory

12                                              Staleness Directory

 

For other ASM metadata structures like PST, ATB, DISK HEADER, this field will have a static value 2147483648 (0x80000000)

 

  • ASM block number stored in the block

 

An ASM file will allocate extents, which are associated with Allocation Units. Multiple ASM metadata blocks of 4096 bytes make the extent, considering the default Allocation Unit size of 1MB; there are 256 blocks on each extent/AU.

 

The value stored on this field indicates the block number relative to a particular file. In this example, (93) is the block number, which will be stored in the first extent of the file. That extent will be allocated on a specific Allocation Unit of any of the disks in the diskgroup.

 

  • Value associated with field referenced by argument #2

 

This is the value found in the block for the field referenced in argument #2.

 

  • Expected value for field referenced by argument 2

 

This is the expected value for the block referenced by argument # 2.

 

 

Having the description of all the arguments for error ORA-15196, It should be possible to have a better understanding of the message:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

In the previous example, the field failing the validations is endian_kfbh, belong to file 1 (FILE DIRECTORY); it was also relative block 93, and the value for endian_kfbh was  211 while the correct value should have been 0.

 

Diagnostics

 

Up to 10gR2, there are some bugs (patch included) related to this error.

 

5554692 Related  to  indirect  extent  allocation.      Please  read  the  bug descriptionin webiv, because not all cases of ORA-15196 are this particular bug.
6027802 This was closed as not a bug, but was related to some IO issues caused by EMC Powerpath. Same type of data mismatch has been observed on other PP installations
6453944 ORA-15196 with ASM disks larger than 2TB using ASMLIB

 

 

The major number of issues of this error is associated with data changed outside of ASM.  This include:

 

  • Disks formatted at the OS level while it was used by ASM
  • Disks assigned to a file system while used by ASM
  • IO errors (stale writes)
  • Usage of 3rdparty software

 

Once this error is reported, the diskgroup needs to be recreated. There are situations where diskgroup cannot be mounted, or others where any reference to the metadata (recursive or non recursive), will signal the error and dismount the diskgroup.

 

Data Collection

 

In order to understand the extension of the problem and produce a correct diagnostic, it is  essential to obtain the following data:

 

  1. Alert.log and trace file associated to the error
  2. First 300MB of the disk affected with the error

 

In the alert.log, review the line before the report of error ORA-15196:

 

WARNING: cache failed to read fn=1 blk=80 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

In the line prior the report of error ORA-15196, it indicates the disk storing the block:  from disk(s): 0.

 

To get the first 300MB:

 

$dd if=<device path> of=/tmp/disk.dd bs= 1048576 count=300

 

It may be necessary to provide partial copy of other disks in the diskgroup.

 

  1. Output from AMDU if available

 

AMDU will be explained with more detail in a different note (TBD).

 

This tool is part of the New Features introduced with 11g. It reads the ASM disks and extract information into different files. Those files have a mapping of the  ASM metadata, an image with the content of the disks or it is possible to extract files from the diskgroup.

 

AMDU can extract the information even if the diskgroup is dismounted.

 

The mapping file is very important for the diagnostic of error ORA-15196. It has the specific location for each of the extents of each ASM metadata file.

 

Note 553639.1 is the placeholder for the AMDU binaries for some of the platforms.

 

 

Data Review

 

  1. Always review other blocks in the boundaries of the affected block. If more than one block has incorrect data (zeros), and they belong to different ASM structures (file directory, disk directory, etc), it is most likely was caused outside of ASM: disk reformatted, assigned to another volume manager, etc.

 

Use kfed to extract the content of the blocks.

 

  1. Reviewing the trace file generated by the error.

 

The trace file always will print a dump of the ASM metadata block in memory,  and also a short call stack. The output of the block is the same generated by kfed, which is a readable by the user.

 

*** SERVICE NAME:() 2008-01-23 11:57:23.892

*** SESSION ID:(39.74) 2008-01-23 11:57:23.892

OSM metadata block dump:

kfbh.endian:                           0 ; 0x000: 0x00 kfbh.hard:  130 ; 0x001: 0x82

kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                      80 ; 0x004: T=0 NUMB=0x50

kfbh.block.obj:                       1 ; 0x008: TYPE=0x0 NUMB=0x1 kfbh.check:                                     4268948098 ; 0x00c: 0xfe72fa82 kfbh.fcn.base:           0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

 

/*  data remove on purpose */

 

After the OSM metadata block dump, the short call stack is printed:

 

—– Abridged Call Stack Trace —–

 

kfcReadBlk()+1276 kfcLoad()+2148 kffbScanNext()+252 kffbTableCb()+700 kfgTableCb()+1252 kffilTableCb()+240 qerfxFetch()+896 qersoFetch()+720 qerjotFetch()+184 opifch2()+8092 kpoal8()+4196 opiodr()+1548 ttcpip()+1284 opitsk()+1432 opiino()+1128 opiodr()+1548 opidrv()+896 sou2o()+80 opimai_real()+124 main()+152

 

  1. Compare the data in the trace file with the data extracted from disk using kfed.

 

Comparing the block dumped in the trace file and the block in disk, it is possible to identify the exact cause of the check validation failure. Every case will be different, but if the data stored in disk is zeros, always remember to validate other blocks (adjacent). If more blocks are reporting invalid data (zeros), this is an indication the disk has been formatted outside ASM.

 

Example 1:

 

This  is  an  example  of  a  block  with   invalid  data.      The  type  of  the  block is KFBTYP_INVALID, generated when a incorrect type is stored.

 

kfbh.endian:                           0 ; 0x000: 0x00

kfbh.hard:                            34 ; 0x001: 0x22

kfbh.type:                            0 ; 0x002: KFBTYP_INVALID

kfbh.datfmt:                          0 ; 0x003: 0x00

kfbh.block.blk:              4290772992 ; 0x004: T=1 NUMB=0x7fc00000

kfbh.block.obj:                     0 ; 0x008: TYPE=0x0 NUMB=0x0

kfbh.check:                           0 ; 0x00c: 0x00000000

kfbh.fcn.base:                    13879 ; 0x010: 0x00003637

kfbh.fcn.wrap:                      512 ; 0x014: 0x00000200

kfbh.spare1:                      978943 ; 0x018: 0x000eefff

kfbh.spare2:                      2054913149 ; 0x01c: 0x7a7b7c7d

 

 

 

 

 

Example 2:

 

The full content of the block has 0xd4.

 

disk:0 au:2 block:253 file:1 physical extent:0 block:253
kfed	read	ausz=1048576	blksz=4096	aunum=2	blknum=253 dev=/dev/rdsk/c2t50060E8000C41384d2s6

kfbh.endian:	212 ; 0x000: 0xd4
kfbh.hard:	212 ; 0x001: 0xd4
kfbh.type:	212 ; 0x002: *** Unknown Enum ***
kfbh.datfmt:	212 ; 0x003: 0xd4
kfbh.block.blk:	3570717908 ; 0x004: T=1 NUMB=0x54d4d4d4 
kfbh.block.obj:	3570717908 ; 0x008: TYPE=0xd NUMB=0x4d4d4 
kfbh.check:	3570717908 ; 0x00c: 0xd4d4d4d4
kfbh.fcn.base:	3570717908 ; 0x010: 0xd4d4d4d4 
kfbh.fcn.wrap:	3570717908 ; 0x014: 0xd4d4d4d4 
kfbh.spare1:	3570717908 ; 0x018: 0xd4d4d4d4 
kfbh.spare2:	3570717908 ; 0x01c: 0xd4d4d4d4 
kfbtTraverseBlock: Invalid OSM block type 212
0000: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0020: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0040: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0060: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 0080: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 00a0: d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4 d4d4d4d4
 

 

 

CASE STUDY

 

 

The diskgroup was not used for some months, used by a copy of a database. Due to business reasons, that database required to be used. Mounting the diskgroup was possible, but when the database was mounted, and reading the ASM metadata was required, error ORA-15196 was signaled and diskgroup dismounted.

 

The diskgroup was configured using external redundancy with a single disk and using the default Allocation Unit size of 1MB.

 

Data Collected

 

  1. The messages in the alert.log:

 

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

 

 

  1. The ASM block dumped in the trace file.

 

 

*** SESSION ID:(108.5) 2008-02-06 10:05:31.054

OSM metadata block dump:

kfbh.endian:                           0 ; 0x000: 0x00 kfbh.hard:  130 ; 0x001: 0x82

kfbh.type:                    7 ; 0x002: KFBTYP_ACDC

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                   10752 ; 0x004: T=0 NUMB=0x2a00

kfbh.block.obj:               3 ; 0x008: TYPE=0x0 NUMB=0x3

kfbh.check:                                     1103194877 ; 0x00c: 0x41c16afd

kfbh.fcn.base:                                               0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

 

  1. AMDU together with 300MB for the disk were collected.

 

Data Review

 

  1. The error:

 

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

 

The error provides the following information:

 

o    The field failing the validation is obj_kfbl

o    The block belongs to file 1 (fn=1).  File 1 is the File Directory.

o    The block is block 256 (blk=256)

o    The value for obj_kfbl found was 3 but the expected value should be 1.

 

 

File extents, allocation units, blocks in ASM start at 0. Also, block size is 4096. Using the default AU size (1MB), there are 256 blocks. Block 256 is stored in the second extent.

 

 

Although the diskgroup was mounted, any query referencing x$kffxp trying to get the extent mapping for file 1 failed. As a result, it was not possible to identify the AU used by block 256 from file 1 (the affected block).

 

  1. Using AMDU

 

One of the files generated by AMDU is the mapping file (*.map) . That file contains the location on disk for every extent of the files stored in the diskgroup. The only record for file 1 was:

 

N0001 D0000 R00 A00000002 F00000001 I0 E00000000 U00 C00256 S0001 B0002097152

 

This line indicates that for File 1 (F00000001)), the first extent is stored  in  Allocation Unit 2 ( A00000002  ) from disk 0 ( D0000    )   .

 

t was not  another entry for file 1 in the mapping file, but AMDU was generating  a core dump.  It was discovered AMDU was trying to read Allocation Unit 50.

 

One of the cool things of AMDU, is the possibility of dumping the content of a complete extent for a particular file, redirecting the output into a text file.

 

$amdu –diskstring ‘<path of device>’ –dump ‘<diskgroup name> -print ‘DG.F1.X1.B0.C256’

 

The previous command will dump 256 blocks of File 1 Extent 1 starting at block  0.

 

The results of the last command were:

 

************************** PRINTING XYZ.F1.X1.B0.C2 **************************

 

——————————– BLOCK 1 OF 2 ——————————–

…………………………………………………………………

disk:0 au:50 block:0 file:1 physical extent:1 block:0

kfed          read          ausz=1048576          blksz=4096          aunum=50          blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd

 

At this point the conclusions were:

 

  • The ASM metadata shows that Allocation Unit 50 from disk 0 belongs to File 1.

 

——————————– BLOCK 1 OF 2 ——————————–

…………………………………………………………………

disk:0 au:50 block:0 file:1 physical extent:1 block:0

kfed       read       ausz=1048576        blksz=4096       aunum=50       blknum=0 dev=/emea/bde/home/users/jfiguer2/disk.dd

 

  • If the block belongs to file 1, the value for kfbh.block.obj field should have been 1 together with the value for kfbh.type, which should have been KFBTYP_FILEDIR. But that was not the case:

 

The error ORA-15196:

 

WARNING: cache failed to read fn=1 blk=256 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [obj_kfbl] [1] [256] [3 != 1]

 

  • The content dumped into the trace file was the same found on disk. The check validation failed because the data stored in the block was not part of the correct ASM metadata, in this case file directory.

 

The next step was to validate all the blocks in the same Allocation Unit. Those blocks belong to the same ASM metadata (KFBTYP_FILEDIR). One Allocation Unit is used exclusively by one unique file.

 

Example for block 1 from AU 50:

 

disk:0 au:50 block:1 file:1 physical extent:1 block:1

kfed        read        ausz=1048576         blksz=4096         aunum=50        blknum=1 dev=/emea/bde/home/users/jfiguer2/disk.dd

 

 

 

 

The solution

 

There was not an available backup for the database stored on the diskgroup, so it was required to keep the diskgroup mounted. Patching the ASM metadata, replacing the content of the first block from Allocation Unit 50, with a valid data.

 

It was not possible to rebuild the real data for the block 0, so it was replaced with block

  1. Additional patching was required, in order to adjust other fields in the block. Once the block was successfully patched, the diskgroup was mounted and queries on internal views did not dismount the diskgroup.

 

Opening the database report errors trying to identify one data file. The extent mapping  for this file was stored in the patched block. Luckily that file was not relevant for the database.  After setting the file offline, the database opened  without errors.

 

Because was not possible to guarantee the integrity of the diskgroup, it was recommended to take a backup of the database and rebuild the diskgroup.

【Oracle数据恢复】ORA-600[4511]错误解析

ORA-00600[4511]错误的触发条件是当ORACLE检验一个行锁并发现该锁不再活动时。 当然也可以由于一个数据块中存放了4095条以上的行时,一般仅仅当使用32kb数据块时可能包含这么多行, 针对这种情况的一种workaround是限制块中的数据行数,可以通过设置pctfree为一个较高值。

ORA-00600[4511]错误属于内核事务块处理,该错误可能造成进程失败或者可能的坏块

如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复!

 

如果用户已经了解到是那张表上存在可能引起ORA-00600[4511]错误的问题数据块,则可以通过执行ANALYZE TABLE <表名> VALIDATE STRUCTURE CASCADE;命令来检验是否存在潜在的问题。也可以用DBV工具来扫描表所在的数据文件。

这也可能是Consistent Read一致性读引起的该问题,那么alter system flush buffer_cache或者重启实例都可能可以绕过该问题。

 

ORA-00600[4511]相关的BUG 信息:

9724970 11.2.0.1.BP08, 11.2.0.2.2, 11.2.0.2.BP02, 11.2.0.3, 12.1.0.1 Block Corruption with PDML UPDATE. ORA_600 [4511] OERI[kdblkcheckerror] by block check

4000840 9.2.0.7, 10.1.0.4, 10.2.0.1 Update of a row with more than 255 columns can cause block corruption

沪ICP备14014813号-2

沪公网安备 31010802001379号