Database Force open example

帮网友强制打开了一个没有备份的测试库,这个库没有备份也没有打开归档,因为之前也出现过active日志文件损毁,一直使用隐式参数才能正常打开:

                 _allow_resetlogs_corruption= TRUE

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

这次一开始这个库报ORA-600[2662]错误:

Mon Aug 23 09:37:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131504], [0], [130254136], [4264285], [], []
Mon Aug 23 09:37:02 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131506], [0], [130254136], [4264285], [], []

ORA-600 [2662] “Block SCN is ahead of Current SCN”错误是当数据块中的SCN领先于current SCN,由于后台进程或服务进程都会比对UGA中的dependent SCN和数据库当前的SCN,如果数据库当前SCN小于dependent SCN,那么该进程就会报ORA-600 [2662]错误,如果遭遇该错误的是服务进程,那么服务进程一般会异常终止;如果遭遇该错误的是后台进程譬如SMON,则会导致实例CRASH。
ORA-600 [2662]错误可以能由以下几种情况引起:
1.启用隐含参数_ALLOW_RESETLOGS_CORRUPTION后,以resetlogs形式打开数据库;这种情况下发生2662错误,根本原因是没有完全前滚导致控制文件中的SCN滞后于数据块中的SCN。
2.硬件故障导致数据库没法写控制文件和联机日志文件
3.错误的部分恢复数据库
4.恢复了控制文件,但是没有使用recover database using backup controlfile进行恢复
5.数据库crash后设置了_DISABLE_LOGGING隐含参数
6.在并行服务器环境中DLM存在问题

该错误的5个参数的具体含义如下:
ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

我们的case当中dependent SCN为130254136,而当前SCN为130131506,其差值为122630;从以上告警日志中可以看到数据库的当前SCN是在不断缓慢增长的,当我们遭遇到2662错误时,很滑稽的一点是只要不断重启数据库保持current SCN的增长,一段时间后2662错误会不药而愈。当然我们也可以不用这种笨办法,10015事件可以帮助我们调整数据库当前SCN:

/* 当数据库处于mount状态,可以使用10015事件来调整scn */

alter session  set events '10015 trace name adjust_scn level 1';

/* 这里可以设置level 2..10等 (level 1是在每次打开数据库时scn增加1000k)*/

/* 需要注意的是10g某些版本不同于9i,需要设置隐式参数_allow_error_simulation,才能真正增进scn */

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> col current_scn format 999,999,999,999

SQL> select current_scn from v$database;
CURRENT_SCN
-----------
    1141408

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             989857784 bytes
Database Buffers          654311424 bytes
Redo Buffers                7135232 bytes
Database mounted.

SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.

SQL> alter database open;
Database altered.

SQL>  select current_scn from v$database;
CURRENT_SCN
-----------
    1142031

/* 可以看到current_scn并未大量增加,10.2.0.4上默认10015 adjust_scn不被触发 */

SQL>  alter system set "_allow_error_simulation"=true scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size                  2213896 bytes
Variable Size             989857784 bytes
Database Buffers          654311424 bytes
Redo Buffers                7135232 bytes
Database mounted.

SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.

SQL> alter database open;
Database altered.

SQL>select current_scn from v$database;
     CURRENT_SCN
----------------
   1,073,741,980

在接手之前该网友已经通过反复重启数据库将数据库的当前SCN提高到dependent SCN的127037138;原以为这样就可以打开数据库了,谁知道又出现了一下错误:

Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Error 704 happened during db open, shutting down database

bootstrap自举过程中遭遇了ORA-600 [4000]错误,该错误一般当Oracle尝试读取数据字典(主要是undo$基表)中记录的USN对应的回滚段失败引起.,通过设置隐式参数_corrupted_rollback_segments可以一定程度上规避该错误,强制打开数据库,其Argument[a]代表造成读取失败的USN(undo segment number),但实际上有问题的回滚段可能不止这一个:

/* 通过strings工具从system表空间上找回各回滚段的名字  */
$strings system.dbf |grep _SYSSMU|less
_SYSSMU1$
_SYSSMU2$
_SYSSMU3$
_SYSSMU4$
_SYSSMU5$
_SYSSMU6$
_SYSSMU7$
_SYSSMU8$
_SYSSMU9$
.........
alter system set "_corrupted_rollback_segments"='(_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$, _SYSSMU11$, _SYSSMU12$)' scope=spfile;
System altered.

/* 即便设置了_corrupted_rollback_segments隐式参数,也还有一定概率遭遇4000错误,尝试加上10513事件,并多次重启数据库 */

SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;
System altered.

/* 再次出现4000 错误 */
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Error 704 happened during db open, shutting down database

/* 再次重启后发现4000错误不再出现 * /

再次重启发现不再出现ORA-600[4000]错误,但在字典检查阶段Oracle认为数据文件227不匹配于当前的incarnation:

Thu Aug 26 11:13:22 2010
Dictionary check beginning
Thu Aug 26 09:46:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_897162.trc:
ORA-01177: data file does not match dictionary - probably old incarnation
ORA-01110: data file 227: '/oracle/QAS/sapdata2/qas_192/qas.data196'
Error 1177 happened during db open, shutting down database
USER: terminating instance due to error 1177
Instance terminated by USER, pid = 897162

初步判断出现ORA-01177可能为2种可能性:
1.数据字典出现讹误,227号文件对应的incarnation信息不正确
2.在之前的某次resetlogs open过程中,227号文件头由于某些原因没有正确更新incarnation信息

针对这样的情况如果一定要找回该数据文件上的数据的话只能通过手动修改数据字典或文件头,当然也可以尝试使用一些直接从数据文件上抽取数据的工具。
因为这是一次友情协助,就没有继续深入下去,通过重建控制文件并跳过该数据文件解决了:

CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS  NOARCHIVELOG
--  SET STANDBY TO MAXIMIZE PERFORMANCE
    MAXLOGFILES 255
    MAXLOGMEMBERS 3
    MAXDATAFILES 254
    MAXINSTANCES 50
    MAXLOGHISTORY 36302
LOGFILE
  GROUP 1 (
    '/oracle/QAS/redolog/redolog11A.dbf',
    '/oracle/QAS/redolog/redolog11B.dbf'
  ) SIZE 500M,
  GROUP 2 (
    '/oracle/QAS/redolog/redolog12A.dbf',
    '/oracle/QAS/redolog/redolog12B.dbf'
  ) SIZE 500M
-- STANDBY LOGFILE
DATAFILE
  '/oracle/QAS/sapdata1/system_1/system.data1',
........
  '/oracle/QAS/sapdata2/qas_192/qas.data195'
CHARACTER SET WE8DEC
Thu Aug 26 14:04:50 2010
Successful mount of redo thread 1, with mount id 2117500093
Thu Aug 26 14:04:50 2010
Completed: CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS
Thu Aug 26 14:05:05 2010
alter database mount
Thu Aug 26 14:05:05 2010
ORA-1100 signalled during: alter database mount...
Thu Aug 26 14:05:15 2010
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1125281471596
Resetting resetlogs activation ID 0 (0x0)
Online log 1 of thread 1 was previously cleared
Thu Aug 26 14:05:36 2010
Assigning activation ID 2117500093 (0x7e367cbd)
Thread 1 opened at log sequence 1
  Current log# 2 seq# 1 mem# 0: /oracle/QAS/redolog/redolog12A.dbf
  Current log# 2 seq# 1 mem# 1: /oracle/QAS/redolog/redolog12B.dbf
Successful open of redo thread 1
Thu Aug 26 14:05:36 2010
SMON: enabling cache recovery
Thu Aug 26 14:05:36 2010
Dictionary check beginning
Tablespace 'PSAPTEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #227 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00227' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #228 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00228' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #229 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00229' in the controlfile.
This file can no longer be recovered so it must be dropped.
Dictionary check complete
Thu Aug 26 14:05:38 2010
SMON: enabling tx recovery
Thu Aug 26 14:05:38 2010
Database Characterset is WE8DEC
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open resetlogs

这个case告诉我们,测试库并不一定就不重要,测试库也是需要备份的。

Comments

  1. admin says

    ADJUST_SCN Event
    ~~~~~~~~~~~~~~~~
    *** WARNING ***
    This event should only ever be used under the guidance
    of an experienced Oracle analyst.
    If an SCN is ahead of the current database SCN, this indicates
    some form of database corruption. The database should be rebuilt
    after bumping the SCN.
    ****************

    The ADJUST_SCN event is useful in some recovery situations where the
    current SCN needs to be incremented by a large value to ensure it
    is ahead of the highest SCN in the database. This is typically
    required if either:
    a. An ORA-600 [2662] error is signalled against database blocks
    or
    b. ORA-1555 errors keep occuring after forcing the database open
    or ORA-604 / ORA-1555 errors occur during database open.
    (Note: If startup reports ORA-704 & ORA-1555 errors together
    then the ADJUST_SCN event cannot be used to bump the
    SCN as the error is occuring during bootstrap.
    Repeated startup/shutdown attempts may help if the SCN
    mismatch is small)
    or
    c. If a database has been forced open used _ALLOW_RESETLOGS_CORRUPTION
    (See )

    The ADJUST_SCN event acts as described below.

    **NOTE: You can check that the ADJUST_SCN event has fired as it
    should write a message to the alert log in the form
    “Debugging event used to advance scn to %s”.
    If this message is NOT present in the alert log the event
    has probably not fired.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    If the database will NOT open:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Take a backup.
    You can use event 10015 to trigger an ADJUST_SCN on database open:

    startup mount;

    alter session set events ‘10015 trace name adjust_scn level 1’;

    (NB: You can only use IMMEDIATE here on an OPEN database. If the
    database is only mounted use the 10015 trigger to adjust SCN,
    otherwise you get ORA 600 [2251], [65535], [4294967295] )

    alter database open;

    If you get an ORA 600:2256 shutdown, use a higher level and reopen.

    Do *NOT* set this event in init.ora or the instance will crash as soon
    as SMON or PMON try to do any clean up. Always use it with the
    “alter session” command.

    ~~~~~~~~~~~~~~~~~~~~~~~~~~
    If the database *IS* OPEN:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~
    You can increase the SCN thus:

    alter session set events ‘IMMEDIATE trace name ADJUST_SCN level 1’;

    LEVEL: Level 1 is usually sufficient – it raises the SCN to 1 billion
    (1024*1024*1024)
    Level 2 raises it to 2 billion etc…

    If you try to raise the SCN to a level LESS THAN or EQUAL to its
    current setting you will get – See below.
    Ie: The event steps the SCN to known levels. You cannot use
    the same level twice.

    Calculating a Level from 600 errors:
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    To get a LEVEL for ADJUST_SCN:

    a) Determine the TARGET scn:
    ora-600 [2662] See Use TARGET >= blocks SCN
    ora-600 [2256] See Use TARGET >= Current SCN

    b) Multiply the TARGET wrap number by 4. This will give you the level
    to use in the adjust_scn to get the correct wrap number.
    c) Next, add the following value to the level to get the desired base
    value as well :

    Add to Level Base
    ~~~~~~~~~~~~ ~~~~~~~~~~~~
    0 0
    1 1073741824
    2 2147483648
    3 3221225472

  2. admin says

    ORA-600 [2662] “Block SCN is ahead of Current SCN”
    PURPOSE:
    This article discusses the internal error “ORA-600 [2662]”, what
    it means and possible actions. The information here is only applicable
    to the versions listed and is provided only for guidance.

    ERROR:
    ORA-600 [2662] [a] [b] [c] [d] [e]

    VERSIONS:
    versions 6.0 to 10.1

    DESCRIPTION:

    A data block SCN is ahead of the current SCN.

    The ORA-600 [2662] occurs when an SCN is compared to the dependent SCN
    stored in a UGA variable.

    If the SCN is less than the dependent SCN then we signal the ORA-600 [2662]
    internal error.

    ARGUMENTS:
    Arg [a] Current SCN WRAP
    Arg [b] Current SCN BASE
    Arg [c] dependent SCN WRAP
    Arg [d] dependent SCN BASE
    Arg [e] Where present this is the DBA where the dependent SCN came from.

    FUNCTIONALITY:
    File and IO buffer management for redo logs

    IMPACT:
    INSTANCE FAILURE
    POSSIBLE PHYSICAL CORRUPTION

    SUGGESTIONS:

    There are different situations where ORA-600 [2662] can be raised.

    It can be raised on startup or duing database operation.

    If not using Parallel Server, check that 2 instances have not mounted
    the same database.

    Check for SMON traces and have the alert.log and trace files ready
    to send to support.

    Check the SCN difference [argument d]-[argument b].

    If the SCNs in the error are very close, then try to shutdown and startup
    the instance several times.

    In some situations, the SCN increment during startup may permit the
    database to open. Keep track of the number of times you attempted a
    startup.

    If the Known Issues section below does not help in terms of identifying
    a solution, please submit the trace files and alert.log to Oracle
    Support Services for further analysis.

    Known Issues:

    Bug# 5889016 See Note:5889016.8
    Corruption / OERI during recovery
    Fixed:

    Bug# 4453449 See Note:4453449.8
    OERI:3020 / corruption errors from multiple FLASHBACK DATABASE
    Fixed: 10.2.0.2, 11.1.0.6

    Bug# 2899477 See Note:2899477.8
    Minimise risk of a false OERI[2662]
    Fixed: 9.2.0.5, 10.1.0.2

    Bug# 2764106 See Note:2764106.8
    False OERI[2662] possible on SELECT which can crash the instance
    Fixed: 9.2.0.5, 10.1.0.2

    Bug# 2216823 See Note:2216823.8
    OERI [2662] reusing a TEMPFILE with a restored database
    Fixed: 10.1.0.2

    Bug# 2054025 See Note:2054025.8
    OERI:2662 possible on new TEMPORARY index block
    Fixed: 9.0.1.3, 9.2.0.1

    Bug# 851959 See Note:851959.8
    OERI:2662 possible from distributed OPS select
    Fixed: 7.3.4.5

    Bug# 647927 P See Note:647927.8
    Digital Unix ONLY: OERI:2662 could occur under heavy load
    Fixed: 8.0.4.2, 8.0.5.0

    INTERNAL ONLY SECTION – NOT FOR PUBLICATION OR DISTRIBUTION TO CUSTOMERS
    ========================================================================

    There were 2 forms of this error until 7.2.3:

    Type I: 4/5 argument forms –
    The SCN found on a block (dependent SCN) is ahead of the
    current SCN. See below for this

    Type II: 1 Argument (before 7.2.3 only):
    Oracle is in the process of writing a block to a log file.
    If the calculated block checksum is less than or equal to 1
    (0 and 1 are reserved) ORA-600 [2662] is returned.
    This is a problem generating an offline immediate log marker
    (kcrfwg).
    *NOT DOCUMENTED HERE*

    Type I
    ~~~~~~
    a. Current SCN WRAP
    b. Current SCN BASE
    c. dependent SCN WRAP
    d. dependent SCN BASE
    e. Where present this is the DBA where the dependent SCN came from.
    From kcrf.h:
    If the SCN comes from the recent or current SCN then a dba
    of zero is saved. If it comes from undo$ because the undo segment is
    not available then the undo segment number is saved, which looks like
    a block from file 0. If the SCN is for a media recovery redo (i.e.
    block number == 0 in change vector), then the dba is for block 0
    of the relevant datafile. If it is from another database for a
    distributed transaction then dba is DBAINF(). If it comes from a TX
    lock then the dba is really usn<<16+slot. Type II ~~~~~~~ a. checksum -> log block checksum – zero if none (thread # in old format)

    —————————————————————————

    Diagnosis:
    ~~~~~~~~~~
    In addition to different basic types from above, there are different
    situations where ORA-600 [2662] type I can be raised.

    Getting started:
    ~~~~~~~~~~~~~~~~
    (1) is the error raised during normal database operations (i.e. when the
    database is up) or during startup of the database?
    (2) what is the SCN difference [d]-[b] ( subtract argument ‘b’ from arg ‘d’)?
    (3) is there a fifth argument [e] ?
    If so convert the dba to file# block#
    Is it a data dictionary object? (file#=1)
    If so find out object name with the help of reference dictionary
    from second database
    (4) What is the current SQL statement? (see trace)
    Which table is refered to?
    Does the table match the object you found in previous step?

    Be careful at this point: there may be no relationship between DBA in [e]
    and the real source of problem (blockdump).

    Deeper analysis:
    ~~~~~~~~~~~~~~~~
    (1) investigate trace file:
    this will be a user trace file normally but could be an smon trace too
    (2) search for: ‘buffer’
    (“buffer dba” in Oracle7 dumps, “buffer tsn” in Oracle8/Oracle9 dumps)
    this will bring you to a blockdump which usually represents the
    ‘real’ source of OERI:2662

    WARNING: There may be more than one buffer pinned to the process
    so ensure you check out all pinned buffers.

    -> does the blockdump match the dba from e.?
    -> what kind of blockdump is it?
    (a) rollback segment header
    (b) datablock
    (c) other

    Check list and possible causes
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    If Parallel Server check both nodes are using the same lock manager
    instance & point at the same control files.

    Possible causes:

    (1) doing an open resetlogs with _ALLOW_RESETLOGS_CORRUPTION enabled
    (2) a hardware problem, like a faulty controller, resulting in a failed
    write to the control file or the redo logs
    (3) restoring parts of the database from backup and not doing the
    appropriate recovery
    (4) restoring a control file and not doing a RECOVER DATABASE USING BACKUP
    CONTROLFILE
    (5) having _DISABLE_LOGGING set during crash recovery
    (6) problems with the DLM in a parallel server environment
    (7) a bug

    Solutions:

    (1) if the SCNs in the error are very close, attempting a startup several
    times will bump up the dscn every time we open the database even if
    open fails. The database will open when dscn=scn.

    (2)You can bump the SCN either on open or while the database is open
    using (see Note:30681.1).
    Be aware that you should rebuild the database if you use this
    option.

    Once this has occurred you would normally want to rebuild the
    database via exp/rebuild/imp as there is no guarantee that some
    other blocks are not ahead of time.

  3. admin says

    ORA-600 [4000] "trying to get dba of undo segment header block from usn"
    PURPOSE:
    This article discusses the internal error "ORA-600 [4000]", what
    it means and possible actions. The information here is only applicable
    to the versions listed and is provided only for guidance.

    ERROR:
    ORA-600 [4000] [a]

    VERSIONS:
    version 6.0 to 9.2

    DESCRIPTION:

    This has the potential to be a very serious error.

    It means that Oracle has tried to find an undo segment number in the
    dictionary cache and failed.

    ARGUMENTS:
    Arg [a] Undo segment number

    FUNCTIONALITY:
    KERNEL TRANSACTION UNDO

    IMPACT:
    INSTANCE FAILURE – Instance will not restart
    STATEMENT FAILURE

    SUGGESTIONS:

    As per Note 1371820.8, this can be seen when executing DML on tables residing
    in tablespaces transported from another database.

    It is fixed in 8.1.7.4, 9.0.1.4 and 9.2.0.1 The workaround however is to
    create more rollback segments in the target database until the highest
    rollback segment number (select max(US#) from sys.undo$;) is at least
    as high as in equivalent max(US#) from the source database.

    It has also been seen where memory has been corrupted so try shutting
    down and restarting the instance.

    If the database will not start contact Oracle Support Services
    immediately, providing the alert.log and associated trace files.

    KNOWN ISSUES:

    NB Bug Fixed Description
    * 9145541 11.1.0.7.4, 11.2.0.1.2, 11.2.0.2, 12.1.0.0 OERI[25027]/OERI[4097]/OERI[4000]/ORA-1555 in plugged datafile after CREATE CONTROLFILE in 11g See NOTE:1066229.1
    2917441 11.1.0.6 OERI [4000] during startup
    2959556 9.2.0.5, 10.1.0.2 STARTUP after an ORA-701 fails with OERI[4000]
    3115733 9.2.0.5, 10.1.0.2 OERI[4000] / index corruption can occur during index coalesce
    1371820 8.1.7.4, 9.0.1.4, 9.2.0.1 OERI:4506 / OERI:4000 possible against transported tablespace
    + 434596 7.3.4.2, 8.0.3.0 ORA-600[4000] from altering storage of BOOTSTRAP$


    Bug 1362499 ORA-600 [4000] after migrating 7.3.4.3 to 8.0.6.1 on HP-UX 32-bit Specific to HP-UX, fixed in one-off patch

     

     

     

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号