DataGuard Managed recovery hang

Our team deleted some archivelog by mistake. Rolled the database forwards by RMAN incremental recovery to an SCN. Did a manual recovery to sync it with the primary. Managed recovery is now failing.
alter database recover managed standby database disconnect

Alert log has :

Fri Jan 22 13:50:22 2010
Attempt to start background Managed Standby Recovery process
MRP0 started with pid=12
MRP0: Background Managed Standby Recovery process started
Media Recovery Waiting for thread 1 seq# 193389
Fetching gap sequence for thread 1, gap sequence 193389-193391
Trying FAL server: ITS
Fri Jan 22 13:50:28 2010
Completed: alter database recover managed standby database d
Fri Jan 22 13:53:25 2010
Failed to request gap sequence. Thread #: 1, gap sequence: 193389-193391
All FAL server has been attempted.

Managed recovery was working earlier today after the Rman incremental and resolved two gaps automatically. But it now appears hung with the standby falling behind the primary.

SQL> show parameter fal

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fal_client string ITS_STBY
fal_server string ITS

[v08k608:ITS:oracle]$ tnsping ITS_STBY

TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:17

Copyright (c) 1997 Oracle Corporation. All rights reserved.

Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= v08k608.am.mot.com)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (10 msec)
[v08k608:ITS:oracle]$ tnsping ITS

TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:27

Copyright (c) 1997 Oracle Corporation. All rights reserved.

Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 187.10.68.75)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (320 msec)

Primary has :
SQL> show parameter log_archive_dest_2
log_archive_dest_2 string SERVICE=DRITS_V08K608 reopen=6
0 max_failure=10 net_timeout=1
80 LGWR ASYNC=20480 OPTIONAL
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2 string ENABLE
[ITS]/its15/oradata/ITS/arch> tnsping DRITS_V08K608
TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:03:24
Copyright (c) 1997 Oracle Corporation. All rights reserved.
Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 10.177.13.57)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (330 msec)

The arch process on the primary database might hang due to a bug below so that it couldn’t ship the missing archive log
files to the standby database.

BUG 6113783 ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK
[ Not published so not viewable in My Oracle Support ]
Fixed 11.2, 10.2.0.5 patchset

We could work workaround the issue by killing the arch processes on the primary site and they will be respawned
automatically immediately without harming the primary database.

[maclean@rh2 ~]$ ps -ef|grep arc
maclean   8231     1  0 22:24 ?        00:00:00 ora_arc0_PROD
maclean   8233     1  0 22:24 ?        00:00:00 ora_arc1_PROD
maclean   8350  8167  0 22:24 pts/0    00:00:00 grep arc
[maclean@rh2 ~]$ kill -9 8231 8233
[maclean@rh2 ~]$ ps -ef|grep arc
maclean   8389     1  0 22:25 ?        00:00:00 ora_arc0_PROD
maclean   8391     1  1 22:25 ?        00:00:00 ora_arc1_PROD
maclean   8393  8167  0 22:25 pts/0    00:00:00 grep arc

and alert log will have:

Fri Jul 30 22:25:27 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=26, OS id=8389
Fri Jul 30 22:25:27 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=27, OS id=8391
Fri Jul 30 22:25:27 EDT 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Jul 30 22:25:27 EDT 2010
ARC1: Becoming the heartbeat ARCH

Actually if we don’t kill some fatal process in 10g , oracle will respawn all nonfatal processes.
For example:

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v grep
maclean  14264     1  0 23:16 ?        00:00:00 ora_pmon_PROD
maclean  14266     1  0 23:16 ?        00:00:00 ora_psp0_PROD
maclean  14268     1  0 23:16 ?        00:00:00 ora_mman_PROD
maclean  14270     1  0 23:16 ?        00:00:00 ora_dbw0_PROD
maclean  14272     1  0 23:16 ?        00:00:00 ora_lgwr_PROD
maclean  14274     1  0 23:16 ?        00:00:00 ora_ckpt_PROD
maclean  14276     1  0 23:16 ?        00:00:00 ora_smon_PROD
maclean  14278     1  0 23:16 ?        00:00:00 ora_reco_PROD
maclean  14338     1  0 23:16 ?        00:00:00 ora_arc0_PROD
maclean  14340     1  0 23:16 ?        00:00:00 ora_arc1_PROD
maclean  14452     1  0 23:17 ?        00:00:00 ora_s000_PROD
maclean  14454     1  0 23:17 ?        00:00:00 ora_d000_PROD
maclean  14456     1  0 23:17 ?        00:00:00 ora_cjq0_PROD
maclean  14458     1  0 23:17 ?        00:00:00 ora_qmnc_PROD
maclean  14460     1  0 23:17 ?        00:00:00 ora_mmon_PROD
maclean  14462     1  0 23:17 ?        00:00:00 ora_mmnl_PROD
maclean  14467     1  0 23:17 ?        00:00:00 ora_q000_PROD
maclean  14568     1  0 23:18 ?        00:00:00 ora_q001_PROD

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v pmon|grep -v ckpt |grep -v lgwr|grep -v smon|grep -v grep|grep -v dbw|grep -v psp|grep -v mman |grep -v rec|awk '{print $2}'|xargs kill -9

and alert log will have:
Fri Jul 30 23:20:58 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=20, OS id=14959
Fri Jul 30 23:20:58 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Jul 30 23:20:58 EDT 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC1 started with pid=21, OS id=14961
ARC1: Becoming the heartbeat ARCH
Fri Jul 30 23:21:29 EDT 2010
found dead shared server 'S000', pid = (10, 3)
found dead dispatcher 'D000', pid = (11, 3)
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process CJQ0
Restarting dead background process QMNC
CJQ0 started with pid=12, OS id=15124
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMON
QMNC started with pid=13, OS id=15126
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMNL
MMON started with pid=14, OS id=15128
MMNL started with pid=16, OS id=15132

That's all right!

Comments

  1. admin says

    Bug 7262356: HW PROBLEM CAUSED RFS TO HANG – THEN PRIMARY HUNG IN FSFO CONFIG

    Show Bug Attributes Bug Attributes
    Type B – Defect Fixed in Product Version 11.1
    Severity 2 – Severe Loss of Service Product Version 10.2.0.2
    Status 96 – Closed, Duplicate Bug Platform 226 – Linux x86-64
    Created 17-Jul-2008 Platform Version 3.0
    Updated 24-Jul-2009 Base Bug 5404871
    Database Version 10.2.0.2
    Affects Platforms Generic
    Product Source Oracle

    Show Related Products Related Products
    Line Oracle Database Products Family Oracle Database
    Area Oracle Database Product 5 – Oracle Server – Enterprise Edition

    Hdr: 7262356 10.2.0.2 RDBMS 10.2.0.2 DATAGUARD_PSBY PRODID-5 PORTID-226 5404871
    Abstract: HW PROBLEM CAUSED RFS TO HANG – THEN PRIMARY HUNG IN FSFO CONFIG

    *** 07/17/08 11:12 am ***
    TAR:
    —-

    PROBLEM:
    ——–
    The problem has been that the standby hardware have some faulty backpane.
    This has caused processes in the standby to go into what is called as “D”
    state where the processes gets stuck in a kernel call. So the RFS processes
    on the standby are getting hung in the Operating System. This in turn causes
    the ARCH processes on the primary to hang. These ARCH processes on the
    primary are in actuality archiving the ORLs to the standby and to the local
    destination. So when they hang, they are also preventing the archival of the
    ORLs to the local destination. When enough of these ARCH processes hang, the
    primary runs out of ORLs and thus the primary itself hangs even though
    log_archive_local_first is set to true (default).

    DIAGNOSTIC ANALYSIS:
    ——————–

    WORKAROUND:
    ———–
    Kill all the archiver processes.

    RELATED BUGS:
    ————-
    bug 6113783 and bug 6987510

    REPRODUCIBILITY:
    —————-
    Can’t reproduce a hardware failure but perhaps stop an ARCH process?

    TEST CASE:
    ———-

    STACK TRACE:
    ————

    SUPPORTING INFORMATION:
    ———————–

    24 HOUR CONTACT INFORMATION FOR P1 BUGS:
    —————————————-

    DIAL-IN INFORMATION:
    ——————–

    IMPACT DATE:
    ————

  2. admin says

    Type B – Defect Fixed in Product Version –
    Severity 2 – Severe Loss of Service Product Version 10.2.0.3
    Status 96 – Closed, Duplicate Bug Platform 46 – Linux x86
    Created 19-Jun-2007 Platform Version –
    Updated 25-Jun-2007 Base Bug 6113783
    Database Version 10.2.0.3
    Affects Platforms Generic
    Product Source Oracle

    Show Related Products Related Products
    Line Oracle Database Products Family Oracle Database
    Area Oracle Database Product 5 – Oracle Server – Enterprise Edition

    Hdr: 6140651 10.2.0.3 RDBMS 10.2.0.3 DATAGUARD_TRAN PRODID-5 PORTID-46 6113783
    Abstract: PRIMARY NOT SENDING ARCHIVED REDO TO DATAGUARD DATABASE

    *** 06/19/07 10:12 am ***
    TAR:
    —-

    PROBLEM:
    ——–
    1. Clear description of the problem encountered:
    10.2.0.3 primary database not sync’ing up with DR database

    2. Pertinent configuration information (MTS/OPS/distributed/etc)

    3. Indication of the frequency and predictability of the problem

    4. Sequence of events leading to the problem
    The node containing the dataguard (DG) database was bounced. The DG
    database was not shut down properly before the bounce. Since this time, the
    primary refuses to send archives to the DG site.

    5. Technical impact on the customer. Include persistent after effects.
    We are currently forced to manually transfer archives from our primary

    DIAGNOSTIC ANALYSIS:
    ——————–
    We’ve set log_archive_trace=4095 on the primary and gathered tracing from the
    primary database. This log file will be uploaded. This trace file indicates
    inconsistent status for various archives that have been transferred in the
    past.

    We’ve also tried to recreate the standby control file but we are still unable
    to get the primary to send archives to the DG database.

    No errors are produced in the alert logs in either the primary or the DG
    database.

    WORKAROUND:
    ———–
    Manually transfer archives and apply them to the standby database daily.

    RELATED BUGS:
    ————-

    REPRODUCIBILITY:
    —————-
    1. State if the problem is reproducible; indicate where and predictability
    It is currently reproducible on our machine

    2. List the versions in which the problem has reproduced
    10.2.0.3

    TEST CASE:
    ———-
    None

    STACK TRACE:
    ————
    None

    SUPPORTING INFORMATION:
    ———————–
    Trace file will be uploaded

    24 HOUR CONTACT INFORMATION FOR P1 BUGS:
    —————————————-

    DIAL-IN INFORMATION:
    ——————–

    IMPACT DATE:
    ————
    We will be bouncing the primary database on 29Jun2007 to apply some patches.
    We need to get an idea of the problem we’re facing before that time.

    *** 06/19/07 10:57 am ***
    Some more info:
    Non-RAC databases.

    On Primary:
    log_archive_config string
    DG_CONFIG=(FORUMPRD,FORUMSTB)
    log_archive_dest_1 string
    LOCATION=/u05/oraarch/FORUMPRD/ MANDATORY REOPEN=60
    VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=FORUMPRD
    log_archive_dest_2 string
    DB_UNIQUE_NAME=FORUMSTB

    SQL> SELECT DATABASE_ROLE, DB_UNIQUE_NAME INSTANCE, OPEN_MODE, –
    PROTECTION_MODE, PROTECTION_LEVEL, SWITCHOVER_STATUS –
    FROM V$DATABASE;> >

    DATABASE_ROLE INSTANCE OPEN_MODE PROTECTION_MODE
    —————- —————————— ———-
    ——————–
    PROTECTION_LEVEL SWITCHOVER_STATUS
    ——————– ——————–
    PRIMARY FORUMPRD READ WRITE MAXIMUM
    PERFORMANCE
    MAXIMUM PERFORMANCE SESSIONS ACTIVE

    SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM
    V$MANAGED_STANDBY;

    PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
    ——— ———— ———- ———- ———- ———-
    ARCH CLOSING 1 3985 180225 1029
    ARCH OPENING 1 3840 104449 1962

    col dest_id format 99
    col recovery_mode format a10
    SELECT DEST_ID, ARCHIVED_THREAD#, ARCHIVED_SEQ#, APPLIED_THREAD#,
    APPLIED_SEQ#, RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;SQL> SQL>

    DEST_ID ARCHIVED_THREAD# ARCHIVED_SEQ# APPLIED_THREAD# APPLIED_SEQ#
    RECOVERY_M
    ——- —————- ————- ————— ————
    ———-
    1 1 3985 0 0 IDLE
    2 1 3838 1 3837 UNKNOWN
    3 0 0 0 0 IDLE
    4 0 0 0 0 IDLE
    5 0 0 0 0 IDLE
    6 0 0 0 0 IDLE
    7 0 0 0 0 IDLE
    8 0 0 0 0 IDLE
    9 0 0 0 0 IDLE
    10 0 0 0 0 IDLE

    10 rows selected.

    SELECT REGISTRAR, CREATOR, THREAD#, SEQUENCE#, FIRST_CHANGE#, NEXT_CHANGE#,
    APPLIED FROM V$ARCHIVED_LOG order by FIRST_CHANGE#;


    REGISTR CREATOR THREAD# SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APP
    ——- ——- ———- ———- ————- ———— —
    ARCH ARCH 1 3972 9.9047E+12 9.9047E+12 NO
    FGRD FGRD 1 3973 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3974 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3975 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3976 9.9047E+12 9.9047E+12 NO
    FGRD FGRD 1 3977 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3978 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3979 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3980 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3981 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3982 9.9047E+12 9.9047E+12 NO

    REGISTR CREATOR THREAD# SEQUENCE# FIRST_CHANGE# NEXT_CHANGE# APP
    ——- ——- ———- ———- ————- ———— —
    ARCH ARCH 1 3983 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3984 9.9047E+12 9.9047E+12 NO
    ARCH ARCH 1 3985 9.9047E+12 9.9047E+12 NO

    685 rows selected.

    SELECT THREAD#, SEQUENCE#, FIRST_CHANGE#, FIRST_TIME, NEXT_CHANGE# FROM
    V$LOG_HISTORY order by FIRST_TIME;

    THREAD# SEQUENCE# FIRST_CHANGE# FIRST_TIME NEXT_CHANGE#
    ———- ———- ————- ————— ————
    1 3966 9.9047E+12 18-JUN-07 9.9047E+12
    1 3967 9.9047E+12 18-JUN-07 9.9047E+12
    1 3968 9.9047E+12 18-JUN-07 9.9047E+12
    1 3969 9.9047E+12 18-JUN-07 9.9047E+12
    1 3970 9.9047E+12 18-JUN-07 9.9047E+12
    1 3971 9.9047E+12 18-JUN-07 9.9047E+12
    1 3972 9.9047E+12 18-JUN-07 9.9047E+12
    1 3973 9.9047E+12 18-JUN-07 9.9047E+12
    1 3974 9.9047E+12 18-JUN-07 9.9047E+12
    1 3975 9.9047E+12 18-JUN-07 9.9047E+12
    1 3976 9.9047E+12 18-JUN-07 9.9047E+12

    THREAD# SEQUENCE# FIRST_CHANGE# FIRST_TIME NEXT_CHANGE#
    ———- ———- ————- ————— ————
    1 3977 9.9047E+12 18-JUN-07 9.9047E+12
    1 3978 9.9047E+12 18-JUN-07 9.9047E+12
    1 3979 9.9047E+12 18-JUN-07 9.9047E+12
    1 3980 9.9047E+12 19-JUN-07 9.9047E+12
    1 3981 9.9047E+12 19-JUN-07 9.9047E+12
    1 3982 9.9047E+12 19-JUN-07 9.9047E+12
    1 3983 9.9047E+12 19-JUN-07 9.9047E+12
    1 3984 9.9047E+12 19-JUN-07 9.9047E+12
    1 3985 9.9047E+12 19-JUN-07 9.9047E+12

    680 rows selected.

    *** 06/19/07 10:57 am ***
    On Standby
    log_archive_config string
    log_archive_dest string
    log_archive_dest_1 string
    LOCATION=/oraarch/FORUMPRD/ MANDATORY REOPEN=60
    VALID_FOR=(ALL_LOGFILES,ALL_ROLES) DB_UNIQUE_NAME=FORUMPRD

    SQL> SELECT DATABASE_ROLE, DB_UNIQUE_NAME INSTANCE, OPEN_MODE, –
    PROTECTION_MODE, PROTECTION_LEVEL, SWITCHOVER_STATUS –
    FROM V$DATABASE;> >

    DATABASE_ROLE INSTANCE OPEN_MODE PROTECTION_MODE
    —————- —————————— ———-
    ——————–
    PROTECTION_LEVEL SWITCHOVER_STATUS
    ——————– ——————–
    PHYSICAL STANDBY FORUMSTB MOUNTED MAXIMUM
    PERFORMANCE
    MAXIMUM PERFORMANCE SESSIONS ACTIVE

    SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM
    V$MANAGED_STANDBY;

    PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
    ——— ———— ———- ———- ———- ———-
    ARCH CONNECTED 0 0 0 0
    ARCH CONNECTED 0 0 0 0
    MRP0 WAIT_FOR_LOG 1 3974 0 0

    col dest_id format 99
    col recovery_mode format a10
    SELECT DEST_ID, ARCHIVED_THREAD#, ARCHIVED_SEQ#, APPLIED_THREAD#,
    APPLIED_SEQ#, RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS;SQL> SQL>

    DEST_ID ARCHIVED_THREAD# ARCHIVED_SEQ# APPLIED_THREAD# APPLIED_SEQ#
    RECOVERY_M
    ——- —————- ————- ————— ————
    ———-
    1 0 0 0 0 MANAGED
    2 0 0 0 0 MANAGED
    3 0 0 0 0 MANAGED
    4 0 0 0 0 MANAGED
    5 0 0 0 0 MANAGED
    6 0 0 0 0 MANAGED
    7 0 0 0 0 MANAGED
    8 0 0 0 0 MANAGED
    9 0 0 0 0 MANAGED
    10 0 0 0 0 MANAGED
    11 0 0 0 0 MANAGED

    11 rows selected.

    SELECT REGISTRAR, CREATOR, THREAD#, SEQUENCE#, FIRST_CHANGE#, NEXT_CHANGE#,
    APPLIED FROM V$ARCHIVED_LOG order by FIRST_CHANGE#;

    no rows selected

    SELECT THREAD#, SEQUENCE#, FIRST_CHANGE#, FIRST_TIME, NEXT_CHANGE# FROM
    V$LOG_HISTORY order by FIRST_TIME;

    THREAD# SEQUENCE# FIRST_CHANGE# FIRST_TIME NEXT_CHANGE#
    ———- ———- ————- ————————– ————
    1 3965 9.9047E+12 18-JUN-2007 04:39:28 9.9047E+12
    1 3966 9.9047E+12 18-JUN-2007 06:19:30 9.9047E+12
    1 3967 9.9047E+12 18-JUN-2007 07:52:49 9.9047E+12
    1 3968 9.9047E+12 18-JUN-2007 09:37:14 9.9047E+12
    1 3969 9.9047E+12 18-JUN-2007 11:47:56 9.9047E+12
    1 3970 9.9047E+12 18-JUN-2007 11:59:38 9.9047E+12
    1 3971 9.9047E+12 18-JUN-2007 14:27:18 9.9047E+12
    1 3972 9.9047E+12 18-JUN-2007 17:13:19 9.9047E+12
    1 3973 9.9047E+12 18-JUN-2007 20:03:04 9.9047E+12

    680 rows selected.

    The standby is currently waiting for 3974 as seen in the alert.log
    Tue Jun 19 11:38:37 2007
    alter database recover managed standby database disconnect
    Tue Jun 19 11:38:37 2007
    Attempt to start background Managed Standby Recovery process (FORUMPRD)
    MRP0 started with pid=16, OS id=16182
    Tue Jun 19 11:38:37 2007
    MRP0: Background Managed Standby Recovery process started (FORUMPRD)
    Managed Standby Recovery not using Real Time Apply
    parallel recovery started with 7 processes
    Media Recovery Waiting for thread 1 sequence 3974
    Tue Jun 19 11:38:44 2007
    Completed: alter database recover managed standby database disconnect

    Archive sequence 3974 exists on the primary site.

    *** 06/19/07 10:58 am *** (CHG: SubComp->DATAGUARD_BRKR)
    *** 06/19/07 10:58 am *** (CHG: Asg->NEW OWNER OWNER)
    *** 06/19/07 10:58 am ***
    *** 06/19/07 10:59 am *** (CHG: SubComp->DATAGUARD_TRAN)
    *** 06/19/07 12:58 pm *** (CHG: Sta->30 Asg->NEW OWNER OWNER)
    *** 06/19/07 12:58 pm ***
    *** 06/19/07 12:58 pm *** (CHG: Sta->92)
    *** 06/19/07 01:27 pm *** (CHG: Sta->11)
    *** 06/19/07 01:27 pm ***
    *** 06/20/07 09:34 am *** (CHG: Asg->NEW OWNER OWNER)
    *** 06/20/07 12:41 pm ***
    I suggested RHACHEM to change log_archive_max_processes to 5 and from his
    reply “The system is now working properly.”.
    *** 06/20/07 01:41 pm ***
    In RHACHEM’s environment, log_archive_max_processes was set to 2, and it
    seems one of the ARCH process was hung, so the logs couldn’t be shipped to
    standby.
    I saw a bunch of messages from forumprd_arc0_31738.trc.
    tkcrrsarc: (WARN) Failed to find ARCH for message (message:0xa)
    tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0xa)

    *** 06/20/07 02:26 pm ***
    ARC1’s trace file ends 05Jun2007, but process is still alive. File uploaded
    *** 06/20/07 05:31 pm ***
    Processstate for ARC1 uploaded.
    *** 06/21/07 07:23 am ***
    The bug is understood now.
    From forumprd_arc1_31742.trc.processstate, “ARCH wait on SENDREQ’ for 673645
    seconds.”. The ARCH1 is stuck waiting for reply from standby.

    Steve has a txn which will kill ARCH after some timeout and he is going to
    merge his txn in 11.2. Since his txn will fix this bug. I’m assigning this
    bug to him.

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号