Our team deleted some archivelog by mistake. Rolled the database forwards by RMAN incremental recovery to an SCN. Did a manual recovery to sync it with the primary. Managed recovery is now failing.
alter database recover managed standby database disconnect
Alert log has :
Fri Jan 22 13:50:22 2010 Attempt to start background Managed Standby Recovery process MRP0 started with pid=12 MRP0: Background Managed Standby Recovery process started Media Recovery Waiting for thread 1 seq# 193389 Fetching gap sequence for thread 1, gap sequence 193389-193391 Trying FAL server: ITS Fri Jan 22 13:50:28 2010 Completed: alter database recover managed standby database d Fri Jan 22 13:53:25 2010 Failed to request gap sequence. Thread #: 1, gap sequence: 193389-193391 All FAL server has been attempted.
Managed recovery was working earlier today after the Rman incremental and resolved two gaps automatically. But it now appears hung with the standby falling behind the primary.
SQL> show parameter fal NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ fal_client string ITS_STBY fal_server string ITS [v08k608:ITS:oracle]$ tnsping ITS_STBY TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:17 Copyright (c) 1997 Oracle Corporation. All rights reserved. Used parameter files: /oracle/product/9.2.0/network/admin/sqlnet.ora Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= v08k608.am.mot.com)(Port= 1526)) (CONNECT_DATA = (SID = ITS))) OK (10 msec) [v08k608:ITS:oracle]$ tnsping ITS TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:27 Copyright (c) 1997 Oracle Corporation. All rights reserved. Used parameter files: /oracle/product/9.2.0/network/admin/sqlnet.ora Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 187.10.68.75)(Port= 1526)) (CONNECT_DATA = (SID = ITS))) OK (320 msec) Primary has : SQL> show parameter log_archive_dest_2 log_archive_dest_2 string SERVICE=DRITS_V08K608 reopen=6 0 max_failure=10 net_timeout=1 80 LGWR ASYNC=20480 OPTIONAL NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_state_2 string ENABLE [ITS]/its15/oradata/ITS/arch> tnsping DRITS_V08K608 TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:03:24 Copyright (c) 1997 Oracle Corporation. All rights reserved. Used parameter files: /oracle/product/9.2.0/network/admin/sqlnet.ora Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 10.177.13.57)(Port= 1526)) (CONNECT_DATA = (SID = ITS))) OK (330 msec)
The arch process on the primary database might hang due to a bug below so that it couldn’t ship the missing archive log
files to the standby database.
BUG 6113783 ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK
[ Not published so not viewable in My Oracle Support ]
Fixed 11.2, 10.2.0.5 patchset
We could work workaround the issue by killing the arch processes on the primary site and they will be respawned
automatically immediately without harming the primary database.
[maclean@rh2 ~]$ ps -ef|grep arc maclean 8231 1 0 22:24 ? 00:00:00 ora_arc0_PROD maclean 8233 1 0 22:24 ? 00:00:00 ora_arc1_PROD maclean 8350 8167 0 22:24 pts/0 00:00:00 grep arc [maclean@rh2 ~]$ kill -9 8231 8233 [maclean@rh2 ~]$ ps -ef|grep arc maclean 8389 1 0 22:25 ? 00:00:00 ora_arc0_PROD maclean 8391 1 1 22:25 ? 00:00:00 ora_arc1_PROD maclean 8393 8167 0 22:25 pts/0 00:00:00 grep arc and alert log will have: Fri Jul 30 22:25:27 EDT 2010 ARCH: Detected ARCH process failure ARCH: Detected ARCH process failure ARCH: STARTING ARCH PROCESSES ARC0 started with pid=26, OS id=8389 Fri Jul 30 22:25:27 EDT 2010 ARC0: Archival started ARC1: Archival started ARCH: STARTING ARCH PROCESSES COMPLETE ARC1 started with pid=27, OS id=8391 Fri Jul 30 22:25:27 EDT 2010 ARC0: Becoming the 'no FAL' ARCH ARC0: Becoming the 'no SRL' ARCH Fri Jul 30 22:25:27 EDT 2010 ARC1: Becoming the heartbeat ARCH
Actually if we don’t kill some fatal process in 10g , oracle will respawn all nonfatal processes.
For example:
[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v grep maclean 14264 1 0 23:16 ? 00:00:00 ora_pmon_PROD maclean 14266 1 0 23:16 ? 00:00:00 ora_psp0_PROD maclean 14268 1 0 23:16 ? 00:00:00 ora_mman_PROD maclean 14270 1 0 23:16 ? 00:00:00 ora_dbw0_PROD maclean 14272 1 0 23:16 ? 00:00:00 ora_lgwr_PROD maclean 14274 1 0 23:16 ? 00:00:00 ora_ckpt_PROD maclean 14276 1 0 23:16 ? 00:00:00 ora_smon_PROD maclean 14278 1 0 23:16 ? 00:00:00 ora_reco_PROD maclean 14338 1 0 23:16 ? 00:00:00 ora_arc0_PROD maclean 14340 1 0 23:16 ? 00:00:00 ora_arc1_PROD maclean 14452 1 0 23:17 ? 00:00:00 ora_s000_PROD maclean 14454 1 0 23:17 ? 00:00:00 ora_d000_PROD maclean 14456 1 0 23:17 ? 00:00:00 ora_cjq0_PROD maclean 14458 1 0 23:17 ? 00:00:00 ora_qmnc_PROD maclean 14460 1 0 23:17 ? 00:00:00 ora_mmon_PROD maclean 14462 1 0 23:17 ? 00:00:00 ora_mmnl_PROD maclean 14467 1 0 23:17 ? 00:00:00 ora_q000_PROD maclean 14568 1 0 23:18 ? 00:00:00 ora_q001_PROD [maclean@rh2 ~]$ ps -ef|grep ora_|grep -v pmon|grep -v ckpt |grep -v lgwr|grep -v smon|grep -v grep|grep -v dbw|grep -v psp|grep -v mman |grep -v rec|awk '{print $2}'|xargs kill -9 and alert log will have: Fri Jul 30 23:20:58 EDT 2010 ARCH: Detected ARCH process failure ARCH: Detected ARCH process failure ARCH: STARTING ARCH PROCESSES ARC0 started with pid=20, OS id=14959 Fri Jul 30 23:20:58 EDT 2010 ARC0: Archival started ARC1: Archival started ARCH: STARTING ARCH PROCESSES COMPLETE Fri Jul 30 23:20:58 EDT 2010 ARC0: Becoming the 'no FAL' ARCH ARC0: Becoming the 'no SRL' ARCH ARC1 started with pid=21, OS id=14961 ARC1: Becoming the heartbeat ARCH Fri Jul 30 23:21:29 EDT 2010 found dead shared server 'S000', pid = (10, 3) found dead dispatcher 'D000', pid = (11, 3) Fri Jul 30 23:22:29 EDT 2010 Restarting dead background process CJQ0 Restarting dead background process QMNC CJQ0 started with pid=12, OS id=15124 Fri Jul 30 23:22:29 EDT 2010 Restarting dead background process MMON QMNC started with pid=13, OS id=15126 Fri Jul 30 23:22:29 EDT 2010 Restarting dead background process MMNL MMON started with pid=14, OS id=15128 MMNL started with pid=16, OS id=15132 That's all right!