ASM communication error has been reported by RDBMS which is leading to instance crash. This has happened couple of times in last few months. There are two kind of ASM communication error happened: WARNING: ASM communication error: op 17 state 0x40 (21561) WARNING: ASM communication error: op 0 state 0x0 (15055)
We are seeing this kind of crash frequently causing disruption in the service and availability of this critical production database. DIAGNOSTIC ANALYSIS: -------------------- This is a 4 Node RAC database. Last time issue occured on Instance 1. Diagnostics Time frame to focus on: =========================================== Wed Feb 27 10:29:50 2013 <==== ariesprd1 communciation failure with ASM reported WARNING: ASM communication error: op 17 state 0x40 (21561) .. .. WARNING: ASM communication error: op 0 state 0x0 (15055) .. Wed Feb 27 12:56:04 2013 Errors in file D:\ORABASE\diag\rdbms\ariesprd\ariesprd1\trace\ariesprd1_dbw0_10068.trc: ORA-21561: OID generation failed .. .. Wed Feb 27 12:56:04 2013 <===== leading to instance crash System state dump requested by (instance=1, osid=10068 (DBW0)), summary=[abnormal instance termination]. System State dumped to trace file D:\ORABASE\diag\rdbms\ariesprd\ariesprd1\trace\ariesprd1_diag_6420.trc DBW0 (ospid: 10068): terminating the instance due to error 63997 WORKAROUND: ----------- Generally, instance crash resolves the issue but last time it led to issue with block recovery (kind of logical corruption) causing the all four nodes to hang forever. This creates a kind of hang in the system till ultimately database instance is crashing. Last crash has led to some block recovery issue and finally we have to deploy DUl to retrieve the data For More Oracle DUL (data unloader) information : Refer http://parnassusdata.com/en/emergency-services
Comment