gc buffer busy等待事件

This wait, also known as global cache buffer busy prior to Oracle 10g, specifies the time the remote instance locally spends accessing the requested data block. This wait event is very similar to the buffer busy waits wait event in a single-instance database.

Problem

High wait times on this event are often the result of:

Hot Blocks – multiple sessions may be requesting a block that is either not in buffer cache or is in an incompatible mode.

Inefficient Queries- as with the gc cr request wait event, the more blocks requested from the buffer cache the more likelihood of a session having to wait for other sessions.

Solutions

Deleting some of the hot rows and re-inserting them back into the table may alleviate a Hot Block problem. Most of the time the rows will be placed into a different block and reduce contention on the block. The DBA may also need to adjust the pctfree and/or pctused parameters for the table to ensure the rows are placed into a different block.

Tuning queries to access fewer blocks in buffer cache will often result in less contention for the same block.

DFS lock handle等待事件

The session waits for the lock handle of a global lock request on the distributed file system (DFS). It is most common in a parallel server situation.

 

Solutions

During a period of wait, use the following SQL statement to retrieve the name of the lock and the mode of the lock request:

select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) “Lock”,
to_char(bitand(p1, 65536)) “Mode”,
p2, p3 , seconds_in_wait
from v$session_wait
where event = ‘DFS lock handle’

 

Then use the following to identify the requesting session and the blocking session:

select inst_id, sid, type, id1, id2, lmode, request, block
from gv$lock
where type=’CI’ and id1=9 and id2=5

 

In this example, use the values from the first query in the where clause. type=Lock, id1=p1 and id2=p2.

 

Waits of this type generally indicate competition between distributed nodes for data.

Know about Oracle RAC Heartbeat

Oracle CRS的心跳主要分为Disk和Network Heartbeat,不同种类的心跳超时会对RAC造成不同的影响。

Heartbeat Mechanisms

Disk Heartbeat ( Voting device ) – IOT
Network Heartbeat ( across the interconnect ) – misscount
Misscount – max time a heartbeat can be missed before entering into a cluster reconfig to evict a node
IOT – internal I/O timeout, in seconds where an I/O to the disk must complete
IOT = misscount [ during initial cluster formation time]
IOT = disktimeout [ all other times ] 

Hardware Storage timeout
Software Storage timeout
OCFS Heartbeat timeouts

disktimeout > max(O2CB_HEARTBEAT_THRESHOLD * 2, HW_STORAGE_TIMEOUT, SW_STORAGE_TIMEOUT)

oracle_crs_heartbeats

How to Uninstall/Reinstall 10g CRS Clusterware?

如何重装10g clusterware集群软件?

一般情况下我会说把DB、CRS停掉,手动删除ORA_CRS_HOME及/etc(或者/opt)目录下的ora*配置文件,使用dd命令清理LUN的头部。再重装一般就没有问题了,但实践过程中往往会因为清理地不够彻底而出现问题。

今天同事问我有没有彻底卸载CRS的文档,到metalink上搜了一圈结果还真没有,最后从Alejandro Vargas’ Blog上找到一份比较权威的文档,这里分享一下。

Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例

一套Linux x86-64上的11.2.0.1 4节点RAC系统中LMS GCS服务进程遭遇到内部错误ORA-00600[kjbmprlst:shadow],导致节点实例意外终止,具体日志如下:

Fri Jul 08 02:04:43 2011
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc  (incident=1011732):
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/PROD/PROD1/incident/incdir_1011732/PROD1_lms1_536_i1011732.trc
Fri Jul 08 02:04:44 2011
Trace dumping is performing id=[cdmp_20110708020444]
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_lms1_536.trc:
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
LMS1 (ospid: 536): terminating the instance due to error 484
Fri Jul 08 02:04:45 2011
opiodr aborting process unknown ospid (27387) as a result of ORA-1092
System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/diag/rdbms/PROD/PROD1/trace/PROD1_diag_513.trc
Fri Jul 08 02:04:54 2011
Termination issued to instance processes. Waiting for the processes to exit
Fri Jul 08 02:04:58 2011
ORA-1092 : opitsk aborting process

该ORA-00600[kjbmprlst:shadow]错误定位为11.2.0.1上的Bug 10121589或Bug 9458781:

Bug 10121589  ORA-600 [kjbmprlst:shadow] can occur in RAC
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions BELOW 12.1
    Versions confirmed as being affected 	

        11.2.0.1 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        12.1 (Future Release)
        11.2.0.2 Bundle Patch 2 for Exadata Database
        11.2.0.1 Bundle Patch 7 for Exadata Database 

Symptoms:

Related To:

    Internal Error May Occur (ORA-600)
    ORA-600 [kjbmprlst:shadow] 

    RAC (Real Application Clusters) / OPS 

Description

    An ORA-600 [kjbmprlst:shadow] can occur if the fix for bug 9979039
    is present.

    Note:
     One off patches for 10200390 should also include this fix.

Bug 9458781  Missing close message to master leaves closed lock dangling crashing the instance with assorted Internal error

Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions >= 11.2.0.1 but BELOW 11.2.0.2
    Versions confirmed as being affected 	

        11.2.0.1 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        11.2.0.2 (Server Patch Set)
        11.2.0.1 Bundle Patch 4 for Exadata Database 

Symptoms:

Related To:

    Instance May Crash
    Internal Error May Occur (ORA-600)
    ORA-600 [KJBMPRLST:SHADOW]
    ORA-600 [KJBMOCVT:RID]
    ORA-600 [KJBRREF:PKEY]
    ORA-600 [KJBRASR:PKEY] 

    RAC (Real Application Clusters) / OPS 

Description

    A lock is closed without sending a message to the master.
    This causes closed lock dangling at the master crashing the instance with different internal errors.

    Reported internal errors so far are :
    - KJBMPRLST:SHADOW
    - KJBMOCVT:RID
    - KJBRREF:PKEY
    - KJBRASR:PKEY

该kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞,目前除了安装补丁外没有已验证的workaround办法(disable drm似乎是无效的):

oradebug lkdebug (track resources, take dumps)
KCL history
KJBL history
KJL history

PCM (GCS) and non-PCM (GES) resources are kept separate and use separate code paths.
GES:
Resource table: kjr and kjrt
Lock table: kjlt
Processes: kjpt
GCS:
Resource table: kjbr
Lock table: kjbl

DLM Structures (continued)
/* PCM resource structure */
typedef struct kjbr {                                /* 68 bytes on sun4u */
  kjsolk       hash_q_kjbr;                             /* hash list : hp */
  ub4          resname_kjbr[2];	                     /* the resource name */
  kjsolk       scan_q_kjbr; /* chain to lmd scan q of grantable resources */
  kjsolk       grant_q_kjbr;                 /* list of granted resources */
  kjsolk       convert_q_kjbr;       /* list of resources being converted */
  ub4          diskscn_bas_kjbr;         /* scn(base) known to be on disk */
  ub2          diskscn_wrap_kjbr;        /* scn(wrap) known to be on disk */
  ub2          writereqscn_wrap_kjbr;    /* scn(wrap) requested for write */
  ub4          writereqscn_bas_kjbr;     /* scn(base) requested for write */
  struct kjbl *sender_kjbr;                 /* lock elected to send block */
  ub2          senderver_kjbr;                  /* version# of above lock */
  ub2          writerver_kjbr;                  /* version# of lock below */
  struct kjbl *writer_kjbr;                /* lock elected to write block */
  ub1          mode_role_kjbr; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1          flags_kjbr;                        /* ignorewip, free etc. */
  ub1          rfpcount_kjbr;                      /* refuse ping counter */
  ub1          history_kjbr;                /* resource operation history */
  kxid         xid_kjbr;                          /* split transaction ID */
} kjbr ;

/* kjbl - PCM lock structure
** Clients and most of the DLM will use the KJUSER* or KJ_* modes and kscns  */

typedef struct kjbl {                                /* 52 bytes on sun4u */
  union {                     /* discriminate lock@master and lock@client */
    struct {                                           /* for lock@master */
      kgglk        state_q_kjbl;             /* link to chain to resource */
      kjbopqi     *rqinfo_kjbl;                             /* target bid */
      struct kjbr *resp_kjbl;                   /* pointer to my resource */
    } kjbllam;                                 /* KJB Lock Lock At Master */
    struct {                                           /* for lock@client */
      ub4         disk_base_kjbl;        /* disk version(base) for replay */
      ub2         disk_wrap_kjbl;        /* disk version(wrap) for replay */
      ub1         master_node_kjbl;                   /* master instance# */
      ub1         client_flag_kjbl;     /* flags specific to client locks */
      ub2         update_seq_kjbl;               /* last update to master */
    } kjbllac;                                 /* KJB Lock Lock At Client */
  } kjblmcd;                        /* KJB Lock Master Client Discrimnant */
  void  *remote_lockp_kjbl;           /* pointer to client lock or shadow */
  ub2    remote_ver_kjbl;                         /* remote lock version# */
  ub2        ver_kjbl;                                     /* my version# */
  ub2        msg_seq_kjbl;                         /* client->master seq# */
  ub2        reqid_kjbl;                         /* requestid for convert */
  ub2        creqid_kjbl; /* requestid for convert that has been cancelled */
  ub2        pi_wrap_kjbl;                     /* scn(wrap) of highest pi */
  ub4        pi_base_kjbl;                     /* scn(base) of highest pi */
  ub1        mode_role_kjbl; /* one of 'n', 's', 'x' && one of 'l' or 'g' */
  ub1        state_kjbl;       /* _L|_R|_W|_S, notify, which q, lock type */
  ub1        node_kjbl;                       /* instance lock belongs to */
  ub1        flags_kjbl;                                /* lock flag bits */
  ub2        rreqid_kjbl;                               /* save the reqid */
  ub2         write_wrap_kjbl;        /* last write request version(wrap) */
  ub4         write_base_kjbl;        /* last write request version(base) */
  ub4         history_kjbl;                     /* lock operation history */
} kjbl;

PCM DLM locks that are owned by the local instance are allocated and embedded in an LE structure.
PCM DLM locks that are owned by remote instances and mastered by the local instance are allocated in SHARED_POOL.

PCM Locks and Resources
Fields of interest in the kclle structure: kcllerls or releasing; kcllelnm or name(id1,id2);
kcllemode or held-mode; kclleacq or acquiring; kcllelck or DLM lock.

Fields of interest in the kjbr structure: resname_kjbr[2] or resource name; grant_q_kjbr or grant queue;
convert_q_kjbr or convert queue; mode_role_kjbr, which is a bitwise merge of grant mode and
role-interpreted NULL(0x00), S(0x01), X(0x02), L0 Local (0x00), G0 Global without PI (0x08), G1 Global with PI (0x018).

The field mode_role_kjbl in kjbl is a bitwise merge of grant, request, and lock mode: 0x00 if grant NULL;
0x01 if grant S; 0x02 if grant X; 0x04 lock has been opened at master; 0x08 if global role (otherwise local);
0x10 has one or more PI; 0x20 if request CR; 0x40 if request S; 0x80 if request X.

Someone has to keep a list of all buffers and where they are mastered
This is called Global Resource Directory (GRD)
GRD is present on all the instances of the cluster
To find out the master:
select  b.dbablk, r.kjblmaster master_node
from x$le l, x$kjbl r, x$bh b
where b.obj =
and b.le_addr = l.le_addr
and l.le_kjbl = r.kjbllockp

Oracle Support宣称可以通过11.2.0.2 (Server Patch Set)11.2.0.1 Bundle Patch 4 for Exadata Database修复该bug,但是有迹象表明在11.2.0.2上仍可能发生该ORA-00600[kjbmprlst:shadow]内部错误,同时该bug更多地发生在超过2个节点的RAC系统中。

 

Oracle等待事件KJC:Wait for msg sends to complete

KJC: Wait for msg sends to complete是一种只会发生在RAC环境中的等待事件,目前对于该等待事件已知的信息不多。
KJ意为Kernel Lock Management Layer,Messaging Enhancements (KJC)。
猜测该等待事件为global enqueue message的网络传输等待,前台进程和后台进程如LMS、LMD均可能发生该等待事件。

 

select name,parameter1,parameter2,parameter3,wait_class from v$event_name where name like '%KJC%';

NAME                                                             PARAMETER1
---------------------------------------------------------------- ----------------------------------------------------------------
PARAMETER2                                                       PARAMETER3
---------------------------------------------------------------- ----------------------------------------------------------------
WAIT_CLASS
----------------------------------------------------------------
KJC: Wait for msg sends to complete                              msg
dest|rcvr                                                        mtype
Other

SQL> select * from gV$system_Event where event like '%KJC%';

   INST_ID EVENT                                                            TOTAL_WAITS
---------- ---------------------------------------------------------------- -----------
TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO TOTAL_WAITS_FG TOTAL_TIMEOUTS_FG
-------------- ----------- ------------ ----------------- -------------- -----------------
TIME_WAITED_FG AVERAGE_WAIT_FG TIME_WAITED_MICRO_FG   EVENT_ID WAIT_CLASS_ID WAIT_CLASS#
-------------- --------------- -------------------- ---------- ------------- -----------
WAIT_CLASS
----------------------------------------------------------------
         2 KJC: Wait for msg sends to complete                                      415
             0           1            0              7638            163                 0
             0               0                 2494 4079369678    1893977003           0
Other

         1 KJC: Wait for msg sends to complete                                       93
             0           0            0              2094             37                 0
             0               0                  615 4079369678    1893977003           0
Other

注意到该等待事件的类别是other而非concurrency,若发生该等待事件我们应当关心的是网络性能
netstat -s or ifconfig?

以下为具体的等待发生场景:

PARSING IN CURSOR #140438030304224 len=169 dep=2 uid=0 oct=3 lid=0
tim=1310310630713104 hv=2411583655 ad='b5a49898' sqlid='637txtu7vvq57'
select service_name, service_name_hash,wait_class_id, wait_class#,  wait_class,total_waits, time_waited  from
gv$service_wait_class where inst_id = USERENV('Instance')

END OF STMT
PARSE #140438030304224:c=6999,e=7685,p=0,cr=5,cu=0,mis=1,r=0,dep=2,og=4,plh=809657567,tim=1310310630713101
CLOSE #140438030304224:c=0,e=20,dep=2,type=0,tim=1310310630713388
WAIT #140438029567384: nam='row cache lock' ela= 530 cache id=8 mode=0 request=3 obj#=6623 tim=1310310630716555
EXEC #140438032451672:c=0,e=61,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,plh=2853959010,tim=1310310630716708
FETCH #140438032451672:c=0,e=49,p=0,cr=4,cu=0,mis=0,r=1,dep=2,og=4,plh=2853959010,tim=1310310630716809
CLOSE #140438032451672:c=0,e=10,dep=2,type=3,tim=1310310630716877

WAIT #140438029567384: nam='KJC: Wait for msg sends to complete' ela= 18
msg=3235372280 dest|rcvr=65536 mtype=12 obj#=6623 tim=1310310630719146

EXEC #140438032420328:c=0,e=190,p=0,cr=1,cu=5,mis=0,r=1,dep=1,og=4,plh=0,tim=1310310630781989

WAIT #140438032420328: nam='KJC: Wait for msg sends to complete' ela= 55
msg=3235374760 dest|rcvr=65537 mtype=32 obj#=74418 tim=1310310630782830

这里的msg猜测可能是递增的message id
mytpe为该msg的类型
obj#为对象号

KJC: Wait for msg sends to complete可能的相关stack call:

LMS's stack:
ksedms ksdxfdmp ksdxcb sspuser sigacthandler libc_poll
kslwait ksxprcv kjctr_rksxp kjctsrw kjccwcrm kjccqmg
kjbmsclose kjbclose kclpdc kclpdcl kclpbi kjmsm
ksbrdp opirip opidrv sou2o main start 

 LMS4 was waiting for  "KJC: Wait for msg sends to complete"

  'KJC: Wait for msg sends to complete', at one of the stack trace
  below

  o kjccqmg => kjctcwcrm
  o kjccsmg => kjctcwcrm

o  an Oracle process is spinning in kjctcwcrm() on 9.0, 9.2 or 10.1

o  an Oracle process got ORA-600[kjccgmb:1] on 9.0, 9.2 or 10.1

o Got ORA-7445[] at an unknown function called from kjbmmsg2res()
o A LMS process hung in kjctcwcrm() on shutdown

LMS may hang waiting for 'kjc: wait for msg sends to complete'
during shutdown of an instance with another session
reporting ORA-600 [kjccgmb:1]

IAGNOSTIC ANALYSIS:
--------------------
Let the customer tracing the lmon, lmd when doing the shutdown immediate,
find out:
    In sur11_lmon_5948.trc
    WAIT #0: nam='ges LMD to shutdown' ela= 496057 p1=0 p2=0 p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49213 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49811 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49772 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49813 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49779 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49787 p1=0 p2=0
p3=0
    WAIT #0: nam='KJC: Wait for msg sends to complete' ela= 49779 p1=0 p2=0 

	LMS4's stack:
ksedms ksdxfdmp ksdxcb sspuser sigacthandler libc_poll
kslwait ksxprcv kjctr_rksxp kjctsrw kjccwcrm kjccqmg
kjbmsclose kjbclose kclpdc kclpdcl kclpbi kjmsm
ksbrdp opirip opidrv sou2o main start 

LMS4 was encountered ORA-4031 during shutting down,

#0  0x0000000004f5bdbe in kjccqmg ()
#1  0x0000000005172378 in kjbmflushsc ()
#2  0x0000000005175351 in kjbmfto ()
#3  0x0000000004ff425f in kjmsm ()
#4  0x00000000022677a7 in ksbrdp ()
#5  0x000000000248c8a3 in opirip ()
#6  0x000000000171eefd in opidrv ()
#7  0x0000000001d95f5f in sou2o ()
#8  0x0000000000a079ba in opimai_real ()
#9  0x0000000001d9b408 in ssthrdmain ()
#10 0x0000000000a078a1 in main ()

(gdb) bt
#0  0x00000000053907aa in kclpbi ()
#1  0x0000000004ff46dc in kjmsm ()
#2  0x00000000022677a7 in ksbrdp ()
#3  0x000000000248c8a3 in opirip ()
#4  0x000000000171eefd in opidrv ()
#5  0x0000000001d95f5f in sou2o ()
#6  0x0000000000a079ba in opimai_real ()
#7  0x0000000001d9b408 in ssthrdmain ()
#8  0x0000000000a078a1 in main ()

(gdb) bt
#0  0x0000000004f5bdbe in kjccqmg ()
#1  0x00000000051a232a in kjbmswrite ()
#2  0x00000000051014f1 in kjbrswrite ()
#3  0x00000000050fec8d in kjbrwreq ()
#4  0x0000000005189fcc in kjbmpwrite ()
#5  0x0000000004fe98fe in kjmxmpm ()
#6  0x0000000004feceee in kjmpmsgi ()
#7  0x0000000004ff2d8e in kjmsm ()
#8  0x00000000022677a7 in ksbrdp ()
#9  0x000000000248c8a3 in opirip ()
#10 0x000000000171eefd in opidrv ()
#11 0x0000000001d95f5f in sou2o ()
#12 0x0000000000a079ba in opimai_real ()
#13 0x0000000001d9b408 in ssthrdmain ()
#14 0x0000000000a078a1 in main ()

以下列出了与该’KJC: Wait for msg sends to complete’相关的Bug信息

Bug Affected Version Description
Bug 5075434 Small performance overhead in RAC (waits for “KJC: Wait for msg sends to complete”) 10.2.0.2 RAC session can suffer a small performance overhead due to waiting for a timeout on “KJC: Wait for msg sends to complete”.
Bug 11687519: PZ99 PROCESS SPIN WITH “KJC: WAIT FOR MSG SENDS TO COMPLETE” 10.2.0.3 Abstract: PZ99 PROCESS SPIN WITH “KJC: WAIT FOR MSG SENDS TO COMPLETE”
Bug 4672266: LMS WAITING FOR ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ WHILE SHUTDOWN 9.2.0.4 Abstract: LMS WAITING FOR ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ WHILE SHUTDOWN LMS2 has been waiting for the event ‘KJC: Wait for msg sends to complete’
while SHUTDOWN IMMEDIATE.
My customer decided to issue SHUTDOWN ABORT to stop the instance in one hour.
Bug 4672266 LMS hang waiting for ‘kjc: wait for msg sends to complete’ during shutdown 9.2.0.4 9.2.0.7 10.1.0.5 10.2.0.2 LMS may hang waiting for ‘kjc: wait for msg sends to complete’ during shutdown of an instance with another session reporting ORA-600 [kjccgmb:1]
Bug 2447621 – Unneccessary waits for “KJC: WAIT FOR MSG SEND COMPLETION” <9.2 Unneccessary waits for “KJC: Wait for message completion” can occur in a RAC environment.
Bug 5347566: SHUTDOWN IMMEDIATE VERY SLOW WITH ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ 9.2.0.7 Exactly similar as Bug 5289823 and Bug 4672266 LMS hang waiting for ‘kjc: wait for msg sends to complete’ during shutdown
Bug 5084699: RACGIMON WAITING FOR ‘KJC: WAIT FOR MSG SENDS TO COMPLETE’ 10.2.0.1 racgimon on one of the 4 nodes cluster has been exhibiting process spin.
It appears to wait for 15 minutes every 20 minutes on “KJC: Wait for msg
sends to complete” in the following PL/SQL execution:
‘begin dbms_lock.allocate_unique(:a1, :a2); end;’
Bug 3561040: SHUTDOWN HANG DUE TO LMS PROCESS WAITING FOR KJC: WAIT FOR MSG SENDS TO COMPLETE 9.2.0.4 LMS process is waiting for  “KJC: Wait for msg sends to complete” while shutdown and shutdown does not complete.

了解你所不知道的SMON功能(六):Instance Recovery

SMON的作用还包括RAC环境中的Instance Recovery,注意虽然Instance Recovery可以翻做实例恢复,但实际上和我们口头所说的实例恢复是不同的。我们口头语言所说的实例恢复很大程度上是指Crash Recovery崩溃恢复,Instance Recovery与Crash Recovery是存在区别的:针对单实例(single instance)或者RAC中所有节点全部崩溃后的恢复,我们称之为Crash Recovery。而对于RAC中的某一个节点失败,存活节点(surviving instance)试图对失败节点线程上redo做应用的情况,我们称之为Instance Recovery。对于Crash Recovery更多的内容可见<还原真实的cache recovery>一文。

现象

Instance Recovery期间分别存在cache recovery和ges/gcs remaster2个recovery stage,注意这2个舞台的恢复是同时进行的。cache recovery的主角是存活节点上的SMON进程,SMON负责分发redo给slave进程。而实施ges/gcs remaster的是RAC专有进程LMON。

整个Reconfiuration的过程如下图:
rac_instance_recovery_reconfiguration
 
注意以上Crash Detected时数据库进入部分可用(Partial Availability)状态,从Freeze Lockdb开始None Availability,到IR applies redo即前滚时转换为Partial Availability,待前滚完成后会实施回滚,但是此时数据库已经进入完全可用(Full Availability)状态了,如下图:
 
oracledatabase12g_instance_recovery

The graphic illustrates the degree of database availability during each step of Oracle instance recovery: 

A.         Real Application Clusters is running on multiple nodes.

B.         Node failure is detected.

C.         The enqueue part of the GRD is reconfigured; resource management is redistributed to the surviving nodes. This operation occurs relatively quickly.

D.        The cache part of the GRD is reconfigured and SMON reads the redo log of the failed instance to identify the database blocks that it needs to recover.

E.         SMON issues the GRD requests to obtain all the database blocks it needs for recovery. After the requests are complete, all other blocks are accessible.

F.         The Oracle server performs roll forward recovery. Redo logs of the failed threads are applied to the database, and blocks are available right after their recovery is completed.

G.        The Oracle server performs rollback recovery. Undo blocks are applied to the database for all uncommitted transactions.

H.        Instance recovery is complete and all data is accessible.

Note: The dashed line represents the blocks identified in step 2 in the previous slide. Also, the dotted steps represent the ones identified in the previous slide.

我们来实际观察一下Instance Recovery的过程:

INST 1:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> alter system set event='10426 trace name context forever,level 12' scope=spfile;  -- 10426 event Reconfiguration trace event
System altered.

SQL> startup force;
ORACLE instance started.

INST 2:
SQL> shutdown abort
ORACLE instance shut down.

=============================================================
========================alert.log============================

Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 (myinst: 1)
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
 parallel recovery started with 2 processes                 --2 recovery slave
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Started redo scan
Completed redo scan
 read 88 KB redo, 82 data blocks need recovery
Started redo application at
 Thread 2: logseq 374, block 2, scn 54624376
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0
  Mem# 0: +DATA/prod/onlinelog/group_4.271.747100549
  Mem# 1: +DATA/prod/onlinelog/group_4.272.747100553
Completed redo application of 0.07MB
Completed instance recovery at
 Thread 2: logseq 374, block 178, scn 54646382
 73 data blocks read, 83 data blocks written, 88 redo k-bytes read
Thread 2 advanced to log sequence 375 (thread recovery)
Redo thread 2 internally disabled at seq 375 (SMON)
ARC3: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (thread 2 sequence 374) (PROD1)
Setting Resource Manager plan SCHEDULER[0x310B]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (PROD1)
2011-06-27 22:19:29.280000 +08:00
Archived Log entry 792 added for thread 2 sequence 374 ID 0x9790ab2 dest 1:
ARC0: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (thread 2 sequence 375) (PROD1)
2011-06-27 22:19:30.336000 +08:00
ARC0: Archiving disabled thread 2 sequence 375
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (PROD1)
Archived Log entry 793 added for thread 2 sequence 375 ID 0x9790ab2 dest 1:
minact-scn: Master considers inst:2 dead

==================================================================
===========================smon trace begin=======================

*** 2011-06-27 22:19:28.279
2011-06-27 22:19:28.279849 : Start recovery for domain=0, valid=0, flags=0x0
Successfully allocated 2 recovery slaves
Using 67 overflow buffers per recovery slave
Thread 2 checkpoint: logseq 374, block 2, scn 54624376
  cache-low rba: logseq 374, block 2
    on-disk rba: logseq 374, block 178, scn 54626382
  start recovery at logseq 374, block 2, scn 54624376
Instance recovery not required for thread 1

*** 2011-06-27 22:19:28.487
Started writing zeroblks thread 2 seq 374 blocks 178-185

*** 2011-06-27 22:19:28.487
Completed writing zeroblks thread 2 seq 374
==== Redo read statistics for thread 2 ====
Total physical reads (from disk and memory): 4096Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 88Kb in 0.18s => 0.48 Mb/sec
Longest record: 8Kb, moves: 0/186 (0%)
Longest LWN: 33Kb, moves: 0/47 (0%), moved: 0Mb
Last redo scn: 0x0000.0341884d (54626381)
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 248/330 = 0.8
----------------------------------------------

*** 2011-06-27 22:19:28.489
KCRA: start recovery claims for 82 data blocks

*** 2011-06-27 22:19:28.526
KCRA: blocks processed = 82/82, claimed = 81, eliminated = 1
2011-06-27 22:19:28.526088 : Validate domain 0
 **************** BEGIN RECOVERY HA STATS  ****************

 I'm the recovery instance

 smon posted (1278500359646), recovery started 0.027 secs,(1278500359673)
 domain validated 0.242 secs (1278500359888)
 claims opened 70, claims converted 11, claims preread 0

 ****************  END RECOVERY HA STATS  *****************
2011-06-27 22:19:28.526668 : Validated domain 0, flags = 0x0

*** 2011-06-27 22:19:28.556
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0

*** 2011-06-27 22:19:28.560
Completed redo application of 0.07MB

*** 2011-06-27 22:19:28.569
Completed recovery checkpoint
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 330/330 = 1.0
----------------------------------------------
*** 2011-06-27 22:19:28.572 5401 krsg.c
Acquiring RECOVERY INFO PING latch from [krsg.c:5401] IX0
*** 2011-06-27 22:19:28.572 5401 krsg.c
Successfully acquired RECOVERY INFO PING latch IX+
*** 2011-06-27 22:19:28.572 5406 krsg.c
Freeing RECOVERY INFO PING latch from [krsg.c:5406] IX0
*** 2011-06-27 22:19:28.572 5406 krsg.c
Successfully freed RECOVERY INFO PING latch IX-
krss_sched_work: Prod archiver request from process SMON (function:0x2000)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Evaluating ARC2 to receive message (flags 0x2)
krss_find_arc: Selecting ARC2 to receive REC PING message
*** 2011-06-27 22:19:28.572 3093 krsv.c
krsv_send_msg: Sending message to process ARC2
*** 2011-06-27 22:19:28.572 1819 krss.c
krss_send_arc: Sent message to ARC2 (message:0x2000)
Recovery sets nab of thread 2 seq 374 to 178 with 8 zeroblks
Retrieving log 4
pre-aal: xlno:4 flno:0 arf:0 arb:2 arh:2 art:4
Updating log 3 thread 2 sequence 375
Previous log 3 thread 2 sequence 0
Updating log 4 thread 2 sequence 374
Previous log 4 thread 2 sequence 374
post-aal: xlno:4 flno:0 arf:3 arb:2 arh:2 art:3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Selecting ARC3 to receive message
*** 2011-06-27 22:19:28.589 3093 krsv.c
krsv_send_msg: Sending message to process ARC3
*** 2011-06-27 22:19:28.589 1819 krss.c
krss_send_arc: Sent message to ARC3 (message:0x1)
Retrieving log 2
Kicking thread 1 to switch logfile
Retrieving log 4
Retrieving log 3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Selecting ARC0 to receive message
*** 2011-06-27 22:19:28.599 3093 krsv.c
krsv_send_msg: Sending message to process ARC0
*** 2011-06-27 22:19:28.599 1819 krss.c
krss_send_arc: Sent message to ARC0 (message:0x1)
*** 2011-06-27 22:19:28.599 838 krsv.c
krsv_dpga: Waiting for pending I/O to complete

*** 2011-06-27 22:19:29.304
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Selecting ARC1 to receive message
*** 2011-06-27 22:19:29.304 3093 krsv.c
krsv_send_msg: Sending message to process ARC1
*** 2011-06-27 22:19:29.304 1819 krss.c
krss_send_arc: Sent message to ARC1 (message:0x1)
SMON[INST-TXN-RECO]:about to recover undo segment 11 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 11 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 12 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 12 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 13 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 13 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 14 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 14 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 15 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 15 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 16 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 16 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 17 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 17 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 18 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 18 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 19 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 19 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 20 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 20 as available status:2 ret:0

*** 2011-06-27 22:19:43.299
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1
* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:22:18.333
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:24:53.365
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

========================================================================
==============================lmon trace begin==========================

 *** 2011-06-27 22:19:27.748
kjxgmpoll reconfig instance map: 1

*** 2011-06-27 22:19:27.748
kjxgmrcfg: Reconfiguration started, type 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 4 0.

*** 2011-06-27 22:19:27.759
     Name Service frozen
kjxgmcs: Setting state to 4 1.
kjxgrdecidever: No old version members in the cluster
kjxgrssvote: reconfig bitmap chksum 0x2137452d cnt 1 master 1 ret 0
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSVOTE: Master indicates no Disk Voting
2011-06-27 22:19:27.760783 : kjxgrDiskVote: nonblocking method is chosen
kjxggpoll: change poll time to 50 ms
2011-06-27 22:19:27.918847 : kjxgrDiskVote: Obtained RR update lock for sequence 5, RR seq 4
2011-06-27 22:19:28.023160 : kjxgrDiskVote: derive membership from CSS (no disk votes)
2011-06-27 22:19:28.023240 : proposed membership: 1

*** 2011-06-27 22:19:28.081
2011-06-27 22:19:28.081952 : kjxgrDiskVote: new membership is updated by inst 1, seq 6
2011-06-27 22:19:28.082073 : kjxgrDiskVote: bitmap: 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmmeminfo: can not invalidate inst 2
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmSendAbortInstMsg: send an abort message to instance 2
 kjfmuin: inst bitmap 1
 kjfmmhi: received msg from inst 1 (inc 2)
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
     Name Service normal
     Name Service recovery done

*** 2011-06-27 22:19:28.191
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxgmcs: total reconfig time 0.432 seconds (from 2895072218 to 2895072650)
kjxggpoll: change poll time to 600 ms
kjfmact: call ksimdic on instance (2)
2011-06-27 22:19:28.211846 :

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 22:19:28.211906 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 22:19:28.211954 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
2011-06-27 22:19:28.212394 : * Begin lmon rcfg step KJGA_RCFG_FREEZE

*** 2011-06-27 22:19:28.233
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 1,
  (rcvinst 1), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 22:19:28.233575 :  Global Resource Directory frozen
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully
inst 1
* kjfcrfg: single_instance_kjga = TRUE

 asby init, 0/1/x2

 asby returns, 0/1/x2/false

* Domain maps before reconfiguration:
*   DOMAIN 0 (valid 1): 1 2
* End of domain mappings

 * dead instance detected - domain 0 invalid = TRUE

* Domain maps after recomputation:
*   DOMAIN 0 (valid 0): 1
* End of domain mappings
2011-06-27 22:19:28.235110 : * Begin lmon rcfg step KJGA_RCFG_COMM
2011-06-27 22:19:28.235242 : GSIPC:KSXPCB: msg 0xd8b84550 status 34, type 2, dest 2, rcvr 0
2011-06-27 22:19:28.235339 : GSIPC:KSXPCB: msg 0xd8b80180 status 34, type 2, dest 2, rcvr 1
 Active Sendback Threshold = 50 %
 Communication channels reestablished
2011-06-27 22:19:28.240076 : * Begin lmon rcfg step KJGA_RCFG_EXCHANGE
2011-06-27 22:19:28.240192 : * Begin lmon rcfg step KJGA_RCFG_ENQCLEANUP
 Master broadcasted resource hash value bitmaps
2011-06-27 22:19:28.251474 :
 Non-local Process blocks cleaned out
2011-06-27 22:19:28.251822 : * Begin lmon rcfg step KJGA_RCFG_CLEANUP
2011-06-27 22:19:28.265220 : * Begin lmon rcfg step KJGA_RCFG_TIMERQ
2011-06-27 22:19:28.265308 : * Begin lmon rcfg step KJGA_RCFG_DDQ
2011-06-27 22:19:28.265393 : * Begin lmon rcfg step KJGA_RCFG_SETMASTER
2011-06-27 22:19:28.271551 :
 Set master node info
2011-06-27 22:19:28.271931 : * Begin lmon rcfg step KJGA_RCFG_ENQREPLAY
2011-06-27 22:19:28.275490 :  Submitted all remote-enqueue requests
2011-06-27 22:19:28.275596 : * Begin lmon rcfg step KJGA_RCFG_ENQDUBIOUS
 Dwn-cvts replayed, VALBLKs dubious
2011-06-27 22:19:28.277223 : * Begin lmon rcfg step KJGA_RCFG_ENQGRANT
 All grantable enqueues granted
2011-06-27 22:19:28.277992 : * Begin lmon rcfg step KJGA_RCFG_PCMREPLAY
2011-06-27 22:19:28.279234 :
2011-06-27 22:19:28.279255 :  Post SMON to start 1st pass IR               --SMON posted by LMON
2011-06-27 22:19:28.307890 :  Submitted all GCS cache requests             --IR acquires all gcs resource needed for recovery
2011-06-27 22:19:28.308038 : * Begin lmon rcfg step KJGA_RCFG_FIXWRITES
 Post SMON to start 1st pass IR
 Fix write in gcs resources
2011-06-27 22:19:28.313508 : * Begin lmon rcfg step KJGA_RCFG_END
2011-06-27 22:19:28.313720 :
2011-06-27 22:19:28.313733 :
Reconfiguration complete

*   domain 0 valid?: 0
* kjfcrfg: ask RMS0 to do pnp work
 **************** BEGIN DLM RCFG HA STATS  ****************

 Total dlm rcfg time (inc 6): 0.100 secs (1278500359581, 1278500359681)
   Begin step .........: 0.001 secs (1278500359581, 1278500359582)
   Freeze step ........: 0.020 secs (1278500359582, 1278500359602)
   Remap step .........: 0.002 secs (1278500359602, 1278500359604)
   Comm step ..........: 0.005 secs (1278500359604, 1278500359609)
   Sync 1 step ........: 0.000 secs (0, 0)
   Exchange step ......: 0.000 secs (1278500359609, 1278500359609)
   Sync 2 step ........: 0.000 secs (0, 0)
   Enqueue cleanup step: 0.011 secs (1278500359609, 1278500359620)
   Sync pcm1 step .....: 0.000 secs (0, 0)
   Cleanup step .......: 0.013 secs (1278500359620, 1278500359633)
   Timerq step ........: 0.000 secs (1278500359633, 1278500359633)
   Ddq step ...........: 0.000 secs (1278500359633, 1278500359633)
   Set master step ....: 0.006 secs (1278500359633, 1278500359639)
   Sync 3 step ........: 0.000 secs (0, 0)
   Enqueue replay step : 0.004 secs (1278500359639, 1278500359643)
   Sync 4 step ........: 0.000 secs (0, 0)
   Enqueue dubious step: 0.001 secs (1278500359643, 1278500359644)
   Sync 5 step ........: 0.000 secs (0, 0)
   Enqueue grant step .: 0.001 secs (1278500359644, 1278500359645)
   Sync 6 step ........: 0.000 secs (0, 0)
   PCM replay step ....: 0.030 secs (1278500359645, 1278500359675)
   Sync 7 step ........: 0.000 secs (0, 0)
   Fixwrt replay step .: 0.003 secs (1278500359675, 1278500359678)
   Sync 8 step ........: 0.000 secs (0, 0)
   End step ...........: 0.001 secs (1278500359680, 1278500359681)
 Number of replayed enqueues sent / received .......: 0 / 0
 Number of replayed fusion locks sent / received ...: 0 / 0
 Number of enqueues mastered before / after rcfg ...: 2217 / 2941
 Number of fusion locks mastered before / after rcfg: 3120 / 5747

 ****************  END DLM RCFG HA STATS  *****************

*** 2011-06-27 22:19:36.589
kjxgfipccb: msg 0x0x7ff526139320, mbo 0x0x7ff526139310, type 19, ack 0, ref 0, stat 34

=====================================================================
============================lms trace begin==========================


*** 2011-06-27 22:38:54.663
2011-06-27 22:38:54.663764 :  0 GCS shadows cancelled, 0 closed, 0 Xw survived
2011-06-27 22:38:54.673539 :  5230 GCS resources traversed, 0 cancelled
2011-06-27 22:38:54.707671 :  9322 GCS shadows traversed, 0 replayed, 0 duplicates,
5183 not replayed, dissolve 0 timeout 0 RCFG(10) lms 0 finished replaying gcs resources
2011-06-27 22:38:54.709132 :  0 write requests issued in 384 GCS resources        --check past image
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709520 :  0 write requests issued in 273 GCS resources
 1 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709842 :  0 write requests issued in 281 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710159 :  0 write requests issued in 233 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710531 :  0 write requests issued in 350 GCS resources
 lms 0 finished fixing gcs write protocol

Instance Recovery和普通的Crash Recovery最大的区别在于实例恢复过程中的GRD Frozen和对GES/GCS资源的Remaster,这部分工作主要由LMON进程完成,可以从以上trace中发现一些KJGA_RCFG_*形式的Reconfiguration步骤,它们的含义:

Reconfiguration Steps:

1.    KJGA_RCFG_BEGIN
LMON continuously polling for reconfiguration event. Once cgs reports a change in cluster membership,
LMON starts reconfiguration

2.    KJGA_RCFG_FREEZE
All processes acknowledges to the reconfiguration freeze before LMON continue 

3.    KJGA_RCFG_REMAP
Updates new instance map (kjfchsu), re-distributes resource mastership. Invalidate recovery domains
if reconfiguration is caused by instance death.

4.    KJGA_RCFG_COMM
Reinitialize communication channel

5.    KJGA_RCFG_EXCHANGE
Exchange of master information of gcs, ges and file affinity master

6.    KJGA_RCFG_ENQCLEANUP
Delete remote dead gcs/ges locks. Cancel converting gcs requests. 

7.    KJGA_RCFG_CLEANUP
Cleanup/remove ges resources

8.    KJGA_RCFG_TIMERQ
Restore relative timeout for enqueue locks on timeout queue

9.    KJGA_RCFG_DDQ
Clean out enqueue locks on deadlock queue

10.  KJGA_RCFG_SETMASTER
Update master info for each enqueue resources that need to be remastered.

11.  KJGA_RCFG_REPLAY
Replay enqueue locks

12.  KJGA_RCFG_ENQDUBIOUS
Invalidates ges resources without established value

13.  KJGA_RCFG_ENQGRANT
Grants all grantable ges lock requests

14.  KJGA_RCFG_REPLAY2
Enqueue reconfiguration complete. Post SMON to start instance recovery.  Starts replaying gcs resources.

15.  KJGA_RCFG_FIXWRITES2
Fix write state of gcs resources

16.  KJGA_RCFG_END
Unfreeze lock database

Instance Recovery相关的诊断事件
我们无法禁止Instance Recovery的发生,事实上一旦出现Instance Crash那么Instance Recovery就是必须的。
与Instance Recovery相关的诊断事件主要有10426和29717等:

    10426 – Reconfiguration trace event
    10425 – Enqueue operations
    10432 – Fusion activity
    10429 – IPC tracing

oerr ora 10425
10425, 00000, "enable global enqueue operations event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global enqueue operations.

oerr ora 10426
10426, 00000, "enable ges/gcs reconfiguration event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs reconfiguration.

oerr ora 10430
10430, 00000, "enable ges/gcs dynamic remastering event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs dynamic remastering.

oerr ora 10401
10401, 00000, "turn on IPC (ksxp) debugging"
// *Cause:
// *Action: Enables debugging code for IPC service layer (ksxp)

oerr ora 10708
10708, 00000, "print out trace information from the RAC buffer cache"
// *Cause: N/A
// *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE.  THIS DOES NOT NEED TO BE
//          TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING.

oerr ora 29717
29717, 00000, "enable global resource directory freeze/unfreeze event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global resource directory freeze/unfreeze.

diag RAC INSTANCE SHUTDOWN LMON
LMON will dump more informations to trace during reconfig and freeze.

event="10426 trace name context forever, level 8"
event="29717 trace name context forever, level 5"
or
event="10426 trace name context forever, level 12"
event="10430 trace name context forever, level 12"
event="10401 trace name context forever, level 8"
event="10046 trace name context forever, level 8"
event="10708 trace name context forever, level 15"
event="29717 trace name context forever, level 5"

see 29717  grd frozen trace

alter system set event='29717 trace name context forever, level 5' scope=spfile;

=========================================================================
============================lmon trace begin=============================

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 23:13:16.693089 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 23:13:16.693219 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
[FDB][start]
2011-06-27 23:13:16.701320 : * Begin lmon rcfg step KJGA_RCFG_FREEZE
[FACK][18711 not frozen]          --fack means acknowledge in advance
[FACK][18713 not frozen]
[FACK][18719 not frozen]
[FACK][18721 not frozen]
[FACK][18723 not frozen]
[FACK][18729 not frozen]
[FACK][18739 not frozen]
[FACK][18743 not frozen]
[FACK][18745 not frozen]
[FACK][18747 not frozen]
[FACK][18749 not frozen]
[FACK][18751 not frozen]
[FACK][18753 not frozen]
[FACK][18755 not frozen]
[FACK][18757 not frozen]
[FACK][18759 not frozen]
[FACK][18763 not frozen]
[FACK][18765 not frozen]
[FACK][18767 not frozen]
[FACK][18769 not frozen]
[FACK][18771 not frozen]
[FACK][18775 not frozen]
[FACK][18777 not frozen]
[FACK][18816 not frozen]
[FACK][18812 not frozen]
[FACK][18818 not frozen]
[FACK][18820 not frozen]
[FACK][18824 not frozen]
[FACK][18826 not frozen]
[FACK][18830 not frozen]
[FACK][18835 not frozen]
[FACK][18842 not frozen]
[FACK][18860 not frozen]
[FACK][18865 not frozen]
[FACK][18881 not frozen]
[FACK][18883 not frozen]
[FACK][18909 not frozen]

*** 2011-06-27 23:13:16.724
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 0,
  (rcvinst 32767), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 23:13:16.724764 :  Global Resource Directory frozen
* kjfc_qry_bigns: noone has the rcvinst established yet, set it to the highest open instance = 1
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully

=====================================================================
==========================lmd0 trace begin===========================

*** 2011-06-27 23:13:16.700
[FFCLI][frozen]
[FUFCLI][normal]

crsctl status resource -t -init in 11.2.0.2 grid infrastructure

11.2.0.2的grid infrastructure中crsctl stat res 命令不再显示如ora.cssd、ora.ctssd、ora.diskmon等基础资源的信息,如果用户想要了解这些resource状态需要加上-init选项:

[grid@rh2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

[grid@rh2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rh2
ora.LISTENER.lsnr
               OFFLINE OFFLINE      rh2
ora.asm
               ONLINE  ONLINE       rh2
ora.gsd
               OFFLINE OFFLINE      rh2
ora.net1.network
               ONLINE  ONLINE       rh2
ora.ons
               ONLINE  ONLINE       rh2
ora.registry.acfs
               OFFLINE OFFLINE      rh2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        OFFLINE OFFLINE
ora.dw.db
      1        OFFLINE OFFLINE
ora.maclean.db
      1        OFFLINE OFFLINE
ora.oc4j
      1        OFFLINE OFFLINE
ora.prod.db
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre_preconnect.svc
      1        OFFLINE OFFLINE
ora.prod.maclean_taf.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.rh2.vip
      1        OFFLINE OFFLINE
ora.rh3.vip
      1        OFFLINE OFFLINE
ora.scan1.vip
      1        OFFLINE OFFLINE                                       

[grid@rh2 ~]$ crsctl stat res -t -init 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rh2                      Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rh2
ora.crf
      1        ONLINE  ONLINE       rh2
ora.crsd
      1        ONLINE  ONLINE       rh2
ora.cssd
      1        ONLINE  ONLINE       rh2
ora.cssdmonitor
      1        ONLINE  ONLINE       rh2
ora.ctssd
      1        ONLINE  ONLINE       rh2                      OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       rh2
ora.drivers.acfs
      1        ONLINE  OFFLINE
ora.evmd
      1        ONLINE  ONLINE       rh2
ora.gipcd
      1        ONLINE  ONLINE       rh2
ora.gpnpd
      1        ONLINE  ONLINE       rh2
ora.mdnsd
      1        ONLINE  ONLINE       rh2

此外在11.2.0.2的grid中当我们想启动、停止、修改这些init资源时都需要加上-init选项,否则将出现CRS-2613: Could not find resource错误:

[grid@rh2 ~]$ crsctl stat res ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl modify res ora.asm -attr AUTO_START=never

[grid@rh2 ~]$ crsctl stat res ora.asm -p
NAME=ora.asm
TYPE=ora.asm.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME%.ASM%CRS_CSS_NODENUMBER%.asm
AUTO_START=never
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=asm) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle ASM resource
ENABLED=1
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(rh2)=+ASM1
GEN_USR_ORA_INST_NAME@SERVERNAME(rh3)=+ASM2
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=weak(ora.LISTENER.lsnr)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=600
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_INST_NAME=+ASM%CRS_CSS_NODENUMBER%
USR_ORA_OPEN_MODE=mount
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.2.0

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13
ora.asm
ora.cluster_interconnect.haip
ora.crf
ora.crsd
ora.cssd
ora.cssdmonitor
ora.ctssd
ora.diskmon
ora.drivers.acfs
ora.evmd
ora.gipcd
ora.gpnpd
ora.mdnsd

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13|xargs crsctl status resource
CRS-2613: Could not find resource 'ora.cluster_interconnect.haip'.
CRS-2613: Could not find resource 'ora.crf'.
CRS-2613: Could not find resource 'ora.crsd'.
CRS-2613: Could not find resource 'ora.cssd'.
CRS-2613: Could not find resource 'ora.cssdmonitor'.
CRS-2613: Could not find resource 'ora.ctssd'.
CRS-2613: Could not find resource 'ora.diskmon'.
CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-2613: Could not find resource 'ora.evmd'.
CRS-2613: Could not find resource 'ora.gipcd'.
CRS-2613: Could not find resource 'ora.gpnpd'.
CRS-2613: Could not find resource 'ora.mdnsd'.
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl status res ora.crsd -init -p
NAME=ora.crsd
TYPE=ora.crs.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=AGENT=1,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=1,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DAEMON_TRACING_LEVELS=AGENT=0,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=0,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CRSD"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)pullup(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(shutdown:ora.asm,intermediate:ora.cssd,intermediate:ora.gipcd)
STOP_MODE=NONE
STOP_TIMEOUT=43200
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[grid@rh2 ~]$ crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65   
CRS-0245:  User doesn't have enough privilege to perform the operation
CRS-4000: Command Modify failed, or completed with errors.

/* 修改某些资源的属性要求root权限 */

[root@rh2 ~]# crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65 

[root@rh2 ~]# crsctl status res ora.crsd -init -p|grep SCRIPT_TIMEOUT
SCRIPT_TIMEOUT=65

[root@rh2 ~]# crsctl status res ora.ctssd -p -init
NAME=ora.ctssd
TYPE=ora.ctss.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DAEMON_TRACING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Ctss Agents"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.cssd,ora.gipcd)pullup(ora.cssd,ora.gipcd)
START_TIMEOUT=60
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(ora.cssd,ora.gipcd)
STOP_TIMEOUT=60
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[root@rh2 ~]# crsctl status res ora.diskmon -p -init
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=never
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=3
CHECK_TIMEOUT=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=
DAEMON_TRACING_LEVELS=
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Diskmon"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.cssd)pullup:always(ora.cssd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
USR_ORA_ENV=ORACLE_USER=grid
VERSION=11.2.0.2.0

[root@rh2 ~]# crsctl status res ora.cssd -init -p
NAME=ora.cssd
TYPE=ora.cssd.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%
AGENT_HB_INTERVAL=0
AGENT_HB_MISCOUNT=10
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=abort
CLEAN_COMMAND=
CSSD_MODE=
CSSD_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX%
CSS_USER=grid
DAEMON_LOGGING_LEVELS=CSSD=2,GIPCNM=2,GIPCGM=2,GIPCCM=2,CLSF=0,SKGFD=0,GPNP=1,OLR=0
DAEMON_TRACING_LEVELS=CSSD=0,GIPCNM=0,GIPCGM=0,GIPCCM=0,CLSF=0,SKGFD=0,GPNP=0,OLR=0
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CSSD"
DETACHED=true
ENABLED=1
ENV_OPTS=
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
OMON_INITRATE=1000
OMON_POLLRATE=500
ORA_OPROCD_MODE=
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCD_TIMEOUT=1000
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=3
SCRIPT_TIMEOUT=600
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.diskmon)hard(ora.cssdmonitor,ora.gpnpd,ora.gipcd)pullup(ora.gpnpd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor)
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
USR_ORA_ENV=
VMON_INITLIMIT=16
VMON_INITRATE=500
VMON_POLLRATE=500

Advise on OS patch upgrade with RAC

Customer is running 10.2.0.2 RAC on Solaris 10. Final objective is to upgrade to RAC 10.2.0.4. Requirement is as below;

1) Upgrade Solaris 10 patches from Generic_118833-03 to 142909-17
2) Upgrade RAC to 10.2.0.4

Question1:
Customer is thinking of performing the OS patch upgrade via the below strategy;

Night1:
1. Shutdown node2.
2. Shutdown crs/db on node1.
3. Perform OS upgrade on node1.
4. Once node1 successfully upgrade it’s OS, start crs/db on node1 and continue database operation until night2. Take note that node2 is down from night1 until night2.

Night2:
1. Shutdown node1.
2. Startup node2; do not startup node2 crs/db.
3. Perform OS upgrade of node2.
4. Once node2 successfully upgrade it’s OS, startup crs/db on node2.
5. Now startup node1 and it’s crs/db and join cluster
Now both node crs/db continues operation.

Is there any concern on the OS patch upgrade strategy as above?

A: No issue on the OS patch upgarde. Stay within a 24 hours of upgrade window.

Question2: After the OS patch upgrade via the strategy above, is there a need to do anything on rac (db/crs)? Perhaps relink? Or do we just bring up crs and both crs/db will up and running?

A: Doc ID 444595.1 – Is It Necessary To Relink Oracle Following OS Upgrade?

Question3: We notice that SUNWsprox OS package is not available in both servers. However RAC is already up and running. Does it mean that SUNWsprox package is not needed in SOlaris 10?

A: That OS-Package SUNWsprox is not required at Solaris 10 (see note 169706.1). Therefore this error can be ignored on your OS version.

Question4: Customer would like to disable gsd. Is this recommended?

A: Yes you can disable the gsd.

Question5: During the OS patch upgrade, java version will be upgraded from “1.5.0_06” to “1.5.0_24”. Will there be any issue with RAC?

A: No issue with RAC.

Question6: If OS patch in node1 is upgraded but not on node2, will RAC have any issue?

A: Please refer the below doc,

Doc ID 220970.1 – RAC: Frequently Asked Questions

Oracle RAC Brain Split Resolution

大约是一周前,一位资深的Oracle工程师向我和客户介绍RAC中脑裂的处理过程,据他介绍脑裂发生时通过各节点对voting disk(投票磁盘)的抢夺,那些争抢到(n/2+1)数量voting disk的节点就可以survive(幸存)下来,而没有争抢到voting disk的节点则被evicted踢出节点。

不得不说以上这番观点,来得太过随意了,一位从Oracle 6就开始从事维护工作的老工程师也会犯这样的概念性错误,只能说Oracle技术的更新过于日新月异了。

在理解脑裂(Brain Split)处理过程前,有必要介绍一下Oracle RAC Css(Cluster Synchronization Services)的工作框架:

Oracle RAC CSS提供2种后台服务包括群组管理(Group Managment简称GM)和节点监控(Node Monitor简称NM),其中GM管理组(group)和锁(lock)服务。在集群中任意时刻总有一个节点会充当GM主控节点(master node)。集群中的其他节点串行地将GM请求发送到主控节点(master node),而master node将集群成员变更信息广播给集群中的其他节点。组成员关系(group membership)在每次发生集群重置(cluster reconfiguration)时发生同步。每一个节点独立地诠释集群成员变化信息。

而节点监控NM服务则负责通过skgxn(skgxn-libskgxn.a,提供节点监控的库)与其他厂商的集群软件保持节点信息的一致性。此外NM还提供对我们熟知的网络心跳(Network heartbeat)和磁盘心跳(Disk heartbeat)的维护以保证节点始终存活着。当集群成员没有正常Network heartbeat或Disk heartbeat时NM负责将成员踢出集群,被踢出集群的节点将发生节点重启(reboot)。

NM服务通过OCR中的记录(OCR中记录了Interconnect的信息)来了解其所需要监听和交互的端点,将心跳信息通过网络发送到其他集群成员。同时它也监控来自所有其他集群成员的网络心跳Network heartbeat,每一秒钟都会发生这样的网络心跳,若某个节点的网络心跳在misscount(by the way:10.2.0.1中Linux上默认misscount为60s,其他平台为30s,若使用了第三方vendor clusterware则为600s,但10.2.0.1中未引入disktimeout;10.2.0.4以后misscount为60s,disktimeout为200s;11.2以后misscount为30s:CRS-4678: Successful get misscount 30 for Cluster Synchronization Services,CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services)指定的秒数中都没有被收到的话,该节点被认为已经”死亡”了。NM还负责当其他节点加入或离开集群时初始化集群的重置(Initiates cluster reconfiguration)。

在解决脑裂的场景中,NM还会监控voting disk以了解其他的竞争子集群(subclusters)。关于子集群我们有必要介绍一下,试想我们的环境中存在大量的节点,以Oracle官方构建过的128个节点的环境为我们的想象空间,当网络故障发生时存在多种的可能性,一种可能性是全局的网络失败,即128个节点中每个节点都不能互相发生网络心跳,此时会产生多达128个的信息”孤岛”子集群。另一种可能性是局部的网络失败,128个节点中被分成多个部分,每个部分中包含多于一个的节点,这些部分就可以被称作子集群(subclusters)。当出现网络故障时子集群内部的多个节点仍能互相通信传输投票信息(vote mesg),但子集群或者孤岛节点之间已经无法通过常规的Interconnect网络交流了,这个时候NM Reconfiguration就需要用到voting disk投票磁盘。

因为NM要使用voting disk来解决因为网络故障造成的通信障碍,所以需要保证voting disk在任意时刻都可以被正常访问。在正常状态下,每个节点都会进行磁盘心跳活动,具体来说就是会到投票磁盘的某个块上写入disk心跳信息,这种活动每一秒钟都会发生,同时CSS还会每秒读取一种称作”kill block”的”赐死块”,当”kill block”的内容表示本节点被驱逐出集群时,CSS会主动重启节点。

为了保证以上的磁盘心跳和读取”kill block”的活动始终正常运作CSS要求保证至少(N/2+1)个投票磁盘要被节点正常访问,这样就保证了每2个节点间总是至少有一个投票磁盘是它们都可以正常访问的,在正常情况下(注意是风平浪静的正常情况)只要节点所能访问的在线voting disk多于无法访问的voting disk,该节点都能幸福地活下去,当无法访问的voting disk多于正常的voting disk时,Cluster Communication Service进程将失败并引起节点重启。所以有一种说法认为voting disk只要有2个足以保证冗余度就可以了,没有必要有3个或以上voting disk,这种说法是错误的。Oracle推荐集群中至少要有3个voting disks。

补充1:

Question:
有同学问那么voting disk  必须是奇数个呢?

Answer:
实际上我们仅仅是推荐使用奇数个vote disk ,而非必须是奇数个。10gR2中vote disk的数目上限是32个。

Question
我们可以使用2或4个vote disk吗?

Answer:
可以的。 但是2、4这样的数目在“至少(N/2+1)个投票磁盘要被节点正常访问”这一disk heartbeat的硬性算法下是不利的:
当我们使用2个vote disk 时,不能发生任意个vote disk的心跳失败
当我们使用3个vote disk 时,不能发生大于1个的vote disk心跳失败
当我们使用4个vote disk 时,不能发生大于1个的vote disk心跳失败 ,这和3个时的容错率是一样,但是因为我们有更多的vote disk,这会导致管理成本和引入的风险增长
当我们使用5个vote disk 时,不能发生大于2个的vote disk心跳失败
当我们使用6个vote disk 时,仍然不能发生大于2个的vote disk心跳失败, 同样的因为比5时多出一个 ,也会引入不合理的管理成本和风险

补充2:
Question:
若节点间的网络心跳正常,且节点所能正常心跳的vote disk 大于 不能正常访问的 ,如3个votedisk 时 恰巧有1个vote disk 的disk heartbeat 超时,此时Brain split 会发生吗?

Answer:
这种情况即不会触发Brain Split,也不会引发节点驱逐协议(eviction protocol)。 当单个或小于(N/2+1)个的voting disk心跳失败(disk heartbeat failure)时,这种心跳失败可能是由于短期内节点访问voting disk发生I/O error错误而引起的,此时css会立刻将这些失败的voting disk标记为OFFLINE。虽然有一定数量的voting disk OFFLINE了,但是我们仍有至少(N/2+1)个投票磁盘可用,这保证了eviction protocol不会被调用,所以没有节点会被reboot重启。紧接着node monitor模块的Disk ping Monitor Thread(DPMT-clssnmDiskPMT)会重复尝试访问这些失败的OFFLINE voting disk,若这些投票磁盘变得再次可I/O访问且经过验证其上的数据也没有讹误,那么css会再次将此voting disk标记为ONLINE;但是如果在45s( 这里的45s是基于misscount和 内部算法获得的) 内仍不能正常访问相关的voting disk,那么DMPT将在cssd.log中生成警告信息,如:

 

CSSD]2011-11-11 20:11:20.668 >
WARNING: clssnmDiskPMT: long disk latency >(45940 ms) to voting disk (0//dev/asm-votedisk1)

 

假设以上发生clssnmDiskPMT警告的RAC场景共有3个voting disk,现已有一个asm-votedisk1因为I/O error或其他原因而被标记为OFFLINE,若此时再有一个votedisk也出现了问题并disk heartbeat 失败,那么节点会因为少于规定数目(2)的votedisk而引发eviction protocol,进而重启reboot。

单个或小于(N/2+1)个的voting disk心跳失败都仅仅生成警告(Warning),而非致命的错误。因为仍有绝大数量的vote disk可被访问,因此生成的警告都是非致命的,eviction protocol将不会被触发。

 

当实际的NM Reconfiguration集群重置情况发生时所有的active节点和正在加入集群的节点都会参与到reconfig中,那些没有应答(ack)的节点都将不再被归入新的集群关系中。实际上reconfig重置包括多个阶段:
1.初始化阶段 — reconfig manager(由集群成员号最低的节点担任)向其他节点发送启动reconfig的信号
2.投票阶段 — 节点向reconfig manager发送该节点所了解的成员关系
3.脑裂检查阶段 — reconfig manager检查是否脑裂
4.驱逐阶段 — reconfig manager驱逐非成员节点
5.更新阶段 — reconfig manager向成员节点发送权威成员关系信息

 

在脑裂检查阶段Reconfig Manager会找出那些没有Network Heartbeat而有Disk Heartbeat的节点,并通过Network Heartbeat(如果可能的话)和Disk Heartbeat的信息来计算所有竞争子集群(subcluster)内的节点数目,并依据以下2种因素决定哪个子集群应当存活下去:

  1. 拥有最多节点数目的子集群(Sub-cluster with largest number of Nodes)
  2. 若子集群内数目相等则为拥有最低节点号的子集群(Sub-cluster with lowest node number),举例来说在一个2节点的RAC环境中总是1号节点会获胜。
  3. 采用Stonith algorithm 的IO fencing(remote power reset)

 

补充:

STONITH算法是一种常用的I/O Fencing algorithm,是RAC中必要的远程关闭节点的接口。其想法十分简单,当某个节点上运行的软件希望确保本集群内的其他节点不能使用某种资源时,拔出其他节点的插座即可。这是一种简单、可靠且有些残酷的算法。Stonith 的优势是其没有特定的硬件需求,也不会限制集群的扩展性。

Oracle Clusterware的Process Monitor模块负责实现IO fencing,保证不会因节点/实例的不协调工作而产生corruption。Process Monitor的工作具体由hangcheck timer或者oprocd 完成, 在Linux平台上10.2.0.4 之前都没有oprocd的存在(其他Unix平台在10.2.0.1就有了),在安装RAC之前需要额外安装hangcheck timer软件以保证IO fencing, 到10.2.0.4 时Linux上也有了oprocd,具体见<Know about RAC Clusterware Process OPROCD> 一文。 这些负责IO fencing的进程一般都会被锁定在内存中(locked in memory)、实时运行(Real time)、休眠固定的时间(Sleep a fixed time)、以root身份运行;若进程唤醒后发现时间已经太晚,那么它会强制reboot;若这些进程发生失败,则也会重启,所以在RAC环境中oprocd是一个很重要的进程,不要失去手动去kill这个进程。

 

 

在完成脑裂检查后进入驱逐阶段,被驱逐节点会收到发送给它们的驱逐信息(如果网络可用的话),若无法发送信息则会通过写出驱逐通知到voting disk上的”kill block”来达到驱逐通知的目的。同时还会等待被驱逐节点表示其已收到驱逐通知,这种表示可能是通过网络通信的方式也可能是投票磁盘上的状态信息。

可以看到Oracle CSS中Brain Split Check时会尽可能地保证最大的一个子集群存活下来以保证RAC系统具有最高的可用性,而并不如那位资深工程师所说的在Cluster Reconfiguration阶段会通过节点对投票磁盘的抢占来保证哪个节点存活下来。

以下为一个三节点RAC环境中的2个示例场景:

1.1号节点网络失败,2,3号节点形成子集群;2,3节点通过voting disk向1号节点发起驱逐:

以下为1号节点的ocssd.log日志:
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 29.610 seconds
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1037, misstime 30390
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 3.280308e-268artbeat fatal, eviction in 29.150 seconds

对2,3号节点发起misscount计时

[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30850
[    CSSD]2011-04-23 17:11:42.943 [3042950032] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:11:44.368 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 3.280308e-268artbeat fatal, eviction in 28.610 seconds
[    CSSD]2011-04-23 17:12:04.778 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 14.580 seconds
[    CSSD]2011-04-23 17:12:04.779 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 3.280308e-268artbeat fatal, eviction in 14.120 seconds
[    CSSD]2011-04-23 17:12:06.207 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 3.280308e-268artbeat fatal, eviction in 13.580 seconds
[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 5.560 seconds
[    CSSD]2011-04-23 17:12:17.719 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 5.100 seconds
[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 4.560 seconds
[    CSSD]2011-04-23 17:12:19.165 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 4.100 seconds
[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 3.560 seconds
[    CSSD]2011-04-23 17:12:20.642 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 3.100 seconds
[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 2.560 seconds
[    CSSD]2011-04-23 17:12:22.139 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 2.100 seconds
[    CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 3.280308e-268artbeat fatal, eviction in 1.550 seconds
[    CSSD]2011-04-23 17:12:23.588 [3042950032] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 3.280308e-268artbeat fatal, eviction in 1.090 seconds

2号节点的ocssd.log日志:

[    CSSD]2011-04-23 17:11:53.054 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 29.800 seconds
[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:   clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30200
[    CSSD]2011-04-23 17:11:53.054 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:11:54.516 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 8.910601e-269artbeat fatal, eviction in 28.790 seconds
[    CSSD]2011-04-23 17:12:14.826 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 14.800 seconds
[    CSSD]2011-04-23 17:12:16.265 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 8.910601e-269artbeat fatal, eviction in 13.800 seconds
[    CSSD]2011-04-23 17:12:27.755 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 5.800 seconds
[    CSSD]2011-04-23 17:12:29.197 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 4.800 seconds
[    CSSD]2011-04-23 17:12:30.658 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 3.800 seconds
[    CSSD]2011-04-23 17:12:32.133 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 2.800 seconds
[    CSSD]2011-04-23 17:12:33.602 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 1.790 seconds
[    CSSD]2011-04-23 17:12:35.126 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 8.910601e-269artbeat fatal, eviction in 0.800 seconds

[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:12:35.399 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]

clssnmHandleSyn应答3号节点发送的同步信息

[    CSSD]2011-04-23 17:12:35.399 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0e]

发生Node Monitoring SUSPEND事件

[    CSSD]2011-04-23 17:12:35.405 [117574544] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(10)

通过clssnmSendVoteInfo向3号节点发送投票信息Vote mesg

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (3/0) unique (1303592601/1303592601) prevConuni(0) birth (9/9) (old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7e80ae8), probe((nil))
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

确认1号节点离开了集群cluster

[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)
[    CSSD]2011-04-23 17:12:35.415 [117574544] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed
[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:12:35.416 [117574544] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:12:35.416 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms
[    CSSD]2011-04-23 17:12:35.416 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)
[    CSSD]2011-04-23 17:12:35.416 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]
[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2
[    CSSD]2011-04-23 17:12:35.417 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[    CSSD]2011-04-23 17:12:35.418 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh2 type 3
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock ocr_crs type 2
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(9/9)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10
[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[    CSSD]2011-04-23 17:12:35.419 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)
[    CSSD]2011-04-23 17:12:35.419 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2011-04-23 17:12:35.428 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:12:35.440 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

以下为3号节点的ocssd.log:

[    CSSD]2011-04-23 17:12:36.303 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 50 1.867300e-268artbeat fatal, eviction in 29.220 seconds
[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:   clssnmPollingThread: node vrh1 (1) is impending reconfig, flag 1037, misstime 30780
[    CSSD]2011-04-23 17:12:36.303 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:12:57.889 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 75 1.867300e-268artbeat fatal, eviction in 14.220 seconds
[    CSSD]2011-04-23 17:13:10.674 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 5.220 seconds
[    CSSD]2011-04-23 17:13:12.115 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 4.220 seconds
[    CSSD]2011-04-23 17:13:13.597 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 3.210 seconds
[    CSSD]2011-04-23 17:13:15.024 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 2.220 seconds
[    CSSD]2011-04-23 17:13:16.504 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 1.220 seconds
[    CSSD]2011-04-23 17:13:17.987 [3053439888] >WARNING: clssnmPollingThread: node vrh1 (1) at 90 1.867300e-268artbeat fatal, eviction in 0.220 seconds
[    CSSD]2011-04-23 17:13:18.325 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh1 (1), flags 0x040d, state 3, wt4c 0
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 10
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE
[    CSSD]2011-04-23 17:13:18.326 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE
[    CSSD]2011-04-23 17:13:18.327 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(10)
[    CSSD]2011-04-23 17:13:18.329 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(2)
[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:13:18.329 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[21] sync[10]
[    CSSD]2011-04-23 17:13:18.330 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0e]
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 1, vrh1, misstime(60010) state(5)
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2011-04-23 17:13:18.332 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE
[    CSSD]2011-04-23 17:13:18.334 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(2)
[    CSSD]2011-04-23 17:13:18.335 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(10)
[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)

以上完成了2-3节点间的Vote mesg通信,这些信息包含Node identifier,GM peer to peer listening endpoint以及
View of cluster membership。

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...

开始检测voting disk上的信息

[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmCheckDskInfo: node 1, vrh1, state 5 with leader 1 has smaller cluster size 1; my cluster size 2 with leader 2

发现其他子集群,包含1号节点且1号节点为该子集群的leader,为最小子集群;3号与2号节点组成最大子集群,2号节点为leader节点

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmEvict: Start
[ CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE: clssnmEvict: Evicting node 1, vrh1, birth 9, death 10, 
impendingrcfg 1, stateflags 0x40d 
发起对1号节点的驱逐

[    CSSD]2011-04-23 17:13:18.337 [3032460176] >TRACE:   clssnmSendShutdown: req to node 1, kill time 443294
[    CSSD]2011-04-23 17:13:18.339 [3032460176] >TRACE:   clssnmDiscHelper: vrh1, node(1) connection failed, con (0xb7eaf220), probe((nil))
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 1, vrh1, undead 1
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 1, vrh1, down, LATS(443144),timeout(150)

clssnmCheckKillStatus检查1号节点是否down了

[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE
[    CSSD]2011-04-23 17:13:18.340 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(10)
[    CSSD]2011-04-23 17:13:18.341 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(2)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 1, state (5/0) unique (1303592601/1303592601) prevConuni(1303592601) birth (9/9) (old/new)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmDeactivateNode: node 1 (vrh1) left cluster

[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[    CSSD]2011-04-23 17:13:18.341 [89033616] >TRACE:   clssnmUpdateNodeState: node 3, state (3/3) unique (1303591326/1303591326) prevConuni(0) birth (3/3) (old/new)
[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: SYNC(10) from node(3) completed
[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:13:18.342 [89033616] >USER:    clssnmHandleUpdate: NODE 3 (vrh3) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:13:18.342 [89033616] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms
[    CSSD]2011-04-23 17:13:18.347 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)
[    CSSD]2011-04-23 17:13:18.348 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 10 complete!
[    CSSD]2011-04-23 17:13:18.350 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (10)
[    CSSD]2011-04-23 17:13:18.350 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][0c]
[    CSSD]2011-04-23 17:13:18.351 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2
[    CSSD]2011-04-23 17:13:18.352 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(crs_version) birth(9/9)
[    CSSD]2011-04-23 17:13:18.353 [3063929744] >TRACE:   clssgmDispatchCMXMSG(): got message type(7) src(2) incarn(10) during incarn(9/9)
[    CSSD]2011-04-23 17:13:18.354 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(EVMDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(CRSDMAIN) birth(9/9)
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3
[    CSSD]2011-04-23 17:13:18.355 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh2 type 3
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock ocr_crs type 2
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(ocr_crs) birth(9/9)
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2
[    CSSD]2011-04-23 17:13:18.356 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(1) grock(#CSS_CLSSOMON) birth(9/9)
[    CSSD]2011-04-23 17:13:18.357 [3021970320] >TRACE:   clssgmEstablishConnections: 2 nodes in cluster incarn 10
[    CSSD]2011-04-23 17:13:18.366 [3063929744] >TRACE:   clssgmPeerDeactivate: node 1 (vrh1), death 10, state 0x80000000 connstate 0xa
[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmHandleDBDone(): src/dest (2/65535) size(68) incarn 10
[    CSSD]2011-04-23 17:13:18.367 [3063929744] >TRACE:   clssgmPeerListener: connects done (2/2)
[    CSSD]2011-04-23 17:13:18.369 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 10 is node(2) birth(2)

更新阶段   

[    CSSD]CLSS-3000: reconfiguration successful, incarnation 10 with 2 nodes

[    CSSD]CLSS-3001: local node number 3, master node number 2

[    CSSD]2011-04-23 17:13:18.372 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(10), with status(1)

2.另一场景为1号节点未加入集群,2号节点的网络失败,因2号节点的member number较小故其通过voting disk向3号节点发起驱逐

以下为2号节点的ocssd.log日志

[    CSSD]2011-04-23 17:41:48.643 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 29.890 seconds
[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: node vrh3 (3) is impending reconfig, flag 1037, misstime 30110
[    CSSD]2011-04-23 17:41:48.643 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:41:50.132 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 50 8.910601e-269artbeat fatal, eviction in 28.890 seconds
[    CSSD]2011-04-23 17:42:10.533 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 14.860 seconds
[    CSSD]2011-04-23 17:42:11.962 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 75 8.910601e-269artbeat fatal, eviction in 13.860 seconds
[    CSSD]2011-04-23 17:42:23.523 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 5.840 seconds
[    CSSD]2011-04-23 17:42:24.989 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 4.840 seconds
[    CSSD]2011-04-23 17:42:26.423 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 3.840 seconds
[    CSSD]2011-04-23 17:42:27.890 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 2.840 seconds
[    CSSD]2011-04-23 17:42:29.382 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 1.840 seconds
[    CSSD]2011-04-23 17:42:30.832 [3053439888] >WARNING: clssnmPollingThread: node vrh3 (3) at 90 8.910601e-269artbeat fatal, eviction in 0.830 seconds
[    CSSD]2011-04-23 17:42:32.020 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh3 (3), flags 0x040d, state 3, wt4c 0
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ALIVE
[    CSSD]2011-04-23 17:42:32.020 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)
[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:42:32.021 [117574544] >TRACE:   clssnmHandleSync: Acknowledging sync: src[2] srcName[vrh2] seq[13] sync[13]
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 3, vrh3, misstime(60000) state(5)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2011-04-23 17:42:32.021 [5073104] >USER:    NMEVENT_SUSPEND [00][00][00][0c]
[    CSSD]2011-04-23 17:42:32.021 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2011-04-23 17:42:32.022 [117574544] >TRACE:   clssnmSendVoteInfo: node(2) syncSeqNo(13)
[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(13)
[    CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmCheckDskInfo: node 3, vrh3, state 5 with leader 3 
has smaller cluster size 1; my cluster size 1 with leader 2 检查voting disk后发现子集群3为最小"子集群"(3号节点的node number较2号大);2号节点为最大子集群 [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Start [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmEvict: Evicting node 3, vrh3, birth 3, death 13, impendingrcfg 1, stateflags 0x40d [ CSSD]2011-04-23 17:42:32.022 [3032460176] >TRACE: clssnmSendShutdown: req to node 3, kill time 1643084 发起对3号节点的驱逐和shutdown request

[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmDiscHelper: vrh3, node(3) connection failed, con (0xb7e79bb0), probe((nil))
[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: Start
[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmWaitOnEvictions: node 3, vrh3, undead 1
[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmCheckKillStatus: Node 3, vrh3, down, LATS(1642874),timeout(210)
[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (15)
[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSetupAckWait: node(2) is ACTIVE
[    CSSD]2011-04-23 17:42:32.023 [3032460176] >TRACE:   clssnmSendUpdate: syncSeqNo(13)
[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(15), ackCount(1)
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 1, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 2, state (3/3) unique (1303591210/1303591210) prevConuni(0) birth (2/2) (old/new)
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmUpdateNodeState: node 3, state (5/0) unique (1303591326/1303591326) prevConuni(1303591326) birth (3/3) (old/new)
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmDeactivateNode: node 3 (vrh3) left cluster

[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: SYNC(13) from node(2) completed
[    CSSD]2011-04-23 17:42:32.024 [117574544] >USER:    clssnmHandleUpdate: NODE 2 (vrh2) IS ACTIVE MEMBER OF CLUSTER
[    CSSD]2011-04-23 17:42:32.024 [117574544] >TRACE:   clssnmHandleUpdate: diskTimeout set to (200000)ms
[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(15)
[    CSSD]2011-04-23 17:42:32.024 [3032460176] >TRACE:   clssnmDoSyncUpdate: Sync 13 complete!
[    CSSD]2011-04-23 17:42:32.024 [3021970320] >TRACE:   clssgmReconfigThread:  started for reconfig (13)
[    CSSD]2011-04-23 17:42:32.024 [3021970320] >USER:    NMEVENT_RECONFIG [00][00][00][04]
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock crs_version type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(2) grock(crs_version) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_FAILOVER type 3
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_FAILOVER) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock EVMDMAIN type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(EVMDMAIN) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock CRSDMAIN type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(CRSDMAIN) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh1 type 3
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_vrh1) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock _ORA_CRS_MEMBER_vrh3 type 3
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(0) grock(_ORA_CRS_MEMBER_vrh3) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock ocr_crs type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(ocr_crs) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupGrocks: cleaning up grock #CSS_CLSSOMON type 2
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmCleanupOrphanMembers: cleaning up remote mbr(3) grock(#CSS_CLSSOMON) birth(3/3)
[    CSSD]2011-04-23 17:42:32.025 [3021970320] >TRACE:   clssgmEstablishConnections: 1 nodes in cluster incarn 13
[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerDeactivate: node 3 (vrh3), death 13, state 0x0 connstate 0xf
[    CSSD]2011-04-23 17:42:32.026 [3063929744] >TRACE:   clssgmPeerListener: connects done (1/1)
[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmEstablishMasterNode: MASTER for 13 is node(2) birth(2)
[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2011-04-23 17:42:32.026 [3021970320] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 13 with 1 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

完成reconfiguration

[    CSSD]2011-04-23 17:42:32.027 [3021970320] >TRACE:   clssgmReconfigThread:  completed for reconfig(13), with status(1)

以下为3号节点的ocssd.log日志:

[    CSSD]2011-04-23 17:42:33.204 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 50 1.867300e-268artbeat fatal, eviction in 29.360 seconds
[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: node vrh2 (2) is impending reconfig, flag 1039, misstime 30640
[    CSSD]2011-04-23 17:42:33.204 [3053439888] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2011-04-23 17:42:55.168 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 75 1.867300e-268artbeat fatal, eviction in 14.330 seconds
[    CSSD]2011-04-23 17:43:08.182 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 5.310 seconds
[    CSSD]2011-04-23 17:43:09.661 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 4.300 seconds
[    CSSD]2011-04-23 17:43:11.144 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 3.300 seconds
[    CSSD]2011-04-23 17:43:12.634 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 2.300 seconds
[    CSSD]2011-04-23 17:43:14.053 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 1.300 seconds
[    CSSD]2011-04-23 17:43:15.467 [3053439888] >WARNING: clssnmPollingThread: node vrh2 (2) at 90 1.867300e-268artbeat fatal, eviction in 0.300 seconds
[    CSSD]2011-04-23 17:43:15.911 [3053439888] >TRACE:   clssnmPollingThread: Eviction started for node vrh2 (2), flags 0x040f, state 3, wt4c 0
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: Initiating sync 13
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmDoSyncUpdate: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (11)
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ALIVE
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmSendSync: syncSeqNo(13)
[    CSSD]2011-04-23 17:43:15.911 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(11), ackCount(1)
[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: diskTimeout set to (57000)ms
[    CSSD]2011-04-23 17:43:15.912 [89033616] >TRACE:   clssnmHandleSync: Acknowledging sync: src[3] srcName[vrh3] seq[29] sync[13]
[    CSSD]2011-04-23 17:43:15.912 [8136912] >USER:    NMEVENT_SUSPEND [00][00][00][0c]
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: done, msg type(11)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmDoSyncUpdate: Terminating node 2, vrh2, misstime(60010) state(5)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: Ack message type (13)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmSetupAckWait: node(3) is ACTIVE
[    CSSD]2011-04-23 17:43:15.913 [89033616] >TRACE:   clssnmSendVoteInfo: node(3) syncSeqNo(13)
[    CSSD]2011-04-23 17:43:15.912 [3032460176] >TRACE:   clssnmWaitForAcks: Ack message type(13), ackCount(1)
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >TRACE:   clssnmCheckDskInfo: Checking disk info...
[ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain. [ CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR: : my node(3), Leader(3), Size(1) VS Node(2), Leader(2), Size(1) 读取voting disk后发现kill block,为避免split brain,自我aborting!

[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   ###################################
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   clssscExit: CSSD aborting
[    CSSD]2011-04-23 17:43:15.913 [3032460176] >ERROR:   ###################################
[    CSSD]--- DUMP GROCK STATE DB ---
[    CSSD]----------
[    CSSD]  type 2, Id 4, Name = (crs_version)
[    CSSD]  flags: 0x1000
[    CSSD]  grant: count=0, type 0, wait 0
[    CSSD]  Member Count =2, master 2
[    CSSD]   . . . . .
[    CSSD]     memberNo =2, seq 2
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]   . . . . .
[    CSSD]     memberNo =1, seq 12
[    CSSD]     flags = 0x1000, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 2, nodeBirth 2
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 3, Id 11, Name = (_ORA_CRS_FAILOVER)
[    CSSD]  flags: 0x0
[    CSSD]  grant: count=1, type 3, wait 1
[    CSSD]  Member Count =1, master -3
[    CSSD]   . . . . .
[    CSSD]     memberNo =0, seq 0
[    CSSD]     flags = 0x12, granted 1
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 2, Id 2, Name = (EVMDMAIN)
[    CSSD]  flags: 0x1000
[    CSSD]  grant: count=0, type 0, wait 0
[    CSSD]  Member Count =2, master 2
[    CSSD]   . . . . .
[    CSSD]     memberNo =2, seq 1
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 2, nodeBirth 2
[    CSSD]     privateDataSize = 508
[    CSSD]     publicDataSize = 504
[    CSSD]   . . . . .
[    CSSD]     memberNo =3, seq 2
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 508
[    CSSD]     publicDataSize = 504
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 2, Id 5, Name = (CRSDMAIN)
[    CSSD]  flags: 0x1000
[    CSSD]  grant: count=0, type 0, wait 0
[    CSSD]  Member Count =1, master 3
[    CSSD]   . . . . .
[    CSSD]     memberNo =3, seq 2
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 128
[    CSSD]     publicDataSize = 128
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh1)
[    CSSD]  flags: 0x0
[    CSSD]  grant: count=1, type 3, wait 1
[    CSSD]  Member Count =1, master -3
[    CSSD]   . . . . .
[    CSSD]     memberNo =0, seq 0
[    CSSD]     flags = 0x12, granted 1
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 3, Id 12, Name = (_ORA_CRS_MEMBER_vrh3)
[    CSSD]  flags: 0x0
[    CSSD]  grant: count=1, type 3, wait 1
[    CSSD]  Member Count =1, master -3
[    CSSD]   . . . . .
[    CSSD]     memberNo =0, seq 0
[    CSSD]     flags = 0x12, granted 1
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 2, Id 3, Name = (ocr_crs)
[    CSSD]  flags: 0x1000
[    CSSD]  grant: count=0, type 0, wait 0
[    CSSD]  Member Count =2, master 3
[    CSSD]   . . . . .
[    CSSD]     memberNo =3, seq 2
[    CSSD]     flags = 0x0, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 32
[    CSSD]   . . . . .
[    CSSD]     memberNo =2, seq 12
[    CSSD]     flags = 0x1000, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 2, nodeBirth 2
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 32
[    CSSD]----------
[    CSSD]----------
[    CSSD]  type 2, Id 1, Name = (#CSS_CLSSOMON)
[    CSSD]  flags: 0x1000
[    CSSD]  grant: count=0, type 0, wait 0
[    CSSD]  Member Count =2, master 2
[    CSSD]   . . . . .
[    CSSD]     memberNo =2, seq 1
[    CSSD]     flags = 0x1000, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 2, nodeBirth 2
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]   . . . . .
[    CSSD]     memberNo =3, seq 2
[    CSSD]     flags = 0x1000, granted 0
[    CSSD]     refCnt = 1
[    CSSD]     nodeNum = 3, nodeBirth 3
[    CSSD]     privateDataSize = 0
[    CSSD]     publicDataSize = 0
[    CSSD]----------
[    CSSD]--- END OF GROCK STATE DUMP ---
[    CSSD]------- Begin Dump -------

沪ICP备14014813号-2

沪公网安备 31010802001379号