Know more about Enqueue Deadlock Detection

今天在 ORACLE ALLSTAR群里讨论了一个关于队列锁死锁检测的问题,原帖子在这里。 有同学指出对于enqueue lock的死锁检测应当是每3 秒钟检测一次,这样说的依据是通过一个简单可重复的实验可以证明在实际出现ora-00060 dead lock错前process等待了3s:

 

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

PROCESS A:

set timing on;
update maclean1 set t1=t1+1;

PROCESS B:

update maclean2 set t1=t1+1;

PROCESS A:
update maclean2 set t1=t1+1;

PROCESS B:

update maclean1 set t1=t1+1;

等待3s后 PROCESS A 会报

ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Elapsed: 00:00:03.02

 

 

可以看到Process A在检测到死锁之前确实等待了 3s,而且这是一个可以重复的实验,很具有说服力。

事实真的是这样的吗?

 

来看下面的演示:

 

SQL> col name for a30
SQL> col value for a5
SQL> col DESCRIB for a50
SQL> set linesize 140 pagesize 1400
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm='_enqueue_deadlock_scan_secs';

NAME                           VALUE DESCRIB
------------------------------ ----- --------------------------------------------------
_enqueue_deadlock_scan_secs    0    deadlock scan interval

SQL> alter system set "_enqueue_deadlock_scan_secs"=18 scope=spfile;

System altered.

Elapsed: 00:00:00.01

SQL> startup force;
ORACLE instance started.

Total System Global Area  851443712 bytes
Fixed Size                  2100040 bytes
Variable Size             738198712 bytes
Database Buffers          104857600 bytes
Redo Buffers                6287360 bytes
Database mounted.
Database opened.

PROCESS A:

SQL> set timing on;
SQL> update maclean1 set t1=t1+1;

1 row updated.

Elapsed: 00:00:00.06

Process B

SQL> update maclean2 set t1=t1+1;

1 row updated.

SQL>  update maclean1 set t1=t1+1;

Process A:
SQL>
SQL> alter session set events '10704 trace name context forever,level 10:10046 trace name context forever,level 8';

Session altered.

SQL> update maclean2 set t1=t1+1;
update maclean2 set t1=t1+1
       *
ERROR at line 1:

ORA-00060: deadlock detected while waiting for resource 

Elapsed: 00:00:18.05

ksqcmi: TX,90011,4a9 mode=6 timeout=21474836
WAIT #12: nam='enq: TX - row lock contention' ela= 2930070 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114759849120
WAIT #12: nam='enq: TX - row lock contention' ela= 2930636 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114762779801
WAIT #12: nam='enq: TX - row lock contention' ela= 2930439 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114765710430
*** 2012-06-12 09:58:43.089
WAIT #12: nam='enq: TX - row lock contention' ela= 2931698 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114768642192
WAIT #12: nam='enq: TX - row lock contention' ela= 2930428 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114771572755
WAIT #12: nam='enq: TX - row lock contention' ela= 2931408 name|mode=1415053318 usn<<16 | slot=589841 sequence=1193 obj#=56810 tim=1308114774504207
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

 

 

 

以上可以看到Process A从’enq: TX – row lock contention’ 等待到触发ORA-00060 deadlock detected的时间从3s 变成了 18s , 恰巧是hidden parameter “_enqueue_deadlock_scan_secs”所指定的值,该隐藏参数默认为0。

 

我们再来看另一个实验:

 

SQL> alter system set "_enqueue_deadlock_scan_secs"=4 scope=spfile;

System altered.

Elapsed: 00:00:00.01

SQL> alter system set "_enqueue_deadlock_time_sec"=9 scope=spfile;

System altered.

Elapsed: 00:00:00.00

SQL> startup force;
ORACLE instance started.

Total System Global Area  851443712 bytes
Fixed Size                  2100040 bytes
Variable Size             738198712 bytes
Database Buffers          104857600 bytes
Redo Buffers                6287360 bytes
Database mounted.
Database opened.

SQL> set linesize 140 pagesize 1400

SQL> show parameter dead

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_enqueue_deadlock_scan_secs          integer                          4
_enqueue_deadlock_time_sec           integer                          9

SQL> set timing on

SQL> select * from maclean1 for update wait 8; 
        T1
----------
        11

Elapsed: 00:00:00.01

PROCESS B

SQL> select * from maclean2 for update wait 8;

        T1
----------
         3

SQL> select * from maclean1 for update wait 8;
select * from maclean1 for update wait 8

PROCESS A

SQL> select * from maclean2 for update wait 8;
select * from maclean2 for update wait 8
              *
ERROR at line 1:
ORA-30006: resource busy; acquire with WAIT timeout expired

Elapsed: 00:00:08.00

 

 

以上我们可以看到 当指定 select for update wait的enqueue request timeout 等待超时为8s时 ,虽然我们指定了”_enqueue_deadlock_scan_secs”=4(deadlock scan interval
),即4s做一次deadlock detected,但是实际Process A并没有做deadlock 的检测, 为什么这样说?

 

因为Process A在这里等待8s后raised的是”ORA-30006: resource busy; acquire with WAIT timeout expired”错误,而非ORA-00060,这就说明了process A没有做死锁检测。

 

这是因为隐藏参数”_enqueue_deadlock_time_sec”(requests with timeout <= this will not have deadlock detection)的存在,当enqueue request time < “_enqueue_deadlock_time_sec”时Server process就会放弃做dead lock detection,这是因为即然指定了enqueue request 超时的时间且timeout值又不是很大(_enqueue_deadlock_time_sec默认值为5,即timeout<5s),那么不做死锁检测也是可以接受的;而对于指定的timeout>”_enqueue_deadlock_time_sec”的情况,Oracle仍需要定时做死锁检测避免长时间的死锁发生。

 

我们再来看下面的演示:

 

 

SQL> show parameter dead

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_enqueue_deadlock_scan_secs          integer                          4
_enqueue_deadlock_time_sec           integer                          9

Process A:

SQL> set timing on;
SQL> select * from maclean1 for update wait 10;  

        T1
----------
        11

Process B:

SQL> select * from maclean2 for update wait 10;  

        T1
----------
         3

SQL> select * from maclean1 for update wait 10;  

PROCESS A:

SQL> select * from maclean2 for update wait 10;
select * from maclean2 for update wait 10
              *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Elapsed: 00:00:06.02

 

 

这一次我们指定 select for update wait 10即10s超时, 因为 10s大于当前的_enqueue_deadlock_time_sec设置值(9s),所以Process A就会做死锁检测。 但是可以看到这里死锁检测时间为6s 而不是我们指定_enqueue_deadlock_scan_secs的4s 。

 

通过反复的测试可以发现,实际的死锁检测时间总是_enqueue_deadlock_scan_secs指定的值向上取整为3的倍数。

 

总结:

enqueue lock队列锁的死锁检测遵循以下原则

1. 默认情况下死锁检测deadlock detection总是3s发生一次, 但是实际受到参数_enqueue_deadlock_scan_secs(deadlock scan interval)的控制,该参数默认为0,死锁检测时间总是_enqueue_deadlock_scan_secs指定的值向上取整为3的倍数, 当_enqueue_deadlock_scan_secs=0 则为3s一次, 当_enqueue_deadlock_scan_secs=4则为6s一次,依此类推。

2. 死锁检测还受到_enqueue_deadlock_time_sec(requests with timeout <= this will not have deadlock detection)的影响,若enqueue request timeout< _enqueue_deadlock_time_sec(默认值为5),则Server process不会做死锁检测。若enqueue request timeout>_enqueue_deadlock_time_sec 则会参考_enqueue_deadlock_scan_secs定期做死锁检测, 当然request timeout的指定不限于select for update wait [TIMEOUT]一种。

 

补充:

 

实际在10.2.0.1中尚未引入以上的这2个hidden parameter , 大约在patchset 10.2.0.3开始引入了 _enqueue_deadlock_time_sec, 在patchset 10.2.0.5中开始引入了_enqueue_deadlock_scan_secs。

 

有同学指出RAC中的死锁检测默认应当为10s, 这受到隐藏参数_lm_dd_interval(dd time interval in seconds) ,该参数最早在版本8.0.6中引入。 通过实验可以验证这位同学的说法,我也表示支持,  在10g中该参数默认为60s,从11g开始默认减少到10s。  需要注意的是在11g之前_lm_dd_interval这个参数指定值是不可信任的,这是因为在11g中对LMD进程的死锁检测进行了优化,原本在节点数量较多的RAC中LMD的Deadlock Detection可能消耗更多的CPU,以下是11g中Oracle内部研发Team测试的LMD在多节点情况下的CPU使用情况:

 

 

 

我们进一步来了解11g中LMD检测死锁的细节,这里我们会用到11g特有的 UTS TRACE 来获取更多 DD的信息:

 

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL>
SQL> select * from global_name
  2  ;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn 

SQL> alter system set "_lm_dd_interval"=20 scope=spfile;

System altered.

SQL> startup force;
ORACLE instance started.

Total System Global Area 1570009088 bytes
Fixed Size                  2228704 bytes
Variable Size            1325403680 bytes
Database Buffers          234881024 bytes
Redo Buffers                7495680 bytes
Database mounted.
Database opened.

SQL> set linesize 140 pagesize 1400
SQL> show parameter lm_dd

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_lm_dd_interval                      integer                          20

SQL> select count(*) from gv$instance;

  COUNT(*)
----------
         2

instance 1:
SQL> oradebug setorapid 12
Oracle pid: 12, Unix process pid: 8608, image: oracle@vrh2.oracle.com (LMD0)

对 LMD0进程做 UTS TRACE所有RAC相关的内部操作将无所遁形
SQL> oradebug event 10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high;
Statement processed.

Elapsed: 00:00:00.00

SQL> update maclean1 set t1=t1+1;

1 row updated.

instance 2:

SQL> update maclean2 set t1=t1+1;

1 row updated.

SQL> update maclean1 set t1=t1+1;

Instance 1:

SQL> update maclean2 set t1=t1+1;
update maclean2 set t1=t1+1
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource

Elapsed: 00:00:20.51

LMD0进程的UTS TRACE

2012-06-12 22:27:00.929284 : [kjmpbmsg:process][type 22][msg 0x7fa620ac85a8][from 1][seq 8148.0][len 192]
2012-06-12 22:27:00.929346 : [kjmxmpm][type 22][seq 0.0][msg 0x7fa620ac85a8][from 1]

*** 2012-06-12 22:27:00.929

* kjddind: received DDIND msg with subtype x6

*          reqp->dd_master_inst_kjxmddi == 1

* kjddind: dump sgh:
2012-06-12 22:27:00.929346*: kjddind: req->timestamp [0.15], kjddt [0.13]
2012-06-12 22:27:00.929346*: >> DDmsg:KJX_DD_REMOTE,TS[0.15],Inst 1->2,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1
2012-06-12 22:27:00.929346*: lock [0x95023930,829], op = [mast]
2012-06-12 22:27:00.929346*: reqp->timestamp [0.15], kjddt [0.13]
2012-06-12 22:27:00.929346*: kjddind: updated local timestamp [0.15]

* kjddind: case KJX_DD_REMOTE
2012-06-12 22:27:00.929346*: ADD IO NODE WFG: 0 frame pointer
2012-06-12 22:27:00.929346*: PUSH: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: PROCESS: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: POP: type=res, enqueue(0xffffffff.0xffffffff)=0xbbb9af40, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: kjddopr[TX 0xe000c.0x32][ext 0x5,0x0]: blocking lock 0xbbb9a800, owner 2097154 of inst 2
2012-06-12 22:27:00.929346*: PUSH: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: PROCESS: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: ADD NODE TO WFG: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: POP: type=txn, enqueue(0xffffffff.0xffffffff)=0xbbb9a800, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: kjddopt: converting lock 0xbbce92f8 on 'TX' 0x80016.0x5d4,txid [2097154,34]of inst 2
2012-06-12 22:27:00.929346*: PUSH: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: PROCESS: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929346*: ADD NODE TO WFG: type=res, enqueue(0xffffffff.0xffffffff)=0xbbce92f8, block=KJUSEREX, snode=1
2012-06-12 22:27:00.929855 : GSIPC:AMBUF: rcv buff 0x7fa620aa8cd8, pool rcvbuf, rqlen 1102
2012-06-12 22:27:00.929878 : GSIPC:GPBMSG: new bmsg 0x7fa620aa8d48 mb 0x7fa620aa8cd8 msg 0x7fa620aa8d68 mlen 192 dest x100 flushsz -1
2012-06-12 22:27:00.929878*: << DDmsg:KJX_DD_REMOTE,TS[0.15],Inst 2->1,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1
2012-06-12 22:27:00.929878*: lock [0xbbce92f8,287], op = [mast]
2012-06-12 22:27:00.929878*: ADD IO NODE WFG: 0 frame pointer
2012-06-12 22:27:00.929923 : [kjmpbmsg:compl][msg 0x7fa620ac8588][typ p][nmsgs 1][qtime 0][ptime 0]
2012-06-12 22:27:00.929947 : GSIPC:PBAT: flush start. flag 0x79 end 0 inc 4.4
2012-06-12 22:27:00.929963 : GSIPC:PBAT: send bmsg 0x7fa620aa8d48 blen 224 dest 1.0
2012-06-12 22:27:00.929979 : GSIPC:SNDQ: enq msg 0x7fa620aa8d48, type 65521 seq 8325, inst 1, receiver 0, queued 1
012-06-12 22:27:00.929979 : GSIPC:SNDQ: enq msg 0x7fa620aa8d48, type 65521 seq 8325, inst 1, receiver 0, queued 1
2012-06-12 22:27:00.929996 : GSIPC:BSEND: flushing sndq 0xb491dd28, id 0, dcx 0xbc517770, inst 1, rcvr 0  qlen 0 1
2012-06-12 22:27:00.930014 : GSIPC:BSEND: no batch1 msg 0x7fa620aa8d48 type 65521 len 224 dest (1:0)
2012-06-12 22:27:00.930088 : kjbsentscn[0x0.3f72dc][to 1]
2012-06-12 22:27:00.930144 : GSIPC:SENDM: send msg 0x7fa620aa8d48 dest x10000 seq 8325 type 65521 tkts x1 mlen xe00110
2012-06-12 22:27:00.930531 : GSIPC:KSXPCB: msg 0x7fa620aa8d48 status 30, type 65521, dest 1, rcvr 0
WAIT #0: nam='ges remote message' ela= 1372 waittime=80 loop=0 p3=74 obj#=-1 tim=1339554420931640
2012-06-12 22:27:00.931728 : GSIPC:RCVD: ksxp msg 0x7fa620af6490 sndr 1 seq 0.8149 type 65521 tkts 1
2012-06-12 22:27:00.931746 : GSIPC:RCVD: watq msg 0x7fa620af6490 sndr 1, seq 8149, type 65521, tkts 1
2012-06-12 22:27:00.931763 : GSIPC:RCVD: seq update (0.8148)->(0.8149) tp -15 fg 0x4 from 1 pbattr 0x0
2012-06-12 22:27:00.931779 : GSIPC:TKT: collect msg 0x7fa620af6490 from 1 for rcvr 0, tickets 1
2012-06-12 22:27:00.931794 : kjbrcvdscn[0x0.3f72dc][from 1][idx 2012-06-12 22:27:00.931810 : kjbrcvdscn[no bscn dd_master_inst_kjxmddi == 1

* kjddind: dump sgh:
NXTIN (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2
BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2
NXTOUT (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
2012-06-12 22:27:00.932058*: kjddind: req->timestamp [0.15], kjddt [0.15]
2012-06-12 22:27:00.932058*: >> DDmsg:KJX_DD_VALIDATE,TS[0.15],Inst 1->2,ddxid[id1,id2,inst:2097153,31,1],ddlock[0x95023930,829],ddMasterInst 1
2012-06-12 22:27:00.932058*: lock [(nil),0], op = [vald_dd]
2012-06-12 22:27:00.932058*: kjddind: updated local timestamp [0.15]

* kjddind: case KJX_DD_VALIDATE

*** 2012-06-12 22:27:00.932

* kjddvald called: kjxmddi stuff:

*                  cont_lockp (nil)

*                  dd_lockp 0x95023930

*                  dd_inst 1

*                  dd_master_inst 1

* sgh graph:
NXTIN (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2
BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2
NXTOUT (nil) 0 wq 0 cvtops x0  0x0.0x0(ext 0x0,0x0)[0000-0000-00000000] inst 1
POP WFG NODE: lock=(nil)

* kjddvald: dump the PRQ:
BLOCKER 0xbbb9a800 5 wq 1 cvtops x28 TX 0xe000c.0x32(ext 0x5,0x0)[20000-0002-00000022] inst 2
BLOCKED 0xbbce92f8 5 wq 2 cvtops x1 TX 0x80016.0x5d4(ext 0x2,0x0)[20000-0002-00000022] inst 2

* kjddvald: KJDD_NXTONOD ->node_kjddsg.dinst_kjddnd =1

* kjddvald: ... which is not my node, my subgraph is validated but the cycle is not complete
Global blockers dump start:---------------------------------
DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x80016][0x5d4],[TX][ext 0x2,0x0]
发现dead lock!!!

 

 

可以看到在以上11.2.0.3的测试中 RAC LMD的死锁检测时间确实受到”_lm_dd_interval”参数的影响变成了指定值的20s。  需要注意的是在10g中_lm_dd_interval默认为60s,在节点较少且Processes数不多的环境中这个参数往往是可信的,但是如果节点很多且有大量Server Process则死锁未必能在这60s内检测到。在11g中对该参数(实际优化的是LMD的死锁检测算法)做了充分的优化,一般可以信任其在默认的10s内完成 Enqueue Deadlock Detection。

 

在11g中控制 RAC中LMD进程检测死锁的hidden parameter 还不仅于”_lm_dd_interval”这一个,RAC的内部机制永远是一个有意思的话题,但是今天我们就聊到这里:

 

SQL> col name for a50
SQL> col describ for a60
SQL> col value for a20
SQL> set linesize 140 pagesize 1400
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm like '_lm_dd%';

NAME                                               VALUE                DESCRIB
-------------------------------------------------- -------------------- ------------------------------------------------------------
_lm_dd_interval                                    20                   dd time interval in seconds
_lm_dd_scan_interval                               5                    dd scan interval in seconds
_lm_dd_search_cnt                                  3                    number of dd search per token get
_lm_dd_max_search_time                             180                  max dd search time per token
_lm_dd_maxdump                                     50                   max number of locks to be dumped during dd validation
_lm_dd_ignore_nodd                                 FALSE                if TRUE nodeadlockwait/nodeadlockblock options are ignored

6 rows selected.

Oracle队列锁enq:TS,Temporary Segment (also TableSpace)

有多少TS,Temporary Segment资源?

每个临时段(例如排序端)对应2个资源
正被drop掉的tablespace或CREATE ROLLBACK SEGMENT对应一个TS资源

有多少个TS lock锁资源?

1个锁资源/对应每一个进程正在执行的如下类操作

 

该资源有多少用户?

所有正在使用临时空间、DROP TABLESPACE 、或CREATE ROLLBACK SEGMENT的用户进程

 

谁使用该资源?

所有正在使用临时空间、DROP TABLESPACE 、或CREATE ROLLBACK SEGMENT的用户进程

 

何时被使用?

对于每一个临时段的2个临时段锁TS enqueue resource有2种不同的用途。 第一目的是为了串行化”High Water Mark”的上涨。第二目的是为了串行化创建、使用和删除一个临时段。当dropping tablespace或创建一个回滚段时将需要tablespace enqueue,目的是避免在row cache字典缓存层面出现的死锁。

 

ID1,ID2的组合:

ID1                                      ID2             含义

segment dba                    0                  串行控制高水位线
segment dba                    1                   创建、使用、删除临时段
tablespace number       2                  避免发生在create rollback segment和drop tablespace时的死锁

 

Lock Value Block:

Not Used.

Init.ora Parameters:

_bump_highwater_mark_count (7.3).

Scope:

Local, Global

Deadlock Sensitive:

No.

Operation:

Synchronous.

 

 

Oracle等待事件Enqueue CI:Cross Instance Call Invocation

“Cross Instance call Enqueue”是一种在一个或多个instance实例间调用后台进程行为时用到的队列锁,具体调用的后台进程行为包括检查点checkpoint、日志切换logfile switch、shutdown实例、载入数据文件头等等。需要注意的是这种Enqueue Lock并不仅仅在RAC中使用,即便是单节点也会用到。CI锁的数量取决于并行执行Cross Instance Call调用的进程的总数。


SQL> col ksqsttyp for a20
SQL> col ksqstrsn for a20
SQL> col ksqstexpl for a80
SQL> set linesize 200 pagesize 2000;
SQL> select ksqsttyp,ksqstrsn,ksqstexpl from x$ksqst where ksqsttyp='CI';

KSQSTTYP             KSQSTRSN             KSQSTEXPL
-------------------- -------------------- --------------------------------------------------------------------------------
CI                   contention           Coordinates cross-instance function invocations

SQL> show parameter cluster_database

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cluster_database                     boolean     FALSE
cluster_database_instances           integer     1


SQL> select * from v$enqueue_stat where eq_type='CI';

   INST_ID EQ TOTAL_REQ# TOTAL_WAIT#  SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
---------- -- ---------- ----------- ---------- ----------- -------------
         1 CI        595           0        595           0             0

当系统中出现有大量这种跨实例后台进程调用时,将出现CI队列锁的争用。假设在一个RAC场景中,同时有大量的回话开始对不同的数据表执行TRUNCATE截断操作,TRUNCATE的一个前提是在所有实例上(因为对象表的dirty buffer可能分布在多个实例上)发生对象级别的检查点(object level checkpoint),检查点发生时CKPT进程会通知DBWR写出指定对象表相关的脏块,DBWR需要扫描Buffer Cache以找出脏块,而如果Buffer Cache很大那么扫描将花费大量的时间,而在此过程中前台进程将一直排他地持有着本地的CI队列锁,这就将造成CI锁的严重争用。

为了减少CI队列锁地争用,我们第一步所要做的是找出实际的Cross Instance call跨实例调用的类型。这里要另外提一下的是在10g以前不管是v$session_wait或statspack中都不会将enqueue锁等待事件的具体enqueue lock类型写明,一般需要我们从p1/p2/p3列中找出enqueue的具体身份,例如”WAIT #1: nam=’enqueue’ ela= 910796 p1=1128857606 p2=1 p3=4″,这里的p1为1128857606也就是16进制的43490006,高位的’4349’转换为ascii码也就是’CI’,而这里的p2/p3对应为V$lock中的ID1/ID2,ID1=1代表了”Reuse (checkpoint and invalidate) block range”,ID2=4代表了”Mounted excl, use to allocate mechanism”。

具体ID1/ID2代表的含义在不同版本中有所变化,可以参考下表:

Id1, Id2 Combination:

   Oracle 10gR1

27 TO 29  *Same as 9i R2

                30    process waiters after row cache requeue
                31    Active Change Directory extent relocation
                32    block change tracking state change
                33    kgl mulitversion obsolete
                34    set previous resetlogs data
                35    set recovery destination pointer
                36    fast object reuse request
                37    test ksbcic()
                38    ASM diskgroup discovery wait
                39    ASM diskgroup release
                40    ASM push DB updates
                41    ASM add ACD chunk
                42    ASM map resize message
                43    ASM map lock message
                44    ASM map unlock message (phase 1)
                45    ASM map unlock message (phase 2)
                46    ASM generate add disk redo marker
                47    ASM check of PST validity
                48    ASM offline disk CIC
                49    Logical Standby Sync Point SCN
                50    update SQL Tuning Base existence bitvector
                51    PQ induced Checkpointing
                52    ASM F1X0 relocation
                53    Scheduler autostart
                54    KZS increment grant/revoke counter
                55    ASM disk operation message
                56    ASM I/O error emulation
                57    DB Supp log cursor invalidation
                58    Cache global range invalidation
                59    Cache global object invalidation
                60    ASM Pre-Existing Extent Lock wait
                61    Perform a ksk action through DBWR
                62    ASM diskgroup refresh wait 

   Oracle 10gR2

       30 to 62 *Same as 10gR1

               63    KCBO object checkpoint
               64    KCBO object pq checkpoint
               65    global health check event
               66    Oracle Label Security refresh
               67    thread internal enable
               68    cross-instance registration
               69    KGL purge unused subheaps
               70    clear pin instance flag
               71    Rolling operations CIC

   Oracle 9iR2

                 Id1   Meaning
                 ~~~   ~~~~~~

             25 TO 26  *Same as 9i R1

                 27    set Database Force Logging mode
                 28    invalidate cached file address translations
                 29    Cursor Unauthorize Mode
                 30    snapshot too old diagnosis
                 31    process waiters after row cache requeue

   Oracle 9iR1

                Id1    Meaning
                ~~~    ~~~~~~~~

             18 TO 24  *Same as Oracle 8i

                25     Update Dscn Tracking (ktcndt)
                26     Purge dictionary Object number Cache

   Oracle 8i
                Id1     Meaning
                ~~~     ~~~~~~~

             0 TO 17    *Same as Oracle 8(please see "Oracle8*" for the Meaning)

                18      Object reuse request
                19      Rolling release checks
                20      Propagate begin backup scn for a file
                21      Refresh top plan (for db scheduler)
                22      Clear checkpoint progress record
                23      Drop temp file
                24      Quiesce database Restricted 

               Id2      Meaning
               ~~~      ~~~~~~~

               0x01     Used to pass in parameters
               0x02     Used to invoke the function in backgroud process
               0x03     Used to indicate the foreground has not returned
               0x04     Mounted excl, use to allocate mechanism
               0x05     Used to queue up interested clients 

    Oracle 8*

		Id1 	Meaning
		~~~	~~~~~~~
                0       Checkpoint block range
                1       Reuse (checkpoint and invalidate) block range
                2       LGWR Checkpointing and Hot Backup
                3       DBWR syncronization of SGA with control file
                4       Log file add/drop/rename notification
                5       Write buffer for CR read
                6       Test call
                7       Invalidate KCK cache in all instances
                8       Alter rollback segment optimal
                9       Signal Query Servers/coordinator
                10      Create Remote parallel query Server
                11      Set Global Partitions
                12      Stop Disk Writes
                13      Drop Sort Segments
                14      Release unused space from Sort Segments
                15      Instance Recovery for Parallel operation Group
                16      Validate parallel slave Lock Value
                17      Check transaction state objects
                18      Flush blocks in object
                19      Rolling release checks
                20      Propagate begin backup scn for a file
                21      Clear checkpoint progress record

     Oracle 7

		Id1 	Meaning
		~~~	~~~~~~~
		0 	Flush buffers for reuse as new class
		1 	LGWR checkpointing and Hot Backup
		2 	DBWR synchronization of SGA with control file
		3 	Log file add/drop/rename notification
		4 	Write buffer for CR read
		5 	Test Call
		6 	Invalidate KCK cache in all instances
		7 	Alter rollback segment optimal
		8 	Signal Query Servers/coordinator
		9 	Create Remote Parallel Query Server
		10 	Set Global Partitions
		11 	Stop Disk Writes
		12 	Drop Sort Segments
		13 	Release unused space from Sort Segments
		14 	Instance Recovery for Parallel operation Group
		15 	Validate parallel slave Lock Value
		16 	Check Transaction State Objects

		Id2 	Meaning
		~~~	~~~~~~~
		1 	Pass in Parameters
		2 	Invoke the call in background process
		3 	Foreground has not returned yet
		4 	Used to allocate the CI call
		5 	Used to queue up interested clients

沪ICP备14014813号-2

沪公网安备 31010802001379号