详解cursor: pin S wait on X等待事件

‘cursor: pin * events’等待事件

该类等待事件一般是为了pin相关的子游标

‘Cursor: pin S on X’ 最常见的等待事件,  进程为了共享操作例如执行pin游标而以SHRD S mode申请mutex, 但是未立即获得。原因是该游标被其他进程以EXCL X mode 持有了。

实际该 cursor: pin S wait on X等待事件往往是由于其他因素诱发的。Mutex争用仅仅是问题的症状,但根本原因需要Database Consultant 进一步挖掘。

 

下面我们列出一些已知的常见案例, 在这些例子中可以看到 我上面提到的 Mutex的争用仅仅是伪争用:

 

过多的子游标 High Version Counts

 

过多的子游标版本Version Count可能导致Mutex 争用,一般一个SQL的Version Count不要高于500。

检查High Version Count很简单, 在AWR里就有SQL ordered by High Version Count,也可以写SQL查V$SQL、V$SQLAREA

 

 

昂贵的X$、V$视图查询

 

一些对于V$、X$视图的查询,需要访问X$KGL*之类的fixed table,可能触发Mutex争用。

Mutex持有者得不到CPU

 

Mutex持有者若得不到足够的CPU片可能一直阻塞他人,直到它拿到需要的CPU。

这种情况可能由于OS操作系统的实际情况或者使用Resource Manager而引起。需要配合AWR中的Host CPU、Instance CPu一起看。

 

已经被KILLED的SESSION仍持有Mutex

 

当session正持有Mutex,而其对应的Process被强制KILL掉, 则直到PMON彻底清理掉该Dead Process并释放Mutex,其他session才能不再等待。  诊断该类问题,最好能检查PMON的TRACE。 当然也存在部分BUG会导致PMON清理过程非常慢。

举例来说,bug 9312879描述了一种场景:PMON 需要获得某个Mutex以便清理某个dead process,但是该Mutex又被其他进程持有,则PMON甚至无法开始真正清理并释放Mutex。

 

 

 

 

详见 《深入理解Oracle中的Mutex》一文

 

 

 

如何模拟 cursor pin S wait on X 等待事件

 

 

 
session A:
SQL> create table emp(t1 int);

Table created.

 

SQL> var b1 number;
SQL> begin :b1 := 7902; end;
2 /

 
SQL> select * from emp where t1=:b1;

no rows selected

 
oracle@localhost:~$ ps -ef|grep LOCAL
oracle 11734 11723 0 05:05 ? 00:00:00 oraclecdb1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11751 11748 0 05:06 ? 00:00:00 oraclecdb1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11757 11754 0 05:06 ? 00:00:00 oraclecdb1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 11785 11760 0 05:06 pts/3 00:00:00 grep LOCAL
oracle@localhost:~$
oracle@localhost:~$ gdb `which oracle` 11734

(gdb) break kxsPeekBinds
Breakpoint 1 at 0x417e870
(gdb) commands
Type commands for when breakpoint 1 is hit, one per line.
End with a line saying just “end”.
>bt 4
>end
(gdb) cont
Continuing.

 

打开sessio B 和session C
session B:
SQL> alter system flush shared_pool;

System altered.
此时session A执行 之前的语句
select * from emp where t1=:b1; ==> 被gdb break住

GDB显示breakpoint kxsPeekBinds,窥视绑定值

Breakpoint 1, 0x000000000417e870 in kxsPeekBinds ()
#0 0x000000000417e870 in kxsPeekBinds ()
#1 0x00000000047e5b52 in opitca ()
#2 0x0000000003ee67e9 in kksSetBindType ()
#3 0x0000000003ef80fa in kksfbc ()
session C:
SQL> conn c##maclean/oracle
Connected.
SQL> var b1 number;
SQL> begin :b1 := 7902; end;
2 /

PL/SQL procedure successfully completed.

SQL> select * from emp where t1=:b1;

==》 等在cursor pin S on X上

 

sessioD 验证:

select event from v$session where wait_class!=’Idle’;
EVENT
—————————————————————-
cursor: pin S wait on X

 
SQL> select spid from v$process where addr=(select paddr from v$SESSION where sid=43);

SPID
————————
12318

SQL> oradebug setospid 12318;
Oracle pid: 40, Unix process pid: 12318, image: oracle@localhost.localdomain (TNS V1-V3)
SQL> oradebug short_stack;
ksedsts()+313<-ksdxfstk()+34<-ksdxcb()+886<-sspuser()+148<-__sighandler()<-__select()+19<-skgpwwait()+230<-kgxWait()+1486<-kgxSharedExamine()+352<-kxsGetRuntimeLock()+254<-kkscsCheckCursor()+555<-kkscsSearchChildList()+1772<-kksfbc()+11554<-kkspsc0()+1296<-kksParseCursor()+118<-opiosq0()+1577<-kpooprx()+289<-kpoal8()+737<-opiodr()+929<-ttcpip()+1230<-opitsk()+1638<-opiino()+883<-opiodr()+929<-opidrv()+589<-sou2o()+120<-opimai_real()+146<-ssthrdmain()+392<-ssthrdMain_e()+18<-main()+220<-__libc_start_main()+244
kkscsCheckCursor=》kxsGetRuntimeLock=》kgxSharedExamine=》kgxWait
col mutex_type for a20
col location for a20
SQL> select * from v$mutex_sleep;

MUTEX_TYPE LOCATION SLEEPS WAIT_TIME CON_ID
——————– ——————– ———- ———- ———-
Cursor Pin kkslce [KKSCHLPIN2] 183369 161405107 0

SQL> /

MUTEX_TYPE LOCATION SLEEPS WAIT_TIME CON_ID
——————– ——————– ———- ———- ———-
Cursor Pin kkslce [KKSCHLPIN2] 191003 161405107 0

SQL> /

MUTEX_TYPE LOCATION SLEEPS WAIT_TIME CON_ID
——————– ——————– ———- ———- ———-
Cursor Pin kkslce [KKSCHLPIN2] 193763 161405107 0

 

这些location 代码存放在kksc中

col requesting_session for 9999
col blocking_session for 9999
select mutex_identifier, requesting_session REQ,
blocking_session BLOCKING,location
from v$mutex_sleep_history;

MUTEX_IDENTIFIER REQ BLOCKING LOCATION
—————- ———- ———- ——————–
2970438836 43 1 kkslce [KKSCHLPIN2]

SQL> select sql_text from v$sqlarea where hash_value =’2970438836′;

SQL_TEXT
——————————————————————————–
select * from emp where t1=:b1
SQL> conn c##maclean/oracle
Connected.
SQL> select * from emp where t1=:b1;

SQL> var b1 number;
SQL> begin :b1 := 7902; end;
2 /

PL/SQL procedure successfully completed.

 

SQL> select * from emp where t1=:b1; ==》也HANG住

 

再打开一个session :
SQL> oradebug setmypid
Statement processed.

SQL> oradebug dump systemstate 266;
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_12702.trc

 

 

 

 
下面是父游标”select * from emp where t1=:b1″的 library cache dump,他只有一个child Handle=0x76aef758
SO: 0x7d8c28b0, type: 90, owner: 0x8496c6e8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x84cca8f8, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8739, pg=0 pdbuid=1

LibraryObjectLock: Address=0x7d8c28b0 Handle=0x7b1ee490 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

User=0x8496c6e8 Session=0x8496c6e8 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=5168f05b
LibraryHandle: Address=0x7b1ee490 Hash=b10d4cb4 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=select * from emp where t1=:b1
FullHashValue=3f0fd3f68247850db833fe62b10d4cb4 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=1 Identifier=2970438836 OwnerIdn=101
Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=2 ActiveLocks=2 TotalLockCount=2 TotalPinCount=1
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 Version=0 BucketInUse=1 HandleInUse=1 HandleReferenceCount=0
Concurrency: DependencyMutex=0x7b1ee540(0, 1, 0, 0) Mutex=0x7b1ee5d0(0, 27, 0, 0)
Flags=RON/PIN/TIM/PN0/DBN/[10012841]
WaitersLists:
Lock=0x7b1ee520[0x7b1ee520,0x7b1ee520]
Pin=0x7b1ee500[0x7b1ee500,0x7b1ee500]
LoadLock=0x7b1ee578[0x7b1ee578,0x7b1ee578]
Timestamp: Current=04-13-2013 05:42:51
HandleReference: Address=0x7b1ee660 Handle=(nil) Flags=[00]
LibraryObject: Address=0x7e0c7bc8 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
ChildTable: size=’16’
Child: id=’0′ Table=0x7e0c8a60 Reference=0x7e0c84d8 Handle=0x76aef758
NamespaceDump:
Parent Cursor: sql_id=bhczycashum5n parent=0x7e0c7c90 maxchild=1 plk=y ppn=n

 

下面是 子游标 child cursor的library cache dump,可以看到它的PinMode=X 即他被PIN住了
SO: 0x77c888e0, type: 90, owner: 0x84920248, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x84ccff88, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:8739, pg=0 pdbuid=1

LibraryObjectLock: Address=0x77c888e0 Handle=0x76aef758 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0
Context=0x7f3b9f18a140 User=0x84920248 Session=0x84920248 ReferenceCount=1
Flags=CBK/[0020] SavepointNum=0
LibraryHandle: Address=0x76aef758 Hash=0 LockMode=N PinMode=X LoadLockMode=0 Status=VALD
Name: Namespace=SQL AREA(00) Type=CURSOR(00)
Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=1 ActiveLocks=2 TotalLockCount=2 TotalPinCount=3
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 Version=0 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
Concurrency: DependencyMutex=0x76aef808(0, 0, 0, 0) Mutex=0x7b1ee5d0(0, 27, 0, 0)
Flags=RON/PIN/PN0/EXP/CHD/[10012111]
WaitersLists:
Lock=0x76aef7e8[0x76aef7e8,0x76aef7e8]
Pin=0x76aef7c8[0x76aef7c8,0x76aef7c8]
LoadLock=0x76aef840[0x76aef840,0x76aef840]
LibraryObject: Address=0x7e3a10c0 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
DataBlocks:
Block: #=’0′ name=KGLH0^b10d4cb4 pins=0 Change=NONE
Heap=0x78f80a48 Pointer=0x7e3a1188 Extent=0x7e3a1030 Flags=I/-/P/A/-/-
FreedLocation=0 Alloc=2.679688 Size=3.976562 LoadTime=4318064770
Block: #=’6′ name=SQLA^b10d4cb4 pins=0 Change=NONE
Heap=0x7e0c82a8 Pointer=0x7bae6bf8 Extent=0x7bae5fc0 Flags=I/-/P/A/-/E
FreedLocation=0 Alloc=0.000000 Size=0.000000 LoadTime=0
NamespaceDump:
Child Cursor: Heap0=0x7e3a1188 Heap6=0x7bae6bf8 Heap0 Load Time=04-13-2013 05:42:51 Heap6 Load Time=04-13-2013 05:42:51

KGLH0^b10d4cb4 =》 b10d4cb4= 2970438836 即 KGLH0^2970438836

 

 

 下面我们列出一些Mutex相关的BUG:

Bug 5399325 : Mutex latch spin causes high cpu on non-CAS platforms
Bug 5443568 : Excess cpu from concurrent calls to kksclearcursorstat()
Bug 5594520 : Some columns in the mutex V$ views have trailing null characters.
Bug 5522582 : Ora-600 [kxsGetRuntimeLock2] might be encountered under rare circumstances.
Bug 4755704 : Incorrect V$SQL statistics when mutex feature enabled.
Bug 5485914 : Self deadlock with ‘cursor: pins S wait on X’ when trying to explain / tracing a remote
mapped cursor.
Bug 5907779 : Self deadlock with ‘cursor: pins S wait on X’ when executing dbms_stats under
10.2.0.x and where the deadlocked cursor involves recursive dictionary SQL.
Bug 5983020 : Documents a rare case where we might see the ‘cursor: pin S wait on X’ wait event.
Bug 5476091 : Interrupt support added to the mutex feature.
Bug 6143117 : Ora-600 [kgllockownerslistdelete] could be raised in early 11.1 releases because of
incorrect use of SL_WRITE_BARRIER in the mutex code.
Bug 6753486 : PMON can spin trying to acquire a mutex during process cleanup.
Bug 6523644 : Latch violation might be seen with the mutex feature.
Bug 7202166 : The view V$MUTEX_SLEEP is broken in early 11g releases.
Bug 7211086 : The view V$MUTEX_SLEEP_HISTORY can return a null LOCATION column.
Bug 6904068 : High CPU usage possible for “cursor: pin S” waits.
Bug 7234778 : Excessive “cursor: pin S wait on X” waits when no X mutex holder.
Bug 5928271 : Mutex performance improvement for 11g’s use of librarycache (KGL) mutexes.
Bug 7462072 : Waits on “cursor: pin S wait on X” for the same reasons as bug 7234778. This is the
10g fix.
Bug 7226463 : Use of execute immediate can lead to mutex waits under pre-11g releases if an
exception is hit.
Note 580273.1 states that mutex waits might be seen under HP-UX Itanium because of a problem
with the operating system’s pw_wait() call.
Bug 5500044 : DDL on partitioned tables with CLOB or BLOB columns might lead to waits on
‘cursor: pin X’ for a pseudo cursor.
Bug 7585574 : The view v$mutex_sleep might not return all the rows that it should.
Bug 7254221 : Library cache mutex usage doesn’t record wait_time.
Bug 7568642 : states that the BLOCKING_SESSION column in v$active_session_history and
v$session is unreliable when there is a mutex blocker. Bug 5159195 is another such case.
Bug 7307972 covers a case where there were lots of waits on “library cache: mutex X” with waits on
the “kglpin1 4” subclass.
Bug 8426816 is a problem with pre-11g releases where a dead process holding a mutex might not be
cleaned correctly by PMON.
Bug 8508078 covers a case where incorrect SQL, executed frequently enough, can result in mutex
related waits being seen.
Bug 8573601 corrects the v$mutex_sleep wait times so that they are correctly reported.
Bug 7441165 makes mutex critical sections non-preemptable if the OS supports this.
Bug 9066130 documents a case where one symptom might be lots of waits on ‘cursor: mutex S’.
Bug 8518755 covers a related problem with the yield implementation that affects mutex usage (see
Bug 8694205 covers for one such case). The 8518755 fix was slightly amended by the fix for Bug
9169219 which itself was slightly amended by Bug 9440214 .
Bug 9188470 states that library cache mutexes might not be released in kgllkde under certain cases
and that we could see incorrect deadlock detection when things are in flux.
Bug 8860198 covers a case where there was high contention on library cache mutexes (“kglpin1 4”)
when the XMLTYPE type was used.
Bug 8499043 says that this bug fix can reduce contention on “library cache: mutex X”. The fix changes
the way in which context kgl objects are handled under RAC.
Bug 9069388 makes it easier to correlate the SIDs of dead processes with the mutex SIDs reported in
state object dumps and derivable from v$ tables (v$ wait information).
Bug 9312879 mutex contention might be seen when a dead process holds the mutex and PMON
cannot deal with the dead process in a timely fashion.
Bug 8431767 mutex contention might be seen when using application contexts.
Bug 5949171 library cache mutex contention might be seen when using Streams Apply feature.
Bug 9499302 requested that the instruction count taken in the mutex code is reduced.
Bug 9821207 adds markhot and unmarkhot to dbms_shared_pool.
Bug 9530750 improves performance of cursor build locks to reduce “library cache: mutex X” waits in
releases from 11.2.0.1 onwards. This is fixed in 12.1.
Bug 8890002 covers the case where we need to implement the pause instruction on x86-64 and ia-64
platforms. This is recommended if high CPU issues are seen on these platforms with mutex contention
also being observed. (IA-64 can be Linux Itanium or HP-UX Itanium).
The fix will allow the CPU to do some useful work if we are seeing lots of waits.
Bug 8625030 documents that DDL on partitions can lead to high waits on ‘library cache: mutex X’.
Bug 9591812 states that some ‘cursor: mutex S’ waits should actually be ‘cursor: mutex X’ waits under
11.2 to 12.1.

A session waits on this event when requesting a mutex for sharable operations related to pins (such as executing a cursor), but the mutex cannot be granted because it is being held exclusively by another session (which is most likely parsing the cursor). This is a replacement for library cache pin waits in prior versions of Oracle.

Solutions

Solutions for mutex wait problems are very similar to each other. See Analysis of Waits on Mutexes for more information and troubleshooting ideas.

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号