我们在学习Oracle基础知识的时候肯定了解过PMON后台进程的功能,包括轮训其他ORACLE PROCESS,清理cleanup dead process并释放锁release enqueue lock ,及清理cleanup latch。
虽然这些理论在你耳边萦绕了千百回, 但你是否有亲眼见证过一次Pmon cleanup dead process并release lock呢?大多数人可能没有。
微观Oracle=> MicroOracle, Maclean带领你见证微观视角下的Oracle behavior:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmac.cn SQL> select pid,program from v$process; PID PROGRAM ---------- ------------------------------------------------ 1 PSEUDO 2 oracle@vrh1.oracle.com (PMON) 3 oracle@vrh1.oracle.com (PSP0) 4 oracle@vrh1.oracle.com (VKTM) 5 oracle@vrh1.oracle.com (GEN0) 6 oracle@vrh1.oracle.com (DIAG) 7 oracle@vrh1.oracle.com (DBRM) 8 oracle@vrh1.oracle.com (PING) 9 oracle@vrh1.oracle.com (ACMS) 10 oracle@vrh1.oracle.com (DIA0) 11 oracle@vrh1.oracle.com (LMON) 12 oracle@vrh1.oracle.com (LMD0) 13 oracle@vrh1.oracle.com (LMS0) 14 oracle@vrh1.oracle.com (RMS0) 15 oracle@vrh1.oracle.com (LMHB) 16 oracle@vrh1.oracle.com (MMAN) 17 oracle@vrh1.oracle.com (DBW0) 18 oracle@vrh1.oracle.com (LGWR) 19 oracle@vrh1.oracle.com (CKPT) 20 oracle@vrh1.oracle.com (SMON) 21 oracle@vrh1.oracle.com (RECO) 22 oracle@vrh1.oracle.com (RBAL) 23 oracle@vrh1.oracle.com (ASMB) 24 oracle@vrh1.oracle.com (MMON) 25 oracle@vrh1.oracle.com (MMNL) 26 oracle@vrh1.oracle.com (MARK) 27 oracle@vrh1.oracle.com (D000) 28 oracle@vrh1.oracle.com (SMCO) 29 oracle@vrh1.oracle.com (S000) 30 oracle@vrh1.oracle.com (LCK0) 31 oracle@vrh1.oracle.com (RSMN) 32 oracle@vrh1.oracle.com (TNS V1-V3) 33 oracle@vrh1.oracle.com (W000) 34 oracle@vrh1.oracle.com (TNS V1-V3) 35 oracle@vrh1.oracle.com (TNS V1-V3) 37 oracle@vrh1.oracle.com (ARC0) 38 oracle@vrh1.oracle.com (ARC1) 40 oracle@vrh1.oracle.com (ARC2) 41 oracle@vrh1.oracle.com (ARC3) 43 oracle@vrh1.oracle.com (GTX0) 44 oracle@vrh1.oracle.com (RCBG) 46 oracle@vrh1.oracle.com (QMNC) 47 oracle@vrh1.oracle.com (TNS V1-V3) 48 oracle@vrh1.oracle.com (TNS V1-V3) 49 oracle@vrh1.oracle.com (Q000) 50 oracle@vrh1.oracle.com (Q001) 51 oracle@vrh1.oracle.com (GCR0) SQL> drop table maclean; Table dropped. SQL> create table maclean(t1 int); Table created. SQL> insert into maclean values(1); 1 row created. SQL> commit; Commit complete.
以上构建了示例环境, 可以看到在该实例中:
PID=2 PMON
PID=11 LMON
PID=18 LGWR
PID=20 SMON
PID=12 LMD
我们尝试构造2个存在”enq: TX – row lock contention”冲突的事务,之后使KILL掉第一个事务,这会导致需要PMON来recover dead process并release TX lock:
PROCESS A: QL> select addr,spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat)); ADDR SPID PID ---------------- ------------------------ ---------- 00000000BD516B80 17880 46 SQL> select distinct sid from v$mystat; SID ---------- 22 SQL> update maclean set t1=t1+1; 1 row updated. PROCESS B SQL> select addr,spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat)); ADDR SPID PID ---------------- ------------------------ ---------- 00000000BD515AD0 17908 45 SQL> update maclean set t1=t1+1; HANG.............. PROCESS B 因为"enq: TX – row lock contention"而HANG住
我们使用PROCESS C启用 对SMON的10500 event trace 和对PMON的KST TRACE:
SQL> set linesize 200 pagesize 1400
SQL> select * from v$lock where sid=22;
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000BDCD7618 00000000BDCD7670 22 AE 100 0 4 0 48 2
00007F63268A9E28 00007F63268A9E88 22 TM 77902 0 3 0 32 2
00000000B9BB4950 00000000B9BB49C8 22 TX 458765 892 6 0 32 1
PROCESS A holde的ENQUEUE LOCK有三个 AE、TM、TX
SQL> alter system switch logfile;
System altered.
SQL> alter system checkpoint;
System altered.
SQL> alter system flush buffer_cache;
System altered.
SQL> alter system set "_trace_events"='10000-10999:255:2,20,33';
System altered.
SQL> ! kill -9 17880
KILL PROCESS A 并等待PROCESS B完成update
对PMON 和 PROCESS B 做errorstack 将KST TRACE记录到磁盘
SQL> oradebug setorapid 2;
Oracle pid: 2, Unix process pid: 17533, image: oracle@vrh1.oracle.com (PMON)
SQL> oradebug dump errorstack 4;
Statement processed.
SQL> oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_pmon_17533.trc
SQL> oradebug setorapid 45;
Oracle pid: 45, Unix process pid: 17908, image: oracle@vrh1.oracle.com (TNS V1-V3)
SQL> oradebug dump errorstack 4;
Statement processed.
SQL>oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_17908.trc
分析PMON的 KST TRACE:
2012-05-18 10:37:34.557225 :8001ECE8:db_trace:ktur.c@5692:ktugru(): [10444:2:1] next rollback uba: 0x00000000.0000.00 2012-05-18 10:37:34.557382 :8001ECE9:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=18 num=4 loc='ksa2.h LINE:285 ID:ksasnd' id1=0 id2=0 name= type=0 2012-05-18 10:37:34.557514 :8001ECEA:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TX-0007000d-0000037c mode=X 2012-05-18 10:37:34.558819 :8001ECF0:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=45 num=5 loc='kji.h LINE:3418 ID:kjata: wake up enqueue owner' id1=0 id2=0 name= type=0 2012-05-18 10:37:34.559047 :8001ECF8:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=12 num=6 loc='kjm.h LINE:1224 ID:kjmpost: post lmd' id1=0 id2=0 name= type=0 2012-05-18 10:37:34.559271 :8001ECFC:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS 2012-05-18 10:37:34.559291 :8001ECFD:db_trace:ktu.c@8652:ktudnx(): [10813:2:1] ktudnx: dec cnt xid:7.13.892 nax:0 nbx:0 2012-05-18 10:37:34.559301 :8001ECFE:db_trace:ktur.c@3198:ktuabt(): [10444:2:1] ABORT TRANSACTION - xid: 0x0007.00d.0000037c 2012-05-18 10:37:34.559327 :8001ECFF:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TM-0001304e-00000000 mode=SX 2012-05-18 10:37:34.559365 :8001ED00:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS 2012-05-18 10:37:34.559908 :8001ED01:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release AE-00000064-00000000 mode=S 2012-05-18 10:37:34.559982 :8001ED02:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS 2012-05-18 10:37:34.560217 :8001ED03:db_trace:ksfd.c@15379:ksfdfods(): [10298:2:1] ksfdfods:fob=0xbab87b48 aiopend=0 2012-05-18 10:37:34.560336 :GSIPC:kjcs.c@4876:kjcsombdi(): GSIPC:SOD: 0xbc79e0c8 action 3 state 0 chunk (nil) regq 0xbc79e108 batq 0xbc79e118 2012-05-18 10:37:34.560357 :GSIPC:kjcs.c@5293:kjcsombdi(): GSIPC:SOD: exit cleanup for 0xbc79e0c8 rc: 1, loc: 0x303 2012-05-18 10:37:34.560375 :8001ED04:db_trace:kss.c@1414:kssdch(): [10809:2:1] kssdch(0xbd516b80 = process, 3) 1 0 exit 2012-05-18 10:37:34.560939 :8001ED06:db_trace:kmm.c@10578:kmmlrl(): [10257:2:1] KMMLRL: Entering: flg(0x0) rflg(0x4) 2012-05-18 10:37:34.561091 :8001ED07:db_trace:kmm.c@10472:kmmlrl_process_events(): [10257:2:1] KMMLRL: Events: succ(3) wait(0) fail(0) 2012-05-18 10:37:34.561100 :8001ED08:db_trace:kmm.c@11279:kmmlrl(): [10257:2:1] KMMLRL: Reg/update: flg(0x0) rflg(0x4) 2012-05-18 10:37:34.563325 :8001ED0B:db_trace:kmm.c@12511:kmmlrl(): [10257:2:1] KMMLRL: Update: ret(0) 2012-05-18 10:37:34.563335 :8001ED0C:db_trace:kmm.c@12768:kmmlrl(): [10257:2:1] KMMLRL: Exiting: flg(0x0) rflg(0x4) 2012-05-18 10:37:34.563354 :8001ED0D:db_trace:ksl2.c@2598:kslwtbctx(): [10005:2:1] KSL WAIT BEG [pmon timer] 300/0x12c 0/0x0 0/0x0 wait_id=78 seq_num=79 snap_id=1
PMON发现dead process A后首先释放了其手上的TX Lock:
ksqrcl: release TX-0007000d-0000037c mode=X
接着它异步Post Process B,告诉Process B 你acquire的TX lock已经被我释放了:
KSL POST SENT postee=45 num=5 loc=’kji.h LINE:3418 ID:kjata: wake up enqueue owner’ id1=0 id2=0 name= type=0
Process B在收到PMON通知后立即继续了工作
ksl2.c@14563:ksliwat(): [10005:45:151] KSL POST RCVD poster=2 num=5 loc=’kji.h LINE:3418 ID:kjata: wake up enqueue owner’ id1=0 id2=0 name= type=0 fac#=3 posted=0x3 may_be_posted=1
kslwtbctx(): [10005:45:151] KSL WAIT BEG [latch: ges resource hash list] 3162668560/0xbc827e10 91/0x5b 0/0x0 wait_id=14 seq_num=15 snap_id=1
kslwtectx(): [10005:45:151] KSL WAIT END [latch: ges resource hash list] 3162668560/0xbc827e10 91/0x5b 0/0x0 wait_id=14 seq_num=15 snap_id=1
在RAC中它还会POST LMD(lock Manager)进程,通知其改变相关的GES信息:
2012-05-18 10:37:34.559047 :8001ECF8:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=12 num=6 loc=’kjm.h LINE:1224 ID:kjmpost: post lmd’ id1=0 id2=0 name= type=0
之后ksqrcl: release TX的工作才真正成功:
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
之后PMON abort Process A原有的Transaction
2012-05-18 10:37:34.559291 :8001ECFD:db_trace:ktu.c@8652:ktudnx(): [10813:2:1] ktudnx: dec cnt xid:7.13.892 nax:0 nbx:0
2012-05-18 10:37:34.559301 :8001ECFE:db_trace:ktur.c@3198:ktuabt(): [10444:2:1] ABORT TRANSACTION – xid: 0x0007.00d.0000037c
释放Process A原本持有的maclean表的TM lock:
ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TM-0001304e-00000000 mode=SX
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
释放Process A原本持有的AE ( Prevent Dropping an edition in use) lock:
ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release AE-00000064-00000000 mode=S
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
清理cleanup process A
kjcs.c@4876:kjcsombdi(): GSIPC:SOD: 0xbc79e0c8 action 3 state 0 chunk (nil) regq 0xbc79e108 batq 0xbc79e118
GSIPC:kjcs.c@5293:kjcsombdi(): GSIPC:SOD: exit cleanup for 0xbc79e0c8 rc: 1, loc: 0x303
kss.c@1414:kssdch(): [10809:2:1] kssdch(0xbd516b80 = process, 3) 1 0 exit
0xbd516b80是原PROCESS A 的paddr 起始地址
kssdch函数的主要作用是 删除process相关的state object SO KSS: delete children of state obj.
PMON 调用kmmlrl()函数更新instance goodness统计update for session drop delta
kmmlrl(): [10257:2:1] KMMLRL: Entering: flg(0x0) rflg(0x4)
kmmlrl_process_events(): [10257:2:1] KMMLRL: Events: succ(3) wait(0) fail(0)
kmmlrl(): [10257:2:1] KMMLRL: Reg/update: flg(0x0) rflg(0x4)
kmmlrl(): [10257:2:1] KMMLRL: Update: ret(0)
kmmlrl(): [10257:2:1] KMMLRL: Exiting: flg(0x0) rflg(0x4)
完成手头的工作后PMON再次进入 3s一次的”pmon timer”等待
kslwtbctx(): [10005:2:1] KSL WAIT BEG [pmon timer] 300/0x12c 0/0x0 0/0x0 wait_id=78 seq_num=79 snap_id=1
精彩 精彩绝伦~~~~~我认真的学习了 ~~~希望以后多多这样的文章啊
oradebug setorapid 2; 这个是啥操作?
生成trace和分析PMON的 KST TRACE: 我还不会啊~~
看日志主要靠经验积累。
我才工作 一年 经验尚浅~~你的网站能不能设置提醒啊,比如你回复了啊,我就会收到提醒,就能看了。
真有意思,这个故事很好玩,讲述的也不错哦