微观ORACLE(一):PMON Release Lock

我们在学习Oracle基础知识的时候肯定了解过PMON后台进程的功能,包括轮训其他ORACLE PROCESS,清理cleanup dead process并释放锁release enqueue lock ,及清理cleanup latch。

 

虽然这些理论在你耳边萦绕了千百回, 但你是否有亲眼见证过一次Pmon cleanup dead process并release lock呢?大多数人可能没有。

 

微观Oracle=> MicroOracle, Maclean带领你见证微观视角下的Oracle behavior:

 

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn 

SQL> select pid,program  from v$process;

       PID PROGRAM
---------- ------------------------------------------------
         1 PSEUDO
         2 oracle@vrh1.oracle.com (PMON)
         3 oracle@vrh1.oracle.com (PSP0)
         4 oracle@vrh1.oracle.com (VKTM)
         5 oracle@vrh1.oracle.com (GEN0)
         6 oracle@vrh1.oracle.com (DIAG)
         7 oracle@vrh1.oracle.com (DBRM)
         8 oracle@vrh1.oracle.com (PING)
         9 oracle@vrh1.oracle.com (ACMS)
        10 oracle@vrh1.oracle.com (DIA0)
        11 oracle@vrh1.oracle.com (LMON)
        12 oracle@vrh1.oracle.com (LMD0)
        13 oracle@vrh1.oracle.com (LMS0)
        14 oracle@vrh1.oracle.com (RMS0)
        15 oracle@vrh1.oracle.com (LMHB)
        16 oracle@vrh1.oracle.com (MMAN)
        17 oracle@vrh1.oracle.com (DBW0)
        18 oracle@vrh1.oracle.com (LGWR)
        19 oracle@vrh1.oracle.com (CKPT)
        20 oracle@vrh1.oracle.com (SMON)
        21 oracle@vrh1.oracle.com (RECO)
        22 oracle@vrh1.oracle.com (RBAL)
        23 oracle@vrh1.oracle.com (ASMB)
        24 oracle@vrh1.oracle.com (MMON)
        25 oracle@vrh1.oracle.com (MMNL)
        26 oracle@vrh1.oracle.com (MARK)
        27 oracle@vrh1.oracle.com (D000)
        28 oracle@vrh1.oracle.com (SMCO)
        29 oracle@vrh1.oracle.com (S000)
        30 oracle@vrh1.oracle.com (LCK0)
        31 oracle@vrh1.oracle.com (RSMN)
        32 oracle@vrh1.oracle.com (TNS V1-V3)
        33 oracle@vrh1.oracle.com (W000)
        34 oracle@vrh1.oracle.com (TNS V1-V3)
        35 oracle@vrh1.oracle.com (TNS V1-V3)
        37 oracle@vrh1.oracle.com (ARC0)
        38 oracle@vrh1.oracle.com (ARC1)
        40 oracle@vrh1.oracle.com (ARC2)
        41 oracle@vrh1.oracle.com (ARC3)
        43 oracle@vrh1.oracle.com (GTX0)
        44 oracle@vrh1.oracle.com (RCBG)
        46 oracle@vrh1.oracle.com (QMNC)
        47 oracle@vrh1.oracle.com (TNS V1-V3)
        48 oracle@vrh1.oracle.com (TNS V1-V3)
        49 oracle@vrh1.oracle.com (Q000)
        50 oracle@vrh1.oracle.com (Q001)
        51 oracle@vrh1.oracle.com (GCR0)

SQL> drop table maclean;

Table dropped.

SQL> create table maclean(t1 int);

Table created.

SQL> insert into maclean values(1);

1 row created.

SQL> commit;

Commit complete.

 

以上构建了示例环境, 可以看到在该实例中:
PID=2  PMON
PID=11 LMON
PID=18 LGWR
PID=20 SMON
PID=12 LMD

 

我们尝试构造2个存在”enq: TX – row lock contention”冲突的事务,之后使KILL掉第一个事务,这会导致需要PMON来recover dead process并release TX lock:

 

PROCESS A:

QL> select addr,spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));

ADDR             SPID                            PID
---------------- ------------------------ ----------
00000000BD516B80 17880                            46

SQL> select distinct sid from v$mystat;

       SID
----------
        22

SQL> update maclean set t1=t1+1;

1 row updated.

PROCESS B

SQL> select addr,spid,pid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));

ADDR             SPID                            PID
---------------- ------------------------ ----------
00000000BD515AD0 17908                            45

SQL> update maclean set t1=t1+1;

HANG..............

PROCESS B 因为"enq: TX – row lock contention"而HANG住

 

 

我们使用PROCESS C启用 对SMON的10500 event trace 和对PMON的KST TRACE:

 

SQL> set linesize 200 pagesize 1400
SQL> select * from v$lock where sid=22;

ADDR             KADDR                   SID TY        ID1        ID2      LMODE    REQUEST      CTIME      BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
00000000BDCD7618 00000000BDCD7670         22 AE        100          0          4          0         48          2
00007F63268A9E28 00007F63268A9E88         22 TM      77902          0          3          0         32          2
00000000B9BB4950 00000000B9BB49C8         22 TX     458765        892          6          0         32          1

PROCESS A holde的ENQUEUE LOCK有三个 AE、TM、TX

SQL> alter system switch logfile;

System altered.

SQL> alter system checkpoint;

System altered.

SQL>  alter system flush buffer_cache;

System altered.

SQL> alter system set "_trace_events"='10000-10999:255:2,20,33';

System altered.

SQL> ! kill -9 17880
KILL PROCESS A 并等待PROCESS B完成update

对PMON 和 PROCESS B 做errorstack 将KST TRACE记录到磁盘

SQL> oradebug setorapid 2;
Oracle pid: 2, Unix process pid: 17533, image: oracle@vrh1.oracle.com (PMON)

SQL> oradebug dump errorstack 4;
Statement processed.

SQL> oradebug tracefile_name

/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_pmon_17533.trc

SQL> oradebug setorapid 45;
Oracle pid: 45, Unix process pid: 17908, image: oracle@vrh1.oracle.com (TNS V1-V3)

SQL> oradebug dump errorstack 4;
Statement processed.

SQL>oradebug tracefile_name
/s01/orabase/diag/rdbms/vprod/VPROD1/trace/VPROD1_ora_17908.trc

 

 

分析PMON的 KST TRACE:

 

2012-05-18 10:37:34.557225 :8001ECE8:db_trace:ktur.c@5692:ktugru(): [10444:2:1] next rollback uba: 0x00000000.0000.00
2012-05-18 10:37:34.557382 :8001ECE9:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=18 num=4 loc='ksa2.h LINE:285 ID:ksasnd' id1=0 id2=0 name=   type=0
2012-05-18 10:37:34.557514 :8001ECEA:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TX-0007000d-0000037c mode=X
2012-05-18 10:37:34.558819 :8001ECF0:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=45 num=5 loc='kji.h LINE:3418 ID:kjata: wake up enqueue owner' id1=0 id2=0 name=   type=0
2012-05-18 10:37:34.559047 :8001ECF8:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=12 num=6 loc='kjm.h LINE:1224 ID:kjmpost: post lmd' id1=0 id2=0 name=   type=0
2012-05-18 10:37:34.559271 :8001ECFC:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
2012-05-18 10:37:34.559291 :8001ECFD:db_trace:ktu.c@8652:ktudnx(): [10813:2:1] ktudnx: dec cnt xid:7.13.892 nax:0 nbx:0
2012-05-18 10:37:34.559301 :8001ECFE:db_trace:ktur.c@3198:ktuabt(): [10444:2:1] ABORT TRANSACTION - xid: 0x0007.00d.0000037c
2012-05-18 10:37:34.559327 :8001ECFF:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TM-0001304e-00000000 mode=SX
2012-05-18 10:37:34.559365 :8001ED00:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
2012-05-18 10:37:34.559908 :8001ED01:db_trace:ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release AE-00000064-00000000 mode=S
2012-05-18 10:37:34.559982 :8001ED02:db_trace:ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS
2012-05-18 10:37:34.560217 :8001ED03:db_trace:ksfd.c@15379:ksfdfods(): [10298:2:1] ksfdfods:fob=0xbab87b48 aiopend=0
2012-05-18 10:37:34.560336 :GSIPC:kjcs.c@4876:kjcsombdi(): GSIPC:SOD: 0xbc79e0c8 action 3 state 0 chunk (nil) regq 0xbc79e108 batq 0xbc79e118
2012-05-18 10:37:34.560357 :GSIPC:kjcs.c@5293:kjcsombdi(): GSIPC:SOD: exit cleanup for 0xbc79e0c8 rc: 1, loc: 0x303
2012-05-18 10:37:34.560375 :8001ED04:db_trace:kss.c@1414:kssdch(): [10809:2:1] kssdch(0xbd516b80 = process, 3) 1 0 exit
2012-05-18 10:37:34.560939 :8001ED06:db_trace:kmm.c@10578:kmmlrl(): [10257:2:1] KMMLRL: Entering: flg(0x0) rflg(0x4)
2012-05-18 10:37:34.561091 :8001ED07:db_trace:kmm.c@10472:kmmlrl_process_events(): [10257:2:1] KMMLRL: Events: succ(3) wait(0) fail(0)
2012-05-18 10:37:34.561100 :8001ED08:db_trace:kmm.c@11279:kmmlrl(): [10257:2:1] KMMLRL: Reg/update: flg(0x0) rflg(0x4)
2012-05-18 10:37:34.563325 :8001ED0B:db_trace:kmm.c@12511:kmmlrl(): [10257:2:1] KMMLRL: Update: ret(0)
2012-05-18 10:37:34.563335 :8001ED0C:db_trace:kmm.c@12768:kmmlrl(): [10257:2:1] KMMLRL: Exiting: flg(0x0) rflg(0x4)
2012-05-18 10:37:34.563354 :8001ED0D:db_trace:ksl2.c@2598:kslwtbctx(): [10005:2:1] KSL WAIT BEG [pmon timer] 300/0x12c 0/0x0 0/0x0 wait_id=78 seq_num=79 snap_id=1

 

 

PMON发现dead process A后首先释放了其手上的TX Lock:
ksqrcl: release TX-0007000d-0000037c mode=X

 

接着它异步Post Process B,告诉Process B 你acquire的TX lock已经被我释放了:
KSL POST SENT postee=45 num=5 loc=’kji.h LINE:3418 ID:kjata: wake up enqueue owner’ id1=0 id2=0 name=   type=0

 

Process B在收到PMON通知后立即继续了工作
ksl2.c@14563:ksliwat(): [10005:45:151] KSL POST RCVD poster=2 num=5 loc=’kji.h LINE:3418 ID:kjata: wake up enqueue owner’ id1=0 id2=0 name=   type=0 fac#=3 posted=0x3 may_be_posted=1
kslwtbctx(): [10005:45:151] KSL WAIT BEG [latch: ges resource hash list] 3162668560/0xbc827e10 91/0x5b 0/0x0 wait_id=14 seq_num=15 snap_id=1
kslwtectx(): [10005:45:151] KSL WAIT END [latch: ges resource hash list] 3162668560/0xbc827e10 91/0x5b 0/0x0 wait_id=14 seq_num=15 snap_id=1

 

 

 

在RAC中它还会POST LMD(lock Manager)进程,通知其改变相关的GES信息:
2012-05-18 10:37:34.559047 :8001ECF8:db_trace:ksl2.c@16009:ksl_update_post_stats(): [10005:2:1] KSL POST SENT postee=12 num=6 loc=’kjm.h LINE:1224 ID:kjmpost: post lmd’ id1=0 id2=0 name=   type=0

 

 

 

之后ksqrcl: release TX的工作才真正成功:
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS

 

之后PMON abort Process A原有的Transaction
2012-05-18 10:37:34.559291 :8001ECFD:db_trace:ktu.c@8652:ktudnx(): [10813:2:1] ktudnx: dec cnt xid:7.13.892 nax:0 nbx:0
2012-05-18 10:37:34.559301 :8001ECFE:db_trace:ktur.c@3198:ktuabt(): [10444:2:1] ABORT TRANSACTION – xid: 0x0007.00d.0000037c

 

释放Process A原本持有的maclean表的TM lock:
ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release TM-0001304e-00000000 mode=SX
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS

 

释放Process A原本持有的AE ( Prevent Dropping an edition in use) lock:
ksq.c@8540:ksqrcli(): [10704:2:1] ksqrcl: release AE-00000064-00000000 mode=S
ksq.c@8826:ksqrcli(): [10704:2:1] ksqrcl: SUCCESS

 

清理cleanup process A
kjcs.c@4876:kjcsombdi(): GSIPC:SOD: 0xbc79e0c8 action 3 state 0 chunk (nil) regq 0xbc79e108 batq 0xbc79e118
GSIPC:kjcs.c@5293:kjcsombdi(): GSIPC:SOD: exit cleanup for 0xbc79e0c8 rc: 1, loc: 0x303
kss.c@1414:kssdch(): [10809:2:1] kssdch(0xbd516b80 = process, 3) 1 0 exit

 

0xbd516b80是原PROCESS A 的paddr 起始地址

kssdch函数的主要作用是 删除process相关的state object SO KSS: delete children of state obj.

 

PMON 调用kmmlrl()函数更新instance goodness统计update for session drop delta
kmmlrl(): [10257:2:1] KMMLRL: Entering: flg(0x0) rflg(0x4)
kmmlrl_process_events(): [10257:2:1] KMMLRL: Events: succ(3) wait(0) fail(0)
kmmlrl(): [10257:2:1] KMMLRL: Reg/update: flg(0x0) rflg(0x4)
kmmlrl(): [10257:2:1] KMMLRL: Update: ret(0)
kmmlrl(): [10257:2:1] KMMLRL: Exiting: flg(0x0) rflg(0x4)

 

完成手头的工作后PMON再次进入 3s一次的”pmon timer”等待
kslwtbctx(): [10005:2:1] KSL WAIT BEG [pmon timer] 300/0x12c 0/0x0 0/0x0 wait_id=78 seq_num=79 snap_id=1

如何跟踪Oracle动态服务注册

如何trace Oracle PMON进程动态注册过程?这个问题我想到2个答案,对PMON做event trace或者采用Oracle Network Server因该都可以达到目的。
让我们来实践一下!

Oracle Network Server Trace模式

1. 启用Oracle SqlNet服务器端trace,这需要我们修改sqlnet.ora配置文件

[maclean@rh2 ~]$ echo "TRACE_LEVEL_SERVER = 16
> TRACE_FILE_SERVER = SERVER
> TRACE_DIRECTORY_SERVER= /home/maclean/ntrc"  > $ORACLE_HOME/network/admin/sqlnet.ora
[maclean@rh2 ~]$ cat $ORACLE_HOME/network/admin/sqlnet.ora
TRACE_LEVEL_SERVER = 16
TRACE_FILE_SERVER = SERVER
TRACE_DIRECTORY_SERVER= /home/maclean/ntrc

2. 触发trace

SQL> conn / as sysdba
Connected.

SQL>  select spid from V$process, V$session where audsid=userenv('SESSIONID') and paddr=addr;
SPID
------------
4264

SQL> alter system register;
System altered.

/*在之前指定的TRACE_DIRECTORY_SERVER目录下将出现形如server_$spid.trc的trace文件*/

[maclean@rh2 ntrc]$ cat server_4290.trc|grep nsprecv|grep -A 2 -B 2 reg
[20-AUG-2010 10:42:53:896] nsprecv: 00 00 00 00 00 15 61 6C  |......al|
[20-AUG-2010 10:42:53:896] nsprecv: 74 65 72 20 73 79 73 74  |ter.syst|
[20-AUG-2010 10:42:53:896] nsprecv: 65 6D 20 72 65 67 69 73  |em.regis|
[20-AUG-2010 10:42:53:896] nsprecv: 74 65 72 01 00 00 00 01  |ter.....|
[20-AUG-2010 10:42:53:896] nsprecv: 00 00 00 00 00 00 00 00  |........|

/*分析该trace文件后可以匹配到以上动态注册语句*/

/*记得将server端sqlnet trace设置disable掉*/

[maclean@rh2 ntrc]$ echo "" > $ORACLE_HOME/network/admin/sqlnet.ora

PMON TRACE模式
1.找出PMON的进程号

SQL> select spid ,program from v$process where program like '%PMON%';

SPID         PROGRAM
------------ ------------------------------------------------
4050         oracle@rh2 (PMON)

2.执行10257事件trace

SQL> oradebug setospid 4050;
Oracle pid: 2, Unix process pid: 4050, image: oracle@rh2 (PMON)

SQL> oradebug event 10257 trace name context forever,level 16;
Statement processed.

SQL> alter system register;
System altered.

SQL> oradebug event 10257 trace name context off;
Statement processed.

SQL> oradebug tracefile_name;
/s01/10gdb/admin/YOUYUS/bdump/youyus_pmon_4050.trc

cat /s01/10gdb/admin/YOUYUS/bdump/youyus_pmon_4050.trc
.......................
kmmlrl: register now
kmmgdnu: cXDB
         goodness=0, delta=1,
         flags=0x5:unblocked/not overloaded, update=0x6:G/D/-
kmmgdnu: YOUYUS
         goodness=0, delta=1,
         flags=0x4:unblocked/not overloaded, update=0x6:G/D/-
kmmlrl: nsgr update returned 0
kmmlrl: nsgr register returned 0

/*可以看到注册了YOUYUS和cXDB2个service的过程*/

若无法正常注册可以按照以下步骤检查:
1. 检查Oracle Net命名方式是否正确设置,该参数由sqlnet.ora配置文件中的NAMES.DIRECTORY_PATH指定。
2. 确认SQL NET配置文件如sqlnet.ora,tnsnames.ora等的位置正确配载。
3. 检查是否设置了TNS_ADMIN环境变量,该变量将影响配置文件的正确路径。
4. 确认LOCAL_LISTENER或REMOTE_LISTENER中指定的service可以被正常tnsping通,若以上参数未指定值则尝试步骤6。
5. 确认主机名可以被正确解析为ip地址,如以下示例:

C:\Users\weisly>nslookup www.oracle.com
服务器:  dir-605
Address:  192.168.0.1

非权威应答:
名称:    a398.g.akamai.net.0.1.cn.akamaitech.net
Addresses:  117.104.136.98
          117.104.136.128
Aliases:  www.oracle.com
          www.oracle.com.edgesuite.net
          a398.g.akamai.net

6. 修改LOCAL_LISTENER为非服务别名解析的形式,这样可以排除配置文件可能造成的影响,从而确认问题是否由配置文件设置不当而引起:

SQL> conn / as sysdba
Connected.

SQL> alter system set local_listener='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=rh2)(PORT=1521)))';
System altered.

SQL> alter system set REMOTE_LISTENER=' (ADDRESS_LIST =(ADDRESS = (PROTOCOL = TCP)(HOST = rh1)(PORT = 1521))(ADDRESS = (PROTOCOL = TCP)(HOST = rh2)(PORT = 1521)))';
System altered.

7. 将主机名替换为ip地址并测试是否解决问题。
8. 以IPC协议替代常见的TCP协议测试,绕过TCP和主机名等因素可能造成的问题,以确认问题是否由这些因素引起:

SQL> alter system set local_listener='(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=KEY1)))';
System altered.

/*其中KEY值需要和listener.ora中指定的一致。*/

SQL> host
[maclean@rh2 admin]$ lsnrctl status KEY1

LSNRCTL for Linux: Version 10.2.0.4.0 - Production on 20-AUG-2010 11:30:06
Copyright (c) 1991, 2007, Oracle.  All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=KEY1)))
STATUS of the LISTENER
------------------------
Alias                     KEY1
Version                   TNSLSNR for Linux: Version 10.2.0.4.0 - Production
Start Date                20-AUG-2010 11:29:15
Uptime                    0 days 0 hr. 0 min. 50 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /s01/10gdb/network/admin/listener.ora
Listener Log File         /s01/10gdb/network/log/key1.log
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=KEY1)))
Services Summary...
Service "YOUYUS" has 1 instance(s).
  Instance "YOUYUS", status READY, has 1 handler(s) for this service...
Service "YOUYUS_XPT" has 1 instance(s).
  Instance "YOUYUS", status READY, has 1 handler(s) for this service...
Service "cXDB" has 1 instance(s).
  Instance "YOUYUS", status READY, has 10 handler(s) for this service...
The command completed successfully

实在不行的话,我们还是提SR吧!

沪ICP备14014813号-2

沪公网安备 31010802001379号