转自老白的<oracle rac 日记>一书,
dump systemstate产生的跟踪文件包含了系统中所有进程的进程状态等信息。每个进程对应跟踪文件中的一段内容,反映该进程的状态信息,包括进程信息,会话信息,enqueues信息(主要是lock的信息),缓冲区的信息和该进程在SGA区中持有的(held)对象的状态等信息。dump systemstate产生的跟踪文件是从dump那一刻开始到dump任务完成之间一段事件内的系统内所有进程的信息。
那么通常在什么情况下使用systemstate比较合适呢?
Oracle推荐的使用systemstate事件的几种情况是:
数据库hang住了
数据库很慢
进程正在hang
数据库出现某些错误
资源争用
dump systemstate的语法为:
ALTER SESSION SET EVENTS ‘immediate trace name systemstate level 10’;
也可以使用ORADEBUG实现这个功能:
sqlplus -prelim / as sysdba
oradebug setmypid
oradebug unlimit;
oradebug dump systemstate 10
如果希望在数据库发生某种错误时调用systemstate事件,可以在参数文件(spfile或者pfile)中设置event参数,
例如,当系统发生死锁(出现ORA-00060错误)时dump systemstate:
event = “60 trace name systemstate level 10”
LEVEL参数:
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all
如果Level过大的话会产生大量的跟踪文件并影响系统的I/O性能,建议不要采用3级以上的跟踪。Hanganalyze报告会分作许多片断,会话片断信息总是由一个header详尽描述被提取的的会话信息。
一般来说,一份systemstate dump中包含了以下内容:
dump header文件头
process dump dump时所有的process的dump信息,每个process一个专门的章节。
call dump在process dump中,包含call dump
session dump每个process中,都有1个或多个(MTS时)session dump
enqueue dump
buffer dump在session dump中可能包含buffer dump
在阅读systemstate dump时,一般首先使用ASS工具来进行分析。ASS是oracle工程师编写的一个AWK脚本,用于分析systemstate dump文件,找出dump中可能存在问题的地方。通过ASS的输出结果,我们就可以发现一些blocker的线索,这些线索就是我们重点要查看的地方。
我们可以通过搜索SO的地址信息来定位某个SO,找到后分析这个SO的信息,并且通过PARENT SO的地址找到其PARENT,建立这些SO的关系图。比如我们找到一个SESSION的SO,就可以看看这个session属于哪个process,这个session正在执行的sql是什么,等等。通过这种分析,就把可能存在问题的SO及关联的SO全部找出来,这样就为进一步分析提供了素材。
1、标准的state object header(SO)
state object header中包含了一些基本的信息,比如:
SO: c00004ti4jierj, type: 2. owner: 0000000000, flag: init/-/-/0x00
其中SO是state object的号码;
type表示state object的类别;
TYPE: state object的已知类别:
2 process(进程)
3 call
4 session(会话信息)
5 enqueue(锁信息)
6 file infomation block(文件信息块,每个FIB标识一个文件)
11 broadcast handle(广播消息句柄)
12 KSV slave class state
13 ksvslvm
16 osp req holder(会话执行os操作的holder)
18 enqueue resource detail(锁资源详细资料)
19 ges message(ges消息)
20 namespace [ksxp] key
24 buffer [db buffer]
36 dml lock
37 temp table lock(临时表锁)
39 list of blocks(用于block cleanout的块列表清单)
40 transaction(事务)
41 dummy
44 sort segment handle(排序段句柄)
50 row cache enqueue
52 user lock
53 library cache lock
54 library cache pin
55 library cache load lock
59 cursor enqueue
61 process queue
62 queue reference
75 queue monitor sob
owner是这个SO的父节点(如果为0,说明是最顶层的SO);flag表示状态,值有以下三种:
kssoinit;state object被初始化了
kssoflst;state object在freelist上
kssofcln;state object已经被pmon释放了。
State object header的数据结构如下:
struct kssob {
unsigned char kssobtyp; /* state object的类别*/
unsigned char kssobflg; /* flags */
unsigned char kssobdelstage;
struct kssob * kssobown; /*拥有者的SO指针*/
kgglk kssoblnk; /*在父对象成员链中的指针*/
}
2、processstate dump(ksupr)
processstate dump转储了进程的状态,从这些信息中我们可以了解进程的基本属性以及进程的状态。
在阅读processstate dump时,我们主要关注的进程的标识(FLAG),从中也可以知道进程的类别。从”(latch info)”中可以看到进程等待latch的情况,这也有助于了解进程故障的原因。另外,进程的OS信息对于进一步了解进程情况也是很有帮助的。
实际上,x$ksupr包含了进程的信息,通过该内存视图可以更进一步了解processstate dump的内容。
ADDR 地址
INDX 序号
INST_ID实例ID
KSSPAFLG state object的状态:
KSSOINIT 0x01 // state object initialized
KSSOFLST 0x02 // state object is on free list
KSSOFCLN 0x04 // state object freed by PMON(for debugging)
KSSPAOWN 该SO的OWNER,如果自己是顶层的SO,那么owner为0
KSUPRFLG 该process的状态:
KSUPDEAD 0x01 process is dead and should be cleaned up
KSUPDSYS 0x02 detached,system process
KSUPDFAT 0x04 detached,fatal(system) process
KSUPDCLN 0x08 process is cleanup(pmon)
KSUPDSMN 0x10 process is smon
KSUPDPSU 0x20 pseudo process
KSUPDMSS 0x40 muti-stated server
KSUPDDPC 0x80 dispatcher process
KSUPRSER 进程的序号(SERIAL NO)
KSUPRIOC
KSLLALAQ 持有的latch
KSLLAWAT 正在等待的latch
KSLLAWHY latch请求的上下文(用于debug)
KSLLAWER latch请求的位置(用于debug)
KSLLASPN 本进程正在spin的latch
KSLLALOW 所持有latch级别的位图(0~9级)
KSLLAPSC 进程发出的POST消息的计数
KSLLAPRC 进程收到的POST消息的计数
KSLLAPRV 收到的最后一个POST的LOC ID,参考图中的①
KSLLAPSN 最后一个发送POST的LOC ID,参考图中的②
KSLLID1R RESOURCE ID的第一部分
KSLLID2R RESOURCE ID的第二部分
KSLLRTYP RESOURCE TYPE+RESOURCE FLAG
KSLLRMTY RESOURCE MANAGE的类型:
KRMENQ 0x01 enqueues
KRMLATCH 0x02 latches
KRMLIBCALK 0x03 library cache locks
KRMBUFLK 0x04 buffer locks
KSLLARPO 最后一个发送消息给这个进程的OS进程
KSLLASPO 这个进程最后一个发送信息过去的OS进程
KSUPRPID OS进程号
KSUPRWID 等待事件的ID
KSUPRUNM OS用户名
KSUPRMNM 用户的机器名
KSUPRPNM 用户程序名
KSUPRTID 用户终端名
KSSRCOBJ STATE OBJECT RECOVERY数据中的正在被操作的对象
KSSRCFRE STATE OBJECT RECOVERY数据中的FREELIST的地址
KSSRCSRC STATE OBJECT RECOVERY数据中的SOURCE PARENT
KSSRCDST STATE OBJECT RECOVERY数据中的DESTINATION PARENT
KSASTQNX MESSAGE STATE中的前向指针
KSASTQPR MESSAGE STATE中的后向指针
KSASTRPL MESSAGE STATE中的REPLY VALUE
KSUPRPGP PROCESS GROUP的名字
KSUPRTFI 进程的trace文件名
KSUPRPUM PGA使用的内存
KSUPRPNAM KSUPRPNAM+KSUPRPRAM是pga分配内存的总和
KSUPRPRAM
KSUPRPFM pga可释放的内存
KSUPRPMM pga使用的最大内存
3、session state object
会话信息中包含了大量我们所需要的信息,一般来说会话状态块是我们分析会话情况的重点。
在会话状态信息中,flag是十分重要的,我们可以从flag中了解会话目前的情况,以及flag位图的详细信息。该会话正在执行的sql和pl/sql的SO地址可以让我们找到当前会话正在做的工作,有助于进一步
分析。另外,会话的等待事件和历史等待事件可以让用户了解会话在现在和过去一段时间里等待的情况,如果要分析会话故障原因的话,这些资料都是十分重要的。
flag的位图如下:
KSUSFUSR 0x00000001 user session (as opposed to recursive session)
KSUSFREC 0x00000002 recursive session(always internal)
KSUSFAUD 0x00000004 audit logon/logoff,used by cleanup
KSUSFDCO 0x00000008 disable commit/rollback from plsql
KSUSFSYS 0x00000010 user session created by system processes
KSUSFSGA 0x00000020 whether UGA is allocate in sga
KSUSFLOG 0x00000040 whether user session logs on to ORACLE
KSUSFMSS 0x00000080 user session created by multi-stated server
KSUSFDIT 0x00000100 disable (defer) interrupt
KSUSFCLC 0x00000200 counted for current license count decrement
KSUSFDET 0x00000400 session has been detached
KSUSFFEX 0x00000800 “forced exit”during shutdown normal
KSUSFCAC 0x00001000 (cloned) session is cached
KSUSFILS 0x00002000 default tx isolation level is serializable
KSUSFOIL 0x00004000 override serializable for READ COMMITTED
KSUSFIDL 0x00008000 idle session scheduler
KSUSFSKP 0x00010000 SKIP unusable indexes maintenance
KSUSFCDF 0x00020000 defer all deferrable constraint by default
KSUSFCND 0x00040000 deferable constraints are immediate
KSUSFIDT 0x00080000 session to be implicitly detached
KSUSFTLA 0x00100000 transaction audit logged
KSUSFJQR 0x00200000 recource checking in job q process enabled
KSUSFMGS 0x00400000 session is migratable
KSUSFGOD 0x00800000 migratable session need to get ownership id
KSUSFSDS 0x01000000 suppress/enable TDSCN computations
KSUSFMSP 0x02000000 parent of migratable session
KSUSFMVC 0x04000000 MV container update progress
KSUSFNAS 0x08000000 an NLS alter session call was done
KSUSFTRU 0x10000000 a trusted callout was performed
KSUSFHOA 0x20000000 an HO agent was called
KSUSFSTZ 0x40000000 an alter session set time_zone was done
KSUSFSRF 0x80000000 summary refresh
4、call state object
Call state object是针对一个call的,我们查看call state object的时候一定要注意depth值,以此判断该call是用户调用还是递归调用。
5、enqueue state object
从enqueue state object中,我们主要可以查看锁的类型、锁的模式以及flag。
6、transaction dump
Transaction dump对应的oracle内存结构是KTCXB,可以通过X$KTCXB来了解更详细的情况。
flag的描述如下(资料来源早期版本,针对10g可能略有不同):
1 allocated but no transaction
2 transaction active
4 state object no longer valid
8 transaction about to commit/abort
10 space management transaction
20 recursive transaction
40 no undo logging
80 no change/commit,must rollback
100 use system undo segment (0)
200 valid undo segment assigned
400 undo seg assigned,lock acquired
800 change may have been made
1000 assigned undo seg
2000 required lock in cleanup
4000 is a pseudo space extent
8000 save the tx table & tx ctl block
10000 no read-only optimize for 2pc
20000 multiple sess attached to this tx
40000 commit scn future set
80000 dependent scn future set
100000 dist call failed,force rollback
200000 remote uncoordinated ddl tx
400000 coordinated global tx
800000 pdml transaction
1000000 next must be commit or rollback
2000000 coordinator in pdml
4000000 disable block level recovery
8000000 library and/or row caches dirty
10000000 serializable transaction
20000000 waiting for unbound transaction
40000000 loosely coupled transaction branch
80000000 long-running transaction
flag2的描述如下(资料来源早期版本,针对10g可能略有不同):
1 tx needs refresh on commit
2 delete performed in tx
4 concurrency check enabled
8 insert performed
10 dir path insert performed
20 fast rollback on net disconnect
40 do not commit this tx
80 this tx made remote change
100 all read-only optim enabled
事务环境的结构如下:
Struct ktcev {
kenv ktcevenv;
kuba UBA的高水位;
kuba ktcevucl;
sb2 在undo高水位块中的剩余空间;
kcbds undo block的描述;
kdbafr undo段头的DBA地址;
kturt * 指向undo seg的KTURT结构;
}
7、library object lock/handle
library object lock如下:
Flags的描述如下:
KGLLKBRO 0x0100 this lock is broken
KGLLKCBB 0x0200 this lock can be broken
KGLLKPNC 0x0400 “kgllkpnc” is a valid pin for the call
KGLLKPNS 0x0800 “kgllkpns” is a valid pin for the session
KGLLKCGA 0x1000 this lock is in CGA memory
KGLLKINH 0x2000 the instance lock is inherited
KGLLKLRU 0x4000 lock protects an object on the session cache lru
KGLLKKPC 0x8000 lock protects an object in the session keep cache
KGLLKRES 0x0010 reserved lock preventing handle from being freed
KGLLKCBK 0x0020 need to callback the client for delete/dump
作为library object的主体,handle的信息如图:
其中namespace的取值包括:
CRSR cursor
TABL table/view/sequence/synonym
BODY body(e.g.,package body)
TRGR trigger
INDX index
CLST cluster
KGLT internal KGL testing
PIPE pipe
LOB lob
DIR directory
QUEU queue
OBJG replication object group
PROP replication propagator
JVSC java source
JVRE java resource
ROBJ reserved for server-side RepAPI
REIP replication internal package
CPOB context policy object
EVNT pub_sud internal information
SUMM summary
DIMN dimension
CTX app context
OUTL stored outlines
TULS ruleset objects
RMGR resource manager
XDBS xdb schema
PPLN pending scheduler plan
PCLS pending scheduler class
SUBS subscription information
LOCS location information
RMOB remote objects info
RSMD RepAPI snapshot metadata
JVSD java shared data
STFG file group
TRANS transformation
RELC replication – log based child
STRM stream:capture process in log-based replication
REVC rule evaluation context
STAP stream:apply process in log-based replication
RELS source inlog-based replication
RELD destination in log-based replication
IFSD IFS schema
XDBC XDB configuration management
USAG user agent mapping
VOMDTABL multi-versioned object for table
JSQI scheduler-event queue info object
CDCS change set
VOMDINDX multi-versioned object for index
STBO sql tuing base object
HTSP hintset object
JSGA scheduler global attributes
JSET scheduler start time namespace
TABL_T temporary table
CLST_T temporary cluster
INDX_Ttemporary index
SCPD sratch pad
JSLV scheduler job slave
MODL mining models
状态标志位的取值:
EXS existent
NEX no-existent
LOC
CRT being created
ALT being altered
DRP being dropped
PRG being purged
UPD being uodated
RIV marked for rolling invalidation
NRC don’t recover when an exclusive pin fails
UDP dep being updated
BOW bad owner of database link
MEM has frame memory associated with heap 0
REA protected with read-only access at least once
NOA protected with no access at least once
通过对library cache object/handle的分析,可以找到相关的sql以及cursor的状态。
@ struct kcbds
@ {
@ ktid kcbdstid; /* full relative DBA plus object number */
@ …..
@ struct ktid /* relative dba + objd */
@ {
@ kdbafr dbr_ktid; /* a relative dba */
@ kobjd objd_ktid; /* data object number */
@ kobjn objn_ktid; /* dictionary object number */
@ struct kdbafr /* full relative dba */
@ {
@ ktsn tsn_kdbafr; 4bytes /* a tablespace number */
@ krdba dba_kdbafr; 4bytes /* a relative dba */
@ };