在《gc buffer busy/gcs log flush sync与log file sync》一文中我介绍了 redo flush慢造成RAC中gc buffer busy争用的原理, 而在《【技术分享】开Oracle调优鹰眼,深入理解AWR性能报告》 中我又介绍了 log file sync(其实本质是lgwr 写redo慢)也会造成单实例single instance环境中的buffer busy wait等待, 这是为什么呢?
我们来做一个演示说明该问题:
示例用表:
conn maclean/oracle
create table maclog (t1 int);
打开一个Session A 并获得它的OS进程号
[oracle@vrh8 ~]$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.5.0 – Production on Sat Mar 9 09:32:25 2013
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@vrh8 ~]$ ps -ef|grep LOCAL
oracle 18441 18438 0 09:41 ? 00:00:00 oracleG10R25 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 18445 18348 0 09:42 pts/3 00:00:00 grep LOCAL
使用 gdb debug 该18375的前台进程,并指定breakpoint
[oracle@vrh8 ~]$ gdb $ORACLE_HOME/bin/oracle 18441
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-37.el5)
(gdb) b kcrf_commit_force
Breakpoint 1 at 0x286519c
之后在session A执行:
SQL> insert into maclog values(1);
1 row created.
若insert hang,则在GDB 中输入多次C – continue 这样能够继续
输入 commit; ==》 此时session A会hang住
SQL> commit; ====》HANG
在GDB 中输入bt 可以看到stack call
(gdb) bt
#0 0x000000000286519c in kcrf_commit_force ()
#1 0x00000000028620ef in kcrfw_redo_gen ()
#2 0x00000000010e7dba in kcbchg1_main ()
#3 0x00000000010e6d99 in kcbchg ()
#4 0x000000000143f65a in ktucmt ()
#5 0x00000000013c7a06 in ktcCommitTxn ()
#6 0x00000000042a559e in ktdcmt ()
#7 0x00000000024fe09c in k2lcom ()
#8 0x0000000002418993 in k2send ()
#9 0x0000000001418b47 in xctctl ()
#10 0x00000000014174dd in xctcom_with_options ()
#11 0x000000000211fc26 in kksExecuteCommand ()
#12 0x00000000030ef87a in opiexe ()
#13 0x0000000003232d47 in kpoal8 ()
#14 0x00000000013b7c10 in opiodr ()
#15 0x0000000003c3c9da in ttcpip ()
#16 0x00000000013b3144 in opitsk ()
#17 0x00000000013b60ec in opiino ()
#18 0x00000000013b7c10 in opiodr ()
#19 0x00000000013a92f8 in opidrv ()
#20 0x0000000001fa3936 in sou2o ()
#21 0x000000000072d40b in opimai_real ()
#22 0x000000000072d35c in main ()
其stack call为ktcCommitTxn=> ktucmt => kcbchg => kcbchg1_main => kcrfw_redo_gen => kcrf_commit_force
kcbchg==> block change ,为什么要发生block change呢? 因为commit需要对在Buffer Cache里的block做immediate block cleanout
此时开一个session B 查询maclog表
SQL> select * from maclog; ==》阻塞在buffer busy wait上
这样就通过 无法完成的immediate block cleanout 去pin住buffer ,来形成了一个buffer busy wait
做一个system state dump :
SQL> oradebug setmypid
Statement processed.
SQL>
SQL> oradebug dump systemstate 266;
Statement processed.
SQL> oradebug tracefile_name
/s01/admin/G10R25/udump/g10r25_ora_18551.trc
分析该systemstate dump
Session B一直在等 buffer busy wait
SO: 0xaa42fff8, type: 4, owner: 0xaa3048f8, flag: INIT/-/-/0x00
(session) sid: 164 trans: (nil), creator: 0xaa3048f8, flag: (100051) USR/- BSY/-/-/-/-/-
DID: 0001-0008-00000002, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
service name: SYS$BACKGROUND
waiting for ‘buffer busy waits’ wait_time=0, seconds since wait started=245
file#=2, block#=89, class#=21
blocking sess=0x(nil) seq=12413
Dumping Session Wait History
for ‘buffer busy waits’ count=1 wait_time=0.215431 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977438 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977538 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977512 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977480 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977488 sec
file#=2, block#=89, class#=21
for ‘buffer busy waits’ count=1 wait_time=0.977639 sec
file#=2, block#=89, class#=21
其stack call为:
Short stack dump:
ksdxfstk()+32<-ksdxcb()+1573<-sspuser()+111<-__restore_rt()+0<-__GI_semtimedop()+10<-sskgpwwait()+265<-skgpwwait()+162<-kslwaitns_timed()+1102<-kskthbwt()+246<-kslwait(
)+228<-kcbzwb()+1496<-kcbgtcr()+23190<-ktugct()+588<-ktbgcl1()+4711<-ktrgcm()+1979<-ktrget()+486<-kdst_fetch()+524<-kdstf0000001kmP()+3137<-kdsttgr()+2427<-qertbFetch()
+650<-qergsFetch()+444<-opifch2()+2944<-opiall0()+2206<-opikpr()+642<-opiodr()+1184<-rpidrus()+196<-skgmstack()+158<-rpidru()+116<-rpiswu2()+409<-kprball()+1270<-kkescF
etch()+83<-kkedsamp()+6304<-kkedsSel()+1495<-kkecdn()+3055<-kkotap()+859<-kkoiqb()+9830<-kkooqb()+904<-kkoqbc()+2093<-apakkoqb()+167<-apaqbdDescendents()+414<-apaqbdLis
tReverse()+68<-apadrv()+573<-opitca()+1545<-kksLoadChild()+9714<-kxsGetRuntimeLock()+1454<-kksfbc()+14910<-kkspsc0()+979<-kksParseCursor()+142<-opiosq0()+1641<-kpooprx(
)+318<-kpoal8()+964<-opiodr()+1184<-ttcpip()+1226<-opitsk()+1310<-opiino()+1024<-opiodr()+1184<-opidrv()+548<-sou2o()+114<-opimai_real()+163<-main()+116<-__libc_start_m
ain()+244<-_start()+41
file=2 block=0x89 即 137 md: EXCL 被 owner: 0xaa30b888 PID=22持有
SO: 0xaadd91d8, type: 24, owner: 0xaa44f240, flag: INIT/-/-/0xc0
(buffer) (CR) PR: 0xaa30c878 FLG: 0x0
class bit: (nil)
kcbbfbp: [BH: 0x69fe2948, LINK: 0xaadd9218] (WAITING)
where: ktuwh05: ktugct, why: 0
BH (0x69fe2948) file#: 2 rdba: 0x00800089 (2/137) class: 33 ba: 0x69cc6000
set: 3 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 0 obj: -1 objn: 0 tsn: 1 afn: 2
hash: [aacc0958,aacc0958] lru: [75fc3228,7eff15a8]
obj-flags: object_ckpt_list
ckptq: [6ef77368,aadf28f8] fileq: [6ef77378,aadf2938] objq: [a7bdd678,6afdf518]
use: [aaddb748,aaddb748] wait: [aaddef10,aadd9218]
st: XCURRENT md: EXCL tch: 465
flags: mod_started gotten_in_current_mode block_written_once
redo_since_read
change state: ACTIVE
change count: 1
LRBA: [0x2d5.10d5a.0] HSCN: [0x0.242e76a] HSUB: [1]
Using State Objects
—————————————-
SO: 0xaaddb708, type: 24, owner: 0xaa454568, flag: INIT/-/-/0x00
(buffer) PR: 0xaa30b888 FLG: 0x1000
class bit: (nil)
kcbbfbp: [BH: 0x69fe2948, LINK: 0xaaddb748]
where: ktuwh02: ktugus, why: 0
Waiting State Objects
—————————————-
SO: 0xaaddeed0, type: 24, owner: 0xaa450598, flag: INIT/-/-/0xc0
(buffer) PR: 0xaa3048f8 FLG: 0x0
class bit: (nil)
kcbbfbp: [BH: 0x69fe2948, LINK: 0xaaddef10] (WAITING)
where: ktuwh02: ktugus, why: 0
简单来说这个示例说明了几点:
- OLTP类型的小DML操作一般都会是immediate block cleanout的,这要求在commit之前对block做change kcbchg
- 在commit kcrf_commit_force完成前都不会释放对该block buffer的buffer pin
- 由上述2点造成的buffer pin最终会影响select和其他insert/update/delete 形成buffer busy wait
- 由于慢的lgwr写redo log会造成 kcrf_commit_force commit的缓慢,表现在等待事件上就是log file sync
- 由于block cleanout时pin block buffer且commit 慢,则会导致更长时间的buffer busy wait
- 若log file sync是由lgwr 写redo log慢(log file parallel write)引起的,则它的另一个效应就是buffer busy wait增多
- 若看到AWR中log file sync+buffer busy wait是主要等待事件,则优先解决log file sync ,因为buffer busy wait实际可能是受害者
AWR中与commit cleanout相关的 Instance activity 有好几个
commit cleanout failures: block lost
commit cleanout failures: buffer being written
commit cleanout failures: callback failure
commit cleanout failures: cannot pin
commit cleanouts
commit cleanouts successfully completed