MINSCN与Cache Fusion Read Consistent

问题来源于 网友在Ask Maclean Home 上关于  RAC 中 Past Image PI块的提问, 网友提出了一个演示,我们在11.2.0.3 2 Node RAC的环境中重现这个实验:

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> drop table test purge;

Table dropped.

SQL> alter system flush buffer_cache;

System altered.

SQL> create table test(id number);

insert into test values(1);

insert into test values(2);

commit;

/* 我们利用 rowid定位TEST表仅有的2行数据的数据块位置 */

select dbms_rowid.rowid_block_number(rowid),dbms_rowid.rowid_relative_fno(rowid) from test;

DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID) DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID)
------------------------------------ ------------------------------------
                               89233                                    1
                               89233                                    1 

SQL> alter system flush buffer_cache;

System altered.

Instance 1  Session A 执行UPDATE操作:

SQL> update test set id=id+1 where id=1;

1 row updated.

Instance 1  Session B 查询x$BH buffer header视图 了解 相关Buffer的状态 

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1227595

 

X$BH 视图的 STATE字段代表Buffer的状态, 以下是状态列表:

 

STATE       NUMBER
           KCBBHFREE         0       buffer free
           KCBBHEXLCUR       1       buffer current (and if DFS locked X)
           KCBBHSHRCUR       2       buffer current (and if DFS locked S)
           KCBBHCR           3       buffer consistant read
           KCBBHREADING      4       Being read
           KCBBHMRECOVERY    5       media recovery (current & special)
           KCBBHIRECOVERY    6       Instance recovery (somewhat special)

 

 

这个演示中我们需要用到的是 : state =1 Xcurrent 、 state=2 Scurrent 、 state=3 CR

 

接着在 Instance 2  更新 同一个数据块内的另一条记录 ,这回引发 gc current block 2 way  并将Current Block 传输到 Instance 2, 同时 Instance 1 的原”Current Block” Convert 成 Past Image:

 

 

Instance 2 Session C 

SQL> update test set id=id+1 where id=2;

1 row updated.

Instance 2 Session D

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         1          0
         3    1227641
         3    1227638

 

STATE =1 的Xcurrent block已传输到 Instance 2 , 再来看 Instance 1 此时的 GC状态:

 

Instance 1 Session B 

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1227641
         3    1227638
         8          0
         3    1227595

 

 

问题出现在这里, 当网友再次在Instance 1上的session A中执行对TEST表的SELECT查询后 ,发现原来的 3个 State=3的CR 块 数量减少到了1个:

 

Instance 1 session A 即最初执行UPDATE的 session

SQL> alter session set events '10046 trace name context forever,level 8:10708 trace name context forever,level 103: trace[rac.*] disk high';

Session altered.

SQL> select * from test;

        ID
----------
         2
         2

select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

      STATE CR_SCN_BAS
---------- ----------
         3    1227716
         3    1227713
         8          0

 

 

网友在原帖中是通过v$BH 视图观察CR块的数量,发现在执行SELECT查询后 CR块数量反而减少了,故而产生了疑问。 我们在以上演示中直接 观察X$BH视图可以发现 , 原本的三个CR块的SCN Version分别为: 1227641、1227638、1227595, 在SELECT查询完成后被 2个不同SCN version的CR块 1227716和 1227713 所替换, Oracle为什么要这样做呢?

 

所幸我们在实际执行SELECT查询前设置了event 10708和 rac.*的诊断TRACE,我们先来看看TRACE内容:

 

PARSING IN CURSOR #140444679938584 len=337 dep=1 uid=0 oct=3 lid=0 tim=1335698913632292 hv=3345277572 ad='bc0e68c8' sqlid='baj7tjm3q9sn4'
SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM (SELECT /*+ NO_PARALLEL("TEST") FULL("TEST") NO_PARALLEL_INDEX("TEST") */ 1 AS C1, 1 AS C2 FROM "SYS"."TEST" "TEST") SAMPLESUB
END OF STMT
PARSE #140444679938584:c=1000,e=27630,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,plh=1950795681,tim=1335698913632252
EXEC #140444679938584:c=0,e=44,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1950795681,tim=1335698913632390

*** 2012-04-29 07:28:33.632
kclscrs: req=0 block=1/89233

*** 2012-04-29 07:28:33.632
kclscrs: bid=1:3:1:0:7:80:1:0:4:0:0:0:1:2:4:1:26:0:0:0:70:1a:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:3f:0:1c:86:2d:4:0:0:0:0:a2:3c:7c:b:70:1a:0:0:0:0:1:0:7a:f8:76:1d:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:63:e5:0:0:0:0:0:0:10:0:0:0
2012-04-29 07:28:33.632578 : kjbcrc[0x15c91.1 76896.0][9]
2012-04-29 07:28:33.632616 : GSIPC:GMBQ: buff 0xba1e8f90, queue 0xbb79f278, pool 0x60013fa0, freeq 1, nxt 0xbb79f278, prv 0xbb79f278
2012-04-29 07:28:33.632634 : kjbmscrc(0x15c91.1)seq 0x2 reqid=0x1c(shadow 0xb4bb4458,reqid x1c)mas@2(infosz 200)(direct 1)
2012-04-29 07:28:33.632654 : kjbsentscn[0x0.12bbc1][to 2]
2012-04-29 07:28:33.632669 : GSIPC:SENDM: send msg 0xba1e9000 dest x20001 seq 24026 type 32 tkts xff0000 mlen x17001a0
2012-04-29 07:28:33.633385 : GSIPC:KSXPCB: msg 0xba1e9000 status 30, type 32, dest 2, rcvr 1

*** 2012-04-29 07:28:33.633
kclwcrs: wait=0 tm=689

*** 2012-04-29 07:28:33.633
kclwcrs: got 1 blocks from ksxprcv
WAIT #140444679938584: nam='gc cr block 2-way' ela= 689 p1=1 p2=89233 p3=1 obj#=76896 tim=1335698913633418
2012-04-29 07:28:33.633490 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-04-29 07:28:33.633510 : kjbrcvdscn[0x0.12bbc1][from 2][idx 2012-04-29 07:28:33.633527 : kjbrcvdscn[no bscn <= rscn 0x0.12bbc1][from 2]

*** 2012-04-29 07:28:33.633
kclwcrs: req=0 typ=cr(2) wtyp=2hop tm=689

 

 

通过TRACE不难发现 因为之前没有收集过TEST表的统计信息, 所以这里出发了Dynamic Sampling的动态采样,这本身会引发对TEST表的 CR读请求,实际产生了一次’gc cr block 2-way’ 等待:

2012-04-29 07:28:33.632654 : kjbsentscn[0x0.12bbc1][to 2]

12bbc1= 1227713  与上述X$BH中的一个CR块对应,kjbsentscn[0x0.12bbc1][to 2] 可以理解为 向 Instance 2 发送了SCN=12bbc1=1227713   DBA=0x15c91.1 76896.0 的  CR Request(obj#=76896)

之后kjbrcvdscn函数确认了 [no bscn <= rscn 0x0.12bbc1][from 2] ,即没有 比已receive的 SCN Version =12bbc1 更好的Best Version

 

CR Server Arch

CR Server in Oracle9i Release 2 There are three instances involved: the requestor instance, the lock master instance, and the current block owner instance. The lock is granted if one of the following is true: Resource held mode is NULL. Resource held mode is S and there is no holder of an S lock in the master node. Otherwise, the master forwards the CR request to the holder node. If the lock is global, then you choose a node to forward the CR request to as follows: If there is a past image (PI) at the lock master instance, and the PI SCN is greater than snap-scn, then the master node is this node. Otherwise, you choose a PI with the smallest SCN and PI-SCN greater than snap-SCN. The owner node of this PI is the node you forward the CR request to. The PI with smallest SCN is the most interesting one, because you have less UNDO to be applied. If there is no PI at all, you choose the node that the current buffer belongs to.

 

 

动态采样完成后才真正执行了用户发出的SELECT语句:

 

 

PARSING IN CURSOR #140444682869592 len=18 dep=0 uid=0 oct=3 lid=0 tim=1335698913635874 hv=1689401402 ad='b1a188f0' sqlid='c99yw1xkb4f1u'
select * from test
END OF STMT
PARSE #140444682869592:c=4999,e=34017,p=0,cr=7,cu=0,mis=1,r=0,dep=0,og=1,plh=1357081020,tim=1335698913635870
EXEC #140444682869592:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=1357081020,tim=1335698913635939
WAIT #140444682869592: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=0 tim=1335698913636071

*** 2012-04-29 07:28:33.636
kclscrs: req=0 block=1/89233

*** 2012-04-29 07:28:33.636
kclscrs: bid=1:3:1:0:7:83:1:0:4:0:0:0:1:2:4:1:26:0:0:0:70:1a:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:4:3:2:1:2:0:2:0:1c:86:2d:4:0:0:0:0:a2:3c:7c:b:70:1a:0:0:0:0:1:0:7d:f8:76:1d:1:2:dc:5:a9:fe:17:75:0:0:0:0:0:0:0:0:0:0:0:0:63:e5:0:0:0:0:0:0:10:0:0:0
2012-04-29 07:28:33.636209 : kjbcrc[0x15c91.1 76896.0][9]
2012-04-29 07:28:33.636228 : GSIPC:GMBQ: buff 0xba0e5d50, queue 0xbb79f278, pool 0x60013fa0, freeq 1, nxt 0xbb79f278, prv 0xbb79f278
2012-04-29 07:28:33.636244 : kjbmscrc(0x15c91.1)seq 0x3 reqid=0x1d(shadow 0xb4bb4458,reqid x1d)mas@2(infosz 200)(direct 1)
2012-04-29 07:28:33.636252 : kjbsentscn[0x0.12bbc4][to 2]
2012-04-29 07:28:33.636358 : GSIPC:SENDM: send msg 0xba0e5dc0 dest x20001 seq 24029 type 32 tkts xff0000 mlen x17001a0
2012-04-29 07:28:33.636861 : GSIPC:KSXPCB: msg 0xba0e5dc0 status 30, type 32, dest 2, rcvr 1

*** 2012-04-29 07:28:33.637
kclwcrs: wait=0 tm=865

*** 2012-04-29 07:28:33.637
kclwcrs: got 1 blocks from ksxprcv
WAIT #140444682869592: nam='gc cr block 2-way' ela= 865 p1=1 p2=89233 p3=1 obj#=76896 tim=1335698913637294
2012-04-29 07:28:33.637356 : kjbcrcomplete[0x15c91.1 76896.0][0]
2012-04-29 07:28:33.637374 : kjbrcvdscn[0x0.12bbc4][from 2][idx 2012-04-29 07:28:33.637389 : kjbrcvdscn[no bscn <= rscn 0x0.12bbc4][from 2]

*** 2012-04-29 07:28:33.637
kclwcrs: req=0 typ=cr(2) wtyp=2hop tm=865

 

类似的, “SELECT * FROM TEST”也引发了一次’gc cr block 2-way’等待:
2012-04-29 07:28:33.637374 : kjbrcvdscn[0x0.12bbc4][from 2][idx 2012-04-29 07:28:33.637389 : kjbrcvdscn[no bscn

最后Foreground Process从 Remote LMS哪里got的是 SCN=1227716 Version的CR, 同样与 之前我们从X$BH 视图查到的scn对应。

 

这样就可以解释为什么Instance 1上出现了2个SCN更大的CR块, 但仍无法解释原来存在于Instance 1 Buffer Cache中的 三个SCN Version 较小的CR 块消失的原因。

 

我们来看下面的演示:

 

 

 

SQL> alter system set "_enable_minscn_cr"=false scope=spfile;

System altered.

SQL> alter system set "_db_block_max_cr_dba"=20 scope=spfile;

System altered.

SQL> startup force;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area 1570009088 bytes
Fixed Size                  2228704 bytes
Variable Size             989859360 bytes
Database Buffers          570425344 bytes
Redo Buffers                7495680 bytes
Database mounted.
Database opened.

 

设置以上 “_enable_minscn_cr”=false 和 “_db_block_max_cr_dba”=20 并重启RAC所有实例, 重现上述演示:

 

 

在Instance 2 Session C 中update更新一次数据块 就对应地在Instance 1 中查询一次 ,以反复在Instance 1中Request CR 

SQL> update test set id=id+1 where id=2;              -- Instance 2

1 row updated.

SQL> select * From test;                         -- Instance 1

        ID
----------
         1
         2

下面为 Instance 1的 X$BH记录

select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

    STATE CR_SCN_BAS
---------- ----------
         3    1273080
         3    1273071
         3    1273041
         3    1273039
         8          0

SQL>  update test set id=id+1 where id=3;

1 row updated.

SQL> select * From test;

        ID
----------
         1
         2

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1273091
         3    1273080
         3    1273071
         3    1273041
         3    1273039
         8          0

...................

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1273793
         3    1273782
         3    1273780
         3    1273769
         3    1273734
         3    1273715
         3    1273691
         3    1273679
         3    1273670
         3    1273643
         3    1273635
         3    1273623
         3    1273106
         3    1273091
         3    1273080
         3    1273071
         3    1273041
         3    1273039
         3    1273033

19 rows selected.

SQL> select state,cr_scn_bas from x$bh where file#=1 and dbablk=89233 and state!=0;

     STATE CR_SCN_BAS
---------- ----------
         3    1274993

 

 

如上述演示 在设置了 “_enable_minscn_cr”(enable/disable minscn optimization for CR)=false 和 “_db_block_max_cr_dba”=20 (Maximum Allowed Number of CR buffers per dba) 2个 参数后 最多的时候 Instance 1 中缓存了同一个数据块的 多达 19个版本的CR块。

 

“_enable_minscn_cr”是11g以后出现的新隐藏参数,它控制Oracle是否计算CR块的最小SCN,当Foreground Process Receive接收到同一个数据块的更新(SCN更大)的SCN Version CR Block时可能会清除CBC上的 SCN较小的、旧的CR块 , 这样做的目的是减少Buffer Cache中同一个数据块 不同版本SCN Version的CR块的数量,注意不管是 语句级别或者事务级别 其所要求的Snap_Scn 快照 SCN总是 语句或事务开始时的Current SCN, 保留一些旧的CR块虽然可能对一些持续时间长的查询或者游标有益, 但是实例Buffer Cache中 同一个数据块的多版本 CR块的总数量是有限的, 这个总数受到 “_db_block_max_cr_dba” 隐藏参数的控制, 如我们上述演示中设置为20 ,则最多可以在Buffer Cache中缓存多大19个版本的CR块; 注意该”_db_block_max_cr_dba” 参数的默认值为6 , 即一个实例Buffer cache中同一个数据块的CR 版本同时不多于6个。

 

引入”_enable_minscn_cr” 优化CR的最小MINSCN 是有其理由的, 即便那些版本较旧的CR块被新陈代谢了, 但只要需要 Foreground Process还是可以通过CR Request , 要求 Holder Instance LMS 去build一个 BEST CR 来用, 这不消我们去担心。

沪ICP备14014813号-2

沪公网安备 31010802001379号