一次Exadata上的ORA-600[kjbmprlst:shadow]故障分析

 

这是一套Exadata 上的11.2.0.1   四节点RAC 系统,从今年初开始频繁地因为LMS后台进程出现内部错误ORA-600[kjbmprlst:shadow]而导致实例意外终止, 虽然是4节点RAC 保证了其高可用性, 但是仍因为 实例经常意外终止 导致应用程序交易失败。

实际我在7月份已经分析过该问题了, 详见<Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例>一文 , kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞。 实际上当时我已经给出了禁用DRM 可能可以避免该ORA-600[kjbmprlst:shadow] 的看法 ,但是 翻阅MOS 上所有现有的ORA-600[kjbmprlst:shadow] case均没有提及disable DRM可以是一种workaround的途径, 提交SR 也没有得到Oracle GCS关于该看法的 肯定回答, 此外还考虑到 核心的产品环境 使用11.2.0.1 这个bug 较多的版本确实不合适, 升级到 11.2.0.2 已修复该Bug 10121589 的bundle Patch 也算是一种不错的方案 ,所以也就没有深究下去。

之后就一直在做升级Exadata到11.2.0.2 的一些准备工作, 但是用户最后考虑到升级Exadata的步骤过于复杂, 很多步骤是不可回退的, 而且在国内升级的案例目前也不多, 推翻了 升级的方案, 于是有了下面的这一段故障分析

 

故障特征

 

如上面所说的这是一套Exadata上的4节点 11.2.0.1 RAC , 应用程序是 Oracle 自己的Billing and Revenue Management (BRM) 。从今年初开始多次出现ORA-600[kjbmprlst:shadow]内部错误, 引发该内部错误的是RAC 关键后台进程 LMS (  global cache server process) ,结果是导致实例意外终止。

 

该ORA-600[kjbmprlst:shadow]在10月份频发的发生(几乎一个礼拜2次),故障发生还存在以下2个特征:

  1. 主要集中在凌晨03:00 以后的20分钟内发生
  2. 1、2、4节点都陆续发生过故障 , 唯有 3号节点从未发生过 ( 这个后面会介绍)

 

初步分析

 

通过应用方拿到了某次故障发生当天每一个小时的AWR报告, 这次故障发生在凌晨 03:05 的 1号节点上, 于是开始分析当时的 1号实例的性能负载。

分析AWR 报告的一种方法就是 通过对比多个时间段的报告 来了解性能变化 ,当然前提是知道问题发生的时段, 这里我们来对比2个时段的信息,这三个时段分别是:

1. 问题发生前的几分钟, 因为1号实例上 3点以后的AWR 报告缺失,所以这里使用的是 02:00 ~ 03:00 的AWR
2. 问题发生前的1小时 , 01:00~02:00 的AWR

 

首先观测DB TIME:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 14225 12-Oct-11 02:00:05 284 152.2
End Snap: 14226 12-Oct-11 02:59:59 288 150.1
Elapsed: 59.91 (mins)
DB Time: 809.98 (mins)

 

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 14224 12-Oct-11 01:00:10 284 152.3
End Snap: 14225 12-Oct-11 02:00:05 284 152.2
Elapsed: 59.91 (mins)
DB Time: 257.62 (mins)

 

AWR中的DB TIME可以一定程度上反映数据库实例的繁忙程度。这里 01:00 到 02:00 时间段的AAS= 257/60 = 4.28,而问题发生的时间段的 AAS = 809 / 60 = 13 , 说明实例的工作负载明显上升了。

 

再来看主要等待事件:

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
enq: TX – row lock contention 10,072 16,079 1596 33.08 Application
log file sync 166,546 10,579 64 21.77 Commit
gc buffer busy acquire 55,828 8,017 144 16.50 Cluster
gc buffer busy release 21,203 5,904 278 12.15 Cluster
gc current block busy 29,196 3,362 115 6.92 Cluster

 

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
log file sync 258,415 4,875 19 31.54 Commit
gc buffer busy acquire 44,471 3,349 75 21.66 Cluster
DB CPU 1,880 12.16
gc current block busy 42,722 1,748 41 11.31 Cluster
gc buffer busy release 10,730 1,127 105 7.29 Cluster

 

问题发生时段的主要等待事件是 enq: TX – row lock contention 这说明存在大量的行锁争用,此外gc buffer busy acquire/release 和 gc current block busy 全局缓存争用等待时长也明显较 之前的 01:00- 02:00 多了数倍,这说明问题发生时段 节点间对全局缓存资源的争用进一步被加剧了。01:00 – 02:00 时段的 enq: TX – row lock contention 不是前5的等待事件,但实际每秒处理的事务数更多为71.8个(问题发生时段为每秒46.3个事务),这说明造成大量行所争用的的原因不是事务变多, 而是应用程序特性造成的。

 

引起频繁 enq: TX – row lock contention等待事件 和 全局缓存资源争用 的最主要SQL语句均是1usg75g3cx4n6:

 

Cluster Wait Time (s) Executions %Total Elapsed Time(s) %Clu %CPU %IO SQL Id SQL Module SQL Text
4,047.30 20,470 22.75 17,790.11 21.01 0.04 0.00 1usg75g3cx4n6 update

 

SQL “1usg75g3cx4n6” 是一条Update更新语句,在此不列出。

 

问题发生时段该语句 1 号实例上运行了20470次,平均每次运行耗时 0.87s , cluster wait time 占该SQL总执行时间的22.75% ,而在01:00 – 02:00 期间该语句共运行了32845 次,平均每次运行耗时 0.02s , cluster wait time 占该SQL总执行时间的 67.44 %。

以上信息说明1号实例在问题发生时段 由SQL语句1usg75g3cx4n6 因为enq: TX – row lock contention(主要因素) 和 gc 全局缓存争用 (次要因素)相互作用引起了巨大的负载。引起enq: TX – row lock contention行锁争用的主要原因是 并发的DML更新相关的行数据。

 

深入分析

除了在问题发生时段数据库负载加剧,出现大量的行锁和全局缓存争用外,问题时段还出现了大规模的动态资源管理操作DRM( Dynamic Resource Management)。

引入DRM的目的是尽可能减少RAC 中CPU和 消息通信的消耗,DRM的资源管理决定依据相关对象的全局缓存操作历史, 该特性会定期自动找出集群中频繁访问某个缓存对象的节点实例,并将该缓存对象re-master到该实例上以优化RAC性能。  DRM 由 LMON、LMD 和 LMS 进程协同完成, LMS负责传输实际的 gcs resouce和 lock 锁信息 , 11g中受到_gc_policy_time隐藏参数的影响,默认最短每10分钟可能被触发。当设置_gc_policy_time=0 ,则object affinity 和 11g 中引入的read mostly policies新特性均将被禁用。

 

AWR报告中1号实例问题时段的 DRM 统计:

 

Name Total per Remaster Op Begin Snap End Snap
remaster ops 2 1.00
remastered objects 2 1.00
replayed locks received 215,517 107,758.50
replayed locks sent 339,349 169,674.50
resources cleaned 0 0.00
remaster time (s) 7.4 3.70
quiesce time (s) 4.3 2.13
freeze time (s) 0.1 0.04
cleanup time (s) 0.4 0.21
replay time (s) 1.0 0.48
fixwrite time (s) 0.9 0.47
sync time (s) 0.8 0.38
affinity objects 525 525

 

以下为01:00- 02:00 时段的DRM 统计:

Name Total per Remaster Op Begin Snap End Snap
remaster ops 5 1.00
remastered objects 8 1.60
replayed locks received 110,902 22,180.40
replayed locks sent 68,890 13,778.00
resources cleaned 0 0.00
remaster time (s) 12.9 2.57
quiesce time (s) 1.4 0.28
freeze time (s) 0.2 0.03
cleanup time (s) 0.9 0.17
replay time (s) 0.5 0.10
fixwrite time (s) 1.9 0.38
sync time (s) 8.0 1.60
affinity objects 526 525

 

可以看到在问题发生时段 DRM 发送(sent)了更多的replayed locks锁资源 ,这里的replay是指将本地锁信息传送到新的master 实例(REPLAY – Transfer of the local lock information to the new master.) , 而负责发送这些replayed locks锁资源的 正是LMS 进程。

如上文所述1、2、4节点均出现过该ORA-600[kjbmprlst:shadow]故障,但是唯有3号节点从没有出现过该问题, 打开3号节点在问题发生时段(03:00-04:00) 的AWR报告后就有重大的发现:

 

Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
3,590.52 0 17.30 34.71 63.63 59v4zh1ac3v2a DBMS_SCHEDULER DECLARE job BINARY_INTEGER := …
3,589.83 0 17.29 51.69 37.98 b6usrg82hwsa3 DBMS_SCHEDULER call dbms_stats.gather_databas…
2,173.79 1 2,173.79 10.47 62.16 28.75 b20w3p880j2gs DBMS_SCHEDULER /* SQL Analyze(1) */ select /*…

 

可以发现3号节点当时正在运行”call dbms_stats.gather_database_stats_job_proc ( )”  11g中自动收集统计信息的作业,该作业运行了较长时间(一个小时内尚未完成), 我们知道10g以后引入了默认的自动收集统计信息的作业,在11g中得到了加强, 这一作业会自动收集自上一次收集以来更新超过总数据量10%的 对象的统计信息, 因为该核心产品数据库是大型的OLTP应用,在03:00之前已经发生了大量的数据更改,所以这导致gather_database_stats_job_proc 要收集许多大对象上的统计信息,而这会引发该3号实例频繁Request之前在 1、2号实例中被更新的全局缓存资源,频繁地访问影响了DRM的decision, 导致大量的object 被 re-master 到 3号节点上。

 

Name Total per Remaster Op Begin Snap End Snap
remaster ops 14 1.00
remastered objects 25 1.79
replayed locks received 1,088,009 77,714.93
replayed locks sent 202,112 14,436.57
resources cleaned 0 0.00
remaster time (s) 30.8 2.20
quiesce time (s) 8.5 0.60
freeze time (s) 0.4 0.03
cleanup time (s) 3.3 0.24
replay time (s) 6.4 0.46
fixwrite time (s) 7.7 0.55
sync time (s) 4.5 0.32
affinity objects 4,294,967,289 4,294,967,289

 

可以看到3号实例接收了大量的replayed lock (replayed locks received) ,而发送的lock则很少。而实际发送(sent) 给 3号实例这些锁信息 资源的 正是 1号实例上的LMS进程。

翻阅alert.log 和视图 我发现每一次的gather_database_stats_job_proc均在凌晨3点的 3号实例发生, 因此每一次故障发生时3号实例只要接收这些replayed locks   资源即可,其LMS进程 仅发送少量的 replayed locks , 所以造成了上面所说的3号实例从不发生该ORA-600[kjbmprlst:shadow]故障 , 这一古怪的现象。

 

我们再来阅读上发生ORA-600[kjbmprlst:shadow]故障的LMS进程的trace文件中的部分内容:

*** 2011-10-12 02:25:22.770
DRM(22812) win(8) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22813) win(1) lms 1 finished replaying gcs resources

*** 2011-10-12 03:05:29.717
 lms 1 finished fixing gcs write protocol
DRM(22813) win(2) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol

*** 2011-10-12 03:05:30.312
DRM(22813) win(3) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22813) win(4) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22813) win(5) lms 1 finished replaying gcs resources

*** 2011-10-12 03:05:31.280
 lms 1 finished fixing gcs write protocol
DRM(22813) win(6) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22813) win(7) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol

*** 2011-10-12 03:05:32.269
DRM(22813) win(8) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22814) win(1) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22814) win(2) lms 1 finished replaying gcs resources

*** 2011-10-12 03:05:33.479
 lms 1 finished fixing gcs write protocol
DRM(22814) win(3) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22814) win(4) lms 1 finished replaying gcs resources

*** 2011-10-12 03:05:34.333
 lms 1 finished fixing gcs write protocol
DRM(22814) win(5) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol
DRM(22814) win(6) lms 1 finished replaying gcs resources
 lms 1 finished fixing gcs write protocol

*** 2011-10-12 03:05:35.315
DRM(22814) win(7) lms 1 finished replaying gcs resources
Incident 1035732 created, dump file:
/maclean1_lms1_20075_i1035732.trc
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []

2011-10-12 03:05:36.811503 : kjbmprlst: internal error on replaying 0x618cd.371 from 3
2011-10-12 03:05:36.811541 : kjbmbassert [0x618cd.371]
2011-10-12 03:05:36.811559 : kjbmsassert(0x618cd.371)(2)
2011-10-12 03:05:36.811636 : kjbmsassert(0x618cd.371)(3)
2011-10-12 03:05:36.811667 : kjbmsassert(0x618cd.371)(4)
kjmpbmsg fatal error on 39
MSG [39:KJX_B_REPLAY] .Yq inc=64 len=136 sender=(3,3) seq=0
     fg=ip stat=KJUSERSTAT_DONE spnum=14 flg=x24
    flow ctrl: ver=60 flag=169 len=136 tkts=0 seq=0 wrp=0
               sndr=3 dest=1 rcvr=2
   FUSION MSG 0x2b178baa5120,39 from 3 spnum 14 ver[64,22813] ln 136 sq[3,8]
          REPLAY 1 [0x618cd.371, 574166.0] c[0x59f904e08,-17597] [0x1,xc69]
          grant 1 convert 0 role x0
          pi [0x0.0x0] flags 0x0 state 0x100
          disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
          msgRM# 22813 bkt# 99097 drmbkt# 99097
      pkey 574166.0 undo 0 stat 5 masters[32768, 3->32768] reminc 64 RM# 22813
 flg x6 type x0 afftime x9776dffe
 nreplays by lms 0 = 0 
 nreplays by lms 1 = 0 
     hv 9 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 34, dom 0]
     kjga st 0x4, step 0.36.0, cinc 64, rmno 22813, flags 0x20
     lb 98304, hb 114687, myb 99097, drmb 99097, apifrz 1
   FUSION MSG DUMP END
MSG [65521:PBATCH] inc=64 len=8376 sender=3 seq=279807301 fg=q
    flow ctrl: ver=1 flag=37 len=8376 tkts=0 seq=279807301 wrp=1
               sndr=3 dest=1 rcvr=2
MSG [39:KJX_B_REPLAY] .Y'z inc=64 len=136 sender=(3,3) seq=0
     fg=ip stat=KJUSERSTAT_DONE spnum=14 flg=x24
   FUSION MSG 0x2b178baa31c8,39 from 3 spnum 14 ver[64,22813] ln 136 sq[3,8]
          REPLAY 1 [0x527f5.27a, 574166.0] c[0xa2f7eddc0,2158] [0x1,xf33]
          grant 1 convert 0 role x0
          pi [0x0.0x0] flags 0x0 state 0x100
          disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
          msgRM# 22813 bkt# 99582 drmbkt# 99582
      pkey 574166.0 undo 0 stat 5 masters[32768, 3->32768] reminc 64 RM# 22813
 flg x6 type x0 afftime x9776dffe
 nreplays by lms 0 = 0 
 nreplays by lms 1 = 0 
     hv 33 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 34, dom 0]
     kjga st 0x4, step 0.36.0, cinc 64, rmno 22813, flags 0x20
     lb 98304, hb 114687, myb 99582, drmb 99582, apifrz 1
   FUSION MSG DUMP END

.........................

    ------------process 0xcf22a6d40--------------------
    proc version      : 0
    Local node        : 0
    pid               : 20075
    lkp_node          : 0
    svr_mode          : 0
    proc state        : KJP_NORMAL
    Last drm hb acked : 114687
    flags             : x80
    current lock op    : (nil)
    Total accesses    : 11763805
    Imm.  accesses    : 11763774
    Locks on ASTQ     : 0
    Locks Pending AST : 0
    Granted locks     : 0
    AST_Q: 
    PENDING_Q: 
    GRANTED_Q: 
    KJM HIST LMS1: 
      20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 
      20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 
      20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 
      20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 
      20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 
      20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 1:0 
      13:65521:0 20:39:0 20:39:0 20:39:0 
    START DEFER MSG QUEUE 0 ON LMS1 flg x87:

MSG [61:KJX_SYNC_RCVCHANNEL] inc=64 len=424 sender=(3,3) seq=98262147
     fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
     SYNC_RCVCHANNEL 0xca303f2f0 from 3 spnum 11 ver[64,196610]
     receiver 0 last sent seq 0.98262028
     receiver 1 last sent seq 1.545148943
     receiver 2 last sent seq 1.279807365
MSG [25:KJX_FTDONE] inc=64 len=64 sender=(3,3) seq=109225856
     fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
MSG [61:KJX_SYNC_RCVCHANNEL] inc=64 len=424 sender=(3,3) seq=98262146
     fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
     SYNC_RCVCHANNEL 0xca405e758 from 3 spnum 11 ver[64,196610]
     receiver 0 last sent seq 0.98262028
     receiver 1 last sent seq 1.545148943
     receiver 2 last sent seq 1.279807365
MSG [25:KJX_FTDONE] inc=64 len=64 sender=(3,3) seq=6400108
     fg=s stat= spnum=11 flg=x20
    END DEFER MSG QUEUE 0 DUMP ON LMS1
     SEQUENCES: 
    [sidx 0]  0:0.0  1:711171046.0  2:279807300.1  3:43212975.0
     DEFER MSG QUEUE 1 ON LMS1 IS EMPTY
     SEQUENCES: 
    [sidx 1]  0:0.0  1:711171047.0  2:279807292.1  3:43212974.0
     DEFER MSG QUEUE 2 ON LMS1 IS EMPTY
     SEQUENCES: 
    [sidx 2]  0:0.0  1:711171047.0  2:279807300.1  3:43212975.0


可以看到在实际出发ORA-600[kjbmprlst:shadow]之前,LMS后台进程确实在进行频繁地DRM操作。

我们不难看出出发ORA-600[kjbmprlst:shadow]的原因是:数据库较高的性能负载, 1号实例上SQL 语句引发的大量行锁争用 和 3号节点上的 gather_database_stats_job_proc 自动统计信息作业 同时存在 , 导致1号实例上的LMS进程频发地执行DRM操作 , 最终在较高的工作负载情况下触发了管理kjbm shadow锁的代码bug 。

经过分析发现2、4号实例发生该ORA-600[kjbmprlst:shadow]故障与1号节点的情形相似,整个RAC中唯有3号节点例外,因为总是由它来执行gather_database_stats_job_proc, 导致它总是成为DRM的受益者 , 而其他3个节点成为受害者。

但是注意 可能你要问 那不就是简单的由自动统计作业引发的故障吗? 用得着分析这么多吗, 直接关掉该作业 或者换个时段就可以了。

回答:实际上如果仅仅是gather_database_stats_job_proc的话是不足以引发该故障的, 一方面由于节点之间的所跑得应用程序都是一样的,也没有采取分区或者分schema , 这导致节点之间的全局缓存资源争用加剧 , 另一方面 因为每个Buffer Cache都较大, 均存放了大量的全局缓存资源, 且数据的更新十分频繁 , 这导致gather_database_stats_job_proc作业开始执行后 大量的全局资源涌向3号实例 ,同时也触发了DRM的决定, 以至于1、2、4号节点上的LMS 在短期内有极大的工作压力 , 最终导致了 bug 被触发。仅仅禁用该gather_database_stats_job_proc 自动信息搜集作业 ,或者修改该作业的运行时间 都不足以彻底规避该ORA-600[kjbmprlst:shadow]故障, 因为只要是LMS后台进程处于较高的工作负载情况下都仍有可能触发bug。

 

虽然以上分析了一大堆但是我对解决该ORA-600[kjbmprlst:shadow]故障的看法没有丝毫改变, 而且更坚定了。Disable DRM 禁用DRM特性可以从根本上改变LMS进程的行为方式,彻底杜绝该ORA-600[kjbmprlst:shadow]故障发生的可能性。

但我仍需要一些佐证!

 

资料信息

 

翻阅MOS 一个Note 映入我的眼帘,<Bug 12834027 – ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking [ID 12834027.8]> 这个Note 我在上一次分析这个问题的时候已经阅读过了, 当时该Note的描述该问题会在11.2.0.1 上发生,不提供任何workaround 方法 ,唯一的解决途径是打上11.2.0.2 + Bp 的补丁。

但是我发现这个Note 在最近的 21-OCT-2011 被修改了,内容如下:

Bug 12834027  ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking
 This note gives a brief overview of bug 12834027.
 The content was last updated on: 21-OCT-2011
 Click here for details of each of the sections below.
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected 	Versions BELOW 12.1
    Versions confirmed as being affected 	

        11.2.0.2 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        12.1 (Future Release) 

Symptoms:

Related To:

    Internal Error May Occur (ORA-600)
    ORA-600 [kjbmprlst:shadow]
    ORA-600 [kjbrasr:pkey] 

    RAC (Real Application Clusters) / OPS
    _gc_read_mostly_locking 

Description

    ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] can occur with RAC
    read mostly locking.

    Workaround
      Disable DRM
     or
      Disable read-mostly object locking
      eg: Run with "_gc_read_mostly_locking"=FALSE

 

 

这一次翻到这个Note 是在10月28日 ,而这个Note是在10月21日更新的, 不得不说这个更新来的很及时, 提供了2种 workaround 的方式 , 包括彻底禁用DRM ,和之前所说的禁用_gc_read_mostly_locking 特性,相关的bug note 说明了 另一个Exadata case的经过:

 

Hdr: 12834027 N/A RDBMS 11.2.0.2.0 RAC PRODID-5 PORTID-226 12730844
Abstract: ORA-600 [KJBMPRLST:SHADOW] & [KJBRASR:PKEY] IN A READ MOSTLY & SKIP LOCK ENV

*** 08/04/11 04:22 am ***

PROBLEM:
--------
3 nodes got crashed and restarted in exadata cluster.

DIAGNOSTIC ANALYSIS:
--------------------
8 node Exadata RAC instance. Below instances got terminated and it rejoined
cluster around 11:46 Today (01-Aug).

Instance2 (gmfcdwp2) got ORA-600 [kjbmprlst:shadow] at 11:46 AM and
instance got terminated by LMS1 process.
LMS1 (ospid: 14922): terminating the instance due to error 484

instance got terminated by LMS1 process
LMS1 (ospid: 28176): terminating the instance due to error 484

instance got terminated by LMS1 process

which was seen in instance 4 and 8.

 

这个case和我们的惊人的相似, 也是几个Exadata上特定的instance number 会出现ORA-600 [kjbmprlst:shadow]。

另一方面需要评估禁用DRM 对于整体性能的影响,其实这一点我已经在<Important parameters For Oracle BRM Application in 11gR2>中介绍了, Oracle 官方推荐为 BRM 应用程序 禁用 DRM 特性 (真绕口) , 而因为这套系统使用的是默认的Exadata 出场的实例参数设置,所以。。。

 

解决方案

1. 禁用DRM 特性,可以彻底避免LMS进程因为DRM操作而引发bug , 具体做法是设置隐藏参数_gc_policy_time为0。

影响评估:

DRM 特性可以一定程度上优化数据库性能, 但是对于没有分区或分schema 的应用程序而言 DRM并不会带来好处。  Oracle 官方推荐在BRM应用程序 上 要禁用DRM特性, 见Metalink 文档<Questions About BRM Support For Oracle DB 11GR2 And Exadata [ID 1319678.1]>:

“_gc_policy_time=0 to disable DRM.” 修改该_gc_policy_time参数 需要重启实例

 

2. 部分禁用DRM 特性 , 设置 _gc_read_mostly_locking= False。 设置 _gc_read_mostly_locking= False可以避免replayed lock 大量传输 ,极大程度上减少LMS触发bug的可能性。  修改该参数同样需要重启实例

 

案例总结

写了这么久有点累了, 也有点不知所云。 可能会再写一篇 介绍11g 中新加入的Read-mostly locking DRM 新特性,本来想在这篇里一并介绍但是发现篇幅有点长了。

有时候我们解决一个问题只需要某个恰当的方案就足够了, 这个方案可以来源于Metalink  、Google 或者朋友, 我们并不知道这种做法、解决方式实际这么生效 , 但是总归或多或少 它生效了。 但是这不代表 真正意义上理解这个故障或者问题了, 譬如 我花了大约 3-4个小时的时间来分析这个问题, 虽然得到的 结论几乎和我在7月份的那一次是一样的 ,但是透过 分析的过程 我还是获益良多 , 这是唯有自己分析方能产生的结果。

同学们, 享受这分析过程带来的 乐趣吧!

Comments

  1. 刘青桂 says

    DRM导致的ora-600,我们在10g的产品库也遇到过。

  2. maclean says

    kjbdrmcvtq lmon drm quiesce: ping completion

    KJBDRMCVTQ LMON DRM QUIESCE

    RELATED BUGS:
    ————-
    bug 12371302 GCS DRM FREEZE IN ENTER SERVER MODE WAIT EVENT

    *** 2011-11-03 03:02:45.727
    DRM(23931) win(1) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    DRM(23931) win(2) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    kjdrvalidRMno: msg type 34 from inst 3 dropped
    FUSION MSG 0x2ab77691c338,34 from 3 spnum 13 ver[84,23930] ln 144 sq[3,8]
    CLOSE [0xf8f09.c7, 453155.0] shadow [0xc57508f10,200] seq 0x2 act 1
    client [0x7efa631d8,3820] reqid x1b46 ordered 0
    grant 1 convert 0 role x0
    pi [0x0.0x0] flags 0x0 state 0x100
    disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
    msgRM# 23930 bkt# 45537 drmbkt# 45537
    pkey 453155.0 undo 0 stat 5 masters[32768, 32768->3] reminc 0 RM# 0
    flg x6 type x0 afftime x1e062edc
    nreplays by lms 0 = 1031
    nreplays by lms 1 = 1001
    hv 86 [stat 0x0, 4->4, wm 32768, RMno 0, reminc 80, dom 0]
    kjga st 0x4, step 0.34.0, cinc 84, rmno 23930, flags 0x20
    lb 32768, hb 49151, myb 45537, drmb 45537, apifrz 1
    FUSION MSG DUMP END
    kjmvalidate: drm drop a message RMno 23930 from 3 type 34
    mver 84 myver 84 seq 0.0 flag xa9
    DRM(23931) win(3) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    DRM(23931) win(4) lms 0 finished replaying gcs resources

    *** 2011-11-03 03:02:46.291
    lms 0 finished fixing gcs write protocol
    DRM(23931) win(5) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    DRM(23931) win(6) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    DRM(23931) win(7) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    DRM(23931) win(8) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol

    *** 2011-11-03 03:02:47.435
    DRM(23932) win(1) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol
    DRM(23932) win(2) lms 0 finished replaying gcs resources
    lms 0 finished fixing gcs write protocol

    *** 2011-11-03 03:02:48.380
    DRM(23932) win(3) lms 0 finished replaying gcs resources
    Incident 637552 created, dump file: /u01/app/oracle/diag/rdbms/aaadb/aaadb4/incident/incdir_637552/aaadb4_lms0_13488_i637552.trc
    ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []

    2011-11-03 03:02:49.503232 : kjbmprlst: internal error on replaying 0xe3.16e from 3
    2011-11-03 03:02:49.503257 : kjbmbassert [0xe3.16e]
    2011-11-03 03:02:49.503272 : kjbmsassert(0xe3.16e)(1)
    2011-11-03 03:02:49.503355 : kjbmsassert(0xe3.16e)(2)
    2011-11-03 03:02:49.503389 : kjbmsassert(0xe3.16e)(3)
    kjmpbmsg fatal error on 39
    MSG [39:KJX_B_REPLAY] {]n inc=84 len=136 sender=(3,3) seq=0
    fg=ip stat=KJUSERSTAT_DONE spnum=13 flg=x24
    flow ctrl: ver=0 flag=169 len=136 tkts=0 seq=0 wrp=0
    sndr=3 dest=4 rcvr=1
    FUSION MSG 0x2ab774dc4788,39 from 3 spnum 13 ver[84,23931] ln 136 sq[3,8]
    REPLAY 1 [0xe3.16e, 576949.0] c[0x8cfc7a978,2307] [0x1,xec1]
    grant 1 convert 0 role x0
    pi [0x0.0x0] flags 0x0 state 0x100
    disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
    msgRM# 23931 bkt# 40988 drmbkt# 40988
    pkey 576949.0 undo 0 stat 5 masters[32768, 3->32768] reminc 84 RM# 23931
    flg x6 type x0 afftime x1e1b3f20
    nreplays by lms 0 = 0
    nreplays by lms 1 = 0
    hv 110 [stat 0x0, 4->4, wm 32768, RMno 0, reminc 80, dom 0]
    kjga st 0x4, step 0.36.0, cinc 84, rmno 23931, flags 0x20
    lb 32768, hb 49151, myb 40988, drmb 40988, apifrz 1
    FUSION MSG DUMP END
    MSG [65521:PBATCH] inc=84 len=8376 sender=3 seq=396496720 fg=q
    flow ctrl: ver=1 flag=37 len=8376 tkts=0 seq=396496720 wrp=0
    sndr=3 dest=4 rcvr=1
    MSG [39:KJX_B_REPLAY] {]W inc=84 len=136 sender=(3,3) seq=0
    fg=ip stat=KJUSERSTAT_DONE spnum=13 flg=x24
    FUSION MSG 0x2ab774dc3798,39 from 3 spnum 13 ver[84,23931] ln 136 sq[3,8]
    REPLAY 1 [0xa2ed.257, 453155.0] c[0x68fdab748,4085] [0x1,x19e5]
    grant 1 convert 0 role x0
    pi [0x0.0x0] flags 0x0 state 0x100
    disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
    msgRM# 23931 bkt# 37981 drmbkt# 37981
    pkey 453155.0 undo 0 stat 5 masters[32768, 3->32768] reminc 84 RM# 23931
    flg x6 type x0 afftime x1e1b3ef2
    nreplays by lms 0 = 1156
    nreplays by lms 1 = 1117
    hv 121 [stat 0x0, 4->4, wm 32768, RMno 0, reminc 82, dom 0]
    kjga st 0x4, step 0.36.0, cinc 84, rmno 23931, flags 0x20
    lb 32768, hb 49151, myb 37981, drmb 37981, apifrz 1
    FUSION MSG DUMP END

    Dumping Work Area Table (level=1)
    =====================================

    Global SGA Info
    —————

    global target: 23808 MB
    auto target: 20165 MB
    max pga: 2048 MB
    pga limit: 4096 MB
    pga limit known: 0
    pga limit errors: 0

    pga inuse: 1402 MB
    pga alloc: 1872 MB
    pga freeable: 312 MB
    pga freed: 71835 MB
    pga to free: 0 %
    broker request: 0

    pga auto: 0 MB
    pga manual: 0 MB

    pga alloc (max): 5139 MB
    pga auto (max): 2049 MB
    pga manual (max): 0 MB

    # workareas : 0
    # workareas(max): 47

    COMPONENT NAME : ASM Buffer Cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    Mapping for Component id to Component name:
    1: shared pool
    2: large pool
    3: java pool
    4: streams pool
    5: SGA Target
    7: DEFAULT buffer cache
    8: KEEP buffer cache
    9: RECYCLE buffer cache
    10: DEFAULT 2K buffer cache
    11: DEFAULT 4K buffer cache
    12: DEFAULT 8K buffer cache
    13: DEFAULT 16K buffer cache
    14: DEFAULT 32K buffer cache
    15: Shared IO Pool
    16: PGA Target
    17: ASM Buffer Cache
    Dumping memory transfer history; start=0
    Dumping resize operations history; start=22
    1: 0, 0, 144, 0->22, fin=22, st=2
    2: 0, 0, 146, 0->1, fin=1, st=2
    3: 0, 0, 148, 0->1, fin=1, st=2
    4: 0, 0, 150, 0->1, fin=1, st=2
    5: 0, 0, 575, 0->121, fin=121, st=2
    7: 0, 0, 639, 0->95, fin=95, st=2
    8: 0, 0, 646, 0->0, fin=0, st=2
    9: 0, 0, 647, 0->0, fin=0, st=2
    10: 0, 0, 641, 0->0, fin=0, st=2
    11: 0, 0, 642, 0->0, fin=0, st=2
    12: 0, 0, 643, 0->0, fin=0, st=2
    13: 0, 0, 644, 0->0, fin=0, st=2
    14: 0, 0, 645, 0->0, fin=0, st=2
    15: 0, 0, 649, 0->0, fin=0, st=2
    16: 0, 0, 1780, 0->81, fin=81, st=2
    17: 0, 0, 639, 0->0, fin=0, st=2
    7: 1, 0, 639, 95->95, fin=95, st=2
    16: 3, 1, 1780, 81->85, fin=85, st=2
    7: 4, 2, 639, 95->91, fin=95, st=3
    16: 3, 1, 1780, 85->89, fin=89, st=2
    16: 3, 1, 1780, 89->93, fin=93, st=2
    7: 4, 2, 639, 95->91, fin=95, st=3
    Dumping last 2 granules of Transferred Memory
    Granule Header dump for 0xb4fffe000, hdrsz=72, gran size=268435456

    Trace Bucket Dump End: KSI
    —– Auto SGA Transfer History —–
    GRANULE SIZE is 268435456
    COMPONENT NAME : shared pool, curnum=22, usrsz=0, tgt=22, min=2, st=0, flg=80
    Number of granules in inuse list (listid 2) is 22
    COMPONENT NAME : large pool, curnum=1, usrsz=0, tgt=1, min=0, st=0, flg=80
    Number of granules in inuse list (listid 2) is 1
    COMPONENT NAME : java pool, curnum=1, usrsz=0, tgt=1, min=1, st=0, flg=80
    Number of granules in inuse list (listid 2) is 1
    COMPONENT NAME : streams pool, curnum=1, usrsz=0, tgt=1, min=0, st=0, flg=80
    Number of granules in inuse list (listid 2) is 1
    COMPONENT NAME : SGA Target, curnum=121, usrsz=0, tgt=117, min=0, st=0, flg=0
    COMPONENT NAME : System Memory, curnum=81, usrsz=0, tgt=85, min=0, st=0, flg=0
    Number of granules in inactive list (listid 0) is 81
    COMPONENT NAME : DEFAULT buffer cache, curnum=95, usrsz=0, tgt=91, min=1, st=0, flg=80
    Number of granules in inuse list (listid 2) is 95
    COMPONENT NAME : KEEP buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : RECYCLE buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : DEFAULT 2K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : DEFAULT 4K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : DEFAULT 8K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : DEFAULT 16K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : DEFAULT 32K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : Shared IO Pool, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
    COMPONENT NAME : PGA Target, curnum=93, usrsz=0, tgt=93, min=0, st=0, flg=0

    GES destination context:
    GES remote instance per receiver context:
    Rcvr 1 state 0x0
    rbufs tot 1100 in_ksxp 1100 ubufs 0 free 10024
    msgs rdyq 0 watq 209 nsq 0 routq 1107
    rmpool: alloc 1775 avail 10024
    ibatch: 2[14340411] 3[1331955] 4[230018] 5[77584] 6[41863] 7[29418] 8[22906] 9[15597] 10[12489] 11[10986]
    ibatch: 12[9804] 13[8399] 14[7605] 15[7279] 16[14835] 17[5326] 18[1268] 19[534] 20[399]
    ibatch: 21[333] sum 16169009
    rcvs 16173636 tot msgs 35913409 max sz 103 avg sz 2
    pbatch: 2[13145648] 3[976483] 4[328637] 5[191756] 6[140659] 7[110263] 8[150497] 9[74627] 10[72052] 11[64821]
    pbatch: 12[63710] 13[58268] 14[63716] 15[50112] 16[415654] 17[9120] 18[2760] 19[1854] 20[1535]
    pbatch: 21[1369] sum 15923541
    rcvs 407779634 tot msgs 445570264 max sz 103 avg sz 2
    watq : 32:34 36:10 44:3 73:9 74:54 100:13 rsv:(tot 86 ib 4 pb 82)
    watq : embed: 33:4 34:9 35:4 36:14 37:3 44:21 86:115 87:2
    watq : embed: 88:1 100:2
    kjctseventdump-begin tail 29 heads 29 @ -1847136902 29 @ -1847136902 29 @ -1847136902 63 @ 0
    #0 id 364 exid 66573 dests 0000
    #1 id 364 exid 66573 dests 0000

    kjctseventdump-end tail 29 heads 29 @ -1847136902 29 @ -1847136902 29 @ -1847136902 63 @ 0
    End of KJC Communication Dump
    START DEFER MSG QUEUE 0 ON LMS0 flg x87:
    MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(3,3) seq=6285124
    fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
    SYNC_RCVCHANNEL 0xcf3c0f5c0 from 3 spnum 11 ver[84,2]
    receiver 0 last sent seq 0.6285080
    receiver 1 last sent seq 0.396496722
    receiver 2 last sent seq 0.404953232
    MSG [25:KJX_FTDONE] inc=84 len=64 sender=(3,3) seq=233131925
    fg=s stat= spnum=11 flg=x20
    MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(2,2) seq=5701455
    fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
    SYNC_RCVCHANNEL 0xcf3d171c0 from 2 spnum 11 ver[84,1280262987]
    receiver 0 last sent seq 0.5694348
    receiver 1 last sent seq 0.121316137
    receiver 2 last sent seq 0.133410556
    MSG [25:KJX_FTDONE] inc=84 len=64 sender=(2,2) seq=134378202
    fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
    MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(3,3) seq=6285125
    fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
    SYNC_RCVCHANNEL 0xcf4915530 from 3 spnum 11 ver[84,2]
    receiver 0 last sent seq 0.6285080
    receiver 1 last sent seq 0.396496744
    receiver 2 last sent seq 0.404953256
    MSG [25:KJX_FTDONE] inc=84 len=64 sender=(3,3) seq=233131925
    fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
    MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(1,1) seq=9049842
    fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
    SYNC_RCVCHANNEL 0xcf427bad0 from 1 spnum 11 ver[84,1280262987]
    receiver 0 last sent seq 0.9049829
    receiver 1 last sent seq 0.377205652
    receiver 2 last sent seq 0.377998312
    MSG [25:KJX_FTDONE] inc=84 len=64 sender=(1,1) seq=60389025
    fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
    END DEFER MSG QUEUE 0 DUMP ON LMS0
    SEQUENCES:
    [sidx 0] 0:377205631.0 1:121316104.0 2:396496713.0 3:0.0
    DEFER MSG QUEUE 1 ON LMS0 IS EMPTY
    SEQUENCES:
    [sidx 1] 0:377205635.0 1:121316122.0 2:396496715.0 3:0.0
    DEFER MSG QUEUE 2 ON LMS0 IS EMPTY
    SEQUENCES:
    [sidx 2] 0:377205635.0 1:121316122.0 2:396496715.0 3:0.0
    error 484 detected in background process
    ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []

    *** 2011-11-03 03:02:51.630
    LMS0 (ospid: 13488): terminating the instance due to error 484
    ksuitm: waiting up to [5] seconds before killing DIAG(13471)

    *** 2011-11-03 03:03:06.283
    Instance termination failed to kill one or more processes
    ksuitm_check: OS PID=18888 is still alive
    ksuitm_check: OS PID=18755 is still alive

  3. mustapha says

    DRM特性的问题,已经是一个常见问题了,可以看看老熊的blog,一样的案例,肯定要禁用掉, 你这个案例,还得做service分业务减少缓存融合,还有就是那一个update语句得从业务层面优化,不要那么频繁的更新一个表,而且估计没有做批量提交以及出现同时更新同一行的情况,个人对extradata做性能测试后,觉得extraddata不是很适合做oltp,做oltp的话,extradata的压缩技术,存储技术都没办法发挥,还不如用两台台高性能的多核多线程的小型机来完成工作,如类似780一类的机器,我编写了一个oltp 的反复更新,插入,冲突较多的测试模型在extra上,性能真的不敢恭维。

    • 11g DRM增强不少,对于读比写多的OLTP应用还是不错的, 但是bug也确实有点多。我个人的意见还是要视乎不同的应用特性,选择是否禁用。

      不过还是要感谢你的建议!

  4. Danny says

    感谢您完整详细的分享

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号