这是一套Exadata 上的11.2.0.1 四节点RAC 系统,从今年初开始频繁地因为LMS后台进程出现内部错误ORA-600[kjbmprlst:shadow]而导致实例意外终止, 虽然是4节点RAC 保证了其高可用性, 但是仍因为 实例经常意外终止 导致应用程序交易失败。
实际我在7月份已经分析过该问题了, 详见<Oracle RAC内部错误:ORA-00600[kjbmprlst:shadow]一例>一文 , kjbmprlst:shadow内部函数用以管理kjbm shadow锁(/libserver10.a/kjbm.o )信息,存在某个已关闭的lock没有及时message给master node的代码漏洞。 实际上当时我已经给出了禁用DRM 可能可以避免该ORA-600[kjbmprlst:shadow] 的看法 ,但是 翻阅MOS 上所有现有的ORA-600[kjbmprlst:shadow] case均没有提及disable DRM可以是一种workaround的途径, 提交SR 也没有得到Oracle GCS关于该看法的 肯定回答, 此外还考虑到 核心的产品环境 使用11.2.0.1 这个bug 较多的版本确实不合适, 升级到 11.2.0.2 已修复该Bug 10121589 的bundle Patch 也算是一种不错的方案 ,所以也就没有深究下去。
之后就一直在做升级Exadata到11.2.0.2 的一些准备工作, 但是用户最后考虑到升级Exadata的步骤过于复杂, 很多步骤是不可回退的, 而且在国内升级的案例目前也不多, 推翻了 升级的方案, 于是有了下面的这一段故障分析
故障特征
如上面所说的这是一套Exadata上的4节点 11.2.0.1 RAC , 应用程序是 Oracle 自己的Billing and Revenue Management (BRM) 。从今年初开始多次出现ORA-600[kjbmprlst:shadow]内部错误, 引发该内部错误的是RAC 关键后台进程 LMS ( global cache server process) ,结果是导致实例意外终止。
该ORA-600[kjbmprlst:shadow]在10月份频发的发生(几乎一个礼拜2次),故障发生还存在以下2个特征:
- 主要集中在凌晨03:00 以后的20分钟内发生
- 1、2、4节点都陆续发生过故障 , 唯有 3号节点从未发生过 ( 这个后面会介绍)
初步分析
通过应用方拿到了某次故障发生当天每一个小时的AWR报告, 这次故障发生在凌晨 03:05 的 1号节点上, 于是开始分析当时的 1号实例的性能负载。
分析AWR 报告的一种方法就是 通过对比多个时间段的报告 来了解性能变化 ,当然前提是知道问题发生的时段, 这里我们来对比2个时段的信息,这三个时段分别是:
1. 问题发生前的几分钟, 因为1号实例上 3点以后的AWR 报告缺失,所以这里使用的是 02:00 ~ 03:00 的AWR
2. 问题发生前的1小时 , 01:00~02:00 的AWR
首先观测DB TIME:
Snap Id | Snap Time | Sessions | Cursors/Session | |
---|---|---|---|---|
Begin Snap: | 14225 | 12-Oct-11 02:00:05 | 284 | 152.2 |
End Snap: | 14226 | 12-Oct-11 02:59:59 | 288 | 150.1 |
Elapsed: | 59.91 (mins) | |||
DB Time: | 809.98 (mins) |
Snap Id | Snap Time | Sessions | Cursors/Session | |
---|---|---|---|---|
Begin Snap: | 14224 | 12-Oct-11 01:00:10 | 284 | 152.3 |
End Snap: | 14225 | 12-Oct-11 02:00:05 | 284 | 152.2 |
Elapsed: | 59.91 (mins) | |||
DB Time: | 257.62 (mins) |
AWR中的DB TIME可以一定程度上反映数据库实例的繁忙程度。这里 01:00 到 02:00 时间段的AAS= 257/60 = 4.28,而问题发生的时间段的 AAS = 809 / 60 = 13 , 说明实例的工作负载明显上升了。
再来看主要等待事件:
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
enq: TX – row lock contention | 10,072 | 16,079 | 1596 | 33.08 | Application |
log file sync | 166,546 | 10,579 | 64 | 21.77 | Commit |
gc buffer busy acquire | 55,828 | 8,017 | 144 | 16.50 | Cluster |
gc buffer busy release | 21,203 | 5,904 | 278 | 12.15 | Cluster |
gc current block busy | 29,196 | 3,362 | 115 | 6.92 | Cluster |
Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
---|---|---|---|---|---|
log file sync | 258,415 | 4,875 | 19 | 31.54 | Commit |
gc buffer busy acquire | 44,471 | 3,349 | 75 | 21.66 | Cluster |
DB CPU | 1,880 | 12.16 | |||
gc current block busy | 42,722 | 1,748 | 41 | 11.31 | Cluster |
gc buffer busy release | 10,730 | 1,127 | 105 | 7.29 | Cluster |
问题发生时段的主要等待事件是 enq: TX – row lock contention 这说明存在大量的行锁争用,此外gc buffer busy acquire/release 和 gc current block busy 全局缓存争用等待时长也明显较 之前的 01:00- 02:00 多了数倍,这说明问题发生时段 节点间对全局缓存资源的争用进一步被加剧了。01:00 – 02:00 时段的 enq: TX – row lock contention 不是前5的等待事件,但实际每秒处理的事务数更多为71.8个(问题发生时段为每秒46.3个事务),这说明造成大量行所争用的的原因不是事务变多, 而是应用程序特性造成的。
引起频繁 enq: TX – row lock contention等待事件 和 全局缓存资源争用 的最主要SQL语句均是1usg75g3cx4n6:
Cluster Wait Time (s) | Executions | %Total | Elapsed Time(s) | %Clu | %CPU | %IO | SQL Id | SQL Module | SQL Text |
---|---|---|---|---|---|---|---|---|---|
4,047.30 | 20,470 | 22.75 | 17,790.11 | 21.01 | 0.04 | 0.00 | 1usg75g3cx4n6 | update |
SQL “1usg75g3cx4n6” 是一条Update更新语句,在此不列出。
问题发生时段该语句 1 号实例上运行了20470次,平均每次运行耗时 0.87s , cluster wait time 占该SQL总执行时间的22.75% ,而在01:00 – 02:00 期间该语句共运行了32845 次,平均每次运行耗时 0.02s , cluster wait time 占该SQL总执行时间的 67.44 %。
以上信息说明1号实例在问题发生时段 由SQL语句1usg75g3cx4n6 因为enq: TX – row lock contention(主要因素) 和 gc 全局缓存争用 (次要因素)相互作用引起了巨大的负载。引起enq: TX – row lock contention行锁争用的主要原因是 并发的DML更新相关的行数据。
深入分析
除了在问题发生时段数据库负载加剧,出现大量的行锁和全局缓存争用外,问题时段还出现了大规模的动态资源管理操作DRM( Dynamic Resource Management)。
引入DRM的目的是尽可能减少RAC 中CPU和 消息通信的消耗,DRM的资源管理决定依据相关对象的全局缓存操作历史, 该特性会定期自动找出集群中频繁访问某个缓存对象的节点实例,并将该缓存对象re-master到该实例上以优化RAC性能。 DRM 由 LMON、LMD 和 LMS 进程协同完成, LMS负责传输实际的 gcs resouce和 lock 锁信息 , 11g中受到_gc_policy_time隐藏参数的影响,默认最短每10分钟可能被触发。当设置_gc_policy_time=0 ,则object affinity 和 11g 中引入的read mostly policies新特性均将被禁用。
AWR报告中1号实例问题时段的 DRM 统计:
Name | Total | per Remaster Op | Begin Snap | End Snap |
---|---|---|---|---|
remaster ops | 2 | 1.00 | ||
remastered objects | 2 | 1.00 | ||
replayed locks received | 215,517 | 107,758.50 | ||
replayed locks sent | 339,349 | 169,674.50 | ||
resources cleaned | 0 | 0.00 | ||
remaster time (s) | 7.4 | 3.70 | ||
quiesce time (s) | 4.3 | 2.13 | ||
freeze time (s) | 0.1 | 0.04 | ||
cleanup time (s) | 0.4 | 0.21 | ||
replay time (s) | 1.0 | 0.48 | ||
fixwrite time (s) | 0.9 | 0.47 | ||
sync time (s) | 0.8 | 0.38 | ||
affinity objects | 525 | 525 |
以下为01:00- 02:00 时段的DRM 统计:
Name | Total | per Remaster Op | Begin Snap | End Snap |
---|---|---|---|---|
remaster ops | 5 | 1.00 | ||
remastered objects | 8 | 1.60 | ||
replayed locks received | 110,902 | 22,180.40 | ||
replayed locks sent | 68,890 | 13,778.00 | ||
resources cleaned | 0 | 0.00 | ||
remaster time (s) | 12.9 | 2.57 | ||
quiesce time (s) | 1.4 | 0.28 | ||
freeze time (s) | 0.2 | 0.03 | ||
cleanup time (s) | 0.9 | 0.17 | ||
replay time (s) | 0.5 | 0.10 | ||
fixwrite time (s) | 1.9 | 0.38 | ||
sync time (s) | 8.0 | 1.60 | ||
affinity objects | 526 | 525 |
可以看到在问题发生时段 DRM 发送(sent)了更多的replayed locks锁资源 ,这里的replay是指将本地锁信息传送到新的master 实例(REPLAY – Transfer of the local lock information to the new master.) , 而负责发送这些replayed locks锁资源的 正是LMS 进程。
如上文所述1、2、4节点均出现过该ORA-600[kjbmprlst:shadow]故障,但是唯有3号节点从没有出现过该问题, 打开3号节点在问题发生时段(03:00-04:00) 的AWR报告后就有重大的发现:
Elapsed Time (s) | Executions | Elapsed Time per Exec (s) | %Total | %CPU | %IO | SQL Id | SQL Module | SQL Text |
---|---|---|---|---|---|---|---|---|
3,590.52 | 0 | 17.30 | 34.71 | 63.63 | 59v4zh1ac3v2a | DBMS_SCHEDULER | DECLARE job BINARY_INTEGER := … | |
3,589.83 | 0 | 17.29 | 51.69 | 37.98 | b6usrg82hwsa3 | DBMS_SCHEDULER | call dbms_stats.gather_databas… | |
2,173.79 | 1 | 2,173.79 | 10.47 | 62.16 | 28.75 | b20w3p880j2gs | DBMS_SCHEDULER | /* SQL Analyze(1) */ select /*… |
可以发现3号节点当时正在运行”call dbms_stats.gather_database_stats_job_proc ( )” 11g中自动收集统计信息的作业,该作业运行了较长时间(一个小时内尚未完成), 我们知道10g以后引入了默认的自动收集统计信息的作业,在11g中得到了加强, 这一作业会自动收集自上一次收集以来更新超过总数据量10%的 对象的统计信息, 因为该核心产品数据库是大型的OLTP应用,在03:00之前已经发生了大量的数据更改,所以这导致gather_database_stats_job_proc 要收集许多大对象上的统计信息,而这会引发该3号实例频繁Request之前在 1、2号实例中被更新的全局缓存资源,频繁地访问影响了DRM的decision, 导致大量的object 被 re-master 到 3号节点上。
Name | Total | per Remaster Op | Begin Snap | End Snap |
---|---|---|---|---|
remaster ops | 14 | 1.00 | ||
remastered objects | 25 | 1.79 | ||
replayed locks received | 1,088,009 | 77,714.93 | ||
replayed locks sent | 202,112 | 14,436.57 | ||
resources cleaned | 0 | 0.00 | ||
remaster time (s) | 30.8 | 2.20 | ||
quiesce time (s) | 8.5 | 0.60 | ||
freeze time (s) | 0.4 | 0.03 | ||
cleanup time (s) | 3.3 | 0.24 | ||
replay time (s) | 6.4 | 0.46 | ||
fixwrite time (s) | 7.7 | 0.55 | ||
sync time (s) | 4.5 | 0.32 | ||
affinity objects | 4,294,967,289 | 4,294,967,289 |
可以看到3号实例接收了大量的replayed lock (replayed locks received) ,而发送的lock则很少。而实际发送(sent) 给 3号实例这些锁信息 资源的 正是 1号实例上的LMS进程。
翻阅alert.log 和视图 我发现每一次的gather_database_stats_job_proc均在凌晨3点的 3号实例发生, 因此每一次故障发生时3号实例只要接收这些replayed locks 资源即可,其LMS进程 仅发送少量的 replayed locks , 所以造成了上面所说的3号实例从不发生该ORA-600[kjbmprlst:shadow]
我们再来阅读上发生ORA-600[kjbmprlst:shadow]故障的LMS进程的trace文件中的部分内容:
*** 2011-10-12 02:25:22.770 DRM(22812) win(8) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22813) win(1) lms 1 finished replaying gcs resources *** 2011-10-12 03:05:29.717 lms 1 finished fixing gcs write protocol DRM(22813) win(2) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol *** 2011-10-12 03:05:30.312 DRM(22813) win(3) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22813) win(4) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22813) win(5) lms 1 finished replaying gcs resources *** 2011-10-12 03:05:31.280 lms 1 finished fixing gcs write protocol DRM(22813) win(6) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22813) win(7) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol *** 2011-10-12 03:05:32.269 DRM(22813) win(8) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22814) win(1) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22814) win(2) lms 1 finished replaying gcs resources *** 2011-10-12 03:05:33.479 lms 1 finished fixing gcs write protocol DRM(22814) win(3) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22814) win(4) lms 1 finished replaying gcs resources *** 2011-10-12 03:05:34.333 lms 1 finished fixing gcs write protocol DRM(22814) win(5) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol DRM(22814) win(6) lms 1 finished replaying gcs resources lms 1 finished fixing gcs write protocol *** 2011-10-12 03:05:35.315 DRM(22814) win(7) lms 1 finished replaying gcs resources Incident 1035732 created, dump file: /maclean1_lms1_20075_i1035732.trc ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], [] 2011-10-12 03:05:36.811503 : kjbmprlst: internal error on replaying 0x618cd.371 from 3 2011-10-12 03:05:36.811541 : kjbmbassert [0x618cd.371] 2011-10-12 03:05:36.811559 : kjbmsassert(0x618cd.371)(2) 2011-10-12 03:05:36.811636 : kjbmsassert(0x618cd.371)(3) 2011-10-12 03:05:36.811667 : kjbmsassert(0x618cd.371)(4) kjmpbmsg fatal error on 39 MSG [39:KJX_B_REPLAY] .Yq inc=64 len=136 sender=(3,3) seq=0 fg=ip stat=KJUSERSTAT_DONE spnum=14 flg=x24 flow ctrl: ver=60 flag=169 len=136 tkts=0 seq=0 wrp=0 sndr=3 dest=1 rcvr=2 FUSION MSG 0x2b178baa5120,39 from 3 spnum 14 ver[64,22813] ln 136 sq[3,8] REPLAY 1 [0x618cd.371, 574166.0] c[0x59f904e08,-17597] [0x1,xc69] grant 1 convert 0 role x0 pi [0x0.0x0] flags 0x0 state 0x100 disk scn 0x0.0 writereq scn 0x0.0 rreqid x0 msgRM# 22813 bkt# 99097 drmbkt# 99097 pkey 574166.0 undo 0 stat 5 masters[32768, 3->32768] reminc 64 RM# 22813 flg x6 type x0 afftime x9776dffe nreplays by lms 0 = 0 nreplays by lms 1 = 0 hv 9 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 34, dom 0] kjga st 0x4, step 0.36.0, cinc 64, rmno 22813, flags 0x20 lb 98304, hb 114687, myb 99097, drmb 99097, apifrz 1 FUSION MSG DUMP END MSG [65521:PBATCH] inc=64 len=8376 sender=3 seq=279807301 fg=q flow ctrl: ver=1 flag=37 len=8376 tkts=0 seq=279807301 wrp=1 sndr=3 dest=1 rcvr=2 MSG [39:KJX_B_REPLAY] .Y'z inc=64 len=136 sender=(3,3) seq=0 fg=ip stat=KJUSERSTAT_DONE spnum=14 flg=x24 FUSION MSG 0x2b178baa31c8,39 from 3 spnum 14 ver[64,22813] ln 136 sq[3,8] REPLAY 1 [0x527f5.27a, 574166.0] c[0xa2f7eddc0,2158] [0x1,xf33] grant 1 convert 0 role x0 pi [0x0.0x0] flags 0x0 state 0x100 disk scn 0x0.0 writereq scn 0x0.0 rreqid x0 msgRM# 22813 bkt# 99582 drmbkt# 99582 pkey 574166.0 undo 0 stat 5 masters[32768, 3->32768] reminc 64 RM# 22813 flg x6 type x0 afftime x9776dffe nreplays by lms 0 = 0 nreplays by lms 1 = 0 hv 33 [stat 0x0, 1->1, wm 32768, RMno 0, reminc 34, dom 0] kjga st 0x4, step 0.36.0, cinc 64, rmno 22813, flags 0x20 lb 98304, hb 114687, myb 99582, drmb 99582, apifrz 1 FUSION MSG DUMP END ......................... ------------process 0xcf22a6d40-------------------- proc version : 0 Local node : 0 pid : 20075 lkp_node : 0 svr_mode : 0 proc state : KJP_NORMAL Last drm hb acked : 114687 flags : x80 current lock op : (nil) Total accesses : 11763805 Imm. accesses : 11763774 Locks on ASTQ : 0 Locks Pending AST : 0 Granted locks : 0 AST_Q: PENDING_Q: GRANTED_Q: KJM HIST LMS1: 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 20:39:0 1:0 13:65521:0 20:39:0 20:39:0 20:39:0 START DEFER MSG QUEUE 0 ON LMS1 flg x87: MSG [61:KJX_SYNC_RCVCHANNEL] inc=64 len=424 sender=(3,3) seq=98262147 fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1 SYNC_RCVCHANNEL 0xca303f2f0 from 3 spnum 11 ver[64,196610] receiver 0 last sent seq 0.98262028 receiver 1 last sent seq 1.545148943 receiver 2 last sent seq 1.279807365 MSG [25:KJX_FTDONE] inc=64 len=64 sender=(3,3) seq=109225856 fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0 MSG [61:KJX_SYNC_RCVCHANNEL] inc=64 len=424 sender=(3,3) seq=98262146 fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1 SYNC_RCVCHANNEL 0xca405e758 from 3 spnum 11 ver[64,196610] receiver 0 last sent seq 0.98262028 receiver 1 last sent seq 1.545148943 receiver 2 last sent seq 1.279807365 MSG [25:KJX_FTDONE] inc=64 len=64 sender=(3,3) seq=6400108 fg=s stat= spnum=11 flg=x20 END DEFER MSG QUEUE 0 DUMP ON LMS1 SEQUENCES: [sidx 0] 0:0.0 1:711171046.0 2:279807300.1 3:43212975.0 DEFER MSG QUEUE 1 ON LMS1 IS EMPTY SEQUENCES: [sidx 1] 0:0.0 1:711171047.0 2:279807292.1 3:43212974.0 DEFER MSG QUEUE 2 ON LMS1 IS EMPTY SEQUENCES: [sidx 2] 0:0.0 1:711171047.0 2:279807300.1 3:43212975.0
可以看到在实际出发ORA-600[kjbmprlst:shadow]之前,LMS后台进程确实在进行频繁地DRM操作。
我们不难看出出发ORA-600[kjbmprlst:shadow]的原因是:数据库较高的性能负载, 1号实例上SQL 语句引发的大量行锁争用 和 3号节点上的 gather_database_stats_job_proc 自动统计信息作业 同时存在 , 导致1号实例上的LMS进程频发地执行DRM操作 , 最终在较高的工作负载情况下触发了管理kjbm shadow锁的代码bug 。
经过分析发现2、4号实例发生该ORA-600[kjbmprlst:shadow]故障与1号节点的情形相似,整个RAC中唯有3号节点例外,因为总是由它来执行gather_database_stats_job_proc, 导致它总是成为DRM的受益者 , 而其他3个节点成为受害者。
但是注意 可能你要问 那不就是简单的由自动统计作业引发的故障吗? 用得着分析这么多吗, 直接关掉该作业 或者换个时段就可以了。
回答:实际上如果仅仅是gather_database_stats_job_proc的话是不足以引发该故障的, 一方面由于节点之间的所跑得应用程序都是一样的,也没有采取分区或者分schema , 这导致节点之间的全局缓存资源争用加剧 , 另一方面 因为每个Buffer Cache都较大, 均存放了大量的全局缓存资源, 且数据的更新十分频繁 , 这导致gather_database_stats_job_proc作业开始执行后 大量的全局资源涌向3号实例 ,同时也触发了DRM的决定, 以至于1、2、4号节点上的LMS 在短期内有极大的工作压力 , 最终导致了 bug 被触发。仅仅禁用该gather_database_stats_
虽然以上分析了一大堆但是我对解决该ORA-600[kjbmprlst:shadow]故障的看法没有丝毫改变, 而且更坚定了。Disable DRM 禁用DRM特性可以从根本上改变LMS进程的行为方式,彻底杜绝该ORA-600[kjbmprlst:shadow]故障发生的可能性。
但我仍需要一些佐证!
资料信息
翻阅MOS 一个Note 映入我的眼帘,<Bug 12834027 – ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking [ID 12834027.8]> 这个Note 我在上一次分析这个问题的时候已经阅读过了, 当时该Note的描述该问题会在11.2.0.1 上发生,不提供任何workaround 方法 ,唯一的解决途径是打上11.2.0.2 + Bp 的补丁。
但是我发现这个Note 在最近的 21-OCT-2011 被修改了,内容如下:
Bug 12834027 ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] with RAC read mostly locking This note gives a brief overview of bug 12834027. The content was last updated on: 21-OCT-2011 Click here for details of each of the sections below. Affects: Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions BELOW 12.1 Versions confirmed as being affected 11.2.0.2 Platforms affected Generic (all / most platforms affected) Fixed: This issue is fixed in 12.1 (Future Release) Symptoms: Related To: Internal Error May Occur (ORA-600) ORA-600 [kjbmprlst:shadow] ORA-600 [kjbrasr:pkey] RAC (Real Application Clusters) / OPS _gc_read_mostly_locking Description ORA-600 [kjbmprlst:shadow] / ORA-600 [kjbrasr:pkey] can occur with RAC read mostly locking. Workaround Disable DRM or Disable read-mostly object locking eg: Run with "_gc_read_mostly_locking"=FALSE
这一次翻到这个Note 是在10月28日 ,而这个Note是在10月21日更新的, 不得不说这个更新来的很及时, 提供了2种 workaround 的方式 , 包括彻底禁用DRM ,和之前所说的禁用_gc_read_mostly_locking 特性,相关的bug note 说明了 另一个Exadata case的经过:
Hdr: 12834027 N/A RDBMS 11.2.0.2.0 RAC PRODID-5 PORTID-226 12730844 Abstract: ORA-600 [KJBMPRLST:SHADOW] & [KJBRASR:PKEY] IN A READ MOSTLY & SKIP LOCK ENV *** 08/04/11 04:22 am *** PROBLEM: -------- 3 nodes got crashed and restarted in exadata cluster. DIAGNOSTIC ANALYSIS: -------------------- 8 node Exadata RAC instance. Below instances got terminated and it rejoined cluster around 11:46 Today (01-Aug). Instance2 (gmfcdwp2) got ORA-600 [kjbmprlst:shadow] at 11:46 AM and instance got terminated by LMS1 process. LMS1 (ospid: 14922): terminating the instance due to error 484 instance got terminated by LMS1 process LMS1 (ospid: 28176): terminating the instance due to error 484 instance got terminated by LMS1 process which was seen in instance 4 and 8.
这个case和我们的惊人的相似, 也是几个Exadata上特定的instance number 会出现ORA-600 [kjbmprlst:shadow]。
另一方面需要评估禁用DRM 对于整体性能的影响,其实这一点我已经在<Important parameters For Oracle BRM Application in 11gR2>中介绍了, Oracle 官方推荐为 BRM 应用程序 禁用 DRM 特性 (真绕口) , 而因为这套系统使用的是默认的Exadata 出场的实例参数设置,所以。。。
解决方案
1. 禁用DRM 特性,可以彻底避免LMS进程因为DRM操作而引发bug , 具体做法是设置隐藏参数_gc_policy_time为0。
影响评估:
DRM 特性可以一定程度上优化数据库性能, 但是对于没有分区或分schema 的应用程序而言 DRM并不会带来好处。 Oracle 官方推荐在BRM应用程序 上 要禁用DRM特性, 见Metalink 文档<Questions About BRM Support For Oracle DB 11GR2 And Exadata [ID 1319678.1]>:
“_gc_policy_time=0 to disable DRM.” 修改该_gc_policy_time参数 需要重启实例
2. 部分禁用DRM 特性 , 设置 _gc_read_mostly_locking= False。 设置 _gc_read_mostly_locking= False可以避免replayed lock 大量传输 ,极大程度上减少LMS触发bug的可能性。 修改该参数同样需要重启实例
案例总结
写了这么久有点累了, 也有点不知所云。 可能会再写一篇 介绍11g 中新加入的Read-mostly locking DRM 新特性,本来想在这篇里一并介绍但是发现篇幅有点长了。
有时候我们解决一个问题只需要某个恰当的方案就足够了, 这个方案可以来源于Metalink 、Google 或者朋友, 我们并不知道这种做法、解决方式实际这么生效 , 但是总归或多或少 它生效了。 但是这不代表 真正意义上理解这个故障或者问题了, 譬如 我花了大约 3-4个小时的时间来分析这个问题, 虽然得到的 结论几乎和我在7月份的那一次是一样的 ,但是透过 分析的过程 我还是获益良多 , 这是唯有自己分析方能产生的结果。
同学们, 享受这分析过程带来的 乐趣吧!
DRM导致的ora-600,我们在10g的产品库也遇到过。
kjbdrmcvtq lmon drm quiesce: ping completion
KJBDRMCVTQ LMON DRM QUIESCE
RELATED BUGS:
————-
bug 12371302 GCS DRM FREEZE IN ENTER SERVER MODE WAIT EVENT
*** 2011-11-03 03:02:45.727
DRM(23931) win(1) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
DRM(23931) win(2) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
kjdrvalidRMno: msg type 34 from inst 3 dropped
FUSION MSG 0x2ab77691c338,34 from 3 spnum 13 ver[84,23930] ln 144 sq[3,8]
CLOSE [0xf8f09.c7, 453155.0] shadow [0xc57508f10,200] seq 0x2 act 1
client [0x7efa631d8,3820] reqid x1b46 ordered 0
grant 1 convert 0 role x0
pi [0x0.0x0] flags 0x0 state 0x100
disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
msgRM# 23930 bkt# 45537 drmbkt# 45537
pkey 453155.0 undo 0 stat 5 masters[32768, 32768->3] reminc 0 RM# 0
flg x6 type x0 afftime x1e062edc
nreplays by lms 0 = 1031
nreplays by lms 1 = 1001
hv 86 [stat 0x0, 4->4, wm 32768, RMno 0, reminc 80, dom 0]
kjga st 0x4, step 0.34.0, cinc 84, rmno 23930, flags 0x20
lb 32768, hb 49151, myb 45537, drmb 45537, apifrz 1
FUSION MSG DUMP END
kjmvalidate: drm drop a message RMno 23930 from 3 type 34
mver 84 myver 84 seq 0.0 flag xa9
DRM(23931) win(3) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
DRM(23931) win(4) lms 0 finished replaying gcs resources
*** 2011-11-03 03:02:46.291
lms 0 finished fixing gcs write protocol
DRM(23931) win(5) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
DRM(23931) win(6) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
DRM(23931) win(7) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
DRM(23931) win(8) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
*** 2011-11-03 03:02:47.435
DRM(23932) win(1) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
DRM(23932) win(2) lms 0 finished replaying gcs resources
lms 0 finished fixing gcs write protocol
*** 2011-11-03 03:02:48.380
DRM(23932) win(3) lms 0 finished replaying gcs resources
Incident 637552 created, dump file: /u01/app/oracle/diag/rdbms/aaadb/aaadb4/incident/incdir_637552/aaadb4_lms0_13488_i637552.trc
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
2011-11-03 03:02:49.503232 : kjbmprlst: internal error on replaying 0xe3.16e from 3
2011-11-03 03:02:49.503257 : kjbmbassert [0xe3.16e]
2011-11-03 03:02:49.503272 : kjbmsassert(0xe3.16e)(1)
2011-11-03 03:02:49.503355 : kjbmsassert(0xe3.16e)(2)
2011-11-03 03:02:49.503389 : kjbmsassert(0xe3.16e)(3)
kjmpbmsg fatal error on 39
MSG [39:KJX_B_REPLAY] {]n inc=84 len=136 sender=(3,3) seq=0
fg=ip stat=KJUSERSTAT_DONE spnum=13 flg=x24
flow ctrl: ver=0 flag=169 len=136 tkts=0 seq=0 wrp=0
sndr=3 dest=4 rcvr=1
FUSION MSG 0x2ab774dc4788,39 from 3 spnum 13 ver[84,23931] ln 136 sq[3,8]
REPLAY 1 [0xe3.16e, 576949.0] c[0x8cfc7a978,2307] [0x1,xec1]
grant 1 convert 0 role x0
pi [0x0.0x0] flags 0x0 state 0x100
disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
msgRM# 23931 bkt# 40988 drmbkt# 40988
pkey 576949.0 undo 0 stat 5 masters[32768, 3->32768] reminc 84 RM# 23931
flg x6 type x0 afftime x1e1b3f20
nreplays by lms 0 = 0
nreplays by lms 1 = 0
hv 110 [stat 0x0, 4->4, wm 32768, RMno 0, reminc 80, dom 0]
kjga st 0x4, step 0.36.0, cinc 84, rmno 23931, flags 0x20
lb 32768, hb 49151, myb 40988, drmb 40988, apifrz 1
FUSION MSG DUMP END
MSG [65521:PBATCH] inc=84 len=8376 sender=3 seq=396496720 fg=q
flow ctrl: ver=1 flag=37 len=8376 tkts=0 seq=396496720 wrp=0
sndr=3 dest=4 rcvr=1
MSG [39:KJX_B_REPLAY] {]W inc=84 len=136 sender=(3,3) seq=0
fg=ip stat=KJUSERSTAT_DONE spnum=13 flg=x24
FUSION MSG 0x2ab774dc3798,39 from 3 spnum 13 ver[84,23931] ln 136 sq[3,8]
REPLAY 1 [0xa2ed.257, 453155.0] c[0x68fdab748,4085] [0x1,x19e5]
grant 1 convert 0 role x0
pi [0x0.0x0] flags 0x0 state 0x100
disk scn 0x0.0 writereq scn 0x0.0 rreqid x0
msgRM# 23931 bkt# 37981 drmbkt# 37981
pkey 453155.0 undo 0 stat 5 masters[32768, 3->32768] reminc 84 RM# 23931
flg x6 type x0 afftime x1e1b3ef2
nreplays by lms 0 = 1156
nreplays by lms 1 = 1117
hv 121 [stat 0x0, 4->4, wm 32768, RMno 0, reminc 82, dom 0]
kjga st 0x4, step 0.36.0, cinc 84, rmno 23931, flags 0x20
lb 32768, hb 49151, myb 37981, drmb 37981, apifrz 1
FUSION MSG DUMP END
Dumping Work Area Table (level=1)
=====================================
Global SGA Info
—————
global target: 23808 MB
auto target: 20165 MB
max pga: 2048 MB
pga limit: 4096 MB
pga limit known: 0
pga limit errors: 0
pga inuse: 1402 MB
pga alloc: 1872 MB
pga freeable: 312 MB
pga freed: 71835 MB
pga to free: 0 %
broker request: 0
pga auto: 0 MB
pga manual: 0 MB
pga alloc (max): 5139 MB
pga auto (max): 2049 MB
pga manual (max): 0 MB
# workareas : 0
# workareas(max): 47
COMPONENT NAME : ASM Buffer Cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
Mapping for Component id to Component name:
1: shared pool
2: large pool
3: java pool
4: streams pool
5: SGA Target
7: DEFAULT buffer cache
8: KEEP buffer cache
9: RECYCLE buffer cache
10: DEFAULT 2K buffer cache
11: DEFAULT 4K buffer cache
12: DEFAULT 8K buffer cache
13: DEFAULT 16K buffer cache
14: DEFAULT 32K buffer cache
15: Shared IO Pool
16: PGA Target
17: ASM Buffer Cache
Dumping memory transfer history; start=0
Dumping resize operations history; start=22
1: 0, 0, 144, 0->22, fin=22, st=2
2: 0, 0, 146, 0->1, fin=1, st=2
3: 0, 0, 148, 0->1, fin=1, st=2
4: 0, 0, 150, 0->1, fin=1, st=2
5: 0, 0, 575, 0->121, fin=121, st=2
7: 0, 0, 639, 0->95, fin=95, st=2
8: 0, 0, 646, 0->0, fin=0, st=2
9: 0, 0, 647, 0->0, fin=0, st=2
10: 0, 0, 641, 0->0, fin=0, st=2
11: 0, 0, 642, 0->0, fin=0, st=2
12: 0, 0, 643, 0->0, fin=0, st=2
13: 0, 0, 644, 0->0, fin=0, st=2
14: 0, 0, 645, 0->0, fin=0, st=2
15: 0, 0, 649, 0->0, fin=0, st=2
16: 0, 0, 1780, 0->81, fin=81, st=2
17: 0, 0, 639, 0->0, fin=0, st=2
7: 1, 0, 639, 95->95, fin=95, st=2
16: 3, 1, 1780, 81->85, fin=85, st=2
7: 4, 2, 639, 95->91, fin=95, st=3
16: 3, 1, 1780, 85->89, fin=89, st=2
16: 3, 1, 1780, 89->93, fin=93, st=2
7: 4, 2, 639, 95->91, fin=95, st=3
Dumping last 2 granules of Transferred Memory
Granule Header dump for 0xb4fffe000, hdrsz=72, gran size=268435456
Trace Bucket Dump End: KSI
—– Auto SGA Transfer History —–
GRANULE SIZE is 268435456
COMPONENT NAME : shared pool, curnum=22, usrsz=0, tgt=22, min=2, st=0, flg=80
Number of granules in inuse list (listid 2) is 22
COMPONENT NAME : large pool, curnum=1, usrsz=0, tgt=1, min=0, st=0, flg=80
Number of granules in inuse list (listid 2) is 1
COMPONENT NAME : java pool, curnum=1, usrsz=0, tgt=1, min=1, st=0, flg=80
Number of granules in inuse list (listid 2) is 1
COMPONENT NAME : streams pool, curnum=1, usrsz=0, tgt=1, min=0, st=0, flg=80
Number of granules in inuse list (listid 2) is 1
COMPONENT NAME : SGA Target, curnum=121, usrsz=0, tgt=117, min=0, st=0, flg=0
COMPONENT NAME : System Memory, curnum=81, usrsz=0, tgt=85, min=0, st=0, flg=0
Number of granules in inactive list (listid 0) is 81
COMPONENT NAME : DEFAULT buffer cache, curnum=95, usrsz=0, tgt=91, min=1, st=0, flg=80
Number of granules in inuse list (listid 2) is 95
COMPONENT NAME : KEEP buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : RECYCLE buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : DEFAULT 2K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : DEFAULT 4K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : DEFAULT 8K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : DEFAULT 16K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : DEFAULT 32K buffer cache, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : Shared IO Pool, curnum=0, usrsz=0, tgt=0, min=0, st=0, flg=0
COMPONENT NAME : PGA Target, curnum=93, usrsz=0, tgt=93, min=0, st=0, flg=0
GES destination context:
GES remote instance per receiver context:
Rcvr 1 state 0x0
rbufs tot 1100 in_ksxp 1100 ubufs 0 free 10024
msgs rdyq 0 watq 209 nsq 0 routq 1107
rmpool: alloc 1775 avail 10024
ibatch: 2[14340411] 3[1331955] 4[230018] 5[77584] 6[41863] 7[29418] 8[22906] 9[15597] 10[12489] 11[10986]
ibatch: 12[9804] 13[8399] 14[7605] 15[7279] 16[14835] 17[5326] 18[1268] 19[534] 20[399]
ibatch: 21[333] sum 16169009
rcvs 16173636 tot msgs 35913409 max sz 103 avg sz 2
pbatch: 2[13145648] 3[976483] 4[328637] 5[191756] 6[140659] 7[110263] 8[150497] 9[74627] 10[72052] 11[64821]
pbatch: 12[63710] 13[58268] 14[63716] 15[50112] 16[415654] 17[9120] 18[2760] 19[1854] 20[1535]
pbatch: 21[1369] sum 15923541
rcvs 407779634 tot msgs 445570264 max sz 103 avg sz 2
watq : 32:34 36:10 44:3 73:9 74:54 100:13 rsv:(tot 86 ib 4 pb 82)
watq : embed: 33:4 34:9 35:4 36:14 37:3 44:21 86:115 87:2
watq : embed: 88:1 100:2
kjctseventdump-begin tail 29 heads 29 @ -1847136902 29 @ -1847136902 29 @ -1847136902 63 @ 0
#0 id 364 exid 66573 dests 0000
#1 id 364 exid 66573 dests 0000
kjctseventdump-end tail 29 heads 29 @ -1847136902 29 @ -1847136902 29 @ -1847136902 63 @ 0
End of KJC Communication Dump
START DEFER MSG QUEUE 0 ON LMS0 flg x87:
MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(3,3) seq=6285124
fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
SYNC_RCVCHANNEL 0xcf3c0f5c0 from 3 spnum 11 ver[84,2]
receiver 0 last sent seq 0.6285080
receiver 1 last sent seq 0.396496722
receiver 2 last sent seq 0.404953232
MSG [25:KJX_FTDONE] inc=84 len=64 sender=(3,3) seq=233131925
fg=s stat= spnum=11 flg=x20
MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(2,2) seq=5701455
fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
SYNC_RCVCHANNEL 0xcf3d171c0 from 2 spnum 11 ver[84,1280262987]
receiver 0 last sent seq 0.5694348
receiver 1 last sent seq 0.121316137
receiver 2 last sent seq 0.133410556
MSG [25:KJX_FTDONE] inc=84 len=64 sender=(2,2) seq=134378202
fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(3,3) seq=6285125
fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
SYNC_RCVCHANNEL 0xcf4915530 from 3 spnum 11 ver[84,2]
receiver 0 last sent seq 0.6285080
receiver 1 last sent seq 0.396496744
receiver 2 last sent seq 0.404953256
MSG [25:KJX_FTDONE] inc=84 len=64 sender=(3,3) seq=233131925
fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
MSG [61:KJX_SYNC_RCVCHANNEL] inc=84 len=424 sender=(1,1) seq=9049842
fg=iq stat=KJUSERSTAT_DONE spnum=11 flg=x1
SYNC_RCVCHANNEL 0xcf427bad0 from 1 spnum 11 ver[84,1280262987]
receiver 0 last sent seq 0.9049829
receiver 1 last sent seq 0.377205652
receiver 2 last sent seq 0.377998312
MSG [25:KJX_FTDONE] inc=84 len=64 sender=(1,1) seq=60389025
fg=s stat=KJUSERSTAT_ATTACHFAILED spnum=11 flg=x0
END DEFER MSG QUEUE 0 DUMP ON LMS0
SEQUENCES:
[sidx 0] 0:377205631.0 1:121316104.0 2:396496713.0 3:0.0
DEFER MSG QUEUE 1 ON LMS0 IS EMPTY
SEQUENCES:
[sidx 1] 0:377205635.0 1:121316122.0 2:396496715.0 3:0.0
DEFER MSG QUEUE 2 ON LMS0 IS EMPTY
SEQUENCES:
[sidx 2] 0:377205635.0 1:121316122.0 2:396496715.0 3:0.0
error 484 detected in background process
ORA-00600: internal error code, arguments: [kjbmprlst:shadow], [], [], [], [], [], [], [], [], [], [], []
*** 2011-11-03 03:02:51.630
LMS0 (ospid: 13488): terminating the instance due to error 484
ksuitm: waiting up to [5] seconds before killing DIAG(13471)
*** 2011-11-03 03:03:06.283
Instance termination failed to kill one or more processes
ksuitm_check: OS PID=18888 is still alive
ksuitm_check: OS PID=18755 is still alive
DRM特性的问题,已经是一个常见问题了,可以看看老熊的blog,一样的案例,肯定要禁用掉, 你这个案例,还得做service分业务减少缓存融合,还有就是那一个update语句得从业务层面优化,不要那么频繁的更新一个表,而且估计没有做批量提交以及出现同时更新同一行的情况,个人对extradata做性能测试后,觉得extraddata不是很适合做oltp,做oltp的话,extradata的压缩技术,存储技术都没办法发挥,还不如用两台台高性能的多核多线程的小型机来完成工作,如类似780一类的机器,我编写了一个oltp 的反复更新,插入,冲突较多的测试模型在extra上,性能真的不敢恭维。
11g DRM增强不少,对于读比写多的OLTP应用还是不错的, 但是bug也确实有点多。我个人的意见还是要视乎不同的应用特性,选择是否禁用。
不过还是要感谢你的建议!
感谢您完整详细的分享