客户有一套AIX 5.3上的10.2.0.4.5生产库系统,最近频繁出现”KKSFBC CHILD COMPLETION”等待,同时导致session不断spin消耗CPU并hang住,从表象看这似乎是由bug引起的。以KKSFBC CHILD COMPLETION为关键字到MOS查询可以找到<Bug 6795880 – Session spins / OERI after ‘kksfbc child completion’ wait – superceded [ID 6795880.8]>,该Bug的症状为进程不断spin且hang住、出现’KKSFBC CHILD COMPLETION’等待事件、还可能伴有’Waits for “cursor: pin S”‘等待事件,直接影响的版本有11.1.0.6、10.2.0.3和10.2.0.4。
对于该Bug的描述是在发生’kksfbc child completion’等待事件后会话陷入无休止的自旋(spins)中,这种自旋(spins)发生在由堆栈调用(stack call)kksSearchChildList->kkshgnc陷入对kksSearchChildList函数的无限循环中。
就当前用户提供的版本号及等待事件信息仍不足以定位到该Bug,我们需要更详细的stack call。所幸的是这个trouble是可以重现的(reproduceable),在之后的一次案发现场我们得到了必要的信息:
Name PID CPU% PgSp Owner
oracle 3723390 10.0 7.0 oracle
SQL> oradebug setospid 3723390
Oracle pid: 155, Unix process pid: 3723390, image: oracle@lmsdb1
SQL> oradebug unlimit;
Statement processed.
SQL> oradebug short_stack;
ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044F0
<-kksfbc+0b9c<-kkspsc0+0ffc<-kksParseCursor+00d4
<-opiosq0+0ae0<-kpooprx+0168<-kpoal8+0400<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124
<-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0098
SQL> oradebug dump processstate 10;
Statement processed.
SQL> oradebug dump systemstate 266;
Statement processed.
=========================process state dump=========================
SO: 7000003f72e3378, type: 4, owner: 7000003f225c798, flag: INIT/-/-/0x00
(session) sid: 270 trans: 0, creator: 7000003f225c798, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-009B-0000017F, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 700000243ef9540, psql: 7000002fe76d9c0, user: 28/OLSUSER
service name: SYS$USERS
O/S info: user: newprepay, term: unknown, ospid: 1234, machine: newprepayC
program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
last wait for 'kksfbc child completion' blocking sess=0x0 seq=4397 wait_time=48840 seconds since wait started=144918
=0, =0, =0
Dumping Session Wait History
for 'kksfbc child completion' count=1 wait_time=48840
=0, =0, =0
可以从以上trace中看到会话确实曾长时间处于’kksfbc child completion’等待中,之后陷入无限自旋(spins)中消耗了大量CPU时间。但这里实际的表现又存有差异,引发无限循环的函数是kksfbc而不是kksSearchChildList(常规的调用序列是:kksParseCursor->kkspsc0->kksfbc ->kksSearchChildList->kkshgnc)。
kksfbc意为K[Kernel]K[Kompile]S[Shared]F[Find]B[Best]C[Child]该函数用以在软解析时找寻合适的子游标,在10.2.0.2以后引入了mutex互斥体来取代原有的Cursor Pin机制,Mutex较Latch更为轻量级。著名的Tanel Poder指出虽然mutex的引入改变了众多cursor pin的内部机制,但kksfbc仍需要持有library cache latches才能扫描library cache hash chains(However the traversing of library cache hash chains (the right child cursor lookup using kksfbc()) was still protected by library cache latches)。另一方面当kksfbc函数针对某个parent cursor找到合适child cursor后,可能使用KKSCHLPINx方法将该child cursor pin住,这个时候就需要exclusive地持有该child cursor相应的mutex,如:
SQL> select mutex_addr,requesting_session rs,blocking_session bs,location from x$mutex_sleep_history; MUTEX_ADDR RS BS LOCATION ---------------- ---------- ---------- ---------------------------------------- 00000000A3CF8928 159 148 kksfbc [KKSCHLPIN1] 00000000A3CEA8A8 159 0 kksfbc [KKSCHLPIN1] 00000000A3CF5508 159 0 kksfbc [KKSCHLPIN1] 00000000A3CF10E8 148 159 kksLockDelete [KKSCHLPIN6]
当有一个进程执行kksfbc,而其他进程可能需要陷入’kksfbc child completion’等待中(更多的是cursor:pin S等待事件),但这种等待一般是十分轻微的,你很难从某个”仪表”上观察到这一事件。因为一系列kksfbc相关的bug仅发生在10.2.0.2以后,可以猜测是由于mutex的引入引起的。
打破’kksfbc child completion’异常等待的一种行之有效的workaround方法是设置隐藏参数_use_kks_mutex_pin为false,即使用传统的由latch保护的Cursor pin;设置该hidden parameter需要重启实例,具体的设置方法如下:
alter system set "_kks_use_mutex_pin"=false scope=spfile; restart instance...........
但这种方法也仅仅是行之有效,而非万试万灵。从消极的角度来说,因为放弃了mutex保护cursor pin的机制,在解析频繁的系统中library cache latch的争用将白热化,有可能成为Top 5等待事件。
另一种值得推荐的workaround是减少硬编码的SQL语句,使用session_cached_cursor减少实际的软解析操作也有助于避免引发该Bug;当然这要求我们修改代码,但从长远来看这种改良是值得的。
最后Oracle在10.2.0.4上提供了该Bug的one-off Patch 8575528,其在10.2.0.4 psu4以后的等价补丁为(Equivalent patch)为merge patch 9696904:
8557428 | 9696904 7527908 | Both fixes are needed. 6795880 superceded by 8575528 in 9696904 which includes extra files so may cause new conflicts |
但merge patch 9696904目前仅有Linux x86/64平台上的版本,而问题数据库所在平台为IBM AIX on POWER Systems (64-bit)。如果要通过补丁来fix这个问题的话,AIX平台的用户可以要求Oracle development开发部门build一个Power版的9696904 patch,也可以升级到10.2.0.5上去;
注意Patch 8575528: MISSING ENTRIES IN V$MUTEX_SLEEP.LOCATION目前有IBM AIX on POWER Systems (64-bit)平台上10.2.0.4.4的版本,经过验证该one-off patch可以在10.2.0.4 PSU4以后的版本上实施(包括10.2.0.4.5/10.2.0.4.6等),不需要如以上描述地去apply 9696904这个merge patch。当然升级到10.2.0.5/11.1.0.7或以上版本依然有效。
此外apply以上8575528:后需要修改隐藏参数”_cursor_features_enabled”并重启才能使fix生效,具体设置防范如下:
ALTER SYSTEM SET "_cursor_features_enabled"=10 scope=spfile; restart instance.............
注意以上参数仅在使用one-off patch时需要设置,而当通过升级到10.2.0.5/11.1.0.7来修复问题时无需设置上述”_cursor_features_enabled”参数。
Getting ORA-00600 [kkshgnc-nextchild] With Wait For [kksfbc child completion]
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.4 and later [Release: 10.2 and later ]
All Platforms
Symptoms
The following error can be raised when using 10.2.0.4 (or lower releases of 10gR2) or 11.1.0.6:
ORA-00600: internal error code, arguments: [kkshgnc-nextchild], [], [], [], [], [], [], []
The call stack looks similar to:
ksedmp kgerinv kgeasnmierr kkshgnc kksSearchChildList kksfbc kkspsc0 kksParseCursor opiosq0 kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real main
and the trace file shows waits for “[kksfbc child completion]” similar to:
KSL WAIT END [latch: cache buffers chains]
KSL WAIT BEG [kksfbc child completion]
KSL WAIT END [kksfbc child completion]
KSL WAIT BEG [cursor: pin S wait on X]
KSL WAIT END [cursor: pin S wait on X]
Cause
The cause of this problem has been identified in Bug 6795880. It is caused by a session going into an infinite spin just after a wait for ‘kksfbc child completion’. The spin can be identified in the call stack by the following routines: kksSearchChildList -> kkshgnc, where kksSearchChildList loops forever.
This problem can also lead to internal error such as:
ORA-600 [kksSearchChildList1]
ORA-600 [kksSearchChildList2]
ORA-600 [kksSearchChildList3]
ORA-600 [kkshgnc-nextchild]
Solution
Bug 6795880 is fixed in RDBMS 10.2.0.5 patchset (only available for Linux x86 and Linux x64 platforms at time of publishing this article, i.e. MAY 2010) and in 11.1.0.7 patchset. There is no known workaround for this bug.
Please execute both the actions in Step A and in Step B as follows to resolve this issue.
A) As this bug introduces the problem described in unpublished Bug 8575528, download and apply Patch 8575528. Steps for obtaining the patch
1) Go to My Oracle Support
2) Activate the Patches & Updates tab
3) Click on Patch ID or Number
4) Enter the patch number 8575528 in the text field
5) Select your platform
6) Click on Search
AND
B) Set the following instance parameter and restart the instance.Please make a note that this parameter is needed to enable the fix. Just applying Patch is not sufficient. When using an SPFILE, issue:
ALTER SYSTEM SET “_cursor_features_enabled”=10 scope=spfile;
and restart the instance.
For PFILE usage, set following parameter in PFILE and restart the instance.
_cursor_features_enabled=10
The parameter is only necessary when installing the one-off fix. When installing the 10.2.0.5 or 11.1.0.7 patch sets there is no need for this parameter.
Bug 6795880 Session spins / OERI after ‘kksfbc child completion’ wait – superceded
This note gives a brief overview of bug 6795880.
The content was last updated on: 05-APR-2011
Click here for details of each of the sections below.
Affects:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions BELOW 11.2
Versions confirmed as being affected
11.1.0.6
10.2.0.4
10.2.0.3
Platforms affected Generic (all / most platforms affected)
Note that this fix can cause / expose the problem described in Bug:8575528
Note that this fix has been superceded by the fix in Bug:8575528
Fixed:
This issue is fixed in
11.1.0.7 (Server Patch Set)
10.2.0.5 (Server Patch Set)
10.2.0.4 Patch 5 on Windows Platforms
Symptoms:
Related To:
Hang (Process Spins)
Internal Error May Occur (ORA-600)
ORA-600 [kksSearchChildList1]
ORA-600 [kksSearchChildList2]
ORA-600 [kksSearchChildList3]
ORA-600 [kkshgnc-nextchild]
Dump in or under kksSearchChildList4
Waits for “cursor: pin S”
Waits for “kksfbc child completion”
This fix is DISABLED by default
_CURSOR_FEATURES_ENABLED
Description
This fix has been superseded by bug:8575528.
A session may go into an infinite spin just after a wait
for ‘kksfbc child completion’. The spin occurs with
a stack including kksSearchChildList -> kkshgnc where
kksSearchChildList loops forever.
This problem can also lead to internal error such as any of
ORA-600 [kksSearchChildList1], ORA-600 [kksSearchChildList2]
ORA-600 [kksSearchChildList3], ORA-600 [kkshgnc-nextchild]
Or Trace dumps on kksSearchChildList4 .
Note:
This fix is disabled by default in 10g.
To enable this fix you must explicitly set the following
parameter for instance startup:
“_cursor_features_enabled” = 10
Applying p8575528_102044_Linux-x86-64.zip on PSU 5 10.2.0.4.5
[oracle@rh2 patches]$ cp /home/oracle/p8575528_102044_Linux-x86-64.zip ./
[oracle@rh2 patches]$ unzip p8575528_102044_Linux-x86-64.zip
Archive: p8575528_102044_Linux-x86-64.zip
creating: 8575528/
creating: 8575528/files/
creating: 8575528/files/lib/
creating: 8575528/files/lib/libserver10.a/
inflating: 8575528/files/lib/libserver10.a/kks1.o
inflating: 8575528/files/lib/libserver10.a/kksc.o
inflating: 8575528/files/lib/libserver10.a/kksh.o
inflating: 8575528/files/lib/libserver10.a/ksmp.o
creating: 8575528/etc/
creating: 8575528/etc/config/
inflating: 8575528/etc/config/inventory
inflating: 8575528/etc/config/actions
creating: 8575528/etc/xml/
inflating: 8575528/etc/xml/GenericActions.xml
inflating: 8575528/etc/xml/ShiphomeDirectoryStructure.xml
inflating: 8575528/README.txt
[oracle@rh2 patches]$ cd 8575528/
[oracle@rh2 8575528]$ $ORACLE_HOME/OPatch/opatch apply
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation. All rights reserved.
Oracle Home : /s01/db_1
Central Inventory : /s01/oraInventory
from : /etc/oraInst.loc
OPatch version : 11.2.0.1.3
OUI version : 10.2.0.4.0
OUI location : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-09_22-03-44PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
ApplySession applying interim patch ‘8575528’ to OH ‘/s01/db_1’
Running prerequisite checks…
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Yes
OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = ‘/s01/db_1’)
Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch ‘8575528’ for restore. This might take a while…
Backing up files affected by the patch ‘8575528’ for rollback. This might take a while…
Patching component oracle.rdbms, 10.2.0.4.0…
Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/kks1.o”
Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/kksc.o”
Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/kksh.o”
Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/ksmp.o”
Running make for target ioracle
ApplySession adding interim patch ‘8575528’ to inventory
Verifying the update…
Inventory check OK: Patch ID 8575528 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 8575528 are present in Oracle Home.
The local system has been patched and can be restarted.
OPatch succeeded.
[oracle@rh2 8575528]$ $ORACLE_HOME/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.3
Oracle Interim Patch Installer version 11.2.0.1.3
Copyright (c) 2010, Oracle Corporation. All rights reserved.
Oracle Home : /s01/db_1
Central Inventory : /s01/oraInventory
from : /etc/oraInst.loc
OPatch version : 11.2.0.1.3
OUI version : 10.2.0.4.0
OUI location : /s01/db_1/oui
Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-09_22-16-21PM.log
Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt
Lsinventory Output file location : /s01/db_1/cfgtoollogs/opatch/lsinv/lsinventory2011-05-09_22-16-21PM.txt
——————————————————————————–
Installed Top-level Products (2):
Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 3 10.2.0.4.0
There are 2 products installed in this Oracle Home.
Interim patches (4) :
Patch 8575528 : applied on Mon May 09 22:06:33 CST 2011
Unique Patch ID: 12363258
Created on 5 Apr 2010, 16:16:27 hrs PST8PDT
Bugs fixed:
8575528
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch 7527908 : applied on Fri May 06 22:25:33 CST 2011
Unique Patch ID: 12546933
Created on 30 Apr 2010, 12:48:09 hrs PST8PDT
Bugs fixed:
7527908
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch 9654991 : applied on Fri May 06 21:33:38 CST 2011
Unique Patch ID: 12816418
Created on 30 Jun 2010, 18:30:40 hrs PST8PDT
Bugs fixed:
9714832, 9713537, 9678695, 9655017, 9173248, 8309642, 9678697, 9678690
9573054, 9654991
This patch overlays patches:
9352164
This patch needs patches:
9352164
as prerequisites
Patch 9352164 : applied on Fri May 06 21:29:20 CST 2011
Unique Patch ID: 12307477
Created on 1 Apr 2010, 11:33:14 hrs PST8PDT
Bugs fixed:
9377578, 6418420, 7835247, 7207654, 7592346, 6724797, 7936993, 7331867
9093300, 7535152, 7552067, 5879114, 8866013, 5457450, 8344348, 7272297
7136866, 7196894, 6258480, 7013124, 6512622, 6355586, 7196532, 8568395
8309587, 7557226, 5399699, 6509115, 8568397, 8568398, 7127618, 5701695
6052226, 7424804, 6817593, 7553884, 6741425, 7513673, 6783812, 8437213
6452766, 6469211, 7527650, 8309592, 8479537, 5991038, 5686407, 6945157
7639602, 9119226, 6403091, 7589862, 7552082, 6711853, 8304589, 6052169
8199266, 6327692, 5756769, 9352191, 7460818, 6268409, 8232056, 6687381
6972843, 8230457, 6800507, 7027551, 6778714, 6200820, 6645719, 7393804
6775231, 3934160, 6683178, 6650256, 5085288, 7528105, 7329252, 6378112
6151380, 6844866, 4723109, 8544896, 5126719, 5890312, 5934363, 7036453
7610362, 8426816, 8433026, 7270434, 7172531, 6451626, 8247855, 5497338
6324944, 6874522, 7175513, 6960489, 7341598, 8576156, 6797677, 8342923
5895190, 6756739, 7150470, 7593835, 7356443, 7044551, 8227106, 4695511
7298688, 5747462, 8556340, 7197445, 5348308, 7937113, 8341623, 7569205
8409848, 6053134, 6163771, 6851669, 6181488, 6375150, 6825866, 7210921
7295780, 6345573, 7033630, 6954722, 6824129, 7523475, 7457766, 7309458
8324577, 6840740, 6804746, 7375611, 8268054, 6981690, 6512811, 6988017
7375613, 8344399, 7340448, 8362683, 7375617, 8251247, 5933656, 6005347
9145204, 6599920, 7238230, 6379441, 6452375, 6352003, 6833965, 7136489
6610218, 7612639, 6392076, 7225204, 9119194, 5476236, 9442328, 7609057
7609058, 6605106, 6374297, 6193945, 4693355, 8217795, 7039896, 7432514
7330909, 6952701, 7190270, 8287155, 7587008, 7207932, 6802650, 7189447
8481935, 4598439, 9442331, 6615740, 7155655, 6749617, 9442335, 7159505
5868257, 5727166, 7173005, 6917874, 9442339, 7013768, 7691766, 7385253
7291739, 7225720, 7257770, 7363767, 7244238, 6941717, 8267348, 7710551
8354686, 7247217, 8328954, 7299153, 8909984, 6681695, 8702276, 9119284
8217011, 7661251, 6265559, 6823287, 6991626, 6954829, 5259835, 6500033
5923486, 7432601, 7022234, 8534387, 5147386, 7697802, 6653934, 7375644
6490140, 7662491, 8331466, 5623467, 6070225, 6635214, 7396409, 6638558
7038750, 6714608, 6838714, 6870937, 7219752, 7263842, 7278117, 6882739
5404871, 8836667, 8373286, 7393292, 6678845, 6903051, 7936793, 6600051
7155248, 4966512, 7155249, 7197637, 8836308, 8568402, 8568404, 8568405
8431487, 5704108, 6343150, 7280764, 6923450, 7643632, 6145177, 8836671
8310931, 6640411, 8347704, 8836675, 7155250, 7155251, 8836677, 7155252
8836678, 7155253, 8292378, 7155254, 6219529, 7411865, 8227091, 8340379
7276960, 6145687, 7659217, 5863926, 7022905, 6852598, 7123643, 6596564
6413089, 6851438, 8836681, 8836683, 8836684, 8836686, 7579469, 7494333
7315642, 8340383, 6786022, 8340387, 6926448, 7600026, 7462072, 6679303
8815639, 7197583, 7172752, 7326645, 7008262, 9173244, 9173248, 7573151
8490879, 7477934, 6725634, 6733655, 6799205, 6980597, 7499353, 6084232
6014513, 7140204, 7254987, 8833280, 6647480, 6120004, 7693128, 6760697
6051177, 8247215, 6858062, 7189645, 6844739, 6768251, 7196863, 5630796
7378661, 7378735, 5970301, 6705822, 8290506, 6658484, 7599944, 9173253
8309623, 7125408, 7257461, 6987790, 7568556, 6919819, 8886674, 5883691
6955744, 7801939, 6074620, 7149004, 6857917, 8283650, 6445948, 5929055
6110752, 7552042, 8210889, 8287504, 6506617, 7306915, 6271590, 5386204
6976005, 8330783, 7606362, 5377242, 7043989, 8309632, 7575925, 6870047
8309637, 5902053, 8309639, 7028176, 6827260, 7588384, 4726401, 6720712
5910650, 6752765, 6971433, 6024730, 8315482, 6628122, 8239142, 9352164
5695562, 4637902, 7345904, 8309642, 6994160, 8556586, 6404447, 8220734
6919764, 7597354, 7523787, 6029179, 5231155, 6455659
Earlier there was a dictionary corruption during CTAS in parallel due to inconsistency in SEG$ table. Then customer did PITR. Now the customer is getting an ORA-600 [kksfbc-reparse-infinite-loop] during a data load following a point in time recovery.