5月26日某客户告警日志中出现的ORA-00600 [4400],[48]错误记录,并产生了trace文件:,
*** SERVICE NAME:(ETL) 2010-05-26 16:45:45.930 *** SESSION ID:(262.12024) 2010-05-26 16:45:45.930 *** 2010-05-26 16:45:45.930 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], [] ORA-10387: parallel query server interrupt (normal) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 0FFFFFFFF ? 000000020 ? ksedmp+0290 bl ksedst 1048DFFC0 ? ksfdmp+0018 bl 03F5B014 kgeriv+0108 bl _ptrgl kgeasi+0118 bl kgeriv 10496FEC8 ? 000000002 ? 700000010008000 ? 110000AD0 ? 110190290 ? ktcddt+012c bl kgeasi 110190110 ? 110450040 ? 113000001130 ? 200000002 ? 100000001 ? 000000000 ? 000000030 ? 000000013 ? ktcsod+0384 bl ktcddt 0000003C0 ? 000000018 ? 1048D4808 ? kssdch_stage+0758 bl _ptrgl .................................................... ----- End of Call Stack Trace ----- ******************* Dumping process map **************** 377144 : ora_p001_ETL 100000000 93710K read/exec oracle 1100007ef 2372K read/write oracle 9fffffff0000000 44K read/exec /usr/ccs/bin/usla64 9fffffff000b30a 0K read/write /usr/ccs/bin/usla64 900000000375880 484K read/exec /usr/lib/liblvm.a[shr_64.o] 9001000a0121ac8 135K read/write /usr/lib/liblvm.a[shr_64.o] 900000000426180 74K read/exec /usr/lib/libcfg.a[shr_64.o] 9001000a00f5ed0 26K read/write /usr/lib/libcfg.a[shr_64.o] 90000000034e280 2K read/exec /usr/lib/libcrypt.a[shr_64.o] 9001000a00d4760 0K read/write /usr/lib/libcrypt.a[shr_64.o] 90000000069a7b0 4K read/exec /usr/lib/libc.a[aio_64.o] 9001000a022a340 0K read/write /usr/lib/libc.a[aio_64.o] 90000000035f300 85K read/exec /usr/lib/libodm.a[shr_64.o] 9001000a00d5c08 35K read/write /usr/lib/libodm.a[shr_64.o] 9000000003ef080 83K read/exec /usr/lib/libperfstat.a[shr_64.o] 9001000a01d0818 9K read/write /usr/lib/libperfstat.a[shr_64.o] 900000000358000 0K read/exec /usr/lib/libdl.a[shr_64.o] 9001000a0219000 0K read/write /usr/lib/libdl.a[shr_64.o] 9000000007ff100 8588K read/exec /oracle/product/10.2.0/lib/libjox10.a[shr.o] 8001000a0000038 585K read/write /oracle/product/10.2.0/lib/libjox10.a[shr.o] 9000000004a0000 228K read/exec /usr/lib/libpthreads.a[shr_xpg5_64.o] 9001000a0144000 558K read/write /usr/lib/libpthreads.a[shr_xpg5_64.o] 900000000045500 2966K read/exec /usr/lib/libc.a[shr_64.o] 9001000a0000788 844K read/write /usr/lib/libc.a[shr_64.o] Total 110843K ******************* End of process map dump **************** =================================================== PROCESS STATE ------------- Process global information: process: 7000000cf48bf98, call: 7000000b40ebdf0, xact: 7000000cdece7a8, curses: 7000000cf666540, usrses: 7000000cf666540 ---------------------------------------- SO: 7000000cf48bf98, type: 2, owner: 0, flag: INIT/-/-/0x00 (process) Oracle pid=43, calls cur/top: 7000000b40ebdf0/7000000b40ebdf0, flag: (0) - int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 115 0 4 last post received-location: kslpsr last process to post me: 7000000cf479c38 1 6 last post sent: 0 0 251 last post sent-location: kxfpqr: QC last process posted by me: 7000000cf48a7f8 10 0 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: 7000000cf50bd50 O/S info: user: oracle, term: UNKNOWN, ospid: 377144 OSD pid info: Unix process pid: 377144, image: oracle@etl_a (P001) SO: 7000000cf666540, type: 4, owner: 7000000cf48bf98, flag: INIT/-/-/0x00 (session) sid: 262 trans: 7000000cdece7a8, creator: 7000000cf48bf98, flag: (c0100041) USR/- BSY/-/-/DEL/-/- DID: 0001-0028-000FFB1E, short-term DID: 0001-002B-00151DA1 txn branch: 7000000cdf412c8 oct: 0, prv: 0, sql: 0, psql: 0, user: 26/ETL O/S info: user: bo, term: pts/0, ospid: 377144, machine: etl_a program: oracle@etl_a (P001) last wait for 'SQL*Net message from dblink' blocking sess=0x0 seq=45 wait_time=476 seconds since wait started=0 driver id=0, #bytes=1, =0 Dumping Session Wait History for 'SQL*Net message from dblink' count=1 wait_time=476 driver id=0, #bytes=1, =0 for 'SQL*Net message to dblink' count=1 wait_time=1 driver id=0, #bytes=1, =0 for 'log file sync' count=1 wait_time=7483 buffer#=ca9, =0, =0 for 'log file sync' count=1 wait_time=358 buffer#=ca9, =0, =0 for 'PX Deq: Execution Msg' count=1 wait_time=15527 sleeptime/senderid=1001ffff, passes=1, =0 for 'log file sync' count=1 wait_time=11855 buffer#=c57, =0, =0 for 'PX Deq: Execution Msg' count=1 wait_time=277142 sleeptime/senderid=1001ffff, passes=1, =0 for 'direct path write' count=1 wait_time=25 file number=5, first dba=103ca, block cnt=3 for 'direct path write' count=1 wait_time=6 file number=4, first dba=1abe1, block cnt=8 for 'row cache lock' count=1 wait_time=3307 cache id=5, mode=0, request=5 temporary object counter: 0 ---------------------------------------- Virtual Thread: kgskvt: 7000000c0e86b58, sess: 7000000cf666540, vc: 0, proc: 7000000cf48bf98 consumer group cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig: vt_state: 0x200, vt_flags: 0x30, blkrun: 0 is_assigned: 1, in_sched: 0 (0) vt_active: 0 (pending: 0) used quanta: 0 (cg: 0) cpu start time: 0, quantum status: 0x0 quantum checks to skip: 0, check thresh: 0 idle time: 0, active time: 0 (cg: 0) cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait time: 0 (cg: 0) queued time outs: 0, time: 0 (cur 0, cg 0) calls aborted: 0, num est exec limit hit: 0 undo current: 0k max: 0k ---------------------------------------- UOL used : 0 locks(used=0, free=2) KGX Atomic Operation Log 7000000bdcf0568 Mutex 0(0, 0) idn 0 oper NONE Cursor Parent uid 262 efd 9 whr 11 slp 0 oper=NONE pt1=7000000c69d0878 pt2=7000000bef37090 pt3=7000000b47119c8 pt4=0 u41=1 stt=0 KGX Atomic Operation Log 7000000bdcf05b0 Mutex 7000000bdc2d028(0, 2) idn 0 oper NONE Cursor Stat uid 262 efd 12 whr 1 slp 0 oper=NONE pt1=7000000bdc2cef8 pt2=0 pt3=0 pt4=0 u41=0 stt=0 KGX Atomic Operation Log 7000000bdcf05f8 Mutex 0(0, 0) idn 0 oper NONE Library Cache uid 262 efd 0 whr 0 slp 0 ---------------------------------------- SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00 (enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2 res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0 own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
从trace文件名就可以看出出错的是某个并行子进程,其最近调用堆栈为ktcsod->ktcddt->kgeasi(报错),ktcsod与ktcddt均为Oracle中内核事务控制模块函数 (Kernel Transaction Control),又该并行子进程最近等待为”SQL*Net message from dblink”事件,共等待476s(wait_time=476 seconds)。trace显示该并行会话包含分布式事务锁:
SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
(enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
即DX(Distributed transaction entry)锁;可以猜测该并行会话在等待远程事务被清理,但等待超时,故报错。
查询metalink可以发现Bug 5223587与该错误一致,该Bug的特征为:
1. 出现ORA-600[4400],且调用栈为ktcddt ,说明涉及到了分布式事务
2. trace显示当时无打开的游标
3. 可以看到当时的等待为’SQL*Net message from dblink’事件
Metalink文档[ID 444108.1]描 述该错误会在涉及远程数据库的并行DML操作中出现,但该错误仅会在清理阶段出现,故实际无影响(It is not critical as such, since it is happening during the cleanup of the operation.)。
该Bug可以通过实施Patch 5223587修复, 目前该补丁只有对应基础版本10.2.0.2和10.2.0.3的版本,该补丁在 10.2.0.3 上有AIX power(64 bit)和Solaris Sparc(64 bit)的版本,10.2.0.2上只有linux-64 与HP-UX的版本。因为该Bug一般不会产生不良影响,故实际上不建议应用补丁。
Applies to:
Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 10.2.0.3 – Release: 9.2 to 10.2
Information in this document applies to any platform.
***Checked for currency 30-Jul-2010***
Symptoms
ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], []
ORA-10387: parallel query server interrupt (normal)
This occurs while executing distributed query in parallel
The stack trace includes:
ksedmp kgeriv kgeasi ktcddt ktcsod kssdch_stage
ktcbod kssdch_stage ksuxds ….
The ORA-600 [4400] may also be accompanied or replaced by an ORA-600 [KXFPG1SG INV].
The call stack may be similar to:
kxfpg1sg kxfpgsg kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start qerpxStart selexe ctcdrv opiexe
Cause
This is unpublished bug: 5223587 “ORA-00600 [4400], [48] AND ORA-10387”
Details: ORA-600 [4400] occurs in one or more query slaves
Solution
Fixed releases: 10.2.0.4, 11g
Please check My Oracle Support to see if an interim patch exists for your platform
To check availability and obtain a patch from My Oracle Support:
a) Click on Patches & Updates Folder
b) Click on Patch Search.
c) Enter patch number: 5223587
d) Select your Platform
e) Click Search
Hdr: 5704068 10.1.0.5.0 RDBMS 10.1.0.5.0 PAR EXECUTION PRODID-5 PORTID-23 ORA-600 5223587
Abstract: ORA-600 [KXFPG1SG:INV] DURING SLAVE CREATION
*** 12/07/06 02:28 pm ***
TAR:
—-
6004392.992
PROBLEM:
——–
ORA-600 [kxfpg1sg:INV] happens to different kind of SQLs running in PARALLEL
So far, it happened to insert /*+ append */, CTAS, SELECTs …
DIAGNOSTIC ANALYSIS:
——————–
We actually have a lot of ORA-600 [4400] before we hit ORa-600 [kxfpg1sg:INV]
Fri Dec 1 03:17:12 2006
Errors in file
/vol101/oraprod/granite/admin/bdump/granite_p007_29114.trc:
ORA-600: internal error code, arguments: [4400], [48], [], [], [], [],
[], []
ORA-10387: parallel query server interrupt (normal)
Fri Dec 1 03:20:47 2006
Private_strands 198 at log switch
………………
Fri Dec 1 03:23:50 2006
Errors in file
/vol101/oraprod/granite/admin/bdump/granite_j035_12597.trc:
ORA-600: internal error code, arguments: [kxfpg1sg:INV],
[0xFFFFFFFF79CC70C8], [], [], [], [],
[], []
Fri Dec 1 03:25:34 2006
Completed checkpoint up to RBA [0x58aec.2.10], SCN: 16957015675
We thought that this could be related to bug 5685731. However, the process
that fails with OERI:[kxfpg1sg:INV] does use DB LINKs
last wait for ‘process startup’ blocking sess=0x0 seq=2388 wait_time=654054
type=50, process#=3aa, waited=7e1
WORKAROUND:
———–
none so far
RELATED BUGS:
————-
Bug 2797383 – ORA-600 [kxfpg1sg:INV] can occur during slave creation.
REPRODUCIBILITY:
—————-
at customer site only, but not at will
TEST CASE:
———-
not available
STACK TRACE:
————
kxfpg1sg
kxfpgsg
kxfrAllocSlaves
kxfrialo
kxfralo
qerpxStart
SUPPORTING INFORMATION:
———————–
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
gathering info from customer
DIAL-IN INFORMATION:
——————–
OWC
IMPACT DATE:
————
asap
These are executive reporting reports which we are not able to produce
due to these errors, these reports are very crucial reports being
looked by executives on daily basis, this is the reason we want
to get this resolved immediately, hence we have requested to
upgrade to sev 1