5月26日某客户告警日志中出现的ORA-00600 [4400],[48]错误记录,并产生了trace文件:,
*** SERVICE NAME:(ETL) 2010-05-26 16:45:45.930 *** SESSION ID:(262.12024) 2010-05-26 16:45:45.930 *** 2010-05-26 16:45:45.930 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [4400], [48], [], [], [], [], [], [] ORA-10387: parallel query server interrupt (normal) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 0FFFFFFFF ? 000000020 ? ksedmp+0290 bl ksedst 1048DFFC0 ? ksfdmp+0018 bl 03F5B014 kgeriv+0108 bl _ptrgl kgeasi+0118 bl kgeriv 10496FEC8 ? 000000002 ? 700000010008000 ? 110000AD0 ? 110190290 ? ktcddt+012c bl kgeasi 110190110 ? 110450040 ? 113000001130 ? 200000002 ? 100000001 ? 000000000 ? 000000030 ? 000000013 ? ktcsod+0384 bl ktcddt 0000003C0 ? 000000018 ? 1048D4808 ? kssdch_stage+0758 bl _ptrgl .................................................... ----- End of Call Stack Trace ----- ******************* Dumping process map **************** 377144 : ora_p001_ETL 100000000 93710K read/exec oracle 1100007ef 2372K read/write oracle 9fffffff0000000 44K read/exec /usr/ccs/bin/usla64 9fffffff000b30a 0K read/write /usr/ccs/bin/usla64 900000000375880 484K read/exec /usr/lib/liblvm.a[shr_64.o] 9001000a0121ac8 135K read/write /usr/lib/liblvm.a[shr_64.o] 900000000426180 74K read/exec /usr/lib/libcfg.a[shr_64.o] 9001000a00f5ed0 26K read/write /usr/lib/libcfg.a[shr_64.o] 90000000034e280 2K read/exec /usr/lib/libcrypt.a[shr_64.o] 9001000a00d4760 0K read/write /usr/lib/libcrypt.a[shr_64.o] 90000000069a7b0 4K read/exec /usr/lib/libc.a[aio_64.o] 9001000a022a340 0K read/write /usr/lib/libc.a[aio_64.o] 90000000035f300 85K read/exec /usr/lib/libodm.a[shr_64.o] 9001000a00d5c08 35K read/write /usr/lib/libodm.a[shr_64.o] 9000000003ef080 83K read/exec /usr/lib/libperfstat.a[shr_64.o] 9001000a01d0818 9K read/write /usr/lib/libperfstat.a[shr_64.o] 900000000358000 0K read/exec /usr/lib/libdl.a[shr_64.o] 9001000a0219000 0K read/write /usr/lib/libdl.a[shr_64.o] 9000000007ff100 8588K read/exec /oracle/product/10.2.0/lib/libjox10.a[shr.o] 8001000a0000038 585K read/write /oracle/product/10.2.0/lib/libjox10.a[shr.o] 9000000004a0000 228K read/exec /usr/lib/libpthreads.a[shr_xpg5_64.o] 9001000a0144000 558K read/write /usr/lib/libpthreads.a[shr_xpg5_64.o] 900000000045500 2966K read/exec /usr/lib/libc.a[shr_64.o] 9001000a0000788 844K read/write /usr/lib/libc.a[shr_64.o] Total 110843K ******************* End of process map dump **************** =================================================== PROCESS STATE ------------- Process global information: process: 7000000cf48bf98, call: 7000000b40ebdf0, xact: 7000000cdece7a8, curses: 7000000cf666540, usrses: 7000000cf666540 ---------------------------------------- SO: 7000000cf48bf98, type: 2, owner: 0, flag: INIT/-/-/0x00 (process) Oracle pid=43, calls cur/top: 7000000b40ebdf0/7000000b40ebdf0, flag: (0) - int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 115 0 4 last post received-location: kslpsr last process to post me: 7000000cf479c38 1 6 last post sent: 0 0 251 last post sent-location: kxfpqr: QC last process posted by me: 7000000cf48a7f8 10 0 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: 7000000cf50bd50 O/S info: user: oracle, term: UNKNOWN, ospid: 377144 OSD pid info: Unix process pid: 377144, image: oracle@etl_a (P001) SO: 7000000cf666540, type: 4, owner: 7000000cf48bf98, flag: INIT/-/-/0x00 (session) sid: 262 trans: 7000000cdece7a8, creator: 7000000cf48bf98, flag: (c0100041) USR/- BSY/-/-/DEL/-/- DID: 0001-0028-000FFB1E, short-term DID: 0001-002B-00151DA1 txn branch: 7000000cdf412c8 oct: 0, prv: 0, sql: 0, psql: 0, user: 26/ETL O/S info: user: bo, term: pts/0, ospid: 377144, machine: etl_a program: oracle@etl_a (P001) last wait for 'SQL*Net message from dblink' blocking sess=0x0 seq=45 wait_time=476 seconds since wait started=0 driver id=0, #bytes=1, =0 Dumping Session Wait History for 'SQL*Net message from dblink' count=1 wait_time=476 driver id=0, #bytes=1, =0 for 'SQL*Net message to dblink' count=1 wait_time=1 driver id=0, #bytes=1, =0 for 'log file sync' count=1 wait_time=7483 buffer#=ca9, =0, =0 for 'log file sync' count=1 wait_time=358 buffer#=ca9, =0, =0 for 'PX Deq: Execution Msg' count=1 wait_time=15527 sleeptime/senderid=1001ffff, passes=1, =0 for 'log file sync' count=1 wait_time=11855 buffer#=c57, =0, =0 for 'PX Deq: Execution Msg' count=1 wait_time=277142 sleeptime/senderid=1001ffff, passes=1, =0 for 'direct path write' count=1 wait_time=25 file number=5, first dba=103ca, block cnt=3 for 'direct path write' count=1 wait_time=6 file number=4, first dba=1abe1, block cnt=8 for 'row cache lock' count=1 wait_time=3307 cache id=5, mode=0, request=5 temporary object counter: 0 ---------------------------------------- Virtual Thread: kgskvt: 7000000c0e86b58, sess: 7000000cf666540, vc: 0, proc: 7000000cf48bf98 consumer group cur: OTHER_GROUPS (upd? 0), mapped: DEFAULT_CONSUMER_GROUP, orig: vt_state: 0x200, vt_flags: 0x30, blkrun: 0 is_assigned: 1, in_sched: 0 (0) vt_active: 0 (pending: 0) used quanta: 0 (cg: 0) cpu start time: 0, quantum status: 0x0 quantum checks to skip: 0, check thresh: 0 idle time: 0, active time: 0 (cg: 0) cpu yields: 0 (cg: 0), waits: 0 (cg: 0), wait time: 0 (cg: 0) queued time outs: 0, time: 0 (cur 0, cg 0) calls aborted: 0, num est exec limit hit: 0 undo current: 0k max: 0k ---------------------------------------- UOL used : 0 locks(used=0, free=2) KGX Atomic Operation Log 7000000bdcf0568 Mutex 0(0, 0) idn 0 oper NONE Cursor Parent uid 262 efd 9 whr 11 slp 0 oper=NONE pt1=7000000c69d0878 pt2=7000000bef37090 pt3=7000000b47119c8 pt4=0 u41=1 stt=0 KGX Atomic Operation Log 7000000bdcf05b0 Mutex 7000000bdc2d028(0, 2) idn 0 oper NONE Cursor Stat uid 262 efd 12 whr 1 slp 0 oper=NONE pt1=7000000bdc2cef8 pt2=0 pt3=0 pt4=0 u41=0 stt=0 KGX Atomic Operation Log 7000000bdcf05f8 Mutex 0(0, 0) idn 0 oper NONE Library Cache uid 262 efd 0 whr 0 slp 0 ---------------------------------------- SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00 (enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9 lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2 res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0 own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
从trace文件名就可以看出出错的是某个并行子进程,其最近调用堆栈为ktcsod->ktcddt->kgeasi(报错),ktcsod与ktcddt均为Oracle中内核事务控制模块函数 (Kernel Transaction Control),又该并行子进程最近等待为”SQL*Net message from dblink”事件,共等待476s(wait_time=476 seconds)。trace显示该并行会话包含分布式事务锁:
SO: 7000000cf7e3ec8, type: 5, owner: 7000000cf666540, flag: INIT/-/-/0x00
(enqueue) DX-00000010-00000000 DID: 0001-002B-00151DA9
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0x7000000cf8bd0b8, mode: X, lock_flag: 0x0
own: 0x7000000cf666540, sess: 0x7000000cf666540, proc: 0x7000000cf48bf98, prv: 0x7000000cf8bd0c8
即DX(Distributed transaction entry)锁;可以猜测该并行会话在等待远程事务被清理,但等待超时,故报错。
查询metalink可以发现Bug 5223587与该错误一致,该Bug的特征为:
1. 出现ORA-600[4400],且调用栈为ktcddt ,说明涉及到了分布式事务
2. trace显示当时无打开的游标
3. 可以看到当时的等待为’SQL*Net message from dblink’事件
Metalink文档[ID 444108.1]描 述该错误会在涉及远程数据库的并行DML操作中出现,但该错误仅会在清理阶段出现,故实际无影响(It is not critical as such, since it is happening during the cleanup of the operation.)。
该Bug可以通过实施Patch 5223587修复, 目前该补丁只有对应基础版本10.2.0.2和10.2.0.3的版本,该补丁在 10.2.0.3 上有AIX power(64 bit)和Solaris Sparc(64 bit)的版本,10.2.0.2上只有linux-64 与HP-UX的版本。因为该Bug一般不会产生不良影响,故实际上不建议应用补丁。