这是一套SunOS 5.10上的10.2.0.3的RAC系统,8月初告警日志中陆续出现以下记录:
Tue Aug 3 15:17:04 2010 Errors in file /u01/app/oracle/admin/prsi061/udump/prsi061a_ora_27774.trc: ORA-07445: exception encountered: core dump [__lwp_kill()+8] [SIGIOT] [unknown code] [0x6C7E00000000] [] []
SIGIOT信号伴随7445错误出现并不多见,因为该信号一般是用来实现相关的硬件异常的。
我们可以欣赏一下这个trace文件。
trace文件中的堆栈信息如下:
ksedmp()+744 CALL ksedst() 000000840 ? 1066C60CC ? 000000000 ? 1066C2BC0 ? 1066C1928 ? 1066C2328 ? ssexhd()+1240 CALL ksedmp() 000106400 ? 106530764 ? 106530000 ? 000106530 ? 000106400 ? 106530764 ? __sighndlr()+12 PTR_CALL 0000000000000000 10652D000 ? 1066C9EF0 ? 10652A72C ? 00010652D ? 000000006 ? 000000067 ? call_user_handler() CALL __sighndlr() 000000006 ? 1066C9EF0 ? +992 1066C9C10 ? 10033B1C0 ? 000000000 ? 000000005 ? sigacthandler()+84 CALL call_user_handler() FFFFFFFF7D500200 ? FFFFFFFF7D500200 ? 1066C9C10 ? 000000009 ? 000000000 ? 000000000 ? __lwp_kill()+8 PTR_CALL 0000000000000000 000000000 ? 1066C9EF0 ? 1066C9C10 ? FFFFFFFF7D500200 ? 000000000 ? FFFFFFFF7C73C000 ? raise()+16 FRM_LESS _pthread_kill() 000000000 ? 000000006 ? FFFFFFFF7F60AC48 ? FFFFFFFF7C54B048 ? 000000005 ? FFFFFFFF7C74CB50 ? abort()+208 CALL raise() 000000006 ? 000000006 ? 000000005 ? FFFFFFFF7C748500 ? FFFFFFFF7D500200 ? 000000005 ? vcsipc_poll()+1724 CALL FFFFFFFF7F5477E0 000001DA0 ? FFFFFFFF7F550D00 ? FFFFFFFF7F4205A8 ? 0001F107C ? 000000001 ? 000000000 ? skgxpwait()+5604 CALL vcsipc_poll() FFFFFFFF7FFECE90 ? 106747378 ? 000001FD0 ? FFFFFFFF7FFE78C8 ? 000001C00 ? 000200000 ? ksxpwait()+1804 CALL 0000000106524980 FFFFFFFF7F54FC28 ? 106747378 ? 000000000 ? FFFFFFFF7FFECF68 ? 0000004E2 ? FFFFFFFF7FFECE90 ? ksliwat()+2952 CALL ksxpwait() 000000000 ? 000101000 ? 000000000 ? 10652DB98 ? 000001000 ? 106533FC8 ? kslwaitns_timed()+4 CALL ksliwat() 000000000 ? 000000002 ? 8 00000007D ? 5798B6C18 ? 5798B6BA0 ? 000032033 ? kskthbwt()+232 CALL kslwaitns_timed() 00000007D ? 000000001 ? 00000007C ? 000000000 ? FFFFFFFF7FFED3B8 ? 000000001 ? kslwait()+116 CALL kskthbwt() 00000007D ? 00000007C ? 000000000 ? 000000007 ? 000032033 ? 000000001 ? ksxprcv()+916 CALL kslwait() 0925A2B0A ? 000000000 ? 00000000A ? 00000000A ? 000032033 ? 000000001 ? kclwcrs()+960 CALL ksxprcv() 0001056DE ? 10652B118 ? 00000007D ? 1056DE598 ? 00010652A ? 1056DE000 ? kclgclk()+10052 CALL kclwcrs() 3800143A8 ? 000000000 ? 000000000 ? 519F716A0 ? 000000007 ? 000106535 ? kcbzib()+19288 CALL kclgclk() 000106400 ? 00000000C ? FFFFFFFF7FFF5EB8 ? 000000000 ? 000000000 ? 000105400 ? kcbgtcr()+10528 CALL kcbzib() 5665FB520 ? FFFFFFFF7C058170 ? 000105C00 ? 000000000 ? 000000006 ? FFFFFFFF7FFF44A0 ? ktrget()+260 CALL kcbgtcr() FFFFFFFF7FFF5028 ? FFFFFFFF7FFF502C ? 5665FB520 ? 000000000 ? 000000000 ? 57FF6DA18 ? kdst_fetch()+872 CALL ktrget() FFFFFFFF7C058160 ? FFFFFFFF7C0580E0 ? 00000023F ? 000000000 ? FFFFFFFF7C058170 ? 3800172B8 ? kdstf0100101km()+50 CALL kdst_fetch() FFFFFFFF7C058158 ? 4 000000000 ? FFFFFFFF7FFF5688 ? 000106528 ? 00000023F ? 00000FC00 ? kdsttgr()+27872 CALL kdstf0100101km() FFFFFFFF7C058158 ? 4E6AB809E ? 000000001 ? 000000000 ? 54C540BB8 ? FFFFFFFF7C058038 ? qertbFetch()+720 CALL kdsttgr() 000000000 ? 000000000 ? FFFFFFFF7C054EA8 ? FFFFFFFF7C058158 ? 000000004 ? 1032109C0 ? qerflFetch()+172 PTR_CALL 0000000000000000 000000001 ? 000000001 ? 1056DE068 ? FFFFFFFF7FFF6348 ? 10652B298 ? 000000002 ? opifch2()+8204 PTR_CALL 0000000000000000 FFFFFFFF7C058898 ? 102527EE0 ? FFFFFFFF7FFF69D0 ? 000000001 ? 103210000 ? 10320CA00 ? opifch()+52 CALL opifch2() FFFFFFFF7FFF6878 ? 000000090 ? 000000000 ? 000000001 ? 000000000 ? 105A2C000 ? opipls()+3532 CALL opifch() 000000005 ? 000000002 ? FFFFFFFF7FFF6F20 ? 000000002 ? 000000000 ? 000000001 ? opiodr()+1548 PTR_CALL 0000000000000000 000106400 ? 10653A000 ? 000105800 ? 000000010 ? 00010653A ? FFFFFFFF7B6392D8 ? rpidrus()+196 CALL opiodr() 10576DC08 ? 000000066 ? 10652B000 ? 000000001 ? FFFFFFFF7C03A830 ? 00010652D ? skgmstack()+168 PTR_CALL 0000000000000000 FFFFFFFF7FFF8350 ? 000000006 ? FFFFFFFF7FFF8100 ? 10652A000 ? 000000066 ? 1056DE000 ? rpidru()+172 CALL skgmstack() 10034D1E0 ? FFFFFFFF7FFF8350 ? 00000F618 ? 10034D1E0 ? FFFFFFFF7FFF8350 ? FFFFFFFF7FFF8328 ? rpiswu2()+500 PTR_CALL 0000000000000000 FFFFFFFF7FFF8B18 ? 1056C3000 ? 1056C2B90 ? 1056C0F50 ? 000000C10 ? 000000182 ? rpidrv()+1696 CALL rpiswu2() 000000000 ? 10652B298 ? 000000000 ? FFFFFFFF7FFF84E8 ? 1056DE000 ? 00010652A ? psddr0()+516 CALL rpidrv() FFFFFFFF7FFF8EC0 ? 000105C00 ? FFFFFFFF7FFF89C4 ? 000000002 ? FFFFFFFF7B615F60 ? 00010652D ? psdnal()+512 CALL psddr0() 106541CE0 ? 10652B298 ? 000000066 ? 1056DE068 ? 000000008 ? 00000000A ? pevm_BFTCHC()+308 PTR_CALL 0000000000000000 FFFFFFFF7FFF9CA8 ? 00000000A ? 000000000 ? FFFFFFFF7B6396F8 ? 106537000 ? 10652B000 ? pfrinstr_FTCHC()+18 CALL pevm_BFTCHC() 000000000 ? 105AE7600 ? 0 555E62580 ? FFFFFFFF7C069EE8 ? FFFFFFFF7B6396F8 ? 000000000 ? pfrrun_no_tool()+72 PTR_CALL 0000000000000000 000000000 ? 000000000 ? FFFFFFFF7C069F50 ? FFFFFFFF7C069EE8 ? 0000001EE ? 555E62892 ? pfrrun()+832 CALL pfrrun_no_tool() FFFFFFFF7C069EE8 ? 555E6288E ? FFFFFFFF7C069F50 ? 105B1BF50 ? 000002001 ? 000002001 ? plsql_run()+696 CALL pfrrun() FFFFFFFF7C035420 ? FFFFFFFF7C069EE8 ? 000002001 ? 000200000 ? FFFFFFFF7C069EE8 ? 0001056DE ? peicnt()+260 CALL plsql_run() 000000006 ? 000000000 ? FFFFFFFF7B63BBF8 ? FFFFFFFF7FFF9888 ? 000000180 ? 000000007 ? kkxexe()+616 CALL peicnt() FFFFFFFF7FFFA808 ? 10652B298 ? 106541CE0 ? 106762258 ? 10652B000 ? 10652B000 ? opiexe()+12736 CALL kkxexe() FFFFFFFF7B63F4B8 ? 106537000 ? 000106537 ? FFFFFFFF7FFF9CA8 ? 000000000 ? 54C0616F0 ? kpoal8()+1912 CALL opiexe() 000106400 ? FFFFFFFF7C056EC0 ? 000000000 ? 000000000 ? 000000000 ? 57FDB9250 ? opiodr()+1548 PTR_CALL 0000000000000000 0BFFFFC00 ? 000040008 ? 000000000 ? 000000820 ? 000105800 ? 106538260 ? ttcpip()+1284 PTR_CALL 0000000000000000 10576DC08 ? 00000005E ? 10652B000 ? 000000001 ? FFFFFFFF7C03A830 ? 00010652D ? opitsk()+1432 CALL ttcpip() 000000017 ? FFFFFFFF7FFFCFB0 ? 1056C3F6C ? 1056C1750 ? 000000000 ? 10652B118 ? opiino()+1128 CALL opitsk() 106538268 ? 000000001 ? 000000000 ? 106538260 ? 1058884D0 ? 0FFFFFFFD ? opiodr()+1548 PTR_CALL 0000000000000000 000106400 ? 10652DB98 ? 000106400 ? 10652D000 ? 000106400 ? 106538260 ? opidrv()+896 CALL opiodr() 1065373D8 ? 00000003C ? 000106400 ? 1065381E0 ? 000106538 ? 00010652D ? sou2o()+80 CALL opidrv() 10653A960 ? 000000000 ? 00000003C ? 106537698 ? 00000003C ? 000000000 ? opimai_real()+124 CALL sou2o() FFFFFFFF7FFFF708 ? 00000003C ? 000000004 ? FFFFFFFF7FFFF730 ? 105E12000 ? 000105E12 ? main()+152 CALL opimai_real() 000000002 ? FFFFFFFF7FFFF808 ? 104054D6C ? 1064D3220 ? 00247E3B4 ? 000014800 ? _start()+380 CALL main() 000000002 ? 000000008 ? 000000000 ? FFFFFFFF7FFFF818 ? FFFFFFFF7FFFF928 ? FFFFFFFF7D500200 ?
经过和MOS确认,认为是apply了Patch 5165885后引起的新问题:
I have checked our internal bug database and issue seems to be occuring due to fix for Bug.5165885.
Action plan:
=============Apply patch for 6678154
https://updates.oracle.com/download/6678154.html
Workaround:
————–Remove the patch for 5165885 .
Yes, Symptoms are pointing finger towards this bug. I would recommend to apply the patch rather than going for workarounds.
这个case目前实施了补丁6678154,仍在观察期。
录以记之!
query failed with ORA-7445 [__lwp_kill()+8] on solaris [ID 563265.1]
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.3 and later [Release: 10.2 and later ]
Information in this document applies to any platform.
Symptoms
on 3 nodes RAC Instance getting terminated while starting up giving ORA-600[] and
ORA-7445 [__lwp_kill()+8]
In the alert log there were lot of process died messages and instance aborted. While trying to
startup the instance it gave the above ORA-600 errors and instance terminated.
Cause
as mentioned in 6268598.8 & 458189.1
the problem occurs with the fix for unpublished bug installed
This problem occurs only with the fix for bug installed.
With that fix in place an instance can crash with
ORA-600 [kjblocalobj_nolock:lt] or ORA-600 [kjsmesm:svrmode]
in a RAC environment.
Solution
apply patch for 6678154
Workaround:
Remove the patch for 5165885 .
ORA-600 [kjblocalobj_nolock:lt] [ID 553041.1]
PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Bug# 6268598 See Note:6268598.8
OERI[kjblocalobj_nolock:lt] in RAC with fix for bug 5165885
Fixed: 10.2.0.3.P11
Is the fix for Bug 6268598 included in 10.2.0.4?
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.3 to 10.2.0.4 – Release: 10.2 to 10.2
Information in this document applies to any platform.
Goal
Background
Note 5165885.8 – Bug 5165885 – ORA-600 [kclcls_8] or [Kjbrchkpkeywait:Timeout] can occur in RAC. The fix for bug 5165885 is included in the 10.2.0.4 patchset.
But when the one-off fix for bug 5165885 was applied on top of 10.2.0.3, it can cause other ora-600 errors; this is fixed by another one-off patch, the fix for bug 6268598.
Note 6268598.8 – Bug 6268598 – OERI[kjblocalobj_nolock:lt] in RAC with fix for bug 5165885
However, bug 6268598 is not included in the list of bugs fixed in 10.2.0.4 (found in Note 401436.1).
Question
Does this mean that 10.2.0.4 will get ora-600 errors caused by the fix for 5165885? Do we need to apply a fix for 6268598 on top of 10.2.0.4 ?
Solution
The fix for bug 6268598 is not included in 10204, because it is not needed. This problem occurs only when the fix for bug 5165885 is applied to 10.2.0.3.
Note 6268598.8 – Bug 6268598 – OERI[kjblocalobj_nolock:lt] in RAC with fix for bug 5165885
The fix for 5165885 is included in the 10204 patchset but it does NOT cause the ora-600 errors
that you may get if you applied the fix to 10203. So the fix for 6268598 is not included in the
10204 patchset.
NB. You can see which bugs are fixed in the 10204 patch set from the following note:
Note 401436.1 – 10.2.0.4 Patch Set – List of Bug Fixes by Problem Type
ORA-600 [KJBLOCAL_OBJ_NOLOCK:LT] And Is Seen In Alert Log After Application Of Patch 5165885
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.3.0
This problem can occur on any platform.
Symptoms
ORA-600 [KJBLOCALOBJ_NOLOCK:LT] encountered after applying Patch 5165885.
Stack Trace is similar to the following:
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kjblocalobj_nolock kclcls kclgclks kcbzib kcbgtcr ktrget kdifbk qerixFetchByLogical Rowid qerjotFetch kpofchswcbk rpiswu2 kpofrws opifch2 opifch opiodr ttcpip opitsk opiino opiodr opidrv
Changes
From the Opatch lsinventory -details command, it can be seen Patch 5165885 has been applied
Cause
This is Bug 6268598
If backport for unpublished bug 5165885 is applied and you are seeing ORA-600 [KJBLOCAL_OBJ_NOLOCK:LT] or ORA-600 [KJSMESM:SVRMODE] then you are probably hitting this bug.
Solution
To implement the solution, please execute the following steps:
Download Patch 6268598
To obtain this patch from METALINK
1) Click on the Patches & Updates tab.
2) Click on Simple Search.
3) Select your platform.
4) Select Search By Patch Number
5) Enter in the patchset number (6268598)
6) Click on Go
7) Read any applicable notes before downloading, then click the Download button.
Hdr: 6354782 10.2.0.3.0 RDBMS 10.2.0.3.0 RAC PRODID-5 PORTID-46 6268598
Abstract: ORA-600 [KJBLOCALOBJ_NOLOCK LT] AFTER APPLYING PATCH FOR BUG 5165885
PROBLEM:
——–
After applying Patch 5165885 the database began to experience ORA-600
[kjblocalobj_nolock:lt] errors. After this occurs, the Grid Control console
cannot be accessed at all and it requires a restart of OPMN for Grid Control
to be available again.
DIAGNOSTIC ANALYSIS:
——————–
This issue looks like Bug 6123415 which is still being worked.
I do see a patch available for Linux-x86-64bit in Bug 6268598, but there does
not look like there is one available for Linux-x86-32bit.
WORKAROUND:
———–
None
RELATED BUGS:
————-
Bug 6268598/6123415 ORA-600 [KJBLOCALOBJ_NOLOCK LT] AFTER APPLYING PATCH
5165885
Bug 6123415 ORA-600 [KJBLOCALOBJ_NOLOCK LT]
Bug 6200640 ORA-600 [KJBLOCALOBJ_NOLOCK LT] AFTER APPLYING PATCH 5165885
REPRODUCIBILITY:
—————-
This error occurs sporadically
TEST CASE:
———-
STACK TRACE:
————
TRACE FILE (emrep1_ora_16818.trc)
—————————
—– Call Stack Trace —–
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kjblocalobj_nolock kclcls
kclgclks kcbzib kcbgtcr ktrget kdifbk qerixFetchByLogical Rowid
qerjotFetch kpofchswcbk rpiswu2 kpofrws opifch2 opifch opiodr ttcpip
opitsk opiino opiodr opidrv sou2o opimai_real main libc_start_main
…
…
—– Call Stack Trace —–
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kjsmesm kjusuc ksipgetctx
ksqcmi ksqgtlctx ksqgelctx kccocx kccbcx kcradx1 kcbdnbRedo kcbdnb
ksedmp ksfdmp kgerinv kgeasnmierr kjblocalobj_nolock kclcls kclgclks
kcbzib kcbgtcr ktrget kdifbk qerixFetchByLogical Rowid qerjotFetch
kpofchswcbk rpiswu2 kpofrws opifch2 opifch opiodr ttcpip opitsk opiino
opiodr opidrv sou2o opimai_real main libc_start_main
…
…
—– Call Stack Trace —–
ksedst ksedmp ksfdmp kgeriv kgefic ksefic kjusuc ksipgetctx ksqcmi
ksqgtlctx ksqgelctx kccocx kccbcx kcradx1 kcbdnbRedo kcbdnb ksedmp
ksfdmp kgerinv kgeasnmierr kjblocalobj_nolock kclcls kclgclks kcbzib
kcbgtcr ktrget kdifbk qerixFetchByLogical Rowid qerjotFetch
kpofchswcbk rpiswu2 kpofrws opifch2 opifch opiodr ttcpip opitsk opiino
opiodr opidrv sou2o opimai_real main libc_start_main
…
…
—– Call Stack Trace —–
ksedst ksedmp ksfdmp kgefec kgefic ksefic kjusuc ksipgetctx ksqcmi
ksqgtlctx ksqgelctx kccocx kccbcx kcradx1 kcbdnbRedo kcbdnb ksedmp
ksfdmp kgerinv kgeasnmierr kjblocalobj_nolock kclcls kclgclks kcbzib
kcbgtcr ktrget kdifbk qerixFetchByLogical Rowid qerjotFetch
kpofchswcbk rpiswu2 kpofrws opifch2 opifch opiodr ttcpip opitsk opiino
opiodr opidrv sou2o opimai_real main libc_start_main
SUPPORTING INFORMATION:
———————–
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
DIAL-IN INFORMATION:
——————–
IMPACT DATE:
————
Hdr: 6268598 10.2.0.3 RDBMS 10.2.0.3 RAC PRODID-5 PORTID-226 ORA-600
Abstract: ORA-600: [KJBLOCALOBJ_NOLOCK:LT] AFTER APPLYING PATCH 5165885
PROBLEM:
——–
Following errors are encountered after applying patch 5165885
ORA-600: internal error code, arguments: [kjblocalobj_nolock:lt], [], [],
ORA-600: internal error code, arguments: [kjsmesm:svrmode], [1], [], [],
ORA-600: internal error code, arguments: [kjblocalobj_nolock:lt], [], [],
ORA-600: internal error code, arguments: [600], [], [], [], [], [], [],
DIAGNOSTIC ANALYSIS:
——————–
WORKAROUND:
———–
RELATED BUGS:
————-
Bug.6123415 Gen RDBMS-1020 V1020 (30) ORA-600 [KJBLOCALOBJ_NOLOCK LT]:
Bug.6200640 Gen RDBMS-1020 V1020 (11) ORA-600 [KJBLOCALOBJ_NOLOCK LT] AFTER
APPLYING PATCH 5165885:
Bug.6225832 Gen RDBMS-1020 V1020 (10) LMS CRASH WITH ORA-600 [KCBGTCR_3]:
REPRODUCIBILITY:
—————-
TEST CASE:
———-
STACK TRACE:
————
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kjblocalobj_nolock
kclcls kclgrnew kcbnew ktsscfctl ktacrss ktatcreate
kxttGetPh kxttICreate qerpxPrepTempTables qerpxSendParse kxfpValidateSlaveGr
oup
kxfpgsg oup kxfrAllocSlaves kxfrialo kxfralo qerpx_rowsrc_start
qerpxStart insExecSubQueryIni insExecStmtExecIniE ngine insexe ngine
opiexe opipls opiodr rpidrus skgmstack rpidru
rpiswu2 rpidrv psddr0 psdnal pevm_EXIM pfrinstr_EXIM
pfrrun_no_tool pfrrun plsql_run peicnt kkxexe opiexe
kpoal8 opiodr ttcpip opitsk opiino opiodr
opidrv sou2o opimai_real main libc_start_main start
SUPPORTING INFORMATION:
———————–
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
DIAL-IN INFORMATION:
——————–
IMPACT DATE:
————
*** 07/22/07 10:22 pm ***
uploaded following files.
————————–
URLSK1R1_alert.log
urlsk1r1_ora_12424.trc
REDISCOVERY INFORMATION:
If backport for 5165885 is applied and you are seeing ORA-600
[KJBLOCAL_OBJ_NOLOCK:LT] or ORA-600 [KJSMESM:SVRMODE] then you are probably
hitting this bug
WORKAROUND:
Back off the patch for 5165885 (although in this case, the original errors
fixed by 5165885 will return).
RELEASE NOTES:
ORA-600 [kjblocal_obj_nolock:lt] and/or ORA-600 [kjsmesm] could occur on
10.2.0.3 after applying the backport for 5165885. This has been fixed.