这是一套古老的系统,SUNOS 5.8,Oracle 8.1.7.4。最近老革命途遇新问题,告警日志烽烟掠起:
Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc: ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], [] Thu Jul 15 16:19:29 2010 Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc: ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], [] Thu Jul 15 16:19:30 2010 Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc: ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], []
如果你像我一样对600着迷,那么点击这里欣赏一下这个trace文件。报错期间运行的SQL及调用栈信息:
ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], [] Current SQL statement for this session: select * from olsuser.cardmaster where cm_card_no between '2336330010201570013' and '2336330010201580004' union select * from olsuser.cardmaster where cm_card_no between '2336330012402300018' and '2336330012402310009' union select * from olsuser.cardmaster where cm_card_no between '2336330052400220016' and '2336330052400230007' union select * from olsuser.cardmaster where cm_card_no between '2336330015103900012' and '2336330015138100032' union select * from olsuser.cardmaster where cm_card_no between '2336330055100910018' and '2336330055100920009' ----- Call Stack Trace ----- calling call entry location type point -------------------- -------- -------------------- ksedmp()+220 CALL ksedst()+0 kgeriv()+268 PTR_CALL 0000000000000000 kgesiv()+140 CALL kgeriv()+0 kgesic1()+32 CALL kgesiv()+0 kghfrf()+204 CALL kgherror()+0 kkscls()+1592 CALL kghfrf()+0 opicca()+248 CALL kkscls()+0 opiclo()+8 CALL opicca()+0 kpoclsa()+60 CALL opiclo()+0 opiodr()+2540 PTR_CALL 0000000000000000 ttcpip()+5676 PTR_CALL 0000000000000000 opitsk()+2408 CALL ttcpip()+0 opiino()+2080 CALL opitsk()+0 opiodr()+2540 PTR_CALL 0000000000000000 opidrv()+1656 CALL opiodr()+0 sou2o()+16 CALL opidrv()+0 main()+172 CALL sou2o()+0 _start()+380 CALL main()+0 /*8.1.7中stack trace还附带着寄存器信息,但我们可读不懂:) */
opicca->kkscls->kghfrf->kgherror(heap层报错)->kgesic1。问题主要发生在调用kghfrf函数的时候,《famous summary stack trace from Oracle Version 8.1.7.4.0 Bug Note》 一文罗列了Oracle的一些stack summary,其中kghfrx函数的作用是”Free extent. This is called when a heap is unpinned to request that it”;可以猜测kghfrf函数是用来释放某种内存结构的。在MOS上输入”kghfrf 8.1.7.4″关键词,可以找到Note 291936.1:
ORA-00600 [17182] on Oracle 8.1.7.4.0 After a CTRL-C or Client Termination
Applies to:
Oracle Server – Enterprise Edition – Version: 8.1.7.4
This problem can occur on any platform.
Checked for relevance on 06-Mar-2007
Oracle RDBMS Server Versions prior to 9i
Symptoms
1. Intermittent heap corruptions errors like ORA-00600 [17182] are reported in the alert.log file.2. There is no impact to the database other than the process which encounters the errors getting killed.
3. From the trace file generated for this ORA-00600 error, check if the top few functions are :
kgherror kghfrf kkscls opicca
Cause
If the trace file shows that kkscls calls kghfrf, then it is related to:Bug 2281320 — ORA-600[17182] POSSIBLE AFTER CTRL-C OR CLIENT DEATH
Solution
The problem is when we call kghfrf to free a chunk of memory, we expect that this chunk to have been allocated from the Heap Memory and hence have a valid header, although internally we have used Frame Memory managed chunk. As a result, kghfrf errors out with the “Bagic Magic Number” in the Memory Chunk header error message.If you are running Oracle 8174, encounter this ORA-00600 [17182], and the call stack indicates the following functions { kgherror kghfrf kkscls }, then download and apply Patch 2281320 from MetaLink.
This issue has been fixed in Oracle Server 8.1.7.5 and later versions.
Note 2281320.8 is not limited to dblinks and can occur during normal database operation as well.
该文档叙述描述在9i以前版本中可能因堆损坏而出现该ORA-00600 [17182]错误,该错误不会导致致命问题或数据库损坏,最坏的情况是遭遇该错误的服务进程被杀死。与该问题匹配的主要依据是stack trace为kgherror kghfrf kkscls opicca,同我们的实际情况一致。可以通过打上one-off patch 2281320或者升级到8.1.7.5来避免该内部错误的发生,当然也可以置之不理,显然它不会造成太大的麻烦。
此外kghfrf函数用以释放内存chunk,Oracle development起初以为所有这些可能被释放的chunk都是从堆内存中分配而来,因此都该有一个有效的header;而实际上它们可能是以帧式内存管理的chunk。kghfrf因读取到这种chunk header中的错误幻数(Bagic Magic Number)而误入歧途了。