这是一套古老的系统,SUNOS 5.8,Oracle 8.1.7.4。最近老革命途遇新问题,告警日志烽烟掠起:
Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc: ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], [] Thu Jul 15 16:19:29 2010 Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc: ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], [] Thu Jul 15 16:19:30 2010 Errors in file /u01/app/oracle/admin/CULPRODB/udump/culprodb_ora_7913.trc: ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], []
如果你像我一样对600着迷,那么点击这里欣赏一下这个trace文件。报错期间运行的SQL及调用栈信息:
ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [17182], [32438472], [], [], [], [], [], [] Current SQL statement for this session: select * from olsuser.cardmaster where cm_card_no between '2336330010201570013' and '2336330010201580004' union select * from olsuser.cardmaster where cm_card_no between '2336330012402300018' and '2336330012402310009' union select * from olsuser.cardmaster where cm_card_no between '2336330052400220016' and '2336330052400230007' union select * from olsuser.cardmaster where cm_card_no between '2336330015103900012' and '2336330015138100032' union select * from olsuser.cardmaster where cm_card_no between '2336330055100910018' and '2336330055100920009' ----- Call Stack Trace ----- calling call entry location type point -------------------- -------- -------------------- ksedmp()+220 CALL ksedst()+0 kgeriv()+268 PTR_CALL 0000000000000000 kgesiv()+140 CALL kgeriv()+0 kgesic1()+32 CALL kgesiv()+0 kghfrf()+204 CALL kgherror()+0 kkscls()+1592 CALL kghfrf()+0 opicca()+248 CALL kkscls()+0 opiclo()+8 CALL opicca()+0 kpoclsa()+60 CALL opiclo()+0 opiodr()+2540 PTR_CALL 0000000000000000 ttcpip()+5676 PTR_CALL 0000000000000000 opitsk()+2408 CALL ttcpip()+0 opiino()+2080 CALL opitsk()+0 opiodr()+2540 PTR_CALL 0000000000000000 opidrv()+1656 CALL opiodr()+0 sou2o()+16 CALL opidrv()+0 main()+172 CALL sou2o()+0 _start()+380 CALL main()+0 /*8.1.7中stack trace还附带着寄存器信息,但我们可读不懂:) */
opicca->kkscls->kghfrf->kgherror(heap层报错)->kgesic1。问题主要发生在调用kghfrf函数的时候,《famous summary stack trace from Oracle Version 8.1.7.4.0 Bug Note》 一文罗列了Oracle的一些stack summary,其中kghfrx函数的作用是”Free extent. This is called when a heap is unpinned to request that it”;可以猜测kghfrf函数是用来释放某种内存结构的。在MOS上输入”kghfrf 8.1.7.4″关键词,可以找到Note 291936.1:
ORA-00600 [17182] on Oracle 8.1.7.4.0 After a CTRL-C or Client Termination
Applies to:
Oracle Server – Enterprise Edition – Version: 8.1.7.4
This problem can occur on any platform.
Checked for relevance on 06-Mar-2007
Oracle RDBMS Server Versions prior to 9i
Symptoms
1. Intermittent heap corruptions errors like ORA-00600 [17182] are reported in the alert.log file.2. There is no impact to the database other than the process which encounters the errors getting killed.
3. From the trace file generated for this ORA-00600 error, check if the top few functions are :
kgherror kghfrf kkscls opicca
Cause
If the trace file shows that kkscls calls kghfrf, then it is related to:Bug 2281320 — ORA-600[17182] POSSIBLE AFTER CTRL-C OR CLIENT DEATH
Solution
The problem is when we call kghfrf to free a chunk of memory, we expect that this chunk to have been allocated from the Heap Memory and hence have a valid header, although internally we have used Frame Memory managed chunk. As a result, kghfrf errors out with the “Bagic Magic Number” in the Memory Chunk header error message.If you are running Oracle 8174, encounter this ORA-00600 [17182], and the call stack indicates the following functions { kgherror kghfrf kkscls }, then download and apply Patch 2281320 from MetaLink.
This issue has been fixed in Oracle Server 8.1.7.5 and later versions.
Note 2281320.8 is not limited to dblinks and can occur during normal database operation as well.
该文档叙述描述在9i以前版本中可能因堆损坏而出现该ORA-00600 [17182]错误,该错误不会导致致命问题或数据库损坏,最坏的情况是遭遇该错误的服务进程被杀死。与该问题匹配的主要依据是stack trace为kgherror kghfrf kkscls opicca,同我们的实际情况一致。可以通过打上one-off patch 2281320或者升级到8.1.7.5来避免该内部错误的发生,当然也可以置之不理,显然它不会造成太大的麻烦。
此外kghfrf函数用以释放内存chunk,Oracle development起初以为所有这些可能被释放的chunk都是从堆内存中分配而来,因此都该有一个有效的header;而实际上它们可能是以帧式内存管理的chunk。kghfrf因读取到这种chunk header中的错误幻数(Bagic Magic Number)而误入歧途了。
Hdr: 2281320 8.1.7.3.0 RDBMS 8.1.7.3.0 PRG INTERFACE PRODID-5 PORTID-87 ORA-600
Abstract: ORA-600[17182] POSSIBLE AFTER CTRL-C OR CLIENT DEATH
PROBLEM:
——–
Regularly an ora-600[17182] is generated. Checking on this kind of files is
done automatically and DBA’s are informed about this error and have to check
it at once (even middle in the night).
Except memory corruption there does not seem to be an impact towards the
database.
DIAGNOSTIC ANALYSIS:
——————–
Have checked the objects in question:
– problem occurs on different tables with different queries
– execution plan shows usage of bitmap index as well FTS scans
No regular plan can be found in it.
Setting of diagnostic event 10235 with level 4 seems to introduce ora-4030 so
had to be put off.
Patch for 2177050 has been installed but problem occurred before and after
installation of this patch.
According cust the same error occurred in 8.1.7.2.0 as well.
WORKAROUND:
———–
Have not found any.
RELATED BUGS:
————-
Have not found any
REPRODUCIBILITY:
—————-
not reproducable at will
TEST CASE:
———-
Not applicable
STACK TRACE:
————
*** 17:30:34.123
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [17182], [1075716168], [], [], [],
[]
, [], []
Current SQL statement for this session:
select * from rcv a where a.clne_seq in (select clne_seq from cdt_lne where
val_
day = 0)
and (a.dte_due + 1 )= (select dte_nxt_pay from its_per b where b.iper_seq =
a.ip
er_seq)
—– Call Stack Trace —–
*** 17:30:45.045
link and map addresses differ for
/oracle/app/oracle/product/8.1.7/lib/libobk.so
– 3ffbffe0000, 30000000000
calling call entry argument values in hex
location type point (? means dubious value)
——————– ——– ——————–
—————————-
ksedmp:1838[kse.c] ??? ksedst:2205[kse.c] 12071E6BC ? 0380003D8 ?
1401EB838 ? 100000018 ?
1214707E8 ? 0380003D8 ?
ksfdmp:917[ksf.c] ??? ksedmp:1838[kse.c] 121470854 ? 00000431E ?
000000000 ? 000000000 ?
000000001 ? 11FFFCAB0 ?
kgeriv:1451[kge.c] ??? ksfdmp:917[ksf.c] 000000000 ? 000000000 ?
000000001 ? 11FFFCAB0 ?
100000018 ? 121471014 ?
kgesiv:1679[kge.c] JSR kgeriv:1451[kge.c] 121470C48 ? 0380003D8 ?
1401EEAF8 ? 11FFFCAB0 ?
100000018 ? 10000431E ?
kgesic1:1558[kge.c] ??? kgesiv:1679[kge.c] 12145F8B4 ? 11FFFCAB0 ?
100000018 ? 000000000 ?
000000000 ? 038000000 ?
kgherror:569[kgh.c] ??? kgesic1:1558[kge.c] 000000000 ? 000000008 ?
000000010 ? 1214677DC ?
0380003D8 ? 1401EEAF8 ?
kghfrf:5102[kgh.c] ??? kgherror:569[kgh.c] 000000001 ? 1206E9D84 ?
1401F9E38 ? 038000000 ?
000000000 ? 000000024 ?
kkscls:3728[kks.c] ??? kghfrf:5102[kgh.c] 120FDF020 ? 000000003 ?
1401F1638 ? 1401F9E38 ?
038004318 ? 000000000 ?
opicca:145[opicca.c JSR kkscls:3728[kks.c] 120E202E0 ? 038000000 ?
000000001 ? 000000001 ?
120BB502C ? 11FFFD660 ?
opiclo:79[opiclo.c] JSR opicca:145[opicca.c 120BB502C ? 11FFFD660 ?
000000003 ? 11FFFD660 ?
1206408D4 ? 038000000 ?
…
SUPPORTING INFORMATION:
———————–
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
DIAL-IN INFORMATION:
——————–
IMPACT DATE:
————
Files will be uploaded to ess30
The ora-600[729] has been solved by installing patch for 2177050.
The uploaded traces are erroring in kkscls -> kghfrf during
closing of cursors. The corruption is in private memory, but
the bad chunk does not appear in the session heap.
Can you add the following to see if we can get closer to
the cause:
event=”600 trace name heapdump level 5125″
event=”10501 trace name context forever, level 4109″
Also can you upload the alert log extract and init.ora
parameter settings.
problem has re-occured after enabling above events, have uploaded the
17182.zip file the cust has provided.
Cust provided a new tracefile containing the ora-600[17182]
have uploaded files of customer: trace + alertfile: 17182_18apr2002.zip
An ORA-3113 is seen from the DBLINK during execution of this statement:
select * from pmm_rcp_pmm order by dte_sta_pmm_rcp_pmm
This occurs in the stack:
ksesec0
ksucin
srsmr1
srsrel
sorrelqb
qersoRelease
rwsrld
qecrlssub
opifch
opiall0
kpoal8
Note: The local error is only occuring as the DB link signals an
ORA-3113. This implies the remote end of the DB link is failing.
You should find out if that is due to an unexpected process
death and if so follow that up as a seperate issue. The issue
here is that in 8i an ORA-3113 from a DB link at a particular
time can cause a local ORA-600 [17182] error.
Please indicate if you are likely to need an 8i fix for this
OERI:17182 problem so I know what action to take next. Thanks
cust has checked the object in question and it is a local table:
New info : select * from dba_objects where object_name = ‘PMM_RCP_PMM’;
OWNER OBJECT_NAME SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID
—— ———— —————————— ———- ————–
OBJECT_TYPE CREATED LAST_DDL_ TIMESTAMP STATUS T G S
—————— ——— ——— ——————- ——- – – –
PUBLIC PMM_RCP_PMM 39282
SYNONYM 06-MAR-00 06-MAR-00 2000-03-06:19:30:41 VALID N N N
ISR PMM_RCP_PMM 39277 39277
TABLE 06-MAR-00 23-OCT-01 2000-03-06:19:29:53 VALID N N N
Please (re-)check the tracefile for the database link.
Ooops – diagnosis is the same – it is just that the ORA-3113 is
from a dead client connection not a dead DB link.
ie: The client going away at an inappropriate time exposes the same
hole in the code.
Please provide a backport for this problem for 8.1.7.3.0
Can a timeframe be given in which a fix can be expected??
Thanks
Rediscovery Information :
“If you get ORA-600[17182] after a ORA-3113, and cause for 3113 indicates
following pattern in the error stack, then it could be this bug.
…opifch()->qecrlssub()->….ksesec0()”
]] ORA-600[17182] occurred followed by ORA-3113 when the heap dump
]] indicated that 17182 encountered while freeing the chunk marked with
]] “define-info”.
Hdr: 2491757 8.1.7.4 RDBMS 8.1.7.4 PRG INTERFACE PRODID-5 PORTID-23 ORA-600 2281320
Abstract: ORA-600 [17182] [32227064], [], [], [], [], [], [] AND ORA-3113 ON 8.1.7.4
TAR:
—-
SMS TAR 2396840.995
PROBLEM:
——–
Customer is getting this problem and problematic sessionis disconnected. They
are not able toreproduce this at will.
DIAGNOSTIC ANALYSIS:
——————–
NA
WORKAROUND:
———–
None
RELATED BUGS:
————-
REPRODUCIBILITY:
—————-
Custome could not reproduce this at will. But we belive it will be reproduced
in future
TEST CASE:
———-
NA
STACK TRACE:
————
ksedmp
kgeriv
kgesiv
kgesic1
kghfrf
kkscls
opicca
opiclo
opifcs
ksuxds
ksudel
opidcl
opidrv
sou2o
main
_start
SUPPORTING INFORMATION:
———————–
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
NA
DIAL-IN INFORMATION:
——————–
NA
IMPACT DATE:
————
Alert.log, init.ora and trace file of problem is on machine ess30in directory
/bug/bug2491757 in file bug2491757.zip
The current SQL statement is
SELECT “OSN_POD_OSEBA”.”SIFRA”
FROM “OSN_POD”.”OSEBA” “OSN_POD_OSEBA” ORDER BY “PRIIMEK”
If this is over a DB link this is probably a duplicate of
Bug.2281320 . Please confirm what “OSN_POD”.”OSEBA” is.
No, this is no db_link. I’ve checked that.
Sorry – I should not have mentioned DB links. Bug 2281320 has
nothing to do with DB links – Ive corrected its title.
From your trace the dump is when freeing kxscdfn in the
current instantiation. This is not pointing at a KGH chunk
hence the OERI:17182.
This is almost certainly a duplicate of bug:2281320
If the customer needs a fix in 8174 please request a PSE to 8174
referencing this bug as evidence. It is likely this is from
a dead client or a client interrupt so unless this is happening
a lot you would need a good business case for a PSE.
There is no actual corruption here – just a cleanup error.
Hdr: 3421829 8.1.7.4 RDBMS 8.1.7.4 PRODID-5 PORTID-59 ORA-600
Abstract: ORA-600 15203 AND ORA-600 17182
PROBLEM:
——–
Customer is getting intermittent ora-600 errors. The TAR was originally
opened for the ora-600 17182 errors. However, since asking the customer to
set event 10235 but before setting it, he has encountered ora-600 15203 which
seems to have spawned more ora-600 17182 errors.
He cannot reproduce the errors at will. However, they are quite frequent
(occur on a daily basis)
DIAGNOSTIC ANALYSIS:
——————–
I have asked the customer to set event 10235 level 2. However, he is
concerned about a performance hit. He wanted to know if we could give him a %
of performance degredation that he would encounter. I told him I would ask
development. I had also asked him to set event 10501. However, I understand
that this event causes more of a performance hit then the 10235. So, I don’t
think I can get him to set that event.
The customer also wants to be sure that we will get all the information needed
from setting the 10235 event. He does not want to have to set further events
– causing downtime on production.
WORKAROUND:
———–
none known
RELATED BUGS:
————-
bug 2765055 OERI:15203 / Memory corruption if partitioned table cursor is
reloaded —–this bug has to do with partitioned tables, my ct does not use
partitioned tables.
REPRODUCIBILITY:
—————-
intermittently on a daily basis
TEST CASE:
———-
none available
STACK TRACE:
————
/opt/oracle/admin/MOVE/udump/ora_7255_move.trc
===============================================
ORA-600: internal error code, arguments: [17182], [1075419096], [], [], [],
[], [], []
Current SQL statement for this session:
SELECT ‘x’ FROM task_master WHERE TASK_ID=2953359 FOR UPDATE NOWAIT
STACK: kgherror kghfrf kxscln kkscls
Chunk 401997d8 sz= 56 ERROR, BAD MAGIC NUMBER (3b)
/opt/oracle/admin/MOVE/udump/ora_3761_move.trc
================================================
ORA-600: internal error code, arguments: [15203], [9], [5], [], [], [], [],
[]
Current SQL statement for this session:
select inventory_id ,product_id ,uom_family ,uom_type_code ,product_key
,location_is_lp_ind ,physical_location_no ,onhand_quantity ,inbound_quantity
,outbound_quantity ,material_status_code ,material_keepers_ref ,inventory_type
,inventory_status from inventory where (location_no=:b0 and
onhand_quantity>0) order by product_id asc
STACK:ksesic2 kksfal