一套SUNOS 5.10上的单节点10.2.0.3系统出现了ORA-07445: exception encountered: core dump [SIGFPE] [Integer divide by zero] [42788866] [] [] []内部错误,具体trace日志如下:
mon_ora_17633.trc Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production With the Partitioning, OLAP and Data Mining options ORACLE_HOME = /oracle/oracle/product/10.2.0 System name: SunOS Node name: monitor-a Release: 5.10 Version: Generic_139556-08 Machine: i86pc ksedmp: internal or fatal error ORA-07445: exception encountered: core dump [SIGFPE] [Integer divide by zero] [42788866] [] [] [] Current SQL statement for this session: select req_time into :b0 from t_FIX_TranSerial where ((((tran_bank=:b1 and tran_type=:b2) and term_no=:b3) and trace_no=:b4) and local_time='0') ----- Call Stack Trace ----- sigsetjmp <- call_user_handler <- sigacthandler <- kpopfr <- kposdi <- kpopsdi <- opiefn0 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main <- 0000000000E54FE7 PROCESS STATE ------------- O/S info: user: monitor, term: pts/5, ospid: 17621, machine: monitor-a program: CTPDATA@monitor-a (TNS V1-V3) application name: CTPDATA@monitor-a (TNS V1-V3), hash value=0 last wait for 'SQL*Net message to client' blocking sess=0x0 seq=6213 wait_time=1 seconds since wait started=0 driver id=62657100, #bytes=1, =0
通过在MOS上查询以上ORA-07445错误的arguement可以发现Note <ORA-7445 [KPOPFR] [SIGFPE] [INTEGER DIVIDE BY ZERO] When Repeatedly Executing a Query (Doc ID 421203.1)> :
Applies to: Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3 This problem can occur on any platform. Symptoms 1. Repeatedly executing a query can lead to the following error: ORA-7445 [kpopfr] [SIGFPE] [INTEGER DIVIDE BY ZERO] 2. The call stack from the ORA-07445 trace file should contain the following functions: kposdi kpopsdi The error is caused by BUG 5753629. Abstract: QUERY FAILS BY ORA-7445 [KPOPFR] Repeatedly executing a query can lead to an ORA-7445[kpopfr] error. Solution To implement the solution, do one of the following: 1. Upgrade to 11.1 or 10.2.0.4, when available. At the time of writing the article these version were not yet available. (July 2007). 2. Apply one-off Patch 5753629 from MetaLink, if available for your platform and version. There is no known workaround available for this bug. References BUG:5753629 - QUERY FAILS BY ORA-7445 [KPOPFR]. Hdr: 5753629 10.2.0.2 RDBMS 10.2.0.2 PRG INTERFACE PRODID-5 PORTID-23 ORA-7445 Abstract: QUERY FAILS BY ORA-7445 [KPOPFR]. *** 01/09/07 06:12 pm *** TAR: ---- PROBLEM: -------- When executing query again and again from one session, query fails by ORA-7445[kpopfr]. ==================================================================== Sat Dec 30 00:22:39 2006 Errors in file /var/log/oracle/trace/felica2_ora_6156.trc: ORA-7445: exception encountered: core dump [kpopfr()+536] [SIGFPE] [Integer divide by zero] [0x1023BCE18] [] [] ==================================================================== DIAGNOSTIC ANALYSIS: -------------------- From disassemble, %o3 is devided by %o0 and %o0 seems to be 0x0. 0x1023f43d0 : umul %g5, %g4, %o0 0x1023f43d4 : mov %g0, %y 0x1023f43d8 : udiv %o3, %o0, %o3 %o0 is calcurated by %g5 X %g4 at kpopfr+528. From our trace file, this value(%g5) is 0x100000. ub4 kponc_p [FFFFFFFF7B22AAAC, FFFFFFFF7B22AAB0) = 00100000 And %g4 seems to be 0x1000 from below trace file output. ========== FRAME [6] (kpopsdi()+148 -> kposdi()) ========== %l0 0000000100302B00 %l1 0000000000000002 %l2 FFFFFFFF7B263CA8 %l3 00000003C36AFB80 %l4 0000000105D7CE20 %l5 0000000000000001 %l6 0000000000000007 %l7 0000000105D7A920 %i0 0000000000000000 %i1 FFFFFFFF7FFFB9EC %i2 0000000000001000 %i3 0000000105E03220 ~~~~~~~~~~~~~~~~~~~~ <--(*) here %i4 0000000000800000 %i5 0000000000105C00 %fp FFFFFFFF7FFFB241 If %g4=0x1000 and %g5=0x100000, %g4 X %g5 = 0x100000000. 0x100000000 is 0x0 as ub4, and this may bring 0 divide and ORA-7445. I can reproduce the similar problem in my house, so I'll upload testcase. Problem reproduce at the following case. * sum of all column size is 1048576(0x100000) * run query again and again from one session (about 4096(0x1000) times) From this results, above guess seems to be correct. WORKAROUND: ----------- n/a RELATED BUGS: ------------- n/a REPRODUCIBILITY: ---------------- I have confirmed that this problem reproduces at the below env. * Linux x86 32bit, 10.2.0.3 : ORA-7445[kpopfr()+300] * Linux x86 64bit, 10.2.0.2 : ORA-7445[kpopfr()+339] * Solaris 64bit, 10.2.0.2 : ORA-7445[kpopfr()+536] * HP-UX Itanium, 10.2.0.2 : ORA-7445[_div32U()+34] TEST CASE: ---------- At first, creating table like follows. conn scott/tiger drop table test; create table test ( c000 char(2000), c001 char(2000), ... c523 char(2000), c524 char(576)); --> sum of all column size is 1048576(0x100000). Run next shell script. while [ 1 ] do echo "set feedback off" echo "select * from test where c001 = 'A';" done | sqlplus -s scott/tiger It takes 3-10 minutes to reproduce the problem. Required time for reproducing depends on hardware spec. STACK TRACE: ------------ ksedmp ssexhd sighndlr call_user_handler kposdi kpopsdi kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real main start
具体向Oracle GCS提交SR以后确认为 BUG 5753629. Oracle GCS给出了2种解决方案:
1.升级到10.2.0.4或更高版本
2.应用Apply one-off Patch 5753629