ORA-07445:[SIGFPE] [Integer divide by zero]内部错误一例

一套SUNOS 5.10上的单节点10.2.0.3系统出现了ORA-07445: exception encountered: core dump [SIGFPE] [Integer divide by zero] [42788866] [] [] []内部错误,具体trace日志如下:

mon_ora_17633.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /oracle/oracle/product/10.2.0
System name: SunOS
Node name: monitor-a
Release: 5.10
Version: Generic_139556-08
Machine: i86pc

ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [SIGFPE] [Integer divide by zero] [42788866] [] [] []
Current SQL statement for this session:
select req_time into :b0 from t_FIX_TranSerial where
((((tran_bank=:b1 and tran_type=:b2) and term_no=:b3)
and trace_no=:b4) and local_time='0')
----- Call Stack Trace -----

sigsetjmp <- call_user_handler
<- sigacthandler <- kpopfr <- kposdi <- kpopsdi <- opiefn0
<- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino
<- opiodr <- opidrv <- sou2o <- opimai_real <- main
<- 0000000000E54FE7

PROCESS STATE
-------------
O/S info: user: monitor, term: pts/5, ospid: 17621, machine: monitor-a
program: CTPDATA@monitor-a (TNS V1-V3)
application name: CTPDATA@monitor-a (TNS V1-V3), hash value=0
last wait for 'SQL*Net message to client' blocking sess=0x0 seq=6213 wait_time=1 seconds since wait started=0
driver id=62657100, #bytes=1, =0

通过在MOS上查询以上ORA-07445错误的arguement可以发现Note <ORA-7445 [KPOPFR] [SIGFPE] [INTEGER DIVIDE BY ZERO] When Repeatedly Executing a Query (Doc ID 421203.1)> :

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
This problem can occur on any platform.
Symptoms

1. Repeatedly executing a query can lead to the following error: 

ORA-7445 [kpopfr] [SIGFPE] [INTEGER DIVIDE BY ZERO]

2. The call stack from the ORA-07445 trace file should contain the following functions:
kposdi  kpopsdi

The error is caused by BUG 5753629.
Abstract: QUERY FAILS BY ORA-7445 [KPOPFR]
Repeatedly executing a query can lead to an ORA-7445[kpopfr] error.

Solution
To implement the solution, do one of the following: 

1. Upgrade to 11.1 or 10.2.0.4, when available.
At the time of writing the article these version were not yet available. (July 2007).
2. Apply one-off Patch 5753629 from MetaLink, if available for your platform and version.

There is no known workaround available for this bug.

References

BUG:5753629 - QUERY FAILS BY ORA-7445 [KPOPFR].

Hdr: 5753629 10.2.0.2 RDBMS 10.2.0.2 PRG INTERFACE PRODID-5 PORTID-23 ORA-7445
Abstract: QUERY FAILS BY ORA-7445 [KPOPFR].

*** 01/09/07 06:12 pm ***
TAR:
----

PROBLEM:
--------
When executing query again and again from one session, query fails
by ORA-7445[kpopfr].

  ====================================================================
  Sat Dec 30 00:22:39 2006
  Errors in file /var/log/oracle/trace/felica2_ora_6156.trc:
  ORA-7445: exception encountered: core dump [kpopfr()+536] [SIGFPE]
  [Integer divide by zero] [0x1023BCE18] [] []
  ====================================================================

DIAGNOSTIC ANALYSIS:
--------------------
From disassemble, %o3 is devided by %o0 and %o0 seems to be 0x0.

  0x1023f43d0 :       umul  %g5, %g4, %o0
  0x1023f43d4 :       mov  %g0, %y
  0x1023f43d8 :       udiv  %o3, %o0, %o3

%o0 is calcurated by %g5 X %g4 at kpopfr+528.

From our trace file, this value(%g5) is 0x100000.

  ub4 kponc_p [FFFFFFFF7B22AAAC, FFFFFFFF7B22AAB0) = 00100000

And %g4 seems to be 0x1000 from below trace file output.
 ========== FRAME [6] (kpopsdi()+148 -> kposdi()) ==========
  %l0 0000000100302B00 %l1 0000000000000002 %l2 FFFFFFFF7B263CA8
  %l3 00000003C36AFB80 %l4 0000000105D7CE20 %l5 0000000000000001
  %l6 0000000000000007 %l7 0000000105D7A920 %i0 0000000000000000
  %i1 FFFFFFFF7FFFB9EC %i2 0000000000001000 %i3 0000000105E03220
                       ~~~~~~~~~~~~~~~~~~~~ <--(*) here
  %i4 0000000000800000 %i5 0000000000105C00 %fp FFFFFFFF7FFFB241 

If %g4=0x1000 and %g5=0x100000, %g4 X %g5 = 0x100000000.
0x100000000 is 0x0 as ub4, and this may bring 0 divide and ORA-7445.

I can reproduce the similar problem in my house, so I'll upload testcase.
Problem reproduce at the following case.

 * sum of all column size is 1048576(0x100000)
 * run query again and again from one session (about 4096(0x1000) times)

From this results, above guess seems to be correct.

WORKAROUND:
-----------
n/a

RELATED BUGS:
-------------
n/a

REPRODUCIBILITY:
----------------
I have confirmed that this problem reproduces at the below env.

 * Linux x86 32bit, 10.2.0.3 : ORA-7445[kpopfr()+300]
 * Linux x86 64bit, 10.2.0.2 : ORA-7445[kpopfr()+339]
 * Solaris 64bit, 10.2.0.2   : ORA-7445[kpopfr()+536]
 * HP-UX Itanium, 10.2.0.2   : ORA-7445[_div32U()+34]

TEST CASE:
----------
At first, creating table like follows.

conn scott/tiger
drop table test;
create table test
( c000 char(2000),
  c001 char(2000),
    ... 
  c523 char(2000),
  c524 char(576));

  --> sum of all column size is 1048576(0x100000).

Run next shell script.

  while [ 1 ]
  do
  echo "set feedback off"
  echo "select * from test where c001 = 'A';"
  done | sqlplus -s scott/tiger

It takes 3-10 minutes to reproduce the problem.
Required time for reproducing depends on hardware spec.

STACK TRACE:
------------
 ksedmp ssexhd sighndlr call_user_handler kposdi kpopsdi kpoal8
 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real
 main start

具体向Oracle GCS提交SR以后确认为 BUG 5753629. Oracle GCS给出了2种解决方案:
1.升级到10.2.0.4或更高版本
2.应用Apply one-off Patch 5753629

沪ICP备14014813号-2

沪公网安备 31010802001379号