Oracle内部错误:ORA-00600[kgskdecrstat1]一例

2011/04/28 by admin 3 Comments

一套Itanium HP-UX上的9.2.0.5系统最近出现了ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [],内部错误，其错误日志如下:

Mon Apr 18 12:32:20 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ARC0: Completed archiving  log 6 thread 1 sequence 831803
Mon Apr 18 12:32:22 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
Mon Apr 18 12:32:23 2011
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []

Trace file
----------
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
ORACLE_HOME = /opt/oracle/product/9.2.0.5
System name:	HP-UX
Release:	B.11.23
Version:	U
Machine:	ia64
Redo thread mounted by this instance: 1
Oracle process number: 28
Unix process pid: 14827, image: oracle@(TNS V1-V3)

*** 2011-04-18 02:39:15.043
*** SESSION ID:(276.16183) 2011-04-18 02:39:15.042
DEADLOCK DETECTED
Current SQL statement for this session:
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                      ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-00030022-0070d56f        28     276     X            344     259           X
TX-000a0012-0058e4fc       344     259     X             28     276           X
session 276: DID 0001-001C-0000002C	session 259: DID 0001-0158-000020E6
session 259: DID 0001-0158-000020E6	session 276: DID 0001-001C-0000002C
Rows waited on:
Session 259: obj - rowid = 00001F0D - AABNCDAAEAAAAXYAAL
 (dictionary objn - 7949, file - 4, block - 1496, slot - 11)
Session 276: obj - rowid = 00001F0B - AABNCAAAEAAAAXCAAY
 (dictionary objn - 7947, file - 4, block - 1474, slot - 24)
Information on the OTHER waiting sessions:
Session 259:
 pid=344 serial=17280 audsid=106170227 user: 41
           program: JDBC Thin Client
 application name: JDBC Thin Client, hash value=0
 Current SQL Statement:
End of information on OTHER waiting sessions.
*** 2011-04-18 12:32:20.392
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kgskdecrstat1], [], [], [], [], [], [], []
----- Call Stack Trace -----
... kgskdecrstat  kskdecrstat  ktudecrustat  ktcdso  ktcrcm  ktdcmt  k2lcom ...
----- End of Call Stack Trace -----

PROCESS STATE
-------------
   program: JDBC Thin Client
   application name: JDBC Thin Client, hash value=0
   last wait for 'SQL*Net message from client' blocking sess=0x0 seq=39332 wait_time=21242
               driver id=74637000, #bytes=1, =0

可以看到以上引发ORA-00600[kgskdecrstat1]内部错误的进程同时发现了死锁(DEADLOCK)，在MOS上搜索可以发现”Bug 2894072: ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [KGSKDECRSTAT1]”，但是该Bug最早发生在9.2.0.3上，已经确定影响9.2.0.x所有版本，并且在9.2.0.5上没有backport的bug fix。

提交sr后，Oracle Gcs给出了解决的solution:
1.升级数据库到10.2.0.5，11.1.0.7，11.2.0.2等目前支持的版本
2.解决引发该bug的ORA-00060 dead lock问题
3.如果在解决ORA-00060后仍出现以上ORA-00600[kgskdecrstat1]内部错误，且启用了9i早期版本中的resource manager的话，可以尝试禁用该特性:
ALTER SYSTEM SET resource_manager_plan=” SCOPE=BOTH;
以便绕过bug 2494790。

Filed Under: Oracle, Oracle Bug Tagged With: 9i, dead lock, Oracle Bug, resource manager

Comments

admin says

2011/04/28 at 23:45

Hdr: 3090309 9.2.0.3 RDBMS 9.2.0.3 TXN MGMT DIST PRODID-5 PORTID-453 2494790
Abstract: ORA-600 [KGSKDECRSTAT1] W/ SCHEDULE_PUSH AND RESOURCE_LIMIT=TRUE

PROBLEM:
——–
Fidelity continues to get the ORA-600 [kgskdecrstat1]
Even after applying Bug Patch 2494790.

DIAGNOSTIC ANALYSIS:
——————–
Current SQL statement for this session:
DECLARE job BINARY_INTEGER := :job;
next_date DATE := :mydate;
broken BOOLEAN := FALSE;
BEGIN
declare rc binary_integer;
begin rc := sys.dbms_defer_sys.push(destination=>’FSADB20.WORLD’,
stop_on_error= >FALSE, execution_seconds=>5400,
delay_seconds=>50, parallelism=>0); end;

:mydate := next_date;
IF broken THEN :b := 1; ELSE :b := 0;
END IF;
END;
—– PL/SQL Call Stack —–
If I change resource_limit=FALSE, there is no error occurring.
However, we do use profiles for some usernames and want the
resource_limit=TRUE and need a solution to this ora-600 bug.

WORKAROUND:
———–
none

RELATED BUGS:
————-
2894072
2494790

REPRODUCIBILITY:
—————-
Always

TEST CASE:
———-

STACK TRACE:
————
ksedmp()+248 CALL ksedst()+0
kgerinv()+160 PTR_CALL 00000000
kgeasnmierr()+28 CALL kgerinv()+0
kgskdecrstat()+448 CALL kgeasnmierr()+0
kskdecrstat()+500 CALL kgskdecrstat()+0
ktudecrustat()+460 CALL kskdecrstat()+0
ktcdso()+756 CALL ktudecrustat()+0
ktcrcm()+2812 CALL ktcdso()+0
ktdcmt()+132 CALL ktcrcm()+0
k2lcom()+88 CALL ktdcmt()+0
k2send()+1024 CALL k2lcom()+0
xctctl()+96 CALL k2send()+0
xctcom()+284 CALL xctctl()+0
psdtcm()+216 CALL xctcom()+0
pevm_icd_call_commo PTR_CALL 00000000
pfrrun()+7420 CALL pevm_icd_call_commo
peicnt()+216 CALL pfrrun()+0
kkxexe()+524 CALL peicnt()+0
opiexe()+11508 CALL kkxexe()+0
opiodr()+2488 PTR_CALL 00000000
rpidrus()+112 CALL opiodr()+0
skgmstack()+152 PTR_CALL 00000000
rpidru()+124 CALL skgmstack()+0
rpiswu2()+328 PTR_CALL 00000000
rpidrv()+2428 CALL rpiswu2()+0
rpiexe()+32 CALL rpidrv()+0
kkjex1e()+4052 CALL rpiexe()+0
kkjsexe()+240 CALL kkjex1e()+0
kkjrdp()+804 CALL kkjsexe()+0
opirip()+992 CALL kkjrdp()+0
opidrv()+684 CALL opirip()+0
sou2o()+16 CALL opidrv()+0
main()+160 CALL sou2o()+0
_start()+264 CALL main()+0

SUPPORTING INFORMATION:
———————–
The scenario is not a problem when I stop and start the resource_manager_plan
repeatedly in a short period of time. The scenario is when then plan is
active, and oracle advanced replication transactions are pushed, than the
ORA-600 error occurs each time.

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————
ASAP
Currently in the certification stage … Going into Production within the next
1-2 weeks. Fidelity needs to resolve this issue before going to prod.

Question from Client regarding _job_queue_interval …
******************************************************
I am taking your direction and applying the patch, 2494790 on the solaris
2.8 machine and bouncing the icmq1 database at 4pm est today.
My question, is the change to job_queue_interval from 60 to 2000. The issue
to me is that we use job_queue_interval for 60 now w/ advanced replication
and have a VERY high flow of transactions per minute, and my understanding is
that the job_queue_interval wakes up and looks for work every 60 seconds.
Changing this to 2000 means our schedule_push would be pushing every 33
minutes, instead of every one minute. This would be unacceptable if this
assumption is correct. Please advise.

From Fidelity :
We have successfully installed the patch 2494790 and bounced the icmq1
database. I have restarted the database resource manager and did schedule
push of oracle replicated transactions and reproduced the ORA-600 errors.
I will upload the alert log and the latest .trc file for your review.
Also, the job_queue_interval init.ora parameter is not a normal parameter
available in 9.2.0.3 for me to modify.
*************************
I did inform them this is an “underscore” parameter, but based on your latest
update, IN will upload the latest trace and alert files, and wait further
instructions.

Here is the process that Fidelity uses to create the plan and start
the resourse manager…. There is additional info as well.

Procedure from Fidelity:
The commands to start and stop the resource manager are…
(run as my dba account…)
exec dbms_resource_manager.switch_plan(”);
exec dbms_resource_manager.switch_plan(‘WEEKLY_SYNCHRO_PLAN’);

Here’s my procedure to install the database resource manager via
the oracle enterprise manager gui…
1. (assumes database was migrated from 8.1.7.4.0 to 9.2.0.3.0)
2. existing resource manager plan, called WEEKLY_SYNCHRO_PLAN deleted via oem
3. use create option (not create wizard) for resource plan and create a
new plan called WEEKLY_SYNCHRO_PLAN via om
4. create new resource groups for DBA_GROUP, REPORTING_GROUP, SYNCHRO_GROUP.
5. assign users to default groups and assign 90% to 5 different levels
for 5 different resource groups.
6. apply parallelism limitations for each group
7. turn on the plan..
exec dbms_resource_manager.switch_plan(`WEEKLY_SYNCHRO_PLAN?);
8. run a transaction on a table that is set up in real-time replication and
ensure transaction is pushed via schedule_push from this database.
9. observe alert log to see if ORA-600 occurred.

Additional Info from Fidelity:
I think I need to clarify that the database resource plan is normally not
running in production except for once/week from wed 8pm to thursday about
10am. So when the 8i to 9i database upgrade is done, the resource manager is
definitely not running. I delete the resource plan while it is not active.

Just tested by shutting off the plan and turning it on 5 minutes later, then
pushing a replicated transaction, but did NOT cause an ORA-600 to occur.
So I cannot prove that this alone causes the error. I had also reported
earlier
that I had run a large periodic update producing thousands of replicated
transactions for a few hours and no additional ORA-600 was produced during
this time period.
Latest info from Fidelity:
1) true, problem only exists when resource_limit=TRUE
this init.ora parameter setting is a requiremnt for us to go live.

2) I did not specify an undo limit for the resource manager.
We are using the “old” rollback segments for this live install on
12-SEP-2003 and are NOT utilizing the new undo management feature
for this release due to reports that it was “buggy” in the first
release of oracle 9i.
Since we are not utilizing the undo mgt feature, what exactly do
you want me to try a workaround with?

Also, I could not find info on the 10720 trace event.
but since it undoes the undo mgt feature and we are not utilizing this
feature, is this still an issue? .

The current definition for the undo_pool feature within database
resource manager looks like this…

all 5 resource groups show the undo_pool entry as UNLIMITED.
———————-
The only info that I could find about event 10720 is what you have already
indicated and the little info in not 76237.1 “traces db scheduling”.

Fidelity has set the event 10720 in the system … Here is the outcome …
—————-
So far so good, event added to init.ora, icmq1 database bounced,
I have turned the database resource manager plan on and off several times in
row and pushed orep transactions and have so far,
not resulted in any alert log error and no ORA-600 errors.
I’ll continue to test.

Now, the patch that was applied to this database, 2494790, is this still
required or should it be backed out? if it should be backed out, how
is this done?
——————–
Can you provide feedback on the 2494790 patch?

回复
admin says

2011/04/28 at 23:46

Bug 2494790 Undo quota discrepancies may be reported for global transactions

This note gives a brief overview of bug 2494790.
The content was last updated on: 29-MAR-2004
Click here for details of each of the sections below.
Affects:

Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 9.0 but < 10.1.0.2 Versions confirmed as being affected 9.2.0.3 Platforms affected Generic (all / most platforms affected) Fixed: This issue is fixed in 9.0.1.5 iAS Patch Set 9.2.0.4 (Server Patch Set) 10.1.0.2 (Base Release) Symptoms: Related To: (None Specified) XA / Distributed Transactions Description When the resource manager is enabled, global transactions can cause undo quota discrepancy messages in the alert log. eg: KTUDECRUSTAT: undo quota discrepancy: xcb=0, xcbusz=52, ResMgr=0 Workaround: These messages can be ignored as they are not a serious problem.

回复
admin says

2011/04/28 at 23:49

Type B – Defect Fixed in Product Version –
Severity 2 – Severe Loss of Service Product Version 9.2.0.3.0
Status 36 – Duplicate Bug. To Filer Platform 23 – Oracle Solaris on SPARC (64-bit)
Created 08-Apr-2003 Platform Version 5.8
Updated 26-Nov-2003 Base Bug 2494790
Database Version 9.2.0.3.0
Affects Platforms Generic
Product Source Oracle Affected Version –
Related Products
Line Oracle Database Products Family Oracle Database
Area Oracle Database Product 5 – Oracle Server – Enterprise Edition
Hdr: 2894072 9.2.0.3.0 RDBMS 9.2.0.3.0 TXN MGMT LOCAL PRODID-5 PORTID-23 ORA-600 2494790
Abstract: ORA-600: INTERNAL ERROR CODE, ARGUMENTS: [KGSKDECRSTAT1]

PROBLEM:
——–
Customer is getting ORA-600: internal error code, arguments: [kgskdecrstat1]

DIAGNOSTIC ANALYSIS:
——————–

WORKAROUND:
———–
Unknown

RELATED BUGS:
————-
Bug:1533623> BugDB see Bug:1533623.-P No Base Bug
Related Bugs : DDR:
Customer: INTERNAL Created: 07-DEC-00
Component: RDBMS Comp Ver: 9.0.0 Rel St: A Updated: 05-JUL-01
Sub Comp: RDBMS Ver: 9.0.0 By: ARHEE
Status: 91,Closed, Could Not Reproduce
Sup Pri: 2,Severe Loss of Service Fixed In Ver:
O/S: 453 Sun SPARC Solaris
PL Group: UNIX Gen/Port: G Error #: – Pub: N

**************************************************************************

Hdr: 1533623 9.0.0 RDBMS 9.0.0 PRODID-5 PORTID-453
Abstract: ORA-600 [KGSKDECRSTAT1], [], [], [], [], [] ON SWITCH CONSUMER
GROUPS

REPRODUCIBILITY:
—————-

TEST CASE:
———-
N/A one time error

STACK TRACE:
————
ORA-600: internal error code, arguments: [kgskdecrstat1], [], [], [], [],
[], [], []
—– Call Stack Trace —–
calling call entry argument values in hex

location type point (? means dubious value)

——————– ——– ——————–
—————————-
ksedmp kgerinv kgeasnmierr kgskdecrstat kskdecrstat ktudecrustat ktcdso ktcrcm
ktdcmt k2lcom k2send xctctl xctcom opiodr ttcpip opitsk opiino opiodr opidrv
sou2o main _start

SUPPORTING INFORMATION:
———————–
Will upload related files.

回复

Oracle内部错误:ORA-00600[kgskdecrstat1]一例

Comments

Comment 取消回复