ORA-00600: internal error code, arguments: [15160]

一套windows上的11.1.0.7系统,用户在查询时出现ORA-00600: internal error code, arguments: [15160], [], [], [], [], [], [], [], [], [], [], []错误如下:

Dump continued from file: c:\app\administrator\diag\rdbms\dbuat\dbuat\trace\dbuat_ora_544.trc
ORA-00600: internal error code, arguments: [15160], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 16131 (ORA 600 [15160]) ========

*** 2010-07-20 15:49:20.015
----- Current SQL Statement for this session (sql_id=3vx5wh2859qy4) -----
SELECT * FROM ACVW_ACDCBIRD
WHERE CUST_AC_NO = NVL(:B5 , CUST_AC_NO)
AND BRANCH_CODE = :B4
AND ACCOUNT_CLASS = NVL(:B3 , ACCOUNT_CLASS)
AND CUST_NO = NVL(:B2 , CUST_NO)
AND CCY = NVL(:B1 , CCY)

----- PL/SQL Call Stack -----
object line object
handle number name
4648BA6C 811 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648BA6C 895 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648BA6C 199 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648BA6C 122 package body FCUAT.ACPKS_FCJ_ACDCBIRD
4648E458 13 anonymous block
4648F520 161 package body FCUAT.STPKS_FIDPKG_WRAPPER
464953BC 1940 package body FCUAT.STPKS_FCMAINT_SERVICE
464B8DE0 164 package body FCUAT.GWPKS_SERVICE_ROUTER
464B8DE0 1549 package body FCUAT.GWPKS_SERVICE_ROUTER
464B8DE0 1718 package body FCUAT.GWPKS_SERVICE_ROUTER
464B96E0 1 anonymous block

----- Call Stack Trace -----

PGOSF60__ksfdmp dbgexPhaseII dbgexProcessError dbgeExecuteForError

该ORA-600[15160]内部错误已知可能由以下Bug引起:

1)
Bug:8295719 - ORA-600 [15160] running query with FULL OUTER join Fixed-Releases: B201 Tags: ANSI CBO OERI REDISCOVERY INFORMATION:
If a query with nested views and Full Outer Join fails during parsing in Oracle11g with ORA-600 [15160] and Join Predicate Push Down (JPPD) is taking place and one of the nested view query blocks in the 10053 trace has a very high value for cost in the chosen plan (in section CBQT Join Predicate Push Down Additional Phase) then you may have encountered this bug.
WORKAROUND:
_optimizer_push_pred_cost_based = false

2)
Bug 7370515 - ORA-600[15160]: INTERNAL ERROR CODE
REDISCOVERY INFORMATION:
If you encountered ORA-600[15160] error when UNION/UNION ALL view is on the right side of null aware anti join, it could be this bug.
WORKAROUND:
Set "_optimizer_null_aware_antijoin"=false

3)
Bug 9826143 - UNION QUERY WITH OUTER JOIN AND GROUP BY CLAUSE FAILS WITH ORA-600 [15160] (36)
>>>>>>>>>>
Bug 9213751 - SQL GETS ORA-600 [15160] WITH _OPTIMIZER_PUSH_PRED_COST_BASED=TRUE (11)
From trace: [_optimizer_push_pred_cost_based = true]

4)
Bug 8947490 - ORA-600[15160] OCCURS WHEN USING SQL TUNING ADVISOR
REDISCOVERY INFORMATION:
a query with full outer joined union view produced ora-[15160] most likely during sqltune /index advisor.

可以从bug描述中了解到这是一个优化器相关的bug,一般与优化器相关的bug都可以通过关闭优化器的部分特性来workaround。针对ORA-600[15160]内部错误,我们可以尝试设置以下2个隐藏参数以绕过问题:

1) SQl>
Alter session set "_optimizer_null_aware_antijoin"=false;

/* to disable null aware anti join */

2) SQl>
Alter session set "_OPTIMIZER_PUSH_PRED_COST_BASED" =false ;

/*WORKAROUND: disable cost based push predicate */

以上多个bug据Oracle官方说法均已在11.2.0.1版本中修复了。

New to My Oracle Support?

晚上打开MOS的页面,发现居然多出了陌生的Get Started栏目,该栏目包含了引导用户在My Oracle Support上如何注册,以及观看使用教学视频等信息:
New to My Oracle Support?

帮助界面由Adobe flash驱动,似乎Oracle爱上了使用flash:不仅仅是此次推出的Get Started,之前在各个教学Note中也都加入了flash驱动的视频;好吧这些flash看上去还挺酷,也可以说是改善用户体验的一个重要环节,不过加载时间实在令人恼火!
How does MOS flash interface work?

下文是blog.oracle.com上对My Oracle Support中视频小插图(Video Vignettes)的介绍:

Video Vignettes for My Oracle Support: Featuring user-driven innovations

My Oracle Support: Featuring user-driven innovations

My Oracle Support is here and with it come new tools to help you find answers faster, more easily log and track SRs, and maintain highly available systems using healthchecks and patch recommendations designed specifically for your running environment.

Learn what’s new and start saving time today.

You can also use the Feedback link inside of My Oracle Support.
There are plenty of video previews available. There are also leader lead classes, and even sessions in other languages. Those posted here are only in English right now. Most are closed captioned for the hearing impaired.

Highlights

Overview of My Oracle Support (with an emphasis on migration of Sun customers)
This 6 minute video is of a live demo. It provides just the basics for a first time user of My Oracle Support. It has an emphasis on those who are coming from Sun support.
Registering for My Oracle Support (including Sun customers)
This 8 minute video is of a live demo. Although registration doesn’t require much explanation, it goes into “Support Identifiers” which is how you gain access to features and services. It has an emphasis on those who are coming from Sun support.
Registration and Account Administration (with an emphasis on migration of Sun customers)
This 15 minute video is of a live demo. Learn how registration works for My Oracle Support with an emphasis on how the administrator at your organization handles setting and approving access for others in your organization. It has a focus on understanding Support Identifiers, getting approval to access, and administration of permissions. We also include getting access with Sun contracts and serial numbers.
Introduction to the new Certify Search
This 4 minute video is of a live demo – no PowerPoint slides! We are rebuilding Certify to make it MUCH easier to find and understand certifications. This video is available PRIOR to shipping this feature, to encourage some discourse and feedback. Let us know your comments!
Certifications Searching – Advanced Uses
This 6 minute video is a live demo – no PowerPoint slides! Although there is not much to talk about in advanced training, we think some tips and tricks might be helpful, so here they are. Again this feature is not yet available, but we want to get customers some time to provide feedback to help in the process.
Upgrade Planner
This 8 minute video of the real software contains no PowerPoint slides. It covers the forthcoming feature to help customers plan an upgrade from one software release to another. Let us know what you think!
Upgrade Planner – Advanced Uses
This 10 minute video of the real software contains no PowerPoint slides. It covers upgrades which would require merge patches, patches which need to be requested and more details on how to access the upgrade planner and some tips and tricks. Let us know what you think!

Most Popular
These videos address frequently asked questions and are among our most watched.

How do I Install configuration manager
Learn the most efficient way to install Oracle Configuration Manager (OCM) in your environment to receive the full value from your support experience. Accurate as of Oct 2010.

Why use configuration manager?
Learn which features require users to download, install and use Oracle Configuration Manager (OCM), and whether these use cases are right for you.

Creating a Service Request
Review methods for creating a Service Request, the features of each step, and what to expect if you save drafts or file the Service Request (does not cover Sun “Hardware” quests, this will be covered in another session).
Dashboard Customization
Key ways to customize your experience by creating a tailored view of your Oracle support data. This video is applicable for customers with and without Oracle Configuration Manager (OCM) in use.
PowerView
Tailor your searches so you see only results relevant to your environment. Power View is especially useful for customers with large numbers of Support Identifiers and collectors.
Searching
Tips and tricks for getting the most out of Knowledge searching, refining searches, and advanced searches. Useful for all users. Mostly accurate, could use a refresh (Oct, 2010)
Product Overview
Explains My Oracle Support features available with and without Oracle Configuration Manager (OCM). (This one is much longer)
Reporting
Learn about the Inventory report and its many features. This demo is for customers using Oracle Configuration Manager (OCM).

My Opinion On Exadata V2

上周参加了Oracle公司的Exadata的网络研讨会;Oracle的销售重点介绍了Exadata V2目前的产品线,虽然要比原先和HP合作时的Exadata V1丰富一些,但我始终觉得目前有限的几种组合很难满足客户的需求。其次假设真买了Exadata的机器,先不说从硬件到软件全部由一个供应商垄断的问题,就目前国内的情况要提供全面的维保也很难做到,因为这是一个全新的产品,不论是普通it从业人员还是久经沙场的DBA对之都缺乏了解,一旦发生问题Oracle中国部分如果解决不了,之后就只能通过提交SR让老外来帮忙解决,而这样一来一去企业的损失是无法估量的。

到目前为止Oracle所拿出的客户成功案例中,最上得了台面的是澳洲联邦银行(Commonwealth Bank of Australia (CBA))基于Exadata实现系统整合的项目,但实际上根据zdnet.com.au的报道,这个项目中的一个应用在今年的7月还处于测试阶段而没有正式上线:

The first application using the database-as-a-service capability will go into production soon, the bank said. Capacity and updating Oracle database clusters to 11gR2 could see further purchases, according to CBA.

事实上我个人对Exadata所带来的技术变革有着极大的兴趣,如果Exadata能够获得成功的话,那么Oracle就在成功收购Sun后在整合硬件领域上迈出了长足的一步,Database Machine的噱头加上自家的老牌商业数据库足可以使Oracle赚到盆满钵满;但Exadata V2始终还有很长的一段路要走,希望它能越走越远!

lower case or UPPER CASE hostname in RAC?

RAC中各节点的hostname是设成大写(UPPER CASE)形式还是小写(lower case)形式好呢?
一开始被这个问题问得有些莫名?之后觉得一切都有必要让事实说话,回顾一下我们使用RAC的惨痛经历便见分晓:

Bug 5168043If node names supplied in VIPCA silent mode are uppercase, VIPCA causes the following error:
Invalid node name "" entered in an input argument.
Workaround   Use VIPCA GUI mode.
Oracle Versions confirmed as being affected - 10203 ( fixed in 10.2.0.4)
PROBLEM:
--------
Env 3 node 10.2.0.1 AIX 5L 

During installation of CRS, root.sh gives "Invalid node name"
error if hostname is written in Capital letters in /etc/hosts.

 # root.sh
  WARNING: directory '/oracle' is not owned by root
  Checking to see if Oracle CRS stack is already configured

  clscfg: version 3 is 10G Release 2.
  Successfully accumulated necessary OCR keys.
  Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
  node : 

  clscfg: Arguments check out successfully.

  Expecting the CRS daemons to be up within 600 seconds.
  CSS is active on these nodes.
  CSS is active on all nodes.
  Waiting for the Oracle CRSD and EVMD to start
  Oracle CRS stack installed and running under init(1M)
  Running vipca(silent) for configuring nodeapps

DIAGNOSTIC ANALYSIS:
--------------------

We can reproduce the error if we issue vipca like below.

$ORA_CRS_HOME/bin/vipca -silent -nodelist {CAPITAL1},{CAPITAL2}  \
  -nodevips {CAPITAL1}/{CAPITALV1}/255.255.252.0/eth0, \
            {CAPITAL2}/{CAPITALV2}/255.255.252.0/eth0
 Invalid node name "{CAPITAL1}" entered in an input argument.
 Invalid node name "{CAPITAL1}" entered in an input argument.

WORKAROUND:
-----------
- do not write capital letters in /etc/hosts
- use vipca(GUI) and configure VIP resource afterwards.

RELATED BUGS:
-------------
BUG#4632899 solaris port specific bug regarding capital letters.
            May not be related with vipca.

REPRODUCIBILITY:
----------------
100% in 10.2.0.1 AIX
100% in 10.2.0.1 Linux

Bug 4632899 CSS does not start on Solaris if the hostname is in upper case.
 The following errors might be noticed when using CRS scripts:
  Failure at scls_scr_create with code 1
  Category: 1234
  Operation: scls_scr_create
  Location: mkdir
  Other: Unable to make user dir
  Dep: 2

Workaround
  Change the hostname to from uppercase to lower case.
Versions confirmed as being affected - 10203 ( Issue fixed in 11.1.0.6)
PROBLEM STATEMENT:
------------------
The localconfig script fails to startup the CSS in single instance
configuration if the hostname of the server contains capital letters.

Errors encountered :
===
Failure at scls_scr_create with code 1
Internal Error Information:
  Category: 1234
  Operation: scls_scr_create
  Location: mkdir
  Other: Unable to make user dir
  Dep: 2
===

Bug 6674075: RAC: PRKR-1078 FROM SRVCTL MODIFY DATABASE WITH UPPER CASE DB NAME

PROBLEM:
--------
Two Node RAC running 10.2.0.3 CRS,ASM and RDBMS.
Customer has installed another 10.2.0.3 home for RDBMS and trying to change
the oracle home for the existing database.

We ran the "srvctl modify" command from both existing and the new oracle
homes (both are 10.2.0.3) and got the same PRKR-1078 error.

The entry for the CONFIG_VERSION looks okay in the OCR.

DIAGNOSTIC ANALYSIS:
--------------------
The entries for the database JUNK and two instances in the OCR looks okay and
the DATABASE.DATABASES.JUNK.CONFIG_VERSION 10.2:

from the OCRDUMP:
=========================
[DATABASE.DATABASES.junk.CONFIG_VERSION]
ORATEXT : 10.2.0.0.0
========================

from the srvm_trace of srvctl modify :
==========================
[main] [12:55:59:677] [has.ClusterLockNative.Native]  prsr_trace: Native:
acquire: get mutext SRVM.DATABASE.DATABASES.junk

[main] [12:55:59:678] [has.ClusterLockNative.Native]  prsr_trace: Native:
acquire: ret=0

[main] [12:55:59:679] [ClusterLock.acquire:127]  Acquire returned nr=true
status=true
[main] [12:55:59:679] [OCR.keyExists:664]
OCR.keyExists(DATABASE.DATABASES.JUNK.CONFIG_VERSION)
[main] [12:55:59:679] [nativesystem.OCRNative.Native]  keyExists: calling
procr_open_key(DATABASE.DATABASES.JUNK.CONFIG_VERSION)

[main] [12:55:59:694] [nativesystem.OCRNative.Native]  keyExists:
procr_open_key retval = 4 

[main] [12:55:59:694] [OCRTree.setDBOracleHome:1572]  DATABASE.DATABASES.junk
incompatible version
[main] [12:55:59:729] [OCRTree.setDBOracleHome:1587]  Releasing exclusive
lock SRVM.DATABASE.DATABASES.junk
[main] [12:55:59:729] [has.ClusterLockNative.Native]  prsr_trace: Native:
unlock

PRKR-1078 : Database JUNK cannot be administered using current version of
srvctl. Instead run srvctl from /oracle/product/10.2.0/db0
=====================

Bug 7560908: CANN'T START SERVICES ,IF ITS INSTANCES REGISTERD WITH NODE NOME IN UPPER CASE
PROBLEM:
--------
- If we try to start service we got the following errors 

oracle@RZ-DB-44:~> srvctl start service -d db1 -s srv1
PRKP-1030 : Failed to start the service srv1.
CRS-1007: Failed after successful dependency consideration
CRS-0223: Resource 'ora.db.srv1.cs' has placement error.

As per Note 372145.1 - Service start fails with CRS-1030 Crs-1007 Crs-0223 

we delete the services and the database resource and its instances ,then we
add the database again but at this time we add the database instance node
name with lower case ,as it was in upper case ,

- after that we are able to start the database instance
.......................

答案很简单,lower case好于UPPER CASE!不仅仅是hostname,使用大写的rac资源(resource)在某些版本中也可能引发bug!当你在设计RAC系统是有必要把大小写这一问题考虑进去,事实上我相信大多数人和我一样喜欢使用小写字母加数字的形式,譬如我的rh1,rh2,rh3……….

11g Release 2 enhanced Tablespace Point In Time Recovery

11g release 2中引入了针对被dropped掉的表空间的表空间时间点恢复,这是一种十分有用的新特性。TSPITR(TablesSpace Point In Time Recovery)在10g中就能做到自动创建辅助实例以恢复表空间到某个时间点,但在10g中是无法恢复一个已经被drop掉的表空间的。如同10g中一样11g仍旧可以利用全自动的TSPITR恢复被drop的表空间;Oracle会自动创建并启动辅助实例,且仅仅还原那些恢复所需的控制文件,system,sysaux,undo表空间及目标表空间,这些工作都将在用户指定的辅助目的地’Auxiliary Destination’中完成;之后Oracle将进一步使用辅助实例recover目标表空间到指定的时间点,并将其中的数据以Data Pump传输表空间的形式倒回到原数据库当中。
接下来我们要具体测试这一新特性,我们会创建一个示例表空间并在该表空间上产生少量数据,之后我们将对数据库进行备份,drop目标示例表空间,并在RMAN中使用TSPITR的方式将已经被drop掉的表空间恢复回来。在正式drop表空间前我们当然需要留意时间点或者当时的scn号,以保证正常恢复,同时在测试时使用recovery catalog恢复目录,虽然我们同样可以不用。
[Read more…]

Oracle DRM kjfcdrmrfg 超时错误分析一例

实例节点的警告日志,错误如下:

Fri Mar 28 04:52:24 2008
Errors in file /oracle/oracle/admin/UC/bdump/u1_lmon_9208.trc:
ORA-00481: LMON process terminated with error
Fri Mar 28 04:52:24 2008
LMON: terminating instance due to error 481
Fri Mar 28 04:52:24 2008
System state dump is made for local instance
System State dumped to trace file /oracle/oracle/admin/UC/bdump/u1_diag_9204.trc Fri Mar 28 04:52:28 2008
Instance terminated by LMON, pid = 9208
Fri Mar 28 04:57:08 2008
Starting ORACLE instance (normal)
… …
2 实例节点的警告日志片段如下:

Fri Mar 28 04:52:24 2008
Trace dumping is performing id=[cdmp_20080328045224] Fri Mar 28 04:52:32 2008
Reconfiguration started (old inc 4, new inc 6)
List of nodes: 1
Fri Mar 28 04:57:21 2008
Reconfiguration started (old inc 4, new inc 8)
List of nodes: 0 1
Nested reconfiguration detected. Fri Mar 28 05:02:27 2008
LMON: terminating instance due to error 484
Fri Mar 28 05:02:27 2008
System state dump is made for local instance
System State dumped to trace file /oracle/oracle/admin/UC/bdump/u2_diag_16093.trc Fri Mar 28 05:02:29 2008
Errors in file /oracle/oracle/admin/UC/bdump/u2_diag_16093.trc:
ORA-00484: LMS* process terminated with error
Fri Mar 28 05:02:29 2008
Trace dumping is performing id=[cdmp_20080328050227] Fri Mar 28 05:02:32 2008
Instance terminated by LMON, pid = 16097
… …
根据 LMON 超时的错误表现,导致 CLUSTER 异常踢出节点的原因,首先对如下方面考虑: 1. 操作系统 CPU/内存使用异常,导致请求不到所需资源,访问超时造成
2. 后台存储系统的 I/O 请求出现挂起,导致对数据文件(如:控制文件等)的访问挂起超时造成

3. 网络出现中断或阻塞,导致 LMON 通信异常,造成超时

4. ORACLE 软件在某些情况下处理异常(BUG) 可能性分析一:操作系统 CPU/内存使用异常状况的
操作系统方面。经检查对应的时间段内,处理器与内 存的使用均表现正常。在应用方面也没有新的动作操作。

可能性分析二:后台存储系统的 I/O 请求出现挂起的

后台存储导致数据库挂起、请求超时造成 CLUSTER 异常并进行 reconfiguration 动作的案例,我们遇到 过两例,分别为:

◆ 因为 AIX OS 漏洞造成存储驱动数据通讯异常,导致节点错误被踢出
◆ 因为存储系统程序本身的 BUG,造成高数据并发访问请求时的 CKPT 进程访问控制文件挂起(在 OS 端无错误,因为 I/O 请求无响应),最终导致数据库内部访问延时的超时定义被触发(详细说明见下
面部分),数据库节点关闭,导致节点错误被踢出

本次事件由于没有关于存储的监控或异常日志,无法判断是否有可能由此问题造成错误。但这个方面的检 查,仍然值得考虑。理由:我们在 LMON 的日志中发现,LMON 访问超时 900 秒,当时 LMON 访问的 数据文件为数据库控制文件。

逻辑结构图如下:

cgs_IMR
对应的超时时间为 900 秒,符合 CKPT 的超时定义,因此引起了我们的注意。关于 CKPT 方面的资料, 我们介绍如下:

■ CKPT 进程每 3 秒更新一次控制文件,这个操作我们可以成为“心跳”。CKPT 进程将为每一个实例 写唯一的一个数据块在控制文件头部, 通过这个信息维护心跳,成为“检查点进程记录”。当然在单
节点数据库也存在此信息。LMON 给每个 LMON 进程发送控制与状态信息。

在数据库内部,这些信息的接收具有 TIMEOUT 超时设定。如果信息发送失败或者出现接收到空信息,那 么 reconfiguration 动作将被触发。

■ LMON 在数据库通过内部参数 _cgs_send_timeout 定义超时,缺省为 300 秒。
■ CKPT 在数据库通过参数“ _controlfile_enqueue_timeout”定义超时,缺省为 900 秒。

本次的 LMON 进程日志 显示, CGS 超 时 条 件被触发 是本 次问题发 生的 触发条件 , U1 的 日 志 u1_lmon_9208.trc 部分见下:

Begin DRM(41)
*** 2008-03-28 04:36:52.893 <<- – CLOCKTIME BEGIN
sent syncr inc 4 lvl 1097 to 0 (4,0/31/0) sent synca inc 4 lvl 1097 (4,0/31/0) sent syncr inc 4 lvl 1098 to 0 (4,0/34/0) sent synca inc 4 lvl 1098 (4,0/34/0) sent syncr inc 4 lvl 1099 to 0 (4,0/36/0) sent synca inc 4 lvl 1099 (4,0/36/0)
… …
sent synca inc 4 lvl 1104 (4,0/38/0)
*** 2008-03-28 04:51:54.355 <<- – 300S TIMEOUT TRIGGED kjfcdrmrfg: SYNC TIMEOUT (610433, 609532, 900), step 31
… …
而在 DIAG 进程的日志中可以发现,LMON 当时在访问控制文件,如下信息:

OSD pid info: Unix process pid: 9208, image: oracle@cupd25k-c (LMON) Dump of memory from 0x0000000402541A38 to 0x0000000402541C40
… …
Repeat 23 times
(FOB) flags=2 fib=5949008e0 incno=0 pending i/o cnt=0 fname=/dev/vx/rdsk/oradgUC/lv_uc_ctl3
fno=2 lblksz=4 fsiz=98499024
(FOB) flags=2 fib=594900540 incno=0 pending i/o cnt=0 fname=/dev/vx/rdsk/oradgUC/lv_uc_ctl2
fno=1 lblksz=4 fsiz=98499024
(FOB) flags=2 fib=5949001a0 incno=0 pending i/o cnt=0 fname=/dev/vx/rdsk/oradgUC/lv_uc_ctl1
fno=0 lblksz=4 fsiz=98499024
OSD pid info: Unix process pid: 9208, image: oracle@cupd25k-c (LMON) Dump of memory from 0x0000000402541A38 to 0x0000000402541C40
… …
Repeat 23 times
(FOB) flags=2 fib=5949008e0 incno=0 pending i/o cnt=0 fname=/dev/vx/rdsk/oradgUC/lv_uc_ctl3
fno=2 lblksz=4 fsiz=98499024
(FOB) flags=2 fib=594900540 incno=0 pending i/o cnt=0 fname=/dev/vx/rdsk/oradgUC/lv_uc_ctl2
fno=1 lblksz=4 fsiz=98499024
(FOB) flags=2 fib=5949001a0 incno=0 pending i/o cnt=0 fname=/dev/vx/rdsk/oradgUC/lv_uc_ctl1
fno=0 lblksz=4 fsiz=98499024
kjfsprn: dmap ver 4 (step 0)
DEFER MSG QUEUE ON LMD0 IS EMPTY SEQUENCES:
0:0.0 1:870037.0
DEFER MSG QUEUE ON LMS0 IS EMPTY SEQUENCES:
… …
DEFER MSG QUEUE ON LMS5 IS EMPTY SEQUENCES:
0:0.0 1:3464186.0
而此错误在高负载和锁请求频繁时发生,可以认为本次错误与网络拥塞无关。 可能性分析四:ORACLE 代码处理异常(BUG)

我们在分析最终认为,此种情况的可能性为 90%,不排除有其它原因(如 I/O HANG)导致 ORACLE 数据库对应的代码出错。不管起因如何,ORACLE 在 10.2.0.2 版本上,在本次事件的异常情况下处理, 在代码上存在缺陷。

理由一:DRM 检查未完成

DRM(dynamic resource mastering)功能在 ORACLE9.2 版本上存在但未被激活,因此只有 10G 版本存在此错误。几乎可以肯定的是,DRM 的轮询功能未能正常完成,应该是 ORACLE 本身对异常的处 理发生错误。

理由二:KJCDRMCFG 函数出现报错导致 LMON 异常

ORACLE 10.2.0.1~10.2.0.3 在 KJCDRMCFG 上面有很多 BUG,在高版本 10.2.0.4 上大部分已
经修复。此函数主要功能等待接收动态资源 REMASTER 信息。
三、结论、解决方案与建议
根据本次分析,我们得到的结论是:因为后台磁盘 I/O 挂起或其它异常原因,导致数据库在应用请求 不是在高负载情况下出现了异常,导致 DRM 过程中 CGS 超时条件被触发。
而此种超时,是由于 DRM 部分的处理代码,未能正确处理这部分异常造成,属于 ORACLE 软件的 BUG 类别。
修复建议:DRM 功能关闭或进行软件升级
此错误由 DRM 功能引起,因此可以选择将 DRM 功能关闭,将不再出现此错误。使用如下内部参数 关闭 DRM 功能:
1. _gc_integrity_checks = false

2. _gc_affinity_time=0

(DRM 在 ORACLE 10GR2 上生效)
3.2.1 关于DRM作用 DRM 在 10gr2 上被打开。每一个数据块在 RAC 下面均有一个主节点具有控制权。此节点将拥有此数
据块的锁资源。其它节点对此数据块的请求将进入请求队列。而决定数据块在那个节点上具有控制权及切 换,将有 DRM 轮询负责完成。DRM 将使 remaster 变快,提升了 RAC 的性能。
关闭 DRM 将导致 reconfiguration 动作比正常动作慢。同时,关闭 DRM 将因为数据块的 remaster 功能的关闭,损失部分数据库性能(大约 2~5%)。但关闭 DRM 功能不会有其它的负面作用。

Applying Database PSU 10.2.0.4.6

Last week Patch set update 10.2.0.4.6 has been released, this psu includes “All fixes in the 10.2.0.4.5 overlay patch” and “New fixes which are included in the Critical Patch Update for October 2010 (CPUOCT2010)”.
Now I am trying to apply it on my 10.2.0.4.5 instance!
[Read more…]

Oct 12: Patch Set Update Released

一周不关注MOS就变得落伍了!在上周的Oct 12,Oracle分别发布了10.2.0.4 Patch Set的PSU 6(10.2.0.4.6)和10.2.0.5的第一个PSU。
10.2.0.4.6补丁更新集包含了所有10.2.0.4.5中的fix,此外添加了CPU(Critical Patch Update) For October 2010 (CPUOCT2010);而10.2.0.5.1则包含了Critical Patch Update for October 2010 (CPUOCT2010),并修复了以下bug:

Bug:6402302  Streams table and schema rules not consistently applied for DDL on view/synonym
Bug:7519406  Larger trace than needed for ORA-8103 under kteincnt1
Bug:8544696  Table segment growth (blocks are not reused) with ASSM
Bug:8546356  ORA-8102/ORA-1499/OERI[kdsgrp1] Composite Partitioned Index corruption after rebuild ONLINE in RAC
Bug:8664189  OERI [kdiss_uncompress: buffer length] on key compressed index
Bug:9711859  ORA-600 [ktsptrn_fix-extmap] during extent allocation caused by bug 8198906
Bug:9713537  Message file change for ORA-600
Bug:9714832  Message file change for ORA-7445
  • 10.2.0.4.6 PSU Note:https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=9952234.8
  • 10.2.0.4.6 Known Issues:https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1230884.1
  • 10.2.0.5.1 PSU Note:https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=9952230.8
  • 10.2.0.5.1 Known Issues:https://support.oracle.com/CSP/main/article?cmd=show&type=NOT&id=1230855.1

同时10.2.0.5已经被confirm成为10g release 2的Final Patch Set:

10.2.0.5.0 – Patch Set #4, List of fixes: Note 1088172.1 ==> Last Patch Set
10.2.0.5.1 – Patch Set Update (PSU 1): Note 9952230.8

记以录之。

11g r2中对闪回数据归档的增强

11g r1中引入了闪回数据归档新特性,其工作原理是为针对启用了归档方式的表,FBDA进程将创建对应于该表的内部历史表。历史表将拥有原始表的所有列加上某些时间戳列以便跟踪事务处理的变化,具体如下:

SQL> desc SYS_FBA_HIST_75718;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 RID                                                VARCHAR2(4000)
 STARTSCN                                           NUMBER
 ENDSCN                                             NUMBER
 XID                                                RAW(8)
 OPERATION                                          VARCHAR2(1)
 T1                                                 NUMBER(38)
 T2                                                 VARCHAR2(20)
/* 注意其中T1,T2为原始表上的应用数据列 */

[Read more…]

11g新特性-SQL PLUS 错误日志

习惯使用SQLPLUS管理Oracle的朋友肯定不会对使用show errors命令来确认PL/SQL匿名块或过程执行错误感到陌生。似乎在11g以前这是唯一的一种勘错途径了,可惜的是这部分show errors显示的错误信息往往不够全面同时也无法立即保存下来。11g r1中引入了新的错误日志特性,以便DBA或应用开发人员在调试PL/SQL程序时更高效地排除错误。通过set errorlogging命令可以在SQL执行期间将所有的错误记录存储在一个特定的错误日志表中。默认情况下,set errorlogging会将错误记录写入到名为SPERRORLOG的表中。可以客制化该表的表名,错误日志表记录错误的各种属性:1.引发错误用户的用户名;2.错误发生时间;3.包含引发错误语句的脚本名;4.用户自定义的标识符;5.SP2,ORA,PLS错误消息;6.具体引发错误的语句。

SQL> desc sperrorlog;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 USERNAME                                           VARCHAR2(256)
 TIMESTAMP                                          TIMESTAMP(6)
 SCRIPT                                             CLOB
 IDENTIFIER                                         VARCHAR2(256)
 MESSAGE                                            CLOB
 STATEMENT                                          CLOB

SQL> show errorlogging ; 
errorlogging is OFF

/* 默认情况下错误日志功能是关闭的,我们需要手动打开它 */

SQL> set errorlogging on;

SQL> show errorlogging ; 
errorlogging is ON TABLE SYS.SPERRORLOG

/* 可以看到成功打开错误日志功能,并会将后续的错误日志写入到当前用户(SYS)名下的SPERRORLOG表 */

SQL> select 1 from abc;
select 1 from abc
              *
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> set linesize 200;
SQL> col username for a20;
SQL> col message for a40
SQL> col statement for a40;

/* session A */
SQL> select 1 from abc;
select 1 from abc
              *
ERROR at line 1:
ORA-00942: table or view does not exist


SQL> select username,message,statement from sperrorlog;

USERNAME             MESSAGE                                  STATEMENT
-------------------- ---------------------------------------- ----------------------------------------
SYS                  ORA-00942: table or view does not exist  select 1 from abc

/* 此时在session B中查询 */


SQL> select username,message,statement from sperrorlog;
no rows selected

/* 换而言之当errorlogging被激活后,发生错误时Oracle会将错误日志追加到SPERRORLOG表上,但并不commit; */
/* 这可能导致V$LOCK动态性能视图中显示大量的TX锁,虽然是虚惊:) */

[Read more…]

沪ICP备14014813号-2

沪公网安备 31010802001379号