Oracle内部错误:ORA-00600:[6033]一例

一套HP-UX上的9.2.0.8系统,某条查询语句执行时出现ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []内部错误,错误trace信息如下:

*** SESSION ID:(583.18281) 2010-12-20 22:49:01.364
*** 2010-12-20 22:49:01.364
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [6033], [], [], [], [], [], [], []
Current SQL statement for this session:
SELECT INTERFACE_HEADER_ID, DOCUMENT_SUBTYPE, AGENT_ID, VENDOR_SITE_ID FROM PO_HEADERS_INTERFACE WHE
RE WF_GROUP_ID = :B1 ORDER BY INTERFACE_HEADER_ID
----- PL/SQL Call Stack -----
object line object
handle number name
c0000001067e3328 4332 package body APPS.PO_AUTOCREATE_DOC
c0000000fd267060 1 anonymous block
c000000108fe4d60 1979 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1745 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1099 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 560 package body APPS.WF_ENGINE_UTIL
c000000108fe4d60 1863 package body
PL/SQL call stack truncated after 1024 bytes.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+184 ? ksedst() C0000000CEB36420 ?
400000000147994B ?

已知的ORA-00600:[6033]错误一般和索引逻辑讹误相关,metalink上有相关的Note建议在出现该错误后运行analyze table validate structure cascade语句以验证表与索引间的数据正确性。

ORA-600 [6033] "null value retrieved from index leaf lookup" [ID 45795.1]
Modified 03-JUN-2010 Type REFERENCE Status PUBLISHED
Note: For additional ORA-600 related information please read Note:146580.1

PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
Run the ANALYZE command on any tables and indexes in the
trace file:
Example: ANALYZE TABLE
 VALIDATE STRUCTURE CASCADE;
Rebuild any corrupted indexes.
Index corruption.
Known Bugs

NB Bug Fixed Description
6401576 9.2.0.8.P22 OERI[ktbair1] / ORA-600 [6101] index corruption possible
5845232 9.2.0.8.P06 Block corruption / errors from concurrent dequeue operations
2718937 9.2.0.4, 10.1.0.2 OERI:6033 from SELECT on IOT with COMPRESSED PRIMARY KEY
1573283 8.1.7.2, 9.0.1.0 OERI:6033 from ALTER INDEX .. REBUILD ONLINE PARAMETERS ('OPTIMIZE FULL')
Certain index operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] , ORA-600 [ktbair1] , ORA-600 [kcbzpb_1], ORA-600 [4519] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Concurrent dequeue operations can lead to block corruption / memory corruption with varying symptoms such as ORA-600 [6033], ORA-600 [6101] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled. Note: This issue was previously fixed under bug 5559640 but that fix had a serious problem which could lead to SGA memory corruption. This fix supercedes the fix for bug 5559640. The problem with patch 5559640 is alerted in Note:414109.1 This fix is superceeded by the fix for bug 6401576.

通过analyze table validate structure cascade命令验证索引后若存在问题则会进一步产生相关的trace文件,一般这类索引逻辑讹误的问题可以通过drop-recreate索引来解决。

基于行跟踪的ROWDEPENDENCIES ORA_ROWSCN信息

在Oracle 10g中的引入了ORA_ROWSCN伪列新特性。基于此种伪列所提供的信息,我们可以方便地找出某个数据块或某一个行最近被修改
的时间戳。在默认情况下,10g下表会以非行依赖性(NOROWDEPENDENCIES)的属性创建,这意味着我们可观察的ORA_ROWSCN信息是以块级跟踪的,无法分辨同一块内的多行间不同的修改时间。为了达到行级粒度的跟踪我们需要在建表时指定基于行依赖性的ROWDEPENDENCIES字句。如:

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> drop table maclean;
Table dropped.

SQL> create table maclean(t1 int,t2 timestamp) ;
Table created.

SQL> insert into maclean values(1,systimestamp);
1 row created.

SQL> commit;
Commit complete.

SQL> insert into maclean values(2,systimestamp);
1 row created.

SQL> commit;
Commit complete.


SQL> alter session set nls_timestamp_format='hh24:mi:ss';
Session altered.

SQL> col t2 for a35
SQL> col orscn for a35

SQL> select scn_to_timestamp(ora_rowscn) orscn,t2 from maclean;

ORSCN                               T2
----------------------------------- -----------------------------------
20:30:11                            20:29:56
20:30:11                            20:30:10

/* 可以看到默认情况下创建的数据表使用块级依赖性追踪(Block-level Dependency Tracking)
   故而其返回的ORA_ROWSCN伪列仅能代表某数据块最近被更新的SCN
*/

create table maclean_rd(t1 int,t2 timestamp) rowdependencies;
Table created.

SQL> select table_name,dependencies from user_tables where dependencies!='DISABLED';
TABLE_NAME                     DEPENDEN
------------------------------ --------
MACLEAN_RD                     ENABLED


/* 包括字典基表在内所有的表都会默认以NOROWDEPENDENCIES创建*/

SQL> insert into maclean_rd values(1,systimestamp);

1 row created.

SQL> commit;

Commit complete.

SQL> insert into maclean_rd values(2,systimestamp);

1 row created.

SQL> commit;

Commit complete.

SQL> select scn_to_timestamp(ora_rowscn) orscn,t2 from maclean_rd;

ORSCN                               T2
----------------------------------- -----------------------------------
20:31:26                            20:31:25
20:31:35                            20:31:37

/* 可以看到在行依赖性跟踪情况下,ORA_ROWSCN反映的时间戳与插入的时间戳间仍会有误差;
   显然这种误差部分源于scn_to_timestamp函数使用的smon_scn_time SCN记录表也仅是粗略记录SCN对应的时间戳。
*/

SQL> select dbms_rowid.rowid_relative_fno(rowid),dbms_rowid.rowid_block_number(rowid) from maclean_rd;

DBMS_ROWID.ROWID_RELATIVE_FNO(ROWID) DBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID)
------------------------------------ ------------------------------------
                                   1                                94122
                                   1                                94122

/* 以上通过rowid找到了插入的2行所在的数据块*/



SQL> alter system dump datafile '/s01/10gdb/oradata/CLINICA/datafile/o1_mf_system_6fp8d3f0_.dbf' block 94122;
System altered.

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug tracefile_name;
/s01/10gdb/admin/clinica/udump/clinica_ora_12934.trc

block_row_dump:
tab 0, row 0, @0x1f88
tl: 24 fb: --H-FL-- lb: 0x0  cc: 2
dscn 0x0000.0351b8bd
col  0: [ 2]  c1 02
col  1: [11]  78 6f 01 02 15 20 1a 21 d8 52 68
tab 0, row 1, @0x1f70
tl: 24 fb: --H-FL-- lb: 0x0  cc: 2
dscn 0x0000.0351b8c4
col  0: [ 2]  c1 03
col  1: [11]  78 6f 01 02 15 20 26 02 ab c2 f8

/* 可以从block dump中看到每行都多出了dscn信息,这就是基于行追踪的行级ORA_ROWSCN信息的来源。
   注意这里的dscn需要占用6个字节的空间,换而言之启用ROWDEPENDENCIES会为每一行多出6个字节的磁盘开销。
*/

/* 此外行级追踪仅能在创建表(create table)的同时指定,而无法使用ALTER TABLE来修改  */

SQL> alter table maclean move tablespace users ROWDEPENDENCIES;
alter table maclean move tablespace users ROWDEPENDENCIES
                                          *
ERROR at line 1:
ORA-14133: ALTER TABLE MOVE cannot be combined with other operations

Oracle内部错误:ORA-00600[17175]一例

一套HP-UX上的10.2.0.4系统出现ORA-00600[17175] Oracle600内部错误,相关的日志信息如下:

Wed Dec 1 01:57:55 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc:
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict
Wed Dec 1 01:57:57 2010
Trace dumping is performing id=[cdmp_20101201015757]
Wed Dec 1 01:58:05 2010
LGWR: terminating instance due to error 472
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms1_3291.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms2_3293.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms3_3295.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms0_3289.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmon_3283.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmd0_3287.trc:
ORA-00472: PMON process terminated with error
Wed Dec 1 01:58:05 2010
Shutting down instance (abort)
License high water mark = 421

/u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: HP-UX
Node name: XGP2_db1
Release: B.11.31
Version: U
Machine: ia64
Instance name: xgp21
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 01:57:55.933
*** SESSION ID:(333.1) 2010-12-01 01:57:55.933
*** 2010-12-01 01:57:55.933
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []
ORA-00601: cleanup lock conflict


ksedst <- ksedmp <- ksfdmp <- kgeriv <- kgesiv
<- kgesic1 <- kghcln <- kslilcr <- $cold_ksl_cleanup <- ksepop
<- kgepop <- kgesev <- ksesec0 <- $cold_kslges <- ksl_get_child_latch
<- kslgpl <- es <- ksfglt <- kghext_numa <- ksmasgn
<- kghnospc <- $cold_kghalo <- ksmdacnk <- ksmdget <- ksosp_alloc
<- ksoreq_submit <- ksbsrv <- kmmssv <- kmmlsa <- kmmlod
<- ksucln <- ksbrdp <- opirip <- $cold_opidrv <- sou2o
<- $cold_opimai_real <- main <- main_opd_entry

PROCESS STATE
-------------
Process global information:
process: c00000018d000078, call: c00000018d252238, xact: 0000000000000000, curses: c00000018d2508a8, usrses: c00000018d2508a8
----------------------------------------
SO: c00000018d000078, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: c00000018d252238/c00000018d252238, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 48
last post received-location: ksoreq_reply
last process to post me: c00000018d037978 1 64
last post sent: 0 0 24
last post sent-location: ksasnd
last process posted by me: c00000018d001058 1 6
(latch info) wait_event=0 bits=90
holding (efd=5) c00000020001d500 Parent+children shared pool level=7
Location from where latch is held: kghfrunp: alloc: clatch nowait:
Context saved from call: 0
state=busy, wlstate=free
holding (efd=5) c00000020000b5f8 OS process allocation level=4
Location from where latch is held: ksoreq_submit:
Context saved from call: 13835058076152957304
state=busy, wlstate=free
Process Group: DEFAULT, pseudo proc: c0000004dd263230
O/S info: user: oracle, term: UNKNOWN, ospid: 3250
OSD pid info: Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)


SO: c0000004df4d5f28, type: 19, owner: c00000018d000078, flag: INIT/-/-/0x00
GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0 inc=4
outq=0 sndq=0 opid=2 prmb=0x0
mbg[i]=(2 19) mbg[b]=(0 0) mbg[r]=(0 0)
fmq[i]=(4 1) fmq[b]=(0 0) fmq[r]=(0 0)
mop[s]=20 mop[q]=1 pendq=0 zmbq=0
nonksxp_recvs=0
------------process 0xc0000004df4d5f28--------------------
proc version : 0
Local node : 0
pid : 3250
lkp_node : 0
svr_mode : 0
proc state : KJP_NORMAL
Last drm hb acked : 0
Total accesses : 181
Imm. accesses : 180
Locks on ASTQ : 0
Locks Pending AST : 0
Granted locks : 0
AST_Q:
PENDING_Q:
GRANTED_Q:
----------------------------------------
SO: c00000018d2f3610, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,
event: 1, last message event: 1,
last message waited event: 1, messages read: 0
channel: (c0000004dd29fdb0) scumnt mount lock
scope: 1, event: 19, last mesage event: 0,
publishers/subscribers: 0/19,
messages published: 0
SO: c00000018d2508a8, type: 4, owner: c00000018d000078, flag: INIT/-/-/0x00
(session) sid: 333 trans: 0000000000000000, creator: c00000018d000078, flag: (51) USR/- BSY/-/-/-/-/-
DID: 0001-0002-00000003, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYS
service name: SYS$BACKGROUND
last wait for 'latch: shared pool' blocking sess=0x0000000000000000 seq=342 wait_time=175677 seconds since wait started=0
address=c0000002000fff60, number=d6, tries=7
Dumping Session Wait History
for 'latch: shared pool' count=1 wait_time=175677
address=c0000002000fff60, number=d6, tries=7
for 'latch: shared pool' count=1 wait_time=97554
address=c0000002000fff60, number=d6, tries=6
for 'latch: shared pool' count=1 wait_time=78023
address=c0000002000fff60, number=d6, tries=5
for 'latch: shared pool' count=1 wait_time=38978
address=c0000002000fff60, number=d6, tries=4
for 'latch: shared pool' count=1 wait_time=38942
address=c0000002000fff60, number=d6, tries=3
for 'latch: shared pool' count=1 wait_time=19435
address=c0000002000fff60, number=d6, tries=2
for 'latch: shared pool' count=1 wait_time=12655
address=c0000002000fff60, number=d6, tries=1
for 'latch: shared pool' count=1 wait_time=8
address=c0000002000fff60, number=d6, tries=0
for 'os thread startup' count=1 wait_time=144253
=0, =0, =0
for 'os thread startup' count=1 wait_time=141360
=0, =0, =0


SO: c00000018d2f3500, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,
event: 2, last message event: 40,
last message waited event: 40, messages read: 1
channel: (c0000004dd29bbd8) system events broadcast channel
scope: 2, event: 224634, last mesage event: 40,
publishers/subscribers: 0/161,
messages published: 1


SO: c00000018d252238, type: 3, owner: c00000018d000078, flag: INIT/-/-/0x00
(call) sess: cur c00000018d2508a8, rec 0, usr c00000018d2508a8; depth: 0
----------------------------------------
SO: c00000018d2594b0, type: 5, owner: c00000018d252238, flag: INIT/-/-/0x00
(enqueue) PR-00000000-00000000 DID: 0001-0002-00000003
lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2
res: 0xc0000004df401718, mode: X, lock_flag: 0x0
own: 0xc00000018d2508a8, sess: 0xc00000018d2508a8, proc: 0xc00000018d000078, prv: 0xc0000004df401728
----------------------------------------
SO: c00000018d30b710, type: 16, owner: c00000018d000078, flag: INIT/-/-/0x00
(osp req holder)
CHILD REQUESTS:
(osp req) type=2(BACKGROUND) flags=0x20001(STATIC/-) state=1(INITED) err=0
pg=0 arg1=0 arg2=(null) reply=(null) pname=S018
pid=0 parent=c00000018d30b710 fulfill=0000000000000000
----------------------------------------
SO: c0000004dbff09c0, type: 192, owner: c0000004dbff09c0, flag: -/-/-/0x00

在metalink上搜索600[17175]内部错误相关的文档,可以找到该错误的大量信息:

Keywords: ora-00600 [17175]

1. Bug 6250251: ORA-00600 17175 DURING KGI CLEANUP - DUMP - ORADEBUG
--ora-600 followed by ora-601 and instance crash with ORA-17175.
--Also, setting of heap check event triggers this problem. In this case
--it is event="10235 trace name context forever, level 27"

2. Bug 4216668 - Dump from INSERT / MERGE on internal columns (Doc ID 4216668.8)
--INSERT or MERGE commands might core dump if operating on object types and internal columns are involved.

3. Bug 7590297: ORA-600 [17175] [255] ORA-601: CLEANUP LOCK CONFLICT CRASHED THE DATABASE

4. SR 3-2296150050
--The error has occurred when Oracle was cleaning shared pool latch/heap information about the process 
which died in middle.
--There is no data corruption associated with this error.
--This is evident from the function kghcln in the trace stack at which it failed.
--This problem is usually the symptom of some earlier problem with the latch.
--Either after a process has died, or a process has signaled an error while holding a shared pool latch, 
and the index to the shared pool latch is invalid.

--There was a Bug 7590297 raised for this issue which could not be progressed due to unavailability of information.
--From few earlier known issues - This can be due to PMON may sometimes signal ORA-601 
while trying to start up additional shared servers or dispatchers.
--There the workaround suggested was to Start the instance with max # of shared servers.

--Can you reproduce the problem?If the instance has been restated the issue may not persist as it is related to memory.
--If the issue persists then we have to perform the following to monitoring the instance to investigate further:

--1. Set the following event in parameter file:
--event="10257 trace name context forever, level 10"
--event="601 trace name SYSTEMSTATE level 10"

--The first event will cause PMON to dump info about shared server startup.
--The second event will cause PMON to do a system state dump when the 601 occurs.

--2. You should also have the track of this in intervals and save the historical results from:

--SQL> select e.total_waits, e.total_timeouts, e.time_waited from v$session_event e, v$session s
, v$bgprocess b where b.name='PMON' and s.paddr=b.paddr and e.sid=s.sid and e.event='process startup';


5. SR 3-2123025401
--=== ODM Solution / Action Plan ===
--Disabled NUMA for resolution

6. SR 7314313.994

Analysis:

Bug 6250251 and bug 4216668 are not applicable to this case.
Bug 7590297 is applicable to this case, as the call stack, error message are the same with this case. 
But this patch is suspended as requested info is not available.

SR 3-2296150050: same error message, same DB version, similar call stack; closed without solution.
SR 3-2123025401: same error message, same DB version, similar call stack. 
The issue happened twice in that SR and solved by disabling NUMA
SR 7314313.994: same error message, same DB version, similar call stack; closed without solution.

ERROR:
ORA-600 [17175] [a]


VERSIONS:
versions 9.2 to 10.1


DESCRIPTION:


This error occurs when we are cleaning up a shared pool latch (either after a process has died, 
or a process has signaled an error while holding a shared pool latch), 
and the index to the shared pool latch is invalid.


ARGUMENTS:
Arg [a] index of the latch recovery structure - usually 255

FUNCTIONALITY:
Generic Heap Manager


IMPACT:
INSTANCE HANG
PROCESS FAILURE
INSTANCE FAILURE

以下为Oracle GCS给出的行动计划,GCS认为绝大多数ORA-00600 [17xxx]是由memory相关的问题引起的,这些问题往往在重启实例后就可以得到解决。并建议可以设置shared_servers=max_shared_servers后进一步观察:

From the uploaded files it looks like you were reported with ORA-00600 [17175] errors 
and crashed the instance.What is the current status after the restart of the database. 
Are you still reported with the same errors and crashing the instance ?
Mostly the ORA-00600 [17xxx] errors are memory releated and might have got resolved after the database restart.


Further looking at the uploaded trace file the failing functions and the error closely 
matches Bug 6958493and is closed as duplicate of BaseBug 6962340which is closed 
as could not able to reproduce the error.


Also a smillar issue is reported inBug 3104250which is fixed in 10g, but that doesn't mean 
you cannot get this error for a new reason and that the same workaround would fix it.
We need to implement the workaround and set: shared_servers=max_shared_servers 
if the error reproduces again. If this is still repeated issue then we can file a new bug with development for the same.


ACTION PLAN
===========
1. Monitor the alertlog for the ORA-00600 [17175] errors for the next few days and if the database still crashes then please
set shared_servers=max_shared_servers and see if the problem resolves or not.

Oracle内部错误:ORA-00600[kccchb_3]一例

一套Linux x86-64上的11.1.0.7 RAC系统,该RAC使用Netapps NFS作为共享存储。其中一个节点出现ORA-00600: internal error code, arguments: [kccchb_3]内部错误并导致实例意外终止,详细日志如下:

Mon Dec 27 00:03:13 2010
Error: Controlfile sequence number in file header is different from the one in memory
Please check that the correct mount optionsare used if controlfile is located on NFS
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc (incident=11353):
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
Incident details in: /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/incident/incdir_11353/TSMOTEBS1_ckpt_15907_i11353.trc
Errors in file /d02/oratst/TSMOTEBS/db/11.1.0/admin/TSMOTEBS1_l118464dcss3001/diag/rdbms/tsmotebs/TSMOTEBS1/trace/TSMOTEBS1_ckpt_15907.trc:
ORA-00600: internal error code, arguments: [kccchb_3], [555541], [555540], [555540], [], [], [], [], [], [], [], []
CKPT (ospid: 15907): terminating the instance due to error 469

根据Metalink文档Note 303986.1 RAC instance using NFS via Netapps report Ora-600 [kccchb_3]:

Symptoms
RAC instance using NFS via Netapps occasionally reports:
ORA-00600 [kccchb_3], [689], [0], [],[],[],[]
ORA-00600 [kclchkblk_3], [7], [738515663], [14], [], [], [], []

Cause
This is most probably a Netapp issue wherein 'noac' doesn't work as expected.
NetApp recommended mount options for Oracle9i RAC on Solaris:
rw,bg,vers=3,proto=tcp,hard,intr,rsize=32768,wsize=32768,forcedirectio,noac

NetApp recommended mount options for Linux SLES9, RHEL4, RHEL3 QU3 and later:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,actimeo=0

NetApp recommended mount options for Linux RHAS2.1, RHEL3 pre QU3:
rw,bg,hard,nointr,tcp,vers=3,timeo=600,rsize=ORACLE_BLOCK_SIZE,
wsize=ORACLE_BLOCK_SIZE,noac

And for AS 2.1 in /etc/modules.conf add:
options nfs nfs_uncached_io=1
Solution

Workaround
set filesystemio_options=DIRECTIO for all instances (to bypass any buffer cache)

以上介绍了该ORA-00600 [kccchb_3]内部错误可以通过设置初始化参数filesystemio_options为DIRECTIO来workaround;实际上更好的选择可以是设置filesystemio_options为SETALL,即同时使用DIRECTIO和ASYNC异步IO。

顺便提一下在Linux上启用异步ASYNC IO不仅需要设置该filesystemio_options参数,我们还需要同时保证以下几点:
1.filesystemio_options参数设置为ASYNC或者SETALL

2.合理设置Linux Kernel内核参数fs.aio-max-nr,例如设置为3145728;以及参数fs.file-max =6815744

3.disk_asynch_io初始化参数设置为TRUE

4.在10gR2下oracle binary可能没有正确以async_on选项编译,我们需要手动重新make(The reason behind this behavior is that the LIBAIO_0.1 Linux OS library is not attached for io_getevents.)

SQL>shutdown immediate
[maclean@rh8 ~]$ cd $ORACLE_HOME/rdbms/lib
[maclean@rh8 lib]$ ln -s /usr/lib/libaio.so skgaio.o
[maclean@rh8 lib]$ make PL_ORALIBS=-laio -f ins_rdbms.mk async_on
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
cp /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaio.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/skgaioi.o
chmod 755 /u01/oracle/product/10.2.0/db_1/bin

 - Linking Oracle 
rm -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle
gcc  -o /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle -L/u01/oracle/product/10.2.0/db_1/rdbms/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/ -L/u01/oracle/product/10.2.0/db_1/lib/stubs/ -L/usr/lib -lirc   -Wl,-E `test -f /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o && echo /u01/oracle/product/10.2.0/db_1/rdbms/lib/sskgpsmti.o` /u01/oracle/product/10.2.0/db_1/rdbms/lib/opimai.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ssoraed.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/ttcsoi.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o  -Wl,--whole-archive -lperfsrv10 -Wl,--no-whole-archive /u01/oracle/product/10.2.0/db_1/lib/nautab.o /u01/oracle/product/10.2.0/db_1/lib/naeet.o /u01/oracle/product/10.2.0/db_1/lib/naect.o /u01/oracle/product/10.2.0/db_1/lib/naedhs.o /u01/oracle/product/10.2.0/db_1/rdbms/lib/config.o  -lserver10 -lodm10 -lnnet10 -lskgxp10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lhasgen10 -lcore10 -lskgxn2 -locr10 -locrb10 -locrutl10 -lhasgen10 -lcore10 -lskgxn2   -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10" ; fi` -lslax10 -lpls10  -lplp10 -lserver10 -lclient10  -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10  -lplp10 -ljox10 -lserver10 -lclsra10 -ldbcfg10 -locijdbcst10 -lwwg  `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lmm -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /u01/oracle/product/10.2.0/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10   -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `if /usr/bin/ar tv /u01/oracle/product/10.2.0/db_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10  -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -laio    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -Wl,-rpath,/u01/oracle/product/10.2.0/db_1/lib -lm    `cat /u01/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm   -L/u01/oracle/product/10.2.0/db_1/lib
/u01/oracle/product/10.2.0/db_1/lib//libcore10.a(lcd.o): In function `lcdprm':
lcd.c:(.text+0x8e8): warning: the `gets' function is dangerous and should not be used.
mv -f /u01/oracle/product/10.2.0/db_1/bin/oracle /u01/oracle/product/10.2.0/db_1/bin/oracleO
mv /u01/oracle/product/10.2.0/db_1/rdbms/lib/oracle /u01/oracle/product/10.2.0/db_1/bin/oracle
chmod 6751 /u01/oracle/product/10.2.0/db_1/bin/oracle

SQL> startup;

/* 通过以下方式可以验证ASYNC IO的工作情况 */

cat /proc/slabinfo | grep kio

[maclean@rh8 ~]$ cat /proc/slabinfo | grep kio
kioctx                56     72    320   12    1 : tunables   54   27    8 : slabdata      6      6      0
kiocb                 15     15    256   15    1 : tunables  120   60    8 : slabdata      1      1      0

/* 这里kioctx的值大于零,说明正在使用异步ASYNC IO */

如何是11g,那么无需那么麻烦要重新编译async_on,同时可以通过以下手段了解异步ASYNC IO的使用情况:

SQL> col name for a60 
SQL> SELECT name, asynch_io FROM v$datafile f,v$iostat_file i 
WHERE f.file# = i.file_no
AND filetype_name = 'Data File' ; 

NAME							     ASYNCH_IO
------------------------------------------------------------ ---------
/standby/oradata/PROD/datafile/o1_mf_system_6q9dwgwh_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_sysaux_6q9dwgyp_.dbf    ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_undotbs1_6q9dwh0r_.dbf  ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_users_6q9dwh2x_.dbf     ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_example_6q9dzhh1_.dbf   ASYNC_ON
/standby/oradata/PROD/datafile/o1_mf_enc_6q9fdzcx_.dbf	     ASYNC_ON

6 rows selected.

/* 可以看到ASYNCH_IO的状态为ON,说明正使用异步IO */

Oracle闩:Cache Buffers chains

Latch cache buffers chains大约是Oracle中child latch数量最多,使用最为频繁的闩锁了。其子闩总数受到初始化参数(8i中的db_block_buffers/4)的影响,Oracle中有大量kernel函数有机会接手持有cache buffer chains latch:

col parent_name for a25
col location for a40

SELECT t1.ksllasnam "parent_name",
       t2.ksllwnam  "location"
  FROM x$ksllw t2, x$kslwsc t1
 WHERE t2.indx = t1.indx
   AND ksllasnam = 'cache buffers chains';

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbw_activate_granule
cache buffers chains      kcbw_first_buffer_free
cache buffers chains      kcbwxb
cache buffers chains      kcbw_examine_granule
cache buffers chains      kcbw_next_free
cache buffers chains      kcbw_first_buffer_free_2
cache buffers chains      kcbbckb
cache buffers chains      kcbbioe
cache buffers chains      kcbbic1
cache buffers chains      kcbbcwd
cache buffers chains      kcbbxsv

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbbwdb
cache buffers chains      kcbbic2
cache buffers chains      kcbkzs
cache buffers chains      kcbrmf2so
cache buffers chains      kcbget: exchange rls
cache buffers chains      kcbralloc
cache buffers chains      kcbgcur: deadlock
cache buffers chains      kcbgcur: buf pinned
cache buffers chains      kcbgtcr
cache buffers chains      kcbchg: no fast path
cache buffers chains      kcbchg: apply change

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbrra: buf exists
cache buffers chains      kcbrra: update buf flags
cache buffers chains      kcbema: find buf
cache buffers chains      kcbtema: find buf
cache buffers chains      kcbget: prewarm wait
cache buffers chains      kcbrfrebuf
cache buffers chains      kcbsod1
cache buffers chains      kcbrbrl
cache buffers chains      kcbgcur: kslbegin
cache buffers chains      kcbgtcr: kslbegin shared
cache buffers chains      kcbrls: kslbegin

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbchg: kslbegin: bufs not pinned
cache buffers chains      kcbchg: kslbegin: call CR func
cache buffers chains      kcbnlc
cache buffers chains      kcbget: exchange
cache buffers chains      kcbget: pin buffer
cache buffers chains      kcbnew: new latch again
cache buffers chains      kcbgkcbcr
cache buffers chains      kcbget: in cur_read
cache buffers chains      kcbesc: escalate
cache buffers chains      kcblbi
cache buffers chains      kcbcge

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbfrl
cache buffers chains      kcbzsc
cache buffers chains      kcbibr
cache buffers chains      kcbnew_1
cache buffers chains      kcbema
cache buffers chains      kcbsrbd
cache buffers chains      kcbso1: set no access
cache buffers chains      kcbtema
cache buffers chains      kcbso1: in done_clr
cache buffers chains      kcbsod2
cache buffers chains      kcbzcg

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbzre1
cache buffers chains      kcbrlb1
cache buffers chains      kcbchkrsod
cache buffers chains      kcbxbh
cache buffers chains      kcbzsck
cache buffers chains      kcbgtcr: fast path
cache buffers chains      kcbgtcr: kslbegin excl
cache buffers chains      kcbgtcrf
cache buffers chains      kcbfdgd
cache buffers chains      kcbdng
cache buffers chains      kcbbufaddr2hdr

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbo_ivd_process
cache buffers chains      kcbo_write_process
cache buffers chains      kcbo_exam_buf
cache buffers chains      kcb_pre_apply: kcbhq61
cache buffers chains      kcb_post_apply: kcbhq62
cache buffers chains      kcb_post_apply: kcbhq63
cache buffers chains      kcbnew : new esc failed
cache buffers chains      kcbesc : escalate failed
cache buffers chains      kcb_private_owner
cache buffers chains      kcb_is_private
cache buffers chains      kcb_unprivatize

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcb_restore_block_headers
cache buffers chains      kcb_flush_undo_buffers
cache buffers chains      kcbgcur - DEADL
cache buffers chains      kcbtbd
cache buffers chains      kcbzwc
cache buffers chains      kcbzwx
cache buffers chains      kcbrmflx
cache buffers chains      kcbzwb
cache buffers chains      kcbzgb: get latch after post
cache buffers chains      kcbzgb: scan from tail. nowait
cache buffers chains      kcbzgb: exit_loop

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbzib: multi-block read: nowait
cache buffers chains      kcbzib: finish free bufs
cache buffers chains      kcbzcb
cache buffers chains      kcbzdh
cache buffers chains      kcbdpr
cache buffers chains      kcbcxx
cache buffers chains      kcbzrn
cache buffers chains      kcbdpd: for specific dba
cache buffers chains      kcbdpd: dump all buffers
cache buffers chains      kcbzib: exchange rls
cache buffers chains      kcbzpnd: dump buffers

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbzhngcbk1: get hash chain latch no wai
                          t

cache buffers chains      kcbo_cxx
cache buffers chains      kcbz_check_obj_reuse_sanity
cache buffers chains      kcbzib_grlk
cache buffers chains      kcbz_force_maps
cache buffers chains      kcbrldflx: recover in-flux bufs
cache buffers chains      kcbra1fbuf: recover in-flux bufs
cache buffers chains      kcbrafb: flashback bufs:1
cache buffers chains      kcbrafb: flashback bufs:2

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kcbr_media_apply: find buffer
cache buffers chains      kcbr_issue_read: alloc buffer
cache buffers chains      kcbr_issue_read: retry alloc
cache buffers chains      kcbr_validate_read: mark corrupt
cache buffers chains      kcbr_apply_change: after apply
cache buffers chains      kcbr_mapply_change
cache buffers chains      kcbr_mrcv_clear_fgda
cache buffers chains      kclwlr
cache buffers chains      kclebs_1
cache buffers chains      kclcls
cache buffers chains      kclcsr_1

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclpred
cache buffers chains      kclcls_1
cache buffers chains      kclple_1
cache buffers chains      kclple_2
cache buffers chains      kclcls_2
cache buffers chains      kcllwr
cache buffers chains      kclwcrs
cache buffers chains      kclcrs_1
cache buffers chains      kclcsr
cache buffers chains      kclrls
cache buffers chains      kclwcrs_1

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclfbst_1
cache buffers chains      kclpdc_1
cache buffers chains      kclwcrs_2
cache buffers chains      kclwcrs_3
cache buffers chains      kclfpdb
cache buffers chains      kclfpdb_2
cache buffers chains      kclpdc_2
cache buffers chains      kcllkopb
cache buffers chains      kclgrantlk
cache buffers chains      kclwrt
cache buffers chains      kcllkopb_1

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclwcrs_4
cache buffers chains      kcllkopb_2
cache buffers chains      kclcls_4
cache buffers chains      kclpred_1
cache buffers chains      kclrclr_2
cache buffers chains      kclrecbst
cache buffers chains      kclgrantlk_1
cache buffers chains      kclcls_5
cache buffers chains      kclrwrite_1
cache buffers chains      kclrwrite_2
cache buffers chains      kclcopy

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclswrite
cache buffers chains      kclchash
cache buffers chains      kclcfusion
cache buffers chains      kclfchk_1
cache buffers chains      kclcfusion_1
cache buffers chains      kclblkdone
cache buffers chains      kclcfusion_2
cache buffers chains      kclrenounce
cache buffers chains      kclbla
cache buffers chains      kclpto_1
cache buffers chains      kclgrantlk_2

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclcomplete
cache buffers chains      kclshrshr
cache buffers chains      kclclaim
cache buffers chains      kclhngcbk1
cache buffers chains      kclblkdone_1
cache buffers chains      kclgvlk
cache buffers chains      kclblkdone_2
cache buffers chains      kclcclaim
cache buffers chains      kclrechk_1
cache buffers chains      kclrechk_2
cache buffers chains      kclbr

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclpto
cache buffers chains      kclpdcl
cache buffers chains      kclpdc_3
cache buffers chains      kclpdc_4
cache buffers chains      kclgcr_1
cache buffers chains      kclcls_6
cache buffers chains      kclevict
cache buffers chains      kcldle
cache buffers chains      kclrcopy
cache buffers chains      kclenter
cache buffers chains      kclrbast

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclexpand
cache buffers chains      kclcls_3
cache buffers chains      kclverify
cache buffers chains      kclaffinity
cache buffers chains      kclassert
cache buffers chains      kclobj
cache buffers chains      kclobj_1
cache buffers chains      kclobj_2
cache buffers chains      kclgclk
cache buffers chains      kclwcrs_5
cache buffers chains      kclscrs

parent_name               location
------------------------- ----------------------------------------
cache buffers chains      kclstalepi
cache buffers chains      kclstalepi_1
cache buffers chains      kclstalepi_2
cache buffers chains      kclgvlk_1
cache buffers chains      kclgclk_1
cache buffers chains      kclgclk_2
cache buffers chains      kclcsr_2
cache buffers chains      kcbvm

204 rows selected.

我们一般认为Latch结构是Mostly exclusive access的,也就是极少会有共享访问闩的机会。但Oracle一般对外宣称读取数据时服务进程是以共享模式使用cache buffers chains闩,这就造成了许多人误以为读读是不会出现latch: cache buffers chains争用的。
但是实际上查询语句大部分情况下仍需要以exclusive模式持有该类子闩(有时会以SHARED模式持有,这取决于读取时是使用kcbgtcr: kslbegin shared还是kcbgtcr: kslbegin excl;kcbgtcr是Oracle rdbms中重要的获取一致性读的函数,其含义为Kernal Cache Buffer GeT Cosistents Read,显然该函数存在两种获取cache buffers chains的方式即kslbegin shared和excl。与之相对应的是kcbgcur: kslbegin,kcbgcur的含义为Kernel Cache Buffer Get Current,该函数用以获取当前块以便修改,也就是”写”;很显然kcbgcur: kslbegin函数只有以excl排他方式持有child cache buffers chains latch的必要),原因在于虽然是查询语句但同样需要修改buffer header结构,譬如修改tch抢手度、holder list的hash变量us_nxt、us_prv以及waiter list的hash变量wa_prv、wa_nxt等。换而言之读读是会引起Latch free:cache buffers chains等待的,而非许多人认为的仅有读写、写写会导致缓存链闩争用。

这个问题我们再用实验证明一遍:

SQL> drop table maclean;
drop table maclean
           *
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> create table maclean tablespace users as select * from dba_objects;
Table created.

SQL> select count(*) from maclean;

  COUNT(*)
----------
     51944

SQL> /

  COUNT(*)
----------
     51944

SQL> /

  COUNT(*)
----------
     51944

SQL> select spid from v$process where addr =(select paddr from v$session where sid=(select distinct sid from v$mystat));

SPID
------------
6023

另开一个命令行窗口:

[maclean@rh2 ~]$ gdb $ORACLE_HOME/bin/oracle 6023
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /s01/10gdb/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /s01/10gdb/bin/oracle, process 6023
Reading symbols from /s01/10gdb/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libskgxp10.so
Reading symbols from /s01/10gdb/lib/libhasgen10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libhasgen10.so
Reading symbols from /s01/10gdb/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libskgxn2.so
Reading symbols from /s01/10gdb/lib/libocr10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libocr10.so
Reading symbols from /s01/10gdb/lib/libocrb10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libocrb10.so
Reading symbols from /s01/10gdb/lib/libocrutl10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libocrutl10.so
Reading symbols from /s01/10gdb/lib/libjox10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libjox10.so
Reading symbols from /s01/10gdb/lib/libclsra10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libclsra10.so
Reading symbols from /s01/10gdb/lib/libdbcfg10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libdbcfg10.so
Reading symbols from /s01/10gdb/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/10gdb/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00000039f280d8e0 in __read_nocancel () from /lib64/libpthread.so.0
(gdb) break kcbgtcr
Breakpoint 1 at 0x108c72c

回到原sqlplus窗口再次执行查询语句,会hang住:
SQL> select count(*) from maclean;

在gdb窗口:
(gdb) break kslfre
Breakpoint 2 at 0x7a133c

Breakpoint 1, 0x000000000108c72c in kcbgtcr ()
(gdb) c
Continuing.

Breakpoint 1, 0x000000000108c72c in kcbgtcr ()
(gdb) c
Continuing.

Breakpoint 1, 0x000000000108c72c in kcbgtcr ()
(gdb) c
Continuing.

Breakpoint 2, 0x00000000007a133c in kslfre ()
(gdb) c
Continuing.

多次continue直到出现kslfre内核函数,开一个新的sqlplus窗口:
SQL> set autotrace on;
SQL> select count(*) from v$latchholder;

  COUNT(*)
----------
         3

Execution Plan
----------------------------------------------------------
Plan hash value: 1575818826

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |    13 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |            |     1 |    13 |            |          |
|*  2 |   FIXED TABLE FULL| X$KSUPRLAT |     1 |    13 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------

v$latchholder的数据来源于X$KSUPRLAT内部视图,因为v$latchholder不带mode字段,所以我们直接观察X$KSUPRLAT

SQL> select KSUPRLAT "address",KSUPRLNM "name",KSUPRLMD "mode" from X$KSUPRLAT;

address          name                      mode
---------------- ------------------------- ---------------
00000000FCE40040 cache buffers chains      EXCLUSIVE
00000000FA696978 simulator lru latch       EXCLUSIVE
00000000FA6CDCE0 simulator hash latch      EXCLUSIVE

/* 可以看到即便是查询语句也是以EXCLUSIVE mode持有child cache buffers chains latch */

(gdb) quit
A debugging session is active.

        Inferior 1 [process 6235] will be detached.

Quit anyway? (y or n) y
Detaching from program: /s01/10gdb/bin/oracle, process 6235

/* 可以通过以下statistics可以看到以上读取为纯粹的逻辑读,没有物理读取的部分干扰*/

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        719  consistent gets
          0  physical reads
          0  redo size
        517  bytes sent via SQL*Net to client
        492  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

[转]Oracle销售人员普遍腐败?

2004年7月19日,中国上海,Oracle大中华区董事总经理陆纯初带着他的新高管班子集体亮相于Oracle World会场。对于陆纯初治下的Oracle中国公司来说,这是一次集体兴奋。

 

不过,更需要陆纯初面对的是,Oracle中国公司的问题并没有因此而平息下来,人事变动只是其中一环,更致命的问题在于Oracle中国公司盲目举起直销大旗,多少有兔死狗烹之嫌。尽管陆纯初并不否认合作伙伴过去在Oracle中国发展中的作用,但这仅仅是过去。

 

“我们现在连鸡肋都不如!”依着Oracle这棵大树做了两年代理的陈昆,已经开始考虑下一步的走向。

 

“鸡肋丢掉的话,人家还会有些可惜。对于Oracle和我们,应该是到了双方信任难关的时候。”陈昆的话极具代表性,这个信任,在陈看来,其中包括商业道德的成分。

 

变革本无可厚非。讲究流程和注重严明纪律的陆纯初这一次入主Oracle中国公司靠走胡伯林、张书恒等之后,开始从渠道入手,清理门 户,充分展示了陆式铁腕手段。其“猛药下沉疴,快刀斩乱麻”的外科手术打法,有如2003年的美国对伊拉克战争,看起来干净利索,最后结果如何还有待考 量。

 

毕竟,从根本上忽视合作伙伴和客户利益的行为是商业大忌。而这种商业大忌,在过去的10多年里,不断在Oracle中国公司的合作史中上演。

 

美的Oracle往事

 

对于张伟来说,1997年进入美的集团是个机会。在这之前,张伟曾在武汉做过一段时间的市场策划,初来到美的也是做跟策划相关的工作。 1996年,美的集团已经确定上线Oracle的系统管理软件,当时ERP的概念还不普及,更多的处于MRP‖阶段。张伟“没想到自己会被抽调到关键用户 的岗位”,并由此开始了IT咨询顾问的职业生涯。

“现在中国做IT咨询资历比较老的人多数是从美的、华为出来的。”——华为跟美的一样,是Oracle在中国的早期客户。1999年,从美的电脑部脱胎而来的美的信息科技有限公司成立,汉普、汉得等一批如今久负盛名的IT咨询公司刚成立不久。

“那时候的汉普是个空架子,除了张后启,没有什么人拿得出手。”而汉普、汉得都是依着Oracle起家,没有客户实施经验是它们的硬 伤,“包括当年的普华永道。”于是,美的仍在进行Oracle系统上线的时候,美的实施队伍和关键用户就开始有人被不断挖走,这在1999年美的信息成立 的时候达到高潮。

“美的100多人的参与队伍,那时候在这些咨询公司看来,是一个现成的金矿。”也是从美的出来,现为某IT咨询公司首席咨询顾问的李宽对几年前业内的这场“挖人运动”仍记忆犹新。

美的、华为被挖走的实施人员和关键用户,多数流入到了中国新近兴起的IT咨询界,

这些早期的Oracle用户也因此帮助Oracle一度占领了中国企业级应用软件的高端市场。

李宽认为,“2000年前后,Oracle在中国市场的火爆很大程度上归功于美的信息、汉普、汉得这些合作伙伴。”这正值SAP在中国 蛰伏4、5年后力推其“灯塔计划”的时候,当时SAP更多是携IBM、德勤等这些国际咨询界巨头在中国打单,而Oracle与中国本土咨询伙伴联合起来, 使得SAP的“灯塔计划”经常被搅和。其中包括哈药和长虹等单子。

2001年,美的信息改名赛意信息。这一年,美的Oracle系统的一期实施已基本完成,之前参与这一项目的实施人员和关键用户都编入了赛意。赛意由此开始了Oracle的渠道角色,并一直持续到现在。

“这一年,也是赛意人走得最多的一年。原来美的那一批人由于待遇等问题在这个时候几乎流失殆尽。” 张伟也是在2001年被一家上线Oracle系统的港资企业挖走,做了内部顾问。今年初,张伟跳到广州科森信息科技公司,再次成为IT咨询顾问。“在科 森,也有不少从美的出来的人,其中包括目前科森的3个首席顾问和我们现在一个项目的顾问经理。”

赛意头上的美的光环随着顾问的流失而逐渐暗淡,加上其市场操作能力的欠缺,Oracle开始将注意力转向扶持其他的合作伙伴,赛意就此沦为一家区域性的二流系统集成商。

伙伴的非Oracle逻辑

对于自己已经脱离近两年的汉普生涯,如今专做瑞典某家系统软件代理的林小兵不愿多说。但谈起Oracle以及当年的“鲲鹏计划”,他明显开始激动。

“除了张后启等人的原因,汉普的起身和衰落,Oracle是一大因素。”自从离开汉普之后,林小兵发誓再也不接触Oracle。

张后启在1997年靠3万块钱创立了汉普,自1999年从美的大规模的挖来咨询顾问,有了较强的实施队伍后,汉普在之后的两年“签单都快签疯了”,“随着单子越来越多,汉普咨询顾问的缺乏越来越成为一个问题。”这两年,林小兵南上北下,“连自己的生日都漏了两次。”

顾问的工资是咨询公司最大的一笔支出。没有顾问,空有单子的局面让张后启开始心头焦虑起来,最简便的方法就是去找投资。2001年底, 联想集团将现金港币5500万元及联想现有IT咨询业务注入汉普,并获得汉普的51%权益。这之后,一直到近期联想卖掉包括原汉普在内的IT服务部门的几 年间,汉普改名叫联想汉普。

一年后的2002年11月25日,Sun公司中国公司总经理薛耀琨、时任Oracle中国区总经理胡伯林和联想集团高级副总裁俞兵共同在北京启动战略联盟,也就是所谓的“鲲鹏计划”。其中,负责Oracle系统咨询服务的就是联想汉普。

当时的Oracle正励精图治,不遗余力推广其渠道策略,打造中国Oracle系,以正面挑战SAP在中国中小企业市场的推广。

组建之初,“梦之队”便发出豪言:“我们想卖的不是十套八套,而是几十套上百套。”而半年后,事实结果恰恰是他们不想卖的“十套八套”。

那时候,林小兵也到了现场。他今日看来,“鲲鹏计划”“有如一场闹剧。”

一年前也是从汉普出来的张凯峰,现在上海自称是“无业游民”,实则做着行业内普遍可见的独立顾问。在这之前,张已经是有4年资历的老汉普人。

对于“鲲鹏计划”,张凯峰的批评直指Oracle。“Oracle对中小企业市场没有战略考虑,更大程度上是一种投机活动,其业务逻辑和管理方式明显不切合中国市场需求。这是‘鲲鹏计划’进展不利的重要原因。”

Oracle在与联想合作之初,曾经承诺针对中国的中小企业开发个性化产品,但实际上没有做到,只是将其标准版产品做了重新包装,换汤不换药。所谓的电子商务套件特别版,“其实也就是价格特别一些而已”。

最要命的是,其产品研发团队都不在中国,而是在新加坡,用的是印度工程师,根本不了解中国客户的需求。而且在实际运作过程中,Oracle 也不愿意做出改变。

林小兵曾经做过一个中小企业单子,该企业需要的一些报表,Oracle不愿意提供。联想汉普出面交涉也没有结果,“Oracle会直接告诉你,如果需要的话,必须另外购买。”

事实上,联想汉普打下不少单子,但最后多是因为客户发现产品不过关。为了不失去客户,汉普只能重推标准版,甚至更换其他厂商的产品。

在市场策略方面,Oracle也表现得急功近利。Oracle 曾经声称将和联想汉普一起加强对二级渠道的培养,这一承诺也未履行。“他们并没有培养渠道,而只是在年底跟联想要单。”这和其他公司对中国市场的潜心经营 和长远战略大相径庭。整个“鲲鹏计划”的运作过程,Oracle 所做的只是对其产品的简单价格调整和一些流于表面的市场宣传推广,追求短期利润而毫无长远打算。

而这段时间的联想汉普,“几乎是将全力放在了‘鲲鹏计划’中”,“深陷其中的同时,也是中国IT咨询发展最快的时候,其他的IT咨询公 司开始超过汉普,迅速崛起。尤其是那些专做SAP代理的咨询公司。”张凯峰认为,从时间上来看,“鲲鹏计划”是汉普历史上开始走下坡路的一个转折点。

从这之后,汉普开始考虑自己的伙伴选择,采取“骑墙战略”,同时启动Oracle、SAP两条业务线。

“现在看来,汉普自那以后,业务重心越来越倾斜于SAP。”林小兵更是认为,包括汉得在内的Oracle传统合作伙伴已经不会继续把核心资源投放在Oracle这里了。

过河拆桥导致志杰之死

志杰曾经是Oracle在中国最大的合作伙伴。

这家名为美国实为香港背景的管理咨询公司,很早的时候就与Oracle捆绑在一起。借Oracle在港台地区、日本和新加坡等地的风头先行进入中国大陆港台背景的企业。

“很多港台企业应用Oracle系统,这也影响到它们在大陆的工厂。”张伟从美的出来后,就曾进了一家港资企业做Oracle实施。

“Oracle在中国大陆之外的亚洲地区特别是日本、港台区域要比SAP卖得好,从这一点上来看,今年Oracle中国公司的高层变化也可以理解。”

事实上,Oracle尤其是其企业应用软件进入中国大陆市场也是先从港、台资企业入手的。在中国本土企业市场上,直到做下华为、美的这两个单子,并借助这两家近乎集体跳槽的实施人员和关键用户的推波助澜,Oracle才开始在中国市场上风生水起。

志杰早期的咨询顾问多来自香港,工资要比中国大陆本土的咨询公司高出很多。跟随Oracle中国市场的开拓节奏,志杰的顾问队伍急需补充。

1999年到2000年,志杰以比汉普、汉得等Oracle其他伙伴高出很多的工资挖到了不少美的、华为出来的顾问,其规模急剧膨胀,就此成为Oracle在中国最大的合作伙伴。

而与之相随的是志杰人力成本的迅速窜升,与其他咨询公司相比,在很长一段时间内,志杰都得面对高居不下的工资支出压力。

“那时候的志杰也享受了Oracle排他性条款的待遇,与Oracle的关系一度亲密无间。”尽管在志杰的时间不长,黎轶超说起志杰辉煌的时候仍难以掩饰其脸上的兴奋。“可以说,在Oracle的合作伙伴中,这段时间的志杰是做得最好的。”

但两年后的2002年底,业内突然传出志杰“将死”的消息。

“不是业务的问题,也不能说是市场的问题,直接原因在于现金流突然短路。” 黎轶超和他的同事们都感到突然,“开始的时候甚至有点不太相信。”

在这之前,从2002年6月到11月,5个月时间里志杰就签下了1800万的合同额。而在同年6月份的“Oracle World 2002北京”大会上,作为Oracle重要合作伙伴,志杰在会场上风光无限。Oracle明确表示对中国的渠道合作伙伴,将在三个方面更多地予以支持, 以提高伙伴能力:一、提高支持、服务的能力;二、扩大营销范围;三、增进伙伴特长。此外还要从资金上给重要伙伴以支持。

黎轶超们相信,就算志杰股东们不伸手解决当前问题,Oracle也不会坐视不管。“而且也就是200万的现金流问题。”

但最后的结果是,Oracle跟志杰股东们一样,“好像一夜之间不看好志杰似的,对志杰的突然死亡不闻不问。”

“后来我们才想到,这对Oracle来说,不会有什么损失。志杰签下的单子几乎都是跟Oracle相关的,不会因为志杰死了这些单子就飞了。” 黎轶超直到现在也难以接受这种Oracle似的冷漠。

分包制与oracle腐败

2001年,胡柏林率领下的Oracle中国公司在中国市场掀起声势浩大的渠道建设运动。“从某种程度上来说,Oracle的渠道之乱也是从这个时候开始的。” Oracle某长期合作伙伴老板这样认为。

也就在这一年,做Oracle代理的门槛突然全线降低,汉普、志杰等传统合作伙伴享受的排他性条款也就此作废,往日的优越感开始逐步被吞噬。

“那个时候,你会突然发现,做Oracle的代理原来可以这么简单,几个人找点关系就可以拿下Oracle代理商的资格。”有一天,张伟发现在他看来根本不懂什么叫ERP的一个人也做Oracle代理的时候,他开始有点迷茫。“这给中国IT咨询界造成了很大冲击。”

事实上,那些没做过IT咨询的人之所以敢做Oracle的代理,在于管理软件界普遍可见的分包制。

也就是说,只要你有关系,能跟Oracle的销售人员接上头,拿下一个单子,你可以不用自己去实施,只要能找到那些有实施能力但没有关系的小公司,或者临时拼凑起一个实施团队,一般来说,你就只要在中间等着分钱就行了。

“一个200万的单子,如果是在这种分包制下完成的话,客户可能只能得到50万的实际价值。”一位原来做Oracle代理的人士告诉记者,“这在Oracle一些中小渠道商里面很普遍。”

更有甚者,有些单子是层层分包下去的,但最后到企业客户那里实施的时候,都可以打着Oracle实施顾问的旗号。

2001年,香港电讯盈科成为Oracle中国移动这个单子的总包头。尽管电讯盈科在电信行业有着丰富的运营经验,但“事实上,那时候它在系统集成上根本就不强。”这也是中国移动成为Oracle中国失败第一单的源头理由。

电讯盈科取得总包资格后,开始往下游系统集成商分包。“科森也是在这个时候成立的。”当时的科森只有香港老板几个人,“因为都是香港人,可能是跟电讯盈科高层关系比较好”,拿到了分包资格,这也是科森起家的起点。

而更多的分包商则采取了层层分包的方式,到最后,中国移动这个单子连电讯盈科都说清楚到底有哪些所谓的Oracle顾问在实施了。

这种混乱局面到2003年底的时候演绎到极致,在中国本土,“做Oracle的人这时候就像一个金字塔似的。”塔顶上就是Oracle的销售人员。

这种现象也直接催生了Oracle销售人员的普遍腐败,以及现在广为人指责的Oracle渠道政策。

“有不少Oracle的销售人员,自己在外面开公司接单,然后再以Oracle的名义分包给代理商。”

有的销售人员则与某些利益相关的代理商勾结起来,除了提供客户信息外,联手推打Oracle其他的合作伙伴,“如果竞争对手在Oracle拿不到比他们更低的折扣,一般来说,都会自动退出。”

在Oracle的渠道体系里,逐渐形成代理商一边要努力跟Oracle销售人员打关系,一边要跟体系内的竞争对手拼折扣,一边还要跟体系外的竞争对手打单的局面。“真是苦不堪言。”李波在Oracle的关系不是很硬,后来也被迫退出了其渠道体系。

而另一方面,Oracle对分销商有着近乎苛刻的货款要求,“款到货到,货到一概不退”,无可争辩。代理商为了拿下客户单子,很多时候不得不先期垫付货款,有时候则是在Oracle销售人员的强迫之下拿出钱来帮其完成业绩指标。

现金流短缺几乎成为所有Oracle代理商的一大头痛问题。“Oracle根本不会考虑我们的难处,它特别自信自己的产品和市场影响力。对于它来说,代理商是在求它施舍。”李波对此颇为气愤。

事实上,Oracle系统的BUG问题业内共知,这在系统实施过程中也凸显出来。客户无论在实施过程还是实施完成之后,都会对Oracle大加指责,而这些指责一般只会落在代理商头上。“对于我们来说,一般不要奢望能完全收回签约款项。”

这样下来,正常情况下,代理商除了要面临Oracle销售人员的回扣要求和Oracle的货款要求,还要面对常见的难收款问题。李波认为,“志杰的死很大程度上也跟这有关。”

而对于客户来说,在这样的渠道体系下,“实施Oracle系统其实风险巨大。”由此来看待Oracle近些年在中国市场上的疲软,也就不难理解了。

对内部腐败的长期默许和合作伙伴与客户利益的习惯性忽视,这让很多了解实情的人人不可理解,但在Oracle过去几年却是不争的常态。对此,绝大多数的业内人士都认为,“这个局面错不在代理商,归根到底在于Oracle的运营机制。”

今年2月份,随着陆纯初在Oracle中国区的高调登场,Oracle在中国的销售策略路线已经越来越明显。继胡伯林2002年清理渠 道运动之后,Oracle在中国一场更大的渠道洗牌已经开始,现在对于Oracle的问题是,它在中国是否能回归到其在全球颇有Oracle风格的直销模 式—一一往无前、不达目的誓不罢休?

而对于过往为Oracle开辟中国本土市场立下汗马功劳的合作伙伴,包括死去的志杰,已经衰落的赛意,再次被卖掉的汉普,以及现在仍活得还可以的汉得和科森,Oracle已经或者正在成为过去。

对于中国更多的从业人员来讲,无论其变革路线会怎样,Oracle骨子里面的美式霸权和过河拆桥风格,他们不相信会有什么改变。

对于Oracle来说,在中国市场上,信任将是未来多年的主题。

 

原文地址: http://it.icxo.com/htmlnews/2004/08/27/310505.htm

Goldengate OGG常见问题与错误列表

以下列出了OGG一些常见的问题与错误及其解答:

Note: 966211.1 How To Resync A Single Table With Minimum Impact To Other Tables’ Replication?
Note: 966227.1 Troubleshooting TCP/IP Errors In Open Systems
Note: 972954.1 What Causes The ‘Bad Column Index(xxxx)’ Error In Replicat?
Note: 987379.1 How to recover from Extract ERROR 180 encountered commit SCN that is not greater than the highest SCN already processed
Note: 1112506.1 GoldenGate Extract Abends with “Unable to lock file” Error For Trail File
Note: 1124403.1 When Replicat is Running but Not Applying Records
Note: 1138409.1 EXTRACT / REPLICAT CHECKPOINT RBA IS LARGER THAN LOCAL TRAILFILE SIZE, AFTER SERVER CRASH
Note: 1163073.1 When using rmthost with timeout option, there are orphaned data collector processes on target system
Note: 1266389.1 Oracle GoldenGate and compressed tables
Note: 1270168.1 OGG-01028 Compression Not Supported – when extract processes regular and cluster tables
Note: 1276538.1 Replicat abending with missing key cols
Note: 1280584.1 Where To Find The List Of Supported And Unsupported Oracle Data Types for GoldenGate
Note: 1288562.1 GG Data Pumps Error With OGG-01031 When Upgrading the Target Side
Note: 957112.1 Encountered SCN That Is Not Greater Than The Highest SCN Already Processed
Note: 1271578.1 How to Handle Tables without Primary Keys or Unique Indexes with Oracle GoldenGate
Note: 957053.1 Recovering From a GG Pump Process Abending With An Incompatible Record Error
Note: 968622.1 Does GoldenGate Support Tables With An UNUSED Column?
Note: 968614.1 Why Does GoldenGate Report The Lag Is Unknown or a Timestamp Mismatch?
Note: 968769.1 GoldenGate Errors That Occur In Teradata Extract Setting
Note: 1163073.1 When using rmthost with timeout option, there are orphaned data collector processes on target system
Note: 964709.1 How To Handle Oracle DDLs (add/drop/resize a Column) 22 19 1
Note: 1300076.1 Usage and considerations for allocation of ports and DYNAMICPORTLIST
Note: 1232303.1 mount options to use when configuring GoldenGate to run on NFS mounted file system
Note: 1298548.1 Extract not accepting CACHESIZE settings
Note: 1300197.1 Logger not logging cobol transactions for GGSLIB with non default AUDCFG location
Note: 1273285.1 How To Troubleshoot Oracle Redo Log Reading Extract Slow Performance Issue
Note: 965356.1 GGS Error 150: No Dynamic Ports Available
Note: 1199561.1 Characterset Conversion: Error 160 Bad column length
Note: 1159684.1 Extract abends,mss: missing trailing blocks, unable to queue I/O, I/O beyond file size
Note: 1292874.1 GoldenGate Extract Abends on Startup with Error OGG-01088 Realloc xxx Bytes Failed.
Note: 1271522.1 Oracle GoldenGate (OGG) 11.1>>> .1>>> .0 and Transparent Data Encryption (TDE)
Note: 965270.1 How does the Manager Allocate TCPIP Ports?
Note: 1195995.1 Implementing GoldenGate for DataGuard fail overs
Note: 965373.1 Installing And Configuring GoldenGate In An Oracle RAC Environment
Note: 1276058.1 Oracle GoldenGate Best Practices: Instantiation from an Oracle Source Database
Note: 971264.1 How To Upgrade GGS Objects In A DDL Replication Environment
Note 957112.1 Encountered SCN That Is Not Greater Than The Highest SCN Already Processed

Note 969550.1 Using SCN To Do The Initial Load From Oracle To Oracle Database
Note 966211.1 How To Resync A Single Table With Minimum Impact To Other Tables’ Replication?
Note 1235986.1 Does Oracle Goldengate extract support distributed transactions?
Note 966227.1 Troubleshooting TCP/IP Errors In Open Systems
Note 972954.1 What Causes The ‘Bad Column Index(xxxx)’; Error In Replicat?
Note 987379.1 How to recover from Extract ERROR 180 encountered commit SCN that is not greater than the highest SCN already processed (GG Version 10)
Note 1112325.1 Deploying OGG to Achieve Operational Reporting for Oracle E-Business Suite
Note 1112506.1 GoldenGate Extract Abends with “Unable to lock file” Error For Trail File
Note 1120793.1 How to Get More Information than from Just a Trace on LINUX Boxes for GoldenGate
Note 1124403.1 When Replicat is Running but Not Applying Records
Note 1138409.1 EXTRACT / REPLICAT CHECKPOINT RBA IS LARGER THAN LOCAL TRAILFILE SIZE, AFTER SERVER CRASH
Note 1163073.1 When using rmthost with timeout option, there are orphaned data collector processes on target system
Note 1266389.1 Oracle GoldenGate and compressed tables
Note 1270168.1 OGG-01028 Compression Not Supported – when extract processes regular and cluster tables
Note 1272645.1 Maintaining the OGG Marker table
Note 1276058.1 Oracle GoldenGate Best Practices: Instantiation from an Oracle Source Database
Note 1276538.1 Replicat abending with missing key cols
Note 1280584.1 Where To Find The List Of Supported And Unsupported Oracle Data Types for GoldenGate
Note 1288562.1 GG Data Pumps Error With OGG-01031 When Upgrading the Target Side

 

如何确定所打Patch是否需要停机

Oracle在11g中提出了online patching(也可以叫做hot patch)的概念,有效减少了因实施one-off patch而导致的系统停机时间。但我们如何得知哪些Patch是可以online apply的,而哪些Patch是必须关闭实例(shutdown instance)后应用的呢?
下面我们就介绍一种简单有效地方法来识别这2种Patch:

1.
从MOS下载所需要的Patch文件

2.
解压该Patch文件(一般为zip压缩包),cd到/$PATCH_ID/etc/config目录,打开inventory.xml文件

3.
检查xml文件中的instance_shutdown选项,如
<instance_shutdown>true</instance_shutdown>
则说明该one-patch要求offline apply,而不能在线实施

Merry Christmas Sql Statement

这段Merry christmas的查询来源于Gilles Haro的博客,不过看到的时候已经是27号了,呵呵!

with Mx as
 (select 60 as MaxWidth from dual)
select decode (sign(floor(MaxWidth / 2) - rownum), 1, lpad(' ',
                                              floor(MaxWidth / 2) -
                                              (rownum - 1)) || rpad('*',
                                                      2 *
                                                      (rownum - 1) + 1,
                                                      ' *'), lpad('* * *',
                                                    floor(MaxWidth / 2) + 3))
  from all_tables, Mx
 where rownum < floor(MaxWidth / 2) + 6;


                              * 
                             * * 
                            * * * 
                           * * * * 
                          * * * * * 
                         * * * * * * 
                        * * * * * * * 
                       * * * * * * * * 
                      * * * * * * * * * 
                     * * * * * * * * * * 
                    * * * * * * * * * * * 
                   * * * * * * * * * * * * 
                  * * * * * * * * * * * * * 
                 * * * * * * * * * * * * * * 
                * * * * * * * * * * * * * * * 
               * * * * * * * * * * * * * * * * 
              * * * * * * * * * * * * * * * * * 
             * * * * * * * * * * * * * * * * * * 
            * * * * * * * * * * * * * * * * * * * 
           * * * * * * * * * * * * * * * * * * * * 
          * * * * * * * * * * * * * * * * * * * * * 
         * * * * * * * * * * * * * * * * * * * * * * 
        * * * * * * * * * * * * * * * * * * * * * * * 
       * * * * * * * * * * * * * * * * * * * * * * * * 
      * * * * * * * * * * * * * * * * * * * * * * * * * 
     * * * * * * * * * * * * * * * * * * * * * * * * * * 
    * * * * * * * * * * * * * * * * * * * * * * * * * * * 
   * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
  * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 
                            * * * 
                            * * * 
                            * * * 
                            * * * 
                            * * * 
                            * * * 

Latches and Tuning:Latches

1. Query the V$FIXED_TABLE to determine the V$ views that provide latch information.

a. Execute the following query:> select name from v$fixed_table where name like ‘V$LATCH%’;

2. Remember that there are 14 levels of latches, numbered 0 – 13.

a. Take a look at three common latches and their level by executing the following query:> select name, level# from v$latch where name in
(‘cache buffers chain’,’library cache’,’redo allocation’);

You may or may not see each of these latches listed depending upon the state and activity on your database.

3. Each parent and child latch has one instance in X$KSLLT.

a. Execute the following query to view a KSLLT structure for a latch:> select v$latch_parent.addr,v$latch_parent.level#,v$latch_parent.latch#,
v$latch_parent.name
from v$latch_parent,x$ksllt
where v$latch_parent.addr = x$ksllt.addr;

You should at least see information for the ‘latch wait list’ latch. However, your results will vary depending upon the state and activity of your database.

4. Using the latch# from the previous query, you can query the X$KSLLD table to see the array of latch descriptors.  One of these arrays is stored in every PGA.

a. Execute the following query to view the array for the associated latch#:> select * from X$KSLLD where indx = <latch# from previous query>;

A given process can select from this table to view the latch array.

Note:  As you can see, output from the X$ tables is somewhat cryptic.  However, knowledge of where to gather information regarding latches will help you to more effectively diagnose latch contention and communicate with Oracle support.  The follow on lessons to this module will use the dynamic performance views and system tables for diagnosing and resolving latch contention.

5. Query the V$SYSTEM_PARAMETER to view the SPIN_COUNT and MAX_EXPONENTIAL_SLEEP parameters:

a. Execute the following query:> select name from v$system_parameter where name like ‘%spin%’ or name like ‘%max%’;

Note: If you are running Oracle8i you will get no rows returned.  That’s because each of these parameters are now hidden.

b.  Execute the following query using the X$KSPPI table which lists hidden parameters:

>  select ksppinm from x$ksppi where (ksppinm like ‘%spin%’ or ksppinm like ‘%max%’) and ksppinm like ‘\_%’ escape ‘\’ ;

Also note the _MAX_SLEEPING_HOLDING_LATCH and _LATCH_WAIT_POSTING parameters.  You can query the ksppidesc column to get a description of these parameters.

6. Remember that some latches are held for long periods of time, while others are not.

a. Execute the following query to view three latches that are held for long periods of time:> select name, level# from v$latch where
name in (‘library cache’,’shared pool’,’library cache load lock’,’latch wait list’);

Also notice the level numbers of these latches taking into account that most latches are in the range of  0 – 7.  The latch wait list latch protects the linked list of waiters.

7. Certain wait events are associated with latches.  A session process may wait on a latch and the wait time will be updated in the V$SESSION_EVENT and
V$SYSTEM_EVENT dynamice performance views.  V$SESSION_WAIT also provides event informaton.

a. Execute the following query to view real time information on which sessions are currently waiting or have just waited for an event:> select sid, event, wait_time
from v$session_wait
order by wait_time, event;

Latch Free will appear in the output if there were statistics for this event.

b.  Query V$SYSTEM_EVENT to view aggregate statistics for the latch free event:

> select * from v$system_event where event = ‘latch free’;

c.  Query V$SESSION_EVENT to view statistics for the latch free event for each session:

> select * from v$session_event where event = ‘latch free’;

8. You can obtain information about specific latches by using both the V$SESSION_WAIT and V$LATCH views..

a. Execute the following query to view three latches that are held for long periods of time:> select distinct l.name “Latch”
from v$session_wait s, v$latch l
where s.p2 = l.latch#
and s.event = ‘latch free’;

You may or may not get a result set from this query if there are no session latch free events.  If so then execute the next query to get an idea of wait events:

> select DISTINCT l.name “Latch”, p1, p2
from v$session_wait s, v$latch l
where s.p2 = l.latch#
and s.wait_time = 0;

Note: Remember that P1 is the latch address and P2 is the latch number.  The DISTINCT operator is used because a process may hold more than one of the same latch.  Run the same query without the DISTINCT operator and notice the result set will have multiple P1 values for the same latch.

> select l.name “Latch”, p1, p2
from v$session_wait s, v$latch l
where s.p2 = l.latch#
and s.wait_time = 0;

9.  Latch contention occurs when a latch is requested by several processes at the same time. The impact of latch contention can be assessed using the sleep histogram in V$LATCH.

a.  Execute the following query to derive information about latch sleeps:> select name, sleeps * sleeps / (misses – spin_gets ) impact
from v$latch
where sleeps > 0;

b.  The SLEEP columns in V$LATCH represent the sleep buckets in the histogram.  Describe V$LATCH to view the sleep buckets:

> desc v$latch

Note: SPIN_GETS is also a sleep bucket.  Only the first four buckets are used.
GETS = immediate_gets + (MISSES(SPIN_GETS+sleep_gets(SLEEP1 + SLEEP2 + SLEEP3))

10.  You can also derive information about latches by creating a trace file.

a.  To create a trace file execute the following command:> alter session set events ‘immediate trace name latches level 10’

b.  The trace file will be created in the directory that is specified by the parameter USER_DUMP_DEST.      View this file and you will see information about sleeps and nowait times.

c.  If child latch#’s are listed in the trace file, you can query V$LATCH_CHILDREN to acquire additional information.

沪ICP备14014813号-2

沪公网安备 31010802001379号