如何公开Oracle trace文件?

隐式参数_trace_files_public决定了Oracle产生的trace文件是否公开,该参数默认值为FALSE,也就是非DBA/OINSTALL组的用户是没有权限读取数据库产生的trace文件的;在某些场合中我们需要让非DBA组的用户也能访问trace文件,就可以通过修改该参数实现。请看下面的例子:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> col name for a20
SQL> col value for a20
SQL> col describ for a40
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%_trace_files_public%'
  7  order by x.ksppinm;

NAME                 VALUE                DESCRIB
-------------------- -------------------- ----------------------------------------
_trace_files_public  FALSE                Create publicly accessible trace files

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug ipc;
Information written to trace file.
SQL> oradebug tracefile_name;
/s01/10gdb/admin/YOUYUS/udump/youyus_ora_10268.trc
SQL> !ls -l /s01/10gdb/admin/YOUYUS/udump/youyus_ora_10268.trc
-rw-r----- 1 maclean oinstall 4206 Aug 11 20:51 /s01/10gdb/admin/YOUYUS/udump/youyus_ora_10268.trc
/*所产生的trace文件权限为640,非oinstall组用户无权限读取该文件*/

SQL> alter system set "_trace_files_public"=true;
alter system set "_trace_files_public"=true
                 *
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified
/*修改该参数需要重启实例*/

SQL> alter system set "_trace_files_public"=true scope=spfile;

System altered.

SQL> startup force;
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2084264 bytes
Variable Size             922747480 bytes
Database Buffers          637534208 bytes
Redo Buffers               14692352 bytes
Database mounted.
Database opened.

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug ipc;
Information written to trace file.
SQL> oradebug tracefile_name;
/s01/10gdb/admin/YOUYUS/udump/youyus_ora_10430.trc
SQL> ! ls -l /s01/10gdb/admin/YOUYUS/udump/youyus_ora_10430.trc
-rw-r--r-- 1 maclean oinstall 5471 Aug 11 20:54 /s01/10gdb/admin/YOUYUS/udump/youyus_ora_10430.trc
/*other组用户也具有了读权限*/

SQL> ! ls -l /s01/10gdb/admin/YOUYUS/
total 24
drwxr-x--- 2 maclean oinstall 4096 Aug 11 20:56 adump
drwxr-x--- 2 maclean oinstall 4096 Aug 11 20:54 bdump
drwxr-x--- 2 maclean oinstall 4096 Aug  5 21:35 cdump
drwxr-x--- 2 maclean oinstall 4096 Aug  5 21:36 dpdump
drwxr-x--- 2 maclean oinstall 4096 Aug  5 21:37 pfile
drwxr-x--- 2 maclean oinstall 4096 Aug 11 20:54 udump
/*请注意修改_trace_files_public为true,并不会修改trace所在目录的权限,Oracle默认建立bdump/udump等trace目录时分配的权限为750,other组用户无法进入这些目录,需要修改目录权限为755,即o+r+x*/

SQL> ! chmod o+r+x /s01/10gdb/admin/YOUYUS/*dump

SQL>  ! ls -l /s01/10gdb/admin/YOUYUS/
total 24
drwxr-xr-x 2 maclean oinstall 4096 Aug 11 20:56 adump
drwxr-xr-x 2 maclean oinstall 4096 Aug 11 20:54 bdump
drwxr-xr-x 2 maclean oinstall 4096 Aug  5 21:35 cdump
drwxr-xr-x 2 maclean oinstall 4096 Aug  5 21:36 dpdump
drwxr-x--- 2 maclean oinstall 4096 Aug  5 21:37 pfile
drwxr-xr-x 2 maclean oinstall 4096 Aug 11 20:54 udump

/*需要注意的另一点是修改_trace_files_public参数并不会引起既有的trace文件的权限被修改,典型的例子是alert log告警日志*/
[maclean@rh2 bdump]$ ls -l
total 20
-rw-r----- 1 maclean oinstall 12971 Aug 11 21:17 alert_YOUYUS.log
-rw-r--r-- 1 maclean oinstall   690 Aug 11 21:12 youyus_lgwr_10514.trc

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%_trace_files_public%'
  7  order by x.ksppinm;

NAME                 VALUE                DESCRIB
-------------------- -------------------- ----------------------------------------
_trace_files_public  FALSE                Create publicly accessible trace files

SQL> alter system set "_trace_files_public"=true scope=spfile;

System altered.

SQL> startup force;
ORACLE instance started.

Total System Global Area 1577058304 bytes
Fixed Size                  2084264 bytes
Variable Size             922747480 bytes
Database Buffers          637534208 bytes
Redo Buffers               14692352 bytes
Database mounted.
Database opened.
SQL> !ls -l
total 32
-rw-r----- 1 maclean oinstall 21189 Aug 11 21:20 alert_YOUYUS.log
-rw-r--r-- 1 maclean oinstall   690 Aug 11 21:12 youyus_lgwr_10514.trc
-rw-r--r-- 1 maclean oinstall   690 Aug 11 21:20 youyus_lgwr_11136.trc

UNION ALL returning wrong results?

有应用人员反映某套Linux上的11.2.0.1数据库系统中出现了UNION ALL后返回的结果集不正确的问题,我们具体分析下出现问题的其中一条语句:

SELECT MTL_SECONDARY_INVENTORIES.SECONDARY_INVENTORY_NAME,
       MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
       MTL_SECONDARY_INVENTORIES.DESCRIPTION,
       MTL_SECONDARY_INVENTORIES.AVAILABILITY_TYPE,
       MTL_SECONDARY_INVENTORIES.MATERIAL_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.MATERIAL_OVERHEAD_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.RESOURCE_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.OVERHEAD_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.OUTSIDE_PROCESSING_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.ASSET_INVENTORY,
       MTL_SECONDARY_INVENTORIES.EXPENSE_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.ENCUMBRANCE_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.ATTRIBUTE3,
       MTL_SECONDARY_INVENTORIES.ATTRIBUTE5,
       WORKFLOW_START_TIMES.WORKFLOW_START_TIME
  FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES,
       REPEMEAERP.WORKFLOW_START_TIMES
 WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
       TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
   AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
       WORKFLOW_START_TIMES.WORKFLOW_START_TIME
   AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
       LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/*以上是QUERY A*/
UNION ALL
/*以下是QUERY B*/
SELECT DISTINCT 'WORKORDERS',
                MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
                'WORK ORDERS WITH WIP AS CATEGORY VALUE',
                1,
                0,
                0,
                0,
                0,
                0,
                1,
                0,
                0,
                'MOI',
                '0',
                WORKFLOW_START_TIMES.WORKFLOW_START_TIME
  FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES, EIMMAINT.WORKFLOW_START_TIMES
 WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
       TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
   AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
       WORKFLOW_START_TIMES.WORKFLOW_START_TIME
   AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
       LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/
138 rows selected.

以上查询语句中,QUERY A部分(也就是UNION ALL之前的SELECT语句)单独查询时返回返回69条记录,QUERY B部分单独查询时返回15记录,UNION ALL后返回的结果却是138条记录,而非84条记录。实际上这套系统也是最近才从10g迁移到11gr2上,之前在10g中同样的应用没有出过类似的问题,可以猜测是11g中新引入的某种特性存在可能引发wrong result的Bug。

具体思路虽然有了,但仍无法确定问题的关键所在;我们来看看该SQL的执行计划:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                               |     7 |  2443 |    52   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                   |                               |     7 |  2443 |    52   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL             | WORKFLOW_START_TIMES          |     1 |    29 |    48   (0)| 00:00:01 |
|   3 |   VIEW                          | VW_JF_SET$9BAED2EA            |     1 |   320 |     4   (0)| 00:00:01 |
|   4 |    UNION ALL PUSHED PREDICATE   |                               |       |       |            |          |
|*  5 |     FILTER                      |                               |       |       |            |          |
|   6 |      TABLE ACCESS BY INDEX ROWID| MTL_SECONDARY_INVENTORIES     |     3 |   336 |     2   (0)| 00:00:01 |
|*  7 |       INDEX RANGE SCAN          | IDX_MTL_SECONDARY_INVENTORIES |     1 |       |     1   (0)| 00:00:01 |
|*  8 |     FILTER                      |                               |       |       |            |          |
|   9 |      TABLE ACCESS BY INDEX ROWID| MTL_SECONDARY_INVENTORIES     |     3 |    36 |     2   (0)| 00:00:01 |
|* 10 |       INDEX RANGE SCAN          | IDX_MTL_SECONDARY_INVENTORIES |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("WORKFLOW_START_TIMES"."WORKFLOW_NAME"='w_int_FreqBatch_EMEA')
5 - filter(TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')<"WORKFLOW_START_TIMES"."WORKFLOW_START_TIME") 7 - access("MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT">TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT"<="WORKFLOW_START_TIMES"."WORKFLOW_START_TIME"
)
8 - filter(TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')<"WORKFLOW_START_TIMES"."WORKFLOW_START_TIME") 10 - access("MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT">TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "MTL_SECONDARY_INVENTORIES"."DW_UPDATE_DT"<="WORKFLOW_START_TIMES"."WORKFLOW_START_TIME"
)

你可能从以上执行计划中发现了两处十分陌生的字眼:UNION ALL  PUSHED PREDICATE和VW_JF_SET$。它们是什么!?

先来说说JF,JF是join factorization的缩写,你可以把它翻译作链接因式分解,如果你学过离散数学或者数据库原理的话,那么这种在11.2.0.1中最新推出的基于成本的变换操作对你来说并不陌生。用公式的样式来表达大概是下面这样:

YYA,YYB和YYC是3个关联的数据对象亦或者是3个关联的结果集;
(YYA JOIN YYB) UNION [ALL] (YYA JOIN YYC)
可以转换成为:
YYA JOIN (YYB UNION [ALL] YYC)

这样做YYA部分只需要读取一次,还可以少做一次JOIN,听上去是挺不错的吧!
下面我们来看一个Oracle使用join factorization的十分简单的实例:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> drop table yya;

drop table yya

           *

ERROR at line 1:

ORA-00942: table or view does not exist

SQL> drop table yyb;

drop table yyb

           *

ERROR at line 1:

ORA-00942: table or view does not exist

SQL> create table yya as select rownum id1,rownum id2,rownum id3 from dual connect by level<=20000;
Table created.
SQL> create table yyb as select rownum id1,rownum id2,rownum id3 from dual connect by level<=20000;
Table created.

SQL> explain plan for
2  select * from yya ,yyb where yya.id1=yyb.id1
3  union all
4  select * from yya, yyb where yya.id1=yyb.id1;

Explained.

SQL> set linesize 100 pagesize 1400;

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 744914999

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                    | 40000 |  2500K|    49   (3)| 00:00:01 |
|*  1 |  HASH JOIN           |                    | 40000 |  2500K|    49   (3)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | YYA                | 20000 |   234K|    16   (0)| 00:00:01 |
|   3 |   VIEW               | VW_JF_SET$6E3F6682 | 40000 |  2031K|    32   (0)| 00:00:01 |
|   4 |    UNION-ALL         |                    |       |       |            |          |
|   5 |     TABLE ACCESS FULL| YYB                | 20000 |   761K|    16   (0)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| YYB                | 20000 |   761K|    16   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("YYA"."ID1"="ITEM_1")

/*执行计划中出现了VW_JF_SET$F22B2A93,Oracle选择了使用join factorization,该执行计划总成本49*/

SQL> alter session set "_optimizer_join_factorization"=false;

Session altered.

/*隐藏参数_optimizer_join_factorization决定了优化器是否可以选用join factorization,现在我们禁用它*/
SQL> explain plan for
  2  select * from yya join yyb on yya.id1=yyb.id1
  3  union all
  4  select * from yya join yyb on yya.id1=yyb.id1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3439541885

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 40000 |  1992K|    66  (52)| 00:00:01 |
|   1 |  UNION-ALL          |      |       |       |            |          |
|*  2 |   HASH JOIN         |      | 20000 |   996K|    33   (4)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| YYA  | 20000 |   234K|    16   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| YYB  | 20000 |   761K|    16   (0)| 00:00:01 |
|*  5 |   HASH JOIN         |      | 20000 |   996K|    33   (4)| 00:00:01 |
|   6 |    TABLE ACCESS FULL| YYA  | 20000 |   234K|    16   (0)| 00:00:01 |
|   7 |    TABLE ACCESS FULL| YYB  | 20000 |   761K|    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("YYA"."ID1"="YYB"."ID1")
   5 - access("YYA"."ID1"="YYB"."ID1")
/*禁用链接因式分解后,Oracle使用了常规的"笨办法",成本上升到66*/

/*有趣的是下面的测试*/

SQL> alter session set "_optimizer_join_factorization"=true;

Session altered.

SQL> create table yyc as select * from yyb;

Table created.

SQL> explain plan for
  2  select * from yya,yyc where yya.id1=yyc.id1
  3  union all
  4  select * from yya,yyb where yya.id1=yyb.id1;

Explained.

SQL>  select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 4240055274

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 40000 |  1992K|    66  (52)| 00:00:01 |
|   1 |  UNION-ALL          |      |       |       |            |          |
|*  2 |   HASH JOIN         |      | 20000 |   996K|    33   (4)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| YYA  | 20000 |   234K|    16   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| YYC  | 20000 |   761K|    16   (0)| 00:00:01 |
|*  5 |   HASH JOIN         |      | 20000 |   996K|    33   (4)| 00:00:01 |
|   6 |    TABLE ACCESS FULL| YYA  | 20000 |   234K|    16   (0)| 00:00:01 |
|   7 |    TABLE ACCESS FULL| YYB  | 20000 |   761K|    16   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("YYA"."ID1"="YYC"."ID1")
   5 - access("YYA"."ID1"="YYB"."ID1")
/*confused,Oracle有什么理由在这里反而不用join factorization了呢?看起来短期内join factorization的实际应用还有待"商榷"
*/

/*10053事件能解释这一问题吗?*/
SQL> alter system flush shared_pool;

System altered.

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever,level 1;
Statement processed.
SQL> explain plan for
  2  select * from yya join yyb on yya.id1=yyb.id1
  3  union all
  4  select * from yya join yyc on yya.id1=yyc.id1;

Explained.

SQL> oradebug event 10053 trace name context off;
Statement processed.
SQL> oradebug tracefile_name;
/home/maclean/app/maclean/diag/rdbms/prod/PROD/trace/PROD_ora_7907.trc

view /home/maclean/app/maclean/diag/rdbms/prod/PROD/trace/PROD_ora_7907.trc
***********************************
Cost-Based Join Factorization
***********************************
Join-Factorization on query block SET$1 (#1)
JF: Using search type: exhaustive
JF: Generate basic transformation units
Validating JF unit: (branch: {2, 3} table: {YYA, YYA})
  rejected: join predicates do not match

JF: Generate transformation units from basic units
JF: No state generated.
/*优化器认为其链接谓词不符合使用join  factorization的条件,JF题案被驳回,"悬案"!*/

join factorization是很棒的新技术,这点没错,但新技术往往又是horrible(可怕的),最近我常用这个词。我们的问题是不是这个新来的引起的呢?通过join factorization关键字检索MOS,可以发现一个今年(2010)3月出现的Bug 9504322,quote:

Hdr: 9504322 11.2.0.1 RDBMS 11.2.0.1 QRY OPTIMIZER PRODID-5 PORTID-226
Abstract: WRONG RESULTS WITH UNION_ALL AND INLINE VIEWS

*** 03/24/10 05:38 am ***

PROBLEM:
--------
Wrong results on 11.2 for queries of type:

SELECT * FROM
(
SELECT ... FROM view, table WHERE ...
UNION ALL
SELECT ... FROM view, table WHERE NOT ...
);

DIAGNOSTIC ANALYSIS:
--------------------
Problem seen between 10.2.0.4 and 11.2.0.1.
If we remove the use of inline view the correct results are returned.

WORKAROUND:
-----------
N/A

RELATED BUGS:
-------------

REPRODUCIBILITY:
----------------
It is reproducing on generic 11.2.0.1

呵呵,似乎有点眉目了,不过实践是检验真理的唯一标准:


SQL> alter session set "_optimizer_join_factorization"=true;

Session altered.

SELECT MTL_SECONDARY_INVENTORIES.SECONDARY_INVENTORY_NAME,
       MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
       MTL_SECONDARY_INVENTORIES.DESCRIPTION,
       MTL_SECONDARY_INVENTORIES.AVAILABILITY_TYPE,
       MTL_SECONDARY_INVENTORIES.MATERIAL_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.MATERIAL_OVERHEAD_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.RESOURCE_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.OVERHEAD_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.OUTSIDE_PROCESSING_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.ASSET_INVENTORY,
       MTL_SECONDARY_INVENTORIES.EXPENSE_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.ENCUMBRANCE_ACCOUNT,
       MTL_SECONDARY_INVENTORIES.ATTRIBUTE3,
       MTL_SECONDARY_INVENTORIES.ATTRIBUTE5,
       WORKFLOW_START_TIMES.WORKFLOW_START_TIME
  FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES,
       REPEMEAERP.WORKFLOW_START_TIMES
 WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
       TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
   AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
       WORKFLOW_START_TIMES.WORKFLOW_START_TIME
   AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
       LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/*以上是QUERY A*/
UNION ALL
/*以下是QUERY B*/
SELECT DISTINCT 'WORKORDERS',
                MTL_SECONDARY_INVENTORIES.ORGANIZATION_ID,
                'WORK ORDERS WITH WIP AS CATEGORY VALUE',
                1,
                0,
                0,
                0,
                0,
                0,
                1,
                0,
                0,
                'MOI',
                '0',
                WORKFLOW_START_TIMES.WORKFLOW_START_TIME
  FROM REPEMEAERP.MTL_SECONDARY_INVENTORIES, EIMMAINT.WORKFLOW_START_TIMES
 WHERE MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT >
       TO_DATE('01/01/1900 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
   AND MTL_SECONDARY_INVENTORIES.DW_UPDATE_DT <=
       WORKFLOW_START_TIMES.WORKFLOW_START_TIME
   AND WORKFLOW_START_TIMES.WORKFLOW_NAME =
       LTRIM(RTRIM('w_int_FreqBatch_EMEA'))
/

138 rows selected.

结果和我们猜想的大相径庭,join factorization并非罪魁,找不到终点让我们回到原点。
至此UNION ALL PUSHED PREDICATE有了极大的嫌疑,什么是PUSH PREDICATE?我把它叫做谓词前推,这玩样最早出现在10g上,但一直问题多多!它到底是何种OPERATION呢?让我们来看看下面的例子:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> create table youyus (t1 int,t2 varchar2(20));

Table created.

SQL> alter table youyus add primary key(t1);

Table altered.

SQL> explain plan for
  2  select *
  3    from youyus
  4  union all
  5  select * from youyus;

Explained.
/*在之后的语句中将用到这个子查询*/
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1959159425

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     2 |    50 |     4  (50)| 00:00:01 |
|   1 |  UNION-ALL         |        |       |       |            |          |
|   2 |   TABLE ACCESS FULL| YOUYUS |     1 |    25 |     2   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| YOUYUS |     1 |    25 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
/*在之后的语句中将用到这个子查询,这里它的"原始"执行计划十分简单*/

SQL> explain plan for
  2  select v2.t1, v2.t2
  3    from (select t1 from youyus where rownum=1) v1,
  4         (select *
  5            from youyus
  6          union all
  7          select * from youyus) v2
  8   where v1.t1 = v2.t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2456530141

-----------------------------------------------------------------------------------------------
| Id  | Operation                      | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |              |     1 |    27 |     1   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                  |              |     1 |    27 |     1   (0)| 00:00:01 |
|   2 |   VIEW                         |              |     1 |    13 |     1   (0)| 00:00:01 |
|*  3 |    COUNT STOPKEY               |              |       |       |            |          |
|   4 |     INDEX FULL SCAN            | SYS_C0010819 |     1 |    13 |     1   (0)| 00:00:01 |
|   5 |   VIEW                         |              |     1 |    14 |     0   (0)| 00:00:01 |
|   6 |    UNION ALL PUSHED PREDICATE  |              |       |       |            |          |
|   7 |     TABLE ACCESS BY INDEX ROWID| YOUYUS       |     1 |    25 |     0   (0)| 00:00:01 |
|*  8 |      INDEX UNIQUE SCAN         | SYS_C0010819 |     1 |       |     0   (0)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID| YOUYUS       |     1 |    25 |     0   (0)| 00:00:01 |
|* 10 |      INDEX UNIQUE SCAN         | SYS_C0010819 |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(ROWNUM=1)
   8 - access("YOUYUS"."T1"="V1"."T1")
  10 - access("YOUYUS"."T1"="V1"."T1")
/* PUSHED PREDICATE将谓词逻辑前推到UNION ALL的子查询中,其优势在于可以避免全表扫描,利用索引*/

SQL> set linesize 100 pagesize 1400;
SQL>
SQL> explain plan for
  2  select /*+ no_push_pred(v2) */ v2.t1, v2.t2
  3    from (select t1 from youyus where rownum=1) v1,
  4         (select *
  5            from youyus
  6          union all
  7          select * from youyus) v2
  8   where v1.t1 = v2.t1;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 2769827061

-------------------------------------------------------------------------------------
| Id  | Operation            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |              |     1 |    38 |     6  (17)| 00:00:01 |
|*  1 |  HASH JOIN           |              |     1 |    38 |     6  (17)| 00:00:01 |
|   2 |   VIEW               |              |     1 |    13 |     1   (0)| 00:00:01 |
|*  3 |    COUNT STOPKEY     |              |       |       |            |          |
|   4 |     INDEX FULL SCAN  | SYS_C0010819 |     1 |    13 |     1   (0)| 00:00:01 |
|   5 |   VIEW               |              |     2 |    50 |     4   (0)| 00:00:01 |
|   6 |    UNION-ALL         |              |       |       |            |          |
|   7 |     TABLE ACCESS FULL| YOUYUS       |     1 |    25 |     2   (0)| 00:00:01 |
|   8 |     TABLE ACCESS FULL| YOUYUS       |     1 |    25 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("V1"."T1"="V2"."T1")
   3 - filter(ROWNUM=1)
/*no_push_pred hint让Oracle 放弃使用PUSHED PREDICATE,使用常规UNION-ALL操作后,子查询执行计划回归成全表扫描,整个计划成本上升*/

10g中HASH GROUP BY引起的临时表空间不足

今天早上应用人员反映一个原本在9i上可以顺利完成的CTAS脚本,迁移到10g后运行总是报“ORA-1652: unable to extend temp segment by 128 in tablespace TS_HQY1_TEMP “无法扩展临时表空间的错误。应用人员表示该脚本涉及的数据量在迁移前后变化不大,而且令人匪夷所思的是在新的10g库上临时表空间大小已达40多个G,要远大于原9i库。很显然这不是由于临时表空间过小导致的该问题,更多的原因肯定是出在迁移后Oracle不同的行为方式上。
该脚本每月执行一次用以汇总数据,其中一个单表接近4亿行记录,GROUP BY操作涉及到的数据量十分庞大。我们来具体看一下这个SQL:

create table gprs_bill.zou_201007_cell_id as
select /* g_all_cdr01,60 */
 calling_num mobile_number,
 lac,
 lpad(cell_id, 5, '0') cell_id,
 count(*) c,
 sum(call_duration) call_duration,
 sum(decode(record_type, '00', 1, 0) * call_duration) moc_call_duration,
 sum(decode(record_type, '01', 1, 0) * call_duration) mtc_call_duarion
  from gprs_bill.g_all_cdr01
 where substr(calling_num, 1, 7) in
       (select mobile_prefix from gprs_bill.zou_mobile_prefix)
 group by calling_num, lac, lpad(cell_id, 5, '0');

SQL> set autotrace traceonly exp
SQL> select /* g_all_cdr01,60 */
  2  calling_num mobile_number,
  3  lac,
  4  lpad(cell_id,5,'0') cell_id,
  5  count(*) c,
  6  sum(call_duration) call_duration,
  7  sum(decode(record_type,'00',1,0)*call_duration) moc_call_duration,
  8  sum(decode(record_type,'01',1,0)*call_duration) mtc_call_duarion
  9  from  gprs_bill.g_all_cdr01
 10  where substr(calling_num,1,7) in (select mobile_prefix from gprs_bill.zou_mobile_prefix)
 11  group by
 12  calling_num ,
 13  lac,
 14  lpad(cell_id,5,'0');

Execution Plan
----------------------------------------------------------
Plan hash value: 212866585

--------------------------------------------------------------------------------
-------------------

| Id  | Operation             | Name              | Rows  | Bytes |TempSpc| Cost
 (%CPU)| Time     |

--------------------------------------------------------------------------------
-------------------

|   0 | SELECT STATEMENT      |                   |   229K|  9880K|       |  103
3K  (3)| 03:26:41 |

|   1 |  HASH GROUP BY        |                   |   229K|  9880K|    22M|  103
3K  (3)| 03:26:41 |

|*  2 |   HASH JOIN RIGHT SEMI|                   |   229K|  9880K|       |  103
0K  (3)| 03:26:10 |

|   3 |    TABLE ACCESS FULL  | ZOU_MOBILE_PREFIX |  1692 | 13536 |       |    1
1   (0)| 00:00:01 |

|   4 |    TABLE ACCESS FULL  | G_ALL_CDR01       |   388M|    13G|       |  102
6K  (2)| 03:25:21 |

--------------------------------------------------------------------------------
-------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("MOBILE_PREFIX"=SUBSTR("CALLING_NUM",1,7))

可以看到Oracle使用了HASH GROUP BY 算法以实现数据分组;HASH算法是10g中新引入的分组算法。
下面我们来详细介绍下10g中数据分组的改动:
在10g中GROUP BY操作仍将引发排序操作,但10g中引入了新的算法,这些算法都不保证返回的数据行有序排列;在10g中如果想保证”GROUP BY”后返回的数据有序排列则需要强制使用”ORDER BY”子句,这点和9i是截然不同的。若你没有指定”ORDER BY”子句,则不能保证返回的结果正确排序。
在10g中”GROUP BY”子句更倾向于使用一种HASH算法而非原先的SORT算法来分组数据,HASH算法的CPU COST要低于原先的SORT算法。但这2种算法在10g中都不保证返回数据正常排序,当采用SORT算法时可能”碰巧”出现返回正常排序数据的状况。
MOS建议,如果迁移中出现大量不利的变化,则可以通过修改参数来确保沿用原先的算法。但需要注意的是,即便采用了以下参数仍不能保证10g后”GROUP BY”后返回的数据如9i中那样排序,你需要做的是加入显式的”ORDER BY”子句以保证Oracle为你做到这一点。

alter session set "_gby_hash_aggregation_enabled" = false;
alter session set optimizer_features_enable="9.2.0";
或者
alter session set optimizer_features_enable="8.1.7";

其中_gby_hash_aggregation_enabled隐式参数决定了Oracle是否可以启用新的HASH算法来进行数据分组(也适用于distinct等操作)。

对于以上说法我们通过实验进一步验证:

在11g中的测试如下:
SQL> select  * from v$version;

BANNER
----------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select  *  from youyus;

T1                 T2
---------- ----------
A                  10
B                  10
F                  30
G                  30
H                  40
I                  40
J                  40
L                  20
M                  20

已选择9行。
SQL>  analyze table youyus compute statistics for all columns;

表已分析。

SQL> set autotrace on;

SQL>  select t2,count(*) from youyus group by t2;

        T2   COUNT(*)
---------- ----------
        30          2
        20          2
        40          3
        10          2


执行计划
----------------------------------------------------------
Plan hash value: 2940504347

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     4 |     8 |     3  (34)| 00:00:01 |
|   1 |  HASH GROUP BY     |        |     4 |     8 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| YOUYUS |     9 |    18 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
/*可以看到使用了hash算法,且返回结果未按t2列大小顺序排列*/

SQL> select t2,count(*) from youyus group by t2 order by t2;

        T2   COUNT(*)
---------- ----------
        10          2
        20          2
        30          2
        40          3


执行计划
----------------------------------------------------------
Plan hash value: 1349668650

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     4 |     8 |     3  (34)| 00:00:01 |
|   1 |  SORT GROUP BY     |        |     4 |     8 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| YOUYUS |     9 |    18 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
/*加入order by子句后,又变回了SORT算法,而且正常排序*/
SQL> alter session set "_gby_hash_aggregation_enabled" = false;

会话已更改。
SQL> alter session set optimizer_features_enable="9.2.0";

会话已更改。
SQL> select t2,count(*) from youyus group by t2;

        T2   COUNT(*)
---------- ----------
        10          2
        20          2
        30          2
        40          3


执行计划
----------------------------------------------------------
Plan hash value: 1349668650

-------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     4 |     8 |    11 |
|   1 |  SORT GROUP BY     |        |     4 |     8 |    11 |
|   2 |   TABLE ACCESS FULL| YOUYUS |     9 |    18 |     2 |
-------------------------------------------------------------

Note
-----
   - cpu costing is off (consider enabling it)
/*optimizer_features_enable设置为9.2.0后cpu cost被off了;返回数据正确排序,但我们要记住这是"侥幸"*/

SQL> alter session set optimizer_features_enable="10.2.0.5";

会话已更改。
SQL> select t2,count(*) from youyus group by t2;

        T2   COUNT(*)
---------- ----------
        10          2
        20          2
        30          2
        40          3


执行计划
----------------------------------------------------------
Plan hash value: 1349668650

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     4 |     8 |     3  (34)| 00:00:01 |
|   1 |  SORT GROUP BY     |        |     4 |     8 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| YOUYUS |     9 |    18 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
/*optimizer_features_enable设为10.2.0.5 一切正常*/
SQL> alter session set optimizer_features_enable="11.2.0.1";

会话已更改。

SQL> select t2,count(*) from youyus group by t2;

        T2   COUNT(*)
---------- ----------
        10          2
        20          2
        30          2
        40          3


执行计划
----------------------------------------------------------
Plan hash value: 1349668650

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     4 |     8 |     3  (34)| 00:00:01 |
|   1 |  SORT GROUP BY     |        |     4 |     8 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| YOUYUS |     9 |    18 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------
/*11.2.0.1中没有变化*/
SQL> alter session set optimizer_features_enable="8.1.7";

会话已更改。

SQL> alter session set "_gby_hash_aggregation_enabled" =true;

会话已更改。
/*看看optimizer_features_enable设为8.1.7,而_gby_hash_aggregation_enabled为true,这种"矛盾"情况下的表现*/
SQL> select t2,count(*) from youyus group by t2;

        T2   COUNT(*)
---------- ----------
        30          2
        20          2
        40          3
        10          2


执行计划
----------------------------------------------------------
Plan hash value: 2940504347

-------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     4 |     8 |    10 |
|   1 |  HASH GROUP BY     |        |     4 |     8 |    10 |
|   2 |   TABLE ACCESS FULL| YOUYUS |     9 |    18 |     1 |
-------------------------------------------------------------

Note
-----
   - cpu costing is off (consider enabling it)
/*居然仍采用了HASH GROUP BY,看起来类似_gby_hash_aggregation_enabled这类参数优先级要高于optimizer_features_enable*/

9i上的表现如下:
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
PL/SQL Release 9.2.0.4.0 - Production
CORE    9.2.0.3.0       Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

SQL> analyze table youyus_9i compute statistics for all columns;

Table analyzed.

SQL> select * from youyus_9i;

T1         T2
-- ----------
A          10
B          10
F          30
G          30
H          40
I          40
J          40
L          20
M          20

9 rows selected.

SQL> alter session set optimizer_mode=ALL_ROWS;

Session altered.

SQL> select t2,count(*) from youyus_9i group by t2;

        T2   COUNT(*)
---------- ----------
        10          2
        20          2
        30          2
        40          3


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=4 Card=4 Bytes=8)
   1    0   SORT (GROUP BY) (Cost=4 Card=4 Bytes=8)
   2    1     TABLE ACCESS (FULL) OF 'YOUYUS_9I' (Cost=2 Card=21 Bytes
          =42)
/*9i下虽然没有指定order by,但我们可以放心返回的数据总是排序的;*/

SQL> alter session set "_gby_hash_aggregation_enabled" =true;
alter session set "_gby_hash_aggregation_enabled" =true
                  *
ERROR at line 1:
ORA-02248: invalid option for ALTER SESSION
/*9i下不存在_gby_hash_aggregation_enabled隐式参数*/

That's great!

应用脚本没有数据一定要正确排序的强制要求,但使用HASH GROUP BY算法后临时表空间的使用量大幅上升,远大于之前在9i上的使用量,最后导致语句无法顺利完成。首先想到的当然是通过修改_gby_hash_aggregation_enabled参数恢复到原先的SORT算法,并观察其临时表空间使用量:

SQL> alter session set "_gby_hash_aggregation_enabled"=false;
Session altered.

SQL> select /* g_all_cdr01,60 */
  2  calling_num mobile_number,
  3  lac,
  4  lpad(cell_id,5,'0') cell_id,
  5  count(*) c,
  6  sum(call_duration) call_duration,
  7  sum(decode(record_type,'00',1,0)*call_duration) moc_call_duration,
  8  sum(decode(record_type,'01',1,0)*call_duration) mtc_call_duarion
  9  from  gprs_bill.g_all_cdr01
 10  where substr(calling_num,1,7) in (select mobile_prefix from gprs_bill.zou_mobile_prefix)
 11  group by
 12  calling_num ,
 13  lac,
 14  lpad(cell_id,5,'0');

Execution Plan
----------------------------------------------------------
Plan hash value: 4013005149

--------------------------------------------------------------------------------
-------------------

| Id  | Operation             | Name              | Rows  | Bytes |TempSpc| Cost
 (%CPU)| Time     |

--------------------------------------------------------------------------------
-------------------

|   0 | SELECT STATEMENT      |                   |   229K|  9880K|       |  103
3K  (3)| 03:26:41 |

|   1 |  SORT GROUP BY        |                   |   229K|  9880K|    22M|  103
3K  (3)| 03:26:41 |

|*  2 |   HASH JOIN RIGHT SEMI|                   |   229K|  9880K|       |  103
0K  (3)| 03:26:10 |

|   3 |    TABLE ACCESS FULL  | ZOU_MOBILE_PREFIX |  1692 | 13536 |       |    1
1   (0)| 00:00:01 |

|   4 |    TABLE ACCESS FULL  | G_ALL_CDR01       |   388M|    13G|       |  102
6K  (2)| 03:25:21 |

--------------------------------------------------------------------------------
-------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("MOBILE_PREFIX"=SUBSTR("CALLING_NUM",1,7))

/*重新执行出现问题的脚本*/
create table gprs_bill.zou_201007_cell_id as
    select /* g_all_cdr01,60 */
    calling_num mobile_number,
    lac,
    lpad(cell_id,5,'0') cell_id,
    count(*) c,
    sum(call_duration) call_duration,
    sum(decode(record_type,'00',1,0)*call_duration) moc_call_duration,
    sum(decode(record_type,'01',1,0)*call_duration) mtc_call_duarion
    from  gprs_bill.g_all_cdr01
    where substr(calling_num,1,7) in (select mobile_prefix from gprs_bill.zou_mobile_prefix)
    group by
    calling_num ,
    lac,
    lpad(cell_id,5,'0');

可以看到在会话级别设置_gby_hash_aggregation_enabled为false后,Oracle不再采用10g中的HASH分组算法;因为该CTAS SQL脚本运行时间较长,我们通过动态视图V$SORT_USAGE来观察其运行期间的排序段使用量:

SQL> set time   on;
14:30:59 SQL> select tablespace,contents,segtype,blocks*8/1024 from v$sort_usage where username='GPRS_BILL';

TABLESPACE                      CONTENTS  SEGTYPE   BLOCKS*8/1024
------------------------------- --------- --------- -------------
TS_HQY1_TEMP                    TEMPORARY SORT               9349

14:35:59 SQL> /

TABLESPACE                      CONTENTS  SEGTYPE   BLOCKS*8/1024
------------------------------- --------- --------- -------------
TS_HQY1_TEMP                    TEMPORARY SORT              10011

/*5分钟内共用10011-9349=662MB 临时空间*/
15:02:46 SQL> select target ,totalwork,sofar,time_remaining,elapsed_seconds from v$session_longops where sofar!=totalwork;

TARGET                                                            TOTALWORK      SOFAR TIME_REMAINING ELAPSED_SECONDS
---------------------------------------------------------------- ---------- ---------- -------------- ---------------
GPRS_BILL.G_ALL_CDR01                                               5575890    5435796            143            5557

15:05:10 SQL> select target ,totalwork,sofar,time_remaining,elapsed_seconds from v$session_longops where sofar!=totalwork;

TARGET                                                            TOTALWORK      SOFAR TIME_REMAINING ELAPSED_SECONDS
---------------------------------------------------------------- ---------- ---------- -------------- ---------------
GPRS_BILL.G_ALL_CDR01                                               5575890    5562082             14            5692

15:05:13 SQL> select tablespace,contents,segtype,blocks*8/1024 from v$sort_usage where username='GPRS_BILL';

TABLESPACE                      CONTENTS  SEGTYPE   BLOCKS*8/1024
------------------------------- --------- --------- -------------
TS_HQY1_TEMP                    TEMPORARY SORT              13835

15:12:22 SQL> select tablespace,contents,segtype,blocks*8/1024 from v$sort_usage where username='GPRS_BILL';

TABLESPACE                      CONTENTS  SEGTYPE   BLOCKS*8/1024
------------------------------- --------- --------- -------------
TS_HQY1_TEMP                    TEMPORARY SORT              13922

/* 排序已经完成,排序段不再增长*/

该分组操作最后排序段使用量为13922MB,在客户可以接受的范围内。看起来新引入的HASH算法虽然有CPU成本低于SORT算法的优势,但可能消耗大量临时空间,可谓有得有失。

DataGuard Managed recovery hang

Our team deleted some archivelog by mistake. Rolled the database forwards by RMAN incremental recovery to an SCN. Did a manual recovery to sync it with the primary. Managed recovery is now failing.
alter database recover managed standby database disconnect

Alert log has :

Fri Jan 22 13:50:22 2010
Attempt to start background Managed Standby Recovery process
MRP0 started with pid=12
MRP0: Background Managed Standby Recovery process started
Media Recovery Waiting for thread 1 seq# 193389
Fetching gap sequence for thread 1, gap sequence 193389-193391
Trying FAL server: ITS
Fri Jan 22 13:50:28 2010
Completed: alter database recover managed standby database d
Fri Jan 22 13:53:25 2010
Failed to request gap sequence. Thread #: 1, gap sequence: 193389-193391
All FAL server has been attempted.

Managed recovery was working earlier today after the Rman incremental and resolved two gaps automatically. But it now appears hung with the standby falling behind the primary.

SQL> show parameter fal

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fal_client string ITS_STBY
fal_server string ITS

[v08k608:ITS:oracle]$ tnsping ITS_STBY

TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:17

Copyright (c) 1997 Oracle Corporation. All rights reserved.

Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= v08k608.am.mot.com)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (10 msec)
[v08k608:ITS:oracle]$ tnsping ITS

TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:01:27

Copyright (c) 1997 Oracle Corporation. All rights reserved.

Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora


Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 187.10.68.75)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (320 msec)

Primary has :
SQL> show parameter log_archive_dest_2
log_archive_dest_2 string SERVICE=DRITS_V08K608 reopen=6
0 max_failure=10 net_timeout=1
80 LGWR ASYNC=20480 OPTIONAL
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2 string ENABLE
[ITS]/its15/oradata/ITS/arch> tnsping DRITS_V08K608
TNS Ping Utility for Solaris: Version 9.2.0.7.0 - Production on 22-JAN-2010 15:03:24
Copyright (c) 1997 Oracle Corporation. All rights reserved.
Used parameter files:
/oracle/product/9.2.0/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL= TCP)(Host= 10.177.13.57)(Port= 1526)) (CONNECT_DATA = (SID = ITS)))
OK (330 msec)

The arch process on the primary database might hang due to a bug below so that it couldn’t ship the missing archive log
files to the standby database.

BUG 6113783 ARC PROCESSES CAN HANG INDEFINITELY ON NETWORK
[ Not published so not viewable in My Oracle Support ]
Fixed 11.2, 10.2.0.5 patchset

We could work workaround the issue by killing the arch processes on the primary site and they will be respawned
automatically immediately without harming the primary database.

[maclean@rh2 ~]$ ps -ef|grep arc
maclean   8231     1  0 22:24 ?        00:00:00 ora_arc0_PROD
maclean   8233     1  0 22:24 ?        00:00:00 ora_arc1_PROD
maclean   8350  8167  0 22:24 pts/0    00:00:00 grep arc
[maclean@rh2 ~]$ kill -9 8231 8233
[maclean@rh2 ~]$ ps -ef|grep arc
maclean   8389     1  0 22:25 ?        00:00:00 ora_arc0_PROD
maclean   8391     1  1 22:25 ?        00:00:00 ora_arc1_PROD
maclean   8393  8167  0 22:25 pts/0    00:00:00 grep arc

and alert log will have:

Fri Jul 30 22:25:27 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=26, OS id=8389
Fri Jul 30 22:25:27 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=27, OS id=8391
Fri Jul 30 22:25:27 EDT 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
Fri Jul 30 22:25:27 EDT 2010
ARC1: Becoming the heartbeat ARCH

Actually if we don’t kill some fatal process in 10g , oracle will respawn all nonfatal processes.
For example:

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v grep
maclean  14264     1  0 23:16 ?        00:00:00 ora_pmon_PROD
maclean  14266     1  0 23:16 ?        00:00:00 ora_psp0_PROD
maclean  14268     1  0 23:16 ?        00:00:00 ora_mman_PROD
maclean  14270     1  0 23:16 ?        00:00:00 ora_dbw0_PROD
maclean  14272     1  0 23:16 ?        00:00:00 ora_lgwr_PROD
maclean  14274     1  0 23:16 ?        00:00:00 ora_ckpt_PROD
maclean  14276     1  0 23:16 ?        00:00:00 ora_smon_PROD
maclean  14278     1  0 23:16 ?        00:00:00 ora_reco_PROD
maclean  14338     1  0 23:16 ?        00:00:00 ora_arc0_PROD
maclean  14340     1  0 23:16 ?        00:00:00 ora_arc1_PROD
maclean  14452     1  0 23:17 ?        00:00:00 ora_s000_PROD
maclean  14454     1  0 23:17 ?        00:00:00 ora_d000_PROD
maclean  14456     1  0 23:17 ?        00:00:00 ora_cjq0_PROD
maclean  14458     1  0 23:17 ?        00:00:00 ora_qmnc_PROD
maclean  14460     1  0 23:17 ?        00:00:00 ora_mmon_PROD
maclean  14462     1  0 23:17 ?        00:00:00 ora_mmnl_PROD
maclean  14467     1  0 23:17 ?        00:00:00 ora_q000_PROD
maclean  14568     1  0 23:18 ?        00:00:00 ora_q001_PROD

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v pmon|grep -v ckpt |grep -v lgwr|grep -v smon|grep -v grep|grep -v dbw|grep -v psp|grep -v mman |grep -v rec|awk '{print $2}'|xargs kill -9

and alert log will have:
Fri Jul 30 23:20:58 EDT 2010
ARCH: Detected ARCH process failure
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=20, OS id=14959
Fri Jul 30 23:20:58 EDT 2010
ARC0: Archival started
ARC1: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
Fri Jul 30 23:20:58 EDT 2010
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC1 started with pid=21, OS id=14961
ARC1: Becoming the heartbeat ARCH
Fri Jul 30 23:21:29 EDT 2010
found dead shared server 'S000', pid = (10, 3)
found dead dispatcher 'D000', pid = (11, 3)
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process CJQ0
Restarting dead background process QMNC
CJQ0 started with pid=12, OS id=15124
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMON
QMNC started with pid=13, OS id=15126
Fri Jul 30 23:22:29 EDT 2010
Restarting dead background process MMNL
MMON started with pid=14, OS id=15128
MMNL started with pid=16, OS id=15132

That's all right!

重做日志浪费(redo wastage)

Oracle中联机日志文件(online redo log)在大多平台上以512 字节为一个标准块。

(HPUX,Tru64 Unix上是1024bytes,SCO UNIX,Reliant UNIX上是2048bytes,而MVS,MPE/ix上是4096bytes,虽然以上许多UNIX已经不再流行,实际情况可以通过

select max(l.lebsz) log_block_size_kccle

from sys.x$kccle l

where l.inst_id = userenv(‘Instance’)   语句查询到)

LGWR后台进程写出REDO时未必能填满最后的当前日志块。举例而言,假设redo buffer中有1025字节的内容需要写出,则1025=512+512+1 共占用三个重做日志标准块,前2个标准块被填满而第三个标准块只使用了1个字节。在LGWR完成写出前,需要释放”redo allocation”闩,在此之前SGA中索引”redo buffer”信息的变量将指向未被填满块后面的一个重做块,换而言之有511字节的空间被LGWR跳过了,这就是我们说的redo wastage;我们可以通过分析v$sysstat动态视图中的redo wastage统计信息了解实例生命周期中的重做浪费量。

SQL> col name for a25
SQL> select name,value from v$sysstat where name like '%wastage%';

NAME                           VALUE
------------------------- ----------
redo wastage                  132032

redo wastage的一个图示:

为什么要浪费这部分空间呢?实际上,这种做法十分有益于LGWR的串行I/O模式。redo wastage并不是问题或者Bug,而是Oracle故意为之的。当然过量的redo wastage也不是什么好事,一般是LGWR写出过于频繁的症状表现。9i以后很少有因为隐式参数_log_io_size过小而导致的LGWR过载了,如果在您的系统中发现redo wastage的问题不小,那么无限制地滥用commit操作往往是引起问题的罪魁祸首,减少不必要的commit语句,把commit从循环中移除都将利于减少redo wastage。

我们来看一下关于redo wastage的演示:

SQL> select distinct bytes/1024/1024 from v$log;

BYTES/1024/1024
---------------
             50                          /*确认联机日志文件大小为50MB*/
SQL> archive log list;                 /*确认数据库处于归档状态*/
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /s01/arch
SQL> set time on;
19:49:45 SQL> alter system switch logfile;           /*切换日志,清理现场*/
System altered.
19:51:07 SQL> col name for a25
19:51:16 SQL> select name,value from v$sysstat where name in ('redo size','redo wastage');

NAME                           VALUE
------------------------- ----------
redo size                 1418793324
redo wastage                88286544               /*演示开始时的基础统计值*/
19:51:19 SQL> begin
19:52:10   2  for i in 1..550000 loop
19:52:10   3  insert into tv values(1,'a');
19:52:10   4   commit;
19:52:10   5   end loop;
19:52:10   6   end;
19:52:11   7  /
/*匿名块中commit操作位于loop循环内,将导致大量redo wastage*/
PL/SQL procedure successfully completed.

19:53:07 SQL> select name,value from v$sysstat where name in ('redo size','redo wastage');

NAME                           VALUE
------------------------- ----------
redo size                 1689225404
redo wastage               112011352
/*频繁提交的匿名块产生了 1689225404-1418793324=257MB的redo,其中存在112011352-88286544=22MB的redo wastage*/

19:53:14 SQL>  begin
19:53:33   2  for i in 1..550000 loop
19:53:33   3  insert into tv values(1,'a');
19:53:33   4  end loop;
19:53:33   5    commit;
19:53:33   6   end;
19:53:34   7  /
/* 此匿名块中commit操作被移除loop循环中,批量修改数据后仅在最后提交一次*/
PL/SQL procedure successfully completed.

19:53:59 SQL> select name,value from v$sysstat where name in ('redo size','redo wastage');

NAME                           VALUE
------------------------- ----------
redo size                 1828546240
redo wastage               112061296
/*稀疏提交的匿名块最后产生了1828546240-1689225404=132MB的重做,而redo wastage为112061296-112011352=48k*/

可能您会很奇怪前者不是只比后者多出22MB的redo浪费吗,为什么总的redo量差了那么多?

我们需要注意到commit本身也是要产生redo的,而且其所产生的还不少!就以上演示来看频繁提交的过程中,commit所占用的redo空间几乎接近一半(257-132-22)/257=40%,而每次commit的平均redo量为(257-132-22)*1024*1024/550000=196 bytes。

commit操作是事务ACID的基础之一,合理运用commit可以帮我们构建健壮可靠的应用,而滥用它必将是另一场灾难!

巧用close_trace命令释放误删trace文件

可能很多朋友都遇到过这样的情况,在UNIX/Linux上定期清理Oracle日志文件夹时可能删除到仍被后台进程open着的trace文件,即某些后台进程一直持有着这些”被已经误删了的“打开文件的描述符(fd),这种情况下文件系统上该文件实际占用的空间是不会被释放的,这就造成使用df命令查看文件系统剩余空间和用du命令查看文件夹空间使用量时数值不一致的问题。此外因为是后台进程持有这些打开文件描述符,所以我们无法像kill服务进程一样来解决该问题(部分后台进程是可以kill的,不建议这样做)。oradebug是sqlplus中威力强大的debug命令,我们可以通过该命令发起多种trace/dump,其中也包括了close_trace事件;close_trace事件可以让指定进程关闭其正持有的trace文件。

下面我们就来演示下相关操作:

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v grep
maclean   7281     1  0 16:35 ?        00:00:00 ora_pmon_PROD
maclean   7283     1  0 16:35 ?        00:00:00 ora_psp0_PROD
maclean   7285     1  0 16:35 ?        00:00:00 ora_mman_PROD
maclean   7287     1  0 16:35 ?        00:00:00 ora_dbw0_PROD
maclean   7289     1  0 16:35 ?        00:00:00 ora_lgwr_PROD
maclean   7291     1  0 16:35 ?        00:00:00 ora_ckpt_PROD
maclean   7293     1  0 16:35 ?        00:00:00 ora_smon_PROD
maclean   7295     1  0 16:35 ?        00:00:00 ora_reco_PROD
maclean   7297     1  0 16:35 ?        00:00:00 ora_cjq0_PROD
maclean   7299     1  0 16:35 ?        00:00:00 ora_mmon_PROD
maclean   7301     1  0 16:35 ?        00:00:00 ora_mmnl_PROD
maclean   7303     1  0 16:35 ?        00:00:00 ora_d000_PROD
maclean   7305     1  0 16:35 ?        00:00:00 ora_s000_PROD
maclean   7313     1  0 16:35 ?        00:00:00 ora_qmnc_PROD
maclean   7430     1  0 16:35 ?        00:00:00 ora_q000_PROD
maclean   7438     1  0 16:36 ?        00:00:00 ora_q001_PROD

/* lgwr是著名的Oracle后台进程,在这个启动的实例中其系统进程号为7289*/

[maclean@rh2 ~]$ ls -l /proc/7289/fd        /* linux上的proc文件系统可以很方便我们探测进程信息*/
total 0
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 0 -> /dev/null
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 1 -> /dev/null
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 10 -> /dev/zero
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 11 -> /dev/zero
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 12 -> /s01/rac10g/rdbms/mesg/oraus.msb
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 13 -> /s01/rac10g/dbs/hc_PROD.dat
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 14 -> /s01/rac10g/dbs/lkPROD
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 15 -> /s01/rac10g/oradata/PROD/controlfile/o1_mf_64q6xphj_.ctl
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 16 -> /s01/rac10g/flash_recovery_area/PROD/controlfile/o1_mf_64q6xpms_.ctl
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 17 -> /s01/rac10g/oradata/PROD/onlinelog/o1_mf_1_64q6xrsr_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 18 -> /s01/rac10g/flash_recovery_area/PROD/onlinelog/o1_mf_1_64q6xsoy_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 19 -> /s01/rac10g/oradata/PROD/onlinelog/o1_mf_2_64q6xths_.log
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 2 -> /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 20 -> /s01/rac10g/flash_recovery_area/PROD/onlinelog/o1_mf_2_64q6xv9o_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 21 -> /s01/rac10g/oradata/PROD/onlinelog/o1_mf_3_64q6xw1b_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 22 -> /s01/rac10g/flash_recovery_area/PROD/onlinelog/o1_mf_3_64q6xwv0_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 23 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_system_64q6wd5j_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 24 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_undotbs1_64q6wd7f_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 25 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_sysaux_64q6wd5m_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 26 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_users_64q6wd89_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 27 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_temp_64q6xyox_.tmp
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 28 -> /s01/rac10g/rdbms/mesg/oraus.msb
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 3 -> /dev/null
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 4 -> /dev/null
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 5 -> /s01/rac10g/admin/PROD/udump/prod_ora_7279.trc
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 6 -> /s01/rac10g/admin/PROD/bdump/alert_PROD.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 7 -> /s01/rac10g/dbs/lkinstPROD (deleted)
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 8 -> /s01/rac10g/dbs/hc_PROD.dat
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 9 -> /s01/rac10g/admin/PROD/bdump/alert_PROD.log

/*可以看到lgwr进程相关trace文件为/s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc,对应打开文件描述符为2*/

[maclean@rh2 ~]$ ls -lh /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc
-rw-r----- 1 maclean oinstall 1.7M Jul 26 16:37 /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc

[maclean@rh2 ~]$ rm -f /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc

/*尝试删除该trace文件*/

[maclean@rh2 ~]$ ls -l /proc/7289/fd|grep lgwr
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 2 -> /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc (deleted)

/*文件已处在deleted状态,但lgwr进程仍持有该文件相关的文件描述符,这个时候该文件占有的空间并不会被释放*/

[maclean@rh2 ~]$ lsof|grep lgwr
oracle 7289   maclean    2w   REG 8,2   1702391 3867134 /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc (deleted)

[maclean@rh2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Mon Jul 26 17:03:04 2010

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> oradebug setospid 7289;
Oracle pid: 6, Unix process pid: 7289, image: oracle@rh2 (LGWR)
SQL> oradebug flush;             /*写出trace buffer内容到trace文件*/
Statement processed.
SQL> oradebug close_trace;
Statement processed.
/*close_trace能够释放指定Oracle进程正打开着的文件,To close the current trace file use*/
SQL> host
[maclean@rh2 ~]$ lsof|grep lgwr

[maclean@rh2 ~]$ ls -l /proc/7289/fd/|grep lgwr
[maclean@rh2 ~]$
/* 从进程相关的fd文件夹中查找不到原来的trace文件;close_trace命令成功释放了该文件,并回收了磁盘空间。*/

_shared_pool_reserved_pct or shared_pool_reserved_size with ASMM

共享池是Oracle著名的SGA的一个重要组成部分,当我们尝试从共享池中分配较大的连续区域时(默认来说是4400bytes),我们可能会用到共享池中的保留区域(也叫保留池);注意Oracle总是会先尝试扫描普通共享池的空闲列表,之后才尝试扫描保留池的空闲列表,无论所需分配的内存块是否超过隐式参数_shared_pool_reserved_min_alloc所指定的值。

什么?你看到过的4031描述文档是用以下伪代码描述分配流程的:

large, scan reserved list
if (chunk found)
check chunk size and perhaps truncate
if (chunk is not found)
scan regular free list
if (chunk found)
check chunk size and perhaps truncate
all done
if (chunk is not found)
do LRU operations and repeat

small, scan regular free list
if (chunk found)
check chunk size and perhaps truncate
all done
if (chunk is not found)
do LRU operations and repeat

那么来看看以下测试:

SQL> alter system set "_shared_pool_reserved_pct"=5 scope=spfile;

System altered.

SQL> startup frce;
SP2-0714: invalid combination of STARTUP options
SQL> startup force;
ORACLE instance started.

Total System Global Area 3154116608 bytes
Fixed Size                  2099616 bytes
Variable Size            2197816928 bytes
Database Buffers          939524096 bytes
Redo Buffers               14675968 bytes
Database mounted.
Database opened.
SQL>  select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
3525368

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm LIKE '%_shared_pool_reserved_min_alloc%';

NAME                            VALU DESCRIB
------------------------------- ---- ---------------------------------------------------------------------
_shared_pool_reserved_min_alloc 4400 minimum allocation size in bytes for reserved area of shared pool

SQL> select count(*) from x$ksmsp where ksmchsiz>4400 and ksmchcom!='free memory';

COUNT(*)
----------
64

SQL>  exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.
SQL> select count(*) from x$ksmsp where ksmchsiz>4400 and ksmchcom!='free memory';

COUNT(*)
----------
67                            /* 方才调用的存储过程成功在共享池中分配到3个大于4400 byte的Chunk,接下来看保留池大小变化)
SQL>  select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
3525368               /* 保留池大小没有发生变化,很显然3个大于4400 byte的Chunk是从regular free list上获取的,而非reserved free list/

以上实验中我们通过调用awr快照存储过程,模拟了从共享池中分配大于4400字节Chunk的操作,实验结果是在保留池有足够空闲空间的情况下,Oracle仍尝试在普通共享池区域中分配了这些连续内存,故而通过查询内部视图x$ksmsp虽然发现了多出了三个大于4400 byte的Chunk,然而保留池的空闲量并未减少。由此可证即便是要分配大于4400字节的内存块,Oracle也会先尝试搜索普通空闲列表,在普通空闲列表上无适应尺寸的连续空间时,才会尝试扫描保留池的空闲列表。

言归正题,我们可以通过2个参数控制保留池的大小:shared_pool_reserved_size和_shared_pool_reserved_pct。这2个参数的区别在于普通参数shared_pool_reserved_size以数值形式制定保留池的大小,这个数值采用在10g的ASMM(自动管理的SGA内存管理)特性的环境中是不会随共享池的大小变化而浮动的;不同于此,隐式参数_shared_pool_reserved_pct作为一个比例值,可以协同ASMM中共享池的变化而适应变化。在讨论经典4031错误的数个文档中,都有介绍到如果在ASMM环境中,设置_shared_pool_reserved_pct往往要好过shared_pool_reserved_size,它使你的共享池更具可收缩性!

纸上得来终觉浅,我们来看看_shared_pool_reserved_pct的实际效果:

SQL> alter system set sga_max_size=3000M scope=spfile;

System altered.

SQL> alter system set sga_target=3000M scope=spfile;

System altered.

SQL> alter system set shared_pool_size=500M;

System altered.

SQL> alter system set "_shared_pool_reserved_pct"=50 scope=spfile;

System altered.

SQL> startup force ;
ORACLE instance started.

Total System Global Area 3154116608 bytes
Fixed Size                  2099616 bytes
Variable Size             570426976 bytes
Database Buffers         2566914048 bytes
Redo Buffers               14675968 bytes
Database mounted.
Database opened.
SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
21158280
SQL> alter system set shared_pool_size=2000M ;  /*ASMM下手动修改shared_pool_size,模拟共享池自动扩展的情况*/

System altered.

SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
21158280                          /*  ohhh!好像跟我们预期的差别挺大,保留池大小没变*/

让我们跑下这段产生反复硬解析的SQL:

begin
for i in 1..200000 loop
execute immediate 'select 2 from dual where 1='||i;
end loop;
end;
/

SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
296215920                                          /* 这样好了,我们如愿了,SGA真"动态" /

SQL>  alter system set shared_pool_size=300M;   /*尝试收缩ASMM下的共享池*/

System altered.

SQL>  alter system flush shared_pool;

System altered.

SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
296215920                           /* 我们甚至无法flush 掉这些内存,这挺要命的 /

SQL> select name ,value/1024/1024 "SIZE MB" from v$system_parameter where name in ('sga_target','sga_max_size','shared_pool_size','db_cache_size','java_pool_size','large_pool_size','db_keep_cache_size');

NAME                    SIZE MB
-------------------- ----------
sga_max_size               3008
shared_pool_size            304
large_pool_size              16
java_pool_size               16
sga_target                 3008
db_cache_size               512
db_keep_cache_size            0

可以看到我们还有很多“没有分配”的SGA内存,我们来加大高速缓存看看:
SQL> alter system set db_cache_size=1000M;
alter system set db_cache_size=1000M
*
ERROR at line 1:
ORA-32017: failure in updating SPFILE
ORA-00384: Insufficient memory to grow cache      /* ohh 因为无法回收保留池的大量内存,导致了SGA其他组件无法扩展/

_shared_pool_reserved_pct的默认值5%可以满足绝大多数情况,通过上述实验证明设置该percent参数可以使保留池大小随SGA动态调整而扩大;但通过再次调整shared_pool_size和flush shared_pool手段都无法回收过度分配的保留池空间,这会导致其他组件无法正常扩展;因而我们在10gASMM的背景下,通过设置_shared_pool_reserved_pct可以获得更好的效果,但因为存在回收空间的问题,该参数也不宜设置过大,如果默认值在您的场景中显得过小,那么您可以尝试使用5-20这个区间内的值,超过20%的话往往就会造成负面的影响了。

ora-00600:[17281], [1001]一例

检查告警日志发现出现ora-600:[17281],[1001]记录,该数据库版本为10.2.0.4:

ORA-00600: internal error code, arguments: [17281], [1001], [0x70000042F5E54F8], [], [], [], [], []
ORA-01001: invalid cursor

分析该600错误产生的trace文件,发现当时运行的语句是一段匿名块:

Current SQL statement for this session:
declare
t_owner varchar2(30);
t_name  varchar2(30);
procedure check_mview is
dummy integer;
begin
if :object_type = ‘TABLE’ then
select 1 into dummy
from sys.all_objects
where owner = :object_owner
and object_name = :object_name
and object_type = ‘MATERIALIZED VIEW’
and rownum = 1;
:object_type := ‘MATERIALIZED VIEW’;
end if;
exception
when others then null;
end;
begin
:sub_object := null;
if :deep != 0 then
begin
if :part2 is null then
select constraint_type, owner, constraint_name
into :object_type, :object_owner, :object_name
from sys.all_constraints c
where c.constraint_name = :part1 and c.owner = user
and rownum = 1;
else
select constraint_type, owner, constraint_name, :part3
into :object_type, :object_owner, :object_name, :sub_object
from sys.all_constraints c
where c.constraint_name = :part2 and c.owner = :part1
and rownum = 1;
end if;
if :object_type = ‘P’ then :object_type := ‘PRIMARY KEY’; end if;
if :object_type = ‘U’ then :object_type := ‘UNIQUE KEY’; end if;
if :object_type = ‘R’ then :object_type := ‘FOREIGN KEY’; end if;
if :object_type = ‘C’ then :object_type := ‘CHECK CONSTRAINT’; end if;
return;
exception
when no_data_found then null;
end;
end if;
:sub_object := :part2;
if (:part2 is null) or (:part1 != user) then
begin
select object_type, user, :part1
into :object_type, :object_owner, :object_name
from sys.all_objects
where owner = user
and object_name = :part1
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
if :object_type = ‘SYNONYM’ then
select s.table_owner, s.table_name
into t_owner, t_name
from sys.all_synonyms s
where s.synonym_name = :part1
and s.owner = user
and rownum = 1;
select o.object_type, o.owner, o.object_name
into :object_type, :object_owner, :object_name
from sys.all_objects o
where o.owner = t_owner
and o.object_name = t_name
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
end if;
:sub_object := :part2;
if :part3 is not null then
:sub_object := :sub_object || ‘.’ || :part3;
end if;
check_mview;
return;
exception
when no_data_found then null;
end;
end if;
begin
select s.table_owner, s.table_name
into t_owner, t_name
from sys.all_synonyms s
where s.synonym_name = :part1
and s.owner = ‘PUBLIC’
and rownum = 1;
select o.object_type, o.owner, o.object_name
into :object_type, :object_owner, :object_name
from sys.all_objects o
where o.owner = t_owner
and o.object_name = t_name
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
check_mview;
return;
exception
when no_data_found then null;
end;
:sub_object := :part3;
begin
select o.object_type, o.owner, o.object_name
into :object_type, :object_owner, :object_name
from sys.all_objects o
where o.owner = :part1
and o.object_name = :part2
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
check_mview;
return;
exception
when no_data_found then null;
end;
begin
if :part2 is null and :part3 is null
then
select ‘USER’, null, :part1
into :object_type, :object_owner, :object_name
from sys.all_users u
where u.username = :part1
and rownum = 1;
return;
end if;
exception
when no_data_found then null;
end;
begin
if :part2 is null and :part3 is null and :deep != 0
then
select ‘ROLE’, null, :part1
into :object_type, :object_owner, :object_name
from sys.session_roles r
where r.role = :part1
and rownum = 1;
return;
end if;
exception
when no_data_found then null;
end;
:object_owner := null;
:object_type := null;
:object_name := null;
:sub_object := null;
end;
—– Call Stack Trace —–
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
——————– ——– ——————– —————————-
ksedst+001c          bl       ksedst1              088424844 ? 041124844 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgeasi+0118          bl       kgeriv               1104722C8 ? 1101B87C0 ?
104AFA0FC ? 7000000100067F8 ?
000000000 ?
kgicli+0188          bl       kgeasi               110195A58 ? 110447310 ?
438100004381 ? 200000002 ?
200000002 ? 000000000 ?
0000003E9 ? 000000002 ?
kgidlt+0398          bl       kgicli               110450A70 ? 104C734E0 ?
kgidel+0018          bl       kgidlt               FFFFFFFFFFF90B8 ? 000000000 ?
000000003 ? 000000000 ?
FFFFFFFFFFF9468 ?
perabo+00ac          bl       kgidel               FFFFFFFFFFF8D20 ? 0000000FF ?
perdcs+0050          bl       perabo               000000000 ? 000000820 ?
000000000 ?
peidcs+01dc          bl       perdcs               110477618 ? 000000000 ?
kkxcls+00a4          bl       peidcs               FFFFFFFFFFF9468 ? 110477618 ?
kxsClean+0044        bl       kkxcls               1100DD338 ?
kxsCloseXsc+0444     bl       kxsClean             FFFFFFFFFFF9760 ?
kksCloseCursor+031c  bl       kxsCloseXsc          110478688 ? 110281FB0 ?
opicca+00c4          bl       kksCloseCursor       104BD9640 ?
opiclo+00a0          bl       opicca               10013D940 ?
kpoclsa+0050         bl       03F32B00
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F9F2A0
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F9E0E8
sou2o+0090           bl       opidrv               3C02DC1BBC ? 44065F000 ?
FFFFFFFFFFFF3A0 ?
opimai_real+01bc     bl       01F9B9F4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?



first argument为17281,该代码对应为在关闭游标时发生错误事件。
发生错误时的调用栈为:kkxcls->peidcs->perdcs->perabo->kgidel->kgidlt->kgicli,通过以上调用栈与argument信息在600 lookup工具中查询,可以发现bug:[6051353]:

Hdr: 6051353 10.1.0.45 THIN 10.1.0.5 PRODID-972 PORTID-212 ORA-600
Abstract: ORA-600[17281] ORA-[1001]

*** 05/14/07 07:09 am ***
TAR:
----
17450130.600

PROBLEM:
--------
Oracle 10.1.0.5 64-bit
AIX5L 64-bit server

Following the application of CPUJAN2007 patch, the database is giving
internal errors.
Tha alert log sjows:
  ORA-600: internal error code, arguments: [17281], [1001],
[0x70000001E792DC8], [], [], [], [], []
  ORA-1001: invalid cursor

Trace file shows the failing statement is an insert.

INSERT INTO V_RPMORGRESOURCEPOSITION
(ORGID, RESOURCEID, EFFECTIVEDAY, TERMINATIONDAY, OWNEDMW,
COMMITTEDMW, AVAILABLEMW, UNOFFEREDMW, FRRCOMMITTEDMW )
SELECT :B12 , :B11 , EFFECTIVEDAY, LEAST(:B10 , :B9 ),
NVL(OWNEDMW,0) + NVL(:B8 ,0), NVL(COMMITTEDMW,0) + NVL(:B7 ,0),
NVL(AVAILABLEMW,0) + NVL(:B6 ,0), NVL(UNOFFEREDMW,0) + NVL(:B5 ,0),
NVL(FRRCOMMITTEDMW,0) + NVL(:B4 ,0)
FROM RPMORGRESOURCEPOSITION
WHERE ORGID = :B3 AND RESOURCEID = :B2 AND EFFECTIVEDAY = :B1

    Other information:
        O/S info: user: , term: , ospid: 1234, machine: esu03als
        client info: smartino@PJM
        application name: eRpm
        action name: QueryZonalLoadObligationDetail
        last wait for 'SQL*Net message from client'

DIAGNOSTIC ANALYSIS:
--------------------
the function stack exactly matches 4359111
kgicli kgidlt kgidel perabo
    Bug 4359111 - STRESS ORA-600[17281] WHEN RUNNING GMT APPLICATION13
But this is already fixed in 10.1.0.5

The 'client' (esu03als) is a Linux server and there is no oracle running
there.

The AIX SysAdmin. who manages the esu03als server says this about the
application:

    "The way RPM connects to our database is by using DBCP (Apache Jakarta
     Project -- see
     As far as I can tell, the release (jar) contained within the RPM
     delivery is version 1.1 of DBCP which was released on 2003-10-20."

This couls also be Bug 5392685/5413487 but tis doesn't seem to have a
resolution.

Also could be Bug 5366763
  possible workaround - set session_cached_cursors to 0
  But this is already set to 0.

Originally the Customer thought this error started after applying CPUJAN2007,
but it is appearing on another database where the CPU patch is not applied.

WORKAROUND:
-----------
none

RELATED BUGS:
-------------
4359111
5413487
5366763

REPRODUCIBILITY:
----------------
intermittent

TEST CASE:
----------
none

STACK TRACE:
------------
          ksedmp ksfdmp kgeriv kgeasi
          kgicli kgidlt kgidel perabo perdcs peidcs kkxcls2 kxtcln
          kxsClean kksCloseCursor opicca opiclo kpoclsa opiodr
          ttcpip opitsk opiino opiodr opidrv sou2o
          main

其调用栈完全一致,可以基本确定2者的关联性。但该文档叙述bug发生在10.1.0.5的AIX版本上,且据称该Bug之前已在“Bug 4359111 – STRESS ORA-600[17281] WHEN RUNNING GMT APPLICATION13”中声明并在10.1.0.5上修复,看起来又是一个伪修复的漏洞。另外一个文档叙述了同样的错误发生在10.2.0.4上:

Hdr: 8337808 10.2.0.4 RDBMS 10.2.0.4 PRG INTERFACE PRODID-5 PORTID-46 ORA-600 4359111
Abstract: ORA-600 [17281] [1001] EVEN AFTER APPLYING THE PATCH 4359111

In prod_ora_12985.trc we see that:
 O/S info: user: Arun?Sharma, term: ARUN, ospid: 2556:3300,
 machine: BVM-EDP\ARUN
Got the ORA-600 at 2009-04-17 13:27:21 .

From the trace this was likely because the PLSQL block in
cursor #2 has an instantiation entry indicating that it
has cursor #3 open:
 INSTANTIATION OBJECT: object=0xf60e9ed0
 type="PL/SQL"[0] lock=0xa383a928 handle=0xb48def6c body=(nil)
 flags=[40] executions=0
 CURSORS: size=4 count=1 next=3
 index cursor      tag  context flags
 ----- ------ -------- -------- ---------------
     2      3 0xf60d56f4 0xf60f832c LRU/PRS/[03]
            ^here

But there is no cursor#3 so it has likely been closed independently

So the immediate thing to do would be to find out what this OS
user (Arun Sharma) was doing at 13.27 on 17th April from the
ARUN machine under OS pid 2556:3300 ?
 - what was the client program ?
 - what is this clients Oracle RSF version ?
 - what was being done in that client at that time.

It is unlikely that the user will remember such fine detail
so you may want to track the alert log closely and as soon
as you seen the ORA-600 find the O/S user , machine etc..
and try to contact them ASAP to confirm what they were doing
etc.. If we can get a handle on the client version / program /
actions that may help. Beyond that the next step is likely
to need a diagnostic on the server side to note cursor close
operations from the client without extranoues additional trace.

From the above update his client is TOAD using 8.0.6.0 on Windows.
TOAD is known to be affected by bug 4359111 so
you should upgrade this client to a version where
bug 4359111 is fixed. (4359111 is a CLIENT SIDE fix)

Marking as an unconfirmed duplicate of bug 4359111
as it looks like some specific client may be connecting
which does not have that patch in place.

与以上文档描述相同,trace中存在以下记录:

INSTANTIATION OBJECT: object=1105e4fe8
type=”PL/SQL”[0] lock=70000044155a8b0 handle=70000042f5e54f8 body=0 level=0
flags=[40] executions=0
CURSORS: size=4 count=3 next=5
index cursor      tag  context flags
—– —— ——– ——– —————
2      4 11049ecc8 110525f60 LRU/PRS/[03]
3      6 11049ecc8 1105261f0 LRU/PRS/[03]
4      5 11049ecc8 110526338 LRU/PRS/[03]

但实际上这里cursor 3所打开的cursor#:4,5,6均不存在,所以cursor# 3也被单独关闭了。文档中问题是由toad引起的,首先toad连接数据库是不需要安装Oracle client的,它通过一些客制化过的c/c++的接口连接到DB;如文档所述可以确定toad V8.0.6.0受到 Bug4359111的影响,而我们的环境中是通过PL/SQL developer连接到数据库的,该工具需要用到Oracle client,而开发人员安装的Oracle client一般为9.2.0.1,极有可能是这一较低版本的客户端软件造成了问题发生,到这里触发Bug的条件基本清晰了。

8i/9i的oracle client虽然仍能够连接到10g,但难保不发生一些兼容性问题或者将早期版本中的Bug再次代入,Oracle对这些连接形式或已不提供技术支持,或提供扩展模式(可能收费)的技术支持。以下列表列出了各版本Server-client的兼容性:



  • #1 – See Note 207319.1
  • #2 – An ORA-3134 error is incorrectly reported if a 10g client tries to connect to an 8.1.7.3 or lower server. See Note 3437884.8 .
  • #3 – An ORA-3134 error is correctly reported when attempting to connect to this version.
  • #4 – There are problems connecting from a 10g client to 8i/9i where one is EBCDIC based. See Note 3564573.8
  • #5 – For connections between 10.2 (or higher) and 9.2 the 9.2 end MUST be at 9.2.0.4 or higher. Connections between 10.2 (or higher) and 9.2.0.1, 9.2.0.2 or 9.2.0.3 are not supported.
  • #6 – For connections between 11.1 (or higher) and 10.1 / 10.2 the 10g end MUST be at 10.1.0.5 / 10.2.0.2 (or higher) respectively in order to use PLSQL between those versions. See Note 4511371.8 for more details.

其实在我们升级或迁移Oracle数据库的时候就因该考虑到客户端软件也需要升级到合适版本才能满足今后兼容性及应用程序健壮度的要求,当然客户端软件并不一定只是oracle client,它可能是jdbc,也许是odbc,也许是dbi等等。

Fail to queue the whole FAL gap in dataguard一例

近日告警日志中出现以下记录:
FAL[server]: Fail to queue the whole FAL gap
GAP – thread 1 sequence 180-180
DBID 3731271451 branch 689955035

这是一个10.2.0.3的dataguard环境,采用物理备库,归档传输模式;查询metalink发现相关note:

Symptoms

When using ARCH transport, gaps could be flagged in the alert log even though the single log gap was for a log that had not been written at the primary yet.

alert.log on primary shows:

FAL[server]: Fail to queue the whole FAL gap

GAP – thread 1 sequence 63962-63962

DBID 1243807152 branch 631898097

or alert.log on standby shows:

Fetching gap sequence in thread 1, gap sequence 63962-63962

Thu Jan 24 14:36:30 2008

FAL[client]: Failed to request gap sequence

GAP – thread 1 sequence 63962-63962

DBID 2004523329 branch 594300676

FAL[client]: All defined FAL servers have been attempted.

v$archive_gap returns no rows

SQL> select * from v$archive_gap;

no rows selected

Cause

Bug 5526409 – FAL gaps reported at standby for log not yet written at primary

Solution

Bug 5526409 is fixed in 10.2.0.4 and 11.1.

Upgrade to 10.2.0.4 as Bug 5526409 is fixed in 10.2.0.4.

Their is no impact of these messages on the database. You can safely ignore these messages.

One-off Patch for Bug 5526409 on top of 10.2.0.3 is available for some platforms. Please check Patch 5526409 for your platform.

该note描述在10.1.0.2-10.2.0.3版本中,在ARCH传输的DataGuard环境中可能出现日志传输gap为单个在primary库中尚未写出的日志,该gap可能会在告警日志中以以上形式标示。
该bug(问题)在版本10.2.0.4和11.1中得到了修复,在10.2.0.3版本中部分平台上有one-off补丁。但实际上该bug(问题)对于主备库不会有任何影响,我们也可以将之忽略。

7月最新发布11.2.0.1.2 Patch set update

7月13日,11g release 2 的第二个补丁集更新发布了;9i的最终版本为9.2.0.8,10g上10.2.0.5很有可能成为最终版本,我们预期今后(11g,12g)中Patch set数量会有效减少,而patch set update数量可能大幅增加;这样的更新形式可以为Oracle Database提升一定的软件形象。可以猜想11gr2的最终版本号可能是11.2.0.2/3.x。

附该psu的readme note:

Released: July 13, 2010

This document is accurate at the time of release. For any changes and additional information regarding PSU 11.2.0.1.2, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

  • Note 854428.1 Patch Set Updates for Oracle Products
  • Note 1089071.1 Oracle Database Patch Set Update 11.2.0.1.2 Known Issues

This document includes the following sections:

1 Patch Information

Patch Set Update (PSU) patches are cumulative. That is, the content of all previous PSUs is included in the latest PSU patch.

PSU 11.2.0.1.2 includes the fixes listed in Section 5, “Bugs Fixed by This Patch”.

Table 1 describes installation types and security content. For each installation type, it indicates the most recent PSU patch to include new security fixes that are pertinent to that installation type. If there are no security fixes to be applied to an installation type, then “None” is indicated. If a specific PSU is listed, then apply that or any later PSU patch to be current with security fixes.

Table 1 Installation Types and Security Content

Installation Type Latest PSU with Security Fixes
Server homes PSU 11.2.0.1.2


Client-Only Installations None
Instant Client Installations None

(The Instant Client installation is not the same as the client-only Installation. For additional information about Instant Client installations, see Oracle Database Concepts.)

2 Patch Installation and Deinstallation

This section includes the following sections:

2.1 Platforms for PSU 11.2.0.1.2

For a list of platforms that are supported in this Patch Set Update, see My Oracle Support Note 1060989.1 Critical Patch Update July 2010 Patch Availability Document for Oracle Products.

2.2 OPatch Utility Information

You must use the OPatch utility version 11.2.0.1.0 or later to apply this patch. Oracle recommends that you use the latest released OPatch 11.2, which is available for download from My Oracle Support patch 6880880 by selecting the 11.2.0.0.0 release.

For information about OPatch documentation, including any known issues, see My Oracle Support Note 293369.1 OPatch documentation list.

2.3 Patch Installation

These instructions are for all Oracle Database installations.

2.3.1 Patch Pre-Installation Instructions

Before you install PSU 11.2.0.1.2, perform the following actions to check the environment and to detect and resolve any one-off patch conflicts.

2.3.1.1 Environments with ASM

If you are installing the PSU to an environment that has Automatic Storage Management (ASM), note the following:

  • For Linux x86 and Linux x86-64 platforms, install either (A) the bug fix for 8898852 and the Database PSU patch 9654983, or (B) the Grid Infrastructure PSU patch 9343627.
  • For all other platforms, no action is required. The fix for 8898852 was included in the base 11.2.0.1.0 release.

2.3.1.2 Environment Checks
  1. Ensure that the $PATH definition has the following executables: make, ar, ld, and nm.The location of these executables depends on your operating system. On many operating systems, they are located in /usr/ccs/bin, in which case you can set your PATH definition as follows:
    export PATH=$PATH:/usr/ccs/bin
    

2.3.1.3 One-off Patch Conflict Detection and Resolution

For an introduction to the PSU one-off patch concepts, see “Patch Set Updates Patch Conflict Resolution” in My Oracle Support Note 854428.1 Patch Set Updates for Oracle Products.

The fastest and easiest way to determine whether you have one-off patches in the Oracle home that conflict with the PSU, and to get the necessary conflict resolution patches, is to use the Patch Recommendations and Patch Plans features on the Patches & Updates tab in My Oracle Support. These features work in conjunction with the My Oracle Support Configuration Manager. Recorded training sessions on these features can be found in Note 603505.1.

However, if you are not using My Oracle Support Patch Plans, follow these steps:

  1. Determine whether any currently installed one-off patches conflict with the PSU patch as follows:
    unzip p9654983_11201_<platform>.zip
    opatch prereq CheckConflictAgainstOHWithDetail -phBaseDir ./9654983
    
  2. The report will indicate the patches that conflict with PSU 9654983 and the patches for which PSU 9654983 is a superset.Note that Oracle proactively provides PSU 11.2.0.1.2 one-off patches for common conflicts.
  3. Use My Oracle Support Note 1061295.1 Patch Set Updates – One-off Patch Conflict Resolution to determine, for each conflicting patch, whether a conflict resolution patch is already available, and if you need to request a new conflict resolution patch or if the conflict may be ignored.
  4. When all the one-off patches that you have requested are available at My Oracle Support, proceed with Section 2.3.2, “Patch Installation Instructions”.

2.3.2 Patch Installation Instructions

Follow these steps:

  1. If you are using a Data Guard Physical Standby database, you must first install this patch on the primary database before installing the patch on the physical standby database. It is not supported to install this patch on the physical standby database before installing the patch on the primary database. For more information, see My Oracle Support Note 278641.1.
  2. Do one of the following, depending on whether this is a RAC environment:
    • If this is a RAC environment, choose one of the patch installation methods provided by OPatch (rolling, all node, or minimum downtime), and shutdown instances and listeners as appropriate for the installation method selected.This PSU patch is rolling RAC installable. Refer to My Oracle Support Note 244241.1 Rolling Patch – OPatch Support for RAC.
    • If this is not a RAC environment, shut down all instances and listeners associated with the Oracle home that you are updating. For more information, see Oracle Database Administrator’s Guide.
  3. Set your current directory to the directory where the patch is located and then run the OPatch utility by entering the following commands:
    unzip p9654983_11201_<platform>.zip
    cd 9654983
    opatch apply
    
  4. If there are errors, refer to Section 3, “Known Issues”.

2.3.3 Patch Post-Installation Instructions

After installing the patch, perform the following actions:

  1. Apply conflict resolution patches as explained in Section 2.3.3.1.
  2. Load modified SQL files into the database, as explained in Section 2.3.3.2.

2.3.3.1 Applying Conflict Resolution Patches

Apply the patch conflict resolution one-off patches that were determined to be needed when you performed the steps in Section 2.3.1.3, “One-off Patch Conflict Detection and Resolution”.

2.3.3.2 Loading Modified SQL Files into the Database

The following steps load modified SQL files into the database. For a RAC environment, perform these steps on only one node.

  1. For each database instance running on the Oracle home being patched, connect to the database using SQL*Plus. Connect as SYSDBA and run the catbundle.sql script as follows:
    cd $ORACLE_HOME/rdbms/admin
    sqlplus /nolog
    SQL> CONNECT / AS SYSDBA
    SQL> STARTUP
    SQL> @catbundle.sql psu apply
    SQL> QUIT
    

    The catbundle.sql execution is reflected in the dba_registry_history view by a row associated with bundle series PSU.

    For information about the catbundle.sql script, see My Oracle Support Note 605795.1 Introduction to Oracle Database catbundle.sql.

  2. Check the following log files in $ORACLE_HOME/cfgtoollogs/catbundle for any errors:
    catbundle_PSU_<database SID>_APPLY_<TIMESTAMP>.log
    catbundle_PSU_<database SID>_GENERATE_<TIMESTAMP>.log
    

    where TIMESTAMP is of the form YYYYMMMDD_HH_MM_SS. If there are errors, refer to Section 3, “Known Issues”.

2.3.4 Patch Post-Installation Instructions for Databases Created or Upgraded after Installation of PSU 11.2.0.1.2 in the Oracle Home

These instructions are for a database that is created or upgraded after the installation of PSU 11.2.0.1.2.

You must execute the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” for any new database only if it was created by any of the following methods:

  • Using DBCA (Database Configuration Assistant) to select a sample database (General, Data Warehouse, Transaction Processing)
  • Using a script that was created by DBCA that creates a database from a sample database

2.4 Patch Deinstallation

These instructions apply if you need to deinstall the patch.

2.4.1 Patch Deinstallation Instructions for a Non-RAC Environment

Follow these steps:

  1. Verify that an $ORACLE_HOME/rdbms/admin/catbundle_PSU_<database SID>_ROLLBACK.sql file exists for each database associated with this ORACLE_HOME. If this is not the case, you must execute the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” against the database before deinstalling the PSU.
  2. Shut down all instances and listeners associated with the Oracle home that you are updating. For more information, see Oracle Database Administrator’s Guide.
  3. Run the OPatch utility specifying the rollback argument as follows.
    opatch rollback -id 9654983
    
  4. If there are errors, refer to Section 3, “Known Issues”.

2.4.2 Patch Post-Deinstallation Instructions for a Non-RAC Environment

Follow these steps:

  1. Start all database instances running from the Oracle home. (For more information, see Oracle Database Administrator’s Guide.)
  2. For each database instance running out of the ORACLE_HOME, connect to the database using SQL*Plus as SYSDBA and run the rollback script as follows:
    cd $ORACLE_HOME/rdbms/admin
    sqlplus /nolog
    SQL> CONNECT / AS SYSDBA
    SQL> STARTUP
    SQL> @catbundle_PSU_<database SID>_ROLLBACK.sql
    SQL> QUIT
    

    In a RAC environment, the name of the rollback script will have the format catbundle_PSU_<database SID PREFIX>_ROLLBACK.sql.

  3. Check the log file for any errors. The log file is found in $ORACLE_HOME/cfgtoollogs/catbundle and is named catbundle_PSU_<database SID>_ROLLBACK_<TIMESTAMP>.log where TIMESTAMP is of the form YYYYMMMDD_HH_MM_SS. If there are errors, refer to Section 3, “Known Issues”.

2.4.3 Patch Deinstallation Instructions for a RAC Environment

Follow these steps for each node in the cluster, one node at a time:

  1. Shut down the instance on the node.
  2. Run the OPatch utility specifying the rollback argument as follows.
    opatch rollback -id 9654983
    

    If there are errors, refer to Section 3, “Known Issues”.

  3. Start the instance on the node as follows:
    srvctl start instance
    

2.4.4 Patch Post-Deinstallation Instructions for a RAC Environment

Follow the instructions listed in Section Section 2.4.2, “Patch Post-Deinstallation Instructions for a Non-RAC Environment” only on the node for which the steps in Section 2.3.3.2, “Loading Modified SQL Files into the Database” were executed during the patch application.

All other instances can be started and accessed as usual while you are executing the deinstallation steps.

3 Known Issues

For information about OPatch issues, see My Oracle Support Note 293369.1 OPatch documentation list.

For issues documented after the release of this PSU, see My Oracle Support Note 1089071.1 Oracle Database Patch Set Update 11.2.0.1.2 Known Issues.

Other known issues are as follows.

Issue 1
The following ignorable errors may be encountered while running the catbundle.sql script or its rollback script:

ORA-29809: cannot drop an operator with dependent objects
ORA-29931: specified association does not exist
ORA-29830: operator does not exist
ORA-00942: table or view does not exist
ORA-00955: name is already used by an existing object
ORA-01430: column being added already exists in table
ORA-01432: public synonym to be dropped does not exist
ORA-01434: private synonym to be dropped does not exist
ORA-01435: user does not exist
ORA-01917: user or role 'XDB' does not exist
ORA-01920: user name '<user-name>' conflicts with another user or role name
ORA-01921: role name '<role name>' conflicts with another user or role name
ORA-01952: system privileges not granted to 'WKSYS'
ORA-02303: cannot drop or replace a type with type or table dependents
ORA-02443: Cannot drop constraint - nonexistent constraint
ORA-04043: object <object-name> does not exist
ORA-29832: cannot drop or replace an indextype with dependent indexes
ORA-29844: duplicate operator name specified
ORA-14452: attempt to create, alter or drop an index on temporary table already in use
ORA-06512: at line <line number>. If this error follow any of above errors, then can be safely ignored.
ORA-01927: cannot REVOKE privileges you did not grant

4 References

The following documents are references for this patch.

Note 293369.1 OPatch documentation list

Note 360870.1 Impact of Java Security Vulnerabilities on Oracle Products

Note 468959.1 Enterprise Manager Grid Control Known Issues

Note 9352237.8 Bug 9352237 – 11.2.0.1.1 Patch Set Update (PSU)

5 Bugs Fixed by This Patch

This patch includes the following bug fixes.

5.1 CPU Molecules

CPU molecules in PSU 11.2.0.1.2:

PSU 11.2.0.1.2 contains the following new CPU molecules:

9676419 – DB-11.2.0.1-MOLECULE-004-CPUJUL2010

9676420 – DB-11.2.0.1-MOLECULE-005-CPUJUL2010

5.2 Bug Fixes

PSU 11.2.0.1.2 contains the following new fixes:

Automatic Storage Management

8755082 – ORA-00600: [KCFIS_TRANSLATE4:VOLUME LOOKUP], [2], [WRONG DEVICE NAME], [], [], [

8890026 – ASM PARTNERING CREATES IMBALANCED PARTNERSHIPS

9170608 – STBH:DD BLOCKS PINNED FOR QUERIES THAT DO NOT REQUEST USED SPACE

9363145 – STBH:DB INSTANCES TERMINATED BY ASMB DUE TO ORA-00600 [KFDSKALLOC0]

Buffer Cache

8330783 – HANGING DB WITH “CACHE BUFFER CHAINS” AND “BUFFER DEADLOCK” WAITS DURING INSERT

8822531 – TAKING AWR SNAP HANGS

Data Guard Broker

8918433 – UNPERSISTED FSFO STATE BITS CAN GET PERSISTED

9363384 – PHYSICAL STANDBY SERVICES NOT STARTED AFTER CONVERT FROM SNAPSHOT

9467635 – BROKER’S METADATA FILE UPGRADE TO 11.2 IS BROKEN

9467727 – GETSTATUS DOC YIELDS INCORRECT RESULT IF DBRESOURCE_ID PROP VALUE IS USED

Data Guard Logical

8774868 – LGSBFSFO: ORA-600 [3020], [3], [138] RAISED IN RECOVERY SLAVE

8822832 – V$ARCHIVE_DEST_STATUS HAS INCORRECT VALUE FOR APPLIED_SEQ#

DataGuard Redo Transport

8872096 – ARCHIVING FORCED DURING CLOSE WHEN NO STANDBY IS PRESENT

9399090 – STBH: CONSTANT/HIGH FREQUENT LOG SWITCHES ON BEEHIVE DATABASE IN THE LAST 3 DAYS

Shared Cursors

8865718 – RECURSIVE CURSORS CONTAINING “AS OF SNAPSHOT” CLAUSE ARE NOT SHARED

8981059 – HIGH VERSION COUNT:BIND_MISMATCH,USER_BIND_PEEK_MISMATCH,OPTIMIZER_MODE_MISMATCH

9010222 – APPS ST 11G ORA-00600 [KKSFBC-REPARSE-INFINITE-LOOP]

9067282 – TB:SH:ORA-00600:[KKSFBC-WRONG-KKSCSFLGS] WHILE RUNNING TPC-H

DML Drivers

9255542 – ARRAY INSERT TO PARTITIONED TABLE LOOSES ROWS DUE TO CONCURRENT DDL (ORA-14403)

9488887 – FORIEGN KEY VIOLATION WITH ARRAY-INSERT AND ONLINE IDX REBUILD AFTER BUG-9255542

Flashback Database

8834425 – ORA-240 IN RVWR PROCESS CAUSING 5MIN TRANSACTIONAL HANG

PLSQL

9210925 – AFTER MANUAL UPGRADE TO 11.1.0.7 PL/SQL CALLS INCORRECT FUNCTION

Automatic Memory Management

8505803 – PRE_PAGE_SGA RESULTS IN EXCESSIVE PAGE TABLE SIZE WHEN USING MEMORY_TARGET [AMM]

Partitioning

9165206 – PARTITIONING ORA-600 [KKPOLLS1] / [KKDOILSF1] – DURING PARTITION MAINTANANCE

Real Application Cluster

8875671 – LX64: ORA-600 ARGS [KJPNP_CHK:!MASTER_READY],

9093300 – LOTS OF REPEATED KJXOCDR: DROP DUPLICATE OPEN MESSAGE IN LMD TRACE

Row Access Method

8544696 – TABLE GROWTH – BLOCKS ARE NOT REUSED

Streams

8650719 – DOWNSTREAM CAPTURE ABORTS WITH ORA-26766

Secure Files

8856478 – RAM SECUREFILE PERF DEGRADATION WITH SF COMPRESSION ON SMALL LOBS DURING ATB MOVE

9272086 – STBH: DATA PUMP WRITER SEEMS TO BE WAITING ON WAIT FOR UNREAD MESSAGE ON BROADCA

DB Recovery

8909984 – APPSST GSI 11G: GAPS IN AWR SNAPSHOTS

9068088 – MEDIA RECOVERY WAS HUNG ON STANDBY

9145541 – ORA-600 [25027] / ORA-600 [4097] FOR ACTIVE TX IN A PLUGGED TABLESPACE

9167285 – PKT-BUGOLTP: ORA-07445: [KCRALC()+87]

Space Management

7519406 – ‘J000’ TRACE FILE REGARDING GATHER_STATS_JOB INTERMITTENTLY SINCE 10.2.0.4

8815639 – [11GR2-LNX-090813] MULTIPLE INSERT CAUSE DATA ALLOCATION ABOVE HHWM

9216806 – HIGH “ENQ: TS – CONTENTION” FOR TEMPORARY SEGMENT WHILE SQLLDR DIRECT PATH LOAD

9242411 – STRESS-BIGBH: LOTS OF OR-3113S IN BIGBH STRESS TEST

9461782 – ORA-7445 [KTSLF_SUMFSG()+54] [SIGSEGV] AND KTSLFSUM_CFS ON CALL STACK

Compression

9011088 – [11GR2]ADDING COLUMN TO COMPRESSED TABLE, DATA LOSS OCCURED.

9275072 – APPSST GSI 11G : BUFFER BUSY WAITS INSERTING INTO TABLES

9341448 – APPSST GSI 11G : BUFFER BUSY WAITS AND LATCH: CACHE BUFFERS WAITS WHEN INSERTING

9637033 – ORA-07445[KDR9IR2RST0] INSERT AS SELECT IN A COMPRESSED TABLE WITH > 255 COLUMNS

SQL Execution

8664189 – ORA-00600 [KDISS_UNCOMPRESS: BUFFER LENGTH]

9119194 – PSRC: DISTRIBUTED QUERY SLOWER IN 10.2.0.4 COMPARED TO 10.2.0.3

Transaction Management

8268775 – PERF: HIGH US ENQUEUE CONTENTION DURING A LOGIN STORM OR SESSION FAILOVER

8803762 – ORA-00600 [KDSGRP1] BLOCK CORRUPTION ON 11G DATABASE UPGRADE

Memory Management

8431487 – INSTANCE CRASH ORA-07445 [KGGHSTFEL()+192] ORA-07445[KGGHSTMAP()+241]

Message

9713537 – ENHANCE CAUSE/ACTION FIELDS OF THE INTERNAL ERROR ORA-00600

9714832 – ENHANCE CAUSE/ACTION FIELDS OF THE INTERNAL ERROR ORA-07445

沪ICP备14014813号-2

沪公网安备 31010802001379号