Know more about commit

COMMIT操作是RDBMS中事务结束的标志,在Oracle中与commit紧密相关的是SCN(System Change Number)。

引入SCN的最根本目的在于:

  1. 为读一致性所用
  2. 为redolog中的记录排序,以及恢复

SCN由SCN Base和Scn Wrap组成,是一种6个字节的结构(structure)。其中SCN Base占用4个字节,而SCN wrap占用2个字节。但在实际存储时SCN-like的stucture常会占用8个字节。

 ub4 kscnbas
 ub2 kscnwrp

struct kcvfhcrs, 8 bytes                 @100                              Creation Checkpointed at scn
      ub4 kscnbas                        @100      0x000a8849
      ub2 kscnwrp                        @104      0x0000

在Oracle中一个事务的开始包含以下操作:

  1. 绑定一个可用的rollback segment
  2. 在事务表(transaction table)上分配一个必要的槽位
  3. 从rollback segment中分配undo block

注意system rollback segment是一种特殊的回滚段,在10g以后普通回滚段的类型都变成了”TYPE2 UNDO”,而唯有system rollback segment的类型仍为”ROLLBACK”,这是由其特殊性造就的:

SQL> col segment_name for a20
SQL> col rollback for a20
SQL> select segment_name,segment_type from dba_segments where segment_type='ROLLBACK';

SEGMENT_NAME         SEGMENT_TYPE
-------------------- ------------------
SYSTEM               ROLLBACK

System rollback segment面向的是SYSTEM表空间上数据字典对象相关事务的数据,以及由对用户数据产生的递归SQL调用所产生的数据。

Oracle不使用基于内存锁管理器的行锁,Oracle中的row lock是基于数据块的。数据块中的Interested Transaction List(ITL)是行锁的重要标志。
ITL的分配遵循以下的原则:

  1. 找出未被使用的ITL
  2. 找出最老的已经事务提交的ITL
  3. 做部分的块清理,直到有可用的ITL
  4. 扩展ITL区域,一条ITL占用24字节

当事务提交COMMIT时,需要完成以下步骤的操作:

  1. 得到一个SCN值
  2. 使用得到的SCN更新事务表中的槽位
  3. 在redo log buffer中创建一条commit记录
  4. 将redo log buffer刷新到磁盘上的在线日志文件
  5. 释放表和行上的锁(may cause delayed block cleanout)

了解你所不知道的SMON功能(七):清理IND$字典基表

SMON的作用还包括清理IND$字典基表(cleanup ind$):
ind$_flags

触发场景

当我们在线创建或重建索引时(create or rebuild index online),服务进程会到IND$字典基表中将该索引对应的记录的FLAGS字段修改为十进制的256或者512(见上图0x100=256,0×200=512),如:

SQL> create index macleans_index on larges(owner,object_name) online;

SQL> select obj# from obj$ where name='MACLEANS_INDEX';

      OBJ#
----------
   1343842

SQL> select FLAGS from ind$ where obj#=1343842;

     FLAGS
----------
       256

ind_online$字典基表记录了索引在线创建/重建的历史

SQL> select * from ind_online$;

      OBJ#      TYPE#      FLAGS
---------- ---------- ----------
   1343839          1        256
   1343842          1        256

create table ind_online$
( obj#          number not null,
  type#         number not null,              /* what kind of index is this? */
                                                               /* normal : 1 */
                                                               /* bitmap : 2 */
                                                              /* cluster : 3 */
                                                            /* iot - top : 4 */
                                                         /* iot - nested : 5 */
                                                            /* secondary : 6 */
                                                                 /* ansi : 7 */
                                                                  /* lob : 8 */
                                             /* cooperative index method : 9 */
  flags         number not null
                                      /* index is being online built : 0x100 */
                                    /* index is being online rebuilt : 0x200 */
)

原则上online create/rebuild index的的清理工作由实际操作的服务进程负责完成,这种清理在DDL语句成功的情况下包括一系列数据字典的维护,在该DDL语句失败的情形中包括对临时段的清理和数据字典的维护,无论如何都需要drop在线日志中间表 SYS_JOURNAL_nnnnn(nnnn为该索引的obj#)。数据字典的维护工作就包含对IND$基表中相应索引记录的FLAGS标志位的恢复,但是如果服务进程在语句执行过程中意外终止的话,那么短时间内FLAGS标志位字段就无法得到恢复,这将导致对该索引的后续操作因ORA-8104错误而无法继续:

SQL> drop index macleans_index;
drop index macleans_index
           *
ERROR at line 1:
ORA-08104: this index object 1343842 is being online built or rebuilt

08104, 00000, "this index object %s is being online built or rebuilt"
// *Cause:  the index is being created or rebuild or waited for recovering
//          from the online (re)build
// *Action: wait the online index build or recovery to complete

SMON负责在启动后(startup)的每小时执行一次对IND$基表中因在线创建/重建索引失败所留下记录的清理,这种清理工作由kdicclean函数驱动(kdicclean is run by smon every 1 hour,called from SMON to find if there is any online builder death and cleanup our ind$ and obj$ and drop the journal table, stop journaling)。
这种清理工作典型的调用堆栈stack call如下:

ksbrdp -> ktmSmonMain  ktmmon -> kdicclean -> kdic_cleanup -> ktssdrp_segment

注意因为SMON进程的清理工作每小时才执行一次,而且在工作负载很高的情况下可能实际很久都不会得到清理,在这种情景中我们总是希望能尽快完成对索引的在线创建或重建,在10gr2以后的版本中我们可以直接使用dbms_repair.online_index_clean来手动清理online index rebuild的遗留问题:

SQL> drop index macleans_index;
drop index macleans_index
           *
ERROR at line 1:
ORA-08104: this index object 1343842 is being online built or rebuilt

DECLARE
 isClean BOOLEAN;
BEGIN
  isClean := FALSE;
  WHILE isClean=FALSE
  LOOP
    isClean := dbms_repair.online_index_clean(
    dbms_repair.all_index_id, dbms_repair.lock_wait);
    dbms_lock.sleep(10);
  END LOOP;
END;
/

SQL>  drop index macleans_index;
 drop index macleans_index
            *
ERROR at line 1:
ORA-01418: specified index does not exist

成功清理

但是如果在9i中的话就比较麻烦,可以尝试用以下方法(不是很推荐,除非你已经等了很久):

1.首先手工删除在线日志表,通过以下手段找出这个中间表的名字

select object_name
  from dba_objects
 where object_name like
       (select '%' || object_id || '%'
          from dba_objects
         where object_name = '&INDEX_NAME')
/
Enter value for index_name: MACLEANS_INDEX
old   6:          where object_name = '&INDEX_NAME')
new   6:          where object_name = 'MACLEANS_INDEX')

OBJECT_NAME
--------------------------------------------------------------------------------
SYS_JOURNAL_1343845

SQL> drop table SYS_JOURNAL_1343845;

Table dropped.

2.第二步要手动修改IND$字典基表

!!!!!! 注意!手动修改数据字典要足够小心!!

select flags from ind$ where obj#=&INDEX_OBJECT_ID;
Enter value for index_object_id: 1343845
old   1: select flags from ind$ where obj#=&INDEX_OBJECT_ID
new   1: select flags from ind$ where obj#=1343845

     FLAGS
----------
       256

a) 针对online create index,手动删除对应的记录
delete from IND$ where obj#=&INDEX_OBJECT_ID

b) 针对online rebuild index,手动恢复对应记录的FLAGS标志位
update IND$ set FLAGS=FLAGS-512 where obj#=&INDEX_OBJECT_ID

接下来我们实际观察一下清理工作的细节:

SQL> select obj# from obj$ where name='MACLEANS_INDEX';

      OBJ#
----------
   1343854

SQL> select FLAGS from ind$ where obj#=1343854;

     FLAGS
----------
       256

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 8;
Statement processed.
SQL> DECLARE
  2   isClean BOOLEAN;
  3  BEGIN
  4    isClean := FALSE;
  5    WHILE isClean=FALSE
  6    LOOP
  7      isClean := dbms_repair.online_index_clean(
  8      dbms_repair.all_index_id, dbms_repair.lock_wait);
  9
 10      dbms_lock.sleep(10);
 11    END LOOP;
 12  END;
 13  /

PL/SQL procedure successfully completed.

===============================10046 trace=============================
select i.obj#, i.flags, u.name, o.name, o.type#
  from sys.obj$ o, sys.user$ u, sys.ind_online$ i
 where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)
   and (not ((i.type# = 9) and bitand(i.flags, 8) = 8))
   and o.obj# = i.obj#
   and o.owner# = u.user#

select u.name,
       o.name,
       o.namespace,
       o.type#,
       decode(bitand(i.property, 1024), 0, 0, 1)
  from ind$ i, obj$ o, user$ u
 where i.obj# = :1
   and o.obj# = i.bo#
   and o.owner# = u.user#

delete from object_usage
 where obj# in (select a.obj#
                  from object_usage a, ind$ b
                 where a.obj# = b.obj#
                   and b.bo# = :1)

drop table "SYS"."SYS_JOURNAL_1343854" purge
delete from icoldep$ where obj# in (select obj# from ind$ where bo#=:1)

delete from ind$ where bo#=:1

delete from ind$ where obj#=:1

我们可以利用以下语句找出系统中可能需要恢复的IND$记录,注意不要看到查询有结果就认为这是操作失败的征兆,很可能是有人在线创建或重建索引:

select i.obj#, i.flags, u.name, o.name, o.type#
  from sys.obj$ o, sys.user$ u, sys.ind_online$ i
 where (bitand(i.flags, 256) = 256 or bitand(i.flags, 512) = 512)
   and (not ((i.type# = 9) and bitand(i.flags, 8) = 8))
   and o.obj# = i.obj#
   and o.owner# = u.user#
/

相关诊断事件可以通过设置诊断事件event=’8105 trace name context forever’
来禁止SMON清理IND$(Oracle event to turn off smon cleanup for online index build)

 alter system set events '8105 trace name context forever';

Find INTCOL#=1001 in col_usage$?

<了解你所不知道的SMON功能(四):维护col_usage$字典基表>中我介绍了SMON后台进程维护字典基表COL_USAGE$一些细节,有网友阅读了这篇文档后发现其数据库的COL_USAGE$中存在INTCOL#=1001的记录。

INTCOL#列表示internal column number对应于COL$基表的INTCOL#,注意Internal Column Number与COL#(column number as created)是不同的。$ORACLE_HOME/rdbms/admin/sql.bsq对于INTCOL#给出了解释:

* If a table T(c1, addr, c2) contains an ADT column addr which is stored
   * exploded, the table will be internally stored as
   *              T(c1, addr, C0003$, C0004$, C0005$, c2)
   * Of these, only c1, addr and c2 are user visible columns. Thus, the
   * user visible column numbers for (c1, addr, C0003$, C0004$, C0005$, c2)
   * will be 1,2,0,0,0,3. And the corresponding internal column numbers will
   * be 1,2,3,4,5,6.
   *
   * Some dictionary tables like icol$, ccol$ need to contain intcol# so
   * that we can have indexes and constraints on ADT attributes. Also, these
   * tables also need to contain col# to maintain backward compatibility.
   * Most of these tables will need to be accessed by col#, intcol# so
   * indexes are created on them based on (obj#, col#) and (obj#, intcol#).
   * Indexes based on col# have to be non-unique if ADT attributes might
   * appear in the table. Indexes based on intcol# can be unique.

这里的ADT指的是抽象数据类型(Abstract DataType is a user defined data type),例如:

CREATE OR REPLACE TYPE persons_address AS OBJECT (
  streetNumber NUMBER,
  streetName   VARCHAR2(30),
  citySuburb   VARCHAR2(30),
  state        VARCHAR2(4),
  postCode     NUMBER
);

熟读Oracle官方文档的朋友一定会记得,Oracle中单表的column总数存在一个上限:1000,即单表不能拥有超过1000个列。

但令人疑惑的是INTCOL#居然是1001,显然1001是某种magic number,而不是指第1001列。

搞清楚这个问题后,再进一步探索就不难发现问题的关键了:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> drop table maclean;

Table dropped.

SQL> create table maclean(oppo_find_me int);

Table created.

SQL> select object_id from dba_objects where object_name='MACLEAN';

 OBJECT_ID
----------
   1343832

SQL> select intcol# from col_usage$ where obj#=1343832;

no rows selected

SQL> insert into maclean values(1);

1 row created.

SQL> commit;

Commit complete.

SQL> select rowid from maclean;    

ROWID
------------------
AAFIFYAABAAByPKAAA

SQL> delete maclean where rowid='AAFIFYAABAAByPKAAA';

1 row deleted.

SQL> commit;

Commit complete.

SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

SQL> select intcol#,equality_preds from col_usage$ where obj#=1343832;

   INTCOL# EQUALITY_PREDS
---------- --------------
      1001              1

通过这一点可以了解那些字典基表是以rowid为条件来查询或更新的

SQL> select owner || '.' || object_name
  2    from dba_objects
  3   where object_id in (select obj# from col_usage$ where intcol# = 1001);

OWNER||'.'||OBJECT_NAME
--------------------------------------------------------------------------------
SYS.COL$
SYS.CDEF$
SYS.VIEW$
SYS.SUMPARTLOG$
SYS.SUM$
SYS.SUMKEY$
SYS.SUMAGG$
SYS.SUMPRED$
SYS.SUMQB$
SYS.PS$
SYS.AW_OBJ$

OWNER||'.'||OBJECT_NAME
--------------------------------------------------------------------------------
SYS.AW_PROP$
SYS.WRI$_ADV_OBJECTS
WMSYS.WM$WORKSPACES_TABLE
SYS.MACLEAN
CTXSYS.DR$INDEX
XDB.XDB$H_INDEX
XDB.XDB$RESOURCE
EXFSYS.RLM$SCHACTLIST
SYS.AW$EXPRESS
MACLEAN.SAMPLE

总结:
Oracle最早在9i中引入了col_usage$字典基表,其目的在于监控column在SQL语句作为predicate的情况,col_usage$的出现完善了CBO中柱状图自动收集的机制。该字典基表上的INTCOL#列代表Internal Column Number以标识ADT列。INTCOL#等于1001代表ROWID伪列,也就是相关对象曾使用ROWID充当predicate。

Buffer Lock Mode and Compatibilities

Buffer Modes:

  • KCBMNEW : New buffer for exclusive access
  • KCBMSHR : Current buffer for shared access
  • KCBMEXL : Current buffer for exclusive access
  • KCBMCR : CR buffer for shared access
  • KCBMCRX : Variant of CR mode
  • KCBMNULL : Used to keep a reference to the buffer

oracle_buffer_lock_compatibilities

Buffer State

  • KCBBHFREE : buffer free
  • KCBBHEXLCUR : buffer current
  • KCBBHSHRCUR : buffer current
  • KCBBHCR : buffer consistent read
  • KCBBHREADING : being read
  • KCBBHMRECOVERY : media recovery
  • KCBBHIRECOVERY : crash or instance recovery
  • KCBBHWRITING : write clone
  • KCBBHPI : past image
  • KCBBHMEMORY : allocated memory buffer
  • KCBBHMWRITING : media recovery write clone
  • KCBBHDONATED : donated to another memory component

Buffer Class

  • KCBCDATA : DB blocks
  • KCBCSORT : sort blocks
  • KCBCSVUN : save undo blocks
  • KCBCSEGH : segment header blocks
  • KCBCSVSH : save undo segment hdrs
  • KCBCFRLS : free list blocks
  • KCBCEMAP : extent map
  • KCBCBMAP : bitmap block
  • KCBCIMAP : bitmap index block
  • KCBCBFSB : bitmapped file space bitmap
  • KCBCBFSH : bitmapped file space header
  • KCBCPW : prewarmed buffers class initially

 

关于V$OPEN_CURSOR

在之前的一次讨论中,有同行指出V$OPEN_CURSOR中列出的不是OPEN CURSOR而是SESSION CACHED CURSOR,原因是在一次ORA-01000(maximum open cursors exceeded)事故中他没有从V$OPEN_CURSOR中找到大量的打开游标。

对于这个问题,我们可以利用JAVA程序做一个演示来说明,以下为JAVA代码:

package javaapplication2;
import java.util.logging.Level;
import java.util.logging.Logger;
import oracle.jdbc.*;
import java.sql.*;

public class Main {

    public static void main(String[] args) throws SQLException {
        try {
            Class.forName("oracle.jdbc.driver.OracleDriver");
        }catch(Exception e ){}
   Connection cnn1=DriverManager.getConnection("jdbc:oracle:thin:@localhost:1521:G11R2", "maclean", "maclean");

  // Connection m[]=new Connection[2000];
   Connection myconn=DriverManager.getConnection("jdbc:oracle:thin:@localhost:1521:G11R2", "maclean", "maclean");

    Statement stat1=myconn.createStatement();
   ResultSet rst1=stat1.executeQuery("select * from v$version");
   while(rst1.next())
   {
       System.out.println(rst1.getString(1));
   }
   rst1=stat1.executeQuery("select distinct sid from v$mystat");

   while (rst1.next()){
   System.out.println("MY SID IS "+rst1.getString(1));
        }

   PreparedStatement s[]=new PreparedStatement[2000];
   PreparedStatement p;
   //ResultSet r[]=new ResultSet[2000];
   int i=0;
   while(i<2000){
    //  m[i]=DriverManager.getConnection("jdbc:oracle:thin:@192.168.1.121:1521:G10R2", "maclean", "maclean");
      //s[i]=m[i].createStatement();
      //m[i].setAutoCommit(false);
      //s[i].execute("insert into testjava values(1)");
       p=myconn.prepareStatement("select /* FIND_ME_OPPO */ * from dual");
       p.execute();

            try {
                Thread.sleep(200);
            } catch (InterruptedException ex) {
                Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
            }

      i++;
      System.out.println(i+" cursor is ok !");
   }
   }
}

以上JAVA代码会打个一个数据库会话,并在循环中不断以prepareStatement对象执行SQL语句,且我们不使用close()方法关闭prepareStatement所打开的游标,实际上这是很多JDBC应用产生ORA-01000问题的主要原因,开发人员在使用prepareStatement时不知道要使用close()方法以回收OPEN CURSOR资源。

注意这里在使用JDBC API时的表现(可能是目前最流行应用形式)和PL/SQL中的游标是存在区别的,在PL/SQL使用close cursor语句并不会真正意义上关闭游标。出于性能的考量,PL/SQL中的游标将被缓存以备将来使用,同时Oracle会维护一张cursor的LRU列表,但如果当本会话的游标数量即将达到open_cursors参数所定义的上限数量时,老的游标将被真正意义上close,以便open后来者。

The following is a technical explanation provided by Oracle Development: 

The server caches cursors opened by the PL/SQL engine.  Each time

a cursor is closed, it is really moved to an Least Recently Used

(LRU) list of open cursors and left open.  This is done as a

performance improvement by saving a ’round_trip’ from Client to Server

each time a cursor is opened.

 

Note that no optimization occurs.  In other words, the cursor caching

code does not check to see if a particular query already has a cursor

open; it simply creates another one.  This is done until OPEN_CURSORS

is reached.  If you have OPEN_CURSORS cached as open, however, then if

an identical query occurs, the server reuses the cached cursor.

可以通过以下语句来了解系统中真正意义上的打开着的游标:

select sum(a.value), b.name
from v$sesstat a, v$statname b
where a.statistic# = b.statistic#
and b.name = 'opened cursors current'
group by b.name;

编译并尝试运行以上程序:

compile:
Created dir: C:\Users\Maclean Liu\Documents\NetBeansProjects\JavaApplication2\dist
Copy libraries to C:\Users\Maclean Liu\Documents\NetBeansProjects\JavaApplication2\dist\lib.
Building jar: C:\Users\Maclean Liu\Documents\NetBeansProjects\JavaApplication2\dist\JavaApplication2.jar
To run this application from the command line without Ant, try:
java -jar "C:\Users\Maclean Liu\Documents\NetBeansProjects\JavaApplication2\dist\JavaApplication2.jar"
jar:
成功生成(总时间:0 秒)

java -jar "C:\Users\Maclean Liu\Documents\NetBeansProjects\JavaApplication2\dist\JavaApplication2.jar"
MY SID IS 392
1 cursor is ok !
2 cursor is ok !
......................

以上程序打开的会话为392,我们来观察392会话的CURSOR情况:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

select sql_text, cursor_type, count(*)
  from v$open_cursor
 where sid = 392
 group by sql_text, cursor_type
 order by 3 desc
/

SQL_TEXT                                                     CURSOR_TYPE                                          COUNT(*)
------------------------------------------------------------ -------------------------------------------------- ----------
select /* FIND_ME_OPPO */ * from dual                        OPEN                                                     1279
select * from v$version                                      OPEN                                                        1
select privilege# from sysauth$ where (grantee#=:1 or grante DICTIONARY LOOKUP CURSOR CACHED                             1
select distinct sid from v$mystat                            OPEN                                                        1
insert into sys.aud$( sessionid,entryid,statement,ntimestamp OPEN-RECURSIVE                                              1
select /*+ connect_by_filtering */ privilege#,level from sys DICTIONARY LOOKUP CURSOR CACHED                             1

可以看到"select /* FIND_ME_OPPO */ * from dual"语句在V$OPEN_CURSOR中存在1279条记录,
且CURSOR TYPE均为OPEN

注意V$OPEN_CURSOR视图中的CURSOR_TYPE列直到11g release 2中才出现,如果该列在9i/10g中就有的话那么可能就不会造成那么多misunderstand了,实际上V$OPEN_CURSOR中的记录来源于X$KGLLK:

SQL> select view_definition
  2    from v$fixed_view_definition
  3   where view_name = 'GV$OPEN_CURSOR';

VIEW_DEFINITION
--------------------------------------------------------------------------------
select inst_id,
       kgllkuse,
       kgllksnm,
       user_name,
       kglhdpar,
       kglnahsh,
       kgllksqlid,
       kglnaobj,
       kgllkest,
       decode(kgllkexc, 0, to_number(NULL), kgllkexc),
       kgllkctp                                              -- CURSOR_TYPE
  from x$kgllk
 where kglhdnsp = 0
   and kglhdpar != kgllkhdl

KGLHDPAR!=KGLLKHDL 该条件排除了父游标,所以V$OPEN_CURSOR列出的均是child cursor
KGLHDNSP ==> NAME SPACE =0 

SQL> select distinct kgllkctp from X$kgllk;

KGLLKCTP
----------------------------------------------------------------
SESSION CURSOR CACHED
PL/SQL CURSOR CACHED
OPEN
OPEN-RECURSIVE
DICTIONARY LOOKUP CURSOR CACHED
BUNDLE DICTIONARY LOOKUP CACHED

以上是六种CURSOR_TYPE,可以看到SESSION CURSOR CACHED和PL/SQL CURSOR CACHED也在其中:

SQL> select kgllkmod, kgllkctp, count(*)
  2    from X$KGLLK
  3   where KGLHDNSP = 0
  4   group by kgllkmod, kgllkctp;

  KGLLKMOD KGLLKCTP                                                           COUNT(*)
---------- ---------------------------------------------------------------- ----------
         1 SESSION CURSOR CACHED                                                    32
         1 DICTIONARY LOOKUP CURSOR CACHED                                          96
         1 OPEN-RECURSIVE                                                           42
         1 PL/SQL CURSOR CACHED                                                     19
         1 OPEN                                                                    216

注意不管是何种CURSOR_TYPE,本质上都是child cursor上的library cache lock,KGLLKMOD=1即这些library cache lock的lock mode均是NULL,也就是breakable parse lock。

Null (N): special for session persistency 

Null locks are special and are only acquired on executable objects (child cursors, procedures, functions, and packages)

A broken null lock means the meta-data has changed.

Null locks are special. They are acquired on objects that are to be executed (child cursor, procedure, function, package, or type body) and they are used to maintain an interest on an object for a long period of time (session persistency), and to detect if the object becomes invalid. Null locks can be broken at any time. This is used as a mechanism to notify a session that an executable object is no longer valid. If a null lock is broken, and thus the object is invalidated, it is an indication to the user who was holding the null lock that the object needs to be recompiled.

A Null lock is acquired during the parse phase of SQL statement execution and is held as long as the shared SQL area for that statement remains in the shared pool. A null lock does not prevent any DDL operation, and can be broken to allow conflicting DDL operations, hence the term breakable parse lock.

A Null lock on an object is broken when there is an exclusive pin on the object. A null lock on a read-only object is broken where there is an exclusive pin on any of the parent objects it depends on.

 

因为CURSOR_TYPE(kgllkctp)列的出现我们在11.2中能够很方便地分辨OPEN CUROSR和SESSION CURSOR CACHED,但如果是在9i/10g/11gr1中则无法通过V$OPEN_CURSOR或X$KGLLK找出哪些是打开游标,另一些是会话缓存游标?

实际上Oracle Support在10g中已经意识到了这个问题,Metalink Note<Bug 7375227 – V$OPEN_CURSOR contains both open cursors and session cached cursors [ID 7375227.8]>说明了该问题:

Bug 7375227  V$OPEN_CURSOR contains both open cursors and session cached cursors
 This note gives a brief overview of bug 7375227.
 The content was last updated on: 10-JUL-2009
 Click here for details of each of the sections below.
Affects:

    Product (Component)	Oracle Server (Rdbms)
    Range of versions believed to be affected	Versions >= 10.2.0.1 but < 11.2
    Versions confirmed as being affected	

        10.2.0.3 

    Platforms affected	Generic (all / most platforms affected)

Fixed:

    This issue is fixed in	

        11.2 (Future Release) 

Symptoms:

Related To:
    Code Improvement
    (None Specified)
    V$OPEN_CURSOR 

Description

    View V$OPEN_CURSOR contains both open cursors and session cached cursors and
    before this fix there was no way to distinguish them.
    This fix adds new column CURSOR_TYPE to V$OPEN_CURSOR which distinguishes
    between open cursors and cursors in the session cache. 

    Values are:
     SYSTEM - recursive sql cursors
     PL/SQL - for open pl/sql cursors
     KNT CACHED, KKS CACHED, PL/SQL CACHED, KQD CACHED, KQD BUNDLE CACHED
     , KXCC CACHED - for cached cursors

Cursor Sharing的相关诊断事件:

[oracle@rh2 ~]$ oerr ora 10270
10270, 00000, "Debug shared cursors"
// *Cause: Enables debugging code in shared cursor management modules
// *Action:

alter session set events '10270 trace name context forever, level 10';

[oracle@rh2 ~]$ oerr ora 10277
10277, 00000, "Cursor sharing (or not) related event (used for testing)"
// *Cause:
// *Action:

 alter session set events '10277 trace name context forever, level 1';

alter session set events  'immediate trace name library_cache level 10';

了解你所不知道的SMON功能(六):Instance Recovery

SMON的作用还包括RAC环境中的Instance Recovery,注意虽然Instance Recovery可以翻做实例恢复,但实际上和我们口头所说的实例恢复是不同的。我们口头语言所说的实例恢复很大程度上是指Crash Recovery崩溃恢复,Instance Recovery与Crash Recovery是存在区别的:针对单实例(single instance)或者RAC中所有节点全部崩溃后的恢复,我们称之为Crash Recovery。而对于RAC中的某一个节点失败,存活节点(surviving instance)试图对失败节点线程上redo做应用的情况,我们称之为Instance Recovery。对于Crash Recovery更多的内容可见<还原真实的cache recovery>一文。

现象

Instance Recovery期间分别存在cache recovery和ges/gcs remaster2个recovery stage,注意这2个舞台的恢复是同时进行的。cache recovery的主角是存活节点上的SMON进程,SMON负责分发redo给slave进程。而实施ges/gcs remaster的是RAC专有进程LMON。

整个Reconfiuration的过程如下图:
rac_instance_recovery_reconfiguration
 
注意以上Crash Detected时数据库进入部分可用(Partial Availability)状态,从Freeze Lockdb开始None Availability,到IR applies redo即前滚时转换为Partial Availability,待前滚完成后会实施回滚,但是此时数据库已经进入完全可用(Full Availability)状态了,如下图:
 
oracledatabase12g_instance_recovery

The graphic illustrates the degree of database availability during each step of Oracle instance recovery: 

A.         Real Application Clusters is running on multiple nodes.

B.         Node failure is detected.

C.         The enqueue part of the GRD is reconfigured; resource management is redistributed to the surviving nodes. This operation occurs relatively quickly.

D.        The cache part of the GRD is reconfigured and SMON reads the redo log of the failed instance to identify the database blocks that it needs to recover.

E.         SMON issues the GRD requests to obtain all the database blocks it needs for recovery. After the requests are complete, all other blocks are accessible.

F.         The Oracle server performs roll forward recovery. Redo logs of the failed threads are applied to the database, and blocks are available right after their recovery is completed.

G.        The Oracle server performs rollback recovery. Undo blocks are applied to the database for all uncommitted transactions.

H.        Instance recovery is complete and all data is accessible.

Note: The dashed line represents the blocks identified in step 2 in the previous slide. Also, the dotted steps represent the ones identified in the previous slide.

我们来实际观察一下Instance Recovery的过程:

INST 1:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> alter system set event='10426 trace name context forever,level 12' scope=spfile;  -- 10426 event Reconfiguration trace event
System altered.

SQL> startup force;
ORACLE instance started.

INST 2:
SQL> shutdown abort
ORACLE instance shut down.

=============================================================
========================alert.log============================

Reconfiguration started (old inc 4, new inc 6)
List of instances:
 1 (myinst: 1)
 Global Resource Directory frozen
 * dead instance detected - domain 0 invalid = TRUE
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
 Post SMON to start 1st pass IR
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
 parallel recovery started with 2 processes                 --2 recovery slave
 Submitted all GCS remote-cache requests
 Post SMON to start 1st pass IR
 Fix write in gcs resources
Reconfiguration complete
Started redo scan
Completed redo scan
 read 88 KB redo, 82 data blocks need recovery
Started redo application at
 Thread 2: logseq 374, block 2, scn 54624376
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0
  Mem# 0: +DATA/prod/onlinelog/group_4.271.747100549
  Mem# 1: +DATA/prod/onlinelog/group_4.272.747100553
Completed redo application of 0.07MB
Completed instance recovery at
 Thread 2: logseq 374, block 178, scn 54646382
 73 data blocks read, 83 data blocks written, 88 redo k-bytes read
Thread 2 advanced to log sequence 375 (thread recovery)
Redo thread 2 internally disabled at seq 375 (SMON)
ARC3: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (thread 2 sequence 374) (PROD1)
Setting Resource Manager plan SCHEDULER[0x310B]:DEFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
ARC3: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_374_747100216.dbf' (PROD1)
2011-06-27 22:19:29.280000 +08:00
Archived Log entry 792 added for thread 2 sequence 374 ID 0x9790ab2 dest 1:
ARC0: Creating local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (thread 2 sequence 375) (PROD1)
2011-06-27 22:19:30.336000 +08:00
ARC0: Archiving disabled thread 2 sequence 375
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_1: '/s01/arch/2_375_747100216.dbf' (PROD1)
Archived Log entry 793 added for thread 2 sequence 375 ID 0x9790ab2 dest 1:
minact-scn: Master considers inst:2 dead

==================================================================
===========================smon trace begin=======================

*** 2011-06-27 22:19:28.279
2011-06-27 22:19:28.279849 : Start recovery for domain=0, valid=0, flags=0x0
Successfully allocated 2 recovery slaves
Using 67 overflow buffers per recovery slave
Thread 2 checkpoint: logseq 374, block 2, scn 54624376
  cache-low rba: logseq 374, block 2
    on-disk rba: logseq 374, block 178, scn 54626382
  start recovery at logseq 374, block 2, scn 54624376
Instance recovery not required for thread 1

*** 2011-06-27 22:19:28.487
Started writing zeroblks thread 2 seq 374 blocks 178-185

*** 2011-06-27 22:19:28.487
Completed writing zeroblks thread 2 seq 374
==== Redo read statistics for thread 2 ====
Total physical reads (from disk and memory): 4096Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 88Kb in 0.18s => 0.48 Mb/sec
Longest record: 8Kb, moves: 0/186 (0%)
Longest LWN: 33Kb, moves: 0/47 (0%), moved: 0Mb
Last redo scn: 0x0000.0341884d (54626381)
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 248/330 = 0.8
----------------------------------------------

*** 2011-06-27 22:19:28.489
KCRA: start recovery claims for 82 data blocks

*** 2011-06-27 22:19:28.526
KCRA: blocks processed = 82/82, claimed = 81, eliminated = 1
2011-06-27 22:19:28.526088 : Validate domain 0
 **************** BEGIN RECOVERY HA STATS  ****************

 I'm the recovery instance

 smon posted (1278500359646), recovery started 0.027 secs,(1278500359673)
 domain validated 0.242 secs (1278500359888)
 claims opened 70, claims converted 11, claims preread 0

 ****************  END RECOVERY HA STATS  *****************
2011-06-27 22:19:28.526668 : Validated domain 0, flags = 0x0

*** 2011-06-27 22:19:28.556
Recovery of Online Redo Log: Thread 2 Group 4 Seq 374 Reading mem 0

*** 2011-06-27 22:19:28.560
Completed redo application of 0.07MB

*** 2011-06-27 22:19:28.569
Completed recovery checkpoint
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 82/82 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 330/330 = 1.0
----------------------------------------------
*** 2011-06-27 22:19:28.572 5401 krsg.c
Acquiring RECOVERY INFO PING latch from [krsg.c:5401] IX0
*** 2011-06-27 22:19:28.572 5401 krsg.c
Successfully acquired RECOVERY INFO PING latch IX+
*** 2011-06-27 22:19:28.572 5406 krsg.c
Freeing RECOVERY INFO PING latch from [krsg.c:5406] IX0
*** 2011-06-27 22:19:28.572 5406 krsg.c
Successfully freed RECOVERY INFO PING latch IX-
krss_sched_work: Prod archiver request from process SMON (function:0x2000)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Evaluating ARC2 to receive message (flags 0x2)
krss_find_arc: Selecting ARC2 to receive REC PING message
*** 2011-06-27 22:19:28.572 3093 krsv.c
krsv_send_msg: Sending message to process ARC2
*** 2011-06-27 22:19:28.572 1819 krss.c
krss_send_arc: Sent message to ARC2 (message:0x2000)
Recovery sets nab of thread 2 seq 374 to 178 with 8 zeroblks
Retrieving log 4
pre-aal: xlno:4 flno:0 arf:0 arb:2 arh:2 art:4
Updating log 3 thread 2 sequence 375
Previous log 3 thread 2 sequence 0
Updating log 4 thread 2 sequence 374
Previous log 4 thread 2 sequence 374
post-aal: xlno:4 flno:0 arf:3 arb:2 arh:2 art:3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC3 to receive message (flags 0x0)
krss_find_arc: Selecting ARC3 to receive message
*** 2011-06-27 22:19:28.589 3093 krsv.c
krsv_send_msg: Sending message to process ARC3
*** 2011-06-27 22:19:28.589 1819 krss.c
krss_send_arc: Sent message to ARC3 (message:0x1)
Retrieving log 2
Kicking thread 1 to switch logfile
Retrieving log 4
Retrieving log 3
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC0 to receive message (flags 0x0)
krss_find_arc: Selecting ARC0 to receive message
*** 2011-06-27 22:19:28.599 3093 krsv.c
krsv_send_msg: Sending message to process ARC0
*** 2011-06-27 22:19:28.599 1819 krss.c
krss_send_arc: Sent message to ARC0 (message:0x1)
*** 2011-06-27 22:19:28.599 838 krsv.c
krsv_dpga: Waiting for pending I/O to complete

*** 2011-06-27 22:19:29.304
krss_sched_work: Prod archiver request from process SMON (function:0x1)
krss_find_arc: Evaluating ARC1 to receive message (flags 0xc)
krss_find_arc: Selecting ARC1 to receive message
*** 2011-06-27 22:19:29.304 3093 krsv.c
krsv_send_msg: Sending message to process ARC1
*** 2011-06-27 22:19:29.304 1819 krss.c
krss_send_arc: Sent message to ARC1 (message:0x1)
SMON[INST-TXN-RECO]:about to recover undo segment 11 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 11 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 12 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 12 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 13 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 13 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 14 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 14 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 15 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 15 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 16 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 16 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 17 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 17 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 18 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 18 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 19 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 19 as available status:2 ret:0
SMON[INST-TXN-RECO]:about to recover undo segment 20 status:3 inst:2
SMON[INST-TXN-RECO]: mark undo segment 20 as available status:2 ret:0

*** 2011-06-27 22:19:43.299
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1
* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:22:18.333
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

*** 2011-06-27 22:24:53.365
* kju_tsn_aff_drm_pending TRACEUD: called with tsn x2, dissolve 0
* kju_tsn_aff_drm_pending TRACEUD: tsn_pkey = x2.1

* >> RM REQ QS ---:
 single window RM request queue is empty
 multi-window RM request queue is empty
* Global DRM state ---:
   There is no dynamic remastering
   RM lock state = 0
 pkey 2.1 undo 1 stat 0 masters[32768, 1->1] reminc 4 RM# 1
 flg x0 type x0 afftime x36e6e3a8
 nreplays by lms 0 = 0
* kju_tsn_aff_drm_pending TRACEUD: matching request not found on swin queue
* kju_tsn_aff_drm_pending TRACEUD: pp found, stat x0
* kju_tsn_aff_drm_pending TRACEUD: 2 return true

========================================================================
==============================lmon trace begin==========================

 *** 2011-06-27 22:19:27.748
kjxgmpoll reconfig instance map: 1

*** 2011-06-27 22:19:27.748
kjxgmrcfg: Reconfiguration started, type 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 4 0.

*** 2011-06-27 22:19:27.759
     Name Service frozen
kjxgmcs: Setting state to 4 1.
kjxgrdecidever: No old version members in the cluster
kjxgrssvote: reconfig bitmap chksum 0x2137452d cnt 1 master 1 ret 0
kjxgrpropmsg: SSMEMI: inst 1 - no disk vote
kjxgrpropmsg: SSVOTE: Master indicates no Disk Voting
2011-06-27 22:19:27.760783 : kjxgrDiskVote: nonblocking method is chosen
kjxggpoll: change poll time to 50 ms
2011-06-27 22:19:27.918847 : kjxgrDiskVote: Obtained RR update lock for sequence 5, RR seq 4
2011-06-27 22:19:28.023160 : kjxgrDiskVote: derive membership from CSS (no disk votes)
2011-06-27 22:19:28.023240 : proposed membership: 1

*** 2011-06-27 22:19:28.081
2011-06-27 22:19:28.081952 : kjxgrDiskVote: new membership is updated by inst 1, seq 6
2011-06-27 22:19:28.082073 : kjxgrDiskVote: bitmap: 1
CGS/IMR TIMEOUTS:
  CSS recovery timeout = 31 sec (Total CSS waittime = 65)
  IMR Reconfig timeout = 75 sec
  CGS rcfg timeout = 85 sec
kjxgmmeminfo: can not invalidate inst 2
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 6 2.
kjfmSendAbortInstMsg: send an abort message to instance 2
 kjfmuin: inst bitmap 1
 kjfmmhi: received msg from inst 1 (inc 2)
     Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 6 3.
     Name Service recovery started
     Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 6 4.
     Multicasted all local name entries for publish
     Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 6 5.
     Name Service normal
     Name Service recovery done

*** 2011-06-27 22:19:28.191
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 6 6.
kjxgmcs: total reconfig time 0.432 seconds (from 2895072218 to 2895072650)
kjxggpoll: change poll time to 600 ms
kjfmact: call ksimdic on instance (2)
2011-06-27 22:19:28.211846 :

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 22:19:28.211906 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 22:19:28.211954 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
2011-06-27 22:19:28.212394 : * Begin lmon rcfg step KJGA_RCFG_FREEZE

*** 2011-06-27 22:19:28.233
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 1,
  (rcvinst 1), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 22:19:28.233575 :  Global Resource Directory frozen
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully
inst 1
* kjfcrfg: single_instance_kjga = TRUE

 asby init, 0/1/x2

 asby returns, 0/1/x2/false

* Domain maps before reconfiguration:
*   DOMAIN 0 (valid 1): 1 2
* End of domain mappings

 * dead instance detected - domain 0 invalid = TRUE

* Domain maps after recomputation:
*   DOMAIN 0 (valid 0): 1
* End of domain mappings
2011-06-27 22:19:28.235110 : * Begin lmon rcfg step KJGA_RCFG_COMM
2011-06-27 22:19:28.235242 : GSIPC:KSXPCB: msg 0xd8b84550 status 34, type 2, dest 2, rcvr 0
2011-06-27 22:19:28.235339 : GSIPC:KSXPCB: msg 0xd8b80180 status 34, type 2, dest 2, rcvr 1
 Active Sendback Threshold = 50 %
 Communication channels reestablished
2011-06-27 22:19:28.240076 : * Begin lmon rcfg step KJGA_RCFG_EXCHANGE
2011-06-27 22:19:28.240192 : * Begin lmon rcfg step KJGA_RCFG_ENQCLEANUP
 Master broadcasted resource hash value bitmaps
2011-06-27 22:19:28.251474 :
 Non-local Process blocks cleaned out
2011-06-27 22:19:28.251822 : * Begin lmon rcfg step KJGA_RCFG_CLEANUP
2011-06-27 22:19:28.265220 : * Begin lmon rcfg step KJGA_RCFG_TIMERQ
2011-06-27 22:19:28.265308 : * Begin lmon rcfg step KJGA_RCFG_DDQ
2011-06-27 22:19:28.265393 : * Begin lmon rcfg step KJGA_RCFG_SETMASTER
2011-06-27 22:19:28.271551 :
 Set master node info
2011-06-27 22:19:28.271931 : * Begin lmon rcfg step KJGA_RCFG_ENQREPLAY
2011-06-27 22:19:28.275490 :  Submitted all remote-enqueue requests
2011-06-27 22:19:28.275596 : * Begin lmon rcfg step KJGA_RCFG_ENQDUBIOUS
 Dwn-cvts replayed, VALBLKs dubious
2011-06-27 22:19:28.277223 : * Begin lmon rcfg step KJGA_RCFG_ENQGRANT
 All grantable enqueues granted
2011-06-27 22:19:28.277992 : * Begin lmon rcfg step KJGA_RCFG_PCMREPLAY
2011-06-27 22:19:28.279234 :
2011-06-27 22:19:28.279255 :  Post SMON to start 1st pass IR               --SMON posted by LMON
2011-06-27 22:19:28.307890 :  Submitted all GCS cache requests             --IR acquires all gcs resource needed for recovery
2011-06-27 22:19:28.308038 : * Begin lmon rcfg step KJGA_RCFG_FIXWRITES
 Post SMON to start 1st pass IR
 Fix write in gcs resources
2011-06-27 22:19:28.313508 : * Begin lmon rcfg step KJGA_RCFG_END
2011-06-27 22:19:28.313720 :
2011-06-27 22:19:28.313733 :
Reconfiguration complete

*   domain 0 valid?: 0
* kjfcrfg: ask RMS0 to do pnp work
 **************** BEGIN DLM RCFG HA STATS  ****************

 Total dlm rcfg time (inc 6): 0.100 secs (1278500359581, 1278500359681)
   Begin step .........: 0.001 secs (1278500359581, 1278500359582)
   Freeze step ........: 0.020 secs (1278500359582, 1278500359602)
   Remap step .........: 0.002 secs (1278500359602, 1278500359604)
   Comm step ..........: 0.005 secs (1278500359604, 1278500359609)
   Sync 1 step ........: 0.000 secs (0, 0)
   Exchange step ......: 0.000 secs (1278500359609, 1278500359609)
   Sync 2 step ........: 0.000 secs (0, 0)
   Enqueue cleanup step: 0.011 secs (1278500359609, 1278500359620)
   Sync pcm1 step .....: 0.000 secs (0, 0)
   Cleanup step .......: 0.013 secs (1278500359620, 1278500359633)
   Timerq step ........: 0.000 secs (1278500359633, 1278500359633)
   Ddq step ...........: 0.000 secs (1278500359633, 1278500359633)
   Set master step ....: 0.006 secs (1278500359633, 1278500359639)
   Sync 3 step ........: 0.000 secs (0, 0)
   Enqueue replay step : 0.004 secs (1278500359639, 1278500359643)
   Sync 4 step ........: 0.000 secs (0, 0)
   Enqueue dubious step: 0.001 secs (1278500359643, 1278500359644)
   Sync 5 step ........: 0.000 secs (0, 0)
   Enqueue grant step .: 0.001 secs (1278500359644, 1278500359645)
   Sync 6 step ........: 0.000 secs (0, 0)
   PCM replay step ....: 0.030 secs (1278500359645, 1278500359675)
   Sync 7 step ........: 0.000 secs (0, 0)
   Fixwrt replay step .: 0.003 secs (1278500359675, 1278500359678)
   Sync 8 step ........: 0.000 secs (0, 0)
   End step ...........: 0.001 secs (1278500359680, 1278500359681)
 Number of replayed enqueues sent / received .......: 0 / 0
 Number of replayed fusion locks sent / received ...: 0 / 0
 Number of enqueues mastered before / after rcfg ...: 2217 / 2941
 Number of fusion locks mastered before / after rcfg: 3120 / 5747

 ****************  END DLM RCFG HA STATS  *****************

*** 2011-06-27 22:19:36.589
kjxgfipccb: msg 0x0x7ff526139320, mbo 0x0x7ff526139310, type 19, ack 0, ref 0, stat 34

=====================================================================
============================lms trace begin==========================


*** 2011-06-27 22:38:54.663
2011-06-27 22:38:54.663764 :  0 GCS shadows cancelled, 0 closed, 0 Xw survived
2011-06-27 22:38:54.673539 :  5230 GCS resources traversed, 0 cancelled
2011-06-27 22:38:54.707671 :  9322 GCS shadows traversed, 0 replayed, 0 duplicates,
5183 not replayed, dissolve 0 timeout 0 RCFG(10) lms 0 finished replaying gcs resources
2011-06-27 22:38:54.709132 :  0 write requests issued in 384 GCS resources        --check past image
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709520 :  0 write requests issued in 273 GCS resources
 1 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.709842 :  0 write requests issued in 281 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710159 :  0 write requests issued in 233 GCS resources
 0 PIs marked suspect, 0 flush PI msgs
2011-06-27 22:38:54.710531 :  0 write requests issued in 350 GCS resources
 lms 0 finished fixing gcs write protocol

Instance Recovery和普通的Crash Recovery最大的区别在于实例恢复过程中的GRD Frozen和对GES/GCS资源的Remaster,这部分工作主要由LMON进程完成,可以从以上trace中发现一些KJGA_RCFG_*形式的Reconfiguration步骤,它们的含义:

Reconfiguration Steps:

1.    KJGA_RCFG_BEGIN
LMON continuously polling for reconfiguration event. Once cgs reports a change in cluster membership,
LMON starts reconfiguration

2.    KJGA_RCFG_FREEZE
All processes acknowledges to the reconfiguration freeze before LMON continue 

3.    KJGA_RCFG_REMAP
Updates new instance map (kjfchsu), re-distributes resource mastership. Invalidate recovery domains
if reconfiguration is caused by instance death.

4.    KJGA_RCFG_COMM
Reinitialize communication channel

5.    KJGA_RCFG_EXCHANGE
Exchange of master information of gcs, ges and file affinity master

6.    KJGA_RCFG_ENQCLEANUP
Delete remote dead gcs/ges locks. Cancel converting gcs requests. 

7.    KJGA_RCFG_CLEANUP
Cleanup/remove ges resources

8.    KJGA_RCFG_TIMERQ
Restore relative timeout for enqueue locks on timeout queue

9.    KJGA_RCFG_DDQ
Clean out enqueue locks on deadlock queue

10.  KJGA_RCFG_SETMASTER
Update master info for each enqueue resources that need to be remastered.

11.  KJGA_RCFG_REPLAY
Replay enqueue locks

12.  KJGA_RCFG_ENQDUBIOUS
Invalidates ges resources without established value

13.  KJGA_RCFG_ENQGRANT
Grants all grantable ges lock requests

14.  KJGA_RCFG_REPLAY2
Enqueue reconfiguration complete. Post SMON to start instance recovery.  Starts replaying gcs resources.

15.  KJGA_RCFG_FIXWRITES2
Fix write state of gcs resources

16.  KJGA_RCFG_END
Unfreeze lock database

Instance Recovery相关的诊断事件
我们无法禁止Instance Recovery的发生,事实上一旦出现Instance Crash那么Instance Recovery就是必须的。
与Instance Recovery相关的诊断事件主要有10426和29717等:

    10426 – Reconfiguration trace event
    10425 – Enqueue operations
    10432 – Fusion activity
    10429 – IPC tracing

oerr ora 10425
10425, 00000, "enable global enqueue operations event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global enqueue operations.

oerr ora 10426
10426, 00000, "enable ges/gcs reconfiguration event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs reconfiguration.

oerr ora 10430
10430, 00000, "enable ges/gcs dynamic remastering event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for ges/gcs dynamic remastering.

oerr ora 10401
10401, 00000, "turn on IPC (ksxp) debugging"
// *Cause:
// *Action: Enables debugging code for IPC service layer (ksxp)

oerr ora 10708
10708, 00000, "print out trace information from the RAC buffer cache"
// *Cause: N/A
// *Action: THIS IS NOT A USER ERROR NUMBER/MESSAGE.  THIS DOES NOT NEED TO BE
//          TRANSLATED OR DOCUMENTED. IT IS USED ONLY FOR DEBUGGING.

oerr ora 29717
29717, 00000, "enable global resource directory freeze/unfreeze event trace"
// *Document: NO
// *Cause:
// *Action: Dump trace for global resource directory freeze/unfreeze.

diag RAC INSTANCE SHUTDOWN LMON
LMON will dump more informations to trace during reconfig and freeze.

event="10426 trace name context forever, level 8"
event="29717 trace name context forever, level 5"
or
event="10426 trace name context forever, level 12"
event="10430 trace name context forever, level 12"
event="10401 trace name context forever, level 8"
event="10046 trace name context forever, level 8"
event="10708 trace name context forever, level 15"
event="29717 trace name context forever, level 5"

see 29717  grd frozen trace

alter system set event='29717 trace name context forever, level 5' scope=spfile;

=========================================================================
============================lmon trace begin=============================

********* kjfcrfg() called, BEGIN LMON RCFG *********
2011-06-27 23:13:16.693089 : * Begin lmon rcfg step KJGA_RCFG_BEGIN
* kjfcrfg: Resource broadcasting disabled
* kjfcrfg: kjfcqiora returned success
kjfcrfg: DRM window size = 4096->4096 (min lognb = 15)
2011-06-27 23:13:16.693219 :
Reconfiguration started (old inc 4, new inc 6)
TIMEOUTS:
  Local health check timeout: 70 sec
  Rcfg process freeze timeout: 70 sec
  Remote health check timeout: 140 sec
  Defer Queue timeout: 163 secs
  CGS rcfg timeout: 85 sec
  Synchronization timeout: 248 sec
  DLM rcfg timeout: 744 sec
List of instances:
 1 (myinst: 1)
Undo tsn affinity 1
OMF 0
[FDB][start]
2011-06-27 23:13:16.701320 : * Begin lmon rcfg step KJGA_RCFG_FREEZE
[FACK][18711 not frozen]          --fack means acknowledge in advance
[FACK][18713 not frozen]
[FACK][18719 not frozen]
[FACK][18721 not frozen]
[FACK][18723 not frozen]
[FACK][18729 not frozen]
[FACK][18739 not frozen]
[FACK][18743 not frozen]
[FACK][18745 not frozen]
[FACK][18747 not frozen]
[FACK][18749 not frozen]
[FACK][18751 not frozen]
[FACK][18753 not frozen]
[FACK][18755 not frozen]
[FACK][18757 not frozen]
[FACK][18759 not frozen]
[FACK][18763 not frozen]
[FACK][18765 not frozen]
[FACK][18767 not frozen]
[FACK][18769 not frozen]
[FACK][18771 not frozen]
[FACK][18775 not frozen]
[FACK][18777 not frozen]
[FACK][18816 not frozen]
[FACK][18812 not frozen]
[FACK][18818 not frozen]
[FACK][18820 not frozen]
[FACK][18824 not frozen]
[FACK][18826 not frozen]
[FACK][18830 not frozen]
[FACK][18835 not frozen]
[FACK][18842 not frozen]
[FACK][18860 not frozen]
[FACK][18865 not frozen]
[FACK][18881 not frozen]
[FACK][18883 not frozen]
[FACK][18909 not frozen]

*** 2011-06-27 23:13:16.724
* published: inc 6, isnested 0, rora req 0,
  rora start 0, rora invalid 0, (roram 32767), isrcvinst 0,
  (rcvinst 32767), isdbopen 1, drh 0, (myinst 1)
  thread 1, isdbmounted 1, sid hash x0
* kjfcrfg: published bigns successfully
* Force-published at step 3
2011-06-27 23:13:16.724764 :  Global Resource Directory frozen
* kjfc_qry_bigns: noone has the rcvinst established yet, set it to the highest open instance = 1
* roram 32767, rcvinst 1
* kjfc_thread_qry: instance 1 flag 3 thread 1 sid 0
* kjfcrfg: queried bigns successfully

=====================================================================
==========================lmd0 trace begin===========================

*** 2011-06-27 23:13:16.700
[FFCLI][frozen]
[FUFCLI][normal]

Advanced Diagnostic using oradebug dumpvar

oradebug工具是Oracle数据库调优和诊断的利器,合理运用oradebug可以大幅减少我们收集诊断信息所花费的时间。当然前提是合理运用,对于初级DBA有这样一个忠告,不要在生产环境中去接触或修改自己所不熟悉的领域的东西,这一点很重要。并不是说中高级DBA在知识和经验上能达到巨细靡遗的状态,实际上中高级Oracle DBA可能每天都在和新事物打交道。恰恰相反,中高级DBA是对以上忠告最忠实的践行者,在生产环境中绝不随意涉险于不确定、不明确、不熟悉的区域,很多时候这会让人觉得是一种”好奇心丧失”的表现。

言归正传,我们所要介绍的是oradebug的dumpvar命令,该命令用于打印转储固定的PGA/UGA/SGA的变量:,其语法如下:

oradebug dumpvar
Print/dump a fixed PGA/SGA/UGA variable.
Syntax                                                              Parameter
oradebug dumpvar <p|s|uga> <variable name> [level]                  <p|s|uga> PGA,SGA or UGA
                                                                    fixed variable name
                                                                    [level]
使用示例
SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn


SQL> show parameter db_files

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_files                             integer     200

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug dumpvar sga kcfdpk
kfil kcfdpk_ [698B950, 698B954) = 000000C8

这里的KCFDPK变量保存了db_files参数的值,000000C8 == 200

利用oradebug工具我们不仅可以做到对这些内部变量的窥视,还可以做到修改,
但是这对我们实际的诊断没有益处,仅仅可以用来满足某些模拟实验

仅仅了解dumpvar命令的使用方法并没有意义,只有和有意义的内部变量结合起来才能真正起到高级诊断的作用,在这里抛砖引玉罗列出部分诊断变量:

sgauso --  该变量可以用于判断use_stored_outlines

SQL> oradebug setmypid;
Statement processed.

Default use_stored_outlines=false;

SQL> oradebug dumpvar sga sgauso
qolprm sgauso_ [060021418, 06002143C) =
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

SQL> alter system set use_stored_outlines=true;
System altered.

SQL> oradebug dumpvar sga sgauso
qolprm sgauso_ [060021418, 06002143C) =
00000001 45440007 4C554146 00000054 00000000 00000000 00000000 00000000 00000000

45440007 4C554146 ==>      DEFAULT,意为优化器使用DEFAULT category中存放的outline.


ksmvpa -- the size of the variable part of the pga

SQL> oradebug dumpvar pga ksmvpa
b4 ksmvpa_ [0068AA6B4, 0068AA6B8) = 0000E920          -- 59680 bytes

kkjsre        -- The SGA variable kkjsre must be 1 for jobs to execute automatically.

SQL> oradebug  dumpvar sga kkjsre
word kkjsre_ [060025760, 060025764) = 00000001

SQL> exec  dbms_ijob.set_enabled(false);
PL/SQL procedure successfully completed.

SQL> oradebug  dumpvar sga kkjsre
word kkjsre_ [060025760, 060025764) = 00000000

kcmsmx          --the MAX reasonable scn 

SQL> oradebug dumpvar pga kcmsmx
kscn kcmsmx_ [00B7E811C, 00B7E8124) = CA7F0000 00000C6C         -- 11.2.0.2

SQL> oradebug dumpvar pga kcmsmx
kscn kcmsmx_ [0068AB02C, 0068AB034) = A701C000 00000B3D         -- 10.2.0.4 

Notice the MAX scn allowed on 11.2 is far higher that that at earlier releases
(due to the fix from bug 9254170).
Use the fix for 9254170 on ALL DBs with an SCN rate set to match between versions.
Avoid excessive SCN generation rates.

kcvlcm - logging checkpoints to alert - FALSE

SQL> show parameter log_checkpoints_to_alert

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_checkpoints_to_alert             boolean     TRUE

SQL> oradebug dumpvar sga kcvlcm
word kcvlcm_ [060019E88, 060019E8C) = 00000001

SQL> alter system set log_checkpoints_to_alert=false;

System altered.

SQL> oradebug dumpvar sga kcvlcm
word kcvlcm_ [060019E88, 060019E8C) = 00000000

kcvcpr - false (a checkpoint is requested)

SQL> oradebug dumpvar sga  kcvcpr
word kcvcpr_ [060019E90, 060019E94) = 00000000

kcvcpf - false (checkpointing fast)

kcvgcw -false a global checkpointing is waiting

SQL> oradebug dumpvar sga kcvgcw
sword kcvgcw_ [060025748, 06002574C) = 00000000

kcfcpd - true if checkpoint writes done

kcvblg  kcvblg[0] is s 1 incidate some file/s in backup mode

SQL> oradebug dumpvar sga kcvblg
ub4 * kcvblg_ [060019E18, 060019E20) = 9A248508 00000000

SQL> oradebug peek 0x9A248508 100
[09A248508, 09A24856C) =
00000001 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000002 00000000
00000002 00000002 00000000 00000000 ...

alter database begin backup;

SQL>  oradebug peek 0x9A248508 100
[09A248508, 09A24856C) =
00000001 00000003 00000003 00000003 00000003
00000003 00000003 00000003 00000003 00000003
00000003 00000003 00000003 00000003 ...

kcsgscn_           -- current scn 

SQL> oradebug dumpvar sga kcsgscn_
kcslf kcsgscn_ [0600122B0, 0600122E0) =
01C7DB4B 00000000 00000000 00000000              01C7DB4B  -- 29875019
0001E907 00000000 00000000 00000000
00000000 00000000 60011F90 00000000              60011F90 is fixed

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
   29875044

kslwlst_  --  waiter list latch

SQL> oradebug dumpvar sga kslwlst
ksllt kslwlst_ [200040AC, 20004174) =
00000000 00000009 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 ...

kcfdfk -- 2 * db_files

SQL> oradebug dumpvar sga kcfdfk
kfil kcfdfk_ [060017EF0, 060017EF4) = 00000190            --400    when db_files=200 

SQL> alter system set db_files=2000 scope=spfile;
System altered.

SQL> oradebug dumpvar sga kcfdfk
kfil kcfdfk_ [060017EF0, 060017EF4) = 00000FA0             --4000

kcffsz  Address of Datafile Size 

So kcbzib() passes in information about how many blocks to read
( and where to start reading from ) The assertion is 

  if (bno < 2 || bno + nblks - 1 > (kcflsz[fno] ? kcflsz[fno] : kcffsz[fno]))

              ==============================================================

              .......NOTE: This section is removed after 9g..................

  {
     /* if the user gave us the block number then this is an external error,
      * but if the dba was internally generated then this is a corruption of
      * some kind. */
     ASSERTNM5(usrdba, OERINM("kcfrbd_2"),
               fno, bno, nblks, kcflsz[fno], kcffsz[fno]);

SQL> oradebug dumpvar sga kcffsz
ub4 * kcffsz_ [060017F20, 060017F28) = 9988E700 00000000

SQL> oradebug peek 0x9988E700 200
[09988E700, 09988E7C8) =
00000001
0007B200
00030980
0000AF00
00067CA0
00003200
00280000
00000500
00000000
00000A00
00000000
00000000
00000600
00040000 ...

SQL> select blocks,to_char(blocks,'XXXXXXXX') hex_blocks from v$datafile;

    BLOCKS HEX_BLOCK
---------- ---------
    504320     7B200
    199040     30980
     44800      AF00
    425120     67CA0
     12800      3200
   2621440    280000
      1280       500
         0         0
      2560       A00
     38400      9600
      6400      1900

    BLOCKS HEX_BLOCK
---------- ---------
      1536       600
    262144     40000

13 rows selected.

kcflsz -- like kcffsz

SQL> oradebug dumpvar  sga kcflsz
ub4 * kcflsz_ [060017F28, 060017F30) = 99892588 00000000

SQL> oradebug peek 0x99892588 200
[099892588, 099892650) =
00000000 0007B200 00030980 0000AF00 00067CA0
00003200 00280000 00000500 00000000 00000A00
00000000 00000000 00000600 00040000 ...

THe arguments are file number, block number, number of blocks, kcflsz[fno],
kcffsz[fno]
Error comes out of kcf.c.
Can they try a validate structure cascade on this table and see what happens?
Please update the customer field so that it does not read internal.

kcf.c kcf.c Kernel Cache Files component.
ksl.c Kernel Service layer Latching & Wait-post Implement.
ksm.c Kernel Service Memory component implementation.
kkj.c Kernel Kompiletime Job queue.

还原真实的cache recovery

我们在学习Oracle基础知识的时候会了解到实例恢复(Instance Recovery)或者说崩溃恢复(Crash recovery)的概念,有时候甚至于这2个名词在我们日常的语言中表达同样的意思。实际上Instance Recovery与Crash Recovery是存在区别的:针对单实例(single instance)或者RAC中所有节点全部崩溃后的恢复,我们称之为Crash Recovery。而对于RAC中的某一个节点失败,存活节点(surviving instance)试图对失败节点线程上redo做应用的情况,我们称之为Instance Recovery。

不管是Instance Recovery还是Crash Recovery,都由2个部分组成:cache recovery继以transaction recovery。

根据官方文档的介绍,Cache Recovery也叫Rolling Forward,就是我们常说的前滚;而Transaction Recovery也叫Rolling Back,就是我们常说的回滚。前滚和回滚贯穿Oracle恢复的基本概念,是我们入门必要学习的知识,在次不多介绍。

有文事者,必济之以武略。理论学得再好,不实践也无用。所幸Crash Recovery是很容易做成的实验,我们不妨来看一看:

SQL> shutdown abort;
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.

SQL> alter database open;

Crash Recovery将从alter database open开始,我们来观察其日志

==================alert.log====================
alter database open
Tue Jun 14 18:19:53 2011
Beginning crash recovery of 1 threads
 parallel recovery started with 2 processes
Tue Jun 14 18:19:53 2011
Started redo scan
Tue Jun 14 18:19:53 2011
Completed redo scan
 0 redo blocks read, 0 data blocks need recovery
Tue Jun 14 18:19:53 2011
Started redo application at
 Thread 1: logseq 1004, block 1124, scn 17136185
Tue Jun 14 18:19:53 2011
Recovery of Online Redo Log: Thread 1 Group 2 Seq 1004 Reading mem 0
  Mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_2_6v34jokt_.log
  Mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_2_6v34jotq_.log
Tue Jun 14 18:19:53 2011
Completed redo application
Tue Jun 14 18:19:53 2011
Completed crash recovery at
 Thread 1: logseq 1004, block 1124, scn 17156186
 0 data blocks read, 0 data blocks written, 0 redo blocks read
Tue Jun 14 18:19:53 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=16, OS id=7829
Tue Jun 14 18:19:53 2011
Thread 1 advanced to log sequence 1005 (thread open)
Thread 1 opened at log sequence 1005
  Current log# 3 seq# 1005 mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log
  Current log# 3 seq# 1005 mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log
Successful open of redo thread 1
Tue Jun 14 18:19:53 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
Tue Jun 14 18:19:53 2011
SMON: enabling cache recovery
Tue Jun 14 18:19:53 2011
db_recovery_file_dest_size of 204800 MB is 6.81% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 14 18:19:54 2011
Successfully onlined Undo Tablespace 1.
Tue Jun 14 18:19:54 2011
SMON: enabling tx recovery
Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 2
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=17, OS id=7831
Tue Jun 14 18:19:55 2011
Completed: alter database open

注意上述单实例Crash Recovery到数据库打开的整个过程:

  • alter database open
  • Beginning crash recovery of 1 threads
  • Started redo scan
  • Completed redo scan
  • Started redo application at
  • Completed redo application
  • Completed crash recovery at
  • SMON: enabling cache recovery
  • Successfully onlined Undo Tablespace 1
  • SMON: enabling tx recovery
  • Completed: alter database open

从上述步骤中我们可以看到三种恢复名词,即:

  • crash recovery
  • cache recovery
  • tx recovery

这和官方文档所描述的Crash Recovery概念是不一致的,我们现在来理清这几种recovery。

crash recovery包含对redo的scan和application,显然其完成的是Rolling Forward前滚的工作,告警日志中出现的crash recovery等同于官方文档中介绍的”cache recovery”,我们可以将” Completed crash recovery”看做前滚完成的标志。而tx recovery从字面就可以看出实际上是Transaction Recovery,tx recovery发生在Undo Tablespace online之后(回滚事务的前提是Undo可用),数据完成打开操作之前(“Completed: alter database open”)。注意tx recovery并不要求数据库打开前完成,仅仅是在数据库打开之前由smon启动(“SMON: enabling tx recovery”)。

剩下的唯一的问题是,这里的cache recovery是什么?显然它不是官方文档中所描述的”cache recovery”,几乎没有任何文档介绍存在这样一个recovery操作,这也是本文重点要介绍的。

我们来看另一个演示,这个演示用以说明cache recovery还存在于最普通的不包含Crash Recovery的数据库打开过程中:

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.

SQL> alter database open;
Database altered.

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

==================alert.log====================
alter database open
Tue Jun 14 18:43:52 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=14, OS id=8133
Tue Jun 14 18:43:52 2011
Thread 1 opened at log sequence 1005
Current log# 3 seq# 1005 mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_3_6v34jpmp_.log
Current log# 3 seq# 1005 mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_3_6v34jpyn_.log
Successful open of redo thread 1
Tue Jun 14 18:43:52 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
Tue Jun 14 18:43:52 2011
SMON: enabling cache recovery
Tue Jun 14 18:43:53 2011
Successfully onlined Undo Tablespace 1.
Tue Jun 14 18:43:53 2011
SMON: enabling tx recovery
Database Characterset is UTF8
Opening with internal Resource Manager plan
where NUMA PG = 1, CPUs = 2
replication_dependency_tracking turned off (no async multimaster replication found)
Tue Jun 14 18:43:53 2011
Incremental checkpoint up to RBA [0x3ed.624.0], current log tail at RBA [0x3ed.944.0]
Tue Jun 14 18:43:53 2011
Starting background process QMNC
QMNC started with pid=15, OS id=8135
Tue Jun 14 18:43:53 2011
Completed: alter database open

因为是clean shutdown,所以这里不存在crash recovery。但这里同样出现了”SMON: enabling cache recovery”,可见cache recovery是每次实例启动instance startup必要执行的一种恢复操作。但问题是,这个恢复操作到底针对何种对象?

实际上cache recovery所要恢复的是rowcache,也就是我们常说的字典缓存(dictionary cache)。关于这个结论,肯定有很多人要问我这样说的依据是什么,对应于这个”cache recovery”的问题,我们很难从google中得到一些启示,因为它和官方文档所描述的”cache recovery-rolling forward”存在重名的关系。

为了证明cache recovery所恢复的是rowcache,我们需要一个实证,从正式的系统中得到验证。要做到这一点是比较困难的,我们需要Oracle愿意把整个database open的过程变成慢动作来供我们参考,验证要用到一些调试工具,例如gdb或者dbx。

我们首先将实例启动到mount状态,并对执行startup的LOCAL进程做gdb的breakpoint断点调试:

SQL> shutdown abort;
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.

找出LOCAL进程的系统进程号SPID

SQL> select spid from v$process
2  where addr in (
3  select paddr from v$session
4  where sid=(select distinct sid from v$mystat))
5  /

SPID
------------
8326

在实例startup nomount/mount后共享池的library cache就是可用的

SQL> select namespace from v$librarycache where gets!=0;

NAMESPACE
---------------
SQL AREA
TABLE/PROCEDURE

而rowcache则尚未被填充,因为字典缓存来源于自举对象(bootstrap$)和字典基表

SQL> select parameter,count,gets from v$rowcache where count!=0;
no rows selected

另开一个terminal窗口,并执行对LOCAL进程8326的gdb breakpoint调试

[oracle@rh2 ~]$ gdb $ORACLE_HOME/bin/oracle 8326
GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /s01/db_1/bin/oracle...(no debugging symbols found)...done.
Attaching to program: /s01/db_1/bin/oracle, process 8326
Reading symbols from /s01/db_1/lib/libskgxp10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libskgxp10.so
Reading symbols from /s01/db_1/lib/libhasgen10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libhasgen10.so
Reading symbols from /s01/db_1/lib/libskgxn2.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libskgxn2.so
Reading symbols from /s01/db_1/lib/libocr10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libocr10.so
Reading symbols from /s01/db_1/lib/libocrb10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libocrb10.so
Reading symbols from /s01/db_1/lib/libocrutl10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libocrutl10.so
Reading symbols from /s01/db_1/lib/libjox10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libjox10.so
Reading symbols from /s01/db_1/lib/libclsra10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libclsra10.so
Reading symbols from /s01/db_1/lib/libdbcfg10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libdbcfg10.so
Reading symbols from /s01/db_1/lib/libnnz10.so...(no debugging symbols found)...done.
Loaded symbols for /s01/db_1/lib/libnnz10.so
Reading symbols from /usr/lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/libaio.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
0x0000003181a0d8e0 in __read_nocancel () from /lib64/libpthread.so.0

输入断点kcrf_commit_force和kqlobjlod
(gdb) break kcrf_commit_force
Breakpoint 1 at 0x2724b6c

(gdb) break kqlobjlod
Breakpoint 2 at 0x1ac5e8c

在之前的terminal中执行数据库打开操作,因为breakpoint的关系这个open操作会hang住,
这时我们通过观察告警日志来了解恢复进度

SQL> alter database open;                              --这里会hang住

在gdb下输入continue,

(gdb) c
Continuing.

Breakpoint 1, 0x0000000002724b6c in kcrf_commit_force ()

观察告警日志可以发现redo application已经完成,但还未进入cache recovery阶段

alter database open
Tue Jun 14 19:14:33 2011
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Tue Jun 14 19:14:33 2011
Started redo scan
Tue Jun 14 19:14:33 2011
Completed redo scan
39 redo blocks read, 74 data blocks need recovery
Tue Jun 14 19:14:33 2011
Started redo application at
Thread 1: logseq 1006, block 1155
Tue Jun 14 19:14:33 2011
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1006 Reading mem 0
Mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_1_6v34jnkn_.log
Mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_1_6v34jnst_.log
Tue Jun 14 19:14:33 2011
Completed redo application
Tue Jun 14 19:14:33 2011
Completed crash recovery at
Thread 1: logseq 1006, block 1194, scn 17200193
74 data blocks read, 74 data blocks written, 39 redo blocks read
Tue Jun 14 19:14:33 2011
LGWR: STARTING ARCH PROCESSES
ARC0: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC0 started with pid=17, OS id=8656
Tue Jun 14 19:14:33 2011
Thread 1 advanced to log sequence 1007 (thread open)
Thread 1 opened at log sequence 1007
Current log# 2 seq# 1007 mem# 0: /flashcard/oradata/G10R2/onlinelog/o1_mf_2_6v34jokt_.log
Current log# 2 seq# 1007 mem# 1: /s01/flash_recovery_area/G10R2/onlinelog/o1_mf_2_6v34jotq_.log
Successful open of redo thread 1
Tue Jun 14 19:14:33 2011
ARC0: Becoming the 'no FAL' ARCH
ARC0: Becoming the 'no SRL' ARCH
ARC0: Becoming the heartbeat ARCH
db_recovery_file_dest_size of 204800 MB is 6.81% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Tue Jun 14 19:14:37 2011
Incremental checkpoint up to RBA [0x3ef.3.0], current log tail at RBA [0x3ef.3.0]

且此时rowcache仍未被填充

SQL> select parameter,count,gets from v$rowcache where count!=0;
no rows selected

在gdb界面下再次执行continue 2次

(gdb) c
Continuing.

Breakpoint 1, 0x0000000002724b6c in kcrf_commit_force ()
(gdb) c
Continuing.

Breakpoint 2, 0x0000000001ac5e8c in kqlobjlod ()

观察告警日志可以发现已开始cache recovery,但也卡陷在cache recovery上,这保证我们的演示不受骚扰

Tue Jun 14 19:16:44 2011
SMON: enabling cache recovery

此时rowcache中出现唯一的一个dc_objects对象

select parameter,count,gets from v$rowcache where count!=0;

PARAMETER                             COUNT       GETS
-------------------------------- ---------- ----------
dc_objects                                1          1

这个对象是什么呢?也许你已经猜到了,我们做一个rowcache dump来看一下:

SQL> ALTER SESSION SET EVENTS 'immediate trace name row_cache level 10';

================row_cache trace===================

BUCKET 43170:
row cache parent object: address=0x92326060 cid=8(dc_objects)
hash=f3d1a8a1 typ=11 transaction=(nil) flags=00000001
own=0x92326130[0x9230f628,0x9230f628] wat=0x92326140[0x92326140,0x92326140] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
set=0, complete=FALSE
data=
00000000 4f42000a 5453544f 24504152 00000000 00000000 00000000 00000000
00000000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
BUCKET 43170 total object count=1

可以看到上述dc_objects尚未完成加载(status=EMPTY & complete=FALSE ),那么这是一个什么object呢?

4f42000a 5453544f 24504152 => 转换为文本即:OB TSTO$PAR也就是BOOTSTRAP$

换而言之在cache recovery时第一个恢复的字典缓存对象是BOOTSTRAP$,这并不出乎我们的意料。

启动实例的LOCAL进程的等待事件为instance state change,这是常规情况下我们观察不到得

SQL> select event,p1text,p1 from v$session where wait_class!='Idle';

EVENT                                    P1TEXT                                           P1
---------------------------------------- ---------------------------------------- ----------
instance state change                    layer                                             2

在gdb界面下再次continue,将载入更多的rowcache对象

(gdb) c
Continuing.

Breakpoint 2, 0x0000000001ac5e8c in kqlobjlod ()

BUCKET 37:
row cache parent object: address=0x916cd980 cid=3(dc_rollback_segments)
hash=5fed2a24 typ=9 transaction=(nil) flags=000000a6
own=0x916cda50[0x916cda50,0x916cda50] wat=0x916cda60[0x916cda60,0x916cda60] mode=N
status=VALID/INSERT/-/FIXED/-/-/-/-/-
data=
00000000 00000000 00000001 00000009 59530006 4d455453 00000000 00000000
00000000 00000000 00000000 00000000 00000003 00000000 00000000 00000000
00000000 00000000 00000000 00000000
BUCKET 37 total object count=1

595300064d455453 -> SYSTEM 属于dc_rollback_segments 也就是著名的system回滚段

BUCKET 55556:
row cache parent object: address=0x916d8cd0 cid=8(dc_objects)
hash=ce89d903 typ=11 transaction=(nil) flags=00000001
own=0x916d8da0[0x9230f628,0x9230f628] wat=0x916d8db0[0x916d8db0,0x916d8db0] mode=S
status=EMPTY/-/-/-/-/-/-/-/-
set=0, complete=FALSE
data=
00000000 5f430006 234a424f 00000000 00000000 00000000 00000000 00000000
00000000 00000005 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
BUCKET 55556 total object count=1

5f430006 234a424f -> C_OBJ# 是著名的bootstrap对象之一,可以从$ORACLE_HOME/rdbms/admin/sql.bsq中找到它

create rollback segment SYSTEM tablespace SYSTEM
storage (initial 50K next 50K)
/
create cluster c_obj# (obj# number)
pctfree 5 size 800                           /* don't waste too much space */
/* A table of 32 cols, 2 index, 2 col per index requires about 2K.
* A table of 10 cols, 2 index, 2 col per index requires about 750.
*/
storage (initial 130K next 200k maxextents unlimited pctincrease 0)
/* avoid space management during IOR I */
/

我们还可以通过v$rowcache_parent视图来了解dictionary cache的情况

SQL> col cache_name for a20
SQL> col keystr for a31
SQL> set linesize 200
SQL> select address,cache_name,existent,lock_mode,saddr,substr(key,1,30) keystr from v$rowcache_parent;

ADDRESS          CACHE_NAME           E  LOCK_MODE SADDR            KEYSTR
---------------- -------------------- - ---------- ---------------- -------------------------------
00000000916CCE20 dc_tablespaces       N          0 00               000000000000000000000000000000
00000000916CD980 dc_rollback_segments Y          0 00               000000000000000000000000000000
0000000092326060 dc_objects           Y          0 00               000000000A00424F4F545354524150
00000000916D8CD0 dc_objects           N          3 000000009BD91328 000000000600435F4F424A23000000
00000000916DA830 dc_object_ids        Y          0 00               380000000000000000000000000000

可以看到持有row cache lock的会话是'000000009BD91328',
且该dc_objects对象还处于non-existent状态,
换而言之真正装载rowcache的是启动实例的LOCAL服务进程

SQL> select sid,program,event,p1,p2,p3 from v$session where saddr='000000009BD91328';

SID PROGRAM                                          EVENT                                    P1   P2 P3
----- ------------------------------------------------ ---------------------------------------- -- ---- --
3294 sqlplus@rh2.oracle.com (TNS V1-V3)               db file scattered read                    1  378  3

该进程正在等待db file scattered read,fileid->1,block-378,这些块属于BOOTSTRAP$表

BOOTSTRAP$对象已从rowcache被载入到library cache中

SQL> select kglhdadr,kglnaobj from x$kglob where kglobtyp=2 and kglnaobj not like 'X$%';

KGLHDADR             KGLNAOBJ
-------------------- --------------------
0000000092326990     BOOTSTRAP$

SQL> select owner||'.'||Name from v$db_object_cache where type='TABLE' and name not like 'X$%';

OWNER||'.'||NAME
--------------------------------------------------------------------------------
SYS.BOOTSTRAP$

初步总结:

  1. 在数据库正式open前需要恢复字典缓存,这个步骤被称为cache recovery,其实是row cache recovery。与官方文档中描述的”cache recovery”不同,row cache recovery应当是Oracle Internal的叫法。
  2. 实际执行row cache recovery的不是SMON进程,而是启动实例的服务进程

Oracle内部视图:x$targetrba

SQL> desc x$targetrba
 Name
 -----------------------------------------
 ADDR                               N/A

 INDX                               N/A

 INST_ID                            N/A

 TOTALLOGSZ                         select sum(bytes)/redo_standard_size from v$log

 LGLOGSZ                            select bytes/512 from v$Log where status='CURRENT'

 CUR_EST_RCV_READS                  Number of dirty buffers in the buffer cache.
                                    In the Standard Edition, this column is always null.       

 ACTUAL_REDO_BLKS                   Current actual number of redo blocks required for recovery           

 TARGET_RBA_SEQ                     target redo block address log sequence    

 TARGET_RBA_BNO                     target redo block address block number

 TARGET_RBA_BOF                     target redo block address offset

 MIN_LAG                            Current target number of redo blocks that must be processed for recovery.
                                    This value is the minimum value of the following 3 columns,
                                    and identifies which of the 3 user-defined limits determines checkpointing.               

 LOGFILESZ                          Maximum number of redo blocks required to guarantee that a
                                    log switch does not occur before the checkpoint completes.  

 CT_LAG                             Number of redo blocks that need to be processed during
                                    recovery to satisfy the LOG_CHECKPOINT_TIMEOUT parameter.
                                    The value displayed is not meaningful unless that parameter has been set.

 CI_LAG                             Number of redo blocks that need to be processed during recovery to
                                    satisfy the LOG_CHECKPOINT_INTERVAL parameter.
                                    The value displayed is not meaningful unless that parameter has been set.

Oracle内部视图:x$ktfbfe

x$ktfbue:kernel transaction, file bitmap free extents,Free extent bitmap in file header for LMT (equivalent to fet$ in DMT); check dba_free_space view definition,ktfb –space/spcmgmt support for bitmapped space manipulation of files/tablespaces

KTFBFE means K[Kernel] T[Transaction] F[File] B[Bitmap] F[Free] E[Extents]

SQL>  desc x$ktfbfe;
 Name                                     
 ----------------------------------------- 
 ADDR               N/A                            
 INDX               N/A                      
 INST_ID            N/A                         
 KTFBFETSN          TS#  containing this free extent                       
 KTFBFEFNO          Relative number of the file containing the free extent                             
 KTFBFEBNO          Starting block number of the free extent                                
 KTFBFEBLKS         Size of the extent in ORACLE blocks

沪ICP备14014813号-2

沪公网安备 31010802001379号