了解你所不知道的SMON功能(三):清理obj$基表

SMON的作用还包括清理obj$数据字典基表(cleanup obj$)

OBJ$字典基表是Oracle Bootstarp启动自举的重要对象之一:

SQL> set linesize 80 ;
SQL> select sql_text from bootstrap$ where sql_text like 'CREATE TABLE OBJ$%';

SQL_TEXT
--------------------------------------------------------------------------------
CREATE TABLE OBJ$("OBJ#" NUMBER NOT NULL,"DATAOBJ#" NUMBER,"OWNER#" NUMBER NOT N
ULL,"NAME" VARCHAR2(30) NOT NULL,"NAMESPACE" NUMBER NOT NULL,"SUBNAME" VARCHAR2(
30),"TYPE#" NUMBER NOT NULL,"CTIME" DATE NOT NULL,"MTIME" DATE NOT NULL,"STIME"
DATE NOT NULL,"STATUS" NUMBER NOT NULL,"REMOTEOWNER" VARCHAR2(30),"LINKNAME" VAR
CHAR2(128),"FLAGS" NUMBER,"OID$" RAW(16),"SPARE1" NUMBER,"SPARE2" NUMBER,"SPARE3
" NUMBER,"SPARE4" VARCHAR2(1000),"SPARE5" VARCHAR2(1000),"SPARE6" DATE) PCTFREE
10 PCTUSED 40 INITRANS 1 MAXTRANS 255 STORAGE (  INITIAL 16K NEXT 1024K MINEXTEN
TS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 OBJNO 18 EXTENTS (FILE 1 BLOCK 121))

 

触发场景

OBJ$基表是一张低级数据字典表,该表几乎对库中的每个对象(表、索引、包、视图等)都包含有一行记录。很多情况下,这些条目所代表的对象是不存在的对象(non-existent),引起这种现象的一种可能的原因是对象本身已经被从数据库中删除了,但是对象条目仍被保留下来以满足消极依赖机制(negative dependency)。因为这些条目的存在会导致OBJ$表不断膨胀,这时就需要由SMON进程来删除这些不再需要的行。SMON会在实例启动(after startup of DB is started cleanup function again)时以及启动后的每12个小时执行一次清理任务(the cleanup is scheduled to run after startup and then every 12 hours)。

我们可以通过以下演示来了解SMON清理obj$的过程:

SQL>  BEGIN
  2      FOR i IN 1 .. 5000 LOOP
  3      execute immediate ('create synonym gustav' || i || ' for
  4  perfstat.sometable');
  5      execute immediate ('drop   synonym gustav' || i );
  6      END LOOP;
  7    END;
  8    /

PL/SQL procedure successfully completed.

SQL> startup force;
ORACLE instance started.

Total System Global Area 1065353216 bytes
Fixed Size                  2089336 bytes
Variable Size             486542984 bytes
Database Buffers          570425344 bytes
Redo Buffers                6295552 bytes
Database mounted.
Database opened.

SQL>   select count(*) from user$ u, obj$ o
  2        where u.user# (+)=o.owner# and o.type#=10 and not exists
  3        (select p_obj# from dependency$ where p_obj# = o.obj#);

  COUNT(*)
----------
      5000

SQL> /

  COUNT(*)
----------
      5000

SQL> /

  COUNT(*)
----------
      4951

SQL> oradebug setospid 18457;
Oracle pid: 8, Unix process pid: 18457, image: oracle@rh2.oracle.com (SMON)

SQL> oradebug event 10046 trace name context forever ,level 1;
Statement processed.

SQL> oradebug tracefile_name;
/s01/admin/G10R2/bdump/g10r2_smon_18457.trc

select o.owner#,
       o.obj#,
       decode(o.linkname,
              null,
              decode(u.name, null, 'SYS', u.name),
              o.remoteowner),
       o.name,
       o.linkname,
       o.namespace,
       o.subname
  from user$ u, obj$ o
 where u.use r#(+) = o.owner#
   and o.type# = :1
   and not exists
 (select p_obj# from dependency$ where p_obj# = o.obj#)
 order by o.obj#
   for update

select null
  from obj$
 where obj# = :1
   and type# = :2
   and obj# not in
       (select p_obj# from dependency$ where p_obj# = obj$.obj#)

delete from obj$ where obj# = :1

/* 删除过程其实较为复杂,可能要删除多个字典基表上的记录 */

现象

我们可以通过以下查询来了解obj$基表中NON-EXISTENT对象的条目总数(type#=10),若这个总数在不断减少说明smon正在执行清理工作
obj$_type#=10

    select trunc(mtime), substr(name, 1, 3) name, count(*)
      from obj$
     where type# = 10
       and not exists (select * from dependency$ where obj# = p_obj#)
     group by trunc(mtime), substr(name, 1, 3);

      select count(*)
        from user$ u, obj$ o
       where u.user#(+) = o.owner#
         and o.type# = 10
         and not exists
       (select p_obj# from dependency$ where p_obj# = o.obj#);

如何禁止SMON清理obj$基表

我们可以通过设置诊断事件event=’10052 trace name context forever’来禁止SMON清理obj$基表,当我们需要避免SMON因cleanup obj$的相关代码而意外终止或spin从而开展进一步的诊断时可以设置该诊断事件。在Oracle并行服务器或RAC环境中,也可以设置该事件来保证只有特定的某个节点来执行清理工作。

10052, 00000, "don't clean up obj$"

alter system set events '10052 trace name context forever, level 65535';

Problem Description: We are receiving the below warning during db startup:
WARNING: kqlclo() has detected the following :
Non-existent object 37336 NOT deleted because an object
of the same name exists already.
Object name: PUBLIC.USER$

This is caused by the SMON trying to cleanup the SYS.OJB$.
SMON cleans all dropped objects which have a SYS.OBJ$.TYPE#=10. 
This can happen very often when you create an object that have the same name as a public synonym. 

When SMON is trying to remove non-existent objects and fails because there are duplicates, 
multiple nonexistent objects with same name.
This query will returned many objects with same name under SYS schema:

select o.name,u.user# from user$ u, obj$ o where u.user# (+)=o.owner# and o.type#=10 
and not exists (select p_obj# from dependency$ where p_obj# = o.obj#);

To cleanup this message:

Take a full backup of the database - this is crucial. If anything goes wrong during this procedure, 
your only option would be to restore from backup, so make sure you have a good backup before proceeding. 
We suggest a COLD backup. If you plan to use a HOT backup, you will have to restore point in time if any problem happens

Normally DML against dictionary objects is unsupported, 
but in this case we know exactly what the type of corruption, 
also you are instructing to do this under guidance from Support.

Data dictionary patching must be done by an experienced DBA. 
This solution is unsupported. 
It means that if there were problems after applying this solution, a database backup must be restored.

1. Set event 10052 at parameter file to disable cleanup of OBJ$ by SMON

EVENT="10052 trace name context forever, level 65535"

2. Startup database in restricted mode

3. Delete from OBJ$, COMMIT

SQL> delete from obj$ where (name,owner#) in ( select o.name,u.user# from user$ u, obj$ o
where u.user# (+)=o.owner# and o.type#=10 and not exists (select p_obj# from
dependency$ where p_obj# = o.obj#) );

SQL> commit;

SQL> Shutdown abort.

4. remove event 10052 from init.ora

5. Restart the database and monitor for the message in the ALERT LOG file

Comments

  1. Hdr: 3530448 9.2.0.4 RDBMS 9.2.0.4 TXN MGMT LOCAL PRODID-5 PORTID-23 3139157
    Abstract: ORA-601 IN KQLCLO FUNCTION CAUSES SMON TO STOP CLEANING UP DELETED OBJ$ OBJECTS
    
    PROBLEM:
    --------
    Ct has many deleted rows in obj$ which should be cleaned up by kqlclo smon 
    function. This is scheduled to run after startup and then every 12 hours.
    
    During execution of this cleanup function often occours ORA-601 "Cleanup Lock 
    Conflict" error from delete of obj$ row. After this error all cleanup is 
    interupted. No error message is written into alert.log file.
    
    There stay thousands of rows in obj$ which should be cleaned up. Next time is 
    function scheduled to execute after 12 hours, but during that time new objects 
    in obj$ are created which should be deleted, so number of non-deleted objects 
    is still increasing.
    
    This causes performance problems - after startup of DB is started cleanup 
    function again. When Ct needs then shutdown database, it is hanged even over 
    40 minutes, till cleanup is completed.
    Other problem is, that count of non deleted objects in obj$ is increasing.
    
    There is no way how to manually remove them - only repeat shutdown/startup or 
    manually by calling kqlclo function from oradebug (which is unsupported).
    It is also not possible to change 12 hour frequency as it is hard coded in 
    ktmmon.
    If we get ORA-601 for one objects, at least others should be deleted.
    
    DIAGNOSTIC ANALYSIS:
    --------------------
    Producing 10046 event from SMON shows:
    =====================
    PARSING IN CURSOR #3 len=32 dep=1 uid=0 oct=7 lid=0 tim=1793719793264 
    hv=3821382080 ad='3badbeb0'
    delete from obj$ where obj# = :1
    END OF STMT
    PARSE #3:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1793719793257
    EXEC 
    #1:c=290000,e=280180,p=0,cr=6659,cu=579,mis=0,r=0,dep=1,og=4,tim=1793720091068
    ERROR #1:err=601 tim=183676937
    WAIT #0: nam='log file sync' ela= 3444 p1=4532 p2=0 p3=0
    *** 12:56:06.498
    WAIT #0: nam='smon timer' ela= 69045262 p1=300 p2=0 p3=0
    =======================
    After err=601 SMON stops executing cleanup and gets iddle - wait event 'smon 
    timer'.
    Folowing selects shows not cleaned objects:
        select trunc (mtime),
              substr(name,1,3) name,
              count(*)
         from obj$
        where type#=10
         and  not exists ( select * from dependency$
                            where obj#=p_obj# )
        group by  trunc(mtime), substr(name,1,3);
    
        TRUNC(MTIME)      NAM COUNT(*)
        ----------------- --- ----------
        23-mar-2004 00:00 GL_ 5894
        24-mar-2004 00:00 GL_ 5947
    
    This is system with Oracle Applications, which is creating such high number of 
    deleted objects - no way to change this.
    
    WORKAROUND:
    -----------
    execute kqlclo function using oradebug - not supported
    
    RELATED BUGS:
    -------------
    BUG 3001270
    
    REPRODUCIBILITY:
    ----------------
    9.2.0.4 Solaris SPARC 64bit - Customer system
    9.2.0.4 Windows 2000 - inhouse with testcase
    9.2.0.4 Linux - inhouse with testcase
    
    TEST CASE:
    ----------
    1. Create type#10 objects using:
      BEGIN
        FOR i IN 1 .. 5000 LOOP
        execute immediate ('create synonym gustav' || i || ' for 
    perfstat.sometable');
        execute immediate ('drop   synonym gustav' || i );
        END LOOP;
      END;
      /
    2. startup force DB.
    3. monitor cleanup progress using:
          select count(*) from user$ u, obj$ o
          where u.user# (+)=o.owner# and o.type#=10 and not exists
          (select p_obj# from dependency$ where p_obj# = o.obj#);
    4. when count stops decreasing, check that smon is iddle - 'smon timer' in 
    v$session_wait. There stay objects which are not cleaned up.
    10046 trace of smon from this testcase will show err=601 error.
    
  2. admin says
    Hdr: 1310142 8.1.6 RDBMS 8.1.6 DICTIONARY PRODID-5 PORTID-610 ORA-600
    Abstract: SMON CRASHES INSTANCE WITH ERROR ORA-600 [16224]
    
    =========================    
    PROBLEM:
    
    10-node OPS.  On May 17, SMON encounters ORA-600 [16224] and terminates the 
    instance.
    This had happened once before on 5-May-2000, however when SMON got ORA-600 
    [16224] on 5-May, it did not terminate the instance.
    
    =========================    
    DIAGNOSTIC ANALYSIS:
    smon_23888_pyrt.trc.f2n7 trace file shows that smon was executing the 
    following sql at the time it got the error:
    
    select o.owner#,o.obj#,decode(o.linkname,null, 
    decode(u.name,null,'SYS',u.name),
    o.remoteowner), o.name,o.linkname,o.namespace from user$ u, obj$ o where 
    u.user#
    (+)=o.owner# and o.type#=:1 and not exists (select p_obj# from dependency$ 
    where
     p_obj# = o.obj#)
    
    The value of the bind variable for type# was 10.
    
    This looks very similar to bug 1279050.  The stack trace below is the same as 
    bug 1279050.  Based on the discussion in that bug, I had the customer issue 
    the above query to see how many rows in obj$ were for non-existent objects.  
    The query came back with 9 rows:
    
       OWNER#       OBJ# DECODE(O.LINKNAME,NULL,DECODE( NAME                       
        LINKNAME    NAMESPACE
    ---------- ---------- ------------------------------ 
    ------------------------------ ---------- ----------
            64       9887 SYSWIR                         
    MD_TRANSACTION_EXCEPTION_LOG                       1
             1       9888 PUBLIC                         
    MD_TRANSACTION_EXCEPTION_LOG                       1
            22       9905 PAYMGR                         CSD_OUT                   
                             1
             1       9906 PUBLIC                         CSD_OUT                   
                             1
            22       9910 PAYMGR                         L_CSD_PROG                
                             1
             1       9911 PUBLIC                         L_CSD_PROG                
                             1
            65       9912 CONDET                         MD_TRANSACTIONS           
                             1
            22       9926 PAYMGR                         L_REF_SRCE                
                             1
             1       9927 PUBLIC                         L_REF_SRCE                
                             1
    
    =========================   
    WORKAROUND:
    None
    
    =========================   
    RELATED BUGS:
    1279050
    
    =========================   
    REPRODUCIBILITY:
    
     1. State if the problem is reproducible; indicate where and predictability
    Not reproducible at will.
     2. List the versions in which the problem has reproduced
    8.1.6
    
    =========================    
    TESTCASE:
    None
    
    ========================
    STACK TRACE:
    *** ID:(7.1) 2000-05-17 01:25:50.595
    *** 01:25:50.595
    ksedmp: internal or fatal error
    ORA-600: internal error code, arguments: [16224], [], [], [], [], [], [], []
    ----- Call Stack Trace -----
    calling              call     entry                argument values in hex
    location             type     point                (? means dubious value)
    -------------------- -------- -------------------- 
    ----------------------------
    ksedmp+00d0          bl       ksedst               119FE2C4 ?
    ksfdmp+0018          bl       ksedmp               20009594 ?
    kgeriv+00f0          bl       _ptrgl
    kgeasi+00cc          bl       kgeriv               65041 ? 594D4752 ? 0 ? 0 ?
                                                       0 ?
    kqlclo+05e8          bl       kgeasi               20009520 ? 20118264 ? 3F60 
    ?
                                                       2 ? 0 ? 20001F68 ? 0 ?
                                                       4C12D663 ?
    ktmmon+0bd0          bl       kqlclo
    ksbrdp+0244          bl       _ptrgl
    opirip+0224          bl       ksbrdp
    opidrv+0498          bl       opirip               10001490 ? 0 ? 0 ?
    sou2o+0028           bl       opidrv               32 ? 0 ? 0 ?
    main+0248            bl       sou2o                F02934F0 ? F02934E0 ?
                                                       2FF22D00 ? 0 ?
    __start+0088         bl       main                 0 ? 0 ?
    
    =========================   
    SUPPORTING INFORMATION:
    Files on bug$:[bug1310142]
    
    =========================    
    alert_pyrt.log - alert log on 5/17/00, showing ORA-600 [16224] error. 
    ora600_event.log - the above results of the query
    smon_18710_pyrt.trc - The smon trace file from 5/5/00. 
    smon_23888_pyrt.trc - The smon trace file from 5/17/00 
    
    Norman, can you check if these 9 rows are static...that is, are they always 
    returned when the query is run?
    
    Had ct. re-run the query.  Results are in orcl_supp1.txt.  Comparing this to 
    ora600_event.log.  No, the obj# numbers are changing.  So the rows are not 
    static.  However, these two queries are now days apart.  They had turned on 
    event 10052, but then commented it out when they had problems with the node 
    crashing.  I don't think that event 10052 had anything to do with their node 
    crashing, but it was the only thing that changed, so the event was commented 
    out. Given these circumstances, it is understandable that the results from the 
    query are changing..
    
    Had ct. rerun the query again.  The results are in orcl_supp2.log.
    This shows the output is changing, with obj# 10365 - 10368.
    orcl_supp1.txt had 16 rows between obj#10337 and 10354.
    
    Given that the entries in obj$ of type#=10 are changing, it doesn't seem that 
    there is any corruption in obj$ itself which is causing failure to clean up 
    objects.  
    
    I have started working on this.  I will have more detailed
    information by tomorrow.
    
    In the meanwhile, can you tell me what "CSD_OUT" is?  I am
    particularly interested in the one that resides in the 
    PUBLIC schema.  Also, can you tell me how this object is
    used?  Thanks.
    
    ]] During cleanup of OBJ$, SMON no longer crashes
    ]] the instance with error ORA-600[16224] 
    
  3. sqlrush says

    我在11.2.0.2下做了这个测试,收获到的结果有点意思。每次设定10046后,smon开始删除过期信息,如果执行查询的话,会出现DFS lock handle等待,删除停止。

  4. loryliu says

    设置10500也可以看到instance在启动的时候在清理obj$SMON: obj$ cleanup beginSMON: obj$ cleanup end. more:0

Trackbacks

  1. […] 了解你所不知道的SMON功能(二):合并空闲区间 了解你所不知道的SMON功能(三):清理obj$基表 了解你所不知道的SMON功能(四):维护col_usage$字典基表 […]

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号