流言终结者: AWR的保留天数和SYSAUX表空间的使用率有关吗？

2011/10/24 by admin 3 Comments

今天在QQ群的技术讨论中有人提及AWR实际保留的天数并非10g的 7天或 11g 的 8天，而是视乎SYSAUX表空间的使用率而定，当SYSAUX表空间空闲空间较多时会将AWR数据保留地更久。

虽然不知道以上这番理论出自那部书籍，但是至少是说的有模有样的，而且网友还告诉我这是他测试过的结果。

实际是这样吗？

我相信这位网友并没有吹牛，他很可能查询dba_hist_snapshot等AWR视图且看到了的确有7天之前的快照仍被保留着，而没有被清理掉。我们来重演他所看到的现场:

测试使用版本11.2.0.2 ， 11g中默认AWR保留8天：

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
----------------------------------------------------------
www.askmac.cn & www.askmac.cn

SQL> col SNAP_INTERVAL for a20
SQL> col RETENTION for a20
SQL> select snap_interval,retention from dba_hist_wr_control;

SNAP_INTERVAL        RETENTION
-------------------- --------------------
+00000 01:00:00.0    +00008 00:00:00.0

以上确认了默认的快照间隔为1小时 ，且保留时间为8天

检查当前SYSAUX表空间的使用率

REM tablespace report

set linesize 200

select a.tablespace_name,
       round(a.bytes_alloc / 1024 / 1024) megs_alloc,
       round(nvl(b.bytes_free, 0) / 1024 / 1024) megs_free,
       round((a.bytes_alloc - nvl(b.bytes_free, 0)) / 1024 / 1024) megs_used,
       round((nvl(b.bytes_free, 0) / a.bytes_alloc) * 100) Pct_Free,
       100 - round((nvl(b.bytes_free, 0) / a.bytes_alloc) * 100) Pct_used,
       round(maxbytes / 1048576) Max
  from (select f.tablespace_name,
               sum(f.bytes) bytes_alloc,
               sum(decode(f.autoextensible, 'YES', f.maxbytes, 'NO', f.bytes)) maxbytes
          from dba_data_files f
         group by tablespace_name) a,
       (select f.tablespace_name, sum(f.bytes) bytes_free
          from dba_free_space f
         group by tablespace_name) b
 where a.tablespace_name = b.tablespace_name(+)
union all
select h.tablespace_name,
       round(sum(h.bytes_free + h.bytes_used) / 1048576) megs_alloc,
       round(sum((h.bytes_free + h.bytes_used) - nvl(p.bytes_used, 0)) /
             1048576) megs_free,
       round(sum(nvl(p.bytes_used, 0)) / 1048576) megs_used,
       round((sum((h.bytes_free + h.bytes_used) - nvl(p.bytes_used, 0)) /
             sum(h.bytes_used + h.bytes_free)) * 100) Pct_Free,
       100 -
       round((sum((h.bytes_free + h.bytes_used) - nvl(p.bytes_used, 0)) /
             sum(h.bytes_used + h.bytes_free)) * 100) pct_used,
       round(sum(f.maxbytes) / 1048576) max
  from sys.v_$TEMP_SPACE_HEADER h,
       sys.v_$Temp_extent_pool  p,
       dba_temp_files           f
 where p.file_id(+) = h.file_id
   and p.tablespace_name(+) = h.tablespace_name
   and f.file_id = h.file_id
   and f.tablespace_name = h.tablespace_name
 group by h.tablespace_name
 ORDER BY 1
/

TABLESPACE_NAME                MEGS_ALLOC  MEGS_FREE  MEGS_USED   PCT_FREE   PCT_USED        MAX
------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
MGMT_AD4J_TS                          200        199          1         99          1      32768
MGMT_ECM_DEPOT_TS                      40         13         27         32         68      32768
MGMT_TABLESPACE                      1350         86       1265          6         94      32768
SYSAUX                                600        295        305         49         51      32768
SYSTEM                                700        231        469         33         67      32768
TEMP                                   21         21          0        100          0      32768
UNDOTBS1                              495        358        137         72         28      32768
USERS                                1950       1243        707         64         36      32768

SYSAUX表空间剩余295MB空间，空闲率较高

因为这套数据库在2011-10-17之后就一直没有打开过，所以Automatic Workload Repository中最早保留的快照信息是在2011-10-10，通过查询dba_hist_snapshot视图可以反映这一点:

select snap_id,
       to_char(begin_interval_time, 'YYYY-MM-DD'),
       to_char(end_interval_time, 'YYYY-MM-DD')
  from dba_hist_snapshot
 order by snap_id;
   SNAP_ID TO_CHAR(BE TO_CHAR(EN
---------- ---------- ----------
        96 2011-10-10 2011-10-10
        97 2011-10-10 2011-10-10
        98 2011-10-10 2011-10-10
        99 2011-10-10 2011-10-10
       100 2011-10-10 2011-10-10
       101 2011-10-10 2011-10-11
       102 2011-10-11 2011-10-11
       103 2011-10-11 2011-10-11
...................
       221 2011-10-17 2011-10-17
       222 2011-10-24 2011-10-24

SQL> select sysdate from dual;

SYSDATE
---------
24-OCT-11

当前的日期是24-OCT-11，而最早的快照信息是在2011-10-10，这样就达成了网友所说的AWR的保留时间并非7或8天，”awr保留天数根据sysaux大小决定。” 或 “默认7天，sysaux足够大这个7天没有意义” 的说法。

事实是这样吗？

不是的!

那么为什么能看到早于7天的快照呢？

回答：不要被所看到的信息所蒙蔽，虽然我们常说事实胜于雄辩或实践是检验真知的唯一，但事情的表象往往会欺骗我们。

以上这个问题的关键点并非在于是否能看到早于7天的snapshot快照信息，而在于当MMON后台进程(该进程负责收集和清理AWR数据)在执行对过期快照清理工作时是否会清除7 或 8 天之前的snapshot，以及MMON后台进程多久才Purge一次AWR Snapshot。

以上这些问题，我们可以通过_swrf_test_action参数和10046 trace搞清楚:

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%_swrf%';

NAME                 VALUE      DESCRIB
-------------------- ---------- --------------------------------------------------
_swrf_test_action    0          test action parameter for SWRF
_swrf_mmon_flush     TRUE       Enable/disable SWRF MMON FLushing
_swrf_mmon_metrics   TRUE       Enable/disable SWRF MMON Metrics Collection
_swrf_metric_frequen FALSE      Enable/disable SWRF Metric Frequent Mode Collectio
t_mode                          n

_swrf_on_disk_enable TRUE       Parameter to enable/disable SWRF
d

_swrf_mmon_dbfus     TRUE       Enable/disable SWRF MMON DB Feature Usage
_swrf_test_dbfus     FALSE      Enable/disable DB Feature Usage Testing

_swrf_test_action 隐藏参数用以调试MMON的行为，设置该参数并10046事件:

SQL> alter session set "_swrf_test_action" = 28; 

Session altered.

SQL> alter session set "_swrf_test_action" = 10; 

Session altered.

[oracle@vrh4 ContentsXML]$ ps -ef|grep mmon
oracle    2872     1  0 18:28 ?        00:00:00 ora_mmon_SBDB
oracle    3446  3289  0 18:44 pts/1    00:00:00 tail -f SBDB_mmon_2872.trc
oracle    3997  3407  0 19:17 pts/2    00:00:00 grep mmon

SQL> oradebug setospid 2872;
Oracle pid: 15, Unix process pid: 2872, image: oracle@vrh4.oracle.com (MMON)

SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.

完成以上操作后等待一段时间，MMON进程的trace文件会陆续写出一些信息，如:

*** 2011-10-24 18:45:24.795
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.

*** 2011-10-24 18:46:24.874
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.

*** 2011-10-24 18:47:24.952
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.

*** 2011-10-24 18:48:25.053
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.

说明MMON每分钟都会自动刷新一定的数据到磁盘上。

此外还可以看到MMON清理过期快照的信息:

*** 2011-10-24 18:58:25.290
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.
KEWRAPM: Beginning one MMON Auto-Purge cycle ...
  KEWRAPM: Finished one MMON Auto-Purge cycle.
KEWRAPC: Auto Purge Action Completed.

*** 2011-10-24 19:28:26.091
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.
KEWRAPM: Beginning one MMON Auto-Purge cycle ..
  KEWRAPM: Finished one MMON Auto-Purge cycle

*** 2011-10-24 19:58:27.041
KEWRAFM: Beginning one MMON Auto-Flush cycle ...
  Finished one MMON Auto-Flush cycle.
KEWRAPM: Beginning one MMON Auto-Purge cycle ...
  KEWRAPM: Finished one MMON Auto-Purge cycle.

可以看到在默认情况下MMON每30分钟会自动去清理一次Automatic Workload Repository自动负载仓库中的过期快照信息，当18:58:25.290的第一次清理工作完成后查询dba_hist_snapshot可以发现过期快照消失了:

SQL> select snap_id,
  2         to_char(begin_interval_time, 'YYYY-MM-DD'),
  3         to_char(end_interval_time, 'YYYY-MM-DD')
  4    from dba_hist_snapshot
  5   order by snap_id;

   SNAP_ID TO_CHAR(BE TO_CHAR(EN
---------- ---------- ----------
       194 2011-10-16 2011-10-16
       195 2011-10-16 2011-10-16
       196 2011-10-16 2011-10-16
       197 2011-10-16 2011-10-16
.................
       222 2011-10-24 2011-10-24

通过以上演示我们可知AWR快照的保留天数与SYSAUX的使用率并无关系，实际控制AWR保留天数的最主要因素是MMON何时、如何地清理过期快照？ MMON的清理操作直接受到dba_hist_wr_control.retention设置值的影响，默认情况10g 为保留7天，而11g为保留8天，MMON只已清理过期的快照。

同时KEWRAPM的trace信息也说明了,默认情况下MMON每30分钟做一次”MMON Auto-Purge cycle”清理工作。

Filed Under: Oracle Tagged With: _swrf_test_action, Auto-Purge, AWR, KEWRAPM, mmon, snapshot

Comments

admin says

2011/10/24 at 22:40

WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged

Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.2.0.2 – Release: 10.2 to 11.2
Information in this document applies to any platform.
***Checked for relevance on 16-Sep-2010***
Symptoms
AWR tables are not being purged according to settings in sys.wrm$_wr_control. Because of this the tables are accumulating more and more rows and the segments associated with these tables become very large.
Cause

Oracle is deciding what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables. For the large AWR tables, we store the snapshot data into partitions. Purging data from these tables means that a partition gets dropped. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.

Based on the above clarifications we can say that if anything is going wrong with the partition split operation then we will end up by having the AWR tables not to be purged.

The root cause for the problems can be diagnosed by triggering the same operations which are executed each time the purge job is fired.
Solution

1) Check how many partitions do exist for the offending table :

select table_name,partition_name
from dba_tab_partitions
where table_name = ‘WRH$_ACTIVE_SESSION_HISTORY’;

2) Try to force the execution of the purge operations :

alter session set “_swrf_test_action” = 72;

This step can finish successfully :

Statement processed

or it can fail:

ORA-02097: parameter cannot be modified because specified value is invalid
ORA-13509: error encountered during updates to a AWR table
ORA-14327: Some index [sub]partitions could not be rebuilt

3) If the operation on step 2 is failing the generated (udump) trace file must be analyzed. In the trace file a partition split statement can be found :

*** KEWROCISTMTEXEC – encountered error: (ORA-14327: Some index [sub]partitions could not be rebuilt
ORA-20001: Component “WRH$_SQLSTA_2446360981_11691” does not exist on index “WRH$_SQLSTAT_INDEX” in schema “SYS”.
ORA-06512: at “SYS.DBMS_I_INDEX_UTL”, line 748
ORA-06512: at “SYS.DBMS_INDEX_UTL”, line 397
ORA-06512: at “SYS.DBMS_INDEX_UTL”, line 556
ORA-06512: at line 1
)
*** SQLSTR: total-len=228, dump-len=228,
STR={alter table WRH$_SQLSTAT split partition
WRH$_SQLSTA_2446360981_11691 at (2446360981,12774) into (partition
WRH$_SQLSTA_2446
360981_11691 tablespace SYSAUX, partition WRH$_SQLSTA_2446360981_12774
tablespace SYSAUX) update indexes}

The split statement must be executed manually :

alter table WRH$_SQLSTAT split partition WRH$_SQLSTA_2446360981_11691 at (2446360981,12774)
into (partition WRH$_SQLSTA_2446 360981_11691 tablespace SYSAUX, partition WRH$_SQLSTA_2446360981_12774 tablespace SYSAUX)
update indexes;

4) Steps 2 and 3 must be executed until setting the event is successful.

5) Check how many partitions do exist for the offending table after all the above actions :
set serveroutput on
declare
CURSOR cur_part IS
SELECT partition_name from dba_tab_partitions
WHERE table_name = ‘WRH$_ACTIVE_SESSION_HISTORY’;

query1 varchar2(200);
query2 varchar2(200);

TYPE partrec IS RECORD (snapid number, dbid number);
TYPE partlist IS TABLE OF partrec;

Outlist partlist;
begin
dbms_output.put_line(‘PARTITION NAME SNAP_ID DBID’);
dbms_output.put_line(‘————————— ——- ———-‘);

for part in cur_part loop
query1 := ‘select min(snap_id), dbid from sys.WRH$_ACTIVE_SESSION_HISTORY partition (‘||part.partition_name||’) group by dbid’;
execute immediate query1 bulk collect into OutList;

if OutList.count > 0 then
for i in OutList.first..OutList.last loop
dbms_output.put_line(part.partition_name||’ Min ‘||OutList(i).snapid||’ ‘||OutList(i).dbid);
end loop;
end if;

query2 := ‘select max(snap_id), dbid from sys.WRH$_ACTIVE_SESSION_HISTORY partition (‘||part.partition_name||’) group by dbid’;
execute immediate query2 bulk collect into OutList;

if OutList.count > 0 then
for i in OutList.first..OutList.last loop
dbms_output.put_line(part.partition_name||’ Max ‘||OutList(i).snapid||’ ‘||OutList(i).dbid);
dbms_output.put_line(‘—‘);
end loop;
end if;

end loop;
end;
/

We should have more partitions now and it is expected that the old partitions will get automatically removed when the retention period will expire.

Once you have identified, and split the partitions, you may want to drop a partition range, by dropping a snapshot range:

DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(
low_snap_id IN NUMBER,
high_snap_id IN NUMBER
dbid IN NUMBER DEFAULT NULL);

回复
admin says

2011/10/24 at 22:41

Suggestions if your SYSAUX Tablespace grows rapidly or too large

Applies to:
Oracle Server – Enterprise Edition – Version: 11.1.0.6 to 11.2.0.2 – Release: 11.1 to 11.2
This problem can occur on any platform.
Symptoms
Observed exceptional SYSAUX tablespace growth and associated overhead purging AWR objects

We anticipate RAC and Exadata will force proportionately higher SYSAUX tablespace growth to maintain performance and auditing statistics. The following example provides some suggestions on how to approach purging SYSAUX OPSTAT based data which can be one of the largest SYSAUX tablespace storage consumers.
Cause
Legacy data from previously collected SQLSETs resulted in partitions for OP_STAT and WRH/WRI based objects

– These are AWR/performance/tuning set based views

It is known that old WRH a/o OPSTAT data can be stranded under various circumstances.
One reason that is not well known is that SQLSETs and BASELINEs require data that is collected and shared with AWR snapshots. Manual or automatic purging of legacy snapshots will not automatically free any snapshots that also share data with the SQLSETs and BASELINEs. This data would only be freed if the SQLSETS and BASELINES were dropped first.

One problem with dropping snapshots: There is no warming about how many snapshots have not been dropped if purging by a range. You will only receive the feedback that the “operation completed”

Solution

SYSAUX Space Usage:
================

Make sure the SYSAUX tablespace is set to AUTOEXTEND off.

Collect awrinfo report and verify which object is consuming the most space in the SYSAUX tablespace.

In the report you may tend to see the following tables are consuming the most space

1. WRH$_ACTIVE_SESSION_HISTORY
2. WRI$_OPTSTAT_HISTGRM_HISTORY
3. WRI$_OPTSTAT_HISTHEAD_HISTORY

Example from the AWRINFO.SQL script

*************************************
(2) Size estimates for AWR snapshots
*************************************
|
| Estimates based on 60 mins snapshot INTERVAL:

| AWR size/day 745.8 MB (31,823 K/snap * 24 snaps/day)
| AWR size/wk 5,220.9 MB (size_per_day * 7) per instance
| AWR size/wk 10,441.8 MB (size_per_day * 7) per database
|
| Estimates based on 24 snaps in past 24 hours:

| AWR size/day 745.8 MB (31,823 K/snap and 24 snaps in past 24 hours)
| AWR size/wk 5,220.9 MB (size_per_day * 7) per instance
| AWR size/wk 10,441.8 MB (size_per_day * 7) per database
|

i.e.per the above information 10GB is the expected weekly size

However the stats show 37gb even after several purge attempts
########################################################
(I) AWR Snapshots Information
########################################################
*****************************************************
(1a) SYSAUX usage – Schema breakdown (dba_segments)
*****************************************************
|
| Total SYSAUX size 37,933.8 MB ( 97% of 39,292.9 MB MAX with AUTOEXTEND OFF )
|
| Schema SYS occupies 37,178.3 MB ( 98.0% ) <<< 37gb | Schema SYSTEM occupies 731.6 MB ( 1.9% ) | Schema ABC occupies 18.7 MB ( 0.0% ) | Schema WMSYS occupies 3.1 MB ( 0.0% ) | Schema DBSNMP occupies 2.2 MB ( 0.0% ) | ******************************************************** (1b) SYSAUX occupants space usage (v$sysaux_occupants) ******************************************************** | | Occupant Name Schema Name Space Usage | -------------------------------------------------------- | SM/OPTSTAT SYS 30,778.6 MB << OPTSTAT 30gb | SM/AWR SYS 5,283.1 MB << AWR | SM/ADVISOR SYS 940.4 MB << Advisors can 'trap' snapshot data* | LOGMNR SYSTEM 722.3 MB | SM/OTHER SYS 32.1 MB largest consumers ----------------------- ASH 737.0 WRH$_ACTIVE_SESSION_HISTORY.WRH$_ACTIVE_1198427434_5673 - 98% TABLE PARTITION ASH 144.0 WRH$_ACTIVE_SESSION_HISTORY.WRH$_ACTIVE_1198427434_5993 - 97% TABLE PARTITION ASH 96.0 WRH$_ACTIVE_SESSION_HISTORY_PK.WRH$_ACTIVE_1198427434_5673 - 84% INDEX PARTITION ASH 24.0 WRH$_ACTIVE_SESSION_HISTORY_PK.WRH$_ACTIVE_1198427434_5993 - 78% INDEX PARTITION > />
SQLPLANS need to be reviewed as they can “trap” Snapshot information*

++++++++++++++++++++++++++
SQLPLAN 656.0 MB WRH$_SQL_PLAN – 93% TABLE
SQLPLAN 136.0 MB WRH$_SQL_PLAN_PK – 77% INDEX

* Snapshots cannot be dropped IF the data is required for SQLPLANs or BASELINEs even though the Snapshot purge appears to be successful

– i.e. you receive no feedback that the Snapshots cannot be dropped due to SQLPLAN / Baseline required data shared by snapshots

Below we see where the information is not being released

**********************************
(4) Space usage by non-AWR components (> 500K)
**********************************
COMPONENT MB SEGMENT_NAME SEGMENT_TYPE
——— ——— —————————————–
NON_AWR 11,376.3 SYS.I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST INDEX
NON_AWR 8,098.9 SYS.WRI$_OPTSTAT_HISTGRM_HISTORY TABLE
NON_AWR 5,813.3 SYS.I_WRI$_OPTSTAT_H_ST INDEX
NON_AWR 1,988.0 SYS.WRI$_OPTSTAT_HISTHEAD_HISTORY TABLE
NON_AWR 1,475.3 SYS.I_WRI$_OPTSTAT_HH_OBJ_ICOL_ST INDEX
NON_AWR 908.6 SYS.I_WRI$_OPTSTAT_HH_ST INDEX
NON_AWR 636.0 SYS.SYS_LOB0000006213C00038$$ LOBSEGMENT
NON_AWR 380.0 SYS.WRI$_ADV_OBJECTS TABLE
NON_AWR 256.0 SYS.I_WRI$_OPTSTAT_SYNOPSIS INDEX
NON_AWR 224.0 SYS.WRI$_OPTSTAT_TAB_HISTORY TABLE
NON_AWR 216.6 SYS.I_WRI$_OPTSTAT_TAB_ST INDEX
NON_AWR 208.0 SYS.WRI$_ADV_OBJECTS_IDX_01 INDEX
NON_AWR 201.5 SYS.I_WRI$_OPTSTAT_TAB_OBJ#_ST INDEX
NON_AWR 200.0 SYS.WRI$_OPTSTAT_SYNOPSIS$ TABLE
NON_AWR 160.0 SYS.SYS_LOB0000006207C00004$$ LOBSEGMENT
NON_AWR 96.0 SYS.WRI$_ADV_OBJECTS_PK INDEX

ACTION PLAN

Follow steps from the document

Note: 387914.1 WRH$_ACTIVE_SESSION_HISTORY Does Not Get Purged

1. Check how many partitions do exist for the offending table

select table_name,partition_name from dba_tab_partitions
where table_name = ‘WRH$_ACTIVE_SESSION_HISTORY’;

2. Try to force the execution of the purge operations :

alter session set “_swrf_test_action” = 72;

3. Drop snapshots manually with small range.

DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(
low_snap_id IN NUMBER,
high_snap_id IN NUMBER
dbid IN NUMBER DEFAULT NULL);
– or –
exec dbms_stats.purge_stats(sysdate-&days);

4. If the drop snapshot did not finish or consume for UNDO space. Terminate the session and wait for a day or two the default auto purge job should automatically drop the old snapshots.

Purge:
WRI$_OPTSTAT_HISTGRM_HISTORY and WRI$_OPTSTAT_HISTHEAD_HISTORY

1. Backup both the above tables using conventional Export
do not use expdp as this does not work for some sys related objects.

2. Truncate the table and verify is there any UNUSED indexes if so rebuild them.
– Consider setting the indexes to UNUSABLE and then rebuild

回复
zhouyf says

2011/10/25 at 11:47

相当强悍
备注同样无比强悍。完美了~~:)

回复

流言终结者: AWR的保留天数和SYSAUX表空间的使用率有关吗？

Comments

Comment 取消回复