分区表基础知识培训,适合对分区表知识了解不多的朋友。
顺利在2月28日给客户单位的运维和开发部门做了关于分区表基础知识的培训, 大家提了不少值得思考的问题,Great!
分区表基础知识培训,适合对分区表知识了解不多的朋友。
顺利在2月28日给客户单位的运维和开发部门做了关于分区表基础知识的培训, 大家提了不少值得思考的问题,Great!
今天有同行问我DBMS_STATS与Analyze的区别, 我简单地介绍了一下DBMS_STATS提供了更丰富的功能,从8i开始推荐使用DBMS_STATS来完全替代Analayz Table/Index/Cluster 命令收集数据库中对象的统计信息。
之后他又问起我有那些功能是Analyze 命令可以做到,反而DBMS_STATS做不到的?
这个问题我倒是没有很系统地去考虑过, 闪过脑子的2个念头就是Analyze Table/Index validate structure 和 Analyze Table list chained rows into chained_rows 这2个命令。
Google了一下相关的信息,发现Asktom对这个已经讨论过了,从8i以后analyze被提名为具备”validate”验证功能的命令,主要负责验证表和索引的结构以及链式行(chained and migrated rows)信息; 而DBMS_STATS包负责统计信息的管理。
analyze is slated to do this and this alone at some time in the future. analyze is destined to be
the “validate” thing — analyze validate structure, analyze list chained rows, etc.
dbms_stats will stick with stats and stats only.
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:735625536552
因为Oracle Development 开发部门已经给予 Analyze 命令较为精确地功能定位了, 所以其最主要的2个功能validate structure和list chained rows目前和将来都不会被植入到DBMS_STATS包当中。
Analyze validate structure 的主要语法包括:
@?/rdbms/admin/utlvalid.sql /* 创建存放验证信息的表 */ desc invalid_rows ANALYZE INDEX <index_name> [PARTITION <partition_name>] [SUBPARTITION <subpartition_name>] VALIDATE STRUCTURE CASCADE INTO <table_name> <OFFLINE | ONLINE> select * from index_stats; ANALYZE TABLE <table_name> [PARTITION <partition_name>] [SUBPARTITION <subpartition_name>] VALIDATE STRUCTURE CASCADE [INTO <table_name>] <OFFLINE | ONLINE> ANALYZE CLUSTER <cluster_name> VALIDATE STRUCTURE CASCADE INTO <table_name> <OFFLINE | ONLINE>
补充:analyze index validate structure会填充index_stats视图,该视图包含了丰富的索引空间信息:
SQL> desc index_stats; Name Null? Type ----------------------------------------- -------- ---------------------------- HEIGHT NUMBER BLOCKS NUMBER NAME VARCHAR2(30) PARTITION_NAME VARCHAR2(30) LF_ROWS NUMBER LF_BLKS NUMBER LF_ROWS_LEN NUMBER LF_BLK_LEN NUMBER BR_ROWS NUMBER BR_BLKS NUMBER BR_ROWS_LEN NUMBER BR_BLK_LEN NUMBER DEL_LF_ROWS NUMBER DEL_LF_ROWS_LEN NUMBER DISTINCT_KEYS NUMBER MOST_REPEATED_KEY NUMBER BTREE_SPACE NUMBER USED_SPACE NUMBER PCT_USED NUMBER ROWS_PER_KEY NUMBER BLKS_GETS_PER_ACCESS NUMBER PRE_ROWS NUMBER PRE_ROWS_LEN NUMBER OPT_CMPR_COUNT NUMBER OPT_CMPR_PCTSAVE NUMBER
但是请注意validate structure会锁住table/index/cluster等对象,对于在线业务可以考虑使用validate structure online在线验证方法,但是validate strucutre online也有它的缺点,那就是在线模式下结构验证命令将不填充索引的状态信息到index_stats视图。
Analyze list chained rows的主要语法包括:
@?/rdbms/admin/utlchain.sql desc chained_rows ANALYZE TABLE <table_name> LIST CHAINED ROWS INTO <table_name>; ANALYZE CLUSTER <cluster_name> LIST CHAINED ROWS INTO <table_name>;
注意因为DBMS_STATS仅仅收集对CBO有用的统计信息,所以一些空间使用情况信息和使用FreeList管理的信息都不会被收集,这些信息包括:
If statistics unrelated to the cost based optimizer are required, then these
must still be collected using the the ANALYZE command. These statistics include:
Space Usage information :
EMPTY_BLOCKS,
AVG_SPACE,
CHAIN_CNT
Information on freelist blocks
AVG_SPACE_FREELIST_BLOCKS,
NUM_FREELIST_BLOCKS
因为以上信息对于CBO计算成本并没有帮助,所以DBMS_STATS也就无意也无法收集它们,但是Analyze命令还是可以做到收集以上这些信息。
此外因为CBO其实并不会参考Cluster类型对象的统计信息来计算Cost成本,而是使用cluster中各个表的统计信息(DBMS_STATS does not gather cluster statistics, but you can use DBMS_STATS to gather statistics on the individual tables instead of the whole cluster. )
所以DBMS_STATS也不支持收集Cluster的统计信息。
这么总结一下,发现Analyze命令的独有功能还真不少,因为Oracle公司已经明确了Analyze作为”validate”验证命令的功能定位,且很多内部的工具和脚本仍在使用Analyze Table/Cluster/Index的特有功能,所以可以预期Analyze命令在未来的一段时间内也不会被废弃。
总结以上DBMS_STATS无法实现,而传统的Analyze命令可以做到的功能:
附Analyze命令的详细语法如下:
Syntax: ANALYZE TABLE tablename COMPUTE|ESTIMATE|DELETE STATISTICS ptnOption options ANALYZE INDEX indexname COMPUTE|ESTIMATE|DELETE STATISTICS ptnOption options ANALYZE CLUSTER clustername COMPUTE|ESTIMATE|DELETE STATISTICS options ptnOption PARTITION (partion) SUBPARTITION (subpartition) options VALIDATE STRUCTURE [CASCADE] [INTO tablename] LIST CHAINED ROWS [INTO tablename] COMPUTE|ESTIMATE STATISTICS FOR TABLE COMPUTE|ESTIMATE STATISTICS FOR ALL COLUMNS COMPUTE|ESTIMATE STATISTICS FOR ALL INDEXED COLUMNS COMPUTE|ESTIMATE STATISTICS FOR COLUMNS [SIZE int] column [SIZE int] When Estimating statistics you can optionally specify ... ESTIMATE STATISTICS SAMPLE n ROWS ... ESTIMATE STATISTICS SAMPLE n PERCENT
原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:
SQL> create table maclean1 as select * from dba_objects;
Table created.
SQL> update maclean1 set status='INVALID' where owner='MACLEAN';
2 rows updated.
SQL> commit;
Commit complete.
SQL> create index ind_maclean1 on maclean1(status);
Index created.
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);
PL/SQL procedure successfully completed.
SQL> explain plan for select * from maclean1 where status='INVALID';
Explained.
SQL> set linesize 140 pagesize 1400
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 987568083
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11320 | 1028K| 85 (0)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| MACLEAN1 | 11320 | 1028K| 85 (0)| 00:00:02 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='INVALID')
13 rows selected.
10053 trace
Access path analysis for MACLEAN1
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
Column (#10): STATUS(
AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
Table: MACLEAN1 Alias: MACLEAN1
Card: Original: 22639.000000 Rounded: 11320 Computed: 11319.50 Non Adjusted: 11319.50
Access Path: TableScan
Cost: 85.33 Resp: 85.33 Degree: 0
Cost_io: 85.00 Cost_cpu: 11935345
Resp_io: 85.00 Resp_cpu: 11935345
Access Path: index (AllEqRange)
Index: IND_MACLEAN1
resc_io: 185.00 resc_cpu: 8449916
ix_sel: 0.500000 ix_sel_with_filters: 0.500000
Cost: 185.24 Resp: 185.24 Degree: 1
Best:: AccessPath: TableScan
Cost: 85.33 Degree: 1 Resp: 85.33 Card: 11319.50 Bytes: 0
可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
所以优化器选择做全表扫描是有道理的
以上符合”STATUS”=’INVALID’ condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。
实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status=’INVALID’条件仅有少量的card满足,具有良好的选择性:
[oracle@vrh4 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for Linux: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production SQL> show parameter optimizer_fea NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ optimizer_features_enable string 11.2.0.2 SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmac.cn & www.askmac.cn SQL> drop table maclean; Table dropped. SQL> create table maclean as select * from dba_objects; Table created. SQL> update maclean set status='INVALID' where owner='MACLEAN'; 2 rows updated. SQL> commit; Commit complete. SQL> create index ind_maclean on maclean(status); Index created. SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2'); PL/SQL procedure successfully completed.
这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。
得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:
rem rem Generate a histogram of data distribution in a column as recorded rem in dba_tab_histograms rem rem Guy Harrison Jan 2010 : www.guyharrison.net rem rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 set pagesize 10000 set lines 120 set verify off col char_value format a10 heading "Endpoint|value" col bucket_count format 99,999,999 heading "bucket|count" col pct format 999.99 heading "Pct" col pct_of_max format a62 heading "Pct of|Max value" rem col endpoint_value format 9999999999999 heading "endpoint|value" CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER) RETURN VARCHAR2 AS l_str LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x')); l_return VARCHAR2 (4000); BEGIN WHILE (l_str IS NOT NULL) LOOP l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx')); l_str := SUBSTR (l_str, 3); END LOOP; RETURN (SUBSTR (l_return, 1, 6)); END; / WITH hist_data AS ( SELECT endpoint_value,endpoint_actual_value, NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value, endpoint_number, endpoint_number, endpoint_number - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0) bucket_count FROM dba_tab_histograms JOIN dba_tab_col_statistics USING (owner, table_name,column_name) WHERE owner = '&owner' AND table_name = '&table' AND column_name = '&column' AND histogram='FREQUENCY') SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value , bucket_count, ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT, RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max FROM hist_data; WITH hist_data AS ( SELECT endpoint_value,endpoint_actual_value, NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value, endpoint_number, endpoint_number, endpoint_number - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0) bucket_count FROM dba_tab_histograms JOIN dba_tab_col_statistics USING (owner, table_name,column_name) WHERE owner = '&owner' AND table_name = '&table' AND column_name = '&column' AND histogram='FREQUENCY') SELECT hexstr(endpoint_value) char_value, bucket_count, ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT, RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max FROM hist_data ORDER BY endpoint_value;
使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:
这里dbms_stats 包收集到的STATUS=’INVALID’ bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:
SQL> explain plan for select * from maclean where status='INVALID'; Explained. SQL> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------- Plan hash value: 3087014066 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 837 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| MACLEAN | 9 | 837 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_MACLEAN | 9 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("STATUS"='INVALID')
以上可以看到只要收集了直方图CBO就会认识到满足STATUS=’INVALID’的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。
我们进一步来看看有直方图情况下的10053 trace:
SQL> alter system flush shared_pool;
System altered.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.
SQL> explain plan for select * from maclean where status='INVALID';
Explained.
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for MACLEAN[MACLEAN]
Column (#10):
NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2
这里的NewDensity= bucket_count / SUM(bucket_count) /2
Column (#10): STATUS(
AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
Histogram: Freq #Bkts: 2 UncompBkts: 22640 EndPtVals: 2
Table: MACLEAN Alias: MACLEAN
Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
Access Path: TableScan
Cost: 85.30 Resp: 85.30 Degree: 0
Cost_io: 85.00 Cost_cpu: 10804625
Resp_io: 85.00 Resp_cpu: 10804625
Access Path: index (AllEqRange)
Index: IND_MACLEAN
resc_io: 2.00 resc_cpu: 20763
ix_sel: 0.000398 ix_sel_with_filters: 0.000398
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange
Index: IND_MACLEAN
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 9.00 Bytes: 0
上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗?
实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>。
假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:
SQL> drop table maclean;
Table dropped.
SQL> create table maclean as select * from dba_objects;
Table created.
SQL> update maclean set status='INVALID' where owner='MACLEAN';
2 rows updated.
SQL> commit;
Commit complete.
SQL> create index ind_maclean on maclean(status);
Index created.
使用dbms_stats默认method_opt收集maclean表
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');
PL/SQL procedure successfully completed.
@histogram.sql
Enter value for owner: SYS
old 12: WHERE owner = '&owner'
new 12: WHERE owner = 'SYS'
Enter value for table: MACLEAN
old 13: AND table_name = '&table'
new 13: AND table_name = 'MACLEAN'
Enter value for column: STATUS
old 14: AND column_name = '&column'
new 14: AND column_name = 'STATUS'
no rows selected
因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图
declare
begin
for i in 1..500 loop
execute immediate ' alter system flush shared_pool';
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
execute immediate 'select count(*) from maclean where status=''INVALID'' ' ;
end loop;
end;
/
PL/SQL procedure successfully completed.
SQL> select obj# from obj$ where name='MACLEAN';
OBJ#
----------
97215
SQL> select * from col_usage$ where OBJ#=97215;
OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
97215 1 1 0 0 0 0 0 17-OCT-11
97215 10 499 0 0 0 0 0 17-OCT-11
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');
PL/SQL procedure successfully completed.
@histogram.sql
Enter value for owner: SYS
Enter value for table: MACLEAN
Enter value for column: STATUS
Endpoint bucket Pct of
value count Pct Max value
---------- ----------- ------- --------------------------------------------------------------
INVALI 2 .04
VALIC3 5,453 99.96 *************************************************
一套11.1.0.7上的Oracle Application Object Library应用程序,在收集schema统计信息时出现了ORA-20001错误,具体错误日志如下:
SQL> exec fnd_stats.gather_schema_statistics('AP'); PL/SQL procedure successfully completed. SQL> show error No errors. ============================================ Concurrent request error Log ------------------------------------ **Starts**14-APR-2011 02:20:53 **Ends**14-APR-2011 04:40:43 ORA-0000: normal, successful completion +---------------------------------------------------------------------------+ Start of log messages from FND_FILE +---------------------------------------------------------------------------+ In GATHER_SCHEMA_STATS , schema_name= ALL percent= 10 degree = 8 internal_flag= NOBACKUP stats on table AQ$_WF_CONTROL_P is locked stats on table FND_CP_GSM_IPC_AQTBL is locked stats on table WF_NOTIFICATION_OUT is locked Error #1: ERROR: While GATHER_TABLE_STATS: object_name=AP.JE_FR_DAS_010***ORA-20001: invalid column name or duplicate columns/column groups/expressions in method_opt*** Error #2: ERROR: While GATHER_TABLE_STATS: object_name=AP.JE_FR_DAS_010_NEW***ORA-20001: invalid column name or duplicate columns/column groups/expressions in method_opt*** Error #3: ERROR: While GATHER_TABLE_STATS: object_name=AP.JG_ZZ_SYS_FORMATS_ALL_B***ORA-20001: invalid column name or duplicate columns/column groups/expressions in method_opt*** +---------------------------------------------------------------------------+ End of log messages from FND_FILE +---------------------------------------------------------------------------+
经确认该问题由bug 7601966: GATHER SCHEMA STATS ON AP SCHEMA FAILS WITH ORA-20001: INVALID COLUMN NAME 引起,可以通过follow文档<Gather Schema Statistics fails with Ora-20001 errors after 11G database upgrade [ID 781813.1]>解决该问题。
在Oracle 10g中DBMS_STATS包针对GATHER_TABLE/INDEX_STATS和DELETE_TABLE/INDEX_STATS等收集统计信息的存储过程提供了AUTO_INVALIDATE选项;
该参数允许用户指定是否让那些对统计信息有依存关系的游标失效,举例来说如果SQL游标涉及到的表,索引,列或固有对象的统计信息收到以上存储过程修改时,使用NO_INVALIDATE选项可以指定是否让这些受到影响的游标失效,何时失效。
NO_INVALIDATE选项可以有以下三种值:
-- no_invalidate - Do not invalide the dependent cursors if set to TRUE. -- The procedure invalidates the dependent cursors immediately -- if set to FALSE. -- Use DBMS_STATS.AUTO_INVALIDATE to have oracle decide when to -- invalidate dependend cursors. This is the default. The default -- can be changed using set_param procedure.
当统计信息为DBMS_STATS包所修改,新的尚未在共享池中缓存的游标将直接使用这些统计信息; 对于已经存在的共享池中游标缓存,我们无法在原始子游标的基础上更新它们的执行计划;这些旧的子游标将被新的参考最新统计信息的子游标替代,这个过程包含一次硬解析以便获得新的优化树和执行计划;换而言之传统的立即游标失效(Immediate Cursor Invalidation)就是在统计信息更新后立即导致原始子游标的失效,而我们所说的滚动游标失效(Rolling Cursor Invalidations)是在统计信息成功更新的前提下保证原始子游标不立即失效;设想如果系统中有一张业务相关表,一旦我们更新了该表的统计信息可能导致大量共享失效,短期内硬解析将十分频繁并占用大量cpu,而且很多时候我们并不期望执行计划有显著变化;为了防止dbms_stats包统计信息时不要越帮越忙,就可以考虑到使用NO_INVALIDATE选项。
我们来看看RCI的具体表现:
[Read more…]
What are the default parameter values ?
select dbms_stats.get_param('cascade') from dual; select dbms_stats.get_param('degree') from dual; select dbms_stats.get_param('estimate_percent') from dual; select dbms_stats.get_param('method_opt') from dual; select dbms_stats.get_param('no_invalidate') from dual; select dbms_stats.get_param('granularity') from dual; DEFAULT PARAMETER DBMS_STATS.AUTO_CASCADE NULL DBMS_STATS.AUTO_SAMPLE_SIZE FOR ALL COLUMNS SIZE AUTO DBMS_STATS.AUTO_INVALIDATE AUTO
Copyright © 2024 · Genesis Framework · WordPress · Log in