原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:
SQL> create table maclean1 as select * from dba_objects;
Table created.
SQL> update maclean1 set status='INVALID' where owner='MACLEAN';
2 rows updated.
SQL> commit;
Commit complete.
SQL> create index ind_maclean1 on maclean1(status);
Index created.
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);
PL/SQL procedure successfully completed.
SQL> explain plan for select * from maclean1 where status='INVALID';
Explained.
SQL> set linesize 140 pagesize 1400
SQL> select * from table(dbms_xplan.display());
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 987568083
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 11320 | 1028K| 85 (0)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| MACLEAN1 | 11320 | 1028K| 85 (0)| 00:00:02 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("STATUS"='INVALID')
13 rows selected.
10053 trace
Access path analysis for MACLEAN1
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
Column (#10): STATUS(
AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
Table: MACLEAN1 Alias: MACLEAN1
Card: Original: 22639.000000 Rounded: 11320 Computed: 11319.50 Non Adjusted: 11319.50
Access Path: TableScan
Cost: 85.33 Resp: 85.33 Degree: 0
Cost_io: 85.00 Cost_cpu: 11935345
Resp_io: 85.00 Resp_cpu: 11935345
Access Path: index (AllEqRange)
Index: IND_MACLEAN1
resc_io: 185.00 resc_cpu: 8449916
ix_sel: 0.500000 ix_sel_with_filters: 0.500000
Cost: 185.24 Resp: 185.24 Degree: 1
Best:: AccessPath: TableScan
Cost: 85.33 Degree: 1 Resp: 85.33 Card: 11319.50 Bytes: 0
可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
所以优化器选择做全表扫描是有道理的
以上符合”STATUS”=’INVALID’ condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。
实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status=’INVALID’条件仅有少量的card满足,具有良好的选择性:
[oracle@vrh4 ~]$ sqlplus / as sysdba SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011 Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for Linux: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production SQL> show parameter optimizer_fea NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ optimizer_features_enable string 11.2.0.2 SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmac.cn & www.askmac.cn SQL> drop table maclean; Table dropped. SQL> create table maclean as select * from dba_objects; Table created. SQL> update maclean set status='INVALID' where owner='MACLEAN'; 2 rows updated. SQL> commit; Commit complete. SQL> create index ind_maclean on maclean(status); Index created. SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2'); PL/SQL procedure successfully completed.
这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。
得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:
rem rem Generate a histogram of data distribution in a column as recorded rem in dba_tab_histograms rem rem Guy Harrison Jan 2010 : www.guyharrison.net rem rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 set pagesize 10000 set lines 120 set verify off col char_value format a10 heading "Endpoint|value" col bucket_count format 99,999,999 heading "bucket|count" col pct format 999.99 heading "Pct" col pct_of_max format a62 heading "Pct of|Max value" rem col endpoint_value format 9999999999999 heading "endpoint|value" CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER) RETURN VARCHAR2 AS l_str LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x')); l_return VARCHAR2 (4000); BEGIN WHILE (l_str IS NOT NULL) LOOP l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx')); l_str := SUBSTR (l_str, 3); END LOOP; RETURN (SUBSTR (l_return, 1, 6)); END; / WITH hist_data AS ( SELECT endpoint_value,endpoint_actual_value, NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value, endpoint_number, endpoint_number, endpoint_number - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0) bucket_count FROM dba_tab_histograms JOIN dba_tab_col_statistics USING (owner, table_name,column_name) WHERE owner = '&owner' AND table_name = '&table' AND column_name = '&column' AND histogram='FREQUENCY') SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value , bucket_count, ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT, RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max FROM hist_data; WITH hist_data AS ( SELECT endpoint_value,endpoint_actual_value, NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value, endpoint_number, endpoint_number, endpoint_number - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0) bucket_count FROM dba_tab_histograms JOIN dba_tab_col_statistics USING (owner, table_name,column_name) WHERE owner = '&owner' AND table_name = '&table' AND column_name = '&column' AND histogram='FREQUENCY') SELECT hexstr(endpoint_value) char_value, bucket_count, ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT, RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max FROM hist_data ORDER BY endpoint_value;
使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:
这里dbms_stats 包收集到的STATUS=’INVALID’ bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:
SQL> explain plan for select * from maclean where status='INVALID'; Explained. SQL> select * from table(dbms_xplan.display()); PLAN_TABLE_OUTPUT ------------------------------------- Plan hash value: 3087014066 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 9 | 837 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| MACLEAN | 9 | 837 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IND_MACLEAN | 9 | | 1 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("STATUS"='INVALID')
以上可以看到只要收集了直方图CBO就会认识到满足STATUS=’INVALID’的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。
我们进一步来看看有直方图情况下的10053 trace:
SQL> alter system flush shared_pool;
System altered.
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.
SQL> explain plan for select * from maclean where status='INVALID';
Explained.
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for MACLEAN[MACLEAN]
Column (#10):
NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2
这里的NewDensity= bucket_count / SUM(bucket_count) /2
Column (#10): STATUS(
AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
Histogram: Freq #Bkts: 2 UncompBkts: 22640 EndPtVals: 2
Table: MACLEAN Alias: MACLEAN
Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
Access Path: TableScan
Cost: 85.30 Resp: 85.30 Degree: 0
Cost_io: 85.00 Cost_cpu: 10804625
Resp_io: 85.00 Resp_cpu: 10804625
Access Path: index (AllEqRange)
Index: IND_MACLEAN
resc_io: 2.00 resc_cpu: 20763
ix_sel: 0.000398 ix_sel_with_filters: 0.000398
Cost: 2.00 Resp: 2.00 Degree: 1
Best:: AccessPath: IndexRange
Index: IND_MACLEAN
Cost: 2.00 Degree: 1 Resp: 2.00 Card: 9.00 Bytes: 0
上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗?
实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>。
假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:
SQL> drop table maclean;
Table dropped.
SQL> create table maclean as select * from dba_objects;
Table created.
SQL> update maclean set status='INVALID' where owner='MACLEAN';
2 rows updated.
SQL> commit;
Commit complete.
SQL> create index ind_maclean on maclean(status);
Index created.
使用dbms_stats默认method_opt收集maclean表
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');
PL/SQL procedure successfully completed.
@histogram.sql
Enter value for owner: SYS
old 12: WHERE owner = '&owner'
new 12: WHERE owner = 'SYS'
Enter value for table: MACLEAN
old 13: AND table_name = '&table'
new 13: AND table_name = 'MACLEAN'
Enter value for column: STATUS
old 14: AND column_name = '&column'
new 14: AND column_name = 'STATUS'
no rows selected
因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图
declare
begin
for i in 1..500 loop
execute immediate ' alter system flush shared_pool';
DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
execute immediate 'select count(*) from maclean where status=''INVALID'' ' ;
end loop;
end;
/
PL/SQL procedure successfully completed.
SQL> select obj# from obj$ where name='MACLEAN';
OBJ#
----------
97215
SQL> select * from col_usage$ where OBJ#=97215;
OBJ# INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
97215 1 1 0 0 0 0 0 17-OCT-11
97215 10 499 0 0 0 0 0 17-OCT-11
SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');
PL/SQL procedure successfully completed.
@histogram.sql
Enter value for owner: SYS
Enter value for table: MACLEAN
Enter value for column: STATUS
Endpoint bucket Pct of
value count Pct Max value
---------- ----------- ------- --------------------------------------------------------------
INVALI 2 .04
VALIC3 5,453 99.96 *************************************************