CBO为什么不走索引?

原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:

 

SQL> create table maclean1 as select * from dba_objects;

Table created.

SQL> update maclean1 set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;                                       

Commit complete.

SQL> create index ind_maclean1 on maclean1(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);

PL/SQL procedure successfully completed.

SQL> explain plan for select * from maclean1 where status='INVALID';

Explained.

SQL> set linesize 140 pagesize 1400
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 987568083

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 11320 |  1028K|    85   (0)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MACLEAN1 | 11320 |  1028K|    85   (0)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STATUS"='INVALID')

13 rows selected.

10053 trace 

Access path analysis for MACLEAN1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
  Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
  Table: MACLEAN1  Alias: MACLEAN1
    Card: Original: 22639.000000  Rounded: 11320  Computed: 11319.50  Non Adjusted: 11319.50
  Access Path: TableScan
    Cost:  85.33  Resp: 85.33  Degree: 0
      Cost_io: 85.00  Cost_cpu: 11935345
      Resp_io: 85.00  Resp_cpu: 11935345
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN1
    resc_io: 185.00  resc_cpu: 8449916
    ix_sel: 0.500000  ix_sel_with_filters: 0.500000
    Cost: 185.24  Resp: 185.24  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 85.33  Degree: 1  Resp: 85.33  Card: 11319.50  Bytes: 0

可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
所以优化器选择做全表扫描是有道理的

 

 

以上符合”STATUS”=’INVALID’ condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。

 

 

实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status=’INVALID’条件仅有少量的card满足,具有良好的选择性:

 

[oracle@vrh4 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> show parameter optimizer_fea

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.2.0.2

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn  & www.askmac.cn

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL>  commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2');

PL/SQL procedure successfully completed.

 

这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。

得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:

 

rem
rem Generate a histogram of data distribution in a column as recorded
rem  in dba_tab_histograms
rem
rem Guy Harrison Jan 2010 : www.guyharrison.net
rem
rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 

set pagesize 10000
set lines 120
set verify off

col char_value format a10 heading "Endpoint|value"
col bucket_count format 99,999,999 heading "bucket|count"
col pct format 999.99 heading "Pct"
col pct_of_max format a62 heading "Pct of|Max value"
rem col endpoint_value format 9999999999999 heading "endpoint|value" 

CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER)
    RETURN VARCHAR2
AS
    l_str      LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x'));
    l_return   VARCHAR2 (4000);
BEGIN
    WHILE (l_str IS NOT NULL)
    LOOP
        l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx'));
        l_str := SUBSTR (l_str, 3);
    END LOOP;

    RETURN (SUBSTR (l_return, 1, 6));
END;
/

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value ,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data;

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT hexstr(endpoint_value) char_value,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data
ORDER BY endpoint_value;

 

使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:

 

 

这里dbms_stats 包收集到的STATUS=’INVALID’ bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:

 

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SQL>  select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------
Plan hash value: 3087014066

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     9 |   837 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MACLEAN     |     9 |   837 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MACLEAN |     9 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STATUS"='INVALID')

以上可以看到只要收集了直方图CBO就会认识到满足STATUS=’INVALID’的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。

我们进一步来看看有直方图情况下的10053 trace:

SQL> alter system flush shared_pool;

System altered.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN[MACLEAN]
  Column (#10):
    NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2

  这里的NewDensity= bucket_count / SUM(bucket_count) /2

   Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
    Histogram: Freq  #Bkts: 2  UncompBkts: 22640  EndPtVals: 2
  Table: MACLEAN  Alias: MACLEAN
 Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
  Access Path: TableScan
    Cost:  85.30  Resp: 85.30  Degree: 0
      Cost_io: 85.00  Cost_cpu: 10804625
      Resp_io: 85.00  Resp_cpu: 10804625
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN
    resc_io: 2.00  resc_cpu: 20763
    ix_sel: 0.000398  ix_sel_with_filters: 0.000398
    Cost: 2.00  Resp: 2.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IND_MACLEAN
         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 9.00  Bytes: 0

 

上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗?

实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>

 

假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:

 

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

使用dbms_stats默认method_opt收集maclean表

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
old  12:    WHERE owner = '&owner'
new  12:    WHERE owner = 'SYS'
Enter value for table: MACLEAN
old  13:      AND table_name = '&table'
new  13:      AND table_name = 'MACLEAN'
Enter value for column: STATUS
old  14:      AND column_name = '&column'
new  14:      AND column_name = 'STATUS'

no rows selected

因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图

    declare
    begin
    for i in 1..500 loop
	execute immediate ' alter system flush shared_pool';
	DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
    execute immediate 'select count(*)  from maclean where status=''INVALID'' ' ;
    end loop;
    end;
    /

PL/SQL procedure successfully completed.

SQL> select obj# from obj$ where name='MACLEAN';

      OBJ#
----------
     97215
SQL> select * from  col_usage$ where  OBJ#=97215;

       OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
     97215          1              1              0                 0           0          0          0 17-OCT-11
     97215         10            499              0                 0           0          0          0 17-OCT-11

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
Enter value for table: MACLEAN
Enter value for column: STATUS

Endpoint        bucket         Pct of
value            count     Pct Max value
---------- ----------- ------- --------------------------------------------------------------
INVALI               2     .04
VALIC3           5,453   99.96  *************************************************

Comments

  1. 赞一个,分析的漂亮。。

  2. 我的疑问是为什么执行了

    declare
    begin
    for i in 1..500 loop
    execute immediate ‘ alter system flush shared_pool’;
    DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
    execute immediate ‘select count(*) from maclean where status=”INVALID” ‘ ;
    end loop;
    end;
    /

    col_usage$ 里面就有关于 maclean 的相关信息了?我自己做的实验也是这样的。。
    对于第一次建立的表,如果col_usage$ 里面没有表的相关信息,收集表的统计信息也会无效。

    • admin says

      alter system flush shared_pool 每次hard parse 时才会在内存中生成col_usage$的数据

      DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO; 需要执行该过程将内存中的数据刷新到表上。 当然SMON后台进程也会每15分钟将shared pool中的predicate columns的数据刷新到col_usage$基表中(until periodically about every 15 minutes SMON flush the data into the data dictionary),另外当instance shutdown时SMON会扫描col_usage$并找出已被drop表的相关predicate columns记录,并删除这部分”orphaned”孤儿记录。

  3. SQL> create table maclean1 as select * from dba_objects;Table created.SQL> update maclean1 set status=’INVALID’ where owner=’MACLEAN’;0 rows updated.SQL> update maclean1 set status=’INVALID’ where owner=’DBDAO’;1 row updated.SQL> commit;Commit complete.SQL> select * from maclean1 where owner like ‘DBDAO%’;1条记录Execution Plan———————————————————-Plan hash value: 987568083——————————————————————————| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |——————————————————————————| 0 | SELECT STATEMENT | | 14 | 2898 | 339 (1)| 00:00:05 ||* 1 | TABLE ACCESS FULL| MACLEAN1 | 14 | 2898 | 339 (1)| 00:00:05 |——————————————————————————Predicate Information (identified by operation id):————————————————— 1 – filter(“OWNER” LIKE ‘DBDAO%’)Note—– – dynamic sampling used for this statement (level=2)Statistics———————————————————- 5 recursive calls 0 db block gets 1324 consistent gets 0 physical reads 0 redo size 1620 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> exec dbms_stats.gather_table_stats(user,’MACLEAN1′,method_opt=>’FOR ALL COLUMNS size 254′,estimate_percent=>100);PL/SQL procedure successfully completed.SQL> select * from maclean1 where owner like ‘DBDAO%’;OWNER——————————OBJECT_NAME——————————————————————————–SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE—————————— ———- ————– ——————-CREATED LAST_DDL_TIME TIMESTAMP STATUS T G S—————— —————— ——————- ——- – – – NAMESPACE EDITION_NAME———- ——————————DBDAODBDAO_PLSQL1 88936 88936 TABLEOWNER——————————OBJECT_NAME——————————————————————————–SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE—————————— ———- ————– ——————-CREATED LAST_DDL_TIME TIMESTAMP STATUS T G S—————— —————— ——————- ——- – – – NAMESPACE EDITION_NAME———- ——————————24-AUG-15 24-AUG-15 2015-08-24:20:28:06 INVALID N N N 1Execution Plan———————————————————-Plan hash value: 987568083——————————————————————————| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |——————————————————————————| 0 | SELECT STATEMENT | | 1 | 98 | 339 (1)| 00:00:05 ||* 1 | TABLE ACCESS FULL| MACLEAN1 | 1 | 98 | 339 (1)| 00:00:05 |——————————————————————————Predicate Information (identified by operation id):————————————————— 1 – filter(“OWNER” LIKE ‘DBDAO%’)Statistics———————————————————- 128 recursive calls 0 db block gets 1608 consistent gets 0 physical reads 0 redo size 1620 bytes sent via SQL*Net to client 519 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 97 sorts (memory) 0 sorts (disk) 1 rows processed

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号