今天处理了这样一问题,where条件中存在函数fun(date)<to_date(‘9999-01-01′,’YYYY-MM-DD’)这样的无实际意义谓词,导致CBO计算基数时cardinality远小于实际情况,导致优化器认为2个源数据集的基数都不大,从而选择了HASH JOIN Right SEMI+SORT ORDER BY的执行计划,但是由于实际基数远大于computed 计算值所以变成了大的数据集做HASH JOIN并全数据排序,而实际该SQL只要求返回几十行数据而已,使用NESTED LOOP SEMI JOIN可以立即返回排序的前20行数据。
这里就需要解释带函数的谓词时CBO如何计算基数,我们通过下面的例子来说明:
create or replace function check_date( RDATE in date) return date is begin IF rdate< to_date('2099-01-01','YYYY-MM-DD') then return rdate; ELSIF rdate >=to_date('2099-01-01','YYYY-MM-DD') then return to_date('2000-01-01'); end if; end check_date; / SQL> select check_date (sysdate) from dual; CHECK_DAT --------- 06-DEC-12 drop table tab1; SQL> create table tab1 tablespace users as select * from dba_objects where rownum create view vtab1 as select object_id as id , object_name as name, object_type as type , check_date(created) cdata from tab1; View created. SQL> select count(distinct cdata) from vtab1; COUNT(DISTINCTCDATA) -------------------- 130 SQL> exec dbms_stats.gather_table_stats('','TAB1', method_opt=>'FOR ALL COLUMNS SIZE 254'); PL/SQL procedure successfully completed.
因为我们指定收集了直方图所以若直接以”created”为条件查询时可以获得较好的计算基数
SQL> select count(*) from tab1 where created >= to_date('0001-10-10','YYYY-MM-DD'); COUNT(*) ---------- 10000 Execution Plan ---------------------------------------------------------- Plan hash value: 1117438016 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 8 | 40 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 8 | | | |* 2 | TABLE ACCESS FULL| TAB1 | 10000 | 80000 | 40 (0)| 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CREATED">=TO_DATE(' 0001-10-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 133 consistent gets 0 physical reads 0 redo size 526 bytes sent via SQL*Net to client 523 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed
在以上查询中>= to_date(‘0001-10-10′,’YYYY-MM-DD’); 这样的过滤条件实际无意义,在直接使用 “created”列作为谓词的情况下,CBO可以获得很好的基数10000。
SQL> select * from TABLE(dbms_xplan.display_cursor(NULL,NULL,'ALLSTATS LAST')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SQL_ID 6zy2k9dy4cv73, child number 0 ------------------------------------- select /*+ gather_plan_statistics */ count(*) from vtab1 where cdata >= to_date('0001-10-10','YYYY-MM-DD') Plan hash value: 1117438016 ------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.25 | 154 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.25 | 154 | |* 2 | TABLE ACCESS FULL| TAB1 | 1 | 500 | 10000 |00:00:00.31 | 154 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CHECK_DATE"("CREATED")>=TO_DATE(' 0001-10-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) 21 rows selected.
通过gather_plan_statistics HINT,我们得到E-Rows 即CBO评估的基数,和 A-Rows实际的基数,可以看到这里E-Rows=500, 即在谓词左边存在使用内部函数或隐身装换的情况下,CBO无法通过现有统计信息的DISTINCT、DENSITY和HISTOGRAM获得较好的Cardinality,其基数总是统计信息中表的总行数/20,如上例中的 10000/20=500。
这就会引入不少的麻烦,因为开发人员有时候为了方便会在视图字段中嵌入自定义的函数,之后若在查询中使用该字段作为谓词条件,则可能导致CBO为相应表计算的基数偏少,是本身应当成本非常高的执行计划的COST变低,而容易被优化器选择。
对于上述问题可选的常见方案是若有这样问题的SQL较少则考虑加HINT或者SQL PROFILE,若较多还是需要考虑减少这种谓词左边有函数的现象。
implicit data_type conversion functions in Filter Predicates. Review Execution Plans.
If Filter Predicatesinclude unexpected INTERNAL_FUNCTION to perform an implicit data_type conversion,
be sure it is not preventing a column from being used as an Access Predicate.