【11g新特性】Cardinality Feedback基数反馈

Cardinality Feedback基数反馈是版本11.2中引入的关于SQL 性能优化的新特性,该特性主要针对 统计信息陈旧、无直方图或虽然有直方图但仍基数计算不准确的情况, Cardinality基数的计算直接影响到后续的JOIN COST等重要的成本计算评估,造成CBO选择不当的执行计划。以上是Cardinality Feedback特性引入的初衷。

 

Cardinality Feedback2

Cardinality Feedback1

 

但是每一个Oracle新版本引入的新特性 都被一些老外DBA称之为buggy ,Cardinality Feedback基数反馈多少也造成了一些麻烦,典型的情况是测试语句性能时,第一次的性能最好,之后再运行其性能变差。

 

我们来看一下 Cardinality Feedback基数反馈是如何作用的:
注意使用普通用户来测试Cardinality Feedback,sys用户被默认禁用该特性

 

 

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> conn maclean/oracle
已连接。

SQL> show parameter dynamic

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
optimizer_dynamic_sampling           integer                0

SQL> create table test as select * from dba_tables;

表已创建。

SQL> select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
SQL_ID  0p4u1wqwg6t9z, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test

Plan hash value: 1950795681

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     104 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     104 |
|   2 |   TABLE ACCESS FULL| TEST |      1 |   8904 |   2873 |00:00:00.01 |     104 |
-------------------------------------------------------------------------------------

已选择14行。

SQL> select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------
SQL_ID  0p4u1wqwg6t9z, child number 1
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from test

Plan hash value: 1950795681

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |     104 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |     104 |
|   2 |   TABLE ACCESS FULL| TEST |      1 |   2873 |   2873 |00:00:00.01 |     104 |
-------------------------------------------------------------------------------------

Note
-----
   - cardinality feedback used for this statement

已选择18行。

 

 

上例中第一次运行时,由于未收集表上的统计信息且optimizer_dynamic_sampling=0 关闭了动态采样所以基数评估值(1)和实际值(2873)有着较大的差距。

 

 

cardinality feedback used for this statement这个信息说明第二次执行时使用了Cardinality Feedback基数反馈,且其基数评估也十分精确了,这是因为第二次执行时考虑到第一次执行时的基数反馈,我们来看看Oracle到底是如何做到的:

 

 

 

SQL> alter system flush shared_pool;

系统已更改。

SQL>
SQL> alter session set events '10053 trace name context forever, level 1';

会话已更改。

SQL>  select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

SQL>  select /*+ gather_plan_statistics */ count(*) from test;

  COUNT(*)
----------
      2873

10053 trace:

第一次执行:

sql= select /*+ gather_plan_statistics */ count(*) from test
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
--------------------------------------+-----------------------------------+
| Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
--------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |         |       |       |    31 |           |
| 1   |  SORT AGGREGATE     |         |     1 |       |       |           |
| 2   |   TABLE ACCESS FULL | TEST    |  8904 |       |    31 |  00:00:01 |
--------------------------------------+-----------------------------------+

SELECT /*+ OPT_ESTIMATE (TABLE "TEST" ROWS=2873.000000 ) */ COUNT(*) "COUNT(*)" FROM "MACLEAN"."TEST" "TEST"

SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for TEST[TEST] 
  Table: TEST  Alias: TEST
    Card: Original: 8904.000000    >> Single Tab Card adjusted from:8904.000000 to:2873.000000
  Rounded: 2873  Computed: 2873.00  Non Adjusted: 8904.00
  Access Path: TableScan
    Cost:  31.10  Resp: 31.10  Degree: 0
      Cost_io: 31.00  Cost_cpu: 1991217
      Resp_io: 31.00  Resp_cpu: 1991217
  Best:: AccessPath: TableScan
         Cost: 31.10  Degree: 1  Resp: 31.10  Card: 2873.00  Bytes: 0

sql= select /*+ gather_plan_statistics */ count(*) from test
----- Explain Plan Dump -----
----- Plan Table -----

============
Plan Table
============
--------------------------------------+-----------------------------------+
| Id  | Operation           | Name    | Rows  | Bytes | Cost  | Time      |
--------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT    |         |       |       |    31 |           |
| 1   |  SORT AGGREGATE     |         |     1 |       |       |           |
| 2   |   TABLE ACCESS FULL | TEST    |  2873 |       |    31 |  00:00:01 |
--------------------------------------+-----------------------------------+

 

 

 

 

可以看到第二次执行时SQL最终转换加入了 OPT_ESTIMATE (TABLE “TEST” ROWS=2873.000000 )的HINT ,OPT_ESTIMATE HINT一般由 kestsaFinalRound()内核函数生成。该HINT用以纠正各种类型的优化器评估,例如某表上的基数或某个列的最大、最小值。反应出优化的不足或者BUG。

 

可以通过V$SQL_SHARED_CURSOR和来找出现有系统shared pool中仍存在的 使用了Cardinality Feedback基数反馈的子游标:

 

 

SQL> select sql_ID,USE_FEEDBACK_STATS  FROM V$SQL_SHARED_CURSOR where USE_FEEDBACK_STATS ='Y';

SQL_ID                     US
-------------------------- --
159sjt1f6khp2              Y

 

 

 

还可以使用cardinality HINT来强制使用Cardinality Feedback 。

select /*+ cardinality(test,  1) */ count(*) from test;

 

 

如何禁用Cardinality Feedback基数反馈

 

对于这些”惹火”特性,为了stable,往往考虑关闭该特性。

可以通过多种方法禁用该特性

1. 使用 _optimizer_use_feedback 隐藏参数

session 级别

SQL> alter session set “_optimizer_use_feedback”=false;

会话已更改。

system级别

SQL> alter system set “_optimizer_use_feedback”=false;

系统已更改。

 

2. 使用opt_param(‘_optimizer_use_feedback’ ‘false’) HINT

例如:

select /*+ opt_param(‘_optimizer_use_feedback’ ‘false’) cardinality(test,1) */ count(*) from test;

 

 

 

CBO Cost Formulas基于成本优化器的成本计算公式大全

CBO Cost Formulas成本计算公式大全:

成本模型:连接方式Join method

注意 连接基数(Join Cardinality)不受到连接方式(join method) 的影响, oracle中主要的三种join method HASH JOIN、Nested Loops、Sort Merge:

  • Nested Loops嵌套循环成本公式:
    • Cost(outer)+Cost(inner))*cardinality(outer)
  • Sort merge 合并排序连接成本公式:
    • Cost(outer) + Cost(inner) + Sort(outer) + Sort(inner)
  • Hash Join 哈希连接公式:
    • Cost(outer) + Cost(inner) + Build(outer) + Probe(inner)

 

Index Unique Scan Cost成本计算
INDEX UNIQUE SCAN COST = (BLEVEL (1-(OIC/100)) + 1) * (OICA/100)

 

Index Range Scan Cost成本计算
INDEX RANGE SCAN COST = (BLEVEL + FF*LFBL)*(1-(OIC/100))+ FF*CLUF)* (OICA/100)

 

 

formula does not include the CPU cost

  • BLEVEL = number of branch levels in index
  • add +1 for leaf block
  • FF = filtering factor – selectivity
  • LFBL = number of leaf blocks
  • CLUF = index clustering factor
  • OIC = optimizer_index_caching(default 0)
  • OICA = optimizer_index_cost_adj parameter(default=100)

 

CPU costing启用的情况下:

mreadtime -Average time , in milliseconds, for a multi-block read (according to sys.aux_stats$)

sreadtime – Average time , in milliseconds, for a single-block read (according to sys.aux_stats$)

MBRC – Average number of blocks to be read in a multi-block read (according to sys.aux_stats$

#SRDs – number of single block reads

#MRDs – number of multi block reads

#CPUCycles – number of CPU Cycles

sreadtime = ioseektim + db_block_size/iotfrspeed

mreadtim = ioseektim + db_file_multiblock_read_count * db_block_size / iotrfspeed

#MRds = #Blks/MBRC

Cost 成本本身 =(#SRds * sreadtim +#MRds * mreadtim + #CPUCycles/cpuspeed)/sreadtim ,  

Cost成本的单位 为 single-block read time=sreadtim

 

 

OSS Description

Provide a description of the component including how it will be built and what it will do, with a reference to the functional requirements (from the Functional Specification) that are being addressed.

Optimizer system statistics contains hardware characteristics. With OSS optimizer combines IO and CPU resources needed to execute query into single unit – estimated execution time.

OSS Components:

 

Component Initialization Maintenance Description Abbreviated component’s name (get_system_stats and set_system_stats)
CPU speed At system startup Gathering system stats with gathering_mode =  ‘NOWORKLOAD’ or ‘START’, ‘STOP’ or ‘INTERVAL’ or setting manually # CPU cycles per second cpuspeed
IO seek time At system startup Gathering system stats with gathering_mode =  ‘NOWORKLOAD’ or setting manually Seek time + latency time + OS overhead time ioseektim
IO transfer speed At system startup Gathering system stats with gathering_mode =  ‘NOWORKLOAD’ or setting manually Rate at which oracle can read data in the single read request iotfrspeed
Max IO throughput None Gathering system stats with gathering_mode =  ‘NOWORKLOAD’ or ‘START’, ‘STOP’ or ‘INTERVAL’ or setting manually This number characterizes maximum throughput (MB / sec) IO subsystem can deliver maxthr
Average slave IO throughput None Gathering system stats with gathering_mode =  ‘START’, ‘STOP’ or ‘INTERVAL’ or setting manually Average parallel slave IO throughput slavethr
Average Single Block Read Time None Gathering system stats with gathering_mode =  ‘START’, ‘STOP’ or ‘INTERVAL’ or setting manually sreadtim
Average Multi Block Read Time None Gathering system stats with gathering_mode =  ‘START’, ‘STOP’ or ‘INTERVAL’ or setting manually mreadtim
Average multi block read count None Gathering system stats with gathering_mode =  ‘START’, ‘STOP’ or ‘INTERVAL’ or setting manually mbrc

 

  1. cpuspeed, ioseektim, iotfrspeed are always collected
  2. maxthr, slavethr, sreadtim, mreadtim and mbrc collected only when user gathers workload statistics.

These two sets contain equivalent information. The difference is that A) does not relate to workload and B) does. At any moment of time only one set of OSS can be used.

 

 

OSS data located in data dictionary in the aux_stats$ table and in the SGA variable kkossga. aux_stats$ keeps persistent copy of the OSS. kkossga keeps working copy. Data in kkossga and  aux_stats$ is always synchronized. User can modify, delete and gather OSS through interface provided in the DBMS_STATS package.

 

OSS used to represent cost as query estimated running time (it’s implemented as #CPU cycles and # multi block reads conversion to # single block reads) and to adjust FTS cost for parallel reads.

SRead_Cost(#Cycles) = (#Cycles * (1 / CPUSpeed)) / sreadtim

SRead_Cost(#MReads) = (#MReads * mreadtim) / sreadtim

When no workload stats available optimizer uses NOWORKLOAD stats to compute sreadtim and mreadtim:

sreadtim = ioseektim + block_size / iotfrspeed

mreadtim = ioseektim + mbrc * block_size / iotfrspeed

Optimizer converts multi block reads to single block reads (even if cost formula looks elegant the actual processing has to support old days behavior and it causes that internally everything converted to single block reads)

aux_stats$

table aux_stats$ (

sname varchar2(30) not null,       /* Prefix */

pname varchar2(30) not null,       /* Name of parameter */

pval1 number,                      /* NUMBER parameter value */

pval2 varchar2(255)                /* VARCHAR2 parameter value */

)

This table stores OSS. It also used to store the intermediate values when analyzing workload.

sname column used to store global prefixes of the stats SYSSTATS_MAIN, SYSSTATS_TEMP and SYSSTATS_INFO.

if sname = SYSSTATS_MAIN then pname and pval1 columns store name-value pairs for data representing current stats:

cpuspeed (# cpu cycles per second) in millions;

ioseektim (Seek time + latency time + OS overhead time) in milliseconds;

iotfrspeed (IO transfer speed) in bytes/ second;

maxthr (maximum I/O system throughput) in bytes/sec;

slavethr (average slave throughput);

sreadtim (wait time to read single block) in milliseconds;

mreadtim (wait time to read a multiblock) in milliseconds;

mbrc (multiblock read count) in blocks;

if sname = SYSSTATS_TEMP then pname and pval1 columns store name-value pairs for intermediate data, generated than user issues DBMS_STAT.GATHER_SYSTEM_STATS procedure and removed then gathering completes.

if sname = SYSSTATS_INFO then pname, pval2 columns store name-values for current and intermediate stats:

DSTART – then gathering was started, format “MM-DD-YYYY HH:MI”

DSTOP – then gathering was (will be, had to be) finished format “MM-DD-YYYY HH:MI”

STATUS – ‘COMPLETED’, ‘AUTOGATHERING’, ‘MANUALGATHERING’, ‘INVALID’

Oracle CBO术语大集合

最近准备写点Histogram和density相关的文章,先把术语给大家理一理:

cardinality (CDN)
Legend
CBQT – cost-based query transformation
JPPD – join predicate push-down
OJPPD – old-style (non-cost-based) JPPD
FPD – filter push-down
PM – predicate move-around
CVM – complex view merging
SPJ – select-project-join
SJC – set join conversion
SU – subquery unnesting
OBYE – order by elimination
OST – old style star transformation
ST – new (cbqt) star transformation
CNT – count(col) to count(*) transformation
JE – Join Elimination
JF – join factorization
SLP – select list pruning
DP – distinct placement
qb – query block
LB – leaf blocks
DK – distinct keys
LB/K – average number of leaf blocks per key
DB/K – average number of data blocks per key
CLUF – clustering factor
NDV – number of distinct values
Resp – response cost
Card – cardinality
Resc – resource cost
NL – nested loops (join)
SM – sort merge (join)
HA – hash (join)
CPUSPEED – CPU Speed
IOTFRSPEED – I/O transfer speed
IOSEEKTIM – I/O seek time
SREADTIM – average single block read time
MREADTIM – average multiblock read time
MBRC – average multiblock read count
MAXTHR – maximum I/O system throughput
SLAVETHR – average slave I/O throughput
dmeth – distribution method
1: no partitioning required
2: value partitioned
4: right is random (round-robin)
128: left is random (round-robin)
8: broadcast right and partition left
16: broadcast left and partition right
32: partition left using partitioning of right
64: partition right using partitioning of left
256: run the join in serial
0: invalid distribution method
sel – selectivity
ptn – partition
adop Automatic degree of parallelism

TABLE: Table Name
ALIAS: Table Alias
QBS: Query Block Signature
#ROWS: Number of Rows
#BLKS: Number of Blocks
ARL: Average Row Length
COR: Cardinality Original
CRD: Cardinality Rounded
CCM: Cardinality Computed
CNA: Cardinality Non Adjusted

AVGLEN: Average Column Length
NDV: Number of Distinct Values
NULLS: Number of Nulls in Column
DEN: Column Density
MIN: Minimum Column Value
MAX: Maximum Column Value
TYPE: Histogram Type
#BKTS: Histogram Buckets
UNCOMPBKTS: Histogram Uncompressed Buckets   
ENDPTVALS: Histogram End Point Values 
OOR: Out-of-Range Predicate

TABLE: Table Name
ALIAS: Table Alias
INDEX: Index Name
QBS: Query Block Signature
LVLS: Index Levels
#LB: Number of Leaf Blocks
#DK: Number of Distinct Keys
LB/K: Average Number of Leaf Blocks Per Key
DB/K: Average Number of Data Blocks Per Key
CLUF: Clustering Factor
INDEX_COLS: Index Column Numbers

COST: Cost of the Join
CARD: Cardinality of the Join
BC: Best Cost
LINE#: Line Number in the 10053 Trace File Where Cost Value is Located
JOIN#: Join Number in the 10053 Trace File Associated With Key
STATUS: If Permutation was Computed for all Table Joins the Status = COMPL. If Not, status = ABORT
*: In ANY Column Indicates Value Not Found in File

Freq 频率直方图
HtBal 高度平衡直方图

 

 

关于 UNCOMPBKTS和ENDPTVALS

 

当直方图类型为frequency histograms( Histogram: Freq)时UncompBkts  等于统计信息中采样的总行数-NULLS(Card: Original- NULLS,因为dbms_stats默认是auto_sample_size采样,所以这栏其实是采样到的原始Card-NULLS), 而EndPtVals 等于bucket总数,或者说NDV,因为frequency histograms中 NDV=number of buckets 

当直方图类型为height balanced histograms (Histogram: HtBal) UncompBkts  等于bucket的数目(其实也等于10053 trace中#Bkts的数目),而EndPtVals 等于已经被压缩的Histogram的大小,其实是等于: select count(*) from dba_tab_histograms where table_name=’YOUR_TABLE_NAME’ and column_name=’YOUR_COLUMN_NAME’的实际总和。  通过这2个值对比,可以了解到popular值的多少以及数据的倾斜度, 是有多个大量重复的值(popular value)还是仅有一个巨大的重复值。

 

 

Know more about CBO Index Cost

近日偶读Joze Senegacnik(他是一名ACE)在OOW 2011上做的《Getting The Best From The Cost Based Optimizer》Presentation(这里可以下载),发现他总结的索引Unique Scan和Range Scan成本计算公式总结地很不错,贴出来共享:

 

Index Unique Scan Cost
INDEX UNIQUE SCAN COST = (BLEVEL (1-(OIC/100)) + 1) * (OICA/100)

 

Index Range Scan Cost
INDEX RANGE SCAN COST = (BLEVEL + FF*LFBL)*(1-(OIC/100))+ FF*CLUF)* (OICA/100)

 

 

formula does not include the CPU cost

  • BLEVEL = number of branch levels in index
  • add +1 for leaf block
  • FF = filtering factor – selectivity
  • LFBL = number of leaf blocks
  • CLUF = index clustering factor
  • OIC = optimizer_index_caching(default 0)
  • OICA = optimizer_index_cost_adj parameter(default=100)

解决Oracle中Split Partition缓慢的问题

有这样一个case , 用户的10g产品数据库中有一张按照月份划分的RANGE分区表 , 其范围为Less than (maxvalue)的最后一个Partition分区总是为空Empty的, 用户每隔半年会对这个MaxValue Partition 执行Split Partition 操作, 以便存放后续月份的数据, 同时这个表上还有一个非分区索引Nonpartitioned indexes。

 

满以为Split 这样一个Empry Partition会是一个回车就结束的任务, 但是Performance issue就在这平淡无奇的分区维护工作中出现了, 实际在执行”Alter Table Split partition Pn at (value) into …” 的DDL语句时,发现需要花费超过十多分钟才能完成一次Split。问题就在于,如果是有大量数据的Partition分区 , Split 操作慢一些也是正常的(expected预期内的) , 但是实际这里的Max Partition总是为空的,没有任何一条数据, 为什么Split 一个空的分区也会是这种蜗牛的速度呢?

 

我们来模拟这个现象, 首先创建一张分区表,Maxvalue的Partition是Empty的,且有一个普通索引:

 

 

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bi
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn & www.askmac.cn

SQL>   CREATE TABLE Maclean
  2     (    "OWNER" VARCHAR2(30),
  3          "OBJECT_NAME" VARCHAR2(128),
  4          "SUBOBJECT_NAME" VARCHAR2(30),
  5          "OBJECT_ID" NUMBER,
  6          "DATA_OBJECT_ID" NUMBER,
  7          "OBJECT_TYPE" VARCHAR2(19),
  8          "CREATED" DATE,
  9          "LAST_DDL_TIME" DATE,
 10          "TIMESTAMP" VARCHAR2(19),
 11          "STATUS" VARCHAR2(7),
 12          "TEMPORARY" VARCHAR2(1),
 13          "GENERATED" VARCHAR2(1),
 14          "SECONDARY" VARCHAR2(1)
 15     ) nologging
 16     partition by range(object_id)
 17     (partition p1 values less than (99999) tablespace users,
 18     partition p2 values less than (maxvalue) tablespace users);

Table created.

SQL> insert /*+ append */ into maclean select * from maclean1;

38736384 rows created.

SQL> commit; 

Commit complete.

SQL> create index ind_obj on maclean(DATA_OBJECT_ID,OBJECT_ID,LAST_DDL_TIME,TIMESTAMP,object_type,owner,status)
nologging parallel
  2  ;

Index created.

SQL> alter index ind_obj noparallel;

Index altered.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true,degree=>2);

PL/SQL procedure successfully completed.

SQL> select num_rows from dba_tables where table_name='MACLEAN';

  NUM_ROWS
----------
  38818438

SQL>  select * from maclean partition (p2);

no rows selected

/* Maclean表有大量的数据 ,但是都在p1分区中, p2分区没有任何数据 */

 

 

我们执行Split partition 的DDL 语句,并使用10046 level 12 event监控该过程:

 

 oradebug setmypid;
 oradebug event 10046 trace name context forever,level 12;

SQL>  alter table maclean split partition p2 at (100001)
  2   into (partition p3, partition p4);

Table altered.

[oracle@vrh8 ~]$ tkprof /s01/admin/G10R21/udump/g10r21_ora_4896.trc g10r21_ora_4896.tkf

TKPROF: Release 10.2.0.1.0 - Production on Thu Nov 17 23:42:48 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

从tkf 文件中可以找出以下内容:

 alter table maclean split partition p2 at (100001)
 into (partition p3, partition p4)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.13       0.30         20       1139          0           0
Execute      1      0.01       0.18          3          6         33           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.14       0.48         23       1145         33           0

select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1
from
 "SYS"."MACLEAN" PARTITION ("P2")  where ( (  (  ( "OBJECT_ID" < 100001 )  )
  ) ) and rownum < 2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1     24.85      39.15     371276     385828          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3     24.85      39.15     371276     385828          0           0

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: SYS   (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  COUNT STOPKEY (cr=385828 pr=371276 pw=0 time=39153836 us)
      0   TABLE ACCESS BY GLOBAL INDEX ROWID MACLEAN PARTITION: 2 2 (cr=385828 pr=371276 pw=0 time=39153817 us)
38736384    INDEX FULL SCAN IND_OBJ (cr=385828 pr=371276 pw=0 time=309891137 us)(object id 52832)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                    371276        0.08         19.46
  latch: cache buffers lru chain                  1        0.00          0.00

select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1
from
 "SYS"."MACLEAN" PARTITION ("P2")  where ( (  (  ( "OBJECT_ID" >= 100001 OR
  "OBJECT_ID" IS NULL  )  )  ) ) and rownum < 2

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.00       0.00          0          3          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.00       0.00          0          3          0           0

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: SYS   (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
      0  COUNT STOPKEY (cr=3 pr=0 pw=0 time=83 us)
      0   PARTITION RANGE SINGLE PARTITION: 2 2 (cr=3 pr=0 pw=0 time=63 us)
      0    TABLE ACCESS FULL MACLEAN PARTITION: 2 2 (cr=3 pr=0 pw=0 time=36 us)

 

 

可以看到在执行”Alter table Split partition”的时候该DDL 语句产生了另外2条递归SQL(recursive sql)被调用,即上例中红色标注的SQL语句,这2条递归SQL分别以 “OBJECT_ID” >= 100001 OR “OBJECT_ID” IS NULL  和  “OBJECT_ID” < 100001 作为条件查询P2分区中的数据, Split Partition的DDL需要使用这2条SQL来找出是否存在满足分隔条件过滤的数据(注意语句中有rownum<2 , 所以实际最多也只返回1条数据,Oracle这样来判定分隔条件的左端或右端是否有数据)。

 

但是这里可以看到,其中 以”OBJECT_ID” < 100001 作为条件的语句运行耗时39.15s, 产生了大量的逻辑和物理读, 究其原因是该SQL的执行计划采用了Index FULL SCAN ,而另外一条以 “OBJECT_ID” >= 100001 OR “OBJECT_ID” IS NULL 作为条件的递归SQL语句则采用了TABLE ACCESS FULL MACLEAN PARTITION, 因为实际P2分区中是没有任何数据的,所以后者运行时间是us级别的, 而前者所要扫描的是一整个没有分区的索引, 这产生了大量的”db file sequential read”等待事件, 我们再来看一下其执行计划:

 

 

SQL> explain plan for
  2  select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
  3   1
  4    from "SYS"."MACLEAN" PARTITION("P2")
  5   where (((("OBJECT_ID" < 100001))))
  6     and rownum < 2; 

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------------------
Plan hash value: 985826631

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |    13 |   380K  (1)| 01:16:01 |       |       |
|*  1 |  COUNT STOPKEY                      |         |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| MACLEAN |     1 |    13 |   380K  (1)| 01:16:01 |     2 |     2 |
|*  3 |    INDEX FULL SCAN                  | IND_OBJ |    38M|       |   380K  (1)| 01:16:01 |       |       |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

      1 - filter(ROWNUM<2)
   3 - access("OBJECT_ID"<100001)
       filter("OBJECT_ID"<100001)

 

注意以上递归SQL中的Hint “FIRST_ROWS PARALLEL(“MACLEAN”, 1)”是该Recursive SQL所固有的,换句话说是写死在Split Partition的Oracle代码层里的。

我们可以分析该Recursive SQL 采用INDEX FULL SCAN的原因可能有2种:
1. Split 所指定的分区的分区统计信息不准确,或者已经被清除。 注意一旦我们Split 某个分区后该分区原有统计信息将失效,且被清除。

如下面的例子:

 

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true,degree=>2);

PL/SQL procedure successfully completed.

SQL> col high_value for a20

SQL> select partition_name,high_value,num_rows,blocks from dba_tab_partitions where table_name='MACLEAN';

PARTITION_NAME                 HIGH_VALUE             NUM_ROWS     BLOCKS
------------------------------ -------------------- ---------- ----------
P1                             99999                  38789142     533240
P3                             100001                        0          0
P4                             MAXVALUE                      0          0

SQL>  alter table maclean split partition p4 at (100010)
  2   into (partition p5, partition p4);

这里我们再次Split 当前的MAXVALUE分区p4

SQL> select partition_name,high_value,num_rows,blocks from dba_tab_partitions where table_name='MACLEAN';

PARTITION_NAME                 HIGH_VALUE             NUM_ROWS     BLOCKS
------------------------------ -------------------- ---------- ----------
P1                             99999                  38789142     533240
P3                             100001                        0          0
P4                             MAXVALUE
P5                             100010

 

 

可以发现Split Partiton 会导致原Partiton的统计信息失效,即便使用dbms_stats.lock_table_stats锁住统计信息也无法避免这种失效。
且单个partiton的统计信息失效并不会导致动态采用的发生(dynamic sampling):

 

SQL> show parameter dyna 

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2

SQL> select * from maclean partition (p4);

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3900731449

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |  9699K|   860M| 29715   (2)| 00:05:57 |       |       |
|   1 |  PARTITION RANGE SINGLE|         |  9699K|   860M| 29715   (2)| 00:05:57 |     4 |     4 |
|   2 |   TABLE ACCESS FULL    | MACLEAN |  9699K|   860M| 29715   (2)| 00:05:57 |     4 |     4 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
        124  recursive calls
          0  db block gets
         23  consistent gets
          0  physical reads
          0  redo size
       1139  bytes sent via SQL*Net to client
        458  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> alter system flush shared_pool;

System altered.

SQL> alter session set optimizer_dynamic_sampling=10;

Session altered.

SQL> select * from maclean partition (p4);

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3900731449

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |  9699K|   860M| 29715   (2)| 00:05:57 |       |       |
| 1 | PARTITION RANGE SINGLE | | 9699K| 860M| 29715 (2)| 00:05:57 | 4 | 4 |
|   2 |   TABLE ACCESS FULL    | MACLEAN |  9699K|   860M| 29715   (2)| 00:05:57 |     4 |     4 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
       5812  recursive calls
          0  db block gets
       1141  consistent gets
         22  physical reads
          0  redo size
       1139  bytes sent via SQL*Net to client
        458  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
        139  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',partname=>'P4');

PL/SQL procedure successfully completed.

SQL> select * from maclean partition (p4);

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3900731449

--------------------------------------------------------------------------------------------------
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |         |     1 |   177 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|         |     1 |   177 |     2   (0)| 00:00:01 |     4 |     4 |
|   2 |   TABLE ACCESS FULL    | MACLEAN |     1 |   177 |     2   (0)| 00:00:01 |     4 |     4 |
--------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
       1139  bytes sent via SQL*Net to client
        458  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

 

 

由于Split Partition 本身会导致分区的统计信息失效, 这叫造成由该Split DDL所生成的递归查询SQL语句在解析时CBO Optimizer优化器无法了解该分区的必要统计信息,所以优化器会根据整张Table的统计信息来估算(根据算法来估,而没有实际的统计操作), 实际统计信息中整张表有38818438 行 , 且共有4个分区, 所以P4分区就顺利成长的拥有38818438 /4 = 9704k Rows了 , 实际上例中红色标注的估算值时9699k rows, 因为CBO 优化器得到的统计信息是该分区中有大量的数据 ,这导致其最终选择了 FULL INDEX SCAN的执行计划 ,而不是去扫描其实是空空如也的分区。

 

这是我们在 10g 中执行Split Partition 操作时需要密切注意的一个问题, 解决方法是没执行一次Split Partition DDL语句之前都收集一遍MaxValue Partiton 的统计信息, 因为该分区是空的 ,所以这种统计是十分迅速的:

 

exec dbms_stats.gather_table_stats('&OWNER','&TABNAME',partname=>'&PARNAME');

 

 

另一种手段则是在每次Split 之前都 手动修改Maxvalue分区的统计信息, 这样做会更快一些:

 

exec dbms_stats.set_table_stats(ownname => '&OWNER',tabname => '&TABNAME',partname => '&PARNAME',
numrows => 0,numblks => 0,force => TRUE);

 

 

2. 另一个原因是 相关的递归SQL语句 被嵌入了”First Rows”的hint , 该提示会让CBO更倾向于使用索引以便快速返回前几行的结果,注意因为这些递归SQL实际只要求返回一行结果,所以First Rows 可以说是恰当且明智的; 另外在分区表+本地分区的情景中, 即便这个分区是非空的且存有大量的数据, 那么使用索引都可以说是正确的选择。

 

但是在这里选择INDEX FULL SCAN 恰恰是不明智的, 很显然Oracle开发部门没有为分区表+ 非分区索引(Non-partitioned Indexes) 或 全局索引(global partitioned indexes)的场景考虑该First Rows提示可能带来的后果,  已知在版本10.2.0.2 和 10.2.0.3 上都存在该不恰当的递归SQL hint造成的Split Partiton性能问题,Bug Note<Bug 6435544: SPLIT PARTITION SLOW BECAUSE OF HINTED RECURSIVE SQL>说明了该问题:

 

Hdr: 6435544 10.2.0.2.0 RDBMS 10.2.0.2.0 QRY OPTIMIZER PRODID-5 PORTID-226
Abstract: SPLIT PARTITION SLOW BECAUSE OF HINTED RECURSIVE SQL

PROBLEM:
--------
Split partition operation took more than 45 minutes to complete. Almost all
the time is taken up by the following SQL -
select /*+ FIRST_ROWS PARALLEL("D_CUSTOMER_ORDER_ITEM_CHANNELS", 8) */ 1
from
 "BOOKER"."D_CUSTOMER_ORDER_ITEM_CHANNELS" PARTITION ("COIC101_MAX")
where ( ( ( ( "LEGAL_ENTITY_ID" < 101 )  )  OR  ( "LEGAL_ENTITY_ID" = 101 AND
 (
 "ORDER_DAY" < TO_DATE(' 2007-11-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS',
 'NLS_CALENDAR=GREGORIAN') )  )  ) ) and rownum < 2

DIAGNOSTIC ANALYSIS:
--------------------
a. Table is partitioned on - (LEGAL_ENTITY_ID, ORDER_DAY).
   Index PK_D_CUST_ORDER_ITEM_CHANNELS is on -
       (CUSTOMER_ORDER_ITEM_ID, MARKETPLACE_ID, LEGAL_ENTITY_ID, ORDER_DAY)

b. Table has 555 partitions. Index is a global index. 

c. From the tkprof output -

call   count       cpu    elapsed       disk      query  current  rows
----- ------  -------- ---------- ---------- ---------- --------  ----
Parse      1      0.00       0.00          0          0        0     0
Execute    1      0.00       0.00          0          0        0     0
Fetch      1    405.50    2796.28    1408207    1444973        0     0
----- ------  -------- ---------- ---------- ---------- --------  ----
total      3    405.50    2796.28    1408207    1444973        0     0

Misses in library cache during parse: 1
Optimizer mode: FIRST_ROWS
Parsing user id: 108     (recursive depth: 1)

Rows     Row Source Operation
-------  ---------------------------------------------------
time=6 us)(object id 1160811)

d. Although we have partition_name in the query, we go for INDEX FULL scan
because of hard coded hint FIRST_ROWS. A FTS of the concerned partition would
be much faster in this case.

WORKAROUND:
-----------
Drop the global index which would force an FTS on a single partition.

RELEASE NOTES:
]] Poor performance was possible for ALTER TABLE SPLIT PARTITION if there
]] was an index on the partition key.
REDISCOVERY INFORMATION:
If you get poor performance for an ALTER TABLE SPLIT PARTITION operation
and there is an index on the partition key, you may be hitting this bug.
WORKAROUND:
None

 

 

Metalink给出的Workaround方案是将分区表上的global index 全局索引drop 掉, 这样可以让CBO只能选择对single partition的FULL TABLE SCAN。

 

实际上这个Solution并很不能让人满意,在产品环境中我们不可能随意drop掉某张关键表上的索引,所以这个solution的可操作性很低。

 

补充:我们来看一下First Rows Hint 在CBO计算成本时如何起作用的:

 

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10053 trace name context forever,level 1;
Statement processed.
SQL> explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
  2   1
  3    from "SYS"."MACLEAN" PARTITION("P4")
  4   where (((("OBJECT_ID" < 100010))))
  5     and rownum < 2; Explained. SQL> oradebug tracefile_name;

10053 trace content ================================================

  *************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
  optimizer_mode_hinted               = true
  optimizer_mode                      = first_rows

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: MACLEAN  Alias: MACLEAN  Partition [3]
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
Index Stats::
  Index: IND_OBJ  Col#: 5 4 8 9 6 1 10
    LVLS: 3  #LB: 380544  #DK: 50447  LB/K: 7.00  DB/K: 777.00  CLUF: 39208542.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#4): OBJECT_ID(NUMBER)  Part#: 3
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Column (#4): OBJECT_ID(NUMBER)
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Table: MACLEAN  Alias: MACLEAN     
    Card: Original: 0  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00
  Access Path: TableScan
    Cost:  2.00  Resp: 2.00  Degree: 0
      Cost_io: 2.00  Cost_cpu: 7121
      Resp_io: 2.00  Resp_cpu: 7121
kkofmx: index filter:"MACLEAN"."OBJECT_ID"<100010 AND ROWNUM<2
  Access Path: index (skip-scan)
    SS sel: 0.0000e+00  ANDV (#skips): 4073
    SS io: 32584.00 vs. table scan io: 2.00
    Skip Scan rejected
  Access Path: index (FullScan)
    Index: IND_OBJ
    resc_io: 380547.00  resc_cpu: 10551751028
    ix_sel: 1  ix_sel_with_filters: 1
    Cost: 382007.38  Resp: 382007.38  Degree: 1
  Best:: AccessPath: IndexRange  Index: IND_OBJ
         Cost: 382007.38  Degree: 1  Resp: 382007.38  Card: 0.00  Bytes: 0
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
***********************
Join order[1]:  MACLEAN[MACLEAN]#0
***********************
Best so far: Table#: 0  cost: 382007.3822  card: 0.0000  bytes: 13
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:80000
*********************************
Number of join permutations tried: 1
*********************************
prefetching is on for IND_OBJ
Final - First Rows Plan:  Best join order: 1
  Cost: 382007.3822  Degree: 1  Card: 1.0000  Bytes: 13
  Resc: 382007.3822  Resc_io: 380547.0000  Resc_cpu: 12512178128
  Resp: 382007.3822  Resp_io: 380547.0000  Resc_cpu: 12512178128
kkoipt: Query block SEL$1 (#0)
******* UNPARSED QUERY IS *******
SELECT /*+ FIRST_ROWS NO_PARALLEL ("MACLEAN") */ 1 "1" FROM "SYS"."MACLEAN" PARTITION ("P4") 
"MACLEAN" WHERE ROWNUM<2 AND "MACLEAN"."OBJECT_ID"<100010
kkoqbc-end
          : call(in-use=46464, alloc=49080), compile(in-use=39288, alloc=40552)
apadrv-end: call(in-use=46464, alloc=49080), compile(in-use=40072, alloc=40552)

sql_id=ff1ft3uxsq105.
Current SQL statement for this session:
explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
 1
  from "SYS"."MACLEAN" PARTITION("P4")
 where (((("OBJECT_ID" < 100010))))
   and rownum < 2

============
Plan Table
============
-------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                            | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
-------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                     |         |       |       |  373K |           |       |       |
| 1   |  COUNT STOPKEY                       |         |       |       |       |           |       |       |
| 2   |   TABLE ACCESS BY GLOBAL INDEX ROWID | MACLEAN |     1 |    13 |  373K |  01:16:25 | 4     | 4     |
| 3   |    INDEX FULL SCAN                   | IND_OBJ |   37M |       |  373K |  01:16:25 |       |       |
-------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
1 - filter(ROWNUM<2)
3 - access("OBJECT_ID"<100010)
3 - filter("OBJECT_ID"<100010)

Content of other_xml column
===========================
  db_version     : 10.2.0.1
  parse_schema   : SYS
  plan_hash      : 985826631
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.1')
      FIRST_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX(@"SEL$1" "MACLEAN"@"SEL$1" ("MACLEAN"."DATA_OBJECT_ID" "MACLEAN"."OBJECT_ID"
"MACLEAN"."LAST_DDL_TIME" "MACLEAN"."TIMESTAMP" "MACLEAN"."OBJECT_TYPE" "MACLE
AN"."OWNER" "MACLEAN"."STATUS"))
    END_OUTLINE_DATA
  */

 

 

可以看到虽然INDEX FULL SCAN的成本(cost:382007)大于 Access Path: TableScan (cost : 2.00) 很多 ,但是optimizer 最终仍然选择了Index Full Scan , 因为其是满足First Rows 要求的执行计划(红色标注部分)。

 

于是我开始自己寻找workaround的路径,目标是让优化器忽略”First Rows”的影响。我一开始寄望于能够通过设置一些影响CBO计算cost的优化器参数来让optimizer 迷途知返, 包括设置optimizer_index_cost_adj和”_db_file_optimizer_read_count”的值到一个很大水平,但发现并不起作用:

 

SQL> alter session set "_db_file_optimizer_read_count"=65535;

Session altered.

SQL>  alter session set optimizer_index_cost_adj=10000;

Session altered.

SQL> alter system flush shared_pool;

System altered.

SQL> select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
  2   1
  3    from "SYS"."MACLEAN" PARTITION("P4")
  4   where (((("OBJECT_ID" < 100010))))
  5     and rownum < 2;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 985826631

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |    13 |    38M  (1)|127:20:09 |       |       |
|*  1 |  COUNT STOPKEY                      |         |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| MACLEAN |     1 |    13 |    38M  (1)|127:20:09 |     4 |     4 |
|*  3 |    INDEX FULL SCAN                  | IND_OBJ |    39M|       |    38M  (1)|127:20:09 |       |       |
---------------------------------------------------------------------------------------------------------------

 

得益于好奇心,我以’%optimizer%ignore’的Like语句去查了Oracle的隐式参数表,果然有志者事竟成,最终有所斩获:

 

SQL> col name for a40
SQL> col value for a20
SQL> col describ for a60
SQL> set linesize 200 pagesize 1400
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm LIKE '%optimizer%ignore%';

NAME                                     VALUE                DESCRIB
---------------------------------------- -------------------- ---------------------
_optimizer_ignore_hints                  TRUE                 enables the embedded hints to be ignored

 

 

在metalink上查了下没有关于该”_optimizer_ignore_hints” 隐式参数的任何有用信息, 就注释来看是可以启用是否忽略SQL中嵌入的HINT提示信息, 我们来具体看以下是否其作用:

 

 

SQL> alter system flush shared_pool;

System altered.

SQL> explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
  2   1
  3    from "SYS"."MACLEAN" PARTITION("P4")
  4   where (((("OBJECT_ID" < 100010))))
  5     and rownum < 2; 

Explained. 

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------
Plan hash value: 985826631

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |    13 |    38M  (1)|127:20:09 |       |       |
|*  1 |  COUNT STOPKEY                      |         |       |       |            |          |       |       |
|   2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| MACLEAN |     1 |    13 |    38M  (1)|127:20:09 |     4 |     4 |
|*  3 |    INDEX FULL SCAN                  | IND_OBJ |    39M|       |    38M  (1)|127:20:09 |       |       |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<2)
   3 - access("OBJECT_ID"<100010)
       filter("OBJECT_ID"<100010)

SQL> alter session set "_optimizer_ignore_hints"=true;

Session altered.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 10053 trace name context forever , level 1;
Statement processed.

SQL> explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
  2   1
  3    from "SYS"."MACLEAN" PARTITION("P4")
  4   where (((("OBJECT_ID" < 100010))))
  5     and rownum < 2; Explained. SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-----------------------------------------------
Plan hash value: 4280157877

---------------------------------------------------------------------------------------------------
| Id  | Operation               | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |         |     1 |    13 |     2   (0)| 00:00:01 |       |       |
|*  1 |  COUNT STOPKEY          |         |       |       |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|         |     1 |    13 |     2   (0)| 00:00:01 |     4 |     4 |
|*  3 |    TABLE ACCESS FULL    | MACLEAN |     1 |    13 |     2   (0)| 00:00:01 |     4 |     4 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<2)
   3 - filter("OBJECT_ID"<100010)

 

惊喜! 该 “_optimizer_ignore_hints”参数确实起到无视HINT提示的作用, 且可以直接在session级别修改,十分方便, 我们透过10053 event来观察该参数是如何其作用的:

 

***************************************
PARAMETERS USED BY THE OPTIMIZER
********************************
  *************************************
  PARAMETERS WITH ALTERED VALUES
  ******************************
  _optimizer_ignore_hints             = true

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: MACLEAN  Alias: MACLEAN  Partition [3]
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
    #Rows: 0  #Blks:  1  AvgRowLen:  0.00
Index Stats::
  Index: IND_OBJ  Col#: 5 4 8 9 6 1 10
    LVLS: 3  #LB: 380544  #DK: 50447  LB/K: 7.00  DB/K: 777.00  CLUF: 39208542.00
***************************************
SINGLE TABLE ACCESS PATH
  Column (#4): OBJECT_ID(NUMBER)  Part#: 3
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Column (#4): OBJECT_ID(NUMBER)
    AvgLen: 22.00 NDV: 0 Nulls: 0 Density: 0.0000e+00 Min: 0 Max: 0
  Table: MACLEAN  Alias: MACLEAN
    Card: Original: 0  Rounded: 1  Computed: 0.00  Non Adjusted: 0.00
  Access Path: TableScan
    Cost:  2.00  Resp: 2.00  Degree: 0
      Cost_io: 2.00  Cost_cpu: 7121
      Resp_io: 2.00  Resp_cpu: 7121
kkofmx: index filter:"MACLEAN"."OBJECT_ID"<100010 AND ROWNUM<2
  Access Path: index (skip-scan)
    SS sel: 0.0000e+00  ANDV (#skips): 4073
    SS io: 32584.00 vs. table scan io: 2.00
    Skip Scan rejected
  Access Path: index (FullScan)
    Index: IND_OBJ
    resc_io: 380547.00  resc_cpu: 10551751028
    ix_sel: 1  ix_sel_with_filters: 1
    Cost: 382007.38  Resp: 382007.38  Degree: 1
  Best:: AccessPath: TableScan
***************************************
OPTIMIZER STATISTICS AND COMPUTATIONS
***************************************
GENERAL PLANS
***************************************
Considering cardinality-based initial join order.
***********************
Join order[1]:  MACLEAN[MACLEAN]#0
***********************
Best so far: Table#: 0  cost: 2.0008  card: 0.0000  bytes: 13
(newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:80000
*********************************
Number of join permutations tried: 1
*********************************
Final - All Rows Plan:  Best join order: 1
  Cost: 2.0008  Degree: 1  Card: 1.0000  Bytes: 13
  Resc: 2.0008  Resc_io: 2.0000  Resc_cpu: 7121
  Resp: 2.0008  Resp_io: 2.0000  Resc_cpu: 7121

Current SQL statement for this session:
explain plan for select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */
 1
  from "SYS"."MACLEAN" PARTITION("P4")
 where (((("OBJECT_ID" < 100010))))
   and rownum < 2

============
Plan Table
============
-------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                | Name    | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
-------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT         |         |       |       |     2 |           |       |       |
| 1   |  COUNT STOPKEY           |         |       |       |       |           |       |       |
| 2   |   PARTITION RANGE SINGLE |         |     1 |    13 |     2 |  00:00:01 | 4     | 4     |
| 3   |    TABLE ACCESS FULL     | MACLEAN |     1 |    13 |     2 |  00:00:01 | 4     | 4     |
-------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
1 - filter(ROWNUM<2)
3 - filter("OBJECT_ID"<100010)

Content of other_xml column
===========================
  db_version     : 10.2.0.1
  parse_schema   : SYS
  plan_hash      : 4280157877
  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "MACLEAN"@"SEL$1")
    END_OUTLINE_DATA
  */

Dumping Hints
=============
  atom_hint=(@=0x988453c0 err=20 resol=1 used=0 token=453 org=1 lvl=2 txt=FIRST_ROWS )
  atom_hint=(@=0x988451f8 err=0 resol=1 used=1 token=177 org=1 lvl=3 txt=NO_PARALLEL ("MACLEAN") )
********** WARNING: SOME HINTS HAVE ERRORS *********

 

实际 “_optimizer_ignore_hints”参数是起到的作用是使SQL 隐式地回归到默认的optimizer_mode=ALL_ROWS上来,  我们终于战胜了嵌入在SQL语句中的”First Rows” Hint 。

 

总结

在针对10g 多个版本上的Split Partition 可能因 其DDL所附带的递归SQL 使用了 固定的 而又不恰当的”First rows Hint” 提示而造成语句执行缓慢的问题, 我们可以通过 以下3个方案解决:

A.  通过每次执行Split之前都收集指定分区的统计,并设置”_optimizer_ignore_hints” =true 来规避 分区统计信息失效和 不恰当的”First Rows hint” 所可能带来的问题, 方法如下:

exec dbms_stats.gather_table_stats('&OWNER','&TABNAME',partname=>'&PARNAME');

alter session set "_optimizer_ignore_hints"=true;

推荐使用方案A

补充:

有同学反映:

”隐藏参数_optimizer_ignore_hints在分区表的split操作中并未启到作用。

从我的测试结果来看,虽然导致split操作慢的根源是FIRST_ROWS优化器模式下的分区表select语句:

select /*+ FIRST_ROWS PARALLEL(“MACLEAN”, 1) */ 1
from
 MACLEAN PARTITION (“P4”)  where ( (  (  (
  “OBJECT_ID” < ‘1000010’ )  )  ) ) and rownum < 2;

不过以上的sql语句在_optimizer_ignore_hints参数调整后仍使用FIRST_ROWS的优化器模式,即没有生效。

我觉得这可能与oracle在执行自己内部命令时,会忽略该参数的设置。(直接执行select的sql语句确实会忽略hint)”

 

实际通过10053 事件追踪该SPLIT PARTITION DDL语句所生产的递归SQL语句,发现带有”FIRST_ROWS”提示的SELECT语句甚至没有解析的过程, 很有可能是该递归SQL语句直接使用了内部硬编码的存储大纲OUTLINES所导致的。

 

换句话说之前_optimizer_ignore_hints的隐式参数针对我们手动执行的带有FIRST_ROWS HINT的SELECT语句是有效的,而对于DDL所附带的递归SQL无效。

 

注:实际在对嵌入了HINT的非recursive SQL语句做调优时,若我们无法修改该SQL的HINT,则还是可以利用到”_optimizer_ignore_hints”的。

 

想了一下可以通过设置较旧的优化器特性了解决该问题(alter session set optimizer_features_enable=’8.0.0′;),该optimizer_features_enable参数同样可以在session级别设置, 如:

 

TEST A:

SQL> set timing on;

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.

SQL> oradebug tracefile_name;
/s01/admin/G10R21/udump/g10r21_ora_13646.trc

SQL> alter table maclean split partition p4 at (10000081) into (partition p14, partition p4);

Table altered.

Elapsed: 00:00:42.50

TEST B:

SQL> set timing on;

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug tracefile_name;
/s01/admin/G10R21/udump/g10r21_ora_13656.trc

SQL> alter session set optimizer_features_enable='8.0.0';

Session altered.

Elapsed: 00:00:00.01

SQL> alter table maclean split partition p4 at (10000091) into (partition p15, partition p4);

Table altered.

Elapsed: 00:00:00.05

PARSING IN CURSOR #2 len=152 dep=1 uid=0 oct=3 lid=0 tim=1291531645417871 hv=2124209225 ad='a74b41f0'

select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1 from "SYS"."MACLEAN" PARTITION ("P4") 
where ( (  (  ( "OBJECT_ID" < 10000091 )  )  ) ) and rownum < 2
END OF STMT
PARSE #2:c=1000,e=684,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=2,tim=1291531645417864
BINDS #2:
EXEC #2:c=0,e=71,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645417984
FETCH #2:c=0,e=92,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645418094
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='COUNT STOPKEY (cr=3 pr=0 pw=0 time=122 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=0 op='PARTITION RANGE SINGLE PARTITION: 13 13 (cr=3 pr=0 pw=0 time=95 us)'
STAT #2 id=3 cnt=0 pid=2 pos=1 obj=52809 op='TABLE ACCESS FULL MACLEAN PARTITION: 13 13 (cr=3 pr=0 pw=0 time=59 us)'
=====================

PARSING IN CURSOR #2 len=177 dep=1 uid=0 oct=3 lid=0 tim=1291531645418799 hv=339345368 ad='a74cedd8'
select /*+ FIRST_ROWS PARALLEL("MACLEAN", 1) */ 1 from "SYS"."MACLEAN" PARTITION ("P4")
 where ( (  (  ( "OBJECT_ID" >= 10000091 OR "OBJECT_ID" IS NULL  )  )  ) ) and
rownum < 2
END OF STMT
PARSE #2:c=1000,e=589,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=2,tim=1291531645418792
BINDS #2:
EXEC #2:c=0,e=67,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645418908
FETCH #2:c=0,e=66,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=2,tim=1291531645418992
STAT #2 id=1 cnt=0 pid=0 pos=1 obj=0 op='COUNT STOPKEY (cr=3 pr=0 pw=0 time=92 us)'
STAT #2 id=2 cnt=0 pid=1 pos=1 obj=0 op='PARTITION RANGE SINGLE PARTITION: 13 13 (cr=3 pr=0 pw=0 time=64 us)'
STAT #2 id=3 cnt=0 pid=2 pos=1 obj=52809 op='TABLE ACCESS FULL MACLEAN PARTITION: 13 13 (cr=3 pr=0 pw=0 time=36 us)'
WAIT #1: nam='control file sequential read' ela= 31 file#=0 block#=1 blocks=1 obj#=-1 tim=1291531645419383
WAIT #1: nam='control file sequential read' ela= 12 file#=1 block#=1 blocks=1 obj#=-1 tim=1291531645419432
WAIT #1: nam='control file sequential read' ela= 23 file#=0 block#=15 blocks=1 obj#=-1 tim=1291531645419494
WAIT #1: nam='control file sequential read' ela= 10 file#=0 block#=17 blocks=1 obj#=-1 tim=1291531645419556

 

B. 如果确实遇到了该问题 ,也可以将Index FULL SCAN 所使用的全局索引drop 掉来达到强制使用FULL single table partition SCAN的目的, 实际使用中不推荐

C. 避免使用Global Partitioned index 或 Non-partitioned Index , 而采用Local partitioned index , 这似乎更难做到

D. 也可以通过将原Maxvalue的分区drop掉之后(前提是该分区真的是空的),再添加新分区的做法来绕过该问题

CBO为什么不走索引?

原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:

 

SQL> create table maclean1 as select * from dba_objects;

Table created.

SQL> update maclean1 set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;                                       

Commit complete.

SQL> create index ind_maclean1 on maclean1(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);

PL/SQL procedure successfully completed.

SQL> explain plan for select * from maclean1 where status='INVALID';

Explained.

SQL> set linesize 140 pagesize 1400
SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
Plan hash value: 987568083

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 11320 |  1028K|    85   (0)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MACLEAN1 | 11320 |  1028K|    85   (0)| 00:00:02 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STATUS"='INVALID')

13 rows selected.

10053 trace 

Access path analysis for MACLEAN1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
  Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
  Table: MACLEAN1  Alias: MACLEAN1
    Card: Original: 22639.000000  Rounded: 11320  Computed: 11319.50  Non Adjusted: 11319.50
  Access Path: TableScan
    Cost:  85.33  Resp: 85.33  Degree: 0
      Cost_io: 85.00  Cost_cpu: 11935345
      Resp_io: 85.00  Resp_cpu: 11935345
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN1
    resc_io: 185.00  resc_cpu: 8449916
    ix_sel: 0.500000  ix_sel_with_filters: 0.500000
    Cost: 185.24  Resp: 185.24  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 85.33  Degree: 1  Resp: 85.33  Card: 11319.50  Bytes: 0

可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
所以优化器选择做全表扫描是有道理的

 

 

以上符合”STATUS”=’INVALID’ condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。

 

 

实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status=’INVALID’条件仅有少量的card满足,具有良好的选择性:

 

[oracle@vrh4 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> show parameter optimizer_fea

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.2.0.2

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn  & www.askmac.cn

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL>  commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2');

PL/SQL procedure successfully completed.

 

这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。

得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:

 

rem
rem Generate a histogram of data distribution in a column as recorded
rem  in dba_tab_histograms
rem
rem Guy Harrison Jan 2010 : www.guyharrison.net
rem
rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 

set pagesize 10000
set lines 120
set verify off

col char_value format a10 heading "Endpoint|value"
col bucket_count format 99,999,999 heading "bucket|count"
col pct format 999.99 heading "Pct"
col pct_of_max format a62 heading "Pct of|Max value"
rem col endpoint_value format 9999999999999 heading "endpoint|value" 

CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER)
    RETURN VARCHAR2
AS
    l_str      LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x'));
    l_return   VARCHAR2 (4000);
BEGIN
    WHILE (l_str IS NOT NULL)
    LOOP
        l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx'));
        l_str := SUBSTR (l_str, 3);
    END LOOP;

    RETURN (SUBSTR (l_return, 1, 6));
END;
/

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value ,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data;

WITH hist_data AS (
SELECT endpoint_value,endpoint_actual_value,
       NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
       endpoint_number,
       endpoint_number,
       endpoint_number
       - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
           bucket_count
FROM dba_tab_histograms
JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
WHERE     owner = '&owner'
      AND table_name = '&table'
      AND column_name = '&column'
      AND histogram='FREQUENCY')
SELECT hexstr(endpoint_value) char_value,
       bucket_count,
       ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
       RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
  FROM hist_data
ORDER BY endpoint_value;

 

使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:

 

 

这里dbms_stats 包收集到的STATUS=’INVALID’ bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:

 

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SQL>  select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
-------------------------------------
Plan hash value: 3087014066

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |     9 |   837 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MACLEAN     |     9 |   837 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_MACLEAN |     9 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STATUS"='INVALID')

以上可以看到只要收集了直方图CBO就会认识到满足STATUS=’INVALID’的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。

我们进一步来看看有直方图情况下的10053 trace:

SQL> alter system flush shared_pool;

System altered.

SQL> oradebug setmypid;
Statement processed.

SQL> oradebug event 10053 trace name context forever ,level 1;
Statement processed.

SQL> explain plan for select * from maclean where status='INVALID';

Explained.

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MACLEAN[MACLEAN]
  Column (#10):
    NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2

  这里的NewDensity= bucket_count / SUM(bucket_count) /2

   Column (#10): STATUS(
    AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
    Histogram: Freq  #Bkts: 2  UncompBkts: 22640  EndPtVals: 2
  Table: MACLEAN  Alias: MACLEAN
 Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
  Access Path: TableScan
    Cost:  85.30  Resp: 85.30  Degree: 0
      Cost_io: 85.00  Cost_cpu: 10804625
      Resp_io: 85.00  Resp_cpu: 10804625
  Access Path: index (AllEqRange)
    Index: IND_MACLEAN
    resc_io: 2.00  resc_cpu: 20763
    ix_sel: 0.000398  ix_sel_with_filters: 0.000398
    Cost: 2.00  Resp: 2.00  Degree: 1
  Best:: AccessPath: IndexRange
  Index: IND_MACLEAN
         Cost: 2.00  Degree: 1  Resp: 2.00  Card: 9.00  Bytes: 0

 

上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗?

实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>

 

假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:

 

SQL> drop table maclean;

Table dropped.

SQL>  create table maclean as select * from dba_objects;

Table created.

SQL> update maclean set status='INVALID' where owner='MACLEAN';

2 rows updated.

SQL> commit;

Commit complete.

SQL> create index ind_maclean on maclean(status);

Index created.

使用dbms_stats默认method_opt收集maclean表

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
old  12:    WHERE owner = '&owner'
new  12:    WHERE owner = 'SYS'
Enter value for table: MACLEAN
old  13:      AND table_name = '&table'
new  13:      AND table_name = 'MACLEAN'
Enter value for column: STATUS
old  14:      AND column_name = '&column'
new  14:      AND column_name = 'STATUS'

no rows selected

因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图

    declare
    begin
    for i in 1..500 loop
	execute immediate ' alter system flush shared_pool';
	DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
    execute immediate 'select count(*)  from maclean where status=''INVALID'' ' ;
    end loop;
    end;
    /

PL/SQL procedure successfully completed.

SQL> select obj# from obj$ where name='MACLEAN';

      OBJ#
----------
     97215
SQL> select * from  col_usage$ where  OBJ#=97215;

       OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
     97215          1              1              0                 0           0          0          0 17-OCT-11
     97215         10            499              0                 0           0          0          0 17-OCT-11

SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');

PL/SQL procedure successfully completed.

@histogram.sql 

Enter value for owner: SYS
Enter value for table: MACLEAN
Enter value for column: STATUS

Endpoint        bucket         Pct of
value            count     Pct Max value
---------- ----------- ------- --------------------------------------------------------------
INVALI               2     .04
VALIC3           5,453   99.96  *************************************************

Find INTCOL#=1001 in col_usage$?

<了解你所不知道的SMON功能(四):维护col_usage$字典基表>中我介绍了SMON后台进程维护字典基表COL_USAGE$一些细节,有网友阅读了这篇文档后发现其数据库的COL_USAGE$中存在INTCOL#=1001的记录。

INTCOL#列表示internal column number对应于COL$基表的INTCOL#,注意Internal Column Number与COL#(column number as created)是不同的。$ORACLE_HOME/rdbms/admin/sql.bsq对于INTCOL#给出了解释:

* If a table T(c1, addr, c2) contains an ADT column addr which is stored
   * exploded, the table will be internally stored as
   *              T(c1, addr, C0003$, C0004$, C0005$, c2)
   * Of these, only c1, addr and c2 are user visible columns. Thus, the
   * user visible column numbers for (c1, addr, C0003$, C0004$, C0005$, c2)
   * will be 1,2,0,0,0,3. And the corresponding internal column numbers will
   * be 1,2,3,4,5,6.
   *
   * Some dictionary tables like icol$, ccol$ need to contain intcol# so
   * that we can have indexes and constraints on ADT attributes. Also, these
   * tables also need to contain col# to maintain backward compatibility.
   * Most of these tables will need to be accessed by col#, intcol# so
   * indexes are created on them based on (obj#, col#) and (obj#, intcol#).
   * Indexes based on col# have to be non-unique if ADT attributes might
   * appear in the table. Indexes based on intcol# can be unique.

这里的ADT指的是抽象数据类型(Abstract DataType is a user defined data type),例如:

CREATE OR REPLACE TYPE persons_address AS OBJECT (
  streetNumber NUMBER,
  streetName   VARCHAR2(30),
  citySuburb   VARCHAR2(30),
  state        VARCHAR2(4),
  postCode     NUMBER
);

熟读Oracle官方文档的朋友一定会记得,Oracle中单表的column总数存在一个上限:1000,即单表不能拥有超过1000个列。

但令人疑惑的是INTCOL#居然是1001,显然1001是某种magic number,而不是指第1001列。

搞清楚这个问题后,再进一步探索就不难发现问题的关键了:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> drop table maclean;

Table dropped.

SQL> create table maclean(oppo_find_me int);

Table created.

SQL> select object_id from dba_objects where object_name='MACLEAN';

 OBJECT_ID
----------
   1343832

SQL> select intcol# from col_usage$ where obj#=1343832;

no rows selected

SQL> insert into maclean values(1);

1 row created.

SQL> commit;

Commit complete.

SQL> select rowid from maclean;    

ROWID
------------------
AAFIFYAABAAByPKAAA

SQL> delete maclean where rowid='AAFIFYAABAAByPKAAA';

1 row deleted.

SQL> commit;

Commit complete.

SQL> exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

PL/SQL procedure successfully completed.

SQL> select intcol#,equality_preds from col_usage$ where obj#=1343832;

   INTCOL# EQUALITY_PREDS
---------- --------------
      1001              1

通过这一点可以了解那些字典基表是以rowid为条件来查询或更新的

SQL> select owner || '.' || object_name
  2    from dba_objects
  3   where object_id in (select obj# from col_usage$ where intcol# = 1001);

OWNER||'.'||OBJECT_NAME
--------------------------------------------------------------------------------
SYS.COL$
SYS.CDEF$
SYS.VIEW$
SYS.SUMPARTLOG$
SYS.SUM$
SYS.SUMKEY$
SYS.SUMAGG$
SYS.SUMPRED$
SYS.SUMQB$
SYS.PS$
SYS.AW_OBJ$

OWNER||'.'||OBJECT_NAME
--------------------------------------------------------------------------------
SYS.AW_PROP$
SYS.WRI$_ADV_OBJECTS
WMSYS.WM$WORKSPACES_TABLE
SYS.MACLEAN
CTXSYS.DR$INDEX
XDB.XDB$H_INDEX
XDB.XDB$RESOURCE
EXFSYS.RLM$SCHACTLIST
SYS.AW$EXPRESS
MACLEAN.SAMPLE

总结:
Oracle最早在9i中引入了col_usage$字典基表,其目的在于监控column在SQL语句作为predicate的情况,col_usage$的出现完善了CBO中柱状图自动收集的机制。该字典基表上的INTCOL#列代表Internal Column Number以标识ADT列。INTCOL#等于1001代表ROWID伪列,也就是相关对象曾使用ROWID充当predicate。

Oracle优化器:星型转换

Oracle 8i中引入了星型转换(star transformation)的优化器新特性以便更有效地处理星型查询。星型查询语句多用于基于星型模型设计的数据仓库应用中。星型模型的称谓源于该种模型以图形化表现时看起来形似一颗海星。这颗星的中央会由一个或多个事实表(fact tables)组成,而各个触角上则分布着多个维度表(dimension tables),如下图:

星型转换的基本思路是尽量避免直接去扫描星型模式中的事实表,因为这些事实表总会因为存有大量数据而十分庞大,对这些表的全表扫描会引起大量物理读并且效率低下。在典型的星型查询中,事实表总是会和多个与之相比小得多的维度表发生连接(join)操作。典型的事实表针对每一个维度表会存在一个外键(foreign key),除去这些键值(key)外还会存在一些度量字段譬如销售额度(sales amount)。与之对应的键值(key)在维度表上扮演主键的角色。而事实表与维度表间的连接操作一般都会发生在事实表上的外键和与之对应的维度表的主键间。同时这类查询总是会在维度表的其他列上存在限制十分严格的过滤谓词。充分结合这些维度表上的过滤谓词可以有效减少需要从事实表上访问的数据集合。这也就是星型转换(star transformation)的根本目的,仅访问事实表上相关的、过滤后精简的数据集合。

Oracle在Sample Schema示例模式中就存有星型模型的Schema,譬如SH:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
-----------------------------------
www.askmac.cn

SQL> conn maclean/maclean
Connected.

SQL> select table_name,comments
  2    from dba_tab_comments
  3   where owner = 'SH'
  4     and table_name in ('SALES', 'CUSTOMERS', 'CHANNELS', 'TIMES');

TABLE_NAME                     COMMENTS
------------------------------ --------------------------------------------------------------------------------
CHANNELS                       small dimension table
CUSTOMERS                      dimension table
SALES                          facts table, without a primary key; all rows are uniquely identified by the comb
TIMES                          Time dimension table to support multiple hierarchies and materialized views

可以从以上各表的注释(comment)中看到,SALES表是SH模式下一个没有主键的事实表,而CHANNELS、CUSTOMERS、TIMES三个小表充当维度表的角色。我们试着构建以下星型查询语句,该查询用以检索出从1999年12月至2000年2月间Florida州所有城市直销形式的每月销售额。

SQL> col name for a35
SQL> col description for a45
SQL> col value for a8
SQL> select name,value,description from v$system_parameter where name='star_transformation_enabled';

NAME                                VALUE    DESCRIPTION
----------------------------------- -------- ---------------------------------------------
star_transformation_enabled         FALSE    enable the use of star transformation

/* 初始化参数star_transformation_enabled用以控制如何启用星型转换,
    默认为FALSE,该参数可以动态修改
*/

SELECT c.cust_city,
       t.calendar_quarter_desc,
       SUM(s.amount_sold) sales_amount
  FROM sh.sales s, sh.times t, sh.customers c, sh.channels ch
 WHERE s.time_id = t.time_id
   AND s.cust_id = c.cust_id
   AND s.channel_id = ch.channel_id
   AND c.cust_state_province = 'FL'
   AND ch.channel_desc = 'Direct Sales'
   AND t.calendar_quarter_desc IN ('2000-01', '2000-02','1999-12')
 GROUP BY c.cust_city, t.calendar_quarter_desc;

SQL> select * from table(dbms_xplan.display_cursor(format => 'IOSTATS'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------

SQL_ID  ddjm7k72b8p2a, child number 1
-------------------------------------
SELECT /*+ gather_plan_statistics */ c.cust_city,
t.calendar_quarter_desc,        SUM(s.amount_sold) sales_amount   FROM
sh.sales s, sh.times t, sh.customers c, sh.channels ch  WHERE s.time_id
= t.time_id    AND s.cust_id = c.cust_id    AND s.channel_id =
ch.channel_id    AND c.cust_state_province = 'FL'    AND
ch.channel_desc = 'Direct Sales'    AND t.calendar_quarter_desc IN
('2000-01', '2000-02','1999-12')  GROUP BY c.cust_city,
t.calendar_quarter_desc

Plan hash value: 382868716

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |      1 |        |     24 |00:00:00.62 |    1735 |   1726 |
|   1 |  HASH GROUP BY                 |           |      1 |     24 |     24 |00:00:00.62 |    1735 |   1726 |
|*  2 |   HASH JOIN                    |           |      1 |   1580 |   6015 |00:00:00.42 |    1735 |   1726 |
|*  3 |    TABLE ACCESS FULL           | CUSTOMERS |      1 |   2438 |   2438 |00:00:01.73 |    1459 |   1455 |
|*  4 |    HASH JOIN                   |           |      1 |   4575 |  74631 |00:00:00.18 |     276 |    271 |
|   5 |     PART JOIN FILTER CREATE    | :BF0000   |      1 |    227 |    182 |00:00:00.04 |      59 |     60 |
|   6 |      MERGE JOIN CARTESIAN      |           |      1 |    227 |    182 |00:00:00.04 |      59 |     60 |
|*  7 |       TABLE ACCESS FULL        | CHANNELS  |      1 |      1 |      1 |00:00:00.01 |       3 |      6 |
|   8 |       BUFFER SORT              |           |      1 |    227 |    182 |00:00:00.02 |      56 |     54 |
|*  9 |        TABLE ACCESS FULL       | TIMES     |      1 |    227 |    182 |00:00:00.02 |      56 |     54 |
|  10 |     PARTITION RANGE JOIN-FILTER|           |      1 |    117K|    117K|00:00:00.09 |     217 |    211 |
|  11 |      TABLE ACCESS FULL         | SALES     |      2 |    117K|    117K|00:00:00.07 |     217 |    211 |
---------------------------------------------------------------------------------------------------------------

可以看到在以上不使用星型转换的执行计划中对事实表SALES执行了全表扫描,这是我们不希望发生的。因为SALES表中每一行记录都对应于一笔销售记录,因此其可能包含数百万行记录。但实际上这其中仅有极小部分是我们在查询中指定的季度在弗罗里达州直销的纪录。若我们启用星型转换,执行计划是否有所改善?

SQL> alter session set star_transformation_enabled=temp_disable;
Session altered.

SQL> alter session set events '10053 trace name context forever,level 1';
Session altered.

在我们的理想当中星型变化会将原查询语句转换成如下形式:

SELECT c.cust_city,
       t.calendar_quarter_desc,
       SUM(s.amount_sold) sales_amount
  FROM sh.sales s, sh.times t, sh.customers c
 WHERE s.time_id = t.time_id
   AND s.cust_id = c.cust_id
   AND c.cust_state_province = 'FL'
   AND t.calendar_quarter_desc IN ('2000-01', '2000-02', '1999-12')
   AND s.time_id IN
       (SELECT time_id
          FROM sh.times
         WHERE calendar_quarter_desc IN ('2000-01', '2000-02', '1999-12'))
   AND s.cust_id IN
       (SELECT cust_id FROM sh.customers WHERE cust_state_province = 'FL')
   AND s.channel_id IN
       (SELECT channel_id
          FROM sh.channels
         WHERE channel_desc = 'Direct Sales')
 GROUP BY c.cust_city, t.calendar_quarter_desc;

/* 以添加AND..IN的形式明确了利用组合过滤谓词来减少需要处理的数据集 */

通过10053优化trace我们可以了解Oracle优化器是如何真正产生这部分过度谓词的:

FPD: Considering simple filter push in query block SEL$C3AF6D21 (#1)
"S"."CHANNEL_ID"=ANY (SELECT /*+ SEMIJOIN_DRIVER */ "CH"."CHANNEL_ID" FROM "SH"."CHANNELS" "CH")
AND "S"."CUST_ID"=ANY (SELECT /*+ SEMIJOIN_DRIVER */ "C"."CUST_ID" FROM "SH"."CUSTOMERS" "C") AND
"S"."TIME_ID"=ANY (SELECT /*+ SEMIJOIN_DRIVER */ "T"."TIME_ID
FPD: Considering simple filter push in query block SEL$ACF30367 (#4)
"T"."CALENDAR_QUARTER_DESC"='2000-01' OR "T"."CALENDAR_QUARTER_DESC"='2000-02' OR "T"."CALENDAR_QUARTER_DESC"='1999-12'
try to generate transitive predicate from check constraints for query block SEL$ACF30367 (#4)
finally: "T"."CALENDAR_QUARTER_DESC"='2000-01' OR "T"."CALENDAR_QUARTER_DESC"='2000-02' OR "T"."CALENDAR_QUARTER_DESC"='1999-12'

FPD: Considering simple filter push in query block SEL$F6045C7B (#3)
"C"."CUST_STATE_PROVINCE"='FL'
try to generate transitive predicate from check constraints for query block SEL$F6045C7B (#3)
finally: "C"."CUST_STATE_PROVINCE"='FL'

FPD: Considering simple filter push in query block SEL$6EE793B7 (#2)
"CH"."CHANNEL_DESC"='Direct Sales'
try to generate transitive predicate from check constraints for query block SEL$6EE793B7 (#2)
finally: "CH"."CHANNEL_DESC"='Direct Sales'

try to generate transitive predicate from check constraints for query block SEL$C3AF6D21 (#1)
finally: "S"."CHANNEL_ID"=ANY (SELECT /*+ SEMIJOIN_DRIVER */ "CH"."CHANNEL_ID" FROM "SH"."CHANNELS" "CH")
AND "S"."CUST_ID"=ANY (SELECT /*+ SEMIJOIN_DRIVER */ "C"."CUST_ID" FROM "SH"."CUSTOMERS" "C")
AND "S"."TIME_ID"=ANY (SELECT /*+ SEMIJOIN_DRIVER */ "T"."TIME_ID

Final query after transformations:******* UNPARSED QUERY IS *******

最终转换后的查询语句:

SELECT "C"."CUST_CITY" "CUST_CITY",
       "T"."CALENDAR_QUARTER_DESC" "CALENDAR_QUARTER_DESC",
       SUM("S"."AMOUNT_SOLD") "SALES_AMOUNT"
  FROM "SH"."SALES" "S", "SH"."TIMES" "T", "SH"."CUSTOMERS" "C"
 WHERE "S"."CHANNEL_ID" = ANY (SELECT /*+ SEMIJOIN_DRIVER */
         "CH"."CHANNEL_ID" "ITEM_1"
          FROM "SH"."CHANNELS" "CH"
         WHERE "CH"."CHANNEL_DESC" = 'Direct Sales')
   AND "S"."CUST_ID" = ANY (SELECT /*+ SEMIJOIN_DRIVER */
         "C"."CUST_ID" "ITEM_1"
          FROM "SH"."CUSTOMERS" "C"
         WHERE "C"."CUST_STATE_PROVINCE" = 'FL')
   AND "S"."TIME_ID" = ANY
 (SELECT /*+ SEMIJOIN_DRIVER */
         "T"."TIME_ID" "ITEM_1"
          FROM "SH"."TIMES" "T"
         WHERE "T"."CALENDAR_QUARTER_DESC" = '2000-01'
            OR "T"."CALENDAR_QUARTER_DESC" = '2000-02'
            OR "T"."CALENDAR_QUARTER_DESC" = '1999-12')
   AND "S"."TIME_ID" = "T"."TIME_ID"
   AND "S"."CUST_ID" = "C"."CUST_ID"
   AND "C"."CUST_STATE_PROVINCE" = 'FL'
   AND ("T"."CALENDAR_QUARTER_DESC" = '2000-01' OR
       "T"."CALENDAR_QUARTER_DESC" = '2000-02' OR
       "T"."CALENDAR_QUARTER_DESC" = '1999-12')
 GROUP BY "C"."CUST_CITY", "T"."CALENDAR_QUARTER_DESC"

/* 要比我们想想的复杂一些,子查询将IN语句化解了,
    并且AND...ANY的形式追加了过度谓词条件
*/

------------------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                              | Name             | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
------------------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                       |                  |       |       |  1710 |           |       |       |
| 1   |  HASH GROUP BY                         |                  |  1254 |   77K |  1710 |  00:00:21 |       |       |
| 2   |   HASH JOIN                            |                  |  1254 |   77K |  1283 |  00:00:16 |       |       |
| 3   |    HASH JOIN                           |                  |  1254 |   45K |   877 |  00:00:11 |       |       |
| 4   |     TABLE ACCESS FULL                  | TIMES            |   227 |  3632 |    18 |  00:00:01 |       |       |
| 5   |     PARTITION RANGE SUBQUERY           |                  |  1254 |   26K |   858 |  00:00:11 | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 6   |      TABLE ACCESS BY LOCAL INDEX ROWID | SALES            |  1254 |   26K |   858 |  00:00:11 | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 7   |       BITMAP CONVERSION TO ROWIDS      |                  |       |       |       |           |       |       |
| 8   |        BITMAP AND                      |                  |       |       |       |           |       |       |
| 9   |         BITMAP MERGE                   |                  |       |       |       |           |       |       |
| 10  |          BITMAP KEY ITERATION          |                  |       |       |       |           |       |       |
| 11  |           BUFFER SORT                  |                  |       |       |       |           |       |       |
| 12  |            TABLE ACCESS FULL           | CHANNELS         |     1 |    13 |     3 |  00:00:01 |       |       |
| 13  |           BITMAP INDEX RANGE SCAN      | SALES_CHANNEL_BIX|       |       |       |           | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 14  |         BITMAP MERGE                   |                  |       |       |       |           |       |       |
| 15  |          BITMAP KEY ITERATION          |                  |       |       |       |           |       |       |
| 16  |           BUFFER SORT                  |                  |       |       |       |           |       |       |
| 17  |            TABLE ACCESS FULL           | TIMES            |   227 |  3632 |    18 |  00:00:01 |       |       |
| 18  |           BITMAP INDEX RANGE SCAN      | SALES_TIME_BIX   |       |       |       |           | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 19  |         BITMAP MERGE                   |                  |       |       |       |           |       |       |
| 20  |          BITMAP KEY ITERATION          |                  |       |       |       |           |       |       |
| 21  |           BUFFER SORT                  |                  |       |       |       |           |       |       |
| 22  |            TABLE ACCESS FULL           | CUSTOMERS        |  2438 |   38K |   406 |  00:00:05 |       |       |
| 23  |           BITMAP INDEX RANGE SCAN      | SALES_CUST_BIX   |       |       |       |           | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 24  |    TABLE ACCESS FULL                   | CUSTOMERS        |  2438 |   62K |   406 |  00:00:05 |       |       |
------------------------------------------------------------------+-----------------------------------+---------------+
Predicate Information:
----------------------
2 - access("S"."CUST_ID"="C"."CUST_ID")
3 - access("S"."TIME_ID"="T"."TIME_ID")
4 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR "T"."CALENDAR_QUARTER_DESC"='2000-02'))
12 - filter("CH"."CHANNEL_DESC"='Direct Sales')
13 - access("S"."CHANNEL_ID"="CH"."CHANNEL_ID")
17 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR "T"."CALENDAR_QUARTER_DESC"='2000-02'))
18 - access("S"."TIME_ID"="T"."TIME_ID")
22 - filter("C"."CUST_STATE_PROVINCE"='FL')
23 - access("S"."CUST_ID"="C"."CUST_ID")
24 - filter("C"."CUST_STATE_PROVINCE"='FL')

从以上演示中可以看到,星型转换添加了必要的对应于维度表约束的子查询谓词。这些子查询谓词又被称为位图半连接谓词(bitmap semi-join predicates,见SEMIJOIN_DRIVER hint)。通过迭代来自于子查询的键值,再通过位图(bitmap)的AND、OR操作(这些位图可以源于位图索引bitmap index,但也可以取自普通的B*tree索引),我们可以做到仅仅访问事实表上的查询相关记录。理想状况下维度表上的过滤谓词可以帮我们过滤掉大量的数据,这样就可以使执行计划效率大大提升。当我们获取到事实表上的相关行后,这部分结果集可能仍需要同维度表使用原谓词重复连接(join back)。某些情况下,重复连接可以被省略,之后我们会提到。

如上演示中列出了星型转换后的查询语句的执行计划。这里可以看到Oracle是使用”TABLE ACCESS BY LOCAL INDEX ROWID”形式访问SALES事实表的,而非全表扫描。这里我们仅关心7-23行的执行计划,服务进程分别在(12,17,22)行从维度表中取得各维度表的相关键值(key value),同时对部分结果集执行了BUFFER SORT操作;在(13,18,23)行的’bitmap index range scan’操作中服务进程从事实表的三个对应于维度表外键的位图索引上(SALES_CHANNEL_BIX,SALES_TIME_BIX,SALES_CUST_BIX)获取了最原始的位图。位图上的每一个bit都对应于事实表上的一行记录。若从子查询中获取的键值(key values)与事实表上的值一致则bit置为1,否则为0。举例而言位图bitmap:[1][0][1][1][0][0][0]..[0](之后都为零)表示事实表上仅有第一、三、四行匹配于由子查询提供的键值。我们假设以上位图是由times表子查询提供的众多键值中的一个(如’2000-01′)的对应于事实表的位图表达式。

接着在执行计划的(10,15,20)行上的’bitmap key iteration’操作会迭代每一个由子查询提供的键值并获取相应的位图。我们假设times表子查询提供的另外2个键值’2000-02’和’1999-12’分别对应的位图为[0][0][0][0][0][1]..[0]和[0][0][0][0][1][0]…[0]即每键值都只有一行符合。

毫无疑问ITERATION迭代操作会为我们生成众多位图,接下来需要对这些不同键值对应的位图进行位图合并操作(BITMAP MERGE,相当于对位图做OR操作),可以看到在上例执行计划中为(9,14,19)行;以我们假设的times表子查询位图合并而言,会生产一个简单的位图[1][0][1][1][1][1][0][0]..[0],这个位图对应事实表上的第一、三、四、五、六行,是对’2000-01′,’2000-02′,’1999-12’三个键值对应位图的合并。

在获得最后位图前我们还需要对来自于三个子查询的位图进一步处理,因为原始查询语句中各约束条件是AND与的形式,因此我们还要对这些已合并的位图执行AND与操作,如执行计划中的第八行”BITMAP AND”,因为是AND与操作所以这步又会过滤掉大量记录。我们假设最终获得的位图是[1][0][1][0]…[0],即仅有第一、三行。

通过最终bitmap位图Oracle可以极高效地生成事实表的ROWID,此步骤表现为第七行的”BITMAP CONVERSION TO ROWIDS”,我们使用这些ROWID来访问事实表取得少量的”绝对”相关记录。以我们的假设而言最终位图仅有2位为1,只需要用这2行的ROWID从事实表上直接fetch2条记录即可,从而避免了低效的全表扫描。

省略重复连接

因为子查询及位图树只是通过维度表上的过滤条件为事实表过滤掉大量的数据,所以从事实表上获取的相关数据仍可能需要重复一次和维度表的连接。省略重复连接的前提是维度表上所有的谓词都是半连接谓词子查询的一部分,And 由子查询检索到的列均唯一(unique) And 维度表的列不被select或group by涉及。在上例中无需对CHANNELS表再次连接的理由是没有select(或group by)CHANNEL表上的列,且channel_id列是唯一的。

临时表转换

若在已知星型转换中重复连接维度表无法被省略的话,Oracle可以将对维度表的子查询结果集存储到内存中的全局临时表(global temporary table)上以避免重复扫描维度表。此外,因为将子查询的结果集物化了,故而若使用并行查询则每个并行子进程(slave)可以直接从物化结果集的临时表中获得数据,而不需要反复执行子查询。

试看以下示例,了解Oracle是如何利用物化临时表避免反复连接的:

SQL> alter session set star_transformation_enabled=true;
Session altered.

SQL> alter session set events '10053 trace name context forever,level 1';
Session altered.

SELECT "T1"."C1" "CUST_CITY",
       "T"."CALENDAR_QUARTER_DESC" "CALENDAR_QUARTER_DESC",
       SUM("S"."AMOUNT_SOLD") "SALES_AMOUNT"
  FROM "SH"."SALES"                      "S",
       "SH"."TIMES"                      "T",
       "SYS"."SYS_TEMP_0FD9D660E_1DF5D6" "T1"
 WHERE "S"."CUST_ID" = ANY (SELECT /*+ SEMIJOIN_DRIVER CACHE_TEMP_TABLE ("T1") */
         "T1"."C0" "C0"
          FROM "SYS"."SYS_TEMP_0FD9D660E_1DF5D6" "T1")
   AND "S"."CHANNEL_ID" = ANY
 (SELECT /*+ SEMIJOIN_DRIVER */
         "CH"."CHANNEL_ID" "ITEM_1"
          FROM "SH"."CHANNELS" "CH"
         WHERE "CH"."CHANNEL_DESC" = 'Direct Sales')
   AND "S"."TIME_ID" = ANY
 (SELECT /*+ SEMIJOIN_DRIVER */
         "T"."TIME_ID" "ITEM_1"
          FROM "SH"."TIMES" "T"
         WHERE "T"."CALENDAR_QUARTER_DESC" = '2000-01'
            OR "T"."CALENDAR_QUARTER_DESC" = '2000-02'
            OR "T"."CALENDAR_QUARTER_DESC" = '1999-12')
   AND "S"."TIME_ID" = "T"."TIME_ID"
   AND "S"."CUST_ID" = "T1"."C0"
   AND ("T"."CALENDAR_QUARTER_DESC" = '2000-01' OR
       "T"."CALENDAR_QUARTER_DESC" = '2000-02' OR
       "T"."CALENDAR_QUARTER_DESC" = '1999-12')
 GROUP BY "T1"."C1", "T"."CALENDAR_QUARTER_DESC"

以上为启用临时表后的星型转换后的查询语句,相应的执行计划如下:
---------------------------------------------------------------------------+-----------------------------------+---------------+
| Id  | Operation                               | Name                     | Rows  | Bytes | Cost  | Time      | Pstart| Pstop |
---------------------------------------------------------------------------+-----------------------------------+---------------+
| 0   | SELECT STATEMENT                        |                          |       |       |   911 |           |       |       |
| 1   |  TEMP TABLE TRANSFORMATION              |                          |       |       |       |           |       |       |
| 2   |   LOAD AS SELECT                        |                          |       |       |       |           |       |       |
| 3   |    TABLE ACCESS FULL                    | CUSTOMERS                |  2438 |   62K |   406 |  00:00:05 |       |       |
| 4   |   HASH GROUP BY                         |                          |  1254 |   64K |   506 |  00:00:07 |       |       |
| 5   |    HASH JOIN                            |                          |  1254 |   64K |   479 |  00:00:06 |       |       |
| 6   |     HASH JOIN                           |                          |  1254 |   45K |   475 |  00:00:06 |       |       |
| 7   |      TABLE ACCESS FULL                  | TIMES                    |   227 |  3632 |    18 |  00:00:01 |       |       |
| 8   |      PARTITION RANGE SUBQUERY           |                          |  1254 |   26K |   456 |  00:00:06 | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 9   |       TABLE ACCESS BY LOCAL INDEX ROWID | SALES                    |  1254 |   26K |   456 |  00:00:06 | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 10  |        BITMAP CONVERSION TO ROWIDS      |                          |       |       |       |           |       |       |
| 11  |         BITMAP AND                      |                          |       |       |       |           |       |       |
| 12  |          BITMAP MERGE                   |                          |       |       |       |           |       |       |
| 13  |           BITMAP KEY ITERATION          |                          |       |       |       |           |       |       |
| 14  |            BUFFER SORT                  |                          |       |       |       |           |       |       |
| 15  |             TABLE ACCESS FULL           | CHANNELS                 |     1 |    13 |     3 |  00:00:01 |       |       |
| 16  |            BITMAP INDEX RANGE SCAN      | SALES_CHANNEL_BIX        |       |       |       |           | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 17  |          BITMAP MERGE                   |                          |       |       |       |           |       |       |
| 18  |           BITMAP KEY ITERATION          |                          |       |       |       |           |       |       |
| 19  |            BUFFER SORT                  |                          |       |       |       |           |       |       |
| 20  |             TABLE ACCESS FULL           | TIMES                    |   227 |  3632 |    18 |  00:00:01 |       |       |
| 21  |            BITMAP INDEX RANGE SCAN      | SALES_TIME_BIX           |       |       |       |           | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 22  |          BITMAP MERGE                   |                          |       |       |       |           |       |       |
| 23  |           BITMAP KEY ITERATION          |                          |       |       |       |           |       |       |
| 24  |            BUFFER SORT                  |                          |       |       |       |           |       |       |
| 25  |             TABLE ACCESS FULL           | SYS_TEMP_0FD9D660E_1DF5D6|  2438 |   12K |     4 |  00:00:01 |       |       |
| 26  |            BITMAP INDEX RANGE SCAN      | SALES_CUST_BIX           |       |       |       |           | KEY(SUBQUERY)| KEY(SUBQUERY)|
| 27  |     TABLE ACCESS FULL                   | SYS_TEMP_0FD9D660E_1DF5D6|  2438 |   36K |     4 |  00:00:01 |       |       |
---------------------------------------------------------------------------+-----------------------------------+---------------+

Predicate Information:
----------------------
3 - filter("C"."CUST_STATE_PROVINCE"='FL')
5 - access("S"."CUST_ID"="C0")
6 - access("S"."TIME_ID"="T"."TIME_ID")
7 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR "T"."CALENDAR_QUARTER_DESC"='2000-02'))
15 - filter("CH"."CHANNEL_DESC"='Direct Sales')
16 - access("S"."CHANNEL_ID"="CH"."CHANNEL_ID")
20 - filter(("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR "T"."CALENDAR_QUARTER_DESC"='2000-02'))
21 - access("S"."TIME_ID"="T"."TIME_ID")
26 - access("S"."CUST_ID"="C0")

从以上trace中可以看到系统命名的临时表SYS_TEMP_0FD9D660E_1DF5D6缓存CUSTOMERS表,之后原先CUSTOMERS表被SYS_TEMP_0FD9D660E_1DF5D6所取代,原CUSTOMERS表上的cust_id和cust_city列均被替换为别名为T1的临时表的C0和C1列。实际上该临时表也仅需要这2列即可满足计划的需求,所以该临时表以如下查询语句填充:

ST: Subquery text:******* UNPARSED QUERY IS *******
SELECT "C"."CUST_ID" "ITEM_1","C"."CUST_CITY" "ITEM_2" FROM "SH"."CUSTOMERS" "C" WHERE "C"."CUST_STATE_PROVINCE"='FL'
Copy query block qb# -1 () : SELECT /*+ CACHE_TEMP_TABLE(T1) */  "C0" FROM "SYS"."SYS_TEMP_0FD9D660E_1DF5D6" T1
ST: Subquery (temp table) text:******* UNPARSED QUERY IS *******
SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "T1"."C0" "C0" FROM "SYS"."SYS_TEMP_0FD9D660E_1DF5D6" "T1"
Copy query block qb# -1 () : SELECT /*+ CACHE_TEMP_TABLE(T1) */  "C0", "C1" FROM "SYS"."SYS_TEMP_0FD9D660E_1DF5D6" T1
ST: Join back qbc text:******* UNPARSED QUERY IS *******
SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "T1"."C0" "C0","T1"."C1" "C1" FROM "SYS"."SYS_TEMP_0FD9D660E_1DF5D6" "T1"

可以从以上执行计划中看到第一、二、三行的”TEMP TABLE TRANSFORMATION LOAD AS SELECT TABLE ACCESS FULL CUSTOMERS”看到Oracle是如何将子查询物化为临时表的。在第25行,Oracle直接以该临时表替代了子查询来构建我们所需要的位图。到第27行Oracle直接利用该临时表来重复连接,避免再次扫描customers表。因为我们在构建临时表时已经使用谓词条件(如上面的红字语句),故而我们无需对临时表再次过滤。

如何启用星型查询

星型转换由初始化参数star_transformation_enabled控制,该参数可以有三种选项:

  • TRUE: Oracle优化器自动识别语句中的事实表和约束维度表并进行星型转换。这一切优化尝试都在CBO的藩篱内,优化器需要确定转换后的执行计划成本要低于不转换的执行计划;同时优化器还会尝试利用物化的临时表,如果那样真的好的话。
  • False: 优化器不会考虑星型转换。
  • TEMP_DISABLE:当一个维度表超过100个块时,”如果简单地设置star_transformation_enabled为TRUE来启用星型变换,那么会话会创建一个内存中的全局临时表(global temporary table)来保存已过滤的维度数据,这在过去会造成很多问题;”这里说的100个块其实是隐式参数_temp_tran_block_threshold(number of blocks for a dimension before we temp transform)的默认值,此外隐式参数_temp_tran_cache(determines if temp table is created with cache option,默认为TRUE)决定了这类临时表是否被缓存住;为了避免创建全局临时表可能带来的问题,就可以用到TEMP_DISABLE这个禁用临时表的选项,让优化器不再考虑使用物化的临时表。

默认该参数为False,若要问这是为什么?因为星型转换适用的场景是数据仓库环境中具有星型模型的模式,而且需要事实表的各个连接列上均有良好的索引时才能发挥其优势。如果能确定以上因素,那么我们可以放心的使用星型转换了,把star_transformation_enabled改为true或temp_disable吧!

总结

星型转换可以有效改善大的事实表与多个具有良好选择率的维度表间连接的查询。星型转换有效避免了全表扫描的性能窘境。它只fetch那些事实表上的”绝对”相关行。同时星型转换是基于CBO优化器的,Oracle能很好地认清使用该种转换是否有利。一旦维度表上的过滤无法有效减少需要从事实表上处理的数据集和时,那么可能全表扫描相对而言更为恰当。

以上我们力图通过一些简单的查询和执行计划来诠释星型转换的基本理念,但现实生产环境中实际的查询语句可能要复杂的多;举例而言如果查询涉及星型模型中的多个事实表的话,那么其复杂度就大幅提高了;如何正确构建事实表上的索引,收集相关列上的柱状图信息,在Oracle优化器无法正确判断的情况下循循善诱,都是大型数据仓库环境中DBA所面临的难题。

11g Multi-Column Correlation Stats and Dynamic Sampling

Oracle CBO优化模式中列的统计信息是一个十分重要的概念,但在11g之前我们所讨论的都是基于单列的统计信息或直方图,也就是说基于成本的优化器总是假设where子句后的谓词中列与列之间不存在联系。但是有的查询包含一个表的多个列,而每个列又都与不同的选择度。这些列中有的是相关的,但优化器并不知道这些关系。在这种情况下,优化器如果要估计出真实的基数(card),必须要了解增加另一列到某个给定列是否会引起结果集的减少。多列上的相关统计数据能提供比单列统计数据或直方图更好的基数估计。当2个列紧密相关时,增加额外的谓词可以减少结果集。Oracle database 11g中引入了扩展统计(也叫多列统计,multicolumn statistics),可以收集一组列上的统计数据,从而让优化器能准确地计算多个单列谓词的选择性。因为把紧密相关的列作为一个组才能正确地放映其组合选择性,所以把相关列作为一组,在其上(列祖)收集统计数据,这些信息足以让优化器能准确地进行选择性估计,在包含使用相关列的谓词查询中,这是我们实际关心的问题。多列统计的引入意味着,在11g中cbo优化器可以对具有多列复杂谓词判断的SQL语句做出更准确的成本估算,许多原本”误用”全表扫描的查询现在可以使用索引扫描的执行计划,语句将运行地更快速。

我们试看下例:
[Read more…]

ORA-00600:[32695], [hash aggregation can't be done]错误一例

还是那个hash group by算法的问题,日志文件中出现以下记录:

*** ACTION NAME:(SQL 窗口 - 新建) 2010-09-03 14:27:54.594
*** MODULE NAME:(PL/SQL Developer) 2010-09-03 14:27:54.594
*** SERVICE NAME:(HQYDB1) 2010-09-03 14:27:54.594
*** SESSION ID:(3205.17923) 2010-09-03 14:27:54.594
*** 2010-09-03 14:27:54.594
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [32695], [hash aggregation can't be done], [], [], [], [], [], []
Current SQL statement for this session:
create table zou_201008_cell_id as
select /* g_all_cdr02,60 */
calling_num mobile_number,
lac,
lpad(cell_id,5,'0') cell_id,
count(*) c,
sum(call_duration) call_duration,
sum(decode(record_type,'00',1,0)*call_duration) moc_call_duration,
sum(decode(record_type,'01',1,0)*call_duration) mtc_call_duarion
from  g_all_cdr02
where substr(calling_num,1,7) in (select mobile_prefix from zou_mobile_prefix)
group by
calling_num ,
lac,
lpad(cell_id,5,'0')
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              100000000 ? 11055A9A0 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgesiv+0080          bl       kgeriv               1050BE654 ? 1050BE604 ?
                                                   0000027E5 ? 080000000 ?
                                                   07FFFFFFF ?
ksesic1+0060         bl       kgesiv               43300000FFFF5310 ?
                                                   4530000000000000 ?
                                                   000000071 ? 000000001 ?
                                                   000000000 ?
qeshPartitionBuildH  bl       01F9CA24
D+04bc
qeshGBYOpenScan2+02  bl       qeshPartitionBuildH  0000027E5 ? 1105C06C0 ?
34                            D
qeshGBYOpenScan+001  bl       qeshGBYOpenScan2     FFFFFFFFFFF5740 ? 11055A938 ?
8                                                  000000000 ? 000000010 ?
qerghFetch+05e8      bl       qeshGBYOpenScan      000001000 ?
rwsfcd+0054          bl       _ptrgl
qerltFetch+036c      bl       03F2EB1C
ctcdrv+4160          bl       01F9C898
opiexe+2884          bl       ctcdrv               100000001 ? 100000001 ?
                                                   110467F30 ?
opiosq0+19f0         bl       opiexe               FFFFFFFFFFF8B50 ?
                                                   2824422142420820 ?
                                                   FFFFFFFFFFF8C10 ?
kpooprx+0168         bl       opiosq0              300000000 ? 000000000 ?
                                                   000000000 ? A4000000000000 ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB464 ?
                                                   FFFFFFFFFFFB068 ?
                                                   1BF000001BF ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103878F8 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F9F2A0
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F9E0E8
sou2o+0090           bl       opidrv               3C02DC1BBC ? 44065F000 ?
                                                   FFFFFFFFFFFF3A0 ?
opimai_real+01bc     bl       01F9B9F4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?

--------------------- Binary Stack Dump ---------------------

这次是因为应用人员不了解alter session的作用域,在PL/SQL Developer工具中的不同窗口(也就是不在同一会话中)中执行了”alter session set “_gby_hash_aggregation_enabled” = false;”和涉及group by操作的SQL,并导致了unpublished bug:6471770被触发。
我们比较容易地workaround绕过这个Bug:


/* 在会话级别设置优化参数_gby_hash_aggregation_enabled */

alter session set "_gby_hash_aggregation_enabled" = false;

/* 或者在语句中加入NO_USE_HASH_AGGREGATION的 hint */

select  /*+ NO_USE_HASH_AGGREGATION */ ....

以上提及的unpublished bug:6471770据称在10.2.0.5,11.1.0.7,11.2.0.1版本中被修正了。

沪ICP备14014813号-2

沪公网安备 31010802001379号