对Oracle中索引叶块分裂而引起延迟情况的测试和分析

在版本10.2.0.4未打上相关one-off补丁的情况下,分别对ASSM和MSSM管理模式表空间进行索引分裂测试,经过测试的结论如下:

l  在10gr2版本中MSSM方式是不能避免索引分裂引起交易超时问题;

l  10.2.0.4上的one-off补丁因为目前仅存在Linux版本,可以考虑声请补丁后具体测试(因目前没有补丁所以处于未知状态)。

l  合并索引是目前最具可行性的解决方案(alter index coalesce)。

l  最新的11gr2中经测试仍存在该问题。

具体测试过程如下:

1.    自动段管理模式下的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management AUTO

3  extent management local uniform size 10M;

创建自动段管理的表空间

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

Table created.

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Index created.         创建实验对象表及索引

SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000;           插入25万条记录

250000 rows created.

SQL> commit;

Commit complete.

SQL>create table idx2 tablespace idx1 as select * from idx1 where 1=2;

Table created.

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)                    取出后端部分记录,即每250条取一条

)

where mod(rn, 250) = 0

)

/

933 rows created.

SQL> commit;

Commit complete.

SQL> analyze index idx1_idx validate structure; 分析原索引

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0               未删除情况下499个叶块

SQL> delete from idx1 where a between 10127 and 243625;                             大量删除

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499      233499            删除后叶块数量不变

SQL> insert into idx1 select * from idx2;                   令那些empty 不再empty,但每个块中只有一到二条记录,空闲率仍为75-100%

commit;

933 rows created.

Commit complete.

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;          造成leaf块分裂前提

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

997 leaf node splits

997 leaf node 90-10 splits

0 branch node splits

0 queue splits                 找出当前会话目前的叶块分裂次数

SQL>insert into idx1 values (251000);                                        此处确实叶块分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

998 leaf node splits

998 leaf node 90-10 splits

0 branch node splits

0 queue splits         可以看到对比之前的查询多了一个叶块分裂

SQL> set linesize 200 pagesize 1500;

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     271601       271601            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      82803        82803            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1         177 0       3728         3728              1

insert into idx1 values (251000)     读了那些实际不空的块,较多buffer_get

1        1409          0      40293        40293            933

insert into idx1 select * from idx2

1      240842          0    3478341      3478341         250000

SQL> insert into idx1 values (251001);                                  不分裂的插入

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     271601       271601            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      82803        82803            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1           9          0       1640         1640              1

insert into idx1 values (251001) 不分裂的插入,少量buffer_gets

1         177          0       3728         3728              1

insert into idx1 values (251000)

1        1409          0      40293        40293            933

insert into idx1 select * from idx2

1      240842          0    3478341      3478341         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

如演示1所示,在自动段管理模式下大量删除后插入造成许多块为75%-100%空闲率且不完全为空,此后叶块分裂时将引起插入操作的相关前台进程扫描大量“空块“,若这些块不在内存中(引发物理读)且可能需要延迟块清除等原因时,减缓了该扫描操作的速度,造成叶块分裂缓慢,最终导致了其他insert操作被split操作所阻塞,出现enq:tx index contention等待事件。

2.  手动段管理模式下的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management MANUAL                                      — MSSM的情况

3  extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250

Index created.

SQL> SQL> 000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

/

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL>   2    3    4    5    6    7    8    9

933 rows created.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0

SQL> delete from idx1 where a between 10127 and 243625;

233499 rows deleted.

SQL> commit;

Commit complete.

SQL> insert into idx1 select * from idx2;

commit;

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL>

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1496 leaf node splits

1496 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000);                                  确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1497 leaf node splits

1497 leaf node 90-10 splits

0 branch node splits

0 queue splits

以上与ASSM时完全一致

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     283301       283301            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      78465        78465            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1        963 0      10422        10422              1              ASSM模式下更大量的空块

insert into idx1 values (251000)

1         984          0      35615        35615            933

insert into idx1 select * from idx2

1      238579          0    3468326      3469984         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     283301       283301            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      78465        78465            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1           7 0       1476         1476              1

insert into idx1 values (251001)    —不分裂的情况与ASSM时一致

1         963 0      10422        10422              1

insert into idx1 values (251000)

1         984          0      35615        35615            933

insert into idx1 select * from idx2

1      238579          0    3468326      3469984         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示2所示,MSSM情况下叶块分裂读取了比ASSM模式下更多的“空块“;MSSM并不能解决大量删除后叶块分裂需要扫描大量非空块的问题,实际上可能更糟糕。从理论上讲MSSM的freelist只能指出那些未达到pctfree和曾经到达pctfree后来删除记录后使用空间下降到pctused的块(doc:A free list is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED.),换而言之MSSM模式下”空块“会更多。

3.  自动段管理模式下coalesce后的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management AUTO                                       — ASSM coalesce情况

3  extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL>

Index created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

/

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL>   2    3    4    5    6    7    8    9

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> SQL>

SQL>

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0

SQL> delete from idx1 where a between 10127 and 243625;

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> alter index idx1_idx coalesce;

Index altered.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280         33           0 — coalesc lf块合并了

SQL> insert into idx1 select * from idx2;

933 rows created.

SQL> SQL> commit;

Commit complete.

SQL>

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

1999 leaf node splits

1995 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000);                                       确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2000 leaf node splits

1996 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     268924       268924            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      78349        78349            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1          23 0       2218         2218              1                             少量buffer gets

insert into idx1 values (251000)

1         191          0      15596        15596            933

insert into idx1 select * from idx2

1      240852          0    3206130      3206130         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1603          0     268924       268924            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         156          0      78349        78349            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1           9 0       1574         1574              1

insert into idx1 values (251001)

1          23 0       2218         2218              1

insert into idx1 values (251000)

1         191          0      15596        15596            933

insert into idx1 select * from idx2

1      240852          0    3206130      3206130         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示三所示在删除后进行coalesce操作,合并操作将大量空块分离出了索引结构(move empty out of index structure),之后的叶块分裂仅读取了少量必要的块。

4.  手动段管理模式下coalesce后的索引块分裂

SQL> drop tablespace idx1 including contents and datafiles;

Tablespace dropped.

SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M

2  segment space management MANUAL                               — mssm情况下 coalesce

3  extent management local uniform size 10M;

Tablespace created.

SQL> create table idx1(a number) tablespace idx1;

create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0;

Table created.

SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250

Index created.

SQL> SQL> 000;

commit;

create table idx2 tablespace idx1 as select * from idx1 where 1=2;

insert into idx2

select * from idx1 where rowid in

(select rid from

(select rid, rownum rn from

(select rowid rid from idx1 where a between 10127 and 243625 order by a)

)

where mod(rn, 250) = 0

)

/

commit;

250000 rows created.

SQL> SQL>

Commit complete.

SQL> SQL>

Table created.

SQL> SQL>   2    3    4    5    6    7    8    9

933 rows created.

SQL> SQL>

Commit complete.

SQL> SQL> SQL>

SQL>

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499           0

SQL> delete from idx1 where a between 10127 and 243625;

commit;

233499 rows deleted.

SQL> SQL>

Commit complete.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280        499      233499

SQL> alter index idx1_idx coalesce;

Index altered.

SQL> analyze index idx1_idx validate structure;

select blocks,lf_blks,del_lf_rows from index_stats;

Index analyzed.

SQL>

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

1280         33           0

SQL> insert into idx1 select * from idx2;

933 rows created.

SQL> SQL> commit;

Commit complete.

SQL>

SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126;

commit;

126 rows created.

SQL> SQL>

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2502 leaf node splits

2494 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> insert into idx1 values (251000);                       确实分裂

1 row created.

SQL> commit;

Commit complete.

SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’  and sid=(select distinct sid from v$mystat);

VALUE NAME

———- —————————————————————-

2503 leaf node splits

2495 leaf node 90-10 splits

0 branch node splits

0 queue splits

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     281059       281059            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      77817        77817            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1          19          0       2010         2010              1                       少量buffer get

insert into idx1 values (251000)

1         126          0      15364        15364            933

insert into idx1 select * from idx2

1      238644          0    3229737      3230569         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

SQL> insert into idx1 values (251001);

1 row created.

SQL> commit;

Commit complete.

SQL> select  executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql

2  where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’;

EXECUTIONS BUFFER_GETS DISK_READS   CPU_TIME ELAPSED_TIME ROWS_PROCESSED

———- ———– ———- ———- ———— ————–

SQL_TEXT

——————————————————————————————————————————————————————————————————–

1        1553          0     281059       281059            933

insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 )

1         153          0      77817        77817            126

insert into idx1 select 250000+rownum from all_objects where rownum <= 126

1          7 0       1460         1460              1

insert into idx1 values (251001)

1          19 0       2010         2010              1

insert into idx1 values (251000)

1         126          0      15364        15364            933

insert into idx1 select * from idx2

1      238644          0    3229737      3230569         250000

insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000

6 rows selected.

如演示4所示,MSSM模式下合并操作与ASSM情况下大致一样,合并操作可以有效解决该问题。

5.  Coalesce合并操作的锁影响

SQL> create table coal (t1 int);

Table created.

SQL> create index pk_t1 on coal(t1);

Index created.

SQL> begin

2    for i in 1..3000 loop

3      insert into coal values(i);

4      commit;

5      end loop;

6      end;

7  /

PL/SQL procedure successfully completed.

SQL> delete coal where t1>500;

2500 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze index pk_t1 validate structure;

Index analyzed.    注意analyze validate操作会block一切dml操作

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          6        2500          删除后的状态

此时另开一个会话,开始dml操作:

SQL> update coal set t1=t1+1;

500 rows updated.

回到原会话

SQL> alter index pk_T1 coalesce;             — coalesce 未被阻塞

Index altered.

在另一个会话中commit,以便执行validate structure

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          3         500

显然coalesce的操作没有涉及有dml操作的块

在没有dml操作的情况下:

SQL> truncate table coal;

Table truncated.

SQL> begin

2    for i in 1..3000 loop

3      insert into coal values(i);

4      commit;

5      end loop;

6      end;

7  /

PL/SQL procedure successfully completed.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          6           0

SQL> delete coal where t1>500;

2500 rows deleted.

SQL> commit;

Commit complete.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          6        2500

SQL> alter index pk_t1 coalesce;

Index altered.

SQL> analyze index pk_t1 validate structure;

Index analyzed.

SQL> select blocks,lf_blks,del_lf_rows from index_stats;

BLOCKS    LF_BLKS DEL_LF_ROWS

———- ———- ———–

8          1           0

没有dml时,coalesce 操作涉及了所有块

如演示5所示coalesce会避开dml操作涉及的块,但在coalesec的短暂间歇出现在索引上有事务的块不会太多。且coalesce操作不会降低索引高度。

附件是关于rebuild及coalesce索引操作的详细描述:

6.  Coalesce操作总结

优点:

l  是一种快速的操作,对整体性能影响最小(not performance sensitive)。

l  不会锁表,绕过有事务的索引块。

l  可以有效解决现有的问题。

l  不会降低索引高度,引起再次的root split

缺点:

l  需要针对个别对象,定期执行合并操作;无法一劳永逸地全局地解决该问题。

7.  Linux 10.2.0.4上相关补丁的技术交流

Metalink bug 8286901 note中叙述了一位用户遇到相同的问题并提交了SR,当时oracle support给出了one-off补丁,但该用户在apply了该补丁后仍未解决问题。

以下为note 原文:

It is similar to bug8286901, but after applied patch8286901, still see enq tx
contentiona with high “failed probes on index block reclamation”

Issue encountered by customer and Oracle developer (Stefan Pommerenk).


He describes is thus:


"Space search performed by the index splitter can't find space in neighboring


blocks, and then instead of allocating new space, we go and continue to


search for space elsewhere, which manifests itself in block reads from disk,


block cleanouts, and subsequent blocks written due to aggressive MTTR


setting."




"To clarify: the cleanouts are not the problem per se. The culprit seems to


be that the space search performed by the index splitter can't find space in


neighboring blocks, and then instead of allocating new space, we go and


continue to search for space elsewhere, which manifests itself in block reads


from disk, block cleanouts, and subsequent blocks written due to aggressive


MTTR setting. This action has caused other sessions to get blocked on TX


enqueue contention, blocked on the splitting session. Advice was to set 10224


trace event for the splitter for a short time only in order to get


diagnostics as to why the space search rejected most blocks.


> A secondary symptom are the bitmap level 1 block updates, which may or may


not be related to the space search; I've not seen them before, maybe because


I didn't really pay attention :P , but the symptoms seen in the ASH trace


indicate it's the same problem. Someone in space mgmt has to look at it to


confirm it is the same problem."

与该用户进行了mail私下交流,他的回复:

I still have a case open with Oracle. I believe that this is a bug in the Oracle code. The problem is that it has been difficult to create a reproducible test case for Oracle support. My specific issue was basically put on hold pending the results of another customer’s service request that appeared to have had the same issue, (9034788). Unfortunately they couldn’t reproduce the issue in that case either.

I believe that there is a correlation between the enq TX – index contention wait event and a spike in the number of ‘failed probes on index block reclamation. I have specifically asked Oracle to explain why there is a spike in the ‘failed probes on index block reclamation’ during the same time frame as the enq TX index contention wait event, but they have not answered my question.

I was hoping that some investigation by Oracle Support into the failed probes metric might get someone on the right track to discovering the bug. That hasn’t happened though.

Hi ,

Thanks for your sharing .  The bug (or specific ktsp behave) is fatal in response time sensitive  OLTP env.

I would like to ask my customer to coalesce those index where massive deleted regularly.

Thanks for your help again!

Yes, I saw that. I have applied patch 8286901 and set the event for version 10.2.0.4, but the problem still occurs periodically. And as I mentioned before, we see a correlation between enq TX waits and the failed probes on index block reclamation. Which is why I still think that it is a bug. I agree that trying to rebuild or coalesce the indexes are simply attempts to workaround the issue and not solve the root cause.

Early on when I started on this issue I did do some index dumps and could clearly see that we had lots of blocks with only 1 or 2 records after our mass delete jobs. I have provided Oracle Support with this information as well as oradump files while the problem is occurring, but they don’t seem to be able to find anything wrong so far.

If you are interested in seeing if you are experiencing a high ‘failed probes on index block reclamation’ event run the query below.

select SS.snap_id,
SS.stat_name,
TO_CHAR(S.BEGIN_INTERVAL_TIME, ‘DAY’) DAY,
S.BEGIN_INTERVAL_TIME,
S.END_INTERVAL_TIME,
SS.value,
SS.value – LAG(SS.VALUE, 1, ss.value) OVER (ORDER BY SS.SNAP_ID) AS DIFF
from DBA_HIST_SYSSTAT SS,
DBA_HIST_SNAPSHOT S
where S.SNAP_ID = SS.SNAP_ID
AND SS.stat_NAME = ‘failed probes on index block reclamation’
ORDER BY SS.SNAP_ID ;

  1. 在11gr2上的测试

在最新的11gr2中进行了测试,仍可以重现该问题(如图单条insert引起了6675buffer_gets,这是在更大量数据的情况下)。

我们可以猜测Oracle提供的one-off补丁中可能是为叶块分裂所会扫描的“空块”附加了一个上限,在未达到上限的情况下扫描仍会发生。而在主流的公开的发行版本中Oracle不会引入该补丁的内容。尝试在没有缓存的情况下引起分裂问题,分裂引起了大约4000个块的物理读,但该操作仍在0.12秒(有缓存是0.02秒,如图)内完成了(该测试使用普通ata硬盘,读取速度在100MB/S: Timing buffered disk reads:  306 MB in  3.00 seconds = 101.93 MB/sec);从1月21日的ash视图中可以看到引起split的260会话处于单块读等待(db file sequential read)中,且已等待了43950us约等于44ms;这与良好io的经验值10ms左右有较大出入;我们可以确信io性能问题也是引发此叶块分裂延迟如此显性的一个重要因素。

具体结论

综上所述,在之前讨论的几个方案中,MSSM方式是不能避免索引分裂引起交易超时问题的;不删除数据的方案在许多对象上不可行;10.2.0.4上的one-off补丁因为目前仅存在Linux版本,可以考虑声请补丁后具体测试(因目前没有补丁所以处于未知状态)。Coalesce合并索引是目前既有的最具可操作性且无副作用的解决方案。

11g新特性之闪回事务处理取消

在Oracle database10g中,Oracle引入了两个闪回特性,即闪回版本查询和闪回事务处理查询,允许撤销跟随在数据库中逻辑错误后面的数据所发生的错误变化。在一个可以的数据错误之后,首先使用闪回版本查询来确定属于该错误事务处理的表行的版本。在确定出错的事务处理后,使用闪回事务处理查询审计该事务所做的所有变化。使用由闪回事务处理查询的undo_sql列提供的SQL代码,撤销由错误事务处理所做的变化。这样,闪回事务处理提供了强大的撤销数据库中逻辑错误的办法。

在Oracle 11g中,可以使用新的闪回事务处理取消特性完成必须由闪回版本查询和闪回事务处理查询共同完成的任务。通常一个数据错误会引起其他依赖事务处理使用有错的数据执行。闪回事务处理取消时一个新的逻辑恢复特性,它使你返回目标事务处理,并使依赖事务处理回到原来的状态。闪回事务处理取消特性识别并修正内部的事务处理以及以及依赖的事务处理,从而彻底撤销逻辑数据错误的作用。撤销插入时,更新和删除操作的完整集合确保了事务处理的原子性和一致性原理被维护。这样,当数据库联机时,通过执行一个取消命令(单独运行transaction_backout过程),就可以执行数据的逻辑恢复。如果正在使用Database Console,点击一下就可以取消事务。依赖事务处理与目标事务处理可能有以下几种关系:

  • 写后写(write-after-write,WAW)关系,依赖事务处理更改了由目标事务处理更改的相同数据。
  • 主键约束关系,依赖事务处理插入与目标事务处理删除的相同的逐渐。

通过执行补偿事务处理(compensating transactions,它将受不要的事务处理影响的数据恢复到原来状态),数据库撤销事务处理的变化。闪回事务处理依靠撤销数据(以及为撤销块产生的重做)来创建补偿事务处理。因此,要撤销一组不要的事务处理,需要同时撤销数据和归档重做日志。

  1. 闪回事务处理取消的先觉条件 :必须首先开启数据库的补全日志(supplemental logging),然后给要使用闪回事务处理特性的用户授予特定的权限。为了启动数据库补全日志,使用以下命令:alter database add supplemental log data; alter database add supplemental log data (primary key) columns。 除了启动补全日志外,还要给闪回事务处理取消特性的用户授予以下权限: grant execute on dbms_flashback to hr; grant select any transaction to hr; 用户必须有flashback 权限,可以通过授予DBMS_FLASHBACK表的execute权限来授予。另外,用户还需要select any transaction权限。如果用户想要取消属于自己的模式中的事务处理,则无需增加权限。但是,如果某个用户想要取消其他模式的事务处理,还必须授予他对手事务处理取消影响的所有表的DML权限。
  2. 使用transaction_backout过程 补偿事务处理的思想对于事务处理取消特性至关重要。通过使用撤销数据,一个补偿事务处理可以取消一个或多个事务处理。使用DBMS_FLASHBACK包的transaction_backpout过程很容易回退不想要的事务处理。以下是transaction_backout过程的结构:

dbms_flashback.transaction_backout(
numtxns NUMBER,
xids    xid_array,
options BINARY_INTEGER DEFAULT NOCASCADE,
scnhint NUMBER DEFAULT 0);

transaction_backout过程有四个参数,如下:

  • NumTxns:此参数为被取消的事务处理的数量。
  • Names: 此参数为被取消的事物处理的列表(按名字排序)。
  • Timehint: 如果你按名字标识事务处理,则可以提供一个时间提示,如在事务处理开始前的某个时间。
  • Options:该事务指定某个事务处理及其依赖的事务处理被取消的顺序。

transaction_backout过程只分析事务处理之间的依赖性,执行DML操作,并提供一个报告。但是此过程不自动提交这些DML操作。该过程所做的就是通过锁定受影响的表行即表本身,保证其他事务处理的依赖性不受取消操作影响。必须明确执行commit语句,确保永久取消。

数据库将自动为取消操作提供一个事务处理名,但是由你给该操作明确的名字有利于后来的神奇。如果transaction_backout过程的执行成功完成。则表明已经取消了单个事务处理且无任何依赖事务处理。一个取消操作花费时间的多少将依赖于被取消的事务处理产生的重做的数量–重做日志量越大,完成transaction_backout操作所花费的时间越长。

transaction_backout过程有4个选项。

  • cascade:取消所有事务处理,其中包括依赖的事务处理,这些事务处理在取消父(目标)事务处理之前首先取消。
  • nocascade:没有任何依赖事务处理。默认值。
  • nicascade_force:只取消目标事务处理,忽略依赖事务处理。
  • nonconflict_only:只取消在目标事务处理中不冲突的那些行。

注意:nocascade的默认值期望待取消的事务处理无任何依赖事务处理。

 transaction_backout报告

可用DBA_FLASHBACK_TRANSACTION_STATE和DBA_FLASHBACK_TRANSACTION_REPORT视图检查事务处理取消操作的细节。transaction_backout过程提供这两个视图。如果一个事务处理出现在DBA_FLASHBACK _TRANSACTION_STATE视图中,则意味着该事务处理已经成功取消。对于每个被取消的事务处理,DBA_FLASHBACK_TRANSACTION_REPORT视图提供了详细的报告。 如果不想使用DBMS_FLASHBACK包,还可用 Database Console执行事务处理取消操作。

转发请注明源地址:https://www.askmac.cn

Oracle database 11g release2发布

万众期待的11g r2终于掀开了盖头来,作为先行军,linux平台无疑可以让最多专家和用户更好最广泛的测试新版本。

oracle甚至为其使用了独立的域名:www.oracledatabase11g.com,针对新产品使用独立域名的作为并不多见,而该域名目前的pr 值 及 alex排名均甚低。究其根本 可能是一种潮流,类似暴雪的游戏产品均拥有自己独立的域名, 我们可以猜想今后会出现 oracledatabase12[X].com 和 oracledatabase13[X].com。下为11g主页的横幅flash。

在《Oracle announces availability of Oracle Database 11g Release 2》中,Oracle展开了一贯的忽悠伎俩,提出了以下优势:更成熟的网格计算帮助企业减少成本,节约存储成本,更少的无用冗余,自动调优与扩展性(更强大的awr和memeory auto tuning )。

以上优势当然是我们希望的,但现实往往是残酷的。 看到11g 逐渐成熟 走向 更广泛的运用,  则DBA本身需要掌握的 新特性 ,以及 新版本中不同的 “古怪脾性”  ,相信有机会一一体验。。。。

使用Oracle RMAN脚本

为什么要使用脚本?

为什么要使用RMAN命令脚本呢? 这里有2个主要原因:

  1. 绝大多数RMAN操作都是批量的同时也是自动化的。举例来说,备份数据库是一个反反复复的操作而非每次执行都要费一番功夫写命令。
  2. 脚本提供了一致性。在临时性的抑或者说一次性的操作,譬如从备份中恢复数据库,一般都不适用自动化。但是,操作本身是相同的,无论DBA在何种环境下。

Oracle11g中有两种脚本形式

  1. 使用命令文件: 即文件系统上的一个文本文件,也叫平面文件。
  2. 使用存储的脚本,该脚本存储在Oracle恢复目录中,由RMAN命令行调用。

命令文件形式

Oracle 命令文件是一种文本解释文件类似于UNIX下的SHELL脚本和Windows中的批处理作业BAT文件。代码演示1中展示了一个非常简单的例子,用以备份USERS表空间。其中扩展名.rman是非必要的,但有助于帮您理清文件用途。

代码演示1:备份USERS表空间的脚本

connect target /

connect catalog rman/secretpass@rmancat

run {

allocate channel c1 type disk format ‘/orabak/%U’;

backup tablespace users;

}

你可以通过多种方式调用脚本,例如自RMAN提示行中:

RMAN> @backup_ts_users.rman

注意调用使用的符号@

您也可以在SHELL中调用脚本,例如:

rman @backup_ts_users.rman

这种方式十分有用,若不想使用@符号,用以下方式替代:

rman cmdfile=backup_ts_users.rman

注意CONNECT子句是放在backup_ts_users.rman命令文件中的,故再次无需提供用户名与密码,也保障了没有泄露密码的危险。

传递参数: backup_ts_users.rman命令文件运行地不错,但他太固定了。他将备份输出到某个特定目录且只备份一个表空间(USERS)。若你想要备份到另一处或其他表空间是,你需要创建新的脚本。

一个较好的策略是使用参数驱动的RMAN脚本。比起硬编码来,参数传递脚本灵活得多。代码演示2中展示了修改了的backup_ts_users .rman脚本。

代码演示2: 参数驱动脚本

connect target /

connect catalog rman/secretpass@rmancat

run {

allocate channel c1 type disk format ‘&1/%U’;

backup tablespace &2;

}

另一个SHELL脚本,名为backup_ts_generic.sh调用以上脚本,传递参数/tmp作为备份目录以及USERS为备份表空间对象。

$ORACLE_HOME/bin/rman <<EOF

@backup_ts_generic.rman “/tmp” USERS

EOF

你还可以使之更灵活,你可以将backup_ts_generic.sh第二行做以下修改:

@backup_ts_generic.rman “/tmp” $1

则您可以将参数传递给sh脚本,在以/tmp为固定备份目录的情况下。例如您要备份MYTS表空间:

backup_ts_generic.sh MYTS1

日志设置: 当你通过某种自动设置来运行RMAN脚本时,实际没有人在观察命令窗口,那我们如何找到RMAN的输出呢?这些输出日志使我们了解脚本的运行结果故之分重要,为了获取日志,我们可以使用log选项。

rman cmdfile=backup_ts_users.rman log=backup_ts_users.log

现在backup_ts_users.rman脚本的输出日志将会记录在名为backup_ts_users.log 的文件中而非出现在屏幕上。

存储脚本

虽然脚本文件可以满足大多数情况,但他们也有部分缺点。脚本文件总是存放在服务器上,这显得并不十分安全,因为只要有阅读该文件的权限就可以获取SYS等账号的密码。

解决方法就是使用RMAN的存储脚本,存储脚本保存在了Oracle恢复目录中,而非直接存放在服务器上。代码演示3展示了调用存储脚本的例子。

代码演示3

RMAN> run { execute script

backup_ts_users; }

C:\> rman

RMAN> connect target /

RMAN> connect catalog rman/secretpass@rmancat

RMAN> create script backup_ts_users

2> comment ‘Tablespace Users Backup’

3> {

4>      allocate channel c1 type disk format ‘c:\temp\%U’;

5>      backup tablespace users;

6> }

在代码演示3中存储脚本backup_ts_users仅在用户连接到目标数据库是可见。同时存储脚本也可以做到传递参数,如代码演示4

代码演示4:参数驱动的存储脚本:

RMAN> create script backup_ts_any

2> comment ‘Any Tablespace Backup’

3> {

4>      allocate channel c1 type disk format ‘c:\temp\%U’;

5>      backup tablespace &1;

Enter value for 1: users

users;

6> }

7>

调用参数驱动存储脚本时,我们需要使用USING 子句。如下例,使用存储脚本备份SYSTEM表空间:

run { execute script

backup_ts_any using ‘SYSTEM’; }

管理存储脚本:RMAN工具提供了管理存储脚本的必要方式。

可以用一下命令列出存储的脚本:

RMAN> list script names;

List of Stored Scripts in Recovery Catalog

Scripts of Target Database ARUPLAP1

Script Name

Description

————

backup_ts_any

Any Tablespace Backup

backup_ts_users

Tablespace Users Backup

以上命令会列出脚本的本地和全局名。

若只需要列出全局名,可以使用以下命令:

RMAN> list global script names;

若要显示某个脚本的具体内容,例如back_ts_any,可以使用以下命令:

RMAN> print global script

backup_ts_level1_any;

删除一个存储脚本:

RMAN> delete global script

backup_ts_level1_any;

你也可以通过一个脚本文件创建存储脚本,如以下:

RMAN> create script backup_ts_users

from file ‘backup_ts_users.rman’;

当然你也可以通过存储脚本还原出一个平面脚本文件:

RMAN> print script backup_ts_users

to file ‘backup_ts_users.rman’;

约束条件对于查询优化的作用

约束条件对于查询优化至关重要。 许多人仅仅认识到约束是为了保证数据的完整性,当然这也是对的。
但约束同事也会被优化器利用以便决定最优执行计划。
优化器使用以下数据作为输入变量:
1. 查询语句
2. 所有可用的数据库对象统计值
3. 系统统计值,可能存在的如CPU速度,单块物理读的速度,以及一系列硬件指标
4. 数据库初始化参数 (parameters)
优化器使用所有这些信息以便决定最好的查询方式。我常常遇到人们在数据仓库或报表系统中避免使用约束。
“或许他们不需要约束以保持数据完整性,但他们确实需要约束以获取最优执行计划。数据仓库中糟糕的执行计划
可能执行数个小时乃至于数天。由于性能考量,数据仓库同样需要约束!
让我来看一些例子(使用11gr1,11.1.0.7)。第一个例子是分区排除,该特性自版本7.3时引入。
在代码演示1中,我们建立2个表包括互斥的数据以及一个合并(UNION ALL)它们的视图。
代码演示1: 建立表以及互斥数据以及试图
SQL> create table t1
2  as
3  select * from all_objects
4  where object_type in (‘TABLE’,’VIEW’);

SQL> alter table t1 modify object_type not null;

表已更改。
SQL> alter table t1 add constraint t1_check_otype
2  check (object_type in (‘TABLE’,’VIEW’));

表已更改。

SQL> create table t2
2  as
3  select * from all_objects
4  where object_type in (‘SYNONYM’,’PROCEDURE’);

SQL> alter table t2 modify object_type not null;

表已更改。

表已创建。

SQL> alter table t2 add constraint t2_check_obype
2  check (object_type in (‘SYNONYM’,’PROCEDURE’));

表已更改。
SQL> create or replace view v
2  as
3  select * from t1
4  union all
5  select * from t2;

视图已创建。

代码演示2中我们使用object_type列查询视图并观察其执行计划。

代码演示2:
SQL> select * from v where object_type = ‘TABLE’;

Execution Plan
—————————————————————————-
Plan hash value: 3982894595

—————————————————————————–
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
—————————————————————————–
|   0 | SELECT STATEMENT     |      |    40 |  6320 |   151   (1)| 00:00:02 |
|   1 |  VIEW                | V    |    40 |  6320 |         (1)| 00:00:02 |
|   2 |   UNION-ALL          |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL | T1   |  3083 |   475K|    31   (0)| 00:00:01 |
|*  4 |    FILTER            |      |       |       |            |          |
|*  5 |     TABLE ACCESS FULL| T2   |     5 |   790 |   12    (1)| 00:00:02 |
—————————————————————————–

Predicate Information (identified by operation id):
——————————-

3 – filter(“OBJECT_TYPE”=’TABLE’)
4 – filter(NULL IS NOT NULL)
5 – filter(“OBJECT_TYPE”=’TABLE’)

在代码演示2中的执行计划似乎没有避免读取T2表,请注意T2表上的object_type仅包括SYNONYMS和PROCEDURES类型

。 但我们可以看到该读表操作的上层检查,即第四步是过滤操作,该过滤的方式是
NULL is NOT NULL
这十分有趣,我们并没有写过这样的句子,但优化器为我们添加了他。由于 NULL IS NOT NULL是恒假的,所以实际

上这系列操作(步骤4和5)其实永远不会发生。(你可以使用tkprof工具来确认这一点)

在下一个例子中,我们来体验一下为什么NOT NULL约束在涉及索引使用时特别重要。

SQL> create table t
2    as
3    select * from all_objects;
Table created.

SQL> create index t_idx on t(object_type);
Index created.

SQL> exec
dbms_stats.gather_table_stats( user, ‘T’ );
PL/SQL procedure successfully completed.

现在,我们尝试计算表中的行数,优化器仅有一种方案,如代码演示3
代码演示3:
SQL> set autotrace traceonly explain
SQL> select count(*) from t;

Execution Plan
—————————————-
Plan hash value: 2966233522

——————————————————————-
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
——————————————————————-
|   0 | SELECT STATEMENT   |      |     1 |   283   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| T    | 68437 |   283   (1)| 00:00:04 |
——————————————————————-
因为object_type列是可为空的且索引不包含空键值(指索引相关的所有列皆为空的情况),我们无法利用索引计算表

上的行数,故不得不全表扫描。若我们告知数据库,OBJECT_TYPE列是非空的,执行计划将立即改变,如代码演示4


代码演示4:
SQL> alter table t modify object_type NOT NULL;
Table altered.

SQL> set autotrace traceonly explain
SQL> select count(*) from t;

Execution Plan
——————————————
Plan hash value: 1058879072

————————————————————————
| Id  | Operation             | Name  | Rows   | Cost (%CPU)| Time     |
————————————————————————
|   0 | SELECT STATEMENT      |       |     1  |    54   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1  |            |          |
|   2 |   INDEX FAST FULL SCAN| T_IDX | 68437  |    54   (2)| 00:00:01 |
————————————————————————
在object_type实际允许为空的情况呢?我们能做些什么呢?若我们可以在索引中加上非空的键值呢?显然计划将改

变。 在这里我把常数0加入索引。
SQL> drop index t_idx;
Index dropped.

SQL> create index t_idx
on t (object_type, 0);
Index created.

现在代码演示6中的执行计划趋向于使用索引了

代码演示6:
SQL> select * from t where object_type is null;

Execution Plan
—————————–
Plan hash value: 470836197

————————————————————————————–
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
————————————————————————————–
|   0 | SELECT STATEMENT             |       |     1 |   101 |  1      (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T     |     1 |   101 |  1      (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | T_IDX |     1 |       |  1      (0)| 00:00:01 |
————————————————————————————–

Predicate Information (identified by operation id):
—————————————————
2 – access(“OBJECT_TYPE” IS NULL)

约束,主键,以及外键

现在我们来测试主键,外键是如何影响优化器的。我们复制scott.emp和scott.dept表使用

DBMS_STATS.SET_TABLE_STATS改变他们的统计量使得它们看起来“十分庞大”,让优化器产生这种假象。

代码演示7:

SQL> create table emp
2    as
3    select *
4    from scott.emp;
Table created.

SQL> create table dept
2    as
3    select *
4    from scott.dept;
Table created.

SQL> create or replace view emp_dept
2    as
3    select emp.ename, dept.dname
4      from emp, dept
5     where emp.deptno = dept.deptno;
View created.

SQL> begin
2       dbms_stats.set_table_stats
3           ( user, ‘EMP’, numrows=>1000000, numblks=>100000 );
4       dbms_stats.set_table_stats
5           ( user, ‘DEPT’, numrows=>100000, numblks=>10000 );
6    end;
7    /
PL/SQL procedure successfully completed.

我们同样使用一个view:emp_dept来返回以上2个表的连接查询结果,在该view仅返回emp表上数据时(ename列),我们

发现执行计划需要读取emp与dept两个表,如代码演示8。

代码演示8:
SQL> select ename from emp_dept;

Execution Plan
—————————–
Plan hash value: 615168685

—————————————————————————————-
| Id   | Operation          | Name  | Rows  |  Bytes |TempSpc | Cost (%CPU) | Time     |
—————————————————————————————-
|    0 | SELECT STATEMENT   |       |  1000K|     31M|        | 31515    (1)| 00:06:19 |
|*   1 |  HASH JOIN         |       |  1000K|     31M|   2448K| 31515    (1)| 00:06:19 |
|    2 |   TABLE ACCESS FULL| DEPT  |   100K|   1269K|        |  2716    (1)| 00:00:33 |
|    3 |   TABLE ACCESS FULL| EMP   |  1000K|     19M|        | 27151    (1)| 00:05:26 |
—————————————————————————————-

Predicate Information (identified by operation id):
—————————————————
1 – access(“EMP”.”DEPTNO”=”DEPT”.”DEPTNO”)

我们知道实际无需访问dept表,由于deptno是dept表的主键,而emp表中的dept实际是外键。现在我们加上这层约束

关系。
SQL> alter table dept add constraint
dept_pk primary key(deptno);
Table altered.

SQL> alter table emp add constraint
emp_fk_dept foreign key(deptno)
2    references dept(deptno);
Table altered.

试看代码演示9中的执行计划

代码演示9:
SQL> select ename from emp_dept;

Execution Plan
——————————
Plan hash value: 3956160932

————————————————————————–
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
————————————————————————–
|   0 | SELECT STATEMENT  |      | 50000 |   976K| 27152   (1)| 00:05:26 |
|*  1 |  TABLE ACCESS FULL| EMP  | 50000 |   976K| 27152   (1)| 00:05:26 |
————————————————————————–

Predicate Information (identified by operation id):
—————————————————
1 – filter(“EMP”.”DEPTNO” IS NOT NULL)

如上dept表在以上查询中无需再在考虑之内了,故也谈不上哈希连接了,多出了一个断言:deptno是非空的。
优化器意识到外键和逐渐的存在,以上查询实际等效于 SELECT ENAME FROM EMP WHERE DEPTNO IS NOT NULL。
优化器舍弃了不必要的表,获得了响应时间上的进步。
同时也证明了实际应用中不因该总是使用SELECT *以简化应用的实施。

NULL对于索引的影响:

Indexes and NULLs

Applies to:

Oracle Server – Enterprise Edition – Version: 9.2.0.8 to 10.2.0.4 – Release: 9.2 to 10.2
Information in this document applies to any platform.

Purpose

This article illustrates some common reasons why indexes are not selected when NULLs are present.

Scope and Application

This is a basic level overview with examples of index usage.

Indexes and NULLs

Indexes and NULLs

When dealing with indexes, a common mistake is to forget about NULLs. Indexes do not store NULL values and so indexes on NULLable columns can’t be used to drive queries unless there is something that eliminates the NULL values from the query.

To illustrate this are a number of examples based upon the following table/indexes:

drop table nulltest; create table nulltest ( col1 number, col2 number, col3 number not null, col4 number not null); create index nullind1 on nulltest (col1); create index notnullind3 on nulltest (col3); begin for i in 1..10000 loop insert into nulltest values (i,i,i,i); if i mod 1000 = 0 then commit; end if; end loop; end; / analyze table nulltest compute statistics;

Illustrative Queries:

select col1 from nulltest t; select /*+ index(t nullind1) */ col1 from nulltest t; select /*+ index(t) */ col1 from nulltest t; select /*+ index(t notnullind3) */ col1 from nulltest t; select /*+ index(t notnullind3) */ col3 from nulltest t; select /*+ index(t nullind1) */ col1 from nulltest t where col1 between 0 and 20000; select col1 from nulltest t where col1 is not null;

Queries and Explanations:

SQL> select col1 from nulltest t; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=10000 Bytes=30000) 1 0 TABLE ACCESS (FULL) OF 'NULLTEST' (Cost=6 Card=10000 Bytes=30000)

col1 is NULLable so the index cannot be used with no predicate
SQL> select /*+ index(t nullind1) */ col1 from nulltest t; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=6 Card=10000 Bytes=30000) 1 0 TABLE ACCESS (FULL) OF 'NULLTEST' (Cost=6 Card=10000 Bytes=30000)
hinting the index on col1 (nullind1) makes no difference since col1 is NULLable
SQL> select /*+ index(t) */ col1 from nulltest t; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=10000 Bytes=30000) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'NULLTEST' (Cost=49 Card=10000 Bytes=30000) 2 1 INDEX (FULL SCAN) OF 'NOTNULLIND3' (NON-UNIQUE) (Cost=20 Card=10000)

An open index hint on the table allows the selection of the index on the NOT NULL column (col3). Notice that the col3 predicate is not included anywhere in the query. In order for col1 to be retrieved, the table has to be accessed.

SQL> select /*+ index(t notnullind3) */ col1 from nulltest t; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=49 Card=10000 Bytes=30000) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'NULLTEST' (Cost=49 Card=10000 Bytes=30000) 2 1 INDEX (FULL SCAN) OF 'NOTNULLIND3' (NON-UNIQUE) (Cost=20 Card=10000)
hinting notnullind3 directly works as well
SQL> select /*+ index(t notnullind3) */ col3 from nulltest t; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20 Card=10000 Bytes=30000) 1 0 INDEX (FULL SCAN) OF 'NOTNULLIND3' (NON-UNIQUE) (Cost=20 Card=10000 Bytes=30000)
Selecting the NOT NULL column (col3) works fine and uses the index with no table access.
SQL> select /*+ index(t nullind1) */ col1 from nulltest t where col1 between 0 and 20000; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20 Card=10000 Bytes=30000) 1 0 INDEX (RANGE SCAN) OF 'NULLIND1' (NON-UNIQUE) (Cost=20 Card=10000 Bytes=30000)
The effect of the predicate against col1 is to eliminate nulls from the data returned from the column. This allows the index to be used.
SQL> select col1 from nulltest t 2 where col1 is not null; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=10000 Bytes=30000) 1 0 INDEX (FAST FULL SCAN) OF 'NULLIND1' (NON-UNIQUE) (Cost=5 Card=10000 Bytes=30000)
This example illustrates that forcing the column to return only NOT NULL values allows the index to be used.

Note that in the previous example, the Index hint prevents an index fast full scan operation from being selected. An INDEX_FFS hint must be supplied to force an index fast full scan.

Active Session History (ASH) performed an emergency flush

Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 4194304 bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:


SQL> select total_size,awr_flush_emergency_count from v$ash_info;

TOTAL_SIZE AWR_FLUSH_EMERGENCY_COUNT
---------- -------------------------
   4194304                         1

SQL> select * from v$ash_info;

TOTAL_SIZE FIXED_SIZE SAMPLING_INTERVAL OLDEST_SAMPLE_ID OLDEST_SAMPLE_TIME                                                          LATEST_SAMPLE_ID
---------- ---------- ----------------- ---------------- --------------------------------------------------------------------------- ----------------
LATEST_SAMPLE_TIME                                                          SAMPLE_COUNT SAMPLED_BYTES SAMPLER_ELAPSED_TIME DISK_FILTER_RATIO
--------------------------------------------------------------------------- ------------ ------------- -------------------- -----------------
AWR_FLUSH_BYTES AWR_FLUSH_ELAPSED_TIME AWR_FLUSH_COUNT AWR_FLUSH_EMERGENCY_COUNT
--------------- ---------------------- --------------- -------------------------
   4194304    4194304              1000          1168516 18-JUN-11 05.06.33.163000000 AM                                                      1490473
21-JUN-11 10.40.34.688000000 PM                                                     9257       3104088              2883663                10
        2764680                 557774               1                         1

This issue can be ignored.

Learning 11g New Background Processes

New Background Processes In 11g

ACMS (atomic controlfile to memory service) per-instance process is an agent that contributes to ensuring a distributed SGA memory update is either globally committed on success or globally aborted in the event of a failure in an Oracle RAC environment.

DBRM (database resource manager) process is responsible for setting resource plans and other resource manager related tasks.This process implements the resource plans that may be configured for a database instance. It sets the resource plans in place and performs various operations related to enforcing/implementing those resource plans. The resource manager allows the administrators of a database to have fine grained control over the resources used by the database instance, by applications accessing the database, or by individual users accessing the database.

DIA0 (diagnosability process 0) (only 0 is currently being used) is responsible for hang detection and deadlock resolution.

DIAG (diagnosability) process performs diagnostic dumps and executes global oradebug commands.In past releases, the DIAG process was used exclusively in a RAC environment. As of Oracle Database 11g, with the new ADR (Advanced Diagnostic Repository), it is responsible for monitoring the overall health of the instance, and it captures information needed in the processing of instance failures. This applies to both single instance configurations as well as multi-instance RAC configurations.

EMNC (event monitor coordinator) is the background server process used for database event management and notifications.The EMNC process is part of the AQ architecture. It is used to notify queue subscribers of messages they would be interested in. This notification is performed asynchronously. There are Oracle Call Interface (OCI) functions available to register a callback for message notification. The callback is a function in the OCI program that will be invoked automatically whenever a message of interest is available in the queue.The EMNn background process is used to notify the subscriber. The EMNC process is started automatically
when the first notification is issued for the instance. The application may then issue an explicit message_receive(dequeue) to retrieve the message.

FBDA (flashback data archiver process) archives the historical rows of tracked tables into flashback data archives. Tracked tables are tables which are enabled for flashback archive. When a transaction containing DML on a tracked table commits, this process stores the pre-image of the rows into the flashback archive. It also keeps metadata on the current rows.FBDA is also responsible for automatically managing the flashback data archive for space, organization, and retention and keeps track of how far the archiving of tracked transactions has occurred.This process is new in Oracle Database 11g Release 1 and above. It is the key component of the new flashback data archive capability–the ability to query data “as of” long periods of time ago (for example, to query data in a table as it appeared one year ago, five years ago, and so on). This long term historical query capability is achieved by maintaining a history of the row changes made to every row in a table over time. This history, in turn, is maintained by the FBDA process in the background. This process functions by working soon after a transaction commits. The FBDA process will read the UNDO generated by that transaction and roll back the changes made by the transaction. It will then record these rolled back (the original values) rows in the flashback data archive for us.

GTX0-j (global transaction) processes provide transparent support for XA global transactions in an Oracle RAC environment. The database autotunes the number of these processes based on the workload of XA global transactions. Global transaction processes are only seen in an Oracle RAC environment.

KATE performs proxy I/O to an ASM metafile when a disk goes offline.

MARK marks ASM allocation units as stale following a missed write to an offline disk.

SMCO (space management coordinator) process coordinates the execution of various space management related tasks, such as proactive space allocation and space reclamation. It dynamically spawns slave processes (Wnnn) to implement the task.This process is part of the manageability infrastructure. It coordinates the proactive space management features of the database such as the processes that discover space that could be reclaimed and the processes that perform the reclamation.

VKTM (virtual keeper of time) is responsible for providing a wall-clock time (updated every second) and reference-time counter (updated every 20 ms and available only when running at elevated priority).Implements a consistent, fine-grained clock
for the Oracle instance. It is responsible for providing both wall clock time (human readable) as well as an extremely high resolution timer (not necessarily built using wall clock time, more of a ticker that increments for very small units of time) used to measure durations and intervals.

RCBG – this background process is responsible for processing data into server result cache

GEN0 – General task execution process which performs required tasks;This process provides, as expected by its name, a general task execution thread for the database. The main goal of this process is to offload potentially blocking processing (processing that would cause a process to stop while it occurs) from some other process and perform it in the background. For example, if the main ASM process needs to perform some blocking file operation, but that operation could safely be done in the background (ASM can safely continue processing before the operation completes), then the ASM process may request the GEN0 process to perform this operation and let GEN0 notify it upon completion. It is similar in nature to the slave processes described further below.

INSV Data Guard Broker INstance Slave Process; The internode servers(INSVs) maintain a connection between the nodes in the cluster to ensure that Broker on each node knows the state of the cluster. The primary database will always startup an INSV process even if the database is not a RAC; As other RAC instances start,the Broker will start the INSVs, and they will make queries between all the instances to determine the current state of each node in a Broker-controlled database. In this manner, the Broker is able to maintain information about each instance in the RAC. To make sure that this does not have a adverse impact to performance , this querying is optimized to avoid any unnecessary RAC traffic.

FSFP Data Guard Broker FSFO Pinger; We may see one more process on the primary database in Broker-controlled configuration: The Fast-Start Failover process(FSFP) which is used only when the primary is under the control of DataGurad’s automatic failover feature, Fast-Start Failover. The FSFP process will establish a connection to the Fast-Start Failover target database by connecting to a DRC process on that database, much like the NVS to DRC process connection.

* NSVn DataGuard Net Server(Nsvn) From 1 to n of these network server processes can exist. They are responsible for making contact with the remote database and sending across any work items to the remote database. Connection to the remote database are made using the same connect identifier that you specified for the database when you created the configuration. “DMON communicates with the standby databases using NSV processes. This prevents DMON from network hang.”

* DRCn These network receiver processes establish the connection from the source database NSVn process. An NSVn to DRCn connection is similar to the LogWriter Network Service(LSN) to the Remote File Servers(RFS) connection for Redo Transport. As with Redo Transport , when the Broker needs to send something(data or SQL, for example) between databases, it uses this NSV to DRC connection. These connections are started as need.

XDMG cell automation manager;XDMG monitors all configured Exadata cells for state changes, such as a bad disk getting replaced, and performs the required tasks for such events. Its primary tasks are to watch for inaccessible disks and cells and when they become accessible again, and to initiate the ASM ONLINE operation. The ONLINE operation is handled by XDWK.

* XDWK XDWK gets started when asynchronous actions such as ONLINE, DROP, and ADD an ASM disk are requested by XDMG. After a 5 minute period of inactivity, this process will shut itself down.

ABMR Auto BMR Background Process “In addition to the real time query capability of the 11g Active Data Guard feature, we can also add to our high availability capability by using the Automatic Block Media Repair feature whereby data block corruptions on the Primary database can be repaired by obtaining those blocks from the standby site – all performed by a background process (ABMR) transparent to the application.”

Some additional Processes not documented in 10G :

* PZ (PQ slaves used for global Views) are RAC Parallel Server Slave processes, but they are not normal parallel slave processes, PZnn processes (starting at 99) are used to query GV$ views which is done using Parallel Execution on all instances, if more than one PZ process is needed, then PZ98, PZ97,… (in that order) are created automatically.

* O00 (ASM slave processes) A group of slave processes establish connections to the ASM instance. Through this connection pool database processes can send messages to the ASM instance. For example opening a file sends the open request to the ASM instance via a slave. However slaves are not used for long running operations such as creating a file. The use slave (pool) connections eliminate the overhead of  logging into the ASM instance for short requests

* x000 – Slave used to expell disks after diskgroup reconfiguration

DESCRIPTION DEST
Auto BMR Listener ABMR
Auto BMR message ABMR
ACMS Watchdog Task ACMS
RM workload learning DBRM
Check for MDBRM to LDBRM message DBRM
Check for LDBRM to MDBRM message DBRM
DBRM Active Session Limit Event DBRM
Check queuing policy DBRM
RM running count check DBRM
DBRM Timeout Actions DBRM
shutdown DBRM DBRM
do RM action in DBRM DBRM
check for quiesce messages DBRM
unquiesce the instance during database close DBRM
unsubscribe to quiesce channel DBRM
subscribe to quiesce channel DBRM
IORM action DSKM
DSKM check for message from master DiskMon DSKM
SAGE CC Action DSKM
DSKM action DSKM
FSFP Wakeup FSFP
FSFP Shutdown Message FSFP
FSFP Register Observer Message FSFP
ASM UFG health check timeout GEN0
Diskgroup Resource Action GEN0
release quiesce enqueue GEN0
get quiesce enqueue GEN0
check for parameters from other instances GEN0
kcl downconvert object locks GEN0
kcl update persistent read mostly info GEN0
kcl initiate persistent read mostly GEN0
kcb L2 cache verify file header GEN0
Start Exadata specific backgrounds GEN0
KSVMSVC timeout action GEN0
Heartbeat action for HSM connectivity GEN0
Prepare flashback log GEN0
GEN0 Timeout KCQ cleanup GEN0
Global Txn SHUTdown GTX*
K2Q Timeout action GTX*
get instance lock GTX*
convert instance lock GTX*
free instance lock GTX*
INSV Receive Message INSV
INSV Shutdown Message INSV
INSV Interrupt INSV
Network Server forced NS**
Network Server shutdown NS**
Network Server wakeup NSA*
Network Server wakeup NSS*
NetSlave Interrupt Function NSV*
NetSlave Wakeup Message NSV*
NetSlave Shutdown Message NSV*
do work for new gpnp instances RMS0
RSM Set Parameter RSM0
RSM Wakeup RSM0
RSM Receive Message RSM0
SMCO Timout SMCO
SMCO Action SMCO
Infrequent timeout actions for Exadata disks auto manage module XDMG
Action to process msgs from instance processes XDMG
Interrupt actions for Exadata disks auto manage module XDMG

References:
F Background Processes
<Oracle Data Guard 11g Handbook>
<Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions>
New Background Processes In 11g [ID 444149.1]

沪ICP备14014813号-2

沪公网安备 31010802001379号