Too many fragmentation in LMT?

这周和同事讨论技术问题时，他告诉我客户的一套11.1.0.6的数据库中某个本地管理表空间上存在大量的Extents Fragment区间碎片，这些连续的Extents没有正常合并为一个大的Extent，他怀疑这是由于11.1.0.6上的bug造成了LMT上存在大量碎片。

同事判断该表空间上有碎片的依据是从dba_free_space视图中查询到大量连续的Free Extents:

SQL> select tablespace_name,EXTENT_MANAGEMENT,ALLOCATION_TYPE from dba_tablespaces where tablespace_name='FRAGMENT';

TABLESPACE_NAME                EXTENT_MAN ALLOCATIO
------------------------------ ---------- ---------
FRAGMENT                       LOCAL      SYSTEM

SQL> select block_id,blocks from dba_free_space where tablespace_name='FRAGMENT' and rownum<10;
BLOCK_ID     BLOCKS
---------- ----------
40009     222136
25          8
9           8
17          8
33          8
41          8
49          8
57          8
65          8
.............. 

SQL> select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' and blocks=8 group by blocks;

  COUNT(*)     BLOCKS
---------- ----------
      5000          8

以上可以看到FRAGMENT表空间使用autoallocate的Local Extent Management，的确存在大量的连续Extents没有合并。在DMT即字典管理表空间模式下需要SMON进程定期维护FET$基表将tablespace上的连续空闲Extents合并为更大的一个Extents。而在LMT模式下因为采用数据文件头上(datafile header 3-8 blocks in 10g)的位图管理区间，所以无需某个后台进程特意去合并区间。

为什么LMT下连续空闲Extents没有合并而造成碎片呢？因为这套库采用11gr1较不稳定的11.1.0.6版本，所以把问题归咎为某个bug似乎可以讲得通。一开始我较为认同同事的bug论，且和同事一起查询了Metalink上11gr1上一些已知的bug，但并没有发现症状匹配的bug note。

这让我反思这个问题，过早的将cause定位到bug过于主观了，并不是所有我们预期外的情况(unexpected)都属于bug。

实际上dba_free_space所显示的信息可能并不”真实”，这种幻象往往由10g以后出现的flashback table特性引起:

SQL> select text from dba_views where view_name='DBA_FREE_SPACE';

TEXT
--------------------------------------------------------------------------------
======DMT REAL FREE EXTENTS=============

select ts.name, fi.file#, f.block#,
       f.length * ts.blocksize, f.length, f.file#
from sys.ts$ ts, sys.fet$ f, sys.file$ fi
where ts.ts# = f.ts#
  and f.ts# = fi.ts#
  and f.file# = fi.relfile#
  and ts.bitmapped = 0

union all

======LMT REAL FREE EXTENTS=============

select /*+ ordered use_nl(f) use_nl(fi) */
       ts.name, fi.file#, f.ktfbfebno,
       f.ktfbfeblks * ts.blocksize, f.ktfbfeblks, f.ktfbfefno
from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
where ts.ts# = f.ktfbfetsn
  and f.ktfbfetsn = fi.ts#
  and f.ktfbfefno = fi.relfile#
  and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0

union all

======LMT RECYCLEBIN FREE EXTENTS=============

select /*+ ordered use_nl(u) use_nl(fi) */
       ts.name, fi.file#, u.ktfbuebno,
       u.ktfbueblks * ts.blocksize, u.ktfbueblks, u.ktfbuefno
from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
where ts.ts# = rb.ts#
  and rb.ts# = fi.ts#
  and u.ktfbuefno = fi.relfile#
  and u.ktfbuesegtsn = rb.ts#
  and u.ktfbuesegfno = rb.file#
  and u.ktfbuesegbno = rb.block#
  and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0

union all

======DMT RECYCLEBIN FREE EXTENTS=============

select ts.name, fi.file#, u.block#,
       u.length * ts.blocksize, u.length, u.file#
from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
where ts.ts# = u.ts#
  and u.ts# = fi.ts#
  and u.segfile# = fi.relfile#
  and u.ts# = rb.ts#
  and u.segfile# = rb.file#
  and u.segblock# = rb.block#
  and ts.bitmapped = 0

以上我们通过解析10g中的dba_free_space视图可以了解到该视图所显示的Free Extents由以下四个部分组成:

LMT表空间上真正空闲的Extents
DMT表空间上真正空闲的Extents
LMT表空间上被RECYCLEBIN中对象占用的Extents
DMT表空间上被RECYCLEBIN中对象占用的Extents

而在10g以前的版本中因为没有recyclebin特性的”干扰”，所以dba_free_space所显示的Free Extents由前2个部分组成，因此我们可以在10g中创建一个兼容视图以实现对真正空闲空间的查询:

create view dba_free_space_pre10g as
select ts.name TABLESPACE_NAME,
       fi.file# FILE_ID,
       f.block# BLOCK_ID,
       f.length * ts.blocksize BYTES,
       f.length BLOCKS,
       f.file# RELATIVE_FNO
  from sys.ts$ ts, sys.fet$ f, sys.file$ fi
 where ts.ts# = f.ts#
   and f.ts# = fi.ts#
   and f.file# = fi.relfile#
   and ts.bitmapped = 0
union all
select /*+ ordered use_nl(f) use_nl(fi) */
 ts.name TABLESPACE_NAME,
 fi.file# FILE_ID,
 f.ktfbfebno BLOCK_ID,
 f.ktfbfeblks * ts.blocksize BYTES,
 f.ktfbfeblks BLOCKS,
 f.ktfbfefno RELATIVE_FNO
  from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
 where ts.ts# = f.ktfbfetsn
   and f.ktfbfetsn = fi.ts#
   and f.ktfbfefno = fi.relfile#
   and ts.bitmapped <> 0
   and ts.online$ in (1, 4)
   and ts.contents$ = 0
 /

create view dba_free_space_recyclebin as
select /*+ ordered use_nl(u) use_nl(fi) */
 ts.name TABLESPACE_NAME,
 fi.file# FILE_ID,
 u.ktfbuebno BLOCK_ID,
 u.ktfbueblks * ts.blocksize BYTES,
 u.ktfbueblks BLOCKS,
 u.ktfbuefno RELATIVE_FNO
  from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
 where ts.ts# = rb.ts#
   and rb.ts# = fi.ts#
   and u.ktfbuefno = fi.relfile#
   and u.ktfbuesegtsn = rb.ts#
   and u.ktfbuesegfno = rb.file#
   and u.ktfbuesegbno = rb.block#
   and ts.bitmapped <> 0
   and ts.online$ in (1, 4)
   and ts.contents$ = 0
union all
select ts.name TABLESPACE_NAME,
       fi.file# FILE_ID,
       u.block# BLOCK_ID,
       u.length * ts.blocksize BYTES,
       u.length BLOCKS,
       u.file# RELATIVE_FNO
  from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
 where ts.ts# = u.ts#
   and u.ts# = fi.ts#
   and u.segfile# = fi.relfile#
   and u.ts# = rb.ts#
   and u.segfile# = rb.file#
   and u.segblock# = rb.block#
   and ts.bitmapped = 0
/

通过以上创建的dba_free_space_pre10g和dba_free_space_recyclebin视图，我们可以很明确地区分表空间上空闲Extents。

针对本例中的LMT上存在大量连续的空闲Extent碎片，可以直接从上述视图中得到答案:

SQL> select * from dba_free_space_pre10g where tablespace_name='FRAGMENT';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
FRAGMENT                               13      40009 1819738112     222136           13

SQL> select count(*),blocks from dba_free_space_recyclebin where tablespace_name='FRAGMENT' group by blocks;

  COUNT(*)     BLOCKS
---------- ----------
      5000          8

显然是RECYCLEBIN中存在大量的小"对象"从而造成了LMT上出现大量碎片的假象

SQL> select space,count(*) from dba_recyclebin where ts_name='FRAGMENT' group by space;

     SPACE   COUNT(*)
---------- ----------
         8       5000

我们可以通过purge recyclebin来"合并"这些Extents碎片

SQL> purge dba_recyclebin;

DBA Recyclebin purged.

SQL>  select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' group by blocks;

  COUNT(*)     BLOCKS
---------- ----------
         1     262136

如果应用程序创建大量的小型堆(heap)表来存放临时数据，在不再需要这些数据时将这些堆表drop掉，那么就可能造成上述LMT”碎片”问题。我们在实际处理10g以后的这类空间问题时一定搞清楚，哪些是真正的Free Extents,而哪些是来自RECYCLEBIN的Extents。

另一方面这个case还告诉我们不要一遇到预料外的行为方式(unexpected behavior)就将问题定位到bug，这样会过早僵化我们的诊断预期。为了尽可能地发散思维，我们有必要如围棋中所提倡的”保留变化”那样来安排诊断步骤。

Too many fragmentation in LMT?

Comments

Comment 取消回复