这周和同事讨论技术问题时,他告诉我客户的一套11.1.0.6的数据库中某个本地管理表空间上存在大量的Extents Fragment区间碎片,这些连续的Extents没有正常合并为一个大的Extent,他怀疑这是由于11.1.0.6上的bug造成了LMT上存在大量碎片。
同事判断该表空间上有碎片的依据是从dba_free_space视图中查询到大量连续的Free Extents:
SQL> select tablespace_name,EXTENT_MANAGEMENT,ALLOCATION_TYPE from dba_tablespaces where tablespace_name='FRAGMENT'; TABLESPACE_NAME EXTENT_MAN ALLOCATIO ------------------------------ ---------- --------- FRAGMENT LOCAL SYSTEM SQL> select block_id,blocks from dba_free_space where tablespace_name='FRAGMENT' and rownum<10; BLOCK_ID BLOCKS ---------- ---------- 40009 222136 25 8 9 8 17 8 33 8 41 8 49 8 57 8 65 8 .............. SQL> select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' and blocks=8 group by blocks; COUNT(*) BLOCKS ---------- ---------- 5000 8
以上可以看到FRAGMENT表空间使用autoallocate的Local Extent Management,的确存在大量的连续Extents没有合并。在DMT即字典管理表空间模式下需要SMON进程定期维护FET$基表将tablespace上的连续空闲Extents合并为更大的一个Extents。而在LMT模式下因为采用数据文件头上(datafile header 3-8 blocks in 10g)的位图管理区间,所以无需某个后台进程特意去合并区间。
为什么LMT下连续空闲Extents没有合并而造成碎片呢?因为这套库采用11gr1较不稳定的11.1.0.6版本,所以把问题归咎为某个bug似乎可以讲得通。一开始我较为认同同事的bug论,且和同事一起查询了Metalink上11gr1上一些已知的bug,但并没有发现症状匹配的bug note。
这让我反思这个问题,过早的将cause定位到bug过于主观了,并不是所有我们预期外的情况(unexpected)都属于bug。
实际上dba_free_space所显示的信息可能并不”真实”,这种幻象往往由10g以后出现的flashback table特性引起:
SQL> select text from dba_views where view_name='DBA_FREE_SPACE'; TEXT -------------------------------------------------------------------------------- ======DMT REAL FREE EXTENTS============= select ts.name, fi.file#, f.block#, f.length * ts.blocksize, f.length, f.file# from sys.ts$ ts, sys.fet$ f, sys.file$ fi where ts.ts# = f.ts# and f.ts# = fi.ts# and f.file# = fi.relfile# and ts.bitmapped = 0 union all ======LMT REAL FREE EXTENTS============= select /*+ ordered use_nl(f) use_nl(fi) */ ts.name, fi.file#, f.ktfbfebno, f.ktfbfeblks * ts.blocksize, f.ktfbfeblks, f.ktfbfefno from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi where ts.ts# = f.ktfbfetsn and f.ktfbfetsn = fi.ts# and f.ktfbfefno = fi.relfile# and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0 union all ======LMT RECYCLEBIN FREE EXTENTS============= select /*+ ordered use_nl(u) use_nl(fi) */ ts.name, fi.file#, u.ktfbuebno, u.ktfbueblks * ts.blocksize, u.ktfbueblks, u.ktfbuefno from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi where ts.ts# = rb.ts# and rb.ts# = fi.ts# and u.ktfbuefno = fi.relfile# and u.ktfbuesegtsn = rb.ts# and u.ktfbuesegfno = rb.file# and u.ktfbuesegbno = rb.block# and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0 union all ======DMT RECYCLEBIN FREE EXTENTS============= select ts.name, fi.file#, u.block#, u.length * ts.blocksize, u.length, u.file# from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb where ts.ts# = u.ts# and u.ts# = fi.ts# and u.segfile# = fi.relfile# and u.ts# = rb.ts# and u.segfile# = rb.file# and u.segblock# = rb.block# and ts.bitmapped = 0
以上我们通过解析10g中的dba_free_space视图可以了解到该视图所显示的Free Extents由以下四个部分组成:
- LMT表空间上真正空闲的Extents
- DMT表空间上真正空闲的Extents
- LMT表空间上被RECYCLEBIN中对象占用的Extents
- DMT表空间上被RECYCLEBIN中对象占用的Extents
而在10g以前的版本中因为没有recyclebin特性的”干扰”,所以dba_free_space所显示的Free Extents由前2个部分组成,因此我们可以在10g中创建一个兼容视图以实现对真正空闲空间的查询:
create view dba_free_space_pre10g as select ts.name TABLESPACE_NAME, fi.file# FILE_ID, f.block# BLOCK_ID, f.length * ts.blocksize BYTES, f.length BLOCKS, f.file# RELATIVE_FNO from sys.ts$ ts, sys.fet$ f, sys.file$ fi where ts.ts# = f.ts# and f.ts# = fi.ts# and f.file# = fi.relfile# and ts.bitmapped = 0 union all select /*+ ordered use_nl(f) use_nl(fi) */ ts.name TABLESPACE_NAME, fi.file# FILE_ID, f.ktfbfebno BLOCK_ID, f.ktfbfeblks * ts.blocksize BYTES, f.ktfbfeblks BLOCKS, f.ktfbfefno RELATIVE_FNO from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi where ts.ts# = f.ktfbfetsn and f.ktfbfetsn = fi.ts# and f.ktfbfefno = fi.relfile# and ts.bitmapped <> 0 and ts.online$ in (1, 4) and ts.contents$ = 0 / create view dba_free_space_recyclebin as select /*+ ordered use_nl(u) use_nl(fi) */ ts.name TABLESPACE_NAME, fi.file# FILE_ID, u.ktfbuebno BLOCK_ID, u.ktfbueblks * ts.blocksize BYTES, u.ktfbueblks BLOCKS, u.ktfbuefno RELATIVE_FNO from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi where ts.ts# = rb.ts# and rb.ts# = fi.ts# and u.ktfbuefno = fi.relfile# and u.ktfbuesegtsn = rb.ts# and u.ktfbuesegfno = rb.file# and u.ktfbuesegbno = rb.block# and ts.bitmapped <> 0 and ts.online$ in (1, 4) and ts.contents$ = 0 union all select ts.name TABLESPACE_NAME, fi.file# FILE_ID, u.block# BLOCK_ID, u.length * ts.blocksize BYTES, u.length BLOCKS, u.file# RELATIVE_FNO from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb where ts.ts# = u.ts# and u.ts# = fi.ts# and u.segfile# = fi.relfile# and u.ts# = rb.ts# and u.segfile# = rb.file# and u.segblock# = rb.block# and ts.bitmapped = 0 /
通过以上创建的dba_free_space_pre10g和dba_free_space_recyclebin视图,我们可以很明确地区分表空间上空闲Extents。
针对本例中的LMT上存在大量连续的空闲Extent碎片,可以直接从上述视图中得到答案:
SQL> select * from dba_free_space_pre10g where tablespace_name='FRAGMENT'; TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO ------------------------------ ---------- ---------- ---------- ---------- ------------ FRAGMENT 13 40009 1819738112 222136 13 SQL> select count(*),blocks from dba_free_space_recyclebin where tablespace_name='FRAGMENT' group by blocks; COUNT(*) BLOCKS ---------- ---------- 5000 8 显然是RECYCLEBIN中存在大量的小"对象"从而造成了LMT上出现大量碎片的假象 SQL> select space,count(*) from dba_recyclebin where ts_name='FRAGMENT' group by space; SPACE COUNT(*) ---------- ---------- 8 5000 我们可以通过purge recyclebin来"合并"这些Extents碎片 SQL> purge dba_recyclebin; DBA Recyclebin purged. SQL> select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' group by blocks; COUNT(*) BLOCKS ---------- ---------- 1 262136
如果应用程序创建大量的小型堆(heap)表来存放临时数据,在不再需要这些数据时将这些堆表drop掉,那么就可能造成上述LMT”碎片”问题。我们在实际处理10g以后的这类空间问题时一定搞清楚,哪些是真正的Free Extents,而哪些是来自RECYCLEBIN的Extents。
另一方面这个case还告诉我们不要一遇到预料外的行为方式(unexpected behavior)就将问题定位到bug,这样会过早僵化我们的诊断预期。为了尽可能地发散思维,我们有必要如围棋中所提倡的”保留变化”那样来安排诊断步骤。
very good,
尤其这一句话
“不要一遇到预料外的行为方式(unexpected behavior)就将问题定位到bug,这样会过早僵化我们的诊断预期。”