SGA中的内存池包含不同大小的内存块。当数据库启动时,就有一个大的内存块分配并被hush buckets 里的空闲列表追踪。随着时间推移,随着内存的分配和释放,内存块被按照大小在不同的hush buckets间移动。当SGA里任何一个内存池里出现不能满足内部分配请求的情况时,ORA-04031就出现了。
shared pool共享池的管理方式不同于其它的内存池。。共享池存放与数据字典和library cache有关的信息。但是,这些内存区域根据空闲列表和最近使用算法(LRU)管理。当在共享池的所有搜索结束后,从LRU列表清除所有的可能清除的对象, 多次扫描空闲列表后,仍没有找到内存块,ORA-04031就出现了。这意味着ORA-04031很难预测。
对共享池的监测,可以看它是否包含许多类似的SQL,只有文字不同。 这种情况会占用更多的共享池内存并引共享池碎片,过多的共享池碎片(fragment)会导致虽然共享池中仍有大量的free memory,但都是尺寸较小的内存块(chunk),当Oracle进程申请一些较大的连续内存空间(memory chunk)时,虽然共享池中的free memory大小远大于申请的连续空间大小,仍会引发ORA-4031错误。使用绑定变量可以使SQL 共享。使用本文所附的脚本可以查出内存中是否有许多类似SQL。
即使使用了绑定变量后,仍然可能存在高version count(子指针)的情况。为了使子指针共享,CURSOR_SHARING参数可能需要调整。metalink 文档Note 296377.1 和 261020.1可以提供详细信息。若造成4031的原因是由于未绑定变量或者游标无法共享导致的过度硬解析(Hard Parse),则应当调整应用绑定变量或者调整初始化参数。
9i中开始引入shared pool subpool子池技术。设计多个共享池子池的目的是分散单个shared pool LRU Latch的并行压力。 ,每一个子池都包含自有的LRU LIST和保留区域等其他内存结构。 subpools子池的数量会在Oracle实例启动(startup nomount)时根据SGA_MAX_SIZE(或11g中的memory_max_target)以及服务器上的CPU数量而自动决定。子池数量最多为7个,在Card系统中为4个。
每一个subpool都是一个”迷你型”的共享池,其包括自有的Freelist、内存结构记录和LRU LIST。subpool子池技术是对shared pool并发扩张能力的增强,且每一个子池现在都使用独立的shared pool child latch来保护。这意味着不再像9i以前的版本那样因为只有一个shared pool latch而剧烈争用了。
但是在实际使用中发现版本9i中仍存在一些BUG,可能导致在子池之间的内存使用分布不平衡均匀,这可能导致虽然部分子池使用率不高,但是个别子池内存过度分配,从而导致ORA-4031错误。一般把这种现象称作”subpool imbalance”。
分析问题发生时段的ALERT.LOG内容如下:
Wed Nov 21 02:16:01 2012 ARC1: Evaluating archive log 3 thread 1 sequence 2373 ARC1: Beginning to archive log 3 thread 1 sequence 2373 Creating archive destination LOG_ARCHIVE_DEST_1: '/card_arch1/1_2373.arc' ARC1: Completed archiving log 3 thread 1 sequence 2373 Wed Nov 21 16:11:46 2012 Errors in file /oracle/admin/card/udump/card1_ora_6782.trc: ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","select rowid, count(*) as TO...","sga heap(1,0)","library cache") ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","select rowid, count(*) as TO...","sga heap(1,0)","library cache") Wed Nov 21 16:11:47 2012 Trace dumping is performing id=[cdmp_20121121161147] Wed Nov 21 16:14:10 2012 Errors in file /oracle/admin/card/udump/card1_ora_6682.trc: ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","TD_REPTCASH","sga heap(1,0)","library cache") ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","TD_REPTCASH","sga heap(1,0)","library cache") Wed Nov 21 16:14:11 2012 Trace dumping is performing id=[cdmp_20121121161411
在16:11:46时SPID=6782的前台服务进程报ORA-4031错误,分析该进程的TRACE如下:
/oracle/admin/card/udump/card1_ora_6782.trc Oracle9i Enterprise Edition Release 9.2.0.8.0 - 64bit Production With the Partitioning and Real Application Clusters options JServer Release 9.2.0.8.0 - Production ORACLE_HOME = /oracle/product/9.2 System name: HP-UX Node name: ecprod1 Release: B.11.31 Version: U Machine: ia64 Instance name: card1 Redo thread mounted by this instance: 1 Oracle process number: 114 Unix process pid: 6782, image: oracle@ecprod1 (TNS V1-V3) *** 2012-11-21 16:06:40.006 *** SESSION ID:(498.18090) 2012-11-21 16:06:40.005 ================================= Begin 4031 Diagnostic Information ================================= The following information assists Oracle in diagnosing causes of ORA-4031 errors. This trace may be disabled by setting the init.ora parameter _4031_dump_bitvec = 0 ====================================== Allocation Request Summary Information ====================================== Current information setting: 00654fff Dump Interval=300 seconds SGA Heap Dump Interval=3600 seconds Last Dump Time=11/21/2012 16:06:39 Allocation request for: kkslpkp - literal info. Heap: c000000353cb25d0, size: 4200 ****************************************************** HEAP DUMP heap name="sga heap" desc=c000000381602030 extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0 parent=0000000000000000 owner=0000000000000000 nex=0000000000000000 xsz=0x1 ****************************************************** HEAP DUMP heap name="library cache" desc=c000000353cb25d0 extent sz=0x348 alt=32767 het=32 rec=0 flg=2 opc=2 parent=c000000381602030 owner=c00000032f85db08 nex=0000000000000000 xsz=0x1 Subheap has 4712 bytes of memory allocated ==================== Process State Object ==================== ---------------------------------------- SO: c0000002e93dac10, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00 (process) Oracle pid=114, calls cur/top: c00000030275a888/c00000030275a888, flag: (0) - int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 111 last post received-location: kglpndl: post after freeing latch last process to post me: c0000002e93dd1b0 137 0 last post sent: 1298803 0 16 last post sent-location: ksasnd last process posted by me: c0000002e93bb2d0 1 6 (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: c0000002ea3a4ff0 O/S info: user: oracle, term: UNKNOWN, ospid: 6782 OSD pid info: Unix process pid: 6782, image: oracle@ecprod1 (TNS V1-V3) ========================= User Session State Object ========================= ---------------------------------------- SO: c0000002ea3fc930, type: 4, owner: c0000002e93dac10, flag: INIT/-/-/0x00 (session) trans: 0000000000000000, creator: c0000002e93dac10, flag: (100041) USR/- BSY/-/-/-/-/- DID: 0001-0072-00161539, short-term DID: 0000-0000-00000000 txn branch: 0000000000000000 oct: 0, prv: 0, sql: c000000347a44a50, psql: c0000003009a42e0, user: 26/OC_MES O/S info: user: root, term: , ospid: 1234, machine: ecrsouterapp1 program: last wait for 'latch free' blocking sess=0x0 seq=33786 wait_time=12162 address=c00000030245cdc8, number=9d, tries=0 temporary object counter: 0 ========================= Current Parent KGL Object ========================= LIBRARY OBJECT HANDLE: handle=c000000347a44a50 name=select rowid, count(:"SYS_B_0") amount from te_enterpriseinfo_cur a where a.borrowerid = :"SYS_B_1" and a.regionalismcode like :"SYS_B_2" hash=5740b5 timestamp=11-21-2012 16:06:39 namespace=CRSR flags=RON/KGHP/TIM/PN0/MED/[50010000] kkkk-dddd-llll=0000-0001-0001 lock=N pin=X latch#=9 lwt=c000000347a44a80[c000000347a44a80,c000000347a44a80] ltm=c000000347a44a90[c000000347a44a90,c000000347a44a90] pwt=c000000347a44ab0[c000000347a44ab0,c000000347a44ab0] ptm=c000000347a44b40[c000000347a44b40,c000000347a44b40] ref=c000000347a44a60[c000000347a44a60, c000000347a44a60] lnd=c000000347a44b58[c000000347a44b58,c000000347a44b58] LIBRARY OBJECT: object=c00000037760ea28 type=CRSR flags=EXS[0001] pflags= [00] status=VALD load=0 CHILDREN: size=16 child# table reference handle ------ -------- --------- -------- 0 c00000037760ec88 c00000035b9ba6a8 c000000353207a90 C000000000000004 ? C000000000032F90 ? ----- End of Call Stack Trace ----- =============================== Memory Utilization of Subpool 1 =============================== Allocation Name Size _________________________ __________ "free memory " 82400392 "miscellaneous " 43284312 "type object de " 0 "parameters " 0 "KCL name table " 3673416 "MTTR advisory " 98256 "Checkpoint queue " 9177280 "db_block_hash_buckets " 9699328 "PL/SQL MPCODE " 0 "trigger inform " 0 "sim memory hea " 1279200 "KCL extra lock elements " 864000 "PL/SQL PPCODE " 0 "ges reserved msg buffers " 2096008 "fixed allocation callback" 744 "pl/sql source " 0 "KQR M PO " 189272 "trigger source " 0 "ges enqueues " 3261024 "KSXR pending messages que" 853952 "gcs shadows " 12303112 "ges resource hash table " 1048576 "dictionary cache " 1068608 "ges resources " 1592696 "enqueue resources " 572504 "errors " 0 "PL/SQL DIANA " 0 "library cache " 439251408 "KQR L PO " 266936 "sql area " 64536 "sessions " 749008 "gcs resources " 20555384 "KQR S SO " 2048 "trigger defini " 0 "event statistics per sess" 3137568 "KGLS heap " 26608 "KQR X PO " 18032 "table definiti " 0 =============================== Memory Utilization of Subpool 2 =============================== Allocation Name Size _________________________ __________ "free memory " 108187608 "miscellaneous " 37722824 "table definiti " 1984 "ges enqueues " 3395408 "db_block_hash_buckets " 9794640 "ksm_file2sga region " 370496 "PL/SQL DIANA " 175992 "KQR S SO " 512 "KQR L PO " 633360 "gcs shadows " 28408552 "sim memory hea " 1286432 "parameters " 206424 "1M buffer " 1049088 "errors " 40280 "type object de " 0 "KQR M PO " 804672 "trigger defini " 272 "trigger source " 88 "partitioning d " 22728 "channel handle " 260392 "KGK heap " 552 "db_files " 253952 "dictionary cache " 1068608 "KQR X PO " 23184 "gcs resources " 36658760 "transaction " 480384 "trigger inform " 64 "library cache " 159389832 "ges resources " 1482984 "sql area " 226277160 "sessions " 746304 "Checkpoint queue " 9177280 "event statistics per sess" 3148936 "fixed allocation callback" 720 "FileIdentificatonBlock " 3580496 "KGLS heap " 381736 "ktlbk state objects " 651240 "PL/SQL MPCODE " 1850264 =============================== Memory Utilization of Subpool 3 =============================== Allocation Name Size _________________________ __________ "free memory " 108746832 "miscellaneous " 43190000 "KGLS heap " 340640 "gcs resource hash table " 16105280 "trigger source " 80 "joxs heap init " 4240 "PL/SQL PPCODE " 23624 "errors " 5592 "KQR L PO " 612576 "trigger inform " 64 "table definiti " 1560 "sim memory hea " 1279200 "MTTR advisory " 106368 "1M buffer " 1049088 "ges regular msg buffers " 1088248 "gcs resources " 20555384 "KQR M PO " 805872 "pl/sql source " 0 "KQR X PO " 59248 "KQR S SO " 2816 "ges enqueues " 3985496 "ges resources " 1917944 "dictionary cache " 1068608 "trigger defini " 9008 "type object de " 0 "PL/SQL DIANA " 410776 "parameters " 187768 "library cache " 197201256 "processes " 1376000 "sql area " 209882928 "sessions " 746304 "qmps connections " 486200 "event statistics per sess" 3137568 "Checkpoint queue " 9177280 "PLS non-lib hp " 2088 "gcs shadows " 12306208 "fixed allocation callback" 680 "PL/SQL MPCODE " 1661384 =============================== Memory Utilization of Subpool 4 =============================== Allocation Name Size _________________________ __________ "free memory " 82782928 "miscellaneous " 40984472 "ges resources " 1783744 "db_handles " 1160000 "parameters " 323936 "sim memory hea " 1279200 "KGLS heap " 697784 "KQR M PO " 791440 "ges big msg buffers " 4674088 "temporary tabl " 0 "PL/SQL DIANA " 146680 "partitioning d " 67896 "KGK heap " 65584 "KQR L PO " 600312 "Checkpoint queue " 9177280 "PL/SQL MPCODE " 3006528 "enqueue " 2332272 "FileOpenBlock " 16104056 "KSXR receive buffers " 1034000 "table definiti " 3272 "KQR X PO " 46368 "ges enqueues " 4135704 "dictionary cache " 1068608 "gcs shadows " 12306208 "KQR S SO " 2304 "PX subheap " 54640 "trigger inform " 680 "library cache " 138469232 "gcs resources " 20555384 "sql area " 271924816 "sessions " 746304 "errors " 15304 "event statistics per sess" 3137568 "ges process array " 1273272 "pl/sql source " 0 "trigger defini " 4400 "type object de " 0 "fixed allocation callback" 728 LIBRARY CACHE STATISTICS: namespace gets hit ratio pins hit ratio reloads invalids -------------- --------- --------- --------- --------- ---------- ---------- CRSR 1044847223 0.999 1198493483 0.551 507345521 90137 TABL/PRCD/TYPE 1190948651 1.000 2350380701 1.000 125865 0 BODY/TYBD 568579 1.000 568659 0.999 71 0 TRGR 12494087 1.000 12494089 1.000 774 0 INDX 2091941 0.999 2091942 0.999 34 0 CLST 68101 0.998 83171 0.996 0 0 OBJE 0 0.000 0 0.000 0 0 PIPE 0 0.000 0 0.000 0 0 LOB 0 0.000 0 0.000 0 0 DIR 348 0.994 216 0.778 22 0 QUEU 0 0.000 0 0.000 0 0 OBJG 0 0.000 0 0.000 0 0 PROP 0 0.000 0 0.000 0 0 JVSC 502 0.578 502 0.578 0 0 JVRE 502 0.584 502 0.584 0 0 ROBJ 0 0.000 0 0.000 0 0 REIP 0 0.000 0 0.000 0 0 CPOB 0 0.000 0 0.000 0 0 EVNT 11919560 1.000 17875331 1.000 1680 0 SUMM 0 0.000 0 0.000 0 0 DIMN 0 0.000 0 0.000 0 0 CTX 0 0.000 0 0.000 0 0 OUTL 0 0.000 0 0.000 0 0 RULS 0 0.000 0 0.000 0 0 RMGR 0 0.000 0 0.000 0 0 IFSD 0 0.000 0 0.000 0 0 PPLN 0 0.000 0 0.000 0 0 PCLS 0 0.000 0 0.000 0 0 SUBS 0 0.000 0 0.000 0 0 LOCS 0 0.000 0 0.000 0 0 RMOB 0 0.000 0 0.000 0 0 RSMD 0 0.000 0 0.000 0 0 JVSD 0 0.000 0 0.000 0 0 ENPR 0 0.000 0 0.000 0 0 RELC 0 0.000 0 0.000 0 0 STREAM 0 0.000 0 0.000 0 0 APPLY 0 0.000 0 0.000 0 0 APPLY SOURCE 0 0.000 0 0.000 0 0 APPLY DESTN 0 0.000 0 0.000 0 0 TEST 0 0.000 0 0.000 0 0 CUMULATIVE 2262939494 0.999 3581988596 0.850 507473967 90137
当Oracle进程触发ORA-4031错误后会产生4031相关的TRACE,其中包括了当时简要的shared pool使用情况; 分析上述4031 trace中的信息可以发现存在4个subpool (Memory Utilization of Subpool 1、Memory Utilization of Subpool 2、Memory Utilization of Subpool 3、Memory Utilization of Subpool 4)。
以上4个subpool的主要使用情况如下:
Subpool1 | Subpool2 | Subpool3 | Subpool4 | Total Size | |
Free Memory | 82400392 | 108187608 | 108746832 | 82782928 | 364MB |
library cache | 439251408 | 159389832 | 197201256 | 138469232 | 947MB |
Sql area | 64536 | 226277160 | 209882928 | 271924816 | 675MB |
Total size | 637531776 | 637531776 | 637531776 | 620754624 | 2415MB |
从以上表格可以看到library cache和SQL Area占用了1600MB的共享池空间,而实际free memory的总数也达到364MB,实际引发ORA-4031时所申请的chunk大小仅为4216 bytes。
进一步分析可以发现在subpool1中的sql area仅占用64536字节,而subpool 4中sql area占用271924816字节,说明subpool1与其他subpool之间在空间使用上存在不均衡。
为了进一步分析Card库的共享池使用情况,我们通过动态性能视图收集了11-23日 14时当前的shared pool使用状态,由于并非在ORA-4031当时所收集的信息,虽然不能作为诊断该问题的直接证据,但同样具有参考意义:
BUCKET KSMCHCLS subpool From Count Biggest AvgSize Total -------------------------------- -------- ---------- ---------- ---------- ---------- ---------- ---------- 0 (<140) free 1 40 1443 48 43 62112 0 (<140) free 1 50 373 56 56 20888 0 (<140) free 1 60 537 64 64 34368 0 (<140) free 1 70 364 72 72 26208 0 (<140) free 1 80 2778 88 86 241024 0 (<140) free 1 90 206 96 96 19776 0 (<140) free 1 100 230 104 104 23920 0 (<140) free 1 110 602 112 112 67424 0 (<140) free 1 120 706 128 124 88112 0 (<140) free 1 130 199 136 136 27064 0 (<140) free 2 40 1031 48 43 45232 0 (<140) free 2 50 766 56 56 42896 0 (<140) free 2 60 616 64 64 39424 0 (<140) free 2 70 374 72 72 26928 0 (<140) free 2 80 3421 88 86 295744 0 (<140) free 2 90 338 96 96 32448 0 (<140) free 2 100 451 104 104 46904 0 (<140) free 2 110 325 112 112 36400 0 (<140) free 2 120 1458 128 121 176832 0 (<140) free 2 130 233 136 136 31688 0 (<140) free 3 40 1604 48 44 70896 0 (<140) free 3 50 640 56 56 35840 0 (<140) free 3 60 694 64 64 44416 0 (<140) free 3 70 680 72 72 48960 0 (<140) free 3 80 3006 88 85 258256 0 (<140) free 3 90 615 96 96 59040 0 (<140) free 3 100 602 104 104 62608 0 (<140) free 3 110 487 112 112 54544 0 (<140) free 3 120 1298 128 123 160360 0 (<140) free 3 130 340 136 136 46240 0 (<140) free 4 40 1110 48 43 48424 0 (<140) free 4 50 758 56 56 42448 0 (<140) free 4 60 557 64 64 35648 0 (<140) free 4 70 451 72 72 32472 0 (<140) free 4 80 2678 88 85 230072 0 (<140) free 4 90 305 96 96 29280 0 (<140) free 4 100 333 104 104 34632 0 (<140) free 4 110 303 112 112 33936 0 (<140) free 4 120 1353 128 121 164832 0 (<140) free 4 130 243 136 136 33048 1 (140-267) free 1 200 1001 216 204 205048 1 (140-267) free 1 140 378 152 148 55968 1 (140-267) free 1 160 568 176 164 93208 1 (140-267) free 1 180 178 192 188 33608 1 (140-267) free 1 220 1536 232 231 355320 1 (140-267) free 1 240 449 256 249 112048 1 (140-267) free 1 260 135 264 264 35640 1 (140-267) free 2 200 1028 216 206 212768 1 (140-267) free 2 140 774 152 148 114560 1 (140-267) free 2 160 1293 176 171 221352 1 (140-267) free 2 180 576 192 189 109144 1 (140-267) free 2 220 377 232 228 86184 1 (140-267) free 2 240 921 256 250 230856 1 (140-267) free 2 260 272 264 264 71808 1 (140-267) free 3 200 1078 216 205 221824 1 (140-267) free 3 140 874 152 148 129448 1 (140-267) free 3 160 1642 176 168 276560 1 (140-267) free 3 180 2612 192 185 483816 1 (140-267) free 3 220 443 232 227 100824 1 (140-267) free 3 240 858 256 248 213608 1 (140-267) free 3 260 193 264 264 50952 1 (140-267) free 4 200 741 216 206 152928 1 (140-267) free 4 140 546 152 148 80816 1 (140-267) free 4 160 1164 176 169 197160 1 (140-267) free 4 180 408 192 187 76632 1 (140-267) free 4 220 304 232 227 69216 1 (140-267) free 4 240 578 256 249 144200 1 (140-267) free 4 260 172 264 264 45408 2 (268-523) free 1 300 494 336 320 158368 2 (268-523) free 1 250 450 296 284 127960 2 (268-523) free 2 300 719 336 320 230744 2 (268-523) free 2 500 20 520 518 10360 2 (268-523) free 2 250 655 296 283 185744 2 (268-523) free 2 450 51 496 480 24488 2 (268-523) free 3 300 956 336 322 308272 2 (268-523) free 3 250 641 296 281 180688 2 (268-523) free 4 300 636 336 320 203920 2 (268-523) free 4 400 46 424 409 18840 2 (268-523) free 4 250 582 296 283 165232 2 (268-523) free 4 350 23 392 383 8824 3-5 (524-4107) free 1 500 1 776 776 776 3-5 (524-4107) free 1 3000 1 3160 3160 3160 3-5 (524-4107) free 2 500 306 984 700 214312 3-5 (524-4107) free 2 1000 123 1496 1201 147760 3-5 (524-4107) free 2 1500 1139 1992 1878 2139208 3-5 (524-4107) free 2 2000 1444 2496 2322 3353672 3-5 (524-4107) free 2 2500 4534 2992 2751 12476912 3-5 (524-4107) free 2 3000 7520 3496 3211 24151448 3-5 (524-4107) free 2 3500 6105 3992 3770 23018216 3-5 (524-4107) free 2 4000 2403 4104 4065 9769384 3-5 (524-4107) free 3 500 652 992 858 559536 3-5 (524-4107) free 3 1000 1897 1496 1162 2204544 3-5 (524-4107) free 3 1500 289 1992 1726 498840 3-5 (524-4107) free 3 2000 2276 2496 2260 5144800 3-5 (524-4107) free 3 2500 4021 2992 2789 11217152 3-5 (524-4107) free 3 3000 1822 3496 3279 5975424 3-5 (524-4107) free 3 3500 11485 3992 3785 43475232 3-5 (524-4107) free 3 4000 2710 4104 4048 10970720 3-5 (524-4107) free 4 500 44 992 966 42512 3-5 (524-4107) free 4 1000 238 1488 1177 280248 3-5 (524-4107) free 4 1500 152 1968 1765 268392 3-5 (524-4107) free 4 2000 2015 2496 2376 4789544 3-5 (524-4107) free 4 2500 3400 2992 2754 9364976 3-5 (524-4107) free 4 3000 2037 3496 3287 6696072 3-5 (524-4107) free 4 3500 4731 3992 3710 17552504 3-5 (524-4107) free 4 4000 2043 4104 4073 8321968 6+ (4108+) free 1 4000 8 4712 4250 34000 6+ (4108+) free 1 5000 1 5912 5912 5912 6+ (4108+) free 1 7000 1 7048 7048 7048 6+ (4108+) free 1 13000 1 13416 13416 13416 6+ (4108+) free 1 15000 1 15752 15752 15752 6+ (4108+) free 1 19000 1 19600 19600 19600 6+ (4108+) free 1 21000 1 21360 21360 21360 6+ (4108+) free 1 25000 1 25232 25232 25232 6+ (4108+) free 2 4000 100 4272 4146 414648 6+ (4108+) free 2 8000 4 8456 8338 33352 6+ (4108+) free 2 11000 1 11872 11872 11872 6+ (4108+) free 3 4000 145 4720 4132 599248 6+ (4108+) free 3 8000 4 8664 8432 33728 6+ (4108+) free 4 4000 90 4528 4152 373728 6+ (4108+) free 4 6000 1 6896 6896 6896 6+ (4108+) free 4 7000 3 7848 7626 22880 6+ (4108+) free 4 8000 3 8456 8360 25080
以上列出了各subpool freelist bucket的情况,6+ (4108+) 代表存放大于4108bytes的free chunk的hash bucket, 可以看到大于4108bytes的free chunk总大小越为1.6M, 而3-5 (524-4107) bucket中chunk数量最多、总大小也最大。 由此可知card库共享池存在大量碎片。过多的共享池碎片导致连续尺寸大约4108字节的free chunk越来越少,当16:11时需要分配4216字节连续chunk时,由于无法分配该连续空间而触发了ORA-4031错误。
综上所述是由于共享池碎片和subpool内存分配不均衡最终引发了此次ORA-4031 。
18:33时触发过ORA-4031的Oracle进程陆续报ORA-00600: [729], [8680], [space leak]错误,该错误的stack call 如下 ksudel=> ksuxds=> ksmugf=> cold_ksmuhe
*** 2012-11-21 18:33:35.807 ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [729], [8680], [space leak], [], [], [], [], [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedmp()+528 call _etext_f()+23058430 000000000 ? 09110304648 C000000000000996 ? 400000000285CD00 ? ksfdmp()+64 call _etext_f()+23058430 000000003 ? 09110304648 kgeriv()+432 call _etext_f()+23058430 600000000004F280 ? 09110304648 000000003 ? C000000000000716 ? 40000000052FF030 ? 00000863F ? 60000000004B5788 ? 000000000 ? 000000000 ? kgesiv()+208 call _etext_f()+23058430 600000000004F280 ? 09110304648 60000000004B7A08 ? 60000000000502F8 ? 600000000001D800 ? 600000000001D790 ? ksesic2()+208 call _etext_f()+23058430 600000000004F280 ? 09110304648 60000000005E5198 ? 0000002D9 ? 000000002 ? 9FFFFFFFFFFF9418 ? $cold_ksmuhe()+432 call _etext_f()+23058430 0000002D9 ? 09110304648 60000000004C1680 ? 9FFFFFFFFFFF9418 ? 60000000004BFFD0 ? 00000000A ? 4000000000874260 ? 60000000005E1780 ? 000000000 ? ksmugf()+688 call _etext_f()+23058430 0000021E8 ? 09110304648 600000000004F060 ? 60000000005E1778 ? 000000001 ? C000000000000B9C ? ksuxds()+4320 call _etext_f()+23058430 4000000000C9B4B8 ? 09110304648 C0000002EB3B47B8 ? 400000000086A4C0 ? C000000000001736 ? 4000000001EC4B60 ? 000008E73 ? 9FFFFFFFFFFF9480 ? 60000000004ABC40 ? ksudel()+144 call 9fffffffffff94f8 C0000002EB3B4320 ?
通过上述stack call可以定位到Bug 3931332 : ORA-600 [729] LEAKS WITH ORA-4031 TRACES:
Hdr: 3931332 9.2.0.5.0 RDBMS 9.2.0.5.0 DICTIONARY PRODID-5 PORTID-46 ORA-600 Abstract: ORA-600 [729] LEAKS WITH ORA-4031 TRACES sedmp: internal or fatal error ORA-600: internal error code, arguments: [729], [4380], [space leak], [], [], [], [], [] ----- Call Stack Trace ----- ksmuhe ksmugf ksuxds ksudel
以上ORA-600 [729] LEAKS内部错误常由ORA-4031引发,一般通过避免ORA-4031可以绕过ORA-600错误
问题总结:
- 由于共享池碎片和subpool内存分配不均衡最终引发了此次ORA-4031,解决方案见下文建议部分
- ORA-600 [729] LEAKS内部错误由ORA-4031触发BUG 3931332引起,通过避免ORA-4031错误可以绕过该问题
- 执行statspack长时间无响应并消耗100%的CPU的问题在11-23日实际操作中未出现,由于没有当时的性能记录无法进一步追查。
1.1 建议
- 定期监控shared pool free memory空闲内存量和碎片情况,具体可以使用如下脚本:
set pages 1000 lines 120 col name for a60 col value for a30 select * from v$sgastat where pool like 'shared%' and name='free memory'; select '0 (<140)' BUCKET, KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10) "From", count(*) "Count" , max(KSMCHSIZ) "Biggest", trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total" from x$ksmsp where KSMCHSIZ<140 and KSMCHCLS='free' group by KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10) UNION ALL select '1 (140-267)' BUCKET, KSMCHCLS, KSMCHIDX,20*trunc(KSMCHSIZ/20) , count(*) , max(KSMCHSIZ) , trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total" from x$ksmsp where KSMCHSIZ between 140 and 267 and KSMCHCLS='free' group by KSMCHCLS, KSMCHIDX, 20*trunc(KSMCHSIZ/20) UNION ALL select '2 (268-523)' BUCKET, KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50) , count(*) , max(KSMCHSIZ) , trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total" from x$ksmsp where KSMCHSIZ between 268 and 523 and KSMCHCLS='free' group by KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50) UNION ALL select '3-5 (524-4107)' BUCKET, KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500) , count(*) , max(KSMCHSIZ) , trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total" from x$ksmsp where KSMCHSIZ between 524 and 4107 and KSMCHCLS='free' group by KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500) UNION ALL select '6+ (4108+)' BUCKET, KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000) , count(*) , max(KSMCHSIZ) , trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total" from x$ksmsp where KSMCHSIZ >= 4108 and KSMCHCLS='free' group by KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000); SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ) SIZ, To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')||'k' "AVG SIZE" FROM X$KSMSP GROUP BY KSMCHCLS;
- 当shared pool 空闲内存少于80M或碎片过多时考虑alter system flush shared_pool强制冲刷共享池。若条件允许也可以考虑定期在业务空闲时段例如凌晨冲刷共享池,或者定期重启数据库实例
- 启用statspack自动收集性能快照,一般Interval为一个小时,statspack是久经考验的性能监控工具,对性能的损耗非常小。
- 考虑部署OSW性能监控工具:
OSWatcher(OS Watcher Black Box)是Oracle标准的Support工具,该工具通过一系列的SHELL脚本来收集多方面的OS信息,包括CPU、内存、网络等。
OSWatcher不会给系统带来较重的性能负载,OSWatcher本质上是调用最常见的OS命令如ps、vmstat、netstat。
通过大量广泛的OSW使用,已经证明了OSW是一套安全、可靠的资源监控工具。
更多OSW的信息可以参考以下Metalink文档:
OSW 用户手册:OSWatcher Black Box User Guide.
What Is The OSWATCHER Black Box Effect On The Server Performance And What Are The Alternative Commands? [ID 946107.1]
中文:OSWatcher Black Box: How to improve performance and monitor your system automatically (Mandarin) [ID 1492202.1]
OSWatcher Black Box Analyzer User Guide [ID 461053.1]
Comment