我们在学习Oracle的过程中,或多或少会存在个人对概念的理解错误、误解或者根本是教材编写存在不严谨的地方,这样或以讹传讹或三人言虎,导致在Oracle圈子存在着一些古老相传的迷信(superstition),因为这些迷信已经深入人心了,所以我们几乎很难纠正过来;这其实很有意思,IT作为一个高科技的领域也会出现迷信,说明我们在IT技术的”教学”和”思考”上存在问题,这一点值得深思。
这里我列出几个最为常见的迷信,算作抛砖引玉:
1.几乎所有的Oracle入门教程都会在介绍Large pool的时候这样描述:”RMAN 备份使用large pool作为磁盘I/O缓冲区,配置Large pool有助于提高RMAN备份性能”
Truth:除非你启用了slaves IO,否则rman并不使用large pool
RMAN I/O可以分成三种模式:
Mode | Disk | tape |
Asynchronous I/O | 绝大多数操作系统支持AIO,默认disk_asynch_io为TRUE,即默认启用磁盘异步IO。如果磁盘设备不支持AIO,那么会使用synchronous I/O。磁盘异步模式下RMAN I/O缓冲区域从PGA中分配,相关IO性能信息存放在V$backup_async_io视图中 | 磁带设备本身不支持AIO(tape I/O is always synchronous),虽然默认tape_asynch_io为TRUE,但磁带设备只能通过IO slaves模拟异步IO,所以启用磁带AIO需要另外设置backup_tape_io_slaves=TRUE。此模式下RMAN I/O缓冲区从shared pool或者large pool中分配,相关IO性能信息存放在V$backup_async_io视图中 |
Synchronous I/O | 若disk_asynch_io设置为false,或操作系统不支持异步IO,且dbwr_io_slaves=0时启用Synchronous I/O。此时RMAN I/O缓冲区从PGA中分配,相关IO性能信息存放在V$backup_sync_io视图中 | 默认backup_tape_io_slaves为false,即磁带设备默认不启用AIO而使用Synchronous I/O。此时RMAN I/O缓冲区从PGA中分配,相关性能信息存放在V$backup_sync_io视图中 |
Slaves I/O | 启用disk slaves I/O,要求设置disk_asynch_io=false且dbwr_io_slaves>0。此模式下RMAN I/O缓冲区从shared pool或者large pool中分配,相关IO性能信息存放在V$backup_async_io视图中 | 设置tape_asynch_io=true且backup_tape_io_slaves=true时启用,磁带的AIO模式其实就是使用slaves Io模拟获得的。所以此模式下的一切和tape AIO完全一样 |
我们在使用RMAN备份数据库时无论是磁盘备份还是磁带备份总是优先期望使用AIO异步IO特性(tape aio比较特殊,见上表),使用AIO的前提是设置合理的初始化参数以及操作系统支持AIO,如果我们使用的操作系统不支持AIO那么我们将不得不使用Synchronous IO同步IO。这并不是世界末日,因为Oracle提供了IO从属进程(slaves IO)来模拟AIO,当然这是退而求其次的。为了启用slaves IO,我们需要手动设置backup_tape_io_slaves或dbwr_io_slaves参数来启用IO从属特性,当使用磁带备份时设置backup_tape_io_slaves(此时tape_asynch_io应当为true)为true,当使用磁盘设备时设置dbwr_io_slaves(此时disk_asynch_io应当为false)为非零值。在启用slaves IO的前提下RMAN才会从Large pool当中分配内存并加以利用,如果没有配置large pool(注意如果启用了ASMM,那么Oracle会自动为large pool分配一个granule大小的空间)或者large pool过小,那么RMAN的内存缓冲区将从shared pool中分配。如果Oracle仍不能获得足够内存,那么将本地进程获取足够的IO缓存。若我们启用了I/O slaves,那么很有必要配置一个足够大的Large pool(一般60-100M就足够了),这样RMAN的I/O缓存区可以从large pool中分配,避免了RMAN的I/O buffer和shared pool中的library cache等其他组件发生竞争。
If I/O slaves are used, I/O buffers are obtained from the SGA ,or the large pool, if configured.If LARGE_POOL_SIZE is set, then Oracle attempts to get memory from the large pool. If this value is not large enough, then Oracle does not try to get buffers from the shared pool.If Oracle cannot get enough memory, then it obtains I/O buffer memory from local process memory and writes a message to the alert.log file indicating that synchronous I/O is used for this backup.
在默认情况下Oracle对于磁盘设备使用AIO模式(disk_asynch_io=true & dbwr_io_slaves=0 by default),而对于磁带设备使用synchronous I/O(tape_asynch_io=true & backup_tape_io_slaves=false by default),都不会启用slaves I/O,所以默认情况下RMAN总是从PGA中分配缓存。换而言之在默认情况下,即便配置了较大的Large pool也不会为RMAN所用。
RMAN allocates the tape buffers in the SGA or the PGA, depending on whether I/O slaves are used. If you set the initialization parameter BACKUP_TAPE_IO_SLAVES = true, then RMAN allocates tape buffers from the SGA or the large pool if the LARGE_POOL_SIZE initialization parameter is set. If you set the parameter to false, then RMAN allocates the buffers from the PGA.
我们来通过以下演示,进一步验证AIO/Slave IO环境下RMAN内存缓冲区从哪里分配,并加强印象:
SQL> select * From v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
SQL> show parameter async
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
disk_asynch_io boolean TRUE
tape_asynch_io boolean TRUE
SQL> select * From v$sgastat where pool='large pool';
POOL NAME BYTES
------------ -------------------------- ----------
large pool PX msg pool 903840
large pool free memory 15873376
backup as backupset database skip offline;
SQL> select * From v$sgastat where pool='large pool';
POOL NAME BYTES
------------ -------------------------- ----------
large pool PX msg pool 903840
large pool free memory 15873376
/* 在AIO模式下,全库备份后发现large pool未发生变化 */
SQL> alter system set disk_asynch_io=false scope=spfile;
System altered.
SQL> alter system set dbwr_io_slaves=2 scope=spfile;
System altered.
/* 以上启用了磁盘I/O Slave特性 */
SQL> startup force;
[oracle@rh2 ~]$ ps -ef|grep i10|grep -v grep
oracle 20761 1 0 20:44 ? 00:00:00 ora_i101_G10R2
oracle 20763 1 0 20:44 ? 00:00:00 ora_i102_G10R2
/* 启用I/O Slave后会出现ora_ixxx_SID这样的后台进程 */
SQL> select * From v$sgastat where pool='large pool';
POOL NAME BYTES
------------ -------------------------- ----------
large pool PX msg pool 903840
large pool free memory 15873376
RMAN> backup as backupset database skip offline;
SQL> select * From v$sgastat where pool='large pool';
POOL NAME BYTES
------------ -------------------------- ----------
large pool PX msg pool 903840
large pool free memory 24151392
large pool KSFQ Buffers 25276416
SQL> /
POOL NAME BYTES
------------ -------------------------- ----------
large pool PX msg pool 903840
large pool free memory 41006432
large pool KSFQ Buffers 8421376
/* 启用了I/O Slave后执行备份操作,
large pool中出现了KSFQ Buffers,这个KSFQ buffer就是RMAN所使用的缓冲区,
实际决定该buffer大小的是隐藏参数_backup_ksfq_bufsz和_backup_ksfq_bufcnt */
SQL> col name for a30
SQL> col describ for a70
SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE '%ksfq%';
NAME VALUE DESCRIB
------------------------------ ---------- ----------------------------------------------------------------------
_backup_ksfq_bufsz 0 size of the ksfq buffer used for backup/restore
_backup_ksfq_bufcnt 0 number of the ksfq buffers used for backup/restore
/* 在10g中似乎Oracle会自动调控以上2个参数 */
SQL> alter system set "_backup_ksfq_bufsz"=131072;
System altered.
SQL> alter system set "_backup_ksfq_bufcnt"=1;
System altered.
RMAN> backup tablespace data01;
/* I/O slaves的IO统计信息仍存放在V$backup_sync_io视图中,
而非可能是你所预期的v$backup_sync_io视图 */
SQL> select type,buffer_size,buffer_count from v$backup_async_io;
TYPE BUFFER_SIZE BUFFER_COUNT
--------- ----------- ------------
AGGREGATE 0 0
INPUT 131072 1
OUTPUT 1048576 4
另外large pool的使用量可以通过下列公式来估算:
LARGE_POOL_SIZE =
(4 * {RMAN Channels} * {DB_BLOCK_SIZE} * {DB_DIRECT_IO_COUNT} * {Multiplexing Level})
+
(4 * {RMAN Channels} * {Tape Buffer Size})
事实上如果你probe过PGA的内存使用情况,那么你或许会在PGA headdump中看到过”KSFQ heap”的相关信息。显然当在非slaves IO模式下,RMAN会从PGA HEAP->KSFQ heap这个subheap子堆中分配必要的buffer。
我们在磁盘AIO模式下执行必要的backup操作,之后找出RMAN相关的shadow process并对其做heapdump,分析其pga内存使用情况
SQL> select spid,pga_used_mem,pga_max_mem from v$process where addr in 2 (select paddr from v$session where program like '%rman%') 3 order by pga_used_mem desc ; SPID PGA_USED_MEM PGA_MAX_MEM ------------ ------------ ----------- 24424 5750341 14410829 24425 4717957 12134125 24413 3308341 9626701 24423 435773 993005 SQL> oradebug setospid 24424; Oracle pid: 25, Unix process pid: 24424, image: oracle@rh2.oracle.com (TNS V1-V3) SQL> oradebug dump heapdump 536870917; Statement processed. SQL> oradebug tracefile_name; /s01/admin/G10R2/udump/g10r2_ora_24424.trc ==========================heapdump details============================== FIVE LARGEST SUB HEAPS for heap name="pga heap" desc=0x68d3ec0 Subheap ds=0x87c83e8 heap name= KSFQ heap size= 4205296 owner=(nil) latch=(nil) ****************************************************** HEAP DUMP heap name="KSFQ heap" desc=0x87c83e8 extent sz=0x1040 alt=32767 het=32767 rec=0 flg=2 opc=2 parent=0x68d3ec0 owner=(nil) nex=(nil) xsz=0x20228 EXTENT 0 addr=0x7f86bf788dd8 Chunk 7f86bf788de8 sz= 1049112 freeable "KSFQ Buffers " EXTENT 1 addr=0x7f86bf988dd8 Chunk 7f86bf988de8 sz= 1049112 freeable "KSFQ Buffers " EXTENT 2 addr=0x7f86bfb88dd8 Chunk 7f86bfb88de8 sz= 1049112 freeable "KSFQ Buffers " EXTENT 3 addr=0x7f86bfc98dd8 Chunk 7f86bfc98de8 sz= 1049112 freeable "KSFQ Buffers " EXTENT 4 addr=0x7f86bfddf358 Chunk 7f86bfddf368 sz= 5192 freeable "KSFQ ctx " EXTENT 5 addr=0x87c7680 Chunk 0087c7690 sz= 984 perm "perm " alo=984 Chunk 0087c7a68 sz= 1944 free " " Chunk 0087c8200 sz= 464 freeable "KSFQ buffer poo" Total heap size = 4205032 FREE LISTS: Bucket 0 size=0 Chunk 0087c7a68 sz= 1944 free " " Total free space = 1944 UNPINNED RECREATABLE CHUNKS (lru first): PERMANENT CHUNKS: Chunk 0087c7690 sz= 984 perm "perm " alo=984 Permanent space = 984 /* 以上可以看到KSFQ Heap子堆共占用了4205296=4M内存, 而该服务进程的pga_used_memory总量为5750341 bytes,即KSFQ占该进程PGA的73% 另外这里KSFQ Buffer的大多Chunk是freeable的,仅少量为perm */
另外磁带I/O缓冲区的大小可以在配置通道时指定,其默认值由操作系统决定,一般为64k。我们使用allocate channel命令设置磁带I/O缓冲区,为了达到最佳性能,可以将磁带I/O缓冲区大小设置为256KB或更大,如:
allocate channel maclean1 device type sbt
parms="blksize=262144,ENV=(NB_ORA_SERV=nas,NB_ORA_POLICY=racdb,NB_ORA_CLIENT=rh2)";
结论:
- 在默认情况下(即disk backup使用AIO而tape backup使用sync IO),RMAN backup并不会从Large Pool中获益,而是从PGA中的KSFQ heap中分配必要的I/O内存缓冲区。当然我们还是推荐设置Large_pool_size为100M左右,因为即使是PC服务器也不会缺这一点内存
- 仅当启用I/O slaves时RMAN backup从Large Pool中分配ksfq buffer(ksfq buffer used for backup/restore),在9i/10g中常有因large pool过小而引起ORA-04031错误的案例;若large pool大小为0,那么ksfq buffer会从shared pool中分配,一方面ORA-04031:(“shared pool”,”unknown object”,”sga heap(1,0)”,”KSFQ Buffers”)错误的概率将大大增加,另一个方面将造成KSFQ与library cache间的竞争,不利于性能。这种情况下RMAN backup的确从Large pool中得到收益,设置large_pool_size为100M仍是被推荐的
- 在非slaves IO模式下RMAN从PGA的KSFQ heap子堆中分配I/O缓冲区,因为在非slaves IO模式下该内存缓冲区没有共享的必要
待修订!!
Mode | Disk | tape |
Asynchronous I/O | 绝大多数磁盘设备支持AIO,默认disk_asynch_io为TRUE,即默认启用磁盘异步IO。如果磁盘设备不支持AIO,那么会使用synchronous I/O。磁盘异步模式下RMAN I/O缓冲区域从PGA中分配,相关IO性能信息存放在V$backup_async_io视图中 | 磁带设备本身不支持AIO,虽然默认tape_asynch_io为TRUE,但磁带设备只能通过IO slaves模拟异步IO,所以启用磁带AIO需要另外设置backup_tape_io_slaves=TRUE。此模式下RMAN I/O缓冲区从shared pool或者large pool中分配,相关IO性能信息存放在V$backup_async_io视图中 |
Synchronous I/O | 若disk_asynch_io设置为false,或磁盘设备不支持异步IO,且dbwr_io_slaves=0时启用Synchronous I/O。此时RMAN I/O缓冲区从PGA中分配,相关IO性能信息存放在V$backup_sync_io视图中 | 默认backup_tape_io_slaves为false,即磁带设备默认不启用AIO而使用Synchronous I/O。此时RMAN I/O缓冲区从PGA中分配,相关性能信息存放在V$backup_sync_io视图中 |
Slaves I/O | 启用disk slaves I/O,要求设置disk_asynch_io=false且dbwr_io_slaves>0。此模式下RMAN I/O缓冲区从shared pool或者large pool中分配,相关IO性能信息存放在V$backup_async_io视图中 | 同tape AIO时一样 |
ORA-04031 “KSFQ Buffers” ksmlgpalloc
Applies to:
Oracle Server – Standard Edition – Version: 9.2.0.8 and later [Release: 9.2 and later ]
Information in this document applies to any platform.
Symptoms
— Problem Statement:
Getting ORA-4031 in the alert log file:
ORA-04031: unable to allocate 532524 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(1,0)”,”KSFQ Buffers”)
…
SO: 80355EA8, type: 4, owner: 802FB928, flag: INIT/-/-/0x00
(session) trans: 00000000, creator: 802FB928, flag: (41) USR/- BSY/-/-/-/-/-
DID: 0001-0095-00000022, short-term DID: 0000-0000-00000000
txn branch: 00000000
oct: 0, prv: 0, sql: 8F21F578, psql: 00000000, user: 0/SYS
O/S info: user: SYSTEM, term: GEHUS5031, ospid: 5296:1468, machine: EC\GEHUS5031
program: rman.exe
client info: rman channel=ch1
application name: backup archivelog: ch1, hash value=3567860541
action name: 0000106 STARTED, hash value=3757870881
last wait for ‘latch free’ blocking sess=0x0 seq=1234 wait_time=332
address=8b6ca86c, number=9d, tries=0
temporary object counter: 0
…No current library cache object being loaded
…No instantiation object
…
Analysis of callstack
——————————-
Summary Stack (to Full stack) (to Function List)
ksm_4031_dump
1089
ksmasg Callback function for allocating an SGA extent. The SGA is fixed size
kghnospc KGH: There is no space available in the heap, signal an error.
kghalo KGH: main allocation entry point
ksmlgpalloc ksmlgpalloc – KSM LarGe Pool ALLOCation
ksfqbalo
ksfqbcre BUFFER MANAGER FUNCS */
ksfqxc
ksfqxcrx
ksfqxcre ksfqxcre – CREate KSFQ context
krbbpc Piece Create. This is the implementation of
krbibpc Backup Piece Creation */
pevm_icd_call_comm
pfrrun PSDEVN: PL/SQL Interpreter Main Instruction Loop
pricar ——————————– pricar ——————————– */
pricbr PL/SQL Remote Interface – Call Back, Remote
prient2 prient2 – common code to run given entrypoint: could be RPC or non-RPC
prient prient – Run given (RPC) entrypoint
kkxrpc ARGSUSED */
kporpc kporpc.c – The oracle side function to start rpc
opiodr OPIODR: ORACLE code request driver – route the current request
ttcpip TTCPIP: Two Task Common PIPe read/write
opitsk opitsk – Two Task Oracle Side Function Dispatcher
opiino opiino – ORACLE Program Interface INitialize Opi
opiodr OPIODR: ORACLE code request driver – route the current request
opidrv # opidrv – ORACLE Program Interface DRiVer (IGNORE)
sou2o # Main Oracle executable entry point
opimai
OracleThreadStart@
77E64826
Cause
Large Pool was not configured.
RMAN backups need Large Pool to complete the required operations.
Solution
— To implement the solution, please execute the following steps::
SQL> ALTER SYSTEM SET LARGE_POOL_SIZE = 104857600 SCOPE=BOTH; — 100 MB
RMAN Restore Performance from Tape is Very Poor
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.4 to 11.1.0.7
This problem can occur on any platform.
Symptoms
RMAN restore from sbt backup is very slow compared to backup speed.
Backup rate is 60-90Mb/sec.
Restore rate is 6Mb/sec.
Backup is hardware multiplexed but media manager does support multiplexed backups.
Restore validate runtime is VERY fast so problem is the write back to disk.
Cause
RMAN is using a 128Kb buffer size for the datafiles when restoring back to disk.
The input buffer size used during backup is 1Mb.
To confirm that you are hitting the same problem, query v$backup_async IO and check buffer_size for the backup eg
SQL>select device_type, type, buffer_size from v$backup_async_io where
open_time > to_date(’30-jun-09 15:14:00′, ‘dd-mon-rr hh24:mi:ss’) and
close_time select device_type, type, buffer_size from v$backup_async_io where
open_time > to_date(’30-jun-09 14:50:00′, ‘dd-mon-rr hh24:mi:ss’) and
close_time alter system set “_BACKUP_KSFQ_BUFSZ”=1048576;
Users restoring to ASM may want to set this to the ASM stripe size.
Problem can affect any release prior to 11.2.
Please note:
Setting _BACKUP_KSFQ_BUFSZ=1048576 will benefit restore performance overall where you can see that a buffer size of 128Kb is being used. However, if restore validate is also taking a long time then setting this hidden parameter alone may not be enough, particularly if you have hardware multiplexed the original backup (allocated > 1 channel per physical tape device). If this is the case then refer to Note 740911.1: RMAN Restore Performance.
Hdr: 12422903 10.2.0.5.0 RDBMS 10.2.0.5.0 RMAN PRODID-5 PORTID-226
Abstract: RMAN BACKUP FAILING WITH ORA-600 [1862]
PROBLEM:
——–
when setting if _backup_ksfq_bufsz > 1048576 then rman backup fails with:
RMAN-8011: including current control file in backupset
RMAN-8038: channel ORA_SBT_TAPE_1: starting piece 1 at 03-05-2011 17:54:59
RMAN-571: ===========================================================
RMAN-569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-571: ===========================================================
RMAN-3009: failure of backup command on ORA_SBT_TAPE_1 channel at 05/03/2011
17:55:02
ORA-600: internal error code, arguments: [1862], [8388608], [], [], [], [],
[], []
DIAGNOSTIC ANALYSIS:
——————–
/home/oracle/admin/PCAP/udump/pcap1_ora_22266.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 – 64bit Production
With the Partitioning, Real Application Clusters, Oracle Label Security,
OLAP,
Data Mining Scoring Engine and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/10.2.0
System name: Linux
Node name: fr-srv-orahm01
Release: 2.6.18-194.el5
Version: #1 SMP Tue Mar 16 21:52:39 EDT 2010
Machine: x86_64
Instance name: PCAP1
Redo thread mounted by this instance: 1
Oracle process number: 439
Unix process pid: 22266, image: oracle@fr-srv-orahm01 (TNS V1-V3)
*** NAME:(0000018 STARTED16) 2011-05-03 14:24:43.976
*** NAME:(backup full datafile) 2011-05-03 14:24:43.976
*** NAME:(SYS$USERS) 2011-05-03 14:24:43.976
*** ID:(6242.2499) 2011-05-03 14:24:43.976
*** 14:24:43.976
ksedmp: internal or fatal error
ORA-600: internal error code, arguments: [1862], [67108864], [], [], [],
[], [], []
—– Call Stack Trace —–
ssd_unwind_bp ksedst ksedmp ksfdmp kgeriv kgesiv ksesic1 ksfqxc
ksfqxcrx ksfqxcre krbbcThread krbbpc krbibpc pevm_icd_call_commo
pfrinstr_ICAL pfrrun_no_tool pfrrun plsql_run pricar pricbr prient2
prient kkxrpc kporpc opiodr ttcpip opitsk opiino opiodr opidrv sou2o
opimai_real main libc_start_main start
user: 0/SYS
service name: SYS$USERS
O/S info: user: oracle, term: pts/1, ospid: 21905, machine: fr-srv-orahm01
program: rman@fr-srv-orahm01 (TNS V1-V3)
client info: rman channel=ORA_SBT_TAPE_1
application name: backup full datafile, hash value=2287030075
action name: 0000018 STARTED16, hash value=109080347
WORKAROUND:
———–
Dont set _backup_ksfq_bufsz > 1048576
STACK TRACE:
————
ssd_unwind_bp ksedst ksedmp ksfdmp kgeriv kgesiv ksesic1 ksfqxc
ksfqxcrx ksfqxcre krbbcThread krbbpc krbibpc pevm_icd_call_commo
pfrinstr_ICAL pfrrun_no_tool pfrrun plsql_run pricar pricbr prient2
prient kkxrpc kporpc opiodr ttcpip opitsk opiino opiodr opidrv sou2o
opimai_real main libc_start_main start
ORA-4030 “Out Of Process Memory Error” During RMAN Restore
Applies To
Oracle Server – Enterprise Edition – Version: 8.1.7.4
This problem can occur on any platform.
Symptoms
During RMAN restore following error occurs
run
{
allocate channel c1 type disk;
restore database ;
}
RMAN-03002: failure during compilation of command
RMAN-03013: command type: restore
RMAN-03006: non-retryable error occurred during execution of command: IRESTORE
RMAN-07004: unhandled exception during command execution on channel c1
RMAN-10035: exception raised in RPC: ORA-04030: out of process memory when
trying to allocate
1048632 bytes (pga heap,KSFQ Buffers)
RMAN-10031: ORA-19583 occurred during call to
DBMS_BACKUP_RESTORE.RESTOREBACKUPPIECE
Recovery Manager complete.
Cause
The RMAN failed to restore because of ORA-4030 “out of process memory”
Fix
Increase data(kbytes), stack(kbytes), memory(kbytes) parameter to unlimited in ulimit -a
Why ORA-04031 on Shared Pool during RMAN back up even when Large Pool is set?
Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.1 and later [Release: 10.2 and later ]
Information in this document applies to any platform.
Goal
Following error occurs while performing RMAN back up:
RMAN-03009: failure of backup command on ch0 channel at 07/23/2010 10:39:21
ORA-04031: unable to allocate 7836 bytes of shared memory (“shared pool”,”unknown object”,”sga heap(1,0)”,”ksfqpar”)
The LARGE_POOL_SIZE was set.
But the Large Pool was not used and ORA-4031 error occurred.
Solution
The LARGE POOL is only utilized when following are set:
1. DBWR_IO_SLAVES and/or BACKUP_TAPE_IO_SLAVES to a non-zero value
2. Set DISK_ASYNCH_IO = FALSE
Otherwise shared pool is used. You can test this as below:
When DBWR_IO_SLAVES is set to zero and DISK_ASYNCH_IO is set to true then Large pool is not used:
RMAN> allocate channel for maintenance type disk;
SQL> select * from v$sgastat where name like ‘%KSFQ
POOL NAME BYTES
———— ————————– ———-
shared pool X$KSFQP ANCHOR 52
shared pool KSFQ buffer pool 2376
SQL> select * from v$sgastat where pool=’large pool’;
POOL NAME BYTES
———— ————————– ———-
large pool PX msg pool 262144
large pool free memory 33292288
When DBWR_IO_SLAVES is set to non-zero and DISK_ASYNCH_IO is set to false:
RMAN> allocate channel for maintenance type disk;
SQL> select * from v$sgastat where pool=’large pool’;
POOL NAME BYTES
———— ————————– ———-
large pool PX msg pool 262144
large pool free memory 29081600
large pool KSFQ Buffers 4210688
SQL> select * from v$sgastat where name like ‘%KSFQ%’;
POOL NAME BYTES
———— ————————– ———-
shared pool X$KSFQP ANCHOR 52
shared pool KSFQ buffer pool 2376
shared pool KSFQ Buffer Structures 208
large pool KSFQ Buffers 4210688
Now, we see the large pool is in use.
Bug 10635701 – RMAN backup many files consumes lots of PGA / fails with ORA-4030
RMAN backups of large numbers of files can fail with ORA-4030
due to excessive memory consumption.
Rediscovery Notes:
RMAN backup fails with ORA-4030.
RMAN tracing shows that the error occurred in function
dbms_backup_restore.validationvalidate
with a trace line such as:
krmxrpc – channel t1 kpurpc2 err=4030 db=target proc=SYS.DBMS_BACKUP_RESTORE.VALIDATIONVALIDATE
Hdr: 10635701 11.2.0.2 RDBMS 11.2.0.2 RMAN PRODID-5 PORTID-226
Abstract: BACKUP OF FRA CONSUMES 15GB OF PGA AND FAIL WITH ORA-4030
PROBLEM:
——–
Exadata Quarter rack: 3 storage cells.
using 11.2.0.2 on RDBMS.
using Veritas NetBackup for Oracle – Release 7.0
The backup of flash recovery area allocates up to 15gb of PGA and it fails
with error ORA-4030
The script used to run the backup:
connect target ‘/’
run {
allocate channel t1 type ‘SBT_TAPE’;
backup recovery area backupset all delete input filesperset=4;
release channel t1;
}
DIAGNOSTIC ANALYSIS:
——————–
DIAGNOSTIC
———-
The incident trace file BGC0TT1_ora_21853_i1344592.trc has this:
Dump continued from file:
/u01/app/oracle/admin/diag/rdbms/bgc0tta/BGC0TT1/trace/BGC0TT1_ora_21853.trc
ORA-4030: out of process memory when trying to allocate 1052696 bytes (KSFQ
heap,KSFQ Buffers)
========= Dump for incident 1344592 (ORA-4030) ========
—– Beginning of Customized Incident Dump(s) —–
=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
—————————————
94% 14 GB, 14727 chunks: “KSFQ Buffers ”
KSFQ heap ds=0x2af17258bd70 dsprt=0xb7d3bc0
There is a massive allocation of memory with comments KSFQ Buffers from the
KSFQ heap.
The callstack:
kghalf <- ksfqbalo <- ksfqbcre <- ksfqxc <- ksfqxcrx <- ksfqxcre <- krbrvv
The limitation of FILESPERSET does not change the results, still error
ORA-4030.
We have asked the customer to run the backup and get output from
v$backup_async_io, to identify the number of buffers and size of buffers
allocated during the backup.
Call Stack of the failing process:
kghalf <- ksfqbalo <- ksfqbcre <- ksfqxc <- ksfqxcrx
<- ksfqxcre <- krbrvv
The heap where memory is allocated KSFQ heap with chunks
with comments "KSFQ Buffers". That information is dumped
into the trace file generated by error ORA-4030
Workaround:
None
RELEASE NOTES:
RMAN backups of large numbers of files can fail with ORA-4030
due to excessive memory consumption.
REDISCOVERY INFORMATION:
RMAN backup fails with ORA-4030. RMAN tracing shows that the error occurred
in function dbms_backup_restore.validationvalidate, with a trace line such
as:
krmxrpc – channel t1 kpurpc2 err=4030 db=target proc=SYS.DBMS_BACKUP_RESTORE.VA
LIDATIONVALIDATE
WORKAROUND:
None
Hdr: 9523300 10.2.0.4 USERDOC 10.2.0.4 PRODID-5 PORTID-212
Abstract: TAPE_ASYNCH_IO AND BACKUP_TAPE_IO_SLAVES CLARIFICATION
PROBLEM:
——–
From the online documentation :
Oracle® Database Administrator’s Reference
10g Release 2 (10.2) for UNIX-Based Operating Systems
Part Number B15658-06
href660
” A.2.4 I/O Slaves
..
There are times when you must turn off asynchronous I/O. For example, if
instructed to do so by Oracle Support for debugging. You can use the
DISK_ASYNCH_IO and TAPE_ASYNCH_IO parameters to switch off asynchronous I/O
for disk or tape devices. ”
Customers find this whole section confusing as it does not distinguish
between disk and tape io slaves and gives the impression that tape async io
is enabled by default (as tape_asynch_io defaults to true) and that
backup_tape_io_slaves therefore need never be set.
As I understand it: tape async io can never be asynchronous unless both
tape_asynch_io and backup_tape_io_slaves are BOTH set to true and the default
settings for these parameters means tape io will always by default be
synchronous.
Asynchronous IO to tape can only be simulated via the use of tape IO slaves
and setting tape_asynch_io=true alone is not enough.
This sentence also needs clarifying:
“Set the DBWR_IO_SLAVES parameter to greater than 0 only if the
DISK_ASYNCH_IO or TAPE_ASYNCH_IO parameter is set to false”
This again does not distinguish between disk and tape io slaves and should
be:
Set DBWR_IO_SLAVES parameter to greater than 0 only if DISK_ASYNCH_IO=FALSE.
For tape io slaves, set BACKUP_TAPE_IO_SLAVES=TRUE and leave
TAPE_ASYNCH_IO=TRUE (default).
10..zz
Once slave number 0z is spawned, the next slaves are 10, 11, 12, … As an example, if dbwr_io_slaves=15, the following process will be started:
ora_i101_SID ora_i102_SID ora_i103_SID ora_i104_SID
ora_i105_SID ora_i106_SID ora_i107_SID ora_i108_SID
ora_i109_SID ora_i10a_SID ora_i10b_SID ora_i10c_SID
ora_i10d_SID ora_i10e_SID ora_i10f_SID