SMON的作用还包括清理死事务:Recover Dead transaction。当服务进程在提交事务(commit)前就意外终止的话会形成死事务(dead transaction),PMON进程负责轮询Oracle进程,找出这类意外终止的死进程(dead process),通知SMON将与该dead process相关的dead transaction回滚清理,并且PMON还负责恢复dead process原本持有的锁和latch。
我们来具体了解dead transaction的恢复过程:
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi PL/SQL Release 10.2.0.4.0 - Production CORE 10.2.0.4.0 Production TNS for Linux: Version 10.2.0.4.0 - Production NLSRTL Version 10.2.0.4.0 - Production SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmac.cn SQL>alter system set fast_start_parallel_rollback=false; System altered. 设置10500,10046事件以跟踪SMON进程的行为 SQL> alter system set events '10500 trace name context forever,level 8'; System altered. SQL> oradebug setospid 4424 Oracle pid: 8, Unix process pid: 4424, image: oracle@rh2.oracle.com (SMON) SQL> oradebug event 10046 trace name context forever,level 8; Statement processed. 在一个新的terminal中执行大批量的删除语句,在执行一段时间后使用操作系统命令将执行该删除操作的 服务进程kill掉,模拟一个大的dead transaction的场景 SQL> delete large_rb; delete large_rb [oracle@rh2 bdump]$ kill -9 4535 等待几秒后pmon进程会找出dead process: [claim lock for dead process][lp 0x7000003c70ceff0][p 0x7000003ca63dad8.1290666][hist x9a514951] 在x$ktube内部视图中出现ktuxecfl(Transaction flags)标记为DEAD的记录: SQL> select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD'; SUM(DISTINCT(KTUXESIZ)) ----------------------- 29386 SQL> / SUM(DISTINCT(KTUXESIZ)) ----------------------- 28816 以上KTUXESIZ代表事务所使用的undo块总数(number of undo blocks used by the transaction) ==================smon trace content================== SMON: system monitor process posted WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332801935 WAIT #0: nam='log file switch completion' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243332815568 WAIT #0: nam='latch: row cache objects' ela= 95 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333332734 WAIT #0: nam='latch: row cache objects' ela= 83 address=2979418792 number=200 tries=1 obj#=1 tim=1278243333356173 WAIT #0: nam='latch: undo global data' ela= 104 address=3066991984 number=187 tries=1 obj#=1 tim=1278243347987705 WAIT #0: nam='latch: object queue header operation' ela= 89 address=3094817048 number=131 tries=0 obj#=1 tim=1278243362468042 WAIT #0: nam='log file switch (checkpoint incomplete)' ela= 0 p1=0 p2=0 p3=0 obj#=1 tim=1278243419588202 Dead transaction 0x00c2.008.0000006d recovered by SMON ===================== PARSING IN CURSOR #3 len=358 dep=1 uid=0 oct=3 lid=0 tim=1278243423594568 hv=3186851936 ad='ae82c1b8' select smontabv.cnt, smontab.time_mp, smontab.scn, smontab.num_mappings, smontab.tim_scn_map, smontab.orig_thread from smon_scn_time smontab, (select max(scn) scnmax, count(*) + sum(NVL2(TIM_SCN_MAP, NUM_MAPPINGS, 0)) cnt from smon_scn_time where thread = 0) smontabv where smontab.scn = smontabv.scnmax and thread = 0 END OF STMT PARSE #3:c=0,e=1354526,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1278243423594556 EXEC #3:c=0,e=106,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1278243423603269 FETCH #3:c=0,e=47065,p=0,cr=319,cu=0,mis=0,r=1,dep=1,og=4,tim=1278243423650375 *** 2011-06-24 21:19:25.899 WAIT #0: nam='smon timer' ela= 299999999 sleep time=300 failed=0 p3=0 obj#=1 tim=1278243716699171 kglScanDependencyHandles4Unpin(): cumscan=3 cumupin=4 time=776 upinned=0
以上SMON回滚清理Dead transaction的过程从”system monitor process posted”开始到”Dead transaction 0x00c2.008.0000006d recovered by SMON”结束。另外可以看到在恢复过程中SMON先后请求了’latch: row cache objects’、’latch: undo global data’、’latch: object queue header operation’三种不同类型的latch。
现象
fast_start_parallel_rollback参数决定了SMON在回滚事务时使用的并行度,若将该参数设置为false那么并行回滚将被禁用,若设置为Low(默认值)那么会以2*CPU_COUNT数目的并行度回滚,当设置为High则4*CPU_COUNT数目的回滚进程将参与进来。当我们通过以下查询发现系统中存在大的dead tranacation需要回滚时我们可以通过设置fast_start_parallel_rollback为HIGH来加速恢复:
select sum(distinct(ktuxesiz)) from x$ktuxe where ktuxecfl = 'DEAD'; ==============parallel transaction recovery=============== *** 2011-06-24 20:31:01.765 SMON: system monitor process posted msgflag:0x0000 (-/-/-/-/-/-/-) *** 2011-06-24 20:31:01.765 SMON: process sort segment requests begin *** 2011-06-24 20:31:01.765 SMON: process sort segment requests end *** 2011-06-24 20:31:01.765 SMON: parallel transaction recovery begin WAIT #0: nam='DFS lock handle' ela= 504 type|mode=1413545989 id1=3 id2=11 obj#=2 tim=1308918661765715 WAIT #0: nam='DFS lock handle' ela= 346 type|mode=1413545989 id1=3 id2=12 obj#=2 tim=1308918661766135 WAIT #0: nam='DFS lock handle' ela= 565 type|mode=1413545989 id1=3 id2=13 obj#=2 tim=1308918661766758 WAIT #0: nam='DFS lock handle' ela= 409 type|mode=1413545989 id1=3 id2=14 obj#=2 tim=1308918661767221 WAIT #0: nam='DFS lock handle' ela= 332 type|mode=1413545989 id1=3 id2=15 obj#=2 tim=1308918661767746 WAIT #0: nam='DFS lock handle' ela= 316 type|mode=1413545989 id1=3 id2=16 obj#=2 tim=1308918661768146 WAIT #0: nam='DFS lock handle' ela= 349 type|mode=1413545989 id1=3 id2=17 obj#=2 tim=1308918661768549 WAIT #0: nam='DFS lock handle' ela= 258 type|mode=1413545989 id1=3 id2=18 obj#=2 tim=1308918661768858 WAIT #0: nam='DFS lock handle' ela= 310 type|mode=1413545989 id1=3 id2=19 obj#=2 tim=1308918661769224 WAIT #0: nam='DFS lock handle' ela= 281 type|mode=1413545989 id1=3 id2=20 obj#=2 tim=1308918661769555 *** 2011-06-24 20:31:01.769 SMON: parallel transaction recovery end
但是在real world的实践中可以发现当fast_start_parallel_rollback= Low/High,即启用并行回滚时常有并行进程因为各种资源互相阻塞导致回滚工作停滞的例子,当遭遇到这种问题时将fast_start_parallel_rollback设置为FALSE一般可以保证恢复工作以串行形式在较长时间内完成。
如何禁止SMON Recover Dead transaction
可以设置10513事件来临时禁止SMON恢复死事务,这在我们做某些异常恢复的时候显得异常有效,当然不建议在一个正常的生产环境中设置这个事件:
SQL> alter system set events '10513 trace name context forever, level 2';
System altered.
10531 -- event disables transaction recovery which was initiated by SMON
SQL> select ktuxeusn,
2 to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS') "Time",
3 ktuxesiz,
4 ktuxesta
5 from x$ktuxe
6 where ktuxecfl = 'DEAD';
KTUXEUSN Time KTUXESIZ KTUXESTA
---------- -------------------------- ---------- ----------------
17 24-JUN-2011 22:03:10 0 INACTIVE
66 24-JUN-2011 22:03:10 0 INACTIVE
105 24-JUN-2011 22:03:10 0 INACTIVE
193 24-JUN-2011 22:03:10 33361 ACTIVE
194 24-JUN-2011 22:03:10 0 INACTIVE
194 24-JUN-2011 22:03:10 0 INACTIVE
197 24-JUN-2011 22:03:10 20171 ACTIVE
7 rows selected.
SQL> /
KTUXEUSN Time KTUXESIZ KTUXESTA
---------- -------------------------- ---------- ----------------
17 24-JUN-2011 22:03:10 0 INACTIVE
66 24-JUN-2011 22:03:10 0 INACTIVE
105 24-JUN-2011 22:03:10 0 INACTIVE
193 24-JUN-2011 22:03:10 33361 ACTIVE
194 24-JUN-2011 22:03:10 0 INACTIVE
194 24-JUN-2011 22:03:10 0 INACTIVE
197 24-JUN-2011 22:03:10 20171 ACTIVE
7 rows selected.
================smon disabled trans recover trace==================
SMON: system monitor process posted
*** 2011-06-24 22:02:57.980
SMON: Event 10513 is level 2, trans recovery disabled.
您好,我有一个问题想请教您
很多资料上面说回滚的以及死掉的会话的未提交事务都是由PMON进程负责清理,而启动时的死事务才由SMON清理,不知这种说法到底对不对。
谢谢您。
你好,这种说法是不正确的,如上文所述pmon会轮询后台和服务进程并找出dead process:
pmon进程会找出dead process:
[claim lock for dead process][lp 0x7000003c70ceff0][p 0x7000003ca63dad8.1290666][hist x9a514951]
pmon会负责回收该进程所持有的锁和其他资源,但实际的事务回滚是由SMON完成的,如上文所述的
Dead transaction 0x00c2.008.0000006d recovered by SMON
SMON: parallel transaction recovery begin
……….
SMON: parallel transaction recovery end
可以通过10513事件禁止SMON做transaction recovery,如上文中的演示
您好,042中有一题是这样的:
In the middle of a transaction, a user session was abnormally terminated but the instance is still up and the database is
open. Which two statements are true in this scenario? (Choose two.)
A.Event Viewer gives more details on the failure.
B.The alert log file gives detailed information about the failure.
C.PMON rolls back the transaction and releases the locks.
D.SMON rolls back the transaction and releases the locks.
E.The transaction is rolled back by the next session that refers to any of the blocks updated by the failed transaction.
F.Data modified by the transaction up to the last commit before the abnormal termination is retained in the database.
Answer : C, F
所以我现在非常疑惑~谢谢您的回答~
Certification主要是面向初学者,部分概念上有误导之嫌。
关于该问题 , Maclean有了新的认识:
_cleanup_rollback_entries
This is an integer parameter that defaults to 100. The parameter is used to specify the maximum number of undo (rollback) records to apply when performing transaction
rollback under PMON.
If the transaction rollback hits the limit then PMON aborts the recovery and signals SMON so that it can perform the recovery instead.
It also looks like SMON may be influenced by this parameter too, if it is asked to do a quick pass (KTUREC_QP) recovery of a transaction.
In the Oracle 8.0 Server Reference, the non-hidden version of this parameter is described as follows:
CLEANUP_ROLLBACK_ENTRIES specifies the number of undo records processed at one time when rolling back a transaction. Prevents long transactions
CLEANUP_ROLLBACK_ENTRIES
Parameter type:
integer
Parameter class:
static
Default value:
20
CLEANUP_ROLLBACK_ENTRIES specifies the number of undo records processed at one time when rolling back a transaction. Prevents long transactions from freezing out shorter transactions that also need to be rolled back. Normally this parameter will not need modification.
_cleanup_rollback_entries 默认值100, 单位为undo(rollback)记录的条数, 若小于100条则 pmon负责该事务的rollback。
若事务需回滚的记录超过100条rollback record则PMON只负责清理dead session,而由SMON负责回滚dead transaction。
所以关于 到底是 PMON还是SMON做rolls back the transaction这个工作的公案,算有一个合理的结论了,如上。