[gview file=”https://www.askmac.cn/wp-content/uploads/2010/03/Diff-Hidden-Parameter-between-10g-and-11g.pdf”]
[gview file=”https://www.askmac.cn/wp-content/uploads/2010/03/Diff-Hidden-Parameter-between-10g-and-11g.pdf”]
运行在操作系统上的进程存在2种系统时序优先级模式:即 实时模式 Real Time(RT) mode, 与分时模式 Time Sharing(TS) mode.
绝大多数Oracle进程运行在TS模式下:
[oracle@rh1 ~]$ ps -efc|grep ora_|grep -v grep oracle 8510 1 TS 23 Mar27 ? 00:00:02 ora_pmon_PROD oracle 8512 1 TS 23 Mar27 ? 00:00:00 ora_psp0_PROD oracle 8514 1 TS 23 Mar27 ? 00:00:00 ora_mman_PROD oracle 8516 1 TS 23 Mar27 ? 00:00:02 ora_dbw0_PROD oracle 8518 1 TS 23 Mar27 ? 00:00:04 ora_lgwr_PROD oracle 8520 1 TS 23 Mar27 ? 00:00:04 ora_ckpt_PROD oracle 8522 1 TS 23 Mar27 ? 00:00:08 ora_smon_PROD oracle 8524 1 TS 23 Mar27 ? 00:00:00 ora_reco_PROD oracle 8526 1 TS 23 Mar27 ? 00:00:34 ora_cjq0_PROD oracle 8528 1 TS 23 Mar27 ? 00:00:06 ora_mmon_PROD oracle 8530 1 TS 24 Mar27 ? 00:00:07 ora_mmnl_PROD oracle 8538 1 TS 23 Mar27 ? 00:00:00 ora_arc0_PROD oracle 8540 1 TS 23 Mar27 ? 00:00:00 ora_arc1_PROD oracle 8548 1 TS 23 Mar27 ? 00:00:00 ora_qmnc_PROD oracle 8555 1 TS 23 Mar27 ? 00:00:00 ora_q000_PROD oracle 8559 1 TS 23 Mar27 ? 00:00:00 ora_q001_PROD oracle 30500 1 TS 23 22:10 ? 00:00:00 ora_j000_PROD
如上所示所有进程均运行在TS模式下且priority均为23|24.
Oracle一般不推荐使用RT模式,因为虽然个别进程可以通过这种方式获得更多的CPU资源,但往往系统的瓶颈并非CPU,即尽管CPU使用率高了,但实际系统TPS并未得到提升。
在10gr2版本后RAC中的LMS进程成为唯一一个使用RT模式的Oracle进程,我们可以通过查询参数_high_priority_processes了解相关信息:
SQL> col name format a40 SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE 2 FROM SYS.x$ksppi x, SYS.x$ksppcv y 3 WHERE x.inst_id = USERENV ('Instance') 4 AND y.inst_id = USERENV ('Instance') 5 AND x.indx = y.indx 6 AND x.ksppinm LIKE '%priority%'; NAME VALUE ---------------------------------------- ---------- _high_priority_processes LMS* _os_sched_high_priority 1
_high_priority_processes通过进程功能名进行匹配,下面我们将提高LGWR及PMON进程的优先级:
SQL> alter system set "_high_priority_processes"='LMS*|LGWR|PMON' scope=spfile; System altered. SQL> startup force; ORACLE instance started. Total System Global Area 281018368 bytes Fixed Size 2083336 bytes Variable Size 150996472 bytes Database Buffers 121634816 bytes Redo Buffers 6303744 bytes Database mounted. Database opened. SQL> !ps -efc|grep ora_|grep -v grep oracle 31441 1 RR 41 22:50 ? 00:00:00 ora_pmon_PROD oracle 31445 1 TS 23 22:50 ? 00:00:00 ora_psp0_PROD oracle 31447 1 TS 23 22:50 ? 00:00:00 ora_mman_PROD oracle 31449 1 TS 23 22:50 ? 00:00:00 ora_dbw0_PROD oracle 31451 1 RR 41 22:50 ? 00:00:00 ora_lgwr_PROD oracle 31455 1 TS 23 22:50 ? 00:00:00 ora_ckpt_PROD oracle 31457 1 TS 23 22:50 ? 00:00:00 ora_smon_PROD oracle 31459 1 TS 22 22:50 ? 00:00:00 ora_reco_PROD oracle 31461 1 TS 23 22:50 ? 00:00:01 ora_cjq0_PROD oracle 31463 1 TS 23 22:50 ? 00:00:01 ora_mmon_PROD oracle 31465 1 TS 24 22:50 ? 00:00:00 ora_mmnl_PROD oracle 31471 1 TS 24 22:50 ? 00:00:00 ora_p000_PROD oracle 31473 1 TS 24 22:50 ? 00:00:00 ora_p001_PROD oracle 31475 1 TS 24 22:50 ? 00:00:00 ora_arc0_PROD oracle 31477 1 TS 22 22:50 ? 00:00:00 ora_arc1_PROD oracle 31481 1 TS 23 22:50 ? 00:00:00 ora_qmnc_PROD oracle 31488 1 TS 23 22:50 ? 00:00:00 ora_q000_PROD oracle 31490 1 TS 23 22:50 ? 00:00:00 ora_q001_PROD oracle 31500 1 TS 23 22:50 ? 00:00:00 ora_j000_PROD
好了lgwr和pmon进程也进入实时模式了,同时priority值上升到了41.
注意:
Oracle默认仅允许LMS进程(11g中多了VKTM进程)使用RT模式是有它的原因的,所以如果不是Oracle support 推荐,您没有任何修改隐式参数的理由。
其次根据Oracle文档[ID 602419.1]的描述,oradism文件(该文件位于$ORACLE_HOME/bin目录下)不正确的权限将导致RT模式无法被正确使用,该文件默认属于root用户并具有s权限。如下测试:
[oracle@rh1 bin]$ ls -la oradism -r-sr-s--- 1 root oinstall 14931 Mar 11 2008 oradism [oracle@rh1 bin]$ su - root Password: [root@rh1 ~]# chown oracle:oinstall /s01/oracle/product/10.2.0/db_1/bin/oradism [root@rh1 ~]# exit logout [oracle@rh1 bin]$ ls -la oradism -r-xr-x--- 1 oracle oinstall 14931 Mar 11 2008 oradism [oracle@rh1 bin]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 - Production on Sun Mar 28 23:07:03 2010 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup; ORACLE instance started. Total System Global Area 281018368 bytes Fixed Size 2083336 bytes Variable Size 150996472 bytes Database Buffers 121634816 bytes Redo Buffers 6303744 bytes Database mounted. Database opened. SQL> col name format a35; SQL> col value format a10; SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE 2 FROM SYS.x$ksppi x, SYS.x$ksppcv y 3 WHERE x.inst_id = USERENV ('Instance') 4 AND y.inst_id = USERENV ('Instance') 5 AND x.indx = y.indx 6 AND x.ksppinm LIKE '%priority%'; NAME VALUE ----------------------------------- ---------- _high_priority_processes LMS*|LGWR|PMON _os_sched_high_priority 1 SQL> !ps -efc|grep ora_|grep -v grep oracle 31994 1 TS 23 23:07 ? 00:00:00 ora_pmon_PROD oracle 31998 1 TS 23 23:07 ? 00:00:00 ora_psp0_PROD oracle 32000 1 TS 23 23:07 ? 00:00:00 ora_mman_PROD oracle 32002 1 TS 23 23:07 ? 00:00:00 ora_dbw0_PROD oracle 32004 1 TS 24 23:07 ? 00:00:00 ora_lgwr_PROD oracle 32008 1 TS 22 23:07 ? 00:00:00 ora_ckpt_PROD oracle 32010 1 TS 23 23:07 ? 00:00:00 ora_smon_PROD oracle 32012 1 TS 22 23:07 ? 00:00:00 ora_reco_PROD oracle 32014 1 TS 23 23:07 ? 00:00:01 ora_cjq0_PROD oracle 32016 1 TS 23 23:07 ? 00:00:01 ora_mmon_PROD oracle 32018 1 TS 24 23:07 ? 00:00:00 ora_mmnl_PROD oracle 32026 1 TS 24 23:07 ? 00:00:00 ora_arc0_PROD oracle 32028 1 TS 23 23:07 ? 00:00:00 ora_arc1_PROD oracle 32032 1 TS 23 23:07 ? 00:00:00 ora_qmnc_PROD oracle 32045 1 TS 23 23:07 ? 00:00:00 ora_q000_PROD oracle 32065 1 TS 23 23:08 ? 00:00:00 ora_q001_PROD oracle 32072 1 TS 23 23:08 ? 00:00:00 ora_j000_PROD
that’s great, 显然oradism不仅为Oracle实例提供了内存资源控制功能,还包括了进程优先级分配的权限。
我们应当再次声明hidden parameter不应“滥用”于production environment.
今天没有外出(似乎人不到现场就特别容易出问题),早上10点左右接到电话被告知crm11实例上出现了7445错误,准备用web vpn拨上去查看一下,赫然发觉windows 7 不支持这种vpn(准确说ie8和firefox都不支持);无奈无奈只好用拨号。
发现alert log中出现大量 7445错误记录:
Fri Mar 26 09:24:53 2010 Errors in file /oravl01/oracle/admin/CRMDB1/udump/crmdb11_ora_6754320.trc: ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] [] Fri Mar 26 09:24:55 2010 Trace dumping is performing id=[cdmp_20100326092455] Fri Mar 26 09:31:16 2010 Errors in file /oravl01/oracle/admin/CRMDB1/udump/crmdb11_ora_2994552.trc: ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] []
看到kghalp函数第一印象 ,是Oracle中堆管理使用的函数;
让我们猜猜字面意思? k -> kernel g -> generic h-> heap a-> allocation p-> point
再让我们来看一下当时的call stack:
Exception signal: 11 (SIGSEGV), code: 51 (Invalid permissions for mapped object), addr: 0x3b, PC: [0x1000973e0, kghalp+0500] Registers: iar: 00000001000973e0, msr: a00000000000d0b2 lr: 00000001013a6df8, cr: 0000000022292484 r00: 0000000000000010, r01: 0ffffffffffcb160, r02: 000000011022a9c0, r03: 0000000000000002, r04: 0000000000000000, r05: 0000000000000100, r06: 0000000000000001, r07: 0000000000000000, r08: 0000000000000000, r09: 0000000000000000, r10: 00000000101b60d8, r11: 0000000000000004, r12: 0000000024592484, r13: 000000011026bfe0, r14: 0000000000000000, r15: 0000000000009000, r16: 0000000110195b2c, r17: 0000000000000000, r18: 0000000000000001, r19: 0000000000000000, r20: 0000000000001000, r21: 0000000000000000, r22: 0000000000000100, r23: 0000000000000001, r24: 0000000000000000, r25: 0000000000000000, r26: 0000000000000001, r27: 0000000104c7fd44, r28: 0000000000000000, r29: 0000000000000100, r30: 0000000000000000, r31: 0000000110195a58, *** 2010-03-26 09:57:28.679 ksedmp: internal or fatal error ORA-07445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] [] Current SQL statement for this session: INSERT INTO AUDIT_DDL_LOG (DDL_TIME, SESSION_ID, OS_USER, IP_ADDRESS, TERMINAL, HOST, USER_NAME, DDL_TYPE, OBJECT_TYPE, OWNER, OBJECT_NAME, SQL_TEXT) VALUES (SYSDATE, SYS_CONTEXT('USERENV','SESSIONID'), SYS_CONTEXT('USERENV','OS_USER'), SYS_CONTEXT('USERENV','IP_ADDRESS'), SYS_CONTEXT('USERENV','TERMINAL'), SYS_CONTEXT('USERENV','HOST'), ORA_LOGIN_USER, ORA_SYSEVENT, ORA_DICT_OBJ_TYPE, ORA_DICT_OBJ_OWNER, ORA_DICT_OBJ_NAME, :B1 ) ----- PL/SQL Call Stack ----- object line object handle number name 70000043da500d0 10 anonymous block ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst+001c bl ksedst1 000000000 ? 104A54EED ? ksedmp+0290 bl ksedst 104A54870 ? ssexhd+03e0 bl ksedmp 300001D15 ? 000044C0 ? 00000000 parchk+01f4 bl kghalp 000000000 ? 2842288200000001 ? 000000000 ? 000000000 ? 000001040 ? 110195B2C ? ptmak+0168 bl parchk FFFFFFFFFFCB560 ? FFFFFFFFFFCB430 ? FFFFFFFFFFCB430 ? pdybF00_Init+0244 bl ptmak 10008049C ? 000000000 ? FFFFFFFFFFCB4F0 ? 07FFFFFFF ? pdy1F79_Init+00c8 bl pdybF00_Init 110BEB1D0 ? pdy1F01_Driver+0048 bl pdy1F79_Init FFFFFFFFFFCBC40 ? pdli_new_cog+00f0 bl pdy1F01_Driver FFFFFFFFFFCBCE0 ? 000000000 ? pdlifu+0264 bl pdli_new_cog 1013885F4 ? FFFFFFFFFFCCB00 ? 7000004383E7680 ? phpcog+0010 bl pdlifu FFFFFFFFFFCD958 ? 7000004383E7680 ? 104C95048 ? phpcmp+0f80 bl phpcog FFFFFFFFFFCC4F0 ? 000000000 ? pcicms2+02d4 bl phpcmp FFFFFFFFFFCD958 ?
发生错误的最上层 kghalp 函数由 parchk 调用, 这似乎是一个package check函数(猜测,呵呵). 我们来整理一下思路, parchk 函数调用了 kghalp函数以帮其分配内存,但却得到了一个非法的低地址[[0x00000003B],正常情况下正文段使用的空间; 这看起来显然是一个bug。
让我们来查查support.oracle.com , 键入7445 kghalp 和sigsegv 关键字 (很多时候不需要使用ora 600/7445 lookup tools).
bug 8244533 赫然显目:
Bug 8244533: ORA-07445 [KGHALP] ERRORS COMPILING PACKAGE WITH DEBUG STACK TRACE: ------------ ksedst <- ksedmp <- ssexhd <- 000044BC <- parchk <- ptmak <- pdybF00_Init <- pdy1F79_Init <- pdy1F01_Driver <- pdli_new_cog <- pdlifu <- phpcog <- phpcmp <- pcicms2 <- pcicms <- kkxcms <- kkxswcm <- kkxmpbms <- kkxmesu <- xtypls <- qctopls <- qctcopn <- qctcopn Exception signal: 11 (SIGSEGV), code: 51 (Invalid permissions for mapped object), addr: 0x3b, PC: [0x1000973e0, kghalp+0500] Registers: iar: 00000001000973e0, msr: a00000000000d0b2 lr: 000000010139ffb8, cr: 00000000222a2484 r00: 0000000000000010, r01: 0ffffffffffe2980, r02: 00000001101e5ab8, r03: 0000000000000002, r04: 0000000000000000, r05: 0000000000000100, r06: 0000000000000001, r07: 0000000000000000, r08: 0000000000000000, r09: 0000000000000000, r10: 0000000010171200, r11: 0000000000000004, r12: 00000000245a2484, r13: 000000011021fbc0, r14: 0000000000000000, r15: 0000000000009000, r16: 0000000110150c54, r17: 0000000000000000, r18: 0000000000000001, r19: 0000000000000000, r20: 0000000000001000, r21: 0000000000000000, r22: 0000000000000100, r23: 0000000000000001, r24: 0000000000000000, r25: 0000000000000000, r26: 0000000000000001, r27: 0000000104c5983c, r28: 0000000000000000, r29: 0000000000000100, r30: 0000000000000000, r31: 0000000110150b80, *** 16:37:14.603 ksedmp: internal or fatal error ORA-7445: exception encountered: core dump [kghalp+0500] [SIGSEGV] [Invalid permissions for mapped object] [0x00000003B] [] [] Current SQL statement for this session: select dummy from dual where ora_dict_obj_type = 'TABLE' ----- Call Stack Trace -----ptmak pdybF00_Init pdy1F79_Init pdy1F01_Driver pdli_new_cog pdlifuphpcog phpcmp pcicms2 pcicms kkxcms kkxswcm kkxmpbms kkxmesu xtyplsTo Filer.Based on this call stack this would appear a likely match forbug 6951953 Abstract: ORA-7445 [PTMAK] IMPORTING PACKAGE COMPILED DEBUG.This bug is fixed on 10.2.0.5 and there is a 10.2.0.4 patch available for IBM AIX Based Systems (64-bit).It maybe worth while to have the customer apply the patch to seeif it resolves the issue.Also the uploaded files included test.sql is this a reproducable testcase?
这个bug 似乎仅在 IBM AIX on POWER Systems (64-bit) 发生,当以DEBUG 模式编译包时有一定几率出现。
好了,既然已经了解了可能发生的诱因,我们可以进一步分析了,接下来看看 errorstack trace信息中 的SO 记录。
SO: 70000043d217668, type: 53, owner: 70000048cee2238, flag: INIT/-/-/0x00 LIBRARY OBJECT LOCK: lock=70000043d217668 handle=700000446261588 mode=N call pin=0 session pin=0 hpc=0000 hlc=0000 htl=70000043d2176e8[70000042b52b368,70000042bb9a808] htb=70000044929b460 ssga=70000044929ad68 user=70000048cee2238 session=70000048eb33010 count=1 flags=[0000] savepoint=0x4bac1488 LIBRARY OBJECT HANDLE: handle=700000446261588 mtx=7000004462616b8(1) cdp=1 name=ALTER TRIGGER "SHUCRM3O"."TRI_PRODUCT_INSTANCE_RELATED" COMPILE DEBUG REUSE SETTINGS hash=164e6a8942406cee159f8943a1a3c85e timestamp=03-26-2010 09:52:12 namespace=CRSR flags=RON/KGHP/TIM/PN0/SML/KST/DBN/MTX/[120100d0] kkkk-dddd-llll=0000-0001-0001 lock=N pin=0 latch#=16 hpc=0002 hlc=0002 lwt=700000446261630[700000446261630,700000446261630] ltm=700000446261640[700000446261640,700000446261640] pwt=7000004462615f8[7000004462615f8,7000004462615f8] ptm=700000446261608[700000446261608,700000446261608] ref=700000446261660[700000446261660,700000446261660] lnd=700000446261678[700000446261678,700000446261678] LIBRARY OBJECT: object=70000045adbc1e8 type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0 CHILDREN: size=16 child# table reference handle 5 70000041776f5c0 70000045ae44720 70000042bfa3a20 DATA BLOCKS: data# heap pointer status pins change whr 0 70000043d9fed20 70000045adbc300 I/P/A/-/- 0 NONE 00
的确有以debug 模式编译对象的语句,不过对象不是包而是trigger ; 看起来只要是可以以debug 模式compile 的对象都有可能引发该问题。
好了,问题到这里已经比较明确了: 应用端以DEBUG模式重新编译包引发了 Oracle bug 8244533,从而导致了对应服务进程的崩溃;总算是虚惊一场,之后通过trace内的machine和user信息找到了实施变更的应用方人员并教育之。
在版本10.2.0.4未打上相关one-off补丁的情况下,分别对ASSM和MSSM管理模式表空间进行索引分裂测试,经过测试的结论如下:
l 在10gr2版本中MSSM方式是不能避免索引分裂引起交易超时问题;
l 10.2.0.4上的one-off补丁因为目前仅存在Linux版本,可以考虑声请补丁后具体测试(因目前没有补丁所以处于未知状态)。
l 合并索引是目前最具可行性的解决方案(alter index coalesce)。
l 最新的11gr2中经测试仍存在该问题。
具体测试过程如下:
1. 自动段管理模式下的索引块分裂
SQL> drop tablespace idx1 including contents and datafiles;
Tablespace dropped. SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M 2 segment space management AUTO 3 extent management local uniform size 10M; —创建自动段管理的表空间 Tablespace created. SQL> create table idx1(a number) tablespace idx1; Table created. create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0; Index created. — 创建实验对象表及索引
SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000; — 插入25万条记录 250000 rows created. SQL> commit; Commit complete. SQL>create table idx2 tablespace idx1 as select * from idx1 where 1=2; Table created. insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) —取出后端部分记录,即每250条取一条 ) where mod(rn, 250) = 0 ) / 933 rows created. SQL> commit; Commit complete. SQL> analyze index idx1_idx validate structure; —分析原索引 select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 499 0 — 未删除情况下499个叶块
SQL> delete from idx1 where a between 10127 and 243625; — 大量删除 commit; 233499 rows deleted. SQL> SQL> Commit complete. SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 499 233499 — 删除后叶块数量不变
SQL> insert into idx1 select * from idx2; — 令那些empty块 不再empty,但每个块中只有一到二条记录,空闲率仍为75-100% commit; 933 rows created. Commit complete. SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126; — 造成leaf块分裂前提
SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 997 leaf node splits 997 leaf node 90-10 splits 0 branch node splits 0 queue splits —找出当前会话目前的叶块分裂次数
SQL>insert into idx1 values (251000); — 此处确实叶块分裂 1 row created. SQL> commit; Commit complete. SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 998 leaf node splits 998 leaf node 90-10 splits 0 branch node splits 0 queue splits — 可以看到对比之前的查询多了一个叶块分裂 SQL> set linesize 200 pagesize 1500; SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1603 0 271601 271601 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 156 0 82803 82803 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 177 0 3728 3728 1 insert into idx1 values (251000) — 读了那些实际不空的块,较多buffer_get 1 1409 0 40293 40293 933 insert into idx1 select * from idx2 1 240842 0 3478341 3478341 250000 SQL> insert into idx1 values (251001); — 不分裂的插入 1 row created. SQL> commit; Commit complete. SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1603 0 271601 271601 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 156 0 82803 82803 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 9 0 1640 1640 1 insert into idx1 values (251001) —不分裂的插入,少量buffer_gets 1 177 0 3728 3728 1 insert into idx1 values (251000) 1 1409 0 40293 40293 933 insert into idx1 select * from idx2 1 240842 0 3478341 3478341 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 |
如演示1所示,在自动段管理模式下大量删除后插入造成许多块为75%-100%空闲率且不完全为空,此后叶块分裂时将引起插入操作的相关前台进程扫描大量“空块“,若这些块不在内存中(引发物理读)且可能需要延迟块清除等原因时,减缓了该扫描操作的速度,造成叶块分裂缓慢,最终导致了其他insert操作被split操作所阻塞,出现enq:tx index contention等待事件。
2. 手动段管理模式下的索引块分裂
SQL> drop tablespace idx1 including contents and datafiles;
Tablespace dropped. SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M 2 segment space management MANUAL — MSSM的情况 3 extent management local uniform size 10M; Tablespace created. SQL> create table idx1(a number) tablespace idx1; create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0; Table created. SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250 Index created. SQL> SQL> 000; commit; create table idx2 tablespace idx1 as select * from idx1 where 1=2; insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) / commit; 250000 rows created. SQL> SQL> Commit complete. SQL> SQL> Table created. SQL> SQL> 2 3 4 5 6 7 8 9 933 rows created. SQL> SQL> Commit complete. SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 499 0 SQL> delete from idx1 where a between 10127 and 243625; 233499 rows deleted. SQL> commit; Commit complete. SQL> insert into idx1 select * from idx2; commit; 933 rows created. SQL> SQL> Commit complete. SQL> SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126; commit; 126 rows created. SQL> SQL> Commit complete. SQL> SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 1496 leaf node splits 1496 leaf node 90-10 splits 0 branch node splits 0 queue splits SQL> insert into idx1 values (251000); — 确实分裂 1 row created. SQL> commit; Commit complete. SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 1497 leaf node splits 1497 leaf node 90-10 splits 0 branch node splits 0 queue splits — 以上与ASSM时完全一致 SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1553 0 283301 283301 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 153 0 78465 78465 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 963 0 10422 10422 1 — 比ASSM模式下更大量的“空块”读 insert into idx1 values (251000) 1 984 0 35615 35615 933 insert into idx1 select * from idx2 1 238579 0 3468326 3469984 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 SQL> insert into idx1 values (251001); 1 row created. SQL> commit; Commit complete. SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1553 0 283301 283301 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 153 0 78465 78465 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 7 0 1476 1476 1 insert into idx1 values (251001) —不分裂的情况与ASSM时一致 1 963 0 10422 10422 1 insert into idx1 values (251000) 1 984 0 35615 35615 933 insert into idx1 select * from idx2 1 238579 0 3468326 3469984 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 6 rows selected. |
如演示2所示,MSSM情况下叶块分裂读取了比ASSM模式下更多的“空块“;MSSM并不能解决大量删除后叶块分裂需要扫描大量非空块的问题,实际上可能更糟糕。从理论上讲MSSM的freelist只能指出那些未达到pctfree和曾经到达pctfree后来删除记录后使用空间下降到pctused的块(doc:A free list is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED.),换而言之MSSM模式下”空块“会更多。
3. 自动段管理模式下coalesce后的索引块分裂
SQL> drop tablespace idx1 including contents and datafiles;
Tablespace dropped. SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M 2 segment space management AUTO — ASSM 下 coalesce情况 3 extent management local uniform size 10M; Tablespace created. SQL> create table idx1(a number) tablespace idx1; create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0; Table created. SQL> SQL> Index created. SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000; commit; create table idx2 tablespace idx1 as select * from idx1 where 1=2; insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) / commit; 250000 rows created. SQL> SQL> Commit complete. SQL> SQL> Table created. SQL> SQL> 2 3 4 5 6 7 8 9 933 rows created. SQL> SQL> Commit complete. SQL> SQL> SQL> SQL> SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 499 0 SQL> delete from idx1 where a between 10127 and 243625; commit; 233499 rows deleted. SQL> SQL> Commit complete. SQL> alter index idx1_idx coalesce; Index altered. SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed.
SQL>
BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 33 0 — coalesc后 lf块合并了 SQL> insert into idx1 select * from idx2; 933 rows created. SQL> SQL> commit; Commit complete. SQL> SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126; commit; 126 rows created. SQL> SQL> Commit complete. SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 1999 leaf node splits 1995 leaf node 90-10 splits 0 branch node splits 0 queue splits SQL> insert into idx1 values (251000); — 确实分裂 1 row created. SQL> commit; Commit complete. SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 2000 leaf node splits 1996 leaf node 90-10 splits 0 branch node splits 0 queue splits SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1603 0 268924 268924 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 156 0 78349 78349 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 23 0 2218 2218 1 —少量buffer gets insert into idx1 values (251000) 1 191 0 15596 15596 933 insert into idx1 select * from idx2 1 240852 0 3206130 3206130 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 SQL> insert into idx1 values (251001); 1 row created. SQL> commit; Commit complete. SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1603 0 268924 268924 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 156 0 78349 78349 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 9 0 1574 1574 1 insert into idx1 values (251001) 1 23 0 2218 2218 1 insert into idx1 values (251000) 1 191 0 15596 15596 933 insert into idx1 select * from idx2 1 240852 0 3206130 3206130 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 6 rows selected. |
如演示三所示在删除后进行coalesce操作,合并操作将大量空块分离出了索引结构(move empty out of index structure),之后的叶块分裂仅读取了少量必要的块。
4. 手动段管理模式下coalesce后的索引块分裂
SQL> drop tablespace idx1 including contents and datafiles;
Tablespace dropped. SQL> create tablespace idx1 datafile ‘?/dbs/idx1.dbf’ size 500M 2 segment space management MANUAL — mssm情况下 coalesce 3 extent management local uniform size 10M; Tablespace created. SQL> create table idx1(a number) tablespace idx1; create index idx1_idx on idx1 (a) tablespace idx1 pctfree 0; Table created. SQL> SQL> insert into idx1 select rownum from all_objects, all_objects where rownum <= 250 Index created. SQL> SQL> 000; commit; create table idx2 tablespace idx1 as select * from idx1 where 1=2; insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) / commit; 250000 rows created. SQL> SQL> Commit complete. SQL> SQL> Table created. SQL> SQL> 2 3 4 5 6 7 8 9 933 rows created. SQL> SQL> Commit complete. SQL> SQL> SQL> SQL> SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 499 0 SQL> delete from idx1 where a between 10127 and 243625; commit; 233499 rows deleted. SQL> SQL> Commit complete. SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 499 233499 SQL> alter index idx1_idx coalesce; Index altered. SQL> analyze index idx1_idx validate structure; select blocks,lf_blks,del_lf_rows from index_stats; Index analyzed. SQL> BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 1280 33 0 SQL> insert into idx1 select * from idx2; 933 rows created. SQL> SQL> commit; Commit complete. SQL> SQL> insert into idx1 select 250000+rownum from all_objects where rownum <= 126; commit; 126 rows created. SQL> SQL> Commit complete. SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 2502 leaf node splits 2494 leaf node 90-10 splits 0 branch node splits 0 queue splits SQL> insert into idx1 values (251000); — 确实分裂 1 row created. SQL> commit; Commit complete. SQL> select ss.value,sy.name from v$sesstat ss ,v$sysstat sy where ss.statistic#=sy.statistic# and name like ‘%split%’ and sid=(select distinct sid from v$mystat); VALUE NAME ———- —————————————————————- 2503 leaf node splits 2495 leaf node 90-10 splits 0 branch node splits 0 queue splits SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1553 0 281059 281059 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 153 0 77817 77817 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 19 0 2010 2010 1 — 少量buffer get insert into idx1 values (251000) 1 126 0 15364 15364 933 insert into idx1 select * from idx2 1 238644 0 3229737 3230569 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 SQL> insert into idx1 values (251001); 1 row created. SQL> commit; Commit complete. SQL> select executions, buffer_gets, disk_reads, cpu_time, elapsed_time, rows_processed, sql_text from v$sql 2 where sql_text like ‘%insert%idx1%’ and sql_text not like ‘%v$sql%’; EXECUTIONS BUFFER_GETS DISK_READS CPU_TIME ELAPSED_TIME ROWS_PROCESSED ———- ———– ———- ———- ———— ————– SQL_TEXT ——————————————————————————————————————————————————————————————————– 1 1553 0 281059 281059 933 insert into idx2 select * from idx1 where rowid in (select rid from (select rid, rownum rn from (select rowid rid from idx1 where a between 10127 and 243625 order by a) ) where mod(rn, 250) = 0 ) 1 153 0 77817 77817 126 insert into idx1 select 250000+rownum from all_objects where rownum <= 126 1 7 0 1460 1460 1 insert into idx1 values (251001) 1 19 0 2010 2010 1 insert into idx1 values (251000) 1 126 0 15364 15364 933 insert into idx1 select * from idx2 1 238644 0 3229737 3230569 250000 insert into idx1 select rownum from all_objects, all_objects where rownum <= 250000 6 rows selected. |
如演示4所示,MSSM模式下合并操作与ASSM情况下大致一样,合并操作可以有效解决该问题。
5. Coalesce合并操作的锁影响
SQL> create table coal (t1 int);
Table created. SQL> create index pk_t1 on coal(t1); Index created. SQL> begin 2 for i in 1..3000 loop 3 insert into coal values(i); 4 commit; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed. SQL> delete coal where t1>500; 2500 rows deleted. SQL> commit; Commit complete. SQL> analyze index pk_t1 validate structure; Index analyzed. — 注意analyze validate操作会block一切dml操作 SQL> select blocks,lf_blks,del_lf_rows from index_stats; BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 8 6 2500 — 删除后的状态 此时另开一个会话,开始dml操作: SQL> update coal set t1=t1+1; 500 rows updated. — 回到原会话 SQL> alter index pk_T1 coalesce; — coalesce 未被阻塞 Index altered. — 在另一个会话中commit,以便执行validate structure
SQL> analyze index pk_t1 validate structure;
Index analyzed.
SQL> select blocks,lf_blks,del_lf_rows from index_stats;
BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 8 3 500 — 显然coalesce的操作没有涉及有dml操作的块 在没有dml操作的情况下: SQL> truncate table coal; Table truncated. SQL> begin 2 for i in 1..3000 loop 3 insert into coal values(i); 4 commit; 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed. SQL> analyze index pk_t1 validate structure; Index analyzed. SQL> select blocks,lf_blks,del_lf_rows from index_stats; BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 8 6 0 SQL> delete coal where t1>500; 2500 rows deleted. SQL> commit; Commit complete. SQL> analyze index pk_t1 validate structure; Index analyzed. SQL> select blocks,lf_blks,del_lf_rows from index_stats; BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 8 6 2500 SQL> alter index pk_t1 coalesce; Index altered. SQL> analyze index pk_t1 validate structure; Index analyzed. SQL> select blocks,lf_blks,del_lf_rows from index_stats; BLOCKS LF_BLKS DEL_LF_ROWS ———- ———- ———– 8 1 0 —没有dml时,coalesce 操作涉及了所有块 |
如演示5所示coalesce会避开dml操作涉及的块,但在coalesec的短暂间歇出现在索引上有事务的块不会太多。且coalesce操作不会降低索引高度。
附件是关于rebuild及coalesce索引操作的详细描述:
6. Coalesce操作总结
优点:
l 是一种快速的操作,对整体性能影响最小(not performance sensitive)。
l 不会锁表,绕过有事务的索引块。
l 可以有效解决现有的问题。
l 不会降低索引高度,引起再次的root split
缺点:
l 需要针对个别对象,定期执行合并操作;无法一劳永逸地全局地解决该问题。
7. Linux 10.2.0.4上相关补丁的技术交流
Metalink bug 8286901 note中叙述了一位用户遇到相同的问题并提交了SR,当时oracle support给出了one-off补丁,但该用户在apply了该补丁后仍未解决问题。
以下为note 原文:
It is similar to bug8286901, but after applied patch8286901, still see enq tx contentiona with high “failed probes on index block reclamation” Issue encountered by customer and Oracle developer (Stefan Pommerenk). He describes is thus: "Space search performed by the index splitter can't find space in neighboring blocks, and then instead of allocating new space, we go and continue to search for space elsewhere, which manifests itself in block reads from disk, block cleanouts, and subsequent blocks written due to aggressive MTTR setting." "To clarify: the cleanouts are not the problem per se. The culprit seems to be that the space search performed by the index splitter can't find space in neighboring blocks, and then instead of allocating new space, we go and continue to search for space elsewhere, which manifests itself in block reads from disk, block cleanouts, and subsequent blocks written due to aggressive MTTR setting. This action has caused other sessions to get blocked on TX enqueue contention, blocked on the splitting session. Advice was to set 10224 trace event for the splitter for a short time only in order to get diagnostics as to why the space search rejected most blocks. > A secondary symptom are the bitmap level 1 block updates, which may or may not be related to the space search; I've not seen them before, maybe because I didn't really pay attention :P , but the symptoms seen in the ASH trace indicate it's the same problem. Someone in space mgmt has to look at it to confirm it is the same problem." |
与该用户进行了mail私下交流,他的回复:
I still have a case open with Oracle. I believe that this is a bug in the Oracle code. The problem is that it has been difficult to create a reproducible test case for Oracle support. My specific issue was basically put on hold pending the results of another customer’s service request that appeared to have had the same issue, (9034788). Unfortunately they couldn’t reproduce the issue in that case either.
I believe that there is a correlation between the enq TX – index contention wait event and a spike in the number of ‘failed probes on index block reclamation. I have specifically asked Oracle to explain why there is a spike in the ‘failed probes on index block reclamation’ during the same time frame as the enq TX index contention wait event, but they have not answered my question. I was hoping that some investigation by Oracle Support into the failed probes metric might get someone on the right track to discovering the bug. That hasn’t happened though. Hi , Thanks for your sharing . The bug (or specific ktsp behave) is fatal in response time sensitive OLTP env. I would like to ask my customer to coalesce those index where massive deleted regularly. Thanks for your help again! Yes, I saw that. I have applied patch 8286901 and set the event for version 10.2.0.4, but the problem still occurs periodically. And as I mentioned before, we see a correlation between enq TX waits and the failed probes on index block reclamation. Which is why I still think that it is a bug. I agree that trying to rebuild or coalesce the indexes are simply attempts to workaround the issue and not solve the root cause. Early on when I started on this issue I did do some index dumps and could clearly see that we had lots of blocks with only 1 or 2 records after our mass delete jobs. I have provided Oracle Support with this information as well as oradump files while the problem is occurring, but they don’t seem to be able to find anything wrong so far. If you are interested in seeing if you are experiencing a high ‘failed probes on index block reclamation’ event run the query below. select SS.snap_id, |
—在最新的11gr2中进行了测试,仍可以重现该问题(如图单条insert引起了6675的buffer_gets,这是在更大量数据的情况下)。 |
我们可以猜测Oracle提供的one-off补丁中可能是为叶块分裂所会扫描的“空块”附加了一个上限,在未达到上限的情况下扫描仍会发生。而在主流的公开的发行版本中Oracle不会引入该补丁的内容。尝试在没有缓存的情况下引起分裂问题,分裂引起了大约4000个块的物理读,但该操作仍在0.12秒(有缓存是0.02秒,如图)内完成了(该测试使用普通ata硬盘,读取速度在100MB/S: Timing buffered disk reads: 306 MB in 3.00 seconds = 101.93 MB/sec);从1月21日的ash视图中可以看到引起split的260会话处于单块读等待(db file sequential read)中,且已等待了43950us约等于44ms;这与良好io的经验值10ms左右有较大出入;我们可以确信io性能问题也是引发此叶块分裂延迟如此显性的一个重要因素。
综上所述,在之前讨论的几个方案中,MSSM方式是不能避免索引分裂引起交易超时问题的;不删除数据的方案在许多对象上不可行;10.2.0.4上的one-off补丁因为目前仅存在Linux版本,可以考虑声请补丁后具体测试(因目前没有补丁所以处于未知状态)。Coalesce合并索引是目前既有的最具可操作性且无副作用的解决方案。
Oracle 10g 中引入了v$osstat 视图方便了dba了解主机负载情况,同时也可以通过oem网页观察到一段时间内主机上负载较高的进程;但如果db未开启oem管理界面,则无法了解过去时段内高负载服务进程的相关信息。以下脚本可以给予一定的帮助。
CREATE TABLE "SYS"."HIGHLOAD_HISTORY" ( "SAMPLE_TIME" DATE, "SPID" NUMBER(10,0), "LOAD" VARCHAR2(7 BYTE), "SID" VARCHAR2(30 BYTE), "USERNAME" VARCHAR2(40 BYTE), "MACHINE" VARCHAR2(64 BYTE), "PROGRAM" VARCHAR2(48 BYTE), "SQL_ID" VARCHAR2(13 BYTE), "SQL_FULLTEXT" CLOB, "INST_ID" NUMBER(2,0), "STATUS" VARCHAR2(8 BYTE) ) --建立记录高负载进程信息的表,内容包括了cpu使用率,及sql(并不十分准确,因为获取spid后需要进行查询)
ps aux|grep $ORACLE_SID|awk '{ if($3>=0.3) print "insert into highload_history select sysdate rec_time,"$2,","$3"%",", ss.sid,ss.username,ss.machine,ss.program,ss.sql_id,(select sql_fulltext from v$sqlarea sq where sq.sql_id=ss.sql_id),(select instance_number from v$instance),ss.status from v$session ss,v$process pr where pr.addr=ss.paddr and pr.spid=",$2";"}' | sqlplus / as sysdba --直接运行即可
set parameter _in_memory_undo = FALSE to disable IMU
Workaround: Disable IMU (set _in_memory_undo=FALSE)
PLEASE NOTE: This bug applies to single instance databases and not RAC as IMU is not enabled in RAC.
注意在RAC系统中IMU是不可用的,所以也就不必要去设置_in_memory_undo=FALSE
The workaround will prevent the problem, but will not fix it.
Note: _in_memory_undo is a dynamic parameter for 10g with values of TRUE or FALSE. It specifies whether there should be in memory undo for transactions. Setting this value to FALSE will disable this feature. This will cause excess redo generation.
_in_memory_undo is applicable when
compatibility >= 10.0
undo_management = AUTO
cluster_database = FALSE
Running IMU transactions may generate out-of-order redo records
Disabling in memory undo (_in_memory_undo=false)
can help to eliminate “In memory undo latch” contention
but there may still be “undo global data” latch contention
as that latch is used regardless of the setting of
_in_memory_undo. The fix for this bug can help reduce
contention on both latches.
The NOLOGGING clause doesn’t prevent redo on all operations, but rather only on a subset. I searched the documentation for examples of this…
http://st-doc.us.oracle.com/11/112/server.112/e16541/parallel007.htm?term=nologging+generate+redo#VLDBG1536
[NO]LOGGING Clause
The [NO]LOGGING clause applies to tables, partitions, tablespaces, and indexes. Virtually no log is generated for certain operations (such as direct-path INSERT) if the NOLOGGING clause is used. The NOLOGGING attribute is not specified at the INSERT statement level but is
instead specified when using the ALTER or CREATE statement for a table, partition, index, or tablespace.
When a table or index has NOLOGGING set, neither parallel nor serial direct-path INSERT operations generate redo logs. Processes running with the NOLOGGING option set run faster because no redo is generated. However, after a NOLOGGING operation against a table,
partition, or index, if a media failure occurs before a backup is performed, then all tables, partitions, and indexes that have been modified might be corrupted.
Direct-path INSERT operations (except for dictionary updates) never generate redo logs if the NOLOGGING clause is used. The NOLOGGING attribute does not affect undo, only redo. To be precise, NOLOGGING allows the direct-path INSERT operation to generate a negligible
amount of redo (range-invalidation redo, as opposed to full image redo).
But I did find an Ask Tom article which is more explicit about what operations generate redo despite the NOLOGGING clause:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:5280714813869
This has a nice table of operations with either No/Archive and No/Logging specified, and you can see redo is generated most of the time.
So to sum up the NOLOGGING clause only works with certain operations and we cannot expect all REDO to be completely halted.
Another item to consider is whether the Indexes on the tables were created with NOLOGGING or not… This is covered in Index generates high redo, although it is in NOLOGGING (Doc ID 1235234.1), so please reveiw that note to see if there are some indexes that can be recreated to reduce redo further.
Question:
在RAC环境中节点之间的OS操作系统时钟一致是clusterware能够稳定运行的重要因素之一,但是如果我们确实有调整OS时间的需求,那么是否真的会影响到RAC的正常运行呢? 具体的影响是如何的呢? 又需要注意哪些方面的因素呢?
Answer:
RAC: Frequently Asked Questions (Doc ID 220970.1)
Does Oracle RAC work with NTP (Network Time Protocol)?
YES! NTP and Oracle RAC are compatible, as a matter of fact, it is recommended to setup NTP in an Oracle RAC cluster, for Oracle 9i Database, Oracle Database 10g, and Oracle Database 11g Release 1.
…
Keep the following points in mind:
# Minor changes in time (in the seconds range) are harmless for Oracle RAC and the Oracle Clusterware. If you intend on making large time changes it is best to shutdown the instances and the entire Oracle Clusterware stack on that node to avoid a false eviction, especially if you are using the Oracle RAC 10g low-brownout patches, which allow really low misscount settings.
# Backup/recovery aspect of large time changes are documented in Note: 77370.1, basically you can’t use RECOVER DATABASE UNTIL TIME to reach the second recovery point, It is possible to overcome with RECOVER DATABASE UNTIL CANCEL or UNTIL CHANGE. If you are doing complete recovery (most of the times) then this is not an issue since the Oracle recovery code uses SCN (System Change Numbers) to advance in the redo/archive logs. The SCN numbers never go back in time (unless a reset-logs operation is performed), there is always an association of an SCN to a human readable timestamp (which may change forward or backwards), hence the issue with recovery until point in time vs. until SCN/Cancel.
# If DBMS_SCHEDULER is in usage it will be affected by time changes, as it’s using actual clock rather than SCN.
# On platforms with OPROCD get fix for <> “OPROCD REBOOTS NODE WHEN TIME IS SET BACK BY XNTPD”
# If NTP is not configured correctly (using -x flag), and diagwait not set to 13 Note: 559365.1 10.2/11.1 RAC systems can be rebooted due to OPROCD, during a leap second event, see Note: 759143.1.
# Daylight saving time adjustments do not affect the system clock, only the displayed time, hence have no impact on the Oracle software.
Apart from these issues, the Oracle RDBMS server is immuned to time changes, i.e. will not affect transaction/read consistency operations.
Also please refer to note:
Dates & Calendars – Frequently Asked Questions (Doc ID 227334.1)
So please perform time changes in small amount using date command only. Doing it precisely will be difficult manually. Therefore using ntpd with -x option could be better solution for this case as well.
In general step in not more than 3 seconds when tuning time backward should be fine.
Can you kindly define and explain of lock_sga parameter?each flatform(HP, IBM, SUN) recommended value of both parameter
and why it is recommended like that value.
LOCK_SGA locks the entire SGA into physical memory. It is usually advisable to lock the SGA into real (physical) memory, especially if the use of virtual memory would include storing some of the SGA using disk space. This parameter is ignored on platforms that do not support it.
Each platform has its own recommendations and support over the parameter value. Hence you would see differences in the recommendations.
+ lock_sga Only locks the entire SGA into physical memory.
+ It can be set to TRUE as long as you want to lock the entire SGA in the physical memeory and your OS supports it. As for as i know it works in all the platforms.
LOCK_SGA locks the entire SGA into physical memory. It is usually advisable to lock the SGA into real (physical) memory, especially if the use of virtual memory would include storing some of the SGA using disk space.
NOTE:: This parameter is ignored on platforms that do not support it.
+ Windows For sure doesn’t support this parameter.
+ AIX Supports it.
+ LOCK_SGA is not supported on Solaris.
+ On Hp-UX if you have mlock (OS privilege) then LOCK_SGA is not required
+ Linux also supports this parameter
You can also review
Note 577898.1 – ORA-27102 Received At Startup When LOCK_SGA Is Set Although Enough Memory Is Available.
lock_sga: Default value: false.
If set to true, the entire SGA will be locked into physical memory (preventing it from being paged out). This may cause problems on systems with insufficient physical memory.
The parameters (and recommendations) are Unix generic.
The lock_sga parameter (which default AND recommended value is false): If you have sufficient memory on the server, you may consider setting it to true. (Forcing the SGA to be locked in physical memory).
On the other hand, if lock_sga is set to true, and you do not have suffficient memory, the SGA will stay in memory, and force other processe, like server processes to swap, and hence cause performance degradation.
For both the hidden and normal configuration parameters, we do not recommend deviations from the default values, without having detailed information on your specific configruation.
If you look at all Oracle products, options, features and parameters, there are billions of combinations. We do not have a complete catalog over each recommended values for each combiantion.
Can you kindly define and explain of this parameter?
_os_sched_high_priority
Regarding _os_sched_high_priority :
Setting LMS priority automatically via _os_sched_high_priority
It has been proven that performance is increased when LMS processes are running in the real time priority class, as opposed to the timesharing class. The parameter _os_sched_high_priority was introduced in Oracle 10g Release 2, and it allows LMS processes to be automatically configured in the real time priority class at instance startup. This feature basically obsoletes the need to manually set the real time priority for the LMS processes using a C program or the renice command.
On Unix platforms, process priority changes are normally executed under the root privilege, and the LMS process priority changes are implemented through an executable � oradism � owned by root with the setuid bit set.
The default value for _os_sched_high_priority in 10.2.0.1 is 1. With _os_sched_high_priority = 1, it means that all LMS processes are set to priority 1 in the real time class. We suggest to always set _os_sched_high_priority=1 in the init.ora, regardless of the default, and only toggle the behavior by restoring the original privileges and permissions on the oradism executable. Other possible values for this parameter are:
� If set to 0, LMS processes are not prioritized at startup, they are kept at the timesharing class as in earlier releases.
� Values higher than 1 give LMS higher priority in the real time class. Those settings are not recommended because they might cause starvation for other Oracle processes, leading to performance degradation.
If LMS processes remain in the timesharing class at instance startup, despite _os_sched_high_priority set to 1, then ownership and/or privileges for oradism may not have been set properly. To correct / verify these settings:
$ ls -l oradism
-rwsr-sr-x 1 root dba 15871 Jun 13 10:32 oradism
# chown root:dba oradism
# chmod 6755 oradism
Note about Linux process priorities:
SCHED_OTHER
static prio 0 : shows dynamic priorities with ‘ps’, ranging from 59-99
SCHED_RR / SCHED_FIFO
static prio 1 -> shows as priority 58 with ‘ps’
static prio 11 -> shows as priority 48 with ‘ps’
static prio 59 -> shows as priority 0 with ‘ps’
static prio 60 -> shows as priority -1 with ‘ps’
static prio 99 -> shows as priority -40 with ‘ps’ (highest RT priority)
On Linux, the LMS process priority can be seen via �ps �efl | grep lms�:
$ ps -efl|grep lms
0 S spommere 2201 1 0 58 0 – 285308 schedu 14:39 ? 00:00:00 ora_lms0_appsu01
0 S spommere 2205 1 0 58 0 – 285308 schedu 14:39 ? 00:00:00 ora_lms1_appsu01
In the above example, the LMS processes are running in the real-time class.
In the example below, the change in process prioritization failed, and the process priority remained at 75, because the permissions on the oradism executable were incorrectly set.
$ ps -efl | grep lms
0 S spommere 2002 1 1 75 0 – 285307 schedu 14:38 ? 00:00:00 ora_lms0_appsu01
0 S spommere 2006 1 1 75 0 – 285308 schedu 14:38 ? 00:00:00 ora_lms1_appsu01
Reference: internal Note 341974.1
Unpublished note 433105.1: ‘LMS Real Time Priority in RAC 10g Release 2 – Things to Consider Before Changing’.
_os_sched_high_priority: Default value is 1.
Setting it to 0 means that all scheduling class manipulation will be disabled.
The parameter is mainly ised in RAC systems, where you may want specific procecces like LMS to run at a higher priority.
Any underscore (hidden) parameters (like _os_sched_high_priority) should not be set unless specificly instructed by Support either via a Service Request or via a MetaLink Note. In all other cases, it’s default value should be used.
(You may see examples of notes by querying in MetaLink with the word _os_sched_high_priority).
Copyright © 2025 · Genesis Framework · WordPress · Log in