运行在操作系统上的进程存在2种系统时序优先级模式:即 实时模式 Real Time(RT) mode, 与分时模式 Time Sharing(TS) mode.
绝大多数Oracle进程运行在TS模式下:
[oracle@rh1 ~]$ ps -efc|grep ora_|grep -v grep oracle 8510 1 TS 23 Mar27 ? 00:00:02 ora_pmon_PROD oracle 8512 1 TS 23 Mar27 ? 00:00:00 ora_psp0_PROD oracle 8514 1 TS 23 Mar27 ? 00:00:00 ora_mman_PROD oracle 8516 1 TS 23 Mar27 ? 00:00:02 ora_dbw0_PROD oracle 8518 1 TS 23 Mar27 ? 00:00:04 ora_lgwr_PROD oracle 8520 1 TS 23 Mar27 ? 00:00:04 ora_ckpt_PROD oracle 8522 1 TS 23 Mar27 ? 00:00:08 ora_smon_PROD oracle 8524 1 TS 23 Mar27 ? 00:00:00 ora_reco_PROD oracle 8526 1 TS 23 Mar27 ? 00:00:34 ora_cjq0_PROD oracle 8528 1 TS 23 Mar27 ? 00:00:06 ora_mmon_PROD oracle 8530 1 TS 24 Mar27 ? 00:00:07 ora_mmnl_PROD oracle 8538 1 TS 23 Mar27 ? 00:00:00 ora_arc0_PROD oracle 8540 1 TS 23 Mar27 ? 00:00:00 ora_arc1_PROD oracle 8548 1 TS 23 Mar27 ? 00:00:00 ora_qmnc_PROD oracle 8555 1 TS 23 Mar27 ? 00:00:00 ora_q000_PROD oracle 8559 1 TS 23 Mar27 ? 00:00:00 ora_q001_PROD oracle 30500 1 TS 23 22:10 ? 00:00:00 ora_j000_PROD
如上所示所有进程均运行在TS模式下且priority均为23|24.
Oracle一般不推荐使用RT模式,因为虽然个别进程可以通过这种方式获得更多的CPU资源,但往往系统的瓶颈并非CPU,即尽管CPU使用率高了,但实际系统TPS并未得到提升。
在10gr2版本后RAC中的LMS进程成为唯一一个使用RT模式的Oracle进程,我们可以通过查询参数_high_priority_processes了解相关信息:
SQL> col name format a40 SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE 2 FROM SYS.x$ksppi x, SYS.x$ksppcv y 3 WHERE x.inst_id = USERENV ('Instance') 4 AND y.inst_id = USERENV ('Instance') 5 AND x.indx = y.indx 6 AND x.ksppinm LIKE '%priority%'; NAME VALUE ---------------------------------------- ---------- _high_priority_processes LMS* _os_sched_high_priority 1
_high_priority_processes通过进程功能名进行匹配,下面我们将提高LGWR及PMON进程的优先级:
SQL> alter system set "_high_priority_processes"='LMS*|LGWR|PMON' scope=spfile; System altered. SQL> startup force; ORACLE instance started. Total System Global Area 281018368 bytes Fixed Size 2083336 bytes Variable Size 150996472 bytes Database Buffers 121634816 bytes Redo Buffers 6303744 bytes Database mounted. Database opened. SQL> !ps -efc|grep ora_|grep -v grep oracle 31441 1 RR 41 22:50 ? 00:00:00 ora_pmon_PROD oracle 31445 1 TS 23 22:50 ? 00:00:00 ora_psp0_PROD oracle 31447 1 TS 23 22:50 ? 00:00:00 ora_mman_PROD oracle 31449 1 TS 23 22:50 ? 00:00:00 ora_dbw0_PROD oracle 31451 1 RR 41 22:50 ? 00:00:00 ora_lgwr_PROD oracle 31455 1 TS 23 22:50 ? 00:00:00 ora_ckpt_PROD oracle 31457 1 TS 23 22:50 ? 00:00:00 ora_smon_PROD oracle 31459 1 TS 22 22:50 ? 00:00:00 ora_reco_PROD oracle 31461 1 TS 23 22:50 ? 00:00:01 ora_cjq0_PROD oracle 31463 1 TS 23 22:50 ? 00:00:01 ora_mmon_PROD oracle 31465 1 TS 24 22:50 ? 00:00:00 ora_mmnl_PROD oracle 31471 1 TS 24 22:50 ? 00:00:00 ora_p000_PROD oracle 31473 1 TS 24 22:50 ? 00:00:00 ora_p001_PROD oracle 31475 1 TS 24 22:50 ? 00:00:00 ora_arc0_PROD oracle 31477 1 TS 22 22:50 ? 00:00:00 ora_arc1_PROD oracle 31481 1 TS 23 22:50 ? 00:00:00 ora_qmnc_PROD oracle 31488 1 TS 23 22:50 ? 00:00:00 ora_q000_PROD oracle 31490 1 TS 23 22:50 ? 00:00:00 ora_q001_PROD oracle 31500 1 TS 23 22:50 ? 00:00:00 ora_j000_PROD
好了lgwr和pmon进程也进入实时模式了,同时priority值上升到了41.
注意:
Oracle默认仅允许LMS进程(11g中多了VKTM进程)使用RT模式是有它的原因的,所以如果不是Oracle support 推荐,您没有任何修改隐式参数的理由。
其次根据Oracle文档[ID 602419.1]的描述,oradism文件(该文件位于$ORACLE_HOME/bin目录下)不正确的权限将导致RT模式无法被正确使用,该文件默认属于root用户并具有s权限。如下测试:
[oracle@rh1 bin]$ ls -la oradism -r-sr-s--- 1 root oinstall 14931 Mar 11 2008 oradism [oracle@rh1 bin]$ su - root Password: [root@rh1 ~]# chown oracle:oinstall /s01/oracle/product/10.2.0/db_1/bin/oradism [root@rh1 ~]# exit logout [oracle@rh1 bin]$ ls -la oradism -r-xr-x--- 1 oracle oinstall 14931 Mar 11 2008 oradism [oracle@rh1 bin]$ sqlplus / as sysdba SQL*Plus: Release 10.2.0.4.0 - Production on Sun Mar 28 23:07:03 2010 Copyright (c) 1982, 2007, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup; ORACLE instance started. Total System Global Area 281018368 bytes Fixed Size 2083336 bytes Variable Size 150996472 bytes Database Buffers 121634816 bytes Redo Buffers 6303744 bytes Database mounted. Database opened. SQL> col name format a35; SQL> col value format a10; SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE 2 FROM SYS.x$ksppi x, SYS.x$ksppcv y 3 WHERE x.inst_id = USERENV ('Instance') 4 AND y.inst_id = USERENV ('Instance') 5 AND x.indx = y.indx 6 AND x.ksppinm LIKE '%priority%'; NAME VALUE ----------------------------------- ---------- _high_priority_processes LMS*|LGWR|PMON _os_sched_high_priority 1 SQL> !ps -efc|grep ora_|grep -v grep oracle 31994 1 TS 23 23:07 ? 00:00:00 ora_pmon_PROD oracle 31998 1 TS 23 23:07 ? 00:00:00 ora_psp0_PROD oracle 32000 1 TS 23 23:07 ? 00:00:00 ora_mman_PROD oracle 32002 1 TS 23 23:07 ? 00:00:00 ora_dbw0_PROD oracle 32004 1 TS 24 23:07 ? 00:00:00 ora_lgwr_PROD oracle 32008 1 TS 22 23:07 ? 00:00:00 ora_ckpt_PROD oracle 32010 1 TS 23 23:07 ? 00:00:00 ora_smon_PROD oracle 32012 1 TS 22 23:07 ? 00:00:00 ora_reco_PROD oracle 32014 1 TS 23 23:07 ? 00:00:01 ora_cjq0_PROD oracle 32016 1 TS 23 23:07 ? 00:00:01 ora_mmon_PROD oracle 32018 1 TS 24 23:07 ? 00:00:00 ora_mmnl_PROD oracle 32026 1 TS 24 23:07 ? 00:00:00 ora_arc0_PROD oracle 32028 1 TS 23 23:07 ? 00:00:00 ora_arc1_PROD oracle 32032 1 TS 23 23:07 ? 00:00:00 ora_qmnc_PROD oracle 32045 1 TS 23 23:07 ? 00:00:00 ora_q000_PROD oracle 32065 1 TS 23 23:08 ? 00:00:00 ora_q001_PROD oracle 32072 1 TS 23 23:08 ? 00:00:00 ora_j000_PROD
that’s great, 显然oradism不仅为Oracle实例提供了内存资源控制功能,还包括了进程优先级分配的权限。
我们应当再次声明hidden parameter不应“滥用”于production environment.
Hdr: 7488159 10.2.0.3 RDBMS 10.2.0.3 RAC PRODID-5 PORTID-87
Abstract: LMS PROCESSES DON’T RUN IN REAL TIME PRIORITY
*** 10/16/08 12:33 am ***
TAR:
—-
7199119.992
PROBLEM:
——–
LMS processes should be running in RT by default stating 10.2.
But LMS remains in low priority (time-sharing) despite following settings:
ls -la oradism
-rwsr-sr-x 1 root dba-21451792 Apr 28 03:06 oradism
_high_priority_processes LMS*
_os_sched_high_priority 1
[GCO270]:oraenv01:/u002/oracle/env01db/10.2.0/bin>ps -efl | grep lms
80008001 R 400 764317 524289 0.1 44 0 15M –
09:44:07 0:00.28 ora_lms0_GCO270
80008001 R 400 764416 524289 0.1 44 0 15M –
09:44:08 0:00.29 ora_lms1_GCO270
80008001 R 400 764432 524289 0.0 44 0 15M –
09:44:08 0:00.29 ora_lms2_GCO270
80008001 S 400 764461 524289 0.0 44 0 15M event
09:44:08 0:00.28 ora_lms3_GCO270
DIAGNOSTIC ANALYSIS:
——————–
Tru64 UNIX process priority
A number between 0 – 63.
44 – 63: represent lowest scheduling priority.
32 – 43: used by system jobs
0 -31: reserved for real-time jobs.
GPRD70]:cgifs01:/usr/users/cgifs01/stat>ps -O SCHED 1045604…
PID USER %CPU PRI UPR NI PPR PSR POL PSET S TTY
TIME COMMAND
849453 cgifs01 0.0 44 44 0 19 6 TS 0 S pts/3
0:00.13 ksh
849453 cgifs01 0.0 44 44 0 19 6 TS 0 S pts/3
0:00.13 ksh
849453 cgifs01 0.0 44 44 0 19 6 TS 0 S pts/3
0:00.13 ksh
849453 cgifs01 0.2 44 44 0 19 6 TS 0 S pts/3
0:00.13 ksh
849453 cgifs01 0.2 44 44 0 19 6 TS 0 S pts/3
0:00.14 ksh
849453 cgifs01 0.2 44 44 0 19 4 TS 0 S pts/3
0:00.14 ksh
849453 cgifs01 0.2 44 44 0 19 7 TS 0 S pts/3
0:00.14 ksh
849453 cgifs01 0.2 44 44 0 19 7 TS 0 S pts/3
0:00.14 ksh
849453 cgifs01 0.6 44 44 0 19 7 TS 0 S pts/3
0:00.15 ksh
849453 cgifs01 0.6 44 44 0 19 7 TS 0 S pts/3
0:00.15 ksh
849453 cgifs01 0.6 44 44 0 19 5 TS 0 S pts/3
0:00.15 ksh
849453 cgifs01 0.6 44 44 0 19 4 TS 0 S pts/3
0:00.16 ksh
849453 cgifs01 0.6 44 44 0 19 6 TS 0 S pts/3
0:00.16 ksh
849453 cgifs01 1.0 44 44 0 19 5 TS 0 S pts/3
0:00.17 ksh
849453 cgifs01 1.0 44 44 0 19 4 TS 0 S pts/3
0:00.17 ksh
849453 cgifs01 1.0 44 44 0 19 6 TS 0 S pts/3
0:00.17 ksh
849453 cgifs01 1.0 44 44 0 19 7 TS 0 S pts/3
0:00.18 ksh
WORKAROUND:
———–
A script to renice the LMSs and LMDs process
Hdr: 9245122 10.2.0.4 RDBMS 10.2.0.4 RAC PRODID-5 PORTID-23
Abstract: HIGH WAITTIME FOR GC_REMASTER
*** 12/28/09 06:57 am ***
BUG TYPE CHOSEN
===============
Performance
SubComponent: Real Application Clusters
=======================================
DETAILED PROBLEM DESCRIPTION
============================
application runs very slow, the database shows high waits for gc_remaster
(it was not the first time of this issue)
DIAGNOSTIC ANALYSIS
===================
col ksppinm for a30
col ksppstvl for a30 tru
select x.ksppinm, y.ksppstvl
from x$ksppi x , x$ksppcv y
where x.indx = y.indx
and x.ksppinm like ‘%_Parameter_name%’ ==> keep the name inside ‘% %’
order by x.ksppinm;
KSPPINM KSPPSTVL
—————————— ——————————
_os_sched_high_priority 1
KSPPINM KSPPSTVL
—————————— ——————————
_high_priority_processes LMS*
defthw99030srv_oracle_DI0DB1> ls -rtl $ORACLE_HOME/bin/oradism
-r-sr-s— 1 root dba-1249392 Apr 4 2008
/oracle/system/dbms1020/bin/oradism
WORKAROUND?
===========
No
TECHNICAL IMPACT
================
Bad performance
RELATED ISSUES (bugs, forums, RFAs)
===================================
N/A
HOW OFTEN DOES THE ISSUE REPRODUCE AT CUSTOMER SITE?
====================================================
Always
DOES THE ISSUE REPRODUCE INTERNALLY?
====================================
Not attempted
Hdr: 9477972 10.2.0.4.0 RDBMS 10.2.0.4.0 VOS PRODID-5 PORTID-23
Abstract: LMS IS NOT RUNNING IN REAL TIME MODE
*** 03/16/10 12:22 am ***
TAR:
—-
PROBLEM:
——–
LMS processes is not running in real time mode.
$ ps -efc | grep lms
oraperf 17813 15326 FSS 59 14:18:03 pts/9 0:00 grep lms
oraperf 23530 1 FSS 1 22:02:43 ? 8:46 ora_lms2_PERF1
oraperf 23542 1 FSS 1 22:02:43 ? 8:46 ora_lms3_PERF1
oraperf 23522 1 FSS 1 22:02:42 ? 8:00 ora_lms0_PERF1
oraperf 23526 1 FSS 59 22:02:42 ? 8:28 ora_lms1_PERF1
oraperf 23546 1 FSS 1 22:02:43 ? 8:36 ora_lms4_PERF1
They are running in FSS.
DIAGNOSTIC ANALYSIS:
——————–
Current setting is,
_os_sched_high_priority = 1
_high_priority_processes = LMS*
I found note.602419.1, but oradism already set correctly.
[oraperf@wsqfinc1a] /opt/app/dtperf $ ls -al $ORACLE_HOME/bin/oradism
-r-sr-s— 1 root perf 1249392 Apr 4 2008
/opt/app/dtperf/perfdb/10.2/bin/oradism
Hdr: 5635098 10.2.0.2.0 KERNEL 2.0.0.1 PRODID-1309 PORTID-46
Abstract: INSTANCE HANGS NO APARENT REASON
From kernel perspective, nothing stands out to point to a kernel issue.
However, I do see around 181 processes in ‘R’ queue state at the time the
dump is taken on node itrac19.cern.ch. I do see LMS process running in
Realtime as well. Many processes are in call sys_gettimeofday() as well,
especially crsd.bin, racgimon and oracle shadows – lms as well.
The hang is the culmination of the following factors.
1. lms0, lms1 running in realtime mode – default starting 10.2 may starve
other processes to get the cpu. Recommend to use _os_sched_high_priority=0
in the init.ora, so that lms does not run in realtime priority.
2. gettimeofday() syscall is very expensive on x86 platform and hence any
kind of tracing or collecting statistics will impact this.
gettimeofday() bug 5132861 is fixed in 11g but cannot be backported to your
db version. I think you can turn off the statistics collection by changing
the parameters:
timed_statistics=false , statistics_level=basic
Hope this helps.
1. Reviewed the sysrq dumps and the box is simply running to capacity limit
here. there are atleast 401 processes in ‘R’ state which means the box is
overloaded and heavily cpu bound.
2. For the parameter change, that parameter would have worked in 10.2.0.2.
However due to bug 5848782 that started in 10.2.0.3, setting
_os_sched_high_priority will not be effective and yes, one need to use
_high_priority_processes=” to mitigate that.
3. ok about the statistics level.
Refer to LMS and Real Time Priority in Oracle RAC 10g Release 2 [ID 558185.1], we would confirm if this behaviour occur on 11gR2 also. If so, please provide command to check/configure DB parameter to avoid this issue.
LMS and VKTM should run in realtime.
Please review the following documents:
http://www.oracle.com/technology/products/database/clusterware/pdf/rac_aix_system_stability.pdf
http://public.dhe.ibm.com/partnerworld/pub/whitepaper/162b6.pdf
http://www-03.ibm.com/support/techdocs/atsmastr.nsf/5cb5ed706d254a8186256c71006d2e0a/7785cdf7e84ea2a6862573b90050e6a6/$FILE/11gR2-tips_August%2025%202010.pdf
LMS priority should be 39, better to follow the above links for configuring AIX along with Oracle RAC database.
SELECT x.ksppinm parm
,y.ksppstvl value
,x.ksppdesc descr
FROM x$ksppi x
,x$ksppcv y
WHERE x.indx = y.indx
AND x.ksppdesc like ‘%&parm_pattern%’
AND x.ksppinm like ‘\_%’ escape ‘\’
ORDER BY x.ksppinm
/
Enter value for parm_pattern: priority
========
also ps -elf | grep lms
_os_sched_high_priority 1 OS high priority level
which is the correct setting
what is the output of ps -elf | grep lms
if you see the pattern: 60 — then lms process is running in realtime