事情要从大约2个月前的一起事故说起,有一套部署在Oracle-Sun Exadata V2 Database Machine上的4节点11.2.0.1 RAC数据库,其中一个节点的RAC关键后台进程LMS报ORA-00600[kjbmprlst:shadow]错误,随后LMS后台进程将该节点上的实例终止。其他节点上的CRS软件检测到该意外终止后,数据库进入全局资源的重新配置过程(Reconfiguration),Reconfiguration在所有剩余节点上都顺利完成了。
但是随后其中一个节点的告警日志中持续出现”Process W000 died, see its trace file”,似乎是实例无法得到分配新进程的必要资源,同时应用程序出现无法登陆该节点上实例的情况,本来4节点的RAC数据库,因为ORA-00600挂了一个,现在又有一个节点登不上,一下变得只剩下一半性能。
随后我赶到了问题现场,继续诊断问题,并发现了以下症状,在此一一列举:
1.尝试远程登录该实例,但是失败,出现ORA-12516 TNS:listener could not find available handler with matching protocol stack”错误。反复登录会出现以下信息:
Linux Error: 12: Cannot allocate memory Additional information: 1 ORA-01034: ORACLE not available
2.确认过ORACLE_SID、ORACLE_HOME等多环境变量后使用”sqlplus / as sysdba”登录却返回”Connected to an idle instance.”(这一点最为蹊跷),无法以sysdba登录就无法收集必要的诊断信息,这个虽然可以通过gdb等手段做systemstate dump,但是暂时绕过
3. 后台进程W000由SMCO进程启动, SMCO进程的日志如下,所报状态为KSOSP_SPAWNED:
Process W000 is dead (pid=2648 req_ver=3812 cur_ver=3812 state=KSOSP_SPAWNED). *** 2011-07-08 02:44:32.971 Process W000 is dead (pid=2650 req_ver=3813 cur_ver=3813 state=KSOSP_SPAWNED).
4. 确认组成instance的内存和后台进程均存活,且仍有日志产生
[oracle@maclean04 trace]$ ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 0 root 644 72 2 0x00000000 32769 root 644 16384 2 0x00000000 65538 root 644 280 2 0xac5ffd78 491524 oracle 660 4096 0 0x96c5992c 1409029 oracle 660 4096 0 [oracle@maclean04 trace]$ ls -l /dev/shm total 34839780 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_0 -rw-r----- 1 oracle oinstall 0 Jun 7 07:19 ora_maclean4_1409029_1 -rw-r----- 1 oracle oinstall 0 Jun 7 07:19 ora_maclean4_1409029_10 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_100 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_101 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_102 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_103 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_104 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_105 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_106 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_107 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_108 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_109 -rw-r----- 1 oracle oinstall 0 Jun 7 07:19 ora_maclean4_1409029_11 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_110 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_111 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_112 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_113 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_114 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_115 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_116 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_117 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_118 -rw-r----- 1 oracle oinstall 268435456 Jun 7 07:19 ora_maclean4_1409029_119 -rw-r----- 1 oracle oinstall 0 Jun 7 07:19 ora_maclean4_1409029_12 ....................... [oracle@maclean04 trace]$ ps -ef|grep ora_ oracle 5466 1 0 Jul03 ? 00:00:18 ora_pz99_maclean4 oracle 14842 10564 0 19:54 pts/9 00:00:00 grep ora_ oracle 18641 1 0 Jun08 ? 00:00:02 ora_q002_maclean4 oracle 23932 1 0 Jun07 ? 00:04:26 ora_pmon_maclean4 oracle 23934 1 0 Jun07 ? 00:00:06 ora_vktm_maclean4 oracle 23938 1 0 Jun07 ? 00:00:00 ora_gen0_maclean4 oracle 23940 1 0 Jun07 ? 00:00:06 ora_diag_maclean4 oracle 23942 1 0 Jun07 ? 00:00:00 ora_dbrm_maclean4 oracle 23944 1 0 Jun07 ? 00:01:01 ora_ping_maclean4 oracle 23946 1 0 Jun07 ? 00:00:16 ora_psp0_maclean4 oracle 23948 1 0 Jun07 ? 00:00:00 ora_acms_maclean4 oracle 23950 1 0 Jun07 ? 02:27:29 ora_dia0_maclean4 oracle 23952 1 0 Jun07 ? 01:19:42 ora_lmon_maclean4 oracle 23954 1 0 Jun07 ? 02:23:59 ora_lmd0_maclean4 oracle 23956 1 5 Jun07 ? 1-13:50:36 ora_lms0_maclean4 oracle 23960 1 4 Jun07 ? 1-12:44:25 ora_lms1_maclean4 oracle 23964 1 0 Jun07 ? 00:00:00 ora_rms0_maclean4 oracle 23966 1 0 Jun07 ? 00:00:00 ora_lmhb_maclean4 oracle 23968 1 0 Jun07 ? 01:58:35 ora_mman_maclean4 oracle 23970 1 0 Jun07 ? 06:28:39 ora_dbw0_maclean4 oracle 23972 1 0 Jun07 ? 06:27:08 ora_dbw1_maclean4 oracle 23974 1 2 Jun07 ? 16:49:56 ora_lgwr_maclean4 oracle 23976 1 0 Jun07 ? 00:20:48 ora_ckpt_maclean4 oracle 23978 1 0 Jun07 ? 00:07:03 ora_smon_maclean4 oracle 23980 1 0 Jun07 ? 00:00:00 ora_reco_maclean4 oracle 23982 1 0 Jun07 ? 00:00:00 ora_rbal_maclean4 oracle 23984 1 0 Jun07 ? 00:01:00 ora_asmb_maclean4 oracle 23986 1 0 Jun07 ? 00:08:15 ora_mmon_maclean4 oracle 23988 1 0 Jun07 ? 00:18:19 ora_mmnl_maclean4 oracle 23992 1 0 Jun07 ? 00:00:00 ora_d000_maclean4 oracle 23994 1 0 Jun07 ? 00:00:00 ora_s000_maclean4 oracle 23996 1 0 Jun07 ? 00:00:00 ora_mark_maclean4 oracle 24065 1 0 Jun07 ? 01:16:54 ora_lck0_maclean4 oracle 24067 1 0 Jun07 ? 00:00:00 ora_rsmn_maclean4 oracle 24079 1 0 Jun07 ? 00:01:02 ora_dskm_maclean4 oracle 24174 1 0 Jun07 ? 00:08:18 ora_arc0_maclean4 oracle 24188 1 0 Jun07 ? 00:08:19 ora_arc1_maclean4 oracle 24190 1 0 Jun07 ? 00:00:59 ora_arc2_maclean4 oracle 24192 1 0 Jun07 ? 00:08:12 ora_arc3_maclean4 oracle 24235 1 0 Jun07 ? 00:00:00 ora_gtx0_maclean4 oracle 24237 1 0 Jun07 ? 00:00:00 ora_rcbg_maclean4 oracle 24241 1 0 Jun07 ? 00:00:00 ora_qmnc_maclean4 oracle 24245 1 0 Jun07 ? 00:00:00 ora_q001_maclean4 oracle 24264 1 0 Jun07 ? 00:08:28 ora_cjq0_maclean4 oracle 25782 1 0 Jun07 ? 00:00:00 ora_smco_maclean4
5.确认在问题发生时系统中仍有大量的空闲内存且未发生大量的SWAP,此外/dev/shm共享内存目录仍有27G的空闲。
6.在其他节点上查询全局动态性能视图gv$resource_limit发现当前故障节点上的登录进程总数上限仅为404,并不多。
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for Linux: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production SQL> select * from global_name; GLOBAL_NAME -------------------------------------------------------------------------------- www.askmac.cn SQL> select * from gv$resource_limit where inst_id=4; RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE ------------------------------ ------------------- --------------- ------------------------------ ------------- processes 50 404 1500 1500 sessions 61 616 2272 2272 enqueue_locks 849 1599 31062 31062 enqueue_resources 846 1007 15016 UNLIMITED ges_procs 47 399 1503 1503 ges_ress 65943 109281 67416 UNLIMITED ges_locks 23448 37966 92350 UNLIMITED ges_cache_ress 7347 14716 0 UNLIMITED ges_reg_msgs 337 5040 3730 UNLIMITED ges_big_msgs 26 502 3730 UNLIMITED ges_rsv_msgs 0 1 1000 1000 gcs_resources 2008435 2876561 3446548 3446548 gcs_shadows 1888276 2392064 3446548 3446548 dml_locks 0 0 9996 UNLIMITED temporary_table_locks 0 45 UNLIMITED UNLIMITED transactions 0 0 2499 UNLIMITED branches 0 2 2499 UNLIMITED cmtcallbk 0 3 2499 UNLIMITED max_rollback_segments 109 129 2499 65535 sort_segment_locks 0 14 UNLIMITED UNLIMITED k2q_locks 0 2 4544 UNLIMITED max_shared_servers 1 1 UNLIMITED UNLIMITED parallel_max_servers 1 19 160 3600
7. Exadata节点系统内核参数文件sysctl.conf中的配置正确:
# Controls the maximum shared segment size, in bytes kernel.shmmax = 68719476736 # Controls the maximum number of shared memory segments, in pages kernel.shmall = 4294967296 ########### BEGIN DO NOT REMOVE Added by Oracle Exadata ########### kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 # bug 8311668 file-max and aio-max-nr fs.file-max = 6815744 # DB install guide says the above fs.aio-max-nr = 1048576 # 8976963 net.ipv4.neigh.bond0.locktime=0 net.ipv4.ip_local_port_range = 9000 65500 # DB install guide says the above net.core.rmem_default = 4194304 net.core.wmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_max = 2097152 # The original DB deployment was net.core.wmem_max = 1048586 but IB works # best for Exadata at the above net.core settings # bug 8268393 remove vm.nr_hugepages = 2048 # bug 8778821 system reboots after 60 sec on panic kernel.panic=60 ########### END DO NOT REMOVE Added by Oracle Exadata ########### ########### BEGIN DO NOT REMOVE Added by Oracle Exadata ########### kernel.shmmax = 64547735961 kernel.shmall = 15758724 ########### END DO NOT REMOVE Added by Oracle Exadata ###########
8. 至此问题还是显得扑朔迷离,主要后台进程和SGA内存的完好,而且操作系统上也仍有大量空闲内存,实例上的资源也没有达到一个临界点。到底是什么造成了无法分配新进程!?
出于谨慎我最后还是检查了系统上的/
但是当我实际打开这个文件后我立即意识到这个配置有问题,似乎少了点什么,以下为该Exadata上的limits.conf文件:
########### BEGIN DO NOT REMOVE Added by Oracle Deployment Scripts ########### oracle soft nproc 2047 oracle hard nproc 16384 oracle soft nofile 65536 oracle hard nofile 65536 ########### END DO NOT REMOVE Added by Oracle Deployment Scripts ###########
显然上述limits.conf中缺少了对memlock参数的设置,在不设置memlock参数的情况下使用缺省的memlock为32,以下为Exadata host上的ulimit输出:
[oracle@maclean4 shm]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 606208
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 2047
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
可以观察到这里的max locked memory确实是缺省的32,而Oracle所推荐的memlock参数却要远大于32。
在Oracle validated Configuration中经过验证的memlock推荐值为50000000,关于Oracle Validated Configuration详见拙作<Understand Oracle Validated Configurations>。
[oracle@rh2 ~]$ cat /etc/security/limits.conf # Oracle-Validated setting for nofile soft limit is 131072 oracle soft nofile 131072 # Oracle-Validated setting for nofile hard limit is 131072 oracle hard nofile 131072 # Oracle-Validated setting for nproc soft limit is 131072 oracle soft nproc 131072 # Oracle-Validated setting for nproc hard limit is 131072 oracle hard nproc 131072 # Oracle-Validated setting for core soft limit is unlimited oracle soft core unlimited # Oracle-Validated setting for core hard limit is unlimited oracle hard core unlimited # Oracle-Validated setting for memlock soft limit is 50000000 oracle soft memlock 50000000 # Oracle-Validated setting for memlock hard limit is 50000000 oracle hard memlock 50000000
搜索Mos可以发现Note[Ora-27102: Out Of Memory: Linux Error: 12: Cannot Allocate Memory with LOCK_SGA=TRUE [ID 401077.1]:指出了因max locked memory过小可能引发Linux Error: 12: Cannot Allocate Memory内存无法分配的问题。
因为修改limits.conf配置文件对已经启动的实例是无效的,所以我们无法通过纠正参数来解决现有的问题。
实际我采用了释放一些资源的方法来workaround了这个问题,通过以下脚本将实例内的所有前台服务进程杀死以释放资源。
ps -ef|grep $SID|grep LOCAL=NO|grep -v grep| awk ‘{print $2}’|xargs kill -9
完成以上命令后出现了终端有点卡的现象,之后恢复正常。尝试使用sysdba本地和远程登录实例均成功,应用的链接也恢复正常。
虽然修复了问题,但是还需要和客户做详尽的说明。我在邮件中阐明了该Exadata一体机上配置文件存在的问题,并提出了几点建议:
1.要求Oracle Support确认该/etc/security/limits.conf中的配置是否合理,是否需要修改
2.设置vm.min_free_kbytes = 51200 内核参数,避免因空闲内存不足引起的性能问题
3.安装OSWatcher监控软件,监控必要的系统资源
客户对我的说法也比较信服,但还是将邮件抄送了原厂Exadata一体机的售前人员。
之后售前人员也曾联系过我,我也做了相同的说明。但原厂售前认为在Exadata一体机是在Oracle美国原厂进行配置安装的,在配置上肯定是最优的,而且该limits.conf中的memlock参数的当前值(32)和推荐值(50000000)之间有如此大的差距,他们认为美国原厂的部署人员不可能犯这么低级的错误。
所以实际他们对我对该memlock参数的说明持一种怀疑的态度,我的建议是就该memlock参数和MOS进行进一步的沟通,以确认该问题。当然这不是我需要完成的工作了。因为对该memlock参数存在分歧,所以短期内也没有修改该参数。
这个case就这样过去了,时间过得很快,转眼已经2个月了。恰巧最近有升级Exadata上数据库到11.2.0.2的项目,所以翻阅了相关patch的readme文档,因为升级RAC到11.2.0.2的前提是Exadata Storage Server Software、InfiniBand Switch Software Version软件版本能够兼容,所以查阅了其兼容列表:
Version Compatibility
The following table lists the Exadata Storage Server software versions that are compatible with each supported Oracle Database 11g Release 2 software version.
Oracle Database Software version | Required Exadata Storage Server Software version |
---|---|
11g Release 2 (11.2.0.2.0) Patch Set 1 | 11.2.2.x |
11g Release 2 (11.2.0.1.0) | 11.2.2.x 11.2.1.x |
The following table lists the InfiniBand Switch software versions that are compatible with each supported Exadata Storage Server software version.
Exadata Storage Server Software version | Required InfiniBand Switch software version |
---|---|
11.2.2.2.2 and later | Exadata Database Machine – Sun Datacenter InfiniBand Switch 36 Switch software version 1.1.3-2 or laterHP Oracle Database Machine – Voltaire ISR 9024D-M and ISR 9024D Switch software 5.1.1 build ID 872 (ISR 9024D-M only) Switch firmware 1.0.0 or higher |
11.2.2.2.0 or earlier | Exadata Database Machine – Sun Datacenter InfiniBand Switch 36 Switch software version 1.0.1-1 or laterHP Oracle Database Machine – Voltaire ISR 9024D-M and ISR 9024D Switch software 5.1.1 build ID 872 (ISR 9024D-M only) Switch firmware 1.0.0 or higher |
为了将Exadata上的RAC数据库升级到11.2.0.2,首先要将Exadata Storage Server Software升级到11.2.2.x,Oracle官方目前推荐的版本是11.2.2.3.2。
所以随后我也翻阅了Exadata Storage Server Software 11.2.2.3.2 的update readme文档,即<Oracle Exadata Database Machine README for patch 12577723 (Support note 1323958.1)>。
该Patch的升级主要分成”Applying the Patch to Exadata Cells”和”Applying the Patch to the Database Server” 2个阶段,即不仅需要在Exadata Cell上实施补丁,还需要在Database节点上实施一个小补丁。
查看”Applying the Patch to the Database Server”章节可以发现存在这样一个步骤:
Repeat the following steps for each database host. If you are taking deployment-wide downtime for the patching, then these steps may be performed in parallel on all database hosts.
- Update the resource limits for the database and the grid users:
Note: - This step does not apply if you have customized the values for your specific deployment and database requirements.
WARNING: - Do not run this step if you have specific customized values in use for your deployment.
- Calculate 75% of the physical memory on the machine using the following command:.
let -i x=($((`cat /proc/meminfo | grep 'MemTotal:' | awk '{print $2}'` * 3 / 4))); echo $x
- Edit the
/etc/security/limits.conf
file to update or add following limits for the database owner (orauser
) and the grid infrastructure user (griduser
). Your deployment may use the same operating system user for both and it may be named asoracle
user. Adjust the following as needed.########## BEGIN DO NOT REMOVE Added by Oracle ########### orauser soft core unlimited orauser hard core unlimited orauser soft nproc 131072 orauser hard nproc 131072 orauser soft nofile 131072 orauser hard nofile 131072 orauser soft memlock <value of x from step 01.a> orauser hard memlock <value of x from step 01.a> griduser soft core unlimited griduser hard core unlimited griduser soft nproc 131072 griduser hard nproc 131072 griduser soft nofile 131072 griduser hard nofile 131072 griduser soft memlock <value of x from step 01.a> griduser hard memlock <value of x from step 01.a> ########### END DO NOT REMOVE Added by Oracle ###########
以上可以看到在正式实施Patch to Database server前做了一个补救措施,那就是为oracle和grid用户添加memlock参数,这里的memlock参数是通过将/proc/meminfo中的MemTotal取75%获得,在<Exadata Server Hardware Details>中我列出了Exadata Database Host的一些硬件参数,其中总内存MemTotal一般为70GB(74027752 kB),换算过来74027752*75%=55520814,也就是说Oracle实际推荐在Exadata上使用的memlock参数应当为55520814,甚至要高于我之前所说的50000000的验证值。
至此该问题终于真相大白!而我们也可以从中学到很多东西:
1.首先我大胆的猜测,实际部署Sun Exadata Machine的因该是Oracle硬件部门,也就是以前Sun的部门。实际在部署过程中,部门与部门之间的充分交流是很重要的,而这里09年匆匆上线的Oracle-Sun Exadata V2显然没有做好,而直到2011 5月发布的Oracle Exadata Database Machine 11g Release 2 (11.2) 11.2.2.3.2 patch 12577723中才反应并解决了该问题
2.IT始终是以人为本,不管是多么高端的服务器、多么先进的技术,如果没有与之相匹配的人和团队来驾驭的话,那么至多只能发挥出50%的效益,在人员对先进技术极端不熟悉的情况下,智能化只是空谈!