如何确认11.2 RAC Grid Infrastructure的时区

本文档描述如何在11.2 Grid Infrastructure 安装完成后修改其GI的时区time zone 设置。

 

一旦OS默认时区被修改,注意确保以下2点:

1. 对于11.2.0.1 ,确保root、grid、oracle用户的shell环境变量TZ设置正确!
2. 对于11.2.0.2及以上版本,确认 $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt 中的参数设置为正确的时区

 

例如:

 

ech $TZ
TZ=US/Pacific

grep TZ s_crsconfig__env.txt 
TZ=US/Pacific

 

 

若timezone设置时区不正确或存在无关字符可能导致RAC Grid Infrastructure无法正常启动。

确保以上2点保证GI能正常启动,这种因时区不正确导致的启动异常问题常发生在OS、GI已经安装完毕后而时区最后被修改的情景中,若发现OS时间与ohasd.log、ocssd.log等日志中的最新记录的时间不匹配,则往往是由该时区问题引起的。

 

在11.2 CRS之前可以通过init.cssd diag来确认时区设置。

 

 

以下为各OS上Timezone默认配置信息

 

 

Linux

To change: /usr/sbin/timeconfig

To display current setting:

cat /etc/sysconfig/clock
ZONE="America/Los_Angeles"
UTC=true
ARC=false
To find out all valid setting: ls -l /usr/share/zoneinfo

Anything that appears in this directory is valid to use, for example, CST6CDT and America/Chicago.

Note: the "Zone" field in /etc/sysconfig/clock could be different than what's in /usr/share/zoneinfo in OL6.3/RHEL6.3, the one from /usr/share/zoneinfo should be used in $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt

hp-ux

To display current setting:

cat /etc/default/tz
PST8PDT
To change: set_parms timezone

To find out all valid setting: ls -l /usr/lib/tztab

Solaris

To display current setting:

grep TZ /etc/TIMEZONE
TZ=US/Pacific
To change, modify TIMEZONE, also run "rtc -z US/pacific; rtc -c"

To find out all valid settings: ls -l /usr/share/lib/zoneinfo

AIX

To display current setting:

grep TZ /etc/environment
TZ=GMT

Upgrade 11.2.0.1 GI/CRS to 11.2.0.2 in Linux

11.2.0.2已经release 1年多了,相对于11.2.0.1要稳定很多。现在我们为客户部署新系统的时候一般都会推荐直接装11.2.0.2(out of place),并打到<Oracle Recommended Patches — Oracle Database>所推荐的PSU。

对于现有的系统则推荐在停机窗口允许的前提下尽可能升级到11.2.0.2上来,当然客户也可以更耐心的等待11.2.0.3版本的release。

针对11.2.0.1到11.2.0.2上的升级工程,其与10g中的升级略有区别。对于misson-critical的数据库必须进行有效的升级演练和备份操作,因为Oracle数据库软件的升级一直是一项复杂的工程,并且具有风险,不能不慎。

同时RAC数据库的升级又要较single-instance单实例的升级来的复杂,主要可以分成以下步骤:

1.  若使用Exadata Database Machine硬件,首先要检查是否需要升级Exadata Storage Software和Infiniband Switch的版本,<Database Machine and Exadata Storage Server 11g Release 2 (11.2) Supported Versions>

2. 完成rolling upgrade Grid Infrastructure的准备工作

3.滚动升级Gird Infrastructure GI软件

4.完成升级RDBMS数据库软件的准备工作

5.具体升级RDBMS数据库软件,包括升级数据字典、并编译失效对象等

这里我们重点介绍的是滚动升级GI/CRS集群软件的准备工作和具体升级步骤,因为11.2.0.2是11gR2的第一个Patchset,且又是首个out of place的大补丁集,所以绝大多数人对新的升级模式并不熟悉。

 

升级GI的准备工作

 

1.注意从11.2.0.1 GI/CRS滚动升级(rolling upgrade)到 11.2.0.2时可能出现意外错误,具体见<Pre-requsite for 11.2.0.1 to 11.2.0.2 ASM Rolling Upgrade>,这里一并引用:

Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.2.0 - Release: 11.2 to 11.2
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2   [Release: 11.2 to 11.2]
Information in this document applies to any platform.
Purpose
This note is to clarify the patch requirement when doing 11.2.0.1 to 11.2.0.2 rolling upgrade.
Scope and Application
Intended audience includes DBA, support engineers.
Pre-requsite for 11.2.0.1 to 11.2.0.2 ASM Rolling Upgrade

There has been some confusion as what patches need to be applied for 11.2.0.1 ASM rolling
upgrade to 11.2.0.2 to be successful. Documentation regarding this is not very clear
(at the time of writing) and a documentation bug has been filed and documentation will be updated in the future.

There are two bugs related to 11.2.0.1 ASM rolling upgrade to 11.2.0.2:

Unpublished bug 9413827: 11201 TO 11202 ASM ROLLING UPGRADE - OLD CRS STACK FAILS TO STOP

Unpublished bug 9706490: LNX64-11202-UD 11201 -> 11202, DG OFFLINE AFTER RESTART CRS STACK DURING UPGRADE

Some of the symptoms include error message when running rootupgrade.sh:

ORA-15154: cluster rolling upgrade incomplete (from bug: 9413827)

or

Diskgroup status is shown offline after the upgrade, crsd.log may have:

2010-05-12 03:45:49.029: [ AGFW][1506556224] Agfw Proxy Server sending the
last reply to PE for message:RESOURCE_START[ora.MYDG1.dg rwsdcvm44 1] ID 4098:1526
TextMessage[CRS-2674: Start of 'ora.MYDG1.dg' on 'rwsdcvm44' failed]
TextMessage[ora.MYDG1.dg rwsdcvm44 1]
ora.MYDG1.dg rwsdcvm44 1:

To overcome this issue, there are two actions you need to take:

a). apply proper patch.
b). change crsconfig_lib.pm

Applying Patch:

1). If $GI_HOME is on version 11.2.0.1.2 (i.e GI PSU2 is applied):

Action: You can apply Patch:9706490 for version 11.2.0.1.2.

Unpublished bug 9413827 is fixed in 11.2.0.1.2 GI PSU2. Patch:9706490 for version
11.2.0.1.2 is built on top of 11.2.0.1.2 GI PSU2 (i.e. includes the 11.2.0.1.2 GI PSU2,
hence includes the fix for 9413827). Applying Patch:9706490 includes both fixes.
opatch will recognize 9706490 is superset of 11.2.0.1.2 GI PSU2 (Patch: 9655006)
and rollback patch 9655006 before applying Patch: 9706490).

2). If $GI_HOME is on version 11.2.0.1.0 (i.e. no GI PSU applied).

Action: You can apply Patch:9706490 for version 11.2.0.1.2. This would make sure you have
applied 11.2.0.1.2 GI PSU2 plus both 9706490 and 9413827 (which is included in GI PSU2).

For platforms that do not have 11.2.0.1.2 GI PSU, then you can apply patch 9413827 on 11.2.0.1.0.

3). If $GI_HOME is on version 11.2.0.1.1 (GI PSU1) (this is rare since GI PSU1 was only
released for Linux platforms and was quite old).

Action: You can rollback GI PSU1 then apply Patch:9706490 on version 11.2.0.1.2
if your platform has 11.2.0.1.2 GI PSU. If your platform does not have 11.2.0.1.2GI PSU,
then apply patch 9413827.

Modify crsconfig_lib.pm

After patch is applied, modify $11.2.0.2_GI_HOME/crs/install/crsconfig_lib.pm:

Before the change:
# grep for bugs 9655006 or 9413827
@cmdout = grep(/(9655006|9413827)/, @output);

After the change:
# grep for bugs 9655006 or 9413827 or 9706490
@cmdout = grep(/(9655006|9413827|9706490)/, @output);

This would prevent rootupgrade.sh from failing when it validates the pre-requsite patches.

这里我们假设环境中的11.2.0.1 GI没有apply任何PSU补丁,为了解决这一”11201 TO 11202 ASM ROLLING UPGRADE – OLD CRS STACK FAILS TO STOP” bug,并成功滚动升级GI,需要在正式升级11.2.0.2 Patchset之前apply 9413827 bug的对应patch。

此外我们还推荐使用最新的opatch工具以避免出现11.2.0.1上opatch无法识别相关patch的问题。

所以我们为了升级GI到11.2.0.2,需要先从MOS下载  3个对应平台(platform)的补丁包,它们是

1.   11.2.0.2.0 PATCH SET FOR ORACLE DATABASE SERVER (Patchset)(patchid:10098816),注意实际上11.2.0.2的这个Patchset由多达7个zip文件组成,如在Linux x86-64平台上:

Patch 10098816 11.2.0.2.0 PATCH SET FOR ORACLE DATABASE SERVER_download

其中升级我们只需要下载1-3的zip包即可,第一、二包是RDBMS Database软件的out of place Patchset,而第三个包为Grid Infrastructure/CRS软件的out of place Patchset,实际在本篇文章(只升级GI)中仅会用到p10098816_112020_Linux-x86-64_3of7.zip这个压缩包。

2.  Patch 9413827: 11201 TO 11202 ASM ROLLING UPGRADE – OLD CRS STACK FAILS TO STOP(patchid:9413827)

3.  Patch 6880880: OPatch 11.2 (patchid:6880880),最新的opatch工具

2. 在所有节点上安装最新的opatch工具,该步骤不需要停止任何服务:

切换到GI拥有者用户,并移动原有的Opatch目录,将新的Opatch安装到CRS_HOME

su - grid

[grid@vrh1 ~]$ mv $CRS_HOME/OPatch $CRS_HOME/OPatch_old
[grid@vrh1 ~]$ unzip /tmp/p6880880_112000_Linux-x86-64.zip -d $CRS_HOME

确认opatch版本

[grid@vrh1 ~]$ $CRS_HOME/OPatch/opatch
Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

3.  在所有节点上滚动安装BUNDLE Patch for Base Bug 9413827补丁包:

1.切换到GI拥有者用户,并确认已经安装的补丁

su - grid 

opatch lsinventory -detail -oh $CRS_HOME

Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

Oracle Home       : /g01/11.2.0/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.6
OUI version       : 11.2.0.1.0
Log file location : /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-09-04_19-08-33PM.log

Lsinventory Output file location :
/g01/11.2.0/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-09-04_19-08-33PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.1.0
There are 1 products installed in this Oracle Home.
........................
###########################################################################

2. 解压之前下载的 p9413827_11201_$platform.zip的补丁包

 unzip p9413827_112010_Linux-x86-64.zip 

###########################################################################

3. 切换到DB HOME拥有者身份,在本地节点上停止RDBMS DB HOME相关的资源:

su - oracle

语法:
 % [RDBMS_HOME]/bin/srvctl stop home -o [RDBMS_HOME] -s [status file location] -n [node_name]

srvctl stop home -o $ORACLE_HOME  -n vrh1 -s stop_db_res           

cat stop_db_res
db-vprod

 hostname
www.askmac.cn

###########################################################################

4. 切换到root用户执行rootcrs.pl -unlock 命令

[root@vrh1 ~]# $CRS_HOME/crs/install/rootcrs.pl -unlock 

2011-09-04 20:46:53: Parsing the host name
2011-09-04 20:46:53: Checking for super user privileges
2011-09-04 20:46:53: User has super user privileges
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.crsd' on 'vrh1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'vrh1'
CRS-2673: Attempting to stop 'ora.SYSTEMDG.dg' on 'vrh1'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'vrh1'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'vrh1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'vrh1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.vrh1.vip' on 'vrh1'
CRS-2677: Stop of 'ora.vrh1.vip' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.vrh1.vip' on 'vrh2'
CRS-2677: Stop of 'ora.registry.acfs' on 'vrh1' succeeded
CRS-2676: Start of 'ora.vrh1.vip' on 'vrh2' succeeded
CRS-2677: Stop of 'ora.SYSTEMDG.dg' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'vrh1'
CRS-2673: Attempting to stop 'ora.eons' on 'vrh1'
CRS-2677: Stop of 'ora.ons' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'vrh1'
CRS-2677: Stop of 'ora.net1.network' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.eons' on 'vrh1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'vrh1' has completed
CRS-2677: Stop of 'ora.crsd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'vrh1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.evmd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'vrh1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'vrh1'
CRS-2677: Stop of 'ora.cssd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'vrh1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'vrh1'
CRS-2677: Stop of 'ora.gipcd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'vrh1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vrh1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /g01/11.2.0/grid

###########################################################################

5.以RDBMS HOME拥有者用户执行patch目录下的prepatch.sh脚本

su - oracle

% custom/server/9413827/custom/scripts/prepatch.sh -dbhome [RDBMS_HOME]

[oracle@vrh1 tmp]$ 9413827/custom/server/9413827/custom/scripts/prepatch.sh -dbhome $ORACLE_HOME

9413827/custom/server/9413827/custom/scripts/prepatch.sh completed successfully.

###########################################################################

6.实际apply patch

以GI/CRS拥有者用户执行以下命令

 % opatch napply -local -oh [CRS_HOME] -id 9413827

su - grid

cd /tmp/9413827/

opatch napply -local -oh $CRS_HOME -id 9413827

Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

UTIL session

Oracle Home       : /g01/11.2.0/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.6
OUI version       : 11.2.0.1.0
Log file location : /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-09-04_20-52-37PM.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   9413827  

Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name: 

You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]:  y

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/g01/11.2.0/grid')

Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Applying interim patch '9413827' to OH '/g01/11.2.0/grid'

Patching component oracle.crs, 11.2.0.1.0...
Patches 9413827 successfully applied.
Log file location: /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-09-04_20-52-37PM.log

OPatch succeeded.

以DB/RDBMS拥有者用户执行以下命令

su - oracle
cd /tmp/9413827/

% opatch napply custom/server/ -local -oh [RDBMS_HOME] -id 9413827

opatch napply custom/server/ -local -oh $ORACLE_HOME -id 9413827

Verifying the update...
Inventory check OK: Patch ID 9413827 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 9413827 are present in Oracle Home.
Running make for target install
Running make for target install

The local system has been patched and can be restarted.

UtilSession: N-Apply done.

OPatch succeeded.

###########################################################################

7. 配置HOME目录

以root用户执行以下命令

 chmod +w $CRS_HOME/log/[nodename]/agent
 chmod +w $CRS_HOME/log/[nodename]/agent/crsd

以DB/RDBMS拥有者用户执行以下命令
su - oracle

 cd /tmp/9413827/

% custom/server/9413827/custom/scripts/postpatch.sh -dbhome [RDBMS_HOME]

[oracle@vrh1 9413827]$ custom/server/9413827/custom/scripts/postpatch.sh -dbhome $ORACLE_HOME
Reading /s01/orabase/product/11.2.0/dbhome_1/install/params.ora..
Reading /s01/orabase/product/11.2.0/dbhome_1/install/params.ora..
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/racgwrap
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/srvctl
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/srvconfig
Parsing file /s01/orabase/product/11.2.0/dbhome_1/bin/cluvfy
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/racgwrap
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/srvctl
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/srvconfig
Verifying file /s01/orabase/product/11.2.0/dbhome_1/bin/cluvfy
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/racgwrap
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/srvctl
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/srvconfig
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/cluvfy
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/racgmain
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/racgeut
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/diskmon.bin
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/lsnodes
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/osdbagrp
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/rawutl
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/srvm/admin/ractrans
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/srvm/admin/getcrshome
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/gnsd
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/bin/crsdiag.pl
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libhasgen11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libclsra11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libdbcfg11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libocr11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libocrb11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libocrutl11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libuini11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/librdjni11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libgns11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libgnsjni11.so
Reapplying file permissions on /s01/orabase/product/11.2.0/dbhome_1/lib/libagfw11.so

###########################################################################

8.以root用户重启CRS进程

# $CRS_HOME/crs/install/rootcrs.pl -patch 

2011-09-04 21:03:32: Parsing the host name
2011-09-04 21:03:32: Checking for super user privileges
2011-09-04 21:03:32: User has super user privileges
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
CRS-4123: Oracle High Availability Services has been started.

# $ORACLE_HOME/bin/srvctl start home -o $ORACLE_HOME -s $STATUS_FILE -n nodename

###########################################################################

9. 使用opatch命令确认补丁安装成功

 opatch lsinventory -detail -oh $CRS_HOME
 opatch lsinventory -detail -oh $RDBMS_HOME

###########################################################################

10. 在其他节点上重复以上步骤,直到在所有节点上成功安装该补丁

###########################################################################

注意AIX平台上有额外的注意事项:

# Special Instruction for AIX
# ---------------------------
#
# During the application of this patch should you see any errors with regards
# to files being locked or opatch  being  unable to copy files then this
#
# could be as result of a process which requires termination or an additional
#
# file needing to be unloaded from the system cache.
#
#
# To try and identify the likely cause please execute the following  commands
#
# and provide the output to your support representative, who will be  able to
#
# identify the corrective steps.
#
#
#     genld -l | grep [CRS_HOME]
#
#     genkld | grep [CRS_HOME]    ( full or partial path will do )
#
#
# Simple Case Resolution:
#
# If genld returns data then a currently executing process has something open
# in
# the [CRS_HOME] directory, please terminate the process as
# required/recommended.
#
#
#  If genkld return data then please remove the enteries from the
#  OS system cache by using the slibclean command as root;
#
#
#     slibclean
#
###########################################################################
#
#  Patch Deinstallation Instructions:
#  ----------------------------------
#
#  To roll back the patch, follow all of the above steps 1-5. In step 6,
#  invoke the following opatch commands to roll back the patch in all homes.
#
#  % opatch rollback -id 9413827 -local -oh [CRS_HOME]
#
#  % opatch rollback -id 9413827 -local -oh [RDBMS_HOME]
#
#  Afterwards, continue with steps 7-9 to complete the procedure.
#
###########################################################################
#
#  If you have any problems installing this PSE or are not sure
#  about inventory setup please call Oracle support.
#
###########################################################################

 

正式升级GI到11.2.0.2

 

1. 解压软件包,如上所述第三个zip包为grid软件

unzip p10098816_112020_Linux-x86-64_3of7.zip

 

2. 以GI拥有者用户启动GI/CRS的OUI安装界面,并选择Out of Place的安装目录

(grid)$ unset ORACLE_HOME ORACLE_BASE ORACLE_SID
(grid)$ export DISPLAY=:0
(grid)$ cd /u01/app/oracle/patchdepot/grid
(grid)$ ./runInstaller
Starting Oracle Universal Installer…

在”Select Installation Options”屏幕中选择Upgrade Oracle Grid Infrastructure or Oracle Automatic Storage Management

 

upgrade_110202_GI

upgrade_110202_GI_a

 

选择不同于现有GI软件的目录

 

upgrade_110202_GI_b

完成安装后会提示要以root用户执行rootupgrade.sh

upgrade_110202_GI_c

3. 注意在正式执行rootupgrade.sh之前数据库服务在所有节点上都是可用的,而在执行rootupgrade.sh脚本期间,本地节点的CRS将短暂关闭,也就是说滚动升级期间至少有一个节点不用

因为unpublished bug 10011084 and unpublished bug 10128494的关系,在执行rootupgrade.sh之前需要修改crsconfig_lib.pm参数文件,修改方式如下:

cp $NEW_CRS_HOME/crs/install/crsconfig_lib.pm $NEW_CRS_HOME/crs/install/crsconfig_lib.pm.bak
vi $NEW_CRS_HOME/crs/install/crsconfig_lib.pm

从以上配置文件中修改如下行,并使用diff命令确认

From
 @cmdout = grep(/$bugid/, @output);
To
  @cmdout = grep(/(9655006|9413827)/, @output);

From
my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR
To
my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR read_file

$ diff crsconfig_lib.pm.orig crsconfig_lib.pm
699c699
< my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR --- >
my @exp_func = qw(check_CRSConfig validate_olrconfig validateOCR read_file
13277c13277
< @cmdout = grep(/$bugid/, @output); --- > @cmdout = grep(/(9655006|9413827)/, @output);

cp /g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm /g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm.bak

并在所有节点上复制该配置文件
scp /g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm vrh2:/g01/11.2.0.2/grid/crs/install/crsconfig_lib.pm

如果觉得麻烦,那么也可以直接从这里下载修改好的crsconfig_lib.pm

由于 bug 10056593 和 bug 10241443 的缘故执行rootupgrde.sh的过程中还可能出现以下错误

Due to bug 10056593, rootupgrade.sh will report this error and continue. This error is ignorable.

Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256

Due to bug 10241443, rootupgrade.sh may report the following error when installing the cvuqdisk package.
This error is ignorable.

    ls: /usr/sbin/smartctl: No such file or directory
    /usr/sbin/smartctl not found.

以上错误可以被忽略,不会影响到升级。

4.正式执行rootupgrade.sh脚本,建议从负载较高的节点开始

[root@vrh1 grid]# /g01/11.2.0.2/grid/rootupgrade.sh
Running Oracle 11g root script...

The following environment variables are set as:
ORACLE_OWNER= grid
ORACLE_HOME= /g01/11.2.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /g01/11.2.0.2/grid/crs/install/crsconfig_params
Creating trace directory
Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256

ASM upgrade has started on first node.

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.crsd' on 'vrh1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'vrh1'
CRS-2673: Attempting to stop 'ora.SYSTEMDG.dg' on 'vrh1'
CRS-2673: Attempting to stop 'ora.registry.acfs' on 'vrh1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.vrh1.vip' on 'vrh1'
CRS-2677: Stop of 'ora.vrh1.vip' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.vrh1.vip' on 'vrh2'
CRS-2677: Stop of 'ora.registry.acfs' on 'vrh1' succeeded
CRS-2676: Start of 'ora.vrh1.vip' on 'vrh2' succeeded
CRS-2677: Stop of 'ora.SYSTEMDG.dg' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'vrh1'
CRS-2673: Attempting to stop 'ora.eons' on 'vrh1'
CRS-2677: Stop of 'ora.ons' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'vrh1'
CRS-2677: Stop of 'ora.net1.network' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.eons' on 'vrh1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'vrh1' has completed
CRS-2677: Stop of 'ora.crsd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'vrh1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.evmd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'vrh1'
CRS-2677: Stop of 'ora.cssdmonitor' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'vrh1'
CRS-2677: Stop of 'ora.cssd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'vrh1'
CRS-2677: Stop of 'ora.diskmon' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'vrh1'
CRS-2677: Stop of 'ora.gipcd' on 'vrh1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vrh1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deleted 1 keys from OCR.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

 

最后执行rootupgrade.sh脚本的节点会出现以下GI/CRS成功升级的信息:

 

Successfully deleted 1 keys from OCR.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9312: Existing ADVM/ACFS installation detected.
ACFS-9314: Removing previous ADVM/ACFS installation.
ACFS-9315: Previous ADVM/ACFS components successfully removed.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Started to upgrade the Oracle Clusterware. This operation may take a few minutes.
Started to upgrade the CSS.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Oracle Clusterware operating version was successfully set to 11.2.0.2.0

ASM upgrade has finished on last node.

Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

5. 确认GI/CRS的版本

su - grid

$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

 hostname
www.askmac.cn

/g01/11.2.0.2/grid/OPatch/opatch lsinventory -oh /g01/11.2.0.2/grid
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.

Oracle Home       : /g01/11.2.0.2/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /g01/11.2.0.2/grid/oui
Log file location : /g01/11.2.0.2/grid/cfgtoollogs/opatch/opatch2011-09-05_02-17-19AM.log

Patch history file: /g01/11.2.0.2/grid/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /g01/11.2.0.2/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-09-05_02-17-19AM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.

6.更新bash_profile , 将CRS_HOME、ORACLE_HOME、PATH等变量指向新的GI目录

为11.2.0.2 Grid Infrastructure添加节点

在之前的文章中我介绍了为10g RAC Cluster添加节点的具体步骤。在11gr2中Oracle CRS升级为Grid Infrastructure,通过GI我们可以更方便地控制CRS资源如:VIP、ASM等等,这也导致了在为11.2中的GI添加节点时,同10gr2相比有着较大的差异。

这里我们要简述在11.2中为GI ADD NODE的几个要点:

一、准备工作

准备工作是不可忽略的,在10g RAC Cluster添加节点中我列举了必须完成的先决条件,在11.2 GI中这些条件依然有效,但请注意以下2点:

1.不仅要为oracle用户配置用户等价性,也要为grid(GI安装用户)用户配置;除非你同时使用oracle安装GI和RDBMS,这是不推荐的

2.在11.2 GI中推出了octssd(Oracle Cluster Synchronization Service Daemon)时间同步服务,如果打算使用octssd的话那么建议禁用ntpd事件服务,具体方法如下:

# service ntpd stop
Shutting down ntpd:                                        [  OK  ]
# chkconfig ntpd off
# mv /etc/ntp.conf /etc/ntp.conf.orig
# rm /var/run/ntpd.pid

3.使用cluster verify工具验证新增节点是否满足cluster的要求:

cluvfy stage -pre nodeadd -n <NEW NODE>

具体用法如:

su - grid

[grid@vrh1 ~]$ cluvfy stage -pre nodeadd -n vrh3

Performing pre-checks for node addition 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed

Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh3:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Run level check passed
Hard limits check failed for "maximum open file descriptors"
Check failed on nodes:
        vrh3
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: vrh3

File "/etc/resolv.conf" is not consistent across nodes

Pre-check for node addition was unsuccessful on all the nodes.

一般来说如果我们不使用DNS解析域名方式的话,那么resolv.conf不一直的问题可以忽略,但在slient安装模式下可能造成我们的操作无法完成,这个后面会介绍。

二、向GI中加入新的节点

注意11.2.0.2 GI添加节点的关键脚本addNode.sh可能存在Bug,如官方文档所述当希望使用Interactive Mode交互模式启动OUI界面添加节点时,只要运行addNode.sh脚本即可,实际情况则不是这样:

documentation said:
Go to CRS_home/oui/bin and run the addNode.sh script on one of the existing nodes.
Oracle Universal Installer runs in add node mode and the Welcome page displays.
Click Next and the Specify Cluster Nodes for Node Addition page displays.

we done:

运行addNode.sh要求以GI拥有者身份运行该脚本,一般为grid用户,要求在已有的正运行GI的节点上启动脚本

[grid@vrh1 ~]$ cd $ORA_CRS_HOME/oui/bin

[grid@vrh1 bin]$ ./addNode.sh
ERROR:
Value for CLUSTER_NEW_NODES not specified.

USAGE:
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl  {-pre|-post} 

/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] CLUSTER_NEW_NODES={}
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] CLUSTER_NEW_NODES={} 
CLUSTER_NEW_VIRTUAL_HOSTNAMES={}

/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -pre [-silent] -responseFile
/g01/11.2.0/grid/cv/cvutl/check_nodeadd.pl -post [-silent]

我们的本意是期望使用图形化的交互界面的OUI(runInstaller -addnode)来新增节点,然而addNode.sh居然让我们输入一些参量,而且其调用的check_nodeadd.pl脚本使用的是silent模式。

在MOS和GOOGLE上搜了一圈,基本所有的文档都推荐使用silent模式来添加节点,无法只好转到静默添加上来。实际上静默添加所需要提供的参数并不多,这可能是这种方式得到推崇的原因之一,但是这里又碰到问题了:

语法SYNTAX:
./addNode.sh –silent 
"CLUSTER_NEW_NODES={node2}" 
"CLUSTER_NEW_PRIVATE_NODE_NAMES={node2-priv}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={node2-vip}"

在我们的例子中具体命令如下

./addNode.sh -silent
"CLUSTER_NEW_NODES={vrh3}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}"
"CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}" 

以上命令因为采用silent模式所以没有任何窗口输出(实际上会输出到 /tmp/silentInstall.log日志文件中),去掉-silent参数

./addNode.sh  "CLUSTER_NEW_NODES={vrh3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}"

Performing pre-checks for node addition 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed

Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "vrh3:/tmp"
Free disk space check passed for "vrh1:/tmp"
Check for multiple users with UID value 54322 passed
User existence check passed for "grid"
Run level check passed
Hard limits check failed for "maximum open file descriptors"
Check failed on nodes:
        vrh3
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed

User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
PRVF-5636 : The DNS response time for an unreachable node exceeded "15000" ms on following nodes: vrh3

File "/etc/resolv.conf" is not consistent across nodes

Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.

Pre-check for node addition was unsuccessful on all the nodes.

在addNode.sh正式添加节点之前它也会调用cluvfy工具来验证新加入节点是否满足条件,如果不满足则拒绝下一步操作。因为我们在之前已经验证过了新节点的可用性,所以这里完全可以跳过addNode.sh的验证,具体来看一下addNode.sh脚本的内容:

[grid@vrh1 bin]$ cat addNode.sh 

#!/bin/sh
OHOME=/g01/11.2.0/grid
INVPTRLOC=$OHOME/oraInst.loc
ADDNODE="$OHOME/oui/bin/runInstaller -addNode -invPtrLoc $INVPTRLOC ORACLE_HOME=$OHOME $*"
if [ "$IGNORE_PREADDNODE_CHECKS" = "Y" -o ! -f "$OHOME/cv/cvutl/check_nodeadd.pl" ]
then
        $ADDNODE
else
        CHECK_NODEADD="$OHOME/perl/bin/perl $OHOME/cv/cvutl/check_nodeadd.pl -pre $*"
        $CHECK_NODEADD
        if [ $? -eq 0 ]
        then
        $ADDNODE
        fi
fi

可以看到存在一个IGNORE_PREADDNODE_CHECKS环境变量可以控制是否进行节点新增的预检查,我们手动设置该变量,之后再次运行addNode.sh脚本:

export IGNORE_PREADDNODE_CHECKS=Y

./addNode.sh  "CLUSTER_NEW_NODES={vrh3}"
"CLUSTER_NEW_VIRTUAL_HOSTNAMES={vrh3-vip}" "CLUSTER_NEW_PRIVATE_NODE_NAMES={vrh3-priv}"
> add_node.log  2>&1

另开一个窗口可以监控新增节点的过程日志

tail -f add_node.log 

Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 5951 MB    Passed
Checking monitor: must be configured to display at least 256 colors.    Actual 16777216    Passed
Oracle Universal Installer, Version 11.2.0.2.0 Production
Copyright (C) 1999, 2010, Oracle. All rights reserved.

Performing tests to see whether nodes vrh2,vrh3 are available
............................................................... 100% Done.

.
-----------------------------------------------------------------------------
Cluster Node Addition Summary
Global Settings
   Source: /g01/11.2.0/grid
   New Nodes
Space Requirements
   New Nodes
      vrh3
         /: Required 6.66GB : Available 32.40GB
Installed Products
   Product Names
      Oracle Grid Infrastructure 11.2.0.2.0
      Sun JDK 1.5.0.24.08
      Installer SDK Component 11.2.0.2.0
      Oracle One-Off Patch Installer 11.2.0.0.2
      Oracle Universal Installer 11.2.0.2.0
      Oracle USM Deconfiguration 11.2.0.2.0
      Oracle Configuration Manager Deconfiguration 10.3.1.0.0
      Enterprise Manager Common Core Files 10.2.0.4.3
      Oracle DBCA Deconfiguration 11.2.0.2.0
      Oracle RAC Deconfiguration 11.2.0.2.0
      Oracle Quality of Service Management (Server) 11.2.0.2.0
      Installation Plugin Files 11.2.0.2.0
      Universal Storage Manager Files 11.2.0.2.0
      Oracle Text Required Support Files 11.2.0.2.0
      Automatic Storage Management Assistant 11.2.0.2.0
      Oracle Database 11g Multimedia Files 11.2.0.2.0
      Oracle Multimedia Java Advanced Imaging 11.2.0.2.0
      Oracle Globalization Support 11.2.0.2.0
      Oracle Multimedia Locator RDBMS Files 11.2.0.2.0
      Oracle Core Required Support Files 11.2.0.2.0
      Bali Share 1.1.18.0.0
      Oracle Database Deconfiguration 11.2.0.2.0
      Oracle Quality of Service Management (Client) 11.2.0.2.0
      Expat libraries 2.0.1.0.1
      Oracle Containers for Java 11.2.0.2.0
      Perl Modules 5.10.0.0.1
      Secure Socket Layer 11.2.0.2.0
      Oracle JDBC/OCI Instant Client 11.2.0.2.0
      Oracle Multimedia Client Option 11.2.0.2.0
      LDAP Required Support Files 11.2.0.2.0
      Character Set Migration Utility 11.2.0.2.0
      Perl Interpreter 5.10.0.0.1
      PL/SQL Embedded Gateway 11.2.0.2.0
      OLAP SQL Scripts 11.2.0.2.0
      Database SQL Scripts 11.2.0.2.0
      Oracle Extended Windowing Toolkit 3.4.47.0.0
      SSL Required Support Files for InstantClient 11.2.0.2.0
      SQL*Plus Files for Instant Client 11.2.0.2.0
      Oracle Net Required Support Files 11.2.0.2.0
      Oracle Database User Interface 2.2.13.0.0
      RDBMS Required Support Files for Instant Client 11.2.0.2.0
      RDBMS Required Support Files Runtime 11.2.0.2.0
      XML Parser for Java 11.2.0.2.0
      Oracle Security Developer Tools 11.2.0.2.0
      Oracle Wallet Manager 11.2.0.2.0
      Enterprise Manager plugin Common Files 11.2.0.2.0
      Platform Required Support Files 11.2.0.2.0
      Oracle JFC Extended Windowing Toolkit 4.2.36.0.0
      RDBMS Required Support Files 11.2.0.2.0
      Oracle Ice Browser 5.2.3.6.0
      Oracle Help For Java 4.2.9.0.0
      Enterprise Manager Common Files 10.2.0.4.3
      Deinstallation Tool 11.2.0.2.0
      Oracle Java Client 11.2.0.2.0
      Cluster Verification Utility Files 11.2.0.2.0
      Oracle Notification Service (eONS) 11.2.0.2.0
      Oracle LDAP administration 11.2.0.2.0
      Cluster Verification Utility Common Files 11.2.0.2.0
      Oracle Clusterware RDBMS Files 11.2.0.2.0
      Oracle Locale Builder 11.2.0.2.0
      Oracle Globalization Support 11.2.0.2.0
      Buildtools Common Files 11.2.0.2.0
      Oracle RAC Required Support Files-HAS 11.2.0.2.0
      SQL*Plus Required Support Files 11.2.0.2.0
      XDK Required Support Files 11.2.0.2.0
      Agent Required Support Files 10.2.0.4.3
      Parser Generator Required Support Files 11.2.0.2.0
      Precompiler Required Support Files 11.2.0.2.0
      Installation Common Files 11.2.0.2.0
      Required Support Files 11.2.0.2.0
      Oracle JDBC/THIN Interfaces 11.2.0.2.0
      Oracle Multimedia Locator 11.2.0.2.0
      Oracle Multimedia 11.2.0.2.0
      HAS Common Files 11.2.0.2.0
      Assistant Common Files 11.2.0.2.0
      PL/SQL 11.2.0.2.0
      HAS Files for DB 11.2.0.2.0
      Oracle Recovery Manager 11.2.0.2.0
      Oracle Database Utilities 11.2.0.2.0
      Oracle Notification Service 11.2.0.2.0
      SQL*Plus 11.2.0.2.0
      Oracle Netca Client 11.2.0.2.0
      Oracle Net 11.2.0.2.0
      Oracle JVM 11.2.0.2.0
      Oracle Internet Directory Client 11.2.0.2.0
      Oracle Net Listener 11.2.0.2.0
      Cluster Ready Services Files 11.2.0.2.0
      Oracle Database 11g 11.2.0.2.0
-----------------------------------------------------------------------------

Instantiating scripts for add node (Monday, August 15, 2011 10:15:35 PM CST)
.                                                                 1% Done.
Instantiation of add node scripts complete

Copying to remote nodes (Monday, August 15, 2011 10:15:38 PM CST)
...............................................................................................                                 96% Done.
Home copied to new nodes

Saving inventory on nodes (Monday, August 15, 2011 10:21:02 PM CST)
.                                                               100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session.
However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script at '/g01/oraInventory/orainstRoot.sh'
with root privileges on nodes 'vrh3'.
If you do not register the inventory, you may not be able to update or
patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each cluster node.
/g01/oraInventory/orainstRoot.sh #On nodes vrh3
/g01/11.2.0/grid/root.sh #On nodes vrh3
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts in each cluster node

The Cluster Node Addition of /g01/11.2.0/grid was successful.
Please check '/tmp/silentInstall.log' for more details.

以上GI软件的安装成功了,接下来我们还需要在新加入的节点上运行2个关键的脚本,千万不要忘记这一点!:

运行orainstRoot.sh 和 root.sh脚本要求以root身份
su - root 

[root@vrh3]# cat /etc/oraInst.loc
inventory_loc=/g01/oraInventory                     --这里是oraInventory的位置
inst_group=asmadmin

[root@vrh3 ~]# cd /g01/oraInventory

[root@vrh3 oraInventory]# ./orainstRoot.sh
Creating the Oracle inventory pointer file (/etc/oraInst.loc)
Changing permissions of /g01/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /g01/oraInventory to asmadmin.
The execution of the script is complete.

运行CRS_HOME下的root.sh脚本,可能会有警告但不要紧

[root@vrh3 ~]# cd $ORA_CRS_HOME

[root@vrh3 g01]# /g01/11.2.0/grid/root.sh
Running Oracle 11g root script...

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /g01/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...

Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
Creating trace directory
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
OLR initialization - successful
Adding daemon to inittab
ACFS-9200: Supported
ACFS-9300: ADVM/ACFS distribution files found.
ACFS-9307: Installing requested ADVM/ACFS software.
ACFS-9308: Loading installed ADVM/ACFS drivers.
ACFS-9321: Creating udev for ADVM/ACFS.
ACFS-9323: Creating module dependencies - this may take some time.
ACFS-9327: Verifying ADVM/ACFS devices.
ACFS-9309: ADVM/ACFS installation correctness verified.
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node vrh1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
/g01/11.2.0/grid/bin/srvctl start listener -n vrh3 ... failed
Failed to perform new node configuration at /g01/11.2.0/grid/crs/install/crsconfig_lib.pm line 8255.
/g01/11.2.0/grid/perl/bin/perl -I/g01/11.2.0/grid/perl/lib -I/g01/11.2.0/grid/crs/install 
/g01/11.2.0/grid/crs/install/rootcrs.pl execution failed

以上会出现了2个小错误:

1.新增节点上LISTENER启动失败的问题可以忽略,这是因为RDBMS_HOME仍未安装,但CRS尝试去启动相关的监听

[root@vrh3 g01]# /g01/11.2.0/grid/bin/srvctl start listener -n vrh3
PRCR-1013 : Failed to start resource ora.CRS_LISTENER.lsnr
PRCR-1064 : Failed to start resource ora.CRS_LISTENER.lsnr on node vrh3
CRS-5010: Update of configuration file "/s01/orabase/product/11.2.0/dbhome_1/network/admin/listener.ora" failed: details at "(:CLSN00014:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2674: Start of 'ora.CRS_LISTENER.lsnr' on 'vrh3' failed
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "clean": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-5013: Agent "/g01/11.2.0/grid/bin/oraagent.bin" failed to start process "/s01/orabase/product/11.2.0/dbhome_1/bin/lsnrctl" for action "check": details at "(:CLSN00008:)" in "/g01/11.2.0/grid/log/vrh3/agent/crsd/oraagent_oracle/oraagent_oracle.log"
CRS-2678: 'ora.CRS_LISTENER.lsnr' on 'vrh3' has experienced an unrecoverable failure
CRS-0267: Human intervention required to resume its availability.
PRCC-1015 : LISTENER was already running on vrh3
PRCR-1004 : Resource ora.LISTENER.lsnr is already running

2.rootcrs.pl脚本运行失败的话,一般重新运行一次即可:

[root@vrh3 bin]# /g01/11.2.0/grid/perl/bin/perl -I/g01/11.2.0/grid/perl/lib
-I/g01/11.2.0/grid/crs/install /g01/11.2.0/grid/crs/install/rootcrs.pl

Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
PRKO-2190 : VIP exists for node vrh3, VIP name vrh3-vip
PRKO-2420 : VIP is already started on node(s): vrh3
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded

3.建议在新增节点上重启crs,并使用cluvfy验证nodeadd顺利完成 :

[root@vrh3 ~]# crsctl stop crs

[root@vrh3 ~]# crsctl start crs

[root@vrh3 ~]# su - grid

[grid@vrh3 ~]$ cluvfy stage -post nodeadd -n vrh1,vrh2,vrh3

Performing post-checks for node addition 

Checking node reachability...
Node reachability check passed from node "vrh1"

Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed

Checking cluster integrity...

Cluster integrity check passed

Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/g01/11.2.0/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Checking node application existence...

Checking existence of VIP node application (required)
VIP node application check passed

Checking existence of NETWORK node application (required)
NETWORK node application check passed

Checking existence of GSD node application (optional)
GSD node application is offline on nodes "vrh3,vrh2,vrh1"

Checking existence of ONS node application (optional)
ONS node application check passed

Checking Single Client Access Name (SCAN)...

Checking TCP connectivity to SCAN Listeners...
TCP connectivity to SCAN Listeners exists on all cluster nodes

Checking name resolution setup for "vrh.cluster.oracle.com"...

ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "vrh.cluster.oracle.com"

ERROR:
PRVF-4657 : Name resolution setup check for "vrh.cluster.oracle.com" (IP address: 192.168.1.190) failed

ERROR:
PRVF-4664 : Found inconsistent name resolution entries for SCAN name "vrh.cluster.oracle.com"

Verification of SCAN VIP and Listener setup failed

User "grid" is not part of "root" group. Check passed

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed

Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed

Oracle Cluster Time Synchronization Services check passed

Post-check for node addition was successful.

Applying 11G R2 GI PSU 11.2.0.2.3

GI PSU 11.2.0.2.3在最近的一次CPU July中被释出,该Patch Set Update包含了最新的CPU,且GI和Database PSU的都包含在其中,可以直接从<Patch 12419353: GI PSU 11.2.0.2.3 (INCLUDES DATABASE PSU 11.2.0.2.3)>页面下载到,该PSU所修复的Bug包括:

CPU molecules in GI PSU 11.2.0.2.3:

GI PSU 11.2.0.2.3 contains the following new CPU 11.2.0.2 molecules:

12586486 - DB-11.2.0.2-MOLECULE-004-CPUJUL2011

12586487 - DB-11.2.0.2-MOLECULE-005-CPUJUL2011

12586488 - DB-11.2.0.2-MOLECULE-006-CPUJUL2011

12586489 - DB-11.2.0.2-MOLECULE-007-CPUJUL2011

12586490 - DB-11.2.0.2-MOLECULE-008-CPUJUL2011

12586491 - DB-11.2.0.2-MOLECULE-009-CPUJUL2011

12586492 - DB-11.2.0.2-MOLECULE-010-CPUJUL2011

12586493 - DB-11.2.0.2-MOLECULE-011-CPUJUL2011

12586494 - DB-11.2.0.2-MOLECULE-012-CPUJUL2011

12586495 - DB-11.2.0.2-MOLECULE-013-CPUJUL2011

12586496 - DB-11.2.0.2-MOLECULE-014-CPUJUL2011

5.2 Bugs Fixed in GI PSU 11.2.0.2.3
GI PSU 11.2.0.2.3 contains all fixes previously released in GI PSU 11.2.0.2.2
(see Section 5.3 for a list of these bug fixes) and the following new fixes:

Note:

ACFS is not supported on HP and therefore the bug fixes for ACFS do not apply to the HP GI PSU 3. 

Automatic Storage Management

6892311 - PROVIDE REASON FOR MOUNT FORCE FAILURE WITHOUT REQUIRING PST DUMP

9078442 - ORA-19762 FROM ASMCMD CP COPYING FILE WITH DIFFERENT BYTE ORDER FROM FILESYSTEM

9572787 - LONG WAITS FOR ENQ: AM CONTENTION FOLLOWING CELL CRASH CAUSED CLUSTERWIDE OUTAGE

9953542 - TB_SOL_SP: HIT 7445 [KFKLCLOSE()+20] ERROR WHEN DG OFFLINE

10040921 - HUNG DATABASE WORKLOAD AND BACKGROUNDS AFTER INDUCING WRITE ERRORS ON AVD VOLUME

10155605 - 11201-OCE:DISABLE FC IN ONE NODE, ASM DISKGOUP FORCE DISMOUNTED IN OTHER NODES.

10278372 - TB:X:CONSISTENTLY PRINT "WARNING: ELAPSED TIME DID NOT ADVANCE" IN ASM ALERT LOG

10310299 - TB:X:LOST WRITES DUE TO RESYNC MISSING EXTENTS WHEN DISK GO OFFLINE DURING REBAL

10324294 - DBMV2: DBFS INSTANCE WAITS MUCH FOR "ASM METADATA FILE OPERATION"

10356782 - DBMV2+: ASM INSTANCE CRASH WITH ORA-600 : [KFCGET0_04], [25],

10367188 - TB:X:REBOOT 2 CELL NODES,ASM FOREGROUND PROCESS HIT ORA-600[KFNSMASTERWAIT01]

10621169 - FORCE DISMOUNT IN ASM RECOVERY MAY DROP REDO'S AND CAUSE METADATA CORRUPTIONS

11065646 - ASM MAY PICK INCORRECT PST WHEN MULTIPLE COPIES EXTANT

11664719 - 11203_ASM_X64:ARB0 STUCK IN DG REBALANCE

11695285 - ORA-15081 I/O WRITE ERROR OCCURED AFTER CELL NODE FAILURE TEST

11707302 - FOUND CORRUPTED ASM FILES AFTER CELL NODES FAILURE TESTING.

11707699 - DATABASE CANNOT MOUNT DUE TO ORA-00214: CONTROL FILE INCONSISTENCY

11800170 - ASM IN KSV WAIT AFTER APPLICATION OF 11.2.0.2 GRID PSU

11800854 - BUG TO TRACK LRG 5135625

12620422 - FAILED TO ONLINE DISKS BECAUSE OF A POSSIBLE RACING RESYNC

Buffer Cache Management

11674485 - LOST DISK WRITE INCORRECTLY SIGNALLED IN STANDBY DATABASE WHEN APPLYING REDO

Generic

9748749 - ORA-7445 [KOXSS2GPAGE]

10082277 - EXCESSIVE ALLOCATION IN PCUR OF "KKSCSADDCHILDNO" CAUSES ORA-4031 ERRORS

10126094 - ORA-600 [KGLLOCKOWNERSLISTDELETE] OR [KGLLOCKOWNERSLISTAPPEND-OVF]

10142788 - APPS 11I PL/SQL NCOMP:ORA-04030: OUT OF PROCESS MEMORY

10258337 - UNUSABLE INDEX SEGMENT NOT REMOVED FOR "ALTER TABLE MOVE"

10378005 - EXPDP RAISES ORA-00600[KOLRARFC: INVALID LOB TYPE], EXP IS SUCCESSFUL

10636231 - HIGH VERSION COUNT FOR INSERT STATEMENTS WITH REASON INST_DRTLD_MISMATCH

12431716 - UNEXPECTED CHANGE IN MUTEX WAIT BEHAVIOUR IN 11.2.0.2.2 PSU (HIGHER CPU POSSIBLE

High Availability

9869401 - REDO TRANSPORT COMPRESSION (RTC) MESSAGES APPEARING IN ALERT LOG

10157249 - CATALOG UPGRADE TO 11.2.0.2 FAILS WITH ORA-1

10193846 - RMAN DUPLICATE FAILS WITH ORA-19755 WHEN BCT FILE OF PRIMARY IS NOT ACCESSIBLE

10648873 - SR11.2.0.3TXN_REGRESS - TRC - KCRFW_REDO_WRITE

11664046 - STBH: WRONG SEQUENCE NUMBER GENERATED AFTER DB SWITCHOVER FROM STBY TO PRIMARY

Oracle Portable ClusterWare

8906163 - PE: NETWORK AND VIP RESOURCES FAIL TO START IN SOLARIS CONTAINERS

9593552 - GIPCCONNECT() IS NOT ASYNC 11.2.0.2GIBTWO

9897335 - TB-ASM: UNNECCESSARY OCR OPERATION LOG MESSAGES IN ASM ALERT LOG WITH ASM OCR

9902536 - LNX64-11202-MESSAGE: EXCESSIVE GNS LOGGING IN CRS ALERT FILE WHEN SELFCHECK FAIL

9916145 - LX64: INTERNAL ERROR IN CRSD.LOG, MISROUTED REQUEST, ASSERT IN CLSM2M.CPP

9916435 - ROOTCRS.PL FAILS TO CREATE NODEAPPS DURING ADD NODE OPERATION

9939306 - SERVICES NOT COMING UP AFTER SWITCHOVER USING SRVCTL START DATABASE

10012319 - ORA-600 [KFDVF_CSS], [19], [542] ON STARTUP OF ASM DURING ADDNODE

10019726 - MEMORY LEAK 1.2MB/HR IN CRSD.BIN ON NON-N NODE

10056713 - LNX64-11202-CSS: SPLIT BRAIN WHEN START CRS STACK IN PARALLEL WITH PRIV NIC DOWN

10103954 - INTERMITTENT "CANNOT COMMUNICATE WITH CRSD DAEMON" ERRORS

10104377 - GIPC ENSURE INITIAL MESSAGE IS NOT LOST DURING ESTABLISH PHASE

10115514 - SOL-X64-11202: CLIENT REGISTER IN GLOBAL GROUP MASTER#DISKMON#GROUP#MX NOT EXIT

10190153 - HPI-SG-11202 ORA.CTSSD AND ORA.CRSD GOES OFFLINE AFTER KILL GIPC ON CRS MASTER

10231906 - 11202-OCE-SYMANTEC:DOWN ONE OF PRIVAE LINKS ON NODE 3,OCSSD CRASHED ON NODE 3

10233811 - AFTER PATCHING GRID HOME, UNABLE TO START RESOURCES DBFS AND GOLDEN

10253630 - TB:X:HANG DETECTED,"WAITING FOR INSTANCE RECOVERY OF GROUP 2" FOR 45 MINUTES

10272615 - TB:X:SHUTDOWN SERVICE CELLD ON 2 CELL NODES,CSSD ABORT IN CLSSNMRCFGMGRTHREAD

10280665 - TB:X:STOP CELLD ON 2 CELL NODES,CSSD ABORT IN CLSSNMVVERIFYPENDINGCONFIGVFS

10299006 - AFTER 11.2.0.2 UPGRADE, ORAAGENT.BIN CONNECTS TO DATABASE WITH TOO MANY SESSIONS

10322157 - 11202_GIBONE: PERM OF FILES UNDER $CH/CRS/SBS CHANGED AFTER PATCHED

10324594 - STATIC ENDPOINT IN THE LEASE BLOCKS OVERWRITTEN DURING UPGRADE

10331452 - SOL-11202-UD: 10205->11202 NETWORK RES USR_ORA_IF VALUE MISSED AFTER UPGRADE

10357258 - SOL-11202-UD: 10205->11202 [IPMP] HUNDREDS OF DUP IP AFTER INTRA-NODE FAILOVER

10361177 - LNX64-11203-GNS: MANY GNS SELF CHECK FAILURE ALERT MESSAGES

10385838 - TB:X:CSS CORE DUMP AT GIPCHAINTERNALSEND

10397652 - AIX-11202-GIPC:DISABLE SWITCH PORT FOR ONE PRIVATE NIC,HAIP DID NOT FAILOVER

10398810 - DOUBLE FREE IN SETUPWORK DUE TO TIMING

10419987 - PEER LISTENER IS ACCESSING A GROCK THAT IS ALREADY DELETED

10621175 - TB_RAC_X64:X: CLSSSCEXIT: CSSD SIGNAL 11 IN THREAD GMDEATHCHECK

10622973 - LOSS OF LEGACY FEATURES IN 11.2

10631693 - TB:X:CLSSNMHANDLEVFDISCOVERACK: NO PENDINGCONFIGURATION TO COMPLETE. CSS ABORT

10637483 - TB:X:REBOOT ONE CELL NODE, CSS ABORT AT CLSSNMVDDISCTHREAD

10637741 - HARD STOP DEPENDENCY CAN CAUSE WRONG FAIL-OVER ORDER

10638381 - 11202-OCE-SYMANTEC: HAIP FAIL TO START WHEN PRIVATE IP IS PLUMBED ON VIRTUAL NIC

11069614 - RDBMS INSTANCE CRASH DUE TO SLOW REAP OF GIPC MESSAGES ON CMT SYSTEMS

11071429 - PORT 11GR2 CRS TO EL6

11654726 - SCAN LISTENER STARTUP FAILS IF /VAR/OPT/ORACLE/LISTENER.ORA EXISTS.

11663339 - DBMV2:SHARED PROCESS SPINNING CAUSES DELAY IN PRIMARY MEMBER CLEANUP

11682409 - RE-USING OCI MEMORY ACROSS CONNECTIONS CAUSES A MEMORY CORRUPTION

11698552 - SRVCTL REPORT WRONG STATUS FOR DATABASE INSTANCE.

11741224 - INCORRECT ACTIVE VERSION CHECK WHILE ENABLING THE BATCH FUNCTIONALITY

11744313 - LNX64-11203-RACG: UNEXPECTED CRSD RESTART DURING PARALLEL STACK START

11775080 - ORA-29701/29702 OCCURS WHEN WORKLOAD TEST RUNNING FOR A LONG TIME AND IS RESTART

11781515 - EVMD/CRSD FAIL TO START AFTER REBOOT, EVEN AFTER CRSCTL START CLUSTERWARE

11807012 - LNX64-11203-RACG: DB SERVICE RUNS INTO "UNKNOWN" STATE AFTER STACK START

11812615 - LNX64-11203-DIT: INCONSISTENT PERMISSION BEFORE/AFTER ROOTCRS.PL -UNLOCK/-PATCH

11828633 - DATABASE SERVICE DID NOT FAIL OVER AND COULD NOT BE STARTED AFTER NODE FAILURE

11840629 - KERNEL CRASH DUMP AND REBOOT FAIL INSIDE SOLARIS CONTAINER

11866171 - ENABLE CRASHDUMP WHEN REBOOTING THE MACHINE (LINUX)

11877079 - HUNDREDS OF ORAAGENT.BIN@HOSTNAME SESSSIONS IN 11.2.0.2 DATABASE

11899801 - 11202_GIBTWO_HPI:AFTER KILL ASM PMON, POLICY AND ADMIN DB RUNNING ON SAME SERVER

11904778 - LNX64-OEL6-11202: CRS STACK CAN'T BE START AFTER RESTART

11933693 - 11.1.0.7 DATABASE INSTANCE TERMINATED BY 11.2.0.2 CRS AGENT

11936945 - CVU NOT RECOGNIZING THE OEL6 ON LINUX

12332919 - ORAAGENT KEEPS EXITING

12340501 - SRVCTL SHOWS INSTANCE AS DOWN AFTER RELOCATION

12340700 - EVMD CONF FILES CAN HAVE WRONG PERMISSIONS AFTER INSTALL

12349848 - LNX64-11203: VIPS FELL OFFLINE WHEN BRING DOWN 3/4 PUBLIC NICS ONE BY ONE

12378938 - THE LISTENER STOPS WHEN THE ORA.NET1.NETWORK'S STATE IS CHANGED TO UNKNOWN

12380213 - 11203_110415:ERROR EXCEPTION WHILE INSTALLATION 11202 DB WITH DATAFILES ON 11203

12399977 - TYPO IN SUB PERFORM_START_SERVICE RETURNS ZERO (SUCCESS) EVEN WHEN FAILED

12677816 - SCAN LISTENER FAILD TO STARTUP IF /VAR/OPT/ORACLE/LISTENER.ORA EXIST

Oracle Space Management

8223165 - ORA-00600 [KTSXTFFS2] AFTER DATABASE STARTUP

9443361 - WRONG RESULTS (ROWDATA) FOR SELECT IN SERIAL FROM COMPRESSED TABLE

10061015 - LNX64-11202:HIT MANY ORA-600 ARGUMENTS: [KTFBHGET:CLSVIOL_KCBGCUR_9] DURING DBCA

10132870 - INDEX BLOCK CORRUPTION - ORA-600 [KCBZPBUF_2], [6401] ON RECOVER

10324526 - ORA-600 [KDDUMMY_BLKCHK] [6106] WHEN UPDATE SUBPARTITION OF TABLE IN TTS

Oracle Transaction Management

10053725 - TS11.2.0.3V3 - TRC - K2GUPDATEGLOBALPREPARECOUNT

10233732 - ORA-600 [K2GUGPC: PTCNT >= TCNT] OCCURS IN A DATABASE LINK TRANSACTION

Oracle Universal Storage Management

9867867 - SUSE10-LNX64-11202:NODE REBOOT HANG WHILE ORACLE_HOME LOCATED ON ACFS

9936659 - LNX64-11202-CRS: ORACLE HOME PUT ON ACFS, DB INST FAILS TO RESTART AFTER CRASH

9942881 - TIGHTEN UP KILL SEMANTICS FOR 'CLEAN' ACTION.

10113899 - AIX KSPRINTTOBUFFER TIMESTAMPS NEEDS TIME SINCE BOOT AND WALL_CLOCK TIMES

10266447 - ROOTUPGRADE.SH FAILS: 'FATAL: MODULE ORACLEOKS NOT FOUND' , ACFS-9121, ACFS-9310

11789566 - ACFS RECOVERY PHASE 2

11804097 - GBM LOCK TAKEN WHEN DETERMINING WHETHER THE FILE SYSTEM IS MOUNTED AND ONLINE

11846686 - ACFSROOT FAILS ON ORACLELINUX-RELEASE-5-6.0.1 RUNNUNG A 2.6.18 KERNEL

12318560 - ALLOW IOS TO RESTART WHEN WRITE ERROR MESG RETURNS SUCCESS

12326246 - ASM TO RETURN DIFF VALUES WHEN OFFLINE MESG UNSUCCESSFUL

12378675 - AIX-11203-HA-ACFS: HIT INVALID ASM BLOCK HEADER WHEN CONFIGURE DG USING AIX LVS

12398567 - ACFS FILE SYSTEM NOT ACCESSIBLE

12545060 - CHOWN OR RM CMD TO LOST+FOUND DIR IN ACFS FAILS ON LINUX

Oracle Utilities

9735282 - GETTING ORA-31693, ORA-2354, ORA-1426 WHEN IMPORTING PARTITIONED TABLE

Oracle Virtual Operating System Services

10317487 - RMAN CONTROLFILE BACKUP FAILS WITH ODM ERROR ORA-17500 AND ORA-245

11651810 - STBH: HIGH HARD PARSING DUE TO FILEOPENBLOCK EATING UP SHARED POOL MEMORY

XML Database

10368698 - PERF ISSUE WITH UPDATE RESOURCE_VIEW DURING AND AFTER UPGRADING TO 11.2.0.2

5.3 Bugs Fixed in GI PSU 11.2.0.2.2
This section describes bugs fixed in the GI PSU 11.2.0.2.2 release.

ACFS

10015603 - KERNEL PANIC IN OKS DRIVER WHEN SHUTDOWING CRS STACK

10178670 - ACFS VOLUMES ARE NOT MOUNTING ONCE RESTARTED THE SERVER

10019796 - FAIL TO GET ENCRYPTION STATUS OF FILES UNTIL DOING ENCR OP FIRST

10029794 - THE DIR CAN'T READ EVEN IF THE DIR IS NOT IN ANY REALM

10056808 - MOUNT ACFS FS FAILED WHEN FS IS FULL

10061534 - DB INSTANCE TERMINATED DUE TO ORA-445 WHEN START INSTANCE ON ALL NODES

10069698 - THE EXISTING FILE COULD CORRUPT IF INPUT INCORRECT PKCS PASSOWRD

10070563 - MULTIPLE WRITES TO THE SAME BLOCK WITH REPLICATION ON CAN GO OUT OF ORDER

10087118 - UNMOUNT PANICS IF ANOTHER USER IS SITTING IN A SNAPSHOT ROOT DIRECTORY

10216878 - REPLI-RELATED RESOURCE FAILED TO FAILOVER WHEN DG DISMOUNTED

10228079 - MOUTING DG ORA-15196 [KFC.C:25316] [ENDIAN_KFBH] AFTER NODE REBOOT

10241696 - FAILED TO MOUNT ACFS FS TO DIRECTORY CREATED ON ANOTHER ACFS FS

10252497 - ADVM/ACFS FAILS TO INSTALL ON SLES10

9861790 - LX64: ADVM DRIVERS HANGING OS DURING ACFS START ATTEMPTS

9906432 - KERNEL PANIC WHILE DISMOUNT ACFS DG FORCE

9975343 - FAIL TO PREPARE SECURITY IF SET ENCRYPTION FIRST ON THE OTHER NODE

10283549 - FIX AIX PANIC AND REMOVE -DAIX_PERF

10283596 - ACFS:KERNEL PANIC DURING USM LABEL PATCHING - ON AIX

10326548 - WRITE-PROTETED ACFS FILES SHOULD NOT BE DELETED BY NON-ROOT USER

ADVM

10045316 - RAC DB INSTALL ON SHARED ACFS HANGS AT LINKING PHASE

10283167 - ASM INSTANCE CANNOT STARTUP DUE TO EXISTENCE OF VMBX PROCESS

10268642 - NODE PANIC FOR BAD TRAP IN "ORACLEADVM" FOR NULL POINTER

10150020 - LINUX HANGS IN ADVM MIRROR RECOVERY, AFTER ASM EVICTIONS

Automatic Storage Management

9788588 - STALENESS REGISTRY MAY GET CLEARED PREMATURELY

10022980 - DISK NOT EXPELLED WHEN COMPACT DISABLED

10040531 - ORA-600 [KFRHTADD01] TRYING TO MOUNT RECO DISKGROUP

10209232 - STBH: DB STUCK WITH A STALE EXTENT MAP AND RESULTS IN DATA CORRUPTIONS

10073683 - ORA-600 [KFCBINITSLOT40] ON ASM ON DBMV2 WITH BP5

9715581 - DBMV2: EXADATA AUTO MANAGEMENT FAILED TO BRING UP DISKS ONLINE

10019218 - ASM DROPPED DISKS BEFORE DISK_REPAIR_TIME EXPIRED

10084145 - DBMV2: ORA-600 [1427] MOUNTING DISKGROUP AFTER ALL CELLS RESTARTED

11067567 - KEPT GENERATING "ELAPSED TIME DID NOT ADVANCE " IN ASM ALERT LOG

10356513 - DISK OFFLINED WITH NON ZERO TIMEOUT EXPELLED IMMEDIATELY

10332589 - TB:X:MOUNT NORMAL REDUNDANCY DG, FAILED WITH ORA-00600:[KFCINITRQ20]

10329146 - MARKING DIFFERENT SR BITS FROM MULTIPLE DBWS CAN CAUSE A LOST WRITE

10299224 - TB:X:PIVOTING AN EXTENT ON AN OFFLINE DISK CAN CAUSE STALE XMAPS IN RDBMS

10245086 - ORA-01210 DURING CREATE TABLESPACE

10230571 - TB:X:REBOOT ONE CELL NODE, RBAL HIT ORA-600[17183]

10228151 - ASM DISKGROUPS NOT GETTING MOUNTED

10227288 - DG FORCIBLY DISMOUNTED AFTER ONE FG LOST DUE TO "COULD NOT READ PST FOR GRP 4"

10222719 - ASM INSTANCE HANGS WITH RBAL PROCESS WAITS ON "NO FREE BUFFER"

10102506 - DISK RESYNC TAKES A LONG TIME EVEN WITH NO STALE EXTENTS

10094201 - DISK OFFLINE IS SLOW

10190642 - ORA-00600: [1433] FOLLOWED BY INSTANCE CRASH WITH ASM ON EXADATA

11067567 - 11202_gibtwo: kept generating "elapsed time did not advance " in asm alert log

Buffer Cache Management

9651350 - ora-00308 and ora-27037 when ora-8103 without event 10736 been set

10110863 - trace files is still generated after applying patch:9651350

10205230 - tb_x64: hit ora-00600: [kclwcrs_6]

10332111 - sql running long in active standby

CRS Group

CLEANUP

9949676 - GNSD.BIN CORE DUMP AFTER KILL ASM PMON ON ALL NODES AT SAME TIME

9975837 - GNS INCORRECTLY PROCESSES IPV6 LOOKUP REQUESTS

10007185 - GNS DUMPS CORE IN CLSKGOPANIC AT CLSKPDVA 717

10028343 - GNS CAN NOT BE RELOCATED AFTER PUBLIC RESTARTED

CRS

9876201 - OHASD AGENT CORE DUMP AT EONSHTTP.C:162

10011084 - 11202 STEP3 MODIFY BINARY AFTER INSTALLATION CANNOT EXCUTE SUCCESSFULLY

10028235 - 'CLSNVIPAGENT.CPP', LINE 1522: ERROR: FORMAL ARGUMENT TYPE OF ...

10045436 - 'ORA.LISTENER.LSNR' FAILED TO BE FENCED OFF DURING CRSD CLEANUP

10062301 - VALUE FOR FIELD 'CLUSTER_NAME' IS MISSING IN CRSCONFIG_PARAMS

10110969 - PORTABILITY ISSUES IN FUNCTION TOLOWER_HOST

10175855 - FAILED TO UGPRADE 11.2.0.1 + ARU 12900951 -> 11.2.0.2

9891341 - CRSD CORE DUMP IN PROATH_MASTER_EXIT_HELPER AT PROATH.C:1834

11655840 - RAC1 DB' STATE_DETAILS IS WRONG AFTER KILL GIPCD

10634513 - OHASD DUMPS CORE WHEN PLUG IN UNPLUGGED PRIVATE NETWORK NIC

10236074 - ASM INSTANCES CRASH SEVERAL TIMES DURING PARALLEL CRS STARTUP

10052529 - DB INST OFFLINE AFTER STOP/START CRS STACK ON ALL NODES IN PARALLEL

10065216 - VIRTUAL MEMORY USAGE OF ORAROOTAGENT IS BIG(1321MB) AND NOT DECREASING

10168006 - ORAAGENT PROCESS MEMORY GROWTH PERIODICALLY.

CSS

9907089 - CSS CORE DUMP DURING EXADATA ROLLING UPGRADE

9926027 - NODE REBOOTED AFTER CRS CLEAN-UP SUCCEEDED 11202 GI + 10205 RAC DB

10014392 - CRSCTL DELETE NODE FAILS WITH CRS-4662 & CRS-4000

10015460 - REMOVAL OF WRONG INCARNATION OF A NODE DUE TO MANUAL SHUTDOWN STATE

10040109 - PMON KILL LEAD TO OS REBOOT

10048027 - ASM UPGRADE FAILS

10052721 - 11201- 11202 NON-ROLLING,CRSCTL.BIN CORE AT CLSSNSQANUM, SIGNAL 11

10083789 - A NODE DOESNT INITIATE A RECONFIG DUE TO INCORRECT RECONFIG STATE

9944978 - FALSE CSS EVICTION AFTER PRIVATE NIC RESUME

9978195 - STOP DB ACTION TIMED OUT AND AGENT EXITS DUE TO FAILURE TO STOP EVENT BRIDGE

10248739 - AFTER APPLY THE PATCH, THE NODE EVICTED DURING START CRS STACK

CVU

9679401 - OUI PREREQ CHECKS FAILED FOR WRONG OWNSHIP OF RESOLV.CONF_`HOST`

9959110 - GNS INTEGRITY PREREQUISITE FAILED WITH PRVF-5213

9979706 - COMP OCR CHECK FAILS TO VERIFY SIZE OF OCR LOCATION

10029900 - CVU PRE NODEADD CHECK VD ERROR

10033106 - ADDNODE.SH SHOULD INDICATE WHAT HAPPENS WHEN ERROR OCCURRING

10075643 - UNABLE TO CONTINUE CONFIG.SH FOR CRS UPGRAD

10083009 - GIPCD FAILS TO RETRIEVE INFORMATION FROM PEERS DUE TO INVALID ENDPOINT

GIPC

9812956 - STATUS OF CRSD AND EVMD GOES INTERMEDIATE FOR EVER WHEN KILL GIPC

9915329 - ORA-600 [603] IN DB AND ORA-603 IN ASM AFTER DOWN INTER-CONNECT NIC

9944948 - START RESOUCE HAIP FAILED WHEN RUN ROOT.SH

9971646 - ORAROOTAGENT CORE DUMPED AT NETWORKHAMAINTHREAD::READROUTEDATA

9974223 - GRID INFRASTRUCTURE NEEDS MULTICAST COMMUNICATION ON 230.0.1.0 ADDRESSES WORKING

10053985 - ERROR IN NETWORK ADDRESS ON SOLARIS 11

10057680 - OHASD ORAROOTAGENT.BIN SPIN CPU AFTER SIMULATE ASM DISK ERROR

10078086 - ROOTUPGRADE.SH FAIL FOR 'CRSCTL STARTUPGRADE' FAIL,10205-> 11202

10260251 - GRID INSTALLATION FAILS TO START HAIP DUE TO CHANGE IN NETWORK INTERFACE NAME

10111010 - CRSD HANGS FOR THE HANAME OF PEER CRSD

11782423 - OHASD.BIN TAKES CPU ABOUT 95% ~ 100%

11077756 - STARTUP FAILURE OF HAIP CAUSES INSTALLATION FAILURE

10375649 - DISABLE HAIP ON PRIMECLUSTER

10284828 - INTERFACE UPDATES GET LOST DURING BOUNCE OF CRSD PROCESS

10284693 - AIX EPIPE FAILURE

10233159 - NEED 20 MINS TO STARTUP CRS WHEN 1/2 GIPC NICS DOWN

10128191 - LRGSRG9 AND LRGSRGE FAILURE

GNS

9864003 - NODE REBOOT DUE TO 'ORA.GNS' FAILED TO BE FENCED OFF DURING CRSD

GPNP

9336825 - GPNPD FLUSH PROFILE PUSH ERROR MESSAGES IN CRS ALERT LOG

10314123 - GPNPD MAY NOT UPDATE PROFILE TO LATEST ON START

10105195 - PROC-32 ACCESSING OCR; CRS DOES NOT COME UP ON NODE

10205290 - DBCA FAILED WITH ERROR ORA-00132

10376847 - [ORA.CRF] [START] ERROR = ERROR 9 ENCOUNTERED WHEN CONNECTING TO MOND

IPD-OS

9812970 - IPD DO NOT MARK TYPE OF DISKS USED FOR VOTING DISK CORRECTLY

10057296 - IPD SPLIT BRAIN AFTER CHANGE BDB LOCATION

10069541 - IPD SPLIT BRAIN AFTER STOPPING ORA.CRF ON MASTER NODE

10071992 - UNREASONABLE VALUES FOR DISK STATISTICS

10072474 - A NODE IS NOT MONITORED AFTER STOP AND START THE ORA.CRF ON IT

10073075 - INVALID DATA RECEIVED FROM THE CLUSTER LOGGER SERVI

10107380 - IPD NOT STARTED DUE TO SCRFOSM_GET_IDS FAILED

OCR

9978765 - ROOTUPGRADE.SH HANG AND CRSD CRASHED ON OTHER NODES,10205-> 11202

10016083 - 'OCRCONFIG -ADD' NEEDS HELPFUL MESSAGE FOR ERROR ORA-15221

OPSM

9918485 - EMCONFIG FAIL WITH NULLPOINTEREXCEPTION AT RACTRANSFERCORE.JAVA

10018215 - RACONE DOES NOT SHUTDOWN INSTANCE DURING RELOCATION

10042143 - ORECORE11 LWSFDSEV CAUSED SEGV IN SRVM NATIVE METHODS

OTHERS

9963327 - CHMOD.PL GETS CALLED INSTEAD OF CHMOD.EXE

10008467 - FAILS DUE TO WRONG VERSION OF PERL USED:

10015210 - OCTSSD LEAK MEMORY 1.7M HR ON PE MASTER DURING 23 HOURS RUNNI

10027079 - CRS_SHUTDOWN_SYNCH EVENT NOT SENT IN SIHA

10028637 - SCLS.C COMPILE ERRORS ON AIX UNDECLARED IDENTIFIERS

10029119 - 11201-11202 CRS UPGRADE OUI ASKS TO RUN ROOTUPGRADE.SH

10036834 - PATCHES NOT FOUND ERROR WHILE UPGRADING GRID FROM 11201 TO 11202

10038791 - HAS SRG SRV GETTING MANY DIFS FOR AIX ON LABEL 100810 AND LATER

10040647 - LNX64-112022-UD; AQ AND RLB DO NOT WORK AFTER UPGRADING FROM 11201

10044622 - EVMD FAILED TO START AFTER KILL OHASD.BIN

10048487 - DIAGCOLLECTION CANNOT RETRIEVE IPD REPORTS

10073372 - DEINSTALL FAILED TO DELETE CRS_HOME ON REMOTE NODE IF OCR VD ON NFS

10089120 - WRONG PROMPT MESSAGE BY DEINSTALL COMMAND WHILE DELETING CRS HOME

10124517 - CRS STACK DOES NOT START AUTOMATICALLY AFTER NODE REBOOT

10157622 - 11.2.0.2 GI BUNDLE 1 HAS-CRS TRACKING BUG

RACG

10036193 - STANDBY NIC DOESN'T WORK IF DOWN PUBLIC NIC

10146768 - NETWORK RESOURCE FAILS TO START WITH IPMP ON SOLARIS 11

USM Miscellaneous

10146744 - ORA.REGISTRY.ACFS BECOME UNKOWN AND ACFS FS DISMOUNT

10283058 - RESOURCES ACFS NEEDS AN OPTION TO DISALLOW THE MOUNTING OF FILE SYSTEMS ON RESOURCE START

10193581 - ROOT.SH CRS-2674: START OF 'ORA.REGISTRY.ACFS' FAIL

10244210 - FAIL TO INSTALL ADVM/ACFS ON SOLARIS CONTAINER

10311856 - APPLY ASSERTION FAILURE:PBOARDENTRY>USRGBOARDRECENTRY_RECORD

Generic

9591812 - incorrect wait events in 11.2 ("cursor: mutex s" instead of "cursor: mutex x")

9905049 - ebr: ora-00600: internal error code, arguments: [kqlhdlod-bad-base-objn]

10052141 - exadata database crash with ora-7445 [_wordcopy_bwd_dest_aligned] and ora-600 [2

10052956 - ora-7445 [kjtdq()+176]

10157402 - lob segment has null data after long to lob conversion in parallel mode

10187168 - obsolete parent cursors if version count exceeds a threshold

10217802 - alter user rename raises ora-4030

10229719 - qrmp:12.2:ora-07445 while performing complete database import on solaris sparc

10264680 - incorrect version_number reported after patch for 10187168 applied

10411618 - add different wait schemes for mutex waits

11069199 - ora-600 [kksobsoletecursor:invalid stub] quering pq when pq is disabled

11818335 - additional changes when wait schemes for mutex waits is disabled

High Availability

10018789 - dbmv2-bigbh:spin in kgllock caused db hung and high library cache lock

10129643 - appsst gsi11g m9000: ksim generic wait event

10170431 - ctwr consuming lots of cpu cycles

Oracle Space Management

6523037 - et11.1dl: ora-600 [kddummy_blkchk] [6110] on update

9724970 - pdml fails with ora-600 [4511]. ora-600 [kdblkcheckerror] by block check

10218814 - dbmv2: ora-00600:[3020] data block corruption on standby

10219576 - ora-600 [ktsl_allocate_disp-fragment]

Oracle Transaction Management

10358019 - invalid results from flashback_transaction_query after applying patch:10322043

Oracle Utilities

10373381 - ora-600 [kkpo_rcinfo_defstg:objnotfound] after rerunning catupgrd.sql

Oracle Virtual Operating System Services

10127360 - dg4msql size increasing to 1.5gb after procedure executed 250 times

Server Manageability

11699057 - ora-00001: unique constraint (sys.wri$_sqlset_plans_tocap_pk) violated

针对Grid Infrastructure的大致升级步骤如下,具体见该PSU的readme文档,见(6 Appendix A: Manual Steps for Apply/Rollback Patch)
部分:

1.升级Opatch,Patch 12419353 requires OPatch version 11.2.0.1.4.
可以从这里下载到11.2上最新的opatch工具,解压后直接覆盖原Opatch目录即可

2.关闭相关的RAC数据库  

<ORACLE_HOME>/bin/srvctl stop database –d <db-unique-name> 

3.Umount所有的ACFS文件系统

4. 切换到oracle用户,停止所有节点上的相关资源
su  - oracle
$<ORACLE_HOME>/bin/srvctl stop home -o <ORACLE_HOME> -s <status file location> -n <node name>

5.在所有节点上运行rootcrs.pl脚本停止GI相关资源
su - root
<GI_HOME>/crs/install/rootcrs.pl -unlock
Using configuration parameter file: /g01/11.2.0/grid/crs/install/crsconfig_params
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.crsd' on 'vrh1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'vrh1'
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'vrh1' has completed
CRS-2677: Stop of 'ora.crsd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'vrh1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.evmd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.asm' on 'vrh1'
CRS-2677: Stop of 'ora.asm' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'vrh1'
CRS-2677: Stop of 'ora.evmd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'vrh1'
CRS-2677: Stop of 'ora.cssd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'vrh1'
CRS-2673: Attempting to stop 'ora.diskmon' on 'vrh1'
CRS-2677: Stop of 'ora.diskmon' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'vrh1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'vrh1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'vrh1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'vrh1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'vrh1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully unlock /g01/11.2.0/grid

6.解压下载的PSU补丁包
unzip p12419353_112020_Linux-x86-64.zip 

7.
在所有节点上实施对GI的Patch

 <GI_HOME>/OPatch/opatch napply -oh <GI_HOME> -local /tmp/12419353

Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

UTIL session

Oracle Home       : /g01/11.2.0/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.6
OUI version       : 11.2.0.2.0
Log file location : /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-08-12_00-07-39AM.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   12419353  

Do you want to proceed? [y|n]

Backing up files...
Applying interim patch '12419353' to OH '/g01/11.2.0/grid'

Patching component oracle.crs, 11.2.0.2.0...
Copying file to "/g01/11.2.0/grid/crs/install/crsconfig_lib.pm"
Copying file to "/g01/11.2.0/grid/crs/install/crspatch.pm"
Copying file to "/g01/11.2.0/grid/crs/install/s_crsconfig_lib.pm"

Patching component oracle.usm, 11.2.0.2.0...
Patches 12419353 successfully applied.
Log file location: /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-08-12_00-07-39AM.log

OPatch succeeded.

<GI_HOME>/OPatch/opatch napply -oh <GI_HOME> -local /tmp/12419331

Invoking OPatch 11.2.0.1.6

Oracle Interim Patch Installer version 11.2.0.1.6
Copyright (c) 2011, Oracle Corporation.  All rights reserved.

UTIL session

Oracle Home       : /g01/11.2.0/grid
Central Inventory : /g01/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.6
OUI version       : 11.2.0.2.0
Log file location : /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-08-12_00-10-46AM.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   12419331  

Backing up files...
Applying interim patch '12419331' to OH '/g01/11.2.0/grid'
ApplySession: Optional component(s) [ oracle.sysman.console.db, 11.2.0.2.0 ] ,
[ oracle.sysman.oms.core, 10.2.0.4.3 ] , [ oracle.rdbms.dv, 11.2.0.2.0 ] ,
[ oracle.sysman.plugin.db.main.repository, 11.2.0.2.0 ]  not present in the Oracle Home or a higher version is found.

Patching component oracle.rdbms.rsf, 11.2.0.2.0...

Patching component oracle.rdbms, 11.2.0.2.0...
Copying file to "/g01/11.2.0/grid/psu/11.2.0.2.3/catpsu.sql"
Copying file to "/g01/11.2.0/grid/psu/11.2.0.2.3/catpsu_rollback.sql"
Copying file to "/g01/11.2.0/grid/cpu/scripts/patch_8837510.sql"
Copying file to "/g01/11.2.0/grid/cpu/scripts/emdb_recomp_invalids.sql"

Patching component oracle.ldap.rsf, 11.2.0.2.0...

Patching component oracle.rdbms.dbscripts, 11.2.0.2.0...

Patching component oracle.rdbms.rman, 11.2.0.2.0...
Patches 12419331 successfully applied.
Log file location: /g01/11.2.0/grid/cfgtoollogs/opatch/opatch2011-08-12_00-10-46AM.log

OPatch succeeded.

后续对DB HOME升级的步骤可以参照以下文档摘录 

8.Run the pre script for DB component of the patch.

As the database home owner execute:

$<UNZIPPED_PATCH_LOCATION>/12419353/custom/server/12419353/custom/scripts/prepatch.sh -dbhome <ORACLE_HOME>

9.Apply the DB patch.

As the database home owner execute:

$<ORACLE_HOME>/OPatch/opatch napply -oh <ORACLE_HOME> -local <UNZIPPED_PATCH_LOCATION>/12419353/custom/server/12419353
$<ORACLE_HOME>/OPatch/opatch napply -oh <ORACLE_HOME> -local <UNZIPPED_PATCH_LOCATION>/12419331

10.Run the post script for DB component of the patch.

As the database home owner execute:

$<UNZIPPED_PATCH_LOCATION>/12419353/custom/server/12419353/custom/scripts/postpatch.sh -dbhome <ORACLE_HOME>

11.Run the post script.

As the root user execute:

#<GI_HOME>/rdbms/install/rootadd_rdbms.sh
If this is a GI Home, as the root user execute:

#<GI_HOME>/crs/install/rootcrs.pl -patch
If this is an Oracle Restart Home, as the root user execute:

#<GI_HOME>/crs/install/roothas.pl -patch

12.Start the CRS managed resources that were earlier running from DB homes.

If this is a GI Home environment, as the database home owner execute:

$<ORACLE_HOME>/bin/srvctl start home -o <ORACLE_HOME> -s <status file location> -n <node name>
If this is an Oracle Restart Home environment, as the database home owner execute:

$<ORACLE_HOME>/bin/srvctl start home -o <ORACLE_HOME> -s <status file location> 

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

客户的一套11.2.0.1 RAC系统采用ASM diskgroup 存放ocr和votedisk,该REG diskgroup中的某个LUN disk由于硬件的原因损坏了,导致冗余的votedisk表决磁盘有一个处于OFFLINE状态,客户希望能删除该OFFLINE的votedisk并新增一个可用的。

在删除该votedisk文件时出现了CRS-4258的错误,错误如下:

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  5b3380d6367e4f94bf19e9db5f2f684e ()  []
 2. ONLINE   6802e6d139354fb3bf95725dd01a02fd (/dev/ocr2) [REG]
 3. ONLINE   a433d51ebd2d4facbfc8e95b017f5393 (/dev/asm-disk1) [REG]
 4. ONLINE   3784d344bffa4f6ebff21c4dd3c873bd (/dev/asm-disk2) [REG]
Located 4 voting disk(s).

crsctl delete css votedisk 5b3380d6367e4f94bf19e9db5f2f684e
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

居然无法移除ASM存储上的voting files,太搞笑了。

客户在MOS上找到了CRS-4258相关问题的Note:

CRS-4258: Addition and Deletion of Voting Files are not Allowed Because the Voting Files are on ASM in 11gR2 [ID 1060146.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.1 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM in 11gR2.


Changes
Stale voting files are seen after accidently dropping one of ASM disks belonging to the ASM diskgroup where voting files are stored.
And CRS-4258 occurs when trying to delete the stale voting files using crsctl delete css votedisk FUID.

[root@grid]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5b91aad0a2184f3dbfa8f970e8ae4d49 (/dev/oracleasm/disks/ASM10) [PLAY]
2. ONLINE 53b1b40b73164f9ebf3f498f6d460187 (/dev/oracleasm/disks/ASM9) [PLAY]
3. OFFLINE 82dfd04b96f14f6dbf36f5a62b118f61 () []

[root@grid]# crsctl delete css votedisk 82dfd04b96f14f6dbf36f5a62b118f61
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM
Cause

1. Seeing stale voting files is due to bug 9024611.

2. "delete" command is not available , only "replace" command is available when voting files are stored on  ASM diskgroup.    

    Please see Oracle Clusterware Administration and Deployment Guide11g Release 2 (11.2)

Solution


1. This issue is permanently fixed in 11.2.0.2.0.

2. Apply patch 9024611. Please contact Oracle support if this patch is not available on your platform.

3. If CSS has stale voting files even after applying patch 9024611, do the following workaround -

WORKAROUND:
Do something to trigger ASM to try to relocate the voting file.

e.g)  $ crsctl replace votedisk  +asm_disk_group   --- Put available ASM diskgroup

        $ crsctl query css votedisk         --- Check if voting files are all online on the new ASM diskgroup
        $ crsctl replace votedisk +PLAY    -- Put the original ASM diskgroup where voting files were 

4. If the workaround above cannot be followed for any reason then you can request the patch for unpublished bug 9409327 for your platform.

References
BUG:9294664 - NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE

Hdr: 9294664 11.2.0.1 PCW 11.2.0.1 ADMUTL PRODID-5 PORTID-226 9024611
Abstract: NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE

PROBLEM:
--------
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).


[root@sdc-drrac01 grid]# crsctl delete css votedisk 
a9b785a59c3c4f67bf15babc67ffb79a
CRS-4258: Addition and deletion of voting files are not allowed because the 
voting files are on ASM

DIAGNOSTIC ANALYSIS:
--------------------
Ct is performing some voting disk failover scenarios in which he has removed 
the 2 votedisk which were on ASM buy drop disk using asmlib and after that 
recreating the disk again and start the cluster in exclusive mode and start 
the ASM and mount the diskgourp So that rebalancing has been done but after 
that

 crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).

and not able to drop the vote disk which is offiline 

WORKAROUND:
-----------
n/a

RELATED BUGS:
-------------
as per bug 9024611 tried the workaround:

but while running 

crsctl css votedisk delete 

we got syntax error and found that there is no command with crsctl css ...

这个Note说明不能移除ASM存储内voting files的问题在11.2.0.2.0上已经解决了,也可以通过安装one-off patch 9024611来修复。

但是实际在11.2.0.2上测试可以发现仍旧无法删除ASM上的voting files:

root@rh2 ~]# crsctl query crs  releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.2.0]

[root@rh2 ~]# crsctl query crs  activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]


[grid@rh2 ~]$ /s01/grid/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.


Oracle Home       : /s01/grid
Central Inventory : /s01/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/grid/oui
Log file location : /s01/grid/cfgtoollogs/opatch/opatch2011-08-04_18-50-34PM.log

Patch history file: /s01/grid/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /s01/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-08-04_18-50-34PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.


There are no Interim patches installed in this Oracle Home.


Rac system comprising of multiple nodes
  Local node = rh2
  Remote node = rh3

--------------------------------------------------------------------------------

OPatch succeeded.


[root@rh2 ~]# crsctl delete css votedisk a433d51ebd2d4facbfc8e95b017f5393

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

又是一个伪修复的Bug….!!

无法,寄希望与replace能解决问题,结果发现:

crsctl replace votedisk +DATA
Failed to create voting files on disk group DATA.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.

方法四中指出的unpublished bug 9409327(Patch 9409327: OFFLINE VF ENTRY REMAINS AFTER PATCH FOR BUG 9024611),目前仅在IBM AIX on POWER Systems (64-bit)的11.2.0.1上有对应的补丁。

利用UDEV服务解决RAC ASM存储设备名

<Why ASMLIB and why not?>我们介绍了使用ASMLIB作为一种专门为Oracle Automatic Storage Management特性设计的内核支持库(kernel support library)的优缺点,同时建议使用成熟的UDEV方案来替代ASMLIB。

这里我们就给出配置UDEV的具体步骤,还是比较简单的:

1.确认在所有RAC节点上已经安装了必要的UDEV包

[root@rh2 ~]# rpm -qa|grep udev
udev-095-14.21.el5

2.通过scsi_id获取设备的块设备的唯一标识名,假设系统上已有LUN sdc-sdp

for i in c d e f g h i j k l m n o p ;
do
echo "sd$i" "`scsi_id -g -u -s /block/sd$i` ";
done

sdc 1IET_00010001
sdd 1IET_00010002
sde 1IET_00010003
sdf 1IET_00010004
sdg 1IET_00010005
sdh 1IET_00010006
sdi 1IET_00010007
sdj 1IET_00010008
sdk 1IET_00010009
sdl 1IET_0001000a
sdm 1IET_0001000b
sdn 1IET_0001000c
sdo 1IET_0001000d
sdp 1IET_0001000e 

以上列出于块设备名对应的唯一标识名

3.创建必要的UDEV配置文件,

首先切换到配置文件目录

[root@rh2 ~]# cd /etc/udev/rules.d

定义必要的规则配置文件

[root@rh2 rules.d]# touch 99-oracle-asmdevices.rules 

[root@rh2 rules.d]# cat 99-oracle-asmdevices.rules
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010001", NAME="ocr1", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010002", NAME="ocr2", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010003", NAME="asm-disk1",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010004", NAME="asm-disk2",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010005", NAME="asm-disk3",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010006", NAME="asm-disk4",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010007", NAME="asm-disk5",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010008", NAME="asm-disk6",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010009", NAME="asm-disk7",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000a", NAME="asm-disk8",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000b", NAME="asm-disk9",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000c", NAME="asm-disk10", OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000d", NAME="asm-disk11", OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000e", NAME="asm-disk12", OWNER="grid",  GROUP="asmadmin", MODE="0660"

Result 为/sbin/scsi_id -g -u -s %p的输出--Match the returned string of the last PROGRAM call. This key may be
used in any following rule after a PROGRAM call.
按顺序填入刚才获取的唯一标识名即可

OWNER为安装Grid Infrastructure的用户,在11gr2中一般为grid,GROUP为asmadmin
MODE采用0660即可

NAME为UDEV映射后的设备名,
建议为OCR和VOTE DISK创建独立的DISKGROUP,为了容易区分将该DISKGROUP专用的设备命名为ocr1..ocrn的形式
其余磁盘可以根据其实际用途或磁盘组名来命名

4.将该规则文件拷贝到其他节点上
[root@rh2 rules.d]# scp 99-oracle-asmdevices.rules Other_node:/etc/udev/rules.d

5.在所有节点上启动udev服务,或者重启服务器即可

[root@rh2 rules.d]# /sbin/udevcontrol reload_rules
[root@rh2 rules.d]# /sbin/start_udev
Starting udev:                                            [  OK  ]

6.检查设备是否到位

[root@rh2 rules.d]# cd /dev
[root@rh2 dev]# ls -l ocr*
brw-rw---- 1 grid asmadmin 8, 32 Jul 10 17:31 ocr1
brw-rw---- 1 grid asmadmin 8, 48 Jul 10 17:31 ocr2

[root@rh2 dev]# ls -l asm-disk*
brw-rw---- 1 grid asmadmin 8,  64 Jul 10 17:31 asm-disk1
brw-rw---- 1 grid asmadmin 8, 208 Jul 10 17:31 asm-disk10
brw-rw---- 1 grid asmadmin 8, 224 Jul 10 17:31 asm-disk11
brw-rw---- 1 grid asmadmin 8, 240 Jul 10 17:31 asm-disk12
brw-rw---- 1 grid asmadmin 8,  80 Jul 10 17:31 asm-disk2
brw-rw---- 1 grid asmadmin 8,  96 Jul 10 17:31 asm-disk3
brw-rw---- 1 grid asmadmin 8, 112 Jul 10 17:31 asm-disk4
brw-rw---- 1 grid asmadmin 8, 128 Jul 10 17:31 asm-disk5
brw-rw---- 1 grid asmadmin 8, 144 Jul 10 17:31 asm-disk6
brw-rw---- 1 grid asmadmin 8, 160 Jul 10 17:31 asm-disk7
brw-rw---- 1 grid asmadmin 8, 176 Jul 10 17:31 asm-disk8
brw-rw---- 1 grid asmadmin 8, 192 Jul 10 17:31 asm-disk9

crsctl status resource -t -init in 11.2.0.2 grid infrastructure

11.2.0.2的grid infrastructure中crsctl stat res 命令不再显示如ora.cssd、ora.ctssd、ora.diskmon等基础资源的信息,如果用户想要了解这些resource状态需要加上-init选项:

[grid@rh2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]

[grid@rh2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rh2
ora.LISTENER.lsnr
               OFFLINE OFFLINE      rh2
ora.asm
               ONLINE  ONLINE       rh2
ora.gsd
               OFFLINE OFFLINE      rh2
ora.net1.network
               ONLINE  ONLINE       rh2
ora.ons
               ONLINE  ONLINE       rh2
ora.registry.acfs
               OFFLINE OFFLINE      rh2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        OFFLINE OFFLINE
ora.cvu
      1        OFFLINE OFFLINE
ora.dw.db
      1        OFFLINE OFFLINE
ora.maclean.db
      1        OFFLINE OFFLINE
ora.oc4j
      1        OFFLINE OFFLINE
ora.prod.db
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.prod.maclean_pre_preconnect.svc
      1        OFFLINE OFFLINE
ora.prod.maclean_taf.svc
      1        OFFLINE OFFLINE
      2        OFFLINE OFFLINE
ora.rh2.vip
      1        OFFLINE OFFLINE
ora.rh3.vip
      1        OFFLINE OFFLINE
ora.scan1.vip
      1        OFFLINE OFFLINE                                       

[grid@rh2 ~]$ crsctl stat res -t -init 
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rh2                      Started
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rh2
ora.crf
      1        ONLINE  ONLINE       rh2
ora.crsd
      1        ONLINE  ONLINE       rh2
ora.cssd
      1        ONLINE  ONLINE       rh2
ora.cssdmonitor
      1        ONLINE  ONLINE       rh2
ora.ctssd
      1        ONLINE  ONLINE       rh2                      OBSERVER
ora.diskmon
      1        ONLINE  ONLINE       rh2
ora.drivers.acfs
      1        ONLINE  OFFLINE
ora.evmd
      1        ONLINE  ONLINE       rh2
ora.gipcd
      1        ONLINE  ONLINE       rh2
ora.gpnpd
      1        ONLINE  ONLINE       rh2
ora.mdnsd
      1        ONLINE  ONLINE       rh2

此外在11.2.0.2的grid中当我们想启动、停止、修改这些init资源时都需要加上-init选项,否则将出现CRS-2613: Could not find resource错误:

[grid@rh2 ~]$ crsctl stat res ora.asm
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl modify res ora.asm -attr AUTO_START=never

[grid@rh2 ~]$ crsctl stat res ora.asm -p
NAME=ora.asm
TYPE=ora.asm.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME%.ASM%CRS_CSS_NODENUMBER%.asm
AUTO_START=never
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=asm) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)
DEGREE=1
DESCRIPTION=Oracle ASM resource
ENABLED=1
GEN_USR_ORA_INST_NAME=
GEN_USR_ORA_INST_NAME@SERVERNAME(rh2)=+ASM1
GEN_USR_ORA_INST_NAME@SERVERNAME(rh3)=+ASM2
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=weak(ora.LISTENER.lsnr)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=
STOP_TIMEOUT=600
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_INST_NAME=+ASM%CRS_CSS_NODENUMBER%
USR_ORA_OPEN_MODE=mount
USR_ORA_OPI=false
USR_ORA_STOP_MODE=immediate
VERSION=11.2.0.2.0

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13
ora.asm
ora.cluster_interconnect.haip
ora.crf
ora.crsd
ora.cssd
ora.cssdmonitor
ora.ctssd
ora.diskmon
ora.drivers.acfs
ora.evmd
ora.gipcd
ora.gpnpd
ora.mdnsd

[grid@rh2 ~]$ crsctl status resource  -init -t|grep -v ONLINE|tail -13|xargs crsctl status resource
CRS-2613: Could not find resource 'ora.cluster_interconnect.haip'.
CRS-2613: Could not find resource 'ora.crf'.
CRS-2613: Could not find resource 'ora.crsd'.
CRS-2613: Could not find resource 'ora.cssd'.
CRS-2613: Could not find resource 'ora.cssdmonitor'.
CRS-2613: Could not find resource 'ora.ctssd'.
CRS-2613: Could not find resource 'ora.diskmon'.
CRS-2613: Could not find resource 'ora.drivers.acfs'.
CRS-2613: Could not find resource 'ora.evmd'.
CRS-2613: Could not find resource 'ora.gipcd'.
CRS-2613: Could not find resource 'ora.gpnpd'.
CRS-2613: Could not find resource 'ora.mdnsd'.
NAME=ora.asm
TYPE=ora.asm.type
TARGET=ONLINE
STATE=ONLINE on rh2

[grid@rh2 ~]$ crsctl status res ora.crsd -init -p
NAME=ora.crsd
TYPE=ora.crs.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=AGENT=1,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=1,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DAEMON_TRACING_LEVELS=AGENT=0,AGFW=0,CLSFRAME=0,CLSVER=0,CLUCLS=0,COMMCRS=0,COMMNS=0,CRSAPP=0,CRSCCL=0,CRSCEVT=0,CRSCOMM=0,CRSD=0,CRSEVT=0,CRSMAIN=0,CRSOCR=0,CRSPE=0,CRSPLACE=0,CRSRES=0,CRSRPT=0,CRSRTI=0,CRSSE=0,CRSSEC=0,CRSTIMER=0,CRSUI=0,CSSCLNT=0,SuiteTes=0,UiServer=0,OCRAPI=1,OCRCLI=1,OCRSRV=1,OCRMAS=1,OCRMSG=1,OCRCAC=1,OCRRAW=1,OCRUTL=1,OCROSD=1,OCRASM=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CRSD"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)pullup(ora.asm,ora.cssd,ora.ctssd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(shutdown:ora.asm,intermediate:ora.cssd,intermediate:ora.gipcd)
STOP_MODE=NONE
STOP_TIMEOUT=43200
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[grid@rh2 ~]$ crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65   
CRS-0245:  User doesn't have enough privilege to perform the operation
CRS-4000: Command Modify failed, or completed with errors.

/* 修改某些资源的属性要求root权限 */

[root@rh2 ~]# crsctl modify res ora.crsd -init -attr "SCRIPT_TIMEOUT"=65 

[root@rh2 ~]# crsctl status res ora.crsd -init -p|grep SCRIPT_TIMEOUT
SCRIPT_TIMEOUT=65

[root@rh2 ~]# crsctl status res ora.ctssd -p -init
NAME=ora.ctssd
TYPE=ora.ctss.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DAEMON_TRACING_LEVELS=CLUCLS=0,CSSCLNT=0,CRSCCL=1,CTSS=5,OCRAPI=1,OCRCLI=1,OCRMSG=1
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Ctss Agents"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=hard(ora.cssd,ora.gipcd)pullup(ora.cssd,ora.gipcd)
START_TIMEOUT=60
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(ora.cssd,ora.gipcd)
STOP_TIMEOUT=60
UPTIME_THRESHOLD=1m
USR_ORA_ENV=

[root@rh2 ~]# crsctl status res ora.diskmon -p -init
NAME=ora.diskmon
TYPE=ora.diskmon.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=never
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=3
CHECK_TIMEOUT=30
CLEAN_ARGS=
CLEAN_COMMAND=
DAEMON_LOGGING_LEVELS=
DAEMON_TRACING_LEVELS=
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for Diskmon"
DETACHED=true
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=10
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.cssd)pullup:always(ora.cssd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=
STOP_TIMEOUT=60
UPTIME_THRESHOLD=5s
USR_ORA_ENV=ORACLE_USER=grid
VERSION=11.2.0.2.0

[root@rh2 ~]# crsctl status res ora.cssd -init -p
NAME=ora.cssd
TYPE=ora.cssd.type
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/cssdagent%CRS_EXE_SUFFIX%
AGENT_HB_INTERVAL=0
AGENT_HB_MISCOUNT=10
AUTO_START=always
CARDINALITY=1
CHECK_ARGS=
CHECK_COMMAND=
CHECK_INTERVAL=30
CLEAN_ARGS=abort
CLEAN_COMMAND=
CSSD_MODE=
CSSD_PATH=%CRS_HOME%/bin/ocssd%CRS_EXE_SUFFIX%
CSS_USER=grid
DAEMON_LOGGING_LEVELS=CSSD=2,GIPCNM=2,GIPCGM=2,GIPCCM=2,CLSF=0,SKGFD=0,GPNP=1,OLR=0
DAEMON_TRACING_LEVELS=CSSD=0,GIPCNM=0,GIPCGM=0,GIPCCM=0,CLSF=0,SKGFD=0,GPNP=0,OLR=0
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for CSSD"
DETACHED=true
ENABLED=1
ENV_OPTS=
FAILOVER_DELAY=0
FAILURE_INTERVAL=3
FAILURE_THRESHOLD=5
HOSTING_MEMBERS=
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
OMON_INITRATE=1000
OMON_POLLRATE=500
ORA_OPROCD_MODE=
ORA_VERSION=11.2.0.2.0
PID_FILE=
PLACEMENT=balanced
PROCD_TIMEOUT=1000
PROCESS_TO_MONITOR=
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=3
SCRIPT_TIMEOUT=600
SERVER_POOLS=
START_ARGS=
START_COMMAND=
START_DEPENDENCIES=weak(concurrent:ora.diskmon)hard(ora.cssdmonitor,ora.gpnpd,ora.gipcd)pullup(ora.gpnpd,ora.gipcd)
START_TIMEOUT=600
STATE_CHANGE_TEMPLATE=
STOP_ARGS=
STOP_COMMAND=
STOP_DEPENDENCIES=hard(intermediate:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor)
STOP_TIMEOUT=900
UPTIME_THRESHOLD=1m
USR_ORA_ENV=
VMON_INITLIMIT=16
VMON_INITRATE=500
VMON_POLLRATE=500

How to recover from root.sh on 11.2 Grid Infrastructure Failed

从10g的clusterware到11g Release2的Grid Infrastructure,Oracle往RAC这个框架里塞进了太多东西。虽然照着Step by Step Installation指南步步为营地去安装11.2.0.1的GI,但在实际执行root.sh脚本的时候,不免又要出现这样那样的错误。例如下面的一例:

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 

The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 20:43:13: Parsing the host name
2011-03-28 20:43:13: Checking for super user privileges
2011-03-28 20:43:13: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting

ADVM/ACFS is not supported on oraclelinux-release-5-5.0.2

一个节点上的root.sh脚本运行居然提示说ADVM/ACFS不支持OEL 5.5,但实际上Redhat 5或者OEL 5是目前仅有的少数支持ACFS的平台(The ACFS install would be on a supported Linux release – either Oracle Enterprise Linux 5 or Red Hat 5)。

检索Metalink发现这是一个Linux平台上的Bug 9474252: ‘ACFSLOAD START’ RETURNS “ADVM/ACFS IS NOT SUPPORTED ON DHL-RELEASE-…”

因为以上Not Supported错误信息在另外一个节点(也是Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)) 运行root.sh脚本时并未出现,那么一般只要找出2个节点间的差异就可能解决问题了:

未出错节点上release相关rpm包的情况

[maclean@rh6 tmp]$ cat /etc/issue
Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)
Kernel \r on an \m

[maclean@rh6 tmp]$ rpm -qa|grep release
enterprise-release-notes-5Server-17
enterprise-release-5-0.0.22

出错节点上release相关rpm包的情况

[root@rh3 tmp]# rpm -qa | grep release
oraclelinux-release-5-5.0.2
enterprise-release-5-0.0.22
enterprise-release-notes-5Server-17

以上可以看到相比起没有出错的节点,出错节点上多安装了一个名为oraclelinux-release-5-5.0.2的rpm包,我们尝试来卸载该rpm是否能解决问题,补充实际上该问题也可以通过修改/tmp/.linux_release文件的内容为enterprise-release-5-0.0.17来解决,而无需如我们这里做的卸载名为oraclelinux-release-5*的rpm软件包:

[root@rh3 install]# rpm -e oraclelinux-release-5-5.0.2

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 20:57:21: Parsing the host name
2011-03-28 20:57:21: Checking for super user privileges
2011-03-28 20:57:21: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
CRS is already configured on this node for crshome=0
Cannot configure two CRS instances on the same cluster.
Please deconfigure before proceeding with the configuration of new home.

再次在失败节点上运行root.sh,被提示告知需要首先deconfigure然后才能再次配置。在官方文档<Oracle Grid Infrastructure Installation Guide 11g Release 2>中介绍了如何反向配置11g release 2中的Grid Infrastructure(Deconfiguring Oracle Clusterware Without Removing Binaries):

/* 同为管理Grid Infra所以仍需要root用户来执行以下操作 */

[root@rh3 grid]# pwd
/u01/app/11.2.0/grid

/* 目前位于GRID_HOME目录下  */

[root@rh3 grid]# cd crs/install

/* 以-deconfig选项执行一个名为rootcrs.pl的脚本 */

[root@rh3 install]# ./rootcrs.pl -deconfig
2011-03-28 21:03:05: Parsing the host name
2011-03-28 21:03:05: Checking for super user privileges
2011-03-28 21:03:05: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:rh3
VIP exists.: //192.168.1.105/255.255.255.0/eth0
VIP exists.:rh6
VIP exists.: //192.168.1.103/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 20796, multicast IP address 234.227.83.81, listening port 2016
Please confirm that you intend to remove the VIPs rh3 (y/[n]) y
ACFS-9200: Supported
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rh3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh3'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh3'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rh3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rh3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rh3'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rh3'
CRS-2677: Stop of 'ora.cssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rh3'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rh3'
CRS-2677: Stop of 'ora.gipcd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'rh3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

/* 如果以上deconfig操作未能成功反向配置那么可以以-FORCE选项执行rootcrs.pl脚本 */

[root@rh3 install]# ./rootcrs.pl -deconfig -force
2011-03-28 21:41:00: Parsing the host name
2011-03-28 21:41:00: Checking for super user privileges
2011-03-28 21:41:00: User has super user privileges
Using configuration parameter file: ./crsconfig_params
VIP exists.:rh3
VIP exists.: //192.168.1.105/255.255.255.0/eth0
VIP exists.:rh6
VIP exists.: //192.168.1.103/255.255.255.0/eth0
GSD exists.
ONS daemon exists. Local port 6100, remote port 6200
eONS daemon exists. Multicast port 20796, multicast IP address 234.227.83.81, listening port 2016
ACFS-9200: Supported
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.crsd' on 'rh3'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rh3'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'rh3'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'rh3' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rh3' has completed
CRS-2677: Stop of 'ora.crsd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rh3'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rh3'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'rh3'
CRS-2673: Attempting to stop 'ora.ctssd' on 'rh3'
CRS-2673: Attempting to stop 'ora.evmd' on 'rh3'
CRS-2677: Stop of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.evmd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'rh3'
CRS-2677: Stop of 'ora.cssd' on 'rh3' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rh3'
CRS-2673: Attempting to stop 'ora.gipcd' on 'rh3'
CRS-2677: Stop of 'ora.gipcd' on 'rh3' succeeded
CRS-2677: Stop of 'ora.diskmon' on 'rh3' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rh3' has completed
CRS-4133: Oracle High Availability Services has been stopped.
Successfully deconfigured Oracle clusterware stack on this node

/* 所幸以上这招总是能够奏效,否则岂不是每次都要完全卸载后重新安装GI? */

顺利完成以上反向配置CRS后,就可以再次尝试运行多灾多难的root.sh了:

[root@rh3 grid]# pwd
/u01/app/11.2.0/grid

[root@rh3 grid]# ./root.sh
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= maclean
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2011-03-28 21:07:29: Parsing the host name
2011-03-28 21:07:29: Checking for super user privileges
2011-03-28 21:07:29: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
CRS-4123: Oracle High Availability Services has been started.
ohasd is starting
FATAL: Module oracleoks not found.
FATAL: Module oracleadvm not found.
FATAL: Module oracleacfs not found.
acfsroot: ACFS-9121: Failed to detect /dev/asm/.asm_ctl_spec.

acfsroot: ACFS-9310: ADVM/ACFS installation failed.

acfsroot: ACFS-9311: not all components were detected after the installation.

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rh6, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
CRS-2672: Attempting to start 'ora.mdnsd' on 'rh3'
CRS-2676: Start of 'ora.mdnsd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'rh3'
CRS-2676: Start of 'ora.gipcd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rh3'
CRS-2676: Start of 'ora.gpnpd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rh3'
CRS-2676: Start of 'ora.cssdmonitor' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rh3'
CRS-2672: Attempting to start 'ora.diskmon' on 'rh3'
CRS-2676: Start of 'ora.diskmon' on 'rh3' succeeded
CRS-2676: Start of 'ora.cssd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rh3'
CRS-2676: Start of 'ora.ctssd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rh3'
CRS-2676: Start of 'ora.crsd' on 'rh3' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'rh3'
CRS-2676: Start of 'ora.evmd' on 'rh3' succeeded
/u01/app/11.2.0/grid/bin/srvctl start vip -i rh3 ... failed
Preparing packages for installation...
cvuqdisk-1.0.7-1
Configure Oracle Grid Infrastructure for a Cluster ... failed
Updating inventory properties for clusterware
Starting Oracle Universal Installer...

Checking swap space: must be greater than 500 MB.   Actual 5023 MB    Passed
The inventory pointer is located at /etc/oraInst.loc
The inventory is located at /s01/oraInventory
'UpdateNodeList' was successful.

以上虽然绕过了”ADVM/ACFS is not supported”的问题,但又出现了”FATAL: Module oracleoks/oracleadvm/oracleacfs not found”,Linux下ACFS/ADVM相关加载Module无法找到的问题,查了下metalink发现这是GI 11.2.0.2中2个被确认的bug 10252497bug 10266447,而实际我所安装的是11.2.0.1版本的GI…….. 好了,所幸我目前的环境是使用NFS的存储,所以如ADVM/ACFS这些存储选项的问题可以忽略不计,准备在11.2.0.2上再测试下。

不得不说11.2.0.1版本GI的安装存在太多的问题,以至于Oracle Support不得不撰写了不少相关故障诊断的文档,例如:<Troubleshooting 11.2 Grid Infastructure Installation Root.sh Issues [ID 1053970.1]>,<How to Proceed from Failed 11gR2 Grid Infrastructure (CRS) Installation [ID 942166.1]>。目前为止还没体验过11.2.0.2的GI,希望它不像上一个版本那么糟糕!

了解更多关于11gR2 diskmon

 

下图显示了有ohasd管理的所有资源(resource)/守护进程(daemons)的依赖关系:

 

 

Diskmon

 

Master diskmon

• Monitors CELLSRVs and Network using Heartbeats
• Propagates Cell/Network state to ASM/RDBMS processes (dskm)
• Maintains a cluster-wide global view of Cells with other DISKMONs in the cluster
• Accepts fencing requests from CRS and delivers them to the Cells
• Accepts intradatabase IORM plans from RDBMS and sends them to the Cells
• Provides communication with the cells

 

 

Diskmon daemon用以监控Exadata 中的cell server,即只有在Exadata环境中才有用。但是在版本11.2.0.1-11.2.0.2,即便是非Exadata环境也会默认启动该守护进 程。  在版本11.2.0.3 中 改进了这一细节,非Exadata环境无发启动diskmon了。

11.2.0.3 Grid Infrastructure diskmon Will be Offline by Default in Non-Exadata Environment

What is being announced?

As Grid Infrastructure daemon diskmon.bin is used for Exadata fencing, started from 11.2.0.3, resource ora.diskmon will be offline in non-Exadata environment. This is expected behaviour change.

Prior to 11.2.0.3:

ps -ef| grep diskmon.bin
grid      3361  3166  0 22:57 ?        00:00:00 /ocw/grid/bin/diskmon.bin -d -f

On 11.2.0.3:

ps -ef| grep diskmon.bin

>> no more diskmon.bin

 

 

一些diskmon进程的日志:

 

[ CSSD]2009-07-27 10:27:36.419 [20] >TRACE: kgzf_dskm_conn4: unable to connect to master
diskmon in 60174 msec

[ CSSD]2009-07-27 10:27:36.419 [20] >TRACE: kgzf_send_main1: connection to master diskmon
timed out

[ CSSD]2009-07-27 10:27:36.421 [22] >TRACE: KGZF: Fatal diskmon condition, IO fencing is
not available. For additional error info look at the master diskmon log file (diskmon.log)

[ CSSD]2009-07-27 10:27:36.421 [22] >ERROR: ASSERT clsssc.c 2471
[ CSSD]2009-07-27 10:27:36.421 [22] >ERROR: clssscSAGEInitFenceCompl: Fence completion
failed, rc 56859

It seems that the new process registered with Oracle Clusterware diskmon is not able to communicate properly .

setsid: failed with -1/1
dskm_getenv_oracle_user: calling getpwnam_r for user oracle
dskm_getenv_oracle_user: info for user oracle complete
dskm_set_user: unable to change ownership for the log directory
/optware/oracle/11.1.0.7/crs/log/shplab01/diskmon to user oracle, id 1101, errno 1
07/27/09 10:27:37: Master Diskmon starting

The tusc output of the cssd log gives the following information
...
1248953770.528145 [/optware/ora][20944]{2992772}
unlink("/var/spool/sockets/pwgr/client20944") ERR#2 ENOENT
1248953770.612485 [/optware/ora][20944]{2992772}
unlink("/tmp/.oracle_master_diskmon") ERR#1 EPERM
1248953770.649479 [/optware/ora][20944]{2992772}
unlink("/tmp/.oracle_master_diskmon") ERR#1 EPERM
1248953770.656719 [/optware/ora][20944]{2992772}
unlink("/var/spool/sockets/pwgr/client20944") ERR#1 EPERM
...

 There is a permission error of the file /tmp/.oracle_master_diskmon.
Solution

The resolution is to change the permissions of the file  /tmp/.oracle_master_diskmon, which should be owned by oracle . 

diskmon.log
============
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.6]{0}:
(14036 -> 12265) SKGXP_CHECK_HEART_BEAT_RESP_EXPIRE: NO PATH to Monitor
entry: 0x17161490
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.7]{0}:
  Subnet: 0
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.8]{0}:
   Remote endpoint [192.168.10.3/44538] is DOWN
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] SKGXP:[386927568.9]{0}:
   Local endpoint [192.168.10.1/45530] is UP
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512]
SKGXP:[386927568.10]{0}: SKGXP_DO_HEART_BEAT_RESP: Matching Monitor Entry Not
Found
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512]
SKGXP:[386927568.11]{0}:   SKGXPGPID Internet address 192.168.10.3 RDS port
number 44538
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512] dskm_hb_thrd_main11:
got status change
2011-12-01 22:14:49.510: [ DISKMON][14036:1093384512]
dskm_ant_rsc_monitor_start: rscnam: o/192.168.10.3 rsc: 0x171609c0 state:
UNREACHABLE reconn_attempts: 0 last_reconn_ts: 1322773921
2011-12-01 22:14:49.649: [ DISKMON][14036:1093384512]
dskm_node_guids_are_offline: query SM done. retcode = 56891(REACHABLE)
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512] dskm_oss_get_net_info5:
oss_get_net_info for device o/192.168.10.3 returned skgxpid
040302010001894cb5afca0419ed706ae92f000008000000000000000000000001030000c0a80a
03000000000000000000000000adfa00000000000016000000 and the following 1 ip
adresess. known_reid: Yes
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]     192.168.10.1
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]
dskm_ant_rsc_monitor_start6.5:Cell does support TCP monitor, and does support
SM Query, cell incarnation is 1, guid num is 2
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512] GUID-0 =
0x0021280001a0af15
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512] GUID-1 =
0x0021280001a0af16
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]
dskm_ant_rsc_monitor_start2: Connected to Same_Inc OSS device: o/192.168.10.3
numIP: 1
2011-12-01 22:14:49.657: [ DISKMON][14036:1093384512]     192.168.10.1

2011-12-01 22:15:07.501: [ DISKMON][14036:1108523328] dskm_slave_thrd_main3:
peer disconnected
2011-12-01 22:15:07.501: [ DISKMON][14036:1108523328] dskm_slave_thrd_main5:
client +ASM1/ASM/15374 disconnected, reid
cid=14e1b2b4de58ef1eff5487b58dccc906,icin=188142332,nmn=1,lnid=188142332,gid=7
,gin=1,gmn=0,umemid=0,opid=8,opsn=1,lvl=process hdr=0x       0

2011-12-01 22:15:08.440: [ CSSCLNT]clsssRecvMsg: got a disconnect from the
server while waiting for message type 1
2011-12-01 22:15:08.440: [ CSSCLNT]clssgsGroupGetStatus:  communications
failed (0/3/-1)

2011-12-01 22:15:08.440: [ CSSCLNT]clssgsGroupGetStatus: returning 8

2011-12-01 22:15:08.440: [ DISKMON][14036:1102219584] CRITICAL: Diskmon
exiting: dskm_rac_thrd_main10: Diskmon is shutting down due to CSSD ABORT
event
2011-12-01 22:15:08.440: [ DISKMON][14036:1102219584] SHUTDOWN FORCE due to
CSSD ABORT
2011-12-01 22:15:08.440: [ DISKMON][14036:1102219584] dskm_rac_thrd_main:
exiting
2011-12-01 22:15:08.754: [ DISKMON][14036:1104320832] dskm_slave_thrd_main5:
client orarootagent/13701 disconnected, reid
cid=DUMMY,icin=-1,nmn=-1,lnid=-1,gid=-1,gin=-1,gmn=-1,umemid=-1,opid=-1,opsn=-
1,lvl=process hdr=0xfece0100
2011-12-01 22:15:09.988: [ DISKMON][14036:1191118288] dskm_cleanup_thrds:
cleaning up the rac event handling thread tid 1102219584
[ DISKMON][13016]

I/O Fencing and SKGXP HA monitoring daemon -- Version 1.2.0.0
Process 13016 started on 2011-12-01 at 22:15:39.863

2011-12-01 22:15:39.867: [ DISKMON][13016] dskm main: starting up

ocssd.log
==========
2011-12-01 22:15:04.223: [    CSSD][1127139648]clssgmmkLocalKillThread: Time
up. Timeout 60500 Start time 369099698 End time 369160198 Current time
369160198
2011-12-01 22:15:04.223: [    CSSD][1127139648]clssgmmkLocalKillResults:
Replying to kill request from remote node 2 kill id 1 Success map 0x00000000
Fail map 0x00000000
2011-12-01 22:15:04.224: [GIPCHAUP][1094015296] gipchaUpperProcessDisconnect:
processing DISCONNECT for hendp 0x2aa5550 [00000000000092e5] { gipchaEndpoint
: port 'nm2_gts-cluster/af9c-724c-2e3f-3946', peer
'gts1db02:205f-3cac-025e-c962', srcCid 00000000-000092e5,  dstCid
00000000-000009d9, numSend 0, maxSend 100, usrFlags 0x4000, flags 0x204 }
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmeventhndlr:
Disconnecting endp 0x932d ninf 0x1c3a2c0
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmDiscHelper: gts1db02,
node(2) connection failed, endp (0x932d), probe(0x3000000000), ninf->endp
0x932d
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmDiscHelper: node 2 clean
up, endp (0x932d), init state 3, cur state 3
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcInternalDissociate: obj
0x2e99290 [000000000000932d] { gipcEndpoint : localAddr
'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 } not associated with any container, ret gipcretFail (1)
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcDissociateF
[clssnmDiscHelper : clssnm.c : 3284]: EXCEPTION[ ret gipcretFail (1) ]  
failed to dissociate obj 0x2e99290 [000000000000932d] { gipcEndpoint :
localAddr 'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 }, flags 0x0
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcInternalDissociate: obj
0x2e99290 [000000000000932d] { gipcEndpoint : localAddr
'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 } not associated with any container, ret gipcretFail (1)
2011-12-01 22:15:04.224: [GIPCXCPT][1122408768] gipcDissociateF
[clssnmDiscHelper : clssnm.c : 3430]: EXCEPTION[ ret gipcretFail (1) ]  
failed to dissociate obj 0x2e99290 [000000000000932d] { gipcEndpoint :
localAddr 'gipcha://gts1db01:nm2_gts-cluster/af9c-724c-2e3f-394', remoteAddr
'gipcha://gts1db02:205f-3cac-025e-c96', numPend 0, numReady 0, numDone 0,
numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x13860e, usrFlags
0x0 }, flags 0x0
2011-12-01 22:15:04.224: [    CSSD][1122408768]clssnmDiscEndp: gipcDestroy
0x932d
2011-12-01 22:15:04.603: [    
CSSD][1104976192](:CSSNM00005:)clssnmvDiskKillCheck: Aborting, evicted by
node gts1db02, number 2, sync 188142334, stamp 393990918
2011-12-01 22:15:04.603: [    
CSSD][1104976192]###################################
2011-12-01 22:15:04.603: [    CSSD][1104976192]clssscExit: CSSD aborting from
thread clssnmvKillBlockThread
2011-12-01 22:15:04.603: [    
CSSD][1104976192]###################################
2011-12-01 22:15:04.603: [    CSSD][1104976192](:CSSSC00012:)clssscExit: A
fatal error occurred and the CSS daemon is terminating abnormally
gts1db01, number 1, has experienced a failure in thread number 10 and is
shutting down
2011-12-01 22:15:04.603: [    CSSD][1104976192]clssscExit: Starting CRSD
cleanup

2011-12-01 22:15:04.737: [    CSSD][1103399232]clssgmDiscEndpcl: gipcDestroy
0xa2ea2f7
2011-12-01 22:15:04.925: [    
CSSD][1112947008](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
3472942430 ms for voting file o/192.168.10.5/DBFS_DG_CD_04_gts1cel03)
2011-12-01 22:15:04.925: [    CSSD][1112947008]clssnmvDiskAvailabilityChange:
voting file o/192.168.10.5/DBFS_DG_CD_04_gts1cel03 now offline
2011-12-01 22:15:04.925: [    
CSSD][1112947008](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
3472942450 ms for voting file o/192.168.10.4/DBFS_DG_CD_02_gts1cel02)
2011-12-01 22:15:04.925: [    CSSD][1112947008]clssnmvDiskAvailabilityChange:
voting file o/192.168.10.4/DBFS_DG_CD_02_gts1cel02 now offline
2011-12-01 22:15:04.925: [    
CSSD][1112947008](:CSSNM00058:)clssnmvDiskCheck: No I/O completions for
3472942480 ms for voting file o/192.168.10.3/DBFS_DG_CD_02_gts1cel01)
2011-12-01 22:15:04.926: [    CSSD][1112947008]clssnmvDiskAvailabilityChange:
voting file o/192.168.10.3/DBFS_DG_CD_02_gts1cel01 now offline
2011-12-01 22:15:04.926: [    
CSSD][1112947008](:CSSNM00018:)clssnmvDiskCheck: Aborting, 0 of 3 configured
voting disks available, need 2
2011-12-01 22:15:04.926: [    CSSD][1112947008]clssscExit: abort already set
1
2011-12-01 22:15:04.926: [   SKGFD][1109793088]Lib :OSS:: closing handle
0x2538e70 for disk :o/192.168.10.5/DBFS_DG_CD_04_gts1cel03:

2011-12-01 22:15:04.926: [   SKGFD][1098676544]Lib :OSS:: closing handle
0x2aaaac0d7cb0 for disk :o/192.168.10.3/DBFS_DG_CD_02_gts1cel01:

Heartbeat timeout logic may to fail to detect dead cells if diskmon
has been running for over 40 days.

Rediscovery Notes:
 If diskmon has been running for over 40 days and DB processes start to hang
 after a cell death, you may have hit this bug.

 All nodes may hang due to one of the heartbeat threads in diskmon
getting stuck trying to notify the instance(s) that it reconnected
to the cell. However if this occurs there is insufficient diagnostic
data collected to help confirm why the hang occurred.
This fix is a diagnostic enhancement for this scenario.

If diskmon/DSKM processes are hung/stuck this fix may help collect
additional useful diagnostics.

PROBLEM:
--------
Diskmon logs fill up quickly causing their disk / volume /u00  to become full

DIAGNOSTIC ANALYSIS:
--------------------
1)  The following are consistently and repeatedly logged:

2010-12-17 03:22:25.848: [ DISKMON][17796:1089268032] dskm_rac_ini13: calling
clssgsqgrp
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_rac_ini80: called
clssgsqgrp:
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_dump_group_priv:
vers: 0 flags: 0x0 confInc: 0 My confInc: 0
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_dump_group_priv:
CSS Msg Hdr: vers: 0 type: UNKNOWN (0) chunks: NO MORE CHUNKS (0) transport:
UNKNOWN (0) mSize: 0
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_dump_group_priv:
Group Private Data is not of type DSKM_MSG_SS_REQ. Not proceeding with msg
dump
2010-12-17 03:22:25.849: [ DISKMON][17796:1089268032] dskm_rac_ini15: Found
my member number 1 to be busy. Waiting (attempts: 598) for OCSSD to clean up
previous incarnation

2) Core files generated, and many  stack dumps of diskmon futher enlarges the
diskmon.log

   The following is frequently seen in the diskmon.log

2010-12-17 03:22:28.855: [ DISKMON][17796:1089268032] dskm_rac_ini16: OCSSD
has notified that another diskmon is currently running in this node.
This might be a duplicate startup. If not consult OCSSD log for additional
information.
2010-12-17 03:22:28.855: [ DISKMON][17796] INCIDENT : EXCEPTION (SIGNAL: 6)
in [gsignal()]
2010-12-17 03:22:28.855: [ DISKMON][17796] Thread 1089268032 got exception 6
2010-12-17 03:22:28.855: [ DISKMON][17796] Stack dump for thread 1089268032
[ DISKMON][17796]
....

沪ICP备14014813号-2

沪公网安备 31010802001379号