节点重启后ORACLE ASM启动失败 遇到KFOD-00313, KFOD-00310: Cluster services error [Unknown error in CSS]错误

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638    QQ号:47079569    邮箱:service@parnassusdata.com

 

 

适用于:

Oracle Database – 企业版 – 版本11.2.0.4及以后
Linux x86-64

症状

Oracle Linux上的3-节RAC, 节重启后,oracle asm没有出现,不能定位表决文件:

//alertxxxx03.log
2015-12-08 13:58:37.755:
[cssd(31576)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rwpcomp03/cssd/ocssd.log
2015-12-08 13:58:41.082:
[ohasd(22290)]CRS-2765:Resource ‘ora.cssdmonitor’ has failed on server ‘xxxx03’.
2015-12-08 13:58:41.329:
[ohasd(22290)]CRS-2878:Failed to restart resource ‘ora.cssd’
2015-12-08 13:58:42.762:
[cssd(32313)]CRS-1713:CSSD daemon is started in clustered mode
2015-12-08 13:58:42.858:
[cssd(32313)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rwpcomp03/cssd/ocssd.log
2015-12-08 13:58:57.875:
[cssd(32313)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rwpcomp03/cssd/ocssd.log

 

变化

突然重启节03时,所有节点都启动并运行,接着是节 04 和 05, 现在不能重启群集。

//ocssd .log
5-12-08 12:12:01.082: [   SKGFD][2906195712]Handle 0x7f258808e5f0 from lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: for disk :ORCL:OCRVOTE2:

2015-12-08 12:12:01.083: [   SKGFD][2906195712]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x7f258808e5f0 for disk :ORCL:OCRVOTE2:

2015-12-08 12:12:01.418: [    CSSD][2918246144]clssscSelect: cookie accept request 0x7f2598084600
2015-12-08 12:12:01.418: [    CSSD][2918246144]clssscevtypSHRCON: getting client with cmproc 0x7f2598084600
2015-12-08 12:12:01.418: [    CSSD][2918246144]clssgmRegisterClient: proc(4/0x7f2598084600), client(419/0x7f259807c8f0)
2015-12-08 12:12:01.418: [    CSSD][2918246144]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7f2598084600) client(0x7f259807c8f0)
2015-12-08 12:12:01.418: [    CSSD][2918246144]clssgmDiscEndpcl: gipcDestroy 0x46b8
2015-12-08 12:12:02.084: [   SKGFD][2906195712]ERROR: -14(asmlib /opt/oracle/extapi/64/asm/orcl/1/libasm.so version failed with 2
)
2015-12-08 12:12:02.084: [    CLSF][2906195712]Allocated CLSF context
2015-12-08 12:12:02.084: [    CSSD][2906195712]clssnmvGetDiskHandle: Unable to open disk ORCL:OCRVOTE2
2015-12-08 12:12:02.084: [    CSSD][2906195712]clssnmlio_opthr:failed to open ORCL:OCRVOTE2
2015-12-08 12:12:02.085: [    CSSD][2934007552]clssnmRemoveNodeInTerm: node 2,  terminated due to Normal Shutdown. Removing from member and connected bitmaps
2015-12-08 12:12:02.085: [    CSSD][2934007552]###################################
2015-12-08 12:12:02.085: [    CSSD][2934007552]clssscExit: CSSD signal 11 in thread Main
2015-12-08 12:12:02.085: [    CSSD][2934007552]###################################

 

原因

在os 信息中发现下列行:

然后因为内核更新,oracle-ohasd 主要过程(14805)由TERM 信号终止。

解决方法

保证asm 存储配置是正确的,asm头是有效的,在所有三个节上可以发现所有的磁盘:

===========================================================================================

下面是crsctl 从所有节查询css votedisk 的输出,节03有一个问题,不能返回任何输出:

[oracle@xxxx03 ~]$ /u01/app/11.2.0/grid/bin/crsctl query css votedisk
[oracle@xxxx03 ~]$
节 04 和05 为表决文件显示了正确的输出:

[oracle@xxxx04 etc]$ /u01/app/11.2.0/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
—  —–    —————–                ——— ———
1. ONLINE   5658baa966104f4ebf959e1bba090d2b (ORCL:OCRVOTE1) [OCRVOTE]
2. ONLINE   8a57fa3cffae4f21bf44e9c834f15efe (ORCL:OCRVOTE2) [OCRVOTE]
3. ONLINE   67b5f53c43894f31bfccc7e94374f617 (ORCL:OCRVOTE3) [OCRVOTE]
4. ONLINE   afc631cc261c4fc1bf6f6729b0383087 (ORCL:OCRVOTE4) [OCRVOTE]
5. ONLINE   11c5294e8b124fdfbf770326342775f2 (ORCL:OCRVOTE5) [OCRVOTE]
Located 5 voting disk(s).

 

[oracle@xxxx04 etc]$ ssh xxxxx05
Last login: Tue Dec  8 19:28:02 2015 from xxxx04.mtn.co.rw
[oracle@xxxx05 ~]$  /u01/app/11.2.0/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
—  —–    —————–                ——— ———
1. ONLINE   5658baa966104f4ebf959e1bba090d2b (ORCL:OCRVOTE1) [OCRVOTE]
2. ONLINE   8a57fa3cffae4f21bf44e9c834f15efe (ORCL:OCRVOTE2) [OCRVOTE]
3. ONLINE   67b5f53c43894f31bfccc7e94374f617 (ORCL:OCRVOTE3) [OCRVOTE]
4. ONLINE   afc631cc261c4fc1bf6f6729b0383087 (ORCL:OCRVOTE4) [OCRVOTE]
5. ONLINE   11c5294e8b124fdfbf770326342775f2 (ORCL:OCRVOTE5) [OCRVOTE]
Located 5 voting disk(s).
[oracle@xxxx05 ~]$
kfod 在三个节上显示了所有完好的磁盘, asm-diskstring
kfod status=TRUE asm_diskstring=’/dev/oracleasm/disks/*’ disks=ALL
但是,显示磁盘之后,节03还是发出下面的错误:

KFOD-00313: No ASM instances available. CSS group services were successfully initilized by kgxgncin
KFOD-00310: Cluster services error [Unknown error in CSS]
KFOD-00310: Cluster services error [Unknown error in CSS]

所有权和 /etc/oracle 上的限制所示 775根:oinstall ,节之间是连贯的。

确信物理表决文件在磁盘每个”How to find the Physical Location of the Voting File and OCR on ASM”上 (Doc ID 1051453.1)

$ kfed read /dev/oracleasm/disks/OCRVOTE1
$ kfed read /dev/oracleasm/disks/OCRVOTE2
$ kfed read /dev/oracleasm/disks/OCRVOTE3
$ kfed read /dev/oracleasm/disks/OCRVOTE4
$ kfed read /dev/oracleasm/disks/OCRVOTE5

lines kfdhdb.vfstart and kfdhdb.vfend did not have ‘zeros’.
使用下列kfod 实用工具句法,能够缩小访问底层设备的问题:

oracle@xxxx03 $ /u01/app/11.2.0/grid/bin/kfod disks=all

KFOD-00313
KFOD-003310
KFOD-003311: error scanning device /dev/raw/rawctl
ORA-15025: could not open disk “dev/raw/rawctl”
Linux-x86_64 Error: 13: Permission deniced
Additional information: 42
Additional information: 22627403
Additional information: 1598903119

在asm alert.log 中,比较节03在/var/log/信息中关闭的时间, 在os 信息中发现了下列行:

Dec 8 09:18:02 xxxx03 yum[30069]: Installed: kernel-uek-firmware-3.8.13-68.el6uek.20903396.noarch
Dec 8 09:18:10 xxxx03 yum[30069]: Installed: kernel-uek-3.8.13-68.el6uek.20903396.x86_64

Then because of kernel updates oracle-ohasd main process (14805) killed by TERM signal.

对于 ASM, Oracle Linux内核更新变化必须被还原。

 

 

 

 

 

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号