存在这种可能性,即ORACLE ASM在add disk扩盘时add disk操作正常完成,disk group的rebalance其实还没有开始,但是由于新加入的disk存在硬件故障,导致add disk后写入到disk header的所有metadata元数据全部丢失,且由于diskgroup是外部冗余即EXTERNAL REdundancy所以该diskgroup由于已经加入了一个DISK,而该DISK上的metadata全部丢失的缘故,所以该diskgroup 将无法正常MOUNT。
且由于新加入的disk上的所有metadata都丢失了,而不仅仅是丢失了disk header的KFBTYP_DISKHEAD,所以还不能是仅仅将KFBTYP_DISKHEAD的信息通过kfed merge其他的disk信息并做修改来还原,其需要通过特殊的手工处理才能绕过该问题。
如下面的例子:
SUCCESS: diskgroup TESTDG03 was created
NOTE: cache deleting context for group TESTDG03 1/0x86485c30
NOTE: cache registered group TESTDG03 number=1 incarn=0xab385c36
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0xab385c36
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:21:07 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 92 for pid 20, osid 20176
Thu Jan 29 08:21:07 2015
NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02
NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03
NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04
NOTE: cache mounting (first) external redundancy group 1/0xAB385C36 (TESTDG03)
NOTE: cache recovered group 1 to fcn 0.0
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Jan 29 08:21:07 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR found thread 1 closed at ABA 0.10750
NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR opening thread 1 at fcn 0.0 ABA 2.0
NOTE: setting 11.2 start ABA for group TESTDG03 thread 1 to 2.0
NOTE: cache mounting group 1/0xAB385C36 (TESTDG03) succeeded
NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0xab385c36
GMON querying group 1 at 93 for pid 13, osid 4612
Thu Jan 29 08:21:07 2015
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup TESTDG03 was mounted
SUCCESS: CREATE DISKGROUP TESTDG03 EXTERNAL REDUNDANCY DISK '/oracleasm/asm-disk01' SIZE 129500M ,
'/oracleasm/asm-disk02' SIZE 128800M ,
'/oracleasm/asm-disk03' SIZE 129200M ,
'/oracleasm/asm-disk04' SIZE 128800M ATTRIBUTE 'compatible.asm'='11.2.0.0.0','au_size'='1M' /* ASMCA */
Thu Jan 29 08:21:07 2015
NOTE: diskgroup resource ora.TESTDG03.dg is online
NOTE: diskgroup resource ora.TESTDG03.dg is updated
Thu Jan 29 08:21:23 2015
SQL> alter diskgroup testdg03 add disk '/oracleasm/asm-disk06'
ORA-15032: not all alterations performed
ORA-15260: permission denied on ASM disk group
ERROR: alter diskgroup testdg03 add disk '/oracleasm/asm-disk06'
Thu Jan 29 08:21:31 2015
SQL> alter diskgroup testdg03 add disk '/oracleasm/asm-disk06'
NOTE: Assigning number (1,4) to disk (/oracleasm/asm-disk06)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk TESTDG03_0004
NOTE: requesting all-instance disk validation for group=1
Thu Jan 29 08:21:32 2015
NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance.
NOTE: initiating PST update: grp = 1
Thu Jan 29 08:21:32 2015
GMON updating group 1 at 94 for pid 21, osid 22706
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xab385c36 (TESTDG03)
GMON querying group 1 at 95 for pid 13, osid 4612
NOTE: cache opening disk 4 of grp 1: TESTDG03_0004 path:/oracleasm/asm-disk06
GMON querying group 1 at 96 for pid 13, osid 4612
SUCCESS: refreshed membership for 1/0xab385c36 (TESTDG03)
SUCCESS: alter diskgroup testdg03 add disk '/oracleasm/asm-disk06'
NOTE: Attempting voting file refresh on diskgroup TESTDG03
Thu Jan 29 08:22:09 2015
SQL> alter diskgroup testdg03 dismount
NOTE: cache dismounting (clean) group 1/0xAB385C36 (TESTDG03)
NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: oracle@mlab2.oracle.com (TNS V1-V3)
Thu Jan 29 08:22:10 2015
NOTE: LGWR doing clean dismount of group 1 (TESTDG03)
NOTE: LGWR closing thread 1 of diskgroup 1 (TESTDG03) at ABA 2.15
NOTE: cache dismounted group 1/0xAB385C36 (TESTDG03)
Thu Jan 29 08:22:10 2015
GMON dismounting group 1 at 97 for pid 21, osid 22730
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
SUCCESS: diskgroup TESTDG03 was dismounted
NOTE: cache deleting context for group TESTDG03 1/0xab385c36
Thu Jan 29 08:22:10 2015
NOTE: diskgroup resource ora.TESTDG03.dg is offline
SUCCESS: alter diskgroup testdg03 dismount
NOTE: diskgroup resource ora.TESTDG03.dg is updated
SQL> alter diskgroup testdg03 mount
NOTE: cache registered group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:22:22 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 100 for pid 21, osid 22730
Thu Jan 29 08:22:22 2015
NOTE: Assigning number (1,4) to disk ()
GMON querying group 1 at 101 for pid 21, osid 22730
NOTE: cache dismounting (clean) group 1/0x83F85C5F (TESTDG03)
NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: oracle@mlab2.oracle.com (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 1/0x83F85C5F (TESTDG03)
NOTE: cache ending mount (fail) of group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: cache deleting context for group TESTDG03 1/0x83f85c5f
GMON dismounting group 1 at 102 for pid 21, osid 22730
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup TESTDG03 was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "4" is missing from group number "1"
ERROR: alter diskgroup testdg03 mount
Thu Jan 29 08:27:37 2015
SQL> alter diskgroup testdg03 mount
NOTE: cache registered group TESTDG03 number=1 incarn=0x56985c64
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x56985c64
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:27:43 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 105 for pid 21, osid 23017
NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02
NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03
NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04
NOTE: cache mounting (first) external redundancy group 1/0x56985C64 (TESTDG03)
NOTE: cache recovered group 1 to fcn 0.609
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Jan 29 08:27:43 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR found thread 1 closed at ABA 2.15
NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR opening thread 1 at fcn 0.609 ABA 3.16
NOTE: cache mounting group 1/0x56985C64 (TESTDG03) succeeded
NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0x56985c64
GMON querying group 1 at 106 for pid 13, osid 4612
Thu Jan 29 08:27:43 2015
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup TESTDG03 was mounted
SUCCESS: alter diskgroup testdg03 mount
Thu Jan 29 08:27:43 2015
NOTE: diskgroup resource ora.TESTDG03.dg is online
NOTE: diskgroup resource ora.TESTDG03.dg is updated
Thu Jan 29 08:33:52 2015
SQL> alter diskgroup testdg03 check all norepair
NOTE: starting check of diskgroup TESTDG03
Thu Jan 29 08:33:52 2015
GMON checking disk 0 for group 1 at 107 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 108 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 109 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 110 for pid 21, osid 23017
ERROR: no kfdsk for (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ERROR: alter diskgroup testdg03 check all norepair
Thu Jan 29 08:34:07 2015
SQL> alter diskgroup testdg03 check all
NOTE: starting check of diskgroup TESTDG03
Thu Jan 29 08:34:07 2015
GMON checking disk 0 for group 1 at 111 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 112 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 113 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 114 for pid 21, osid 23017
ERROR: no kfdsk for (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ERROR: alter diskgroup testdg03 check all
SQL> alter diskgroup testdg03 check all repair
NOTE: starting check of diskgroup TESTDG03
GMON checking disk 0 for group 1 at 115 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 116 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 117 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 118 for pid 21, osid 23017
ERROR: no kfdsk for (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup "TESTDG03" contains 1 error(s)
ERROR: alter diskgroup testdg03 check all repair
[oracle@mlab2 oracleasm]$ oerr ora 15042
15042, 00000, “ASM disk \”%s\” is missing from group number \”%s\” ”
// *Cause: The specified disk, which is a necessary part of a diskgroup,
// could not be found on the system.
// *Action: Check the hardware configuration.
//
ORA-15042错误正是因为add disk的磁盘上的metadata全部丢失了,但搞笑的时候新加入的盘上可能因为还没有开始rebalance而没有一点真正有意义的数据,但因为ASM认为该disk已经add进来了,所以必须要该disk可用才能mount diskgroup。 而且用户甚至无法强制DROP这个DISK,原因是需要DISKGROUP在MOUNT状态下才可以drop disk, 这就变成了鸡生蛋 蛋生鸡的死循环, 要DROP这个disk必须MOUNT DISKGROUP,但要MOUNT DISKGROUP要先DROP该DISK。
对于此问题一般需要诗檀软件工程师手动修改ASM metadata来绕过问题,或者如果有之前的ASM metadata也可以采用。