Oracle ASM重新同步- 或- 如何恢复损坏集群

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638    QQ号:47079569    邮箱:service@parnassusdata.com

 

测试#3:覆盖ASM磁盘头,而ASM磁盘组处于脱机状态

在这篇文章中,我们将更深度挖掘ASM。我们会在一个磁盘上覆盖ASM磁盘头,而磁盘组处于脱机状态,检查结果并再次修复LUN

覆盖磁盘“DISK003A”的盘头

dd if=/dev/random bs=8k count=1 of=/dev/sdg1

尝试安装磁盘组

SQL> alter diskgroup DATA2 mount;
alter diskgroup data2 mount
*

ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "2"
ASM alert.log 显示:
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: start heartbeating (grp 2)
kfdp_query(DATA2): 13
kfdp_queryBg(): 13
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: Assigning number (2,0) to disk ()
kfdp_query(DATA2): 14
kfdp_queryBg(): 14
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: cache dismounting (clean) group 2/0x635ADC92 (DATA2)
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 2/0x635ADC92 (DATA2)
NOTE: cache ending mount (fail) of group DATA2 number=2 incarn=0x635adc92
kfdp_dismount(): 15
kfdp_dismountBg(): 15
NOTE: De-assigning number (2,0) from disk ()
NOTE: De-assigning number (2,1) from disk (ORCL:DISK003B)
ERROR: diskgroup DATA2 was not mounted
NOTE: cache deleting context for group DATA2 2/1666899090
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "0" is missing from group number "2"
ERROR: alter diskgroup data2 mount
Thu Oct 01 14:12:04 2009
ASM Health Checker found 1 new failures

取决于你重新启动服务器还是执行“oracleasm scandisks”,操作不同,下面查询的结果也不同:

SQL> set pages 40000 lines 120
SQL> col PATH for a30
SQL> select DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,PATH FROM V$ASM_DISK;
DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    PATH
----------- ------- ------------ ------- -------- ------------------------------
0 CLOSED  MEMBER       ONLINE  NORMAL   ORCL:DISK001A
3 CLOSED  MEMBER       ONLINE  NORMAL   ORCL:DISK003B
2 CLOSED  CANDIDATE    ONLINE  NORMAL   ORCL:DISK003A
1 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK001B
2 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK002A
3 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK002B

1号磁盘被标记为“CANDIDATE”… ASM还记得磁盘是在那里。

但是如果你执行“oracleasm scandisks”ASM库将重新扫描所有磁盘并重新读取ASM头。这一次的查询如下所示:

SQL> set pages 40000 lines 120
SQL> col PATH for a30
SQL> select DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,PATH FROM V$ASM_DISK;
DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    PATH
----------- ------- ------------ ------- -------- ------------------------------
0 CLOSED  MEMBER       ONLINE  NORMAL   ORCL:DISK001A
2 CLOSED  MEMBER       ONLINE  NORMAL   ORCL:DISK003B
1 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK001B
2 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK002A
3 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK002B

之前的盘“DISK003A”不见了。这并不奇怪,因为我们重新扫描了所有磁盘,ASM删除了DISK003A,因为它没有一个有效的磁盘头。系统重新启动后会看到相同的结果。

尝试强制安装磁盘组

虽然丢失了一个镜像,用剩下的镜像我们也应该能够安装磁盘组:

SQL> alter diskgroup data2 mount force;

Diskgroup altered.

ASM alert.log 显示:

SQL> alter diskgroup data2 mount force

NOTE: cache registered group DATA2 number=2 incarn=0x2b1adcb7
NOTE: cache began mount (first) of group DATA2 number=2 incarn=0x2b1adcb7
NOTE: Assigning number (2,1) to disk (ORCL:DISK003B)
Thu Oct 01 14:43:59 2009
NOTE: start heartbeating (grp 2)
Thu Oct 01 14:43:59 2009
kfdp_query(DATA2): 31
kfdp_queryBg(): 31
NOTE: Assigning number (2,0) to disk ()
kfdp_query(DATA2): 32
kfdp_queryBg(): 32
NOTE: cache opening disk 1 of grp 2: DISK003B label:DISK003B
NOTE: F1X0 found on disk 1 au 2 fcn 0.0
NOTE: cache mounting (first) normal redundancy group 2/0x2B1ADCB7 (DATA2)
Thu Oct 01 14:44:00 2009
* allocate domain 2, invalid = TRUE
Thu Oct 01 14:44:00 2009

NOTE: attached to recovery domain 2

NOTE: cache recovered group 2 to fcn 0.3769
Thu Oct 01 14:44:00 2009
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA2)
NOTE: LGWR found thread 1 closed at ABA 32.395
NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA2)
NOTE: LGWR opening thread 1 at fcn 0.3769 ABA 33.396
NOTE: cache mounting group 2/0x2B1ADCB7 (DATA2) succeeded
NOTE: cache ending mount (success) of group DATA2 number=2 incarn=0x2b1adcb7
Thu Oct 01 14:44:00 2009
kfdp_query(DATA2): 33
kfdp_queryBg(): 33
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2
SUCCESS: diskgroup DATA2 was mounted
SUCCESS: alter diskgroup data2 mount force
Thu Oct 01 14:44:00 2009
NOTE: diskgroup resource ora.DATA2.dg is online
Thu Oct 01 14:45:38 2009
WARNING: Disk (DISK003A) will be dropped in: (12407) secs on ASM inst: (1)
GMON SlaveB: Deferred DG Ops completed.



再次查询 v$asm_disk:


SQL> set pages 40000 lines 120
SQL> col PATH for a30

SQL> select DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,PATH FROM V$ASM_DISK;

DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE    PATH
----------- ------- ------------ ------- -------- ------------------------------
0 CLOSED  MEMBER       ONLINE  NORMAL   ORCL:DISK001A
0 MISSING UNKNOWN      OFFLINE NORMAL
1 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK001B
2 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK002A
3 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK002B
1 CACHED  MEMBER       ONLINE  NORMAL   ORCL:DISK003B

识别新的/丢失的磁盘

ASM发现有一个磁盘丢失……。但是如何识别这个盘呢?我发现了下面的脚本,并稍加修改,以用于当前的ASM:

/etc/init.d/oracleasm querydisk -d `/etc/init.d/oracleasm listdisks -d` | \
cut -f2,10,11 -d" " | \
perl -pe 's/"(.*)".*\[(.*), *(.*)\]/$1 $2 $3/g;' | \
while read v_asmdisk v_minor v_major
do
v_device=`ls -la /dev | grep " $v_minor, *$v_major " | awk '{print $10}'`
echo "ASM disk $v_asmdisk based on /dev/$v_device [$v_minor, $v_major]"
done

它会报告所有发现的ASM磁盘及其设备路径:

ASM disk DISK001A based on /dev/sdc1 [8, 33]
ASM disk DISK001B based on /dev/sdb1 [8, 17]
ASM disk DISK002A based on /dev/sdd1 [8, 49]
ASM disk DISK002B based on /dev/sde1 [8, 65]
ASM disk DISK003B based on /dev/sdf1 [8, 81]


基于这些信息,我们可以从可用设备中减去ASM设备,并推断出缺少的设备是/ dev/ sdg1。这种方法也许没有效果,但它会产生一个概述快。现在由管理员验证要使用的设备。请记住,多路径设备可能需要重复几次。

重复使用同一个ASM磁盘名称

标记ASM磁盘

我们将重新使用/ dev / sdg1,并使用旧标签“DISK003A”将其标记为ASM磁盘:

[root@rac1 ~]# oracleasm createdisk DISK003A /dev/sdg1
Writing disk header: done
Instantiating disk: done
[root@rac1 ~]# oracleasm scandisks
Reloading disk partitions: done
Cleaning any stale ASM disks...
Scanning system for ASM disks...
[root@rac1 ~]# oracleasm listdisks

DISK001A
DISK001B
DISK002A
DISK002B
DISK003A
DISK003B

到现在为止没什么问题.

添加磁盘到磁盘组

现在我们需要添加磁盘到磁盘组:

SQL> alter diskgroup data2 add disk 'ORCL:DISK003A';
alter diskgroup data2 add disk 'ORCL:DISK003A'
*
位于第1行的错误:
ORA-15032: not all alterations performed
ORA-15020: discovered duplicate ASM disk "DISK003A"


那么,该错误信息是正确的。 ASM记得之前的磁盘,防止具有相同名称的磁盘被重复使用。但是,你可以强制重复使用磁盘,通过以下步骤:

  • 从磁盘组删除脱机磁盘
  • 添加新磁盘到磁盘组

 

 

1. 从磁盘组删除脱机磁盘

SQL> alter diskgroup data2 drop disk DISK003A force;

Diskgroup altered.

ASM alert.log 显示:

Thu Oct 01 15:24:59 2009
SQL> alter diskgroup data2 drop disk DISK003A force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
NOTE: initiating PST update: grp = 2
kfdp_update(): 34
Thu Oct 01 15:25:02 2009
kfdp_updateBg(): 34
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
Thu Oct 01 15:25:02 2009
NOTE: membership refresh pending for group 2/0x2b1adcb7 (DATA2)
kfdp_query(DATA2): 35
kfdp_queryBg(): 35
SUCCESS: refreshed membership for 2/0x2b1adcb7 (DATA2)
NOTE: starting rebalance of group 2/0x2b1adcb7 (DATA2) at power 1
SUCCESS: alter diskgroup data2 drop disk DISK003A force
Starting background process ARB0
Thu Oct 01 15:25:05 2009
ARB0 started with pid=27, OS id=24150
NOTE: assigning ARB0 to group 2/0x2b1adcb7 (DATA2)
NOTE: F1X0 copy 1 relocating from 0:2 to 1:2 for diskgroup 2 (DATA2)
NOTE: F1X0 copy 2 relocating from 1:2 to 0:2 for diskgroup 2 (DATA2)
NOTE: stopping process ARB0
Thu Oct 01 15:25:12 2009
SUCCESS: rebalance completed for group 2/0x2b1adcb7 (DATA2)
Thu Oct 01 15:25:12 2009
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
NOTE: initiating PST update: grp = 2
kfdp_update(): 36
Thu Oct 01 15:25:15 2009
kfdp_updateBg(): 36
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
WARNING: offline disk number 0 has references (3433 AUs)
NOTE: initiating PST update: grp = 2
kfdp_update(): 37
kfdp_updateBg(): 37
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
NOTE: membership refresh pending for group 2/0x2b1adcb7 (DATA2)
kfdp_query(DATA2): 38
kfdp_queryBg(): 38
SUCCESS: refreshed membership for 2/0x2b1adcb7 (DATA2)
Thu Oct 01 15:24:59 2009
SQL> alter diskgroup data2 drop disk DISK003A force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
NOTE: initiating PST update: grp = 2
kfdp_update(): 34
Thu Oct 01 15:25:02 2009
kfdp_updateBg(): 34
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
Thu Oct 01 15:25:02 2009
NOTE: membership refresh pending for group 2/0x2b1adcb7 (DATA2)
kfdp_query(DATA2): 35
kfdp_queryBg(): 35
SUCCESS: refreshed membership for 2/0x2b1adcb7 (DATA2)
NOTE: starting rebalance of group 2/0x2b1adcb7 (DATA2) at power 1
SUCCESS: alter diskgroup data2 drop disk DISK003A force
Starting background process ARB0
Thu Oct 01 15:25:05 2009
ARB0 started with pid=27, OS id=24150
NOTE: assigning ARB0 to group 2/0x2b1adcb7 (DATA2)
NOTE: F1X0 copy 1 relocating from 0:2 to 1:2 for diskgroup 2 (DATA2)
NOTE: F1X0 copy 2 relocating from 1:2 to 0:2 for diskgroup 2 (DATA2)
NOTE: stopping process ARB0
Thu Oct 01 15:25:12 2009

SUCCESS: rebalance completed for group 2/0x2b1adcb7 (DATA2)
Thu Oct 01 15:25:12 2009
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
NOTE: initiating PST update: grp = 2
kfdp_update(): 36
Thu Oct 01 15:25:15 2009
kfdp_updateBg(): 36
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
WARNING: offline disk number 0 has references (3433 AUs)
NOTE: initiating PST update: grp = 2
kfdp_update(): 37
kfdp_updateBg(): 37
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: PST update grp = 2 completed successfully
NOTE: membership refresh pending for group 2/0x2b1adcb7 (DATA2)
kfdp_query(DATA2): 38
kfdp_queryBg(): 38
SUCCESS: refreshed membership for 2/0x2b1adcb7 (DATA2)


2. 添加新磁盘到磁盘组

SQL> alter diskgroup data2 add disk 'ORCL:DISK003A';
Diskgroup altered.
ASM alert.log 显示:
Thu Oct 01 15:27:22 2009
SQL> alter diskgroup data2 add disk 'ORCL:DISK003A'
NOTE: Assigning number (2,2) to disk (ORCL:DISK003A)
NOTE: requesting all-instance membership refresh for group=2
NOTE: initializing header on grp 2 disk DISK003A
NOTE: cache opening disk 2 of grp 2: DISK003A label:DISK003A
NOTE: requesting all-instance disk validation for group=2
Thu Oct 01 15:27:25 2009
NOTE: disk validation pending for group 2/0x2b1adcb7 (DATA2)
SUCCESS: validated disks for 2/0x2b1adcb7 (DATA2)
NOTE: initiating PST update: grp = 2
kfdp_update(): 39
Thu Oct 01 15:27:28 2009
kfdp_updateBg(): 39
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 2 completed successfully
NOTE: membership refresh pending for group 2/0x2b1adcb7 (DATA2)
kfdp_query(DATA2): 40
kfdp_queryBg(): 40
kfdp_query(DATA2) 41
kfdp_queryBg(): 41
SUCCESS: refreshed membership for 2/0x2b1adcb7 (DATA2)
NOTE: starting rebalance of group 2/0x2b1adcb7 (DATA2) at power 1
Starting background process ARB0
SUCCESS: alter diskgroup data2 add disk 'ORCL:DISK003A'
Thu Oct 01 15:27:31 2009
ARB0 started with pid=35, OS id=24288
NOTE: assigning ARB0 to group 2/0x2b1adcb7 (DATA2)
NOTE: F1X0 copy 2 relocating from 0:2 to 2:2 for diskgroup 2 (DATA2)


之后,ASM会重新平衡其范围以恢复你所选择的冗余。操作成功后会出现以下消息:


Thu Oct 01 15:37:45 2009
NOTE: stopping process ARB0
SUCCESS: rebalance completed for group 2/0x2b1adcb7 (DATA2)
Thu Oct 01 15:37:47 2009
NOTE: GroupBlock outside rolling migration privileged region
NOTE: requesting all-instance membership refresh for group=2
NOTE: initiating PST update: grp = 2
kfdp_update(): 42
Thu Oct 01 15:37:50 2009
kfdp_updateBg(): 42
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 2 completed successfully
SUCCESS: disk number 0 force dropped offline
NOTE: initiating PST update: grp = 2
kfdp_update(): 43
kfdp_updateBg(): 43
NOTE: group DATA2: updated PST location: disk 0001 (PST copy 0)
NOTE: group DATA2: updated PST location: disk 0002 (PST copy 1)
NOTE: PST update grp = 2 completed successfully
NOTE: De-assigning number (2,0) from disk ()
NOTE: membership refresh pending for group 2/0x2b1adcb7 (DATA2)
kfdp_query(DATA2): 44
kfdp_queryBg(): 44
SUCCESS: refreshed membership for 2/0x2b1adcb7 (DATA2)

检查ASM磁盘操作

你可以用下列查询检查ASM操作:

SQL> select GROUP_NUMBER, OPERATION, STATE, ACTUAL, SOFAR, EST_MINUTES from v$asm_operation;
如果一个操作在运行(如重新平衡)查询将返回一些行。比如对于我们刚刚添加的磁盘,我们会得到:
GROUP_NUMBER OPERA STAT     ACTUAL      SOFAR EST_MINUTES
------------ ----- ---- ---------- ---------- -----------
2           REBAL RUN           1         49          16

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号