深入了解Oracle ASM(一):基础概念

ASM基础概念

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

任何转载请注明源地址,否则追究法律责任!:https://www.askmac.cn/archives/know-oracle-asm.html

 

相关文章链接:

Asm Instance Parameter Best Practice

为什么RHEL 6上没有ASMLIB?

Unix上如何查看文件名开头为”+asm”的TRACE文件

asm_power_limit对IO的影响

针对11.2 RAC丢失OCR和Votedisk所在ASM Diskgroup的恢复手段

10g ASM lost disk log

11gR2 RAC ASM启动揭秘

在11gR2 RAC中修改ASM DISK Path磁盘路径

在Linux 6上使用UDEV解决RAC ASM存储设备名问题

Script:找出ASM中的Spfile参数文件

如何诊断ASMLIB故障

Script:收集ASM诊断信息

Comparation between ASM note [ID 373242.1] and note [ID 452924.1]

Why ASMLIB and why not?

ASM file metadata operation等待事件

几个关于oracle 11g ASM的问题

利用UDEV服务解决RAC ASM存储设备名

Discover Your Missed ASM Disks

Oracle内部视图X$KFFXP

Fixed X$ Tables in ASM

了解AMDU工具生成的MAP文件

使用AMDU工具从无法MOUNT的DISKGROUP中抽取数据文件

 

Oracle 自动存储管理概述

自动存储管理 (ASM) 是 Oracle Database 的一个特性,它为数据库管理员提供了一个在所有服务器和存储平台上均一致的简单存储管理接口。作为专门为 Oracle 数据库文件创建的垂直集成文件系统和卷管理器,ASM 提供了直接异步 I/O 的性能以及文件系统的易管理性。ASM 提供了可节省 DBA 时间的功能,以及管理动态数据库环境的灵活性,并且提高了效率。ASM 的主要优点有:

  • 简化和自动化了存储管理
  • 提高了存储利用率和敏捷性
  • 提供可预测的性能、可用性和可伸缩性


Oracle Cloud File System 概述

Oracle Cloud File System (CloudFS) 前所未有地简化了通用文件的存储管理、供应自动化和存储整合。CloudFS 是一个存储云基础架构,提供资源池、网络可访问性、快速伸缩以及快速供应 — 这些都是云计算环境的关键要求。该产品包括:

  • Oracle ASM Dynamic Volume Manager (ADVM)

ADVM 提供了一个通用卷管理服务和一个标准设备驱动程序接口,便于系统管理员跨不同平台进行管理。ACFS 和第三方文件系统可以使用 ASM 动态卷创建和管理可利用 ASM 特性的所有功能的文件系统。因此,无需停机即可轻松调整 ADVM 卷的大小以适应文件系统的存储需求。 

  • Oracle ASM Cluster File System (ACFS)

一个通用的与 POSIX、X/OPEN 和 Windows 兼容的文件系统,专为单节点和单集群的配置而设计。使用操作系统自带的命令、ASM asmcmd 和 Oracle Enterprise Manager 对 ACFS 进行管理。ACFS 支持高级数据服务,如时间点复制快照、文件系统复制和标签,以及文件系统安全性和加密。

 

 

Automatic Storage Management是Oracle 在版本10g中率先(对比其他RDBMS)提出的数据库存储自动解决方案,在版本11g中得到进一步升华。ASM提供了数据库管理所需要的一个简单、有效的存储管理接口,该接口实现了跨服务器和存储平台。 ASM是文件系统filesystem和volume manager卷管理软件的一体化,专门为Oracle的数据库文件锁设计的; ASM在保证如文件系统般管理简单的基础上提供高性能的异步Async IO。ASM的引入提高了数据库的可扩展容量,同时节约了DBA的时间,使其能够更敏捷、更高效地管理一个流动性较大的数据库环境。

 

ASM的出现是为RDBMS管理文件存储

  • 注意ASM不会替代RDBMS去实施IO读写,很多对这一点存在误解,认为RDBMS发送IO request给ASM,ASM去做真正的IO操作,这是错误的。
  • 真正的IO还是由RDBMS进程去实施,和不用ASM的裸设备一样
  • 因此ASM不是IO的中间层,也就不存在因为ASM而出现所谓的IO瓶颈
  • 对于ASM而言LUN DISK可以是裸设备也可以直接是块设备(10.2.0.2以后)
  • 适合存放在ASM中的文件类型包括:数据文件datafile、控制文件controlfile、重做日志redolog、归档日志archivelog、闪回日志flashback log、spfile、RMAN备份以及block tracking file、datapump文件
  • 从11gR2开始,ASM引入了ACFS特性可以存放任何类型的文件; 但是ACFS不支持存放数据文件

 

ASM基础概念:

  • ASM的最小存储单位是一个”allocation unit”(AU),通常为1MB,在Exadata上推荐为4MB
  • ASM的核心是存储文件
  • 文件被划分为多个文件片,称之为”extent”
  • 11g之前extent的大小总是为一个AU,11g之后一个extent可以是1 or 8 or 64个AU
  • ASM使用file extent map维护文件extent的位置
  • ASM在LUN DISK的头部header维护其元数据,而非数据字典
  • 同时RDBMS DB会在shared pool中缓存file extent map,当server process处理IO时使用
  • 因为ASM instance使用类似于普通RDBMS的原理的instance/crash recovery,所以ASM instance奔溃后总是能复原的。

ASM存储以diskgroups的概念呈现:

  • Diskgroup DG对RDBMS实例可见,例如一个DATA DG,对于RDBMS来说就是以’+DATA’表示的一个存储点, 可以在该DG上创建一个tablespace,例如: create tablespace ONASM datafile ‘+DATA’ size 10M。
  • Diskgroup下面是一个或者多个failure group (FG)
  • FG被定义为一组Disk
  • Disk在这里可以是裸的物理卷、磁盘分区、代表某个磁盘阵列的LUN,亦或者是LVM或者NAS设备
  • 多个FG中的disk不应当具备相同的单点故障,否则ASM的冗余无效

 

ASM所提供的高可用性:

  • ASM提供数据镜像以便从磁盘失败中恢复
  • 用户可以选择EXTERNAL、NORMAL、HIGH三种冗余镜像
  • EXTERNAL即ASM本身不做镜像,而依赖于底层存储阵列资深实现镜像;在External下任何的写错误都会导致Disk Group被强制dismount。在此模式下所有的ASM DISK必须都存在健康,否则Disk Group将无法MOUNT
  • NORMAL 即ASM将为每一个extent创建一个额外的拷贝以便实现冗余;默认情况下所有的文件都会被镜像,这样每一个file extent都有2份拷贝。若写错误发生在2个Disk上且这2个Disk是partners时将导致disk Disk Group被强制dismount。若发生失败的磁盘不是partners则不会引起数据丢失和不可用。
  • HIGH 即ASM为每一个extent创建两个额外的拷贝以便实现更高的冗余。2个互为partners的Disk的失败不会引起数据丢失,当然不能有更多的partners Disk失败了。
  • 数据镜像依赖于failure group和extent partnering实现。ASM在NORMAL 或 HIGH 冗余度下可以容许丢失一个failure group中所有的磁盘。

 

Failure Group镜像的使用

  • ASM的镜像并不像RAID 1那样
  • ASM的镜像基于文件extent的粒度,extent分布在多个磁盘之间,称为partner
  • Partner disk会存放在一个或者多个分离的failure group上
  • ASM自动选择partner并限制其数量小于10个
  • 若磁盘失败,则ASM更新其extent map使今后的读取操作指向剩余的健康partner
  • 在11g中,若某个disk处于offline状态,则对于文件的变更会被追踪记录这样当disk被重现online时则这些变化得以重新应用,前提是offline的时间不超过DISK_REPAIR_TIME所指定的时间(默认为3.6个小时). 这种情况常发生在存储控制器故障或者类似的短期磁盘故障:
  • 这种对于文件变更的追踪基于一个发生变化的file extent的位图,该位图告诉ASM哪些extents需要从健康的partner哪里拷贝至需要修复的disk,该特性称之为fast mirror resync
  • 在10g中没有fast mirror resync特性,若disk出现offline则直接自动被drop掉,不存在允许修复的周期
  • 对于无法再online的disk,则必须被drop掉; 一个新的disk会被ASM选择并通过rebalancing 操作拷贝数据,这些工作是后台自动完成的。

 

重新平衡Rebalancing

  • Rebalancing是在磁盘之间移动文件extent以实现diskgroup上的IO负载均衡的过程
  • Rebalancing在后台异步发生,是可监控的
  • 在集群环境中,一个diskgroup的重平衡只能在一个ASM instance上发生,不能通过集群多节点同时处理以加速
  • 当disk被加入或移除时,ASM会自动在后台开始数据重新平衡工作
  • 重平衡的速度和力度可以通过asm_power_limit参数控制
  • asm_power_limit参数默认为1,其范围为0~11(从11.2.0.2开始是0-1024),该参数控制实施重平衡后台进程的数量;Level 0表示不实施重新平衡
  • 在重新平衡过程中IO性能(主要是吞吐量和响应时间)可能受到影响,其影响程度取决于存储本身的能力和重新平衡的力度,默认的asm_powner_limit=1不会造成过度的影响

 

性能方面

  • ASM会通过在DG中条带化文件extent分布以最大化可用的IO带宽
  • 有2种可用条带化宽度:coarse粗糙条带化大小为1个AU,fine精细条带化为128K
  • 即便是fine精细条带化仍采用普通大小的file extent,但是条带化以更小的片形式循环式地分布在多个extent上
  • ASM默认不让RDBMS去读备用的镜像拷贝extent,即使这样请放心IO还是均衡的
  • 默认情况下RDBMS总是去读取主primary extent,从11.1开始可以通过PREFERRED_READ_FAILURE_GROUP参数设置让本地节点优先读取某个failure group中的extent; 该特性主要为extended distance RAC设计,不建议在常规ASM中使用

 

 其他知识

  • 并非RAC才能使用ASM,单节点同样可以从ASM哪里获得好处
  • 节点上的一个ASM instance实例可以为多个RDBMS DB实例服务
  • RAC环境中的ASM必须也是集群化的,以便能够协调更新元数据
  • 从11.2开始,ASM从RDBMS HOME分离出来,而和clusterware一起安装在GRID HOME下。

 

 

Disk Group:

Disk Group”磁盘组” 是ASM管理的逻辑概念对象,一个Disk Group由多个ASM disk组成。每一个Disk Group都是子描述的,如同一个标准的文件系统一样。所有关于该Diskgroup 空间使用信息的元数据均完整地包含在这个磁盘组中。 若ASM可以找到所有属于该ASM diskgroup的DISK则他不需要任何其他额外的元数据。

文件空间从Disk Group中分配。任何一个ASM文件总是完整地包含在一个单独的Disk Group中。但是,一个Disk Group可能包含了属于多个数据库的文件,一个单独的数据库的文件也可以存放在多个不同的Disk Group中。 在大多数实际的部署中,不会创建太多数量的Disk Groups,一般在3~4个。

Disk Group提供三种不同的redundancy冗余度,详见上文。

 

ASM Disk

 

一个ASM Disk是组成Disk Group的基本的持久的存储。 当一个ASM Disk加入到Disk Group中时,它要么采用管理员指定的ASM Disk Name要么采用系统自动分配的Disk Name。 这不同于OS 给用于访问该设备的”艺名”。  在一个Cluster集群中, 同一个Disk 可能在不同的节点上显示不同的Device Name设备名,例如在 Node1上的 /dev/sdc ,对应于Node2上的/dev/sdd。 ASM Disk必须在所有使用该Disk Group的实例上可用直接磁盘I/O访问。

实际上对于RDBMS Oracle而言访问ASM disk和访问普通的文件并没有什么不同,除非使用了ASMLIB(ASMLIB不是ASM必须的,再次强调!)。常规情况下ASM Disk是OS上可见的LUN的partition,该分区覆盖了所有不被操作系统所保留的磁盘的空间。 大多数操作系统需要保留LUN的第一个block作为分区表(partition table); 由于ASM总是会写ASM Disk的第一个块,所以要保证ASM不会去覆盖前几个block上的分区表(partition table),例如在Solaris上分区时不要把前几个柱面划给partition。LUN可以是简单的物理JBOD,或者是由高级存储阵列管理的虚拟LUN。既可以是直连的设备也可以是SAN。ASM Disk可以是任何被开发系统调用所访问的东西,除了本地文件系统。 甚至于NFS上的文件都可以被当做一个ASM Disk来用,这样便于喜欢NAS的用户使用ASM,当然比起NFS来我更建议干脆用ISCSI。

 

注意虽然可以使用普通logical Volume Manager LVM管理的logical volume作为ASM Disk,但是这并不是推荐组合,除非你想不到其他更好的办法。 即便你一定要这样用,但是注意也不要在LVM级别做镜像和条带化。

ASM将任何文件以AU大小均匀分布在Disk Group的所有Disk上。每一个ASM Disk均被维护以保持同样的使用比率。这保证同一个Disk Group中的所有Disk的IO负载基本一致。由于ASM在一个Disk Group中的磁盘上的负载均衡,所以为同一个物理磁盘的不同区域划分为2个ASM Disk不会对性能有所影响;而同一个物理磁盘上划分2个不同分区置于不同的2个Disk Group则有效。

当ASM Disk Group启用冗余时单个ASM Disk仅是一个失败单元。对于该ASM Disk的写失败在10g会自动从该Disk Group drop掉该Disk,前提是该Disk的丢失被容许。

 

Allocation Unit

每一个ASM Disk都被划分为许多个AU allocation units(单个AU 的大小在 1MB ~64MB,注意总是2的次方MB)。而且AU allocation unit也是Disk Group的基本分配单元。一个ASM Disk上的可用空间总是整数倍个AU。在每一个ASM Disk的头部均有一个表,该表的每一条记录代表该ASM Disk上的一个AU。文件的extent指针(pointer)给出了ASM Disk Number磁盘号和AU号,这就描述了该extent的物理位置。由于所有的空间操作都以AU为单位,所以不存在所谓ASM碎片这样的概念和问题。

一个AU(1M~64M)足够小,以便一个文件总是要包含很多个AU,这样就可以分布在很多磁盘上,也不会造成热点。一个AU又足够大以便能够在一个IO操作中访问它,以获得更加的吞吐量,也能提供高效的顺序访问。访问一个AU的时间将更多的消耗在磁盘传输速率上而非花在寻找AU头上。对于Disk Group的重新平衡也是对每一个AU逐次做的。

 

 

 

 

 

了解ASM后台进程的作用:

 

GMON: ASM Diskgroup监控进程

ASMB: ASM后台网络进程

RBAL: ASM reblance master process 重新平衡主进程

ARBx:   reblance slave process实际实施reblance的后台进程

MARK: AU resync AU重新同步的指挥家进程

 

了解ASM前台进程的作用:

 

ASM的client(主要是RDBMS DB和CRSD))在连接ASM实例时会产生前台进程,前天进程的名字一般为oracle+ASM_<process>_<product> (例如: oracle+ASM_DBW0_DB1)。

 

OCR 特有的前台进程foreground: oracle+ASM1_ocr

 

 

 ASM相关的V$和X$视图

 

视图名 X$基表名 描述
V$ASM_DISKGROUP X$KFGRP 实施磁盘发现disk discovery和列出磁盘组
V$ASM_DISKGROUP_STAT X$KFGRP_STAT 显示disk group状态
V$ASM_DISK X$KFDSK, X$KFKID 实施磁盘发现disk discovery和列出磁盘以及这些磁盘的使用度量信息
V$ASM_DISK_STAT X$KFDSK_STAT,X$KFKID 列出磁盘和其使用度量信息
V$ASM_FILE X$KFFIL 列出ASM文件也包括了元数据信息
V$ASM_ALIAS X$KFALS 列出了ASM的别名,文件和目录
V$ASM_TEMPLATE X$KFTMTA 列出可用的模板和其属性
V$ASM_CLIENT X$KFNCL 列出链接到ASM的DB实例
V$ASM_OPERATION X$KFGMG 列出rebalancing重平衡操作
N/A X$KFKLIB 可用的ASMLIB路径
N/A X$KFDPARTNER 列出Disk-partners关系
N/A X$KFFXP 所有ASM文件的extent map
N/A X$KFDAT 所有ASM Disk的extent列表
N/A X$KFBH 描述ASM cache
N/A X$KFCCE ASM block的链表
V$ASM_ATTRIBUTE(new in 11g) X$KFENV(new in 11g) Asm属性,该X$基表还显示一些隐藏属性
V$ASM_DISK_IOSTAT(new in 11g) X$KFNSDSKIOST(new in 11g) I/O统计信息
N/A X$KFDFS(new in 11g)
N/A X$KFDDD(new in 11g)
N/A X$KFGBRB(new in 11g)
N/A X$KFMDGRP(new in 11g)
N/A X$KFCLLE(new in 11g)
N/A X$KFVOL(new in 11g)
N/A X$KFVOLSTAT(new in 11g)
N/A X$KFVOFS(new in 11g)
N/A X$KFVOFSV(new in 11g)

 

X$KFFXP包含了文件、extent和AU之间的映射关系。 从该X$视图可以追踪给定文件的extent的条带化和镜像情况。注意对于primary au和mirror au读操作的负载是均衡的, 而写操作要求同时写2者到磁盘。以下是X$KFFXP视图列的含义

 

 

X$KFFXP Column Name Description
ADDR x$ table address/identifier
INDX row unique identifier
INST_ID instance number (RAC)
NUMBER_KFFXP ASM file number. Join with v$asm_file and v$asm_alias
COMPOUND_KFFXP File identifier. Join with compound_index in v$asm_file
INCARN_KFFXP File incarnation id. Join with incarnation in v$asm_file
PXN_KFFXP Progressive file extent number
XNUM_KFFXP ASM file extent number (mirrored extent pairs have the same extent value)
GROUP_KFFXP ASM disk group number. Join with v$asm_disk and v$asm_diskgroup
DISK_KFFXP Disk number where the extent is allocated. Join with v$asm_disk
AU_KFFXP Relative position of the allocation unit from the beginning of the disk. The allocation unit size
(1 MB) in v$asm_diskgroup
LXN_KFFXP 0->primary extent, ->mirror extent, 2->2nd mirror copy (high redundancy and metadata)
FLAGS_KFFXP N.K.
CHK_KFFXP N.K.

 

 

X$KFDAT该X$视图包含了所有allocation unit AU的细节,不管是FREE的还是USED。

 

X$KFDAT Column Name Description
ADDR x$ table address/identifier
INDX row unique identifier
INST_ID instance number (RAC)
GROUP_KFDAT diskgroup number, join with v$asm_diskgroup
NUMBER_KFDAT disk number, join with v$asm_disk
COMPOUND_KFDAT disk compund_index, join with v$asm_disk
AUNUM_KFDAT Disk allocation unit (relative position from the beginning of the disk), join with
x$kffxp.au_kffxp
V_KFDAT V=this Allocation Unit is used; F=AU is free
FNUM_KFDAT file number, join with v$asm_file
I_KFDAT N/K
XNUM_KFDAT Progressive file extent number join with x$kffxp.pxn_kffxp
RAW_KFDAT raw format encoding of the disk,and file extent information

 

 

X$KFDPARTNER 这个X$视图包含了 disk-partner(1-N)的映射关系,在一个给定ASM Diskgroup,若2个Disk存有同一个extent的镜像拷贝,则将2个disk视作partners。因此partners必须属于同一个diskgroup下的不同的failgroup。

 

X$KFDPARTNER Column Name Description
ADDR x$ table address/identifier
INDX row unique identifier
INST_ID instance number (RAC)
GRP diskgroup number, join with v$asm_diskgroup
DISK disk number, join with v$asm_disk
COMPOUND disk identifier. Join with compound_index in v$asm_disk
NUMBER_KFDPARTNER partner disk number, i.e. disk-to-partner (1-N) relationship
MIRROR_KFDPARNER if=1 in a healthy normal redundancy config
PARITY_KFDPARNER if=1 in a healthy normal redundancy config
ACTIVE_KFDPARNER if=1 in a healthy normal redundancy config

 

研究ASM必要的技巧

 

1)找出ASM的镜像mirror extent,在例子中是ASM的spfile

 

 

[grid@localhost ~]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Wed Feb 13 11:13:39 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Automatic Storage Management option

INSTANCE_NAME
----------------
+ASM

SQL> 
SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +SYSTEMDG/asm/asmparameterfile
                                                 /registry.253.805993079

select GROUP_KFFXP, DISK_KFFXP, AU_KFFXP
  from x$kffxp
 where number_kffxp =
       (select file_number
          from v$asm_alias
         where name = 'REGISTRY.253.805993079');

GROUP_KFFXP DISK_KFFXP   AU_KFFXP
----------- ---------- ----------
          3          2         38
          3          1         39
          3          0         44

也可以这样定位

select GROUP_KFDAT, NUMBER_KFDAT, AUNUM_KFDAT
  from x$kfdat
 where fnum_kfdat = (select file_number
                       from v$asm_alias
                      where name = 'REGISTRY.253.805993079')

GROUP_KFDAT NUMBER_KFDAT AUNUM_KFDAT
----------- ------------ -----------
          3            0          44
          3            1          39
          3            2          38

==> 找到该 DISK对应的路径
SQL> select path,DISK_NUMBER from v$asm_disk where GROUP_NUMBER=3 and disk_number in (0,1,2);

PATH                 DISK_NUMBER
-------------------- -----------
/dev/asm-diski                 2
/dev/asm-diskh                 1
/dev/asm-diskg                 0

SQL> create pfile='/home/grid/pfile' from spfile;

File created.

SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Automatic Storage Management option

[grid@localhost ~]$ cat pfile 
+ASM.asm_diskgroups='EXTDG','NORDG'#Manual Mount
*.asm_diskstring='/dev/asm*'
*.asm_power_limit=1
*.diagnostic_dest='/g01/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.local_listener='LISTENER_+ASM'
*.remote_login_passwordfile='EXCLUSIVE'

通过dd读取该AU		  

[grid@localhost ~]$ dd if=/dev/asm-diski of=/tmp/spfile.dmp skip=38 bs=1024k count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00328614 seconds, 319 MB/s

[grid@localhost ~]$ strings /tmp/spfile.dmp 
+ASM.asm_diskgroups='EXTDG','NORDG'#Manual Mount
*.asm_diskstring='/dev/asm*'
*.asm_power_limit=1
*.diagnostic_dest='/g01/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.local_listener='LISTENER_+ASM'
*.remote_login_passwordfile='EXCLUSIVE'

[grid@localhost ~]$ dd if=/dev/asm-diskh of=/tmp/spfile1.dmp skip=39 bs=1024k count=1  
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0325114 seconds, 32.3 MB/s

[grid@localhost ~]$ strings /tmp/spfile1.dmp                                          
+ASM.asm_diskgroups='EXTDG','NORDG'#Manual Mount
*.asm_diskstring='/dev/asm*'
*.asm_power_limit=1
*.diagnostic_dest='/g01/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.local_listener='LISTENER_+ASM'
*.remote_login_passwordfile='EXCLUSIVE'		

[grid@localhost ~]$ dd if=/dev/asm-diskg of=/tmp/spfile2.dmp skip=44 bs=1024k count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0298287 seconds, 35.2 MB/s

[grid@localhost ~]$ strings /tmp/spfile2.dmp                                          
+ASM.asm_diskgroups='EXTDG','NORDG'#Manual Mount
*.asm_diskstring='/dev/asm*'
*.asm_power_limit=1
*.diagnostic_dest='/g01/app/grid'
*.instance_type='asm'
*.large_pool_size=12M
*.local_listener='LISTENER_+ASM'
*.remote_login_passwordfile='EXCLUSIVE'

 

 

2) 显示asm disk failure group和 disk partners的映射关系:

 

  1* select DISK_NUMBER,FAILGROUP,path from v$asm_disk where group_number=3
SQL> /

DISK_NUMBER FAILGROUP                      PATH
----------- ------------------------------ --------------------
          3 SYSTEMDG_0003                  /dev/asm-diskj
          2 SYSTEMDG_0002                  /dev/asm-diski
          1 SYSTEMDG_0001                  /dev/asm-diskh
          0 SYSTEMDG_0000                  /dev/asm-diskg

SQL> select disk,NUMBER_KFDPARTNER,DISKFGNUM from X$KFDPARTNER where grp=3;

      DISK NUMBER_KFDPARTNER  DISKFGNUM
---------- ----------------- ----------
         0                 1          1
         0                 2          1
         0                 3          1
         1                 0          2
         1                 2          2
         1                 3          2
         2                 0          3
         2                 1          3
         2                 3          3
         3                 0          4
         3                 1          4
         3                 2          4

12 rows selected.

 

 

ASM常见问题, FAQ:

 

Q:ASM做 rebalance和 mirror 的基本颗粒是什么?

A: ASM做mirror 镜像的基本颗粒是file的extent,默认情况下一个extent等于一个AU,11g之后一个extent可以是1 or 8 or 64个AU

ASM做rebalance重新平衡的基本颗粒也是extent,虽然重新平衡是对每一个AU逐次做的。

 

 

Q:ASMLIB和ASM的关系是什么?

A:ASMLIB是一种种基于Linux module,专门为Oracle Automatic Storage Management特性设计的内核支持库(kernel support library)。

简单来说ASMLIB是一种Linux下的程序包,它不属于Oracle ASM kernel。 通过ASMLIb可以做到设备名绑定,便于ASM使用的目的; 但是Linux上能实现设备名绑定并便于ASM使用的服务有很多,例如udev、mpath等;

所以ASMLIB并不是ASM必须的组件; 国内的中文文章对于该概念的描述大多不清晰,造成了ASMLIB=ASM或者ASM必须用ASMLIB的误解,这是以讹传讹。

ASMLIB的缺点见拙作《Why ASMLIB and why not?》一文

 

Q: ASM是否是raid 10或者raid 01?

A:ASM的mirror是基于file extent的,而不是像raid那样基于disk或者block。 所以ASM既不同于Raid 10,也不是Raid 01。 如果硬要说相似点的话,因为ASM是先mirror镜像后stripe条带化,所以在这个特征上更像Raid 10。 但是注意,再次强调,ASM既不是RAID 10也不是RAID 01, 重复一千遍。。。。。。。。。。。。。

 

 

TO BE Continued………………. 😎

【Oracle ASM】Variable Extent Size 原理

Variable size extents enable support for larger ASM datafiles, reduce SGA memory requirements for very large databases, and improve performance for file create and open operations. The size of the extent map that defines a file can be smaller by a factor of 8 and 64 depending on the file size. The initial extent size is equal to the allocation unit size and it increases by a factor of 8 and 64 at predefined thresholds. This feature is automatic for newly created and resized datafiles when the disk group compatibility attributes are set to Oracle Release 11 or higher. For information about compatibility attributes, see “Disk Group Compatibility”.

For 11.1 shows the ASM file extent relationship with allocation units. Extent size is always equal to AU for the first 20000 extent sets (0 – 19999). Figure 1-4 shows the first eight extents (0 to 7) distributed on four ASM disks. After the first 20000 extent sets, the extent size becomes 8*AU for next 20000 extent sets (20000 – 39999). This is shown as bold rectangles labeled with the extent set numbers 20000 to 20007, and so on. The next increment for an ASM extent is 64*AU (not shown in the figure).

The ASM coarse striping is always equal to the disk group AU size, but fine striping size always remains 128KB in any configuration (not shown in the figure). The AU size is determined at creation time with the allocation unit size (AU_SIZE) disk group attribute. The values can be 1, 2, 4, 8, 16, 32, and 64 MB.

 

 

FOR 11.2:

The extent size of a file varies as follows:

  • Extent size always equals the disk group AU size for the first 20000 extent sets (0 – 19999).
  • Extent size equals 4*AU size for the next 20000 extent sets (20000 – 39999).
  • Extent size equals 16*AU size for the next 20000 and higher extent sets (40000+)

 

 

 

ostmg004

 

如绿字所标记 Variable Extent Size要求DATABASE_COMPATIBILITY 》=11.1,否则不管你构建多大的数据文件 最后其Extent Size总是等于1个AU的。

 

SQL> select * from x$kffxp where size_kffxp!=1 and rownum<3;

ADDR                   INDX    INST_ID GROUP_KFFXP NUMBER_KFFXP COMPOUND_KFFXP INCARN_KFFXP  PXN_KFFXP XNUM_KFFXP  LXN_KFFXP DISK_KFFXP   AU_KFFXP FLAGS_KFFXP  CHK_KFFXP SIZE_KFFXP
---------------- ---------- ---------- ----------- ------------ -------------- ------------ ---------- ---------- ---------- ---------- ---------- ----------- ---------- ----------
00007F6FCB2557F0      54983          1           1         6345       16783561    838455287      40000      20000          0          4      17036           0        224          4
00007F6FCB2557F0      54984          1           1         6345       16783561    838455287      40001      20000          1          0       3888           0         21          4

 

如上述查询 size_kffxp 为Extent的大小(几个AU)。 这些EXTENT对应于FILE NUMBER 6345:

 

 

[oracle@mlab2 ~]$ kfed read /dev/asm-disk9 aun=22 blkn=201|less
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                    6345 ; 0x004: blk=6345
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  2648198861 ; 0x00c: 0x9dd84ecd
kfbh.fcn.base:                  1506490 ; 0x010: 0x0016fcba
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfffdb.node.incarn:           838455287 ; 0x000: A=1 NUMM=0x18fce7fb
kfffdb.node.frlist.number:   4294967295 ; 0x004: 0xffffffff
kfffdb.node.frlist.incarn:            0 ; 0x008: A=0 NUMM=0x0
kfffdb.hibytes:                       5 ; 0x00c: 0x00000005
kfffdb.lobytes:              1073750016 ; 0x010: 0x40002000
kfffdb.xtntcnt:                   40768 ; 0x014: 0x00009f40
kfffdb.xtnteof:                   40768 ; 0x018: 0x00009f40
kfffdb.blkSize:                    8192 ; 0x01c: 0x00002000
kfffdb.flags:                        17 ; 0x020: O=1 S=0 S=0 D=0 C=1 I=0 R=0 A=0
kfffdb.fileType:                      2 ; 0x021: 0x02
kfffdb.dXrs:                         18 ; 0x022: SCHE=0x1 NUMB=0x2
kfffdb.iXrs:                         19 ; 0x023: SCHE=0x1 NUMB=0x3
kfffdb.dXsiz[0]:                  20000 ; 0x024: 0x00004e20
kfffdb.dXsiz[1]:                  20000 ; 0x028: 0x00004e20
kfffdb.dXsiz[2]:             4294967288 ; 0x02c: 0xfffffff8
kfffdb.iXsiz[0]:             4294967295 ; 0x030: 0xffffffff
kfffdb.iXsiz[1]:                      0 ; 0x034: 0x00000000
kfffdb.iXsiz[2]:                      0 ; 0x038: 0x00000000
kfffdb.xtntblk:                      63 ; 0x03c: 0x003f
kfffdb.break:                        60 ; 0x03e: 0x003c
kfffdb.priZn:                         0 ; 0x040: KFDZN_COLD
kfffdb.secZn:                         0 ; 0x041: KFDZN_COLD
kfffdb.ub2spare:                      0 ; 0x042: 0x0000
kfffdb.alias[0]:                   6998 ; 0x044: 0x00001b56
kfffdb.alias[1]:             4294967295 ; 0x048: 0xffffffff
kfffdb.strpwdth:                      8 ; 0x04c: 0x08
kfffdb.strpsz:                       20 ; 0x04d: 0x14
kfffdb.usmsz:                         0 ; 0x04e: 0x0000
kfffdb.crets.hi:               32999496 ; 0x050: HOUR=0x8 DAYS=0x2 MNTH=0x2 YEAR=0x7de
kfffdb.crets.lo:              988212224 ; 0x054: USEC=0x0 MSEC=0x1bb SECS=0x2e MINS=0xe
kfffdb.modts.hi:               32999500 ; 0x058: HOUR=0xc DAYS=0x2 MNTH=0x2 YEAR=0x7de
kfffdb.modts.lo:                      0 ; 0x05c: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfffdb.dasz[0]:                       0 ; 0x060: 0x00
kfffdb.dasz[1]:                       2 ; 0x061: 0x02
kfffdb.dasz[2]:                       4 ; 0x062: 0x04
kfffdb.dasz[3]:                       0 ; 0x063: 0x00
kfffdb.permissn:                      0 ; 0x064: 0x00
kfffdb.ub1spar1:                      0 ; 0x065: 0x00
kfffdb.ub2spar2:                      0 ; 0x066: 0x0000
kfffdb.user.entnum:                   0 ; 0x068: 0x0000
kfffdb.user.entinc:                   0 ; 0x06a: 0x0000
kfffdb.group.entnum:                  0 ; 0x06c: 0x0000
kfffdb.group.entinc:                  0 ; 0x06e: 0x0000

/* -------------------------------- kfdasz --------------------------------- */
/*
  NAME
    kfdasz - Kernel Files Disk Au SiZe.

  DESCRIPTION
    Enumerates the possible AU size multiples which may be used for
    Multi-AU "buddied" extents.

  NOTES
    The AU size multiple value may be determined by using the
    KFDASZ_VALUE macro.
*/
#ifndef KFDASZ_1X
/* 10g had space for four sizes, but only 1x was used. The remaining sizes
 * are new in 11, but 10g never references them, so it is safe to change the
 * values
 */
#define KFDASZ_1X      ((kfdasz)0)      /*    1x AU size                     */
#define KFDASZ_2X      ((kfdasz)1)      /*    2x AU size                     */
#define KFDASZ_4X      ((kfdasz)2)      /*    4x AU size                     */
#define KFDASZ_8X      ((kfdasz)3)      /*    8x AU size                     */
#define KFDASZ_16X     ((kfdasz)4)      /*   16x AU size                     */
#define KFDASZ_32X     ((kfdasz)5)      /*   32x AU size                     */
#define KFDASZ_64X     ((kfdasz)6)      /*   64x AU size                     */
#define KFDASZ_LAST    ((kfdasz)7)      /*   First unused value 11g          */
#define KFDASZ_LAST_10 ((kfdasz)4)      /*   First unused value 10g          */
#define KFDASZ_VALUE(x) ((ub1)(1 << (x)))
#endif /* KFDASZ_1X */

 

An extent is composed of one or more allocation units (AUs). An extent can be 1 AU, 4 AUs, 16 AUs, or 64 AUs. The extent size is encoded in two bits of the flags field as a kfdasz enumeration.

 

为什么RHEL 6上没有ASMLIB?

为什么RHEL 6上没有ASMLIB?

 

有些人简单解释为Oracle为了推广自家的Oracle Linux而特意为之,实际上这一评价是不公允的。

ASMLIB的特殊性是它有部分Linux Kernel内核级别的组件, 而Kernel的开放与否完全取决于Kernel内核的维护者,对于Red Hat Enterprise Linux而言是Redhat公司。 由于Redhat公司从RHEL 6这个版本开始决定从Kernel中移除ASMLIB,并且不再给与Oracle公司对于该部分内核代码的访问修改权利,这直接导致Oracle不可能build出FOR RHEL 6的ASMLIB版本,因此显然这是Redhat的问题,而非O记得问题。

 

注意ASMLIB仅仅是不支持原生态的RHEL 6的Kernel,这不代表在RHEL 6上不可能用ASMLIB。Metalink文档介绍了相关信息:

 

What is ASMLib?

 

ASMLib is free, optional software for the Automatic Storage Management (ASM) feature of Oracle Database that simplifies the management and discovery of ASM disks and makes I/O processing and kernel resource usage with ASM storage more efficient. ASMLib is not required to use the Automatic Storage Management (ASM) feature of Oracle Database on Linux and all features and functionality of ASM will work without ASMLib.

Software Update Policy for ASMLib running on Red Hat Enterprise Linux

Oracle provides ASMLib software and support for customers who receive Red Hat Enterprise Linux (RHEL) operating system support from Red Hat and have a valid Oracle database support contract. Only the latest release of ASMLib will be provided for new Linux kernels released with each new RHEL minor release (“Update”). For example, if Red Hat were to release kernel 2.6.18-194.0.1.el5, Oracle will only release the latest version of ASMLib, say 2.0.5, for that kernel. Oracle will not release any previous versions of ASMlib for that kernel.

Furthermore, ASMLib software is only provided for Linux kernels for which the corresponding packages (devel, src, binaries) are available to Oracle. For example, Oracle cannot provide ASMLib software for kernels provided under Red Hat’s Extended Update Model or “z-stream” support.

Red Hat Enterprise Linux 6 (RHEL6)

For RHEL6 or Oracle Linux 6, Oracle will only provide ASMLib software and updates when configured Unbreakable Enterprise Kernel (UEK). Oracle will not provide ASMLib packages for kernels distributed by Red Hat as part of RHEL 6 or the Red Hat compatible kernel in Oracle Linux 6. ASMLib updates will be delivered via Unbreakable Linux Network(ULN) which is available to customers with Oracle Linux support. ULN works with both Oracle Linux or Red Hat Linux installations, but ASMlib usage will require replacing any Red Hat kernel with UEK

 
对于RHEL6,只要使用Oracle自己的Unbreakable Enterprise Kernel(UEK)内核则仍可以使用ASMLIB软件及其更新,仅仅是不支持RHEL 6自带的Kernel以及Redhat兼容的Kernel。  可以使用Unbreakable Linux Network(ULN)更新ASMLIB,前提是用户购买了Oracle Linux  Support服务。 不管是Oracle Linux还是Redhat Linux均可以使用ULN服务,但是ASMlib要求必须将任何Redhat Kernel替换为UEK。

 

Unbreakable Linux Network

 

相关阅读:

 在Linux 6上使用UDEV解决RAC ASM存储设备名问题
利用UDEV服务解决RAC ASM存储设备名
Why ASMLIB and why not?
如何诊断ASMLIB故障

【ASM数据恢复】如何修复ASM Disk header_status=FORMER的磁盘重新加入Diskgroup ORA-15017 ORA-15063 ORA-15032

存在以下这种情况, 由于误DROP DISKGROUP或者误将DISK DROP 出原Diskgroup,或者因为Bug 13331814: ASM DISKS TURNED INTO FORMER WHILE DISKGROUP IS MOUNTED, 导致ASM DISK的header_status=FORMER,而非正常的MEMBER状态。

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

举例来说如下面的例子

 

 

[oracle@mlab2 ~]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 19 21:55:09 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Automatic Storage Management option

SQL>  create diskgroup maclean external redundancy disk '/dev/asm-disk9';

Diskgroup created.

SQL> select group_number,name,state from v$asm_diskgroup;

GROUP_NUMBER NAME                           STATE
------------ ------------------------------ -----------
           1 DATA                           MOUNTED
           2 MACLEAN                        MOUNTED

SQL> col path for a40
SQL> select name,path,header_status from v$asm_disk where group_number=2;

NAME                           PATH
------------------------------ ----------------------------------------
HEADER_STATU
------------
MACLEAN_0000                   /dev/asm-disk9
MEMBER

 

 

这里我们将diskgroup drop一次:

 

 

SQL> drop diskgroup maclean;

Diskgroup dropped.

SQL> alter diskgroup maclean mount;
alter diskgroup maclean mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "MACLEAN" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"MACLEAN"

SQL> select name,path,header_status from v$asm_disk where path='/dev/asm-disk9';

NAME                           PATH
------------------------------ ----------------------------------------
HEADER_STATU
------------
                               /dev/asm-disk9
FORMER

 

 

 

使用kfed查看 ASM disk header metadata

 

 

[oracle@mlab2 ~]$ kfed read /dev/asm-disk9 |head -25
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                   554377417 ; 0x00c: 0x210b20c9
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:      ORCLDISK ; 0x000: length=11
kfdhdb.driver.reserved[0]:        65796 ; 0x008: 0x00010104
kfdhdb.driver.reserved[1]:            1 ; 0x00c: 0x00000001
kfdhdb.driver.reserved[2]:      4206569 ; 0x010: 0x00402fe9
kfdhdb.driver.reserved[3]:      3367865 ; 0x014: 0x003363b9
kfdhdb.driver.reserved[4]:    196018176 ; 0x018: 0x0baf0000
kfdhdb.driver.reserved[5]:    390595073 ; 0x01c: 0x17480201
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        0 ; 0x024: 0x0000
kfdhdb.grptyp:                        1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts:                        4 ; 0x027: KFDHDR_FORMER
kfdhdb.dskname:            MACLEAN_0000 ; 0x028: length=12
kfdhdb.grpname:                 MACLEAN ; 0x048: length=7
kfdhdb.fgname:             MACLEAN_0000 ; 0x068: length=12

 

 

 

这里的kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER 说明了该DISK的状态为FORMER

 

 

首先备份对应 ASM DISK的header

 

[oracle@mlab2 ~]$ mkdir /tmp/asm
[oracle@mlab2 ~]$ dd if=/dev/asm-disk9 of=/tmp/asm/asm-disk9-header bs=1024k count=20

[oracle@mlab2 ~]$ kfed read /dev/asm-disk9 > /tmp/asm/asm-disk9-meta

[oracle@mlab2 ~]$ ls -l /tmp/asm/asm-disk9-meta
-rw-r--r-- 1 oracle oinstall 6597 Nov 20 01:26 /tmp/asm/asm-disk9-meta

修改asm-disk9-meta这个文本中的内容

kfdhdb.hdrsts: 4 ; 0x027: KFDHDR_FORMER

修改为

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER

 

 

 

如下面2图所示的变更:

 

hdrsts

 

 

 

 

 

 

 

 

 

 

 

hdrsts2

 

 

之后可以使用修改好的元数据信息文本来patch ASM DISK了,具体命令如下:

 

 

[oracle@mlab2 ~]$ kfed merge /dev/asm-disk9 text=/tmp/asm/asm-disk9-meta

再次确认 

[oracle@mlab2 ~]$ kfed read /dev/asm-disk9 |grep hdrsts
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER

 

 

最后尝试MOUNT该DISK对应的DISKGROUP:

 

SQL> alter diskgroup maclean mount;

Diskgroup altered.

SQL> select name,state from v$asm_diskgroup;

NAME                           STATE
------------------------------ -----------
DATA                           MOUNTED
MACLEAN                        MOUNTED

【ASM数据恢复】ORA-15196 invalid ASM block header [kfc.c] [check_kfbh]错误解析 ORA-15196: 无效的 ASM 块标头 [:] [] [] [] [ != ]

ORA-15196: invalid ASM block header [kfc.c] [check_kfbh] [325] [2147483650] [2170461157 != 2170461165]

ORA-15196: 无效的 ASM 块标头 [:] [] [] [] [ != ]

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

 

诗檀软件专业数据库修复团队 

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

是当ASM读取到无效的asm block header时的报错,相关的BUG NOTE如下:

 

check_kfbh的存在时为了检验块的一致性 ,会对整个块做一个32 bit xor 来对比现有check_kfbh,有点象block的checksum

The field check_kfbh is adjusted to ensure that a 32 bit xor of the whole block will be zero. Check values are always calculated before writing a
block to disk, but may not be accurate for a block in the buffer cache. In this example the check value is “8A 8B D0 AE” but since this is little
endian the actual value is “0xAED08B8A

 

 

BUG 13010760 – ORA-15196: INVALID ASM BLOCK HEADER [KFC.C:9133] [CHECK_KFBH] [2147483649] [8]       11.1.0.7.0

BUG 13829821 – ORA-15196 [KFC.C:25210] [CHECK_KFBH] [2147483649] [8] [2170839822 != 2170840087]             11.2.0.2.2

BUG 13978640 – LNX64-12.1-ASM:HIT ORA-15196:INVALID ASM BLOCK HEADER [KFC.C:27990] [CHECK_KFBH]   12.1

BUG 14827224 – PS:WIN64:ORA-15196:INVALID ASM BLOCK HEADER[KFC.C:28261] ON DB CREATE ON VMS  12.1

BUG 16025504 – ASM METADATA CORRUPTION DURING OAM SYSTEM TESTING  12.1

BUG 14020529 – RAC CLUSTER CRASH WITH ORA-15196 ERROR 10.2.0.4

BUG 14109859 – DISKGROUP ASM NOT POSSIBLE MOUNT – INSTANCE CRASH AND NOT OPEN 11.2.0.2

BUG 14740185 – ASM REPORTED CORRUPTED AT BLOCKS : ORA-15196: INVALID ASM BLOCK HEADER [KFC.C:23  11.2.0.1

BUG 13591322 – ORA-15196 AT BLOCK CORRUPTION            11.1.0.7

BUG 14676017 – NOT ABLE TO MOUNT ASM DISKGROUP ORA-15040 ORA-15042    11.2.0.3

BUG 14771123 – ORA-15335: ASM METADATA CORRUPTION DETECTED IN DISK GROUP  11.2.0.4

BUG 13952321 – ALL DBS DOWN DUE TO INVALID ASM BLOCK HEADER  11.2.0.1

BUG 14728558 – ASM REPORTED CORRUPTED METADATA & AT BLOCKS   11.1.0.7.0

 

对于该问题 建议:

 

1、首先要检查 OS是否存在IO丢失的问题,例如 AIX上执行 errpt ,Linux查看os log 和dmesg

2、 检查RAC的多个节点上存储设备名是否一致

3、 考虑执行 ‘ alter diskgroup check norepair’来检查ASM diskgroup

4、11.2中如果遇到上述情况会自动使用amdu工具dump ASM disk其trace一般在ASM对应的user dump下 ,如NOTE: AMDU dump of disk group MAC created at /oracle/diag/asm/+asm/+ASM3/trace, 整个AMDU trace里包含了很有用的信息,如果存在大量的AMDU-00209: Corrupt block found: Disk N0072 AU [26127] block [94] type [0]信息,和看到  Corrupt metadata blocks: 320  大量损坏的元数据块, 则说明该问题一般是由于 OS层或者磁盘/存储硬件层引起的,因为不太可能有ASM的bug 会造成这么大量的损坏, 解释成丢失更新 Lost Update也不合理

 

5、实际上如果你和我一样检阅过上面全部的BUG Note,你会发现没有个BUG note是最终定位为real bug的,这些SR/Bug Note最终要么不了了之,要么查出来确实是OS或者硬盘/存储有问题;所以对于这个问题提交SR的效率很低, 往往找OS和存储厂商一起检查更有效率

6、如果你真的遇到了该问题,那么如果坏掉的元数据块(metadata block)比较少的话,可以请相关的ASM专家手工帮你修复,具体也可以找ASKMACLEAN专业数据库修复团队成员帮您恢复。切记不要在无备份ASM header的情况下操作,否则可能丢失恢复数据的最后一线希望。

Oracle ASM工具amdu使用指南

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638    QQ号:47079569    邮箱:service@parnassusdata.com

 

AMDU有以下三种功能:

 

  1. 将ASM DISK上的元数据转储到文件系统上以便分析
  2. 将ASM文件的内容抽取出来并写入到OS文件系统,Diskgroup是否mount均可
  3. 打印出块的元数据,以块中C语言结构或16进制的形式

 

 

AMDU输入数据可以是ASM DISK的内容,亦或者是上一次运行AMDU所生成的文件夹中的信息。

 

 

 

选项-diskstring和-exclude用以指定那些ASM DISK需要被读取。 选项-direcotry指定上一次运行AMDU所生成的文件夹。 指定的文件夹也可以是包含上一次文件夹内容的拷贝。

 

 

探测磁盘

 

这一个步骤使用ASM Discovery信息以找到磁盘组。磁盘的头部Asm disk header将被读取以便判断哪些磁盘属于哪个Diskgroup。下一步骤中被扫描的磁盘将在此步骤中被选择。探测的结果将被存放在report文件中。 使用-directory选项,可以读取已存在的报告文件而非重新在重复一次本步骤。

 

 

 

[oracle@lab1 oracle.SupportTools]$ ./amdu -diskstring '/dev/asm*'
amdu_2012_09_23_01_40_44/


[oracle@lab1 oracle.SupportTools]$ cd amdu_2012_09_23_01_40_44/
[oracle@lab1 amdu_2012_09_23_01_40_44]$ ls

report.txt

[oracle@lab1 amdu_2012_09_23_01_40_44]$ cat report.txt
-*-amdu-*-

******************************* AMDU Settings ********************************
ORACLE_HOME = ?
System name:    Linux
Node name:      lab1.oracle.com
Release:        2.6.32-200.13.1.el5uek
Version:        #1 SMP Wed Jul 27 21:02:33 EDT 2011
Machine:        x86_64
amdu run:       23-SEP-12 01:40:44
Endianess:      1


--------------------------------- Operations ---------------------------------

------------------------------- Disk Selection -------------------------------
-diskstring '/dev/asm*'

------------------------------ Reading Control -------------------------------

------------------------------- Output Control -------------------------------

********************************* DISCOVERY **********************************

----------------------------- DISK REPORT N0001 ------------------------------
Disk Path: /dev/asm-diskd
Unique Disk ID:
Disk Label:
Physical Sector Size: 512 bytes
Disk Size: 8192 megabytes
Group Name: FRA
Disk Name: FRA_0000
Failure Group Name: FRA_0000
Disk Number: 0
Header Status: 3
Disk Creation Time: 2012/09/21 02:42:53.616000
Last Mount Time: 2012/09/23 01:00:49.311000
Compatibility Version: 0x0a100000
Disk Sector Size: 512 bytes
Disk size in AUs: 8192 AUs
Group Redundancy: 1
Metadata Block Size: 4096 bytes
AU Size: 1048576 bytes
Stride: 113792 AUs
Group Creation Time: 2012/09/21 02:42:53.563000
File 1 Block 1 location: AU 2
----------------------------- DISK REPORT N0002 ------------------------------
Disk Path: /dev/asm-diskc
Unique Disk ID:
Disk Label:
Physical Sector Size: 512 bytes
Disk Size: 8192 megabytes
Group Name: DATA
Disk Name: DATA_0001
Failure Group Name: DATA_0001
Disk Number: 1
Header Status: 3
Disk Creation Time: 2012/09/21 02:39:12.436000
Last Mount Time: 2012/09/23 01:00:49.097000
Compatibility Version: 0x0b200000
Disk Sector Size: 512 bytes
Disk size in AUs: 8192 AUs
Group Redundancy: 1
Metadata Block Size: 4096 bytes
AU Size: 1048576 bytes
Stride: 113792 AUs
Group Creation Time: 2012/09/21 02:39:12.389000
File 1 Block 1 location: AU 0

----------------------------- DISK REPORT N0003 ------------------------------
Disk Path: /dev/asm-diskb
Unique Disk ID:
Disk Label:
Physical Sector Size: 512 bytes
Disk Size: 8192 megabytes
Group Name: DATA
Disk Name: DATA_0000
Failure Group Name: DATA_0000
Disk Number: 0
Header Status: 3
Disk Creation Time: 2012/09/21 02:39:12.436000
Last Mount Time: 2012/09/23 01:00:49.097000
Compatibility Version: 0x0b200000
Disk Sector Size: 512 bytes
Disk size in AUs: 8192 AUs
Group Redundancy: 1
Metadata Block Size: 4096 bytes
AU Size: 1048576 bytes
Stride: 113792 AUs
Group Creation Time: 2012/09/21 02:39:12.389000
File 1 Block 1 location: AU 2
******************************* END OF REPORT ********************************

 

 

 

扫描磁盘

磁盘上的分配表将被扫描,基于该分配表的记录和命令行选项,相关的数据块将被写入到镜像文件中。 Map文件将被创建以便描述相关Allocation Unit 以及他们被写入到镜像文件的何处。 若任何文件将被抽取,则他们的这些盘区图将从分配表中读取并在内存中构造起来。若任意块正要被打印则该块的位置将被保留在内存中。在此阶段使用-directory选项将直接读取现有的位图文件而非重新开始本步骤

 

 

抽取文件

将被抽取文件的盘区位图将被排序,从而从ASM DISK中读取该文件的数据并写出到输出文件中。该步骤无法使用-directory选项

 

 

打印块

格式化的块输出将被打印到标准输出并具备该块数据是如何被读取的信息。KFED命令将被使用以便转储该数据块。 使用-directory选项数据从镜像文件中读取。

 

 

 

输出文件

AMDU创建四种类型的输出文件。他们都被存放在新的DUMP目录下。 这些文件名将由AMDU自动生成。 新的DUMP文件目录每次运行都会自动创建。 新的目录的名字基于时间和时期直到每秒。  目录名字写出到标准输出在目录下的文件生成之前。注意目录名字总是关联到当前目录除非你手动输入了-parent选项。

 

若AMDU使用-directory选项则不会有dump目录生成也不会有输出文件创建。取而代之directory选项指定了上一次创建的dump目录的位置。 若使用-pirnt选项则指定生成的格式化文件打印输出从上一次创建的DUMP目录 。 PRINTOUT总是发送到标准输出而非创建新的文件。

 

 

 

抽取文件

 

一个抽取文件被创建,每当使用-extract选项 的命令行。 抽取文件被放置在DUMP目录下,名字为<group>_<number>.f,这里的<group>是diskgroup 的大写的名字,而数字是命令行中指定的文件号。 该抽取文件将出现并包含同样的内容,通过数据库访问。 若该文件的部分不可用则该部分的输出文件将被填入0xBADFDA7A,同时一条信息将出现在stderr中。

选项-output将被使用抽取一个单独的文件到特性的文件名而非DUMP目录。也可以配合nodir选项避免完全创建新的DUMP 目录。

 

镜像文件

 

镜像文件包含ASM DISK的块镜像。 这是从磁盘中拷贝出来的裸数据。 由于可能有大量数据,所以这些文件系统可能存在问题当大文件时, 同时一个镜像文件总是小于2gb。 当有多于2GB的数据时,将生成多个镜像文件。一个镜像文件包含多个磁盘中的数据, 但同时也保证这些磁盘都属于同一个diskgroup(基于磁盘头)。 同一个allocation unit中的块总是相邻的且不跨越镜像文件。 无意义的数据,例如空块,将不会被转储, 因此总是只有部分AU在DUMP中。 所以完整的镜像文件的大小非常数。

 

即便是已经从DISKGROUP角度drop掉的DISK将仍在DISK HEADER包含group name , 将可能被包含在镜像文件中,若-former选项被选中。 注意,这里与mount diskgroup不同,PST表将不作为磁盘是否是DISK GROUP中的依据。那些被强迫drop掉的DISK甚至不需要-former选项也将被包含在镜像文件中。

 

镜像文件的名字基于group name和一个序列号。形式是大写的GROUP NAME,以及一个序列号。第一个镜像文件的序列号是0001.

 

位图文件

 

map file是ASCII文件,描述了特定磁盘组的镜像文件中的数据。 AMDU将为每一个系列的镜像文件创建一个map file,距离来说一个磁盘组对应一个map file。 map file包含每一行对应一个allocation unit,包含内容的对应dump file。 其中部分allocation units可能在map file中有记录 但是实际甚至对应image file没有写入内容。 每一行都有着同样的字段和同样的长度,这些行在image file中对应数据的顺序,但是包含绝对参考在image file中的位置以便能够 其他排序方式 , 而不用丢失image file中AU的跟踪。

 

 

以下的字段包含在每一行中。这些字段使用空格分离。每一个字段开始以一个唯一的字母之后紧跟着数字和前导零。 他们应该帮助排序和查找一遍重新组织map file。在以下的描述中 这个 前导的字母和 数字位 使用 圆括号括起来。 举例来说 D4表示大写D紧跟着4位数字。

 

Disk Report Number (N4) 每一个磁盘被探测发现的均被分配一个disk report number,该号码打印在报告中以及该DISK的信息。2个DISK在同一个diskgroup,同一个disk number的将会有不同的2个disk report number。第一个disk reported会使用disk report number 1

Disk number (D4):这是磁盘号字段从header中抓取到的。若该磁盘号无效或头部无法识别则该字段为9999

Disk repeat (R2) 一般为0 , 有可能在同一个 diskgroup中找到2个磁盘拥有同样的disk number。 The first repeat gets a repeat count of 1 for its map file entries.第一个除非有超过100个同样disk number,则额外的位数将进位。

Allocation Unit (A8): 磁盘中的AU number。  The AU within the disk where the data was read. Note that this is different than the extent number for physically addressed metadata since extent 2 is near AU 113,000. If the disk is greater than 100 terabytes and the AU size is one megabyte, then this field could exceed 8 digits

File Number (F8)文件号 The ASM file that owns the extent. If the number is less than 256 then this is ASM metadata or an ASM registry. If this is physically addressed metadata then the file number will be 00000000

Extent Number (E8): The physical extent number within the file. This is the index in the file extent map that a database instance would use to find this AU. If the file was (two-way) mirrored then this is a primary extent if the number is even, and a secondary copy if it is odd. If this is an indirect extent then this is a value between 0 and 299 giving the index into the indirect extents. For physically addressed metadata this is the extent within the physically addressed metadata, not the AU within the disk.

AU within extent (U2): Large extents are supported for large files. Thus there could be multiple AU’s dumped for the same extent. Note that metadata files do not currently use large extents so this only happens for user file dumps to image files.

Block count (C5): The number of blocks copied to the image file from the AU. A lot of space is saved by not creating images of blocks that are just initialized contents. This is particularly true for indirect extents where most indirect extents will have only a few blocks of extent pointers. If the extent is not dumped to the image file then this is zero. The count is in ASM metadata blocks, even if the file number is >256 and the indirect flag is 0. This is normally 4K blocks, but could be different in the future. With the -noimage option this is always zero since no images are ever created.

Image File Sequence Number (S4): This is the NNNN field of the image file name where blocks from the AU are dumped. With the -noimage option this is always zero since no image files are ever created.

Byte Offset in Image File (B10): This is the location within the image file where the block images appear. It is always a multiple of the ASM metadata block size. Since the image file is always less than 2Gb this will always fit in a 32 bit signed integer. Note that this will be an offset to the end of the previously dumped AU when the block count is zero. With the -noimage option this is always zero since no images are ever created.

Corrupt Block Flag (X0): If any of the blocks in the AU are corrupt, then the line will end with ‘X’. Normally this is a blank character so that the line ends in two blanks.

 

 

 

AMDU是ORACLE针对ASM开发的源数据转储工具,其全称为ASM Metadata Dump Utility(AMDU)

AMDU具体以下三个主要功能:

  1. 将ASM DISK上的元数据转储到文件系统上以便分析
  2. 将ASM文件的内容抽取出来并写入到OS文件系统,Diskgroup是否mount均可
  3. 打印出块的元数据,以块中C语言结构或16进制的形式

 

 

这里我们将用到使用AMDU抽取ASM DISKGROUP中的数据文件; ASM作为近几年最流行的存储解决方案, 大家对他的优缺点都有所了解,其中的问题之一就是ASM是个黑盒。 一旦DISKGROUP无法MOUNT起来就意味着传统方法无法以磁盘为基础导出任何数据。

 

AMDU解决了这一问题, 这里我们仅讨论在ASM DISKGROUP 无法MOUNT的情况下的范畴,不讨论RDBMS数据文件在ASM下讹误的处理。

 

注意 AMDU虽然是11g才发布的工具,但是实际对10g的ASM 也有效。

 

当前你可能遇到的场景是, ORACLE DATABASE的SPFILE、CONTROLFILE、DATAFILE均存放在ASM DISKGROUP中,而由于一些ASM ORA-600错误导致无法MOUNT该DISKGROUP, 你需要的是使用AMDU将这些文件从ASM DISK中转储出来。

 

场景 1 丢失了 包括SPFILE、CONTROLFILE、DATAFILE

 

恢复步骤: 从备份中还原出SPFILE ,即便没有SPFILE的话PFILE也可以,总之你需要从参数文件中了解control_files的信息

 

 

SQL> show parameter control_files
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string      
+DATA/prodb/controlfile/current.260.794687955, +FRA/prodb/controlfile/current.256.794687955

 

获得control_files 控制文件在ASM中的位置后事情就好办了,+DATA/prodb/controlfile/current.260.794687955 这里 260是这个控制文件在+DATA 这个DISKGROUP中的FILE NUMBER

此外我们还需要ASM DISK的DISCOVERY PATH信息,这完全可以从ASM的SPFILE中的asm_diskstring 参数获得

 

[oracle@mlab2 oracle.SupportTools]$ unzip amdu_X86-64.zip

Archive:  amdu_X86-64.zip
inflating: libskgxp11.so
inflating: amdu
inflating: libnnz11.so
inflating: libclntsh.so.11.1







[oracle@mlab2 oracle.SupportTools]$ export LD_LIBRARY_PATH=./



[oracle@mlab2 oracle.SupportTools]$ ./amdu -diskstring '/dev/asm*' -extract data.260

amdu_2009_10_10_20_19_17/
AMDU-00204: Disk N0006 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0006: '/dev/asm-disk10'
AMDU-00204: Disk N0003 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0003: '/dev/asm-disk5'
AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0002: '/dev/asm-disk6'



[oracle@mlab2 oracle.SupportTools]$ cd amdu_2009_10_10_20_19_17/

[oracle@mlab2 amdu_2009_10_10_20_19_17]$ ls
DATA_260.f  report.txt

[oracle@mlab2 amdu_2009_10_10_20_19_17]$ ls -l
total 9548

-rw-r--r-- 1 oracle oinstall 9748480 Oct 10 20:19 DATA_260.f
-rw-r--r-- 1 oracle oinstall    9441 Oct 10 20:19 report.txt

 

 

以上转储出来的DATA_260.f 就是控制文件,我们使用该控制文件startup mount RDBMS实例:

 

SQL> alter system set control_files='/opt/oracle.SupportTools/amdu_2009_10_10_20_19_17/DATA_260.f' scope=spfile;
System altered.




SQL> startup force mount;
ORACLE instance started.


Total System Global Area 1870647296 bytes
Fixed Size                  2229424 bytes
Variable Size             452987728 bytes
Database Buffers         1409286144 bytes
Redo Buffers                6144000 bytes
Database mounted.







SQL> select name from v$datafile;




NAME

--------------------------------------------------------------------------------
+DATA/prodb/datafile/system.256.794687873
+DATA/prodb/datafile/sysaux.257.794687875
+DATA/prodb/datafile/undotbs1.258.794687875
+DATA/prodb/datafile/users.259.794687875
+DATA/prodb/datafile/example.265.794687995
+DATA/prodb/datafile/mactbs.267.794688457




6 rows selected.

 

 

startup mount实例后,可以从v$datafile中获得数据文件名,其中就包括了其在DISKGROUP中的FILE NUMBER

 

再使用./amdu -diskstring ‘/dev/asm*’ -extract 命令即可 导出数据文件到操作系统

 

 

[oracle@mlab2 oracle.SupportTools]$ ./amdu -diskstring '/dev/asm*' -extract data.256

amdu_2009_10_10_20_22_21/
AMDU-00204: Disk N0006 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0006: '/dev/asm-disk10'
AMDU-00204: Disk N0003 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0003: '/dev/asm-disk5'
AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0002: '/dev/asm-disk6'







[oracle@mlab2 oracle.SupportTools]$ cd amdu_2009_10_10_20_22_21/
[oracle@mlab2 amdu_2009_10_10_20_22_21]$ ls
DATA_256.f  report.txt


[oracle@mlab2 amdu_2009_10_10_20_22_21]$ dbv file=DATA_256.f




DBVERIFY: Release 11.2.0.3.0 - Production on Sat Oct 10 20:23:12 2009
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
DBVERIFY - Verification starting : FILE = /opt/oracle.SupportTools/amdu_2009_10_10_20_22_21/DATA_256.f


DBVERIFY - Verification complete

Total Pages Examined         : 90880
Total Pages Processed (Data) : 59817
Total Pages Failing   (Data) : 0
Total Pages Processed (Index): 12609
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 3637
Total Pages Processed (Seg)  : 1
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 14817
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Total Pages Encrypted        : 0
Highest block SCN            : 1125305 (0.1125305)


使用AMDU工具从无法MOUNT的ORACLE ASM DISKGROUP中抽取数据文件

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

目前ORACLE PRM-DUL 免费提供ORACLE ASM中的文件克隆功能了,详见http://www.parnassusdata.com/

 

AMDU是ORACLE针对ASM开发的源数据转储工具,其全称为ASM Metadata Dump Utility(AMDU)

AMDU具体以下三个主要功能:

1. 将ASM DISK上的元数据转储到文件系统上以便分析
2. 将ASM文件的内容抽取出来并写入到OS文件系统,Diskgroup是否mount均可
3. 打印出块的元数据,以块中C语言结构或16进制的形式

 

这里我们将用到使用AMDU抽取ASM DISKGROUP中的数据文件; ASM作为近几年最流行的存储解决方案, 大家对他的优缺点都有所了解,其中的问题之一就是ASM是个黑盒。 一旦DISKGROUP无法MOUNT起来就意味着传统方法无法以磁盘为基础导出任何数据。

AMDU解决了这一问题, 这里我们仅讨论在ASM DISKGROUP 无法MOUNT的情况下的范畴,不讨论RDBMS数据文件在ASM下讹误的处理。

注意 AMDU虽然是11g才发布的工具,但是实际对10g的ASM 也有效。

当前你可能遇到的场景是, ORACLE DATABASE的SPFILE、CONTROLFILE、DATAFILE均存放在ASM DISKGROUP中,而由于一些ASM ORA-600错误导致无法MOUNT该DISKGROUP, 你需要的是使用AMDU将这些文件从ASM DISK中转储出来。

场景 1 丢失了 包括SPFILE、CONTROLFILE、DATAFILE

 

恢复步骤: 从备份中还原出SPFILE ,即便没有SPFILE的话PFILE也可以,总之你需要从参数文件中了解control_files的信息

 

 

SQL> show parameter control_files

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
control_files string   
+DATA/prodb/controlfile/current.260.794687955, 
+FRA/prodb/controlfile/current.256.794687955

 

获得control_files 控制文件在ASM中的位置后事情就好办了,+DATA/prodb/controlfile/current.260.794687955 这里 260是这个控制文件在+DATA 这个DISKGROUP中的FILE NUMBER

此外我们还需要ASM DISK的DISCOVERY PATH信息,这完全可以从ASM的SPFILE中的asm_diskstring 参数获得

 

[oracle@mlab2 oracle.SupportTools]$ unzip amdu_X86-64.zip
Archive: amdu_X86-64.zip
inflating: libskgxp11.so
inflating: amdu
inflating: libnnz11.so
inflating: libclntsh.so.11.1

[oracle@mlab2 oracle.SupportTools]$ export LD_LIBRARY_PATH=./

[oracle@mlab2 oracle.SupportTools]$ ./amdu -diskstring '/dev/asm*' -extract data.260
amdu_2009_10_10_20_19_17/
AMDU-00204: Disk N0006 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0006: '/dev/asm-disk10'
AMDU-00204: Disk N0003 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0003: '/dev/asm-disk5'
AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0002: '/dev/asm-disk6'

[oracle@mlab2 oracle.SupportTools]$ cd amdu_2009_10_10_20_19_17/
[oracle@mlab2 amdu_2009_10_10_20_19_17]$ ls
DATA_260.f report.txt
[oracle@mlab2 amdu_2009_10_10_20_19_17]$ ls -l
total 9548
-rw-r--r-- 1 oracle oinstall 9748480 Oct 10 20:19 DATA_260.f
-rw-r--r-- 1 oracle oinstall 9441 Oct 10 20:19 report.txt

 

 

以上转储出来的DATA_260.f 就是控制文件,我们使用该控制文件startup mount RDBMS实例:

 

 

SQL> alter system set control_files='/opt/oracle.SupportTools/amdu_2009_10_10_20_19_17/DATA_260.f' scope=spfile;

System altered.

SQL> startup force mount;
ORACLE instance started.

Total System Global Area 1870647296 bytes
Fixed Size 2229424 bytes
Variable Size 452987728 bytes
Database Buffers 1409286144 bytes
Redo Buffers 6144000 bytes
Database mounted.

SQL> select name from v$datafile;

NAME
--------------------------------------------------------------------------------
+DATA/prodb/datafile/system.256.794687873
+DATA/prodb/datafile/sysaux.257.794687875
+DATA/prodb/datafile/undotbs1.258.794687875
+DATA/prodb/datafile/users.259.794687875
+DATA/prodb/datafile/example.265.794687995
+DATA/prodb/datafile/mactbs.267.794688457

6 rows selected.

startup mount实例后,可以从v$datafile中获得数据文件名,其中就包括了其在DISKGROUP中的FILE NUMBER

再使用./amdu -diskstring '/dev/asm*' -extract 命令即可 导出数据文件到操作系统

[oracle@mlab2 oracle.SupportTools]$ ./amdu -diskstring '/dev/asm*' -extract data.256
amdu_2009_10_10_20_22_21/
AMDU-00204: Disk N0006 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0006: '/dev/asm-disk10'
AMDU-00204: Disk N0003 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0003: '/dev/asm-disk5'
AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0002: '/dev/asm-disk6'

[oracle@mlab2 oracle.SupportTools]$ cd amdu_2009_10_10_20_22_21/
[oracle@mlab2 amdu_2009_10_10_20_22_21]$ ls
DATA_256.f report.txt
[oracle@mlab2 amdu_2009_10_10_20_22_21]$ dbv file=DATA_256.f

DBVERIFY: Release 11.2.0.3.0 - Production on Sat Oct 10 20:23:12 2009

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

DBVERIFY - Verification starting : FILE = /opt/oracle.SupportTools/amdu_2009_10_10_20_22_21/DATA_256.f

DBVERIFY - Verification complete

Total Pages Examined : 90880
Total Pages Processed (Data) : 59817
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 12609
Total Pages Failing (Index): 0
Total Pages Processed (Other): 3637
Total Pages Processed (Seg) : 1
Total Pages Failing (Seg) : 0
Total Pages Empty : 14817
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 1125305 (0.1125305)

11gR2 RAC ASM启动揭秘

11gR2 RAC中ocr和votedisk终于可以存放在ASM中了, 这避免了10g中仍需要为这2个RAC的关键点划分裸设备的窘境,  随之 11gR2 中ASM的spfile也可以存放到ASM diskgroup中以实现多节点ASM的共享管理了。

 

这听上去似乎有些不可思议,照常理来说 ASM实例启动并mount diskgroup后才能够访问diskgroup上的文件, 但是ASM实例只有获得ASM spfile后才能够启动实例,这2者形成了死循环。

 

有同学在T.askmac.cn上提问关于ASM启动的疑问

 

hello maclean,

查看spfile位置
ASMCMD> spget
+CRSDG/rac/asmparameterfile/registry.253.787925627
就有个疑问,ASM 也算是一种ORACLE instance,自动的系统参数文件在自己的diskgroup,我的问题是它是如何启动从自身未启动的磁盘组读的参数文件?
thanks.!

 

我们来解释这个问题:

从11.2开始Oracle Cluterware标示voting disk files的方法较之前的版本11.1或10.2有所区别,11.2之前voting disk file的位置存放在OCR中, 但是因为从11.2开始ocr和votedisk可以存放在ASM了 , 所以自11.2始voting disk file通过GPNP profile中的CSS voting file discovery string来定位。

CSS voting disk file的discovery string将指向ASM,所以它要使用ASM discovery string的值。  如以下的例子使用udev绑定设备名作为ASM使用的LUN, 这些udev获得的设备形式如/dev/rasm-disk* , 我们利用gpnptool get命令获得gpnp profile:

 

 

[grid@maclean1 trace]$ gpnptool get



Warning: some command line parameters were defaulted. Resulting command line:
/g01/grid/app/11.2.0/grid/bin/gpnptool.bin get -o-

<?xml version="1.0" encoding="UTF-8"?><gpnp:GPnP-Profile Version="1.0" xmlns="http://www.grid-pnp.org/2005/11/gpnp-profile" xmlns:gpnp="http://www.grid-pnp.org/2005/11/gpnp-profile" 
xmlns:orcl="http://www.oracle.com/gpnp/2005/11/gpnp-profile" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.grid-pnp.org/2005/11/gpnp-profile gpnp-profile.xsd" 
ProfileSequence="9" ClusterUId="452185be9cd14ff4ffdc7688ec5439bf" 
ClusterName="maclean-cluster" PALocation=""><gpnp:Network-Profile><gpnp:HostNetwork id="gen" 
HostName="*"><gpnp:Network id="net1" IP="192.168.1.0" Adapter="eth0" 
Use="public"/><gpnp:Network id="net2" IP="172.168.1.0" Adapter="eth1" 
Use="cluster_interconnect"/></gpnp:HostNetwork></gpnp:Network-Profile><
orcl:CSS-Profile id="css" DiscoveryString="+asm" LeaseDuration="400"/><orcl:ASM-Profile id="asm" DiscoveryString="/dev/rasm*" SPFile="+SYSTEMDG/maclean-cluster/asmparameterfile/registry.253.788682933"/><
ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"> <InclusiveNamespaces 
xmlns="http://www.w3.org/2001/10/xml-exc-c14n#" 
PrefixList="gpnp orcl xsi"/></ds:Transform></ds:Transforms><
ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>L1SLg10AqGEauCQ4ne9quucITZA=</ds:DigestValue><
/ds:Reference></ds:SignedInfo><ds:SignatureValue>rTyZm9vfcQCMuian6isnAThUmsV4xPoK2fteMc1l0GIvRvHncMwLQzPM/QrXCGGTCEvgvXzUPEKzmdX2oy5vLcztN60UHr6AJtA2JYYodmrsFwEyVBQ1D6wH+HQiOe2SG9UzdQnNtWSbjD4jfZkeQWyMPfWdKm071Ek0Rfb4nxE=</ds:SignatureValue></ds:Signature></gpnp:GPnP-Profile>
Success.

 

 

 

其中重要的2条记录:

 

<orcl:CSS-Profile id=”css” DiscoveryString=”+asm” LeaseDuration=”400″/>
==》css voting disk指向+ASM
<orcl:ASM-Profile id=”asm” DiscoveryString=”/dev/rasm*” SPFile=”+SYSTEMDG/maclean-cluster/asmparameterfile/registry.253.788682933″/>
==》该记录表达了ASM的DiscoveryString=”/dev/rasm*”,即ASM实例启动时会去寻找的设备路径,SPFILE记录了ASM Parameter FILE的ALIAS

 

但是请注意虽然GPNP记录了ASM Parameter FILE的ALIAS,但这不代表ASM直接能访问到该SPFILE,在实际Diskgroup被Mount之前光知道一个ASM ALIAS是没有用的。

我们来看一下+SYSTEMDG/maclean-cluster/asmparameterfile/registry.253.788682933这个SPFILE在ASM中所处的位置:

 

 

[grid@maclean1 wallets]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Tue Jul 17 05:45:35 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> 
set linesize 140 pagesize 1400
col "FILE NAME" format a40
set head on
select NAME         "FILE NAME",
           AU_KFFXP     "AU NUMBER",
           NUMBER_KFFXP "FILE NUMBER",
           DISK_KFFXP   "DISK NUMBER"
      from x$kffxp, v$asm_alias
     where GROUP_KFFXP = GROUP_NUMBER
       and NUMBER_KFFXP = FILE_NUMBER
       and name in ('REGISTRY.253.788682933')
   order by  DISK_KFFXP,AU_KFFXP;

FILE NAME                                 AU NUMBER FILE NUMBER DISK NUMBER
---------------------------------------- ---------- ----------- -----------
REGISTRY.253.788682933                           39         253           1
REGISTRY.253.788682933                           35         253           3
REGISTRY.253.788682933                           35         253           4


SQL> col path for a50
SQL> select disk_number,path from v$asm_disk where disk_number in (1,3,4) and GROUP_NUMBER=3;

DISK_NUMBER PATH
----------- --------------------------------------------------
          3 /dev/rasm-diske
          4 /dev/rasm-diskf
          1 /dev/rasm-diskc

 

 

可以看到该ASM SPFILE共有三份镜像(redundancy=high),分别保留在 /dev/rasm-diskc的AU=39和/dev/rasm-diske AU=35、/dev/rasm-diskf AU=35。

我们利用kfed命令分别检查这三个ASM DISK的header:

 

[grid@maclean1 wallets]$ kfed read /dev/rasm-diske|grep spfile
kfdhdb.spfile:                       35 ; 0x0f4: 0x00000023

[grid@maclean1 wallets]$ kfed read /dev/rasm-diskc|grep spfile 
kfdhdb.spfile:                       39 ; 0x0f4: 0x00000027

[grid@maclean1 wallets]$ kfed read /dev/rasm-diskf|grep spfile 
kfdhdb.spfile:                       35 ; 0x0f4: 0x00000023

 

 

可以看到ASM disk header的kfdhdb.spfile指向ASM SPFILE在这个DISK上的AU NUMBER即其位置, ASM实例在启动时只需要通过GPNP PROFILE中的 DiscoveryString找到合适的设备路径,并读取其ASM disk header即可以找到kfdhdb.spfile这个位置属性,从而在没有MOUNT DISKGROUP的情况下读取ASM SPFILE,并成功启动ASM, 这也就解决了鸡生蛋、蛋生鸡的难题。

在11gR2 RAC中修改ASM DISK Path磁盘路径

有同学在T.askmac.cn上提问关于修改11gR2中ASM DISK的路径问题,具体问题如下:

 

aix 6.1,grid 11.2.0.3+asm11.2.0.3+rac

建数据库的时候使用的是aix自带的多路径软件mpio,建了diskgroup

现在改造成veritas dmp多路径,已经修改了asm的disk_strings=/dev/vx/rdmp/*,crs/asm启动的时候已经可以识别到磁盘/dev/vx/rdmp/开头的磁盘,但是读取不回原来的diskgroup信息。

crs启动的报错日志:
2012-07-13 15:07:29.748: [ GPNP][1286]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2108 clsgpnp_profileCallUrlInt] get-profile call to url “ipc://GPNPD_ggtest1” disco “” [f=0 claimed- host: cname: seq: auth:]
2012-07-13 15:07:29.762: [ GPNP][1286]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2236 clsgpnp_profileCallUrlInt] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote “ipc://GPNPD_ggtest1” disco “”
2012-07-13 15:07:29.762: [ CSSD][1286]clssnmReadDiscoveryProfile: voting file discovery string(/dev/vx/rdmp/*)
2012-07-13 15:07:29.762: [ CSSD][1286]clssnmvDDiscThread: using discovery string /dev/vx/rdmp/* for initial discovery
2012-07-13 15:07:29.762: [ SKGFD][1286]Discovery with str:/dev/vx/rdmp/*:

2012-07-13 15:07:29.762: [ SKGFD][1286]UFS discovery with :/dev/vx/rdmp/*:

2012-07-13 15:07:29.769: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_919:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_212:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_211:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_210:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_209:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_181:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_180:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_3:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_2:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_1:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_0:

2012-07-13 15:07:29.771: [ SKGFD][1286]OSS discovery with :/dev/vx/rdmp/*:

2012-07-13 15:07:29.771: [ SKGFD][1286]Handle 1115e7510 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_916:

2012-07-13 15:07:29.772: [ SKGFD][1286]Handle 1118758b0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_912:

2012-07-13 15:07:29.773: [ SKGFD][1286]Handle 1118d9cf0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_908:

2012-07-13 15:07:29.773: [ SKGFD][1286]Handle 1118da450 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_904:

2012-07-13 15:07:29.773: [ SKGFD][1286]Handle 1118dad70 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_903:

2012-07-13 15:07:29.802: [ CLSF][1286]checksum failed for disk:/dev/vx/rdmp/v_df8000_916:
2012-07-13 15:07:29.803: [ SKGFD][1286]Lib :UFS:: closing handle 1115e7510 for disk :/dev/vx/rdmp/v_df8000_916:

2012-07-13 15:07:29.803: [ SKGFD][1286]Lib :UFS:: closing handle 1118758b0 for disk :/dev/vx/rdmp/v_df8000_912:

2012-07-13 15:07:29.804: [ SKGFD][1286]Handle 1115e6710 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_202:

2012-07-13 15:07:29.808: [ SKGFD][1286]Handle 1115e7030 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_201:

2012-07-13 15:07:29.809: [ SKGFD][1286]Handle 1115e7ad0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_200:

2012-07-13 15:07:29.809: [ SKGFD][1286]Handle 1118733f0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_199:

2012-07-13 15:07:29.816: [ CLSF][1286]checksum failed for disk:/dev/vx/rdmp/v_df8000_186:
2012-07-13 15:07:29.816: [ SKGFD][1286]Lib :UFS:: closing handle 1118de5d0 for disk :/dev/vx/rdmp/v_df8000_186:

2012-07-13 15:07:29.816: [ CSSD][1286]clssnmvDiskVerify: Successful discovery of 0 disks
2012-07-13 15:07:29.816: [ CSSD][1286]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2012-07-13 15:07:29.816: [ CSSD][1286]clssnmvFindInitialConfigs: No voting files found
2012-07-13 15:07:29.816: [ CSSD][1286](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2012-07-13 15:07:30.169: [ CSSD][1029]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(1115e4870) client(0)

 

 

其需求为修改ASM DISK PATH的磁盘设备路径,但是由于11gR2 RAC+ASM的特殊性,导致CRS无法正常启动,虽然使用crsctl start crs -excl -nocrs的方式可以启动CSS服务和ASM实例,

 

但是最后还是报(clssnmCompleteInitVFDiscovery: Voting file not found),这是由于Voteing file投票磁盘的位置没有合理更新导致的。

 

 

这里我们来学习一下,如何正确的修改11gR2 RAC+ASM下的ASM DISK路径:

 

 

1.我们来营造一个修改ASM DISK路径的环境,这里我们使用UDEV的设备名绑定服务,并利用UDEV将原本的ASM DISK从/dev/asm-disk* 修改为 /dev/rasm-disk*的形式, 这只需要修改udev rule文件即可实现:

 

 

[grid@maclean1 ~]$ export  ORACLE_HOME=/g01/grid/app/11.2.0/grid

[grid@maclean1 ~]$ /g01/grid/app/11.2.0/grid/bin/sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 15 04:09:28 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> show parameter diskstri 

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/asm*

 

可以看到当前ASM实例使用的asm_diskstring 为/dev/asm*, 切换到root用户修改UDEV RULE文件 :

 

 

[root@maclean1 rules.d]# cp 99-oracle-asmdevices.rules  99-oracle-asmdevices.rules.bak
[root@maclean1 rules.d]# vi 99-oracle-asmdevices.rules

[root@maclean1 rules.d]# cat 99-oracle-asmdevices.rules

KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB09cadb31-cfbea255_", NAME="rasm-diskb", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB5f097069-59efb82f_", NAME="rasm-diskc", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB4e1a81c0-20478bc4_", NAME="rasm-diskd", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VBdcce9285-b13c5a27_", NAME="rasm-diske", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB82effe1a-dbca7dff_", NAME="rasm-diskf", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB950d279f-c581cb51_", NAME="rasm-diskg", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB14400d81-651672d7_", NAME="rasm-diskh", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB31b1237b-78aa22bb_", NAME="rasm-diski", OWNER="grid", GROUP="asmadmin", MODE="0660"

 

 

以上修改了原始99-oracle-asmdevices.rules的UDEV RULE规则文件,生成的设备名被修改为/dev/rasm-disk*的形式,不同于之前的ASM DISK设备名, 这要求我们后续的一系列操作来保证RAC CRS可以正常启动。

 

当前运行时的votedisk和ocr 存放位置:

 

[root@maclean1 rules.d]# /g01/grid/app/11.2.0/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   6896bfc3d1464f9fbf0ea9df87e023ad (/dev/asm-diskb) [SYSTEMDG]
 2. ONLINE   58eb81b656084ff2bfd315d9badd08b7 (/dev/asm-diskc) [SYSTEMDG]
 3. ONLINE   6bf7324625c54f3abf2c942b1e7f70d9 (/dev/asm-diskd) [SYSTEMDG]
 4. ONLINE   43ad8ae20c354f5ebf7083bc30bf94cc (/dev/asm-diske) [SYSTEMDG]
 5. ONLINE   4c225359d51b4f93bfba01080664b3d7 (/dev/asm-diskf) [SYSTEMDG]
Located 5 voting disk(s).

[root@maclean1 rules.d]# /g01/grid/app/11.2.0/grid/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2844
         Available space (kbytes) :     259276
         ID                       :  879001605
         Device/File Name         :  +SYSTEMDG
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

 

 

因为votedisk file的位置具体指向一个ASM DISK,所以后续我们会需要crsctl replace votedisk, 我们现在重启LINUX OS:

 

 

[root@maclean1 rules.d]# init 6

rebooting ............

[root@maclean1 dev]# ls -l *asm*
brw-rw---- 1 grid asmadmin 8,  16 Jul 15 04:15 rasm-diskb
brw-rw---- 1 grid asmadmin 8,  32 Jul 15 04:15 rasm-diskc
brw-rw---- 1 grid asmadmin 8,  48 Jul 15 04:15 rasm-diskd
brw-rw---- 1 grid asmadmin 8,  64 Jul 15 04:15 rasm-diske
brw-rw---- 1 grid asmadmin 8,  80 Jul 15 04:15 rasm-diskf
brw-rw---- 1 grid asmadmin 8,  96 Jul 15 04:15 rasm-diskg
brw-rw---- 1 grid asmadmin 8, 112 Jul 15 04:15 rasm-diskh
brw-rw---- 1 grid asmadmin 8, 128 Jul 15 04:15 rasm-diski

 

 

重启后自动获得了形如/dev/rasm-disk*的ASM DISK,查阅ASM日志可以发现css服务仍搜索/dev/asm*路径来获得ASM DISK,但是这将导致找不到任何有效的ASM DISK:

 

more /g01/grid/app/11.2.0/grid/log/maclean1/cssd/ocssd.log

2012-07-15 04:17:45.208: [ SKGFD][1099548992]Discovery with str:/dev/asm*:

2012-07-15 04:17:45.208: [   SKGFD][1099548992]UFS discovery with :/dev/asm*:

2012-07-15 04:17:45.208: [   SKGFD][1099548992]OSS discovery with :/dev/asm*:

2012-07-15 04:17:45.208: [ CSSD][1099548992]clssnmvDiskVerify: Successful discovery of 0 disks
2012-07-15 04:17:45.208: [    CSSD][1099548992]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2012-07-15 04:17:45.208: [    CSSD][1099548992]clssnmvFindInitialConfigs: No voting files found
2012-07-15 04:17:45.208: [    CSSD][1099548992](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. 
Retrying discovery in 15 seconds
2012-07-15 04:17:45.251: [    CSSD][1096661312]clssgmExecuteClientRequest(): type(37) size(80) only connect and 
exit messages are allowed before lease acquisition proc(0x26a8ba0) client((nil))
2012-07-15 04:17:45.251: [    CSSD][1096661312]clssgmDeadProc: proc 0x26a8ba0
2012-07-15 04:17:45.251: [    CSSD][1096661312]clssgmDestroyProc: cleaning up proc(0x26a8ba0) con(0xfe6) skgpid  
ospid 3751 with 0 clients, refcount 0
2012-07-15 04:17:45.252: [    CSSD][1096661312]clssgmDiscEndpcl: gipcDestroy 0xfe6
2012-07-15 04:17:45.829: [    CSSD][1096661312]clssscSelect: cookie accept request 0x2318ea0
2012-07-15 04:17:45.829: [    CSSD][1096661312]clssgmAllocProc: (0x2659480) allocated
2012-07-15 04:17:45.830: [    CSSD][1096661312]clssgmClientConnectMsg: properties of cmProc 0x2659480 - 1,2,3,4,5
2012-07-15 04:17:45.830: [    CSSD][1096661312]clssgmClientConnectMsg: Connect from con(0x114e) proc(0x2659480) pid(3751) 
version 11:2:1:4, properties: 1,2,3,4,5
2012-07-15 04:17:45.830: [    CSSD][1096661312]clssgmClientConnectMsg: msg flags 0x0000
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssscSelect: cookie accept request 0x253ddd0
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssscevtypSHRCON: getting client with cmproc 0x253ddd0
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssgmRegisterClient: proc(3/0x253ddd0), client(61/0x26877b0)
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are 
allowed before lease acquisition proc(0x253ddd0) client(0x26877b0)
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssgmDiscEndpcl: gipcDestroy 0x1174
2012-07-15 04:17:46.070: [    CSSD][1096661312]clssscSelect: cookie accept request 0x26368a0
2012-07-15 04:17:46.070: [    CSSD][1096661312]clssscevtypSHRCON: getting client with cmproc 0x26368a0
2012-07-15 04:17:46.070: [    CSSD][1096661312]clssgmRegisterClient: proc(5/0x26368a0), client(50/0x26877b0)

 

 

由于11gR2中CRS服务依赖于ASM,因为ocr存放在ASM中,所以ASM若无法有效启动,这导致CRS服务也无法正常工作:

[root@maclean1 ~]# crsctl check has
CRS-4638: Oracle High Availability Services is online

[root@maclean1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

 

 

2. 以上是修改ASM DISK PATH磁盘路径的现状,我们需要通过以下操作来恢复CRS:

首先彻底关闭OHASD服务:

 

[root@maclean1 ~]# crsctl stop has -f 

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'maclean1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'maclean1'
CRS-2673: Attempting to stop 'ora.crf' on 'maclean1'
CRS-2677: Stop of 'ora.mdnsd' on 'maclean1' succeeded
CRS-2677: Stop of 'ora.crf' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'maclean1'
CRS-2677: Stop of 'ora.gipcd' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'maclean1'
CRS-2677: Stop of 'ora.gpnpd' on 'maclean1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'maclean1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

 

 

3. 以-excl -nocrs方式启动CRS,这将仅启动ASM 实例而不会启动CRS服务:

 

 

 [root@maclean1 ~]# crsctl start crs -excl -nocrs 

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'maclean1'
CRS-2676: Start of 'ora.mdnsd' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'maclean1'
CRS-2676: Start of 'ora.gpnpd' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'maclean1'
CRS-2672: Attempting to start 'ora.gipcd' on 'maclean1'
CRS-2676: Start of 'ora.cssdmonitor' on 'maclean1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'maclean1'
CRS-2672: Attempting to start 'ora.diskmon' on 'maclean1'
CRS-2676: Start of 'ora.diskmon' on 'maclean1' succeeded
CRS-2676: Start of 'ora.cssd' on 'maclean1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'maclean1'
CRS-2672: Attempting to start 'ora.ctssd' on 'maclean1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'maclean1'
CRS-2676: Start of 'ora.ctssd' on 'maclean1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'maclean1'
CRS-2676: Start of 'ora.asm' on 'maclean1' succeeded	

#建议同时修改CRS_HOME所在的ORACLE_BASE为777权限,避免可能的问题
[root@maclean1 ~]# chmod 777 /g01

 

 

 

4.修改ASM实例的disk_strings为当前的ASM DISK PATH信息:

 

[root@maclean1 ~]# su - grid

[grid@maclean1 ~]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 15 04:40:40 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter system set asm_diskstring='/dev/rasm*';

System altered.

SQL> alter diskgroup systemdg mount;

Diskgroup altered.

SQL> create spfile from memory;

File created.

SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /g01/grid/app/11.2.0/grid/dbs/
                                                 spfile+ASM1.ora

SQL> show parameter disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      SYSTEMDG
asm_diskstring                       string      /dev/rasm*

SQL> create pfile from spfile;

File created.

SQL> create spfile='+SYSTEMDG' from pfile;

File created.

SQL> startup force;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +SYSTEMDG/maclean-cluster/asmp
                                                 arameterfile/registry.253.7886
                                                 82933

 

 

以上成功修改了asm_diskstring ,且更新了ASM DISKGROUP上的SPFILE , 由于ASM使用共享的SPFILE所以其他节点上一般无需在做其他操作。

 

5. crsctl replace votedisk 命令将votedisk重置位置:

 

 

[root@maclean1 ~]# crsctl replace votedisk +systemdg

Successful addition of voting disk 864a00efcfbe4f42bfd0f4f6b60472a0.
Successful addition of voting disk ab14d6e727614f29bf53b9870052a5c8.
Successful addition of voting disk 754c03c168854f46bf2daee7287bf260.
Successful addition of voting disk 9ed58f37f3e84f28bfcd9b101f2af9f3.
Successful addition of voting disk 4ce7b7c682364f12bf4df5ce1fb7814e.
Successfully replaced voting disk group with +systemdg.
CRS-4266: Voting file(s) successfully replaced		

[root@maclean1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   864a00efcfbe4f42bfd0f4f6b60472a0 (/dev/rasm-diskb) [SYSTEMDG]
 2. ONLINE   ab14d6e727614f29bf53b9870052a5c8 (/dev/rasm-diskc) [SYSTEMDG]
 3. ONLINE   754c03c168854f46bf2daee7287bf260 (/dev/rasm-diskd) [SYSTEMDG]
 4. ONLINE   9ed58f37f3e84f28bfcd9b101f2af9f3 (/dev/rasm-diske) [SYSTEMDG]
 5. ONLINE   4ce7b7c682364f12bf4df5ce1fb7814e (/dev/rasm-diskf) [SYSTEMDG]
Located 5 voting disk(s).

[root@maclean1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2844
         Available space (kbytes) :     259276
         ID                       :  879001605
         Device/File Name         :  +SYSTEMDG
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

 

 

以上replace了votedisk到新的 ASM DISK上,并确认votedisk和OCR均为可用状态。

 

6.重启CRS服务:

 

 

[root@maclean1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'maclean1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'maclean1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'maclean1'
CRS-2673: Attempting to stop 'ora.asm' on 'maclean1'
CRS-2677: Stop of 'ora.mdnsd' on 'maclean1' succeeded
CRS-2677: Stop of 'ora.asm' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'maclean1'
CRS-2677: Stop of 'ora.ctssd' on 'maclean1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'maclean1'
CRS-2677: Stop of 'ora.cssd' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'maclean1'
CRS-2677: Stop of 'ora.gipcd' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'maclean1'
CRS-2677: Stop of 'ora.gpnpd' on 'maclean1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'maclean1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@maclean1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.BACKUPDG.dg
               ONLINE  ONLINE       maclean1                                     
ora.DATA.dg
               ONLINE  ONLINE       maclean1                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       maclean1                                     
ora.SYSTEMDG.dg
               ONLINE  ONLINE       maclean1                                     
ora.asm
               ONLINE  ONLINE       maclean1                 Started             
ora.gsd
               OFFLINE OFFLINE      maclean1                                     
ora.net1.network
               ONLINE  ONLINE       maclean1                                     
ora.ons
               ONLINE  ONLINE       maclean1                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       maclean1                                     
ora.cvu
      1        ONLINE  ONLINE       maclean1                                     
ora.maclean1.vip
      1        ONLINE  ONLINE       maclean1                                     
ora.maclean2.vip
      1        ONLINE  INTERMEDIATE maclean1                 FAILED OVER         
ora.oc4j
      1        ONLINE  OFFLINE                               STARTING            
ora.prod.db
      1        ONLINE  OFFLINE                               Instance Shutdown,S 
                                                             TARTING             
      2        ONLINE  OFFLINE                                                   
ora.scan1.vip
      1        ONLINE  ONLINE       maclean1

 

 

因为上面更新了ASM共享使用的SPFILE,所以其他节点上一般不会存在问题,直接重启后CRS即可正常工作。

 

由于11gR2 RAC+ASM的启动依赖较为复杂,虽然通过以上操作但你仍可能在修改ASM DISK PATH路径时遇到一些问题, 若无法解决请去该问题的原帖提问,谢谢配合!

 

10g ASM lost disk log

10g中存储掉电导致一个failgroup中的asm disk全部丢失,可能导致RDBMS INSTANCE HANG住几秒到一分钟的时间,相关ASM日志如下:

 

alert.log :

 

Tue Jun 19 15:37:19 GMT+08:00 2012NOTE: assigning ARB0 to group 2/0xa15117e7 (SDATA)
Tue Jun 19 15:37:20 GMT+08:00 2012NOTE: F1X0 copy 2 relocating from 10:2 to 9:2
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 65534:4294967294
Tue Jun 19 15:37:20 GMT+08:00 2012NOTE: X->S down convert bast on F1B3 bastCount=2
NOTE: X->S down convert bast on F1B3 bastCount=3
NOTE: X->S down convert bast on F1B3 bastCount=4
NOTE: X->S down convert bast on F1B3 bastCount=5
NOTE: X->S down convert bast on F1B3 bastCount=6
NOTE: X->S down convert bast on F1B3 bastCount=7
NOTE: X->S down convert bast on F1B3 bastCount=8
NOTE: X->S down convert bast on F1B3 bastCount=9
NOTE: X->S down convert bast on F1B3 bastCount=10
NOTE: X->S down convert bast on F1B3 bastCount=11
NOTE: X->S down convert bast on F1B3 bastCount=12
NOTE: X->S down convert bast on F1B3 bastCount=13
NOTE: X->S down convert bast on F1B3 bastCount=14
NOTE: X->S down convert bast on F1B3 bastCount=15
NOTE: X->S down convert bast on F1B3 bastCount=16
NOTE: X->S down convert bast on F1B3 bastCount=17
NOTE: X->S down convert bast on F1B3 bastCount=18
NOTE: X->S down convert bast on F1B3 bastCount=19
NOTE: X->S down convert bast on F1B3 bastCount=20
NOTE: X->S down convert bast on F1B3 bastCount=21
Tue Jun 19 15:37:41 GMT+08:00 2012SQL> alter diskgroup data  add failgroup fg2  disk    '/dev/rhdiskpower0' force, '/dev/rhdiskpower1' force , '/dev/rhdiskpower2' force , '/dev/rhdiskpower3' force 
Tue Jun 19 15:37:41 GMT+08:00 2012NOTE: reconfiguration of group 1/0x5e1117e5 (DATA), full=1
Tue Jun 19 15:37:42 GMT+08:00 2012NOTE: initializing header on grp 1 disk DATA_0004
NOTE: initializing header on grp 1 disk DATA_0006
NOTE: initializing header on grp 1 disk DATA_0008
NOTE: initializing header on grp 1 disk DATA_0009
NOTE: cache opening disk 4 of grp 1: DATA_0004 path:/dev/rhdiskpower0
NOTE: cache opening disk 6 of grp 1: DATA_0006 path:/dev/rhdiskpower1
NOTE: cache opening disk 8 of grp 1: DATA_0008 path:/dev/rhdiskpower2
NOTE: cache opening disk 9 of grp 1: DATA_0009 path:/dev/rhdiskpower3
NOTE: requesting all-instance disk validation for group=1
Tue Jun 19 15:37:42 GMT+08:00 2012NOTE: disk validation pending for group 1/0x5e1117e5 (DATA)
SUCCESS: validated disks for 1/0x5e1117e5 (DATA)
Tue Jun 19 15:37:44 GMT+08:00 2012NOTE: PST update: grp = 1
Tue Jun 19 15:37:44 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
NOTE: group DATA: relocated PST to: disk 0004 (PST copy 1)
Tue Jun 19 15:37:44 GMT+08:00 2012NOTE: requesting all-instance membership refresh for group=1
Tue Jun 19 15:37:44 GMT+08:00 2012NOTE: membership refresh pending for group 1/0x5e1117e5 (DATA)
SUCCESS: refreshed membership for 1/0x5e1117e5 (DATA)
Tue Jun 19 15:38:06 GMT+08:00 2012SQL> alter diskgroup data rebalance power 10 
Tue Jun 19 15:38:06 GMT+08:00 2012NOTE: PST update: grp = 1
NOTE: requesting all-instance membership refresh for group=1
Tue Jun 19 15:38:06 GMT+08:00 2012NOTE: membership refresh pending for group 1/0x5e1117e5 (DATA)
SUCCESS: refreshed membership for 1/0x5e1117e5 (DATA)
Tue Jun 19 15:38:32 GMT+08:00 2012SQL> alter diskgroup sdata rebalance power 10 
Tue Jun 19 15:38:32 GMT+08:00 2012ERROR: ORA-1013 thrown in ARB0 for group number 2
Tue Jun 19 15:38:32 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb0_5374362.trc:
ORA-01013: user requested cancel of current operation
Tue Jun 19 15:38:32 GMT+08:00 2012NOTE: stopping process ARB0
Tue Jun 19 15:38:35 GMT+08:00 2012NOTE: rebalance interrupted for group 2/0xa15117e7 (SDATA)
Tue Jun 19 15:38:35 GMT+08:00 2012NOTE: starting rebalance of group 1/0x5e1117e5 (DATA) at power 10
Starting background process ARB0
Tue Jun 19 15:38:35 GMT+08:00 2012NOTE: PST update: grp = 2
NOTE: requesting all-instance membership refresh for group=2
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB1
ARB0 started with pid=20, OS id=5898374
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB2
ARB1 started with pid=21, OS id=5374366
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB3
ARB2 started with pid=22, OS id=4456744
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB4
ARB3 started with pid=23, OS id=8782054
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB5
ARB4 started with pid=24, OS id=8454504
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB6
ARB5 started with pid=25, OS id=5963930
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB7
ARB6 started with pid=26, OS id=6357166
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB8
ARB7 started with pid=27, OS id=7209164
Tue Jun 19 15:38:35 GMT+08:00 2012Starting background process ARB9
ARB8 started with pid=28, OS id=6488284
Tue Jun 19 15:38:35 GMT+08:00 2012NOTE: membership refresh pending for group 2/0xa15117e7 (SDATA)
NOTE: assigning ARB0 to group 1/0x5e1117e5 (DATA)
ARB9 started with pid=29, OS id=5308840
Tue Jun 19 15:38:35 GMT+08:00 2012NOTE: assigning ARB1 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB2 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB3 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB4 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB5 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB6 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB7 to group 1/0x5e1117e5 (DATA)
Tue Jun 19 15:38:36 GMT+08:00 2012NOTE: F1X0 copy 2 relocating from 5:4294967294 to 9:2
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 65534:4294967294
Tue Jun 19 15:38:36 GMT+08:00 2012NOTE: assigning ARB8 to group 1/0x5e1117e5 (DATA)
NOTE: assigning ARB9 to group 1/0x5e1117e5 (DATA)
Tue Jun 19 15:38:37 GMT+08:00 2012NOTE: X->S down convert bast on F1B3 bastCount=2
NOTE: X->S down convert bast on F1B3 bastCount=3
NOTE: X->S down convert bast on F1B3 bastCount=4
NOTE: X->S down convert bast on F1B3 bastCount=5
NOTE: X->S down convert bast on F1B3 bastCount=6
NOTE: X->S down convert bast on F1B3 bastCount=7
NOTE: X->S down convert bast on F1B3 bastCount=8
NOTE: X->S down convert bast on F1B3 bastCount=9
Tue Jun 19 15:38:38 GMT+08:00 2012SUCCESS: refreshed membership for 2/0xa15117e7 (SDATA)
Tue Jun 19 15:38:39 GMT+08:00 2012NOTE: X->S down convert bast on F1B3 bastCount=10
NOTE: X->S down convert bast on F1B3 bastCount=11
NOTE: X->S down convert bast on F1B3 bastCount=12
NOTE: X->S down convert bast on F1B3 bastCount=13
NOTE: X->S down convert bast on F1B3 bastCount=14
NOTE: X->S down convert bast on F1B3 bastCount=15
NOTE: X->S down convert bast on F1B3 bastCount=16
NOTE: X->S down convert bast on F1B3 bastCount=17
NOTE: X->S down convert bast on F1B3 bastCount=18
NOTE: X->S down convert bast on F1B3 bastCount=19
NOTE: X->S down convert bast on F1B3 bastCount=20
NOTE: X->S down convert bast on F1B3 bastCount=21
Tue Jun 19 15:44:26 GMT+08:00 2012NOTE: cache initiating offline of disk 6  group 1
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb4_8454504.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 10496000
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb9_5308840.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 10500096
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5127 diskname:/dev/rhdiskpower0
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5125 diskname:/dev/rhdiskpower3
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: process 7274698 initiating offline of disk 6.3277973293 (DATA_0006) with mask 0x3 in group 1
Tue Jun 19 15:44:26 GMT+08:00 2012	 rq:110d431d8 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
Tue Jun 19 15:44:26 GMT+08:00 2012	 rq:110d42c90 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb5_5963930.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 10504192
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: Disk 6 in group 1 in mode: 0x7,state: 0x2 will be taken offline (local:y)
Tue Jun 19 15:44:26 GMT+08:00 2012	 status:2
Tue Jun 19 15:44:26 GMT+08:00 2012	 status:2
Tue Jun 19 15:44:26 GMT+08:00 2012ERROR: failed to copy file +DATA.291, extent 716
Tue Jun 19 15:44:26 GMT+08:00 2012NOTE: PST update: grp = 1, dsk = 6, mode = 0x6
Tue Jun 19 15:44:26 GMT+08:00 2012ERROR: failed to copy file +DATA.291, extent 906
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5129 diskname:/dev/rhdiskpower1
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb3_8782054.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 10479616
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012	 rq:110d42fc8 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5117 diskname:/dev/rhdiskpower2
Tue Jun 19 15:44:26 GMT+08:00 2012	 status:2
Tue Jun 19 15:44:26 GMT+08:00 2012ERROR: failed to copy file +DATA.291, extent 1355
Tue Jun 19 15:44:26 GMT+08:00 2012	 rq:110d43020 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
Tue Jun 19 15:44:26 GMT+08:00 2012	 status:2
ERROR: failed to copy file +DATA.291, extent 1567
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_gmon_7602242.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 2056
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: Disk 4 in group 1 in mode: 0x7,state: 0x2 will be taken offline (local:y)
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_gmon_7602242.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 2056
Additional information: -1
WARNING: Disk 8 in group 1 in mode: 0x7,state: 0x2 will be taken offline (local:y)
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5128 diskname:/dev/rhdiskpower0
	 rq:110d431d8 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
	 status:2
ERROR: failed to copy file +DATA.291, extent 764
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_gmon_7602242.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 2056
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: Disk 9 in group 1 in mode: 0x7,state: 0x2 will be taken offline (local:y)
NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:26 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb7_7209164.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 10498048
Additional information: -1
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5126 diskname:/dev/rhdiskpower3
	 rq:110d42f48 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
	 status:2
ERROR: failed to copy file +DATA.291, extent 1130
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5127 diskname:/dev/rhdiskpower3
	 rq:110d43258 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
	 status:2
ERROR: failed to copy file +DATA.291, extent 582
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5118 diskname:/dev/rhdiskpower2
	 rq:110d432d8 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
	 status:2
ERROR: failed to copy file +DATA.291, extent 1015
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5117 diskname:/dev/rhdiskpower2
	 rq:110d42d10 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
	 status:2
ERROR: failed to copy file +DATA.291, extent 1445
Tue Jun 19 15:44:26 GMT+08:00 2012WARNING: IO Failed.  au:5129 diskname:/dev/rhdiskpower1
	 rq:110d42f48 buffer:11086b800 au_offset(bytes):0 iosz:1048576 operation:1
	 status:2
ERROR: failed to copy file +DATA.291, extent 1243
Tue Jun 19 15:44:28 GMT+08:00 2012NOTE: PST update: grp = 1, dsk = 6, mode = 0x4
Tue Jun 19 15:44:28 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
NOTE: cache closing disk 6 of grp 1: DATA_0006
Tue Jun 19 15:44:31 GMT+08:00 2012NOTE: DBWR successfully wrote to at least one mirror side
NOTE: cache initiating offline of disk 8  group 1
WARNING: process 7274698 initiating offline of disk 8.3277973294 (DATA_0008) with mask 0x3 in group 1
NOTE: PST update: grp = 1, dsk = 8, mode = 0x6
Tue Jun 19 15:44:31 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:34 GMT+08:00 2012NOTE: PST update: grp = 1, dsk = 8, mode = 0x4
Tue Jun 19 15:44:34 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
NOTE: cache closing disk 8 of grp 1: DATA_0008
Tue Jun 19 15:44:37 GMT+08:00 2012NOTE: DBWR successfully wrote to at least one mirror side
NOTE: cache initiating offline of disk 9  group 1
WARNING: process 7274698 initiating offline of disk 9.3277973295 (DATA_0009) with mask 0x3 in group 1
NOTE: PST update: grp = 1, dsk = 9, mode = 0x6
Tue Jun 19 15:44:37 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:37 GMT+08:00 2012WARNING: IO Failed.  au:16 diskname:/dev/rhdiskpower4
	 rq:110446a30 buffer:7000000127b4000 au_offset(bytes):770048 iosz:4096 operation:1
	 status:2
NOTE: cache initiating offline of disk 5  group 2
WARNING: process 5046626 initiating offline of disk 5.3277973287 (SDATA_0005) with mask 0x3 in group 2
NOTE: PST update: grp = 2, dsk = 5, mode = 0x6
Tue Jun 19 15:44:38 GMT+08:00 2012WARNING: found another non-responsive disk 9.3277973291 (SDATA_0009) that will be taken offline 
WARNING: found another non-responsive disk 8.3277973290 (SDATA_0008) that will be taken offline 
WARNING: found another non-responsive disk 7.3277973289 (SDATA_0007) that will be taken offline 
WARNING: found another non-responsive disk 6.3277973288 (SDATA_0006) that will be taken offline 
WARNING: Disk 9 in group 2 in mode: 0x7,state: 0x2 will be taken offline (local:y)
WARNING: Disk 8 in group 2 in mode: 0x7,state: 0x2 will be taken offline (local:y)
WARNING: Disk 7 in group 2 in mode: 0x7,state: 0x2 will be taken offline (local:y)
WARNING: Disk 6 in group 2 in mode: 0x7,state: 0x2 will be taken offline (local:y)
NOTE: group SDATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:40 GMT+08:00 2012NOTE: PST update: grp = 1, dsk = 9, mode = 0x4
Tue Jun 19 15:44:40 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:40 GMT+08:00 2012NOTE: PST update: grp = 2, dsk = 5, mode = 0x4
Tue Jun 19 15:44:40 GMT+08:00 2012NOTE: group SDATA: relocated PST to: disk 0000 (PST copy 0)
NOTE: cache closing disk 9 of grp 1: DATA_0009
Tue Jun 19 15:44:43 GMT+08:00 2012NOTE: DBWR successfully wrote to at least one mirror side
NOTE: cache initiating offline of disk 4  group 1
WARNING: process 7274698 initiating offline of disk 4.3277973292 (DATA_0004) with mask 0x3 in group 1
NOTE: PST update: grp = 1, dsk = 4, mode = 0x6
NOTE: cache closing disk 5 of grp 2: SDATA_0005
Tue Jun 19 15:44:43 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:43 GMT+08:00 2012NOTE: LGWR successfully wrote to at least one mirror side
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15081 thrown in ARB8 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15080 thrown in ARB7 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15080 thrown in ARB9 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb8_6488284.trc:
ORA-15081: failed to submit an I/O operation to a disk
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb7_7209164.trc:
ORA-15080: synchronous I/O operation to a disk failed
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb9_5308840.trc:
ORA-15080: synchronous I/O operation to a disk failed
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15081 thrown in ARB6 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15080 thrown in ARB5 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15081 thrown in ARB2 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb6_6357166.trc:
ORA-15081: failed to submit an I/O operation to a disk
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb5_5963930.trc:
ORA-15080: synchronous I/O operation to a disk failed
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb2_4456744.trc:
ORA-15081: failed to submit an I/O operation to a disk
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15080 thrown in ARB3 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15081 thrown in ARB0 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb3_8782054.trc:
ORA-15080: synchronous I/O operation to a disk failed
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb0_5898374.trc:
ORA-15081: failed to submit an I/O operation to a disk
Tue Jun 19 15:44:43 GMT+08:00 2012NOTE: stopping process ARB7
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15080 thrown in ARB4 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012NOTE: stopping process ARB8
NOTE: stopping process ARB9
NOTE: stopping process ARB6
NOTE: stopping process ARB5
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb4_8454504.trc:
ORA-15080: synchronous I/O operation to a disk failed
NOTE: stopping process ARB2
NOTE: stopping process ARB3
NOTE: stopping process ARB0
NOTE: stopping process ARB4
Tue Jun 19 15:44:43 GMT+08:00 2012ERROR: ORA-15081 thrown in ARB1 for group number 1
Tue Jun 19 15:44:43 GMT+08:00 2012Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_arb1_5374366.trc:
ORA-15081: failed to submit an I/O operation to a disk
Tue Jun 19 15:44:43 GMT+08:00 2012NOTE: stopping process ARB1
Tue Jun 19 15:44:46 GMT+08:00 2012NOTE: PST update: grp = 1, dsk = 4, mode = 0x4
Tue Jun 19 15:44:46 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:47 GMT+08:00 2012WARNING: rebalance not completed for group 1/0x5e1117e5 (DATA)
Tue Jun 19 15:44:47 GMT+08:00 2012SUCCESS: rebalance completed for group 1/0x5e1117e5 (DATA) 
NOTE: PST update: grp = 1
Tue Jun 19 15:44:47 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
NOTE: cache closing disk 4 of grp 1: DATA_0004
Tue Jun 19 15:44:49 GMT+08:00 2012NOTE: DBWR successfully wrote to at least one mirror side
NOTE: cache initiating offline of disk 7  group 2
WARNING: process 7274698 initiating offline of disk 7.3277973289 (SDATA_0007) with mask 0x3 in group 2
NOTE: PST update: grp = 2, dsk = 7, mode = 0x6
Tue Jun 19 15:44:50 GMT+08:00 2012NOTE: group SDATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:52 GMT+08:00 2012NOTE: PST update: grp = 2, dsk = 7, mode = 0x4
Tue Jun 19 15:44:52 GMT+08:00 2012NOTE: group SDATA: relocated PST to: disk 0000 (PST copy 0)
NOTE: cache closing disk 7 of grp 2: SDATA_0007
Tue Jun 19 15:44:56 GMT+08:00 2012NOTE: DBWR successfully wrote to at least one mirror side
Tue Jun 19 15:44:56 GMT+08:00 2012WARNING: offline disk number 5 has references (10684 AUs)
WARNING: offline disk number 7 has references (10676 AUs)
NOTE: PST update: grp = 1
Tue Jun 19 15:44:56 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:44:56 GMT+08:00 2012WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower16
	 rq:11087eed0 buffer:110881200 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower16
	 rq:11087eed0 buffer:110881200 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower20
	 rq:11087f1e0 buffer:110882400 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower20
	 rq:11087f1e0 buffer:110882400 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower6
	 rq:11087f800 buffer:110884800 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower6
	 rq:11087f800 buffer:110884800 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
SUCCESS: refreshed membership for 2/0xa15117e7 (SDATA)
Tue Jun 19 15:44:56 GMT+08:00 2012NOTE: X->S down convert bast on F1B3 bastCount=22
Tue Jun 19 15:44:56 GMT+08:00 2012NOTE: membership refresh pending for group 1/0x5e1117e5 (DATA)
SUCCESS: refreshed membership for 1/0x5e1117e5 (DATA)
Tue Jun 19 15:44:59 GMT+08:00 2012WARNING: PST-initiated drop disk 1(1578178533).4(3277973292) (DATA_0004)
NOTE: PST update: grp = 1
Tue Jun 19 15:45:00 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:45:00 GMT+08:00 2012NOTE: requesting all-instance membership refresh for group=1
Tue Jun 19 15:45:00 GMT+08:00 2012NOTE: membership refresh pending for group 1/0x5e1117e5 (DATA)
SUCCESS: refreshed membership for 1/0x5e1117e5 (DATA)
Tue Jun 19 15:45:02 GMT+08:00 2012SUCCESS: PST-initiated disk drop completed
Tue Jun 19 15:45:05 GMT+08:00 2012NOTE: starting rebalance of group 1/0x5e1117e5 (DATA) at power 1
Starting background process ARB0
ARB0 started with pid=20, OS id=10486162
Tue Jun 19 15:45:05 GMT+08:00 2012NOTE: assigning ARB0 to group 1/0x5e1117e5 (DATA)
Tue Jun 19 15:45:05 GMT+08:00 2012NOTE: F1X0 copy 2 relocating from 9:2 to 5:4294967294
NOTE: F1X0 copy 3 relocating from 65534:4294967294 to 65534:4294967294
Tue Jun 19 15:45:05 GMT+08:00 2012NOTE: X->S down convert bast on F1B3 bastCount=2
NOTE: X->S down convert bast on F1B3 bastCount=3
NOTE: X->S down convert bast on F1B3 bastCount=4
NOTE: X->S down convert bast on F1B3 bastCount=5
NOTE: X->S down convert bast on F1B3 bastCount=6
NOTE: X->S down convert bast on F1B3 bastCount=7
NOTE: X->S down convert bast on F1B3 bastCount=8
NOTE: X->S down convert bast on F1B3 bastCount=9
NOTE: X->S down convert bast on F1B3 bastCount=10
NOTE: X->S down convert bast on F1B3 bastCount=11
NOTE: X->S down convert bast on F1B3 bastCount=12
NOTE: X->S down convert bast on F1B3 bastCount=13
NOTE: X->S down convert bast on F1B3 bastCount=14
NOTE: X->S down convert bast on F1B3 bastCount=15
NOTE: X->S down convert bast on F1B3 bastCount=16
NOTE: X->S down convert bast on F1B3 bastCount=17
NOTE: X->S down convert bast on F1B3 bastCount=18
NOTE: X->S down convert bast on F1B3 bastCount=19
NOTE: X->S down convert bast on F1B3 bastCount=20
NOTE: X->S down convert bast on F1B3 bastCount=21
NOTE: cache closing disk 6 of grp 2: SDATA_0006
NOTE: cache closing disk 8 of grp 2: SDATA_0008
NOTE: cache closing disk 9 of grp 2: SDATA_0009
NOTE: X->S down convert bast on F1B3 bastCount=22
Tue Jun 19 15:45:07 GMT+08:00 2012NOTE: membership refresh pending for group 2/0xa15117e7 (SDATA)
SUCCESS: refreshed membership for 2/0xa15117e7 (SDATA)
Tue Jun 19 15:45:08 GMT+08:00 2012SUCCESS: PST-initiated disk drop completed
Tue Jun 19 15:50:41 GMT+08:00 2012SUCCESS: refreshed membership for 2/0xa15117e7 (SDATA)
Tue Jun 19 15:55:53 GMT+08:00 2012NOTE: stopping process ARB0
Tue Jun 19 15:55:56 GMT+08:00 2012WARNING: rebalance not completed for group 1/0x5e1117e5 (DATA)
Tue Jun 19 15:55:56 GMT+08:00 2012SUCCESS: rebalance completed for group 1/0x5e1117e5 (DATA) 
NOTE: PST update: grp = 1
Tue Jun 19 15:55:56 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)
Tue Jun 19 15:55:56 GMT+08:00 2012SUCCESS: disk number 4 force dropped offline
WARNING: offline disk number 5 has references (20926 AUs)
SUCCESS: disk number 6 force dropped offline
WARNING: offline disk number 7 has references (20924 AUs)
SUCCESS: disk number 8 force dropped offline
SUCCESS: disk number 9 force dropped offline
NOTE: PST update: grp = 1
Tue Jun 19 15:55:56 GMT+08:00 2012NOTE: group DATA: relocated PST to: disk 0000 (PST copy 0)

rbal trace:

*** 2012-06-19 15:44:56.076
kfgbDoorBellBast: BAST release invoked, gn=1
kfgbDoorBellBast: BAST released, gn=1
NOTE: PST update: grp = 1
kfgbDoorBellArm: value block read: 00000001
*** 2012-06-19 15:44:56.101
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower16
	 rq:11087eed0 buffer:110881200 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower16
	 rq:11087eed0 buffer:110881200 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower20
	 rq:11087f1e0 buffer:110882400 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower20
	 rq:11087f1e0 buffer:110882400 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower6
	 rq:11087f800 buffer:110884800 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
WARNING: IO Failed.  au:0 diskname:/dev/rhdiskpower6
	 rq:11087f800 buffer:110884800 au_offset(bytes):0 iosz:4096 operation:0
	 status:2
kfgbDoorBellArm: value block read: 00000003
kfgbDoorBellBast: BAST release invoked, gn=1
kfgbDoorBellBast: BAST released, gn=1
kfgbDoorBellArm: value block read: 00000003
kfgbRebalGrp: queued rebalance (power 1) for group 1/0x5e1117e5 (DATA)
kfgbDoorBellArm: value block read: 00000003
*** 2012-06-19 15:45:07.568
kfgbDoorBellBast: BAST release invoked, gn=2
kfgbDoorBellBast: BAST released, gn=2
kfgbDoorBellArm: value block read: 00000001
*** 2012-06-19 15:50:38.645
kfgbDoorBellBast: BAST release invoked, gn=2
kfgbDoorBellBast: BAST released, gn=2
kfgbDoorBellArm: value block read: 00000002
*** 2012-06-19 15:55:56.239
kfgbRelease: de-queued rebalance of group 1/0x5e1117e5 (DATA)
kfgbDoorBellArm: value block read: 00000003
kfgbExpellNow: checking for empty ASM disks, gn=1
NOTE: PST update: grp = 1
NOTE: PST update: grp = 1
kfgbDoorBellArm: value block modify: 00000004


RDBMS ALERT.LOG

Tue Jun 19 15:09:22 GMT+08:00 2012Thread 2 advanced to log sequence 37 (thread recovery)
Tue Jun 19 15:15:15 GMT+08:00 2012WARNING: failed to write mirror side 0 of virtual extent 0 of file 280 in group 1 
WARNING: process 6029392 initiating offline of disk 9.3279471124 (DATA_0009) with mask 0x3 in group 1
Tue Jun 19 15:15:15 GMT+08:00 2012Errors in file /u01/app/oracle/admin/culprodb/bdump/culprodb1_smon_4915536.trc:
ORA-27091: unable to queue I/O
ORA-27072: File I/O error
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 7
Additional information: 7871200
Additional information: -1
Tue Jun 19 15:15:15 GMT+08:00 2012WARNING: failed to read mirror side 1 of virtual extent 32 
logical extent 0 of file 284 in group 1 from disk 7 allocation unit 4294967295; if possible, will try another mirror side 
Tue Jun 19 15:15:15 GMT+08:00 2012WARNING: failed to write mirror side 0 of virtual extent 2 of file 277 in group 1 
Tue Jun 19 15:15:15 GMT+08:00 2012WARNING: process 4849996 initiating offline of disk 7.3279471122 (DATA_0007) with mask 0x3 in group 1
WARNING: failed to write mirror side 0 of virtual extent 2 of file 277 in group 1 
WARNING: process 4849996 initiating offline of disk 7.3279471122 (DATA_0007) with mask 0x3 in group 1
Tue Jun 19 15:28:39 GMT+08:00 2012SUCCESS: disk DATA_0008 (8.3279471123) dropped from diskgroup DATA
SUCCESS: disk DATA_0009 (9.3279471124) dropped from diskgroup DATA
Tue Jun 19 15:35:10 GMT+08:00 2012Errors in file /u01/app/oracle/admin/culprodb/bdump/culprodb1_asmb_5046620.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Tue Jun 19 15:35:10 GMT+08:00 2012ASMB: terminating instance due to error 15064
Tue Jun 19 15:35:10 GMT+08:00 2012System state dump is made for local instance
System State dumped to trace file /u01/app/oracle/admin/culprodb/bdump/culprodb1_diag_4391248.trc
Tue Jun 19 15:35:11 GMT+08:00 2012Shutting down instance (abort)
License high water mark = 12
Tue Jun 19 15:35:11 GMT+08:00 2012Trace dumping is performing id=[cdmp_20120619153510]
Tue Jun 19 15:35:15 GMT+08:00 2012Instance terminated by ASMB, pid = 5046620
Tue Jun 19 15:35:16 GMT+08:00 2012Instance terminated by USER, pid = 7274646

沪ICP备14014813号-2

沪公网安备 31010802001379号