ASM file metadata operation等待事件

什么是” ASM file metadata operation “等待事件?

这是一个内部的undocumented等待事件:

“ASM file metadata operation” event is associated with database (instance) communication with ASM (instance). If the database has files in ASM disk group(s) it needs to access disk group(s), get extent maps for files that it already has, get updated extent info (e.g. after a rebalance), create new files in ASM etc.

High waits for ‘ksv master wait’ while doing an ASM file metadata operation were reported when a data migration utility was running. This wait was also seen for a drop of a tablespace.

Many waits for “ASM metadata file operation” may be seen under kfnConnect
when sessions query ASM group statistics in a DB instance.

Rediscovery Notes:
Waits occur under repeated kfnConnect calls when querying certain
views based on fixed (X$) views, such as V$ASM_DISKGROUP_STAT.

At this time on the ASM only a few process(one or two) will be on non-idle
wait events such as “ASM file metadata operation” .

WAITING FOR ‘ASM FILE METADATA OPERATION’ DURING RMAN BACKUP & ASM HANGS

【日常管理】Asm Diskgroup增加磁盘add disk


show parameter power

 alter system set asm_power_limit=0; 



alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA28';
alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA27';
alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA26';
alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA25';
alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA24';
alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA23';
alter diskgroup MACLEAN add FAILGROUP FailgroupA disk '/asmdisks/asmdiskA22';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB50';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB49';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB48';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB45';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB44';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB43';
alter diskgroup MACLEAN add FAILGROUP FailgroupB disk '/asmdisks/asmdiskB42';





alter diskgroup MACLEAN rebalance power 11;
select * from v$asm_operation;

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

客户的一套11.2.0.1 RAC系统采用ASM diskgroup 存放ocr和votedisk,该REG diskgroup中的某个LUN disk由于硬件的原因损坏了,导致冗余的votedisk表决磁盘有一个处于OFFLINE状态,客户希望能删除该OFFLINE的votedisk并新增一个可用的。

在删除该votedisk文件时出现了CRS-4258的错误,错误如下:

crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. OFFLINE  5b3380d6367e4f94bf19e9db5f2f684e ()  []
 2. ONLINE   6802e6d139354fb3bf95725dd01a02fd (/dev/ocr2) [REG]
 3. ONLINE   a433d51ebd2d4facbfc8e95b017f5393 (/dev/asm-disk1) [REG]
 4. ONLINE   3784d344bffa4f6ebff21c4dd3c873bd (/dev/asm-disk2) [REG]
Located 4 voting disk(s).

crsctl delete css votedisk 5b3380d6367e4f94bf19e9db5f2f684e
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

居然无法移除ASM存储上的voting files,太搞笑了。

客户在MOS上找到了CRS-4258相关问题的Note:

CRS-4258: Addition and Deletion of Voting Files are not Allowed Because the Voting Files are on ASM in 11gR2 [ID 1060146.1]
Applies to:
Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.1 - Release: 11.2 to 11.2
Information in this document applies to any platform.
Symptoms

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM in 11gR2.


Changes
Stale voting files are seen after accidently dropping one of ASM disks belonging to the ASM diskgroup where voting files are stored.
And CRS-4258 occurs when trying to delete the stale voting files using crsctl delete css votedisk FUID.

[root@grid]# crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5b91aad0a2184f3dbfa8f970e8ae4d49 (/dev/oracleasm/disks/ASM10) [PLAY]
2. ONLINE 53b1b40b73164f9ebf3f498f6d460187 (/dev/oracleasm/disks/ASM9) [PLAY]
3. OFFLINE 82dfd04b96f14f6dbf36f5a62b118f61 () []

[root@grid]# crsctl delete css votedisk 82dfd04b96f14f6dbf36f5a62b118f61
CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM
Cause

1. Seeing stale voting files is due to bug 9024611.

2. "delete" command is not available , only "replace" command is available when voting files are stored on  ASM diskgroup.    

    Please see Oracle Clusterware Administration and Deployment Guide11g Release 2 (11.2)

Solution


1. This issue is permanently fixed in 11.2.0.2.0.

2. Apply patch 9024611. Please contact Oracle support if this patch is not available on your platform.

3. If CSS has stale voting files even after applying patch 9024611, do the following workaround -

WORKAROUND:
Do something to trigger ASM to try to relocate the voting file.

e.g)  $ crsctl replace votedisk  +asm_disk_group   --- Put available ASM diskgroup

        $ crsctl query css votedisk         --- Check if voting files are all online on the new ASM diskgroup
        $ crsctl replace votedisk +PLAY    -- Put the original ASM diskgroup where voting files were 

4. If the workaround above cannot be followed for any reason then you can request the patch for unpublished bug 9409327 for your platform.

References
BUG:9294664 - NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE

Hdr: 9294664 11.2.0.1 PCW 11.2.0.1 ADMUTL PRODID-5 PORTID-226 9024611
Abstract: NOT ABLE TO REMOVE THE VOTEDISK WHICH IS OFFILNE

PROBLEM:
--------
crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).


[root@sdc-drrac01 grid]# crsctl delete css votedisk 
a9b785a59c3c4f67bf15babc67ffb79a
CRS-4258: Addition and deletion of voting files are not allowed because the 
voting files are on ASM

DIAGNOSTIC ANALYSIS:
--------------------
Ct is performing some voting disk failover scenarios in which he has removed 
the 2 votedisk which were on ASM buy drop disk using asmlib and after that 
recreating the disk again and start the cluster in exclusive mode and start 
the ASM and mount the diskgourp So that rebalancing has been done but after 
that

 crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   9f7f4f7f798d4f69bfe31653894421a2 (ORCL:GRID1) [GRID]
2. OFFLINE  a9b785a59c3c4f67bf15babc67ffb79a () []
3. OFFLINE  29988f37fa794f12bfea3f3672c99609 () []
4. ONLINE   a8b3a040195c4f54bfce8ef21bd4fa07 (ORCL:GRID3) [GRID]
5. ONLINE   a1e4fbd9df6f4f67bf8fc12fe9780721 (ORCL:GRID2) [GRID]
Located 5 voting disk(s).

and not able to drop the vote disk which is offiline 

WORKAROUND:
-----------
n/a

RELATED BUGS:
-------------
as per bug 9024611 tried the workaround:

but while running 

crsctl css votedisk delete 

we got syntax error and found that there is no command with crsctl css ...

这个Note说明不能移除ASM存储内voting files的问题在11.2.0.2.0上已经解决了,也可以通过安装one-off patch 9024611来修复。

但是实际在11.2.0.2上测试可以发现仍旧无法删除ASM上的voting files:

root@rh2 ~]# crsctl query crs  releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.2.0]

[root@rh2 ~]# crsctl query crs  activeversion
Oracle Clusterware active version on the cluster is [11.2.0.2.0]


[grid@rh2 ~]$ /s01/grid/OPatch/opatch lsinventory
Invoking OPatch 11.2.0.1.1

Oracle Interim Patch Installer version 11.2.0.1.1
Copyright (c) 2009, Oracle Corporation.  All rights reserved.


Oracle Home       : /s01/grid
Central Inventory : /s01/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 11.2.0.1.1
OUI version       : 11.2.0.2.0
OUI location      : /s01/grid/oui
Log file location : /s01/grid/cfgtoollogs/opatch/opatch2011-08-04_18-50-34PM.log

Patch history file: /s01/grid/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /s01/grid/cfgtoollogs/opatch/lsinv/lsinventory2011-08-04_18-50-34PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (1): 

Oracle Grid Infrastructure                                           11.2.0.2.0
There are 1 products installed in this Oracle Home.


There are no Interim patches installed in this Oracle Home.


Rac system comprising of multiple nodes
  Local node = rh2
  Remote node = rh3

--------------------------------------------------------------------------------

OPatch succeeded.


[root@rh2 ~]# crsctl delete css votedisk a433d51ebd2d4facbfc8e95b017f5393

CRS-4258: Addition and deletion of voting files are not allowed because the voting files are on ASM

又是一个伪修复的Bug….!!

无法,寄希望与replace能解决问题,结果发现:

crsctl replace votedisk +DATA
Failed to create voting files on disk group DATA.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.

方法四中指出的unpublished bug 9409327(Patch 9409327: OFFLINE VF ENTRY REMAINS AFTER PATCH FOR BUG 9024611),目前仅在IBM AIX on POWER Systems (64-bit)的11.2.0.1上有对应的补丁。

利用UDEV服务解决RAC ASM存储设备名

<Why ASMLIB and why not?>我们介绍了使用ASMLIB作为一种专门为Oracle Automatic Storage Management特性设计的内核支持库(kernel support library)的优缺点,同时建议使用成熟的UDEV方案来替代ASMLIB。

这里我们就给出配置UDEV的具体步骤,还是比较简单的:

1.确认在所有RAC节点上已经安装了必要的UDEV包

[root@rh2 ~]# rpm -qa|grep udev
udev-095-14.21.el5

2.通过scsi_id获取设备的块设备的唯一标识名,假设系统上已有LUN sdc-sdp

for i in c d e f g h i j k l m n o p ;
do
echo "sd$i" "`scsi_id -g -u -s /block/sd$i` ";
done

sdc 1IET_00010001
sdd 1IET_00010002
sde 1IET_00010003
sdf 1IET_00010004
sdg 1IET_00010005
sdh 1IET_00010006
sdi 1IET_00010007
sdj 1IET_00010008
sdk 1IET_00010009
sdl 1IET_0001000a
sdm 1IET_0001000b
sdn 1IET_0001000c
sdo 1IET_0001000d
sdp 1IET_0001000e 

以上列出于块设备名对应的唯一标识名

3.创建必要的UDEV配置文件,

首先切换到配置文件目录

[root@rh2 ~]# cd /etc/udev/rules.d

定义必要的规则配置文件

[root@rh2 rules.d]# touch 99-oracle-asmdevices.rules 

[root@rh2 rules.d]# cat 99-oracle-asmdevices.rules
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010001", NAME="ocr1", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010002", NAME="ocr2", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010003", NAME="asm-disk1",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010004", NAME="asm-disk2",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010005", NAME="asm-disk3",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010006", NAME="asm-disk4",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010007", NAME="asm-disk5",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010008", NAME="asm-disk6",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010009", NAME="asm-disk7",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000a", NAME="asm-disk8",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000b", NAME="asm-disk9",  OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000c", NAME="asm-disk10", OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000d", NAME="asm-disk11", OWNER="grid",  GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000e", NAME="asm-disk12", OWNER="grid",  GROUP="asmadmin", MODE="0660"

Result 为/sbin/scsi_id -g -u -s %p的输出--Match the returned string of the last PROGRAM call. This key may be
used in any following rule after a PROGRAM call.
按顺序填入刚才获取的唯一标识名即可

OWNER为安装Grid Infrastructure的用户,在11gr2中一般为grid,GROUP为asmadmin
MODE采用0660即可

NAME为UDEV映射后的设备名,
建议为OCR和VOTE DISK创建独立的DISKGROUP,为了容易区分将该DISKGROUP专用的设备命名为ocr1..ocrn的形式
其余磁盘可以根据其实际用途或磁盘组名来命名

4.将该规则文件拷贝到其他节点上
[root@rh2 rules.d]# scp 99-oracle-asmdevices.rules Other_node:/etc/udev/rules.d

5.在所有节点上启动udev服务,或者重启服务器即可

[root@rh2 rules.d]# /sbin/udevcontrol reload_rules
[root@rh2 rules.d]# /sbin/start_udev
Starting udev:                                            [  OK  ]

6.检查设备是否到位

[root@rh2 rules.d]# cd /dev
[root@rh2 dev]# ls -l ocr*
brw-rw---- 1 grid asmadmin 8, 32 Jul 10 17:31 ocr1
brw-rw---- 1 grid asmadmin 8, 48 Jul 10 17:31 ocr2

[root@rh2 dev]# ls -l asm-disk*
brw-rw---- 1 grid asmadmin 8,  64 Jul 10 17:31 asm-disk1
brw-rw---- 1 grid asmadmin 8, 208 Jul 10 17:31 asm-disk10
brw-rw---- 1 grid asmadmin 8, 224 Jul 10 17:31 asm-disk11
brw-rw---- 1 grid asmadmin 8, 240 Jul 10 17:31 asm-disk12
brw-rw---- 1 grid asmadmin 8,  80 Jul 10 17:31 asm-disk2
brw-rw---- 1 grid asmadmin 8,  96 Jul 10 17:31 asm-disk3
brw-rw---- 1 grid asmadmin 8, 112 Jul 10 17:31 asm-disk4
brw-rw---- 1 grid asmadmin 8, 128 Jul 10 17:31 asm-disk5
brw-rw---- 1 grid asmadmin 8, 144 Jul 10 17:31 asm-disk6
brw-rw---- 1 grid asmadmin 8, 160 Jul 10 17:31 asm-disk7
brw-rw---- 1 grid asmadmin 8, 176 Jul 10 17:31 asm-disk8
brw-rw---- 1 grid asmadmin 8, 192 Jul 10 17:31 asm-disk9

11.2.0.2 asmcmd lsdg show incorrect diskgroup number

今天在给ASM扩磁盘组的时候发现11.2.0.2上asmcmd中lsdg命令所显示的磁盘组数不正确,现象如下:

Node A lsdg正常:
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    142976    50596                0           50596              0             Y  DATA/
MOUNTED  EXTERN  N         512   4096  1048576     51200    50742                0           50742              0             N  FRA/

Node B lsdg仅显示DATA磁盘组
ASMCMD> lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    142976    50596                0           50596              0             Y  DATA/

从Node B节点上ASM实例的动态视图查询显示正常:

SQL> select name from v$asm_diskgroup;

NAME
------------------------------
DATA
FRA


ASM diag info:
SQL> select sysdate "Date and Time" from dual;

Date and Time
--------------------
28-JUN-2011 23:30:27

SQL> 
SQL> select * from v$asm_diskgroup order by 1;
select * from v$asm_disk order by 1, 2, 3;
select * from gv$asm_operation order by 1;
select * from v$version where banner like '%Database%' order by 1;
select * from gv$asm_client order by 1;

prompt

prompt ASM Disk Groups
prompt ===============

select group_number  "Group"
,      name          "Group Name"
,      state         "State"
,      type          "Type"
,      total_mb/1024 "Total GB"
,      free_mb/1024  "Free GB"
from   v$asm_diskgroup
/

prompt
prompt ASM Disks
prompt ==============

col "Group"          form 999
col "Disk"           form 999
col "Header"         form a9
col "Mode"           form a8

GROUP_NUMBER NAME                           SECTOR_SIZE BLOCK_SIZE ALLOCATION_UNIT_SIZE STATE           TYPE      TOTAL_MB    FREE_MB HOT_USED_MB COLD_USED_MB REQUIRED_MIRROR_FREE_MB USABLE_FILE_MB OFFLINE_DISKS COMPATIBILITY                                                DATABASE_COMPATIBILITY                                       V
------------ ------------------------------ ----------- ---------- -------------------- --------------- ------- ---------- ---------- ----------- ------------ ----------------------- -------------- ------------- ------------------------------------------------------------ ------------------------------------------------------------ -
           0 FRA                                      0       4096                    0 DISMOUNTED                       0          0           0            0              0               0             0 0.0.0.0.0                                                    0.0.0.0.0                                                    N
           1 DATA                                   512       4096              1048576 MOUNTED         EXTERN      142976      50596           0        92380              0           50596             0 11.2.0.0.0                                                   10.1.0.0.0                                                   Y

SQL> col "Redundancy"     form a10
col "Failure Group"  form a10
col "Path"           form a19

 select group_number  "Group"
,      disk_number   "Disk"
,      header_status "Header"

GROUP_NUMBER DISK_NUMBER COMPOUND_INDEX INCARNATION MOUNT_S HEADER_STATU MODE_ST STATE           REDUNDA LIBRARY                                                              OS_MB   TOTAL_MB    FREE_MB HOT_USED_MB COLD_USED_MB NAME                    FAILGROUP                      LABEL
------------ ----------- -------------- ----------- ------- ------------ ------- --------------- ------- ---------------------------------------------------------------- ---------- ---------- ---------- ----------- ------------ ------------------------------ ------------------------------ -------------------------------
PATH                                                                                                                                                                    UDID                                                              PRODUCT                          CREATE_DATE          MOUNT_DATE           REPAIR_TIMER      READS    WRITES  READ_ERRS WRITE_ERRS  READ_TIME WRITE_TIME BYTES_READ BYTES_WRITTEN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------- -------------------- -------------------- ------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- -------------
P HASH_VALUE  HOT_READS HOT_WRITES HOT_BYTES_READ HOT_BYTES_WRITTEN COLD_READS COLD_WRITES COLD_BYTES_READ COLD_BYTES_WRITTEN V SECTOR_SIZE FAILGRO
- ---------- ---------- ---------- -------------- ----------------- ---------- ----------- --------------- ------------------ - ----------- -------
           0           0              0  3915932260 CLOSED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                               51200           0          0           0            0
/dev/raw/raw8                                                                                                                                                             28-JUN-2011 17:08:16 28-JUN-2011 17:08:25             0
           0                                                                                                                  N         512 REGULAR

           1           0       16777216  3915932274 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                        8096        8096       2827           0         5269 DATA_0000                      DATA_0000
/dev/raw/raw1                                                                                                                                                             29-MAR-2011 22:39:53 31-MAY-2011 06:42:41             0         26          1          0          0    .948212    .001678    1118208          4096
  1452020686          0          0              0                 0          0           0               0                  0 Y         512 REGULAR

           1           1       16777217  3915932273 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                        8096        8096       2861           0         5235 DATA_0001                      DATA_0001
/dev/raw/raw2                                                                                                                                                             29-MAR-2011 22:39:53 31-MAY-2011 06:42:41             0         15          0          0          0    .868808          0    1105920             0
   559735068          0          0              0                 0          0           0               0                  0 N         512 REGULAR

           1           2       16777218  3915932272 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                        8096        8096       2859           0         5237 DATA_0002                      DATA_0002
/dev/raw/raw3                                                                                                                                                             29-MAR-2011 23:36:28 31-MAY-2011 06:42:41             0         14          0          0          0    .738089          0      57344             0
  2179807744          0          0              0                 0          0           0               0                  0 N         512 REGULAR

           1           4       16777220  3915932270 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                        8192        8192       2896           0         5296 DATA_0004                      DATA_0004
/dev/raw/raw5                                                                                                                                                             21-APR-2011 00:57:45 31-MAY-2011 06:42:41             0         14          0          0          0    .738083          0      57344             0
  3104258115          0          0              0                 0          0           0               0                  0 N         512 REGULAR

           1           5       16777221  3915932271 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                        8096        8096       2856           0         5240 DATA_0005                      DATA_0005
/dev/raw/raw4                                                                                                                                                             30-MAY-2011 20:36:59 31-MAY-2011 06:42:41             0         15          0          0          0    .911934          0    1105920             0
  2075303794          0          0              0                 0          0           0               0                  0 N         512 REGULAR

           1           7       16777223  3915932269 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                               51200       51200      18149           0        33051 DATA_0007                      DATA_0007
/dev/raw/raw6                                                                                                                                                             28-JUN-2011 17:09:22 28-JUN-2011 17:09:22             0         39          5          0          0    .998594   6.022413     159744         20480
,      mode_status   "Mode"
  2166785847          0          0              0                 0         18           0           73728                  0 N         512 REGULAR

           1           8       16777224  3915932268 CACHED  MEMBER       ONLINE  NORMAL          UNKNOWN System                                                               51200       51200      18148           0        33052 DATA_0008                      DATA_0008
/dev/raw/raw7                                                                                                                                                             28-JUN-2011 17:09:58 28-JUN-2011 17:09:58             0         25          0          0          0    .760427          0     102400             0
  1385034587          0          0              0                 0          6           0           24576                  0 N         512 REGULAR


8 rows selected.

SQL> ,      state         "State"
,      redundancy    "Redundancy"
,      total_mb      "Total MB"
,      free_mb       "Free MB"
,      name          "Disk Name"
,      failgroup     "Failure Group"
,      path          "Path"
from   v$asm_disk
order by group_number
,        disk_number

/

prompt
prompt Instances currently accessing these diskgroups
prompt ==============================================

no rows selected

SQL> 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

SQL> 
select c.group_number  "Group"
,      g.name          "Group Name"
,      c.instance_name "Instance"

   INST_ID GROUP_NUMBER INSTANCE_NAME                                                    DB_NAME  STATUS       SOFTWARE_VERSION                                            COMPATIBLE_VERSION
---------- ------------ ---------------------------------------------------------------- -------- ------------ ------------------------------------------------------------ ------------------------------------------------------------
         1            1 +ASM1                                                            +ASM     CONNECTED    11.2.0.2.0                                                  11.2.0.2.0
         1            2 PROD1                                                            PROD     CONNECTED    11.2.0.2.0                                                  11.2.0.0.0
         1            1 PROD1                                                            PROD     CONNECTED    11.2.0.2.0                                                  11.2.0.0.0
         2            1 +ASM2                                                            +ASM     CONNECTED    11.2.0.2.0                                                  11.2.0.2.0

SQL> SQL> 
SQL> SQL> ASM Disk Groups
SQL> ===============
SQL> SQL>   2    3    4    5    6    7    8  from   v$asm_client c
,      v$asm_diskgroup g
where  g.group_number=c.group_number
/


     Group Group Name                State           Type      Total GB Free GB
---------- ------------------------- --------------- ------- ---------- -------
         1 DATA                      MOUNTED         EXTERN     139.625      49
         0 FRA                       DISMOUNTED                       0       0

SQL> SQL> 
SQL> ASM Disks
SQL> ==============
SQL> SQL> SQL> SQL> SQL> SQL> SQL> SQL> SQL> SQL>   2    3  prompt
  4    5    6    7    8    9   10   11   12   13   14   15  SQL> prompt Report the Percentage of Imbalance in all Mounted Diskgroups
prompt ==============================================
select dfail, count(dfail) from
(
select disk, count(failgroup) as dfail
from x$kfdpartner, v$asm_disk where
number_kfdpartner=disk_number and grp=group_number

Group Disk Header    Mode     State           Redundancy   Total MB    Free MB Disk Name                      Failure Gr Path
----- ---- --------- -------- --------------- ---------- ---------- ---------- ------------------------------ ---------- -------------------
    0    0 MEMBER    ONLINE   NORMAL          UNKNOWN             0          0                                           /dev/raw/raw8
    1    0 MEMBER    ONLINE   NORMAL          UNKNOWN          8096       2827 DATA_0000                      DATA_0000  /dev/raw/raw1
    1    1 MEMBER    ONLINE   NORMAL          UNKNOWN          8096       2861 DATA_0001                      DATA_0001  /dev/raw/raw2
    1    2 MEMBER    ONLINE   NORMAL          UNKNOWN          8096       2859 DATA_0002                      DATA_0002  /dev/raw/raw3
    1    4 MEMBER    ONLINE   NORMAL          UNKNOWN          8192       2896 DATA_0004                      DATA_0004  /dev/raw/raw5
    1    5 MEMBER    ONLINE   NORMAL          UNKNOWN          8096       2856 DATA_0005                      DATA_0005  /dev/raw/raw4
    1    7 MEMBER    ONLINE   NORMAL          UNKNOWN         51200      18149 DATA_0007                      DATA_0007  /dev/raw/raw6
    1    8 MEMBER    ONLINE   NORMAL          UNKNOWN         51200      18148 DATA_0008                      DATA_0008  /dev/raw/raw7

8 rows selected.

SQL> SQL> 
SQL> Instances currently accessing these diskgroups
SQL> ==============================================
SQL> SQL>   2    3    4    5    6    7  group by disk, failgroup
)
group by dfail; 

select g.name as "GROUP", d.name as "DISK", d.failgroup, fcnt, pcnt,
decode(pcnt - fcnt, 0, 'MUST', 'SHOULD') as action from

Group Group Name                Instance
----- ------------------------- ----------------------------------------------------------------
    1 DATA                      +ASM2

SQL> SQL> 
SQL> Report the Percentage of Imbalance in all Mounted Diskgroups
SQL> ==============================================
SQL>   2    3    4    5    6    7    8  (select gnum, DISK1, failgroup, count(failgroup) as fcnt from
(select gnum, DISK1
from
(
select d.group_number as gnum, disk as disk1,
count(distinct failgroup) as dfail
from x$kfdpartner, v$asm_disk_stat d where
number_kfdpartner=disk_number and grp=d.group_number

no rows selected

SQL> SQL>   2    3    4    5    6    7    8    9   10   11  and active_kfdpartner=1
 12  group by d.group_number, disk
 13  ), v$asm_disk_stat
 14  where dfail < 3
 15  and disk1=disk_number
 16  and gnum=group_number),
 17  x$kfdpartner, v$asm_disk_stat d where
 18  number_kfdpartner=disk_number and grp=d.group_number and grp=gnum
 19  and disk1=disk
 20  and active_kfdpartner=1
 21  group by gnum, disk1, failgroup),
 22  (select grp, disk, count(disk) as pcnt from x$kfdpartner where
 23  active_kfdpartner=1 group by grp, disk),
 24  v$asm_diskgroup_stat g, v$asm_disk_stat d
 25  where gnum=grp and gnum=g.group_number and gnum=d.group_number and
 26  disk=disk1 and disk=disk_number and
 27  ((fcnt = 1 and (pcnt - fcnt) > 3) or ((pcnt - fcnt) = 0))
 28  /

no rows selected

SQL> 
SQL> col TYPE form a15
SQL> col FILE_NUMBER form 9999 head FILE_NUM
SQL> col GROUP_NUMBER form 9999 head GR_NUM
SQL> col GB for 9999.99
SQL> 
SQL> select GROUP_NUMBER   ,
  2   FILE_NUMBER          ,
  3   COMPOUND_INDEX       ,
  4   INCARNATION          ,
  5   BLOCK_SIZE           ,
  6   BLOCKS               ,
  7   BYTES/1024/1024/1024 GB ,
  8   TYPE                 ,
  9   STRIPED              ,
 10   CREATION_DATE        ,
 11   MODIFICATION_DATE
 12  from v$asm_file
 13  where TYPE != 'ARCHIVELOG'
 14  /


GR_NUM FILE_NUM COMPOUND_INDEX INCARNATION BLOCK_SIZE     BLOCKS       GB TYPE            STRIPE CREATION_DATE        MODIFICATION_DATE
------ -------- -------------- ----------- ---------- ---------- -------- --------------- ------ -------------------- --------------------
     1      253       16777469   747096005        512          3      .00 ASMPARAMETERFIL COARSE 29-MAR-2011 22:40:05 29-MAR-2011 22:00:00
                                                                          E

     1      255       16777471   747096007       4096      66591      .25 OCRFILE         COARSE 29-MAR-2011 22:40:07 28-JUN-2011 19:00:00
     1      256       16777472   747100091       8192     116481      .89 DATAFILE        COARSE 29-MAR-2011 23:48:10 28-JUN-2011 18:00:00
     1      257       16777473   747100091       8192     122881      .94 DATAFILE        COARSE 29-MAR-2011 23:48:11 28-JUN-2011 18:00:00
     1      258       16777474   747100093       8192      78721      .60 DATAFILE        COARSE 29-MAR-2011 23:48:12 28-JUN-2011 18:00:00
     1      259       16777475   747100093       8192     883201     6.74 DATAFILE        COARSE 29-MAR-2011 23:48:12 28-JUN-2011 18:00:00
     1      260       16777476   747100215      16384       1545      .02 CONTROLFILE     FINE   29-MAR-2011 23:50:14 28-JUN-2011 19:00:00
     1      261       16777477   747100215      16384       1545      .02 CONTROLFILE     FINE   29-MAR-2011 23:50:14 28-JUN-2011 19:00:00
     1      262       16777478   747100217        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:50:16 28-JUN-2011 22:00:00
     1      263       16777479   747100219        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:50:18 28-JUN-2011 22:00:00
     1      264       16777480   747100219        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:50:19 28-JUN-2011 20:00:00
     1      265       16777481   747100223        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:50:22 28-JUN-2011 20:00:00
     1      266       16777482   747100227       8192    1889537    14.42 TEMPFILE        COARSE 29-MAR-2011 23:50:27 28-JUN-2011 23:00:00
     1      267       16777483   747100231       8192      44241      .34 DATAFILE        COARSE 29-MAR-2011 23:50:31 28-JUN-2011 19:00:00
     1      268       16777484   747100469       8192     144001     1.10 DATAFILE        COARSE 29-MAR-2011 23:54:29 28-JUN-2011 18:00:00
prompt
     1      269       16777485   747100547        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:55:47 28-JUN-2011 19:00:00
     1      270       16777486   747100549        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:55:48 28-JUN-2011 19:00:00
     1      271       16777487   747100549        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:55:49 28-JUN-2011 19:00:00
     1      272       16777488   747100553        512     102401      .05 ONLINELOG       COARSE 29-MAR-2011 23:55:52 28-JUN-2011 19:00:00
     1      273       16777489   747100553        512          9      .00 PARAMETERFILE   COARSE 29-MAR-2011 23:55:53 12-MAY-2011 17:00:00
prompt free ASM disks and their paths
     1      274       16777490   754587291       8192       2561      .02 DATAFILE        COARSE 23-JUN-2011 15:34:50 28-JUN-2011 18:00:00
     1      419       16777635   754281899       4096         91      .00 DUMPSET         COARSE 20-JUN-2011 02:44:58 20-JUN-2011 02:00:00
     1      657       16777873   755047049       8192     256001     1.95 TEMPFILE        COARSE 28-JUN-2011 23:17:28 28-JUN-2011 23:00:00
     1      658       16777874   755032881       8192    8171521    62.34 DATAFILE        COARSE 28-JUN-2011 19:21:20 28-JUN-2011 22:00:00

24 rows selected.

SQL> SQL> 
SQL> free ASM disks and their paths
SQL> prompt ===========================
===========================
SQL> select header_status , mode_status, path from V$asm_disk
  2  where header_status in ('FORMER','CANDIDATE')
  3  /

show parameter asm
show parameter size
show parameter proc
show parameter cluster
show parameter instance_type
show parameter instance_name

show parameter pfile

show sga

spool off
no rows selected

SQL> SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
asm_diskgroups                       string
asm_diskstring                       string          /dev/raw/raw*
asm_power_limit                      integer         10
asm_preferred_read_failure_groups    string
SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
db_cache_size                        big integer     0
large_pool_size                      big integer     12M
max_dump_file_size                   string          unlimited
sga_max_size                         big integer     272M
shared_pool_reserved_size            big integer     7969177
shared_pool_size                     big integer     0
sort_area_size                       integer         65536
workarea_size_policy                 string          AUTO
SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
processes                            integer         100
SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
cluster_database                     boolean         TRUE
cluster_database_instances           integer         4
cluster_interconnects                string
SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
instance_type                        string          asm
SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
instance_name                        string          +ASM2
SQL> SQL> 
NAME                                 TYPE            VALUE
------------------------------------ --------------- ------------------------------
spfile                               string          +DATA/rh-cluster/asmparameterf
                                                     ile/registry.253.747096005
SQL> SQL> 
Total System Global Area  283930624 bytes
Fixed Size                  2225792 bytes
Variable Size             256539008 bytes
ASM Cache                  25165824 bytes

原来是因为lsdg只列出已经mount的diskgroup,因为在Node B上FRA磁盘组没有mount所以不被列出,我本来以为是Bug造成的。
感谢Liqin Zhang网友的指出。

Oracle等待事件kfk:async disk IO

kfk: async disk IO等待事件是ASM下异步的System I/O等待事件,kfk内核层面在disk_asynch_io=true时被激活。当rbal或其他ASM相关后台进程在维护ASM磁盘组时可能进入kfk: async disk IO等待。

SQL> col name for a20
SQL> col PARAMETER1 for a10
SQL> col PARAMETER2 for a10
SQL> col PARAMETER3 for a10
SQL> col WAIT_CLASS for a15

SQL> select name,parameter1,parameter2,parameter3,wait_class from v$event_name where name='kfk: async disk IO';

NAME                 PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
-------------------- ---------- ---------- ---------- ---------------
kfk: async disk IO   count      intr       timeout    System I/O

SQL> select * from v$version;    

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> select name,value from v$system_parameter where name in ('instance_type','asm_power_limit');

NAME                 VALUE
-------------------- ----------
instance_type        asm
asm_power_limit      10

SQL> conn / as sysasm
Connected.

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug event 10046 trace name context forever,level 8;
Statement processed.
SQL> alter diskgroup data check all;

Diskgroup altered.

SQL> oradebug event 10046 trace name context off;
Statement processed.

SQL> oradebug tracefile_name;
/s01/orabase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_29405.trc

=====================trace=====================
PARSING IN CURSOR #140442102181424 len=30 dep=0 uid=0 oct=193 
lid=0 tim=1313673029551496 hv=2849532521 ad='6bd58b50' sqlid='ft5h7dunxhum9'
alter diskgroup data check all
END OF STMT
PARSE #140442102181424:c=1999,e=14171,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=1313673029551493
WAIT #140442102181424: nam='Disk file operations I/O' ela= 573 FileOperation=2 fileno=0 filetype=15 obj#=-1 
WAIT #140442102181424: nam='Disk file operations I/O' ela= 33 FileOperation=2 fileno=0 filetype=15 obj#=-1 
WAIT #140442102181424: nam='Disk file operations I/O' ela= 29 FileOperation=2 fileno=0 filetype=15 obj#=-1 
WAIT #140442102181424: nam='kfk: async disk IO' ela= 941 count=1 intr=0 timeout=4294967295 obj#=-1

fdp_checkDsk(): 20
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
WAIT #140442102181424: nam='rdbms ipc reply' ela= 1610 from_process=19 timeou
t=2147483647 p3=0 obj#=-1 tim=1313673029798048
kfdp_checkDsk(): 21
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
WAIT #140442102181424: nam='rdbms ipc reply' ela= 1677 from_process=19 timeou
t=2147483647 p3=0 obj#=-1 tim=1313673029885645
kfdp_checkDsk(): 22
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----
WAIT #140442102181424: nam='rdbms ipc reply' ela= 1350 from_process=19 timeou
t=2147483647 p3=0 obj#=-1 tim=1313673029970397
kfdp_checkDsk(): 23
----- Abridged Call Stack Trace -----
kfdp_checkDsk(): 24
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfdp_checkDsk()+476<-kfdCheck()+1649<-kfgCheck()+477<-kfxdrvAl
terOne()+5976<-kfxdrvAlter()+2287<-kfxdrvEntry()+1306<-opiexe()+20028<-opiosq
0()+3993<-kpooprx()+274<-kpoal8()+800<-opiodr()+910<-ttcpip()+2289<-opitsk()+
1670<-opiino()+966<-opiodr()+910
<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
----- End of Abridged Call Stack Trace -----

Script:收集ASM诊断信息

收集ASM诊断信息最佳的工具仍是RDA,对于不能使用RDA的环境可以采用如下脚本:

 

spool asm_diag1.txt
set pagesize 1000
set lines 500
col "Group Name"   form a25
col "Disk Name"    form a30
col "State"  form a15
col "Type"   form a7
col "Free GB"   form 9,999
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
select sysdate "Date and Time" from dual;

select * from v$asm_diskgroup order by 1;
select * from v$asm_disk order by 1, 2, 3;
select * from gv$asm_operation order by 1;
select * from v$version where banner like '%Database%' order by 1;
select * from gv$asm_client order by 1;

prompt

prompt ASM Disk Groups
prompt ===============

select group_number  "Group"
,      name          "Group Name"
,      state         "State"
,      type          "Type"
,      total_mb/1024 "Total GB"
,      free_mb/1024  "Free GB"
from   v$asm_diskgroup
/

prompt
prompt ASM Disks
prompt ==============

col "Group"          form 999
col "Disk"           form 999
col "Header"         form a9
col "Mode"           form a8
col "Redundancy"     form a10
col "Failure Group"  form a10
col "Path"           form a19

 select group_number  "Group"
,      disk_number   "Disk"
,      header_status "Header"
,      mode_status   "Mode"
,      state         "State"
,      redundancy    "Redundancy"
,      total_mb      "Total MB"
,      free_mb       "Free MB"
,      name          "Disk Name"
,      failgroup     "Failure Group"
,      path          "Path"
from   v$asm_disk
order by group_number
,        disk_number

/

prompt
prompt Instances currently accessing these diskgroups
prompt ==============================================

select c.group_number  "Group"
,      g.name          "Group Name"
,      c.instance_name "Instance"
from   v$asm_client c
,      v$asm_diskgroup g
where  g.group_number=c.group_number
/

prompt
prompt Report the Percentage of Imbalance in all Mounted Diskgroups
prompt ==============================================
select dfail, count(dfail) from
(
select disk, count(failgroup) as dfail
from x$kfdpartner, v$asm_disk where
number_kfdpartner=disk_number and grp=group_number
group by disk, failgroup
)
group by dfail; 

select g.name as "GROUP", d.name as "DISK", d.failgroup, fcnt, pcnt,
decode(pcnt - fcnt, 0, 'MUST', 'SHOULD') as action from
(select gnum, DISK1, failgroup, count(failgroup) as fcnt from
(select gnum, DISK1
from
(
select d.group_number as gnum, disk as disk1,
count(distinct failgroup) as dfail
from x$kfdpartner, v$asm_disk_stat d where
number_kfdpartner=disk_number and grp=d.group_number
and active_kfdpartner=1
group by d.group_number, disk
), v$asm_disk_stat
where dfail < 3
and disk1=disk_number
and gnum=group_number),
x$kfdpartner, v$asm_disk_stat d where
number_kfdpartner=disk_number and grp=d.group_number and grp=gnum
and disk1=disk
and active_kfdpartner=1
group by gnum, disk1, failgroup),
(select grp, disk, count(disk) as pcnt from x$kfdpartner where
active_kfdpartner=1 group by grp, disk),
v$asm_diskgroup_stat g, v$asm_disk_stat d
where gnum=grp and gnum=g.group_number and gnum=d.group_number and
disk=disk1 and disk=disk_number and
((fcnt = 1 and (pcnt - fcnt) > 3) or ((pcnt - fcnt) = 0))
/

col TYPE form a15
col FILE_NUMBER form 9999 head FILE_NUM
col GROUP_NUMBER form 9999 head GR_NUM
col GB for 9999.99

select GROUP_NUMBER   ,
 FILE_NUMBER          ,
 COMPOUND_INDEX       ,
 INCARNATION          ,
 BLOCK_SIZE           ,
 BLOCKS               ,
 BYTES/1024/1024/1024 GB ,
 TYPE                 ,
 STRIPED              ,
 CREATION_DATE        ,
 MODIFICATION_DATE
from v$asm_file
where TYPE != 'ARCHIVELOG'
/

prompt
prompt free ASM disks and their paths
prompt ===========================
select header_status , mode_status, path from V$asm_disk
where header_status in ('FORMER','CANDIDATE')
/

show parameter asm
show parameter size
show parameter proc
show parameter cluster
show parameter instance_type
show parameter instance_name

show parameter pfile

show sga

spool off

 

Code to be run on the ASM instance. Use file asmdebug.sql

 

set newpage none
set linesize 100
spool /tmp/asmdebug.out
--
-- Get a timestamp
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
--
-- Diskgroup information
set head off
select 'Diskgroup Information' from dual;
set head on
column name format a15
column DG# format 99
select group_number DG#, name, state, type, total_mb, free_mb from
v$asm_diskgroup;
--
-- Get the # of Allocation Units  per DG
set head off
select 'Number of AUs per diskgroup' from dual;
set head on
select count(number_kfdat) AU_count, group_kfdat DG# from x$kfdat
group by group_kfdat;
--
-- Get the # of Allocation Units  per DiskGroup and Disk
set head off
select 'Number of AUs per Diskgroup,Disk' from dual;
col "group#,disk#" for a30
set head on
select count(*)AU_count, GROUP_KFDAT||','||number_kfdat "group#,disk#" from x$kfdat group by GROUP_KFDAT,number_kfdat;
--
-- Get the # of allocated (V) and free (F) Allocation Units
set head off
select 'Number of allocated (V) and free (F) Allocation Units' from dual;
col "VF" for a2
set head on
select GROUP_KFDAT "group#", number_kfdat "disk#", v_kfdat "VF", count(*)
from x$kfdat
group by GROUP_KFDAT, number_kfdat, v_kfdat;
--
-- Get the # of Allocation Units per ASM file
set head off
select 'Number of AUs per ASM file ordered by AU count for metadata only'
from dual;
set head on
select count(XNUM_KFFXP) AU_count,  NUMBER_KFFXP file#, GROUP_KFFXP DG# from x$kffxp where NUMBER_KFFXP < 256
group by NUMBER_KFFXP, GROUP_KFFXP
order by count(XNUM_KFFXP) ;
--
-- Get the # of Allocation Units per ASM file by file alias.  Change the
-- system_created Y|N depending if you want the short or long ASM name
set head off
select 'Number of AUs per ASM file ordered by AU count.  This is for non
metadata' from dual;
col name format a60
set head on
select GROUP_KFFXP, NUMBER_KFFXP, name, count(*)
from x$kffxp, v$asm_alias
where GROUP_KFFXP=GROUP_NUMBER and NUMBER_KFFXP=FILE_NUMBER and
system_created='Y'
     group by GROUP_KFFXP, NUMBER_KFFXP, name
     order by GROUP_KFFXP, NUMBER_KFFXP;
--
-- Get partner information.  This is really only useful if redundancy is other than
-- external.
set head off
select 'The following shows the disk to partner relationship.  This is really only
useful if using normal or high redundancy.' from dual;
set head on
select grp DG#, disk, NUMBER_KFDPARTNER partner, PARITY_KFDPARTNER parity, ACTIVE_KFDPARTNER active
from x$kfdpartner;
--
-- Another look at file utilization.
set head off
set linesize 132
select 'bytes is the sum of AUs with data in them * 1024^2
space is the sum of all AUs allocated for this file * 1024^2'
from dual;
set head on
col Name format a60
select f.group_number, f.file_number, bytes, space, space/(1024*1024) "InMB", a.name "Name"
from v$asm_file f, v$asm_alias a
where f.group_number=a.group_number and f.file_number=a.file_number
    and system_created='Y'
    order by f.group_number, f.file_number;
--
-- Get robust disk information
set linesize 400
col failgroup format a20
col label format a20
col name format a40
col path format a40
set head off
select 'Robust disk information' from dual;
set head on
select GROUP_NUMBER, DISK_NUMBER, INCARNATION, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE, LIBRARY, TOTAL_MB, FREE_MB,
NAME, FAILGROUP, LABEL, PATH, CREATE_DATE, MOUNT_DATE, READS,
WRITES, READ_ERRS, WRITE_ERRS, READ_TIME, WRITE_TIME, BYTES_READ, BYTES_WRITTEN
from v$asm_disk;
--
spool off

 

 

Code to be executed on the database instances using this ASM instance. Use file rdbmsdebug.sql

 

set newpage none
spool /tmp/rdbmsdebug.out
--
-- Get a timestamp
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
--
-- Get datafile information as the database sees it
set head off
select 'V$DATAFILE information' from dual;
set head on
set linesize 132
col name format a60
select file#, name, block_size, blocks, bytes, bytes/(1024*1024) "InMB",
status from v$datafile;
--
-- Get controlfile information as the database sees it
set head off
select 'V$CONTROLFILE information' from dual;
set head on
select * from v$controlfile;
--
-- Get archivelog information as the database sees it
set head off
select 'GV$ARCHIVED_LOG information' from dual;
set head on
select name, thread#, sequence#, blocks*block_size "size", status
status from gv$archived_log
order by thread#,sequence#;
--
-- Get redolog information as the database sees it
set head off
select 'v$log and v$logfile information' from dual;
set head on
col member format a60
select a.group#, member, thread#, sequence#, bytes, a.status
from v$log a, v$logfile b where a.group# = b.group#
order by thread#;
--
-- Get tempfiles information as the database sees it
set head off
select 'GV$TEMPFILE information' from dual;
set head on
col name format a60
select INST_ID,TS#,FILE#, RFILE#,NAME, CREATION_CHANGE# , CREATION_TIME,  STATUS, BYTES, BLOCKS, CREATE_BYTES, BLOCK_SIZE
from gv$tempfile order by inst_id, file#;
spool off

 

Other items to collect:

System error logs
Alert logs from all database and ASM instances
All recent trace files from all databases involved and all of the ASM instances
All “.trc” files from the ASM instances

 

check ASM disk space

 

1) Determine which (if any) disks contain no free space (ie are below the threshold)
select group_kfdat "group #",
       number_kfdat "disk #",
       count(*) "# AU's"
from x$kfdat a
where v_kfdat = 'V'
and not exists (select *
                from x$kfdat b
                where a.group_kfdat = b.group_kfdat
                and a.number_kfdat = b.number_kfdat
                and b.v_kfdat = 'F')
group by GROUP_KFDAT, number_kfdat;

If no rows are returned ... the following query can also be used

select disk_number "Disk #", free_mb
from v$asm_disk
where group_number = *** disk group number ***
order by 2;

If rows are returned from the first query ... or FREE_MB is less than 100mb in the second ... then there is probably insufficient disk space to allow a rebalance to occur ... Note the Disk #'s for later

2) Determine which files have allocation units on the disk(s) that are on exhausted disks

select name, file_number
from v$asm_alias
where group_number in (select group_kffxp
                                           from x$kffxp
                                           where group_kffxp=*** disk group number ***
                                           and disk_kffxp in (*** disk list from #1 above ***)
                                           and au_kffxp != 4294967294
                                           and number_kffxp >= 256)
and file_number in (select number_kffxp
                                  from x$kffxp
                                  where group_kffxp=*** disk group number ***
                                  and disk_kffxp in (*** disk list from #1 above ***)
                                  and au_kffxp != 4294967294
                                  and number_kffxp >= 256)
and system_created='Y';

3) Free up space so that the rebalance can occur

Using the file list from #2 above ... we will need to either drop or move tablespace(s)/datafile(s) such that all disks that are exhausted have at least 100mb free ...

NOTE ... the AU count above ... should relate to 1mb AU size ... so if a single file ... with at least 100 au's can be dropped or moved ... this
should be sufficient to free up enough space to allow the rebalance to occur

Droppable tablespaces may be things like:
    * temporary tablespaces
    * index tablespaces (assuming you know how to rebuild the indexes)

If none of the tablespaces are droppable then the tablespace(s)/datafile(s) will need to be
    * moved to another diskgroup (at least temporarily) ...
    * dropped using RMAN (with the database shutdown) and will be restored later

   Note 330103.1  How to Move Asm Database Files From one Diskgroup To Another ?

4) Check to see if there is sufficient FREE_MB on the problem disks

select disk_number "Disk #", free_mb
from v$asm_disk
where disk_group = *** disk group number ***
and disk_number in (*** disk list from #1 above ***)
order by 2;

 

Check Diskgroup Balance

If you want to perform an imbalance check for all mounted diskgroups, run the script in MOS Note 367445.1.

If you want to determine if you already have imbalance for a file in an existing diskgroup, use the following query:

select disk_kffxp, sum(size_kffxp) from x$kffxp where group_kffxp=AAA and number_kffxp=BBB and lxn_kffxp=0 group by disk_kffxp order by 2;

Breakdown of input/output is as follows:

  • AAA is the group_number in v$asm_alias
  • BBB is file_number in v$asm_alias
  • disk_kffxp gives us the disk number.
  • size_kffxp is used such that we account for variable sized extents.
  • sum(size_kffxp) provides the number of AUs that are on that disk.
  • lxn_kffxp is used in the query such that we go after only the primary extents, not secondary extents

If you want to check balance from an IO perspective, query the statistics in v$asm_disk_iostat before and after running a large SQL statement. For example, if the running a large query that does just reads, the reads and read_bytes columns should be roughly the same for all disks in the diskgroup.

Oracle内部错误:ORA-00600[kfioTranslateIO03]一例

一套Linux x86-64上的11.2.0.2 RAC+ASM系统,其中一个节点出现了ORA-00600[kfioTranslateIO03]内部错误,其具体日志如下:

=============================alert.log===============================
adrci> show alert -tail -f 

2011-05-30 20:29:12.657000 +08:00
Starting background process RSMN
RSMN started with pid=31, OS id=22084
ORACLE_BASE not set in environment. It is recommended
that ORACLE_BASE be set in the environment
Reusing ORACLE_BASE from an earlier startup = /s01/orabase
ALTER DATABASE MOUNT /* db agent *//* {0:7:3} */
This instance was first to mount
2011-05-30 20:29:15.026000 +08:00
Sweep [inc][100831]: completed
Sweep [inc2][100831]: completed
NOTE: Loaded library: System
ORA-15025: could not open disk "/dev/raw/raw1"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw2"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw3"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
ORA-15025: could not open disk "/dev/raw/raw5"
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 9
SUCCESS: diskgroup DATA was mounted
NOTE: dependency between database PROD and diskgroup resource ora.DATA.dg is established
Errors in file /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22056.trc  (incident=104831):

ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], [] 

Incident details in: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_104831/PROD1_ckpt_22056_i104831.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

adrci> show problem 

ADR Home = /s01/orabase/diag/rdbms/prod/PROD1:
*************************************************************************
PROBLEM_ID           PROBLEM_KEY                                                 LAST_INCIDENT
-------------------- ----------------------------------------------------------- --------------------
2                    ORA 7445 [kghdmp_new()+1133]                                18387
3                    ORA 7445 [kghfnd()+2672]                                    20701
5                    ORA 7445 [kcldmp()+246]                                     28229
6                    ORA 7445 [kclxle()+311]                                     28230
1                    ORA 4031                                                    56918
4                    ORA 445                                                     90278
7                    ORA 600 [kfioTranslateIO03]                                 108831

adrci> show incident -mode detail -p "incident_id=108831"

ADR Home = /s01/orabase/diag/rdbms/prod/PROD1:
*************************************************************************

**********************************************************
INCIDENT INFO RECORD 1
**********************************************************
   INCIDENT_ID                   108831
   STATUS                        ready
   CREATE_TIME                   2011-05-30 20:31:55.484000 +08:00
   PROBLEM_ID                    7
   CLOSE_TIME
   FLOOD_CONTROLLED              none
   ERROR_FACILITY                ORA
   ERROR_NUMBER                  600
   ERROR_ARG1                    kfioTranslateIO03
   ERROR_ARG2
   ERROR_ARG3
   ERROR_ARG4
   ERROR_ARG5
   ERROR_ARG6
   ERROR_ARG7
   ERROR_ARG8
   ERROR_ARG9
   ERROR_ARG10
   ERROR_ARG11
   ERROR_ARG12
   SIGNALLING_COMPONENT          ASM
   SIGNALLING_SUBCOMPONENT
   SUSPECT_COMPONENT
   SUSPECT_SUBCOMPONENT
   ECID
   IMPACTS                       0
   PROBLEM_KEY                   ORA 600 [kfioTranslateIO03]
   FIRST_INCIDENT                96831
   FIRSTINC_TIME                 2011-05-30 20:24:40.372000 +08:00
   LAST_INCIDENT                 108831
   LASTINC_TIME                  2011-05-30 20:31:55.484000 +08:00
   IMPACT1                       0
   IMPACT2                       0
   IMPACT3                       0
   IMPACT4                       0
   KEY_NAME                      ProcId
   KEY_VALUE                     19.1
   KEY_NAME                      Client ProcId
   KEY_VALUE                     oracle@rh2.oracle.com.22504_139763918456544
   KEY_NAME                      SID
   KEY_VALUE                     397.1
   OWNER_ID                      1
   INCIDENT_FILE                 /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc
   OWNER_ID                      1
   INCIDENT_FILE                 /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc
1 rows fetched

===================================trace===================================

adrci> view /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc

Dump continued from file: /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

========= Dump for incident 108831 (ORA 600 [kfioTranslateIO03]) ========
----- Beginning of Customized Incident Dump(s) -----
kfioRqSet=0x7f1d524151c0 parent=0x7fffb2642d30 gn=(64.0) cnt=0
  size=32768 vxn=0 byte offset=16384 buf offset=0
  tried[0]=0 tried[1]=0 tried[2]=0 tried[3]=0 tried[4]=0 tried[5]=0
  skipped[0]=0 skipped[1]=0 skipped[2]=0 skipped[3]=0 skipped[4]=0 skipped[5]=0
parent :
DDE: Ending a split invocation on error recording!
----- End of Customized Incident Dump(s) -----

*** 2011-05-30 20:31:55.548
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36        call     kgdsdst()            000000000 ? 000000000 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000001 ? 000000002 ?
ksedst1()+98         call     skdstdst()           000000000 ? 000000000 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedst()+34          call     ksedst1()            000000000 ? 000000001 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkedDefDump()+2741  call     ksedst()             000000000 ? 000000001 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksedmp()+36          call     dbkedDefDump()       000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
ksfdmp()+64          call     ksedmp()             000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexPhaseII()+1764  call     ksfdmp()             000000003 ? 000000002 ?
                                                   7FFFB2634D58 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgexExplicitEndInc  call     dbgexPhaseII()       7F1D5281F710 ? 7F1D52822500 ?
()+750                                             7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeEndDDEInvocatio  call     dbgexExplicitEndInc  7F1D5281F710 ? 7F1D52822500 ?
nImpl()+767                   ()                   7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgeEndSpltInvokOnR  call     dbgeEndDDEInvocatio  7F1D5281F710 ? 7F1D52822500 ?
ec()+265                      nImpl()              7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbgePostErrorKGE()+  call     dbgeEndSpltInvokOnR  7F1D5281F710 ? 7F1D52822500 ?
248                           ec()                 7FFFB2640890 ? 000000001 ?
                                                   000000000 ? 000000002 ?
dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   000000000 ? 7F1D52830E40 ?
63                                                 000003AE9 ? 000000000 ?
                                                   100000000 ? 000000002 ?
kgeade()+351         call     dbkePostKGE_kgsf()   00B7C8EA0 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000000 ?
                                                   100000000 ? 000000002 ?
kgerelv()+135        call     kgeade()             00B7C8EA0 ? 00B7C9050 ?
                                                   7F1D52830E40 ? 000003AE9 ?
                                                   100000000 ? 000000002 ?
kserecl0()+157       call     kgerelv()            00B7C8EA0 ? 7F1D52830E40 ?
                                                   000003AE9 ? 00952980C ?
                                                   7FFFB2641C10 ? 000000000 ?
kfioErrorRecord()+7  call     kserecl0()           00B7C8EA0 ? 7F1D52830E40 ?
6                                                  000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfiorq_dump()+129    call     kfioErrorRecord()    7FFFB2642D30 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfioRqSetDump()+565  call     kfiorq_dump()        7FFFB2642D30 ? 7F1D52830E40 ?
                                                   000003AE9 ? 000000005 ?
                                                   7FFFB2641C60 ? 000000000 ?
kfioTranslateIO()+3  call     kfioRqSetDump()      7F1D524151C0 ? 7F1D52830E40 ?
079                                                000003AE9 ? 000000005 ?
kfioRqSetPrepare()+  call     kfioTranslateIO()    7F1D524151C0 ? 7F1D52415098 ?
1017                                               7FFFB26421D4 ? 7FFFB26421D0 ?
                                                   0D4F338B0 ? 000000000 ?
kfioSubmitIO()+2852  call     kfioRqSetPrepare()   7F1D524151C0 ? 7F1D52415098 ?
                                                   7FFFB26425D8 ? 7FFFB2642608 ?
                                                   0D4F338B0 ? 000000000 ?
kfioRequestPriv()+1  call     kfioSubmitIO()       7FFFB2642E10 ? 000000001 ?
94                                                 7FFFB26425D8 ? 7FFFB2642608 ?
                                                   0D4F338B0 ? 000000000 ?
kfioRequest()+701    call     kfioRequestPriv()    000000000 ? 000000001 ?
                                                   7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfd_kfioRequest()+  call     kfioRequest()        7FFFB2642E10 ? 000000001 ?
644                                                7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 7FFF00000000 ?
ksfd_osmio()+1050    call     ksfd_kfioRequest()   7FFFB2642E10 ? 000000001 ?
                                                   7FFFB2642E18 ? 000000001 ?
                                                   000000000 ? 000000000 ?
ksfd_io()+2717       call     ksfd_osmio()         000000001 ?
                                                   FFFFFFFFB2642D30 ?
                                                   FFFFFFFFB2642D30 ?
                                                   0D400A0B0 ? 000008000 ?
                                                   7FFFB2643170 ?
ksfdread()+576       call     ksfd_io()            0D400A0B0 ? 000000001 ?
                                                   7F1D52417E00 ? 000008000 ?
                                                   000000000 ? 000000703 ?
kcc_identify_file()  call     ksfdread()           0D400A0B0 ? 000000001 ?
+309                                               7F1D52417E00 ? 000008000 ?
                                                   000000000 ? 000000703 ?
kcc_identify()+225   call     kcc_identify_file()  0D400A0B0 ? 7F1D52417E00 ?
                                                   000000000 ? 060019450 ?
                                                   060019630 ? 0DAC34670 ?
kccida()+225         call     kcc_identify()       000000000 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?
ksbabs()+771         call     kccida()             7FFFB2643B08 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?
ksbrdp()+971         call     ksbabs()             7FFFB2643B08 ? 7F1D52417E00 ?
                                                   060019630 ? 7FFFB26434A4 ?
                                                   000000000 ? 0DAC34670 ?

adrci> view /s01/orabase/diag/rdbms/prod/PROD1/trace/PROD1_ckpt_22504.trc

NOTE: disk 4 is missing from group 1
Incident 108831 created, dump file: /s01/orabase/diag/rdbms/prod/PROD1/incident/incdir_108831/PROD1_ckpt_22504_i108831.trc
ORA-00600: internal error code, arguments: [kfioTranslateIO03], [], [], [], [], [], [], [], [], [], [], []

=========Start of 'kfiorq = [0x7fffb2642d30]' dumping =========
        Status            =  UNKWOWN
        Flags             =  READ |  SYNC
        Mirror side       = 0
        Fib               = 0xd4f338b0
        Offset            = 1
        buffer ptr        = 0x7f1d52417e00
        Rcount            = 32768
        err_kfiorq        = 15081
        Inflight disk IO  = 0
        Completed disk IO = 0
        Oracle error      = 0
        Intended zone     = 48
  ===Dump of all attached kfiodrq's===
=========End of 'kfiorq = [0x7fffb2642d30]' dumping =========

parent :
############# kfiofib = 0xd4f338b0 #################
Diskgroup Name     =
File number        = 261.747100215
File type          = 1
Flags              = 10
Blksize            = 16384
File size          = 1131 blocks
Blk one offset     = 1
Redundancy         = 17
Physical blocksz   = 512
Open name          = +DATA/prod/controlfile/current.261.747100215
Fully-qualified nm =+DATA/prod/controlfile/current.261.747100215
Mapid             = 2
Slave ID          = -1
Connection        = 0x(nil)
############################################
Error ORA-600 signaled at ksedsts()+461<-ksf_short_stack()+77<-kge_snap_callstack()+63<-kge_sigtrace_dump()+69<-kgepop()+712<-kgersel()+175<-kfioTranslateIO()+3138<-kfi
oRqSetPrepare()+1022<-kfioSubmitIO()+2857<-kfioRequestPriv()+199<-kfioRequest()+706<-ksfd_kfioRequest()+649<-ksfd_osmio()+1055<-ksfd_io()+2722<-ksfdread()+581<-kcc_iden
tify_file()+314<-kcc_identify()+230<-kccida()+230<-ksbabs()+771<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<
-__libc_start_main()+244<-_start()+36
ERROR: unrecoverable error ORA-600 raised in ASM I/O path; terminating process 22504
----- Abridged Call Stack Trace -----
ksedsts()+461<-kfioRequest()+2157<-ksfd_kfioRequest()+649<-ksfd_osmio()+1055<-ksfd_io()+2722<-ksfdread()+581<-kcc_identify_file()+314<-kcc_identify()+230<-kccida()+230<
-ksbabs()+771<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252
<-main()+201<-__libc_start_main()+244<-_start()+36 ----- End of Abridged Call Stack Trace ----- *** 2011-05-30 20:31:56.271 KSU: Terminating fatal process 'oracle@rh2.oracle.com (CKPT)' adrci> ips create package
Created package 2 without any contents, correlation level typical

adrci> ips add problem 7 package 2
Added problem 7 to package 2

adrci> ips finalize package 2
Finalized package 2

adrci> ips generate package 2 in /tmp
Generated package 2 in file /tmp/IPSPKG_20110531224208_COM_1.zip, mode complete

诊断发现由于ASM diskgroup磁盘组中的磁盘设备文件/dev/raw/raw*的权限被修改成了0600,而这些裸设备的拥有者为grid用户,导致oracle用户无法读写这些裸设备,通过将设备文件的权限修改为0660,解决了该问题。

Why ASMLIB and why not?

ASMLIB是一种基于Linux module,专门为Oracle Automatic Storage Management特性设计的内核支持库(kernel support library)。

长久以来我们对ASMLIB的认识并不全面,这里我们来具体了解一下使用ASMLIB的优缺点。

理论上我们可以从ASMLIB API中得到的以下益处:

  1. 总是使用direct,async IO
  2. 解决了永久性设备名的问题,即便在重启后设备名已经改变的情况下
  3. 解决了文件权限、拥有者的问题
  4. 减少了I/O期间从用户模式到内核模式的上下文切换,从而可能降低cpu使用率
  5. 减少了文件句柄的使用量
  6. ASMLIB API提供了传递如I/O优先级等元信息到存储设备的可能

虽然从理论上我们可以从ASMLIB中得到性能收益,但实践过程中这种优势是几乎可以忽略的,没有任何性能报告显示ASMLIB对比Linux上原生态的udev设备管理服务有任何性能上的优势。在Oracle官方论坛上有一篇<ASMLib and Linux block devices>讨论ASMLIB性能收益的帖子,你可以从中看到”asmlib wouldn’t necessarily give you much of an io performance benefit, it’s mainly for ease of management as it will find/discover the right devices for you, the io effect of asmlib is large the same as doing async io to raw devices.”的评论,实际上使用ASMLIB和直接使用裸设备(raw device)在性能上没有什么差别。

ASMLIB可能带来的缺点:

  1. 对于多路径设备(multipathing)需要在/etc/sysconfig/oracleasm-_dev_oracleasm配置文件中设置ORACLEASM_SCANORDER及ORACLEASM_SCANEXCLUDE,以便ASMLIB能找到正确的设备文件,具体可以参考Metalink Note<How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks? [ID 602952.1]>
  2. 因为ASM INSTANCE使用ASMLIB提供的asm disk,所以增加了额外的层面
  3. 每次Linux Kernel更新,都需要替换新的ASMLIB包
  4. 增加了因人为错误造成宕机downtime的可能
  5. 使用ASMLIB意味着要花费更多时间去创建和维护
  6. 因为ASMLIB的存在,可能引入更多的bug,这是我们最不想看到的
  7. 使用ASMLIB创建的disk,其disk header并不会和普通的asm disk header有什么不同,仅仅是在头部多出了ASMLIB的属性空间。

结论:
我个人的观点是尽可能不要使用ASMLIB,当然这不是DBA个人所能决定的事情。另一方面这取决于个人习惯,在rhel 4的早期发行版本中没有提供udev这样的设备管理服务,这导致在rhel 4中大量的ASM+RAC组合的系统使用ASMLIB , 经网友指出udev 作为kernel 2.6的新特性被引入,在rhel4的初始版本中就已经加入了udev绑定服务,但是在rhel4时代实际udev的使用并不广泛(In Linux 2.6, a new feature was introduced to simplify device management and hot plug capabilities. This feature is called udev and is a standard package in RHEL4 or Oracle
Enterprise Linux 4 (OEL4) as well as Novell’s SLES9 and SLES10.)。如果是在RHEL/OEL 5中那么你已经有充分的理由利用udev而放弃ASMLIB。

Reference:
ASMLIB Performance vs Udev
RAC+ASM 3 years in production Stories to share
How To Setup ASM & ASMLIB On Native Linux Multipath Mapper disks? [ID 602952.1]
ASMLib and Linux block devices

Discover Your Missed ASM Disks

经常有网友在构建10g/11g中ASM存储环境的时候遇到ASM磁盘无法识别的问题,虽然已经为存储设备赋予了适当的权限,也为ASM实例修改了asm_diskstring初始化参数,可是在DBCA的ASM Diskgroup创建页面里就是无法显示候选的ASM Disk磁盘。

实际上因为ASM存储方式比起裸设备或GPFS来说更为黑盒,我们也无法利用ASM instance中的一些动态性能视图或内部视图来排查造成这一问题的原因,使得这类问题显得十分棘手。

下面我来介绍一种使用操作系统调用追踪工具来排查ASM无法找到磁盘问题的方法。

[oracle@vrh1 raw]$ cd /dev/rdsk 

/* 演示中我们要用到的三个裸设备位于/dev/rdsk下 */

[oracle@vrh1 rdsk]$ ls
hdisk1  hdisk2  hdisk3

[oracle@vrh1 rdsk]$ sqlplus / as sysdba

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Prod
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL>  select path from v$asm_disk;
no rows selected

SQL> alter system set asm_diskstring='/dev/rdsk/hdisk*';
System altered.

SQL> select * from v$asm_disk;
no rows selected

/* 以上虽然设置了asm_diskstring参数,ASM instance依然无法在指定路径下找到合适的设备
    这是为什么呢?!
*/

[oracle@vrh1 rdsk]$ ps -ef|grep rbal|grep -v grep
oracle   31375     1  0 15:40 ?        00:00:00 asm_rbal_+ASM1

/* 找出当前ASM实例中的rbal后台进程  */

/* 针对该后台进程做系统调用的trace,一般Linux上使用strace,而Unix上使用truss */

strace -f -o /tmp/asm_rbal.trc -p $OS_PID_OF_RBAL_BGPROCESS
truss  -ef -o /tmp/asm_rbal.trc -p $OS_PID_OF_RBAL_BGPROCESS

[oracle@vrh1 rdsk]$ strace -f -o /tmp/asm_rbal.trc  -p 31375
Process 31375 attached - interrupt to quit

/* 在另外一个终端窗口中打开sqlplus登录ASM实例,执行对v$ASM_DISK或者X$KFDSK的查询 */

SQL> select * from x$kfdsk;

no rows selected

/* 完成以上查询后使用Ctrl+C中断strace命令,并分析生成的system call trace */

/* 因为在不同Unix平台上rbal后台进程可能使用不同的system call function函数以达到相同的目的,
    所以在你的平台上可能并不像在Linux上使用open和access2个关键词搜索就可以得到答案了!     */

[oracle@vrh1 rdsk]$ cat /tmp/asm_rbal.trc |egrep "open|access"
31375 open("/dev/rdsk", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 15
31375 access("/dev/rdsk/hdisk3", R_OK|W_OK) = -1 EACCES (Permission denied)
31375 access("/dev/rdsk/hdisk2", R_OK|W_OK) = -1 EACCES (Permission denied)
31375 access("/dev/rdsk/hdisk1", R_OK|W_OK) = -1 EACCES (Permission denied)

可以看到以上系统调用中使用access函数访问候选的hdisk*ASM磁盘设备时出现了”Permission denied”的问题,很显然是因为权限不足导致了ASM实例无法利用这部分的磁盘设备,我们来纠正这一问题后再次尝试:

[root@vrh1 ~]# chown oracle:dba /dev/rdsk/hdisk*

SQL> select path_kfdsk from x$kfdsk;

PATH_KFDSK
--------------------------------------------------------------------------------
/dev/rdsk/hdisk3
/dev/rdsk/hdisk2
/dev/rdsk/hdisk1

SQL> select path from v$asm_disk;

PATH
--------------------------------------------------------------------------------
/dev/rdsk/hdisk3
/dev/rdsk/hdisk1
/dev/rdsk/hdisk2

以上丢失ASM Disk诊断方法的原理是在查询v$asm_disk或者x$kfdsk视图时Oracle会让RBAL这个ASM特别的后台进程去访问asm_diskstring参数指定路径下的所有可用设备,只要我们了解了在访问过程中RBAL遭遇的问题,那么一般来说都可以很简单地予以解决,如果strace/truss也无法给予你任何启示的话,也许你不得不去提交一个SR让Oracle Support来进一步协助你了,当然在正式提交SR之前你有必要再次确认一下你的ASM Disk是否都满足了以下的这些硬指标:

1.合理地设置ASM_DISKSTRING参数,若没有设置ASM_DISKSTRING参数那么ASM实例会尝试到默认的一些路径去搜索磁盘设备,这些默认路径在不同操作系统上略有不同:

操作系统 默认搜索路径
Solaris /dev/rdsk/*
Windows \\.\orcldisk*
Linux /dev/raw/*
Linux with ASMLIB ORCL:*
Linux with ASMLIB /dev/oracleasm/disks/*
HPUX /dev/rdsk/*
HP-UX(Tru 64) /dev/rdsk/*
AIX /dev/*

2.候选磁盘设备应当属于安装ASM的Oracle软件的用户,否则使用chown或卷管理软件修改磁盘拥有者,并保证磁盘被正确加载

3.候选磁盘设备应当设置合理的权限,一般为660,如果存在问题那么可以临时设为770以便测试

4.RAC环境中需要注意所有的磁盘设备应当在所有节点都可见,建议使用cluvfy工具验证

沪ICP备14014813号-2

沪公网安备 31010802001379号