【转】Find block in ASM

转自asmsupportguy blog

In the post Where is my data I have shown how to locate and extract an Oracle datafile block from ASM. To make things easier, I have now created a Perl script find_block.pl that automates the process – you provide the datafile name and the block number, and the script generates the command to extract the data block from ASM.

 

find_block.pl

 

#!$ORACLE_HOME/perl/bin/perl -w
#
# The find_block.pl constructs the command(s) to extract a block from ASM.
# For a complete info about this script see ASM Support Guy blog post:
# http://asmsupportguy.blogspot.com/2014/10/find-block-in-asm.html
#
# Copyright (C) 2014 Bane Radulovic
#
# This program is free software: you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation, either version 3 of the License, or any later version.
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details
# at http://www.gnu.org/licenses/.
#
# Version 1.00, Oct 2014
# The initial release.
#
# Version 1.01, Oct 2014
# Minor improvements.
#
# Version 1.02, Oct 2014
# Added support for AFD disks.
#
# Version 1.03, Nov 2014
# Added sanity checks, e.g. if the requested block is reasonable,
# if the specified filename is valid, etc.
#
# Version 1.04, Nov 2014
# Improved the check for Exadata storage cell based disk.
#
use strict;
use DBI;
use DBD::Oracle qw(:ora_session_modes);
use POSIX;
# Handle the version query
die "find_block.pl version 1.04\n"
 if ( $ARGV[0] =~ /^-v/i );
# Check the number of input arguments
die "Usage: \$ORACLE_HOME/perl/bin/perl find_block.pl filename block\n"
 unless ( @ARGV == 2 );
# Get the filename from the first input argument
my $filename = shift @ARGV;
# Check if the filename makes sense.
# The 'minimum' filename is +DGNAME/filename,
# i.e. it has to begin with the '+' followed by a disk group name,
# followed by at least one '/', followed by directory or file name...
die "Error: The $filename is not a valid file name.\n"
 unless ( $filename =~ /^\+\w/ && $filename =~ /\/\w/ );
# Get the disk group name out of the user specified filename
my $diskgroup_name = substr($filename, 1, index($filename, "/") -1 );
# Get the ASM file name out of the user specified filename
my $asmfile = substr($filename, rindex($filename, "/") +1 );
# Get the block number from the second input argument
my $block_number = shift @ARGV;
# Check if the block number is an integer
die "Usage: \$ORACLE_HOME/perl/bin/perl find_block.pl filename block\n"
 unless ( $block_number =~ /^\d+$/ );
# Check if the ASM SID is set
die "Error: ASM SID not set.\n"
 unless ( $ENV{ORACLE_SID} =~ /\+ASM/ );
# Connect to the (local) ASM instance
my $dbh = DBI->connect('dbi:Oracle:', "", "", { ora_session_mode => ORA_SYSDBA })
 or die "$DBI::errstr\n";
# Check if the disk group exists and if it is mounted
my $group_number = &asm_diskgroup("group_number", $diskgroup_name);
die "Error: Disk group $diskgroup_name not mounted or does not exist.\n"
 unless ( $group_number );
# Check if the user specified file exists in the disk group
my $file_number = &asm_alias("file_number", $asmfile, $group_number);
die "Error: File $asmfile does not exist in disk group $diskgroup_name.\n"
 unless ( $file_number );
# Get the block size for the file
my $block_size = &asm_file("block_size", $group_number, $file_number);
# Get the number of blocks in the file
my $file_blocks = &asm_file("blocks", $group_number, $file_number);
# Check if the user specified block number makes sense
die "Error: Block range for file $asmfile is: 0 - $file_blocks.\n"
 unless ( $block_number >= 0 && $block_number <= $file_blocks );
# Get the disk group AU size
my $au_size = &asm_diskgroup("allocation_unit_size", $diskgroup_name);
# Work out the blocks per AU and the virtual extent number
my $blocks_per_au = $au_size/$block_size;
my $xnum_kffxp = floor($block_number/$blocks_per_au);
# Get the disk and AU numbers into the @disk_au array
my @disk_au = &asm_kffxp($file_number, $group_number, $xnum_kffxp);
die "Could not get any disk and AU numbers for file $asmfile.\n"
 unless ( @disk_au );
# Get the disk path(s) and generate the block extract command(s)
while ( @disk_au ) {
 # Do not assume anything
 my $storage_cell = "FALSE";
 # Get the disk number from @disk_au
 my $disk_number = shift @disk_au;
 # Get the AU number from @disk_au
 my $au_number = shift @disk_au;
 # Get the path for that disk number
 my $path = &asm_disk("path", $group_number, $disk_number);
 # If there is no path move to the next disk
 if ( ! $path ) {
  next;
  }
 # If ASMLIB is in use, the path will return ORCL:DISKNAME.
 # Set the path to /dev/oracleasm/disks/DISKNAME
 elsif ( $path =~ /ORCL:(.*)/ ) {
  $path = "/dev/oracleasm/disks/".$1;
  }
 # If ASM Filter Driver (AFD) is in use, the path will return AFD:DISKNAME.
 # Get the actual path from /dev/oracleafd/disks/DISKNAME
 elsif ( $path =~ /AFD:(.*)/ ) {
  if ( ! open AFDDISK, "/dev/oracleafd/disks/".$1 ) { next }
  else { chomp($path = <AFDDISK>) }
  }
 # For Exadata storage cell based disk, the path will start with o/IP address
 elsif ( $path =~ /^o\/\d{1,3}\./ ) {
  $storage_cell = "TRUE";
  }
 if ( $storage_cell eq "TRUE" ) {
  # Construct the kfed command for Exadata storage cell based disk
  # dev=$path ausz=$au_size aunum=$au_number blksz=$block_size blknum=$block_number
  # The grep filters out the kfed stuff
  print "kfed read dev=$path ausz=$au_size aunum=$au_number blksz=$block_size blknum=$block_number | grep -iv ^kf > block_$block_number.txt\n";
  }
 else {
  # Construct the dd command
  # if=$path bs=$block_size count=1 skip=$skip of=block_$block_number.dd
  my $skip=$au_number*$blocks_per_au + $block_number%$blocks_per_au;
  print "dd if=$path bs=$block_size count=1 skip=$skip of=block_$block_number.dd\n";
  }
 }
# We are done. Disconnect from the (local) ASM instance
$dbh->disconnect;
# Subs
# Get a column from v$asm_file for a given group number and file number
sub asm_file {
 my $col = shift @_;
 my $group_number = shift @_;
 my $file_number = shift @_;
 my $sql = $dbh->prepare("select $col from v\$asm_file where group_number=$group_number and file_number=$file_number");
 $sql->execute;
 my $col_value = $sql->fetchrow_array;
 $sql->finish;
 return $col_value;
 }
# Get a column from v$asm_alias for a given (file) name and group number
sub asm_alias {
 my $col = shift @_;
 my $name = shift @_;
 my $group_number = shift @_;
 my $sql = $dbh->prepare("select $col from v\$asm_alias where lower(name)=lower('$name') and group_number=$group_number");
 $sql->execute;
 my $col_value = $sql->fetchrow_array;
 $sql->finish;
 return $col_value;
 }
# Get a column from v$asm_diskgroup for a given disk group name
sub asm_diskgroup {
 my $col = shift @_;
 my $name = shift @_;
 my $sql = $dbh->prepare("select $col from v\$asm_diskgroup where name=upper('$name')");
 $sql->execute;
 my $col_value = $sql->fetchrow_array;
 $sql->finish;
 return $col_value;
 }
# Get a column from v$asm_disk for a given group number and disk number
sub asm_disk {
 my $col = shift @_;
 my $group_number = shift @_;
 my $disk_number = shift @_;
 my $sql = $dbh->prepare("select $col from v\$asm_disk where group_number=$group_number and disk_number=$disk_number");
 $sql->execute;
 my $col_value = $sql->fetchrow_array;
 $sql->finish;
 return $col_value;
 }
# Get the disk and AU numbers from x$kffxp for a given virtual extent number.
# This will return one row for an external redundancy file,
# two rows for a normal redundancy and three rows for a high redundancy.
# Well, it will return an array with disk and AU pairs, not rows.
sub asm_kffxp {
 my $file_number = shift @_;
 my $group_number = shift @_;
 my $xnum = shift @_;
 # The @disk_au array to hold the disk number, AU number rows
 my @disk_au;
 my $sql = $dbh->prepare("select disk_kffxp, au_kffxp from x\$kffxp where number_kffxp=$file_number and group_kffxp=$group_number and xnum_kffxp=$xnum");
 $sql->execute;
 # Expecting one disk number and one AU number per row
 while ( my @row = $sql->fetchrow_array) {
  # Add each (element of the) row to @disk_au array
  foreach ( @row ) { push @disk_au, $_ }
  }
 $sql->finish;
 return @disk_au;
 }

The find_block.pl is a Perl script that constructs the dd or the kfed command to extract a block from ASM. It should work with all Linux and Unix ASM versions and with local (non-flex) ASM in the standalone (single instance) or cluster environments.

 

The script should be run as the ASM/Grid Infrastructure owner, using the perl binary in the ASM oracle home. In a cluster environment, the script can be run from any node. Before running the script, set the ASM environment and make sure the ORACLE_SID, ORACLE_HOME, LD_LIBRARY_PATH, etc are set correctly. For ASM versions 10g and 11gR1, also set the environment variable PERL5LIB, like this:

 

export PERL5LIB=$ORACLE_HOME/perl/lib/5.8.3:$ORACLE_HOME/perl/lib/site_perl

 

Run the script as follows:

 

$ORACLE_HOME/perl/bin/perl find_block.pl filename block

 

Where:

  • filename is the name of the file from which to extract the block. For a datafile, the file name can be obtained from the database instance with SELECT NAME FROM V$DATAFILE.
  • block is the block number to be extracted from ASM.

 

The output should look like this:

 

dd if=[ASM disk path] … of=block_N.dd

 

Or in Exadata:

 

kfed read dev=[ASM disk path] … > block_N.txt

 

If the file redundancy is external, the script would generate a single command. For a normal redundancy file, the script would generate two commands, and for the high redundancy file the script would generate three commands.

 

Example with ASM version 10.2.0.1

 

The first example is with a single instance ASM version 10.2.0.1. I first create the table and insert some data, in the database instance, of course.

 

[oracle@cat10g ~]$ sqlplus / as sysdba

 

SQL*Plus: Release 10.2.0.1.0 – Production on [date]

 

SQL> create table TAB1 (name varchar2(16)) tablespace USERS;

 

Table created.

 

SQL> insert into TAB1 values (‘CAT’);

 

1 row created.

 

SQL> insert into TAB1 values (‘DOG’);

 

1 row created.

 

SQL> commit;

 

Commit complete.

 

SQL> select ROWID, NAME from TAB1;

 

ROWID              NAME

—————— ——————————–

AAANE+AAEAAAAGHAAA CAT

AAANE+AAEAAAAGHAAB DOG

 

SQL> select DBMS_ROWID.ROWID_BLOCK_NUMBER(‘AAANE+AAEAAAAGHAAA’) “Block” from dual;

 

Block

———

391

 

SQL> select t.name “Tablespace”, f.name “Datafile”

from v$tablespace t, v$datafile f

where t.ts#=f.ts# and t.name=’USERS’;

 

Tablespace   Datafile

———— ————————————–

USERS        +DATA/cat/datafile/users.259.783204313

 

SQL>

 

Switch to the ASM environment, set PERL5LIB, and run the script.

 

$ export PERL5LIB=$ORACLE_HOME/perl/lib/5.8.3:$ORACLE_HOME/perl/lib/site_perl

$ $ORACLE_HOME/perl/bin/perl find_block.pl +DATA/cat/datafile/users.259.783204313 391

dd if=/dev/oracleasm/disks/ASMDISK01 bs=8192 count=1 skip=100359 of=block_391.dd

$

 

From the output of the find_block.pl, I see that the specified file is external redundancy, as the script produced a single dd command. Run the dd command:

 

$ dd if=/dev/oracleasm/disks/ASMDISK01 bs=8192 count=1 skip=100359 of=block_391.dd

$

 

Looking at the content of the block_3237.dd file, with the od utility, I see the data inserted in the table:

 

$ od -c block_391.dd | tail -3

0017740 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 , 001

0017760 001 003 D O G , 001 001 003 C A T 001 006 u   G

0020000

$

 

Example with ASM version 12.1.0.1 in Exadata

 

In Exadata we cannot use the dd command to extract the block, as the ASM disks are not visible from the database server. To get the database block, we can use the kfedtool, so the find_block.pl will construct a kfed command that can be used to extract the block from ASM.

 

Let’s have a look at an example with ASM version 12.1.0.1, in a two node cluster, with the datafile in a pluggable database in Exadata.

 

As in the previous example, I first create the table and insert some data.

 

$ sqlplus / as sysdba

 

SQL*Plus: Release 12.1.0.1.0 Production on [date]

 

SQL> alter pluggable database BR_PDB open;

 

Pluggable database altered.

 

SQL> show pdbs

 

CON_ID CON_NAME OPEN MODE   RESTRICTED

—— ——– ———– ———-

2 PDB$SEED READ ONLY   NO

5 BR_PDB   READ WRITE  NO

 

SQL>

 

$ sqlplus bane/welcome1@BR_PDB

 

SQL*Plus: Release 12.1.0.1.0 Production on [date]

 

SQL> create table TAB1 (n number, name varchar2(16)) tablespace USERS;

 

Table created.

 

SQL> insert into TAB1 values (1, ‘CAT’);

 

1 row created.

 

SQL> insert into TAB1 values (2, ‘DOG’);

 

1 row created.

 

SQL> commit;

 

Commit complete.

 

SQL> select t.name “Tablespace”, f.name “Datafile”

from v$tablespace t, v$datafile f

where t.ts#=f.ts# and t.name=’USERS’;

 

Tablespace Datafile

———- ———————————————

USERS      +DATA/CDB/054…/DATAFILE/users.588.860861901

 

SQL> select ROWID, NAME from TAB1;

 

ROWID              NAME

—————— —-

AAAWYEABfAAAACDAAA CAT

AAAWYEABfAAAACDAAB DOG

 

SQL> select DBMS_ROWID.ROWID_BLOCK_NUMBER(‘AAAWYEABfAAAACDAAA’) “Block number” from dual;

 

Block number

————

131

 

SQL>

 

Switch to the ASM environment, and run the script.

 

$ $ORACLE_HOME/perl/bin/perl find_block.pl +DATA/CDB/0548068A10AB14DEE053E273BB0A46D1/DATAFILE/users.588.860861901 131

kfed read dev=o/192.168.1.9/DATA_CD_03_exacelmel05 ausz=4194304 aunum=16212 blksz=8192 blknum=131 | grep -iv ^kf > block_131.txt

kfed read dev=o/192.168.1.11/DATA_CD_09_exacelmel07 ausz=4194304 aunum=16267 blksz=8192 blknum=131 | grep -iv ^kf > block_131.txt

 

Note that the find_block.pl generated two commands, as that datafile is normal redundancy. Run one of the commands:

 

$ kfed read dev=o/192.168.1.9/DATA_CD_03_exacelmel05 ausz=4194304 aunum=16212 blksz=8192 blknum=131 | grep -iv ^kf > block_131.txt

$

 

Review the content of the block_131.txt file (note that this is a text file). Sure enough I see my DOG and my CAT:

 

$ more block_131.txt

FD5106080 00000000 00000000 …  […………….]

Repeat 501 times

FD5107FE0 00000000 00000000 …  [……..,……D]

FD5107FF0 012C474F 02C10202 …  [OG,……CAT..,-]

$

 

Find any block

 

The find_block.pl can be used to extract a block from any file stored in ASM. Just for fun, I ran the script on a controlfile and a random block:

 

$ $ORACLE_HOME/perl/bin/perl find_block.pl +DATA/CDB/CONTROLFILE/current.289.843047837 5

kfed read dev=o/192.168.1.9/DATA_CD_10_exacelmel05 ausz=4194304 aunum=73 blksz=16384 blknum=5 | grep -iv ^kf > block_5.txt

kfed read dev=o/192.168.1.11/DATA_CD_01_exacelmel07 ausz=4194304 aunum=66 blksz=16384 blknum=5 | grep -iv ^kf > block_5.txt

kfed read dev=o/192.168.1.10/DATA_CD_04_exacelmel06 ausz=4194304 aunum=78 blksz=16384 blknum=5 | grep -iv ^kf > block_5.txt

$

 

Keen observer will notice that the script worked out the correct block size for the controlfile (16k) and that it generated three different commands. While the disk group DATA is normal redundancy, the controlfile is high redundancy (default redundancy for the controlfile in ASM).

 

Conclusion

 

The find_block.pl is a Perl script that construct the dd or the kfed command to extract a block from a file in ASM. In most cases we want to extract a block from a datafile, but the script can be used to extract a block from a controlfile, redo log or any other file in ASM.

 

If the file is external redundancy, the script will generate a single command, that can be used to extract the block from the ASM disk.

 

If the file is normal redundancy, the script will generate two commands, that can be used to extract the (copies of the same) block from two different ASM disks. This can be handy, for example in cases where a corruption is reported against one of the blocks and for some reason the ASM cannot repair it.

 

If the file is high redundancy, the script will generate three commands.

 

To use the script you don’t have to know the file redundancy, the block size or any other file attribute. All that is required is the file name and the block number.

Oracle ORA-15196 ORACLE ASMトラブル解析

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

一般的、ASMメータデータブロックが(metadata block)にとてつもないトラブルが検証された場合に限って、この文で紹介するORA-15196エラを報告する。

 

ORA-15196エラのフォーマットは:

 

ORA-15196: invalid ASM block header [1st] [2nd] [3rd] [4th] [5th != 6th]

 

関連する変数の意味は:

 

1st, このエラを引き起こしたOracleカーネル関数の名前とコードの行数

2nd, トラブルを引き起こしたエリアの名前を検証する

3rd , そのブロックに格納されたASM番号      http://www.parnassusdata.com/

4th,そのブロックに格納されたASMブロック番号

5th, 2ndで指定していたエリアで実際に格納していた数値

6th,  2ndで指定していたエリアで格納したい数値

 

 

例えば:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

 

 

kfc.c:7997は今回のORA-15196エラを引き起こしたコードはkfc.cでカーネルソースコードは7997のコード

endian_kfbh :トラブルがあるエリアの名前を検証する。endian_kfbhはendianの属性をスクライブするときに使われる。

そのブロックに格納されたASM番号:1

そのブロックに格納されたASM番号:93

2nd 2ndで指定していたエリアで実際に格納していた数値

2ndで指定していたエリアで格納したい数値:0

 

 

 

前のバラメタを解析する

 

  • 検証でトラブルのエリア名前を探し出した。

ASM metadataメータデータはいろんな構造により構造された。例えば、FILE directoryファイルディレクトリ、Disk Directoryディスクディレクトリ、,Active Change Directory(ACDC)これらの情報はASM の第一号ファイルから255号ファイルに格納される。各ファイルはASM Extentに構造される。またExtentは大きさが4096 bytesのASM blockで構造される(Metadata Fileに限って、このようになるが、Oracle DatafileデータファイルはDB_BLOCK_SIZE大きさのデータブロックで構造される)。ここで、各ASM BLOCKに必要なブロックヘッダKFBH,を含んでいて、以下のようになる:

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                       1 ; 0x004: blk=1

kfbh.block.obj:                       1 ; 0x008: file=1

kfbh.check:                   325804796 ; 0x00c: 0x136b62fc

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

 

 

 

KFBHはせいぜいC言語ヘッダファイルで定義されたStructureだけで、定義は以下の通りになる:

 

endian_kfbh       /* endianness of writer              */

hard_kfbh         /* H.A.R.D. magic # and block size   */

type_kfbh      /* metadata block type               */

datfmt_kfbh       /* metadata block data format        */

block_kfbh       /* block location of this block      */

check_kfbh        /* check value to verify consistency */

fcn_kfbh         /* change number of last change      */

spare1_kfbh       /* zero pad out to 32 bytes          */

spare2_kfbh       /* zero pad out to 32 bytes          */

 

 

以上各エリアもトラブルトラブルのエリアを検証する。例えば:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

ORA-15196 INVALID ASM BLOCK HEADER [KFC.C 8064] [CHECK_KFBH] [2147483827] [2]

 

 

  • ASM Blockに格納されたASMオブジェクト番号object_number

 

各ASM metadata blockは指定していたASM 構造メータファイルに該当している。kfbh.block.objはブロックヘッダのオブジェクト番号情報で、ASMのファイル番号でもある。例えば以下のように、つまり、1号ファイルKFBTYP_FILEDIR File Directory

 

[oracle@mlab2 ~]$ kfed  read /oracleasm/asm-disk01 aun=2 blkn=1 aus=4194304|less

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                       1 ; 0x004: blk=1

kfbh.block.obj:                       1 ; 0x008: file=1

kfbh.check:                   325804796 ; 0x00c: 0x136b62fc

kfbh.fcn.base:                        0 ; 0x010: 0x00000000

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

 

 

ASM File number: ASM Metadata: kfbh.type の関係

 

ASM File number ASM Metadata kfbh.type
1 File Directory KFBTYP_FILEDIR
2 Disk Directory KFBTYP_DISKDIR
3 Active Change Directory (ACD) KFBTYP_CHNGDIR
4 Continous Operations Directory (COD) KFBTYP_COD_DATA
5 Template Directory KFBTYP_TMPLTDIR
6 Alias Directory KFBTYP_ALIASDIR
9 Attributes Directory KFBTYP_ATTRDIR
12 Staleness Directory KFBTYP_STALEDIR

 

 

別のmetadata構造、例えばPST、KFBTYP_DISKHEADなど。そのkfbh.block.objはいつも2147483648で、つまり16進数 0x 80000000、例えば:

 

[oracle@mlab2 ~]$ kfed  read /oracleasm/asm-disk01 aun=0 blkn=0 aus=4194304|less

kfbh.endian:                          1 ; 0x000: 0x01

kfbh.hard:                          130 ; 0x001: 0x82

kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD

kfbh.datfmt:                          1 ; 0x003: 0x01

kfbh.block.blk:                       0 ; 0x004: blk=0

kfbh.block.obj:              2147483648 ; 0x008: disk=0

kfbh.check:                   852050979 ; 0x00c: 0x32c94423

kfbh.fcn.base:                   106569 ; 0x010: 0x0001a049

kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000

kfbh.spare1:                          0 ; 0x018: 0x00000000

kfbh.spare2:                          0 ; 0x01c: 0x00000000

 

 

 

 

 

  • ASM blockに格納されたブロック番号 block number

 

一つのASM Fileは複数のExtentによって構造される。一つのExtentもAllocation Unitである。複数のallocation Units(Variable Extent Size)もいける。 一つのExtentは複数のASM metadata block(一般的に大きさは4096 bytes)に構造される。ディフォルト1MB AUで、一つのextentは256のASM metadata blockに該当する。

 

 

kfbh.block.blkはあるファイルのブロック番号と意味している。例えば、kfbh.block.blk=93。では、あるファイルに格納されたextentにある。けど、そのextentはDisk groupにどんなASM diskのどんなAUでも構造できる。

 

 

  • 2nd指定していたエリアで実際に格納した数値

 

つまり二つ目に実際に格納された数値。

 

 

  • 2nd指定していたエリアで理論的に格納した数値

つまり二つ目に理論的に格納された数値。

 

例えば:

 

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

 

これは一つ目のファイルの93ブロックのendian_kfbhエリアがkfc.cの7997コードに対して検証するときに見つけ出した。実際に格納された数値は211が実際にendian_kfbh一般的に二つの数値を含んでいて: 0x01 あるいは 0x00。0x01はLittle Endianで、0x00はBig Endianである。 211という数値はいけない数値で、大事な一つ目の93ブロックに損害が現れたと意味している。

 

 

ORA-15196に関するBUG Noteは以下の通り:

Bug 14545129 – ORA-15196_ INVALID ASM BLOCK HEADER [KXDAM
Bug 14740185 – ASM REPORTED CORRUPTED AT BLOCKS _ ORA-15196_ INVALID ASM BLOCK HEADER [KFC
Bug 5554692 – Error ORA-15196 reporting ASM block header invalid ora-15196 after ORA-.pdf
ORA-15196 and ORA-600 [kfgpn lclenq] Resizing Datafile or Dropping User (Doc ID 759379.1)
ORA-15196 WITH ASM DISKS LARGER THAN 2TB (Doc ID 736891.1)
Querying V$ASM_FILE Gives ORA-15196 After ASM Was Upgraded From 10gR2 To 11gR2 with an AU size  1M (Doc ID 1145365.1)
Bug 11801536 ORA-15196 INVALID ASM BLOCK HEADER [KFC.C 8064] [CHECK_KFBH] [2147483827]
Bug 13605059 – ORA-15196_ INVALID ASM BLOCK HEADER [KFC

 

 

 

 

 

けど、ORA-15196エラを引き起こしたのはORACLE自身のBUGではなく、ORACLE以外の原因で導いた結果だから:

 

 

 

 

 

 

 

  1. ASMが使っているディスクはOSレベルからフォーマットされた。あるいはヘッダが上書きされた。例えば、AIXでchdevがpvヘッダを設定した。
  2. ASMが使っているディスクがファイルシステムとされた。一般的に、これは誤操作によるものである。例えば、SAシステム管理者はあるASMディスクがオペレーションシステムに使われたから、無駄使いと認定して、mkfsする。
  3. IOリンクあるいはストレージトラブル。例えば書き込みをなくしたなど
  4. 別のアプリによるもの

 

 

 

  • 必要なデータ

 

 

ORA-15196 に対して、詩檀は以下のデータを収集することを勧めている:

 

Oracle アラームログalert.logと関連するtraceファイル

トラブルがあるASM Diskの300MBデータのDD

 

まずはalert.logの内容でトラブルが起こったASM Disk番号を確認する:

WARNING: cache failed to read fn=1 blk=80 from disk(s): 0

ORA-15196: invalid ASM block header [kfc.c:7997] [endian_kfbh] [1] [93] [211 != 0]

 

ここでdisk(s):0。DISK_NUMBER=0のASM DISKで後でddコマンドで300MBデータをコピする:

 

$dd if=<device path> of=/tmp/disk.dd bs= 1048576 count=300

ORA-15042 ORA-15040 ORA-15032 ASM add disk

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

こんな可能性もある:ORACLE ASMがディスクを無事に追加したが,disk groupのrebalanceが始まっていない。けど、追加したdiskにはハードウェアトラブルが起きて、ディスクを追加したあと、ディスクヘッダに書き込むすべてのメータデータもなくした。それに、diskgroupは外部のリダンダンシーで、つまりEXTERNAL Redundancyである。だからそのdiskgroupはあるDISKに追加した。そのDISKのmetadataを全部なくしたから、diskgroup がまともにMOUNTできない。
disk headerのKFBTYP_DISKHEADだけじゃなく、追加したdiskのすべてのmetadataもなくしたから、KFBTYP_DISKHEADの情報をkfed mergeでリカバリできない。特別な人工処置でこのトラブルを避けられる。
以下のインスタンスのように:

SUCCESS: diskgroup TESTDG03 was created
NOTE: cache deleting context for group TESTDG03 1/0x86485c30
NOTE: cache registered group TESTDG03 number=1 incarn=0xab385c36
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0xab385c36
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:21:07 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 92 for pid 20, osid 20176
Thu Jan 29 08:21:07 2015
NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02
NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03
NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04
NOTE: cache mounting (first) external redundancy group 1/0xAB385C36 (TESTDG03)
NOTE: cache recovered group 1 to fcn 0.0
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Jan 29 08:21:07 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR found thread 1 closed at ABA 0.10750
NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR opening thread 1 at fcn 0.0 ABA 2.0
NOTE: setting 11.2 start ABA for group TESTDG03 thread 1 to 2.0
NOTE: cache mounting group 1/0xAB385C36 (TESTDG03) succeeded
NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0xab385c36
GMON querying group 1 at 93 for pid 13, osid 4612
Thu Jan 29 08:21:07 2015
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup TESTDG03 was mounted
SUCCESS: CREATE DISKGROUP TESTDG03 EXTERNAL REDUNDANCY DISK ‘/oracleasm/asm-disk01’ SIZE 129500M ,
‘/oracleasm/asm-disk02’ SIZE 128800M ,
‘/oracleasm/asm-disk03’ SIZE 129200M ,
‘/oracleasm/asm-disk04’ SIZE 128800M ATTRIBUTE ‘compatible.asm’=’11.2.0.0.0′,’au_size’=’1M’ /* ASMCA */
Thu Jan 29 08:21:07 2015
NOTE: diskgroup resource ora.TESTDG03.dg is online
NOTE: diskgroup resource ora.TESTDG03.dg is updated
Thu Jan 29 08:21:23 2015
SQL> alter diskgroup testdg03 add disk ‘/oracleasm/asm-disk06’
ORA-15032: not all alterations performed
ORA-15260: permission denied on ASM disk group
ERROR: alter diskgroup testdg03 add disk ‘/oracleasm/asm-disk06’
Thu Jan 29 08:21:31 2015
SQL> alter diskgroup testdg03 add disk ‘/oracleasm/asm-disk06’
NOTE: Assigning number (1,4) to disk (/oracleasm/asm-disk06)
NOTE: requesting all-instance membership refresh for group=1
NOTE: initializing header on grp 1 disk TESTDG03_0004
NOTE: requesting all-instance disk validation for group=1
Thu Jan 29 08:21:32 2015
NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance.
NOTE: requesting all-instance disk validation for group=1
NOTE: skipping rediscovery for group 1/0xab385c36 (TESTDG03) on local instance.
NOTE: initiating PST update: grp = 1
Thu Jan 29 08:21:32 2015
GMON updating group 1 at 94 for pid 21, osid 22706
NOTE: PST update grp = 1 completed successfully
NOTE: membership refresh pending for group 1/0xab385c36 (TESTDG03)
GMON querying group 1 at 95 for pid 13, osid 4612
NOTE: cache opening disk 4 of grp 1: TESTDG03_0004 path:/oracleasm/asm-disk06
GMON querying group 1 at 96 for pid 13, osid 4612
SUCCESS: refreshed membership for 1/0xab385c36 (TESTDG03)
SUCCESS: alter diskgroup testdg03 add disk ‘/oracleasm/asm-disk06’
NOTE: Attempting voting file refresh on diskgroup TESTDG03
Thu Jan 29 08:22:09 2015
SQL> alter diskgroup testdg03 dismount
NOTE: cache dismounting (clean) group 1/0xAB385C36 (TESTDG03)
NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: oracle@mlab2.oracle.com (TNS V1-V3)
Thu Jan 29 08:22:10 2015
NOTE: LGWR doing clean dismount of group 1 (TESTDG03)
NOTE: LGWR closing thread 1 of diskgroup 1 (TESTDG03) at ABA 2.15
NOTE: cache dismounted group 1/0xAB385C36 (TESTDG03)
Thu Jan 29 08:22:10 2015
GMON dismounting group 1 at 97 for pid 21, osid 22730
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
SUCCESS: diskgroup TESTDG03 was dismounted
NOTE: cache deleting context for group TESTDG03 1/0xab385c36
Thu Jan 29 08:22:10 2015
NOTE: diskgroup resource ora.TESTDG03.dg is offline
SUCCESS: alter diskgroup testdg03 dismount
NOTE: diskgroup resource ora.TESTDG03.dg is updated
SQL> alter diskgroup testdg03 mount
NOTE: cache registered group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:22:22 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 100 for pid 21, osid 22730
Thu Jan 29 08:22:22 2015
NOTE: Assigning number (1,4) to disk ()
GMON querying group 1 at 101 for pid 21, osid 22730
NOTE: cache dismounting (clean) group 1/0x83F85C5F (TESTDG03)
NOTE: messaging CKPT to quiesce pins Unix process pid: 22730, image: oracle@mlab2.oracle.com (TNS V1-V3)
NOTE: dbwr not being msg’d to dismount
NOTE: lgwr not being msg’d to dismount
NOTE: cache dismounted group 1/0x83F85C5F (TESTDG03)
NOTE: cache ending mount (fail) of group TESTDG03 number=1 incarn=0x83f85c5f
NOTE: cache deleting context for group TESTDG03 1/0x83f85c5f
GMON dismounting group 1 at 102 for pid 21, osid 22730
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup TESTDG03 was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk “4” is missing from group number “1”
ERROR: alter diskgroup testdg03 mount
Thu Jan 29 08:27:37 2015
SQL> alter diskgroup testdg03 mount
NOTE: cache registered group TESTDG03 number=1 incarn=0x56985c64
NOTE: cache began mount (first) of group TESTDG03 number=1 incarn=0x56985c64
NOTE: Assigning number (1,3) to disk (/oracleasm/asm-disk04)
NOTE: Assigning number (1,2) to disk (/oracleasm/asm-disk03)
NOTE: Assigning number (1,1) to disk (/oracleasm/asm-disk02)
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Thu Jan 29 08:27:43 2015
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 105 for pid 21, osid 23017
NOTE: cache opening disk 0 of grp 1: TESTDG03_0000 path:/oracleasm/asm-disk01
NOTE: F1X0 found on disk 0 au 2 fcn 0.0
NOTE: cache opening disk 1 of grp 1: TESTDG03_0001 path:/oracleasm/asm-disk02
NOTE: cache opening disk 2 of grp 1: TESTDG03_0002 path:/oracleasm/asm-disk03
NOTE: cache opening disk 3 of grp 1: TESTDG03_0003 path:/oracleasm/asm-disk04
NOTE: cache mounting (first) external redundancy group 1/0x56985C64 (TESTDG03)
NOTE: cache recovered group 1 to fcn 0.609
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Thu Jan 29 08:27:43 2015
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR found thread 1 closed at ABA 2.15
NOTE: LGWR mounted thread 1 for diskgroup 1 (TESTDG03)
NOTE: LGWR opening thread 1 at fcn 0.609 ABA 3.16
NOTE: cache mounting group 1/0x56985C64 (TESTDG03) succeeded
NOTE: cache ending mount (success) of group TESTDG03 number=1 incarn=0x56985c64
GMON querying group 1 at 106 for pid 13, osid 4612
Thu Jan 29 08:27:43 2015
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup TESTDG03 was mounted
SUCCESS: alter diskgroup testdg03 mount
Thu Jan 29 08:27:43 2015
NOTE: diskgroup resource ora.TESTDG03.dg is online
NOTE: diskgroup resource ora.TESTDG03.dg is updated
Thu Jan 29 08:33:52 2015
SQL> alter diskgroup testdg03 check all norepair
NOTE: starting check of diskgroup TESTDG03
Thu Jan 29 08:33:52 2015
GMON checking disk 0 for group 1 at 107 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 108 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 109 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 110 for pid 21, osid 23017
ERROR: no kfdsk for (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup “TESTDG03” contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup “TESTDG03” contains 1 error(s)
ERROR: alter diskgroup testdg03 check all norepair
Thu Jan 29 08:34:07 2015
SQL> alter diskgroup testdg03 check all
NOTE: starting check of diskgroup TESTDG03
Thu Jan 29 08:34:07 2015
GMON checking disk 0 for group 1 at 111 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 112 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 113 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 114 for pid 21, osid 23017
ERROR: no kfdsk for (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup “TESTDG03” contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup “TESTDG03” contains 1 error(s)
ERROR: alter diskgroup testdg03 check all
SQL> alter diskgroup testdg03 check all repair
NOTE: starting check of diskgroup TESTDG03
GMON checking disk 0 for group 1 at 115 for pid 21, osid 23017
GMON checking disk 1 for group 1 at 116 for pid 21, osid 23017
GMON checking disk 2 for group 1 at 117 for pid 21, osid 23017
GMON checking disk 3 for group 1 at 118 for pid 21, osid 23017
ERROR: no kfdsk for (4)
ERROR: check of diskgroup TESTDG03 found 1 total errors
ORA-15049: diskgroup “TESTDG03” contains 1 error(s)
ORA-15032: not all alterations performed
ORA-15049: diskgroup “TESTDG03” contains 1 error(s)
ERROR: alter diskgroup testdg03 check all repair

[oracle@mlab2 oracleasm]$ oerr ora 15042
15042, 00000, “ASM disk \”%s\” is missing from group number \”%s\” ”
// *Cause: The specified disk, which is a necessary part of a diskgroup,
// could not be found on the system.
// *Action: Check the hardware configuration.
//
ORA-15042トラブルの原因はディスクを追加するときにすべてのメータデータをなくした。けど、追加したばかりのディスクが意味があるデータをまだrebalanceしていない。ASMはディスクを追加したから、そのディスクを使えるようになってから、diskgroupをmountできる。DISKGROUPはMOUNTの状態だけでディスクをドロップできるから。それについてユーザーがそのDISKを強制的にDROPできなくなったと思っている。それでループに落ちた:ディスクをドロップしたいなら、DISKGROUPをMOUNTする必要があるが、DISKGROUPをMOUNTするにはディスクをドロップ必要がある。

このトラブルに対して、ASM metadataを修正すれば避けられる。

【Oracleデータベースリカバリ】ORA-00600: [16513], [1403], [20]例

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

 

あるデータベースシステムがストレージトラブルでASM diskgroup損害が起きた。
あるデータベースはストレージのせいでASM diskgroupがこわれた。後のリカバリでopen database段階でORA-00600: [16513], [1403], [20]が現れた:ORA-00600: internal error code, arguments: [16513], [1403], [20], [], [], [], [], []

ORA-00600: internal error code, arguments: [16513], [1403], [20], [], [], [], [], []
Current SQL statement for this session:
alter database open
—– Call Stack Trace —–
calling call entry argument values in hex
location type point (? means dubious value)
——————– ——– ——————– —————————-
ssd_unwind_bp: unhandled instruction at 0x76add1 instr=f
ksedst()+31 call ksedst1() 000000000 ? 000000001 ?
7FFF04273960 ? 7FFF042739C0 ?
7FFF04273900 ? 000000000 ?
ksedmp()+610 call ksedst() 000000000 ? 000000001 ?
7FFF04273960 ? 7FFF042739C0 ?
7FFF04273900 ? 000000000 ?
ksfdmp()+63 call ksedmp() 000000003 ? 000000001 ?
7FFF04273960 ? 7FFF042739C0 ?
7FFF04273900 ? 000000000 ?
kgeriv()+176 call ksfdmp() 006AE9A20 ? 000000003 ?
7FFF04273960 ? 7FFF042739C0 ?
7FFF04273900 ? 000000000 ?
kgesiv()+119 call kgeriv() 006AE9A20 ? 00DAC5398 ?
000000000 ? 2B749C907FA0 ?
7FFF04273900 ? 000000000 ?
ksesic2()+215 call kgesiv() 006AE9A20 ? 00DAC5398 ?
000004081 ? 000000002 ?
7FFF042748A0 ? 000000000 ?
kqdpts()+351 call ksesic2() 006AE9A20 ? 000000000 ?
00000057B ? 000000000 ?
000000014 ? 000000000 ?
kqrlfc()+314 call kqdpts() 55DF9CE78 ? 000000000 ?
00000057B ? 000000000 ?
000000014 ? 000000000 ?
kqlbplc()+153 call kqrlfc() 55DF9CE78 ? 7FFF04274BC0 ?
00000057B ? 000000000 ?
000000014 ? 000000000 ?
kqlblfc()+263 call kqlbplc() 000000000 ? 7FFF04274BC0 ?
00000057B ? 000000000 ?
000000014 ? 000000000 ?
adbdrv()+58009 call kqlblfc() 000000000 ? 7FFF0427B450 ?
00000057B ? 000000000 ?
000000014 ? 000000000 ?
opiexe()+13745 call adbdrv() 000000000 ? 7FFF0427B450 ?
536FEDDF8 ? 000000000 ?
000000014 ? 000000000 ?
opiosq0()+3398 call opiexe() 000000004 ? 000000000 ?
7FFF0427C60C ? 000000001 ?
000000014 ? 000000000 ?
kpooprx()+318 call opiosq0() 000000003 ? 00000000E ?
7FFF0427C938 ? 0000000A4 ?
000000000 ? 600000013 ?
kpoal8()+783 call kpooprx() 7FFF0427FB1C ? 7FFF0427DB48 ?
000000013 ? 000000001 ?
000000000 ? 600000013 ?
opiodr()+1184 call kpoal8() 00000005E ? 000000017 ?
ORA-01555 caused by SQL statement below (SQL ID: 4krwuz0ctqxdt, SCN: 0x0cdc.c2a516cc):

ORA-00600: internal error code, arguments: [16513], [1403], [20], [], [], [], [], []
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [16513], [1403], [20], [], [], [], [], []
Error 704 happened during db open, shutting down database
USER: terminating instance due to error 704

そのopen database が失敗したときにORA-00600: [16513]エラがよく現れる。そのstack callエラによる:kqrlfc=>kqdpts=>エラを報告する
kqrlfc =>KQR Kernel SQL Row cache management component Load Fixed Cache 代表row cacheディクショナリーメモリーをロードする
kqdpts => Kernel Query Dictionary Patch Time Stamps
その16513 は 16500 dict/rowcache this layer provides support to load / cache Oracle’s dictionary in memory in the library cacheに属している; つまりディクショナリーをロードするときにoracleは重大トラブルが起こった。
そのトラブルは人工的にデータディクショナリーをpatchすることで避けられた。

【Oracle ASMリカバリ】V$ASM_DISK HEADER_STATUSがProvisionedのトラブル解析を示される

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

 

 

ユーザーが前に使用可能なdiskgroup をmountするときに,ASMのアラームログで以下のエラが示される:

 

SQL> ALTER DISKGROUP ALL MOUNT
Tue Jul 19 09:31:09 2005
Loaded ASM Library – Generic Linux, version 1.0.0 library for asmlib interface
Tue Jul 19 09:31:09 2005
NOTE: cache registered group DBFILE_GRP number=1 incarn=0xc3fd9b7d
NOTE: cache registered group FLASHBACK_GRP number=2 incarn=0xc40d9b7e
NOTE: cache dismounting group 1/0xC3FD9B7D (DBFILE_GRP)
NOTE: dbwr not being msg’d to dismount
ERROR: diskgroup DBFILE_GRP was not mounted
NOTE: cache dismounting group 2/0xC40D9B7E (FLASHBACK_GRP)
NOTE: dbwr not being msg’d to dismount
ERROR: diskgroup FLASHBACK_GRP was not mounted

[oracle@vrh8 ~]$ oerr ora 15032
15032, 00000, “not all alterations performed”
// *Cause: At least one ALTER DISKGROUP action failed.
// *Action: Check the other messages issued along with this summary error.
//
[oracle@vrh8 ~]$ oerr ora 15063
15063, 00000, “ASM discovered an insufficient number of disks for diskgroup \”%s\””
// *Cause: ASM was unable to find a sufficient number of disks belonging to the
// diskgroup to continue the operation.
// *Action: Check that the disks in the diskgroup are present and functioning,
// that the owner of the ORACLE binary has read/write permission to
// the disks, and that the ASM_DISKSTRING initialization parameter
// has been set correctly. Verify that ASM discovers the appropriate
// disks by querying V$ASM_DISK from the ASM instance.
//

主に三つのエラが現れる:
ORA-15032: not all alterations performed
ORA-15063: diskgroup “FLASHBACK_GRP” lacks quorum of 2 PST disks; 0 found
ORA-15063: diskgroup “DBFILE_GRP” lacks quorum of 2 PST disks; 0 found

SQL> select path, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE from v$asm_disk;
PATH MOUNT_S HEADER_STATUS MODE_ST STATE
————- ——- ————- ——- ——–
/dev/raw/raw1 CLOSED PROVISIONED ONLINE NORMAL

一般的に、このトラブルの原因はハードウェアトラブルあるいはストレージハードウェアがアップグレードしている。

Kfedでdisk headerをチェックする:
kfed read /dev/raw/raw1
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.driver.provstr: ORCLDISKASM1 ; 0x000: length=12
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: ASM1 ; 0x028: length=4
kfdhdb.grpname: DBFILE_GRP ; 0x048: length=10
kfdhdb.fgname: ASM1 ; 0x068: length=4
kfdhdb.capname: ; 0x088: length=0
kfdhdb.dskname: ASM1 ; 0x028: length=4
kfdhdb.grpname: DBFILE_GRP ; 0x048: length=10
kfdhdb.fgname: ASM1 ; 0x068: length=4
kfdhdb.capname: ; 0x088: length=0

kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER, KFDHDR_MEMBERを観察して、そのheader状態が実際にはMEMBERだと説明してください。けどV$ASM_DISK.HEADER_STATUS はPROVISIONEDと映して、二つには一致していない。

disk headerでkfdhdb.hdrsts がdiskの状態を標識した。次のテーブルでいくつかの状態のディスクライブを含んでいる:
 

kfdhdb.hdrsts ディスクライブ
MEMBER 今のdiskgroupのdiskに属している
FORMER このdiskは前にあるdiskgroupに属しているが、そのdiskgroupも削除された
CANDIDATE 裸のマシンを使うときに、新しいdiskgroupに使われるdisk
PROVISIONED Asmlibを使うと、新しいdiskgroupに使われるdisk

 

もしkfed  readがその状態は0x027: KFDHDR_MEMBERの場合に、V$ASM_DISKがPROVISIONEDと示すわけにはいかない。

Checksumが正確ではない場合に、V$ASM_DISK.HEADER_STATUSがPROVISIONEDと示すかもしれない。

例えば、ハードウェアトラブルがheaderの部分が正常を示すが、あるいはchecksumだけが正確ではないと示される。Kfedでそのトラブルをリカバリできる。流石に、 kfed、amdu 、adhuはASM三神器である。

 

【Oracle ASMデータリカバリ】ORA-15066 & too many offline disks in PST (grp 1)

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

ある10.2.0.5 ASMシステムで、ストレージトラブルでnormal redundancy diskgroupに四つのfailgroupの中で、二つのfailgroupのASM DISKがアクセス出来ない。もし10gの環境でASM diskをアクセス出来ないであれば、disk dropをdiskgroupから出す。

 

エラは以下の通り:

 

Sat Nov 15 17:05:36 CST 2013WARNING: PST-initiated drop disk 1(739802527).1(3915943320) WARNING: PST-initiated drop disk 1(739802527).3(3915943321) Sat Nov 15 17:05:36 CST 2013NOTE: PST update: grp = 1Sat Nov 15 17:05:36 CST 2013ERROR: too many offline disks in PST (grp 1)Sat Nov 15 17:05:36 CST 2013ERROR: ORA-15066 signalled during reconfiguration of diskgroup DATADGNOTE: requesting all-instance membership refresh for group=1Sat Nov 15 17:05:36 CST 2013NOTE: membership refresh pending for group 1/0x2c187d9f (DATADG)WARNING: rejecting drop force of disk number 1WARNING: rejecting drop force of disk number 3SUCCESS: refreshed membership for 1/0x2c187d9f (DATADG)Sat Nov 15 17:05:39 CST 2013ERROR: PST-initiated disk drop failedSat Nov 15 17:05:39 CST 2013ERROR: PST-initiated MANDATORY DISMOUNT of group DATADGNOTE: cache dismounting group 1/0x2C187D9F (DATADG) Sat Nov 15 17:05:39 CST 2013NOTE: halting all I/Os to diskgroup DATADG

normal redundancyなので,せいぜい一つのfailgroupしかなくせない。ASMは二つのfailgroupをdropしてみると、diskgroupを強制的にDISMOUNTされる。つまり即PST-initiated MANDATORY DISMOUNT of group DATADG。

 

PST-initiated drop disk =》 PST初期化drop disk

ERROR: too many offline disks in PST (grp 1) ==> ASMはこれだけのdiskをdropするとデータをなくすから、too many offline disk

 

WARNING: rejecting drop force of disk number 1
WARNING: rejecting drop force of disk number 3  ==> ASMはこれらのdiskをdropすることを拒否している。

ERROR: PST-initiated disk drop failed==> PSTは drop diskを初期化することに失敗した。

ERROR: PST-initiated MANDATORY DISMOUNT of group ORADATA==》 強制的にdismount diskgroup

 

以上のように、asm diskgroupに対するメンテナンスはループに落ちる。つまり mount diskgroup => drop diskを開始して=> drop diskが多すぎたから、asm が強制的にdismount diskgroupした => 人工的にmount diskgroupする。 そのASM Diskgroupは実際に使えなくなった。

 

隠しバラメタをセットすることで、あるいは人工的にpatch PSTするの方法でこのトラブルを避けられるが、実際に実行したい時にASM底のデータ構造に詳しいエンジニアが現場の状態によって実行するので、ここでは割愛します。

Oracle ASM no read quorum in group: required 1, found 0 disks & ORA-15063 ORA-15017 ORA-15032

プロのOracle Databaseの復旧サービスを提供
携帯番号: +86 13764045638 メール:service@parnassusdata.com

 

ASMを使うときにASM Diskgroupが現れて、MOUNTできなくなった上に、ASMインスタンスのalert.logに以下のようなエラが現れた:
ERROR: no read quorum in group: required 1, found 0 disks
ORA-15032: not all alterations performed
ORA-15017: diskgroup “TESTDG04” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “TESTDG04”
ERROR: /* ASMCMD */ALTER DISKGROUP testdg04 MOUNT
これはASM Diskgroup上の部分、Metadataメータデータにロジック損害/一致していない事情が現れたと意味している。この場合にはkfed repairでリカバリしてください (10.2.0.5バーション後)
kfed repairでうまくいかない場合には、ASM底の構造に詳しいプロ技術員が人工的にロジック損害を修復するという方法しかない。

ASMCMD> mount testdg04
ORA-15032: not all alterations performed
ORA-15017: diskgroup “TESTDG04” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “TESTDG04” (DBD ERROR: OCIStmtExecute)

SQL> /* ASMCMD */ALTER DISKGROUP testdg04 MOUNT
NOTE: cache registered group TESTDG04 number=1 incarn=0x22485c89
NOTE: cache began mount (first) of group TESTDG04 number=1 incarn=0x22485c89
NOTE: Assigning number (1,0) to disk (/oracleasm/asm-disk01)
Fri Jan 30 02:33:05 2015
ERROR: no read quorum in group: required 1, found 0 disks
NOTE: cache dismounting (clean) group 1/0x22485C89 (TESTDG04)
NOTE: messaging CKPT to quiesce pins Unix process pid: 8724, image: oracle@mlab2.oracle.com (TNS V1-V3)
NOTE: dbwr not being msg’d to dismount
NOTE: lgwr not being msg’d to dismount
NOTE: cache dismounted group 1/0x22485C89 (TESTDG04)
NOTE: cache ending mount (fail) of group TESTDG04 number=1 incarn=0x22485c89
NOTE: cache deleting context for group TESTDG04 1/0x22485c89
GMON dismounting group 1 at 139 for pid 20, osid 8724
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup TESTDG04 was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup “TESTDG04” cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup “TESTDG04”
ERROR: /* ASMCMD */ALTER DISKGROUP testdg04 MOUNT
Fri Jan 30 02:33:05 2015
NOTE: No asm libraries found in the system

SQL> create diskgroup testdg04 external redundancy disk ‘/oracleasm/asm-disk01’;

Diskgroup created.
[oracle@mlab2 ~]$ kfed read /oracleasm/asm-disk01 aun=1 blkn=0|less
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 17 ; 0x002: KFBTYP_PST_META
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 256 ; 0x004: blk=256
kfbh.block.obj: 2147483648 ; 0x008: disk=0
kfbh.check: 4150372545 ; 0x00c: 0xf761a8c1
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdpHdrBv0.time.hi: 33015746 ; 0x000: HOUR=0x2 DAYS=0x1e MNTH=0x1 YEAR=0x7df
kfdpHdrBv0.time.lo: 2005330944 ; 0x004: USEC=0x0 MSEC=0x1bb SECS=0x38 MINS=0x1d
kfdpHdrBv0.last: 2 ; 0x008: 0x00000002
kfdpHdrBv0.next: 2 ; 0x00c: 0x00000002
kfdpHdrBv0.copyCnt: 1 ; 0x010: 0x01
kfdpHdrBv0.version: 0 ; 0x011: 0x00
kfdpHdrBv0.ub2spare: 0 ; 0x012: 0x0000
kfdpHdrBv0.incarn: 1 ; 0x014: 0x00000001
kfdpHdrBv0.copy[0]: 0 ; 0x018: 0x0000
kfdpHdrBv0.copy[1]: 0 ; 0x01a: 0x0000
kfdpHdrBv0.copy[2]: 0 ; 0x01c: 0x0000
kfdpHdrBv0.copy[3]: 0 ; 0x01e: 0x0000
kfdpHdrBv0.copy[4]: 0 ; 0x020: 0x0000
kfdpHdrBv0.dtaSz: 1 ; 0x022: 0x0001
ub1[0]: 2 ; 0x024: 0x02
ub1[1]: 0 ; 0x025: 0x00
ub1[2]: 0 ; 0x026: 0x00
ub1[3]: 0 ; 0x027: 0x00
ub1[4]: 0 ; 0x028: 0x00
ub1[5]: 0 ; 0x029: 0x00
ub1[6]: 0 ; 0x02a: 0x00
ub1[7]: 0 ; 0x02b: 0x00
ub1[8]: 0 ; 0x02c: 0x00
ub1[9]: 0 ; 0x02d: 0x00

自分でうまくいかないときに詩檀ソフトORACLEデータベースリカバリチームに助けを求めてください。
携帯番号: 13764045638 メール:service@parnassusdata.com

ORACLE PRMは詩檀ソフトに独立で開發したORACLEデータベースディザスターリカバリソフトである。
ORACLE PRMをダウンロードするにはこちら:http://parnassusdata.com/sites/default/files/ParnassusData_PRMForOracle_3206.zip
PRM ガイドブックhttp://www.parnassusdata.com/sites/default/files /ParnassusData%20Recovery%20Manager%20For%20Oracle%20Database%E7%94%A8%E6%88%B7%E6%89%8B%E5%86%8C%20v0.3.pdf
[oracle@mlab2 trace]$ oerr ora 15063
15063, 00000, “ASM discovered an insufficient number of disks for diskgroup \”%s\””
// *Cause: ASM was unable to find a sufficient number of disks belonging to the
// diskgroup to continue the operation.
// *Action: Check that the disks in the diskgroup are present and functioning,
// that the owner of the ORACLE binary has read/write permission to
// the disks, and that the ASM_DISKSTRING initialization parameter
// has been set correctly. Verify that ASM discovers the appropriate
// disks by querying V$ASM_DISK from the ASM instance.
//
[oracle@mlab2 trace]$ oerr ora 15017
15017, 00000, “diskgroup \”%s\” cannot be mounted”
// *Cause: The disk group could not be mounted by the ASM instance either
// of an IO error or due to other reasons explained in alert log or
// trace files.
// *Action: Check for additional errors reported. Correct the error and
// retry the operation.
//
[oracle@mlab2 trace]$ oerr ora 15032
15032, 00000, “not all alterations performed”
// *Cause: At least one ALTER DISKGROUP action failed.
// *Action: Check the other messages issued along with this summary error.
//

Oracle ASM kfed repairで何をできるか?

オフィシャルな説明によると、kfed repairコマンドについての説明は極めて簡略である:
Recover the disk header from the redundant copy of it maintained on an unused portion of the disk. 
主にdisk headerのヘッダ4096 bytesのKFBTYP_DISKHEAD構造に使われる。
このリカバリは10.2.0.5後のDisk Header自動バックアップ機能に基づいている。
PSTでつまりAU=1の最後の二つのデータブロックで自動的にバックアップされた(Read from PST(AU 1)’s penultimate Block) KFBTYP_DISKHEAD。
 
例えば:
[oracle@mlab2 oracleasm]$ kfed read asm-disk04 aun=1 blkn=254|less
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483651 ; 0x008: disk=3
kfbh.check:                    98849704 ; 0x00c: 0x05e453a8
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr:         ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]:            0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]:            0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]:            0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]:            0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]:            0 ; 0x018: 0x00000000
kfdhdb.driver.reserved[5]:            0 ; 0x01c: 0x00000000
kfdhdb.compat:                168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum:                        3 ; 0x024: 0x0003
kfdhdb.grptyp:                        3 ; 0x026: KFDGTP_HIGH
kfdhdb.hdrsts:                        3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname:              DATA1_0003 ; 0x028: length=10
kfdhdb.grpname:                   DATA1 ; 0x048: length=5
kfdhdb.fgname:               DATA1_0003 ; 0x068: length=10
kfdhdb.capname:                         ; 0x088: length=0
kfdhdb.crestmp.hi:             33006980 ; 0x0a8: HOUR=0x4 DAYS=0xc MNTH=0x9 YEAR=0x7de
kfdhdb.crestmp.lo:           2555232256 ; 0x0ac: USEC=0x0 MSEC=0x370 SECS=0x4 MINS=0x26
kfdhdb.mntstmp.hi:             33008247 ; 0x0b0: HOUR=0x17 DAYS=0x13 MNTH=0xa YEAR=0x7de
kfdhdb.mntstmp.lo:           3341018112 ; 0x0b4: USEC=0x0 MSEC=0xf9 SECS=0x32 MINS=0x31
kfdhdb.secsize:                     512 ; 0x0b8: 0x0200
kfdhdb.blksize:                    4096 ; 0x0ba: 0x1000
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.mfact:                    113792 ; 0x0c0: 0x0001bc80
kfdhdb.dsksize:                  128800 ; 0x0c4: 0x0001f720
kfdhdb.pmcnt:                         3 ; 0x0c8: 0x00000003
kfdhdb.fstlocn:                       1 ; 0x0cc: 0x00000001
kfdhdb.altlocn:                       2 ; 0x0d0: 0x00000002
:

ASMというバックアップがあるから、kfed repairでASM disk header最初の4096 bytes
をリカバリできるが、そのバックアップは4096バイトのmetadataしかバックアップできないから、
metadata全体に対応できない上に(KFBTYP_DISKHEADを除いて)
ほかの大量なmetadataメータデータが必要としている。
前に現場にトラブルを見つけ出したら、kfed repairもPST KFBTYP_PST_METAの
一部のロジックエラ、損害をリカバリできるが、自身の環境で再現できない。
 
kfed repairの処理プロセスは以下の通り:
 
[oracle@mlab2 oracleasm]$ dd if=/dev/zero of=asm-disk04 bs=4096 count=1 conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 6.9811e-05 seconds, 58.7 MB/s
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ kfed read asm-disk04
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
kfbh.spare2:                          0 ; 0x01c: 0x00000000
7F70E2F06400 00000000 00000000 00000000 00000000  [................]
  Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]

[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ www.askmac.cn
[oracle@mlab2 oracleasm]$ 
[oracle@mlab2 oracleasm]$ strace -o  kfed1.log kfed repair asm-disk04

munmap(0x7fd80aa2d000, 143360)          = 0
stat("asm-disk04", {st_mode=S_IFREG|0644, st_size=135056588800, ...}) = 0
access("asm-disk04", F_OK)              = 0
statfs("asm-disk04", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=961422084, f_bfree=454635689, f_bavail=405808746, f_files=244187136, f_ffree=24
4182184, f_fsid={-138681668, -1790432782}, f_namelen=255, f_frsize=4096}) = 0
open("asm-disk04", O_RDWR)              = 7
lseek(7, 2088960, SEEK_SET)             = 2088960
read(7, "\1\202\1\1\0\0\0\0\3\0\0\200\250S\344\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(7, 0, SEEK_SET)                   = 0
read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(7, 0, SEEK_SET)                   = 0
write(7, "\1\202\1\1\0\0\0\0\3\0\0\200\250S\344\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
close(7)

以上のシーンではKFED でKFBTYP_PST_METAのメータデータをリカバリするプロセスを再現していない。
オフセットが2088960の(2088960/4096=510=255+255 つまりAUN=1最後の二つのブロック)の4096バイト,
そしてOFFSET=0の位置に書き込む。
 
 
そして以下のKFEDメータコードで、ほかのmetadataをリカバリする手がかりを見つけ出せる:
 


#define KFEDOP_REPAIR ((kfedop)13)      /* Repair ASM disk header            */
www.askmac.cn

  case KFEDOP_REPAIR:
    /* Read from PST(AU 1)'s penultimate Block */
    cx->aunum_kfedcx  = (ub4)1;
    cx->blknum_kfedcx = (ub4)(bfact - 2);

    if (!kfedReadBlk(cx))
      goto done;

    /* Validate the Disk Header block read from PST */
    if(!kfedValidateBlk(cx, KFBTYP_DISKHEAD))
      goto done;

    /* Fix the block number and checksum in the buffer */
    if (!kfedFixBackupHeader(cx)) www.askmac.cn
      goto done;

    /* Write to Disk Header(AU 0 and Block 0) */
    cx->aunum_kfedcx  = (ub4)0;
    cx->blknum_kfedcx = (ub4)0;

    if (!kfedWriteBlk(cx))
      goto done;

    break;

以上のコードはKFEDOP_REPAIR操作がPST(AU 1)’s penultimate Blockを読み取り。
つまりAUN=1の最後の二つのブロックである。読み取れない時にはエラになる。
もし読み取れるであれば、PSTから読み取れたDISK headerのblockをテストする。
そして、その中のblock numberとchecksumの数値をFIXする。disk header、つまりAUN=0 BLKN=0のところを書き出す。。
 
とにかく、このようなASMのトラブルがあったら、DISK HEADERをバックアップしたあと(バックアップするには200MBが必要としている)。
DISK HEADER自動的なバックアップがある場合に、KFED REPAIRしてみてください。
成功できない場合に、metadata診断や人工リカバリの仕事がいっぱい待っているから。。

【Oracle ASM数据恢复】V$ASM_DISK HEADER_STATUS显示为Provisioned的问题解析

当用户加载mount一个之前可用的diskgroup 时,将在ASM的告警日志中看到下面的错误:

 

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

 

SQL> ALTER DISKGROUP ALL MOUNT
Tue Jul 19 09:31:09 2005
Loaded ASM Library - Generic Linux, version 1.0.0 library for asmlib interface
Tue Jul 19 09:31:09 2005
NOTE: cache registered group DBFILE_GRP number=1 incarn=0xc3fd9b7d
NOTE: cache registered group FLASHBACK_GRP number=2 incarn=0xc40d9b7e
NOTE: cache dismounting group 1/0xC3FD9B7D (DBFILE_GRP)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DBFILE_GRP was not mounted
NOTE: cache dismounting group 2/0xC40D9B7E (FLASHBACK_GRP)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup FLASHBACK_GRP was not mounted

[oracle@vrh8 ~]$ oerr ora 15032
15032, 00000, "not all alterations performed"
// *Cause:  At least one ALTER DISKGROUP action failed.
// *Action: Check the other messages issued along with this summary error.
//
[oracle@vrh8 ~]$ oerr ora 15063
15063, 00000, "ASM discovered an insufficient number of disks for diskgroup \"%s\""
// *Cause:  ASM was unable to find a sufficient number of disks belonging to the
//          diskgroup to continue the operation.
// *Action: Check that the disks in the diskgroup are present and functioning, 
//          that the owner of the ORACLE binary has read/write permission to 
//          the disks, and that the ASM_DISKSTRING initialization parameter 
//          has been set correctly.  Verify that ASM discovers the appropriate 
//          disks by querying V$ASM_DISK from the ASM instance.
//

 

 

 

主要是出现三个错误:
ORA-15032: not all alterations performed
ORA-15063: diskgroup “FLASHBACK_GRP” lacks quorum of 2 PST disks; 0 found
ORA-15063: diskgroup “DBFILE_GRP” lacks quorum of 2 PST disks; 0 found

 

 

检查V$ASM_DISK 的HEADER_STATUS 显示为PROVISIONED,甚至当这个disk没有被asmlib label过:

 

 

SQL> select path, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE from v$asm_disk;
PATH MOUNT_S HEADER_STATUS MODE_ST STATE
------------- ------- ------------- ------- --------
/dev/raw/raw1 CLOSED PROVISIONED ONLINE NORMAL

 

 

 

导致该问题的原因一般是硬件故障,或者由于存储的固件升级

 

使用kfed来检查disk header:

kfed read /dev/raw/raw1
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfdhdb.driver.provstr: ORCLDISKASM1 ; 0x000: length=12
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: ASM1 ; 0x028: length=4
kfdhdb.grpname: DBFILE_GRP ; 0x048: length=10
kfdhdb.fgname: ASM1 ; 0x068: length=4
kfdhdb.capname: ; 0x088: length=0
kfdhdb.dskname: ASM1 ; 0x028: length=4
kfdhdb.grpname: DBFILE_GRP ; 0x048: length=10
kfdhdb.fgname: ASM1 ; 0x068: length=4
kfdhdb.capname: ; 0x088: length=0

 

 

观察kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER,  KFDHDR_MEMBER说明其header状态实际是MEMBER,而V$ASM_DISK.HEADER_STATUS 则显示为PROVISIONED,2者并不匹配。

 

在disk header中kfdhdb.hdrsts 标记了本disk的状态, 下表显示了几个状态的描述:

 

kfdhdb.hdrsts 描述
MEMBER 属于当前diskgroup的disk
FORMER 这个disk以前属于一个diskgroup,现在这个diskgroup被删除了
CANDIDATE 当使用裸设备,一个新的可以被diskgroup所用的disk
PROVISIONED 当使用asmlib,一个新的可以被diskgroup所用的disk

 

 

如果kfed  read发现其状态为0x027: KFDHDR_MEMBER,则V$ASM_DISK不应当显示PROVISIONED。

我们发现当checksum不正确时可能导致V$ASM_DISK.HEADER_STATUS显示为PROVISIONED。

例如当发生硬件故障导致header的部分显得正常而仅仅checksum不正确。那么我们可以通过kfed来修复该问题的, kfed不愧和 amdu 、adhu合称ASM 三神器。

 

 

 

 

ASM Metadata元数据介绍

1.    ASM Metadata介绍

 

 

ASM基础概念:

 

  • ASM的最小存储单位是一个”allocation unit”(AU),通常为1MB,在Exadata上推荐为4MB
  • ASM的核心是存储文件
  • 文件被划分为多个文件片,称之为”extent”
  • 11g之前extent的大小总是为一个AU,11g之后一个extent可以是1 or 8 or 64个AU
  • ASM使用file extent map维护文件extent的位置
  • ASM在LUN DISK的头部header维护其元数据,而非数据字典
  • 同时RDBMS DB会在shared pool中缓存file extent map,当server process处理IO时使用
  • 因为ASM instance使用类似于普通RDBMS的原理的instance/crash recovery,所以ASM instance奔溃后总是能复原的

 

ASM将任何文件以AU大小均匀分布在Disk Group的所有Disk上。每一个ASM Disk均被维护以保持同样的使用比率。这保证同一个Disk Group中的所有Disk的IO负载基本一致。由于ASM在一个Disk Group中的磁盘上的负载均衡,所以为同一个物理磁盘的不同区域划分为2个ASM Disk不会对性能有所影响;而同一个物理磁盘上划分2个不同分区置于不同的2个Disk Group则有效。

当ASM Disk Group启用冗余时单个ASM Disk仅是一个失败单元。对于该ASM Disk的写失败在10g会自动从该Disk Group drop掉该Disk,前提是该Disk的丢失被容许。

 

Allocation Unit

每一个ASM Disk都被划分为许多个AU allocation units(单个AU 的大小在 1MB ~64MB,注意总是2的次方MB)。而且AU allocation unit也是Disk Group的基本分配单元。一个ASM Disk上的可用空间总是整数倍个AU。在每一个ASM Disk的头部均有一个表,该表的每一条记录代表该ASM Disk上的一个AU。文件的extent指针(pointer)给出了ASM Disk Number磁盘号和AU号,这就描述了该extent的物理位置。由于所有的空间操作都以AU为单位,所以不存在所谓ASM碎片这样的概念和问题。

一个AU(1M~64M)足够小,以便一个文件总是要包含很多个AU,这样就可以分布在很多磁盘上,也不会造成热点。一个AU又足够大以便能够在一个IO操作中访问它,以获得更加的吞吐量,也能提供高效的顺序访问。访问一个AU的时间将更多的消耗在磁盘传输速率上而非花在寻找AU头上。对于Disk Group的重新平衡也是对每一个AU逐次做的。

 

 

asm disk的前50个AU(50MB)是为asm metadata保留的

ASM 的前255个file number是为metadata file保留的,文件号从1开始, file numner=1的1号文件为ASM的file directory;普通的ASM File的file number从256开始

 

 

可以使用脚本 GetAsmDH.sh来定期备份 ASM header的metadata数据, GetAsmDH.sh本质是使用 dd 来备份asm header 前1MB的数据

GetAsmDH.sh:

 

 

#!/bin/sh

mkdir /tmp/HC 2> /dev/null
rm -f /tmp/HC/asmdisks.lst 2> /dev/null
rm -f /tmp/HC/asm_diskh.sh 2> /dev/null

echo " "
echo "############################################"
echo " 1) Collecting Information About the Disks:"
echo "############################################"

sqlplus '/nolog' <<eof
connect / as sysdba
set linesize 90
col path format a60
set heading off
set head off
set feedback off
spool /tmp/HC/asmdisks.lst
select group_number,disk_number,path from v\$asm_disk_stat where group_number > 0 order by group_number,disk_number;
spool off;
eof
echo " "
echo " "

ls -l /tmp/HC/asmdisks.lst


echo " "
echo "############################################"
echo " 2) Generating "asm_diskh.sh" script."
echo "############################################"
echo " "

grep -v SQL /tmp/HC/asmdisks.lst > /tmp/HC/asmdisks_tmp.lst

mv /tmp/HC/asmdisks_tmp.lst /tmp/HC/asmdisks.lst 

sed 's/ORCL:/\/dev\/oracleasm\/disks\//g' </tmp/HC/asmdisks.lst>/tmp/HC/asmdisks_NEW.lst

mv /tmp/HC/asmdisks_NEW.lst /tmp/HC/asmdisks.lst



cat /tmp/HC/asmdisks.lst|while read LINE
do
comm=`echo $LINE|awk '{print "dd if="$3 " of=/tmp/HC/dsk_"$1"_"$2".dd bs=1048576 count=1"}'`
echo $comm >> /tmp/HC/asm_diskh.sh
done 

chmod 700 /tmp/HC/asm_diskh.sh

ls -l /tmp/HC/asm_diskh.sh

echo " "
echo "############################################"
echo " 3) Executing asm_diskh.sh script to "
echo " generate dd dumps."
echo "############################################"
echo " "


### For display only
/tmp/HC/asm_diskh.sh 2> /dev/null
ls -l /tmp/HC/*dd

echo " "
echo "############################################"
echo " 4) Compressing dd dumps in the next format:"
echo " (asm_dd_header_all_<date_time>.tar)"
echo "############################################"
echo " "




NOW=$(date +"%m-%d-%Y_%T")

tar -cvf /tmp/HC/asm_dd_header_all_$NOW.tar /tmp/HC/*.dd 2> /dev/null

compress /tmp/HC/asm_dd_header_all_$NOW.tar

ls -l /tmp/HC/*.Z












当需要恢复 ASM header时可以直接 将备份的文件dd回去 ,但是要注意dd加上conv=notrunc
其他一些有用的脚本:

 

以下脚本用于ASM修复disk header时:
 
 
1. dd各种有用的metadata block :
 
#! /bin/sh
rm /tmp/kfed_DH.out /tmp/kfed_FS.out /tmp/kfed_BK.out /tmp/kfed_FD.out /tmp/kfed_DD.out /tmp/kfed_PST.out
for i in `ls /dev/asm-disk*`
do
echo $i >> /tmp/kfed_DH.out
kfed read $i >> /tmp/kfed_DH.out
echo $i >> /tmp/kfed_FS.out
kfed read $i blkn=1 >> /tmp/kfed_FS.out
echo $i >> /tmp/kfed_BK.out
kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out
echo $i >> /tmp/kfed_FD.out
kfed read $i aun=2 blkn=1 >> /tmp/kfed_FD.out
echo $i >> /tmp/kfed_DD.out
kfed read $i aun=2 blkn=2 >> /tmp/kfed_DD.out
echo $i >> /tmp/kfed_PST.out
kfed read $i aun=1 blkn=2 >> /tmp/kfed_PST.out
done
 
 
 
kfed_DH.out ==>KFBTYP_DISKHEAD      aun=0 blkn=0
kfed_FS.out ==>  KFBTYP_FREESPC      aun=1 blkn=0
kfed_BK.out  ==> KFBTYP_DISKHEAD DISK HEAD BACKUP   aun=1 blkn=254
kfed_FD.out  ==> KFBTYP_FILEDIR   aun=2  blkn=1
kfed_DD.out  ==> KFBTYP_FILEDIR  aun=2 blkn=2
kfed_PST.out ==> KFBTYP_PST_NONE aun=1 blkn=2
 
2 . Query ASM header from SQL:
 
 
spool asm_info.html
set pagesize 1000
set linesize 250
set feedback off
col bytes format 999,999,999,999
col space format 999,999,999,999
col gn format 999
col name format a20
col au format 99999999
col state format a12
col type format a12
col total_mb format 999,999,999
col free_mb format 999,999,999
col od format 999
col compatibility format a12
col dn format 999
col mount_status format a12
col header_status format a12
col mode_status format a12
col mode format a12
col failgroup format a20
col label format a12
col path format a45
col path1 format a40
col path2 format a40
col path3 format a40
col bytes_read format 999,999,999,999,999
col bytes_written format 999,999,999,999,999
col cold_bytes_read format 999,999,999,999,999
col cold_bytes_written format 999,999,999,999,999

alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS' ;

select to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS' ) current_time from dual;
select group_number gn, name, allocation_unit_size au, state, type, total_mb, free_mb, offline_disks od, compatibility from v$asm_diskgroup;
select group_number gn,disk_number dn, mount_status, header_status,mode_status,state, total_mb, free_mb,name, failgroup, label, path,create_date, mount_date from v$asm_disk order by group_number, disk_number;

break on g_n skip 1
break on failgroup skip 1
compute sum of t_mb f_mb on failgroup
compute count of failgroup on failgroup

select g.group_number g_n,g.disk_number d_n,g.name , g.path , g.total_mb t_mb,g.free_mb f_mb,g.failgroup from v$asm_disk g order by g_n, failgroup, d_n;
SET MARKUP HTML ON
set echo on
select 'THIS ASM REPORT WAS GENERATED AT: ==)> ' , sysdate " " from dual;
select 'HOSTNAME ASSOCIATED WITH THIS ASM INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';
select * from v$asm_diskgroup;
SELECT * FROM V$ASM_DISK ORDER BY GROUP_NUMBER,DISK_NUMBER;
SELECT * FROM V$ASM_CLIENT;
select * from V$ASM_ATTRIBUTE;
select * from v$asm_operation;
select * from v$version;
show parameter
show sga
spool off
exit
 
 
 
AMDU result:
 
 
Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)

amdu -diskstring '/dev/asm-disk*' -dump 'MACLEAN_DG' -noimage
 
 
 
4. 脚本查找LISTHEAD
 
 
#!/bin/bash
# Usage: scan.sh     
i=0
size=0
asize=$2
rm list.txt
echo AUSZIE=$asize
while [ 1 ]
do
kfed read $1 ausz=$asize aunum=$i blknum=0 | grep LISTHEAD > list.txt
size=$(stat -c %s list.txt)
if [ $size -gt 0 ]; then
  echo LISTHEAD is found in AU=$i FILE=lhAU$i.txt
  kfed read $1 ausz=$asize aunum=$i blknum=0 text=lhAU$i.txt
fi
i=$[$i+1]
if [ $i -eq $3 ]; then
  echo $3 AUs scanned
  exit 0
fi
done
 
 
使用方法:
 
[grid@vmac1 tmp]$ ./scan.sh /dev/asm-diskb 1048576 10
AUSZIE=1048576
LISTHEAD is found in AU=2 FILE=lhAU2.txt
10 AUs scanned

沪ICP备14014813号-2

沪公网安备 31010802001379号