【ASM内部原理】_asm_kill_unresponsive_clients & _asm_healthcheck_timeout

SQL> SQL> select * From V$VERSION;

BANNER
——————————————————————————–
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
PL/SQL Release 11.2.0.3.0 – Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 – Production
NLSRTL Version 11.2.0.3.0 – Production
SQL>select pid,pname from v$process

PID PNAME
———- —–
1
2 PMON
3 PSP0
4 VKTM
5 GEN0
6 DIAG
7 DBRM
8 PING
9 ACMS
10 DIA0
11 LMON
12 LMD0
13 LMS0
14 RMS0
15 LMHB
16 MMAN
17 DBW0
18 LGWR
19 CKPT
20 SMON
21 RECO
22 RBAL
23 ASMB
24 MMON
25 MMNL
26 MARK
27 SMCO
28 LCK0
29 RSMN
30 J000
31 ARC1
32 J001
33 W000
34 W001
35
36 ARC0
37 ARC2
38 ARC3
39 GTX0
40 RCBG
41 QMNC
42
43 Q000
44 O000
45 CJQ0
51 Q001
53 GCR0

23 ASMB
SQL> oradebug setorapid 23;
Oracle pid: 23, Unix process pid: 5771, image: oracle@maclean1.oracle.com (ASMB)
SQL> oradebug short_stack;
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+115<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+485<-ttcdrv()+1461<-nioqwa()+61<-upirtrc()+1385<-upirtr()+148<-kpurcs()+34<-OCIKDispatch()+42<-kfnOpExecuteWithWait()+722<-kfnbRun()+5370<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36
SQL>
SQL> oradebug suspend;
Statement processed.
SQL> oradebug short_stack;
ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1876<-ksdxsus()+1101<-ksdxffrz()+40<-ksdxcb()+1876<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+115<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+485<-ttcdrv()+1461<-nioqwa()+61<-upirtrc()+1385<-upirtr()+148<-kpurcs()+34<-OCIKDispatch()+42<-kfnOpExecuteWithWait()+722<-kfnbRun()+5370<-ksbrdp()+971<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+252<-main()+201<-__libc_start_main()+244<-_start()+36

2013-05-21 09:19:50.952000 -04:00
Unix process pid: 5771, image: oracle@maclean1.oracle.com (ASMB) flash frozen [ command #2 ]

 

 

WARNING: client [PROD1:PROD] not responsive for 214s; state=0x1. killing pid 17379
WARNING: client [PROD1:PROD] cleanup delayed; waited 274s, pid 17379 mbr 0x1

WARNING: client [PROD1:PROD1] not responsive for 206s; state=0x1. killing pid 7865 
Thu Feb 07 10:13:37 2013
WARNING: client [PROD1:PROD1] cleanup delayed; waited 266s, pid 7865 mbr 0x2
Thu Feb 07 10:14:27 2013
System State dumped to trace file /u01/app/oracle/diag/asm/+asm/+ASM2/trace/+ASM2_ora_28130.trc
Thu Feb 07 10:14:37 2013
WARNING: client [PROD1:PROD1] cleanup delayed; waited 326s, pid 7865 mbr 0x2

 

_asm_kill_unresponsive_clients = TRUE
_asm_healthcheck_timeout = 180(s) seconds until health check takes action

 

“NOTE: parameter _asm_kill_unresponsive_clients is disabled: not ejecting client

ASM + AMDU

AMDU just extracts data from an ASM file system onto a regular file system.  The output generates database datafiles with an extension “.f”  All you have to do is an ALTER DATAFILE … RENAME … TO … command and you can use these extracted files to mount and open your database.

 

Little bit of history of our unique situation which will hopefully make the following steps clear:
1) We were using two ASM diskgroups +DATA and +FRA
2) DATA group has 16 disk groups of 1.03TB each disk
3) DATA group got messed up because we had some corruption in the disk headers, due to accidentally adding 2 new disk to the DATA diskgroup that were over 2TB in size.
4) We then removed these 2 new disks by creating a new diskgroup (DATA1) with these 2 disks.
5) FRA diskgroup was OK and it contained a copy of the controlfile and multiplexed online redo logs and multiplexed archived logs.  This is one recommendation I can’t stress enough => multiplexing your controlfile, online logs and archived logs.  We also had a pfile converted from an spfile available as well.  Recovery becomes more of a pain in the balls without these things, I’d strongly advide at least making sure these things are backed up.
6) The result is that we were left with the DATA diskgroup thinking is was supposed to have these 2 disks as members; and the 2 disks have no disk header information about the DATA diskgroup – those 2 disks think they’re part of the diskgroup DATA1 only.

Damn, what a mess.  But the good new is, we fixed it using AMDU. AMDU is just a bit by bit data extraction tool.  It’s similar to RMAN only in that it extracts your Oracle data to another location.  It doesn’t check for corruption or anything else, it just copies stuff from one place to another.  There are no tricks or tuning or fancy parameters in AMDU.  It’s just a straight forward, simple extraction tool.

AMDU isn’t documented and Oracle support will tell you not to do the following steps without guidance from support, but if you’re just wanting to mess around with AMDU for educational purposes or are in a total desperate situation, then you can try the following at your own risk.  Although we did use this on our Live environment.  Oh, and I forgot to mention – we didn’t have ANY backup at all because of a mixup in hardware.  So we were basically running live without a net.  Anyway, AMDU did save the day, here’s how:

1) Find the file names for each datafile in your ASM diskgroup.  Our looked something like this:
+FRA/orcl/datafile/media.260.739318209
+DATA/orcl/datafile/system.256.739321475
+DATA/orcl/datafile/sysaux.257.739321555
+DATA/orcl/datafile/undotbs.258.739321589
+DATA/orcl/datafile/users.259.739321609

These file names are another thing I strongly recommend you keep in a separate text file in a safe location.  And while I’m thinking about it, also keep the DBID returned from RMAN when you connect to it.  You’ll need RMAN after AMDU is done, so it’s good to check it out anyway.
The important part we needed from the datfiles was the digits “.260″, “.256″, “.257″, “.258″, “.259″   We had this because we were using OMF with our ASM.  I’m not sure how it works if you’re not using OMF but I can imagine some research and experimentation would solve that.

2) Next you have to make a place to let AMDU extract data too.  This needs to be a file system and it needs to be at least the same size that the storage needed for the DB.  For example, our DB was 9TB in size, therefore we needed to create a 9TB file system.  We made it 10TB just to be sure.
AMDU just extracts data from an ASM file system.  It doesn’t check or verify anything.

3) Next, make sure AMDU works.  What you’ll do here is a -dump of the metadata which AMDU will use to find the data to extract.  This command will produce 3 small files on your file system in the same directory from which you launch the amdu command. This can be run harmlessly.
$ amdu -diskstring ‘/dev/rdsk/*’ -dump DATA

If you get a Bus Error (core dump) error, try making sure you NLS_ parameters are cleared and also try export LD_LIBRARY_PATH=/path/of/amdu

4) Check the report.txt file for anything strange or any errors.  In the report.txtfile we had, we saw things like this:
AMDU-00201: Disk N0018: ‘/dev/rdsk/c7t60080E5000185EB00000037A4D0DE994d0s6′
AMDU-00209: Corrupt block found: Disk N0038 AU [1] block [254] type [0]
AMDU-00201: Disk N0038: ‘/dev/rdsk/c7t60080E5000185EB20000039B4D0DE979d0s6′
AMDU-00209: Corrupt block found: Disk N0040 AU [1] block [254] type [0]
AMDU-00201: Disk N0040: ‘/dev/rdsk/c7t60080E5000185EB20000039D4D0DE9B5d0s6′
AMDU-00209: Corrupt block found: Disk N0020 AU [1] block [254] type [0]
AMDU-00201: Disk N0020: ‘/dev/rdsk/c7t60080E5000185EB00000037C4D0DE9CDd0s6′
AMDU-00209: Corrupt block found: Disk N0042 AU [1] block [254] type [0]
AMDU-00201: Disk N0042: ‘/dev/rdsk/c7t60080E5000185EB20000039F4D0DE9F3d0s6′
AMDU-00209: Corrupt block found: Disk N0056 AU [1] block [254] type [0]

but it turned out this was OK and not cause for concern.  I know I said earlier AMDU doesn’t check for corruption, but the mentioned errors above happened in a second step where each allocation unit will be checked for corruptions and I/O errors by default.  Even if we did have corruption, we still needed to extract with AMDU and then do an RMAN backup to another set of disks, which will verify if we have corruption and where it is.  Then we can work on fixing it.

6) When you’re ready to extract your data using AMDU, do this:
$ cd <directory whenre you want to extract the data to>
$ amdu -diskstring ‘/dev/rdsk/*’ -extract ‘DATA.258

where -diskstring is the same setting you have in your ASM instance for asm_diskstring parameter and -extract requires the parameters <diskgroup>.<middle_number_of _the_OMF_file_name>

This will extract the ASM datafile to your file system.  It’ll create a folder in the directory where you launched the amdu command from and create two files in that folder called DATA_258.f  and report.txt.  The file DATA_258.f is basically the same thing that you’ll get if you did a regular extract from an ASM file system to a regular file system.  The DATA_258.f file can be used by the database with just a few configuration changes to your database.

7) Now that we have the file extracted to a file system, we have to tell the DB to start using this “.f” file.  This can be accomplished with the standard RENAME DATAFILE command while the database in mounted:
ALTER DATABASE RENAME file ‘+DATA/orcl/datafile/system.258.738387863′ TO ‘/test/amdu_2010_12_31_01_29_39/DATA2_258.f’;

8) Follow the above steps for all your datafiles.  Don’t forget the tempfile, or recreate it.

9) You’ll also have to do the standard DBA stuff to get the database instance open, like modifying the pfile to point to your controlfiles (if their in different places from where they’re supposed to be), etc …

10) If you had multiplexed your online redo logs and achived logs on your DATA diskgroup, you’ll have to drop the ENTIRE online redo log groups and recreate them.  Well, create new ones first on +FRA or your file system, then drop the entire redo log groups.  You can create more after rebuilding your DATA diskgroup.

11) Now that your data is off the ASM diskgroup and you have your database open and running on the amdu converted files, you can clear the diskheaders on your ASM disks, rebuild your ASM diskgroup, launch an RMAN backup, then do an RMAN switch to copy command.  See below:

a) clear disk headers using dd if=/dev/zero of=/dev/rdsk/c7t60080E5000185EB2000003924D0DE891d0s6 bs=8192 count=12800    where /dev/rdsk/c7t60080E5000185EB2000003924D0DE891d0s6 is the location of a disk in your corrupted diskgroup.  Repeat for every disk in your diskgroup.

b) recreate your diskgroup  create diskgroup DATA1 external redundancy disk
‘/dev/rdsk/c*****   where /dev/rdsk/****** is the location of a disk you want to add to the diskgroup

c) With your DB in MOUNT state, you can switch over the SYSTEM, SYSAUX, UNDO tablespaces using these commands:
RMAN> copy datafile 1 to ‘+DATA’;
RMAN> switch datafile 1 to copy;

To move other tablespaces while the DB is open:

backup as copy tablespace MEDIA format ‘+DATA’;  => where tablespace MEDIA is associated with datafile #5

RMAN> sql ‘alter database datafile 5 offline’;
RMAN> list copy of datafile 5;
RMAN> switch datafile 5 to copy;
RMAN> list copy of datafile 5;
RMAN> recover datafile 5;
RMAN> sql ‘alter database datafile 5 online’;
RMAN> report schema;
RMAN> backup current controlfile;

And you’re back in business

ASM隐藏参数列表

NAME SDESC
————————————————– —————————————-
_asm_access ASM File access mechanism
_lm_asm_enq_hashing if TRUE makes ASM use enqueue master has
hing for fusion locks

_ges_diagnostics_asm_dump_level systemstate level on global enqueue diag
nostics blocked by ASM

_asm_runtime_capability_volume_support runtime capability for volume support re
turns supported

_asm_disable_multiple_instance_check Disable checking for multiple ASM instan
ces on a given node

_asm_disable_amdu_dump Disable AMDU dump
_asmsid ASM instance id
_asm_global_dump_level System state dump level for ASM asserts
_remote_asm_publish_caps Publish instance type in SQLPLUS
_remote_asm remote ASM configuration
_asm_allow_system_alias_rename if system alias renaming is allowed
_asm_instlock_quota ASM Instance Lock Quota
asm_diskstring disk set locations for discovery
_asm_disk_repair_time seconds to wait before dropping a failin
g disk

asm_preferred_read_failure_groups preferred read failure groups
_asm_disable_profilediscovery disable profile query for discovery
_asm_imbalance_tolerance hundredths of a percentage of inter-disk
imbalance to tolerate

_asm_shadow_cycle Inverse shadow cycle requirement
_asm_primary_load_cycles True if primary load is in cycles, false
if extent counts

_asm_primary_load Number of cycles/extents to load for non
-mirrored files

_asm_secondary_load_cycles True if secondary load is in cycles, fal
se if extent counts

_asm_secondary_load Number of cycles/extents to load for mir
rored files

asm_diskgroups disk groups to mount automatically
asm_diskgroups2 disk groups to mount automatically set 2
asm_diskgroups3 disk groups to mount automatically set 3
asm_diskgroups4 disk groups to mount automatically set 4
asm_power_limit number of parallel relocations for disk
rebalancing

_asm_log_scale_rebalance Rebalance power uses logarithmic scale
_asm_sync_rebalance Rebalance uses sync I/O
_asm_ausize allocation unit size
_asm_blksize metadata block size
_asm_acd_chunks initial ACD chunks created
_asm_partner_target_disk_part target maximum number of disk partners f
or repartnering

_asm_partner_target_fg_rel target maximum number of failure group r
elationships for repartnering

_asm_automatic_rezone automatically rebalance free space acros
s zones

_asm_rebalance_plan_size maximum rebalance work unit
_asm_rebalance_space_errors number of out of space errors allowed be
fore aborting rebalance

_asm_libraries library search order for discovery
_asm_maxio Maximum size of individual I/O request
_asm_allow_only_raw_disks Discovery only raw devices
_asm_fob_tac_frequency Timeout frequency for FOB cleanup
_asm_emulate_nfs_disk Emulate NFS disk test event
_asm_allow_lvm_resilvering Enable disk resilvering for external red
undancy

_asm_lsod_bucket_size ASM lsod bucket size
_asm_iostat_latch_count ASM I/O statistics latch count
_asm_disable_smr_creation Do Not create smr
_asm_wait_time Max/imum time to wait before asmb exits
_asm_skip_diskval_check skip client side discovery for disk reva
lidate

_asm_skip_resize_check skip the checking of the clients for s/w
compatibility for resize

_asm_skip_rename_check skip the checking of the clients for s/w
compatibility for rename

_asm_direct_con_expire_time Expire time for idle direct connection t
o ASM instance

_asm_check_for_misbehaving_cf_clients check for misbehaving CF-holding clients
_asm_diag_dead_clients diagnostics for dead clients
_asm_disable_ufg_dump disable terminated umbilicus diagnostic
_asm_reserve_slaves reserve ASM slaves for CF txns
_asm_kill_unresponsive_clients kill unresponsive ASM clients
_asm_disable_async_msgs disable async intra-instance messaging
_asm_remote_client_timeout timeout before killing disconnected remo
te clients

_asm_allow_unsafe_reconnect attempt unsafe reconnect to ASM
_asm_disable_ufgmemberkill disable ufg member kill
_asm_nodekill_escalate_time secs until escalating to nodekill if fen
ce incomplete

_asm_healthcheck_timeout seconds until health check takes action
_asm_stripewidth ASM file stripe width
_asm_stripesize ASM file stripe size
_asm_random_zone Random zones for new files
_asm_serialize_volume_rebalance Serialize volume rebalance
_asm_force_quiesce Force diskgroup quiescing
_asm_dba_threshold ASM Disk Based Allocation Threshold
_asm_dba_batch ASM Disk Based Allocation Max Batch Size
_asm_dba_spcchk_thld ASM Disk Based Allocation Space Check Th
reshold

_asm_usd_batch ASM USD Update Max Batch Size
_asm_fail_random_rx Randomly fail some RX enqueue gets
_asm_max_redo_buffer_size asm maximum redo buffer size
_asm_max_cod_strides maximum number of COD strides
_asm_evenread ASM Even Read level
_asm_evenread_alpha ASM Even Read Alpha
_asm_evenread_alpha2 ASM Even Read Second Alpha
_asm_evenread_faststart ASM Even Read Fast Start Threshold
_asm_noevenread_diskgroups List of disk groups having even read dis
abled

_asm_networks ASM network subnet addresses
_asm_dbmsdg_nohdrchk dbms_diskgroup.checkfile does not check
block headers

_asm_root_directory ASM default root directory
_asm_allowdegeneratemounts Allow force-mounts of DGs w/o proper quo
rum

_asm_hbeatiowait number of secs to wait for PST Async Hbe
at IO return

_asm_hbeatwaitquantum quantum used to compute time-to-wait for
a PST Hbeat check

_asm_repairquantum quantum (in 3s) used to compute elapsed
time for disk drop

_asm_emulmax max number of concurrent disks to emulat
e I /O errors

_asm_emultimeout timeout before emulation begins (in 3s t
icks)

_asm_kfdpevent KFDP event
_asm_storagemaysplit PST Split Possible
_asm_avoid_pst_scans Avoid PST Scans
_asm_compatibility default ASM compatibility level
_asm_proxy_startwait Maximum time to wait for ASM proxy conne
ction

_asm_admin_with_sysdba Does the sysdba role have administrative
privileges on ASM?

_asm_allow_appliance_dropdisk_noforce Allow DROP DISK/FAILUREGROUP NOFORCE on
ASM Appliances

_asm_appliance_config_file Appliance configuration file name
_asm_scrub_limit ASM disk scrubbing power

 

 

 

Script:ASM修复脚本,寻找LISTHEAD和Kfed源数据

以下脚本用于ASM修复disk header时:

 

 

1. dd各种有用的metadata block :

 

#! /bin/sh
rm /tmp/kfed_DH.out /tmp/kfed_FS.out /tmp/kfed_BK.out /tmp/kfed_FD.out /tmp/kfed_DD.out /tmp/kfed_PST.out
for i in `ls /dev/asm-disk*`
do
echo $i >> /tmp/kfed_DH.out
kfed read $i >> /tmp/kfed_DH.out
echo $i >> /tmp/kfed_FS.out
kfed read $i blkn=1 >> /tmp/kfed_FS.out
echo $i >> /tmp/kfed_BK.out
kfed read $i aun=1 blkn=254 >> /tmp/kfed_BK.out
echo $i >> /tmp/kfed_FD.out
kfed read $i aun=2 blkn=1 >> /tmp/kfed_FD.out
echo $i >> /tmp/kfed_DD.out
kfed read $i aun=2 blkn=2 >> /tmp/kfed_DD.out
echo $i >> /tmp/kfed_PST.out
kfed read $i aun=1 blkn=2 >> /tmp/kfed_PST.out
done

 

 

 

kfed_DH.out ==>KFBTYP_DISKHEAD      aun=0 blkn=0

kfed_FS.out ==>  KFBTYP_FREESPC      aun=1 blkn=0

kfed_BK.out  ==> KFBTYP_DISKHEAD DISK HEAD BACKUP   aun=1 blkn=254

kfed_FD.out  ==> KFBTYP_FILEDIR   aun=2  blkn=1

kfed_DD.out  ==> KFBTYP_FILEDIR  aun=2 blkn=2

kfed_PST.out ==> KFBTYP_PST_NONE aun=1 blkn=2

 

2 . Query ASM header from SQL:

 

 

spool asm_info.html
set pagesize 1000
set linesize 250
set feedback off
col bytes format 999,999,999,999
col space format 999,999,999,999
col gn format 999
col name format a20
col au format 99999999
col state format a12
col type format a12
col total_mb format 999,999,999
col free_mb format 999,999,999
col od format 999
col compatibility format a12
col dn format 999
col mount_status format a12
col header_status format a12
col mode_status format a12
col mode format a12
col failgroup format a20
col label format a12
col path format a45
col path1 format a40
col path2 format a40
col path3 format a40
col bytes_read format 999,999,999,999,999
col bytes_written format 999,999,999,999,999
col cold_bytes_read format 999,999,999,999,999
col cold_bytes_written format 999,999,999,999,999

alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS' ;

select to_char(sysdate, 'DD-MON-YYYY HH24:MI:SS' ) current_time from dual;
select group_number gn, name, allocation_unit_size au, state, type, total_mb, free_mb, offline_disks od, compatibility from v$asm_diskgroup;
select group_number gn,disk_number dn, mount_status, header_status,mode_status,state, total_mb, free_mb,name, failgroup, label, path,create_date, mount_date from v$asm_disk order by group_number, disk_number;

break on g_n skip 1
break on failgroup skip 1
compute sum of t_mb f_mb on failgroup
compute count of failgroup on failgroup

select g.group_number g_n,g.disk_number d_n,g.name , g.path , g.total_mb t_mb,g.free_mb f_mb,g.failgroup from v$asm_disk g order by g_n, failgroup, d_n;
SET MARKUP HTML ON
set echo on
select 'THIS ASM REPORT WAS GENERATED AT: ==)> ' , sysdate " " from dual;
select 'HOSTNAME ASSOCIATED WITH THIS ASM INSTANCE: ==)> ' , MACHINE " " from v$session where program like '%SMON%';
select * from v$asm_diskgroup;
SELECT * FROM V$ASM_DISK ORDER BY GROUP_NUMBER,DISK_NUMBER;
SELECT * FROM V$ASM_CLIENT;
select * from V$ASM_ATTRIBUTE;
select * from v$asm_operation;
select * from v$version;
show parameter
show sga
spool off
exit

 

 

 

AMDU result:

 

 

Placeholder for AMDU binaries and using with ASM 10g (Doc ID 553639.1)

amdu -diskstring '/dev/asm-disk*' -dump 'MACLEAN_DG' -noimage

 

 

 

4. 脚本查找LISTHEAD

 

 

#!/bin/bash
# Usage: scan.sh   <path> <AU size> <disk size in AU>
i=0
size=0
asize=$2
rm list.txt
echo AUSZIE=$asize
while [ 1 ]
do
kfed read $1 ausz=$asize aunum=$i blknum=0 | grep LISTHEAD > list.txt
size=$(stat -c %s list.txt)
if [ $size -gt 0 ]; then
  echo LISTHEAD is found in AU=$i FILE=lhAU$i.txt
  kfed read $1 ausz=$asize aunum=$i blknum=0 text=lhAU$i.txt
fi
i=$[$i+1]
if [ $i -eq $3 ]; then
  echo $3 AUs scanned
  exit 0
fi
done

 

 

使用方法:

 

[grid@vmac1 tmp]$ ./scan.sh /dev/asm-diskb 1048576 10
AUSZIE=1048576
LISTHEAD is found in AU=2 FILE=lhAU2.txt
10 AUs scanned

ASM丢失disk header导致ORA-15032、ORA-15040、ORA-15042 Diskgroup无法mount

ASM丢失disk header导致ORA-15032、ORA-15040、ORA-15042 Diskgroup无法mount的案例不少,这里我们介绍下如何解决。

 

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

SQL> select * from v$version;

BANNER
——————————————————————————–
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
PL/SQL Release 11.2.0.3.0 – Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 – Production
NLSRTL Version 11.2.0.3.0 – Production
SQL> alter diskgroup datadg mount;
alter diskgroup datadg mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk “5” is missing from group number “1”
ERROR: alter diskgroup datadg mount
Wed Mar 13 07:42:03 2013
SQL> alter diskgroup datadg mount
NOTE: cache registered group DATADG number=1 incarn=0xccb845cd
NOTE: cache began mount (first) of group DATADG number=1 incarn=0xccb845cd
NOTE: Assigning number (1,2) to disk (/dev/asm-diskg)
NOTE: Assigning number (1,1) to disk (/dev/asm-diskf)
NOTE: Assigning number (1,0) to disk (/dev/asm-diske)
Wed Mar 13 16:42:09 2013
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 20 for pid 27, osid 5439
NOTE: Assigning number (1,5) to disk ()
GMON querying group 1 at 21 for pid 27, osid 5439
NOTE: cache dismounting (clean) group 1/0xCCB845CD (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 5439, image: oracle@vmac1 (TNS V1-V3)
NOTE: dbwr not being msg’d to dismount
NOTE: lgwr not being msg’d to dismount
NOTE: cache dismounted group 1/0xCCB845CD (DATADG)
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xccb845cd
NOTE: cache deleting context for group DATADG 1/0xccb845cd
GMON dismounting group 1 at 22 for pid 27, osid 5439
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk “5” is missing from group number “1”
ERROR: alter diskgroup datadg mount
Wed Mar 13 16:42:10 2013
ASM Health Checker found 1 new failures

 

[grid@vmac1 ~]$ kfed read /dev/asm-diskh
kfbh.endian: 0 ; 0x000: 0x00
kfbh.hard: 0 ; 0x001: 0x00
kfbh.type: 0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt: 0 ; 0x003: 0x00
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 0 ; 0x008: file=0
kfbh.check: 0 ; 0x00c: 0x00000000
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
7FA1DA233400 00000000 00000000 00000000 00000000 […………….]
Repeat 255 times
KFED-00322: Invalid content encountered during block traversal: [kfbtTraverseBlock][Invalid OSM block type][][0]
col path for a20
set linesize 200 pagesize 1400
select path,header_status,state from v$asm_disk;
PATH HEADER_STATUS STATE
——————– ———————————— ————————
/dev/asm-diskh CANDIDATE NORMAL
/dev/asm-diskg MEMBER NORMAL
/dev/asm-diskf MEMBER NORMAL
/dev/asm-diske MEMBER NORMAL
/dev/asm-diskc MEMBER NORMAL
/dev/asm-diskd MEMBER NORMAL
/dev/asm-diskb MEMBER NORMAL

7 rows selected.

[grid@vmac1 ~]$ kfed repair /dev/asm-diskh
KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]

 

[grid@vmac1 ~]$ kfed repair /dev/asm-diskh ausz=1048576
KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]
关闭ASM实例

 
优先备份 问题ASM的header

dd if=<bad disk> of=<file> bs=4096 count=1

 
3. 检查现有磁盘找出拥有file 1 block 1的

[grid@vmac1 ~]$ kfed read /dev/asm-diske |grep f1b1
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

[grid@vmac1 ~]$
[grid@vmac1 ~]$ kfed read /dev/asm-diskf |grep f1b1
kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000

[grid@vmac1 ~]$ kfed read /dev/asm-diskg |grep f1b1
kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000

 

这里asm-diske上出现了f1b1非零值,则其拥有file 1 block 1,可以通过检查第二个au的类型是否是KFBTYP_LISTHEAD来确认

[grid@vmac1 ~]$ kfed read /dev/asm-diske aun=2|grep kfbh.type
kfbh.type: 5 ; 0x002: KFBTYP_LISTHEAD

 

 

若丢失的磁盘包含了”file 1 block 1 F1B1″则扫描 该磁盘上所有的 AU直到找到KFBTYP_LISTHEAD , 如果找不到LISTHEAD 那么别无选择只能重建diskgroup。

 
从版本11.1.0.7开始(10g是从10.2.0.5开始,所以尽量别用10.2.0.5之前的版本上的ASM),当每一个I/O写提交向ASM disk header(AU 0 blocknum 0),都会复制到 AU 1中,最后第二个块。基于不同的AU size,该块的位置不同

Allocation Unit Size Block Number on AU 1
1048576 254
4194304 1022
8388608 2046
16777216 4094

 

首先使用kfed 验证该位置是否有正确的disk header,否则手动找到合适的header。 表1如参考:

[grid@vmac1 ~]$ kfed read /dev/asm-diske ausz=1048576 aun=1 blkn=254|less
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 254 ; 0x004: blk=254
kfbh.block.obj: 2147483648 ; 0x008: disk=0
kfbh.check: 2086475720 ; 0x00c: 0x7c5d17c8
kfbh.fcn.base: 31322 ; 0x010: 0x00007a5a
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
之后使用kfed repair命令修复disk header

[grid@vmac1 ~]$ kfed repair /dev/asm-diskh ausz=1048576

 

 
如果disk header的自动备份也丢失了,那么会报如下错误

KFED-00320: Invalid block num1 = [0], num2 = [1], error = [endian_kfbh]

 

若上述kfed repair无效则需要通过 手动恢复disk header的办法了:

 

对于版本10.2.0.5之前没有kred repair可用的情况:

 

找一个与问题disk 在同一个diskgroup中的,不包含f1b1 的好的disk header。 例如这里的asm-diskf
[grid@vmac1 ~]$ kfed read /dev/asm-diskf |grep f1b1
kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000

 

使用kfed read <device name> > fix.txt 命令保存其头部
[grid@vmac1 ~]$ kfed read /dev/asm-diskf > fix.txt
编辑fix.txt 修改kfdhdb.dsknum 、kfdhdb.dskname 、 kfdhdb.fgname 三个信息:

同时参考alert.log中的信息:

[grid@vmac1 trace]$ grep “cache opening” alert_+ASM1.log
NOTE: cache opening disk 0 of grp 1: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 1: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 1: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 0 of grp 1: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 1: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 1: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 0 of grp 1: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 1: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 1: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 0 of grp 1: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 1: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 1: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 0 of grp 1: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 1: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 1: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 0 of grp 1: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 1: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 1: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/asm-diske
NOTE: cache opening disk 1 of grp 2: DATADG_0001 path:/dev/asm-diskf
NOTE: cache opening disk 2 of grp 2: DATADG_0002 path:/dev/asm-diskg
NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/asm-diske
NOTE: cache opening disk 1 of grp 2: DATADG_0001 path:/dev/asm-diskf
NOTE: cache opening disk 2 of grp 2: DATADG_0002 path:/dev/asm-diskg
NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diske
NOTE: cache opening disk 1 of grp 1: DATADG_0001 path:/dev/asm-diskf
NOTE: cache opening disk 2 of grp 1: DATADG_0002 path:/dev/asm-diskg
NOTE: cache opening disk 0 of grp 2: SYSTEDG_0000 path:/dev/asm-diskb
NOTE: cache opening disk 1 of grp 2: SYSTEDG_0001 path:/dev/asm-diskc
NOTE: cache opening disk 2 of grp 2: SYSTEDG_0002 path:/dev/asm-diskd
NOTE: cache opening disk 5 of grp 1: DATADG_0005 path:/dev/asm-diskh

 

 
原fix.txt中的内容:

[grid@vmac1 ~]$ egrep “dsknum|grptyp|hdrsts|dskname|grpname|fgname” fix.txt
kfdhdb.dsknum: 1 ; 0x024: 0x0001
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATADG_0001 ; 0x028: length=11
kfdhdb.grpname: DATADG ; 0x048: length=6
kfdhdb.fgname: DATADG_0001 ; 0x068: length=11

 

修改后:

[grid@vmac1 ~]$ egrep “dsknum|grptyp|hdrsts|dskname|grpname|fgname” fix.txt
kfdhdb.dsknum: 5 ; 0x024: 0x0005
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATADG_0005 ; 0x028: length=11
kfdhdb.grpname: DATADG ; 0x048: length=6
kfdhdb.fgname: DATADG_0005 ; 0x068: length=11

 

还需要修改 kfbh.block.obj
[grid@vmac1 ~]$ grep kfbh.block.obj fix.txt
kfbh.block.obj: 2147483649 ; 0x008: disk=1

2147483649==》0x80000001
0x80000001 最后一位是ASM DISK NUMBER , 应当等于kfdhdb.dsknum , 在这里为 0x80000005 ==》 2147483653
[grid@vmac1 ~]$ grep kfbh.block.obj fix.txt
kfbh.block.obj: 2147483653 ; 0x008: disk=5

 
若使用windows平台上的ASMLIB则还要麻烦地修改kfdhdb.driver.reserved[0], 所幸很少人会用Windows上的ASMLIB

 

 

 

接着查看aunum=2 blknum=2来查找 disk directory, kfed read 拥有 f1b1的disk的 aunum=2 blknum=位置:

kfed read <device name> aunum=2 blknum=2 | more
[grid@vmac1 ~]$ kfed read /dev/asm-diske|grep f1b1
kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002

 

[grid@vmac1 ~]$ kfed read /dev/asm-diske aunum=2 blknum=2|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 2 ; 0x004: blk=2
kfbh.block.obj: 1 ; 0x008: file=1
kfbh.check: 322527999 ; 0x00c: 0x133962ff
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfffdb.node.incarn: 1 ; 0x000: A=1 NUMM=0x0
kfffdb.node.frlist.number: 4294967295 ; 0x004: 0xffffffff

…………………….
kfffde[0].xptr.au: 3 ; 0x4a0: 0x00000003
kfffde[0].xptr.disk: 0 ; 0x4a4: 0x0000
kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk: 41 ; 0x4a7: 0x29
kfffde[1].xptr.au: 4294967295 ; 0x4a8: 0xffffffff
kfffde[1].xptr.disk: 65535 ; 0x4ac: 0xffff
kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk: 42 ; 0x4af: 0x2a
==》disk directory 位于 disk=0 的 aunum=3

NOTE: cache opening disk 0 of grp 2: DATADG_0000 path:/dev/asm-diske ==> 还是 asm-diske
[grid@vmac1 ~]$ kfed read /dev/asm-diske aunum=3 blknum=0|more
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 6 ; 0x002: KFBTYP_DISKDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 2 ; 0x008: file=2
kfbh.check: 389127513 ; 0x00c: 0x17319d59
kfbh.fcn.base: 31299 ; 0x010: 0x00007a43
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kffdnd.bnode.incarn: 1 ; 0x000: A=1 NUMM=0x0
kffdnd.bnode.frlist.number: 4294967295 ; 0x004: 0xffffffff
kffdnd.bnode.frlist.incarn: 0 ; 0x008: A=0 NUMM=0x0
kffdnd.overfl.number: 4294967295 ; 0x00c: 0xffffffff
kffdnd.overfl.incarn: 0 ; 0x010: A=0 NUMM=0x0
kffdnd.parent.number: 0 ; 0x014: 0x00000000
kffdnd.parent.incarn: 1 ; 0x018: A=1 NUMM=0x0
kffdnd.fstblk.number: 0 ; 0x01c: 0x00000000
kffdnd.fstblk.incarn: 1 ; 0x020: A=1 NUMM=0x0
kfddde[0].entry.incarn: 1 ; 0x024: A=1 NUMM=0x0
kfddde[0].entry.hash: 0 ; 0x028: 0x00000000
kfddde[0].entry.refer.number:4294967295 ; 0x02c: 0xffffffff
kfddde[0].entry.refer.incarn: 0 ; 0x030: A=0 NUMM=0x0
………………………………………………….

kfddde 结构为 disk directory结构,仅有kfddde[0].entry.incarn A=1的记录是已分配的记录,若A=0则说明该记录被删除。

[grid@vmac1 ~]$ grep “kfddde\[5\]” disk.txt
kfddde[5].entry.incarn: 1 ; 0x8e4: A=1 NUMM=0x0
kfddde[5].entry.hash: 5 ; 0x8e8: 0x00000005
kfddde[5].entry.refer.number:4294967295 ; 0x8ec: 0xffffffff
kfddde[5].entry.refer.incarn: 0 ; 0x8f0: A=0 NUMM=0x0
kfddde[5].dsknum: 5 ; 0x8f4: 0x0005
kfddde[5].state: 2 ; 0x8f6: KFDSTA_NORMAL
kfddde[5].ddchgfl: 132 ; 0x8f7: 0x84
kfddde[5].dskname: DATADG_0005 ; 0x8f8: length=11
kfddde[5].fgname: DATADG_0005 ; 0x918: length=11
kfddde[5].crestmp.hi: 32984459 ; 0x938: HOUR=0xb DAYS=0xc MNTH=0x3 YEAR=0x7dd
kfddde[5].crestmp.lo: 2470649856 ; 0x93c: USEC=0x0 MSEC=0xc8 SECS=0x34 MINS=0x24
kfddde[5].failstmp.hi: 0 ; 0x940: HOUR=0x0 DAYS=0x0 MNTH=0x0 YEAR=0x0
kfddde[5].failstmp.lo: 0 ; 0x944: USEC=0x0 MSEC=0x0 SECS=0x0 MINS=0x0
kfddde[5].timer: 0 ; 0x948: 0x00000000
kfddde[5].size: 5120 ; 0x94c: 0x00001400
kfddde[5].srRloc.super.hiStart: 0 ; 0x950: 0x00000000
kfddde[5].srRloc.super.loStart: 0 ; 0x954: 0x00000000
kfddde[5].srRloc.super.length: 0 ; 0x958: 0x00000000
kfddde[5].srRloc.incarn: 0 ; 0x95c: 0x00000000
kfddde[5].dskrprtm: 0 ; 0x960: 0x00000000
kfddde[5].zones[0].start: 0 ; 0x964: 0x00000000
kfddde[5].zones[0].size: 5120 ; 0x968: 0x00001400
kfddde[5].zones[0].used: 2 ; 0x96c: 0x00000002

 

回到编辑fix.txt上来, 调整 crestmp.hi 和 crestmp.lo 匹配上面显示的信息,若已经匹配则无需修改。

原本

[grid@vmac1 ~]$ egrep “hi|lo” fix.txt
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 2147483653 ; 0x008: disk=5
kfdhdb.crestmp.hi: 32983191 ; 0x0a8: HOUR=0x17 DAYS=0x4 MNTH=0x2 YEAR=0x7dd
kfdhdb.crestmp.lo: 2328519680 ; 0x0ac: USEC=0x0 MSEC=0x299 SECS=0x2c MINS=0x22
kfdhdb.mntstmp.hi: 32984468 ; 0x0b0: HOUR=0x14 DAYS=0xc MNTH=0x3 YEAR=0x7dd
kfdhdb.mntstmp.lo: 1231840256 ; 0x0b4: USEC=0x0 MSEC=0x319 SECS=0x16 MINS=0x12
kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001
kfdhdb.altlocn: 2 ; 0x0d0: 0x00000002
kfdhdb.f1b1locn: 0 ; 0x0d4: 0x00000000
kfdhdb.grpstmp.hi: 32983191 ; 0x0e4: HOUR=0x17 DAYS=0x4 MNTH=0x2 YEAR=0x7dd
kfdhdb.grpstmp.lo: 2328331264 ; 0x0e8: USEC=0x0 MSEC=0x1e1 SECS=0x2c MINS=0x22
修改后

kfdhdb.crestmp.hi: 32984459 ; 0x938: HOUR=0xb DAYS=0xc MNTH=0x3 YEAR=0x7dd
kfdhdb.crestmp.lo: 2470649856 ; 0x93c: USEC=0x0 MSEC=0xc8 SECS=0x34 MINS=0x24
kfdhdb.mntstmp.hi: 32984468 ; 0x0b0: HOUR=0x14 DAYS=0xc MNTH=0x3 YEAR=0x7dd
kfdhdb.mntstmp.lo: 1231840256 ; 0x0b4: USEC=0x0 MSEC=0x319 SECS=0x16 MINS=0x12

 
之后使用kfed merge命令合并disk header

kfed merge <device name> text=fix.txt

[grid@vmac1 ~]$ kfed merge /dev/asm-diskh text=fix.txt

 

若使用ASMLIb,则使用如下命令修复header中的asmlib信息

 

/etc/init.d/oracleasm force-renamedisk /dev/sdbg1 <ASMLIB Disk Name>
/etc/init.d/oracleasm scandisks
/etc/init.d/oracleasm listdisks

 

 

之后启动ASM实例到nomount

SQL> startup nomount;
SQL> col path for a20
SQL> set linesize 200 pagesize 1400
SQL> select path,header_status,state from v$asm_disk;

PATH HEADER_STATUS STATE
——————– ———————————— ————————
/dev/asm-diskh MEMBER NORMAL
/dev/asm-diskg MEMBER NORMAL
/dev/asm-diskf MEMBER NORMAL
/dev/asm-diske MEMBER NORMAL
/dev/asm-diskc MEMBER NORMAL
/dev/asm-diskd MEMBER NORMAL
/dev/asm-diskb MEMBER NORMAL

7 rows selected.
检查头部信息是否为member

 

 

 

[grid@vmac1 ~]$ kfed read /dev/asm-diskh
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: blk=0
kfbh.block.obj: 2147483653 ; 0x008: disk=5
kfbh.check: 3412972861 ; 0x00c: 0xcb6dd53d
kfbh.fcn.base: 0 ; 0x010: 0x00000000
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
kfbh.spare2: 0 ; 0x01c: 0x00000000
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
kfdhdb.driver.reserved[0]: 0 ; 0x008: 0x00000000
kfdhdb.driver.reserved[1]: 0 ; 0x00c: 0x00000000
kfdhdb.driver.reserved[2]: 0 ; 0x010: 0x00000000
kfdhdb.driver.reserved[3]: 0 ; 0x014: 0x00000000
kfdhdb.driver.reserved[4]: 0 ; 0x018: 0x00000000

 

之后 alter diskgroup 问题DG mount;

若上述步骤无问题则可以正常mount dg;
SQL>
SQL> alter diskgroup datadg mount;

Diskgroup altered.
NOTE: cache registered group DATADG number=1 incarn=0x01c845f0
NOTE: cache began mount (first) of group DATADG number=1 incarn=0x01c845f0
NOTE: Assigning number (1,5) to disk (/dev/asm-diskh)
NOTE: Assigning number (1,2) to disk (/dev/asm-diskg)
NOTE: Assigning number (1,1) to disk (/dev/asm-diskf)
NOTE: Assigning number (1,0) to disk (/dev/asm-diske)
Wed Mar 13 19:39:49 2013
NOTE: GMON heartbeating for grp 1
GMON querying group 1 at 56 for pid 27, osid 8690
NOTE: cache opening disk 0 of grp 1: DATADG_0000 path:/dev/asm-diske
NOTE: F1X0 found on disk 0 au 2 fcn 0.31322
NOTE: cache opening disk 1 of grp 1: DATADG_0001 path:/dev/asm-diskf
NOTE: cache opening disk 2 of grp 1: DATADG_0002 path:/dev/asm-diskg
NOTE: cache opening disk 5 of grp 1: DATADG_0005 path:/dev/asm-diskh
NOTE: cache mounting (first) external redundancy group 1/0x01C845F0 (DATADG)
Wed Mar 13 19:39:49 2013
* allocate domain 1, invalid = TRUE
kjbdomatt send to inst 2
Wed Mar 13 19:39:49 2013
NOTE: attached to recovery domain 1
NOTE: starting recovery of thread=1 ckpt=11.2351 group=1 (DATADG)
NOTE: advancing ckpt for group 1 (DATADG) thread=1 ckpt=11.2351
NOTE: cache recovered group 1 to fcn 0.33763
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Wed Mar 13 19:39:49 2013
NOTE: LGWR attempting to mount thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR found thread 1 closed at ABA 11.2350
NOTE: LGWR mounted thread 1 for diskgroup 1 (DATADG)
NOTE: LGWR opening thread 1 at fcn 0.33763 ABA 12.2351
NOTE: cache mounting group 1/0x01C845F0 (DATADG) succeeded
NOTE: cache ending mount (success) of group DATADG number=1 incarn=0x01c845f0
GMON querying group 1 at 57 for pid 18, osid 2911
Wed Mar 13 19:39:49 2013
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
SUCCESS: diskgroup DATADG was mounted
SUCCESS: alter diskgroup datadg mount
Wed Mar 13 19:39:49 2013
NOTE: diskgroup resource ora.DATADG.dg is online
NOTE: diskgroup resource ora.DATADG.dg is updated
Wed Mar 13 19:39:59 2013
NOTE: client PROD1:PROD registered, osid 10169, mbr 0x1
Wed Mar 13 19:40:11 2013
NOTE: ASM client PROD1:PROD disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /g01/orabase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_10169.trc

 

 

 

 
注意 以上修复仅针对 asm header丢失磁盘头大约20MB(不一定)空间的数据, 若更多则仍可能无法mount diskgroup
若file numbe=4 Continuing Operations Directory (COD) – ASM file number 4也丢失了 那么一般很难mount了,在下面的例子中是前22MB丢失时即便kfed merge也无力回天了。
SELECT x.xnum_kffxp “Extent”,
x.au_kffxp “AU”,
x.disk_kffxp “Disk #”,
d.name “Disk name”
FROM x$kffxp x, v$asm_disk_stat d
WHERE x.group_kffxp=d.group_number
and x.disk_kffxp=d.disk_number
and x.group_kffxp=1
and x.number_kffxp=4
ORDER BY 1, 2;

Extent AU Disk # Disk name
———- ———- ———- ——————————————————————————————
0 21 5 DATADG_0005
1 16 1 DATADG_0001
2 33 2 DATADG_0002
3 34 0 DATADG_0000
4 22 5 DATADG_0005
5 34 2 DATADG_0002
6 35 0 DATADG_0000
7 33 1 DATADG_0001

SQL> alter diskgroup datadg dismount;

Diskgroup altered.

[grid@vmac1 ~]$ dd if=/dev/zero of=/dev/asm-diskh bs=1024k count=20
20+0 records in
20+0 records out
20971520 bytes (21 MB) copied, 0.0165823 s, 1.3 GB/s

 
[grid@vmac1 ~]$ kfed merge /dev/asm-diskh text=fix.txt

SQL> alter diskgroup datadg mount;

Diskgroup altered.

SQL> alter diskgroup datadg mount;

Diskgroup altered.
SQL> alter diskgroup datadg dismount;

Diskgroup altered.

[grid@vmac1 ~]$ dd if=/dev/zero of=/dev/asm-diskh bs=1024k count=21
21+0 records in
21+0 records out
22020096 bytes (22 MB) copied, 0.0182842 s, 1.2 GB/s
[grid@vmac1 ~]$ kfed merge /dev/asm-diskh text=fix.txt
SQL> alter diskgroup datadg mount;

Diskgroup altered.

 

[grid@vmac1 ~]$ dd if=/dev/zero of=/dev/asm-diskh bs=1024k count=22
22+0 records in
22+0 records out
23068672 bytes (23 MB) copied, 0.0312157 s, 739 MB/s
[grid@vmac1 ~]$ kfed merge /dev/asm-diskh text=fix.txt
SQL> alter diskgroup datadg mount;
alter diskgroup datadg mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15130: diskgroup “DATADG” is being dismounted
ORA-15066: offlining disk “DATADG_0005” in group “DATADG” may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]

 

Errors in file /g01/orabase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_8690.trc:
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ERROR: cache failed to read group=1(DATADG) fn=4 blk=0 from disk(s): 5(DATADG_0005)
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
NOTE: cache initiating offline of disk 5 group DATADG
NOTE: process _user8690_+asm1 (8690) initiating offline of disk 5.3915953639 (DATADG_0005) with mask 0x7e in group 1
WARNING: Disk 5 (DATADG_0005) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 1, dsk = 5/0xe968b5e7, mask = 0x6a, op = clear
GMON updating disk modes for group 1 at 108 for pid 27, osid 8690
ERROR: Disk 5 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 1)
WARNING: Offline of disk 5 (DATADG_0005) in group 1 and mode 0x7f failed on ASM inst 1
Wed Mar 13 20:00:56 2013
NOTE: halting all I/Os to diskgroup 1 (DATADG)
System State dumped to trace file /g01/orabase/diag/asm/+asm/+ASM1/trace/+ASM1_ora_8690.trc
NOTE: AMDU dump of disk group DATADG created at /g01/orabase/diag/asm/+asm/+ASM1/trace
ERROR: ORA-15130 signalled during mount of diskgroup DATADG
NOTE: cache dismounting (clean) group 1/0xEB784617 (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 8690, image: oracle@vmac1 (TNS V1-V3)
NOTE: LGWR doing non-clean dismount of group 1 (DATADG)
NOTE: LGWR sync ABA=18.2360 last written ABA 18.2360
kjbdomdet send to inst 2
detach from dom 1, sending detach message to inst 2
Wed Mar 13 20:00:57 2013
List of instances:
1 2
Dirty detach reconfiguration started (new ddet inc 1, cluster inc 12)
Global Resource Directory partially frozen for dirty detach
* dirty detach – domain 1 invalid = TRUE
0 GCS resources traversed, 0 cancelled
Dirty Detach Reconfiguration complete
freeing rdom 1
WARNING: dirty detached from domain 1
NOTE: cache dismounted group 1/0xEB784617 (DATADG)
NOTE: cache ending mount (fail) of group DATADG number=1 incarn=0xeb784617
NOTE: cache deleting context for group DATADG 1/0xeb784617
GMON dismounting group 1 at 109 for pid 27, osid 8690
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
NOTE: Disk in mode 0x8 marked for de-assignment
ERROR: diskgroup DATADG was not mounted
ORA-15032: not all alterations performed
ORA-15130: diskgroup “DATADG” is being dismounted
ORA-15066: offlining disk “DATADG_0005” in group “DATADG” may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ERROR: alter diskgroup datadg mount
SQL> alter diskgroup datadg mount;
alter diskgroup datadg mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15130: diskgroup “DATADG” is being dismounted
ORA-15066: offlining disk “DATADG_0005” in group “DATADG” may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 != 1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 !=1]
ORA-15196: invalid ASM block header [kfc.c:26077] [endian_kfbh] [4] [0] [0 !=1]

[grid@vmac1 trace]$ dd if=/dev/asm-diske of=/dev/asm-diskh bs=4096 skip=3 seek=3 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000617397 s, 6.6 MB/s

 

[grid@vmac1 trace]$ dd if=/dev/asm-diske of=/dev/asm-diskh bs=4096 skip=4 seek=4 count=1
kfffde[0].xptr.au: 21 ; 0x4a0: 0x00000015
kfffde[0].xptr.disk: 5 ; 0x4a4: 0x0005
kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk: 58 ; 0x4a7: 0x3a
kfffde[1].xptr.au: 16 ; 0x4a8: 0x00000010

 
[grid@vmac1 trace]$ dd if=/dev/asm-diske of=/dev/asm-diskh bs=1048576 skip=21 seek=21 count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00296742 s, 353 MB/s

 

ORA-15066: offlining disk “DATADG_0005” in group “DATADG” may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26077] [obj_kfbl] [4] [0] [3 != 4]
ORA-15196: invalid ASM block header [kfc.c:26077] [obj_kfbl] [4] [0] [3 != 4]
ERROR: alter diskgroup datadg mount force

 

如何诊断ASMLIB故障

虽然我并不推荐你使用ASMLIB绑定设备名, 详见这篇文章《Why ASMLIB and why not?》 。  但大概因为介绍ASMLIB的文章远多于UDEV的缘故, 导致有大量对RAC安装配置不太熟悉的朋友仍执意采用ASMLIB,又因为ASMLIB的配置不算太简单所以在实际安装RAC之前的ASMLIB实施过程中有不少人遇到了问题, 其次在ASMLIB的使用过程中麻烦也不少。

这里总结了一下ASMLIB的诊断思路, 如下脚本:

 

cat /etc/sysconfig/oracleasm


1) uname -a
2) rpm -qa | grep ^oracleasm
3) rpm -V oracleasmlib
4) multipath -ll

1) output of command line

# rpm -V oracleasm-support

# /etc/init.d/oracleasm scandisks

# /etc/init.d/oracleasm listdisks

# ls -l -R /dev/oracleasm/

# ls -l /etc/sysconfig/oracleasm

# cat /etc/sysconfig/oracleasm

# mount

2) oracleasm log file

/var/log/oracleasm

3) sosreport

By default the "sos" package should be installed into EL4u6 or later.
(If not, please download the sos package from ULN https://linux.oracle.com)

You just need type command "sosreport" as root user, and press "Enter" or "yes" for all the questions.

The sosreport will run for several minutes, according to different system, the running time might be more longer.
Once completed, "sosreport" will generate a compressed sosreport-xx-xx.bz2 file under /tmp. 


[summary]
- confirm system build asm disk on muoltipath devices
- modify /etc/udev/rules.d.90-dm.rules
- currently both nodes could find the asm disk from scandisks and listdisks
- sharon.honor will try installer again, if necessary need get help from application(RAC) team


1. Reboot the box.

2. Run the following commands
#fdisk -l
#multipath -ll
#blkid
#cat /etc/sysconfig/oracleasm
#cat /etc/sysconfig/oracleasm-_dev_oracleasm

#uptime
#/etc/init.d/oracleasm start
#/etc/init.d/oracleasm listdisks

#uptime
#/etc/init.d/oracleasm scandisk

Please also modify the /etc/sysconfig/oracleasm-_dev_oracleasm with below.
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"


The devices that asmlib will scan is controlled in the /etc/sysconfig/oracleasm file 
with the "scanorder" and "scanexclude" parameters.

ORACLE DUL and ASM

amdu

Always good to know that starting from 11g amdu is included in the distribution. Amdu is the asm dump utility. Apart
from dumping asm metadata it also has an -extract option to extract a file from an asm disk group. It is a standalone
utility, if needed you can even use it with 10g.
amdu help=yes prints information on how to use it.

asm 11g

Asm 11g uses variable extents, this is not supported in the 10.2 version of DUL. Ask for for a beta version if you
encounter this.

control.dul for asm

DUL needs to know where the asm disks are. DUL does not do automatic discovery the disks. This is the file I use
for testing:
disk /11gr2/oradata/dsk0
disk /11gr2/oradata/dsk1
disk /11gr2/oradata/dsk2
disk /11gr2/oradata/dsk3
disk /11gr2/oradata/dsk4
disk /11gr2/oradata/dsk5
+DG11G/TST/DATAFILE/SYSTEM.260.740158099
+DG11G/TST/DATAFILE/SYSAUX.261.740158145
+DG11G/TST/DATAFILE/UNDOTBS1.262.740158177
+DG11G/TST/DATAFILE/USERS.264.740158249
+DG11G/TST/TEMPFILE/TEMP.263.740158181
/11gr2/rdbms/dbs/bigfilets
/11gr2/rdbms/dbs/assm.dbf
First list all the asm disk, DUL will find out the diskgroups etc from header inspection, if you did not configure any
datafiles in a diskgroup, DUL tries to list the files it finds, otherwise its silent.
The essential bit of a file name is the diskgroupname and the file number, the combination is used to identify the
datafile in the diskgroup.
to be continued.

Asm Instance Parameter Best Practice

10g/11gR1
Processes = 25 + (10 + [max number of concurrent database file
creations, and file extend operations possible])*n
11g.2.0.3
In 11.2.0.3, the “PROCESSES” parameter will be default to “available CPU cores * 80 + 40” (in the ASM spfile). As the default value for “MEMORY_TARGET” is based on “PROCESSES”, it can be insufficient if there is a large number of CPU cores or large number of diskgroups which could cause issues (i.e. Grid Infrastructure stack fails to stop with ORA-04031 etc) per Bug:13605735 & Bug:12885278, it is recommended to increase the value of MEMORY_MAX_TARGET & MEMORY_TARGET before upgrading/installing to 11.2.0.3 (does not apply to 10g ASM):

 

SQL> alter system set memory_max_target=4096m scope=spfile;
SQL> alter system set memory_target=1536m scope=spfile;

–The number 1536m has proven to be sufficient for most environment, the change will not be effective until next restart.

 

alter system set “_DISABLE_REBALANCE_COMPACT”=true;

 

Exadata上

ASM_POWER_LIMIT=4 (部署时默认值)
ASM 4031

Oracle strongly recommends that you use Automatic Memory Management (AMM) for ASM. Automatic Memory Management, automatically manages the memory-related parameters for ASM instances with the MEMORY_TARGET parameter. AMM is enabled by default on ASM instances, even when the MEMORY_TARGET parameter is not explicitly set. The default value used for MEMORY_TARGET (272 MB) is acceptable for most environments. This is the only parameter that you need to set for complete ASM memory management. You can also increase MEMORY_TARGET dynamically, up to the value of the MEMORY_MAX_TARGET parameter, just as you can for a database instance.

Note: For Linux environments, automatic memory management will not work if /dev/shm is not available or is sized smaller than MEMORY_TARGET. For Enterprise Linux Release 5, /dev/shm is configured to be half the size of the system memory by default. You can adjust this by adding a size option to the entry for /dev/shm in /etc/fstab. For more details, see the man page for the mount command.

Note: The minimum MEMORY_TARGET for ASM is 256 MB in the SPFILE. If you set MEMORY_TARGET to a lower value, Oracle Database increases the value to 256 MB automatically.
If you are not using Automatic Memory Management, then the default value for this parameter is suitable for most environments.
=)> For 32-bit environments 32 MB is the default and minimum requirement for an ASM instance, but 128 MB is recommended.

=)> On 64-bit platforms 88 MB are required for an ASM instance, recommended values is 150 MB.
When you do not use Automatic Memory Management in a database instance, the SGA parameter settings for a database instance may require minor modifications to support ASM. When you use Automatic Memory Management, the sizing data discussed below can be treated as informational only or as supplemental information to help determine the appropriate values that you should use for the SGA. Oracle highly recommends using automatic memory management.
The following are configuration guidelines for Shared Pool sizing on the database instance (when Automatic Memory Management is not used):
SHARED_POOL_SIZE initialization parameter. Aggregate the values from the following queries to obtain the current database storage size that is either on Oracle ASM or stored in Oracle ASM. Next, determine the redundancy type and calculate the SHARED_POOL_SIZE using the aggregated value as input.

 

SELECT SUM(bytes)/(1024*1024*1024) FROM V$DATAFILE;

SELECT SUM(bytes)/(1024*1024*1024) FROM V$LOGFILE a, V$LOG b
WHERE a.group#=b.group#;

SELECT SUM(bytes)/(1024*1024*1024) FROM V$TEMPFILE
WHERE status=’ONLINE’;
o For disk groups using external redundancy, every 100 GB of space needs 1 MB of extra shared pool plus 2 MB.

o For disk groups using normal redundancy, every 50 GB of space needs 1 MB of extra shared pool plus 4 MB.

o For disk groups using high redundancy, every 33 GB of space needs 1 MB of extra shared pool plus 6 MB.
mportant Note:

In 11.2.0.3, we increase the default PROCESSES based on the number of CPU cores, and the default MEMORY_TARGET is based on PROCESSES. If in 11.2.0.2, customers explicitly set MEMORY_TARGET to some value that may not be big enough for 11.2.0.3, when they upgrade to 11.2.0.3, ASM will fail to start with error “memory_target is too small”. We should add additional check for MEMORY_TARGET during the upgrade prerequisite check.

You can unset MEMORY_TARGET so that ASM can use the default value, but if MEMORY_TARGET is explicitly set, please make sure it’s large enough, following the next rules:
1) If PROCESSES parameter is explicitly set:

The MEMORY_TARGET should be set to no less than:

256M + PROCESSES * 132K (64bit)

or

256M + PROCESSES * 120K (32bit)
2) If PROCESSES parameter is not set:

The MEMORY_TARGET should be set to no less than:

256M + (available_cpu_cores * 80 + 40) * 132K (64bit)
or

256M + (available_cpu_cores * 80 + 40) * 120K (32bit)

Oracle内部视图X$KFFXP

X$KFFXP是ASM(Automatic Storage Management)自动存储管理特性的重要内部视图,该视图反应了File Extent Map映射关系,ASM会将文件split成多个多个piece分片,这些分片被称为Extents。 在Disk上存放这些Extent的位置,就是我们常说的”Allocation Unit”。

 

KFF意为Kernel File,X$KFFXP即Kernel File Extent Maps, 该内部视图的一条记录代表一个Extent。

 

其字段含义如下:

 

GROUP_KFFXP        diskgroup number (1 - 63) ASM disk group number. Join with v$asm_disk and v$asm_diskgroup

NUMBER_KFFXP      file number for the extent ASM file number. Join with v$asm_file and v$asm_alias

COMPOUND_KFFXP    (group_kffxp << 24) + file # File identifier. Join with compound_index in v$asm_file

INCARN_KFFXP      file incarnation number File incarnation id. Join with incarnation in v$asm_file

PXN_KFFXP            physical extent number  Extent number per file

XNUM_KFFXP          extent number bit 31 set if indirect Logical extent
number per file (mirrored extents have the same value)

LXN_KFFXP            logical extent number 0,1 used to identify primary/mirror extent,
2 identifies file header allocation unit (hypothesis) used in the query such that
we go after only the primary extents, not secondary extents 

DISK_KFFXP          disk on which AU is located  Disk number where the extent is allocated.
Join with v$asm_disk Relative position of the allocation unit from the beginning of the disk. 

AU_KFFXP              AU number on disk of AU allocation unit size (1 MB) in v$asm_diskgroup

从11g开始加入了CHK_KFFXP SIZE_KFFXP 2个新的字段

CHK_KFFXP   未知 可能是范围为[0-256]的某种校验值

SIZE_KFFXP  size_kffxp is used such that we account for variable sized extents. 
sum(size_kffxp) provides the number of AUs that are on that disk.

 

在实例级别控制ASM Diskgroup AU 和 stripe size的是2个隐藏参数 _asm_ausize 1048576 以及 _asm_stripesize 131072。从11g开始一个Extent可能包含多个AU。

 

可以通过以下脚本查询文件与Extent等ASM属性的映射关系:

 

set linesize 140 pagesize 1400
col "FILE NAME" format a40
set head on
select NAME         "FILE NAME",
       NUMBER_KFFXP "FILE NUMBER",
       XNUM_KFFXP   "EXTENT NUMBER",
       DISK_KFFXP   "DISK NUMBER",
       AU_KFFXP     "AU NUMBER",
       SIZE_KFFXP   "NUMBER of AUs"
  from x$kffxp, v$asm_alias
 where GROUP_KFFXP = GROUP_NUMBER
   and NUMBER_KFFXP = FILE_NUMBER
   and system_created = 'Y'
   and lxn_kffxp = 0
 order by name;

了解AMDU工具生成的MAP文件

AMDU是ORACLE针对ASM开发的源数据转储工具,其全称为ASM Metadata Dump Utility(AMDU), 在《使用AMDU工具从无法MOUNT的DISKGROUP中抽取数据文件》中我们介绍了AMDU抽取数据库文件的方法, 今天我们来介绍AMDU使用DUMP转储模式时生成的MAP文件的含义。

在DUMP模式下AMDU即会生成DISKGROUP的IMAGE镜像文件,也会生成MAP文件:

 

 

[oracle@lab1 oracle.SupportTools]$ ./amdu -diskstring '/dev/asm*' -dump DATA
amdu_2012_09_24_02_14_12/

AMDU-00204: Disk N0002 is in currently mounted diskgroup DATA
AMDU-00201: Disk N0002: '/dev/asm-diskb'

[oracle@lab1 oracle.SupportTools]$ cd amdu_2012_09_24_02_14_12/

[oracle@lab1 amdu_2012_09_24_02_14_12]$ head -10 DATA.map 

N0002 D0000 R00 A00000000 F00000000 I0 E00000000 U00 C00256 S0001 B0000000000  
N0002 D0000 R00 A00000001 F00000000 I0 E00000000 U00 C00256 S0001 B0001048576  
N0002 D0000 R00 A00000002 F00000001 I0 E00000000 U00 C00256 S0001 B0002097152  
N0002 D0000 R00 A00000003 F00000002 I0 E00000000 U00 C00256 S0001 B0003145728  
N0002 D0000 R00 A00000004 F00000003 I0 E00000000 U00 C00256 S0001 B0004194304  
N0002 D0000 R00 A00000005 F00000003 I0 E00000002 U00 C00256 S0001 B0005242880  
N0002 D0000 R00 A00000006 F00000003 I0 E00000004 U00 C00256 S0001 B0006291456  
N0002 D0000 R00 A00000007 F00000003 I0 E00000006 U00 C00256 S0001 B0007340032  
N0002 D0000 R00 A00000008 F00000003 I0 E00000008 U00 C00256 S0001 B0008388608  
N0002 D0000 R00 A00000009 F00000003 I0 E00000010 U00 C00256 S0001 B0009437184

 

 

AMDU的MAP文件是ASCII编码的文件,其内容描述了对应某个DISKGROUP的镜像文件中的数据。 AMDU针对每一个DISKGROUP创建一个map文件,一个MAP文件对应一组image file镜像文件。 map文件中的每一行对应已转储到image file镜像文件中的allocation unit AU。 存在这样的可能,即一个AU虽然实际没有数据被写入到image file镜像中,但实际在map文件中却又对应的记录。 map文件中的每一行均有着相同的字段和字段长度。 map中的行依据镜像文件中数据的顺序而排序, 同时也包含对应到镜像文件中数据的绝对位置,以便通过多种排序来跟踪AU。

 

下面要介绍的字段将出现在map文件中的每一行中。这些字段以空格分隔,每一个字段均以一个字母开头并紧跟着多位数字,数字以0作为前导。下面将介绍每个字段的含义:

Disk Report Number(Nxxxx):例如N0002 ,每个被AMDU探测到的ASM DISK都会被分配一个disk report number。这个数字也被写入在AMDU报告文件中(report file),以及该DISK的其他信息。 存在这种可能,同一个diskgroup中被探测到的2个磁盘存在同样的DISK NUMBER,此时这2个磁盘将被分配不同的disk report number。

Disk Number(Dxxxx):例如D0000; 这个字段是从ASM DISK header中抽取到的disk number。若抽取到的disk number无效或者磁盘头部无法被识别则置为9999。

Disk Repeat(Rxx): 例如R00; 一般总是为0,仅在AMDU识别大量同一个DISKGROUP下同样disk number的时候可能被增加。

Allocation Unit(AU  Axxxxxxxx):例如A00000000。 指数据存放在ASM DISK中的AU位置。若ASM DISK超过100 TB& AU是一兆的情况,则该字段会溢出8位。

 

FILE Number(Fxxxxxxxx): 例如F00000000, 指该DISKGROUP中的拥有该盘区的ASM FILE Number文件号。 若该数字小于256则为ASM源数据或ASM注册信息。 如果是物理地址源数据则其FILE NUMBER为00000000。

 

Indirect flag(Ix):例如I0。 若该盘区属于某个文件则为0, 否则为1

 

Extent Number(Exxxxxxxx),例如E00000000; 文件中的物理extent号。此是FILE EXTENT MAP中的索引,数据库实例使用该字段定位AU。若文件被2路镜像,则Extent Number为偶数的是primary extent,奇数的是secondary copy。对于物理地址源数据,则是该源数据内的extent号。

 

AU within extent(Uxx):例如U00 对于大文件可以使用大盘区。

 

Block count(Cxxxxx):例如C00256, 为从AU中拷贝到镜像文件中的块总数, 一般来说都是4k大小的block

 

Image File Sequence Number(Sxxxx):S0001 由于DUMP单个的镜像文件不超过2GB,所以该字段对应转储的image file号。

 

Byte Offset in Image File(Bxxxxxxxxxx):例如B0001048576, 对应为镜像文件中该块的位置

 

Corrupt Block Flag(X0),若该AU中存在坏块,则改行以X结尾,一般来说该字段总是空格。

 

沪ICP备14014813号-2

沪公网安备 31010802001379号