首先来了解下PST quorum 是什么意思:
如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!
诗檀软件专业数据库修复团队
服务热线 : 13764045638 QQ号:47079569 邮箱:service@parnassusdata.com
Partner and Status Table
一般来说aun=1 是保留给Partner and Status Table(PST)的拷贝使用的。 一般5个ASM DISK将包含一份PST拷贝。多数的PST内容必须相同且验证有效。否则无法判断哪些ASM DISK实际拥有相关数据。
在 PST中每一条记录对应Diskgroup中的一个ASM DISK。每一条记录会对一个ASM disk枚举其partners的ASM DISK。同时会有一个flag来表示该DISK是否是ONLINE可读写的。这些信息对recovery是否能做很重要。
PST表的Blkn=0是PST的header,存放了如下的信息:
- Timestamp to indicate PST is valid
- Version number to compare with other PST copies
- List of disks containing PST copies
- Bit map for shadow paging updates
PST的最后一个块是heartbeat block,当diskgroup mount时其每3秒心跳更新一次。
以下为PST header
kfed read /oracleasm/asm-disk01 aun=1 blkn=0 aus=4194304 |less kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 17 ; 0x002: KFBTYP_PST_META kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 1024 ; 0x004: blk=1024 kfbh.block.obj: 2147483648 ; 0x008: disk=0 kfbh.check: 3813974007 ; 0x00c: 0xe3549ff7 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdpHdrPairBv1.first.super.time.hi:32999670 ; 0x000: HOUR=0x16 DAYS=0x7 MNTH=0x2 YEAR=0x7de kfdpHdrPairBv1.first.super.time.lo:1788841984 ; 0x004: USEC=0x0 MSEC=0x3e4 SECS=0x29 MINS=0x1a kfdpHdrPairBv1.first.super.last: 2 ; 0x008: 0x00000002 kfdpHdrPairBv1.first.super.next: 2 ; 0x00c: 0x00000002 kfdpHdrPairBv1.first.super.copyCnt: 5 ; 0x010: 0x05 kfdpHdrPairBv1.first.super.version: 1 ; 0x011: 0x01 kfdpHdrPairBv1.first.super.ub2spare: 0 ; 0x012: 0x0000 kfdpHdrPairBv1.first.super.incarn: 1 ; 0x014: 0x00000001 kfdpHdrPairBv1.first.super.copy[0]: 0 ; 0x018: 0x0000 kfdpHdrPairBv1.first.super.copy[1]: 1 ; 0x01a: 0x0001 kfdpHdrPairBv1.first.super.copy[2]: 2 ; 0x01c: 0x0002 kfdpHdrPairBv1.first.super.copy[3]: 3 ; 0x01e: 0x0003 kfdpHdrPairBv1.first.super.copy[4]: 4 ; 0x020: 0x0004 kfdpHdrPairBv1.first.super.dtaSz: 15 ; 0x022: 0x000f kfdpHdrPairBv1.first.asmCompat:186646528 ; 0x024: 0x0b200000 kfdpHdrPairBv1.first.newCopy[0]: 0 ; 0x028: 0x0000 kfdpHdrPairBv1.first.newCopy[1]: 0 ; 0x02a: 0x0000 kfdpHdrPairBv1.first.newCopy[2]: 0 ; 0x02c: 0x0000 kfdpHdrPairBv1.first.newCopy[3]: 0 ; 0x02e: 0x0000 kfdpHdrPairBv1.first.newCopy[4]: 0 ; 0x030: 0x0000 kfdpHdrPairBv1.first.newCopyCnt: 0 ; 0x032: 0x00 kfdpHdrPairBv1.first.contType: 1 ; 0x033: 0x01 kfdpHdrPairBv1.first.spares[0]: 0 ; 0x034: 0x00000000 kfdpHdrPairBv1.first.spares[1]: 0 ; 0x038: 0x00000000 kfdpHdrPairBv1.first.spares[2]: 0 ; 0x03c: 0x00000000 kfdpHdrPairBv1.first.spares[3]: 0 ; 0x040: 0x00000000 kfdpHdrPairBv1.first.spares[4]: 0 ; 0x044: 0x00000000 kfdpHdrPairBv1.first.spares[5]: 0 ; 0x048: 0x00000000 kfdpHdrPairBv1.first.spares[6]: 0 ; 0x04c: 0x00000000 kfdpHdrPairBv1.first.spares[7]: 0 ; 0x050: 0x00000000 kfdpHdrPairBv1.first.spares[8]: 0 ; 0x054: 0x00000000 kfdpHdrPairBv1.first.spares[9]: 0 ; 0x058: 0x00000000 kfdpHdrPairBv1.first.spares[10]: 0 ; 0x05c: 0x00000000 kfdpHdrPairBv1.first.spares[11]: 0 ; 0x060: 0x00000000 kfdpHdrPairBv1.first.spares[12]: 0 ; 0x064: 0x00000000 kfdpHdrPairBv1.first.spares[13]: 0 ; 0x068: 0x00000000 kfdpHdrPairBv1.first.spares[14]: 0 ; 0x06c: 0x00000000 kfdpHdrPairBv1.first.spares[15]: 0 ; 0x070: 0x00000000 kfdpHdrPairBv1.first.spares[16]: 0 ; 0x074: 0x00000000 kfdpHdrPairBv1.first.spares[17]: 0 ; 0x078: 0x00000000 kfdpHdrPairBv1.first.spares[18]: 0 ; 0x07c: 0x00000000 kfdpHdrPairBv1.first.spares[19]: 0 ; 0x080: 0x00000000
- super.time wall clock time of last PST commit
- super.last last committed content version number
- super.next next available content version number
- super.copyCnt # of disks holding PST copies
- super.version version of PST header format
- super.ub2spare pad to ub4 align
- super.incarn incarnation of <copy> list
- super.copy[0] disks holding the PST copies
- super.dtaSz data entries in PST
- newCopy[0] new disks holding PST copies
- newCopyCnt new # disks holding PST copies
以下为PST table block:
kfed read /oracleasm/asm-disk02 aun=1 blkn=3 aus=4194304 |less kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 18 ; 0x002: KFBTYP_PST_DTA kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 1027 ; 0x004: blk=1027 kfbh.block.obj: 2147483649 ; 0x008: disk=1 kfbh.check: 4204644293 ; 0x00c: 0xfa9dc7c5 kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdpDtaEv1[0].status: 127 ; 0x000: I=1 V=1 V=1 P=1 P=1 A=1 D=1 kfdpDtaEv1[0].fgNum: 1 ; 0x002: 0x0001 kfdpDtaEv1[0].addTs: 2022663849 ; 0x004: 0x788f66a9 kfdpDtaEv1[0].partner[0]: 49154 ; 0x008: P=1 P=1 PART=0x2 kfdpDtaEv1[0].partner[1]: 49153 ; 0x00a: P=1 P=1 PART=0x1 kfdpDtaEv1[0].partner[2]: 49155 ; 0x00c: P=1 P=1 PART=0x3 kfdpDtaEv1[0].partner[3]: 49166 ; 0x00e: P=1 P=1 PART=0xe kfdpDtaEv1[0].partner[4]: 49165 ; 0x010: P=1 P=1 PART=0xd kfdpDtaEv1[0].partner[5]: 49164 ; 0x012: P=1 P=1 PART=0xc kfdpDtaEv1[0].partner[6]: 49156 ; 0x014: P=1 P=1 PART=0x4 kfdpDtaEv1[0].partner[7]: 49163 ; 0x016: P=1 P=1 PART=0xb kfdpDtaEv1[0].partner[8]: 10000 ; 0x018: P=0 P=0 PART=0x2710 kfdpDtaEv1[0].partner[9]: 0 ; 0x01a: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[10]: 0 ; 0x01c: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[11]: 0 ; 0x01e: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[12]: 0 ; 0x020: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[13]: 0 ; 0x022: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[14]: 0 ; 0x024: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[15]: 0 ; 0x026: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[16]: 0 ; 0x028: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[17]: 0 ; 0x02a: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[18]: 0 ; 0x02c: P=0 P=0 PART=0x0 kfdpDtaEv1[0].partner[19]: 0 ; 0x02e: P=0 P=0 PART=0x0 kfdpDtaEv1[1].status: 127 ; 0x030: I=1 V=1 V=1 P=1 P=1 A=1 D=1 kfdpDtaEv1[1].fgNum: 2 ; 0x032: 0x0002 kfdpDtaEv1[1].addTs: 2022663849 ; 0x034: 0x788f66a9 kfdpDtaEv1[1].partner[0]: 49155 ; 0x038: P=1 P=1 PART=0x3 kfdpDtaEv1[1].partner[1]: 49152 ; 0x03a: P=1 P=1 PART=0x0 kfdpDtaEv1[1].partner[2]: 49154 ; 0x03c: P=1 P=1 PART=0x2 kfdpDtaEv1[1].partner[3]: 49166 ; 0x03e: P=1 P=1 PART=0xe kfdpDtaEv1[1].partner[4]: 49157 ; 0x040: P=1 P=1 PART=0x5 kfdpDtaEv1[1].partner[5]: 49156 ; 0x042: P=1 P=1 PART=0x4 kfdpDtaEv1[1].partner[6]: 49165 ; 0x044: P=1 P=1 PART=0xd kfdpDtaEv1[1].partner[7]: 49164 ; 0x046: P=1 P=1 PART=0xc kfdpDtaEv1[1].partner[8]: 10000 ; 0x048: P=0 P=0 PART=0x2710 kfdpDtaEv1[1].partner[9]: 0 ; 0x04a: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[10]: 0 ; 0x04c: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[11]: 0 ; 0x04e: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[12]: 0 ; 0x050: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[13]: 0 ; 0x052: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[14]: 0 ; 0x054: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[15]: 0 ; 0x056: P=0 P=0 PART=0x0 kfdpDtaEv1[1].partner[16]: 0 ; 0x058: P=0 P=0 PART=0x0
- kfdpDtaEv1[0].status: 127 ; 0×000: I=1 V=1 V=1 P=1 P=1 A=1 D=1 disk status
- fgNum fail group number
- addTs timestamp of the addition to the diskgroup
- kfdpDtaEv1[0].partner[0]: 49154 ; 0×008: P=1 P=1 PART=0×2 partner list
AUN=1 的最后一个block为KFBTYP_HBEAT 心跳表:
[oracle@mlab2 hzy]$ kfed read /oracleasm/asm-disk02 aun=1 blkn=1023 aus=4194304 |less kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 19 ; 0x002: KFBTYP_HBEAT kfbh.datfmt: 2 ; 0x003: 0x02 kfbh.block.blk: 2047 ; 0x004: blk=2047 kfbh.block.obj: 2147483649 ; 0x008: disk=1 kfbh.check: 1479766671 ; 0x00c: 0x5833728f kfbh.fcn.base: 0 ; 0x010: 0x00000000 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000 kfbh.spare2: 0 ; 0x01c: 0x00000000 kfdpHbeatB.instance: 1 ; 0x000: 0x00000001 kfdpHbeatB.ts.hi: 32999734 ; 0x004: HOUR=0x16 DAYS=0x9 MNTH=0x2 YEAR=0x7de kfdpHbeatB.ts.lo: 3968041984 ; 0x008: USEC=0x0 MSEC=0xe1 SECS=0x8 MINS=0x3b kfdpHbeatB.rnd[0]: 1065296177 ; 0x00c: 0x3f7f2131 kfdpHbeatB.rnd[1]: 857037208 ; 0x010: 0x33155998 kfdpHbeatB.rnd[2]: 2779184235 ; 0x014: 0xa5a6fc6b kfdpHbeatB.rnd[3]: 2660793989 ; 0x018: 0x9e987e85
- kfdpHbeatB.instance instance id
- kfdpHbeatB.ts.hi timestamp
- kfdpHbeatB.rnd[0] 随机加盐
- External Redundancy一般有一个PST
- Normal Redundancy至多有个3个PST
- High Redundancy 至多有5个PST
如下场景中PST 可能被重定位:
- 存有PST的ASM DISK不可用了(当ASM启东时)
- ASM DISK OFFLINE了
- 当对PST的读写发生了I/O错误
- disk被正常DROP了
- 在读取其他ASM metadata之前会先检查PST
- 当ASM实例被要求mount diskgroup时,GMON进程会读取diskgroup中所有磁盘去找到和确认PST拷贝
- 如果他发现有足够的PST,那么会mount diskgroup
- 之后,PST会被缓存在ASM缓存中,以及GMON的PGA中并使用排他的PT.n.0锁保护
- 同集群中的其他ASM实例也将缓存PST到GMON的PGA,并使用共享PT.n.o锁保护
- 仅仅那个持有排他锁的GMON能更新磁盘上的PST信息
- 每一个ASM DISK上的AUN=1均为PST保留,但只有几个磁盘上真的有PST数据
如果出现diskgroup 无法mount的错误,且alert.log中出现如下信息,则可能是丢失了必要数量的PST了:
Wed Jan 01 21:34:37 IST 2014
SQL> ALTER DISKGROUP ALL MOUNT
Wed Jan 01 21:34:38 IST 2014
NOTE: cache registered group DATA number=1 incarn=0x1c58c060
Wed Jan 01 21:34:38 IST 2014
ERROR: no PST quorum in group 1: required 2, found 0 >>>>>>>>>>>>>>>>> HERE
Wed Jan 01 21:34:38 IST 2014
NOTE: cache dismounting group 1/0x1C58C060 (DATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Wed Jan 01 22:37:51 IST 2014
该PST quorum丢失的问题常由以下几个原因导致:
- ASM DISK丢失
- ASM DISK corrupted损坏
- 部分ASM DISK的AUN=1 PST部分损坏,或者被数据不完整
- 不当的ASM_DISKSTRING参数设置
- 不当的ASM DISK权限设置
对于该no PST quorum问题的常见对策:
- 重建diskgroup
- 手动修复PST(十分复杂)
必要的诊断数据收集如下:
1. The complete ASM alertfile (please not only a part). That will help me to understand the history. 2. Have you used multipath devices (e.g. using Linux Device Mapper) as base device for creating your asmlib disks ? Or have you just used single path devices ('sd' devices) for your asmlib disks ? 3. Which oracle user's do you have created in the affected environment ? Please show all oracle users together them with their groups. 4. Please upload a spoolfile with the output from the next commands commands: $> cat /etc/*release $> uname -a $> rpm -qa |grep oracleasm $> df -ha $> ls -l /dev/oracleasm/disks/* $> /etc/init.d/oracleasm status $> /usr/sbin/oracleasm-discover $> /usr/sbin/oracleasm-discover 'ORCL:*' $> /etc/init.d/oracleasm scandisks $> /etc/init.d/oracleasm listdisks $> /etc/init.d/oracleasm querydisk $> ls -ltr $ORACLE_HOME/bin/oracle $> id -a oracle $> cat /etc/group 5. Upload the next files: => /var/log/messages => /etc/sysconfig/oracleasm /etc/sysconfig/oracleasm-_dev_oracleasm 6. Provide a spoolfile with the next 'kfod' output: $GRID_HOME/bin/kfod asm_diskstring='ORCL:*' disks=all $GRID_HOME/bin/kfod disks=all $GRID_HOME/bin/kfod asm_diskstring='/dev/oracleasm/disks/*' disks=all 7. Provide anather spoolfile with the output of the next commands (to run as GRID user): $GRID_HOME/crsctl check crs $GRID_HOME/crsctl stat res -t $GRID_HOME/crsctl stat res -t -init 8. To see the ASM_discovery string both in the ASM instance spfile and in the CRS profile (gpnp) provide the next informations: a) Via ASMCMD: $ ./asmcmd dsget b) Via GPNPTOOL: $ ./gpnptool get c) $ cat $ORACLE_HOME/gpnp/*/profiles/peer/profile.xml
Comment