system01.dbf损坏的oracle数据库恢复

system01.dbf损坏的oracle数据库恢复

诗檀软件工程师于2014 年11 月26 日通过修正文件中的坏块。 成功从原数据库中救出数据文件,并操作诗檀公司救援工具PRM-DUL 从数据文件中抽取出客户业务数据。所得数据已经倒入到临时数据库ORCL1 中,以方便用户查询验证。

由于数据文件内部的坏块量较少,因此保证了90%以上的业务数据可成功抽取,完成任务。

现场ORCL 数据库采用11.2.0.1.0 版本单节点, 数据存储使用普通文件存储模式,其操作平台为Windows Server。

当时数据库情况
1. Alert Log 历史追溯
通过追溯日志发现,于2014 年1 月5 日已出现坏块读取错误。

Sun Jan 05 06:00:48 2014
GATHER_STATS_JOB encountered errors. Check the trace file.
Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j003_192.trc:
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 2:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。

然后这种坏块现象也蔓延到System 表空间文件 SYSTEM01.DBF

Mon Feb 17 02:00:00 2014
Clearing Resource Manager plan via parameter Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbrm_3016.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 1:'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Unable to restore resource manager plan to '':
ORA-02097: parameter cannot be modified because specified
value is invalid
ORA-00604: error occurred at recursive SQL level 1
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 1:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM0
1.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败,

并开始出现内部错误:

Mon Feb 24 11:37:36 2014
Exception [type: ACCESS_VIOLATION,
UNABLE_TO_READ] [ADDR:0xD4A0000] [PC:0x6D17E20,
_kgghash()+686]
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora
_5432.trc (incident=9762):
ORA-07445: 出现异常错误: 核心转储 [kgghash()+686]
[ACCESS_VIOLATION] [ADDR:0xD4A0000] [PC:0x6D17E20]
[UNABLE_TO_READ] []
Incident details in:
d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdi
r_9762\orcl_ora_5432_i9762.trc

之后此类错误陆续出现在REDO log 以及TEMP 文件,甚至在用户业务文件.ORA 中也出现了坏块。

Sat May 24 08:27:46 2014
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m0
00_5072.trc (incident=9763):
ORA-00603: ORACLE server session terminated by fatal error

ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 3:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTB
S01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 3:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTB
S01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 3:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTB
S01.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 2:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX0
1.DBF'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Erro
Incident details in:
d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdi
r_9763\orcl_m000_5072_i9763.trc
opidrv aborting process M000 ospid (5072) as a result of
ORA-603
Thu Sep 11 22:02:02 2014
Errors in file

d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j00
0_4392.trc:
ORA-12012: error on auto execute of job 128120
ORA-01115: IO error reading block from file (block # )
ORA-01110: data file 6:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\BJ_ZCGL.
ORA'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
ORA-06512: at "SYS.DBMS_ADVISOR", line 201
ORA-06512: at "SYS.DBMS_SPACE", line 1798
ORA-06512: at "SYS.DBMS_SPACE", line 1871

由于最终坏块蔓延到控制文件,最终导致宕机。

Thu Oct 09 21:49:52 2014
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m0
00_4676.trc:
ORA-00202: control file:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO
L01.CTL'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m0
00_4676.trc:
ORA-00204: error in reading (block 22, # blocks 1) of control
file
ORA-00202: control file:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO
L01.CTL'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Thu Oct 09 21:54:01 2014
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_lgw
r_3036.trc:
ORA-00202: control file:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO
L01.CTL'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_lgw
r_3036.trc:
ORA-00204: error in reading (block 22, # blocks 1) of control
file
ORA-00202: control file:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO
L01.CTL'
ORA-27070: async read/write failed
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
LGWR (ospid: 3036): terminating the instance due to error 204

 

之后客户及第三方服务机构曾多次尝试打开数据库失败。由于数据库文件坏块已基本蔓延至所有数据文件。一旦数据库奔
溃后,即便修复了控制文件(据日志观察,10 月31 日对控制文件进行过修复),由于系统文件无法通过自检,数据库打开失败
也就成了必然事件。

 

 

Wed Nov 19 11:37:05 2014
SMON: Restarting fast_start parallel rollback
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora
_3568.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01115: 从文件 读取块时出现 IO 错误 (块 # )
ORA-01110: 数据文件 1:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM0
1.DBF'
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Errors in file
d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora
_3568.trc:
ORA-00604: 递归 SQL 级别 1 出现错误
ORA-01115: 从文件 读取块时出现 IO 错误 (块 # )
ORA-01110: 数据文件 1:
'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM0
1.DBF'
ORA-27070: 异步读取/写入失败
OSD-04006: ReadFile() 失败, 无法读取文件
O/S-Error: (OS 23) 数据错误(循环冗余检查)。
Error 604 happened during db open, shutting down database
USER (ospid: 3568): terminating the instance due to error 604

用户曾试图将相关数据文件copy 到其他无坏块盘,但由于无法完成坏块读取,导致操作失败。

数据库配置

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 -
Production
With the Partitioning, OLAP, Data Mining and Real
Application Testing options.
Using parameter settings in server-side spfile
D:\APP\ADMINISTRATOR\PRODUCT.2.0\DBHOME_
1\DATABASE\SPFILEORCL.ORA
System parameters with non-default values:
processes = 150
memory_target = 1232M
control_files =
"G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO
L01.CTL"
control_files =
"D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\
ORCL\CONTROL02.CTL"
db_block_size = 8192
compatible = "11.2.0.0.0"
db_recovery_file_dest =
"D:\app\Administrator\flash_recovery_area"
db_recovery_file_dest_size= 3852M
undo_tablespace = "UNDOTBS1"
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP)
(SERVICE=orclXDB)"
audit_file_dest =
"D:\APP\ADMINISTRATOR\ADMIN\ORCL\ADUMP"
audit_trail = "DB"
db_name = "orcl"
open_cursors = 300
diagnostic_dest = "D:\APP\ADMINISTRATOR".

下图为数据库Mount 后,数据文件情况:

oradata

数据库救援处理

诗檀工程师在分析其数据库情况后,按以下步骤进行数据库数据救援操作:

1. 首先使用文件修复拷贝工具将所有数据库文件copy 至I 盘(磁盘为用户提供的无问题盘)。

System01.dbf

oradata

工程师使用PRM-DUL 工具打开了对应数据库文件:

1) 打开prm.bat
2) 使用字典模式并倒入数据文件:
SYSTEM01.DBF
SYSAUX01.DBF
BJ_ZCGL.ORA
EXAMPLE01.DBF
USERS01.DBF
3)通过观察发现需要导出Schema 下的数据

 

4. 使用PRM-DUL DataBridge 方式将用户业务数据抽取并成功倒入到临时数据库ORCL1 中。

QQ截图20160321140302

 

1. 由于坏块使得重新将客户系统恢复已无意义。建议第三方重新部署软件相应数据库应用设置并将救援出的客户数据重新导入

2. 用户的数据库无备份,无归档。并缺少相应数据库专业人员进行维护及日志检查。数据库出现问题在初期未能及时解决,使得坏块扩散,最终造成客户损失。

建议:开启备份和日志归档,派遣数据库相关人员做定期健康检查。

 

Comment

*

沪ICP备14014813号-2

沪公网安备 31010802001379号