system01.dbf损坏的oracle数据库恢复
诗檀软件工程师于2014 年11 月26 日通过修正文件中的坏块。 成功从原数据库中救出数据文件,并操作诗檀公司救援工具PRM-DUL 从数据文件中抽取出客户业务数据。所得数据已经倒入到临时数据库ORCL1 中,以方便用户查询验证。
由于数据文件内部的坏块量较少,因此保证了90%以上的业务数据可成功抽取,完成任务。
现场ORCL 数据库采用11.2.0.1.0 版本单节点, 数据存储使用普通文件存储模式,其操作平台为Windows Server。
当时数据库情况
1. Alert Log 历史追溯
通过追溯日志发现,于2014 年1 月5 日已出现坏块读取错误。
Sun Jan 05 06:00:48 2014 GATHER_STATS_JOB encountered errors. Check the trace file. Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j003_192.trc: ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 2: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX01.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。
然后这种坏块现象也蔓延到System 表空间文件 SYSTEM01.DBF
Mon Feb 17 02:00:00 2014 Clearing Resource Manager plan via parameter Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_dbrm_3016.trc: ORA-00604: error occurred at recursive SQL level 1 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 1:'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM01.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 Unable to restore resource manager plan to '': ORA-02097: parameter cannot be modified because specified value is invalid ORA-00604: error occurred at recursive SQL level 1 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 1: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM0 1.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败,
并开始出现内部错误:
Mon Feb 24 11:37:36 2014 Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0xD4A0000] [PC:0x6D17E20, _kgghash()+686] Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora _5432.trc (incident=9762): ORA-07445: 出现异常错误: 核心转储 [kgghash()+686] [ACCESS_VIOLATION] [ADDR:0xD4A0000] [PC:0x6D17E20] [UNABLE_TO_READ] [] Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdi r_9762\orcl_ora_5432_i9762.trc
之后此类错误陆续出现在REDO log 以及TEMP 文件,甚至在用户业务文件.ORA 中也出现了坏块。
Sat May 24 08:27:46 2014 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m0 00_5072.trc (incident=9763): ORA-00603: ORACLE server session terminated by fatal error ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 3: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTB S01.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 3: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTB S01.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 3: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\UNDOTB S01.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 2: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSAUX0 1.DBF' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Erro Incident details in: d:\app\administrator\diag\rdbms\orcl\orcl\incident\incdi r_9763\orcl_m000_5072_i9763.trc opidrv aborting process M000 ospid (5072) as a result of ORA-603 Thu Sep 11 22:02:02 2014 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_j00 0_4392.trc: ORA-12012: error on auto execute of job 128120 ORA-01115: IO error reading block from file (block # ) ORA-01110: data file 6: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\BJ_ZCGL. ORA' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 ORA-06512: at "SYS.DBMS_ADVISOR", line 201 ORA-06512: at "SYS.DBMS_SPACE", line 1798 ORA-06512: at "SYS.DBMS_SPACE", line 1871
由于最终坏块蔓延到控制文件,最终导致宕机。
Thu Oct 09 21:49:52 2014 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m0 00_4676.trc: ORA-00202: control file: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO L01.CTL' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_m0 00_4676.trc: ORA-00204: error in reading (block 22, # blocks 1) of control file ORA-00202: control file: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO L01.CTL' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 Thu Oct 09 21:54:01 2014 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_lgw r_3036.trc: ORA-00202: control file: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO L01.CTL' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_lgw r_3036.trc: ORA-00204: error in reading (block 22, # blocks 1) of control file ORA-00202: control file: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO L01.CTL' ORA-27070: async read/write failed OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 LGWR (ospid: 3036): terminating the instance due to error 204
之后客户及第三方服务机构曾多次尝试打开数据库失败。由于数据库文件坏块已基本蔓延至所有数据文件。一旦数据库奔
溃后,即便修复了控制文件(据日志观察,10 月31 日对控制文件进行过修复),由于系统文件无法通过自检,数据库打开失败
也就成了必然事件。
Wed Nov 19 11:37:05 2014 SMON: Restarting fast_start parallel rollback Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora _3568.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-01115: 从文件 读取块时出现 IO 错误 (块 # ) ORA-01110: 数据文件 1: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM0 1.DBF' ORA-27070: 异步读取/写入失败 OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 Errors in file d:\app\administrator\diag\rdbms\orcl\orcl\trace\orcl_ora _3568.trc: ORA-00604: 递归 SQL 级别 1 出现错误 ORA-01115: 从文件 读取块时出现 IO 错误 (块 # ) ORA-01110: 数据文件 1: 'G:\APP\ADMINISTRATOR\ORADATA\ORCL\SYSTEM0 1.DBF' ORA-27070: 异步读取/写入失败 OSD-04006: ReadFile() 失败, 无法读取文件 O/S-Error: (OS 23) 数据错误(循环冗余检查)。 Error 604 happened during db open, shutting down database USER (ospid: 3568): terminating the instance due to error 604
用户曾试图将相关数据文件copy 到其他无坏块盘,但由于无法完成坏块读取,导致操作失败。
数据库配置
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options. Using parameter settings in server-side spfile D:\APP\ADMINISTRATOR\PRODUCT.2.0\DBHOME_ 1\DATABASE\SPFILEORCL.ORA System parameters with non-default values: processes = 150 memory_target = 1232M control_files = "G:\APP\ADMINISTRATOR\ORADATA\ORCL\CONTRO L01.CTL" control_files = "D:\APP\ADMINISTRATOR\FLASH_RECOVERY_AREA\ ORCL\CONTROL02.CTL" db_block_size = 8192 compatible = "11.2.0.0.0" db_recovery_file_dest = "D:\app\Administrator\flash_recovery_area" db_recovery_file_dest_size= 3852M undo_tablespace = "UNDOTBS1" remote_login_passwordfile= "EXCLUSIVE" db_domain = "" dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)" audit_file_dest = "D:\APP\ADMINISTRATOR\ADMIN\ORCL\ADUMP" audit_trail = "DB" db_name = "orcl" open_cursors = 300 diagnostic_dest = "D:\APP\ADMINISTRATOR".
下图为数据库Mount 后,数据文件情况:
数据库救援处理
诗檀工程师在分析其数据库情况后,按以下步骤进行数据库数据救援操作:
1. 首先使用文件修复拷贝工具将所有数据库文件copy 至I 盘(磁盘为用户提供的无问题盘)。
System01.dbf
工程师使用PRM-DUL 工具打开了对应数据库文件:
1) 打开prm.bat
2) 使用字典模式并倒入数据文件:
SYSTEM01.DBF
SYSAUX01.DBF
BJ_ZCGL.ORA
EXAMPLE01.DBF
USERS01.DBF
3)通过观察发现需要导出Schema 下的数据
4. 使用PRM-DUL DataBridge 方式将用户业务数据抽取并成功倒入到临时数据库ORCL1 中。
1. 由于坏块使得重新将客户系统恢复已无意义。建议第三方重新部署软件相应数据库应用设置并将救援出的客户数据重新导入
2. 用户的数据库无备份,无归档。并缺少相应数据库专业人员进行维护及日志检查。数据库出现问题在初期未能及时解决,使得坏块扩散,最终造成客户损失。
建议:开启备份和日志归档,派遣数据库相关人员做定期健康检查。
Comment