Recreate failovered primary database using Flashback Database
很多朋友一直对DataGuard的fast-start failover持怀疑的态度;通过observer的观察,(Fast-Start Failover)FSFO提供了一种在primary数据库不可用情况下自动故障切换到standby数据库的能力。造成很多朋友不愿意使用FSFO的原因之一是故障切换后不得不重建原primary数据库,不过如果我们能配合使用10g中的闪回数据库特性的话,这一重建数据库的工作便也可以一劳永逸了。
/* 启用fast_start failover的前提之一是在primary和standby库上都启用flashback database */ SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for Linux: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production /* 当前的primary库情况 */ SQL> select open_mode ,database_role,flashback_on from v$database; OPEN_MODE DATABASE_ROLE FLASHBACK_ON -------------------- ---------------- ------------------ READ WRITE PRIMARY YES SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss'; Session altered. SQL> select OLDEST_FLASHBACK_TIME from v$flashback_database_log; OLDEST_FLASHBACK_TI ------------------- 2011-02-19 22:40:39 /* 当前的standby库情况 */ SQL> select open_mode ,database_role,flashback_on from v$database; OPEN_MODE DATABASE_ROLE FLASHBACK_ON -------------------- ---------------- ------------------ READ ONLY WITH APPLY PHYSICAL STANDBY YES SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss'; Session altered. SQL> select OLDEST_FLASHBACK_TIME from v$flashback_database_log; OLDEST_FLASHBACK_TI ------------------- 2011-02-19 22:34:56 DGMGRL> edit configuration set property faststartfailoverthreshold=10; Property "faststartfailoverthreshold" updated DGMGRL> show configuration verbose; Configuration - pro Protection Mode: MaxAvailability Databases: sbdb - Primary database prod - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS /* 另开一个终端启动observer */ DGMGRL> start observer Observer started DGMGRL> show fast_start failover Fast-Start Failover: DISABLED Threshold: 10 seconds Target: (none) Observer: rh3.oracle.com Lag Limit: 30 seconds Shutdown Primary: TRUE Auto-reinstate: TRUE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Offline YES Oracle Error Conditions: (none) /* 启动快速故障切换 */ DGMGRL> enable fast_start failover Enabled. /* 记录实际切换前的时间 */ [maclean@rh3 ~]$ date Sat Feb 19 23:35:44 CST 2011 [maclean@rh2 ~]$ ps -ef|grep pmon|grep -v grep maclean 25165 1 0 22:58 ? 00:00:00 ora_pmon_SBDB maclean 26080 1 0 23:04 ? 00:00:00 ora_pmon_PROD [maclean@rh2 ~]$ kill -9 25165 /* 通过kill pmon进程造成主库crash,引发fast-start failover */ /* observer观察到primary库意外终止后,实施了快速故障切换 */ 23:37:27.92 Saturday, February 19, 2011 Initiating Fast-Start Failover to database "prod"... Performing failover NOW, please wait... Failover succeeded, new primary is "prod" 23:37:33.69 Saturday, February 19, 2011 23:39:17.98 Saturday, February 19, 2011 Initiating reinstatement for database "sbdb"... Reinstating database "sbdb", please wait... Error: ORA-16653: failed to reinstate database Failed. Reinstatement of database "sbdb" failed 23:39:26.30 Saturday, February 19, 2011 23:40:05.00 Saturday, February 19, 2011 Initiating reinstatement for database "sbdb"... Reinstating database "sbdb", please wait... Error: ORA-16653: failed to reinstate database Failed. Reinstatement of database "sbdb" failed 23:40:09.24 Saturday, February 19, 2011 /* 接下来我们通过闪回数据库来将失败切换后的primary库flashback到faliover之前以便重用*/ SQL> startup mount; SQL> flashback database to timestamp to_timestamp('2011-02-19 23:30:44','YYYY-MM-DD hh24:mi:ss'); flashback database to timestamp to_timestamp('2011-02-19 23:35:44','YYYY-MM-DD hh24:mi:ss') * ERROR at line 1: ORA-38754: FLASHBACK DATABASE not started; required redo log is not available ORA-38762: redo logs needed for SCN 1143987 to SCN End-of-Redo ORA-38761: redo log sequence 10 in thread 1, incarnation 3 could not be accessed /* 缺少最近的归档日志而无法完成闪回,需要到目前的primary库中查找 */ SQL> select name from v$archived_log where FIRST_CHANGE#<1143987 and NEXT_CHANGE#>1143987; NAME -------------------------------------------------------------------------------- /s01/fast_recovery_area/PROD/archivelog/2011_02_19/o1_mf_1_10_6ozpzh9c_.arc SQL> alter database register physical logfile '/s01/fast_recovery_area/PROD/archivelog/2011_02_19/o1_mf_1_10_6ozpzh9c_.arc'; Database altered. SQL> flashback database to timestamp to_timestamp('2011-02-19 23:30:44','YYYY-MM-DD hh24:mi:ss'); Flashback complete. /* 成功闪回后,observer将主动去尝试reinstate这个目前可用的standby库,如以下日志*/ 23:46:20.16 Saturday, February 19, 2011 Initiating reinstatement for database "sbdb"... Reinstating database "sbdb", please wait... Operation requires shutdown of instance "SBDB" on database "sbdb" Shutting down instance "SBDB"... ORA-01109: database not open Database dismounted. ORACLE instance shut down. Operation requires startup of instance "SBDB" on database "sbdb" Starting instance "SBDB"... ORACLE instance started. Database mounted. Continuing to reinstate database "sbdb" ... Reinstatement of database "sbdb" succeeded 23:47:13.49 Saturday, February 19, 2011 DGMGRL> show configuration verbose Configuration - pro Protection Mode: MaxAvailability Databases: prod - Primary database sbdb - (*) Physical standby database (*) Fast-Start Failover target Fast-Start Failover: ENABLED Threshold: 10 seconds Target: sbdb Observer: rh3.oracle.com Lag Limit: 30 seconds (not in use) Shutdown Primary: TRUE Auto-reinstate: TRUE Configuration Status: SUCCESS /*可以通过show configuration命令监控到当前primary与standby库都处于可用状态, 且fast_start failover也为启用状态 */
Oracle数据库升级前必要的准备工作
Oracle数据库升级向来是一门纷繁复杂的工程,DBA需要为产品数据库的升级耗费大量时间精力在准备工作上;因为其升级复杂度高,所以即便做了较为充分的准备仍可能在升级过程中遇到意想不到的问题,为了更高效地完成升级任务和减少停机时间,我们有必要为升级工作营造一种”舒适的”防御式的数据库”氛围”:
1.为了保障升级后的数据库性能,我们有必要在升级前有效地收集数据库的性能统计信息,以便升级后若发生性能问题可以做出对比:
- 为了保证性能统计信息真实有效,有必要在数据库升级前的一个月即开展收集工作
- 收集的性能统计信息应当尽可能的精确真实
- 在Oracle 8i/9i中使用Statspack性能报表,将快照级别设置为6或更高,设置快照间隔为30分钟,在具体升级前将perfstat用户使用exp工具导出,参考Metalink文档Note:466350.1介绍了若何对比升级前后的Statspack快照
- 在Oracle 10g/11g中使用AWR自动负载仓库性能报告,保证采集30天左右的快照,快照间隔最好为30-60分钟;之后可以使用dbms_swrf_internal.awr_extract存储过程将AWR导出到dumpfile文件,在升级完成后载入这部分AWR信息,并可以使用DBMS_WORKLOAD_REPOSITORY.AWR_DIFF_REPORT_HTML函数对比升级前后的性能
2.正式升级前的防御性措施:
- 过多的审计信息可能会导致升级速度下降,可以在升级前将审计数据导出,并清理审计字典基表:
截断SYS.AUD$基表: SQL>TRUNCATE TABLE SYS.AUD$;
- 同样的有必要清理10g后出现的回收站:
清理DBA回收站: SQL>purge DBA_RECYCLEBIN;
- 移除一些”过期”的参数,设置这些参数的原因很有可能是为了修正原版本上的一些问题,例如我们都会做的设置event参数;但在新版本中这些参数是否仍有必要设置是一个值得讨论的问题,当然你完全可以就此事去提交一个SR:
这些"过期"参数可能包括:过老的如optimizer_features_enable=8.1.7.4,_always_semi_join=off,_unnest_subquery=false 或者event = "10061 trace name context forever, level 10",如此之类等等。
- 为数据库中的数据字典收集统计信息:
在Oracle 9i中可以执行以下过程收集数据字典统计信息, SQL> exec DBMS_STATS.GATHER_SCHEMA_STATS ('SYS', options => 'GATHER',estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR ALL COLUMNS SIZE AUTO', cascade => TRUE); 在Oracle10g/11g中收集字典统计信息可以由GATHER_DICTIONARY_STATS存储过程来完成: SQL> exec DBMS_STATS.GATHER_DICTIONARY_STATS;
- 为策万全,我们有必要为回退数据库升级任务做好准备,10g以前只能通过备份恢复来完成,10g以后我们可以利用闪回数据库的还原点特性来回退数据库,但需要注意以下几点:
- 利用还原点要求数据库处于归档且打开flashback database的模式下
- 在特性仅在版本10.2之后可用
- 必须保证闪回回复区flashback recovery area有足够的磁盘空间
- 注意在升级后不要立即修改compatible参数,restore point无法跨越compatible工作
/* 首先我们在正式升级前创建一个有效的保证闪回数据库的还原点 */ SQL> create restore point pre11gupgrd guarantee flashback database; Restore point created. /* 确认以上4个注意后,我们可以大胆放心地实施升级工作了 */ SQL> shutdown immediate; .............. SQL> @?/rdbms/admin/catupgrd.sql ............. upgrade failed /* 在升级过程中出现了不可绕过的错误时,我们可能不得不回退数据库到还原点,也就是升级前*/ /* 关闭实例后,还原环境到10g下 */ SQL> startup mount; /* 正式闪回到还原点pre11gupgrd */ SQL> flashback database to restore point pre11gupgrd; Flashback complete. SQL> alter database open; alter database open * ERROR at line 1: ORA-01589: must use RESETLOGS or NORESETLOGS option for database open SQL> alter database open resetlogs; /* 以resetlogs打开数据库 */ /* 之后有必要删除这一个还原点 */ SQL> select * from v$restore_point; SCN DATABASE_INCARNATION# GUA STORAGE_SIZE ---------- --------------------- --- ------------ TIME --------------------------------------------------------------------------- NAME -------------------------------------------------------------------------------- 5081633 3 YES 15941632 08-FEB-11 08.20.33.000000000 PM PRE11GUPGRD SQL> drop restore point pre11gupgrd; Restore point dropped.
- 下载最新版本的预升级检查脚本(pre-upgrade check script),如utlu102i.sql / utlu111i.sql / utlu112i.sql;Metalink文档Note:884522.1 <How to Download and Run Oracle’s Database Pre-Upgrade Utility> 指出了各版本utluxxx脚本的下载地址
/* 将升级信息spool到日志文件中 */ SQL> SPOOL /tmp/UPGRADE/utlu112i.log SQL> @/tmp/UPGRADE/utlu112i.sql
- 需要关注SYS和SYSTEM用户模式下的失效对象,有必要在升级前修复所有的失效对象:
SELECT UNIQUE object_name, object_type, owner FROM dba_objects WHERE status = 'INVALID';
- 在升级完成后推荐执行utlrp.sql脚本以重新编译(Recompile)对象,从11.1.0.7开始升级前后的失效对象将自动对比,执行?/rdbms/admin/utluiobj.sql脚本可以列出对比信息,同时基表registry$sys_inv_objs和registry$nonsys_inv_objs分别列出了数据库中失效的sys或非sys对象:
SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production PL/SQL Release 11.2.0.1.0 - Production CORE 11.2.0.1.0 Production TNS for Linux: Version 11.2.0.1.0 - Production NLSRTL Version 11.2.0.1.0 - Production SQL> @?/rdbms/admin/utluiobj.sql . Oracle Database 11.1 Post-Upgrade Invalid Objects Tool 02-08-2011 22:23:22 . This tool lists post-upgrade invalid objects that were not invalid prior to upgrade (it ignores pre-existing pre-upgrade invalid objects). . Owner Object Name Object Type . SH FWEEK_PSCAT_SALES_MV MATERIALIZED VIEW PL/SQL procedure successfully completed.
3.解决升级过程中失效的组件(component)
- 确保该部分组件确实被link到目前的Oracle软件2进制可执行文件或库文件中
- 如果确认不会用到某些组件(component),想要通过手动彻底移除这部分组件(亦或者希望reinstall重新安装这部分组件),那么可以参考以下文档:
Note:472937.1 Information On Installed Database Components/Schemas Note.300056.1 Debug and Validate Invalid Objects Note:753041.1 How to diagnose Components with NON VALID status Note.733667.1 How to Determine if XDB is Being Used in the Database? 组件升级失败实例1:数据库从10.2升级到11.2,在10g的环境中Database Vault组件已经安装, Database Vault组件在升级relink前被turned off,在升级到11.2的过程中XDB组件升级失败; 其原因在于安装或切换Database Vault将使得XDB组件失效,或者由Bug 8942758引起。 解决方案是在升级前执行utlrp.sql脚本重新编译失效对象和组件,在此例中执行utlrp.sql可以使XDB组件valid. 组件升级失败实例2:数据库从10.2.0.4升级到11.1.0.7,在升级过程中"ORACLE SERVER"组件失效; 其原因在于DMBS_SQLPA包引用了某个不存在的列,该问题可以参考metalink文档782735.1和Notes:605317.1/736353.1。 有效的解决方案是: 1.在升级前将SYS.PLAN_TABLE$基表或者同义词PUBLIC.PLAN_TABLE DROP掉 2.若已执行了升级操作并遭遇了该问题,那么可以使用以下手段修复该问题: @catplan.sql -- recreate the plan table @dbmsxpln.sql -- reload dbms_xplan spec @prvtxpln.plb -- reload dbms_xplan implementation @prvtspao.plb -- reload dbms_sqlpa alter package SYS.DBMS_SUMADVISOR compile ; alter package SYS.DBMS_SUMADVISOR compile body;
4. 使用例如AIX上的slibclean等命令清理操作系统环境,在少数专有平台上不清理载入的共享库文件可能导致升级失败
5.在执行catupgrd.sql脚本正式升级前打开sqlplus的echo输出,将升级过程中所有的输出信息转储到日志文件中:
SQL> set echo on SQL> SPOOL /tmp/upgrade.log SQL> @catupgrd.sql SQL> spool off
DBUA图形化升级工具默认使用spool和”echo”输出,这些日志可以在$ORACLE_HOME/cfgtoollogs/dbua//upgrade/目录下找到。