Slide:Upgrade 11.2.0.1 RAC DB/RDBMS to 11.2.0.2 in Linux By Maclean

Recreate failovered primary database using Flashback Database

很多朋友一直对DataGuard的fast-start failover持怀疑的态度;通过observer的观察,(Fast-Start Failover)FSFO提供了一种在primary数据库不可用情况下自动故障切换到standby数据库的能力。造成很多朋友不愿意使用FSFO的原因之一是故障切换后不得不重建原primary数据库,不过如果我们能配合使用10g中的闪回数据库特性的话,这一重建数据库的工作便也可以一劳永逸了。

/* 启用fast_start failover的前提之一是在primary和standby库上都启用flashback database */

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

/* 当前的primary库情况 */

SQL> select open_mode ,database_role,flashback_on from v$database;
OPEN_MODE            DATABASE_ROLE    FLASHBACK_ON
-------------------- ---------------- ------------------
READ WRITE           PRIMARY          YES


SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';

Session altered.

SQL> select OLDEST_FLASHBACK_TIME  from v$flashback_database_log;

OLDEST_FLASHBACK_TI
-------------------
2011-02-19 22:40:39

/* 当前的standby库情况 */
SQL> select open_mode ,database_role,flashback_on from v$database;

OPEN_MODE            DATABASE_ROLE    FLASHBACK_ON
-------------------- ---------------- ------------------
READ ONLY WITH APPLY PHYSICAL STANDBY YES

SQL> alter session set nls_date_format='YYYY-MM-DD hh24:mi:ss';

Session altered.
SQL> select OLDEST_FLASHBACK_TIME  from v$flashback_database_log;

OLDEST_FLASHBACK_TI
-------------------
2011-02-19 22:34:56


DGMGRL> edit configuration set property faststartfailoverthreshold=10;
Property "faststartfailoverthreshold" updated

DGMGRL> show configuration verbose;

Configuration - pro

  Protection Mode: MaxAvailability
  Databases:
    sbdb - Primary database
    prod - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

/* 另开一个终端启动observer */

DGMGRL> start observer
Observer started

DGMGRL> show fast_start failover

Fast-Start Failover: DISABLED

  Threshold:        10 seconds
  Target:           (none)
  Observer:         rh3.oracle.com
  Lag Limit:        30 seconds
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Offline               YES

  Oracle Error Conditions:
    (none)


/* 启动快速故障切换  */

DGMGRL> enable fast_start failover
Enabled.

/* 记录实际切换前的时间 */

[maclean@rh3 ~]$ date
Sat Feb 19 23:35:44 CST 2011

[maclean@rh2 ~]$ ps -ef|grep pmon|grep -v grep
maclean  25165     1  0 22:58 ?        00:00:00 ora_pmon_SBDB
maclean  26080     1  0 23:04 ?        00:00:00 ora_pmon_PROD

[maclean@rh2 ~]$ kill -9 25165

/* 通过kill pmon进程造成主库crash,引发fast-start failover */

/* observer观察到primary库意外终止后,实施了快速故障切换 */

23:37:27.92  Saturday, February 19, 2011
Initiating Fast-Start Failover to database "prod"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "prod"
23:37:33.69  Saturday, February 19, 2011
23:39:17.98  Saturday, February 19, 2011
Initiating reinstatement for database "sbdb"...
Reinstating database "sbdb", please wait...
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "sbdb" failed
23:39:26.30  Saturday, February 19, 2011

23:40:05.00  Saturday, February 19, 2011
Initiating reinstatement for database "sbdb"...
Reinstating database "sbdb", please wait...
Error: ORA-16653: failed to reinstate database

Failed.
Reinstatement of database "sbdb" failed
23:40:09.24  Saturday, February 19, 2011

/* 接下来我们通过闪回数据库来将失败切换后的primary库flashback到faliover之前以便重用*/

SQL> startup mount;

SQL> flashback database to timestamp to_timestamp('2011-02-19 23:30:44','YYYY-MM-DD hh24:mi:ss');
flashback database to timestamp to_timestamp('2011-02-19 23:35:44','YYYY-MM-DD hh24:mi:ss')
*
ERROR at line 1:
ORA-38754: FLASHBACK DATABASE not started; required redo log is not available
ORA-38762: redo logs needed for SCN 1143987 to SCN End-of-Redo
ORA-38761: redo log sequence 10 in thread 1, incarnation 3 could not be
accessed

/* 缺少最近的归档日志而无法完成闪回,需要到目前的primary库中查找 */

SQL> select name from v$archived_log where FIRST_CHANGE#<1143987 and NEXT_CHANGE#>1143987;

NAME
--------------------------------------------------------------------------------
/s01/fast_recovery_area/PROD/archivelog/2011_02_19/o1_mf_1_10_6ozpzh9c_.arc

SQL> alter database register physical logfile '/s01/fast_recovery_area/PROD/archivelog/2011_02_19/o1_mf_1_10_6ozpzh9c_.arc';
Database altered.


SQL> flashback database to timestamp to_timestamp('2011-02-19 23:30:44','YYYY-MM-DD hh24:mi:ss');
Flashback complete.

/* 成功闪回后,observer将主动去尝试reinstate这个目前可用的standby库,如以下日志*/

23:46:20.16  Saturday, February 19, 2011
Initiating reinstatement for database "sbdb"...
Reinstating database "sbdb", please wait...
Operation requires shutdown of instance "SBDB" on database "sbdb"
Shutting down instance "SBDB"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "SBDB" on database "sbdb"
Starting instance "SBDB"...
ORACLE instance started.
Database mounted.
Continuing to reinstate database "sbdb" ...
Reinstatement of database "sbdb" succeeded
23:47:13.49  Saturday, February 19, 2011

DGMGRL> show configuration verbose

Configuration - pro

  Protection Mode: MaxAvailability
  Databases:
    prod - Primary database
    sbdb - (*) Physical standby database

  (*) Fast-Start Failover target

Fast-Start Failover: ENABLED

  Threshold:        10 seconds
  Target:           sbdb
  Observer:         rh3.oracle.com
  Lag Limit:        30 seconds (not in use)
  Shutdown Primary: TRUE
  Auto-reinstate:   TRUE

Configuration Status:
SUCCESS

/*可以通过show configuration命令监控到当前primary与standby库都处于可用状态,
   且fast_start failover也为启用状态 */

Oracle数据库升级前必要的准备工作

Oracle数据库升级向来是一门纷繁复杂的工程,DBA需要为产品数据库的升级耗费大量时间精力在准备工作上;因为其升级复杂度高,所以即便做了较为充分的准备仍可能在升级过程中遇到意想不到的问题,为了更高效地完成升级任务和减少停机时间,我们有必要为升级工作营造一种”舒适的”防御式的数据库”氛围”:

1.为了保障升级后的数据库性能,我们有必要在升级前有效地收集数据库的性能统计信息,以便升级后若发生性能问题可以做出对比:

  • 为了保证性能统计信息真实有效,有必要在数据库升级前的一个月即开展收集工作
  • 收集的性能统计信息应当尽可能的精确真实
  • 在Oracle 8i/9i中使用Statspack性能报表,将快照级别设置为6或更高,设置快照间隔为30分钟,在具体升级前将perfstat用户使用exp工具导出,参考Metalink文档Note:466350.1介绍了若何对比升级前后的Statspack快照
  • 在Oracle 10g/11g中使用AWR自动负载仓库性能报告,保证采集30天左右的快照,快照间隔最好为30-60分钟;之后可以使用dbms_swrf_internal.awr_extract存储过程将AWR导出到dumpfile文件,在升级完成后载入这部分AWR信息,并可以使用DBMS_WORKLOAD_REPOSITORY.AWR_DIFF_REPORT_HTML函数对比升级前后的性能

2.正式升级前的防御性措施:

  • 过多的审计信息可能会导致升级速度下降,可以在升级前将审计数据导出,并清理审计字典基表:
截断SYS.AUD$基表:
SQL>TRUNCATE TABLE SYS.AUD$;
  • 同样的有必要清理10g后出现的回收站:
清理DBA回收站:
SQL>purge DBA_RECYCLEBIN;
  • 移除一些”过期”的参数,设置这些参数的原因很有可能是为了修正原版本上的一些问题,例如我们都会做的设置event参数;但在新版本中这些参数是否仍有必要设置是一个值得讨论的问题,当然你完全可以就此事去提交一个SR:
这些"过期"参数可能包括:过老的如optimizer_features_enable=8.1.7.4,_always_semi_join=off,_unnest_subquery=false
或者event = "10061 trace name context forever, level 10",如此之类等等。
  • 为数据库中的数据字典收集统计信息:
在Oracle 9i中可以执行以下过程收集数据字典统计信息,
SQL> exec DBMS_STATS.GATHER_SCHEMA_STATS
     ('SYS', options => 'GATHER',estimate_percent =>
      DBMS_STATS.AUTO_SAMPLE_SIZE, method_opt => 'FOR
      ALL COLUMNS SIZE AUTO', cascade => TRUE);

在Oracle10g/11g中收集字典统计信息可以由GATHER_DICTIONARY_STATS存储过程来完成:
SQL> exec DBMS_STATS.GATHER_DICTIONARY_STATS;
  • 为策万全,我们有必要为回退数据库升级任务做好准备,10g以前只能通过备份恢复来完成,10g以后我们可以利用闪回数据库的还原点特性来回退数据库,但需要注意以下几点:
    • 利用还原点要求数据库处于归档且打开flashback database的模式下
    • 在特性仅在版本10.2之后可用
    • 必须保证闪回回复区flashback recovery area有足够的磁盘空间
    • 注意在升级后不要立即修改compatible参数,restore point无法跨越compatible工作
/* 首先我们在正式升级前创建一个有效的保证闪回数据库的还原点 */

SQL> create restore point pre11gupgrd guarantee flashback database;
Restore point created.

/* 确认以上4个注意后,我们可以大胆放心地实施升级工作了 */
SQL> shutdown immediate;
..............
SQL> @?/rdbms/admin/catupgrd.sql
.............
upgrade failed

/* 在升级过程中出现了不可绕过的错误时,我们可能不得不回退数据库到还原点,也就是升级前*/

/* 关闭实例后,还原环境到10g下 */

SQL> startup mount;

/* 正式闪回到还原点pre11gupgrd */
SQL> flashback database to restore point pre11gupgrd;
Flashback complete.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

SQL>  alter database open resetlogs;

/* 以resetlogs打开数据库 */

/* 之后有必要删除这一个还原点 */
SQL> select * from v$restore_point;

       SCN DATABASE_INCARNATION# GUA STORAGE_SIZE
---------- --------------------- --- ------------
TIME
---------------------------------------------------------------------------
NAME
--------------------------------------------------------------------------------
   5081633                     3 YES     15941632
08-FEB-11 08.20.33.000000000 PM
PRE11GUPGRD

SQL> drop restore point pre11gupgrd;
Restore point dropped.
  • 下载最新版本的预升级检查脚本(pre-upgrade check script),如utlu102i.sql / utlu111i.sql / utlu112i.sql;Metalink文档Note:884522.1 <How to Download and Run Oracle’s Database Pre-Upgrade Utility> 指出了各版本utluxxx脚本的下载地址
/* 将升级信息spool到日志文件中 */
SQL> SPOOL /tmp/UPGRADE/utlu112i.log
SQL> @/tmp/UPGRADE/utlu112i.sql
  • 需要关注SYS和SYSTEM用户模式下的失效对象,有必要在升级前修复所有的失效对象:
SELECT UNIQUE object_name, object_type, owner
  FROM dba_objects
 WHERE status = 'INVALID';
  • 在升级完成后推荐执行utlrp.sql脚本以重新编译(Recompile)对象,从11.1.0.7开始升级前后的失效对象将自动对比,执行?/rdbms/admin/utluiobj.sql脚本可以列出对比信息,同时基表registry$sys_inv_objs和registry$nonsys_inv_objs分别列出了数据库中失效的sys或非sys对象:
SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> @?/rdbms/admin/utluiobj.sql
.
Oracle Database 11.1 Post-Upgrade Invalid Objects Tool 02-08-2011 22:23:22
.
This tool lists post-upgrade invalid objects that were not invalid
prior to upgrade (it ignores pre-existing pre-upgrade invalid objects).
.
Owner                     Object Name                     Object Type
.
SH            FWEEK_PSCAT_SALES_MV               MATERIALIZED VIEW

PL/SQL procedure successfully completed.

3.解决升级过程中失效的组件(component)

  • 确保该部分组件确实被link到目前的Oracle软件2进制可执行文件或库文件中
  • 如果确认不会用到某些组件(component),想要通过手动彻底移除这部分组件(亦或者希望reinstall重新安装这部分组件),那么可以参考以下文档:
Note:472937.1 Information On Installed Database Components/Schemas
Note.300056.1 Debug and Validate Invalid Objects
Note:753041.1 How to diagnose Components with NON VALID status
Note.733667.1 How to Determine if XDB is Being Used in the Database?

组件升级失败实例1:数据库从10.2升级到11.2,在10g的环境中Database Vault组件已经安装,
Database Vault组件在升级relink前被turned off,在升级到11.2的过程中XDB组件升级失败;
其原因在于安装或切换Database Vault将使得XDB组件失效,或者由Bug 8942758引起。
解决方案是在升级前执行utlrp.sql脚本重新编译失效对象和组件,在此例中执行utlrp.sql可以使XDB组件valid.

组件升级失败实例2:数据库从10.2.0.4升级到11.1.0.7,在升级过程中"ORACLE SERVER"组件失效;
其原因在于DMBS_SQLPA包引用了某个不存在的列,该问题可以参考metalink文档782735.1和Notes:605317.1/736353.1。
有效的解决方案是:
1.在升级前将SYS.PLAN_TABLE$基表或者同义词PUBLIC.PLAN_TABLE DROP掉
2.若已执行了升级操作并遭遇了该问题,那么可以使用以下手段修复该问题:
@catplan.sql -- recreate the plan table
@dbmsxpln.sql -- reload dbms_xplan spec
@prvtxpln.plb -- reload dbms_xplan implementation
@prvtspao.plb -- reload dbms_sqlpa
alter package SYS.DBMS_SUMADVISOR compile ;
alter package SYS.DBMS_SUMADVISOR compile body;

4. 使用例如AIX上的slibclean等命令清理操作系统环境,在少数专有平台上不清理载入的共享库文件可能导致升级失败

5.在执行catupgrd.sql脚本正式升级前打开sqlplus的echo输出,将升级过程中所有的输出信息转储到日志文件中:

SQL> set echo on

SQL> SPOOL /tmp/upgrade.log

SQL> @catupgrd.sql
SQL> spool off

DBUA图形化升级工具默认使用spool和”echo”输出,这些日志可以在$ORACLE_HOME/cfgtoollogs/dbua//upgrade/目录下找到。

沪ICP备14014813号-2

沪公网安备 31010802001379号