在《GoldenGate实现Live Standby主备库切换(1)》中我们介绍了如何针对GoldenGate Live standby环境执行计划内的Switchover切换。除去计划内的主备切换,实际生产中更多的故障切换发生在主机故障或主库不可用的情况下,这种情况下一般我们已经无法在Primary上停止应用及extract了;当我们在这样的情况下failover到Standby上后如同在DataGuard环境下一样即便Primary上的数据库恢复了我们也无法直接进行回切了,需要做的是重新配置Primary上的OGG并将Standby上的数据以initial load的形式还原回去,在数据重新同步后才能再切换到Primary上。下面我们就来介绍如何在计划外的情况下从主库failover到备库,并尝试回切:
1. 使用lag replicat命令了解standby上的replicat的延迟情况,若返回"At EOF (end of file)"则说明replicat已应用所有trail中的数据到备库上。 GGSCI (rh3.oracle.com) 1> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING EXTRACT STOPPED EXTSTD2 00:00:00 23:42:47 EXTRACT STOPPED PUMPSTD2 00:00:00 23:41:29 REPLICAT RUNNING REPSTD1 00:00:00 00:00:00 GGSCI (rh3.oracle.com) 5> lag replicat repstd1 Sending GETLAG request to REPLICAT REPSTD1 ... Last record lag: 5 seconds. At EOF, no more records to process. 2. 停止standby上的replicat GGSCI (rh3.oracle.com) 6> stop replicat repstd1 Sending STOP request to REPLICAT REPSTD1 ... Request processed. 3. 在standby上执行必要的赋予DML权限,启动triggers触发器和cascade delete约束的脚本 4. 启动standby上的extract, 在此之前先确认Standby上的data pump group不被启动,以保证trail文件堆积在standby上 GGSCI (rh3.oracle.com) 15> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING EXTRACT STOPPED EXTSTD2 00:00:00 24:04:16 EXTRACT STOPPED PUMPSTD2 00:00:00 24:02:57 REPLICAT STOPPED REPSTD1 00:00:00 00:00:06 GGSCI (rh3.oracle.com) 16> start extstd2 Sending START request to MANAGER ... EXTRACT EXTSTD2 starting 5. 此时可以将应用切换到standby上了 ============================================================================== 以上步骤完成了故障切换到Standby的过程,接下来我们尝试将应用还原到primary上 1.如果原primary主机已损毁则需要重装Oracle软件,并重建Primary系统上的Goldengate软件目录 2.从primary端启动GGSCI命令 3.删除primary上相关的extract及EXTTRAIL,并重建 GGSCI (rh2.oracle.com) 6> delete extract extstd1 Deleted EXTRACT EXTSTD1. GGSCI (rh2.oracle.com) 7> delete exttrail /d01/ext/cl GGSCI (rh2.oracle.com) 14> add extract extstd1,tranlog,begin now EXTRACT added. GGSCI (rh2.oracle.com) 15> add exttrail /d01/ext/cl,megabytes 100,extract extstd1 EXTTRAIL added. 4. 在primary上启动Manager GGSCI (rh2.oracle.com) 18> start Manager Manager started. 5. 接着在primary上执行disable trigger触发器和cascade delete约束的脚本 6. 在standby上对执行热备份(逻辑,物理的均可);并记录该热备的结束时间 7. 使用standby上的热备份来完成primary上的initial load后,再以HANDLECOLLISIONS选项启动Standby上的replicat GGSCI (rh2.oracle.com) 22> view params repstd2 -- Identify the Replicat group: REPLICAT repstd2 -- State that source and target definitions are identical: ASSUMETARGETDEFS -- Specify database login information as needed for the database: userid maclean, password maclean HANDLECOLLISIONS -- Specify tables for delivery: MAP clinic.*, TARGET clinic.*; -- Exclude specific tables from delivery if needed: -- MAPEXCLUDEGGSCI (rh2.oracle.com) 23> start replicat repstd2 Sending START request to MANAGER ... REPLICAT REPSTD2 starting 8.并启动standby上的data pump group,将堆积的trail文件传输到Primary上 GGSCI (rh3.oracle.com) 19> start pumpstd2 Sending START request to MANAGER ... EXTRACT PUMPSTD2 starting 9.使用info replicat观察primary上的replicat,观察其进度是否已晚于完成初始化导出的时间 10. 禁用primary上目前使用的HANDLECOLLISIONS选项 GGSCI (rh2.oracle.com) 26> send replicat repstd2,NOHANDLECOLLISIONS 11. 关闭之前切换到Standby上的一切应用 12. 12.若需要进行数据验证则关闭Standby上的extract、pump及Primary上的replicat: GGSCI (rh2.oracle.com) 31> lag replicat repstd2 Sending GETLAG request to REPLICAT REPSTD2 ... Last record lag: 3 seconds. At EOF, no more records to process. GGSCI (rh3.oracle.com) 28> stop extstd2 Sending STOP request to EXTRACT EXTSTD2 ... Request processed. GGSCI (rh3.oracle.com) 26> stop pumpstd2 Sending STOP request to EXTRACT PUMPSTD2 ... Request processed. GGSCI (rh2.oracle.com) 34> stop replicat repstd2 Sending STOP request to REPLICAT REPSTD2 ... Request processed. /* 使用Oracle GoldenGate Veridata等工具验证数据一致性, 若不一致则修复 */ standby库上: SQL> select sum(t2) from tv; SUM(T2) ---------- 5355944997 primary库上: SQL> select sum(t2) from tv; SUM(T2) ---------- 5355944997 13.在primary系统上赋予应用相关DML权限,启用触发器及删除约束 14. 修改primary系统上的extract group的begin time为当前,启动Primary到Standby的extract、pump及replicat GGSCI (rh2.oracle.com) 36> alter extstd1 ,begin now EXTRACT altered. GGSCI (rh2.oracle.com) 52> start extract extstd1 Sending START request to MANAGER ... EXTRACT EXTSTD1 starting GGSCI (rh2.oracle.com) 53> start extract pumpstd1 Sending START request to MANAGER ... EXTRACT PUMPSTD1 starting GGSCI (rh3.oracle.com) 3> start repstd1 Sending START request to MANAGER ... REPLICAT REPSTD1 starting 此时系统切换回原始的primary->standby状态. That's great!