How GoldenGate process consumes memory

Question:
We are using Golden Gate to replicate the data from Oracle 9.2.0.8 on Solaris8 SPARC 64 bit (GoldenGate Version 10.4.0.31 Build 001) to Oracle RAC 11.2.0.1 on Solaris10 SPARC 64bit (GoldenGate Version 10.4.0.19 Build 002). Both GoldenGate Extract and Replicat process are working fine. Please refer below information for more easy to understand our goldengate setup.

Extract Side | Replicat Side
Hostname: HK8SN020 | Hostname: HK8SP226 (HK8SP227 dont have any goldengate client, all goldengate process are located on HK8SP226)
Oracle 9.2.0.8 (32bit binary) | Oracle 11.2.0.1 (64bit binary)
Solaris8 Sparc 64bit Kernel | Solaris10 Sparc 64bit kernel
GoldenGate Version 10.4.0.31 Build 001 | GoldenGate Version 10.4.0.19 Build 002

However, on 27-Mar-2010, we found the server memory utilization on Solaris10 HK8SP226 are unexpected continuously rising since around 01:30. At around 3:20, the server memory utilization are up to 100%. At around 5:20, the server memory utilization is suddenly drop and back to normal. We compared the “Sar -r” and Solaris server message logfile. We found that on 05:21:44, goldengate relicate process are terminated with error message ” malloc 2097152 bytes failed”. After that, seem the server memory are suddenly released and back to normal memory utilization level.
We suspected the abnormal server memory usage are cased by goldengate replicate process. Can you please help to investigate and find out the root cause?

Answer:
GoldenGate replicates only committed transactions, it stores the operations of each transaction in a managed virtual-memory pool known as a cache until it receives either a commit or a rollback for that transaction. One global cache operates as a shared resource of an Extract process. The following sub-pools of virtual memory are allocated from the global cache:(1)One sub-pool per log reader thread for most transaction row data. and (2)One sub-pool for BLOB data and possibly other large items.

Within each sub-pool, individual buffers are allocated from the global cache, each one containing information that is relative to a transaction that is being processed by GoldenGate. The sizes of the initial and incremental buffers are controlled by the CACHEBUFFERSIZE option of CACHEMGR.

The actual amount of physical memory that is used by any GoldenGate process is controlled by the operating system, not the GoldenGate process. The global cache size is controlled by the CACHESIZE option of CACHEMGR.Cache manager keeps a GoldenGate process working within the soft limit of its global cache size, only allocating virtual memory (not physical memory) on demand. The actual amount of physical memory that is used by any GoldenGate process is controlled by the operating system, not the GoldenGate program.

GoldenGate cache manager only takes advantage of the memory management functions of the operating system to ensure that GoldenGate processes work in a sustained and efficient manner. Within cache, OGG makes use of all the modern “virtual memory” techniques by allocating and managing active buffers efficiently and recycling old buffers instead of paging to disk, when possible and paging less-used information to disk, when necessary.

When COM initializes, by default it first determines how much virtual memory the OS has available for it and uses that to determine what CACHESIZE should be. Default for CACHESIZE is 8GB for 64-bit systems and 2GB for 32-bit systems.

The available virtual memory is reported with the PROCESS VM AVAIL FROM OS value in the report file. The CACHESIZE value will either be rejected or sized down if it is larger than, or sufficiently close to, the amount of virtual memory that is available to the process.

The CACHESIZE value will always be a power of two, rounded down from the value of PROCESS VM AVAIL FROM OS, unless the latter is itself a power of two, in which case it is halved. After the specified size is consumed by data, the memory manager will try to free up memory by paging data to disk or by reusing aged buffers, before requesting more memory from the system.

The memory manager generates statistics that can be viewed with the SEND EXTRACT or SEND REPLICAT command when used with the CACHEMANAGER option.The statistics show the size of the memory pool, the paging frequency, the size of the transactions, and other information that creates a system profile. Based on this profile, you might need to make adjustments to the memory cache if you see performance problems that appear to be related to file caching. The first step is to modify the CACHESIZE and CACHEPAGEOUTSIZE parameters. You might need to use a higher or lower cache size, a higher or lower page size, or a combination of both, based on the size and type of transactions that are being generated. You might also need to adjust the initial memory allocation with the CACHEBUFFERSIZE option. It is possible, however, that operating system constraints could limit the effect of modifying any components of the CACHEMGR parameter. In particular, if the operating system has a small per-process virtual memory limit, it will force more file caching, regardless of the CACHEMGR configuration.

Once the CACHESIZE is set to 1 GB, the GoldenGate process will use up to 1 GB virtual memory and then it will use swap space on disk.
If the CACHESIZE is explicitly set in process parameter file then the CACHEMGR will use only 1GB. Otherwise it will default to the Memory size depending upon the platform(32 or 64).If a fixed CACHESIZE is set in the parameter file then it will be taken by the process, if no the default will be taken by the process depending upon the platform. If very low virtual memory limit is set or available in the OS then it will force more file caching. There is always a difference between caching in memory buffers and file caching as it involves read and write i/o’s.

So try to set a default CACHESIZE for the GoldenGate Process (Extract/Replicat). Edit the respective source extract and target replicat parameter files and use the below mentioned CACHEMGR parameter with the options given and restart the processes.

CACHEMGR CACHEBUFFERSIZE 64KB, CACHESIZE 1GB, CACHEDIRECTORY
, CACHEDIRECTORY
Example:
CACHEMGR CACHEBUFFERSIZE 64KB, CACHESIZE 1GB, CACHEDIRECTORY /ggs/dirtmp, CACHEDIRECTORY /ggs2/temp

So once the CACHESIZE is set to 1 GB, the GoldenGate process will use up to 1 GB virtual memory only and then after it will use swap space on disk.

Will Goldengate use Remote Procedure Call (RPC)?

Question:
My Customer using Oracle GoldenGate to sync data from 11.2.0.1 RAC (on Solaris 10 SPRAc 64bit) to 9.2.0.8 standalone DB (On solaris Sparc 64bit). My customer recently doing an OS security scan check for node of 11.2.0.1 RAC. The security scan check result saying the RAC node (HK8SP226) has a potential security issue on “rpcstatd: RPC statd remote file creation and removal”. My customer suspect that Goldengate is using RPC for file transfer. Would you please help to check if goldengate need to use RPC or not? if goldengate dont need to use RPC, then i will ask my csutomer to close this function in server

Supplementory information for security check result.

———————————————————————-

hk8sp226 {172.20.104.178} Solaris
H
rpcstatd: RPC statd remote file creation and removal
Remote Procedure Call (RPC) statd maintains state information in cooperation with RPC lockd to provide crash and recovery
functionality for file locking across the Network File System (NFS). Statd does not validate information received from a remote lockd.
By sending to the statd service an RCP or RDIST request including references to the parent directory (“..”), an attacker can provide
false information to the rpc.statd file, allowing the creation of a file in an arbitrary directory on the host. This can be used to overwrite
pre-existing files or create new files on the host.

Answer:
Oracle GoldenGate transfers trail files over TCP/IP to the remote host which internally uses RPC for transferring the data to remote. You could let customer know about it and could exclude it from hardening.

配置GoldenGate意外处理及记录Oracle错误信息(1)

GoldenGate自身不提供异常处理的程序。在默认情况下,若Replicat进程遭遇操作故障都会导致其异常终止(ABEND),同时将会回滚事务到最近的检查点。在生产环境中这往往并不理想。当我们在做无缝的数据迁移时会用到HANDLECOLLISIONS和NOHANDLECOLLISIONS参数,这2个参数控制了Replicat是否会试图解决重复记录和缺失记录的错误,但这错误真的应该被忽略吗?这个问题只有熟悉应用的维护人员才能解答,我们需要做的是在出现这类错误后及时记录错误的相关信息。

这就要求我们在Replicat中配置相关的意外处理程序,该程序用以记录是哪个Replicat进程,是那些数据引起了错误的产生。

接下来我们会尝试建立不同的意外处理程序,用以捕获复制过程中Oracle出现的相关错误,但在这些故障发生后我们允许Replicat继续它的工作:

1. 首先我们要做的是创建用以记录错误相关信息的错误记录表:
SQL> conn maclean/maclean
Connected.
SQL> create table exception_log
  2  ( replicat_name varchar2(10),
  3    table_name varchar2(100),
  4    errno number,
  5    dberrmsg varchar2(4000),
  6    optype varchar2(20),
  7    errtype varchar2(20),
  8    logrba number,
  9    logposition number,
 10    committimestamp timestamp);

Table created.

SQL> alter table exception_log add primary key (logrba,logposition,committimestamp);
Table altered.

以上意外处理记录表应建立于Goldengate管理员账户名下,可用以记录所有Replicat的意外数据。

2.修改各Replicat的参数文件添加相关的意外处理程序:
GGSCI (rh3.oracle.com) 59> view params rep1

replicat rep1
userid maclean,password maclean
ASSUMETARGETDEFS
discardfile /s01/discard/rep1.log,append,megabytes 10
REPERROR (DEFAULT, EXCEPTION)
REPERROR (DEFAULT2,ABEND)
map defs.tbc, target defs.tbc;
map defs.tbc, target maclean.exception_log, 
EXCEPTIONSONLY,
INSERTALLRECORDS,
COLMAP (   replicat_name = "rep1"
, table_name = @GETENV ("GGHEADER", "TABLENAME")
, errno = @GETENV ("LASTERR", "DBERRNUM")
, dberrmsg = @GETENV ("LASTERR", "DBERRMSG") 
, optype = @GETENV ("LASTERR", "OPTYPE")
, errtype = @GETENV ("LASTERR", "ERRTYPE")
, logrba = @GETENV ("GGHEADER", "LOGRBA")
, logposition = @GETENV ("GGHEADER", "LOGPOSITION")
, committimestamp = @GETENV ("GGHEADER", "COMMITTIMESTAMP"));

REPERROR参数用以控制Replicat进程如何响应映射过程中发生的错误
DEFAULT参数代表一种全局错误类型,即除去所有已明确指定的错误外的一切错误
DEFAULT2参数代表当DEFAULT错误以Exception方式响应时,所有MAP映射中未定义Exception部分出现的所有错误

3.停止并重启Replicat 进程
GGSCI (rh3.oracle.com) 60> stop rep1

Sending STOP request to REPLICAT REP1 ...
Request processed.


GGSCI (rh3.oracle.com) 61> start rep1

Sending START request to MANAGER ...
REPLICAT REP1 starting

4.以上完成了对单表defs.tbc意外处理的配置,接着我们可以启动相关的应用了

5.通过某些手动篡改数据或不同步操作,可以很容易地触发意外处理而将相关错误信息记录到我们的exception_log表中,我们可以来查看
相关记录:
SQL> col dberrmsg for a1;
SQL> col table_name for a10;
SQL> select * from exception_log;
 
REPLICAT_NAME TABLE_NAME      ERRNO D OPTYPE               ERRTYPE                  LOGRBA LOGPOSITION COMMITTIMESTAMP
------------- ---------- ---------- - -------------------- -------------------- ---------- ----------- ---------
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    59259920 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    59260812 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94620688 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94621580 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94682640 23-DEC-10 1
rep1          DEFS.TBC         1403   PK UPDATE            DB                          231    94683532 23-DEC-10 1
 
6 rows selected

可以看到以上为基于主键更新时出现了1403 "no data found"的Oracle常规错误。

意外处理程序所能记录的相关信息还不于止此,我们还可以用它来记录如Update操作的前后数据镜像,这些信息可以在冲突解决时派上用场。

How to handle ddl in GoldenGate environment without ddl replication?

Question:gg only config to replicate dml, when source has some ddl changes, how to handle in source and target?

Answer:
GoldenGate capture the dml changes only by default,if you have made any changes in DDL structure,then stop the extract and replicat process and make changes in DDL structure and restart the extract and replicat process,then only extract can be able to capture the changes done at DDL structure.

It is advisible that stop extract,pump and replicat process make changes in the DDL and restart the extract,pump and replicat.
The above suggested procedure will not make any impact in DDL replication.

Also you can follow the below procedure
1. stop extract
2. stop data pump
3. there are lag in replicat, so replicat still running to reduce lag
4. do ddl in source, when replicat is running
5. stop replicat untill lag=0
6.do ddl in target
7.start extract and pump and replicat

Question:After do ddl in source, whether need to do delete and re-add the trandata on the table which is been changed?

Answer:
yes,you are correct,After doing DDL changes,you need to do delete and re-add the trandata on the table which is being changed.

GoldenGate Build for Oracle 8i

BUILD REQUEST INSTALLATION MEDIA
BUILD REQUEST ORACLE GOLDENGATE BUILD FOR 8I -32BIT ON WINDOWS (XP, 2003) 1040_WINNT.zip
BUILD REQUEST SUN SOLARIS 5.8 SPARC (64-BIT) , ORACLE 8I GG VERSION 10.0.0.52_00 ggs_solaris8_sparc_ora81_32bit_v10_0_0_52_001.tar
GGS V10.4.0.31.001 BUILD FOR HPUX 11.11 PA-RISC 64BIT / ORACLE 8.1 ggs_hpux1111_pa_ora81_32bit_v10_4_0_31_001.tar

GoldenGate实现Live Standby主备库切换(2)

在《GoldenGate实现Live Standby主备库切换(1)》中我们介绍了如何针对GoldenGate Live standby环境执行计划内的Switchover切换。除去计划内的主备切换,实际生产中更多的故障切换发生在主机故障或主库不可用的情况下,这种情况下一般我们已经无法在Primary上停止应用及extract了;当我们在这样的情况下failover到Standby上后如同在DataGuard环境下一样即便Primary上的数据库恢复了我们也无法直接进行回切了,需要做的是重新配置Primary上的OGG并将Standby上的数据以initial load的形式还原回去,在数据重新同步后才能再切换到Primary上。下面我们就来介绍如何在计划外的情况下从主库failover到备库,并尝试回切:

1.
使用lag replicat命令了解standby上的replicat的延迟情况,若返回"At EOF (end of file)"则说明replicat已应用所有trail中的数据到备库上。
GGSCI (rh3.oracle.com) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     STOPPED     EXTSTD2     00:00:00      23:42:47    
EXTRACT     STOPPED     PUMPSTD2    00:00:00      23:41:29    
REPLICAT    RUNNING     REPSTD1     00:00:00      00:00:00  

GGSCI (rh3.oracle.com) 5> lag replicat repstd1

Sending GETLAG request to REPLICAT REPSTD1 ...
Last record lag: 5 seconds.
At EOF, no more records to process.

2.
停止standby上的replicat
GGSCI (rh3.oracle.com) 6> stop replicat repstd1

Sending STOP request to REPLICAT REPSTD1 ...
Request processed.

3.
在standby上执行必要的赋予DML权限,启动triggers触发器和cascade delete约束的脚本

4.
启动standby上的extract,
在此之前先确认Standby上的data pump group不被启动,以保证trail文件堆积在standby上
GGSCI (rh3.oracle.com) 15> info all
Program     Status      Group       Lag           Time Since Chkpt
MANAGER     RUNNING                                           
EXTRACT     STOPPED     EXTSTD2     00:00:00      24:04:16    
EXTRACT     STOPPED     PUMPSTD2    00:00:00      24:02:57    
REPLICAT    STOPPED     REPSTD1     00:00:00      00:00:06    

GGSCI (rh3.oracle.com) 16> start extstd2
Sending START request to MANAGER ...
EXTRACT EXTSTD2 starting

5.
此时可以将应用切换到standby上了

==============================================================================
以上步骤完成了故障切换到Standby的过程,接下来我们尝试将应用还原到primary上
1.如果原primary主机已损毁则需要重装Oracle软件,并重建Primary系统上的Goldengate软件目录
2.从primary端启动GGSCI命令
3.删除primary上相关的extract及EXTTRAIL,并重建

GGSCI (rh2.oracle.com) 6> delete extract extstd1
Deleted EXTRACT EXTSTD1.

GGSCI (rh2.oracle.com) 7> delete exttrail /d01/ext/cl

GGSCI (rh2.oracle.com) 14> add extract extstd1,tranlog,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 15> add exttrail /d01/ext/cl,megabytes 100,extract extstd1
EXTTRAIL added.

4.
在primary上启动Manager
GGSCI (rh2.oracle.com) 18> start Manager
Manager started.

5.
接着在primary上执行disable trigger触发器和cascade delete约束的脚本

6.
在standby上对执行热备份(逻辑,物理的均可);并记录该热备的结束时间

7.
使用standby上的热备份来完成primary上的initial load后,再以HANDLECOLLISIONS选项启动Standby上的replicat
GGSCI (rh2.oracle.com) 22> view params repstd2

-- Identify the Replicat group:
REPLICAT repstd2
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
HANDLECOLLISIONS
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 


GGSCI (rh2.oracle.com) 23> start replicat repstd2

Sending START request to MANAGER ...
REPLICAT REPSTD2 starting

8.并启动standby上的data pump group,将堆积的trail文件传输到Primary上
GGSCI (rh3.oracle.com) 19> start pumpstd2
Sending START request to MANAGER ...
EXTRACT PUMPSTD2 starting

9.使用info replicat观察primary上的replicat,观察其进度是否已晚于完成初始化导出的时间


10.
禁用primary上目前使用的HANDLECOLLISIONS选项
GGSCI (rh2.oracle.com) 26> send replicat repstd2,NOHANDLECOLLISIONS

11.
关闭之前切换到Standby上的一切应用

12.
12.若需要进行数据验证则关闭Standby上的extract、pump及Primary上的replicat:
GGSCI (rh2.oracle.com) 31> lag replicat repstd2
Sending GETLAG request to REPLICAT REPSTD2 ...
Last record lag: 3 seconds.
At EOF, no more records to process.

GGSCI (rh3.oracle.com) 28> stop extstd2
Sending STOP request to EXTRACT EXTSTD2 ...
Request processed.


GGSCI (rh3.oracle.com) 26> stop pumpstd2
Sending STOP request to EXTRACT PUMPSTD2 ...
Request processed.

GGSCI (rh2.oracle.com) 34> stop replicat repstd2
Sending STOP request to REPLICAT REPSTD2 ...
Request processed.

/* 使用Oracle GoldenGate Veridata等工具验证数据一致性,
   若不一致则修复
*/

standby库上:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
5355944997

primary库上:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
5355944997

13.在primary系统上赋予应用相关DML权限,启用触发器及删除约束

14.
修改primary系统上的extract group的begin time为当前,启动Primary到Standby的extract、pump及replicat
GGSCI (rh2.oracle.com) 36> alter extstd1 ,begin now
EXTRACT altered.


GGSCI (rh2.oracle.com) 52> start extract extstd1

Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting


GGSCI (rh2.oracle.com) 53> start extract pumpstd1

Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting


GGSCI (rh3.oracle.com) 3> start repstd1

Sending START request to MANAGER ...
REPLICAT REPSTD1 starting

此时系统切换回原始的primary->standby状态.
That's great!

How to understand goldengate report file statistics

- Are total process records = inserts + updates + deletes +discards + ignores?
Generally total process records = inserts + updates + deletes +discards + ignores

- Are update collisions is part of updates ?
Yes

- Are delete collisions is part of deletes?
Yes

- Are total successfully process records = inserts + updates + deletes?
Yes

- Can HANDLECOLLISIONS handle ORA-00001 error?
No, HANDLECOLLISIONS can not handle ORA-00001 error. This error is an unique constraint violation error.
HANDLECOLLISIONS control whether or not replicate tries to resolve duplicate-record
and missing-record errors when applying SQL on the target.
These errors can occur during an initial load, when data from source tables is
being loaded to target tables while Oracle Golden Gate is replicating
transactional changes that are being made to those tables.
You can read more about HANDLECOLLISIONS and NOHANDLECOLLISIONS in Oracle® GoldenGate Windows and UNIX Reference Guide.

- How to find the value of bind variable when error is occured by update statement?
In case you want to capture conflict records, you can capture the conflict records to discard file.
Also the conflict record can be written to trace table.

- How to capture the conflict records to discard file?
A discard file is generated when the error condition is encountered by the extract or replicat,
and there is a database error generated from a DDL operation.
Discard file is generally used to do the troubleshooting.

The location of the discard file, is determined by the DISCARDFILE parameter in the Extract or Replicat parameter file.

DISCARDFILE  [, APPEND | PURGE | MEGABYTES ]

Where:

<file name> is the discard file name.
APPEND adds new content to existing content if the file already exists.
PURGE purges the file vefore writing new content.
MEGABYTES <N> sets the maximum size of the file, in megabytes. The default is 1MB.

GoldenGate实现Live Standby主备库切换(1)

Oracle Goldengate目前支持主被动式的双向配置,换而言之OGG可以将来自于激活的主库的数据变化完全复制到从库中,从库在不断同步数据的同时已经为计划内的和计划外的outages做好了故障切换的准备,也就是我们说的Live Standby。这里我们重点介绍一下配置Oracle Goldengate Live Standby系统的步骤,和具体的故障切换过程。

SQL> conn clinic/clinic
Connected.
SQL> drop table tv;
create table tv (t1 int primary key,t2 int,t3 varchar2(30));
Table dropped.

SQL> 

Table created.

SQL> drop sequence seqt1;

create sequence seqt1 start with 1 increment by 1;
Sequence dropped.

SQL> SQL>
Sequence created.

declare
  rnd number(9,2);
begin
   for i in 1..100000 loop
     insert into tv values(seqt1.nextval,i*dbms_random.value,'MACLEAN IS TESTING');
     commit;
   end loop;
end;
/

/* 以上脚本在primary主库的某个应用账户下创建了测试用的数据,
    接着我们可以使用各种工具将数据初始化到从库中,如果在这个过程中
    希望实时在线数据迁移的话,可以参考《Goldengate实现在线数据迁移》
*/

/* 注意我们在Live Standby的环境中往往需要复制sequence序列,以保证切换到备库时业务可以正常进行  */

/* 初始化备库数据后,确保已与主库完全一致 */
primary :
SQL> select sum(t2) from tv;

   SUM(T2)
----------
2498624495

SQL> select last_number from user_sequences;

LAST_NUMBER
-----------
     100001

standby:
SQL> select sum(t2) from tv;

   SUM(T2)
----------
2498624495

SQL> select last_number from user_sequences;

LAST_NUMBER
-----------
     100001

以上完成准备工作后,我们可以进入到正式配置Goldengate live stanby的阶段,包括以下步骤:

  1. 配置由主库到备库的extract、replicat、data pump,该步骤同普通的单向复制没有太大的区别
  2. 配置由备库到主库的extract、replicat、data pump
  3. 启动由主库到备库的extract、replicat、data pump

接下来我们会实践整个配置过程:

1.
创建由主库到备库的extract、data pump、replicat
GGSCI (rh2.oracle.com) 10> dblogin userid maclean
Password: 
Successfully logged into database.

GGSCI (rh2.oracle.com) 11> add trandata clinic.*
Logging of supplemental redo data enabled for table CLINIC.TV


GGSCI (rh2.oracle.com) 4> add extract extstd1,tranlog,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 5> add exttrail /d01/ext/cl,megabytes 100,extract extstd1
EXTTRAIL added.

GGSCI (rh2.oracle.com) 7> view params extstd1

-- Identify the Extract group:
EXTRACT extstd1
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify the local trail that this Extract writes to:
EXTTRAIL /d01/ext/cl
-- Specify sequences to be captured:
SEQUENCE clinic.seqt1;
-- Specify tables to be captured:
TABLE clinic.*;
-- Exclude specific tables from capture if needed:
-- TABLEEXCLUDE 

GGSCI (rh2.oracle.com) 17> add extract pumpstd1,exttrailsource /d01/ext/cl,begin now
EXTRACT added.

GGSCI (rh2.oracle.com) 98> add rmttrail /d01/rmt/cl,megabytes 100,extract pumpstd1
RMTTRAIL added.

GGSCI (rh2.oracle.com) 129> view params pumpstd1
-- Identify the data pump group:
EXTRACT pumpstd1
userid maclean, password maclean
-- Specify database login information as needed for the database:
userid maclean, password maclean
RMTHOST rh3.oracle.com, MGRPORT 7809
-- Specify the remote trail on the standby system:
RMTTRAIL /d01/rmt/cl
-- Pass data through without mapping, filtering, conversion:
PASSTHRU
sequence clinic.seqt1;
Table clinic.*;


在备库上配置由主库到备库的replicat:

GGSCI (rh3.oracle.com) 4> add replicat repstd1,exttrail /d01/rmt/cl,begin now
REPLICAT added.

GGSCI (rh3.oracle.com) 49> view params repstd1
-- Identify the Replicat group:
REPLICAT repstd1
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 

2.
创建由备库到主库的extract、data pump、replicat

GGSCI (rh3.oracle.com) 51> dblogin userid maclean
Password: 
Successfully logged into database.

GGSCI (rh3.oracle.com) 52> add trandata clinic.*
Logging of supplemental redo data enabled for table CLINIC.TV.

/* 不要忘记在备库端的相关表加上追加日志 */

GGSCI (rh3.oracle.com) 53> add extract extstd2,tranlog,begin now
EXTRACT added.

GGSCI (rh3.oracle.com) 54> add exttrail /d01/ext/cl,megabytes 100,extract extstd2
EXTTRAIL added.

GGSCI (rh3.oracle.com) 58> view params extstd2
-- Identify the Extract group:
EXTRACT extstd2
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify the local trail that this Extract writes to:
EXTTRAIL /d01/ext/cl
-- Specify sequences to be captured:
SEQUENCE clinic.seqt1;
-- Specify tables to be captured:
TABLE clinic.*;
-- Exclude specific tables from capture if needed:
-- TABLEEXCLUDE 

GGSCI (rh3.oracle.com) 59> add extract pumpstd2,exttrailsource /d01/ext/cl,begin now
EXTRACT added.

GGSCI (rh3.oracle.com) 60> add rmttrail /d01/rmt/cl,megabytes 100,extract pumpstd2
RMTTRAIL added.

GGSCI (rh3.oracle.com) 63> view params pumpstd2

-- Identify the data pump group:
EXTRACT pumpstd2
userid maclean, password maclean
-- Specify database login information as needed for the database:
userid maclean, password maclean
RMTHOST rh2.oracle.com, MGRPORT 7809
-- Specify the remote trail on the standby system:
RMTTRAIL /d01/rmt/cl
-- Pass data through without mapping, filtering, conversion:
PASSTHRU
sequence clinic.seqt1;
Table clinic.*;

在主库上配置replicat:


GGSCI (rh2.oracle.com) 136> add replicat repstd2,exttrail /d01/rmt/cl,begin now,checkpointtable maclean.ck
REPLICAT added.

GGSCI (rh2.oracle.com) 138> view params repstd2

-- Identify the Replicat group:
REPLICAT repstd2
-- State that source and target definitions are identical:
ASSUMETARGETDEFS
-- Specify database login information as needed for the database:
userid maclean, password maclean
-- Specify tables for delivery:
MAP clinic.*, TARGET clinic.*;
-- Exclude specific tables from delivery if needed:
-- MAPEXCLUDE 

3.
完成以上OGG配置后,可以启动主库到备库的extract、pump、以及replicat:
GGSCI (rh2.oracle.com) 141> start extstd1
Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting


GGSCI (rh2.oracle.com) 142> start pumpstd1
Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting



GGSCI (rh3.oracle.com) 70> start repstd1
Sending START request to MANAGER ...
REPLICAT REPSTD1 starting

/* 如果你是在offline状态下配置的话,那么此时可以启用应用了*/

接下来我们尝试做有计划的主备库切换演练:

1.
首先停止一切在主库上的应用,这一点和DataGuard Switchover一样。在保证没有活动事务的情况下,才能切换干净。
2.
在主库端使用LAG等命令了解extract的延迟,若返回如"At EOF, no more records to process"的信息,则说明所有事务均已被抽取。
GGSCI (rh2.oracle.com) 144> lag extstd1
Sending GETLAG request to EXTRACT EXTSTD1 ...
Last record lag: 0 seconds.
At EOF, no more records to process.

在EOF的前提下关闭extract:
GGSCI (rh2.oracle.com) 146> stop extstd1 
Sending STOP request to EXTRACT EXTSTD1 ...
Request processed.

3.
同样对pump使用LAG命令,若返回如"At EOF, no more records to process"的信息,则说明已抽取的数据都被发送到备库了。
GGSCI (rh2.oracle.com) 147> lag pumpstd1
Sending GETLAG request to EXTRACT PUMPSTD1 ...
Last record lag: 3 seconds.
At EOF, no more records to process.

在EOF的前提下,关闭data pump
GGSCI (rh2.oracle.com) 148> stop pumpstd1
Sending STOP request to EXTRACT PUMPSTD1 ...
Request processed.

3.
检查备库端replicat的同步情况,如返回"At EOF, no more records to process.",则说明所有记录均被复制。
GGSCI (rh3.oracle.com) 71> lag repstd1
Sending GETLAG request to REPLICAT REPSTD1 ...
Last record lag: 5 seconds.
At EOF, no more records to process.

在EOF的前提下关闭replicat

GGSCI (rh3.oracle.com) 72> stop repstd1
Sending STOP request to REPLICAT REPSTD1 ...
Request processed.

4.
紧接着我们可以在备库上为业务应用用户赋予必要的insert、update、delete权限,启用各种触发器trigger及cascade delete约束等;
以上手段在主库上对应的操作是收回应用业务的权限,disable掉各种触发器及cascade delete约束,
之所以这样做是为了保证在任何时候扮演备库角色的数据库均不应当接受任何除了OGG外的手动的或者应用驱动的业务数据变更,
以保证主备库间的数据一致。

5.
修改原备库上的extract的启动时间到现在,已保证它不去抽取那些之前的重做日志

GGSCI (rh3.oracle.com) 75> alter extstd2 ,begin now
EXTRACT altered.

GGSCI (rh3.oracle.com) 76> start extstd2

Sending START request to MANAGER ...
EXTRACT EXTSTD2 starting


若之前没有启动由备库到主库的pump和replicat的话可以在此时启动:

GGSCI (rh3.oracle.com) 78> start pumpstd2

Sending START request to MANAGER ...
EXTRACT PUMPSTD2 starting

GGSCI (rh2.oracle.com) 161> start repstd2

Sending START request to MANAGER ...
REPLICAT REPSTD2 starting

6.此时我们可以正式启动在原备库现在的主库上的应用了


接下来我们尝试回切到原主库上:
1.前提步骤与之前的切换相似,首先停止在原备库上的任何应用,
之后使用LAG命令确认extract和replicat的进度,在确认后关闭extract和replicat。
完成在主库上的维护工作:包括赋予权限,启用触发器等等。

2.修改原主库上的extract的开始时间为当前,保证它不去处理之前的重做日志:
GGSCI (rh2.oracle.com) 165> alter extract extstd1,begin now
EXTRACT altered.

3.此时我们已经可以启动在原主库现在的主库上的应用了

4.启动最早配置的由主库到备库的extract、pump、replicat:

GGSCI (rh2.oracle.com) 166> start extstd1

Sending START request to MANAGER ...
EXTRACT EXTSTD1 starting

GGSCI (rh2.oracle.com) 171> start pumpstd1

Sending START request to MANAGER ...
EXTRACT PUMPSTD1 starting

GGSCI (rh3.oracle.com) 86> start repstd1

Sending START request to MANAGER ...
REPLICAT REPSTD1 starting


以上完成了OGG的Live Standby中主备库之间的计划内的切换Switchover,That's Great!

Does Oracle Goldengate support Parallel DML?

Golengate的基本工作原理是通过挖掘重做日志以获取数据库中的数据变化;而如果我们在数据库中使用并行DML去插入数据的话会因为直接路径插入而产生少量的redo重做日志。那么OGG的日志挖掘是否能正确捕获这些并行DML所产生的数据变化呢?接着我们来实际地测试一下:

SQL> select le.leseq "Current log sequence No",
  2         100 * cp.cpodr_bno / le.lesiz "Percent Full",
  3         (cpodr_bno - 1) * 512 "Current Offset",
  4         le.lesiz * 512 - cpodr_bno * 512 "Left space"
  5    from x$kcccp cp, x$kccle le
  6   where LE.leseq = CP.cpodr_seq
  7     and bitand(le.leflg, 24) = 8;

Current log sequence No Percent Full Current Offset Left space
----------------------- ------------ -------------- ----------
                    177   78.5112305       82324480   22532608

/* 通过以上查询我们可以了解实际的redo写出情况:
    Current Offset说明了当前日志文件所写到的位置,
    而Left Space说明了当前日志文件所剩余的空间 
82324480(Current Offset)+22532608(Left space)+512(redo header)=logfile size=le.lesiz* redo block size
*/

SQL> alter system switch logfile;
System altered.

SQL> select le.leseq "Current log sequence No",
  2         100 * cp.cpodr_bno / le.lesiz "Percent Full",
  3         (cpodr_bno - 1) * 512 "Current Offset",
  4         le.lesiz * 512 - cpodr_bno * 512 "Left space"
  5    from x$kcccp cp, x$kccle le
  6   where LE.leseq = CP.cpodr_seq
  7     and bitand(le.leflg, 24) = 8;

Current log sequence No Percent Full Current Offset Left space
----------------------- ------------ -------------- ----------
                    179   .002441406           2048  104855040

/* 初始位置为No 179的日志文件 */

SQL> select le.leseq "Current log sequence No",
  2         100 * cp.cpodr_bno / le.lesiz "Percent Full",
  3         (cpodr_bno - 1) * 512 "Current Offset",
  4         le.lesiz * 512 - cpodr_bno * 512 "Left space"
  5    from x$kcccp cp, x$kccle le
  6   where LE.leseq = CP.cpodr_seq
  7     and bitand(le.leflg, 24) = 8;

Current log sequence No Percent Full Current Offset Left space
----------------------- ------------ -------------- ----------
                    180    58.277832       61108224   43748864

/* 使用普通非并行DML插入产生了104855040+61108224=158M的redo */


SQL> truncate table tv;
Table truncated.

SQL> select le.leseq "Current log sequence No",
  2         100 * cp.cpodr_bno / le.lesiz "Percent Full",
  3         (cpodr_bno - 1) * 512 "Current Offset",
  4         le.lesiz * 512 - cpodr_bno * 512 "Left space"
  5    from x$kcccp cp, x$kccle le
  6   where LE.leseq = CP.cpodr_seq
  7     and bitand(le.leflg, 24) = 8;

Current log sequence No Percent Full Current Offset Left space
----------------------- ------------ -------------- ----------
                    180   60.6469727       63592448   41264640

/* 初始为No 180日志文件的63592448 */

SQL> alter session enable parallel dml;
Session altered.

/* 在session级别启用并行DML */

SQL> set autotrace on;
SQL> insert /*+ parallel(tv,4) */ into tv select * from sample;

3640772 rows created.
Execution Plan
----------------------------------------------------------
ERROR:
ORA-12838: cannot read/modify an object after modifying it in parallel
SP2-0612: Error generating AUTOTRACE EXPLAIN report
Statistics
----------------------------------------------------------
        111  recursive calls
       1168  db block gets
      17850  consistent gets
      17745  physical reads
      97944  redo size
        815  bytes sent via SQL*Net to client
        750  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
    3640772  rows processed

/* autotrace statistics显示并行插入仅产生了97944字节的redo */

SQL> commit;
Commit complete.

SQL> select le.leseq "Current log sequence No",
  2         100 * cp.cpodr_bno / le.lesiz "Percent Full",
  3         (cpodr_bno - 1) * 512 "Current Offset",
  4         le.lesiz * 512 - cpodr_bno * 512 "Left space"
  5    from x$kcccp cp, x$kccle le
  6   where LE.leseq = CP.cpodr_seq
  7     and bitand(le.leflg, 24) = 8;

Current log sequence No Percent Full Current Offset Left space
----------------------- ------------ -------------- ----------
                    182   10.4882813       10997248   93859840

/* 而实际上日志由180切换到了182,实际的redo产生大约是41264640+104857600+10997248=150M */

/* 换而言之autotrace对并行DML语句所产生的实际redo统计远少于实际值,
    这也就保证了extract能够捕获到所有这些并行DML所引起的数据变化
*/

GGSCI (rh2.oracle.com) 59> stats load2
Sending STATS request to EXTRACT LOAD2 ...

Start of Statistics at 2010-12-16 20:17:35.

Output to /s01/new/ze:

Extracting from CLINIC.TV to CLINIC.TV:

*** Total statistics since 2010-12-16 20:17:24 ***
        Total inserts                           923555.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        923555.00

*** Daily statistics since 2010-12-16 20:17:24 ***
        Total inserts                           923555.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        923555.00

*** Hourly statistics since 2010-12-16 20:17:24 ***
        Total inserts                           923555.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        923555.00

*** Latest statistics since 2010-12-16 20:17:24 ***
        Total inserts                           923555.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        923555.00

End of Statistics.


GGSCI (rh2.oracle.com) 60> !
stats load2

Sending STATS request to EXTRACT LOAD2 ...

Start of Statistics at 2010-12-16 20:17:37.

Output to /s01/new/ze:

Extracting from CLINIC.TV to CLINIC.TV:

*** Total statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1090336.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1090336.00

*** Daily statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1090336.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1090336.00

*** Hourly statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1090336.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1090336.00

*** Latest statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1090336.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1090336.00

End of Statistics.


GGSCI (rh2.oracle.com) 61> !
stats load2

Sending STATS request to EXTRACT LOAD2 ...

Start of Statistics at 2010-12-16 20:17:39.

Output to /s01/new/ze:

Extracting from CLINIC.TV to CLINIC.TV:

*** Total statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1249284.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1249284.00

*** Daily statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1249284.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1249284.00

*** Hourly statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1249284.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1249284.00

*** Latest statistics since 2010-12-16 20:17:24 ***
        Total inserts                           1249284.00
        Total updates                                0.00
        Total deletes                                0.00
        Total discards                               0.00
        Total operations                        1249284.00

End of Statistics.

/* 可以看到extract的统计信息中Total inserts不断递增,说明extract正确捕获了
    所有由并行INSERT引发的直接路径插入操作
*/

Does Oracle Goldengate support Parallel DML?
结论显然是: Yes。

Goldengate can’t extract data from compressed table

OGG目前不支持对10g以后压缩表(compressed table)的extract,若挖掘到压缩表相关的DML纪录则extract会abend:

SQL> alter table sales compress 2;
Table altered.

SQL>  update sales set  AMOUNT_SOLD= AMOUNT_SOLD +1 where rownum<200;
199 rows updated.

SQL> commit;
Commit complete.

extract report:
***********************************************************************
**                     Run Time Messages                             **
***********************************************************************
2010-12-12 23:08:44  INFO    OGG-01517  Position of first record processed Sequence 51, 
RBA 12853264, SCN 0.53443895, 2010-12-12 下午11:08:42.
TABLE resolved (entry CLINIC.SALES):
  Table CLINIC.SALES;

Source Context :
  SourceModule            : [ggdb.ora.dbx]
  SourceID                : [/home/ecloud/workspace/Build_FBO_OpenSys_r11.1.1.0.0_078_[34093]/perforce/src/gglib/ggdbora/dbxoci.c]
  SourceFunction          : [get_object_attributes]
  SourceLine              : [5122]
  ThreadBacktrace         : [20] elements
                          : [/home/maclean/gg/extract(CMessageContext::AddThreadContext()+0x26) [0x7f6596]]
                          : [/home/maclean/gg/extract(CMessageFactory::CreateMessage(
CSourceContext*, unsigned int, ...)+0x7b2) [0x7ed032]]
                          : [/home/maclean/gg/extract(_MSG_ERR_ORACLE_COMPRESSED_TABLE
(CSourceContext*, DBString<777> const&, CMessageFactory::MessageDisposition)+0x92) [0x79c842]]
                          : [/home/maclean/gg/extract(gl_get_odbc_table_def(file_def*, short, char*)+0x6a4) [0x6a7ee4]]
                          : [/home/maclean/gg/extract [0x54e03e]]
                          : [/home/maclean/gg/extract [0x54f078]]
                          : [/home/maclean/gg/extract(get_map_entry(char*, int, __wc*, int)+0x841) [0x550cf1]]
                          : [/home/maclean/gg/extract [0x62ec62]]
                          : [/home/maclean/gg/extract(WILDCARD_check_table(char const*, char const*,
 int, unsigned int*, int, unsigned int, DBString<777>*, int)+0x320) [0x630560]]
                          : [/home/maclean/gg/extract(REDOORA_lookup_table(file_def**, log_context_t*, int)+0x1398) [0x6ce678]]
                          : [/home/maclean/gg/extract [0x70b81c]]
                          : [/home/maclean/gg/extract(REDOORAOUT_get_committed_record(objectpool*, 
cache_object*, log_context_t*, chkpt_context_t*, short, int, int, int*, char*, int*, int*, int*, char*, 
short*, int*, char*)+0x449) [0x713619]]
                          : [/home/maclean/gg/extract [0x6fc77e]]
                          : [/home/maclean/gg/extract(REDOORAOPS_process_ops
(subrec_info_t*, objectpool*, objectpool*, cache_object**, log_context_t*, redo_thread_t*, 
chkpt_context_t*, long, int, int, int, char*, int*, int*, char*, short*, int*, char*)+0x5fa) [0x700bca]]
                          : [/home/maclean/gg/extract(REDO_read(char*, int*, long*, char*, short*, int, char*)+0xb6a) [0x6ca6fa]]
                          : [/home/maclean/gg/extract(READ_EXTRACT_RECORD(chkpt_context_t*, 
short, char*, __std_rec_hdr*, int*, int*, long*, short, short, char*, short*, char*, short*, 
long*, char*)+0x344) [0x4f0824]]
                          : [/home/maclean/gg/extract [0x56ab17]]
                          : [/home/maclean/gg/extract(main+0xfaa) [0x56fc3a]]
                          : [/lib64/libc.so.6(__libc_start_main+0xf4) [0x39f1c1d994]]
                          : [/home/maclean/gg/extract(__gxx_personality_v0+0x1f2) [0x4e5ada]]

2010-12-12 23:10:11  ERROR   
OGG-01433  Failed to validate table CLINIC.SALES. 
The table is compressed and extract will not be able to extract data from Oracle logs.

沪ICP备14014813号-2

沪公网安备 31010802001379号