一套Linux x86-64 上的11.2.0.2逻辑备库因为出现ORA-26786: A row with key exists but has conflicting column(s) “导致APPLIER 进程无法继续工作,详细信息如下:
1) ### Primary OS Details ### Hostname - vrh6 OS details - [oracle@vrh6 ~]$ uname -a Linux vrh6 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux 2) ### Standby OS Details ### Hostname - vrh7 [oracle@vrh7 sqlscripts]$ uname -a Linux vrh7 2.6.18-194.8.1.el5 #1 SMP Wed Jun 23 10:52:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux DB Version - 11.2.0.2 3) ### Primary DB Details ### 11.2.0.2 4) ### Standby DB Details ### 11.2.0.2 5) ### Standby Configuration Type ### Logical Standby(s) Only. 6) ### RAC Usage ### Primary only uses RAC. 7) ### Documentation Reference ### We configured logical standby successfully, but the log apply process on standby is very slow and lagging by many archive log files 8) ### Data Guard Management Method ### Data Guard GUI (Console/Grid Control). 9) ### Issue or Question Details ### The logical standby was applying logs very slow, so raised the LCR cache size to 3058M. After this the performance has improved, but we still see lag (about 3-4 hours)
现象是logical standby逻辑备库应用日志十分缓慢,加大LCR cache也无济于事。
该问题提交SR后,MOS要求通过以下脚本收集primary 与 logical standby的信息:
Script to Collect Data Guard Logical Standby Diagnostic Information (Doc ID 241512.1) Script to Collect Data Guard Primary Site Diagnostic Information (Doc ID 241374.1) I reviewed the trace file and could not see any errors or stuck SQL apply , please refer to note: Oracle10g Data Guard SQL Apply Troubleshooting (Doc ID 312434.1) MAA - SQL Apply Best Practices 10gR2 (Doc ID 387450.1) to verify the SQL Apply progress and review some tuning recommendation in note 387450.1
分析脚本运行结果以后发现逻辑备库的日志应用其实停止了:
The logical standby database in now almost stopped.In last two days it is not applying any changes, but in log files it says " loading previously applied transaction information". In the alert_log file either it is mining the same set of log files many times. It has been mining the same files foew about 6 hours now. The logical standby is lagging production by 4 days. This morning i have enabled few parameters as per the document 10G bestpractices. Here are the parameters Exec DBMS_LOGSTDBY.APPLY_SET('MAX_SERVERS',24); ALTER SYSTEM SET PARALLEL_MAX_SERVERS=24; scope=BOTH; Exec DBMS_LOGSTDBY.APPLY_SET('LOG_AUTO_DELETE', 'TRUE'); We are getting below error in logical standby. I am uploading the alert log file as well. Tue Aug 30 02:22:04 2011 LOGSTDBY: SQL Apply about to stop with ORA-26786 Tue Aug 30 02:22:04 2011 LOGSTDBY: SQL Apply about to stop with ORA-26786 Tue Aug 30 02:22:04 2011 LOGSTDBY: SQL Apply about to stop with ORA-26786 Tue Aug 30 02:22:04 2011 LOGSTDBY: SQL Apply about to stop with ORA-26786 Tue Aug 30 02:22:04 2011 LOGSTDBY: SQL Apply about to stop with ORA-26786 LOGSTDBY: SQL Apply finished logging error information LOGSTDBY Apply process AS02 server id=2 pid=37 OS id=23940 stopped Errors in file /d02/app/prrp081/diag/rdbms/prrp081/prrp081/trace/prrp081_as02_23940.trc: ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC_COST_TYPE_ID") = (84243, ) exists but has conflicting column(s) "LAST_UPDATE_LOGIN", "ORIGINAL_SHIPPED_QTY", "REQUEST_ID" in table BOM.CST_REVENUE_COGS_MATCH_LINES LOGSTDBY: SQL Apply finished logging error information LOGSTDBY Apply process AS05 server id=5 pid=40 OS id=23946 stopped Tue Aug 30 02:22:05 2011 Errors in file /d02/app/prrp081/diag/rdbms/prrp081/prrp081/trace/prrp081_lsp0_30249.trc: ORA-26808: Apply process AS02 died unexpectedly. ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC_COST_TYPE_ID") = (84243, ) exists but has conflicting column(s) "LAST_UPDATE_LOGIN", "ORIGINAL_SHIPPED_QTY", "REQUEST_ID" in table BOM.CST_REVENUE_COGS_MATCH_LINES Errors in file /d02/app/prrp081/diag/rdbms/prrp081/prrp081/trace/prrp081_as05_23946.trc: ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC_COST_TYPE_ID") = (84243, ) exists but has conflicting column(s) "LAST_UPDATE_LOGIN", "ORIGINAL_SHIPPED_QTY", "REQUEST_ID" in table BOM.CST_REVENUE_COGS_MATCH_LINES We have re-started the apply process yesterday and it was still doing "ORA-16106: loading previously applied transaction information" . When i see the alert.log file, it has been mining the same archive log files since yesterday (more than 30 hours for now). My archive log files are piling up and the gap is widening. 10G best practices have been implemented yesterday. sys@vrh7> ; 1* SELECT sid,SERIAL#,TYPE, LOGSTDBY_ID,STATUS_CODE, STATUS FROM V$LOGSTDBY_PROCESS sys@vrh7> / SID SERIAL# TYPE LOGSTDBY_ID STATUS_CODE STATUS ----- ---------- ------------------------------ ----------- ----------- 754 9 COORDINATOR -1 16116 ORA-16116: no work available 1010 63 ANALYZER 0 16116 ORA-16116: no work available 1257 165 APPLIER 1 16116 ORA-16116: no work available 1509 51 APPLIER 2 16116 ORA-16116: no work available 1760 59 APPLIER 3 16116 ORA-16116: no work available 10 51 APPLIER 4 16116 ORA-16116: no work available 254 59 APPLIER 5 16116 ORA-16116: no work available 508 267 APPLIER 6 16116 ORA-16116: no work available 757 81 APPLIER 7 16116 ORA-16116: no work available 1009 53 APPLIER 8 16116 ORA-16116: no work available 1256 73 APPLIER 9 16116 ORA-16116: no work available 1510 55 APPLIER 10 16116 ORA-16116: no work available 1762 69 APPLIER 11 16116 ORA-16116: no work available 11 51 APPLIER 12 16116 ORA-16116: no work available 256 59 APPLIER 13 16116 ORA-16116: no work available 504 85 APPLIER 14 16116 ORA-16116: no work available 760 105 APPLIER 15 16116 ORA-16116: no work available 1006 63 APPLIER 16 16116 ORA-16116: no work available 1255 161 APPLIER 17 16116 ORA-16116: no work available 1511 51 APPLIER 18 16116 ORA-16116: no work available 1761 59 APPLIER 19 16116 ORA-16116: no work available 252 61 READER 1024 16242 ORA-16242: Processing log file (thread# 1, sequence# 11585) 510 67 BUILDER 1025 16119 ORA-16119: building transaction at SCN 0x0abb.d76aec86 1004 55 PREPARER 1026 16116 ORA-16116: no work available 5 61 PREPARER 1027 16116 ORA-16116: no work available 25 rows selected. The logical standby is a week behind the primary database and the logmining is happening on the same files when we re-started the standby apply. With this speed it never catch up and clear backlog arch files. I see some data descriphancy errors in the alert log file on table BOM.CST_REVENUE_COGS_MATCH_LINES. The apply is not updating many of the records in this table compalining "ORA-26786: A row with key ("COGS_OM_LINE_ID", "PAC _COST_TYPE_ID") = (84243, ) exists but has conflic ting column(s) "ORIGINAL_SHIPPED_QTY" in table BOM .CST_REVENUE_COGS_MATCH_LINES" I ran logical standby diag script and uploading the output. Please do the needful ASAP as this is directly impacting the revenue/business loss.
MOS 最后给出的建议:
Looks like you identify the issue, you may want to skip the stuck table or transactions and re-instantiate the table when both databases are synced, please refer to the below notes: Oracle10g Data Guard SQL Apply Troubleshooting (Doc ID 312434.1) Handling ORA-1403 ora-12801 on logical standby apply (Doc ID 1178284.1) Applies to: Oracle Server - Enterprise Edition - Version: 9.2.0.8 to 11.2.0.1.0 - Release: 9.2 to 11.2 Information in this document applies to any platform. Purpose Logical standby apply may report ora-1403: No data found during apply. This note explains how to check the transaction that is failing and ways to fix the same problematic transaction. Sample Alert log contents ================== LOGMINER: Begin mining logfile: D:\ORADATA\LOGI\SREDO05.LOG LOGSTDBY Apply process P012 started with pid=50 OS id=5300 LOGMINER: session# = 41, builder process P003 started with pid=41 OS id=4740 Tue Aug 17 09:53:08 2010 LOGSTDBY status: ORA-01403: no data found LOGSTDBY Apply process P006 pid=44 OS id=2892 stopped Tue Aug 17 09:53:08 2010 Errors in file c:\oracle\product.2.0\admin\logi\bdump\logi_lsp0_2144.trc: ORA-12801: error signaled in parallel query server P006 ORA-01403: no data found LOGSTDBY Apply process P006 pid=44 OS id=2892 stopped Tue Aug 17 09:53:08 2010 Errors in file c:\oracle\product.2.0\admin\logi\bdump\logi_lsp0_2144.trc: ORA-12801: error signaled in parallel query server P006 ORA-01403: no data found Last Review Date August 18, 2010 Instructions for the Reader A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting. Troubleshooting Details ORA-1403 is mostly caused due some of below reasons 1) Skip rule defined on DML operations earlier in Logical standby database that cause the data mismatch between PRIMARY and STANDBY database. 2) User modified the data directly in Logical standby database by setting database guard status OFF. Solution:1 ************** 1) Identify the failing sql statement using DBA_LOGSTDBY_EVENTS from logical standby database SQL> select XIDUSN, XIDSLT, XIDSQN , status , event from dba_logstdby_events order by event_time; XIDUSN XIDSLT XIDSQN STATUS EVENT ---------- ---------- ---------- ------------------------------ ------------------------------- 5 22 1826 ORA-01403: no data found delete from "SHANKAR"."TEST1" where "ID" = 1 and "ENAME" = 'shankar' and ROWID = 'AAAAAAAABAAAJHaAAA' You should not compare the ROWID of PRIMARY & STANDBY database as it will be complete different and sql apply will not check the row based on ROWID that is shown in the sql statement above. 2) Skip the failing transaction from logical standby database. Execute the procedure dbms_lostdby.skip_transaction(XIDUSN , XIDSLT ,XIDSQN) of the failing transaction. SQL> exec dbms_logstdby.skip_transaction (5,22,1826); 3) Restart the sql apply. SQL> alter database start logical standby apply immediate; If using broker DGMGRL>edit database logi set state='online'; Solution:2 ************* If the data mismatch is huge that may cause repeated ora-1403 on same table then its advisable to instantiate the complete table from PRIMARY to STANDBY. TableName : TEST1 SchemaName : SHANKAR 1. At primary, as sys user grant below roles to the user whose table(s) is being reinstantiated SQL> grant SELECT_CATALOG_ROLE to SHANKAR; SQL> grant EXP_FULL_DATABASE to SHANKAR; SQL> grant IMP_FULL_DATABASE to SHANKAR; 2. As sys user, create a connected user database link at logical standby. SQL> create public database link synctable connect to shankar identified by shankar using 'prim'; database link created. prim - is the service name in TNSNAMES.ORA that points to primary database. 3. verify the database link from logical standby to ensure that it gets info from primary SQL> select db_unique_name,database_role from v$database@synctable; DB_UNIQUE_NAME DATABASE_ROLE ------------------------------ ---------------- PRIM PRIMARY 4. Stop the sql apply if it is already on. You can skip this step otherwise SQL> select count(*) from v$logstdby; ## will result zero rows if sql apply is not running SQL> alter database stop logical standby apply; If using BROKER DGMGRL> edit database logi set state='log-apply-off'; 5. Remove the skip rules in LOGICAL standby database, if there is any, from the table that is being instantiated SQL> select STATEMENT_OPT,NAME from DBA_LOGSTDBY_SKIP where OWNER='SHANKAR'; SQL> exec dbms_logstdby.unskip('SCHEMA_DDL', 'SHANKAR','TEST1'); SQL> exec dbms_logstdby.unskip('DML', 'SHANKAR','TEST1'); 6. From logical standby ,start instantiating the the table as sys user SQL> exec dbms_logstdby.instantiate_table('SHANKAR','TEST1','SYNCTABLE') Note:- Ensure that the Schema Name, Table Name and dblink name are in Uppercase . 7. Once you are done with instantiation, resume sql apply SQL> alter database start logical standby apply immediate; If using BROKER DGMGRL> edit database logi set state='online'; 8.At primary, Revoke the roles granted to the application user - SHANKAR SQL> revoke SELECT_CATALOG_ROLE from SHANKAR; SQL> revoke EXP_FULL_DATABASE from SHANKAR; SQL> revoke IMP_FULL_DATABASE from SHANKAR; Solution:3 ************* If the sql apply fails even though the data is in sync, you may need to log a service request with oracle support with the below events set in Logical standby and the issue being reproduced. 1) In logical standby, stop the sql apply sql> alter database stop logical standby apply; if using broker DGMGRL> edit database logi set state='log-apply-off'; 2) Set the below events in logical standby (you may backup and remove all the files in /bdump so that you can zip and upload all the files generated in /bdump to Oracle Support) SQL> alter system set max_dump_file_size=unlimited; SQL>alter system set events '1349 trace name context forever, level 4095'; SQL> alter system set events '16300 trace name context forever, level 15'; SQL> alter system set events '26700 trace name context forever, level 1544'; SQL>alter system set events '10308 trace name context forever, level 8'; SQL>alter system set events '1403 trace name errorstack level 3'; 3) Start the sql apply sql> alter database start logical standby apply immediate; if using broker DGMGRL> edit database logi set state='online'; 4) Monitor the alert log to ensure issue is reproduced. Errors in file c:\oracle\product.2.0\admin\logi\bdump\logi_lsp0_4532.trc: ORA-12801: error signaled in parallel query server P006 ORA-01403: no data found Log a service request and upload the alert log and trace files generated in /bdump folder 5) Turn off the trace events set earlier. SQL>alter system set events '1349 trace name context off'; SQL>alter system set events '16300 trace name context off'; SQL>alter system set events '26700 trace name context off'; SQL>alter system set events '10308 trace name context off'; SQL>alter system set events '1403 trace name context off'; 6. End