SCN may jump in a distributed transaction with dblink

2009/06/21 by admin 1 Comment

在分布式事务环境中数据库的SCN可能瞬间暴增，这种行为被称作SCN jump in distributed transaction，注意这种现象是正常的。在发生分布式事务的2个数据库中，SCN较低的DB会将SCN和较高的那个数据库同步，这有时候会造成我们的一些误解认为可能是Hot backup等操作引起了SCN的猛增。

我们来具体看一下这种现象:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for 32-bit Windows: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> select * from global_name;

GLOBAL_NAME
--------------------------------------------------------------------------------
www.askmac.cn

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    5072553

SQL> select current_scn from v$database@PROD;

CURRENT_SCN
-----------
    7798262

SQL> insert into testlink@PROD values(1);

1 row created.

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    7798282

由于这种SCN跳跃可能引起ORA-600 [2252]内部错误，相关的bug case.

Filed Under: Oracle Tagged With: dblink, distributed transaction, system change number

Comments

Ask_Maclean_liu_Oracle says

2015/02/14 at 20:57

由于这种SCN跳跃可能引起ORA-600 [2252]内部错误，相关的bug case:Hdr: 11767824 11.1.0.7 RDBMS 11.1.0.7 RECOVERY PRODID-5 PORTID-23 9254170Abstract: HIGH SCN VALUES / ORA-600 [2252] ERRORSPROBLEM:——–Customer has the following distributed environment:NDRPRO: 10.2.0.4.0 – 64bit O/s solaris 5.9OPTIMA: 11.1.0.7.0 O/s solaris 5.10 The following errors occurred on NDRPRO resulting in instance crash by RECO:Mon Feb 7 19:48:08 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_j000_3807.trc:ORA-600: internal error code, arguments: [2252], [2832], [1879521022], [],[], [], [], []Mon Feb 7 19:48:23 2011Error 600 trapped in 2PC on transaction 356.59.431981. Cleaning up.Error stack returned to user:ORA-600: internal error code, arguments: [2252], [2832], [1879521022], [],[], [], [], []Mon Feb 7 19:48:23 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_j000_3807.trc:ORA-603: ORACLE server session terminated by fatal errorORA-600: internal error code, arguments: [2252], [2832], [1879521022], [],[], [], [], []Mon Feb 7 19:48:37 2011DISTRIB TRAN NDRPRO.WORLD.5e02d205.356.59.431981 is local tran 356.59.431981 (hex=164.3b.6976d) insert pending collecting tran, scn=12165222955616 (hex=b10.6fcafa60)Mon Feb 7 19:48:38 2011HS: Created new FDS instance definition in server DDHS: Instance id = 673591, instance name = hsgcs21p (classODBC10.2.0.4.0_300)Mon Feb 7 19:48:38 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_reco_19784.trc:ORA-600: internal error code, arguments: [2252], [2832], [1880544809], [],[], [], [], []Mon Feb 7 19:48:43 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_j004_28599.trc:ORA-12012: error on auto execute of job 7212828ORA-6576: not a valid function or procedure nameMon Feb 7 19:48:54 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_reco_19784.trc:ORA-600: internal error code, arguments: [2252], [2832], [1880544809], [],[], [], [], []Mon Feb 7 19:49:02 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_reco_19784.trc:ORA-600: internal error code, arguments: [2252], [2832], [1880544809], [],[], [], [], []Mon Feb 7 19:49:02 2011RECO: terminating instance due to error 476Mon Feb 7 19:49:02 2011ORA-476: RECO process terminated with errorMon Feb 7 19:49:02 2011Errors in file /orasys/app/oracle/admin/ndrpro/bdump/ndrpro_lgwr_19767.trc:ORA-476: RECO process terminated with errorDIAGNOSTIC ANALYSIS:——————–As I understand it, it is expected behavior to the SCN to jump in adistributed transaction to the SCN of the database with the highest SCN.However, the SCN jump is so severe it causes the ORA-600 [2252] to besignaled because the new SCN is greater than the max reasonable SCN asexplained in Bug 9254170.We do see an SCN jump in the archives, but at OPTIMA, not NDR:SQL> select INST_ID,THREAD#,SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE# fromsystem.OPTIMA_ARCHIVED where (NEXT_CHANGE#-FIRST_CHANGE#)>200000000000 orderby INST_ID,THREAD#,SEQUENCE#,FIRST_CHANGE#,NEXT_CHANGE#; INST_ID THREAD# SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#———- ———- ———- ——————– ——————– 5 5 3586 11305401088656 11533047607155 5 6 3965 11305409961956 11533047580831 6 5 3586 11305401088656 11533047607155 6 6 3965 11305409961956 11533047580831 7 5 3586 11305401088656 11533047607155 7 6 3965 11305409961956 11533047580831 8 5 3586 11305401088656 11533047607155 8 6 3965 11305409961956 115330475808318 rows selected.WORKAROUND:———–Set distributed recovery off and purge the problem distributed transaction,which had gone in-doubt.RELATED BUGS:————-9254170.REPRODUCIBILITY:—————-TEST CASE:———-STACK TRACE:————kcmchk kcsadjn k2rlog npicon0 k2nrec k2vrec ksbcti k2vdrv ksbcti ksbabsksbrdp opiripSUPPORTING INFORMATION:———————–Uploading the following supporting information:1) alert_ndrpro.log: Alert log for local database which was terminated byRECO2) ndr_archived.dmp: exp dump of v$archived_log from NDR3) optima_archived.dmp: exp dump of v$archived_log from OPTIMA4) ndrpro_j035_18426.trc: trace file from ora-600 [2252]5) ndrpro_reco_19728.trc: trace file from ora-600 [2252]6) ndrpro_opatchlsinv.txt: opatch lsinventory output for NDR7) 2pc_neighbors.txt: DBA_2PC_NEIGHBORS output for NDR 2pc_pending.txt: DBA_2PC_PENDING output for NDRUploaded two files, SCN-mbo2.log and SCN-optima.log, to the bug under/upload/bug11/bug11767824. These two files contain the hourly snapshots ofthe SCNs in the two databases as has been advised earlier. Customer wants toverify and confirm if the increase in SCNs as noticed in these logs are inline with expectations that the test DB would catch up within Optima in the8~22 hours period as had been outlined earlier. Thanks! Also, another division of AT&T is facing the same issue in one of theirapplication suites (Telegence) different from this one (Optima), and there isa separate Sev1 bug on that – bug 11727879 – and that division is alsorolling back the patch following the advice from this bug. So, forconvenience, it would be great if both bugs are linked and owned by the sameowner/team. Thanks! -Govardhanan.Customer has provided the below information regarding the sysdate and SCN atthe time they got the ORA-600 errors (within 30 mins after rolling back thepatch)Database Proj1 – SCN: b171e7c38a5Database NSO – SCN:b171e55f805Sysdate information from Proj1 and NSO databases———————————————–PROJ1 date: SYSTIMESTAMP—————————————————————————SYS_EXTRACT_UTC(SYSTIMESTAMP)—————————————————————————28-FEB-11 05.40.59.792873 AM -06:0028-FEB-11 11.40.59.792873 AMNSO:SYSTIMESTAMP—————————————————————————SYS_EXTRACT_UTC(SYSTIMESTAMP)—————————————————————————28-FEB-11 03.40.58.977912 AM -08:0028-FEB-11 11.40.58.977912 AMAde Transaction:kgiyer_diag_11767824_10.2.0.4.0Files Changed: 1. rdbms/mesg/e9858.msg 2. rdbms/notes/backport.txt 3. rdbms/src/server/dict/libcache/opikgl.c 4. rdbms/src/server/progint/distrib/k2r.c 5. rdbms/src/server/progint/distrib/k2s.c 6. rdbms/src/server/progint/opi/opiexe.c 7. rdbms/src/server/rcv/if/kcm.h 8. rdbms/src/server/rcv/if/kcs.h 9. rdbms/src/server/rcv/kcm.c 10. rdbms/src/server/rcv/kcs.c 11. rdbms/src/server/rcv/kcv.c 12. rdbms/src/server/txn/disttx/k2g.c 13. rdbms/src/server/txn/if/k2g.h 14. rdbms/src/server/txn/if/ktu.h 15. rdbms/src/server/txn/lcltx/ktu.c 16. rdbms/src/server/vos/if/kspare.h 17. rdbms/src/server/vos/ksdp.c 18. rdbms/src/server/vos/ksdt.c 19. rdbms/src/server/vos/kspt.c Build Instructions:recomp rdbms/mesg/e9858.msgrecomp rdbms/src/server/dict/libcache/opikgl.crecomp rdbms/src/server/progint/distrib/k2r.crecomp rdbms/src/server/progint/distrib/k2s.crecomp rdbms/src/server/progint/opi/opiexe.crecomp rdbms/src/server/rcv/kcm.crecomp rdbms/src/server/rcv/kcs.crecomp rdbms/src/server/rcv/kcv.crecomp rdbms/src/server/txn/disttx/k2g.crecomp rdbms/src/server/txn/lcltx/ktu.crecomp rdbms/src/server/vos/ksdp.crecomp rdbms/src/server/vos/ksdt.crecomp rdbms/src/server/vos/kspt.ccd rdbms/src/server/vos; $ORACLE_HOME/rdbms/bin/genksms > ksms.s; /usr/bin/as-o ksms.o ksms.s;link oracleDeliverables:rdbms/mesg/oraus.msgopikgl.o k2r.o k2s.o opiexe.o kcm.o kcs.o kcv.o k2g.o ktu.o ksdp.o ksdt.okspt.o : libserver10.aUsage Instructions:Please refer to Readme InstructionsPerformance Implications:Out of the box, the patch is diagnostics only.There will be a minor performance hit due to the various diagnostic events.Base Label Used:RDBMS_10.2.0.4.0_LINUX_RELEASERollback Instructions:If the below hidden parameters were not used after applying the patch – _max_reasonable_scn_rate _reasonable_scn_offset_seconds- then rolling back the patch is straightforward. If the above parameterswere used, then dev help will be requied to safely rollback the patch.Online Patch:NOReadme Instructions:Also appended to backport.txt in above diag txn.PART 1——The purpose of this patch is primarily to track SCN jumps in a database dueto distributed transactions and to dump information about the remotedatabases participating in the distributed transaction. This will allow us totrack back to the database which is communicating its very high SCN to theother databases in the distributed transaction. Once this database (suspectdatabase) is identified, then an attempt will be made to study the alertlogs, trace dumps and redo logs of the suspect database. If that does notgive enough information or if the suspect database is found to have receivedits SCN from another database through a distributed transaction, then thesame patch will be applied on the suspect databsae and the process repeated.Hence this diagnostic will be an ITERATIVE ATTEMPT at getting to the rootcause of the high SCN issue in ATT databases..In order to collect information about SCN jumps in the database, afterapplying the patch, ct is requested to set the following events – GSCN_ALERT_WRAP_DELTA GSCN_ALERT_BASE_DELTA- either statically by setting below in init.ora – event=”GSCN_TRACKER_SET trace name GSCN_ALERT_WRAP_DELTA level 0xFFFFFFFE” event=”GSCN_TRACKER_SET trace name GSCN_ALERT_BASE_DELTA level 0x100000″- or dynamically using – alter system set events ‘immediate trace name GSCN_ALERT_WRAP_DELTA level0xFFFFFFFE’; alter system set events ‘immediate trace name GSCN_ALERT_BASE_DELTA level0x100000’;The level values have the following meaning -For GSCN_ALERT_WRAP_DELTA, the level indicates the higher two bytes of a sixbyte SCN number.For GSCN_ALERT_BASE_DELTA, the level indicates the lower four bytes of a sixbyte SCN number.Note that 0 and 0xFFFFFFFF are invalid level values.To dynamically disable the above event use – alter system set events ‘immediate trace name GSCN_ALERT_WRAP_DELTA level0xFFFFFFFE’; alter system set events ‘immediate trace name GSCN_ALERT_BASE_DELTA level0xFFFFFFFE’;The level values for the above events taken together is the SCN adjustthreshold. Whenever there is an SCN jump in the database that is more than orequal to the SCN adjust threshold, diagnostics will be dumped to trace. Forexample, the settings of – alter system set events ‘immediate trace name GSCN_ALERT_WRAP_DELTA level0xFFFFFFFE’; alter system set events ‘immediate trace name GSCN_ALERT_BASE_DELTA level0x100000’;- translates to an SCN value of 0x0000.00100000(GSCN_ALERT_WRAP_DELTA.GSCN_ALERT_BASE_DELTA). Remember 0xFFFFFFFE means 0since zero is an invalid value for the Oracle event syntax.So for the above example, distributed transaction diagnostics will be dumpedwhen there is an SCN jump of more than 1048576 (0x100000) SCNs in thedatabase.The default settings for the above events translates to 0x0001.00000000 (ie.about 4 billion SCNs). Since ATT databases were observed to have frequent SCNjumps in the 1 million range, it is recommened that ct set the above eventsaccordingly.When the SCN adjust threshold is triggered, diagnostics will dumped to atrace file. The location of the trace file will be printed in the alert logalongwith the message – Detected high global SCN adjust delta!- The trace file will contain the relevant traces between the lines – Dumping global SCN tracker diagnosticsand End of global SCN tracker diagnostics- At anytime, ct can check the settings for the above events by doing – alter session set events ‘immediate trace name DUMP_GSCN_TRACKER_SETTINGSlevel 1’;- which will dump the value of the above events to alert log.Note that the above events will NOT prevent the SCN jump from going through.They will only collect diagnostics. Ensure MAX_DUMP_FILE_SIZE is set to ahealthy value so as to not run out of dump space. The following is a sample testcase to verify the above behavior -1. Prepare two databases that will perform distributed transactions with eachother. In my environment, I have two databases with the following details -patched database—————-VERSION = 10.2.0.4.0GLOBAL NAME = SRC4.US.ORACLE.COMDBID = 0x6288911DTABLE = SCOTT.EMPDBLINK = DSTI.US.ORACLE.COM- by doing – alter database rename global_name to src4.us.oracle.com; conn scott/tiger create database link dsti connect to bob identified by bob using ‘dst’;non-patched database——————–VERSION = 11.1.0.7.2GLOBAL NAME = DST7.US.ORACLE.COMDBID = 0x22A2CC7ATABLE = BOB.EMPDBLINK = SRCD.US.ORACLE.COM- by doing – alter database rename global_name to dst7.us.oracle.com; create user bob identified by bob; grant dba to bob; create table bob.emp as select * from scott.emp; conn bob/bob create database link srcd connect to scott identified by tiger using ‘src’;2. Set the SCN adjust threshold to 100000 SCNs in SRC4 – alter system set events ‘immediate trace name GSCN_ALERT_WRAP_DELTA level0xFFFFFFFE’; alter system set events ‘immediate trace name GSCN_ALERT_BASE_DELTA level100000’;You can check it is set correctly by running – alter session set events ‘immediate trace name DUMP_GSCN_TRACKER_SETTINGSlevel 1’;- and checking the alert log which should look lie -GSCNTRACKER Settings – GSCN_ALERT_RATE = 0 GSCN_ALERT_DELTA = 0x0000.000186a0 GSCN_ABORT_ON_DELTA_ALERT = NO GSCN_SAFE_ERROR_ON_SCN_CHECK = NO3. Bump up the SCN on DST7Check the current SCN on both databases using – select to_char(current_scn, ‘xxxxxxxxxxxx’) from v$database;- I get -SRC4 = 0x0000.00047236DST7 = 0x0000.00060174Now, we increase the SCN on DST7 by using the ADJUST_SCN event -(Note: do not use this event outside of your test environment)set the following hidden parameter in init.ora on DST7 database and bouncethe database. _allow_error_simulation=trueNow, the SCN is increased by doing – alter system set events ‘immediate trace name adjust_scn level ‘;where translates to (n*0x40000000) as the target SCN value. ie. n = 1will set the SCN as 0x0000.40000000. For the testcase pick a value that islarger than the current SCN on both databases by atleast 100000 SCNs (ourearlier SCN adjust threshold value).So, run on DST7 – alter system set events ‘immediate trace name adjust_scn level 1’; select to_char(current_scn, ‘xxxxxxxxxxxx’) from v$database;- I get -SRC4 = 0x0000.0004760aDST7 = 0x0000.400002a1 4. Initiate a distributed transaction from DST7 to SRC4Run on DST7 -conn bob/bobBEGIN update emp set ename = ‘QUEEN’ where empno=7839; update emp@srcd set ename = ‘QUEEN’ where empno=7839;END;/In the alert log of SRC4, I find -GSCNTRACKER(0x0000.400002a3): Detected high global SCN adjust delta! Alert delta : 0x0000.000186a0 Observed delta : 0x0000.3ffb8c88 Old SCN : 0x0000.0004761b New SCN : 0x0000.400002a3Dumping to trace file /ade/kgiyer_t4/oracle/rdbms/log/t4_ora_11104.trcFrom /ade/kgiyer_t4/oracle/rdbms/log/t4_ora_11104.trc, I see (relevantextract only) -Dumping global SCN tracker diagnostics…Dumping current session distributed transaction data – Parent USERNAME = BOB Parent USERID = 64 Parent DBNAME = DST7.US.ORACLE.COM Parent DBID = a7cc2a22 My DBNAME = SRC4.US.ORACLE.COM My DBID = d1198826… Short stack dump:ksdxfstk()+19<-ksdxdocmdmult()+3494<-ksudumpses()+765<-kcsgstdsd()+855<-kcsgstdsdec()+969<-kcsgstdd()+1088<-kcsgstaad()+411<-kcslaj2()+552<-kcsadj2()+287<-kcsadjn()+272<-k2serv()+412<-opiodr()+976<-ttcpip()+1085<-opitsk()+1054<-opiino()+821<-opiodr()+976<-opidrv()+466<-sou2o()+91<-opimai_real()+117<-main()+111<-__libc_start_main()+211…End of global SCN tracker diagnostics srcJul=0 srcJtim=0 cps=1 From BOB@DST7.US.ORACLE.COM (a7cc2a22) This SCOTT@SRC4.US.ORACLE.COM (d1198826)- The above tells me that an SCN adjust of 0x3ffb8c88 (1073450120) SCNs wasattempted.The call stack (k2serv()) tells me that this was an incoming distributedtransaction.The transaction was initiated by user BOB on database DST7.US.ORACLE.COM withDBID=0x22a2cc7a.SRC4.US.ORACLE.COM.Now, from here we can move to database DST7 and study its traces tounderstand why it had a very high SCN.Notice that the current SCN on SRC4 has increased -SRC4 = 0x0000.400005d2DST7 = 0x0000.400002a2 5. Now, we initiate a distributed transaction from SRC4 to DST7First we bump up the SCN another notch on DST7 – alter system set events ‘immediate trace name adjust_scn level 2’; select to_char(current_scn, ‘xxxxxxxxxxxx’) from v$database;- I get -SRC4 = 0x0000.400005d7DST7 = 0x0000.80000001 Run on SRC4 -conn scott/tigerBEGIN update emp set ename = ‘QUEEN’ where empno=7839; update emp@dsti set ename = ‘QUEEN’ where empno=7839;END;/In the alert log of SRC4, I find -GSCNTRACKER(0x0000.80000022): Detected high global SCN adjust delta! Alert delta : 0x0000.000186a0 Observed delta : 0x0000.3ffffa31 Old SCN : 0x0000.400005f1 New SCN : 0x0000.80000022Dumping to trace file /ade/kgiyer_t4/oracle/rdbms/log/t4_ora_11960.trcFrom /ade/kgiyer_t4/oracle/rdbms/log/t4_ora_11960.trc, I see (relevantextract only) -Dumping global SCN tracker diagnosticsDumping all network connections -0001) NCO (0xf7103920) ACTIVE = YES LAST SUB = NO IN TXN = NO CONN USERNAME = BOB AUTH USERNAME = USERID = 38 DBNAME = DST7.US.ORACLE.COM DBID = a7cc2a22 DBLINK = DSTI.US.ORACLE.COM CONNECT STRING = dst NCODEF FLAG = 0x2080 NCOLINK FLAG = 0x0000 HOSTDEF FLAG = 0x40002da9 TRAIT = 00000003 COMMIT STRENGTH = 1 LAST CMD RCVD = 5 SESSION NUM = 10Dumping current session distributed transaction data – Parent USERNAME = SCOTT Parent USERID = 38 Parent DBNAME = Parent DBID = My DBNAME = SRC4.US.ORACLE.COM My DBID = d1198826… Short stack dump:ksdxfstk()+19<-ksdxdocmdmult()+3494<-ksudumpses()+765<-kcsgstdsd()+855<-kcsgstdsdec()+969<-kcsgstdd()+1088<-kcsgstaad()+411<-kcslaj2()+552<-kcsadj2()+287<-kcsadjn()+272<-k2rlog()+860<-npicon0()+5012<-npicon()+41<-opikgl()+1632<-opiodr()+976<-__PGOSF112_rpidrus()+159<-skgmstack()+139<-rpidru()+98<-rpiswu2()+334<-rpidrv()+1199<-rpikgl()+262<-kqlrlk()+562<-kqlrld()+212<-kqllod()+6365<-kglobld()+821<-kglobpn()+881<-kglpim()+263<-kglpin()+943<-kglgob()+287<-qcdlgbo()+466<-qcdlgob()+734<-qcsfgob()+215<-qcsprfro()+437<-qcsprfro_tree()+321<-qcsprfro_tree()+118<-qcspafq()+102<-qcspqbDescendents()+249<-qcspqb()+167<-qcsdrv()+243<-qcitrans()+742<-qcisem()+395<-ph2csql_analyze()+749<-ph2stm()+1829<-ph2sms()+260<-ph2blo()+311<-ph2stm()+7445<-ph2uni()+359<-ph2dr2()+200<-ph2drv()+200<-phpsem()+32<-phpcmp()+1078<-pcisem2()+238<-kkxsem()+371<-opitca()+10353<-__PGOSF238_kksFullTypeCheck()+20<-rpiswu2()+334<-kksLoadChild()+5879<-kxsGetRuntimeLock()+1314<-kksfbc()+12752<-kkspsc0()+847<-kksParseCursor()+132<-opiosq0()+1315<-kpooprx()+215<-kpoal8()+673<-opiodr()+976<-ttcpip()+1085<-opitsk()+1054<-opiino()+821<-opiodr()+976<-opidrv()+466<-sou2o()+91<-opimai_real()+117<-main()+111<-__libc_start_main()+211 Current SQL Statement: BEGIN update emp set ename = ‘QUEEN’ where empno=7839; update emp@dsti set ename = ‘QUEEN’ where empno=7839;END;…End of global SCN tracker diagnosticsGSCNTRACKER: k2rlog: nco:0xf7103920 srcSCN: 0x0000.400005f1 -> adjSCN:0x0000.80000022 srcJul=2455626 srcJtim=63201 cps=1 This SCOTT@SRC4.US.ORACLE.COM (d1198826) To @DST7.US.ORACLE.COM (a7cc2a22)- The above tells me that an SCN adjust of 0x3ffffa31 (1073740337) SCNs wasattempted.The call stack (k2rlog()) tells me that this was an outgoing distributedtransaction.The transaction was initiated by user SCOTT on database SRC4.US.ORACLE.COMThe transaction was trying to connect (k2rlog) to BOB (CONN USERNAME) schemaon DST7.US.ORACLE.COM with DBID=a7cc2a22 through dblink DSTI.US.ORACLE.COM.Now, from here we can move to database DST7 and study its traces tounderstand why it had a very high SCN.So this covers both incoming and outgoing distributed transactions.Notice that the current SCN on SRC4 has increased -SRC4 = 0x0000.800000e7DST7 = 0x0000.80000101 6. Now we will try an incoming distributed transaction on SRC4 and abort thetxn if it exceeds the SCN adjust threshold.On SRC4 enable the abort option using – alter system set events ‘immediate trace name GSCN_ABORT_ON_DELTA_ALERTlevel 1’;- now if there is any SCN bump of more than 100000 SCNs on SRC4, then errorORA-10892 will be thrown.First we bump up the SCN another notch on DST7 – alter system set events ‘immediate trace name adjust_scn level 3’; select to_char(current_scn, ‘xxxxxxxxxxxx’) from v$database;- I get -SRC4 = 0x0000.80000140DST7 = 0x0000.c0000003 Run on DST7 -conn bob/bobBEGIN update emp set ename = ‘QUEEN’ where empno=7839; update emp@srcd set ename = ‘QUEEN’ where empno=7839;END;/- you will see that above SQL errors out with -ERROR at line 1:ORA-10892: SCN adjust delta exceeds thresholdORA-2063: preceding line from SRCDORA-6512: at line 3Checking the current SCN on SRC4, I get -SRC4 = 0x0000.80000162DST7 = 0x0000.c000014f – notice that the SCN on SRC4 did not increase. The alert log and sessiontrace files contain the diagnostic dump for the SCN adjust as before.Now bump up the SCN on SRC4 (disable GSCN_ABORT_ON_DELTA_ALERT first else thebump will not go through) – alter system set events ‘immediate trace name GSCN_ABORT_ON_DELTA_ALERTlevel 0xFFFFFFFE’; alter system set events ‘immediate trace name adjust_scn level 3’; select to_char(current_scn, ‘xxxxxxxxxxxx’) from v$database; alter system set events ‘immediate trace name GSCN_ABORT_ON_DELTA_ALERTlevel 1’;- I get -SRC4 = 0x0000.c000009fDST7 = 0x0000.c00001de Now try running on DST7 -conn bob/bobBEGIN update emp set ename = ‘QUEEN’ where empno=7839; update emp@srcd set ename = ‘QUEEN’ where empno=7839;END;/- you will see that above succeeds because the SCN adjust was less than thethreshold value of 100000.End of testcase.PART 2——The diagnostic patch contains other features that are disabled out of thebox. Ct has to enable them if they wish to use these features. Below is adescription of these features -1. SCN rate trackingIn addition to tracking SCN jumps, it is also possible to track the systemSCN rate every 30 seconds. If this rate is found to be greater than a SCNrate threshold, then diagnostics will be dumped. This option is disabled bydefault. The SCN rate threshold can be set using the following event – GSCN_ALERT_RATE- either statically by setting below in init.ora – event=”GSCN_TRACKER_SET trace name GSCN_ALERT_RATE level 50000″- or dynamically using – alter system set events ‘immediate trace name GSCN_ALERT_RATE level 50000’;- To disable this event, use – alter system set events ‘immediate trace name GSCN_ALERT_RATE level0xFFFFFFFE’;In the above example, the SCN rate threshold is 50000. ie. If the average SCNgrowth rate in the database is found to be greater than or equal to 50000SCNs per second, then all active session related diagnostics will be dumpedto a trace file. The location of the trace file will be printed in the alertlog alongwith the message – Detected high global SCN growth rate!- The trace file will contain the relevant traces between the lines – Dumping global SCN tracker diagnosticsand End of global SCN tracker diagnosticsThis is useful in identifying misbehaving sessions w.r.t SCN increments.2. Controlling distributed transactionsThe patch allows database to raise an external error (ORA-10892) wheneverthere is an attempt to adjust database SCN by more than the SCN adjustthreshold (GSCN_ALERT_WRAP_DELTA.GSCN_ALERT_BASE_DELTA). The advantage ofdoing this is that distributed transactions that attempt to increase thedatabase SCN by a high value will be aborted due to the above error and theSCN will not be adjusted. This can be considered as a stop gap measure tocontrol SCN growth while the root cause is investigated.This option is disabled by default. It can be enabled by using the followingevent – GSCN_ABORT_ON_DELTA_ALERT- either statically by setting below in init.ora – event=”GSCN_TRACKER_SET trace name GSCN_ABORT_ON_DELTA_ALERT level 1″- or dynamically using – alter system set events ‘immediate trace name GSCN_ABORT_ON_DELTA_ALERTlevel 1’;- To disable this event, use – alter system set events ‘immediate trace name GSCN_ABORT_ON_DELTA_ALERTlevel 0xFFFFFFFE’;3. Avoiding ORA-600[2252] from crashing databaseThe ORA-600[2252] error is raised if an attempt is made to increase thecurrent database SCN beyong the max reasonable SCN limit. Now, in some cases(eg. during DB startup), this error may be raised because the current SCN isalready more than the max reasonable SCN limit. This was observed at ATT whenthey applied the patch to increase the calcualated max SCN value and thenrolled back the patch. At that time, the current SCN had already progressedbeyond the pre-patch max SCN value. So database startup after rolling backthe patch failed with ORA-600[2252]. Another way to simulate this would be toshutdown the database, turn back the system clock and try to startup thedatabase. Since max SCN calculation is based on the system clock, this wouldresult in a lower max SCN value than the current SCN. However, in most othercases, this error is raised before the current SCN is allowed to increasebeyond the max reasonable SCN (this is the primary reason for raising theerror). Now when an internal error is raised in a database backgroundprocess, then it is a fatal condition and the instance crashes. In ct’s case,the distributed transaction recovery process (RECO) hit ORA-600[2252] leadingto the database crash. This could be avoided if an external error instead ofan internal error is raised for the case where an attempt is made to increasethe current SCN beyond the max reasonable SCN but where the current SCNitself is still less than the max reasonable SCN. This feature is disabled bydefault. It can be enabled by using the following event – GSCN_SAFE_ERROR_ON_SCN_CHECK- either statically by setting below in init.ora – event=”GSCN_TRACKER_SET trace name GSCN_SAFE_ERROR_ON_SCN_CHECK level 1″- or dynamically using – alter system set events ‘immediate trace name GSCN_SAFE_ERROR_ON_SCN_CHECKlevel 1’;- To disable this event, use – alter system set events ‘immediate trace name GSCN_SAFE_ERROR_ON_SCN_CHECKlevel 0xFFFFFFFE’;4. Controlling max reasonable SCN calculationThe patch allows the ct to increase the max reasonable SCN calculation usingthe hidden init.ora parameters – _max_reasonable_scn_rate _reasonable_scn_offset_seconds- These parameters are set to a default value such that the computed maxreasonable SCN value is the same as that of pre-patch databases._max_reasonable_scn_rate This has to be set to a value that reflects the usual SCN rate per secondfor the database due to normal transactions. This is a static parameter andhas to be set in init.ora and cannot be modified dynamically. This parametercan be set to a value in the range (16384 to 262144). Its default value is16384._reasonable_scn_offset_seconds This parameter can be used to introduce an offset to the system timeduring the max reasonable SCN calculation. This allows to compensate foralready high global SCNs on a database. This parameter should be used insteadof changing the _max_reasonable_scn_rate to compensate for high global SCNs.This is a static parameter and has to be set in init.ora and cannot bemodified dynamically. This parameter can be set to a value in the range (0 to2147483647). Its default value is 0.In order to find the correct value for this parameter for a particulardatabase, use the following steps – a. first find the target SCN – this is either the global SCN or the SCN inthe ORA-600 [2252] error parameters(first and second). To find the current global SCN, use query – select to_char(current_scn,’XXXXXXXXXXXX’) from v$database; b. convert the target SCN to an event level as below – if target SCN is 0xaaaa.bbbbcccc, then level will be 0xbbbbaaaa c. when the db is in either mount or open state run – alter session set events ‘immediate trace name DUMP_SCN_CHECK level0xbbbbaaaa’; note that 0xbbbbaaaa should be replaced by whatever value wascalculated in step b. d. check alert log for the recommended value for this parameter as outputin the line – Add _reasonable_scn_offset_seconds by nnn e. add nnn to existing value of _reasonable_scn_offset_seconds and set itin init.ora as – _reasonable_scn_offset_seconds=mmmEnd of Readme Instructions.Ade Transaction:kgiyer_diag_11767824_11.1.0.7.2_mlrBuild Instructions and Deliverables:See backport.txtUsage Instructions:Please refer to Readme Instructions in backport.txtPerformance Implications:Out of the box, the patch is diagnostics only.There will be a minor performance hit due to the various diagnostic events. Base Label Used:RDBMS_11.1.0.7.2PSU_LINUX_RELEASERollback Instructions:If the below hidden parameters were not used after applying the patch – _max_reasonable_scn_rate _reasonable_scn_offset_seconds- then rolling back the patch is straightforward. If the above parameterswere used, then dev help will be requied to safely rollback the patch. Online Patch:NO Readme Instructions:Appended to backport.txt in above diag txn.Uploaded alert log files from optima toI have requested this information from the customer but I was of theunderstanding that the customer and support in turn had already provided allthis information to you from the parent SR.Ade Transaction:kgiyer_diag_11767824_11.1.0.7.2_mlr3 Build Instructions and Deliverables:See backport.txt Usage Instructions:Please refer to Readme Instructions in backport.txt Performance Implications:Out of the box, the patch is diagnostics only.There will be a minor performance hit due to the various diagnostic events. Base Label Used:RDBMS_11.1.0.7.2PSU_LINUX_RELEASE Rollback Instructions:If the below hidden parameters were not used after applying the patch – _max_reasonable_scn_rate _reasonable_scn_offset_seconds- then rolling back the patch is straightforward. If the above parameterswere used, then dev help will be requied to safely rollback the patch. Online Patch:NO Readme Instructions:Appended to backport.txt in above diag txn.*** 03/27/11 04:27 pm ***Pasting README addendum for reference (this is in addition to README at ***KGIYER 03/06/11 01:25 am ***) -ADDENDUM – PATCH V2The revised version of this diag patch introduces periodic SCN tracking.The rationale for this feaure is that -It is important for further RCA to be able to understand the SCN growth trendaccurately and differentiate between intrinsic and extrinsic SCN growthcontributors. This differentiation will allow us to judge whether the currentDB is the source of its high SCN growth or if the growth is largelyinfluenced by communication with a remote DB.The earlier events were triggered on a SCN change threshold and so did nottrack every SCN jump. Earlier, when there were many SCN jumps within a shortperiod of time, it triggered a lot of diagnostic dumps within a short periodleading to performance issues. Also it becomes difficult to extract atime-based SCN growth trend from such traces.Hece, the periodic SCN tracking feature introduceds a new diagnostic event -GSCN_TRACKER_SECONDS – which controls the periodicity of the SCN diagnosticdump (default 60 seconds).The periodicity can be set — either statically by setting below in init.ora – event=”GSCN_TRACKER_SET trace name GSCN_TRACKER_SECONDS level 120″- or dynamically using – alter system set events ‘immediate trace name GSCN_TRACKER_SECONDS level120’;- To disable this event, use – alter system set events ‘immediate trace name GSCN_TRACKER_SECONDS level0xFFFFFFFE’;The output generated by this diagnostic event and the pervious ones have beencondensed for easier extraction and analysis. A description of the diagnosticouput format is given below. -GSCN_TRACKER_SECONDS ouputs the following line (example) every 60 seconds inckpt process trace file -GSCNT SCN RATE: 03/22/2011 11:50:26 pscn=0x0005.00000267 cscn=0x0005.4000001dxadj=0x0000.00000002 iadj=0x0000.3ffffd94 incr=0x0000.00000020 in 60 seconds- wherepscn (prev scn) is the scn 60 seconds ago,cscn (current scn) is the current system scn,xadj (external adjust) is the amount of scn jump (in hexadecimal) due tocommunication with remote DB,iadj (internal adjust) is the amount of scn jump (in hexadecimal) generatedwithin the same DBincr (increments) is the amount of SCN increments which is a derived valuefrom the previous four statisticsIncase xadj is non-zero in the above dump, an additional line (example) isoutput -GSCNT SCN RATE: 03/22/2011 11:50:26 In the last 60 seconds, the maximumextrinsic SCN jump was 0x0000.00000001 due to outgoing distributedtransaction(end) with DST7.US.ORACLE.COM (a7cc2a22)- which give the remote DB name which caused the highest SCN jump withing the60 second measurement period. Sometimes the priodic SCN tracking code may notbe able to acquire a latch to update the SCN jump statistics. Rather thanforcing an error, the SCN jump info is ouput to alert log along with thetimestamp starting with the phrase – ‘GSCNT SCN ADJUST (LATCH_GET_FAIL)’.This jump would show up in the SCN stat output as SCN increment.Incase GSCN_ALERT_RATE event is set (disabled by default) and if incr persecond exceeds this value, then the above line is also output to alert log.By default, active session dumps and error stack have been disabled to keepthe diagnostic dump size manageable. In case active session dump is requiredduring GSCN_ALERT_RATE event trigger, it can be set (example also enableserror stack dump) — either statically by setting below in init.ora – event=”GSCN_TRACKER_DUMP trace name GSCN_SESSION_INFO level 1; nameGSCN_ERROR_STACK level 1; name context forever, level 1″;- or dynamically using – alter system set events ‘GSCN_TRACKER_DUMP trace name GSCN_SESSION_INFOlevel 1; name GSCN_ERROR_STACK level 1; name context forever, level 1’;- To disable the action session or error stack dump, remove the correspondingevent in above specificationIncase GSCN_ALERT_WRAP_DELTA.GSCN_ALERT_BASE_DELTA is set (default0x0001.00000000) and if an external SCN adjust exceeds this value, then thefollowing line (example) is dumped to alert log and session trace file -GSCNT SCN ADJUST: 03/22/2011 12:00:51 oscn=0x0005.80000565cscn=0x0005.c0000065 adj=0x0000.3ffffb00 due to outgoing distributed – whereoscn (old scn) is the scn before jumpcscn (current scn) is the scn after jumpadj (adjust) is the amount of scn jump (in hexadecimal)the rest are remote DB and DBID name for the corresponding distributedtransaction.- In addition to above, distributed transaction info and active DB link infois dumped to session trace file. Error stack dump is disabled by default butcan be enabled using GSCN_TRACKER_DUMP event as mentioned above.Out of the box, the diagnostics will output SCN stats every 60 seconds inckpt trace file and dump distributed transaction info to session tracewhenever an externl SCN jump is atleastGSCN_ALERT_WRAP_DELTA.GSCN_ALERT_BASE_DELTA SCNs.Please upload alert log and ckpt trace file after collecting the diagnostics.

回复

SCN may jump in a distributed transaction with dblink

Comments

Comment 取消回复