How to troubleshooting RAC Vip Problem

1.- Please provide the output of the following commands from each node:

 srvctl config nodeapps -n <nodename> -a -g -s -l
 ifconfig -a
 cat /etc/hosts

2.- Please set debug mode for VIP resources and reproduce the problem. Please take note of the time of the test:

 a.- As root user, issue the command :
 crsctl debug log res "<ora.dbtest2.vip>:5"

 (note: replace ora.dbtest2.vip for each of your vip resources)

 b.- Take note of node, date, time

 c.- Reproduce the problem

 d.- You may turn off debugging with command :
 crsctl debug log res "<ora.dbtest2.vip>:0"

3.- Set OS watcher as the following note explain:

 Note 301137.1: OS Watcher User Guide - upload output of OS Watcher

4.- Collect from each node:

 a.- Os log files
 /var/log/messages

 b.- Os watcher stats for the time of the test

 c.- CRS log files:
 From the $ORA_CRS_HOME, run the following commands as root

 * $script /tmp/diag.log
 * $env
 * $id
 * $cd $ORA_CRS_HOME/bin
 o Execute diagcollection.pl by passing the crs_home as the following
 o export ORA_CRS_HOME=/u01/crs
 o $ORA_CRS_HOME/bin/diagcollection.pl -crshome=$ORA_CRS_HOME --collect

This will create the crsData_<hostname>.tar.gz, ocrData_<hostname>.tar.gz,
oraData_<hostname>.tar.gz and basData_<hostname>.tar.gz. Additionally in 11gR2,
there will be os_<hostname>.tar.gz and ipd_

 d.- If Vendor clusterware is not used then upload the oprocd logs.
They are in /var/opt/oracle/`hostname`/ on most platforms.

了解sqlplus中的copy命令

复制表数据是Oracle dba和应用开发人员的日常工作之一,常见的三种复制表数据的方式是:

  1. 以INSERT ..SELECT的方式将目标数据由源表插入到源表中,以此种方法复制表上的数据时可以灵活运行SELECT语句剔除那些无用的数据;同时也可以利用append nologging和并行parallel等特性加速数据复制速度.通过使用DBLINK数据库远程连接,INSERT…SELECT方式可以在数据库间复制表数据,当然要比在本库中复制表数据效率下降一些.
  2. 使用Oracle标准的导入导出工具exp/imp或者在10g以后使用DataPump数据泵技术,此种方法可以在数据库之间复制表数据,也可以用QUERY选项指定Where条件以筛选导出数据,10g中server端模式的数据泵极大地提高了导入导出的效率.
  3. 此外我们也可以使用SQLPLUS的copy命令在本库中或者在数据库间复制表上的数据.

使用copy命令复制表数据的这种方式用的较少,我们来着重介绍一下:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> conn maclean/maclean
Connected.

SQL> create table copy_sample as select * from dba_objects;
Table created.

SQL> copy
usage: COPY FROM <db> TO <db> <opt> <table> { (<cols>) } USING <sel>
 <db>   : database string, e.g., hr/your_password@d:chicago-mktg
 <opt>  : ONE of the keywords: APPEND, CREATE, INSERT or REPLACE
 <table>: name of the destination table
 <cols> : a comma-separated list of destination column aliases
 <sel>  : any valid SQL SELECT statement
A missing FROM or TO clause uses the current SQL*Plus connection.

1.使用copy命令在本地同一数据库复制表上的数据

SQL>copy from maclean/maclean@clinica.rh2.oracle.com create copy_table using select * from copy_sample;

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table COPY_TABLE created.

   51041 rows selected from maclean@clinica.rh2.oracle.com.
   51041 rows inserted into COPY_TABLE.
   51041 rows committed into COPY_TABLE at DEFAULT HOST connection.

/*  以上命令中from指定了源表所在的数据库连接字符,另可以用to关键字指定目标对象所在数据库连接字符串;
    using字句使用SELECT语句指定了所需复制的数据,因为是查询语句所以数据可以来源于多个表的复杂查询结果
*/

/*  需要注意的是from或to必须指定其一,
    否则出现:"SP2-0495: FROM and TO clauses both missing; specify at least one"的提示,
    若不指定from或者to,则默认其为SQLPLUS当前连接的数据库及模式
*/

/* 可以通过加大sqlplus的arraysize加速copy表数据  */

SQL>drop table copy_table;
Table dropped.

SQL>set arraysize 5000;

SQL>copy from maclean/maclean@clinica.rh2.oracle.com create copy_table using select * from copy_sample;

Array fetch/bind size is 5000. (arraysize is 5000)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table COPY_TABLE created.

   51041 rows selected from maclean@clinica.rh2.oracle.com.
   51041 rows inserted into COPY_TABLE.
   51041 rows committed into COPY_TABLE at DEFAULT HOST connection.

2.使用copy命令在数据库间复制表数据

SQL>copy from maclean/maclean@clinica.rh2.oracle.com to maclean/maclean@clinicb.rh3.oracle.com create
copy_table_from_another_db using select * from copy_sample;

Array fetch/bind size is 15. (arraysize is 15)
Will commit when done. (copycommit is 0)
Maximum long size is 80. (long is 80)
Table COPY_TABLE_FROM_ANOTHER_DB created.
   51041 rows selected from maclean@clinica.rh2.oracle.com.
   51041 rows inserted into COPY_TABLE_FROM_ANOTHER_DB.
   51041 rows committed into COPY_TABLE_FROM_ANOTHER_DB at maclean@clinicb.rh3.oracle.com.

SQL> conn maclean/maclean@clinicb.rh3.oracle.com
Connected.

SQL> select count(*) from copy_table_from_another_db;
  COUNT(*)
----------
     51041

3.使用不同的复制copy形式:

copy命令存在多种数据复制的模式,分别为:
1.append模式,将查询到数据插入到目标表中;若目标表不存在,则会创建目标表:
2.create模式,首先创建目标表,之后将数据插入到目标表中;若目标表已存在,copy命令将报错:
ERROR:
ORA-00955: name is already used by an existing object

3.insert模式,只负责将数据插入到目标表.若目标表不存在则copy命令直接报错;当使用insert模式时,需要保证using字句指定的列和目标表上的列一一对应

4.replace模式,负责替换目标表中的数据.若目标表不存在,使用replace模式时copy命令会创建目标表并插入数据;若目标表存在,则copy会将该表先drop掉再重建并插入数据

How to setup Oracle Streams Bi-Directional

构建Oracle双向流复制是一个十分复杂的过程,我写这个文档的目的是尽量有条理地列出所需做的工作,帮助DBA更有效的建设流复制环境。

1.以scott模式为复制示例,一般只要在创建数据库时选择了安装sample schema,都会存在该scott模式;至少保证源库中存在该schema,以便可以初始化到目标库中。

2.在源和目标2个数据库中创建strmadmin流管理用户,当然你也可以选用其他名字。同时在2个库中都要创建streams使用的表空间,以便让logmnr使用它:
[Read more…]

searchmysite

这是一个搜索结果预留页面, Page For Search Result

Patch your 10g Oracle database to PSU 10.2.0.4.5

有人会问不是10.2.0.5已经release了吗?为什么还要装10.2.0.4版本的PSU(Patch set Update)呢?做出这样的决定很大程度上是处于无奈的,毕竟Oracle的Bug有那么多,百年大计安全第一;10.2.0.4的稳定性在10g的范畴内可说是有目共睹的,而多达5个版本的PSU也进一步保证了能在10.2.0.4基础上不做大的变更,只修复合并一些小的bug。继10.2.0.4.4后(10.2.0.4.4有CRS和DB 2个版本)在July 13.2010推出了10.2.0.4.5 PSU,该PSU只有1.3MB大小,合并了:9714832, 9713537, 9678695, 9655017, 9173248, 8309642, 9678697, 9678690,9573054, 9654991等多个patch;因为10.2.0.4.5 PSU需要在10.2.0.4.4的基础上才能apply,所以我们需要首先应用10.2.0.4.4 PSU补丁。

PSU 10.2.0.4.5 的安装步骤:

1.如果你是全新的fresh安装的话,需要安装数据库Server软件10.2.0.1,之后升级到10.2.0.4;此外因为是fresh installation,还不存在数据库,所以只要是以custom database形式创建的新库都可以skip Post installation的步骤(也就是说不用执行脚本),而以DBCA中General, Data Warehouse, Transaction Processing形式创建新库需要执行catbundle.sql升级脚本(Loading Modified SQL Files into the Database)。
2.安装Oracle Opatch软件版本11.2.0.0,对应的patch号为6880880,使用10.2.0.4默认的Opatch软件在prereq和apply阶段会遇到Bug,并出现以下问题:

Invoking prereq "checkconflictagainstohwithdetail"
Prereq "checkConflictAgainstOHWithDetail" not executed
Unable to create Patch Object.
Exception occured : Patch ID is null.

Invoking prereq "checkconflictagainstohwithdetail"
The location ".//README.html" is not a directory or a valid patch zip file.
Prereq "checkConflictAgainstOHWithDetail" not executed
PrereqSession failed: Invalid patch location.

可以从MOS下载到p6880880_112000_$PLATFORM.zip的介质,解压后将OPatch部分的内容覆盖到原$ORACLE_HOME/OPatch目录:
[Read more…]

High Contrast for My Oracle Supports

My Oracle supports现在可以使用High Contrast高对比度界面设置功能了,目前提供2种选项:Black Text on White或者White Text on Black。个人比较喜欢黑底白字这类高对比度的界面,不过感觉现在这个White Text on Black还有待微调:

登录后的Dashboard:

这次的高亮界面因该是My Oracle Support Release 5.2 Upgrade升级中客户体验改善的一部分:

Prepare for My Oracle Support Release 5.2 [ID 1245514.1]

What is being announced?

Oracle Global Customer Support is pleased to announce that on October 29, 2010, My Oracle Support was upgraded to Release 5.2.

This upgrade is part of Oracle’s ongoing commitment to improving your support experience and providing a single integrated support platform across all products. My Oracle Support Release 5.2 will integrate Sun systems, processes, and tools into the My Oracle Support and Global Single Instance (GSI) infrastructure. Our goal is to consolidate systems and utilize global best practices, as well as provide a number of requested enhancements for existing My Oracle Support users. Although Release 5.2 enables Sun functionality, Sun customers and employees will not fully migrate to My Oracle Support until the weekend of December 10-12, 2010.

Below are some of the enhancements included in My Oracle Support Release 5.2. Comprehensive release notes covering all updates will be available soon under the News section on your My Oracle Support Dashboard.

Improvements to Knowledge Search
The new version of My Oracle Support incorporates a set of easy-to-use, quick-action features that enable customers to more quickly find the self-help knowledge they need.

Oracle has added the capability to filter knowledge based on the products in an uploaded configuration or from a Support Identifier number, thereby targeting searches for greater efficiency and relevance. We are enabling this through PowerView functionality; for more information, please review the recorded training titled Using PowerView in My Oracle Support located in Note 603505.1.

We have also made improvements to the Related Knowledge Search Results that are returned during the Service Request (SR) creation process. We have improved the precision of articles you are shown and we have added My Oracle Support Community content to the Service Request knowledge search. These improvements will help bring you the most relevant information that can help solve your SR.

Another knowledge search feature included in this release of My Oracle Support is Automated Search Expansion. For example, a term such as db will automatically include the word database in the search keywords to bring another level of user-friendliness and relevance to the process.

Finally, users have the ability to search any combination of the Knowledge Base, Archived Articles, Bug Database, Communities, or Documentation categories. This functionality provides users with the flexibility to filter knowledge even more precisely.

Checking Version Certification
Do you need to see if Oracle has certified your product versions to work together? The My Oracle Support Certify process has been dramatically updated, making the verification of product version compatibility much easier for specific product lines.

Choose your SR Severity
Users have the ability to set their own Service Request severity levels (1, 2, 3, or 4) in My Oracle Support Release 5.2.

Additional Features

* Enhanced Knowledge Management functionality and bug fixes
* Improved health checks with Oracle Configuration Manager (OCM) data
* Patch recommendations via Oracle Configuration Manager for specific product lines

What do you need to do?

* Be aware that My Oracle Support is upgraded to Release 5.2 and become familiar with the new features now available.
* Please review Note 875942.1 and attend one of the upcoming live webinars covering Support Tools and Processes. Alternately, you may watch one of the recorded training viewlets available as part of your Oracle Support contract.
* You can also check out the Communities link within My Oracle Support for targeted help from Oracle engineers and customers.
* If you plan on upgrading any of your Oracle products soon, please take a look at Note 250.1 and check out the new Upgrade Advisors pages available to help with specific upgrade paths.

Who to contact for more information?
Click on the “Contact Us” button at the top of the My Oracle Support dashboard page for feedback and assistance from your Oracle support team.


How does RECORDLENGTH affect your exp speed?

exp导出工具的几个常用参数想必大家都很熟悉;有时为了加速导出作业我们会加上direct=y,进一步的可能就会设置RECORDLENGTH参数了,Oracle官方对这个参数的定义是:length of IO record;这个解释过于简单了,偶有余暇来探究一下RECORDLENGTH的工作原理:

[maclean@rh2 test]$ exp help=y

Export: Release 10.2.0.4.0 - Production on Mon Nov 8 17:26:34 2010
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
You can let Export prompt you for parameters by entering the EXP
command followed by your username/password:

     Example: EXP SCOTT/TIGER

Or, you can control how Export runs by entering the EXP command followed
by various arguments. To specify parameters, you use keywords:

     Format:  EXP KEYWORD=value or KEYWORD=(value1,value2,...,valueN)
     Example: EXP SCOTT/TIGER GRANTS=Y TABLES=(EMP,DEPT,MGR)
               or TABLES=(T1:P1,T1:P2), if T1 is partitioned table

USERID must be the first parameter on the command line.

Keyword    Description (Default)      Keyword      Description (Default)
--------------------------------------------------------------------------
USERID     username/password          FULL         export entire file (N)
BUFFER     size of data buffer        OWNER        list of owner usernames
FILE       output files (EXPDAT.DMP)  TABLES       list of table names
COMPRESS   import into one extent (Y) RECORDLENGTH length of IO record
GRANTS     export grants (Y)          INCTYPE      incremental export type
INDEXES    export indexes (Y)         RECORD       track incr. export (Y)
DIRECT     direct path (N)            TRIGGERS     export triggers (Y)
LOG        log file of screen output  STATISTICS   analyze objects (ESTIMATE)
ROWS       export data rows (Y)       PARFILE      parameter filename
CONSISTENT cross-table consistency(N) CONSTRAINTS  export constraints (Y)

OBJECT_CONSISTENT    transaction set to read only during object export (N)
FEEDBACK             display progress every x rows (0)
FILESIZE             maximum size of each dump file
FLASHBACK_SCN        SCN used to set session snapshot back to
FLASHBACK_TIME       time used to get the SCN closest to the specified time
QUERY                select clause used to export a subset of a table
RESUMABLE            suspend when a space related error is encountered(N)
RESUMABLE_NAME       text string used to identify resumable statement
RESUMABLE_TIMEOUT    wait time for RESUMABLE
TTS_FULL_CHECK       perform full or partial dependency check for TTS
VOLSIZE              number of bytes to write to each tape volume
TABLESPACES          list of tablespaces to export
TRANSPORT_TABLESPACE export transportable tablespace metadata (N)
TEMPLATE             template name which invokes iAS mode export

Export terminated successfully without warnings.

/* 针对使用direct=y直接路径,未使用recordlength的作业进行strace系统调用追踪 */

[maclean@rh2 test]$ strace -o exp1.trace exp maclean/maclean file=tv.dmp tables=tv direct=y

Export: Release 10.2.0.4.0 - Production on Mon Nov 8 17:27:23 2010
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in US7ASCII character set and UTF8 NCHAR character set
server uses ZHS16GBK character set (possible charset conversion)

About to export specified tables via Direct Path ...
. . exporting table                             TV    6602624 rows exported
Export terminated successfully without warnings.

[maclean@rh2 test]$ ps -ef|grep exp
maclean  12265 12198 55 17:27 pts/2    00:00:02 strace -o exp1.trace exp maclean/maclean file=tv.dmp tables=tv direct=y
maclean  12266 12265 24 17:27 pts/2    00:00:01 exp                 file=tv.dmp tables=tv direct=y
maclean  12269 12233  0 17:27 pts/3    00:00:00 grep exp
[maclean@rh2 test]$ pmap -x 12266
12266:   exp                 file=tv.dmp tables=tv direct=y
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000     608     280       0 r-x--  exp
0000000000698000      16      16       8 rwx--  exp
000000000069c000      40      40      40 rwx--    [ anon ]
000000000f330000     876     656     656 rwx--    [ anon ]
00000039f1800000     112     100       0 r-x--  ld-2.5.so
00000039f1a1b000       4       4       4 r-x--  ld-2.5.so
00000039f1a1c000       4       4       4 rwx--  ld-2.5.so
00000039f1c00000    1336     452       0 r-x--  libc-2.5.so
00000039f1d4e000    2044       0       0 -----  libc-2.5.so
00000039f1f4d000      16      16       8 r-x--  libc-2.5.so
00000039f1f51000       4       4       4 rwx--  libc-2.5.so
00000039f1f52000      20      20      20 rwx--    [ anon ]
00000039f2000000     520      20       0 r-x--  libm-2.5.so
00000039f2082000    2044       0       0 -----  libm-2.5.so
00000039f2281000       4       4       4 r-x--  libm-2.5.so
00000039f2282000       4       4       4 rwx--  libm-2.5.so
00000039f2400000       8       8       0 r-x--  libdl-2.5.so
00000039f2402000    2048       0       0 -----  libdl-2.5.so
00000039f2602000       4       4       4 r-x--  libdl-2.5.so
00000039f2603000       4       4       4 rwx--  libdl-2.5.so
00000039f2800000      88      60       0 r-x--  libpthread-2.5.so
00000039f2816000    2044       0       0 -----  libpthread-2.5.so
00000039f2a15000       4       4       4 r-x--  libpthread-2.5.so
00000039f2a16000       4       4       4 rwx--  libpthread-2.5.so
00000039f2a17000      16       4       4 rwx--    [ anon ]
00000039f5c00000      84      24       0 r-x--  libnsl-2.5.so
00000039f5c15000    2044       0       0 -----  libnsl-2.5.so
00000039f5e14000       4       4       4 r-x--  libnsl-2.5.so
00000039f5e15000       4       4       4 rwx--  libnsl-2.5.so
00000039f5e16000       8       0       0 rwx--    [ anon ]
00002b6ebde28000       8       8       8 rwx--    [ anon ]
00002b6ebde2a000   18632    5024       4 r-x--  libclntsh.so.10.1
00002b6ebf05c000    2044       0       0 -----  libclntsh.so.10.1
00002b6ebf25b000     656     656     640 rwx--  libclntsh.so.10.1
00002b6ebf2ff000     112      32      32 rwx--    [ anon ]
00002b6ebf31b000    3004    1280     416 r-x--  libnnz10.so
00002b6ebf60a000    1020       0       0 -----  libnnz10.so
00002b6ebf709000     708     256     256 rwx--  libnnz10.so
00002b6ebf7ba000       4       4       4 rwx--    [ anon ]
00002b6ebf7d8000     432     164     164 rwx--    [ anon ]
00002b6ebf844000     400       8       0 r-x--  timezlrg.dat
00002b6ebf8c5000      40      28       0 r-x--  libnss_files-2.5.so
00002b6ebf8cf000    2044       0       0 -----  libnss_files-2.5.so
00002b6ebface000       4       4       4 r-x--  libnss_files-2.5.so
00002b6ebfacf000       4       4       4 rwx--  libnss_files-2.5.so
00002b6ebfad0000       4       4       4 rwx--    [ anon ]
00007ffff2cb3000      84      64      64 rwx--    [ stack ]
ffffffffff600000    8192       0       0 -----    [ anon ]
----------------  ------  ------  ------
total kB           51408    9276    2380
[maclean@rh2 test]$ cd /proc/12266/fd
[maclean@rh2 fd]$ ls -l
总计 0
lrwx------ 1 maclean oinstall 64 11-08 17:27 0 -> /dev/pts/2
lrwx------ 1 maclean oinstall 64 11-08 17:27 1 -> /dev/pts/2
lrwx------ 1 maclean oinstall 64 11-08 17:27 2 -> /dev/pts/2
lr-x------ 1 maclean oinstall 64 11-08 17:27 3 -> /s01/10gdb/rdbms/mesg/expus.msb
lr-x------ 1 maclean oinstall 64 11-08 17:27 4 -> /s01/10gdb/oracore/mesg/lrmus.msb
lr-x------ 1 maclean oinstall 64 11-08 17:27 5 -> /s01/10gdb/rdbms/mesg/ocius.msb
lrwx------ 1 maclean oinstall 64 11-08 17:27 6 -> socket:[56016]
l-wx------ 1 maclean oinstall 64 11-08 17:27 7 -> /home/maclean/test/tv.dmp
l-wx------ 1 maclean oinstall 64 11-08 17:27 8 -> pipe:[56017]
lr-x------ 1 maclean oinstall 64 11-08 17:27 9 -> pipe:[56018]

/* 同时分析该exp进程的地址空间及打开文件描述符,可以看到这里的fd=>7指向了导出文件tv.dmp */

节选trace部分内容:
read(9, "\7\333\0\0\6\0\0\0\0\0\10\0 \0\0009:57\5\0VALID\1\0N\1\0N"..., 2064) = 2064
read(9, "2008-03-12:00:39:58\5\0VALID\1\0N\1\0N"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0LID\1\0N\1\0N\1\0N\3\0SYS\v\0I_S"..., 2064) = 2064
read(9, "\376\377\3\0\302\5$\3\0\302\5$\5\0INDEX\7\0xl\3\f\1(<\7\0xl"..., 1958) = 1958
read(9, "\0\327\0\0\6\0\0\0\0\00059\5\0VALID\1\0N\1\0N\1\0N\5\0OU"..., 2064) = 215
write(7, "9:57\5\0VALID\1\0N\1\0N\1\0N\3\0SYS\t\0ATEMP"..., 4096) = 4096
write(7, "\0N\1\0N\3\0SYS\v\0I_SUMDEP$_2\376\377\3\0\302\5\17\3\0"..., 4096) = 4096
write(8, "\0r\0\0\6\0\0\0\0\0\3[\235@\177\202\277n+\0\0\0 \0\0\0\0\0\0\200a\314"..., 114) = 114
read(9, "\7\333\0\0\6\0\0\0\0\0\10\0 \0\0ALID\1\0N\1\0N\1\0N\5\0OU"..., 2064) = 2064
read(9, "\0N\1\0N\1\0N\3\0SYS\t\0MON_MODS$\376\377\3\0\302\5T\3"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\1)\1\7\0xl\3\f\1)\1\23\0002008-03-"..., 2064) = 2064
read(9, "ABLE\7\0xl\3\f\1)\26\7\0xl\3\f\1)\26\23\0002008-03-"..., 1958) = 1958
read(9, "\0\327\0\0\6\0\0\0\0\0\302\0061\3\0\302\0061\5\0INDEX\7\0xl\3\f\1"..., 2064) = 215
write(7, "ALID\1\0N\1\0N\1\0N\5\0OUTLN\17\0OL$NODE_OL"..., 4096) = 4096
write(7, ")\1\7\0xl\3\f\1)\1\23\0002008-03-12:00:40:00"..., 4096) = 4096
write(8, "\0r\0\0\6\0\0\0\0\0\3[\236@\177\202\277n+\0\0\0 \0\0\0\0\0\0\200a\314"..., 114) = 114 

/* 文件描述符为9的是一个管道文件,不难猜测其数据来源于TV表所在数据文件;
   fd为8的因该是system call的输出管道,可以不用关心  */ 

[maclean@rh2 test]$ strace -o exp2.trace exp maclean/maclean file=tv.dmp tables=tv direct=y recordlength=32768

Export: Release 10.2.0.4.0 - Production on Mon Nov 8 18:31:07 2010

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in US7ASCII character set and UTF8 NCHAR character set
server uses ZHS16GBK character set (possible charset conversion)

About to export specified tables via Direct Path ...
. . exporting table                             TV    6602624 rows exported
Export terminated successfully without warnings.

节选的recordlength为32k时的system call trace:

read(9, "\7\333\0\0\6\0\0\0\0\0\10\0\200\0\0ONYM\7\0xl\3\f\2\t9\7\0xl"..., 2064) = 2064
read(9, "\5\0VALID\1\0N\1\0N\1\0N\7\0OLAPSYS\10\0CWM$U"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0002:01:09:00\5\0VALID\1\0N\1\0"..., 2064) = 2064
read(9, "\0N\7\0OLAPSYS\26\0CWM2$AWDIMCREATEACC"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\0\303\5T\2\5\0TABLE\7\0xl\3\f\2\n\4\7"..., 2064) = 2064
read(9, "\5T\25\4\0\303\5T\25\5\0INDEX\7\0xl\3\f\2\n\5\7\0xl\3\f\2"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\4\0\303\5T'\5\0INDEX\7\0xl\3\f\2\n\5"..., 2064) = 2064
read(9, "ID\1\0N\1\0N\1\0N\7\0OLAPSYS\23\0CWM2$HIERA"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0007:32\5\0VALID\1\0N\1\0N\1\0N\7\0"..., 2064) = 2064
read(9, "-10-20:20:07:33\5\0VALID\1\0N\1\0N\1\0N\7"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\0VALID\1\0N\1\0N\1\0N\7\0OLAPS"..., 2064) = 2064
read(9, "\0CWM2_OLAP_MEASURE\376\377\4\0\303\5U\"\376\377\7\0PA"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\0N\1\0N\1\0N\6\0PUBLIC\23\0CWM2"..., 2064) = 2064
read(9, "008-03-12:01:09:11\5\0VALID\1\0N\1\0N\1"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\0N\7\0OLAPSYS\35\0ALL$OLAP2"..., 2064) = 2064
read(9, "TTR_USES\376\377\4\0\303\5V\v\376\377\4\0VIEW\7\0xl\3\f\2\n"..., 1958) = 1958
read(9, "\3\v\0\0\6\0\0\0\0\0\2\n\23\7\0xn\n\24\25\10'\23\0002008-03-"..., 2064) = 779
write(7, "ONYM\7\0xl\3\f\2\t9\7\0xl\3\f\2\t9\23\0002008-03-"..., 4096) = 4096
write(7, "008-03-12:01:09:02\5\0VALID\1\0N\1\0N\1"..., 28672) = 28672

/* 以上可以看到exp进程首先了累计(accumulates)读取了32k左右的数据到缓存中,之后以2个write调用分别写出4096和28672 bytes
    的数据,也就是在读取32k数据后全部写出再依次循环,不同于上面的累计读取8k然后写出8k
*/

/* 对比pmap报告可以发现内存使用也发生了变化 */

00002b9f6e034000     708     256     256 rwx--  libnnz10.so
00002b9f6e0e5000       4       4       4 rwx--    [ anon ]
00002b9f6e103000     432     212     212 rwx--    [ anon ]
00002b9f6e16f000     400       8       0 r-x--  timezlrg.dat
00002b9f6e1f0000      40      28       0 r-x--  libnss_files-2.5.so

/* 相对应的是exp进程的匿名块anon(00002b9f6e103000     432     212     212 rwx--    [ anon ])部分rss和Dirty,
    由原来的164k增长到了212k,增长了48k
*/

[maclean@rh2 test]$ strace -o exp3.trace exp maclean/maclean file=tv.dmp tables=tv direct=y recordlength=65535

Export: Release 10.2.0.4.0 - Production on Mon Nov 8 17:53:25 2010
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in US7ASCII character set and UTF8 NCHAR character set
server uses ZHS16GBK character set (possible charset conversion)

About to export specified tables via Direct Path ...
. . exporting table                             TV    6602624 rows exported
Export terminated successfully without warnings.

/* 以65535 recordlength导出的strace 记录 */

read(9, "\7\333\0\0\6\0\0\0\0\0\10\377\377\0\0MADMIN\21\0SYS_IOT_T"..., 2064) = 2011
read(9, "\7\333\0\0\6\0\0\0\0\00010-09-29:18:14:08\5\0VAL"..., 2064) = 2064
read(9, "4:09\5\0VALID\1\0N\1\0N\1\0N\3\0SYS\23\0AQ$AQ"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\4\0\303\6 b\5\0INDEX\7\0xn\t\36\17!0"..., 2064) = 2064
read(9, ":21\5\0VALID\1\0N\1\0N\1\0N\3\0SYS\4\0HR11\376\377"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0J$\4\0P101\4\0\303\6!>\4\0\303\6\"9\17\0"..., 2064) = 2064
read(9, "P101\4\0\303\6!O\4\0\303\6\">\17\0INDEX PARTITIO"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0I1CCOL$\4\0P101\4\0\303\6!a\4\0\303"..., 2064) = 2064
read(9, ":44\5\0VALID\1\0N\1\0N\1\0N\3\0SYS\16\0WRH$_P"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0N_HISTORY_PK\31\0WRH$_ACT"..., 2064) = 2064
read(9, "SQLSTAT_PK\31\0WRH$_SQLSTA_14284528"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\0002010-10-18:20:48:28\5\0"..., 2064) = 2064
read(9, ";\f\23\0002010-10-20:19:58:11\5\0VALID\1\0"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0RT$\376\377\4\0\303\6$W\376\377\5\0TABLE\7\0"..., 2064) = 2064
read(9, "38C00004$$\376\377\4\0\303\6%(\4\0\303\6%(\3\0LOB\7\0x"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\6%G\376\377\5\0TABLE\7\0xn\n\24\25\10\34\7"..., 2064) = 2064
read(9, "VALID\1\0N\1\0N\1\0N\3\0SYS\30\0GV_OLAPI_SE"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\1\0N\1\0N\3\0SYS\31\0OLAPI_MEM"..., 2064) = 2064
read(9, "n\n\24\25\t\n\7\0xn\n\24\25\t\n\23\0002010-10-20:20:0"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\4\0TYPE\7\0xn\n\24\25\t\v\7\0xn\n\24\25"..., 2064) = 2064
read(9, "\16\23\0002010-10-20:20:08:12\5\0VALID\1\0N"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\7\0xn\n\24\25\t\25\23\0002010-10-20:"..., 2064) = 2064
read(9, "ITIONSTRUCT\376\377\4\0\303\6(\4\376\377\4\0TYPE\7\0xn\n"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\4\0VIEW\7\0xn\n\24\25\t\27\7\0xn\n\24\25"..., 2064) = 2064
read(9, " PARTITION\7\0xn\v\7\0238\10\7\0xn\v\7\0238\10\23\00020"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0-11-07:18:55:08\5\0VALID"..., 2064) = 2064
read(9, "ITION\7\0xn\v\7\0238\n\7\0xn\v\7\0238\n\23\0002010-11"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\000010-11-07:22:52:09\5\0VA"..., 2064) = 2064
read(9, "-11-07:22:52:10\5\0VALID\1\0N\1\0N\1\0N\6"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0T_SOO_BUFFER_BUSY\376\377\4\0\303"..., 2064) = 2064
read(9, "\0xn\n\36\27\5\33\7\0xn\n\37\27\6\5\23\0002010-10-23:01"..., 1958) = 1958
read(9, "\7\333\0\0\6\0\0\0\0\0\6+ \17\0TABLE PARTITION\7\0"..., 2064) = 2064
read(9, "09\5\0VALID\1\0N\1\0N\1\0N\3\0SYS\27\0WRH$_DB"..., 1477) = 1477
write(7, "0-09-29:17:56:12\5\0VALID\1\0N\1\0N\1\0N"..., 4096) = 4096
write(7, "05465\376\377\4\0\303\6\37\\\4\0\303\6\37\\\5\0INDEX\7\0xn\t\35"..., 61440) = 61440

/* 这里exp首先累计(accumulates)读取了65536 bytes的数据到缓存中,然后以2次write写出这些记录到导出文件中
   换而言之可以把recordlength看做direct patch时的buffer size
*/

那么recordlength是如何影响exp的效率的呢?由以上演示可知在使用direct直接路径读取方式时,recordlength并不影响读取(read),不管我们设置如何的recordlength,exp进程所需做的read总工作量都是一样的。区别在于write的次数和单次写出的数据量,以不设置recordlength为例,其写出8192字节数据需要做2次的write,则当dump文件为600MB时需要写出153600次;而设置recordlength为65535时,其写出(4096+61440)=65536也仅需要做2次write call,当dump文件为600MB时仅需要写出19200次。可知当使用更大的recordlength时可以有效减少exp作业所需的CPU时间和实际IO次数,以极大地加速导出速度;表现到IO层的话,就是用户可以显示地使用iostat等工具看到在导出阶段的写出IO大幅上升。使用recordlength的直接代价是exp进程会消耗使用更多的内存(一般在100~200k左右),以现在的眼光看来这点内存无关紧要。

需要注意的是recordlength现在存在一个最大值(maximum)为65535,设置超过65536的recordlength,exp程序都会返回如:Note: RECORDLENGTH=65536 truncated to 65535的信息;早期版本(734 or 8i)中不存在这个限制,也说明了exp的开发人员逐渐意识到大于65535的 recordlength无益于加速导出。

metalink文档[ID 134966.1]指出了如何判断recordlength的默认值:

Problem Description
——————-
You want to export the entire database and move it to a different platform.
The RECORDLENGTH parameter must be specified during the export since each
platform has a different default.
How do you determine the default RECORDLENGTH value for each platform?
Solution Description
——————–
Look for the BUFSIZ-Parameter in the File /usr/include/stdio.h file to determine
the default size of the output record length.
If no value is specified for the BUFSIZ parameter then a default of 512 bytes
will be used.
Explanation
———–
If the RECORDLENGTH parameter is not explicitly set on EXP/IMP, ORACLE will use the
value specified for BUFSIZ in the /usr/include/stdio.h file to determine the
size of it’s output record length.
The BUFFER parameter applies ONLY to conventional path Export. It has no effect on a direct path Export. This BUFFER parameter specifies the size (in bytes) of the buffer used to fetch rows. It determines the maximum number of rows in an array, fetched by Export. For direct path Export, use the RECORDLENGTH parameter to specify the size of the buffer that Export uses for writing to the export file.

The RECORDLENGTH parameter specifies the length (in bytes) of the file record. You can use this parameter to specify the size of the Export I/O buffer (highest value is 64 kb). Changing the RECORDLENGTH parameter affects only the size of data that accumulates before writing to disk. It does not affect the operating system file block size. If you do not define this parameter, it defaults to your platform-dependent value for BUFSIZ (1024 bytes in most cases).

BUFSIZ本身是UNIX/LINUX系统中控制I/O buffersize的参数,有兴趣研究的话可以参考著名的apue书(W.Richard Stevens的<Advanced Programming in the UNIX Environment>,这本书可以做任何人的Unix C启蒙书:

The value of BUFSIZ is chosen on each system so as to make stream I/O efficient. So it is a good idea to use BUFSIZ as the size for the buffer when you call setvbuf. Actually, you can get an even better value to use for the buffer size by means of the fstat system call: it is found in the st_blksize field of the file attributes.Sometimes people also use BUFSIZ as the allocation size of buffers used for related purposes, such as strings used to receive a line of input with fgets (see section 12.8 Character Input). There is no particular reason to use BUFSIZ for this instead of any other integer, except that it might lead to doing I/O in chunks of an efficient size.

How to trouble shooting Library cache lock/pin

1.Troubleshooting Library Cache: Lock, Pin and Load Lock (Doc ID 444560.1)
What is "Library cache lock" ?

This event controls the concurrency between clients of the library cache. It acquires a lock on
the object handle so that either:

* One client can prevent other clients from accessing the same object.

* The client can maintain a dependency for a long time (for example, so that no other client can change the object).

This lock is also obtained to locate an object in the library cache.
Library cache lock will be obtained on database objects referenced during parsing or compiling of
SQL or PL/SQL statements (table, view, procedure, function, package, package body, trigger, index, cluster, synonym).
The lock will be released at the end of the parse or compilation.

Cursors (SQL and PL/SQL areas), pipes and any other transient objects do not use this lock.
Library cache lock is not deadlock sensitive and the operation is synchronous.

Parameters:

* handle address
Address of the object being loaded.

* lock address
Address of the load lock being used. This is not the same thing as a latch or an enqueue, it is a State Object.

* Mode
Indicates the data pieces of the object which need to be loaded.

* Namespace
The name of the object namespace as it is displayed in V$DB_OBJECT_CACHE view

How can Library cache lock be reduced?

In general , high contention on library cache lock is usually a result of an under-sized shared pool or
non-sharing of sql. Some ways of reducing the contention are:

* Reduce the reloads by increasing the shared pool size as the locks may take a long time if the pool is undersized.

* Increase sharing by setting the cursor_sharing to similar or force.
Be aware this may change the execution plan; so setting the parameter should be thoroughly tested.

* Reduce the invalidations by running the batch jobs to collect statistics or any other maintenance jobs
separately from OLTP.

Note 122793.1 How to Find which Session is Holding a Particular Library Cache Lock

Known Bugs:

Note:10018789.8Spin in kgllock / DB hang with high library cache lock waits
Note:7423411.8Process may hang waiting for "library cache load lock" with no holder
Note:7706138.8Process may hang waiting for "library cache load lock" with no holder
Note:9675816.8Bug 9675816 - Self deadlock with 'library cache lock' waits

2.How to Find which Session is Holding a Particular Library Cache Lock (Doc ID 122793.1)

Method 1: Systemstate Analysis

For older versions you can use the following syntax that is also possible in higher versions.The level 266 is
not available before 9.2.0.6

alter session set max_dump_file_size=unlimited;
alter session set events 'immediate trace name systemstate level 10'
Oracle will create a systemstate tracefile in your USER_DUMP_DEST directory.

METHOD 2: EXAMINE THE X$KGLLK TABLE
select sid,saddr from v$session where event= 'library cache lock';

SID SADDR
---------- --------
16 572ed244

select kgllkhdl Handle,kgllkreq Request, kglnaobj Object
from x$kgllk where kgllkses = '572ed244'
and kgllkreq > 0;

HANDLE REQUEST OBJECT
-------- ---------- ------------------------------------------------------------
62d064dc 2 EMPLOYEES

A:It's possible that one library cache lock can block all the other sessions if this table need to be
queried by other sessions.

The problem is why the library cache lock was held for so long and wasn't released.
Usually, library cache lock only cost a few milliseconds.

It could be these Known Bugs:

Note:10018789.8 Spin in kgllock / DB hang with high library cache lock waits
Note:7423411.8 Process may hang waiting for "library cache load lock" with no holder
Note:7706138.8 Process may hang waiting for "library cache load lock" with no holder
Note:9675816.8 Bug 9675816 - Self deadlock with 'library cache lock' waits

For more information, you can refer note 444560.1 and 122793.1.

To understand the root cause, we need more information to analyze.
You may take systemstate and hanganalyze next time and we'll help you to analyze them.

1. Please generate systemstate dump as sysdba:
SQL>conn / as sysdba;
SQL>alter session set max_dump_file_size = unlimited;
SQL>alter session set events 'immediate trace name systemstate level 10';
Wait for some some seconds
SQL>alter session set events 'immediate trace name systemstate level 10';
Wait for some some seconds
SQL>alter session set events 'immediate trace name systemstate level 10';

2. Open another session as sysdba:
SQL>conn / as sysdba;
SQL>oradebug setmypid
SQL>oradebug unlimit;
SQL>oradebug dump hanganalyze 3
Wait for some seconds
SQL>oradebug dump hanganalyze 3
Wait for some seconds
SQL>oradebug dump hanganalyze 3

3. The generated files will be under udump. Please upload these files.
$cd udump
$ls –ltr

ARCHIVER ERROR ORA-00354: CORRUPT REDO LOG BLOCK HEADER

Problem Description:
ORA-16038: log 2 sequence# 13831 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 2 thread 1: ‘/oradata/3/TOOLS/stdby_redo/srl1.log’

LOG FILE
---------------
Filename = alert_TOOLS5_from_1021.log
See ...

...
Wed Oct 28 11:41:59 2009
Primary database is in MAXIMUM AVAILABILITY mode
Standby controlfile consistent with primary
RFS[1]: Successfully opened standby log 1: '/oradata/3/TOOLS/stdby_redo/srl0.log'
Wed Oct 28 11:42:00 2009
ARC0: Log corruption near block 604525 change 10551037679542 time ?
Wed Oct 28 11:42:00 2009
Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc0_2143.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 604525 change 10551037679542 time 10/28/2009 11:29:50
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'
ARC0: All Archive destinations made inactive due to error 354
Wed Oct 28 11:42:00 2009
ARC0: Closing local archive destination LOG_ARCHIVE_DEST_2: '/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc' (error 354)
(TOOLS)
Committing creation of archivelog '/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc' (error 354)
ARCH: Archival stopped, error occurred. Will continue retrying
Wed Oct 28 11:42:05 2009
ORACLE Instance TOOLS - Archival Error
Wed Oct 28 11:42:05 2009
ORA-16038: log 2 sequence# 13831 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'
Wed Oct 28 11:42:05 2009
Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc0_2143.trc:
ORA-16038: log 2 sequence# 13831 cannot be archived
ORA-00354: corrupt redo log block header
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'
Wed Oct 28 11:43:04 2009
ARCH: Archival stopped, error occurred. Will continue retrying
Wed Oct 28 11:43:04 2009
ORACLE Instance TOOLS - Archival Error
Wed Oct 28 11:43:04 2009
Primary database is in MAXIMUM AVAILABILITY mode
Changing standby controlfile to RESYNCHRONIZATION level
Wed Oct 28 11:43:04 2009
ORA-16014: log 1 sequence# 13832 not archived, no available destinations
ORA-00312: online log 1 thread 1: '/oradata/3/TOOLS/stdby_redo/srl0.log'
Wed Oct 28 11:43:04 2009
Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc1_2145.trc:
ORA-16014: log 1 sequence# 13832 not archived, no available destinations
ORA-00312: online log 1 thread 1: '/oradata/3/TOOLS/stdby_redo/srl0.log'
RFS[1]: Successfully opened standby log 2: '/oradata/3/TOOLS/stdby_redo/srl1.log'
Wed Oct 28 11:43:13 2009
RFS[3]: Archived Log: '/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc'
Wed Oct 28 11:43:14 2009
RFS LogMiner: Registered logfile [/oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc] to LogMiner session id [4]
Wed Oct 28 11:43:15 2009
LOGMINER: Begin mining logfile for session 4 thread 1 sequence 13831, /oradata/3/TOOLS/archive/dgarc/1_13831_635534096.arc
Wed Oct 28 11:44:03 2009
RFS[3]: Archived Log: '/oradata/3/TOOLS/archive/dgarc/1_13832_635534096.arc'
...
LOG FILE
---------------
Filename = alert_TOOLS6_from_1021.log
See ...

...
Wed Oct 28 11:16:01 2009
Thread 1 advanced to log sequence 13830 (LGWR switch)
Current log# 8 seq# 13830 mem# 0: /oradata/1/redo/TOOLS/redo1a.log
Current log# 8 seq# 13830 mem# 1: /oradata/2/redo/TOOLS/redo1b.log
Current log# 8 seq# 13830 mem# 2: /oradata/3/redo/TOOLS/redo1c.log
Wed Oct 28 11:29:50 2009
LGWR: Standby redo logfile selected to archive thread 1 sequence 13831
LGWR: Standby redo logfile selected for thread 1 sequence 13831 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:29:50 2009
Thread 1 advanced to log sequence 13831 (LGWR switch)
Current log# 9 seq# 13831 mem# 0: /oradata/1/redo/TOOLS/redo2a.log
Current log# 9 seq# 13831 mem# 1: /oradata/2/redo/TOOLS/redo2b.log
Current log# 9 seq# 13831 mem# 2: /oradata/3/redo/TOOLS/redo2c.log
Wed Oct 28 11:41:59 2009
LGWR: Standby redo logfile selected to archive thread 1 sequence 13832
LGWR: Standby redo logfile selected for thread 1 sequence 13832 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:41:59 2009
Thread 1 advanced to log sequence 13832 (LGWR switch)
Current log# 10 seq# 13832 mem# 0: /oradata/1/redo/TOOLS/redo3a.log
Current log# 10 seq# 13832 mem# 1: /oradata/2/redo/TOOLS/redo3b.log
Current log# 10 seq# 13832 mem# 2: /oradata/3/redo/TOOLS/redo3c.log
Wed Oct 28 11:43:04 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 13833
LGWR: Standby redo logfile selected for thread 1 sequence 13833 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:43:04 2009
Thread 1 advanced to log sequence 13833 (LGWR switch)
Current log# 11 seq# 13833 mem# 0: /oradata/1/redo/TOOLS/redo4a.log
Current log# 11 seq# 13833 mem# 1: /oradata/2/redo/TOOLS/redo4b.log
Current log# 11 seq# 13833 mem# 2: /oradata/3/redo/TOOLS/redo4c.log
Wed Oct 28 11:45:04 2009
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 13834
LGWR: Standby redo logfile selected for thread 1 sequence 13834 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:45:05 2009
Thread 1 advanced to log sequence 13834 (LGWR switch)
Current log# 8 seq# 13834 mem# 0: /oradata/1/redo/TOOLS/redo1a.log
Current log# 8 seq# 13834 mem# 1: /oradata/2/redo/TOOLS/redo1b.log
Current log# 8 seq# 13834 mem# 2: /oradata/3/redo/TOOLS/redo1c.log
Wed Oct 28 11:46:03 2009
Thread 1 cannot allocate new log, sequence 13835
Checkpoint not complete
Current log# 8 seq# 13834 mem# 0: /oradata/1/redo/TOOLS/redo1a.log
Current log# 8 seq# 13834 mem# 1: /oradata/2/redo/TOOLS/redo1b.log
Current log# 8 seq# 13834 mem# 2: /oradata/3/redo/TOOLS/redo1c.log
Wed Oct 28 11:46:10 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Standby redo logfile selected to archive thread 1 sequence 13835
LGWR: Standby redo logfile selected for thread 1 sequence 13835 for destination LOG_ARCHIVE_DEST_2
Wed Oct 28 11:46:11 2009
Thread 1 advanced to log sequence 13835 (LGWR switch)
Current log# 9 seq# 13835 mem# 0: /oradata/1/redo/TOOLS/redo2a.log
Current log# 9 seq# 13835 mem# 1: /oradata/2/redo/TOOLS/redo2b.log
Current log# 9 seq# 13835 mem# 2: /oradata/3/redo/TOOLS/redo2c.log
Wed Oct 28 11:48:03 2009
Thread 1 cannot allocate new log, sequence 13836
Checkpoint not complete
Current log# 9 seq# 13835 mem# 0: /oradata/1/redo/TOOLS/redo2a.log
Current log# 9 seq# 13835 mem# 1: /oradata/2/redo/TOOLS/redo2b.log
Current log# 9 seq# 13835 mem# 2: /oradata/3/redo/TOOLS/redo2c.log
Wed Oct 28 11:48:06 2009
...

From the standby, as at 2009-10-28, 11:42, when the archiver tried to archive the standby
redo logfile. it encountered this error:

ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 604525 change 10551037679542 time 10/28/2009 11:29:50
ORA-00312: online log 2 thread 1: '/oradata/3/TOOLS/stdby_redo/srl1.log'

Errors in file /tools/oracle/admin/TOOLS/bdump/tools_arc0_2143.trc

The real logfile is retrieved from primary by the standby RFS process, then the log apply continue as usual.
The fact that the standby redo logs are corrupted and identified as corrupt by the ARC process , makes it clear that there could be some sort of I/O errors which has caused.
Reviewing the alert.log file it is clear that the RFS process fetched the new copy of the file which is corrupted and the issue has been resolved.
This is more an issue to be concentrated from the system adminisration end to determine in case there are any issues at the I.O subsystem
.

list some Script to Collect Data Guard Primary Site Diagnostic Information:

Overview
——–
This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-
This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-
– – – – – – – – – – – – – – – – Script begins here – – – – – – – – – – – – – – – –

— NAME: dg_prim_diag.sql (Run on PRIMARY with a LOGICAL or PHYSICAL STANDBY)
— ————————————————————————
— Copyright 2002, Oracle Corporation
— LAST UPDATED: 2/23/04

— Usage: @dg_prim_diag
— ————————————————————————
— PURPOSE:
— This script is to be used to assist in collection information to help
— troubeshoot Data Guard issues with an emphasis on Logical Standby.
— ————————————————————————
— DISCLAIMER:
— This script is provided for educational purposes only. It is NOT
— supported by Oracle World Wide Technical Support.
— The script has been tested and appears to work as intended.
— You should always run new scripts on a test instance initially.
— ————————————————————————
— Script output is as follows:

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,’Mondd_hhmi’) timecol,
‘.out’ spool_extension from sys.dual;
column output new_value dbname
select value || ‘_’ output
from v$parameter where name = ‘db_name’;
spool dg_prim_diag_&&dbname&&timestamp&&suffix
set linesize 79
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = ‘MON-DD-YYYY HH24:MI:SS’;
set feedback on
select to_char(sysdate) time from dual;

set echo on

— In the following the database_role should be primary as that is what
— this script is intended to be run on. If protection_level is different
— than protection_mode then for some reason the mode listed in
— protection_mode experienced a need to downgrade. Once the error
— condition has been corrected the protection_level should match the
— protection_mode after the next log switch.

column role format a7 tru
column name format a10 wrap

select name,database_role role,log_mode,
protection_mode,protection_level
from v$database;

— ARCHIVER can be (STOPPED | STARTED | FAILED). FAILED means that the
— archiver failed to archive a log last time, but will try again within 5
— minutes. LOG_SWITCH_WAIT The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log
— switching is waiting for. Note that if ALTER SYSTEM SWITCH LOGFILE is
— hung, but there is room in the current online redo log, then value is
— NULL

column host_name format a20 tru
column version format a9 tru

select instance_name,host_name,version,archiver,log_switch_wait
from v$instance;

— The following query give us information about catpatch.
— This way we can tell if the procedure doesn’t match the image.

select version, modified, status from dba_registry
where comp_id = ‘CATPROC’;

— Force logging is not mandatory but is recommended. Supplemental
— logging must be enabled if the standby associated with this primary is
— a logical standby. During normal operations it is acceptable for
— SWITCHOVER_STATUS to be SESSIONS ACTIVE or TO STANDBY.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru

select force_logging,remote_archive,
supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker
from v$database;

— This query produces a list of all archive destinations. It shows if
— they are enabled, what process is servicing that destination, if the
— destination is local or remote, and if remote what the current mount ID
— is.

column destination format a35 wrap
column process format a7
column archiver format a8
column ID format 99
column mid format 99

select dest_id “ID”,destination,status,target,
schedule,process,mountid mid
from v$archive_dest order by dest_id;

— This select will give further detail on the destinations as to what
— options have been set. Register indicates whether or not the archived
— redo log is registered in the remote destination control file.

set numwidth 8
column ID format 99

select dest_id “ID”,archiver,transmit_mode,affirm,async_blocks async,
net_timeout net_time,delay_mins delay,reopen_secs reopen,
register,binding
from v$archive_dest order by dest_id;

— The following select will show any errors that occured the last time
— an attempt to archive to the destination was attempted. If ERROR is
— blank and status is VALID then the archive completed correctly.

column error format a55 wrap

select dest_id,status,error from v$archive_dest;

— The query below will determine if any error conditions have been
— reached by querying the v$dataguard_status view (view only available in
— 9.2.0 and above):

column message format a80

select message, timestamp
from v$dataguard_status
where severity in (‘Error’,’Fatal’)
order by timestamp;

— The following query will determine the current sequence number
— and the last sequence archived. If you are remotely archiving
— using the LGWR process then the archived sequence should be one
— higher than the current sequence. If remotely archiving using the
— ARCH process then the archived sequence should be equal to the
— current sequence. The applied sequence information is updated at
— log switch time.

select ads.dest_id,max(sequence#) “Current Sequence”,
max(log_sequence) “Last Archived”
from v$archived_log al, v$archive_dest ad, v$archive_dest_status ads
where ad.dest_id=al.dest_id
and al.dest_id=ads.dest_id
group by ads.dest_id;

— The following select will attempt to gather as much information as
— possible from the standby. SRLs are not supported with Logical Standby
— until Version 10.1.

set numwidth 8
column ID format 99
column “SRLs” format 99
column Active format 99

select dest_id id,database_mode db_mode,recovery_mode,
protection_mode,standby_logfile_count “SRLs”,
standby_logfile_active ACTIVE,
archived_seq#
from v$archive_dest_status;

— Query v$managed_standby to see the status of processes involved in
— the shipping redo on this system. Does not include processes needed to
— apply redo.

select process,status,client_process,sequence#
from v$managed_standby;

— The following query is run on the primary to see if SRL’s have been
— created in preparation for switchover.

select group#,sequence#,bytes from v$standby_log;

— The above SRL’s should match in number and in size with the ORL’s
— returned below:

select group#,thread#,sequence#,bytes,archived,status from v$log;

— Non-default init parameters.

set numwidth 5
column name format a30 tru
column value format a48 wra
select name, value
from v$parameter
where isdefault = ‘FALSE’;

spool off

– – – – – – – – – – – – – – – – Script ends here – – – – – – – – – – – – – – – –

another one:

Overview
——–

This script is intended to provide an easy method to provide information
necessary to troubleshoot Data Guard issues.

Script Notes
————-

This script is intended to be run via sqlplus as the SYS or Internal user.

Script
——-

– – – – – – – – – – – – – – – – Script begins here – – – – – – – – – – – – – – – –

— NAME: DG_phy_stby_diag.sql
— ————————————————————————
— AUTHOR:
— Michael Smith – Oracle Support Services – DataServer Group
— Copyright 2002, Oracle Corporation
— ————————————————————————
— PURPOSE:
— This script is to be used to assist in collection information to help
— troubeshoot Data Guard issues.
— ————————————————————————
— DISCLAIMER:
— This script is provided for educational purposes only. It is NOT
— supported by Oracle World Wide Technical Support.
— The script has been tested and appears to work as intended.
— You should always run new scripts on a test instance initially.
— ————————————————————————
— Script output is as follows:

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,’Mondd_hhmi’) timecol,
‘.out’ spool_extension from sys.dual;
column output new_value dbname
select value || ‘_’ output
from v$parameter where name = ‘db_name’;
spool dgdiag_phystby_&&dbname&&timestamp&&suffix
set lines 200
set pagesize 35
set trim on
set trims on
alter session set nls_date_format = ‘MON-DD-YYYY HH24:MI:SS’;
set feedback on
select to_char(sysdate) time from dual;

set echo on


— ARCHIVER can be (STOPPED | STARTED | FAILED) FAILED means that the archiver failed
— to archive a — log last time, but will try again within 5 minutes. LOG_SWITCH_WAIT
— The ARCHIVE LOG/CLEAR LOG/CHECKPOINT event log switching is waiting for. Note that
— if ALTER SYSTEM SWITCH LOGFILE is hung, but there is room in the current online
— redo log, then value is NULL

column host_name format a20 tru
column version format a9 tru
select instance_name,host_name,version,archiver,log_switch_wait from v$instance;

— The following select will give us the generic information about how this standby is
— setup. The database_role should be standby as that is what this script is intended
— to be ran on. If protection_level is different than protection_mode then for some
— reason the mode listed in protection_mode experienced a need to downgrade. Once the
— error condition has been corrected the protection_level should match the protection_mode
— after the next log switch.

column ROLE format a7 tru
select name,database_role,log_mode,controlfile_type,protection_mode,protection_level
from v$database;

— Force logging is not mandatory but is recommended. Supplemental logging should be enabled
— on the standby if a logical standby is in the configuration. During normal
— operations it is acceptable for SWITCHOVER_STATUS to be SESSIONS ACTIVE or NOT ALLOWED.

column force_logging format a13 tru
column remote_archive format a14 tru
column dataguard_broker format a16 tru
select force_logging,remote_archive,supplemental_log_data_pk,supplemental_log_data_ui,
switchover_status,dataguard_broker from v$database;

— This query produces a list of all archive destinations and shows if they are enabled,
— what process is servicing that destination, if the destination is local or remote,
— and if remote what the current mount ID is. For a physical standby we should have at
— least one remote destination that points the primary set but it should be deferred.

COLUMN destination FORMAT A35 WRAP
column process format a7
column archiver format a8
column ID format 99

select dest_id “ID”,destination,status,target,
archiver,schedule,process,mountid
from v$archive_dest;

— If the protection mode of the standby is set to anything higher than max performance
— then we need to make sure the remote destination that points to the primary is set
— with the correct options else we will have issues during switchover.

select dest_id,process,transmit_mode,async_blocks,
net_timeout,delay_mins,reopen_secs,register,binding
from v$archive_dest;

— The following select will show any errors that occured the last time an attempt to
— archive to the destination was attempted. If ERROR is blank and status is VALID then
— the archive completed correctly.

column error format a55 tru
select dest_id,status,error from v$archive_dest;

— Determine if any error conditions have been reached by querying thev$dataguard_status
— view (view only available in 9.2.0 and above):

column message format a80
select message, timestamp
from v$dataguard_status
where severity in (‘Error’,’Fatal’)
order by timestamp;

— The following query is ran to get the status of the SRL’s on the standby. If the
— primary is archiving with the LGWR process and SRL’s are present (in the correct
— number and size) then we should see a group# active.

select group#,sequence#,bytes,used,archived,status from v$standby_log;

— The above SRL’s should match in number and in size with the ORL’s returned below:

select group#,thread#,sequence#,bytes,archived,status from v$log;

— Query v$managed_standby to see the status of processes involved in the
— configuration.

select process,status,client_process,sequence#,block#,active_agents,known_agents
from v$managed_standby;

— Verify that the last sequence# received and the last sequence# applied to standby
— database.

select al.thrd “Thread”, almax “Last Seq Received”, lhmax “Last Seq Applied”
from (select thread# thrd, max(sequence#) almax
from v$archived_log
where resetlogs_change#=(select resetlogs_change# from v$database)
group by thread#) al,
(select thread# thrd, max(sequence#) lhmax
from v$log_history
where first_time=(select max(first_time) from v$log_history)
group by thread#) lh
where al.thrd = lh.thrd;

— The V$ARCHIVE_GAP fixed view on a physical standby database only returns the next
— gap that is currently blocking redo apply from continuing. After resolving the
— identified gap and starting redo apply, query the V$ARCHIVE_GAP fixed view again
— on the physical standby database to determine the next gap sequence, if there is
— one.

select * from v$archive_gap;

— Non-default init parameters.

set numwidth 5
column name format a30 tru
column value format a50 wra
select name, value
from v$parameter
where isdefault = ‘FALSE’;

spool off

– – – – – – – – – – – – – – – – Script ends here – – – – – – – – – – – – – – – –

Dream features in Oracle Database 12g?

We are wondering some new features from Oracle database 12g in the immediate future:

  1. Oracle database 12g automatic upgrade,do not need human intervention anymore
  2. Oracle database 12g self-organizing Real application cluster, any instance created on any platform can be added into the cluster anytime
  3. Oracle database 12g automatic detected recovery, It can be recovered by itself like terminator!
  4. Oracle database 12g super compression, human don’t need care about compression any more, Oracle will compress data appropriately

to be continued……………

沪ICP备14014813号-2

沪公网安备 31010802001379号