ORA-00600:[32695], [hash aggregation can't be done]错误一例

2010/09/03 by admin 3 Comments

还是那个hash group by算法的问题，日志文件中出现以下记录:

*** ACTION NAME:(SQL 窗口 - 新建) 2010-09-03 14:27:54.594
*** MODULE NAME:(PL/SQL Developer) 2010-09-03 14:27:54.594
*** SERVICE NAME:(HQYDB1) 2010-09-03 14:27:54.594
*** SESSION ID:(3205.17923) 2010-09-03 14:27:54.594
*** 2010-09-03 14:27:54.594
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [32695], [hash aggregation can't be done], [], [], [], [], [], []
Current SQL statement for this session:
create table zou_201008_cell_id as
select /* g_all_cdr02,60 */
calling_num mobile_number,
lac,
lpad(cell_id,5,'0') cell_id,
count(*) c,
sum(call_duration) call_duration,
sum(decode(record_type,'00',1,0)*call_duration) moc_call_duration,
sum(decode(record_type,'01',1,0)*call_duration) mtc_call_duarion
from  g_all_cdr02
where substr(calling_num,1,7) in (select mobile_prefix from zou_mobile_prefix)
group by
calling_num ,
lac,
lpad(cell_id,5,'0')
----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              100000000 ? 11055A9A0 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgesiv+0080          bl       kgeriv               1050BE654 ? 1050BE604 ?
                                                   0000027E5 ? 080000000 ?
                                                   07FFFFFFF ?
ksesic1+0060         bl       kgesiv               43300000FFFF5310 ?
                                                   4530000000000000 ?
                                                   000000071 ? 000000001 ?
                                                   000000000 ?
qeshPartitionBuildH  bl       01F9CA24
D+04bc
qeshGBYOpenScan2+02  bl       qeshPartitionBuildH  0000027E5 ? 1105C06C0 ?
34                            D
qeshGBYOpenScan+001  bl       qeshGBYOpenScan2     FFFFFFFFFFF5740 ? 11055A938 ?
8                                                  000000000 ? 000000010 ?
qerghFetch+05e8      bl       qeshGBYOpenScan      000001000 ?
rwsfcd+0054          bl       _ptrgl
qerltFetch+036c      bl       03F2EB1C
ctcdrv+4160          bl       01F9C898
opiexe+2884          bl       ctcdrv               100000001 ? 100000001 ?
                                                   110467F30 ?
opiosq0+19f0         bl       opiexe               FFFFFFFFFFF8B50 ?
                                                   2824422142420820 ?
                                                   FFFFFFFFFFF8C10 ?
kpooprx+0168         bl       opiosq0              300000000 ? 000000000 ?
                                                   000000000 ? A4000000000000 ?
kpoal8+0400          bl       kpooprx              FFFFFFFFFFFB464 ?
                                                   FFFFFFFFFFFB068 ?
                                                   1BF000001BF ? 100000001 ?
                                                   000000000 ? A40000000000A4 ?
                                                   000000000 ? 1103878F8 ?
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F9F2A0
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F9E0E8
sou2o+0090           bl       opidrv               3C02DC1BBC ? 44065F000 ?
                                                   FFFFFFFFFFFF3A0 ?
opimai_real+01bc     bl       01F9B9F4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?

--------------------- Binary Stack Dump ---------------------

这次是因为应用人员不了解alter session的作用域，在PL/SQL Developer工具中的不同窗口(也就是不在同一会话中)中执行了”alter session set “_gby_hash_aggregation_enabled” = false;”和涉及group by操作的SQL，并导致了unpublished bug:6471770被触发。
我们比较容易地workaround绕过这个Bug:


/* 在会话级别设置优化参数_gby_hash_aggregation_enabled */

alter session set "_gby_hash_aggregation_enabled" = false;

/* 或者在语句中加入NO_USE_HASH_AGGREGATION的 hint */

select  /*+ NO_USE_HASH_AGGREGATION */ ....

以上提及的unpublished bug:6471770据称在10.2.0.5,11.1.0.7,11.2.0.1版本中被修正了。

Filed Under: Oracle Tagged With: cbo, hash group by, hint, internal error, optimizer, Oracle Bug, sort, SQL tuning

【Oracle数据恢复】ORA-00600[6711]错误一例

2010/09/01 by admin 5 Comments

一套Linux上的10.2.0.4系统，日志中频繁出现ORA-00600[6711]内部错误:

如果自己搞不定可以找ASKMACLEAN专业ORACLE数据库修复团队成员帮您恢复!

Wed Sep  1 21:24:30 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_smon_5622.trc:
ORA-00600: internal error code, arguments: [6711], [4256248], [1], [4256242], [0], [], [], []
Wed Sep  1 21:24:31 2010
Non-fatal internal error happenned while SMON was doing logging scn->time mapping.

MOS上有一个关于6711内部错误十分简单的Note,该文档声称出现6711错误极有可能是部分类型为簇(cluster)的数据字典表存在潜在的讹误，这个Note甚至没有告诉我们该错误argument参数的意义。
不过其实我们可以猜出来,因为是和corruption相关的错误，那么实际上可能关联的几个因素无非是obj#,file#,block#；4256248和4256242 两个数字像极了Data Block Address，把他们当做dba来看待，也就指向了1号数据文件的61938块和61944数据块，我们来看看这些块属于哪个对象：

SQL> set linesize 200;
SQL> select segment_name, segment_type
  2    from dba_extents
  3   where relative_fno = 1
  4     and (61938 between block_id and block_id + blocks or
  5         61944 between block_id and block_id + blocks);

SEGMENT_NAME                                                                      SEGMENT_TYPE
--------------------------------------------------------------------------------- ------------------
SMON_SCN_TO_TIME                                                                  CLUSTER

不出意料是一个cluster，SMON_SCN_TO_TIME是SMON_SCN_TIME表的基簇，SMON_SCN_TIME表用以记录数据库中scn对应的时间戳。我们直接查看用以创建数据字典的sql.bsq文件，可以进一步了解他们的结构:

cat $ORACLE_HOME/rdbms/admin/sql.bsq|grep -A 24 "create cluster smon_scn_to_time"
create cluster smon_scn_to_time (
  thread number                         /* thread, compatibility */
)
/
create index smon_scn_to_time_idx on cluster smon_scn_to_time
/
create table smon_scn_time (
  thread number,                         /* thread, compatibility */
  time_mp number,                        /* time this recent scn represents */
  time_dp date,                          /* time as date, compatibility */
  scn_wrp number,                        /* scn.wrp, compatibility */
  scn_bas number,                        /* scn.bas, compatibility */
  num_mappings number,
  tim_scn_map raw(1200),
  scn number default 0,                  /* scn */
  orig_thread number default 0           /* for downgrade */
) cluster smon_scn_to_time (thread)
/

create unique index smon_scn_time_tim_idx on smon_scn_time(time_mp)
/

create unique index smon_scn_time_scn_idx on smon_scn_time(scn)
/

从以上脚本可以看到这个簇上存在多个索引，我们需要进一步validate验证所有这些对象:

SQL> analyze table SMON_SCN_TIME validate structure;
Table analyzed.

SQL>analyze table SMON_SCN_TIME validate structure cascade;
Table analyzed.

SQL> analyze cluster SMON_SCN_TO_TIME validate structure;
Cluster analyzed.

SQL> analyze cluster SMON_SCN_TO_TIME validate structure cascade;
analyze cluster SMON_SCN_TO_TIME validate structure cascade
*
ERROR at line 1:
ORA-01499: table/index cross reference failure - see trace file

到这里问题已经很清晰了，问题出在SMON_SCN_TO_TIME的索引smon_scn_to_time_idx身上，极有可能是该索引上出现了逻辑讹误。所幸有问题的仅仅是索引，找出问题所在后要解决就显得容易得多了:

SQL> alter index smon_scn_to_time_idx rebuild ;

Index altered.

/* 在索引出现讹误的情况下仅仅rebuild往往是无效的，在我们rebuild的同时告警日志中再次出现了ORA-00600[6711]错误 !!! */

/* 我们需要的彻底把有问题的索引drop掉，并再次创建!!! */

SQL> drop index smon_scn_to_time_idx ;

Index dropped.

SQL> create index smon_scn_to_time_idx on cluster smon_scn_to_time;

Index created.

/* 至此问题解决，告警日志中不再出现错误! * /

/* That's great! * /

Filed Under: Oracle Tagged With: cluster, force open, Oracle Corruption, Oracle索引index, smon_scn_time

Oracle SQL Developer 的一个Bug

2010/08/30 by admin 1 Comment

Oracle SQL Developer是Oracle近几年来推出的一款免费的图形界面SQL PL/SQL开发工具，在11g中它甚至被集成到了Oracle server软件中；如果你平常有用这款IDE的话，大概会注意到它提供了十分实用的Database copy,diff,export功能。

当在Oracle SQL developer中使用sys用户登录数据库，并尝试导出某些schema的ddl定义时，可能会丢失一些constraint约束的ddl信息，如以下例子:

我们尝试导出scott用户的表，索引以及约束的相关ddl，当使用sys用户登录数据库时，Oracle SQL developer给出的SQL:

--------------------------------------------------------
--  File created - Monday-August-30-2010
--------------------------------------------------------
--------------------------------------------------------
--  DDL for Table BONUS
--------------------------------------------------------

  CREATE TABLE "BONUS"
   (	"ENAME" VARCHAR2(10),
	"JOB" VARCHAR2(9),
	"SAL" NUMBER,
	"COMM" NUMBER
   ) ;
--------------------------------------------------------
--  DDL for Table DEPT
--------------------------------------------------------

  CREATE TABLE "DEPT"
   (	"DEPTNO" NUMBER(2,0),
	"DNAME" VARCHAR2(14),
	"LOC" VARCHAR2(13)
   ) ;
--------------------------------------------------------
--  DDL for Table EMP
--------------------------------------------------------

  CREATE TABLE "EMP"
   (	"EMPNO" NUMBER(4,0),
	"ENAME" VARCHAR2(10),
	"JOB" VARCHAR2(9),
	"MGR" NUMBER(4,0),
	"HIREDATE" DATE,
	"SAL" NUMBER(7,2),
	"COMM" NUMBER(7,2),
	"DEPTNO" NUMBER(2,0)
   ) ;
--------------------------------------------------------
--  DDL for Table SALGRADE
--------------------------------------------------------

  CREATE TABLE "SALGRADE"
   (	"GRADE" NUMBER,
	"LOSAL" NUMBER,
	"HISAL" NUMBER
   ) ;

--------------------------------------------------------
--  DDL for Index PK_DEPT
--------------------------------------------------------

  CREATE UNIQUE INDEX "PK_DEPT" ON "DEPT" ("DEPTNO")   ;
--------------------------------------------------------
--  DDL for Index PK_EMP
--------------------------------------------------------

  CREATE UNIQUE INDEX "PK_EMP" ON "EMP" ("EMPNO")   ;

以上SQL中未给出PK和FK约束的ddl信息。

如果使用scott用户登录数据库，导出自身schema的table,index,constraint定义信息时:

--------------------------------------------------------
--  File created - Monday-August-30-2010
--------------------------------------------------------
--------------------------------------------------------
--  DDL for Table BONUS
--------------------------------------------------------

  CREATE TABLE "BONUS"
   (	"ENAME" VARCHAR2(10),
	"JOB" VARCHAR2(9),
	"SAL" NUMBER,
	"COMM" NUMBER
   ) ;
--------------------------------------------------------
--  DDL for Table DEPT
--------------------------------------------------------

  CREATE TABLE "DEPT"
   (	"DEPTNO" NUMBER(2,0),
	"DNAME" VARCHAR2(14),
	"LOC" VARCHAR2(13)
   ) ;
--------------------------------------------------------
--  DDL for Table EMP
--------------------------------------------------------

  CREATE TABLE "EMP"
   (	"EMPNO" NUMBER(4,0),
	"ENAME" VARCHAR2(10),
	"JOB" VARCHAR2(9),
	"MGR" NUMBER(4,0),
	"HIREDATE" DATE,
	"SAL" NUMBER(7,2),
	"COMM" NUMBER(7,2),
	"DEPTNO" NUMBER(2,0)
   ) ;
--------------------------------------------------------
--  DDL for Table SALGRADE
--------------------------------------------------------

  CREATE TABLE "SALGRADE"
   (	"GRADE" NUMBER,
	"LOSAL" NUMBER,
	"HISAL" NUMBER
   ) ;

--------------------------------------------------------
--  Constraints for Table EMP
--------------------------------------------------------

  ALTER TABLE "EMP" ADD CONSTRAINT "PK_EMP" PRIMARY KEY ("EMPNO") ENABLE;

--------------------------------------------------------
--  Constraints for Table DEPT
--------------------------------------------------------

  ALTER TABLE "DEPT" ADD CONSTRAINT "PK_DEPT" PRIMARY KEY ("DEPTNO") ENABLE;
--------------------------------------------------------
--  DDL for Index PK_EMP
--------------------------------------------------------

  CREATE UNIQUE INDEX "PK_EMP" ON "EMP" ("EMPNO")   ;
--------------------------------------------------------
--  DDL for Index PK_DEPT
--------------------------------------------------------

  CREATE UNIQUE INDEX "PK_DEPT" ON "DEPT" ("DEPTNO")  ;

--------------------------------------------------------
--  Ref Constraints for Table EMP
--------------------------------------------------------

  ALTER TABLE "EMP" ADD CONSTRAINT "FK_DEPTNO" FOREIGN KEY ("DEPTNO")
	  REFERENCES "DEPT" ("DEPTNO") ENABLE;

Oracle SQL developer给出了正确的SQL，包含FK和PK约束。
这个Bug在8月(2010 Aug)被确认将影响此前所有版本的Oracle SQL developer。

Filed Under: Oracle Tagged With: constraint, Oracle Bug, sql developer

Oracle 11g中数据库能有多大？

2010/08/29 by admin Leave a Comment

根据11g release 2文档《Oracle® Database Reference 11g Release 2 (11.2)》，我们可以计算出11g中数据库大小的一个理论极限:

Physical Database Limits

Item	Type of Limit	Limit Value
Database Block Size	Minimum	`2048` bytes; must be a multiple of operating system physical block size
Database Block Size	Maximum	Operating system dependent; never more than `32` KB
Database Blocks	Minimum in initial extent of a segment	2 blocks
Database Blocks	Maximum per datafile	Platform dependent; typically 2²² – 1 blocks
Controlfiles	Number of control files	`1` minimum; `2` or more (on separate devices) strongly recommended
Controlfiles	Size of a control file	Dependent on operating system and database creation options; maximum of `20,000` x (database block size)
Database files	Maximum per tablespace	Operating system dependent; usually `1022`
Database files	Maximum per database	65533 May be less on some operating systems Limited also by size of database blocks and by the `DB_FILES` initialization parameter for a particular instance
Database extents	Maximum per dictionary managed tablespace	`4` GB * physical block size (with K/M modifier); `4` GB (without K/M modifier)
Database extents	Maximum per locally managed (uniform) tablespace	`2` GB * physical block size (with K/M modifier); `2` GB (without K/M modifier)
Database file size	Maximum	Operating system dependent. Limited by maximum operating system file size; typically 2²² or `4` MB blocks
`MAXEXTENTS`	Default value	Derived from tablespace default storage or `DB_BLOCK_SIZE` initialization parameter
`MAXEXTENTS`	Maximum	Unlimited
Redo Log Files	Maximum number of logfiles	Limited by value of `MAXLOGFILES` parameter in the `CREATE DATABASE` statementControl file can be resized to allow more entries; ultimately an operating system limit
Redo Log Files	Maximum number of logfiles per group	Unlimited
Redo Log File Size	Minimum size	`4 MB`
Redo Log File Size	Maximum Size	Operating system limit; typically `2` GB
Tablespaces	Maximum number per database	`64 K` Number of tablespaces cannot exceed the number of database files because each tablespace must include at least one file
Bigfile Tablespaces	Number of blocks	A bigfile tablespace contains only one datafile or tempfile, which can contain up to approximately 4 billion ( 2³² ) blocks. The maximum size of the single datafile or tempfile is 128 terabytes (TB) for a tablespace with 32 K blocks and 32 TB for a tablespace with 8 K blocks.
Smallfile (traditional) Tablespaces	Number of blocks	A smallfile tablespace is a traditional Oracle tablespace, which can contain 1022 datafiles or tempfiles, each of which can contain up to approximately 4 million (2²²) blocks.
External Tables file	Maximum size	Dependent on the operating system.An external table can be composed of multiple files.

11g中当采用32 K块时单个表空间最大可以扩展到128 TB，而一个数据库最多拥有64 K个表空间，则可以得出单个数据库的理论最大值为128TB *64 K= 8192 PB= 8EB，该数据与Oracle 10g 数据库的容量持平，换而言之11g中数据库的容量并未增加。

Youtube视频网站一年产生的信息量大约在800TB，世界上1000强的公司所有数据库的总数据量约合1EB(平均一个公司1PB)，而64位可寻址内存的最大容量为16EB，理论上我们可以把所有这些少于8EB的信息存储到您的笔记本上安装的Oracle数据库中，而且可以将它们完全缓存到内存中以供读取。

在几年前没有人相信TB级别的数据库会在不久后普及，但是我们今天可以很自信地说随着TB级别的数据库不断涌现，无论从硬件角度或者软件角度支撑该数量级的数据库的solution都已经十分成熟了。

在11g未来的日子里，我们将注目于EB级数据库的出现！

Filed Under: Oracle Tagged With: database, ORA, Oracle, Oracle 11g新特性, physical, RAC

发一个Oracle Dba招聘启事

2010/08/28 by admin Leave a Comment

公司招聘初中级Oracle Dba。主要向客户提供oracle数据库技术支持服务。
工作地点在上海，具体薪资视乎应聘者的技术水平。

具体要求：
1.熟悉Oracle基础理论和知识，对Oracle技术有持久的兴趣
2.熟悉数据库常见错误和简单性能问题的诊断
3.不对经验作强制要求，当然有经验更佳
4.熟悉unix/solaris操作系统/存储等硬件环境者优先
5.拥有ocp证书者优先

如有意向，请将简历发至 maclean.liu@parnassusdata.com
该职位长期招聘中！

Filed Under: Oracle Tagged With: ORA, Oracle, RAC

Database Force open example

2010/08/27 by admin 3 Comments

帮网友强制打开了一个没有备份的测试库，这个库没有备份也没有打开归档，因为之前也出现过active日志文件损毁，一直使用隐式参数才能正常打开:

                 _allow_resetlogs_corruption= TRUE

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线： 13764045638 QQ号:47079569 邮箱：service@parnassusdata.com

这次一开始这个库报ORA-600[2662]错误:

Mon Aug 23 09:37:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131504], [0], [130254136], [4264285], [], []
Mon Aug 23 09:37:02 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_852096.trc:
ORA-00600: internal error code, arguments: [2662], [0], [130131506], [0], [130254136], [4264285], [], []

ORA-600 [2662] “Block SCN is ahead of Current SCN”错误是当数据块中的SCN领先于current SCN，由于后台进程或服务进程都会比对UGA中的dependent SCN和数据库当前的SCN，如果数据库当前SCN小于dependent SCN，那么该进程就会报ORA-600 [2662]错误，如果遭遇该错误的是服务进程，那么服务进程一般会异常终止；如果遭遇该错误的是后台进程譬如SMON,则会导致实例CRASH。
ORA-600 [2662]错误可以能由以下几种情况引起:
1.启用隐含参数_ALLOW_RESETLOGS_CORRUPTION后，以resetlogs形式打开数据库；这种情况下发生2662错误，根本原因是没有完全前滚导致控制文件中的SCN滞后于数据块中的SCN。
2.硬件故障导致数据库没法写控制文件和联机日志文件
3.错误的部分恢复数据库
4.恢复了控制文件，但是没有使用recover database using backup controlfile进行恢复
5.数据库crash后设置了_DISABLE_LOGGING隐含参数
6.在并行服务器环境中DLM存在问题

该错误的5个参数的具体含义如下:
ARGUMENTS:
Arg [a] Current SCN WRAP
Arg [b] Current SCN BASE
Arg [c] dependent SCN WRAP
Arg [d] dependent SCN BASE
Arg [e] Where present this is the DBA where the dependent SCN came from.

我们的case当中dependent SCN为130254136，而当前SCN为130131506，其差值为122630；从以上告警日志中可以看到数据库的当前SCN是在不断缓慢增长的，当我们遭遇到2662错误时，很滑稽的一点是只要不断重启数据库保持current SCN的增长，一段时间后2662错误会不药而愈。当然我们也可以不用这种笨办法，10015事件可以帮助我们调整数据库当前SCN:

/* 当数据库处于mount状态，可以使用10015事件来调整scn */

alter session set events '10015 trace name adjust_scn level 1';

/* 这里可以设置level 2..10等 (level 1是在每次打开数据库时scn增加1000k)*/

/* 需要注意的是10g某些版本不同于9i，需要设置隐式参数_allow_error_simulation，才能真正增进scn */

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE 10.2.0.4.0 Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

SQL> col current_scn format 999,999,999,999

SQL> select current_scn from v$database;
CURRENT_SCN
-----------
1141408

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1653518336 bytes
Fixed Size 2213896 bytes
Variable Size 989857784 bytes
Database Buffers 654311424 bytes
Redo Buffers 7135232 bytes
Database mounted.

SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.

SQL> alter database open;
Database altered.

SQL> select current_scn from v$database;
CURRENT_SCN
-----------
1142031

/* 可以看到current_scn并未大量增加，10.2.0.4上默认10015 adjust_scn不被触发 */

SQL> alter system set "_allow_error_simulation"=true scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount;
ORACLE instance started.
Total System Global Area 1653518336 bytes
Fixed Size 2213896 bytes
Variable Size 989857784 bytes
Database Buffers 654311424 bytes
Redo Buffers 7135232 bytes
Database mounted.

SQL> alter session set events '10015 trace name adjust_scn level 1';
Session altered.

SQL> alter database open;
Database altered.

SQL>select current_scn from v$database;
CURRENT_SCN
----------------
1,073,741,980

在接手之前该网友已经通过反复重启数据库将数据库的当前SCN提高到dependent SCN的127037138；原以为这样就可以打开数据库了，谁知道又出现了一下错误:

Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_929958.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Wed Aug 25 07:43:53 2010
Error 704 happened during db open, shutting down database

bootstrap自举过程中遭遇了ORA-600 [4000]错误，该错误一般当Oracle尝试读取数据字典(主要是undo$基表)中记录的USN对应的回滚段失败引起.，通过设置隐式参数_corrupted_rollback_segments可以一定程度上规避该错误，强制打开数据库,其Argument[a]代表造成读取失败的USN(undo segment number)，但实际上有问题的回滚段可能不止这一个:

/* 通过strings工具从system表空间上找回各回滚段的名字  */
$strings system.dbf |grep _SYSSMU|less
_SYSSMU1$
_SYSSMU2$
_SYSSMU3$
_SYSSMU4$
_SYSSMU5$
_SYSSMU6$
_SYSSMU7$
_SYSSMU8$
_SYSSMU9$
.........
alter system set "_corrupted_rollback_segments"='(_SYSSMU1$, _SYSSMU2$, _SYSSMU3$, _SYSSMU4$, _SYSSMU5$, _SYSSMU6$, _SYSSMU7$, _SYSSMU8$, _SYSSMU9$, _SYSSMU10$, _SYSSMU11$, _SYSSMU12$)' scope=spfile;
System altered.

/* 即便设置了_corrupted_rollback_segments隐式参数，也还有一定概率遭遇4000错误，尝试加上10513事件，并多次重启数据库 */

SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;
System altered.

/* 再次出现4000 错误 */
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_1016014.trc:
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00600: internal error code, arguments: [4000], [8], [], [], [], [], [], []
Thu Aug 26 09:43:39 2010
Error 704 happened during db open, shutting down database

/* 再次重启后发现4000错误不再出现 * /

再次重启发现不再出现ORA-600[4000]错误，但在字典检查阶段Oracle认为数据文件227不匹配于当前的incarnation:

Thu Aug 26 11:13:22 2010
Dictionary check beginning
Thu Aug 26 09:46:00 2010
Errors in file /oracle/QAS/saptrace/usertrace/qas_ora_897162.trc:
ORA-01177: data file does not match dictionary - probably old incarnation
ORA-01110: data file 227: '/oracle/QAS/sapdata2/qas_192/qas.data196'
Error 1177 happened during db open, shutting down database
USER: terminating instance due to error 1177
Instance terminated by USER, pid = 897162

初步判断出现ORA-01177可能为2种可能性:
1.数据字典出现讹误，227号文件对应的incarnation信息不正确
2.在之前的某次resetlogs open过程中，227号文件头由于某些原因没有正确更新incarnation信息

针对这样的情况如果一定要找回该数据文件上的数据的话只能通过手动修改数据字典或文件头，当然也可以尝试使用一些直接从数据文件上抽取数据的工具。
因为这是一次友情协助，就没有继续深入下去，通过重建控制文件并跳过该数据文件解决了:

CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS  NOARCHIVELOG
--  SET STANDBY TO MAXIMIZE PERFORMANCE
    MAXLOGFILES 255
    MAXLOGMEMBERS 3
    MAXDATAFILES 254
    MAXINSTANCES 50
    MAXLOGHISTORY 36302
LOGFILE
  GROUP 1 (
    '/oracle/QAS/redolog/redolog11A.dbf',
    '/oracle/QAS/redolog/redolog11B.dbf'
  ) SIZE 500M,
  GROUP 2 (
    '/oracle/QAS/redolog/redolog12A.dbf',
    '/oracle/QAS/redolog/redolog12B.dbf'
  ) SIZE 500M
-- STANDBY LOGFILE
DATAFILE
  '/oracle/QAS/sapdata1/system_1/system.data1',
........
  '/oracle/QAS/sapdata2/qas_192/qas.data195'
CHARACTER SET WE8DEC
Thu Aug 26 14:04:50 2010
Successful mount of redo thread 1, with mount id 2117500093
Thu Aug 26 14:04:50 2010
Completed: CREATE CONTROLFILE REUSE DATABASE "QAS" RESETLOGS
Thu Aug 26 14:05:05 2010
alter database mount
Thu Aug 26 14:05:05 2010
ORA-1100 signalled during: alter database mount...
Thu Aug 26 14:05:15 2010
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 1125281471596
Resetting resetlogs activation ID 0 (0x0)
Online log 1 of thread 1 was previously cleared
Thu Aug 26 14:05:36 2010
Assigning activation ID 2117500093 (0x7e367cbd)
Thread 1 opened at log sequence 1
  Current log# 2 seq# 1 mem# 0: /oracle/QAS/redolog/redolog12A.dbf
  Current log# 2 seq# 1 mem# 1: /oracle/QAS/redolog/redolog12B.dbf
Successful open of redo thread 1
Thu Aug 26 14:05:36 2010
SMON: enabling cache recovery
Thu Aug 26 14:05:36 2010
Dictionary check beginning
Tablespace 'PSAPTEMP' #2 found in data dictionary,
but not in the controlfile. Adding to controlfile.
File #227 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00227' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #228 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00228' in the controlfile.
This file can no longer be recovered so it must be dropped.
File #229 found in data dictionary but not in controlfile.
Creating OFFLINE file 'MISSING00229' in the controlfile.
This file can no longer be recovered so it must be dropped.
Dictionary check complete
Thu Aug 26 14:05:38 2010
SMON: enabling tx recovery
Thu Aug 26 14:05:38 2010
Database Characterset is WE8DEC
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: alter database open resetlogs

这个case告诉我们，测试库并不一定就不重要，测试库也是需要备份的。

Filed Under: Oracle Tagged With: database, internal error, ORA, ora-00600, Oracle, RAC, sql, standby

【Oracle数据恢复】ORA-600[4194]错误一例

2010/08/26 by admin 5 Comments

ORA-600[4194]内部错误一般由重做记录与回滚记录不匹配引发。Oracle在验证Undo record number时，会对比redo change 和回滚段中的undo record number，若发现2者存在差异则报该4194错误。其错误argument[a][b]，a代表回滚块中的最大undo record number，b代表重做日志中记录的undo record number。这个错误可能由回滚段或者redo log日志文件讹误引起。

ORA-00600[4194]错误的根本原因是 redo记录与回滚段(rollback/undo)记录之间的不一致。当ORACLE在验证undo记录时相对应的变化需要应用到undo数据块的最大undo记录上，此时若检验出错则会报ORA-00600[4194]

此错误不像ORA-600[2662]或ORA-600[4000]错误那样必然导致数据库无法打开，因为它很少出现在前滚阶段；当数据库被打开，smon开始执行事务恢复或一些回滚段的管理工作时则很有可能触发该错误。

ORA-600[4194]的2个的含义：

Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

这个ORA-600[4194] 报错属于ORACLE内核从cache层到事务undo处理，可能的影响是进程失败或者可能的回滚段坏块。

可能的bug 包括：

8240762 10.2.0.5,
11.1.0.7.10,
11.2.0.1
Undo corruptions with ORA-600 [4193]/ORA-600 [4194] or ORA-600 [4137] /
SMON may spin to recover transaction

3210520 9.2.0.5, 10.1.0.2 OERI[kjccqmg:esm] / OERI[4194] / corruption possible in RAC

792610 8.0.6.0, 8.1.6.0 Rollback segment corruption

对于非自举对象non-bootstrap对象对应的undo记录可以通过如下方法搞定，如果涉及到的对象是bootstrap系统对象则可能需要手动 bbed来修复，如果自己搞不定可以找ASKMACLEAN专业数据库修复团队成员帮您恢复：

来具体看一下错误记录:

Thu Aug 26 18:58:50 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_smon_6587.trc:
ORA-01595: error freeing extent (3) of rollback segment (4))
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 18:58:50 2010
..............
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
Thu Aug 26 19:00:31 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 19:00:34 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'
ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []
Thu Aug 26 19:00:35 2010
Errors in file /s01/10gdb/admin/YOUYUS/bdump/youyus_j000_6630.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 2 change 1617922 time 08/26/2010 18:35:39
ORA-00334: archived log: '/s01/10gdb/flash_recovery_area/YOUYUS/onlinelog/o1_mf_3_65psr4on_.log'

ORA-00600: internal error code, arguments: [4194], [53], [41], [], [], [], [], []

如果你因为ORA-600[4194]错误导致数据库无法打开，那么可以尝试设置以下事件:

SQL> alter system set event='10513 trace name context forever,level 2 : 10512 trace name context forever,level 1: 10511 trace name context forever,level 2: 10510 trace name context forever,level 1' scope=spfile;
System altered.

/* 10513事件用以阻止SMON在启动数据库后执行事务恢复(transaction recovery) */
/* 10512事件用以阻止SMON shrink rollback segment */
/* 10511事件用以阻止SMON check to cleanup undo dictionary */
/* 10500事件用以阻止SMON check to offline pending offline rollback segment */

SQL> alter system set undo_management=MANUAL scope=spfile;
System altered.

SQL> shutdown immediate;
ORA-03113: end-of-file on communication channel

SQL> startup mount;
ORACLE instance started.

Total System Global Area 2634022912 bytes
Fixed Size 2086288 bytes
Variable Size 2382367344 bytes
Database Buffers 234881024 bytes
Redo Buffers 14688256 bytes
Database mounted.
SQL> alter database open;

Database altered.

SQL> create undo tablespace undoc datafile size 300M;

SQL> alter system set undo_management=AUTO scope=spfile;
System altered.

SQL> alter system set undo_tablespace=undoc scope=spfile;
System altered.

SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 2634022912 bytes
Fixed Size 2086288 bytes
Variable Size 2382367344 bytes
Database Buffers 234881024 bytes
Redo Buffers 14688256 bytes
Database mounted.

SQL> alter database open;
Database altered.

/* 通过重建undo表空间可以避免一些4194错误，但不是全部 */

/* 这个库目前处于随时会crash的不可控状态，我们必须要导出数据并导入到新库中 * /

/* 这种情况下direct方式 可能可以规避一些意外错误 */

[maclean@rh2 dump]$ exp maclean/maclean file=full_maclean.dmp owner=maclean direct=y statistics=none
Export: Release 10.2.0.4.0 - Production on Thu Aug 26 21:18:40 2010
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Export done in ZHS16GBK character set and UTF8 NCHAR character set

About to export specified users ...
. exporting pre-schema procedural objects and actions
. exporting foreign function library names for user MACLEAN
. exporting PUBLIC type synonyms
. exporting private type synonyms
. exporting object type definitions for user MACLEAN
About to export MACLEAN's objects ...
. exporting database links
. exporting sequence numbers
. exporting cluster definitions
. about to export MACLEAN's tables via Direct Path ...
Table SYS_EXPORT_TABLE_01 will be exported in conventional path.
. . exporting table SYS_EXPORT_TABLE_01 256 rows exported
Table SYS_EXPORT_TABLE_02 will be exported in conventional path.
. . exporting table SYS_EXPORT_TABLE_02 257 rows exported
Table SYS_EXPORT_TABLE_03 will be exported in conventional path.
..............
exporting refresh groups and children
. exporting dimensions
. exporting post-schema procedural objects and actions
. exporting statistics
Export terminated successfully with warnings.

/* we are lucky! */

Filed Under: Oracle Tagged With: crash, force open, hidden parameter, internal error, ora-00600, recover

如何修复重编译Datapump工具expdp/impdp

2010/08/26 by admin Leave a Comment

数据泵工具expdp/impdp是10g中引发的服务器端导入导出外部工具，虽然是外部的binary，但是实际expdp/impdp都依赖于内部的PL/SQL package主要是(dbms_datapump)，在很多情况下我们需要修复或重新加载Datapump工具，方法如下：

对于版本10.1:

1. Catdp.sql orders the installation of all its components including 
   the Metadata API which was previously installed separately.
   By default catproc.sql invoke this script.

SQL >@ $ORACLE_HOME/rdbms/admin/catdp.sql

2. dbmspump.sql will create DBMS procedures for dataPUMP

SQL >@ $ORACLE_HOME/rdbms/admin/dbmspump.sql

对于版本10.2:

1. Catdph.sql will Re-Install DataPump types and views

SQL >@ $ORACLE_HOME/rdbms/admin/catdph.sql 

Note: If XDB is installed, then it is required to run "catmetx.sql" script also.

Use this code to verify if XDB is installed:

SQL> select substr(comp_name,1,30) comp_name, substr(comp_id,1,10)
     comp_id,substr(version,1,12) version,status from dba_registry;

Sample output if XDB installed,

Oracle XML Database            XDB        -version-   VALID

2. prvtdtde.plb will Re-Install tde_library packages

SQL >@ $ORACLE_HOME/rdbms/admin/prvtdtde.plb 

3. Catdpb.sql will Re-Install DataPump packages

SQL >@ $ORACLE_HOME/rdbms/admin/catdpb.sql 

4.Dbmspump.sql will Re-Install DBMS DataPump objects

SQL >@ $ORACLE_HOME/rdbms/admin/dbmspump.sql 

5. To recompile  invalid objects, if any

SQL >@ $ORACLE_HOME/rdbms/admin/utlrp.sql

对于版本11g :

1. Catproc.sql 

SQL >@ $ORACLE_HOME/rdbms/admin/catproc.sql 

2. To recompile invalid objects, if any

SQL >@ $ORACLE_HOME/rdbms/admin/utlrp.sql

示例使用方法FOR 10.2.0.5:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production

select substr(comp_name,1,30) comp_name, substr(comp_id,1,10)
     comp_id,substr(version,1,12) version,status from dba_registry;

确认XDB未安装

@$ORACLE_HOME/rdbms/admin/catdph.sql 

....

Package created.

Grant succeeded.

SQL> @ $ORACLE_HOME/rdbms/admin/prvtdtde.plb 

Library created.

No errors.

Package created.

Synonym created.

Package created.

@ $ORACLE_HOME/rdbms/admin/catdpb.sql 

@ $ORACLE_HOME/rdbms/admin/dbmspump.sql 

@ $ORACLE_HOME/rdbms/admin/utlrp.sql

同时可以把expdp、impdp的binary重新编译一遍

cd $ORACLE_HOME/rdbms/lib
make -f ins_rdbms.mk iexpdp iimpdp

- Linking Datapump Export utility (expdp)
rm -f /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/expdp
gcc -o /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/expdp -L/s01/oracle/product/10.2.0.5/db_1/rdbms/lib/ -L/s01/oracle/product/10.2.0.5/db_1/lib/ -L/s01/oracle/product/10.2.0.5/db_1/lib/stubs/   /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/s0udexp.o  /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/defopt.o -ldbtools10 -lclntsh  `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10  -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10  -lvsn10 -lcommon10 -lgeneric10   -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lnnetd10  -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10   `cat /s01/oracle/product/10.2.0.5/db_1/lib/sysliblist` -Wl,-rpath,/s01/oracle/product/10.2.0.5/db_1/lib -lm    `cat /s01/oracle/product/10.2.0.5/db_1/lib/sysliblist` -ldl -lm   -L/s01/oracle/product/10.2.0.5/db_1/lib
mv -f /s01/oracle/product/10.2.0.5/db_1/bin/expdp /s01/oracle/product/10.2.0.5/db_1/bin/expdpO
mv /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/expdp /s01/oracle/product/10.2.0.5/db_1/bin/expdp
chmod 751 /s01/oracle/product/10.2.0.5/db_1/bin/expdp

 - Linking Datapump Import utility (impdp)
rm -f /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/impdp
gcc -o /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/impdp -L/s01/oracle/product/10.2.0.5/db_1/rdbms/lib/ -L/s01/oracle/product/10.2.0.5/db_1/lib/ -L/s01/oracle/product/10.2.0.5/db_1/lib/stubs/   /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/s0udimp.o  /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/defopt.o -ldbtools10 -lclntsh  `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10  -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat /s01/oracle/product/10.2.0.5/db_1/lib/ldflags`    -lnsslb10 -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10  -lvsn10 -lcommon10 -lgeneric10   -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lnnetd10  -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10  -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10   `cat /s01/oracle/product/10.2.0.5/db_1/lib/sysliblist` -Wl,-rpath,/s01/oracle/product/10.2.0.5/db_1/lib -lm    `cat /s01/oracle/product/10.2.0.5/db_1/lib/sysliblist` -ldl -lm   -L/s01/oracle/product/10.2.0.5/db_1/lib
mv -f /s01/oracle/product/10.2.0.5/db_1/bin/impdp  /s01/oracle/product/10.2.0.5/db_1/bin/impdpO
mv /s01/oracle/product/10.2.0.5/db_1/rdbms/lib/impdp /s01/oracle/product/10.2.0.5/db_1/bin/impdp
chmod 751 /s01/oracle/product/10.2.0.5/db_1/bin/impdp

[oracle@vrh8 lib]$ ls -l $ORACLE_HOME/bin/*pdp
-rwxr-x--x 1 oracle oinstall 228377 Aug 26 09:15 /s01/oracle/product/10.2.0.5/db_1/bin/expdp
-rwxr-x--x 1 oracle oinstall 233704 Aug 26 09:15 /s01/oracle/product/10.2.0.5/db_1/bin/impdp

Filed Under: Oracle Tagged With: datapump, dbms_datapump

SGA_MAX_SIZE,SGA_TARGET以及PRE_PAGE_SGA参数

2010/08/26 by admin 4 Comments

10g引入ASMM后SGA_TARGET取代shared_pool_size,db_cache_size等参数，成为DBA关注的主要SGA内存管理参数；有不少情况下SGA_TARGET参数会设置为一个小于SGA_MAX_SIZE的值(这样做在多实例情况下更具灵活性)。但不少人会问，这样岂不是要浪费一部分物理内存吗？Oracle会为实例分配SGA_MAX_SIZE大小的内存的，SGA_TARGET要设得和SGA_MAX_SIZE一样大才合理啊！

让我们来看看实际的情况:

SQL> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

/*  linux上的10.2.0.4  */

SQL> show parameter sga
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     FALSE
sga_max_size                         big integer 3600M
sga_target                           big integer 368M

SQL> col component for a25;
SQL> select component, current_size / 1024 / 1024, last_oper_type
  2    from v$sga_dynamic_components
  3   where current_size != 0;

COMPONENT                 CURRENT_SIZE/1024/1024 LAST_OPER_TYP
------------------------- ---------------------- -------------
shared pool                                   96 STATIC
large pool                                    16 STATIC
java pool                                     16 STATIC
DEFAULT buffer cache                         224 INITIALIZING

/* 此时的SGA_TARGET为368M，而SGA_MAX_SIZE为3600M */

/* 我们来看一下实际的物理内存使用情况  */

/* 以root用户登录，因为我们需要用到清理文件系统缓存的命令 */

[maclean@rh2 ~]$ su - root
Password:

[root@rh2 ~]# sync
[root@rh2 ~]# sync

/* sync 命令用以写出文件系统脏缓存，类似于Oracle的checkpoint手动检查点 */

[root@rh2 ~]# echo 3 > /proc/sys/vm/drop_caches
[root@rh2 ~]# echo 3 > /proc/sys/vm/drop_caches

/*  将proc文件系统下的drop_caches文件置为3，清理文件系统缓存，以免其干扰我们的实验 */

[root@rh2 ~]# free
             total       used       free     shared    buffers     cached
Mem:       4046352     429328    3617024          0        296     176100
-/+ buffers/cache:     252932    3793420
Swap:     10241428          0   10241428

/*  4g内存中仅有419M处于使用状态，其他均为free。可见Oracle没有为实例分配大小等于SGA_MAX_SIZE的内存空间，而接近于SGA_TARGET_SIZE的大小 */
[root@rh2 ~]# ps -ef|grep pmon|grep -v grep
maclean   6361     1  0 18:35 ?        00:00:00 ora_pmon_YOUYUS

[root@rh2 ~]# pmap -x 6361
6361:   ora_pmon_YOUYUS
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000  100412    7300       0 r-x--  oracle
000000000680f000     544     180      64 rwx--  oracle
0000000006897000     148     104     104 rwx--    [ anon ]
000000001e9d0000     672     532     532 rwx--    [ anon ]
0000000060000000 3688448    1044     388 rwxs-    [ shmid=0x390005 ]

/* 利用pmap工具探测Oracle后台进程的内存地址空间，可以看到这里虚拟共享内存段(也就是SGA)的大小为3602M */

[root@rh2 ~]# ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x74018366 3112960    root      600        4          0
0x00000000 3473409    root      644        80         2
0x74018365 3080194    root      600        4          0
0x00000000 3506179    root      644        16384      2
0x00000000 3538948    root      644        280        2
0x1cdd16e8 3735557    maclean   640        3776970752 15

/* 使用ipcs 命令也可以观察到这个大小为3602M的共享内存段 */

/* 可以猜测Oracle在这里使用malloc函数或类似的方法实现虚拟内存的分配，没有像calloc函数那样对空间中的每一位都初始为0,保证了不浪费物理内存  */

/*  我们动态修改SGA_TARGET参数来看看*  /

SQL> alter system set sga_target=3000M;

System altered.

SQL> select component, current_size / 1024 / 1024, last_oper_type
  2    from v$sga_dynamic_components
  3   where current_size != 0;

COMPONENT                 CURRENT_SIZE/1024/1024 LAST_OPER_TYP
------------------------- ---------------------- -------------
shared pool                                   96 STATIC
large pool                                    16 STATIC
java pool                                     16 STATIC
DEFAULT buffer cache                        2864 INITIALIZING

/*   BUFFER_CACHE 的空间大幅增加 * /

SQL> !free
             total       used       free     shared    buffers     cached
Mem:       4046352    3187784     858568          0        536    2889920
-/+ buffers/cache:     297328    3749024
Swap:     10241428          0   10241428

/* used mem部分上升到3113M，随着SGA_TARGET的动态增长而增长了 */

/* 如果我们改小SGA_TARGET呢？ */

SQL> alter system set sga_target=368M;

System altered.

SQL> select component, current_size / 1024 / 1024, last_oper_type
  2    from v$sga_dynamic_components
  3   where current_size != 0;

COMPONENT                 CURRENT_SIZE/1024/1024 LAST_OPER_TYP
------------------------- ---------------------- -------------
shared pool                                   96 STATIC
large pool                                    16 STATIC
java pool                                     16 STATIC
DEFAULT buffer cache                         224 SHRINK

/* 可以看到BUFFER CACHE最近执行了SHRINK收缩操作，SIZE下降到224M */

QL> !free
             total       used       free     shared    buffers     cached
Mem:       4046352    3204608     841744          0       1596    2912888
-/+ buffers/cache:     290124    3756228
Swap:     10241428          0   10241428

/* 此时OS层仍认为used memory部分为3130M;但可以放心，它们是可被其他进程复用的  * /

官方对pre_page_sga参数的定义是”PRE_PAGE_SGA determines whether Oracle reads the entire SGA into memory at instance startup. Operating system page table entries are then prebuilt for each page of the SGA. This setting can increase the amount of time necessary for instance startup, but it is likely to decrease the amount of time necessary for Oracle to reach its full performance capacity after startup.” 它决定Oracle是否在启动阶段将整个SGA读入内存，实际上在启动阶段SGA的内容是十分稀疏的，当pre_page_sga为TRUE时Oracle所要做的是向OS所要SGA_MAX_SIZE大小的实际物理页。

我们来看一下这个参数的实际效果:

SQL> alter system set sga_max_size=2500M scope=spfile;
System altered.

SQL> alter system set pre_page_sga=true scope=spfile;
System altered.

SQL> startup force ;
ORACLE instance started.

Total System Global Area 2634022912 bytes
Fixed Size                  2086288 bytes
Variable Size            2382367344 bytes
Database Buffers          234881024 bytes
Redo Buffers               14688256 bytes
Database mounted.
Database opened.

[root@rh2 ~]# echo 3 > /proc/sys/vm/drop_caches

SQL> show parameter sga

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
lock_sga                             boolean     FALSE
pre_page_sga                         boolean     TRUE
sga_max_size                         big integer 2512M
sga_target                           big integer 368M

SQL> !free
             total       used       free     shared    buffers     cached
Mem:       4046352    2895256    1151096          0        648    2623692
-/+ buffers/cache:     270916    3775436
Swap:     10241428     104936   10136492

/*  设置pre_page_sga参数为true后，Oracle实际分配的物理内存不再局限于SGA_TARGET，而是在实例生命周期内几乎恒等于SGA_MAX_SIZE指定的值了 * /

/*  当SGA_TARGET配合pre_page_sga使用时，这的确可能造成一种浪费 ! * /

10g中Oracle已经强烈不建议使用pre_page_sga参数了，在ASMM下它带来的问题可不小!

Filed Under: Oracle Tagged With: database, ORA, Oracle, RAC, sql

undo backup optimization does not work on 11.2.0.1?

2010/08/25 by admin 2 Comments

Backup Undo Optimization是11g的新特性之一，RMAN将避免备份撤销表空间上那些已提交事务的撤销数据。且该特性无法被禁用(You can enable and disable backup optimization, but backup undo optimization is built-in behavior.)。

我们在11.2.0.1版本上具体测试一下这个新特性:

SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

/*  为了避免undo自动调优干扰我们的测试，修改_undo_autotune参数禁用该特性 */

SQL> alter system set "_undo_autotune"=false;
System altered.

/* 创建一个新的undo表空间，清理现场 */
SQL> create undo tablespace UNDOTEST datafile size 500M autoextend on next 50M maxsize unlimited;
Tablespace created.

SQL> alter system set undo_tablespace=UNDOTEST;
System altered.

/* 列出相关的undo参数，可以看到这里undo_retention参数设为极短的10s */

SQL> show parameter undo;
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
_undo_autotune                       boolean     FALSE
undo_management                      string      AUTO
undo_retention                       integer     10
undo_tablespace                      string      UNDOTEST

RMAN> list backup;

specification does not match any backup in the repository

RMAN> backup tablespace UNDOTEST;

RMAN> list backup of tablespace UNDOTEST;

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
4       Full    1.90M      DISK        00:00:00     25-AUG-10

/* undo表空间初始的备份大小为1.9M  */

SQL> conn maclean/maclean
Connected.

SQL> select count(*) from YOUYUS;
  COUNT(*)
----------
    579808

/* YOUYUS表有大约60万条数据，我们批量删除这些数据，将产生大量的undo*/

SQL> delete YOUYUS;
579808 rows deleted.

/*  此时再次执行备份undo表空间操作 */

RMAN> backup tablespace UNDOTEST;

RMAN>  list backup of tablespace UNDOTEST;

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
5       Full    134.43M    DISK        00:00:03     25-AUG-10

/*  在存在大量active undo数据的情况下，备份文件增大到134M */

SQL> commit;
Commit complete.

SQL> exec dbms_lock.sleep(20);
PL/SQL procedure successfully completed.

SQL> select status,sum(blocks) from dba_undo_extents group by status;
STATUS    SUM(BLOCKS)
--------- -----------
UNEXPIRED        2696
EXPIRED         32936

/* commit后等待20s,确定没有active的撤销段 */

RMAN> backup tablespace UNDOTEST;

RMAN> list backup of tablespace UNDOTEST;
BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
6       Full    134.49M    DISK        00:00:02     25-AUG-10

/* 备份文件还要大于commit前，undo backup optimization居然没有起作用？ */

/* 这个会是BUG吗？ */

根据以上情况我提交了SR，ORACLE GCS给出的回复:

Bug 6399468: UNDO OPTIMIZATION
====> Undo optimization was changed from ‘all undo not needed’ to ‘undo not in use older than 1 hour’
In unpublished bug 6399468 DEV has confirmed the Undo optimization was changed from ‘all undo not needed’ to ‘undo not in use older than 1 hour’. For your last test case please wait for 1 hour and try backup again.

居然又是一个unpublished的BUG，Oracle DEV部门确认了backup undo optimization所避免备份的是1个小时以上未被尝试使用的undo，而非所有不再需要的undo。这是典型的开发部门和文档撰写部门间没有充分交流造成的问题！

/* 尝试等待3600s */

SQL> exec dbms_lock.sleep(3600);
PL/SQL procedure successfully completed.

/* 3600s还真漫长....... */

RMAN> backup tablespace UNDOTEST;

RMAN> list backup of tablespace UNDOTEST;
BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
10      Full    15.01M     DISK        00:00:00     25-AUG-10

/* 备份集缩小到15m,undo backup optimization起到了作用！*/

That's great!

Filed Under: Oracle Tagged With: database, ORA, Oracle, Oracle 11g新特性, Oracle Bug, RAC, sql

Physical Database Limits

诗檀软件专业数据库修复团队

服务热线 ： 13764045638 QQ号:47079569 邮箱：service@parnassusdata.com

这个case告诉我们，测试库并不一定就不重要，测试库也是需要备份的。

10g中Oracle已经强烈不建议使用pre_page_sga参数了，在ASMM下它带来的问题可不小!

服务热线： 13764045638 QQ号:47079569 邮箱：service@parnassusdata.com