转储控制文件信息

当前控制文件的内容可以通过CONTROLF dump以文本形式转储到实例参数指定的目录下。

各级别CONTROLF转储的区别:

转储级别 转储内容
1 仅包含文件头信息
2 包括文件头,数据库信息记录,检查点进程记录
3 所有记录类型,针对循环重用的记录类型仅保留最早及最近的记录
4 如上,包含4条最新的可重用类型的记录
5+ 如上,包含的可重用类型记录为2*level条
SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/s01/rac10g/admin/PROD/udump/prod_ora_3710.trc

以下例而言,使用oradebug命令启动转储,将会把当前控制文件信息以文本形式转储到该服务进程对应的用户转储文件中,文本中记录了level 3所对应的内容。

SQL> oradebug setmypid;
Statement processed.
SQL> oradebug dump controlf 3;
Statement processed.
SQL> oradebug tracefile_name;
/s01/rac10g/admin/PROD/udump/prod_ora_3710.trc

oradebug必须以SYSDBA形式登陆后方能使用,你也可以以非SYSDBA的身份通过alter session形式启动转储:

SQL> alter session set events 'immediate trace name controlf level 3';

Session altered.

[gview file=”http://askmac.cn/wp-content/uploads/resource/Controlfile.doc”]

巧用close_trace命令释放误删trace文件

可能很多朋友都遇到过这样的情况,在UNIX/Linux上定期清理Oracle日志文件夹时可能删除到仍被后台进程open着的trace文件,即某些后台进程一直持有着这些”被已经误删了的“打开文件的描述符(fd),这种情况下文件系统上该文件实际占用的空间是不会被释放的,这就造成使用df命令查看文件系统剩余空间和用du命令查看文件夹空间使用量时数值不一致的问题。此外因为是后台进程持有这些打开文件描述符,所以我们无法像kill服务进程一样来解决该问题(部分后台进程是可以kill的,不建议这样做)。oradebug是sqlplus中威力强大的debug命令,我们可以通过该命令发起多种trace/dump,其中也包括了close_trace事件;close_trace事件可以让指定进程关闭其正持有的trace文件。

下面我们就来演示下相关操作:

[maclean@rh2 ~]$ ps -ef|grep ora_|grep -v grep
maclean   7281     1  0 16:35 ?        00:00:00 ora_pmon_PROD
maclean   7283     1  0 16:35 ?        00:00:00 ora_psp0_PROD
maclean   7285     1  0 16:35 ?        00:00:00 ora_mman_PROD
maclean   7287     1  0 16:35 ?        00:00:00 ora_dbw0_PROD
maclean   7289     1  0 16:35 ?        00:00:00 ora_lgwr_PROD
maclean   7291     1  0 16:35 ?        00:00:00 ora_ckpt_PROD
maclean   7293     1  0 16:35 ?        00:00:00 ora_smon_PROD
maclean   7295     1  0 16:35 ?        00:00:00 ora_reco_PROD
maclean   7297     1  0 16:35 ?        00:00:00 ora_cjq0_PROD
maclean   7299     1  0 16:35 ?        00:00:00 ora_mmon_PROD
maclean   7301     1  0 16:35 ?        00:00:00 ora_mmnl_PROD
maclean   7303     1  0 16:35 ?        00:00:00 ora_d000_PROD
maclean   7305     1  0 16:35 ?        00:00:00 ora_s000_PROD
maclean   7313     1  0 16:35 ?        00:00:00 ora_qmnc_PROD
maclean   7430     1  0 16:35 ?        00:00:00 ora_q000_PROD
maclean   7438     1  0 16:36 ?        00:00:00 ora_q001_PROD

/* lgwr是著名的Oracle后台进程,在这个启动的实例中其系统进程号为7289*/

[maclean@rh2 ~]$ ls -l /proc/7289/fd        /* linux上的proc文件系统可以很方便我们探测进程信息*/
total 0
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 0 -> /dev/null
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 1 -> /dev/null
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 10 -> /dev/zero
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 11 -> /dev/zero
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 12 -> /s01/rac10g/rdbms/mesg/oraus.msb
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 13 -> /s01/rac10g/dbs/hc_PROD.dat
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 14 -> /s01/rac10g/dbs/lkPROD
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 15 -> /s01/rac10g/oradata/PROD/controlfile/o1_mf_64q6xphj_.ctl
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 16 -> /s01/rac10g/flash_recovery_area/PROD/controlfile/o1_mf_64q6xpms_.ctl
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 17 -> /s01/rac10g/oradata/PROD/onlinelog/o1_mf_1_64q6xrsr_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 18 -> /s01/rac10g/flash_recovery_area/PROD/onlinelog/o1_mf_1_64q6xsoy_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 19 -> /s01/rac10g/oradata/PROD/onlinelog/o1_mf_2_64q6xths_.log
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 2 -> /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 20 -> /s01/rac10g/flash_recovery_area/PROD/onlinelog/o1_mf_2_64q6xv9o_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 21 -> /s01/rac10g/oradata/PROD/onlinelog/o1_mf_3_64q6xw1b_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 22 -> /s01/rac10g/flash_recovery_area/PROD/onlinelog/o1_mf_3_64q6xwv0_.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 23 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_system_64q6wd5j_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 24 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_undotbs1_64q6wd7f_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 25 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_sysaux_64q6wd5m_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 26 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_users_64q6wd89_.dbf
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 27 -> /s01/rac10g/oradata/PROD/datafile/o1_mf_temp_64q6xyox_.tmp
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 28 -> /s01/rac10g/rdbms/mesg/oraus.msb
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 3 -> /dev/null
lr-x------ 1 maclean oinstall 64 Jul 26 16:38 4 -> /dev/null
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 5 -> /s01/rac10g/admin/PROD/udump/prod_ora_7279.trc
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 6 -> /s01/rac10g/admin/PROD/bdump/alert_PROD.log
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 7 -> /s01/rac10g/dbs/lkinstPROD (deleted)
lrwx------ 1 maclean oinstall 64 Jul 26 16:38 8 -> /s01/rac10g/dbs/hc_PROD.dat
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 9 -> /s01/rac10g/admin/PROD/bdump/alert_PROD.log

/*可以看到lgwr进程相关trace文件为/s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc,对应打开文件描述符为2*/

[maclean@rh2 ~]$ ls -lh /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc
-rw-r----- 1 maclean oinstall 1.7M Jul 26 16:37 /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc

[maclean@rh2 ~]$ rm -f /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc

/*尝试删除该trace文件*/

[maclean@rh2 ~]$ ls -l /proc/7289/fd|grep lgwr
l-wx------ 1 maclean oinstall 64 Jul 26 16:38 2 -> /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc (deleted)

/*文件已处在deleted状态,但lgwr进程仍持有该文件相关的文件描述符,这个时候该文件占有的空间并不会被释放*/

[maclean@rh2 ~]$ lsof|grep lgwr
oracle 7289   maclean    2w   REG 8,2   1702391 3867134 /s01/rac10g/admin/PROD/bdump/prod_lgwr_7289.trc (deleted)

[maclean@rh2 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.5.0 - Production on Mon Jul 26 17:03:04 2010

Copyright (c) 1982, 2010, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> oradebug setospid 7289;
Oracle pid: 6, Unix process pid: 7289, image: oracle@rh2 (LGWR)
SQL> oradebug flush;             /*写出trace buffer内容到trace文件*/
Statement processed.
SQL> oradebug close_trace;
Statement processed.
/*close_trace能够释放指定Oracle进程正打开着的文件,To close the current trace file use*/
SQL> host
[maclean@rh2 ~]$ lsof|grep lgwr

[maclean@rh2 ~]$ ls -l /proc/7289/fd/|grep lgwr
[maclean@rh2 ~]$
/* 从进程相关的fd文件夹中查找不到原来的trace文件;close_trace命令成功释放了该文件,并回收了磁盘空间。*/

【书籍推荐】Oracle 8i Internal Services

直到今天仍是经典,无可替代,绝不落伍。想了解Oracle internal 必读的一本书,每次翻开影印本都会有所收获,感谢作者的无私贡献!
[gview file=”http://askmac.cn/wp-content/uploads/resource/oracle8i.internal.services.for.waits.latches.locks.pdf”]

【书籍推荐】Expert Oracle Practices

类似于《Oracle dba日记》,由国外一群Oracle专家合著,够fashion!

Expert_Oracle_Practices

【转】网络制图法(Internet Cartography)

fackbook的技术专家之一Carlos Bueno在这周一发表了这篇关于有趣的网络制图(Internet Cartography)的文章,如果你恰好”无法正常浏览“facebook的页面,那么也可以读读我所引用的:

Every generation likes to think it reinvents the world from scratch. But some things are shaped by history and geography as much as anything. Mountains, rivers, archipelagos, and long terrestrial crossings play a big role in deciding where, how, and how well different parts of the Earth get connected.

This is a map of the global telegraph network from 110 years ago side-by-side with the internet of today:

One way to see the internet is as a physical manifestation of trade volume between cities, on a 40-year moving average. That is about how long it takes for economic ties to develop, demand to rise, and high-volume communications routes to be financed and built. Once built, these links tend to stick around.

Governments and empires have come and gone, bandwidth has increased a billion-fold, but the network has the same general shape it had back when Mark Twain was sending witty telegrams. The only big change since then is greater ties between the US and Asia.

Just from looking at where the cables go you can guess how long it would take to send a message. A telegram from San Francisco to Hong Kong in 1901 must have taken many hops through British Empire cables to Europe, through the Middle East, and so on. London to New York was fast and direct. The vestiges of the Spanish and Portuguese Empires show up in the many links between South America, the Caribbean archipelago, and the Iberian peninsula.

A cool thing is that you can measure these relative latencies yourself, using the present-day internet. If you run a website with a decent amount of worldwide traffic, you can use that traffic to map out how the internet responds with regards to you, and see how that matches with the gross structure of the ‘net.

I wrote about a cheap and cheerful way to generate this data last year, and the code has since been open-sourced as part of Yahoo’s Boomerang measurement framework. The basic idea is to have your users perform two tiny network requests: one to a throwaway hostname generated in the moment, like 8j48sas.dns.example.net/A.gif, then another to a different single-pixel image on the same host, 8j48sas.dns.example.net/B.gif. The first request will require a DNS lookup, TCP handshake, and HTTP transaction. The second only needs to do the TCP and HTTP steps. Now you have fuzzy measurements of how long it took to do a full HTTP round-trip (B) and to do a full end-to-end DNS lookup (A – B).

Real-world data on DNS performance is generally considered hard to come by. The domain name system is designed with caching and intermediaries at all levels, so you as a site owner only see part of the story during normal operation. You can buy data like this from commercial services like Gomez or Keynote, or get it yourself if you happen to have, say, a browser plugin installed on millions of computers. Otherwise, this Javascript method is less accurate but works well enough.
xkcd.com/192

Here is a chart of median (50th percentile) DNS latencies experienced by a random sample of Facebook users, broken down by country. As you can see, there are several lines crowding together at the bottom. That is the US and parts of Europe like the UK and Belgium. Facebook’s DNS servers tend to be physically close to users in those countries. Spain and France are a bit higher up, and the rest of the graph is a mix of Asian and South American countries. [1]

The median value only tells part of the story. Here is the worldwide DNS latency data as a density plot, to show the distribution. Notice that a substantial number of users took more than 500 milliseconds just to look up a hostname. This is the uncached worst-case, of course, but it’s something to keep in mind.


HTTP Latencies
Here is the chart for measurement B, the TCP + HTTP latency. This better reflects the real “geography” of the internet, because the HTTP requests travel all of the way back to our web tiers in the United States. There is much less volatility in these measurements day-to-day; it’s controlled more by basic network conditions and speed-of-light and less by the health of various DNS recursors around the world.

How low can you go?
So how fast are these links between countries, compared to what is possible? Below is a chart of the same median HTTP latency data, averaged over a week. The short light-grey bars represent the theoretical minimum. If you could carve a direct line between any two spots along the surface of the planet, this grey bar would be the internet round trip time between the US and the given country. [2]

We can learn a lot of things from this chart. The most obvious is that HTTP latency between Asia and the US is worse than US-Europe. The Pacific Ocean is wider than the Atlantic, of course, but raw distance is not the only factor. Economics and local geography play their part.

Look at the ratios between the black bars (real) and the grey bars (theoretical). Both the fastest European and Asian countries have real-world latencies at or below 2X the theoretical minimum, which is pretty impressive. Few technologies get within spitting distance of the physical limits of the universe.

These low-multiple countries tend to have fortunate geography, or a strong history of economic relations with the United States, or both. Other countries with less-strong trade ties, such as Spain, or lots of little islands like the Philippines, have multiples nearer to 2.5X and above. While Australia is a bit farther than Thailand it’s 15% closer as far as the internet is concerned. More investment has been put in by the cable operators to make that route fast and wide. In fact, Australia (population 22M) a comparable amount of bandwidth to the US as all of South America (population 385M).

The multiples of South American countries start at 3.5X and go up from there. North-South routes are hurt by an unlucky trifecta of mountains, long land crossings, and archipelagos. There is only one cable that serves the Pacific side from Los Angeles to Panama. It’s hard to justify building lots of capacity on the Pacific side, because the Andes mountains cut off that part of the continent from the rest. Most traffic follows a long and painful path across the entire length of the US to the Atlantic, then takes a right turn and down another 800 miles of the Florida peninsula. It exits Miami and immediately hits a congested maze of cables, hopping in and out of the water as it navigates the islands of the Caribbean. Someday South America will get better connected, but natural barriers drive the costs way up.

There are other interesting cases such as Belgium, which has the lowest latency and lowest multiple (1.6X) of any European country. The reason is that Belgium is well-placed as an internet nexus, being a) close to Britain but away from the Channel and b) geographically convenient for branching off into the rest of Europe.

Try this at home
These measurements are very skewed towards the United States. It would be awesome to see measurements from other spots and different traffic patterns from around the world. The code to collect this data (and a lot more) is open-source and simple to implement. So try the experiment for yourself and let us know what you find.

Carlos Bueno, an Engineer at Facebook, loves pinging the tubes.

Notes
[1] This chart generally agrees with data gathered by Yahoo and Microsoft. The data is very US-centric; the picture will be quite different if you were to run the experiment from a site based on another continent. Facebook’s servers are largely in the US, so naturally we care most about how to get bits from here to there and less about, say, between India and Saudi Arabia.

[2] The theoretical minimum latency is calculated using the average speed of light through optical fiber, over a hypothetical cable laid in a great circle line between the town of Independence, Kansas and the centroid of the given country. This time is multiplied by 4 to approximate the two round-trips necessary to complete a TCP handshake and HTTP transaction. You can read all about Great Circle routes and the speed of light through fiber in Wikipedia, or just use Wolfram Alpha to do it for you.

_shared_pool_reserved_pct or shared_pool_reserved_size with ASMM

共享池是Oracle著名的SGA的一个重要组成部分,当我们尝试从共享池中分配较大的连续区域时(默认来说是4400bytes),我们可能会用到共享池中的保留区域(也叫保留池);注意Oracle总是会先尝试扫描普通共享池的空闲列表,之后才尝试扫描保留池的空闲列表,无论所需分配的内存块是否超过隐式参数_shared_pool_reserved_min_alloc所指定的值。

什么?你看到过的4031描述文档是用以下伪代码描述分配流程的:

large, scan reserved list
if (chunk found)
check chunk size and perhaps truncate
if (chunk is not found)
scan regular free list
if (chunk found)
check chunk size and perhaps truncate
all done
if (chunk is not found)
do LRU operations and repeat

small, scan regular free list
if (chunk found)
check chunk size and perhaps truncate
all done
if (chunk is not found)
do LRU operations and repeat

那么来看看以下测试:

SQL> alter system set "_shared_pool_reserved_pct"=5 scope=spfile;

System altered.

SQL> startup frce;
SP2-0714: invalid combination of STARTUP options
SQL> startup force;
ORACLE instance started.

Total System Global Area 3154116608 bytes
Fixed Size                  2099616 bytes
Variable Size            2197816928 bytes
Database Buffers          939524096 bytes
Redo Buffers               14675968 bytes
Database mounted.
Database opened.
SQL>  select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
3525368

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
3   WHERE x.inst_id = USERENV ('Instance')
4   AND y.inst_id = USERENV ('Instance')
5   AND x.indx = y.indx
6  AND x.ksppinm LIKE '%_shared_pool_reserved_min_alloc%';

NAME                            VALU DESCRIB
------------------------------- ---- ---------------------------------------------------------------------
_shared_pool_reserved_min_alloc 4400 minimum allocation size in bytes for reserved area of shared pool

SQL> select count(*) from x$ksmsp where ksmchsiz>4400 and ksmchcom!='free memory';

COUNT(*)
----------
64

SQL>  exec dbms_workload_repository.create_snapshot;

PL/SQL procedure successfully completed.
SQL> select count(*) from x$ksmsp where ksmchsiz>4400 and ksmchcom!='free memory';

COUNT(*)
----------
67                            /* 方才调用的存储过程成功在共享池中分配到3个大于4400 byte的Chunk,接下来看保留池大小变化)
SQL>  select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
3525368               /* 保留池大小没有发生变化,很显然3个大于4400 byte的Chunk是从regular free list上获取的,而非reserved free list/

以上实验中我们通过调用awr快照存储过程,模拟了从共享池中分配大于4400字节Chunk的操作,实验结果是在保留池有足够空闲空间的情况下,Oracle仍尝试在普通共享池区域中分配了这些连续内存,故而通过查询内部视图x$ksmsp虽然发现了多出了三个大于4400 byte的Chunk,然而保留池的空闲量并未减少。由此可证即便是要分配大于4400字节的内存块,Oracle也会先尝试搜索普通空闲列表,在普通空闲列表上无适应尺寸的连续空间时,才会尝试扫描保留池的空闲列表。

言归正题,我们可以通过2个参数控制保留池的大小:shared_pool_reserved_size和_shared_pool_reserved_pct。这2个参数的区别在于普通参数shared_pool_reserved_size以数值形式制定保留池的大小,这个数值采用在10g的ASMM(自动管理的SGA内存管理)特性的环境中是不会随共享池的大小变化而浮动的;不同于此,隐式参数_shared_pool_reserved_pct作为一个比例值,可以协同ASMM中共享池的变化而适应变化。在讨论经典4031错误的数个文档中,都有介绍到如果在ASMM环境中,设置_shared_pool_reserved_pct往往要好过shared_pool_reserved_size,它使你的共享池更具可收缩性!

纸上得来终觉浅,我们来看看_shared_pool_reserved_pct的实际效果:

SQL> alter system set sga_max_size=3000M scope=spfile;

System altered.

SQL> alter system set sga_target=3000M scope=spfile;

System altered.

SQL> alter system set shared_pool_size=500M;

System altered.

SQL> alter system set "_shared_pool_reserved_pct"=50 scope=spfile;

System altered.

SQL> startup force ;
ORACLE instance started.

Total System Global Area 3154116608 bytes
Fixed Size                  2099616 bytes
Variable Size             570426976 bytes
Database Buffers         2566914048 bytes
Redo Buffers               14675968 bytes
Database mounted.
Database opened.
SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
21158280
SQL> alter system set shared_pool_size=2000M ;  /*ASMM下手动修改shared_pool_size,模拟共享池自动扩展的情况*/

System altered.

SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
21158280                          /*  ohhh!好像跟我们预期的差别挺大,保留池大小没变*/

让我们跑下这段产生反复硬解析的SQL:

begin
for i in 1..200000 loop
execute immediate 'select 2 from dual where 1='||i;
end loop;
end;
/

SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
296215920                                          /* 这样好了,我们如愿了,SGA真"动态" /

SQL>  alter system set shared_pool_size=300M;   /*尝试收缩ASMM下的共享池*/

System altered.

SQL>  alter system flush shared_pool;

System altered.

SQL> select free_space from v$shared_pool_reserved;

FREE_SPACE
----------
296215920                           /* 我们甚至无法flush 掉这些内存,这挺要命的 /

SQL> select name ,value/1024/1024 "SIZE MB" from v$system_parameter where name in ('sga_target','sga_max_size','shared_pool_size','db_cache_size','java_pool_size','large_pool_size','db_keep_cache_size');

NAME                    SIZE MB
-------------------- ----------
sga_max_size               3008
shared_pool_size            304
large_pool_size              16
java_pool_size               16
sga_target                 3008
db_cache_size               512
db_keep_cache_size            0

可以看到我们还有很多“没有分配”的SGA内存,我们来加大高速缓存看看:
SQL> alter system set db_cache_size=1000M;
alter system set db_cache_size=1000M
*
ERROR at line 1:
ORA-32017: failure in updating SPFILE
ORA-00384: Insufficient memory to grow cache      /* ohh 因为无法回收保留池的大量内存,导致了SGA其他组件无法扩展/

_shared_pool_reserved_pct的默认值5%可以满足绝大多数情况,通过上述实验证明设置该percent参数可以使保留池大小随SGA动态调整而扩大;但通过再次调整shared_pool_size和flush shared_pool手段都无法回收过度分配的保留池空间,这会导致其他组件无法正常扩展;因而我们在10gASMM的背景下,通过设置_shared_pool_reserved_pct可以获得更好的效果,但因为存在回收空间的问题,该参数也不宜设置过大,如果默认值在您的场景中显得过小,那么您可以尝试使用5-20这个区间内的值,超过20%的话往往就会造成负面的影响了。

facebook发布了Tornado V1.0

想要在您的站点上实时体验海量活跃用户的负载?我们来介绍一种最新的方式,这种新的实时网络框架被命名为Tornado(中文为龙卷风);Facebook从去年秋天开始研发该软件,最近以开源协议发布了该软件的1.0版本。

Tornado是基于Python开发的实时网页服务器,理论上可以支持上万的连续连接,使用长轮询方式进行实时数据传递。它是构成FriendFeed的核心技术,FriendFeed最初由2名前Google员工及多名网络社区领导人协作开发,在2009年FriendFeed被facebook所收购。Facebook公司现在的CTO:Bret Taylor进一步扩展了该软件。

Bret Taylor在过去几年中曾是Google Reader的主要开发者之一,之后他加入了real-time网络社区,并同Taylor一起启动了FriendFeed项目。之后FriendFeed被facebook收购,Darnell重返thing实验室,并领导将Tornado开发到现在的V1.0版本。

可以从这里进入Tornado的官网并下载下载到源代码tar包。

Twitter将启动其在犹他州的客户数据中心

常用twitter的用户可能感觉到了,该网站在过去几个月中出过一些过载导致无法访问的故障。世界杯期间每天300000新建用户的增长是造成过载的一个重要因素。这也推动了twitter建设自己的数据仓库存储中心。他们正在建设的一个数据中心,位于盐湖城。

虽然毫无疑问该中心将不及苹果在北卡罗来纳州耗资10亿美元建设的数据中心庞大。 Twitter发言人称他们正加紧建设一个为自身定制的数据中心,并将在今年启动。

“拥有独立的数据中心,将给予网站更大的容量,以适应用户的增长”Twitter的jean-Paul Cozzatti在其技术博客中写到。

“在该数据中心,Twitter将有能力完全控制其网络及系统配置,将占用大块商用面积,并使用特殊设计的电源和冷却设备。该数据中心将采用多个供应商提供的服务器,并且运行开源操作系统和应用程序”。

直到最近,Twitter仍使用由日本电信电话株式会社NTT美国公司在海湾地区建设的数据中心。”我们仍将和NTT美国合作管理现有的中心,这尚是我们首次定制数据中心”,一位Twitter发言人这样告诉我们。

这是自facebook在1月份公开其独立数据中心以来大型社交网站的又一重大举措。facebook的数据中心位于俄勒冈州,那里聚集了众多其他公司的Datacenter,包括亚马逊和谷歌。巨型网络公司扎堆于是有原因的,这里可以提供廉价的电能和适宜的气候(够凉爽),以及公司税收优惠。

近期Twitter出现了因大量无法正常访问的用户投诉(最主要的恐怕是有时无法注册新用户)而引发的公关危机。Twitter公司的博客大致叙述了他们的问题,Cozzatti-Twitter公司的主要技术负责人之一的博客也详细地描述了该问题。最主要的问题在于每周一,Twitter的主用户数据库会因为一个查询而卡住,此时整个系统都被锁定了。他们不得不重启该数据库,这个过程历时超过12小时!现在你或许理解他们需要对系统拥有更多控制权的苦衷了:)

”我们时常对比在各种在收缩,维护,调整Twitter这一飞翔中的火箭的工作”Cozzatti写到。

诊断ORA-08103错误

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

 

ORA-08103问题的诊断最好是能生成8103错误的ERROR STACK TRACE, 在TRACE中会记录具体引发8103的对象的OBJ和OBJD,这便于我们定位可能存在corruption的对象。

问题在于往往前台进程遇到ORA-08103错误不会在后台生成TRACE文件,这需要我们手动设置8103 触发ERRORSTACK的EVENTS:

 

ALTER SYSTEM SET  EVENTS  ‘ 8103 TRACE NAME ERRORSTACK LEVEL 3’;

解决思路包括:
1. 通过OBJD和DBA定位到具体的表名和块号
2. 有条件的情况下对该表做一个analyze .. validate structure
3. 有条件的情况下对该表所在tablespace做一个 dbms_space_admin.ASSM_TABLESPACE_VERIFY
4. 有条件的情况下move这张表或者相关的分区,尝试绕过该问题
5. 有条件的情况下降该表或分区移动到MSSM表空间上,绕过该问题

execute dbms_space_admin.tablespace_verify(‘&tablespace_name’)
oradebug setmypid
oradebug tracefile_name

 

execute dbms_space_admin.assm_tablespace_verify(‘&tablespace_name’,dbms_space_admin.TS_VERIFY_BITMAPS)
oradebug setmypid
oradebug tracefile_name

 

 

 

[oracle@nas ~]$ oerr ora 8103
08103, 00000, "object no longer exists"
// *Cause: The object has been deleted by another user since the operation
// began, or a prior incomplete recovery restored the database to
// a point in time during the deletion of the object.
// *Action: Delete the object if this is the result of an incomplete
// recovery.

@ Using the call stack trace arguments to identify the block producing the ORA-8103.

@ struct kcbds
@ {
@ ktid kcbdstid; /* full relative DBA plus object number */
@ .....@ struct ktid /* relative dba + objd */
@ {
@ kdbafr dbr_ktid; /* a relative dba */
@ kobjd objd_ktid; /* data object number */
@ kobjn objn_ktid; /* dictionary object number */@ struct kdbafr /* full relative dba */
@ {
@ ktsn tsn_kdbafr; 4bytes /* a tablespace number */
@ krdba dba_kdbafr; 4bytes /* a relative dba */
@ };

@ alter session set db_file_multiblock_read_count=1;
@ alter session set events '8103 trace name errorstack level 3';

@ kcbgtcr(kcbds *ds,...

@ ktecgshx(sdes, ...)
@ kcbds *sdes;

@ ktecgetsh(cdes, ...)
@ kcbds *cdes;

@ Example from a trace file with function ktecgshx being called by kteinicnt1:

@ kteinicnt1()+796 CALL ktecgshx() FFFFFFFF7FFF8F78 ?
@ 000000003 ? 000000004 ?
@ 0000001BC ? 000000000 ?
@ 1007AA000 ?

@ Argument/Register addr=0xFFFFFFFF7FFF8F78.
@ Dump of memory from 0xFFFFFFFF7FFF8F38 to 0xFFFFFFFF7FFF9078
@ FFFFFFFF7FFF8F30 00000000 00000000 [........]
@ FFFFFFFF7FFF8F40 00000000 00000000 FFFFFFFF 00000001 [................]
@ FFFFFFFF7FFF8F50 00000000 00000000 00000000 00000000 [................]
@ Repeat 1 times
@ FFFFFFFF7FFF8F70 00000000 00000000 0000000C 01006402 [..............d.]
After increase in load, queries against ASSM table intermittently fail with ORA-8103 when executed in
parallel if there are concurrent updates performed on the table.

This appears to only manifest when access is in parallel.

Cause

This is caused by Bug 5637976 ORA-8103 EVEN WITH THE WORKAROUND FROM Bug 3569503 fixed in 11.1g.

Concurrent inserts and direct path exports on an ASSM table causes ORA-8103/ORA-1410.
This is due to the fact that newly formatted blocks between low and high water mark do not get flushed to disk and query sees old copies from disk.

Rediscovery Information:
1. Concurrent inserts and exports on ASSM tables
2. ORA-8103/ORA-1410
3. redo dump shows 'ktspbfredo - Format Pagetable Datablock' for that rdba

REDO RECORD - Thread:2 RBA: 0x00045b.001887a1.0028 LEN: 0x008c VLD: 0x01
SCN: 0x0578.6eddf7be SUBSCN: 1 07/19/2012 12:11:00
CHANGE #1 TYP:1 CLS: 4 AFN:370 DBA:0x5ca5f32e OBJ:1638047 SCN:0x0578.6eddf7bd SEQ: 1 OP:13.17
ktsphfredo - Format Pagetable Segment Header
StartDBA 0x5ca5f32b nblks: 32 ForceL3 :1 Tsn: 15 objd: 1638047

REDO RECORD - Thread:2 RBA: 0x00045b.001887a5.0198 LEN: 0x008c VLD: 0x01
SCN: 0x0578.6eddf7c7 SUBSCN: 1 07/19/2012 12:11:00
CHANGE #1 TYP:1 CLS: 4 AFN:284 DBA:0x4718cbee OBJ:1638047 SCN:0x0578.6eddf7c2 SEQ: 1 OP:13.17
ktsphfredo - Format Pagetable Segment Header

BH (70000039ffb5108) file#: 370 rdba: 0x5ca5f32e (370/2487086) class: 7 ba: 70000039f230000
set: 94 blksize: 32768 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 5 obj: 1638047 objn: 148393 tsn: 15 afn: 370
hash: [700000fde5e6380,700000fde5e6380] lru: [7000005e7fcbdc0,700000b91fb4ce8]
lru-flags: hot_buffer
ckptq: [NULL] fileq: [NULL] objq: [700000f7c3f8288,70000063cfbac28]
st: SCURRENT md: NULL tch: 2 le: 70000069bff76a0
flags: remote_transfer
LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 15 rdba: 0x5ca5f32e (370/2487086)
scn: 0x0578.6eded558 seq: 0x01 flg: 0x00 tail: 0xd5582401
frmt: 0x02 chkval: 0x0000 type: 0x24=PAGETABLE EXTENT MAP BLOCK
Hex dump of block: st=0, typ_found=1

EMB Dump: 
Map Header:: next 0x4718cbee #extents: 1112 obj#: 1638047 flag: 0x10000000
Inc # 0 
Extent Map
-----------------------------------------------------------------
0x5ca5f32b length: 32 
0x5ceff1eb length: 32 
0x5d15360b length: 32 
0x5d5ddbcb length: 32 
0x5d9d106b length: 32
0x5dc000ab length: 32
0x5e09e1ab length: 32
0x5e4a8c0b length: 32
0x5e80d24b length: 32
0x5ec9a10b length: 32
0x5f009feb length: 32
0x5f40b74b length: 32
0x5f895f2b length: 32
0x5fd254cb length: 32

BH (700000dbcfc0ea8) file#: 284 rdba: 0x4718cbee (284/1625070) class: 7 ba: 700000dbc750000
set: 67 blksize: 32768 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 2 obj: 1638047 objn: 148393 tsn: 15 afn: 284
hash: [700000fdc387588,700000fdc387588] lru: [7000002f1fbcf90,700000a77fcfc30]
lru-flags: hot_buffer
ckptq: [NULL] fileq: [NULL] objq: [700000fc67dd420,700000453fb1828]
st: SCURRENT md: NULL tch: 143 le: 700000665fd8200
flags: remote_transfer
LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [65535]
buffer tsn: 15 rdba: 0x4718cbee (284/1625070)
scn: 0x0578.6ee3867a seq: 0x01 flg: 0x00 tail: 0x867a2401
frmt: 0x02 chkval: 0x0000 type: 0x24=PAGETABLE EXTENT MAP BLOCK
Hex dump of block: st=0, typ_found=1

EMB Dump:
Map Header:: next 0x00000000 #extents: 1983 obj#: 1638047 flag: 0x10000000
Inc # 0
Extent Map
-----------------------------------------------------------------
0x4718cbeb length: 32
0x475b598b length: 32
0x47989f6b length: 32
0x47d84f2b length: 32

ORA-8103 - objd: 1638108 objn: 1338416 tsn: 15 rdba: 0x4b8bf059

ksedmp: internal or fatal error
ORA-08103: object no longer exists
Current SQL statement for this session:

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000001 ? 000000000 ?
ksedmp+0290          bl       ksedst               104C23090 ?
ksddoa+0308          bl       _ptrgl
ksdpcg+0104          bl       ksddoa               110490160 ? 11048ACB8 ?
ksdpec+00e8          bl       ksdpcg               FFFFFFFFFFEEF20 ?
                                                   700000010007FE0 ?
                                                   FFFFFFFFFFEEFF0 ?
ksfpec+00a4          bl       03F37234
kgesev+007c          bl       _ptrgl
ksesec0+0048         bl       kgesev               000007FE8 ? 104FD1FE0 ?
                                                   000000000 ? 000000000 ?
                                                   FFFFFFFFFFEF410 ?
kteinicnt1+0384      bl       01FC3F98
qertbFetch+0288      bl       03F386EC
qertqoFetch+0298     bl       01FC3FD8
qerpx_resume+0370    bl       01FC3FD8
qerpxFetch+0e08      bl       qerpx_resume         000000000 ? 11055A520 ?
rwsfcd+0054          bl       _ptrgl
insfch+00b4          bl       _ptrgl
insdrv+042c          bl       insfch               104C2BAE8 ? 000000000 ?
inscovexe+02d8       bl       insdrv               1104A81B0 ?
insExecStmtExecIniE  bl       _ptrgl
ngine+005c
insexe+0318          bl       insExecStmtExecIniE  000000000 ? 000000400 ?
                              ngine                11048A818 ?
opiexe+2840          bl       insexe               1104BF320 ? FFFFFFFFFFF1678 ?
opipls+1888          bl       opiexe               FFFFFFFFFFF29C8 ?
                                                   FFFFFFFFFFF2AB0 ?
                                                   FFFFFFFFFFF2968 ?
opiodr+0b2c          bl       _ptrgl
rpidrus+01dc         bl       opiodr               66FFFF47D0 ? 6FFFF4800 ?
                                                   FFFFFFFFFFF5900 ? A00000000 ?
skgmstack+00c8       bl       _ptrgl
rpidru+0088          bl       skgmstack            000000003 ? 000000003 ?
                                                   000000002 ? 000000000 ?
                                                   FFFFFFFFFFF50B0 ?
rpiswu2+0368         bl       _ptrgl
rpidrv+097c          bl       rpiswu2              70000100553C598 ? 000000000 ?
                                                   700000010003520 ? 110566428 ?
                                                   110566464 ? 96FFFF5B30 ?
                                                   1104C6010 ? 000000000 ?

Argument/Register addr=0x0FFFFFFFFFFEF410.
Dump of memory from 0x0FFFFFFFFFFEF3D0 to 0x0FFFFFFFFFFEF510
FFFFFFFFFFEF3D0 00000000 00000000 00000001 1048A818 [.............H..]
FFFFFFFFFFEF3E0 00000000 00002000 00000001 1019C060 [...... ........`]
FFFFFFFFFFEF3F0 0FFFFFFF FFFEF5E0 48220080 00000B9D [........H"......]
FFFFFFFFFFEF400 00000000 00000000 00000000 00000000 [................]
FFFFFFFFFFEF410 0000000F 4B8BF059 0018FEDC 00146C30 [....K..Y......l0]
FFFFFFFFFFEF420 00080003 00007FE8 00000000 100733A8 [..............3.]

00146C30=> 1338416=> ORA-8103 - objd: 1638108 objn: 1338416

kjbhistory[0xbf059.12e0000,(pkey 4294967295.0)(where 1)]
*** 2012-07-19 15:05:23.818
GLOBAL CACHE ELEMENT DUMP (address: 70000018cfe95a0):
id1: 0xbf059 id2: 0x12e0000 pkey: INVALID block: (302/782425)
lock: NC rls: 0x0000 acq: 0x0003 latch: 20
flags: 0xc1 fair: 0 recovery: 0 fpin: 'ktewh25: kteinicnt'
bscn: 0x578.6ee51801 bctx: 0 write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0
lcp: 700000fd843f070 lnk: [700000fd843f090,700000fd843f090] lch: [700000bdbfbb338,700000bdbfbb338]
seq: 25664 hist: 7 352 477 329 144:6 384 7 352 477 329
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT:
flg: 0x00080000 state: READING mode: EXCL
pin: 'ktewh25: kteinicnt'
addr: 700000bdbfbb228 obj: 1638108 cls: SEG HEAD bscn: 0x577.a4f2674f

Note= OERR: ORA-8103 "object no longer exists" / Troubleshooting, Diagnostic and Solution (Doc ID 8103.1)
==>
Cause
ORA-8103 is caused by an invalid block type. The block header has an invalid block type or the 
block type inside the block is not expected; e.g. a data block (Type=6) was expected but the 
actual block information is not a data block (Type!=6).

ORA-8103 is also caused by an unexpected data_object_id where it is changing for the involved 
objects while the affected SQL statement is executed.

$sqlplus / as sysdba

Note: please replace literal '<owner>' with actual owner 

--------------------<
set lines 500
set long 9999
set pages 999
set serveroutput on size 1000000
set feedback off
SET MARKUP HTML ON SPOOL ON HEAD "<TITLE>SQL*Plus Report</title><STYLE TYPE='TEXT/CSS'><!--BODY {background: ffffc6} --></STYLE>"
spool query_result.html
set echo off
alter session set nls_date_format = 'yyyy/mm/dd hh24:mi:ss';
SELECT * FROM DBA_TAB_MODIFICATIONS where table_owner = '<owner>'
and table_name in ('RAW_BORM','MG_34_FEE_DTL','RAW_BOIS','MG_34_CA_AMT_BK','RAW_BLDVNI');
spool off
SET MARKUP HTML OFF
set echo on
-------------------->

1. run the hcheck script against the database "using note hcheck.sql" script to check for 
known problems in Oracle8i, Oracle9i, Oracle10g and Oracle 11g (Doc ID 136697.1) and provide the output to SR.
Please do not provide a print screen, but the spool file obtained

2. set event for ORA 8103 to capture the errorstack
alter system set events='8103 trace name errorstack, level 3';

3. wait for the error to reproduce and upload the trace file created for the error

ora-00600:[17281], [1001]一例

检查告警日志发现出现ora-600:[17281],[1001]记录,该数据库版本为10.2.0.4:

ORA-00600: internal error code, arguments: [17281], [1001], [0x70000042F5E54F8], [], [], [], [], []
ORA-01001: invalid cursor

分析该600错误产生的trace文件,发现当时运行的语句是一段匿名块:

Current SQL statement for this session:
declare
t_owner varchar2(30);
t_name  varchar2(30);
procedure check_mview is
dummy integer;
begin
if :object_type = ‘TABLE’ then
select 1 into dummy
from sys.all_objects
where owner = :object_owner
and object_name = :object_name
and object_type = ‘MATERIALIZED VIEW’
and rownum = 1;
:object_type := ‘MATERIALIZED VIEW’;
end if;
exception
when others then null;
end;
begin
:sub_object := null;
if :deep != 0 then
begin
if :part2 is null then
select constraint_type, owner, constraint_name
into :object_type, :object_owner, :object_name
from sys.all_constraints c
where c.constraint_name = :part1 and c.owner = user
and rownum = 1;
else
select constraint_type, owner, constraint_name, :part3
into :object_type, :object_owner, :object_name, :sub_object
from sys.all_constraints c
where c.constraint_name = :part2 and c.owner = :part1
and rownum = 1;
end if;
if :object_type = ‘P’ then :object_type := ‘PRIMARY KEY’; end if;
if :object_type = ‘U’ then :object_type := ‘UNIQUE KEY’; end if;
if :object_type = ‘R’ then :object_type := ‘FOREIGN KEY’; end if;
if :object_type = ‘C’ then :object_type := ‘CHECK CONSTRAINT’; end if;
return;
exception
when no_data_found then null;
end;
end if;
:sub_object := :part2;
if (:part2 is null) or (:part1 != user) then
begin
select object_type, user, :part1
into :object_type, :object_owner, :object_name
from sys.all_objects
where owner = user
and object_name = :part1
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
if :object_type = ‘SYNONYM’ then
select s.table_owner, s.table_name
into t_owner, t_name
from sys.all_synonyms s
where s.synonym_name = :part1
and s.owner = user
and rownum = 1;
select o.object_type, o.owner, o.object_name
into :object_type, :object_owner, :object_name
from sys.all_objects o
where o.owner = t_owner
and o.object_name = t_name
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
end if;
:sub_object := :part2;
if :part3 is not null then
:sub_object := :sub_object || ‘.’ || :part3;
end if;
check_mview;
return;
exception
when no_data_found then null;
end;
end if;
begin
select s.table_owner, s.table_name
into t_owner, t_name
from sys.all_synonyms s
where s.synonym_name = :part1
and s.owner = ‘PUBLIC’
and rownum = 1;
select o.object_type, o.owner, o.object_name
into :object_type, :object_owner, :object_name
from sys.all_objects o
where o.owner = t_owner
and o.object_name = t_name
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
check_mview;
return;
exception
when no_data_found then null;
end;
:sub_object := :part3;
begin
select o.object_type, o.owner, o.object_name
into :object_type, :object_owner, :object_name
from sys.all_objects o
where o.owner = :part1
and o.object_name = :part2
and object_type in (‘MATERIALIZED VIEW’, ‘TABLE’, ‘VIEW’, ‘SEQUENCE’, ‘PROCEDURE’, ‘FUNCTION’, ‘PACKAGE’, ‘TYPE’, ‘TRIGGER’, ‘SYNONYM’)
and rownum = 1;
check_mview;
return;
exception
when no_data_found then null;
end;
begin
if :part2 is null and :part3 is null
then
select ‘USER’, null, :part1
into :object_type, :object_owner, :object_name
from sys.all_users u
where u.username = :part1
and rownum = 1;
return;
end if;
exception
when no_data_found then null;
end;
begin
if :part2 is null and :part3 is null and :deep != 0
then
select ‘ROLE’, null, :part1
into :object_type, :object_owner, :object_name
from sys.session_roles r
where r.role = :part1
and rownum = 1;
return;
end if;
exception
when no_data_found then null;
end;
:object_owner := null;
:object_type := null;
:object_name := null;
:sub_object := null;
end;
—– Call Stack Trace —–
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
——————– ——– ——————– —————————-
ksedst+001c          bl       ksedst1              088424844 ? 041124844 ?
ksedmp+0290          bl       ksedst               104A54870 ?
ksfdmp+0018          bl       03F30204
kgeriv+0108          bl       _ptrgl
kgeasi+0118          bl       kgeriv               1104722C8 ? 1101B87C0 ?
104AFA0FC ? 7000000100067F8 ?
000000000 ?
kgicli+0188          bl       kgeasi               110195A58 ? 110447310 ?
438100004381 ? 200000002 ?
200000002 ? 000000000 ?
0000003E9 ? 000000002 ?
kgidlt+0398          bl       kgicli               110450A70 ? 104C734E0 ?
kgidel+0018          bl       kgidlt               FFFFFFFFFFF90B8 ? 000000000 ?
000000003 ? 000000000 ?
FFFFFFFFFFF9468 ?
perabo+00ac          bl       kgidel               FFFFFFFFFFF8D20 ? 0000000FF ?
perdcs+0050          bl       perabo               000000000 ? 000000820 ?
000000000 ?
peidcs+01dc          bl       perdcs               110477618 ? 000000000 ?
kkxcls+00a4          bl       peidcs               FFFFFFFFFFF9468 ? 110477618 ?
kxsClean+0044        bl       kkxcls               1100DD338 ?
kxsCloseXsc+0444     bl       kxsClean             FFFFFFFFFFF9760 ?
kksCloseCursor+031c  bl       kxsCloseXsc          110478688 ? 110281FB0 ?
opicca+00c4          bl       kksCloseCursor       104BD9640 ?
opiclo+00a0          bl       opicca               10013D940 ?
kpoclsa+0050         bl       03F32B00
opiodr+0ae0          bl       _ptrgl
ttcpip+1020          bl       _ptrgl
opitsk+1124          bl       01F9F2A0
opiino+0990          bl       opitsk               000000000 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl
opidrv+0484          bl       01F9E0E8
sou2o+0090           bl       opidrv               3C02DC1BBC ? 44065F000 ?
FFFFFFFFFFFF3A0 ?
opimai_real+01bc     bl       01F9B9F4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?



first argument为17281,该代码对应为在关闭游标时发生错误事件。
发生错误时的调用栈为:kkxcls->peidcs->perdcs->perabo->kgidel->kgidlt->kgicli,通过以上调用栈与argument信息在600 lookup工具中查询,可以发现bug:[6051353]:

Hdr: 6051353 10.1.0.45 THIN 10.1.0.5 PRODID-972 PORTID-212 ORA-600
Abstract: ORA-600[17281] ORA-[1001]

*** 05/14/07 07:09 am ***
TAR:
----
17450130.600

PROBLEM:
--------
Oracle 10.1.0.5 64-bit
AIX5L 64-bit server

Following the application of CPUJAN2007 patch, the database is giving
internal errors.
Tha alert log sjows:
  ORA-600: internal error code, arguments: [17281], [1001],
[0x70000001E792DC8], [], [], [], [], []
  ORA-1001: invalid cursor

Trace file shows the failing statement is an insert.

INSERT INTO V_RPMORGRESOURCEPOSITION
(ORGID, RESOURCEID, EFFECTIVEDAY, TERMINATIONDAY, OWNEDMW,
COMMITTEDMW, AVAILABLEMW, UNOFFEREDMW, FRRCOMMITTEDMW )
SELECT :B12 , :B11 , EFFECTIVEDAY, LEAST(:B10 , :B9 ),
NVL(OWNEDMW,0) + NVL(:B8 ,0), NVL(COMMITTEDMW,0) + NVL(:B7 ,0),
NVL(AVAILABLEMW,0) + NVL(:B6 ,0), NVL(UNOFFEREDMW,0) + NVL(:B5 ,0),
NVL(FRRCOMMITTEDMW,0) + NVL(:B4 ,0)
FROM RPMORGRESOURCEPOSITION
WHERE ORGID = :B3 AND RESOURCEID = :B2 AND EFFECTIVEDAY = :B1

    Other information:
        O/S info: user: , term: , ospid: 1234, machine: esu03als
        client info: smartino@PJM
        application name: eRpm
        action name: QueryZonalLoadObligationDetail
        last wait for 'SQL*Net message from client'

DIAGNOSTIC ANALYSIS:
--------------------
the function stack exactly matches 4359111
kgicli kgidlt kgidel perabo
    Bug 4359111 - STRESS ORA-600[17281] WHEN RUNNING GMT APPLICATION13
But this is already fixed in 10.1.0.5

The 'client' (esu03als) is a Linux server and there is no oracle running
there.

The AIX SysAdmin. who manages the esu03als server says this about the
application:

    "The way RPM connects to our database is by using DBCP (Apache Jakarta
     Project -- see
     As far as I can tell, the release (jar) contained within the RPM
     delivery is version 1.1 of DBCP which was released on 2003-10-20."

This couls also be Bug 5392685/5413487 but tis doesn't seem to have a
resolution.

Also could be Bug 5366763
  possible workaround - set session_cached_cursors to 0
  But this is already set to 0.

Originally the Customer thought this error started after applying CPUJAN2007,
but it is appearing on another database where the CPU patch is not applied.

WORKAROUND:
-----------
none

RELATED BUGS:
-------------
4359111
5413487
5366763

REPRODUCIBILITY:
----------------
intermittent

TEST CASE:
----------
none

STACK TRACE:
------------
          ksedmp ksfdmp kgeriv kgeasi
          kgicli kgidlt kgidel perabo perdcs peidcs kkxcls2 kxtcln
          kxsClean kksCloseCursor opicca opiclo kpoclsa opiodr
          ttcpip opitsk opiino opiodr opidrv sou2o
          main

其调用栈完全一致,可以基本确定2者的关联性。但该文档叙述bug发生在10.1.0.5的AIX版本上,且据称该Bug之前已在“Bug 4359111 – STRESS ORA-600[17281] WHEN RUNNING GMT APPLICATION13”中声明并在10.1.0.5上修复,看起来又是一个伪修复的漏洞。另外一个文档叙述了同样的错误发生在10.2.0.4上:

Hdr: 8337808 10.2.0.4 RDBMS 10.2.0.4 PRG INTERFACE PRODID-5 PORTID-46 ORA-600 4359111
Abstract: ORA-600 [17281] [1001] EVEN AFTER APPLYING THE PATCH 4359111

In prod_ora_12985.trc we see that:
 O/S info: user: Arun?Sharma, term: ARUN, ospid: 2556:3300,
 machine: BVM-EDP\ARUN
Got the ORA-600 at 2009-04-17 13:27:21 .

From the trace this was likely because the PLSQL block in
cursor #2 has an instantiation entry indicating that it
has cursor #3 open:
 INSTANTIATION OBJECT: object=0xf60e9ed0
 type="PL/SQL"[0] lock=0xa383a928 handle=0xb48def6c body=(nil)
 flags=[40] executions=0
 CURSORS: size=4 count=1 next=3
 index cursor      tag  context flags
 ----- ------ -------- -------- ---------------
     2      3 0xf60d56f4 0xf60f832c LRU/PRS/[03]
            ^here

But there is no cursor#3 so it has likely been closed independently

So the immediate thing to do would be to find out what this OS
user (Arun Sharma) was doing at 13.27 on 17th April from the
ARUN machine under OS pid 2556:3300 ?
 - what was the client program ?
 - what is this clients Oracle RSF version ?
 - what was being done in that client at that time.

It is unlikely that the user will remember such fine detail
so you may want to track the alert log closely and as soon
as you seen the ORA-600 find the O/S user , machine etc..
and try to contact them ASAP to confirm what they were doing
etc.. If we can get a handle on the client version / program /
actions that may help. Beyond that the next step is likely
to need a diagnostic on the server side to note cursor close
operations from the client without extranoues additional trace.

From the above update his client is TOAD using 8.0.6.0 on Windows.
TOAD is known to be affected by bug 4359111 so
you should upgrade this client to a version where
bug 4359111 is fixed. (4359111 is a CLIENT SIDE fix)

Marking as an unconfirmed duplicate of bug 4359111
as it looks like some specific client may be connecting
which does not have that patch in place.

与以上文档描述相同,trace中存在以下记录:

INSTANTIATION OBJECT: object=1105e4fe8
type=”PL/SQL”[0] lock=70000044155a8b0 handle=70000042f5e54f8 body=0 level=0
flags=[40] executions=0
CURSORS: size=4 count=3 next=5
index cursor      tag  context flags
—– —— ——– ——– —————
2      4 11049ecc8 110525f60 LRU/PRS/[03]
3      6 11049ecc8 1105261f0 LRU/PRS/[03]
4      5 11049ecc8 110526338 LRU/PRS/[03]

但实际上这里cursor 3所打开的cursor#:4,5,6均不存在,所以cursor# 3也被单独关闭了。文档中问题是由toad引起的,首先toad连接数据库是不需要安装Oracle client的,它通过一些客制化过的c/c++的接口连接到DB;如文档所述可以确定toad V8.0.6.0受到 Bug4359111的影响,而我们的环境中是通过PL/SQL developer连接到数据库的,该工具需要用到Oracle client,而开发人员安装的Oracle client一般为9.2.0.1,极有可能是这一较低版本的客户端软件造成了问题发生,到这里触发Bug的条件基本清晰了。

8i/9i的oracle client虽然仍能够连接到10g,但难保不发生一些兼容性问题或者将早期版本中的Bug再次代入,Oracle对这些连接形式或已不提供技术支持,或提供扩展模式(可能收费)的技术支持。以下列表列出了各版本Server-client的兼容性:



  • #1 – See Note 207319.1
  • #2 – An ORA-3134 error is incorrectly reported if a 10g client tries to connect to an 8.1.7.3 or lower server. See Note 3437884.8 .
  • #3 – An ORA-3134 error is correctly reported when attempting to connect to this version.
  • #4 – There are problems connecting from a 10g client to 8i/9i where one is EBCDIC based. See Note 3564573.8
  • #5 – For connections between 10.2 (or higher) and 9.2 the 9.2 end MUST be at 9.2.0.4 or higher. Connections between 10.2 (or higher) and 9.2.0.1, 9.2.0.2 or 9.2.0.3 are not supported.
  • #6 – For connections between 11.1 (or higher) and 10.1 / 10.2 the 10g end MUST be at 10.1.0.5 / 10.2.0.2 (or higher) respectively in order to use PLSQL between those versions. See Note 4511371.8 for more details.

其实在我们升级或迁移Oracle数据库的时候就因该考虑到客户端软件也需要升级到合适版本才能满足今后兼容性及应用程序健壮度的要求,当然客户端软件并不一定只是oracle client,它可能是jdbc,也许是odbc,也许是dbi等等。

沪ICP备14014813号-2

沪公网安备 31010802001379号