共享池中的NETWORK BUFFER

2010/08/05 by admin 8 Comments

中午休闲时在itpub看到一个关于network buffer占用大量内存的求助帖,帖子原文如下:

各位大侠们，请教个问题。昨天遇到一个solaris10平台下的oracle10g(10.2.0.4)数据库报共享内存不足，发现数据库的sga_target才2512M，而在v$sgastat视图中查到的
shared pool–>NETWORK BUFFER就有1848744416字节，是什么引起network buffer这么大呢，在udmp目录下1分钟产生几个跟 ORA-4031相关的文件。

==================
SQL> show parameter sga

NAME                               TYPE       VALUE
———————————— ———– ——————————
lock_sga                            boolean    FALSE
pre_page_sga                      boolean    FALSE
sga_max_size                      big integer 2512M
sga_target                         big integer 2512M
SQL> show parameter share

NAME                               TYPE       VALUE
———————————— ———– ——————————
hi_shared_memory_address          integer    0
max_shared_servers                integer
shared_memory_address             integer    0
shared_pool_reserved_size          big integer 72142028
shared_pool_size                   big integer 0
shared_server_sessions             integer
shared_servers                      inte

NETWORK BUFFER对我们来说或许有些陌生，那是因为绝大多数场合都采用dedicated server模式，共享服务器模式下NETWORK BUFFER将被大量使用。MOS文档[741523.1]叙述了NETWORK BUFFER的主要用途:

On 10.2, after upgrading from 9iR2, the following error occurs:

ORA-07445: exception encountered: core dump [] [] [] [] [] []

plus

Dispatcher Trace file contains an ORA-4031 Diagnostic trace, with:
Allocation request for: NETWORK BUFFER

…followed by…

found dead dispatcher ‘D000’, pid = (12, 1)

The amount of memory used by NETWORK BUFFERs in the shared pool has significantly grown between 9.2 and 10.2. The side-effect is to run-out of Shared Pool memory (reporting an ORA-4031), when a large number of sessions are connecting to the server (in the order of 1000’s).

While a session is being established, we allocate 3 buffers each of 32k in size. After the session is established, we use the 3 SDU-sized buffers, however we do not deallocate the 3x32k buffer we allocated initially.

This issue has been logged in unpublished Bug 5410481.

Additionally, there is Bug 6907529.

NS buffers are allocated based on the SDU specified by the user. The negotiated SDU could be considerably lower. The difference between these two is wasted.

For example, the dispatcher specifies an SDU of 32k. Clients, by default, use an SDU of 8k. The remaining 24k is never used.

Issue in Bug 6907529 is fixed in 11.2.

Bug 5410481 is fixed in 10.2.0.3.

As a workaround to 5410481, the ADDRESS part of DISPATCHERS parameter can be used to specify a smaller SDU size.

For example:
DISPATCHERS=”(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp))(SDU=8192))”

To implement the change;

connect to the database as SYSDBA

alter system set dispatchers='(address=(protocol=tcp)(host=IP-Address)(sdu=8192))(dispatchers=DispatcherCount)’ scope=spfile;

re-start the database

你可能会问SDU是什么？Oracle NET缓存的数据以SDU为基本单位，SDU即 session data unit，一般默认为8192 bytes。当这些数据单元被写满，或被client读取时，他们将被传递给Oracle Network层(oracle network layer)。譬如Data Guard环境中redo传输的每个Chunk往往要大于8192 bytes，那么默认的SDU就不太适用。当有大量重做数据要传输到standby库时，增大SDU buffer的大小可以改善Oracle的网络性能。你可以很方便的通过修改sqlnet.ora配置文件来修改SDU,如在该文件内加入以下条目:
DEFAULT_SDU_SIZE=32767 /*修改全局默认SDU到32k*/
当然你也可以在tnsnames.ora中定义服务别名时个别指定SDU，下文我们会用到。
如上文所述在版本10.2.0.3以前当会话建立时，Oracle会以dispatchers参数定义的SDU为单位，分配3个单位的NETWORK BUFFER，而实际上client端可能并未指定和dispatchers一致的SDU，若dispatchers中定义的SDU为32k，而client使用默认的8k SDU,则一个会话可能要浪费3*32-3*8=72k的NETWORK BUFFER。

为什么共享服务器模式下会用到共享池中的NETWORK BUFFER，而独享服务器模式下没有呢？因为在独享服务器模式下每个会话所分配的三个SDU是从PGA中获取的；当使用共享服务器模式时会话与服务进程形成一对多的映射关系，这三个SDU 的NETWORK BUFFER同UGA一样转移到了SGA中。

下面我们通过实践来进一步验证。

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production
/*实验服务器端采用10.2.0.4版本*/
SQL> show parameter dispatch

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dispatchers                          string      (address=(protocol=tcp)(host=1
                                                        92.168.1.103)(sdu=32768))(SERV
                                                        ICE=cXDB)(dispatchers=10)
/*dispatchers中指定了SDU为32k*/

C:\Windows\System32>tnsping cXDB
TNS Ping Utility for 32-bit Windows: Version 11.2.0.1.0 - Production on 05-8月 -2010 22:51:27
Copyright (c) 1997, 2010, Oracle.  All rights reserved.
已使用的参数文件:
D:\tools\adminstratorg\orahome\network\admin\sqlnet.ora
已使用 TNSNAMES 适配器来解析别名
尝试连接 (DESCRIPTION = (SDU=8192) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.103)(PORT = 1521))) (CONNECT_DATA = (SERVER = SHARED) (SERVICE_NAME = cXDB)))
OK (30 毫秒)
/* client端采用11.2.0.1版本,定义了共享服务器模式的服务别名，显式指定SDU为8192字节*/

这里我们要用到一个简单的java程序，用来模拟大量会话登录；这个程序很傻瓜，但是总比你一个个开SQLPLUS要明智的多:

/*这是一个很简单的java程序，登录远程数据库，并尝试打开600个回话,并且都指定了SDU为8192*/
package javaapplication2;
import oracle.jdbc.*;
import java.sql.*;
public class Main
{
    public static void main(String[] args) throws SQLException
    {
        try
        {
            Class.forName("oracle.jdbc.driver.OracleDriver");
        }
        catch(Exception e )
        {
        }
        Connection cnn1=DriverManager.getConnection("jdbc:oracle:thin:@(DESCRIPTION = (SDU=8192) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.103)(PORT = 1521))) (CONNECT_DATA = (SERVER = SHARED) (SERVICE_NAME = cXDB)))", "system", "password");
        Statement stat1=cnn1.createStatement();
        ResultSet rst1=stat1.executeQuery("select * from v$version");
        while(rst1.next())
        {
            System.out.println(rst1.getString(1));
        }
        Connection m[]=new Connection[2000];
        Statement s[]=new Statement[2000];
        ResultSet r[]=new ResultSet[2000];
        int i=0;
        while(i<600)
        {
            try
            {
                m[i]=DriverManager.getConnection("jdbc:oracle:thin:@(DESCRIPTION = (SDU=8192) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.103)(PORT = 1521))) (CONNECT_DATA = (SERVER = SHARED) (SERVICE_NAME = cXDB)))", "system", "password");
            }
            catch (Exception em)
            {
                System.out.println(em.getMessage());
            }
            try
            {
                Thread.sleep(3);
            }
            catch (Exception e)
            {
            }
            s[i]=m[i].createStatement();
            m[i].setAutoCommit(false);
            i++;
            System.out.println(i+"is ok !");
        }
        System.out.println("We are waiting!");
        try
        {
            Thread.sleep(1000);
        }
        catch (Exception e)
        {
        }
    }
}

编译上面这段程序，尝试执行看看，执行的同时留意观察NETWORK BUFFER:

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';

NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool      328080

java -jar ora_network_buffer_test_8.jar
/*启动编译后的测试程序*/

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool    69608200

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool      348960
/*会话终止后，NETWORK BUFFER回缩*/

修改上述程序中的SDU到32k，重新编译后再次测试
java -jar ora_network_buffer_test_32.jar

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';

NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool      328080

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';

NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool    99148576
/*可以看到同样的会话数量，client端SDU增大到32k后，NETWORK BUFFER有了大幅增长*/

我们修改dispatchers参数中的SDU到8k看看
SQL> alter system set dispatchers='';

System altered.

SQL> alter system set dispatchers='(address=(protocol=tcp)(host=192.168.1.103)(sdu=8192))(SERVICE=cXDB)(dispatchers=10)';

System altered.
SQL> show parameter dispatchers

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dispatchers                          string      (address=(protocol=tcp)(host=1
                                                        92.168.1.103)(sdu=8192))(SERVI
                                                        CE=cXDB)(dispatchers=10)
SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';

NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool      328080

java -jar ora_network_buffer_test_32.jar

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';

NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool    99148552
/*看起来dispatcher中的SDU优先级并没有client中的高*/
我们再来看看client中SDU为8k的情况
SQL> show parameter dispatchers

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
dispatchers                          string      (address=(protocol=tcp)(host=1
                                                        92.168.1.103)(sdu=8192))(SERVI
                                                        CE=cXDB)(dispatchers=10)
SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';

NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool      328080

java -jar ora_network_buffer_test_8.jar

SQL> select name,pool,bytes from v$sgastat where name like '%NETWORK%';
NAME                       POOL              BYTES
-------------------------- ------------ ----------
NETWORK BUFFER             shared pool    69608200
/*与dispatchers中为32k,而client为8k时一样*/

由以上实践可知10.2.0.4之后，NETWORK BUFFER的使用量由客户端设定的SDU和共享服务器会话数决定。我在之前的博文中曾经列出过TNS协议的几个基础类描述(见《Oracle 网络TNS协议的几个基础类描述》),其中Session包含了setSDU(int i)方法，其代码如下:

public void setSDU(int i)
{
if(i <= 0) sdu = 2048;
else if(i > 32767)
sdu = 32767;
else if(i < 512)
sdu = 512;
else
sdu = i;
}

由以上代码可知，客户端设定的SDU时，其最大最小值分别为32k和512bytes，大于32k时被强制设为32k,而小于512bytes时被强制设为512bytes,若设定SDU<0，则被强制修正为2048 bytes,在512 bytes- 32767 bytes之间则为原值不变。

Filed Under: Oracle Tagged With: memory, ORA-07445, sdu, shared pool, shared servers, sqlnet

Comments

admin says

2010/08/06 at 00:06

Applies to:
Oracle Server – Enterprise Edition – Version: 11.1.0.6
This problem can occur on any platform.
Symptoms

When trying to run RMAN backup the following error occur (It might occur during other operations and is not predicated on RMAN running):

ORA-04031: unable to allocate 36424 bytes of shared memory (“shared pool”,”unknown object”,”sga
heap(1,1)”,”krbmror”)

1. The DISPATCHERS parameter has been set for XDB as below:

dispatchers = “(PROTOCOL=TCP) (SERVICE=compdataXDB)”

2. The trace file and output from v$sgastat shows high allocation under “NETWORK BUFFER” component of shared pool:

Trace file:
~~~~~~~~
Allocation request for: NETWORK BUFFER
Heap: 000000000FA7E6D8, size: 32832

Subpool 1
—————-
“free memory ” 152960736
“NETWORK BUFFER ” -1536212680 <== the negative value indicates over 2G used, reported incorrectly in trace file due to unpublished Bug 7258873 v$sgastat: ~~~~~~~ SQL> select * from v$sgastat where pool like ‘shared%’ order by bytes;
POOL NAME BYTES
———— ————————– ———-
shared pool free memory 139557432
shared pool NETWORK BUFFER 7053951568 —-> High

3. The alert log shows many errors as below:

dispatcher ‘D000’ encountered error getting listening address
ORA-108 : opidrv aborting process D000 ospid (1876_3744)
Process debug not enabled via parameter _debug_enable
found dead dispatcher ‘D000’, pid = (19, 136)

4. After a restart of database the instance will be fine but after period of time the NETWORK BUFFER component keeps growing excessively.
Cause

Consistent growth of NETWORK BUFFER component is consuming a lot of space in shared pool.
It causes insufficient space in shared pool.

This is due to incorrect dispatcher settings for XDB.
dispatchers = “(PROTOCOL=TCP) (SERVICE=compdataXDB)”

Bug 8573881 was raised for this issue.
Solution

1. Remove DISPATCHERS parameter for XDB if there is no XDB usage in database.

2. Restart the database instance.

回复
admin says

2010/08/06 at 00:10

Hdr: 6801602 10.2.0.3.0 RDBMS 10.2.0.3.0 MEMORY MGMT PRODID-5 PORTID-87 6333663
Abstract: UGA MEMORY GREW DRAMATICALLY AND CONSUME 80% OF LARGE POOL

PROBLEM:
——–
Problem history:

–The Database was recently upgraded from 9i to 10g
–On 10th of Jan,database started to suffer from ora-4031 errors on shared
pool,The failure size for ORA-4031 is around 32K for network buffer

–shared servers is being used,and the symptoms match that described in
Bug 5410481:Abstract: 4031, SHARED SERVER NETWORK BUFFER SIZE 32K

–suggest to CT to decrease the SDU being used to be 2k (SDU=2920),that
resolved the shared pool issue.

–However Database suffers from ora-4031 on large pool,UGA memory grew
dramatically and fragmented the free memory to an extent that a request for a
large chunk of memory (1,000,024) failed.

DIAGNOSTIC ANALYSIS:
——————–
Trace file shows the following:

Allocation request for: bind var buf
Heap: 455f65bc0, size: 1000024
******************************************************
HEAP DUMP heap name=”large pool” desc=400001990 >> the problem is in the
Large Pool now
extent sz=0xfe0 alt=216 het=32767 rec=9 flg=-126 opc=0
parent=0 owner=0 nex=0 xsz=0x1
******************************************************
HEAP DUMP heap name=”kxs-heap-b” desc=455f65bc0
extent sz=0x400 alt=32767 het=32767 rec=0 flg=2 opc=0
parent=4aefea7d8 owner=0 nex=0 xsz=0x1000000
Subheap has 196352 bytes of memory allocated

So it appears the workaround worked to clear out the excessive memory growth
in the Shared Pool. Now we see large ‘session heap’ (UGA) allocations in
the Large Pool and signs that the Large Pool is too small for the workload.

==============================
Memory Utilization of Subpool 1
================================
Allocation Name Size
_________________________ __________
“free memory ” 300,326,992
“miscellaneous ” 0
“session heap ” 1,310,285,744 << 1G for UGA memory --Ct increase large pool up to 2.5 GB,however we can still see that during heavy activity UGA memory is still using quite a bit of the memory resources in the Large Pool --The AWR reports show a yo-yo effect of memory freed and then allocated once again to UGA (session heap). The Large Pool trace shows almost 95% of the Large Pool is UGA. --ried using: sqlplus /nolog connect / as sysdba oradebug setorapid oradebug unlimit
oradebug dump heapdump 536870916
oradebug tracefile_name <<< this command provides the full path and filename of the trace file just generated >>>

to get a heapdump of the large pool but it fails with:

SQL> oradebug dump heapdump 536870916
ORA-1012: Nicht angemeldet

and the trace file generated does not have the required info.

WORKAROUND:
———–

RELATED BUGS:
————-

REPRODUCIBILITY:
—————-

TEST CASE:
———-

STACK TRACE:
————

SUPPORTING INFORMATION:
———————–

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————

Can you please provide more information about the sizing of the SGA:

1. What was the SGA size and the individual SGA components in 9i?
1.1. How was the SGA size determined in 9i?
1.2. Please provide details of the factors taken into account when
calculating the large pool size in 9i (I presume customer was making use of
shared servers in 9i).

2. What was the initial SGA size and the individual SGA components in 10g?
2.1. How was the SGA size determined in 10g?
2.2. How was the large pool size determined in 10g (please provide all
factors taken into account)?
2.3. Was there any change in load (e.g. more users, more batch processing,
etc) or load type since moving to 10g? If yes, please provide details.
2.4. What changes were made to init parameters since the problem was
observed? You mentioned that SDU was modified (part of DISPATCHERS
parameter). From the provided AWR report, I see that large pool size was
1.5GB on 29 Jan 2008, and the RDA shows the large pool to be 2.5GB on 6 Feb
2008. We need to know about all changes and their effects on the problem.

3. You said that customer upgraded from 9i to 10g. In the alert log (from
RDA) I see CREATE DATABASE statement, on 22 Dec 2007. Can you please clarify
if this was an upgrade or not, and what you meant by mentioning 9i?

3. Why is customer not using automatic SGA?
3.1. Did they have problems with auto SGA? Were they running into Oracle
bug(s)? If so, please provide details.

4. About the ORA-4031 error
4.1. Is customer still getting the error? If yes, can we please have the
trace file(s) for the error and the relevant portion of the alert log (say 2
weeks or from the last instance restart before the error).
4.2. Is the UGA memory usage going up and down over a period of time (if yes,
please specify the time interval, the load on the system, users/sessions
details for low/med/high loads, etc), or is it steadily going up and
eventually ending up with ORA-4031?
4.3. It would be good to provide few AWR reports that illustrate this, as the
single AWR report (that was provided) is not of much use 🙁

Based on the above I would suggest:

1. Increase _large_pool_min_alloc to 65536 (note that this will be the
default value in 10.2.0.4). Leave large pool at 2GB (although I would think
that it needs to be larger). Run for a while or run the tests and see if
this makes any difference. Look at AWR and see if shared pool latch waits go
away from the top 5 events.

1.1. If we still get the error, please provide the ORA-4031 dump files, AWR
reports and the alert log for the relevant period.

2. If the above helps, work on the long term solution for the SGA. Customer
needs to review their (session and memory) requirements and calculate the SGA
size accordingly. Once they get that right, they should move to automatic
SGA. The whole point of auto SGA is not to worry about different pool
allocations…

回复
admin says

2010/08/06 at 00:14

Hdr: 6371207 10.2.0.2 RDBMS 10.2.0.2 SHARED SERVER PRODID-5 PORTID-226 ORA-4031 5410481
Abstract: ORA-4031 – REQUESTING 32K NETWORK BUFFER SIZE BUT SDU=2K

PROBLEM:
——–
Dispatchers are trying to allocate a “NETWORK BUFFER” (memory size: 32856)
from shared pool and it is failing with ORA-4031.
At this time users get ORA-3113
and at the end the dispatcher fails with TNS-12531 due to ORA-4031 :
error encountered when accepting new connection:
NS Primary Error: TNS-12531: TNS:cannot allocate memory

DIAGNOSTIC ANALYSIS:
——————–
Trace file shows :

*** 10:19:10.249
=================================
Begin 4031 Diagnostic Information
=================================
The following information assists Oracle in diagnosing
causes of ORA-4031 errors. This trace may be disabled
by setting the init. ora _4031 _dump_bitvec = 0
=====================================
Allocation Request Summary Informaton
=====================================
Current information setting: 04014fff
SGA Heap Dump Interval=3600 seconds
Dump Interval=300 seconds
Last Dump Time=08/28/2007 10:19:10
Dump Count=1
Allocation request for: NETWORK BUFFER
Heap: 0x6003d300, size: 32856
******************************************************
HEAP DUMP heap name=”sga heap(2,0)” desc=0x6003d300
extent sz=0xfe0 alt=216 het=32767 rec=9 flg=-126 opc=0
parent=(nil) owner=(nil) nex=(nil) xsz=0x1000000

There’s one bug which can probably explain this Bug:5410481

But cust. is using SDU=2048 to workaround this :
dispatchers =
(dispatchers=4)
(listener=(address_list=

However, the trace file shows that the “NETWORK BUFFER” requested is size:
32856, a little bit over 32k and that’s probably the reason for the
shared_pool fragmentation and final ORA-4031.

WORKAROUND:
———–
Do not use Shared servers? unacceptable for customer

RELATED BUGS:
————-
5410481

REPRODUCIBILITY:
—————-
At cust. site.

TEST CASE:
———-
n/a

STACK TRACE:
————
—– Call Stack Trace —–
calling call entry argument values in hex
location type point (? means dubious value)
——————– ——– ——————–
—————————-
ksedst()+31 call ksedst1() 000000000 ? 000000000 ?
7FBFFFCA40 ? 7FBFFFCAA0 ?
7FBFFFC9E0 ? 000000000 ?
ksm_4031_dump()+139 call ksedst() 000000000 ? 000000000 ?
9 7FBFFFCA40 ? 7FBFFFCAA0 ?
7FBFFFC9E0 ? 000000000 ?
ksmasg()+312 call ksm_4031_dump() 004014FFF ? 060000058 ?
06003D300 ? 7FBFFFCAA0 ?
000008058 ? 005398FD8 ?
kghnospc()+2570 call ksmasg() 004014FFF ? 060000058 ?
06003D300 ? 000008058 ?
0600000A0 ? 7FBFFFD650 ?
kghalo()+2159 call kghnospc() 006463B60 ? 06003D300 ?
000008058 ? 06003D34C ?
001012000 ? 000000000 ?
kmnsbm()+578 call kghalo() 000000000 ? 06003D300 ?
000008025 ? 07FFFFFFF ?
000008058 ? 000008058 ?
nsbal()+428 call kmnsbm() 2E4485BE0 ? 000008015 ?
000008025 ? 07FFFFFFF ?
000008058 ? 000008058 ?
nsbalc()+123 call nsbal() 0064DABC0 ? 0065CCBC8 ?
000008015 ? 07FFFFFFF ?
000008058 ? 000008058 ?
nsdo()+17070 call nsbalc() 0065CC950 ? 000007FFF ?
000000000 ? 07FFFFFFF ?
000008058 ? 000008058 ?
nsopen()+2315 call nsdo() 2A9749EB80 ? 000000041 ?
000000000 ? 7FBFFFDCD0 ?
7FBFFFDCE8 ? 000000000 ?
nsanswer()+539 call nsopen() 2A9749EB80 ? 000000003 ?
7FBFFFE020 ? 2A9749ECC8 ?
2A9749EC34 ? 000000000 ?
kmnans()+37 call nsanswer() 0064DABC0 ? 2A9749EB80 ?
2A9749EDD0 ? 2A9749ECC8 ?
2A9749EC34 ? 00646AAB8 ?
kmdahd()+385 call kmnans() 0064DABC0 ? 00646AAB8 ?
2A9749EB80 ? 2A9749ECC8 ?
2A9749EBF8 ? 00646AAB8 ?
kmdmai()+5156 call kmdahd() 00646AAB8 ? 00646AAB8 ?
2A9749EB80 ? 2A9749ECC8 ?
2A9749EBF8 ? 00646AAB8 ?
kmmrdp()+564 call kmdmai() 2F1FFF790 ? 00646AAB8 ?
2A9749EB80 ? 2A9749ECC8 ?
2A9749EBF8 ? 00646AAB8 ?
opirip()+1196 call kmmrdp() 2F1FFF790 ? 00646AAB8 ?
2A9749EB80 ? 2A9749ECC8 ?
2A9749EBF8 ? 00646AAB8 ?

SUPPORTING INFORMATION:
———————–
This problem started today after migrating the database from 32 to 64 bits,
no parameter changes were done at that time.
The problem was temporarily solved after increasing the shared_pool and
reducing db_block_buffers, however their main concern is that the problem
will probably arise again even with a bigger shared pool.

The question is why are we making 32K allocation requests when we have
sdu=2048 for dispatchers?

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————

回复
admin says

2010/08/06 at 00:18

Hdr: 7673071 10.2.0.3 NET 10.2.0.3 PRODID-115 PORTID-269 ORA-7445 7282773
Abstract: ORA-4031 IN “NETWORK BUFFER” BRINGS ORA-7445 AND DISPTCHER DOWN

PROBLEM:
——–
When dispatcher receives ORA-4031 in “NETWORK BUFFER”, dispatcher also
receives ORA-7445 and process goes down.

———————————————————–
=====================================
Allocation Request Summary Informaton
=====================================
Current information setting: 04014fff
SGA Heap Dump Interval=3600 seconds
Dump Interval=300 seconds
Last Dump Time=12/26/2008 19:37:16
Dump Count=1
Allocation request for: NETWORK BUFFER <----(*) Heap: 04E425A4, size: 32812 ----------------------------------------------------------- DIAGNOSTIC ANALYSIS: -------------------- In dispatcher trace, error is raised by the following procedure. ORA-4031 -> TNS-12531,TNS-12613 -> ORA-7445

This ORA-7445 error occurs only when ORA-4031 occurs.
So we understand that we are able to avoid ORA-7445 errors by
preventing occurence of ORA-4031 errors.
But most of ORA-4031 occurs by tuning issue, so it is impossible to
avoid this error completely.

Normally when ORA-4031 occurs, dispatcher process are alive and
following requests are handled without problem.
But by this ORA-7445, dispathers go down and many sessions (talking
with that dispatcher process) are gotten errors.
So we think this behavior is a problem.

Similar phenomenon is reported at BUG:5986729.
We can reproduce the problem, and get additonal info.
Therefore please proceed the investigation about this issue.

WORKAROUND:
———–
Avoiding the ORA-4031, but it’s impossible to avoid ORA-4031 completely

RELATED BUGS:
————-
BUG:5986729 DISPACHER DIED BY ORA-7445 AT NSTOCONTROLRTO

REPRODUCIBILITY:
—————-
Same problem reproduces in my ct’s site and my house.
We have comfired on R10.2.0.3 Windows x86 32bit.

TEST CASE:
———-
I will upload testcase later.

STACK TRACE:
————
stack of ORA-7445 is as follows.

ksnwait nsevwait ksliwat kslwaitns_timed kskthbwt kslwait kmdmai
kmmrdp opidrv sou2o opimai_real opimai BackgroundThreadStart@4

SUPPORTING INFORMATION:
———————–
n/a

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
n/a

DIAL-IN INFORMATION:
——————–
n/a

IMPACT DATE:
————
n/a

*** 12/28/08 10:24 pm ***
I uploaded alert and trace with ORA-7445 on /upload/bug7673071
Linux shows good symbol for network functions.
But we see several different stack trace.

10.2.0.3
o nstoToqGetKey nlsqCompare nlsqInsert nstoToqRefresh nstoArmEventATO
nstoPostNTConn nsopen nsanswer kmnans kmdahd kmdmai kmmrdp opirip …
o nserrbc nserrbd nstoHandleEventTo nstoToqWalk nsevwait ksnwait
ksliwat kslwaitns_timed kskthbwt kslwait kmdmai kmmrdp opirip …
o nserror nserrbc nserrbd nstoHandleEventTO nstoToqWalk nsevwait
ksnwait ksliwat kslwaitns_timed kskthbwt kslwait kmdmai kmmrdp …
10.2.0.4
o nlqudeq nlsqSpliceOut nlsqRemove nstoToqRefresh nstoCleanupTOC
nsevidh nsevrec nsevwait ksnwait ksliwat kslwaitns_timed
kskthbwt kslwait kmdmai kmmrdp …
11.1.0.7
o nlqudeq nlsqSpliceOut nlsqRemove nstoToqRefresh nstoCleanupTOC
nsevidh nsevrec nsevwait ksnwait ksliwat kslwait kmdmai kmmrdp …

Before getting SEGV, following errors are recorded in trace file.

error encountered when accepting new connection:
NS Primary Error: TNS-12531: TNS:cannot allocate memory
NS Secondary Error: TNS-12613: Message 12613 not found;

回复
admin says

2010/08/06 at 00:19

Hdr: 2672096 8.1.7.3 RDBMS 8.1.7.3 SHARED SERVER PRODID-5 PORTID-89
Abstract: MTS CONNECTIONS HANG DISPATCHER ALLOCATING “NETWORK BUFFER”

PROBLEM:
——–
After the database is migrated from 7.3.3 to 8.1.7.3, connection to database
via mts hang.
Customer first ran into bug 1666549 and implemented sdu=32000 as a workaround
as described in bug.

Customer then sees scenario of Bug 2660758, That bug was closed as dup of
1666549. The workaround was to decently tune the shared pool and large pool.

However, this customer has upped the shared pool and large pool to 1Gb and
they
are still observing the behaviour that the shared pool gets so fragmented that

no contiguous piece can be found for the 32k net buffers.

THe large pool is not maxing out (>400 free) so there is no evidence PQ
activity is forcing net pakcges out of the large pool into the shared pool.

Also, once this happens the dispatcher seems to wait for SGA mem allocation.
Previous open bug same customer:

2628505 close as duplicate of bug 1666549

DIAGNOSTIC ANALYSIS:
——————–
Inplemented workaround from bug 1666549 by adding the parameter SDU=32000 and
modify the init.ora (mts_dispatchers), problem still there.

Oracle Cousultants onsite investigation:

fter bouncing the instance we see that mts continues to use the shared pool
for the network buffer 32064 bytes, we have configured sqlnet to use an
sdu=32000 as a workaround to bug 1666549. We are now hitting bug 2660758.

What is happening is that the shared pool is fragmenting, until we no longer
have sufficient continuous memory chunks of size 32064 inorder to be used by
the network buffer. It appears that ORACLE only coalesces chunks in the
shared
pool if there are 10 continous chunks free. We have tried flushing the shared

pool every 15 minutes but it appears that this does not coalesce the
fragmented shared pool. It appears that we only succeed in coalescing the
shared pool by bouncing the system. This is not an acceptable workaround for
the client as we are in a 7X24 environment. I roughly estimate that under
normal conditions with the current configuration we can probably only operate
2 to 4 days until it fragments.

WORKAROUND:
———–
no workwound other than restarting database

RELATED BUGS:
————-
Bug:2628505 as duplicate bug 1666549.
bug 2660758.

REPRODUCIBILITY:
—————-
Frequent

TEST CASE:
———-
on customer production site.

STACK TRACE:
————

SUPPORTING INFORMATION:
———————–

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
Bing Said +61 3 8616 4119

DIAL-IN INFORMATION:
——————–

IMPACT DATE:
————

2628505 was logged for this customer and closed as dup of bug 1666549. Cust
already large_pool of 1 gig which was the w/a.
currently the problem occurs every 2 – 4 days where the system hangs and need
to be re-booted. As the system is currently operational I am downgrading this
to P2.

The issue for bug 1666549 is addressed in 9i and I would suggest the customer
upgrade to this version as the fix(es) involved are not backportable to 8.1.7

回复
admin says

2010/08/06 at 12:04

Applies to:

Oracle Net Services – Version: 8.1.7.4.0 to 10.1.0.2.0
Information in this document applies to any platform.
Applications using OracleNet, applications requiring performance tuning with respect to OracleNet.
Goal

==================
AIM OF THIS ARTICLE
==================

++ What is the relation between MTU, SDU, and TDU.
++ Explain how applications are affected by the above paramters.
++ To build upon and concisely explain some important points of the following Notes:

1) Note:67983.1 “Oracle Net Performance Tuning”
2) Note:44694.1 “SQL*Net Packet Sizes (SDU & TDU Parameters)”
Solution

Basically, MTU stands for Maximum Transmission Unit. It defines the maximum size of a packet that can be transferred in one frame over a network. Hence, translating this to the TCP/IP suite, it would mean that if the MTU is 1500 bytes, 20 bytes of those 1500 bytes are lost to the IP header, and 20 bytes to the TCP packet header. Thus, a maximum of only 1460 bytes of actual application data can be transferred in one batch,ie , in a single packet, over the network. (This doesn’t really “waste” any bytes, its just that they are unusable from the application point of view).

In Oracle terms, this 1460 bytes of data consists of two more headers – the NS header, and the NT header. When any application passes its data via OracleNet, the NS layer of OracleNet receives this application data, and splits it up into its own packets (NS packets). The size of this packet is determined by the SDU size (SDU stands for Session Data Unit). By default, this size, ie, SDU, is set to 2048 bytes.

The NS layer, after splitting the application data into packets, send EACH NS packet to the NT layer. The NT layer is the layer of OracleNet that does the actual interaction with the real Transport layer, ie, with TCP in our case. The NT layer will further split up EACH packet is received from NS into further packets, this time, determined by the TDU size (TDU stands for Transmission Data Unit).

Thus, effectively, how many NT packets are generated will actually determine how efficiently data is being sent over the network.

The following notes can be referred –

1) Note: 44694.1 “SQL*Net Packet Sizes (SDU & TDU Parameters)”
2) Note: 67983.1 “Oracle Net Performance Tuning”

Note 1) gives an introduction to SDU and TDU, but note 2) gives a very clear excerpt on exactly how SDU and TDU settings affect your performance over the network, in relation to the MTU of the network.

The excerpt in note 2) that one intending to tune an application should be looking at is –

”
3. Setting the SDU size

From a SQL*Net perspective, the only thing that can be tuned is the SDU size. SDU stands for Session Data Unit. SDU is a buffer at the Network Session (NS) layer from which SQL*Net packets are passed via TDU (Transport Data Unit) at the Network Transport (NT) layer, to the underlying protocol. By default in version 2.3 (7.3), SDU was set to 4K. However, it changes to 2K in Net*8 and Oracle*Net (8.0 and higher).

Below is a real life example where a customer’s application performance was improved by tuning the SDU size.

A company had a large Global Information System written in OCI, where information was initially stored in flat files and then passed via the OCI application into Oracle 8.0.4. The application read the flat file in fixed chunks and then inserted them via Net8 into the database. The data was read from the flat files in 4K blocks but, when passed via OCI over Net8, the amount of data passed over the line equated to 2900 bytes.

With the default SDU size of 2K, each block of data was handled as follows:

OCI App —-> Net*8 2900 bytes passed
NS —-> NT 2048 bytes (2K SDU size is being used)
TCP Layer 1460 bytes – packet 1
588 bytes – packet 2
NS —-> NT 852 bytes
TCP Layer 852 bytes – packet 3

If we set SDU to 4K, each block will be handled as follows:

OCI App —-> Net*8 2900 bytes passed
NS —-> NT 2900 bytes (4K SDU size is being used)
TCP Layer 1460 bytes – packet 1
1450 bytes – packet 2

So for each 4K block the OCI application passes to Net8, two TCP packets are sent instead of three by changing the SDU size. Ignoring Acknowledgment packets (ACK’s), given that each flat file was 30MB in size, there would be 7680 X 4K blocks of data to send. With SDU at 2K, 23040 packets of data are
sent. With SDU set at 4K, 15360 packets are sent, a saving of 7680 data packets.

It could be argued that the same amount of data is being passed over the wire, and it should take the same amount of time given a fixed bandwidth. However each packet has Data Link Control (DLC), TCP, and IP header information which take extra bandwidth. For example 7680 extra packets will use 405K of extra header data. Secondly, each packet will take a finite amount of time to pass through the Network Interface Card (NIC), routers etc, all of which will add time to the overall transaction. Also, if the ACK’s in the equation included, packets on the network are lower with SDU set to 4K.

To understand this it is necessary to understand how ACK’s work. In the above example, where 2K SDU size was used then there will be three data packets sent. The way TCP works is that if a data packet is sent and if that is the end of transmission, an ACK will be sent back. If some data is sent and it fills two
packets, an ACK will be sent on receipt of two consecutive TCP segments. If three data packets are sent and the transmission ends, an ACK will be sent for the first two data packets and an ACK for the third packet. Therefore with an SDU of 2K, there will be three data packets and therefore get two ACK packets per 2900 bytes of data.

With an SDU of 4K there are only two packets per 2900 bytes sent therefore only one ACK is sent. So with an SDU of 2K, 15360 ACK’s are sent. And with an SDU of 4K set there will 7680 ACK packets, a saving of 7680 ACK packets.

So as demonstrated by setting SDU appropriately there can be a marked improvement in network and application performance.

How To –>

To set SDU size, edit the TNSNAMES.ORA and also the LISTENER.ORA on the desired server, as follows (if either one is not set, the default SDU will be used for SQL*Net communication):

#TNSNAMES.ORA

ukp9930.world =
(DESCRIPTION =
(SDU=4096) #<– SDU can be set from 512 bytes to 32Kb
(ADDRESS_LIST =
(ADDRESS =
(PROTOCOL = TCP)
(Host = ukp9930)
(Port = 1521)
)
)
(CONNECT_DATA = (SERVICE_NAME = rep1))
)

#LISTENER.ORA

LISTENER =
(ADDRESS_LIST =
(ADDRESS =(PROTOCOL= TCP)(Host= UKP9930)(Port= 1521))
)

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SDU=4096) <———- SDU has to match the setting in
(SID_NAME = rep1) tnsnames otherwise it will negotiate to
) the lowest of the two sizes
)

Again getting the best SDU setting comes down to a bit of maths.

This is all that can be done from an Application or SQL*Net perspective. However if this is done right but performance is still an issue, then there are other things which are degrading performance on the WAN.

================
END OF EXCERPT
================

Hence, the principle is that we make the TDU a multiple of your SDU, and the (TDU+40) a multiple of the MTU. The SDU's default is 2K, and the TDU is 32K. Normally, the SDU and TDU defaults are best kept in that way. One can however follow the above suggestions for further tuning.

Finally, the speed of the network interfaces (network cards) does not have any direct bearing on the above parameters. Thus, whether the network card is of speed 100Mega bit per second or 1Giga bit per second, it is the MTU size that will bear upon the above parameters.

回复
admin says

2010/08/06 at 12:05

Applies to:

Oracle Net Services – Version: 8.0.5.0.0 to 11.1.0.6.0
Information in this document applies to any platform.
Purpose

Oracle Net (formally SQL*Net) allows limited control over the packet sizes via the parameters SDU (Session Data Unit) and TDU (Transport Data Unit). These control the sizes of the ‘Session’ and ‘Transport’ OSI layer buffers respectively. The TDU parameter has been deprecated in the Oracle Net v8.0 and beyond and is ignored. It is only mentioned here for backward compatibility.
Scope and Application

This document covers the Session Data Unit configuration parameter.
SQL*Net Packet Sizes (SDU & TDU Parameters)

Packet Sizes

SDU is the Session Data Unit of the NS layer and regulates the size of the sent and read data to the NT layer. SDU values range from 512 to 32767 bytes with a default of 2048 bytes (this range depends on version). To minimize Oracle Net packet header overhead and message fragmentation, set the SDU size as a multiple of the MSS (Maximum Segment Size of the Network Protocol being used). Keep in mind that the old parameter, TDU, was the same as the Maximum Transmission Unit (MTU) of Ethernet we are now familiar with, yet with older Physical Layer protocols it was not called MTU so Oracle used the term of TDU.

To calculate the MSS:
MSS = MTU – TCP header size – IP header size

For TCP over Ethernet:
MTU (or TDU) – 1500 bytes for Ethernet
TCP – 20 bytes
IP – 20 bytes

This leaves 1460 bytes (approximately as sometimes the TCP header is larger) for the MSS for TCP/IP over Ethernet. The Transparent Network Substrate (TNS) header is an additional 30 bytes. So the real amount of data that can be sent will be 1430. Since the TNS header is included in the TCP data packet, you include the TNS header in the SDU size. For instance, if you had 5720 bytes of data to send, it will send it in four TCP packets (5720/1430=4). There will be four TNS packets sent, so this adds 30 bytes per packet for the TNS header (1430+30=1460). Adding the 40 bytes of TCP/IP header, you will get four full Ethernet packets sent (1460+40=1500). In order to get TCP/IP Windowing to function efficiently, you should configure the TCP Send and Receive buffers accordingly (which is not covered in this paper).

TDU is the Transport Data Unit and regulates the size of the sent and read data at the Transparent Network Substrate Network Transport ITNS NT) layer (the layer that communicates to the Operating System protocol layer). Overriding the default TDU size of 32767 bytes is normally not required, and in Oracle v8.0 and later it cannot be configured. Values for TDU range from 0 to 32767. Not setting TDU will set its value to the default for that protocol adapter (also the same as setting TDU=0). A TDU value of 1, for example, will cause the NT layer to read an write only 1 byte of data from, or to, the actual protocol layer. The NS layer will append all of the data together. For this reason it is advisable to let Sql*Net obtain as much data as it can from the network by using the default values.

The SQL*Plus “arraysize” parameter (a parameter that determines the number of rows fetched per network trip) will obviously play a major role in performance since more fetched rows means more fetched bytes of data.

For a connection to have an SDU size greater than 2048, the client and the server must specify a value the larger value for the SDU. The database will then select the lower of the 2 values.

SDU Configuration:

To configure the SDU to work, ensure the SDU values appear in all the relevant places. Here are some examples for the 2 main locations:

1.Client’s TNSNAMES.ORA: The parameters must appear in the DESCRIPTION clause.

TEST =
(DESCRIPTION =
(SDU=8192)
(TDU=8192) <- 8.0 TDU position
(ADDRESS =(PROTOCOL = TCP)(HOST = bill.johndoe.com)(PORT = 1521))
(CONNECT_DATA = (SID = V920)))

LISTENER.ORA: The parameters must appear in the SID_DESC clause.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SDU = 8192) <- Setting SDU to 8192 for this SID
(TDU = 8192) <- 8.0 TDU position
(SID_NAME = V920)
(ORACLE_HOME = /oracle/product/9.2.0)))

2. From releases 9.0.1.5, 9.2.0.4 and 10.2.0.1 onwards, the default SDU size can now be changed for connections using SERVICE_NAME. ie Dynamic registeration. Where as example 1, is for SID only, which is static registration

SQLNET.ORA

DEFAULT_SDU_SIZE = 8192
This now means SDU can be set on a per connection basis. Either by adding SDU parameter to local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA), or by setting SDU for all Oracle Net connections with the parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file on the client and server.

Shared Server Configuration

If using shared server processes, set the SDU size in the DISPATCHERS parameter as follows:
DISPATCHERS="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP))(SDU=8192))"

Note: For Oracle8, use the parameter MTS_DISPATCHERS.

Ensure that the SDU size matches the value configured for the client, as the Server will select the smaller of the values.

回复
admin says

2010/08/06 at 12:06

How to Determine SDU Value Being Negotiated Between Client and Server
Applies to:

Oracle Net Services
Information in this document applies to any platform.
Checked for relevance on 04-FEB-2010
Goal

How to determine, via Oracle Net tracing, if the DEFAULT_SDU_SIZE parameter is working. i.e. Is the new SDU packet size being negotiated properly between client and server?
Solution

After adding DEFAULT_SDU_SIZE setting to the sqlnet.ora file at both client and server then restarting the listener, generate a client trace from any remote client.

Sqlnet.ora file at the server example:

DEFAULT_SDU_SIZE=8192

Use the same parameter at the client side sqlnet.ora file.

Edit the client’s sqlnet.ora file and add the following syntax:

TRACE_DIRECTORY_CLIENT =
TRACE_FILE_CLIENT = CLIENT
TRACE_LEVEL_CLIENT = 16

Now establish a connection to the database from the remote client and immediately exit.

Then review the cli*.trc file on the client and check for a line similar to the following :

[21-JAN-2010 21:48:14:821] nsconneg: vsn=314, lov=300, opt=0x81, sdu=8192, tdu=32767, ntc=0x7f08
[21-JAN-2010 21:48:14:821] nsconneg: vsn=314, gbl=0x81, sdu=8192, tdu=32767
Note that the increased value, sdu=8192, was negotiated between client and server.

回复

共享池中的NETWORK BUFFER

Comments

Comment 取消回复