IO校准(IO Calibration)特性可以帮助我们了解存储系统的真实性能,以进一步判断I/O性能问题是由数据库还是存储系统自身引起的。I/O校准特性(IO Calibration)通过对Oracle数据文件的随机I/O访问存储介质,其结论值更符合数据库IO性能的真实情况。
在使用该特性前,我们要确保满足以下条件:
- 调用该存储过程需要用到SYSDBA权限
- TIME_STATISTICS 参数为true:
SQL> show parameter timed_statistics;
NAME TYPE VALUE
———————————— ———– ——————————
timed_statistics boolean TRUE
- 必须打开异步IO;注意在使用文件系统时,FILESYSTEMIO_OPTIONS需设为SETALL或ASYNC,否则Oracle不会启用异步IO。
- 可以通过以下查询检验是否启用了异步IO:
SQL> SELECT NAME, ASYNCH_IO 2 FROM V$DATAFILE F, V$IOSTAT_FILE I 3 WHERE F.FILE# = I.FILE_NO 4 AND FILETYPE_NAME = 'Data File'; NAME ASYNCH_IO -------------------------------------------------------------------------------- --------- D:\TOOLS\ADMINSTRATORG\ORADATA\PROD\DATAFILE\O1_MF_SYSTEM_65DN8HXT_.DBF ASYNC_ON D:\TOOLS\ADMINSTRATORG\ORADATA\PROD\DATAFILE\O1_MF_SYSAUX_65DN8J18_.DBF ASYNC_ON D:\TOOLS\ADMINSTRATORG\ORADATA\PROD\DATAFILE\O1_MF_UNDOTBS1_65DN8J1S_.DBF ASYNC_ON D:\TOOLS\ADMINSTRATORG\ORADATA\PROD\DATAFILE\O1_MF_USERS_65DN8J2X_.DBF ASYNC_ON
IO Calibration特性可以通过DBMS_RESOURCE_MANAGER.CALIBRATE_IO存储过程调用;该过程会对Oracle数据文件引发一系列IO敏感的只读工作负载(由1MB大小的随机IO组成),从而判断存储系统所能持续的最大IOPS(每秒最大IO请求数)和MBPS(每秒IO传输速率)。为了使结果更具代表性,应当保持IO Calibration测试过程中整个数据库是空闲的,没有其他IO负载损耗。
让我们具体来看一下CALIBRATE_IO存储过程的调用方法:
-- Initiate an I/O calibration -- -- Input arguments: -- num_physical_disks - Approximate number of physical disks in -- the database storage -- max_latency - Maximum tolerable latency in milliseconds for -- database-block-sized IO requests -- -- Output arguments: -- max_iops - Maximum number of I/O requests per second that can be -- sustained. The I/O requests are randomly-distributed, -- database-block-sized reads. -- max_mbps - Maximum throughput of I/O that can be sustained, -- expressed in megabytes per second. The I/O requests -- are randomly-distributed, 1 megabyte reads. -- actual_latency - Average latency of database-block-sized I/O -- requests at "max_iops" rate, expressed in milliseconds. -- PROCEDURE calibrate_io ( num_physical_disks IN PLS_INTEGER DEFAULT 1, max_latency IN PLS_INTEGER DEFAULT 20, max_iops OUT PLS_INTEGER, max_mbps OUT PLS_INTEGER, actual_latency OUT PLS_INTEGER );
其中输入参数num_physical_disks用以指定数据库存储所用物理磁盘的近似数;max_latency指定了客户所能容忍的最大IO 延迟时间,单位为ms;一般我们认为IO平均等待在10ms左右是一种较好的表现,远大于这个数字往往说明IO负载过高了,这里我们假定15ms是我们的deadline。
SQL> set serveroutput on; SQL> DECLARE 2 lat INTEGER; 3 iops INTEGER; 4 mbps INTEGER; 5 BEGIN 6 -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (disk_count,max_latency , iops, mbps, lat); 7 DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 15, iops, mbps, lat); 8 9 DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops); 10 DBMS_OUTPUT.PUT_LINE ('latency = ' || lat); 11 dbms_output.put_line('max_mbps = ' || mbps); 12 end; 13 / max_iops = 127 latency = 15 max_mbps = 38 PL/SQL procedure successfully completed /*平均延迟为15ms时,最大iops为127,mbps为38*/ /*执行过程中也可以通过V$IO_CALIBRATION_STATUS视图了解进度*/ SQL> select * from V$IO_CALIBRATION_STATUS; STATUS CALIBRATION_TIME ------------- -------------------------------------------------------------------------------- IN PROGRESS 11-8月 -10 12.02.35.121 下午 SQL> alter session set nls_timestamp_format='HH24:MI'; Session altered SQL> col start_time for a10; SQL> col end_time for a10; SQL> select * from DBA_RSRC_IO_CALIBRATE; START_TIME END_TIME MAX_IOPS MAX_MBPS MAX_PMBPS LATENCY NUM_PHYSICAL_DISKS ---------- ---------- ---------- ---------- ---------- ---------- ------------------ 12:55 12:59 127 38 14 15 2 /*DBA_RSRC_IO_CALIBRATE视图记录了IO CALIBRATION的历史信息*/
[gview file=”http://askmac.cn/wp-content/uploads/resource/calibrate_io.pdf”]
Hdr: 7047806 11.1.0.6 RDBMS 11.1.0.6 VOS PRODID-5 PORTID-226 ORA-600
Abstract: ORA-600 [12240] AFTER INTERRUPTING DBMS_RESOURCE_MANAGER.CALIBRATE_IO
PROBLEM:
——–
To reproduce, simply run and interrupt (Ctrl-C)
dbms_resource_manager.calibrate_io.
Then you are going to see a few cs (Calibration Slave?) background processes
that are waiting on “KSV master wait”.
A foreground is trying to kill these processes and its trace file contains
the following:
rdel: Failed to kill slaves
After a few tries, the process gives up with an oeri(12240) error:
rdel: Failed to kill slaves
Incident 105769 created, dump file:
/ade/orabench_lin/oracle/log/diag/rdbms/dwmain/MAIN/incident/incdir_105769/MAI
N_ora_13684_i105769.trc
ORA-600: internal error code, arguments: [12240], [], [], [], [], [], [],
[], [], [], [], []
At this point pmon takes the lead and tries to cleanup this now dead process
but cannot and generate the following in its trace file:
*** 15:14:01.372
found process 0x4c71495c pid=21 serial=123 ospid = 13684 dead
rdel: Failed to kill slaves
DIAGNOSTIC ANALYSIS:
——————–
I have reproduced this in-house on label RDBMS_MAIN_LINUX_080507 with ASM.
WORKAROUND:
———–
RELATED BUGS:
————-
REPRODUCIBILITY:
—————-
TEST CASE:
———-
STACK TRACE:
————
SUPPORTING INFORMATION:
———————–
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
—————————————-
DIAL-IN INFORMATION:
——————–
IMPACT DATE:
————
When shutting down the database, these cs slaves generate trace files with
this sort of content:
ORA-448: normal completion of background process
ORA-448: normal completion of background process
—– SQL Statement (None) —–
Current SQL information unavailable – no cursor.
—– Call Stack Trace —–
calling call entry argument values in hex
location type point (? means dubious value)
——————– ——– ——————–
—————————-
skdstdst()+38 call kgdsdst() BFFFA088 ? 2 ?
ksedst1()+81 call skdstdst()+42949672 BFFFA088 ? 0 ? 1 ? 88787CE
?
58 E5A0C78 ? 88787CE ?
ksedst()+30 call ksedst1()+308268239 B79C1C88 ? B7BF058C ?
5 2050033 ? 0 ? FF ? 2004 ?
dbkedDefDump()+960 call ksedst()+4294967266 0 ? 4BA29BBC ? 4BA29BBC ?
70A068CE ? 0 ? 4CC507A4 ?
ksedmp()+35 call dbkedDefDump()+4294 1 ? 0 ?
966336
ksfdcse()+7883 call ksedmp()+4294967261 1 ? 13B ? BFFFA4F8 ? E ?
4C716978 ? 20008000 ?
kcfcse()+550 call ksfdcse()+429495941 BFFFAB94 ? 0 ? B79D8024 ?
3 182715 ? 3B039D ? 15 ?
ksvrdp()+12669 call 00000000 0 ? 0 ? 0 ?
opirip()+662 call ksvrdp()+539424192 6F746365 ? 64207972 ?
2073656F ? 20746F6E ?
73697865 ? 6F662074 ?
opidrv()+500 call opirip()+4294966634 32 ? 4 ? BFFFC628 ?
sou2o()+71 call opidrv()+4294966796 32 ? 4 ? BFFFC628 ?
RELEASE NOTES:
]]A state transition was not occuring on CTRL-C
]]causing the ORA-600 in this bug.
]]
REDISCOVERY INFORMATION:
If you perform a CTRL-C while doing IO calibration and
you see this trace in the alert log, you may have hit this
bug.
WORKAROUND:
None