SQL Performance Analyzer SPA常用脚本汇总

SPA常用脚本汇总

附件为 一个SPA报告 spa_buffergets_summary

 

SQL 性能分析器 SQL Performance Analyzer SPA

Oracle Database 11g 引入了 SQL 性能分析器;使用该工具可以准确地评估更改对组成工作量的 SQL 语句的影响。SQL 性能分析器可帮助预测潜在的更改对 SQL 查询工作量的性能影响。这种功能可向 DBA 提供有关 SQL 语句性能的详细信息,例如,执行前后的统计信息,提高或降低性能的语句。这样一来,您就可以执行诸如以下操作的操作:在测试环境中进行更改,以确定数据库升级是否会改进工作量性能。

 

  1. 11g 的新增功能
  2. 目标用户:DBA、QA、应用程序开发人员
  3. 帮助预测系统更改对 SQL 工作量响应时间的影响
  4. 建立不同版本的 SQL 工作量性能(即 SQL 执行计划和执行统计信息)
  5. 以串行方式执行 SQL(不考虑并发性)
  6. 分析性能差异
  7. 提供对单个 SQL 的细粒度性能分析
  8. 与 SQL 优化指导集成在一起以优化回归

SQL 性能分析器:使用情形
SQL 性能分析器可用于预测和防止会影响 SQL 执行计划结构的任何数据库环境更改所带来的潜在性能问题。这些更改可以包括(但不限于)以下任何一种更改:

  1. 数据库升级
  2. 实施优化建议
  3. 更改方案
  4. 收集统计信息
  5. 更改数据库参数
  6. 更改操作系统和硬件

 

DBA 甚至可以使用 SQL 性能分析器为最复杂的环境预测先期更改导致的 SQL 性能更改。例如,随着应用程序在开发周期中的变化,数据库应用程序开发人员可以测试对方案、 数据库对象和重写应用程序的更改,以减轻任何潜在的性能影响。
使用 SQL 性能分析器还可以比较 SQL 性能统计信息。

SQL 性能分析器:概要

1.  收集 SQL:在这个阶段中,将收集用于表示生产系统中的 SQL 工作量的 SQL 语句集。可以使用 SQL 优化集或自动工作量资料档案库 (AWR) 来捕获要传送的信息。因为 AWR 本质上是捕获高负载的 SQL,所以应考虑修改默认的 AWR 快照设置和捕获的顶级 SQL,以确保 AWR 捕获最大数量的 SQL 语句。这可以确保捕获更加完整的 SQL 工作量。

2.  传送:在这个阶段中,应将得到的工作量结果传送到测试系统。从生产系统导出 STS,然后将 STS 导入到测试系统。

3.  计算“之前版本”性能:在进行任何更改之前,执行 SQL 语句,收集评估将来的更改对工作量性能的可能影响所需的基线信息。在此阶段收集的信息给出了系统工作量当前状态的一个快照。性能数据包括:

-执行计划(如由解释计划生成的计划)
-执行统计信息(如由占用时间、缓冲获取次数、磁盘读取次数和已处理的行数组成的信息)

4. 进行更改:获得了之前版本数据后,可以实施计划的更改,然后开始查看对性能的影响。

5.  计算“之后版本”性能:在数据库环境中进行了更改之后才执行此步骤。SQL 工作量的每个语句都在虚拟执行(仅收集统计信息)模式下运行,收集与步骤 3 所捕获的信息相同的信息。

6.  比较和分析 SQL 性能:在获得了两个版本的 SQL 工作量性能数据后,可以通过比较之后版本与之前版本的数据来进行性能分析。比较的根据是执行统计信息,如所用时间、CPU 时间和缓冲区获取次数等。

7.  优化回归的 SQL:在此阶段中,已经准确地确认了哪些 SQL 语句在进行数据库更改时可能导致性能问题。在此阶段中可以使用任何一种数据库工具来优化系统。例如,可以对确认的语句使用 SQL 优化指导或访问指导,然后实施相应的建议。也可以使用在步骤 3 中捕获的计划植入 SQL 计划管理 (SPM) 以确保计划保持不变。在实施了任何优化操作后,应重复该过程来创建新的之后版本,然后分析性能差异以确保新的性能是可接受的。
默认情况下SPA若涉及到DML语句则只有查询部分Query会被执行,但是貌似是从11.2开始可以执行完全的DML了,需要加入参数EXECUTE_FULLDML,但是该参数目前有一些BUG:

Bug 10428438 : WITH EXECUTE_FULLDML ROWS IS ALWAYS SET TO 0 11.2.0.1

Bug 14635522 : SPA SHOULD CAPTURE AND REPLAY TRANSACTIONS 11.2.0.3

 

By default, only the query portion of DMLs is executed. Using APIs, you can execute the full DML by using the EXECUTE_FULLDML task parameter.EXECUTE_FULLDML when set to TRUE executes DML statement fully, including acquiring row locks and modifying rows; When EXECUTE_FULLDML is set to FALSE (the default value is false) to execute only the query part of the DML without modifying data. When TRUE, SQL Performance Analyzer will issue a rollback following DML execution to prevent persistent changes from being made by the DML. So SPA does not make make any change to the data in the tables.

 

执行方法如下:

 

execute DBMS_SQLPA.SET_ANALYSIS_TASK_PARAMETER(task_name   => 'TASK_21137', -
                                               parameter   => 'EXECUTE_FULLDML', -
                                               value       => 'TRUE');

 

 

 

 

从cursor cache中收集tuning set, 持续12分钟,间隔5秒钟

 

 

begin
DBMS_SQLTUNE.CREATE_SQLSET (sqlset_name => 'MAC_SPA');
dbms_sqltune.capture_cursor_cache_sqlset(
sqlset_name => 'MAC_SPA' ,
time_limit => 12*60,
repeat_interval => 5);
end ;
/

basic_filter=> q'# module like 'DWH_TEST%' and sql_text not like '%applicat%' and parsing_schema_name in ('APPS') #'

basic_filter   => 'sql_text LIKE ''%my_objects%'' and parsing_schema_name = ''SPA_TEST_USER''',

==>过滤条件使用

 

从当前cursor cache中匹配条件 获得SQLset ROW

 

 

SELECT sql_id, sql_text 
FROM table(DBMS_SQLTUNE.SELECT_CURSOR_CACHE('buffer_gets > 500')) 
ORDER BY sql_id;

SELECT * 
FROM table(DBMS_SQLTUNE.SELECT_CURSOR_CACHE('sql_id = ''4rm4183czbs7j'''));

 DECLARE
  cur sys_refcursor;
BEGIN
  OPEN cur FOR
    SELECT value(P) 
    FROM table(DBMS_SQLTUNE.SELECT_CURSOR_CACHE) P;

  -- Process each statement (or pass cursor to load_sqlset).

  CLOSE cur;
END;
/

 -- create the tuning set
EXEC DBMS_SQLTUNE.CREATE_SQLSET('MAC_SPA');
-- populate the tuning set from the cursor cache
DECLARE
 cur DBMS_SQLTUNE.SQLSET_CURSOR;
BEGIN
 OPEN cur FOR
   SELECT VALUE(P)
     FROM table(
       DBMS_SQLTUNE.SELECT_CURSOR_CACHE(
         'parsing_schema_name <> ''SYS'' AND elapsed_time > 5000000',
          NULL, NULL, NULL, NULL, 1, NULL,
         'ALL')) P;

DBMS_SQLTUNE.LOAD_SQLSET(sqlset_name => 'MAC_SPA',
                        populate_cursor => cur);

END;
/

 

 

从AWR快照中加载SQLset ROW到SQL TUNING SET

 

 

DECLARE
  cur sys_refcursor;
BEGIN
  OPEN cur FOR
    SELECT VALUE (P) 
    FROM table(dbms_sqltune.select_workload_repository(4146,4161)) P;

  -- Process each statement (or pass cursor to load_sqlset)
  DBMS_SQLTUNE.LOAD_SQLSET(sqlset_name => 'MAC_SPA',
                        populate_cursor => cur);
  CLOSE cur;
END;
/

 

 

 

将SQL TUNING SET Pack到表中:

 

 

set echo on
select name,statement_count from dba_sqlset;

drop table maclean.pack_sqlset purge;

exec DBMS_SQLTUNE.CREATE_STGTAB_SQLSET('PACK_SQLSET','MACLEAN');

exec DBMS_SQLTUNE.PACK_STGTAB_SQLSET('MAC_SPA','SYS','PACK_SQLSET','MACLEAN');

SQL> desc maclean.pack_sqlset;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 NAME                                               VARCHAR2(30)
 OWNER                                              VARCHAR2(30)
 DESCRIPTION                                        VARCHAR2(256)
 SQL_ID                                             VARCHAR2(13)
 FORCE_MATCHING_SIGNATURE                           NUMBER
 SQL_TEXT                                           CLOB
 PARSING_SCHEMA_NAME                                VARCHAR2(30)
 BIND_DATA                                          RAW(2000)
 BIND_LIST                                          SQL_BIND_SET
 MODULE                                             VARCHAR2(48)
 ACTION                                             VARCHAR2(32)
 ELAPSED_TIME                                       NUMBER
 CPU_TIME                                           NUMBER
 BUFFER_GETS                                        NUMBER
 DISK_READS                                         NUMBER
 DIRECT_WRITES                                      NUMBER
 ROWS_PROCESSED                                     NUMBER
 FETCHES                                            NUMBER
 EXECUTIONS                                         NUMBER
 END_OF_FETCH_COUNT                                 NUMBER
 OPTIMIZER_COST                                     NUMBER
 OPTIMIZER_ENV                                      RAW(1000)
 PRIORITY                                           NUMBER
 COMMAND_TYPE                                       NUMBER
 FIRST_LOAD_TIME                                    VARCHAR2(19)
 STAT_PERIOD                                        NUMBER
 ACTIVE_STAT_PERIOD                                 NUMBER
 OTHER                                              CLOB
 PLAN_HASH_VALUE                                    NUMBER
 PLAN                                               SQL_PLAN_TABLE_TYPE
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             BLOB
 SPARE4                                             CLOB

 

 

 

将测试对应 schema的数据和 上述PACK TABLE 导出导入到 目标测试库中:

 

set echo on
exec DBMS_SQLTUNE.UNPACK_STGTAB_SQLSET('MAC_SPA','SYS',TRUE,'PACK_SQLSET','MACLEAN');
alter system flush buffer_cache;
alter system flush shared_pool;

 

 

创建SPA任务 并运行;

 

 

var sts_task varchar2(64);
exec :sts_task:= dbms_sqlpa.create_analysis_task(task_name => '10g_11g_spa',description => 'experiment for 10gR2 to 11gR2 upgrade',sqlset_name=> 'MAC_SPA');

PL/SQL procedure successfully completed.

var exe_task varchar2(64);
exec :exe_task:=dbms_sqlpa.execute_analysis_task(task_name=>'10g_11g_spa',execution_name=>'10g_trail',execution_type=>'CONVERT SQLSET',execution_desc=>'10g sql trail');

var exe_task varchar2(64);
exec :exe_task:=dbms_sqlpa.execute_analysis_task(task_name=>'10g_11g_spa',execution_name=>'11g_trail',execution_type=>'TEST EXECUTE',execution_desc=>'11g sql trail');

 

 

 

执行任务比较

 

 

 

比较CPU_TIME
EXEC dbms_sqlpa.execute_analysis_task( -
  task_name => '10g_11g_spa', -
  execution_name => 'compare_10g_112_cpu', -
  execution_type => 'COMPARE PERFORMANCE', -
  execution_params => dbms_advisor.arglist('COMPARISON_METRIC','CPU_TIME','EXECUTION_NAME1','10g_trail','EXECUTION_NAME2','11g_trail'), -
  execution_desc => 'Compare 10g SQL Trace Performance to 11g Test-Execute for CPU_TIME')
  /

比较BUFFER_GETS
EXEC dbms_sqlpa.execute_analysis_task( -
  task_name => '10g_11g_spa', -
  execution_name => 'compare_10g_112_buffergets', -
  execution_type => 'COMPARE PERFORMANCE', -
  execution_params => dbms_advisor.arglist('COMPARISON_METRIC','BUFFER_GETS','EXECUTION_NAME1','10g_trail','EXECUTION_NAME2','11g_trail'), -
  execution_desc => 'Compare 10g SQL Trace Performance to 11g Test-Execute for BUFFER_GETS')
  /

比较实际执行时长 

begin 
DBMS_SQLPA.EXECUTE_ANALYSIS_TASK( 
task_name => 'SPA_TEST', 
execution_type => 'COMPARE PERFORMANCE', 
execution_name => 'Compare_elapsed_time', 
execution_params => dbms_advisor.arglist('execution_name1', '10g_trail', 'execution_name2', '11g_trail', 'comparison_metric', 'elapsed_time') ); 
end; 
/

比较物理读

begin 
DBMS_SQLPA.EXECUTE_ANALYSIS_TASK( 
task_name => '10g_11g_spa', 
execution_type => 'COMPARE PERFORMANCE', 
execution_name => 'Compare_physical_reads0', 
execution_params => dbms_advisor.arglist('execution_name1', '10g_trail', 'execution_name2', '11g_trail', 'comparison_metric', 'disk_reads') ); 
end; 
/

Set the comparison_metric parameter to specify an expression of execution 
statistics to use in the performance impact analysis. Possible values include 
the following metrics or any combination of them: elapsed_time (default), 
cpu_time, buffer_gets, disk_reads, direct_writes, and optimizer_cost.

 

 

 

获得SPA报告:

 

 

 

set long 100000 longchunksize 100000 linesize 200 head off feedback off echo off 
spool spa_report_elapsed_time.html 
SELECT dbms_sqlpa.report_analysis_task('SPA_TEST', 'HTML', 'ALL','ALL', execution_name=>'Compare_elapsed_time') FROM dual; 
spool off

产生buffergets 比较report    

set heading off long 100000000 longchunksize 10000 echo off;
set linesize 1000 trimspool on;
spool buffergets_summary.html
select xmltype(dbms_sqlpa.report_analysis_task('10g_11g_spa',
                                                'html',
                                                'typical',
                                                'all',
                                                null,
                                                100,
                                                'compare_10g_112_buffergets')).getclobval(0,0)
from dual;
spool off

产生errors比较report 
spool errors_summary.html
select xmltype(dbms_sqlpa.report_analysis_task('10g_11g_spa',
                                                'html',
                                                'errors',
                                                'summary',
                                                null,
                                                100,
                                                '11g_trail')).getclobval(0,0)
from dual;
spool off

产生unsupport比较report 
spool unsuppor_all.html
select xmltype(dbms_sqlpa.report_analysis_task('10g_11g_spa',
                                                'html',
                                                'unsupported',
                                                'all',
                                                null,
                                                100,
                                                '11g_trail')).getclobval(0,0)
from dual;
spool off

 

 

 

 

 

execution_type
Type of the action to perform by the function. If NULL it will default to the value of the DEFAULT_EXECUTION_TYPE parameter. Possible values are:
[TEST] EXECUTE – test-execute every SQL statement and collect its execution plans and execution statistics. The resulting plans and statistics will be stored in the advisor framework. This is default.
EXPLAIN PLAN – generate explain plan for every statement in the SQL workload. This is similar to the EXPLAIN PLAN command. The resulting plans will be stored in the advisor framework in association with the task.
COMPARE [PERFORMANCE] – analyze and compare two versions of SQL performance data. The performance data is generated by test-executing or generating explain plan of the SQL statements. Use this option when two executions of type EXPLAIN_PLAN or TEST_EXECUTE already exist in the task
CONVERT SQLSET – used to read the statistics captured in a SQL Tuning Set and model them as a task execution. This can be used when you wish to avoid executing the SQL statements because valid data for the experiment already exists in the SQL Tuning Set.

 

 

For 9i Upgrade to 10g

 

 

exec dbms_stats.gather_system_stats(gathering_mode=>'NOWORKLOAD');

alter system set "_optim_peek_user_binds"=false;           ==> 禁用BIND PEEK特性,该特性在10g中有

exec DBMS_STATS.SET_PARAM( 'method_opt','FOR ALL COLUMNS SIZE 1' );
commit;

9i 

?/rdbms/admin/dbmssupp

exec dbms_support.start_trace(binds=>TRUE, waits=> FALSE);

exec dbms_support.stop_trace;

exec dbms_support.start_trace_in_session(sid=>sid,serial=>ser, binds=>TRUE, waits=>FALSE);

select sid,serial# from v$SESSION WHERE ... ;

exec dbms_support.stop_trace_in_session(sid=>SID,serial=>ser);

create table mapping_table tablespace USERS as
select object_id id, owner, substr(object_name, 1, 30) name
  from dba_objects
 where object_type not in ('CONSUMER GROUP',
                           'EVALUATION CONTEXT',
                           'FUNCTION',
                           'INDEXTYPE',
                           'JAVA CLASS',
                           'JAVA DATA',
                           'JAVA RESOURCE',
                           'LIBRARY',
                           'LOB',
                           'OPERATOR',
                           'PACKAGE',
                           'PACKAGE BODY',
                           'PROCEDURE',
                           'QUEUE',
                           'RESOURCE PLAN',
                           'SYNONYM',
                           'TRIGGER',
                           'TYPE',
                           'TYPE BODY')
union all
select user_id id, username owner, null name from dba_users;

declare
  mycur dbms_sqltune.sqlset_cursor;
begin
  dbms_sqltune.create_sqlset('9i_prod_wkld');
    open mycur for
      select value(p)
      from table(dbms_sqltune.select_sql_trace(
                   directory=>'SPADIR',
                   file_name=>'%trc',
                   mapping_table_name => 'MAPPING_TABLE',
                   select_mode => dbms_sqltune.single_execution)) p;
  dbms_sqltune.load_sqlset(
    sqlset_name => '9i_prod_wkld',
    populate_cursor => mycur,
    commit_rows => 1000);

  close mycur;
end;
/

create user spadba identified by oracle;
grant dba to spadba;
grant all on dbms_sqlpa to spadba;

create public database link to10g connect to spadba identified by oracle using 'STRINGS';

var sts_task varchar2(64);
exec :sts_task:= dbms_sqlpa.create_analysis_task(task_name => '9i_11g_spa1',description => 'experiment for 9i to 11gR2 upgrade',sqlset_name=> '9i_prod_wkld');

var exe_task varchar2(64);
exec :exe_task:=dbms_sqlpa.execute_analysis_task(task_name=>'9i_11g_spa1',execution_name=>'9i_trail1',execution_type=>'CONVERT SQLSET',execution_desc=>'9i sql trail generated from sts');

dbms_sqlpa.execute_analysis_task(task_name=>'9i_11g_spa1',execution_name=>'10g_trail1',execution_type=>'TEST EXECUTE',execution_desc=>'10g trail test',-
execution_params=>dbms_advisor.arglist('DATABASE_LINK','DBLINKNAME'));

select sofar,totalwork from V$ADVISOR_PROGRESS where task_id=<TID>;

Oracle OLTP表压缩技术

OLTP压缩

 

对于启用11g OLTP压缩特性的表而言,当发生INSERT时若块内部空间的阈值未达到则不作压缩,若INSERT后达到阈值则触发压缩,如上图所示。 该压缩一般与commit/rollback无关。

若存在多个字段,则可以共享使用符号表:

 

OLTP压缩1

 

 

表空间级别指定OLTP压缩:

create tablespace TablespaceName datafile ‘……..’  default COMPRESS FOR OLTP;

表级别指定压缩:

create table OLTPCOMP (t1 int,t2 varchar2(200))                    COMPRESS FOR OLTP;

 

将表修改为COMPRESS FOR OLTP 但对现有块不压缩

alter table TableName  COMPRESS FOR OLTP;

move并将表修改为COMPRESS FOR OLTP

alter table TableName MOVE COMPRESS FOR OLTP;

 

可以通过在线重定义在线修改为 压缩:

BEGIN
  DBMS_REDEFINITION.CAN_REDEF_TABLE('SH','SALES',DBMS_REDEFINITION.CONS_USE_PK);
END;
/

create table SALES_TMP compress for oltp as select * from SALES where 1=2;
alter table SALES_TMP add primary key(col1);

BEGIN
  DBMS_REDEFINITION.START_REDEF_TABLE(
    uname => 'SH', orig_table => 'SALES', int_table  => 'SALES_TMP',
    col_mapping => NULL, options_flag => DBMS_REDEFINITION.CONS_USE_PK);
END;
/

BEGIN
  DBMS_REDEFINITION.SYNC_INTERIM_TABLE('SH','SALES', 'SALES2');
END;
/


BEGIN
  DBMS_REDEFINITION.FINISH_REDEF_TABLE('SH','SALES', 'SALES2');
END;
/

drop table SALES_TMP;


 

11gR2游标共享新特性带来的一些问题以及_cursor_features_enabled、_cursor_obsolete_threshold和106001 event

版本11gR2中引入cursor sharing游标共享和mutex互斥锁增强的一些特性,而这些特性也带来了一些问题(主要体现在版本11.2.0.1和11.2.0.2上,11.2.0.3上基本已经修复)。

Cursor Obsolescence游标废弃是一种SQL Cursor游标管理方面的增强特性,该特性启用后若parent cursor父游标名下的子游标child cursor总数超过一定的数目,则该父游标parent cursor将被废弃,同时一个新的父游标将被开始。 这样做有2点好处:

  • 避免进程去扫描长长的子游标列表child cursor list以找到一个合适的子游标child cursor
  • 废弃的游标将在一定时间内被age out,其占用的内存可以被重新利用

 

实际在版本10g中就引入了该Cursor Obsolescence游标废弃特性,当时child cursor 的总数阀值是1024, 但是这个阀值在11g中被移除了,这导致出现一个父游标下大量child cursor即high version count的发生;由此引发了一系列的版本11.2.0.3之前的cursor sharing 性能问题,主要症状是版本11.2.0.1和11.2.0.2上出现大量的Cursor: Mutex S 和 library cache lock等待事件。

增强补丁Enhancement patch《Bug 10187168 – Enhancement to obsolete parent cursors if VERSION_COUNT exceeds a threshold》就该问题引入了新的隐藏参数_cursor_obsolete_threshold(Number of cursors per parent before obsoletion.),该”_cursor_obsolete_threshold”参数用以指定子游标总数阀值,若一个父游标的child cursor count<=>version count高于”_cursor_obsolete_threshold”,则触发Cursor Obsolescence游标废弃特性。

 

注意版本11.2.0.3中默认就有”_cursor_obsolete_threshold”了,而且默认值为100。

 

对于版本11.1.0.7、11.2.0.1和11.2.0.2则都有该Bug 10187168的bug backport存在,从2011年5月开始就有相关针对的one-off backport补丁存在。 但是这些one-off backport补丁不使用”_cursor_obsolete_threshold”参数。在版本11.1.0.7、11.2.0.1和11.2.0.2上需要设置合适的”_cursor_features_enabled”(默认值为2)参数,并设置必要的106001 event,该event的level值即是child cursor count的阀值,必须设置该106001事件后该特性才生效。

但是请注意 “_cursor_features_enabled”参数需要重启实例方能生效。而”_cursor_obsolete_threshold”参数和106001 event则可以在线启用、禁用。

对于不同的版本而言,一般推荐打上最新的PSU补丁,并根据补丁的README提示或者咨询Oracle Support获得关于该版本上Cursor Obsolescence问题的信息:
针对不同版本设置 “_cursor_features_enabled”+106001 event的方法:

 

版本11.1.0.7

SQL> alter system set "_cursor_features_enabled"=18 scope=spfile;

System altered.

SQL> alter system set event='106001 trace name context forever,level 1024' scope=spfile;

System altered.

并重启实例

版本11.2.0.1

SQL> alter system set "_cursor_features_enabled"=34 scope=spfile;

System altered.

SQL> alter system set event='106001 trace name context forever,level 1024' scope=spfile;

System altered.

版本11.2.0.2

SQL> alter system set "_cursor_features_enabled"=1026 scope=spfile;

System altered.

SQL> alter system set event='106001 trace name context forever,level 1024' scope=spfile;

System altered.

11gR2 Experience -> If using cursor_sharing = “FORCE” or “SIMILAR”
1) ORA-600 errors as workload increases [kkspsc0: basehd]
or [kglLockOwnersListAppend-ovf] - applied patches to address
2) AWR showing -> cursor: mutex S and library cache lock
1. Download and apply the 11.2.0.2.3PSU Patch 11724916
2. Enable event 106001 to address Bug 10187168.
To enable the fix "_cursor_features_enabled" needs to be set
3) Oracle 11.2.0.2.2 PSU (Patch Set Update) includes new parameters that you can tweak
based on workload characteristics. Even more fixes have been added
Note: 10411618 - Enhancement to add different "Mutex" wait schemes [ID 10411618.8]
4) 11.2.0.3 Has many Mutex enhancement’s

106001 level
The level is used to specify the maximum number of child cursors 
that a parent can have before we obsolete
the parent cursor and create a new parent.
Doing the above can help reduce mutex waits, 
memory consumption and other side effects seen when we see
many child cursors for a given parent.

 

 

 

题外话是11.2.0.3中的Mutex增强了很多,不要再跟着初学者论坛的那帮家伙一起愚蠢地大喊:”虽然版本升级 8i=>9i=>10g=>11g=>12c,但是我觉得oracle里面基础、核心的东西一直都没变了”这种神话了, 你一直浮游在Oracle的表面怎么可能知道Kernel到底有多大的变化?!!

VKTM进程消耗大量CPU的问题

11g中引入了VKTM后台进程,VKTM是virtual keeper of time的缩写,该进程负责提供时钟时间(每秒更新一次)以及参考时间服务(每20ms更新一次,仅在进程高优先级情况下可用),该参考时间服务用于各种基于时间间隔的度量。  VKTM在SGA中发布这些计时信息,以便各种RDBMS Client可以廉价和快速了解时间信息。Wall-clock 时钟时间每一秒更新一次且单调递增。 而参考时间计数(Reference-time)则每20ms更新一次,且仅当VKTM运行在高优先级情况下时可用。

 

在某些环境下VKTM持续消耗较多的CPU,特别是在虚拟化的环境中例如Vmware、Vbox等; 对于这些虚拟化环境若是非产品production环境,则可以考虑将VKTM进程不要运行在高优先级上,虽然这会导致Reference-time参考时间计数不可用,但是实际不会产生必要的性能度量不可用的问题。 在11g中默认_high_priority_processes 隐藏参数指定了LMS*和VKTM运行在高优先级下,可以通过修改该参数,仅让LMS运行在高优先级下,这样VKTM所消耗的CPU将明显下降。 当让我们不建议在产品环境中这样做,如果你确实要这样做,建议优先咨询Oracle Support。

使用方法如下,注意需要重启RDBMS实例方才生效:

 

 

 

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
 FROM SYS.x$ksppi x, SYS.x$ksppcv y
 WHERE x.inst_id = USERENV ('Instance')
 AND y.inst_id = USERENV ('Instance')
 AND x.indx = y.indx
AND x.ksppinm like '%high%';

SQL> alter system set "_high_priority_processes"='LMS*' scope=spfile;

System altered.

之后重启INSTANCE

shutdown immediate;
startup;

【11gR2新特性】Large Partition Extents:_partition_large_extents & _index_partition_large_extents

11.2.0.2中引入了_partition_large_extents 的新特性,在extent size auto allocate的表空间上若创建分区表,则分区的initial extent size为8M。
试看下面的例子:

 

 

_partition_large_extents
TRUE
Enables large extent allocation for partitioned tables

_index_partition_large_extents
FALSE
Enables large extent allocation for partitioned indices

SQL> select ALLOCATION_TYPE from dba_tablespaces where tablespace_name='USERS';

ALLOCATION_TYPE
------------------
SYSTEM

@partition.sql        ==》测试用CREATE TABLE脚本,将在users表空间上创建一个有384个子分区的表

 alter session set deferred_segment_creation=false;

SQL> select count(*) from dba_tab_subpartitions where table_name='MACLEAN_PARTITION';

  COUNT(*)
----------
       384

SQL> select blocks,count(*) from dba_Extents where segment_name='MACLEAN_PARTITION' group by blocks;

    BLOCKS   COUNT(*)
---------- ----------
      1024        384

SQL> alter session set "_partition_large_extents"=false;

会话已更改。

@partition.sql

SQL> select count(*) from dba_tab_subpartitions where table_name='MACLEAN_PARTITION';

  COUNT(*)
----------
       384

SQL>  select blocks,count(*) from dba_Extents where segment_name='MACLEAN_PARTITION' group by blocks;

    BLOCKS   COUNT(*)
---------- ----------
         8        384

 

可以看到在11.2.0.2 以后_partition_large_extents参数默认打开(default : TRUE), 且当表空间allocation_type=SYSTEM时  创建的初始化分区大小为8M。

实际该特性可能在导致以下2个问题:

  • 若配合deferred_segment_creation=false使用则创建具有大量分区的表时,耗时大幅增长
  • 若配合deferred_segment_creation=false使用则建好后空表的所占空间为8M*分区数目,对比关闭该特性的情况会大非常多

 

具体还可以参考以下文档:
Initial Extent Size of a Partition Changed To 8MB From 64KB

Applies to:
Oracle Server – Enterprise Edition – Version: 11.2.0.2 and later   [Release: 11.2 and later ]
Information in this document applies to any platform.
Symptoms
A partition of a partitioned table is created with default initial extent size 8MB vs 64KB in previous versions.

Changes
Upgrade to 11.2.0.2.
Cause
Hidden parameter _partition_large_extents was introduced in 11.2.0.2 to enable large extent allocation for partitioned tables if created in autoallocate locally managed tablespaces with default extent size.
The default value for the parameter is TRUE. This hidden parameter supersedes parameter cell_partition_large_extents which was introduced in 11.2.0.1.
Solution
The hidden parameter _partition_large_extents, introduced in 11.2.0.2, affects exadata and non-exadata databases, it has the two values TRUE, FALSE. The default value is TRUE which means all partitioned objects in the db will start with 8MB extents if created in autoallocate locally managed tablespaces.

Similarly parameter _index_partition_large_extents was introduced for partitioned indexes, the default value for this one is FALSE.

Smaller size extents impacts performance of operations like load and scan, typically full table scan FTS (multi-block IO). For partitioned objects with many partitions, the fact of having many partitions leads to more small extents with autoallocation than for a nonpartitioned table; more smaller extents lead to more expensive space allocation/deallocation operations, and the I/O performance of scanning can be impacted due to more extents with smaller extent size.

The change of the default allocation size for partitioned tables was introduced together with the enhanced  deferred segment creation support for partitioned tables in 11.2.0.2 (deferred_segment_creation parameter, default TRUE).
Using both new defaults ensures that a negative impact from pre-created partitioned tables could be ruled out since empty partitions will no longer allocate any space.

If the user forcefully sets the parameter _partition_large_extents to FALSE (either at session or at instance level), then the pre-11.2.0.2 behaviour (of starting from 64k extent size) will be restored.

The user can also override the default extent allocation on a per statement base by specifying INITIAL extent in the CREATE TABLE … / ALTER TABLE … ADD PARTITION … command as well as by changing the default storage initial for the tablespace involved e.g. ALTER TABLESPACE … DEFAULT STORAGE (INITIAL 65536);
 

 

 

了解Database Replay Capture内部原理

Database Replay是11g中很酷的特性,对于workload capture的内部工作原理大家理解的不多,这里就介绍一下。

对于Workload Capture需要考虑的因素:

  • 负载捕获文件消耗定量的磁盘空间,这些捕获文件是2进程文件,无法直接阅读,有测试表明在大并发量的OLTP环境中可以达到capture 10分钟占用1G磁盘空间
  • 数据库重启:
    • 可能是保证捕获所有事务的可靠重放的唯一路径
      • 使用startup restrict启动实例,避免不完整的事务捕获
      • 启动capture会取消restrict模式
    • 基于负载类型重启不是必要的
  • 为重放目的恢复数据库的多种方法:
    • 基于scn或时间的物理恢复
    • 逻辑恢复应用数据
    • 闪回或者快照数据
  • Capture可以指定过滤器作为捕获workload子集的方法
  • 需要的权限包括SYSDBA、SYSOPER和合适的OS权限
  • 性能消耗:
    • 在TPCC测试中capture的性能损耗为4.5%
    • 对于每个session会多消耗64KB的内存
    • 必要的Workload Capture耗费文件系统磁盘空间

 

 

这里需要注意的有2点, 对于RAC集群存在workload capture  file的目录必须位于共享文件系统上,否则start_capture时会报错。

对于每个session会多消耗64KB的内存,这是由于本质上写出负载信息到workload  capture file的同样是Server Process服务进程本身,但是这种写出并非在parse解析或execution执行阶段,Server Process将其LOGON、LOGOFF、SQL执行等信息记录存放在PGA中,主要是WCR Capture PG、WCR Capture PGA中,当PGA中的工作负载历史记录达到一定数目时,Server Process本身负责写出这些数据到WCR文件中,在写出这些WCR文件时Server Process进入’WCR: capture file IO write’等待事件。

与WCR相关的等待事件另有:

 

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

SQL> select name  from v$event_name where name like '%WCR%';

NAME
----------------------------------------------------------------
WCR: replay client notify
WCR: replay clock
WCR: replay lock order
WCR: replay paused
WCR: RAC message context busy
WCR: capture file IO write
WCR: Sync context busy
latch: WCR: sync
latch: WCR: processes HT

11g中还多出了部分为WCR而设的LATCH 

  1* select name,gets from v$latch where name like '%WCR%'
SQL> /

NAME                                 GETS
------------------------------ ----------
WCR: kecu cas mem                       3
WCR: kecr File Count                   37
WCR: MMON Create dir                    1
WCR: ticker cache                       0
WCR: sync                             495
WCR: processes HT                       0
WCR: MTS VC queue                       0

7 rows selected.

 

 

我们通过如下演示具体说明Database Replay Capture的内部原理

 

 

1. 首先打开capture dbms_workload_capture.start_capture

 CREATE OR REPLACE DIRECTORY dbcapture AS '/home/oracle/dbcapture';

execute dbms_workload_capture.start_capture('CAPTURE','DBCAPTURE',default_action=>'INCLUDE');

SQL> select id,name,status,start_time,end_time,connects,user_calls,dir_path from 
dba_workload_captures where id = (select max(id) from dba_workload_captures) ;

        ID
----------
NAME
--------------------------------------------------------------------------------
STATUS                                   START_TIM END_TIME    CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
         1
CAPTURE
IN PROGRESS                              08-DEC-12                   11

        ID
----------
NAME
--------------------------------------------------------------------------------
STATUS                                   START_TIM END_TIME    CONNECTS
---------------------------------------- --------- --------- ----------
USER_CALLS
----------
DIR_PATH
--------------------------------------------------------------------------------
       167
/home/oracle/dbcapture

2. 窥探 capture file目录

[oracle@mlab2 dbcapture]$ ls -lR
.:
total 8
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 cap
drwxr-xr-x 3 oracle oinstall 4096 Dec  8 07:24 capfiles
-rw-r--r-- 1 oracle oinstall    0 Dec  8 07:24 wcr_cap_00001.start

./cap:
total 4
-rw-r--r-- 1 oracle oinstall 91 Dec  8 07:24 wcr_scapture.wmd

./capfiles:
total 4
drwxr-xr-x 12 oracle oinstall 4096 Dec  8 07:24 inst1

./capfiles/inst1:
total 40
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 08:31 aa
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ab
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ac
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ad
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ae
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 af
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ag
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ah
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 ai
drwxr-xr-x 2 oracle oinstall 4096 Dec  8 07:24 aj

./capfiles/inst1/aa:
total 316
-rw-r--r-- 1 oracle oinstall   1762 Dec  8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall  16478 Dec  8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall   1772 Dec  8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall   1821 Dec  8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall   1815 Dec  8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall   1427 Dec  8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall   1425 Dec  8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall   2398 Dec  8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec  8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:28 wcr_c6h3qh0000013.rec

./capfiles/inst1/ab:
total 0

./capfiles/inst1/ac:
total 0

./capfiles/inst1/ad:
total 0

./capfiles/inst1/ae:
total 0

./capfiles/inst1/af:
total 0

./capfiles/inst1/ag:
total 0

./capfiles/inst1/ah:
total 0

./capfiles/inst1/ai:
total 0

./capfiles/inst1/aj:
total 0

[oracle@mlab2 dbcapture]$ cd ./capfiles/inst1/aa
[oracle@mlab2 aa]$ ls -l
total 316
-rw-r--r-- 1 oracle oinstall   1762 Dec  8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall  16478 Dec  8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall   1772 Dec  8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall   1821 Dec  8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall   1815 Dec  8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall   1427 Dec  8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall   1425 Dec  8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall   2398 Dec  8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec  8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:28 wcr_c6h3qh0000013.rec
[oracle@mlab2 aa]$ ls -l |wc -l
14

以上负载目录共14个文件

3. 我们LOGON一个新的Server Process

[oracle@mlab2 ~]$ sqlplus  / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Dec 8 08:37:40 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

会多出一个wcr文件

[oracle@mlab2 aa]$ ls -ltr
total 316
-rw-r--r-- 1 oracle oinstall   1762 Dec  8 07:25 wcr_c6cdah0000001.rec
-rw-r--r-- 1 oracle oinstall  16478 Dec  8 07:28 wcr_c6cf1h0000002.rec
-rw-r--r-- 1 oracle oinstall   1772 Dec  8 07:29 wcr_c6cjdh0000004.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:29 wcr_c6cnah0000005.rec
-rw-r--r-- 1 oracle oinstall   1815 Dec  8 07:33 wcr_c6cq6h000000a.rec
-rw-r--r-- 1 oracle oinstall   1535 Dec  8 07:34 wcr_c6cxmh000000h.rec
-rw-r--r-- 1 oracle oinstall   1425 Dec  8 07:41 wcr_c6czph000000k.rec
-rw-r--r-- 1 oracle oinstall   1427 Dec  8 07:41 wcr_c6cxvh000000j.rec
-rw-r--r-- 1 oracle oinstall   1821 Dec  8 07:41 wcr_c6cpfh0000007.rec
-rw-r--r-- 1 oracle oinstall   2398 Dec  8 07:49 wcr_c6dqfh000000q.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 07:55 wcr_c6f6yh000000t.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:28 wcr_c6h3qh0000013.rec
-rw-r--r-- 1 oracle oinstall 259321 Dec  8 08:35 wcr_c6du7h000000r.rec
-rw-r--r-- 1 oracle oinstall      0 Dec  8 08:37 wcr_c6hp4h0000018.rec

但是新出现的wcr_c6hp4h0000018.rec 是空的

SQL> select spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat));

SPID
------------------------
14293

该新的服务进程的操作系统进程号为14293, 可以看到该进程的打开文件描述符,其中就包含了wcr_c6hp4h0000018.rec

[oracle@mlab2 ~]$ ls -l /proc/14293/fd
total 0
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 0 -> /dev/null
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 1 -> /dev/null
lrwx------ 1 oracle oinstall 64 Dec  8 08:39 10 -> /u01/app/oracle/product/11201/db_1/rdbms/audit/CRMV_ora_14293_1.aud
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 11 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 12 -> pipe:[34585895]
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 13 -> /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trm
l-wx------ 1 oracle oinstall 64 Dec  8 08:39 2 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 3 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 4 -> /dev/null
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 5 -> /u01/app/oracle/product/11201/db_1/rdbms/mesg/oraus.msb
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 6 -> /proc/14293/fd
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 7 -> /dev/zero
lrwx------ 1 oracle oinstall 64 Dec  8 08:39 8 -> /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec
lr-x------ 1 oracle oinstall 64 Dec  8 08:39 9 -> pipe:[34585894]

也可以通过lsof查到

[root@mlab2 ~]# lsof|grep wcr_c6hp4h0000018.rec
oracle    14293    oracle    8u      REG                8,1          0   17629644 /home/oracle/dbcapture/capfiles/inst1/aa/wcr_c6hp4h0000018.rec

可以得出一个结论,一个Server Process对应一个WCR REC文件,当Server Process LOGON时生成该文件

3.执行少量SQL语句:

SQL> select 1 from dual;

         1
----------
         1

SQL> /

         1
----------
         1

[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec

==》执行少量SQL无任何时, 不写出任何数据

执行某个超长SQL语句后,可以在下面看到其文件内容包含该WCR文件的版本号,LOGON的登录信息和所执行过的SQL语句历史,但是不包含执行计划

[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec
11.2.0.3.0
*File header info. (Shadow process='14293')
 D0576B5D710A34F4E043B201A8C0ECFE
SYS;
NLS_LANGUAGE?
AMERICAN>
NLS_TERRITORY?
AMERICA>
NLS_CURRENCY?
NLS_ISO_CURRENCY?
AMERICA>
NLS_NUMERIC_CHARACTERS?
NLS_CALENDAR?   GREGORIAN>
NLS_DATE_FORMAT?        DD-MON-RR>
NLS_DATE_LANGUAGE?
AMERICAN>
NLS_CHARACTERSET?
AL32UTF8>
NLS_SORT?
BINARY>
NLS_TIME_FORMAT?
HH.MI.SSXFF AM>
NLS_TIMESTAMP_FORMAT?
DD-MON-RR HH.MI.SSXFF AM>
NLS_TIME_TZ_FORMAT?
HH.MI.SSXFF AM TZR>
NLS_TIMESTAMP_TZ_FORMAT?
DD-MON-RR HH.MI.SSXFF AM TZR>
NLS_DUAL_CURRENCY?
NLS_SPECIAL_CHARS?
NLS_NCHAR_CHARACTERSET?
UTF8>
NLS_COMP?
BINARY>
NLS_LENGTH_SEMANTICS?
BYTE>
NLS_NCHAR_CONV_EXCP?
FALSE
(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/app/oracle/product/11201/db_1/bin/oracle)(ARGV0=oracleCRMV)(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))')(DETACH=NO))(CONNECT_DATA=(CID=(PROGRAM=sqlplus)(HOST=mlab2.oracle.com)(USER=oracle))))
,T$sqlplus@mlab2.oracle.com (TNS V1-V3)U
        tselect spid from v$process where addr = ( select paddr from v$session where sid=(select distinct sid from v$mystat))
        `       _
select 1 from dual
select 1 from dual

执行某些错误语句时甚至会记录其错误信息
[oracle@mlab2 aa]$ strings wcr_c6hp4h0000018.rec

9`9_^B
create table vva(t1 int)
`:_i
:`:_iB
`;_^
;`;_^B
create table vva(t1 int)
`_i
>`>_iB
FusC
`?_^
?`?_^B
FvWC
        _begin
for i in 1..50000 loop
execute immediate 'select 1 from dual where 2='||i;
end loop;
end;

若SERVER PROCESS LOGOFF 则会有如下信息

C`E_    B
k^2C

以上说明Server Process并不胡在parse或execution阶段写WCR文件,而是将这些数据存放在PGA中,达到一定大小阀值后才写出,所以其性能影响小,但是如果在WCR数据写出前就意外终止则,这些数据将丢失。

4. 窥视进程信息

SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump processstate 10;
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc

从processstate 文件中可以发现历史等待信息包括 WCR: capture file IO write,这是Server process在写WCR 文件

     3: waited for 'SQL*Net message to client'
        driver id=0x62657100, #bytes=0x1, =0x0
        wait_id=139 seq_num=140 snap_id=1
        wait times: snap=0.000007 sec, exc=0.000007 sec, total=0.000007 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 0.934091 sec of elapsed time
     4: waited for 'latch: shared pool'
        address=0x60106b20, number=0x133, tries=0x0
        wait_id=138 seq_num=139 snap_id=1
        wait times: snap=0.000066 sec, exc=0.000066 sec, total=0.000066 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 1.180690 sec of elapsed time
     5: waited for 'WCR: capture file IO write'
        =0x0, =0x0, =0x0
        wait_id=137 seq_num=138 snap_id=1
        wait times: snap=0.000189 sec, exc=0.000189 sec, total=0.000189 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 3.122783 sec of elapsed time
     6: waited for 'WCR: capture file IO write'
        =0x0, =0x0, =0x0
        wait_id=136 seq_num=137 snap_id=1
        wait times: snap=0.000191 sec, exc=0.000191 sec, total=0.000191 sec
        wait times: max=infinite
        wait counts: calls=0 os=0
        occurred after 3.053132 sec of elapsed time
     7: waited for 'WCR: capture file IO write'

5.窥视PGA内存信息

SQL> oradebug dump heapdump 536870917;
Statement processed.

grep WCR /u01/app/oracle/diag/rdbms/crmv/CRMV/trace/CRMV_ora_14293.trc

  Chunk     7fb1b606bfc0 sz=    65600    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6111e18 sz=     4224    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6112e98 sz=     4184    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6113ef0 sz=     4224    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90
  Chunk     7fb1b6114f70 sz=     4104    recreate  "WCR Capture PG "  latch=(nil)
  Chunk     7fb1b6115f78 sz=      160    freeable  "WCR Capture PGA"
  Chunk     7fb1b6116018 sz=     3248    freeable  "WCR Capture PGA"
  Subheap ds=0x7fb1b6115f90  heap name=  WCR Capture PG  size=           82336
HEAP DUMP heap name="WCR Capture PG"  desc=0x7fb1b6115f90
FIVE LARGEST SUB HEAPS for heap name="WCR Capture PG"   desc=0x7fb1b6115f9

PGA中存在WCR Capture PG 和WCR Capture PGA的freeable or recreate内存chunk,这些内存可以从Server Process返回给OS

  Chunk     7fb1b606bfc0 sz=    65600    freeable  "WCR Capture PG "  ds=0x7fb1b6115f90

sz=    65600=》 64k 这可能就是文档描述的64k内存,但是实际可以看到使用总量超过了64k

WCR是否可能引起内存泄露呢?:)!

6.隐藏参数

控制WCR CAPTURE的参数至少有以下2个

SQL> SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
  2   FROM SYS.x$ksppi x, SYS.x$ksppcv y
  3   WHERE x.inst_id = USERENV ('Instance')
  4   AND y.inst_id = USERENV ('Instance')
  5   AND x.indx = y.indx
  6  AND x.ksppinm in ('_capture_buffer_size','_wcr_control');

NAME                 VALUE                DESCRIB
-------------------- -------------------- ------------------------------------------------------------
_wcr_control         0                    Oracle internal test WCR parameter used ONLY for testing!
_capture_buffer_size 65536                To set the size of the PGA I/O recording buffers

其中_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
_wcr_control 控制WCR的内部行为,具体尚不清晰

7.结束capture 

SQL> execute dbms_workload_capture.finish_capture;

PL/SQL procedure successfully completed.

DBMS_WORKLOAD_CAPTURE.START_CAPTURE(): Starting database capture at 12/08/2012 07:24:54
DBMS_WORKLOAD_CAPTURE.FINISH_CAPTURE(): Stopped database capture (not all sessions  could flush their capture buffers) at 12/08/2012 10:29:49

 

 

综上所述,我们得出结论:

1. 负责写WCR WORKLOAD CAPTURE文件的不是什么后台进程,而是Server Process服务进程(前台进程)
2. 每一个server process对应一个WCR文件
3. Server Proess会写出LOGON、LOGOFF、SQL执行以及错误信息到WCR文件中
4. Server Process的写不是立即模式Immediate mode,而是将数据存放在PGA的(WCR Capture) subheap中,当数据达到一定大小或一定时间(timeout才写出一次)
5. 再次强调, Server Process的写不是立即模式Immediate mode,即capture不发生在parse或者execution阶段(很多人认为capture发生在parse时这种说法可以证明为不正确,parse并不受到capture的影响),而仅仅是将LOGON、SQL历史(不包括执行计划)放在PGA的WCR Capture内存子堆中,条件触发才写出,所以其性能损耗小,按照tpcc的测试结论为4.5%
6. 隐藏参数_capture_buffer_size 控制PGA中WCR BUFFER的SIZE,默认为64k
7. WCR Capture文件虽然是binrary 2进制文件,但并未加密,所以第三方软件理论上也可以读取该WCR capture file
8. WCR: capture file IO write等待事件是Server Process在写WCR文件
9. 执行dbms_workload_capture.finish_capture;会让现有Server Process FLUSH WCR buffer,但是就ALERT.LOG的自解释信息看”Stopped database capture (not all sessions  could flush their capture buffers)”,这种flush不是必然能回收内存的 

 

那些在11gR2中可能惹祸的新特性,一张列表帮助你摆脱升级11gR2带来的烦恼

有很多朋友因为11gR2那些潜在的特性可能给升级后系统稳定运行带来麻烦而无法鼓足升级到11gR2的勇气,实际Oracle在开发新版本RDBMS软件时引入的一些特性有很好的理念的,但是往往这些理念会给已稳定的应用环境带来变数,最显著的就是10g/9i升级到11gR2时的执行计划稳定性,此外adaptive cursor sharing 自适应游标、automatic serial direct path自动判断串行直接路径读、deferred segment creation、GC read mostly DRM…….等等的一系列特性已经在大量的案例中被证明是不适合于大量国产Application的。

 

我在这篇文章里想做的是给出一张列表,能够将11gR2的优化器optimizer特性、和其他的如上列的这些可能引起问题的特性通过参数的方式给出一张列表,你可以选择性的禁用这些特性,前提是你的Applicaiton就该特性经过充分的测试,如果没有时间或者环境来测试这些新特性,那么还不如禁用这些特性,禁用新特性的结果也仅仅是回到老版本(一般是10gR2 10.2.0.4)的默认表现上来。

 

你肯定要问:” 如果都禁用了11gR2的特性,那么我还升级做什么?”

 

回答是: 首先这里给出的是一张禁用11gR2特性列表,如果你对部分特性已经很熟悉,那么你可以选择性而非全部地禁用这些特性,如果不熟悉也测试不了,那么无畏给稳定的系统引入不确定因素。其次这里列出的仅仅是11gR2部分默认已启用的可能”惹祸”的特性, 其他的一些特性例如flashback archive、securefile,它们默认不开启,本身需要你去手动打开才会生效,并不会受到这张列表的影响。

 

注意,为了避免滥用,我仅列出这张列表的部分内容,如果你确实需要该列表,那么请去T.askmac.cn论坛下载,下载地址传送

 

 

 

REM ===============================FOR STABLE OPTIMIZER===================================

alter system set "_enable_automatic_sqltune"=false scope=both;       Automatic SQL Tuning Advisory enabled parameter  

#以下优化器参数均可以在session/system级别设置,一般优化器参数均可以在线修改            
alter session set "_serial_direct_read"=false;
alter session set "_nlj_batching_enabled" = 0; 
alter session set "_optimizer_undo_cost_change" = '10.2.0.4'; -- 11.2.0.1
alter session set "_optimizer_null_aware_antijoin" = false; -- true
alter session set "_optimizer_extend_jppd_view_types" = false; -- true
alter session set "_replace_virtual_columns" = false; -- true
alter session set "_first_k_rows_dynamic_proration" = false; -- true
alter session set "_bloom_pruning_enabled" = false; -- true
alter session set "_optimizer_multi_level_push_pred" = false; -- true
alter session set "_optimizer_group_by_placement" = false; -- true
alter session set "_optimizer_extended_cursor_sharing_rel" = none; -- simple
alter session set "_optimizer_adaptive_cursor_sharing" = false; -- true
alter session set "_optimizer_improve_selectivity" = false ; -- true
alter session set "_optimizer_enable_density_improvements" = false; -- true
alter session set "_optimizer_native_full_outer_join" = off; -- force
alter session set "_optimizer_enable_extended_stats" = false; -- true
alter session set "_nlj_batching_enabled" = 0; -- 1
alter session set "_optimizer_extended_stats_usage_control" = 255; -- 224
alter session set "_bloom_folding_enabled" = false; -- true
alter session set "_optimizer_coalesce_subqueries" = false; -- true
alter session set "_optimizer_fast_pred_transitivity" = false; -- true
alter session set "_optimizer_fast_access_pred_analysis" = false; -- true
alter session set "_optimizer_unnest_disjunctive_subq" = false; -- true
alter session set "_optimizer_unnest_corr_set_subq" = false; -- true
alter session set "_optimizer_distinct_agg_transform" = false; -- true
alter session set "_aggregation_optimization_settings" = 32; -- 0
alter session set "_optimizer_connect_by_elim_dups" = false; -- true
alter session set "_optimizer_eliminate_filtering_join" = false; -- true
alter session set "_connect_by_use_union_all" = old_plan_mode; --true
alter session set "_optimizer_join_factorization" = false; -- true
alter session set "_optimizer_use_cbqt_star_transformation" = false; -- true
alter session set "_optimizer_table_expansion" = false ; -- true
alter session set "_and_pruning_enabled" = false ; -- true
alter session set "_optimizer_distinct_placement" = false ; -- true
alter session set "_optimizer_use_feedback" = false ; -- true
alter session set "_optimizer_try_st_before_jppd" = false ; -- true

REM ===============================MEMORY===================================

alter system set "_memory_imm_mode_without_autosga"=false scope=both; 
alter system set "_enable_shared_pool_durations"=false scope=spfile;

REM ===============================SEGMENT==================================
alter system set deferred_segment_creation=false; 

.................

11g中如何禁用自动统计信息收集作业

11g中如何禁用自动统计信息收集作业?

因为11g中auto stats gather job被集成到auto task中,所以与10g中的禁用方式不一样:

 

 

SQL> select client_name,status from DBA_AUTOTASK_CLIENT;

CLIENT_NAME                                                      STATUS
---------------------------------------------------------------- --------
auto optimizer stats collection                                  ENABLED
auto space advisor                                               ENABLED
sql tuning advisor                                               ENABLED

begin
DBMS_AUTO_TASK_ADMIN.DISABLE(client_name => 'auto optimizer stats collection',
operation => NULL,
window_name => NULL);
end;
/
PL/SQL procedure successfully completed.

SQL>  select client_name,status from DBA_AUTOTASK_CLIENT;

CLIENT_NAME                                                      STATUS
---------------------------------------------------------------- --------
auto optimizer stats collection                                  DISABLED
auto space advisor                                               ENABLED
sql tuning advisor                                               ENABLED

针对11.2 RAC丢失OCR和Votedisk所在ASM Diskgroup的恢复手段

之前有同学在我的Oracle Allstarts群里讨论关于丢失包含ocr和votedisk的ASM diskgroup导致11gR2 RAC cluster无法正常启动的问题,最早我在《在11gR2 RAC中修改ASM DISK Path磁盘路径》一文中介绍了,如何在不启动CRS的情况下启动11.2中的ASM实例并实施操作, 这里也需要用到同样的命令”crsctl start crs -excl -nocrs “;

 

如果自己搞不定可以找诗檀软件专业ORACLE数据库修复团队成员帮您恢复!

诗檀软件专业数据库修复团队

服务热线 : 13764045638   QQ号:47079569    邮箱:service@parnassusdata.com

 

注意以下的恢复手段,针对ASM中单独的ocr或者单独的votedisk丢失也有效,因为11.2中普遍把ocr和votedisk存放在ASM中,而ASM的启动又依赖于ocr和votedisk,所以在丢失ocr或votedisk仍一一者都会导致cluter无法正常启动;这里我们仅仅讨论如何让CRS正常启动,如果丢失的diskgroup中还存放有数据库的话,数据的恢复不属于本篇文章的讨论范畴。

 

前提:恢复的前提是你仍有和故障前一样多的ASM LUN DISK,且你有OCR的自动备份,注意默认情况下每4个小时会自动备份一次,只要你没有删除$GI_HOME,一般都会有备份可用;不要求有votedisk备份

 

恢复场景: 利用dd命令清空ocr和votedisk所在diskgroup header,模拟diskgroup corruption:

 

 

1. 检查votedisk和 ocr备份

[root@vrh1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a853d6204bbc4feabfd8c73d4c3b3001 (/dev/asm-diskh) [SYSTEMDG]
 2. ONLINE   a5b37704c3574f0fbf21d1d9f58c4a6b (/dev/asm-diskg) [SYSTEMDG]
 3. ONLINE   36e5c51ff0294fc3bf2a042266650331 (/dev/asm-diski) [SYSTEMDG]
 4. ONLINE   af337d1512824fe4bf6ad45283517aaa (/dev/asm-diskj) [SYSTEMDG]
 5. ONLINE   3c4a349e2e304ff6bf64b2b1c9d9cf5d (/dev/asm-diskk) [SYSTEMDG]
Located 5 voting disk(s).

su - grid

[grid@vrh1 ~]$ ocrconfig -showbackup
PROT-26: Oracle Cluster Registry backup locations were retrieved from a local copy

vrh1     2012/08/09 01:59:56     /g01/11.2.0/maclean/grid/cdata/vrh-cluster/backup00.ocr

vrh1     2012/08/08 21:59:56     /g01/11.2.0/maclean/grid/cdata/vrh-cluster/backup01.ocr

vrh1     2012/08/08 17:59:55     /g01/11.2.0/maclean/grid/cdata/vrh-cluster/backup02.ocr

vrh1     2012/08/08 05:59:54     /g01/11.2.0/grid/cdata/vrh-cluster/day.ocr

vrh1     2012/08/08 05:59:54     /g01/11.2.0/grid/cdata/vrh-cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available

2. 彻底关闭所有节点上的clusterware ,OHASD 

crsctl stop has -f

3. GetAsmDH.sh ==> GetAsmDH.sh是ASM disk header的备份脚本 
请养成良好的习惯,做危险操作前备份asm header

[grid@vrh1 ~]$ ./GetAsmDH.sh 

############################################
 1) Collecting Information About the Disks:
############################################

SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 9 03:28:13 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

SQL> Connected.
SQL> SQL> SQL> SQL> SQL> SQL> SQL> 
           1           0 /dev/asm-diske
           1           1 /dev/asm-diskd
           2           0 /dev/asm-diskb
           2           1 /dev/asm-diskc
           2           2 /dev/asm-diskf
           3           0 /dev/asm-diskh
           3           1 /dev/asm-diskg
           3           2 /dev/asm-diski
           3           3 /dev/asm-diskj
           3           4 /dev/asm-diskk
SQL> SQL> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

-rw-r--r-- 1 grid oinstall 1048 Aug  9 03:28 /tmp/HC/asmdisks.lst

############################################
 2) Generating asm_diskh.sh script.
############################################

-rwx------ 1 grid oinstall 666 Aug  9 03:28 /tmp/HC/asm_diskh.sh

############################################
 3) Executing  asm_diskh.sh script to 
    generate dd  dumps.
############################################

-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_1_0.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_1_1.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_2_0.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_2_1.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_2_2.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_3_0.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_3_1.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_3_2.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_3_3.dd
-rw-r--r-- 1 grid oinstall 1048576 Aug  9 03:28 /tmp/HC/dsk_3_4.dd

############################################
 4) Compressing dd dumps in the next format:
    (asm_dd_header_all_.tar)
############################################

/tmp/HC/dsk_1_0.dd
/tmp/HC/dsk_1_1.dd
/tmp/HC/dsk_2_0.dd
/tmp/HC/dsk_2_1.dd
/tmp/HC/dsk_2_2.dd
/tmp/HC/dsk_3_0.dd
/tmp/HC/dsk_3_1.dd
/tmp/HC/dsk_3_2.dd
/tmp/HC/dsk_3_3.dd
/tmp/HC/dsk_3_4.dd
./GetAsmDH.sh: line 81: compress: command not found
ls: /tmp/HC/*.Z: No such file or directory
[grid@vrh1 ~]$

4. 使用dd 命令 破坏ocr和votedisk所在diskgroup

[root@vrh1 ~]# dd if=/dev/zero of=/dev/asm-diskh bs=1024k count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00423853 seconds, 247 MB/s

[root@vrh1 ~]# dd if=/dev/zero of=/dev/asm-diskg bs=1024k count=1 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0045179 seconds, 232 MB/s

[root@vrh1 ~]# dd if=/dev/zero of=/dev/asm-diski bs=1024k count=1 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00469976 seconds, 223 MB/s

[root@vrh1 ~]# dd if=/dev/zero of=/dev/asm-diskj bs=1024k count=1 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00344262 seconds, 305 MB/s
[root@vrh1 ~]# dd if=/dev/zero of=/dev/asm-diskk bs=1024k count=1 
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0053518 seconds, 196 MB/s

5. 在一个节点上尝试重新启动HAS

[root@vrh1 ~]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.

 

 

 

但是因为ocr和votedisk所在diskgroup丢失,所以CSS将无法正常启动,如以下日志所示:

 

 

alertvrh1.log 

[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:35:41.207
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:35:56.240
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:36:11.284
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:36:26.305
[cssd(5162)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /g01/11.2.0/grid/log/vrh1/cssd/ocssd.log
2012-08-09 03:36:41.328

ocssd.log
2012-08-09 03:40:26.662: [    CSSD][1078700352]clssnmReadDiscoveryProfile: voting file discovery string(/dev/asm*)
2012-08-09 03:40:26.662: [    CSSD][1078700352]clssnmvDDiscThread: using discovery string /dev/asm* for initial discovery
2012-08-09 03:40:26.662: [   SKGFD][1078700352]Discovery with str:/dev/asm*:

2012-08-09 03:40:26.662: [   SKGFD][1078700352]UFS discovery with :/dev/asm*:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskf:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskb:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskj:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskh:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskc:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskd:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diske:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskg:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diski:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Fetching UFS disk :/dev/asm-diskk:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]OSS discovery with :/dev/asm*:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Handle 0xdf22a0 from lib :UFS:: for disk :/dev/asm-diskf:

2012-08-09 03:40:26.665: [   SKGFD][1078700352]Handle 0xf412a0 from lib :UFS:: for disk :/dev/asm-diskb:

2012-08-09 03:40:26.666: [   SKGFD][1078700352]Handle 0xf3a680 from lib :UFS:: for disk :/dev/asm-diskj:

2012-08-09 03:40:26.666: [   SKGFD][1078700352]Handle 0xf93da0 from lib :UFS:: for disk :/dev/asm-diskh:

2012-08-09 03:40:26.667: [    CSSD][1078700352]clssnmvDiskVerify: Successful discovery of 0 disks
2012-08-09 03:40:26.667: [    CSSD][1078700352]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2012-08-09 03:40:26.667: [    CSSD][1078700352]clssnmvFindInitialConfigs: No voting files found
2012-08-09 03:40:26.667: [    CSSD][1078700352](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds

 

 

 

正式的恢复ocr和votedisk所在diskgroup的步骤如下:

 

 

1. 以-excl -nocrs 方式启动cluster,这将可以启动ASM实例 但不启动CRS

 [root@vrh1 vrh1]# crsctl start crs -excl -nocrs 

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'vrh1'
CRS-2676: Start of 'ora.mdnsd' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'vrh1'
CRS-2676: Start of 'ora.gpnpd' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'vrh1'
CRS-2672: Attempting to start 'ora.gipcd' on 'vrh1'
CRS-2676: Start of 'ora.cssdmonitor' on 'vrh1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'vrh1'
CRS-2672: Attempting to start 'ora.diskmon' on 'vrh1'
CRS-2676: Start of 'ora.diskmon' on 'vrh1' succeeded
CRS-2676: Start of 'ora.cssd' on 'vrh1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'vrh1'
CRS-2672: Attempting to start 'ora.ctssd' on 'vrh1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'vrh1'
CRS-2676: Start of 'ora.ctssd' on 'vrh1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'vrh1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'vrh1'
CRS-2676: Start of 'ora.asm' on 'vrh1' succeeded

 

 

 

2.重建原ocr和votedisk所在diskgroup,注意compatible.asm必须是11.2:

 

 

[root@vrh1 vrh1]# su - grid
[grid@vrh1 ~]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Thu Aug 9 04:16:58 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> create diskgroup systemdg high redundancy disk '/dev/asm-diskh','/dev/asm-diskg','/dev/asm-diski','/dev/asm-diskj','/dev/asm-diskk'
 ATTRIBUTE 'compatible.rdbms' = '11.2', 'compatible.asm' = '11.2';

 

 

 

3.从ocr backup中恢复ocr并做ocrcheck检验:

 

 

[root@vrh1 ~]# ocrconfig -restore /g01/11.2.0/grid/cdata/vrh-cluster/backup00.ocr

[root@vrh1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       3180
         Available space (kbytes) :     258940
         ID                       : 1238458014
         Device/File Name         :  +systemdg
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

 

 

 

4. 准备恢复votedisk ,可能会遇到下面的错误:

 

[grid@vrh1 ~]$ crsctl replace votedisk  +SYSTEMDG
CRS-4602: Failed 27 to add voting file 2e4e0fe285924f86bf5473d00dcc0388.
CRS-4602: Failed 27 to add voting file 4fa54bb0cc5c4fafbf1a9be5479bf389.
CRS-4602: Failed 27 to add voting file a109ead9ea4e4f28bfe233188623616a.
CRS-4602: Failed 27 to add voting file 042c9fbd71b54f5abfcd3ab3408f3cf3.
CRS-4602: Failed 27 to add voting file 7b5a8cd24f954fafbf835ad78615763f.
Failed to replace voting disk group with +SYSTEMDG.
CRS-4000: Command Replace failed, or completed with errors.

 

 

需要重新配置一下ASM的参数,并重启ASM:

 

 

SQL> alter system set asm_diskstring='/dev/asm*';

System altered.

SQL> create spfile from memory;

File created.

SQL>  startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /g01/11.2.0/grid/dbs/spfile+AS
                                                 M1.ora

[grid@vrh1 trace]$  crsctl replace votedisk  +SYSTEMDG
CRS-4256: Updating the profile
Successful addition of voting disk 85edc0e82d274f78bfc58cdc73b8c68a.
Successful addition of voting disk 201ffffc8ba44faabfe2efec2aa75840.
Successful addition of voting disk 6f2a25c589964faabf6980f7c5f621ce.
Successful addition of voting disk 93eb315648454f25bf3717df1a2c73d5.
Successful addition of voting disk 3737240678964f88bfbfbd31d8b3829f.
Successfully replaced voting disk group with +SYSTEMDG.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced

 

 

 

 

5. 重启has服务,检验cluster是否正常:

 

 

[root@vrh1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[root@vrh1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   85edc0e82d274f78bfc58cdc73b8c68a (/dev/asm-diskh) [SYSTEMDG]
 2. ONLINE   201ffffc8ba44faabfe2efec2aa75840 (/dev/asm-diskg) [SYSTEMDG]
 3. ONLINE   6f2a25c589964faabf6980f7c5f621ce (/dev/asm-diski) [SYSTEMDG]
 4. ONLINE   93eb315648454f25bf3717df1a2c73d5 (/dev/asm-diskj) [SYSTEMDG]
 5. ONLINE   3737240678964f88bfbfbd31d8b3829f (/dev/asm-diskk) [SYSTEMDG]
Located 5 voting disk(s).

[root@vrh1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.BACKUPDG.dg
               ONLINE  ONLINE       vrh1                                         
ora.DATA.dg
               ONLINE  ONLINE       vrh1                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       vrh1                                         
ora.LSN_MACLEAN.lsnr
               ONLINE  ONLINE       vrh1                                         
ora.SYSTEMDG.dg
               ONLINE  ONLINE       vrh1                                         
ora.asm
               ONLINE  ONLINE       vrh1                     Started             
ora.gsd
               OFFLINE OFFLINE      vrh1                                         
ora.net1.network
               ONLINE  ONLINE       vrh1                                         
ora.ons
               ONLINE  ONLINE       vrh1                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr  https://www.askmac.cn
      1        ONLINE  ONLINE       vrh1                                         
ora.cvu
      1        OFFLINE OFFLINE                                                   
ora.oc4j
      1        OFFLINE OFFLINE                                                   
ora.scan1.vip
      1        ONLINE  ONLINE       vrh1                                         
ora.vprod.db
      1        ONLINE  OFFLINE                                                   
      2        ONLINE  OFFLINE                                                   
ora.vrh1.vip
      1        ONLINE  ONLINE       vrh1                                         
ora.vrh2.vip
      1        ONLINE  INTERMEDIATE vrh1                     FAILED OVER

在11gR2 RAC中修改ASM DISK Path磁盘路径

有同学在T.askmac.cn上提问关于修改11gR2中ASM DISK的路径问题,具体问题如下:

 

aix 6.1,grid 11.2.0.3+asm11.2.0.3+rac

建数据库的时候使用的是aix自带的多路径软件mpio,建了diskgroup

现在改造成veritas dmp多路径,已经修改了asm的disk_strings=/dev/vx/rdmp/*,crs/asm启动的时候已经可以识别到磁盘/dev/vx/rdmp/开头的磁盘,但是读取不回原来的diskgroup信息。

crs启动的报错日志:
2012-07-13 15:07:29.748: [ GPNP][1286]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2108 clsgpnp_profileCallUrlInt] get-profile call to url “ipc://GPNPD_ggtest1” disco “” [f=0 claimed- host: cname: seq: auth:]
2012-07-13 15:07:29.762: [ GPNP][1286]clsgpnp_profileCallUrlInt: [at clsgpnp.c:2236 clsgpnp_profileCallUrlInt] Result: (0) CLSGPNP_OK. Successful get-profile CALL to remote “ipc://GPNPD_ggtest1” disco “”
2012-07-13 15:07:29.762: [ CSSD][1286]clssnmReadDiscoveryProfile: voting file discovery string(/dev/vx/rdmp/*)
2012-07-13 15:07:29.762: [ CSSD][1286]clssnmvDDiscThread: using discovery string /dev/vx/rdmp/* for initial discovery
2012-07-13 15:07:29.762: [ SKGFD][1286]Discovery with str:/dev/vx/rdmp/*:

2012-07-13 15:07:29.762: [ SKGFD][1286]UFS discovery with :/dev/vx/rdmp/*:

2012-07-13 15:07:29.769: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_919:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_212:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_211:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_210:

2012-07-13 15:07:29.770: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_209:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_181:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/v_df8000_180:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_3:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_2:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_1:

2012-07-13 15:07:29.771: [ SKGFD][1286]Fetching UFS disk :/dev/vx/rdmp/disk_0:

2012-07-13 15:07:29.771: [ SKGFD][1286]OSS discovery with :/dev/vx/rdmp/*:

2012-07-13 15:07:29.771: [ SKGFD][1286]Handle 1115e7510 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_916:

2012-07-13 15:07:29.772: [ SKGFD][1286]Handle 1118758b0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_912:

2012-07-13 15:07:29.773: [ SKGFD][1286]Handle 1118d9cf0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_908:

2012-07-13 15:07:29.773: [ SKGFD][1286]Handle 1118da450 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_904:

2012-07-13 15:07:29.773: [ SKGFD][1286]Handle 1118dad70 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_903:

2012-07-13 15:07:29.802: [ CLSF][1286]checksum failed for disk:/dev/vx/rdmp/v_df8000_916:
2012-07-13 15:07:29.803: [ SKGFD][1286]Lib :UFS:: closing handle 1115e7510 for disk :/dev/vx/rdmp/v_df8000_916:

2012-07-13 15:07:29.803: [ SKGFD][1286]Lib :UFS:: closing handle 1118758b0 for disk :/dev/vx/rdmp/v_df8000_912:

2012-07-13 15:07:29.804: [ SKGFD][1286]Handle 1115e6710 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_202:

2012-07-13 15:07:29.808: [ SKGFD][1286]Handle 1115e7030 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_201:

2012-07-13 15:07:29.809: [ SKGFD][1286]Handle 1115e7ad0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_200:

2012-07-13 15:07:29.809: [ SKGFD][1286]Handle 1118733f0 from lib :UFS:: for disk :/dev/vx/rdmp/v_df8000_199:

2012-07-13 15:07:29.816: [ CLSF][1286]checksum failed for disk:/dev/vx/rdmp/v_df8000_186:
2012-07-13 15:07:29.816: [ SKGFD][1286]Lib :UFS:: closing handle 1118de5d0 for disk :/dev/vx/rdmp/v_df8000_186:

2012-07-13 15:07:29.816: [ CSSD][1286]clssnmvDiskVerify: Successful discovery of 0 disks
2012-07-13 15:07:29.816: [ CSSD][1286]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2012-07-13 15:07:29.816: [ CSSD][1286]clssnmvFindInitialConfigs: No voting files found
2012-07-13 15:07:29.816: [ CSSD][1286](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2012-07-13 15:07:30.169: [ CSSD][1029]clssgmExecuteClientRequest(): type(37) size(80) only connect and exit messages are allowed before lease acquisition proc(1115e4870) client(0)

 

 

其需求为修改ASM DISK PATH的磁盘设备路径,但是由于11gR2 RAC+ASM的特殊性,导致CRS无法正常启动,虽然使用crsctl start crs -excl -nocrs的方式可以启动CSS服务和ASM实例,

 

但是最后还是报(clssnmCompleteInitVFDiscovery: Voting file not found),这是由于Voteing file投票磁盘的位置没有合理更新导致的。

 

 

这里我们来学习一下,如何正确的修改11gR2 RAC+ASM下的ASM DISK路径:

 

 

1.我们来营造一个修改ASM DISK路径的环境,这里我们使用UDEV的设备名绑定服务,并利用UDEV将原本的ASM DISK从/dev/asm-disk* 修改为 /dev/rasm-disk*的形式, 这只需要修改udev rule文件即可实现:

 

 

[grid@maclean1 ~]$ export  ORACLE_HOME=/g01/grid/app/11.2.0/grid

[grid@maclean1 ~]$ /g01/grid/app/11.2.0/grid/bin/sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 15 04:09:28 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> show parameter diskstri 

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskstring                       string      /dev/asm*

 

可以看到当前ASM实例使用的asm_diskstring 为/dev/asm*, 切换到root用户修改UDEV RULE文件 :

 

 

[root@maclean1 rules.d]# cp 99-oracle-asmdevices.rules  99-oracle-asmdevices.rules.bak
[root@maclean1 rules.d]# vi 99-oracle-asmdevices.rules

[root@maclean1 rules.d]# cat 99-oracle-asmdevices.rules

KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB09cadb31-cfbea255_", NAME="rasm-diskb", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB5f097069-59efb82f_", NAME="rasm-diskc", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB4e1a81c0-20478bc4_", NAME="rasm-diskd", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VBdcce9285-b13c5a27_", NAME="rasm-diske", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB82effe1a-dbca7dff_", NAME="rasm-diskf", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB950d279f-c581cb51_", NAME="rasm-diskg", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB14400d81-651672d7_", NAME="rasm-diskh", OWNER="grid", GROUP="asmadmin", MODE="0660"
KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="SATA_VBOX_HARDDISK_VB31b1237b-78aa22bb_", NAME="rasm-diski", OWNER="grid", GROUP="asmadmin", MODE="0660"

 

 

以上修改了原始99-oracle-asmdevices.rules的UDEV RULE规则文件,生成的设备名被修改为/dev/rasm-disk*的形式,不同于之前的ASM DISK设备名, 这要求我们后续的一系列操作来保证RAC CRS可以正常启动。

 

当前运行时的votedisk和ocr 存放位置:

 

[root@maclean1 rules.d]# /g01/grid/app/11.2.0/grid/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   6896bfc3d1464f9fbf0ea9df87e023ad (/dev/asm-diskb) [SYSTEMDG]
 2. ONLINE   58eb81b656084ff2bfd315d9badd08b7 (/dev/asm-diskc) [SYSTEMDG]
 3. ONLINE   6bf7324625c54f3abf2c942b1e7f70d9 (/dev/asm-diskd) [SYSTEMDG]
 4. ONLINE   43ad8ae20c354f5ebf7083bc30bf94cc (/dev/asm-diske) [SYSTEMDG]
 5. ONLINE   4c225359d51b4f93bfba01080664b3d7 (/dev/asm-diskf) [SYSTEMDG]
Located 5 voting disk(s).

[root@maclean1 rules.d]# /g01/grid/app/11.2.0/grid/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2844
         Available space (kbytes) :     259276
         ID                       :  879001605
         Device/File Name         :  +SYSTEMDG
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

 

 

因为votedisk file的位置具体指向一个ASM DISK,所以后续我们会需要crsctl replace votedisk, 我们现在重启LINUX OS:

 

 

[root@maclean1 rules.d]# init 6

rebooting ............

[root@maclean1 dev]# ls -l *asm*
brw-rw---- 1 grid asmadmin 8,  16 Jul 15 04:15 rasm-diskb
brw-rw---- 1 grid asmadmin 8,  32 Jul 15 04:15 rasm-diskc
brw-rw---- 1 grid asmadmin 8,  48 Jul 15 04:15 rasm-diskd
brw-rw---- 1 grid asmadmin 8,  64 Jul 15 04:15 rasm-diske
brw-rw---- 1 grid asmadmin 8,  80 Jul 15 04:15 rasm-diskf
brw-rw---- 1 grid asmadmin 8,  96 Jul 15 04:15 rasm-diskg
brw-rw---- 1 grid asmadmin 8, 112 Jul 15 04:15 rasm-diskh
brw-rw---- 1 grid asmadmin 8, 128 Jul 15 04:15 rasm-diski

 

 

重启后自动获得了形如/dev/rasm-disk*的ASM DISK,查阅ASM日志可以发现css服务仍搜索/dev/asm*路径来获得ASM DISK,但是这将导致找不到任何有效的ASM DISK:

 

more /g01/grid/app/11.2.0/grid/log/maclean1/cssd/ocssd.log

2012-07-15 04:17:45.208: [ SKGFD][1099548992]Discovery with str:/dev/asm*:

2012-07-15 04:17:45.208: [   SKGFD][1099548992]UFS discovery with :/dev/asm*:

2012-07-15 04:17:45.208: [   SKGFD][1099548992]OSS discovery with :/dev/asm*:

2012-07-15 04:17:45.208: [ CSSD][1099548992]clssnmvDiskVerify: Successful discovery of 0 disks
2012-07-15 04:17:45.208: [    CSSD][1099548992]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2012-07-15 04:17:45.208: [    CSSD][1099548992]clssnmvFindInitialConfigs: No voting files found
2012-07-15 04:17:45.208: [    CSSD][1099548992](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. 
Retrying discovery in 15 seconds
2012-07-15 04:17:45.251: [    CSSD][1096661312]clssgmExecuteClientRequest(): type(37) size(80) only connect and 
exit messages are allowed before lease acquisition proc(0x26a8ba0) client((nil))
2012-07-15 04:17:45.251: [    CSSD][1096661312]clssgmDeadProc: proc 0x26a8ba0
2012-07-15 04:17:45.251: [    CSSD][1096661312]clssgmDestroyProc: cleaning up proc(0x26a8ba0) con(0xfe6) skgpid  
ospid 3751 with 0 clients, refcount 0
2012-07-15 04:17:45.252: [    CSSD][1096661312]clssgmDiscEndpcl: gipcDestroy 0xfe6
2012-07-15 04:17:45.829: [    CSSD][1096661312]clssscSelect: cookie accept request 0x2318ea0
2012-07-15 04:17:45.829: [    CSSD][1096661312]clssgmAllocProc: (0x2659480) allocated
2012-07-15 04:17:45.830: [    CSSD][1096661312]clssgmClientConnectMsg: properties of cmProc 0x2659480 - 1,2,3,4,5
2012-07-15 04:17:45.830: [    CSSD][1096661312]clssgmClientConnectMsg: Connect from con(0x114e) proc(0x2659480) pid(3751) 
version 11:2:1:4, properties: 1,2,3,4,5
2012-07-15 04:17:45.830: [    CSSD][1096661312]clssgmClientConnectMsg: msg flags 0x0000
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssscSelect: cookie accept request 0x253ddd0
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssscevtypSHRCON: getting client with cmproc 0x253ddd0
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssgmRegisterClient: proc(3/0x253ddd0), client(61/0x26877b0)
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are 
allowed before lease acquisition proc(0x253ddd0) client(0x26877b0)
2012-07-15 04:17:45.939: [    CSSD][1096661312]clssgmDiscEndpcl: gipcDestroy 0x1174
2012-07-15 04:17:46.070: [    CSSD][1096661312]clssscSelect: cookie accept request 0x26368a0
2012-07-15 04:17:46.070: [    CSSD][1096661312]clssscevtypSHRCON: getting client with cmproc 0x26368a0
2012-07-15 04:17:46.070: [    CSSD][1096661312]clssgmRegisterClient: proc(5/0x26368a0), client(50/0x26877b0)

 

 

由于11gR2中CRS服务依赖于ASM,因为ocr存放在ASM中,所以ASM若无法有效启动,这导致CRS服务也无法正常工作:

[root@maclean1 ~]# crsctl check has
CRS-4638: Oracle High Availability Services is online

[root@maclean1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

 

 

2. 以上是修改ASM DISK PATH磁盘路径的现状,我们需要通过以下操作来恢复CRS:

首先彻底关闭OHASD服务:

 

[root@maclean1 ~]# crsctl stop has -f 

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'maclean1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'maclean1'
CRS-2673: Attempting to stop 'ora.crf' on 'maclean1'
CRS-2677: Stop of 'ora.mdnsd' on 'maclean1' succeeded
CRS-2677: Stop of 'ora.crf' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'maclean1'
CRS-2677: Stop of 'ora.gipcd' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'maclean1'
CRS-2677: Stop of 'ora.gpnpd' on 'maclean1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'maclean1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

 

 

3. 以-excl -nocrs方式启动CRS,这将仅启动ASM 实例而不会启动CRS服务:

 

 

 [root@maclean1 ~]# crsctl start crs -excl -nocrs 

CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'maclean1'
CRS-2676: Start of 'ora.mdnsd' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'maclean1'
CRS-2676: Start of 'ora.gpnpd' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'maclean1'
CRS-2672: Attempting to start 'ora.gipcd' on 'maclean1'
CRS-2676: Start of 'ora.cssdmonitor' on 'maclean1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'maclean1'
CRS-2672: Attempting to start 'ora.diskmon' on 'maclean1'
CRS-2676: Start of 'ora.diskmon' on 'maclean1' succeeded
CRS-2676: Start of 'ora.cssd' on 'maclean1' succeeded
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'maclean1'
CRS-2672: Attempting to start 'ora.ctssd' on 'maclean1'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'maclean1'
CRS-2676: Start of 'ora.ctssd' on 'maclean1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'maclean1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'maclean1'
CRS-2676: Start of 'ora.asm' on 'maclean1' succeeded	

#建议同时修改CRS_HOME所在的ORACLE_BASE为777权限,避免可能的问题
[root@maclean1 ~]# chmod 777 /g01

 

 

 

4.修改ASM实例的disk_strings为当前的ASM DISK PATH信息:

 

[root@maclean1 ~]# su - grid

[grid@maclean1 ~]$ sqlplus  / as sysasm

SQL*Plus: Release 11.2.0.3.0 Production on Sun Jul 15 04:40:40 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter system set asm_diskstring='/dev/rasm*';

System altered.

SQL> alter diskgroup systemdg mount;

Diskgroup altered.

SQL> create spfile from memory;

File created.

SQL> startup force mount;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted

SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /g01/grid/app/11.2.0/grid/dbs/
                                                 spfile+ASM1.ora

SQL> show parameter disk

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
asm_diskgroups                       string      SYSTEMDG
asm_diskstring                       string      /dev/rasm*

SQL> create pfile from spfile;

File created.

SQL> create spfile='+SYSTEMDG' from pfile;

File created.

SQL> startup force;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ASM instance started

Total System Global Area  283930624 bytes
Fixed Size                  2227664 bytes
Variable Size             256537136 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL> show parameter spfile

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      +SYSTEMDG/maclean-cluster/asmp
                                                 arameterfile/registry.253.7886
                                                 82933

 

 

以上成功修改了asm_diskstring ,且更新了ASM DISKGROUP上的SPFILE , 由于ASM使用共享的SPFILE所以其他节点上一般无需在做其他操作。

 

5. crsctl replace votedisk 命令将votedisk重置位置:

 

 

[root@maclean1 ~]# crsctl replace votedisk +systemdg

Successful addition of voting disk 864a00efcfbe4f42bfd0f4f6b60472a0.
Successful addition of voting disk ab14d6e727614f29bf53b9870052a5c8.
Successful addition of voting disk 754c03c168854f46bf2daee7287bf260.
Successful addition of voting disk 9ed58f37f3e84f28bfcd9b101f2af9f3.
Successful addition of voting disk 4ce7b7c682364f12bf4df5ce1fb7814e.
Successfully replaced voting disk group with +systemdg.
CRS-4266: Voting file(s) successfully replaced		

[root@maclean1 ~]# crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   864a00efcfbe4f42bfd0f4f6b60472a0 (/dev/rasm-diskb) [SYSTEMDG]
 2. ONLINE   ab14d6e727614f29bf53b9870052a5c8 (/dev/rasm-diskc) [SYSTEMDG]
 3. ONLINE   754c03c168854f46bf2daee7287bf260 (/dev/rasm-diskd) [SYSTEMDG]
 4. ONLINE   9ed58f37f3e84f28bfcd9b101f2af9f3 (/dev/rasm-diske) [SYSTEMDG]
 5. ONLINE   4ce7b7c682364f12bf4df5ce1fb7814e (/dev/rasm-diskf) [SYSTEMDG]
Located 5 voting disk(s).

[root@maclean1 ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          3
         Total space (kbytes)     :     262120
         Used space (kbytes)      :       2844
         Available space (kbytes) :     259276
         ID                       :  879001605
         Device/File Name         :  +SYSTEMDG
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

 

 

以上replace了votedisk到新的 ASM DISK上,并确认votedisk和OCR均为可用状态。

 

6.重启CRS服务:

 

 

[root@maclean1 ~]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'maclean1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'maclean1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'maclean1'
CRS-2673: Attempting to stop 'ora.asm' on 'maclean1'
CRS-2677: Stop of 'ora.mdnsd' on 'maclean1' succeeded
CRS-2677: Stop of 'ora.asm' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'maclean1'
CRS-2677: Stop of 'ora.ctssd' on 'maclean1' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'maclean1'
CRS-2677: Stop of 'ora.cssd' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'maclean1'
CRS-2677: Stop of 'ora.gipcd' on 'maclean1' succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on 'maclean1'
CRS-2677: Stop of 'ora.gpnpd' on 'maclean1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'maclean1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@maclean1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.BACKUPDG.dg
               ONLINE  ONLINE       maclean1                                     
ora.DATA.dg
               ONLINE  ONLINE       maclean1                                     
ora.LISTENER.lsnr
               ONLINE  ONLINE       maclean1                                     
ora.SYSTEMDG.dg
               ONLINE  ONLINE       maclean1                                     
ora.asm
               ONLINE  ONLINE       maclean1                 Started             
ora.gsd
               OFFLINE OFFLINE      maclean1                                     
ora.net1.network
               ONLINE  ONLINE       maclean1                                     
ora.ons
               ONLINE  ONLINE       maclean1                                     
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       maclean1                                     
ora.cvu
      1        ONLINE  ONLINE       maclean1                                     
ora.maclean1.vip
      1        ONLINE  ONLINE       maclean1                                     
ora.maclean2.vip
      1        ONLINE  INTERMEDIATE maclean1                 FAILED OVER         
ora.oc4j
      1        ONLINE  OFFLINE                               STARTING            
ora.prod.db
      1        ONLINE  OFFLINE                               Instance Shutdown,S 
                                                             TARTING             
      2        ONLINE  OFFLINE                                                   
ora.scan1.vip
      1        ONLINE  ONLINE       maclean1

 

 

因为上面更新了ASM共享使用的SPFILE,所以其他节点上一般不会存在问题,直接重启后CRS即可正常工作。

 

由于11gR2 RAC+ASM的启动依赖较为复杂,虽然通过以上操作但你仍可能在修改ASM DISK PATH路径时遇到一些问题, 若无法解决请去该问题的原帖提问,谢谢配合!

 

沪ICP备14014813号-2

沪公网安备 31010802001379号