log file sync

从Exadata Storage Software 11.2.2.4开始引入了Exadata Smart Flash Logging的新特性，该特性允许LGWR进程将redo同时并行写入flash cache 和 disk controller 中，只要在flash cache 和 disk controller中有任意一个率先写完成就会通知RDBMS数据库继续工作，该特性着眼于改善了Exadata对Redo写入的响应时间和吞吐量。(The feature allows redo writes to be written to both flash cache and disk controller cache, with an acknowledgement sent to the RDBMS as soon as either of these writes complete; this improves response times and thoughput.)；特别有利于改善log file sync等待事件是主要性能瓶颈的系统。

当频繁写redo重做日志的IO成为Exadata一体机的主要性能瓶颈的时候，Exadata开发部门自然想到了通过DBM上已有的flashcache来减少响应时间的办法。但是又要保证不会因为flashcache的引入而导致redo关键日志信息的丢失：

The main problem that number of writes in redo logs is very high, even there are no activity in database. Therefore Flash Cache disk will reach his write limit very fast – some days or months (I am not see exact test results). In this way you will lost flash cache disk and all data on them. But losing redo logs is very unpleasant case of database unavailability, which can lie in big downtime and possible data loss.
As you already know, 11.2.2.4.0 introduced the Smart Flash Log feature. For customers that are not in 11.2.2.4.0, don’t suggest putting the redo logs on a diskgroup that uses griddisks carved from flashdisks. There are different issues when using redo logs on the flashcache in previous versions and those should be avoided.

只要是安装过Exadata Storage Software 11.2.2.4补丁的系统都会隐式地启用该Exadata Smart Flash Logging特性，但是它同时也要求数据库版本要大于Database 11.2.0.2 Bundle Patch 11。

Metalink目前没有介绍如何在已经启用Exadata Smart Flash Logging的DBM上禁用(disable)该特性。

实际每个cell会分配512MB的flashcache用于Smart Flash Logging，因此现在每个cell的可用flash空间为 364.75Gb 。

不仅局限于Online Redo Log可以受益于Smart Flash Logging，Standby Redo Log 也可以从该特性中得到性能提升，前提是满足必要的软件版本组合cell patch 11.2.2.4 and Database 11.2.0.2 Bundle Patch 11 or greate。

可以通过CellCLI 命令行了解现有的Smart Flash Logging配置，若有输出则说明配置了Smart Flash Logging。

CellCLI> LIST FLASHLOG DETAIL

更多信息可以参考文档”Exadata Smart Flash Logging Explained”，引用如下:

Smart Flash Logging works as follows. When receiving a redo log write request, Exadata will do
parallel writes to the on-disk redo logs as well as a small amount of space reserved in the flash
hardware. When either of these writes has successfully completed the database will be
immediately notified of completion. If the disk drives hosting the logs experience slow response
times, then the Exadata Smart Flash Cache will provide a faster log write response time.
Conversely, if the Exadata Smart Flash Cache is temporarily experiencing slow response times
(e.g., due to wear leveling algorithms), then the disk drive will provide a faster response time.
Given the speed advantage the Exadata flash hardware has over disk drives, log writes should be
written to Exadata Smart Flash Cache, almost all of the time, resulting in very fast redo write
performance. This algorithm will significantly smooth out redo write response times and provide
overall better database performance.

The Exadata Smart Flash Cache is not used as a permanent store for redo data – it is just a
temporary store for the purpose of providing fast redo write response time. The Exadata Smart
Flash Cache is a cache for storing redo data until this data is safely written to disk. The Exadata
Storage Server comes with a substantial amount of flash storage. A small amount is allocated for
database logging and the remainder will be used for caching user data. The best practices and
configuration of redo log sizing, duplexing and mirroring do not change when using Exadata
Smart Flash Logging. Smart Flash Logging handles all crash and recovery scenarios without
requiring any additional or special administrator intervention beyond what would normally be
needed for recovery of the database from redo logs. From an end user perspective, the system
behaves in a completely transparent manner and the user need not be aware that flash is being
used as a temporary store for redo. The only behavioral difference will be consistently low
latencies for redo log writes.

By default, 512 MB of the Exadata flash is allocated to Smart Flash Logging. Relative to the 384
GB of flash in each Exadata cell this is an insignificant investment for a huge performance
benefit. This default allocation will be sufficient for most situations. Statistics are maintained to
indicate the number and frequency of redo writes serviced by flash and those that could not be
serviced, due to, for example, insufficient flash space being allocated for Smart Flash Logging.
For a database with a high redo generation rate, or when many databases are consolidated on to
one Exadata Database Machine, the size of the flash allocated to Smart Flash Logging may need
to be enlarged. In addition, for consolidated deployments, the Exadata I/O Resource Manager
(IORM) has been enhanced to enable or disable Smart Flash Logging for the different databases
running on the Database Machine, reserving flash for the most performance critical databases.

以及<Exadata Smart Flash Cache Features and the Oracle Exadata Database Machine>官方白皮书，公开的文档地址：

[gview file=”http://www.oracle.com/technetwork/database/exadata/exadata-smart-flash-cache-366203.pdf”]

Script

set echo off
set feedback off
column timecol new_value timestamp
column spool_extension new_value suffix
select to_char(sysdate,'Mondd_hh24mi') timecol,
'.out' spool_extension from sys.dual;
column output new_value dbname
select value || '_' output
from v$parameter where name = 'db_name';
spool lfsdiag_&&dbname&&timestamp&&suffix
set trim on
set trims on
set lines 130
set pages 100
set verify off
alter session set optimizer_features_enable = '10.2.0.4';

PROMPT LFSDIAG DATA FOR &&dbname&&timestamp
PROMPT Note: All timings are in milliseconds (1000 milliseconds = 1 second)

PROMPT
PROMPT IMPORTANT PARAMETERS RELATING TO LOG FILE SYNC WAITS:
column name format a40 wra
column value format a40 wra
select inst_id, name, value from gv$parameter
where ((value is not null and name like '%log_archive%') or
name like '%commit%' or name like '%event=%' or name like '%lgwr%')
and name not in (select name from gv$parameter where (name like '%log_archive_dest_state%'
and value = 'enable') or name = 'log_archive_format')
order by 1,2,3;

PROMPT
PROMPT HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
PROMPT
PROMPT APPROACH: Look at the wait distribution for log file sync waits
PROMPT by looking at "wait_time_milli". Look at the high wait times then
PROMPT see if you can correlate those with other related wait events.
column event format a40 wra
select inst_id, event, wait_time_milli, wait_count
from gv$event_histogram
where event in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way') or
event like '%LGWR%' or event like '%LNS%'
order by 2 desc,1,3;

PROMPT
PROMPT ORDERED BY WAIT_TIME_MILLI
select inst_id, event, wait_time_milli, wait_count
from gv$event_histogram
where event in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or event like '%LGWR%' or event like '%LNS%'
order by 3,1,2 desc;

PROMPT
PROMPT REDO WRITE STATS
PROMPT
PROMPT "redo write time" in centiseconds (100 per second)
PROMPT 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
PROMPT 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
column value format 99999999999999999999
column milliseconds format 99999999999999.999
select v.version, ss.inst_id, ss.name, ss.value,
decode(substr(version,1,4),
'11.1',decode (name,'redo write time',value*10,
'redo write broadcast ack time',value*10),
'11.2',decode (name,'redo write time',value*10,
'redo write broadcast ack time',value/1000),
decode (name,'redo write time',value*10)) milliseconds
from gv$sysstat ss, v$instance v
where name like 'redo write%' and value > 0
order by 1,2,3;

PROMPT
PROMPT ASH THRESHOLD...
PROMPT
PROMPT This will be the threshold in milliseconds for average log file sync
PROMPT times. This will be used for the next queries to look for the worst
PROMPT 'log file sync' minutes. Any minutes that have an average log file
PROMPT sync time greater than the threshold will be analyzed further.
column threshold_in_ms new_value threshold format 999999999.999
select min(threshold_in_ms) threshold_in_ms
from (select inst_id, to_char(sample_time,'Mondd_hh24mi') minute,
avg(time_waited)/1000 threshold_in_ms
from gv$active_session_history
where event = 'log file sync'
group by inst_id,to_char(sample_time,'Mondd_hh24mi')
order by 3 desc)
where rownum <= 5;

PROMPT
PROMPT ASH WORST MINUTES FOR LOG FILE SYNC WAITS:
PROMPT
PROMPT APPROACH: These are the minutes where the avg log file sync time
PROMPT was the highest (in milliseconds).
column event format a30 tru
column program format a35 tru
column total_wait_time format 999999999999.999
column avg_time_waited format 999999999999.999
select to_char(sample_time,'Mondd_hh24mi') minute, inst_id, event,
sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
avg(time_waited)/1000 AVG_TIME_WAITED
from gv$active_session_history
where event = 'log file sync'
group by to_char(sample_time,'Mondd_hh24mi'), inst_id, event
having avg(time_waited)/1000 > &&threshold
order by 1,2;

PROMPT
PROMPT ASH LFS BACKGROUND PROCESS WAITS DURING WORST MINUTES:
PROMPT
PROMPT APPROACH: What is LGWR doing when 'log file sync' waits
PROMPT are happening? LMS info may be relevent for broadcast
PROMPT on commit and LNS data may be relevant for dataguard.
PROMPT If more details are needed see the ASH DETAILS FOR WORST
PROMPT MINUTES section at the bottom of the report.
column inst format 999
column event format a30 tru
column program format a35 wra
select to_char(sample_time,'Mondd_hh24mi') minute, inst_id inst, program, event,
sum(time_waited)/1000 TOTAL_WAIT_TIME , count(*) WAITS,
avg(time_waited)/1000 AVG_TIME_WAITED
from gv$active_session_history
where to_char(sample_time,'Mondd_hh24mi') in (select to_char(sample_time,'Mondd_hh24mi')
from gv$active_session_history
where event = 'log file sync'
group by to_char(sample_time,'Mondd_hh24mi'), inst_id
having avg(time_waited)/1000 > &&threshold)
and (program like '%LGWR%' or program like '%LMS%' or
program like '%LNS%' or event = 'log file sync')
group by to_char(sample_time,'Mondd_hh24mi'), inst_id, program, event
order by 1,2,3,5 desc, 4;

PROMPT
PROMPT AWR WORST AVG LOG FILE SYNC SNAPS:
PROMPT
PROMPT APPROACH: These are the AWR snaps where the average 'log file sync'
PROMPT times were the highest.
column begin format a12 tru
column end format a12 tru
column name format a13 tru
select dhs.snap_id, dhs.instance_number inst, to_char(dhs.begin_interval_time,'Mondd_hh24mi') BEGIN,
to_char(dhs.end_interval_time,'Mondd_hh24mi') END,
en.name, se.time_waited_micro/1000 total_wait_time, se.total_waits,
se.time_waited_micro/1000 / se.total_waits avg_time_waited
from dba_hist_snapshot dhs, wrh$_system_event se, v$event_name en
where (dhs.snap_id = se.snap_id and dhs.instance_number = se.instance_number)
and se.event_id = en.event_id and en.name = 'log file sync' and
dhs.snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,2;

PROMPT
PROMPT AWR REDO WRITE STATS
PROMPT
PROMPT "redo write time" in centiseconds (100 per second)
PROMPT 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
PROMPT 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
column stat_name format a30 tru
select v.version, ss.snap_id, ss.instance_number inst, sn.stat_name, ss.value,
decode(substr(version,1,4),
'11.1',decode (stat_name,'redo write time',value*10,
'redo write broadcast ack time',value*10),
'11.2',decode (stat_name,'redo write time',value*10,
'redo write broadcast ack time',value/1000),
decode (stat_name,'redo write time',value*10)) milliseconds
from wrh$_sysstat ss, wrh$_stat_name sn, v$instance v
where ss.stat_id = sn.stat_id
and sn.stat_name like 'redo write%' and ss.value > 0
and ss.snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,2,3;

PROMPT
PROMPT AWR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs:
PROMPT
PROMPT APPROACH: These are the AWR snaps where the average 'log file sync'
PROMPT times were the highest. Look at related waits at those times.
column name format a40 tru
select se.snap_id, se.instance_number inst, en.name,
se.total_waits, se.time_waited_micro/1000 total_wait_time,
se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and (en.name in
('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or en.name like '%LGWR%' or en.name like '%LNS%')
and se.snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1, 6 desc;

PROMPT
PROMPT AWR HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS FOR WORST LFS AWRs:
PROMPT Note: This query won't work on 10.2 - ORA-942
PROMPT
PROMPT APPROACH: Look at the wait distribution for log file sync waits
PROMPT by looking at "wait_time_milli". Look at the high wait times then
PROMPT see if you can correlate those with other related wait events.
select eh.snap_id, eh.instance_number inst, en.name, eh.wait_time_milli, eh.wait_count
from wrh$_event_histogram eh, v$event_name en
where eh.event_id = en.event_id and
(en.name in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or en.name like '%LGWR%' or en.name like '%LNS%')
and snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,3 desc,2,4;

PROMPT
PROMPT ORDERED BY WAIT_TIME_MILLI
PROMPT Note: This query won't work on 10.2 - ORA-942
select eh.snap_id, eh.instance_number inst, en.name, eh.wait_time_milli, eh.wait_count
from wrh$_event_histogram eh, v$event_name en
where eh.event_id = en.event_id and
(en.name in ('log file sync','gcs log flush sync',
'log file parallel write','wait for scn ack',
'log file switch completion','gc cr grant 2-way',
'gc buffer busy','gc current block 2-way')
or en.name like '%LGWR%' or en.name like '%LNS%')
and snap_id in (select snap_id from (
select se.snap_id, se.time_waited_micro/1000 / se.total_waits avg_time_waited
from wrh$_system_event se, v$event_name en
where se.event_id = en.event_id and en.name = 'log file sync'
order by avg_time_waited desc)
where rownum < 4)
order by 1,4,2,3 desc;

PROMPT
PROMPT ASH DETAILS FOR WORST MINUTES:
PROMPT
PROMPT APPROACH: If you cannot determine the problem from the data
PROMPT above, you may need to look at the details of what each session
PROMPT is doing during each 'bad' snap. Most likely you will want to
PROMPT note the times of the high log file sync waits, look at what
PROMPT LGWR is doing at those times, and go from there...
column program format a45 wra
column sample_time format a25 tru
column event format a30 tru
column time_waited format 999999.999
column p1 format a40 tru
column p2 format a40 tru
column p3 format a40 tru
select sample_time, inst_id inst, session_id, program, event, time_waited/1000 TIME_WAITED,
p1text||': '||p1 p1,p2text||': '||p2 p2,p3text||': '||p3 p3
from gv$active_session_history
where to_char(sample_time,'Mondd_hh24mi') in (select
to_char(sample_time,'Mondd_hh24mi')
from gv$active_session_history
where event = 'log file sync'
group by to_char(sample_time,'Mondd_hh24mi'), inst_id
having avg(time_waited)/1000 > &&threshold)
order by 1,2,3,4,5;

select to_char(sysdate,'Mondd hh24:mi:ss') TIME from dual;

spool off

PROMPT
PROMPT OUTPUT FILE IS: lfsdiag_&&dbname&&timestamp&&suffix
PROMPT

Exadata Smart Flash Logging新特性

Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql)

Script