原文链接:http://www.dbaleet.org/get_exadata_cell_hidden_parameter_and_exadata_reverse_offloading/
Exadata一体机的offloading有一个很重要的功能就是db节点和cell节点之间的负载可以互相感知。
例如数据库节点大量使用smart scan,smart scan把负载offload到cell一端,在某些情况下,可能导致cell节点的cpu负载过高,甚至过载的情况。相反db节点由于把负载offload到cell节点反而比较空闲。
在这种情况下reverse offloading就派上用场了,cell如果发现cpu的占用率过高,这个时候会将一部分原本使用smart scan的查询不使用smart I/O过滤, 而是直接使用普通db的block I/O返回给db节点以缓解cell节点CPU的负载,等到cell的负载降下来,再次smart scan, 这个过程就叫做reverse offloading。当然如果此时发现db节点的CPU也很高,那么cell就不返回block I/O的数据块。整个这个过程对于数据来说是透明的。
所以在某些情况下,不要相信Exadata不需要索引的传言,即使所有的扫描都是smart scan!
也并不是说只要可能走smart scan,就一定非要强制其进行smart scan。如果smart scan过度很可能导致cell节点的cpu非常高,这个时候建立合适的索引不适用smart scan未尝不是明智之举。
有几个专有的统计信息叫做“cell physical IO bytes sent directly to DB node to balance CPU” 来标记reverse offloading这个过程。
如下:
SQL> select name, value from v$sysstat where name like'%balance%'; NAME VALUE ---------------------------------------------------------------- ---------- cell physical IO bytes sent directly to DB node to balance CPU 0
这项统计信息的描述是:
The number of I/O bytes sent back to the database server for processing due to CPU usage on Oracle Exadata Storage Server. |
也就是说cell因自身cpu占用过高使用普通block I/O扫描返回给db节点的I/O字节数。
那么这个过程是通过什么参数来控制的呢?遗憾的是我虽然知道有这个特性但是却一时也记不起来了,我只记得这是一个cell端设置的隐含参数。如果能把所有的cell的隐含参数摆在我面前,我一定能够一眼认出它来。遗憾的是,我并没有一份包含所有的cell的隐含参数的列表,也不知道如何获取这个信息。至少目前我并不知道通过什么命令或者视图可以查询到cell端使用的所有隐含参数信息。
经过一段时间的思考以后。。。
我想既然cell端的systemstate是dump cell端存储软件的内存信息的,cellsrv中应该包含当前所有参数的信息。
于是尝试进行cell端的systemsate dump,方法我在以前的文章如何在Exadata的cell节点做systemstate dump中有提到过。
[root@dm01cel01 ~]# cellcli CellCLI: Release 12.1.1.1.0 - Production on Tue Oct 08 02:24:45 MDT 2013 Copyright (c) 2007, 2013, Oracle. All rights reserved. Cell Efficiency Ratio: 589 CellCLI> alter cell events="immediate cellsrv.cellsrv_statedump(0,0)" Dump sequence #2 has been written to /opt/oracle/cell/log/diag/asm/cell/dm01cel01/trace/svtrc_27579_59.trc Cell dm01cel01 successfully altered
打开/opt/oracle/cell/log/diag/asm/cell/dm01cel01/trace/svtrc_27579_59.trc,搜索parameter就能得到一份当前cell隐含参数的列表(令人遗憾的是暂时无法获取这些隐含参数的描述信息),如下所示:
+ Parameters for process at dump time Dumping configuration parameter values Unable to lookup value for parameter local_ipaddresses ipaddress1 = 192.168.10.11/22 (default = NULL) Unable to lookup value for parameter ipaddress2 Unable to lookup value for parameter ipaddress3 Unable to lookup value for parameter ipaddress4 version = 0.0 (default = ) _cell_max_pll_pred_writes = 36 _cell_pred_writes_autotune_enabled = TRUE _cell_max_pll_pred_reads = 36 _cell_pred_reads_autotune_enabled = TRUE _cell_max_flash_largeios = 48 _cell_num_threads_in_short_wait = 40 _cell_max_pll_pred_filters = 24 (default = 0) _cell_pred_filters_autotune_enabled = TRUE _cell_pred_filter_max_iosize = 1073741824 _cell_num_threads = 100 _cell_num_buffers = 5000 _cell_num_1mb_buffers = 8000 (default = 0) _cell_num_1mb_bwr_buffers = 180 _cell_num_1mb_brr_buffers = 180 _cell_max_dynbufs_memsize = 3072 (default = 0) _cell_listener_port = 5042 _cell_listener_backlog = 1000 _cell_listener_pll_jobs = 23 _cell_listener_req_batch = 100 _cell_num_0_byte_recv_ports = 4 _ms_cell_ioctl_timeout = 30000 _cell_iorm_test_mode = FALSE _cell_iorm_perf_stats = FALSE _cell_iorm_wl_mode = 0 _cell_iorm_hipri_alloc = 0 _cell_iorm_medpri_alloc = 0 _cell_iorm_lowpri_alloc = 0 _cell_iorm_asm_alloc = 0 _cell_iorm_lutil_limit = 0 _cell_iorm_hints_enabled = FALSE _iorm_hint0 = -1 _iorm_priority0 = -1 _iorm_hint1 = -1 _iorm_priority1 = -1 _iorm_hint2 = -1 _iorm_priority2 = -1 _iorm_hint3 = -1 _iorm_priority3 = -1 _iorm_hint4 = -1 _iorm_priority4 = -1 _iorm_hint5 = -1 _iorm_priority5 = -1 _iorm_hint6 = -1 _iorm_priority6 = -1 _iorm_hint7 = -1 _iorm_priority7 = -1 _cell_iorm_pri_catidx = -1 _cell_iorm_pri_dbidx = -1 _cell_iorm_pri_cgidx = -1 _cell_iorm_enable = TRUE _cell_iorm_max_io = 0 _cell_iorm_max_lio = 0 _cell_iorm_conc_writes = 0 _cell_iorm_deadline = 0 _cell_iorm_fake_dbs = 0 _cell_hard_disable = FALSE _cell_raise_softassert_on_harderr = FALSE _cell_enable_ossnet_checksum = FALSE _cell_enable_skgxp_stats = TRUE _skgxp_udp_use_tcb = TRUE _skgxp_udp_use_tcb_client = TRUE _cell_memory_tracing = TRUE _cell_dmpsga_enabled = FALSE _cell_enable_dynamic_credits = TRUE _cell_num_ios_per_predjob = 10 _cell_num_pred_flashio_corrupt_retries = 1000 _cell_pred_dump_disk_onclose = FALSE _cell_pred_polling_ctl_enabled = TRUE _cell_pred_sim_block_byteord_conv = FALSE _cell_max_kuty_failure_diagnostics = 0 _cell_print_all_params = FALSE _cell_pred_disable_destbuf_refill = FALSE _cell_smartio_passthru_enabled = FALSE _cell_pred_no_predio_limit = FALSE _cell_pred_enable_io_buffer_eviction = TRUE _cell_pred_enable_dest_buffer_eviction = TRUE _cell_pred_enable_flashio = TRUE _cell_snapshot_bufsize = 1 _cell_snapshot_interval = 100 _cell_gen_time_stats_level = 1 _cell_gen_time_stats_timer_level = 0 _cell_force_split_gdisk = FALSE _cell_testlevel = 0 _cell_max_receive_buffers_per_port = 600 _cell_num_8k_buffers = 10000 _cell_num_16k_buffers = 5000 _cell_num_32k_buffers = 5000 _cell_num_64k_buffers = 5000 _cell_max_receive_buffers_8k_port = 1000 _cell_max_receive_buffers_1mb_port = 50 _cell_crash_on_error = 0 _cell_crash_on_error_skip_n = 0 _cell_1mb_buffers_hugepage_support = TRUE _skgxp_udp_interface_detection_time_secs = 4 _skgxp_gen_ant_ping_misscount = 2 _disable_diskmon_tcp_monitor = FALSE _disable_diskmon_subnet_manager_query = FALSE _skgxp_min_zcpy_len = 2147483647 _skgxp_min_rpc_rcv_zcpy_len = 2147483647 _skgxp_zcpy_flags = 2147483647 _skgxp_ctx_flags1 = 0 _skgxp_ctx_flags1mask = 0 _skgxp_dynamic_protocol = 0 _skgxp_inets = 0 _skgxpg_last_parameter = 27 _skgxp_ant_options = 0 _libcell_enable_libcell_interrupts = 1 _cell_rcvport_hist_size = 0 _skgxp_gen_rpc_no_path_check_in_sec = 1 _skgxp_gen_rpc_timeout_in_sec = 300 _skgxp_gen_ant_off_rpc_timeout_in_sec = 30 _reconnect_to_cell_freq_in_sec = 2 _reconnect_to_cell_attempts = 7 _disconnect_to_cell_attempts = 2 _reconnect_controls_reset_interval = 60 _dskm_disable_reconnect_to_cell = FALSE _cell_disable_resource_leak_check = FALSE _cell_disable_ant_check_reid = FALSE _cell_disable_proactive_drop = FALSE _cell_server_event = _cell_client_event = _cell_reserve_hugepage_memory_mb = 24 _cell_tolerates_max_backward_drift_microsecs = 300000 _cell_num_sched_log_entries = 8192 _cell_storage_index_columns = 8 (default = 0) _cell_storage_index_partial_reads_threshold_percent = 85 _cell_storage_index_partial_rd_sectors = 512 _cell_enable_storage_index_for_loads = TRUE _cell_enable_storage_index_for_writes = TRUE _cell_storage_index_diag_mode = 0 _cell_storage_index_sizing_factor = 2 _cell_pred_max_smartio_sessions = 3820 (default = 0) _cell_pred_max_con_ccfilter = 23 (default = 14) _cell_pred_max_con_filters = 0 _cell_pred_num_ios_toissue_keptobj = 2 _cell_pred_max_cus_per_filter = 1 _cell_load_timezone_during_boot = TRUE _cell_sendport_private_rqh_pool_size = 10 _cell_sendport_global_rqh_num_pools = 512 _cell_sendport_global_rqh_pool_maxincr = 150 _cell_capability_version = 0 _cell_iolat_stats_disable = FALSE _cell_pred_mapelem_split_size = -1 _cell_perf_flags = 0 _cell_enable_sbuf_check = FALSE _cell_disable_crash_dump_enhancement = FALSE _cell_buffer_expiration_hours = 48 _cell_object_expiration_hours = 24 _cell_pred_max_num_outstanding_ios = 1000 _cell_mutex_stats = 0 _cell_port_activity_threshold = 300000 _cell_ant_port_activity_threshold = 1800000 _cell_ant_port_noopen_threshold = 60000 _cell_in_lrg_testing = FALSE _cell_io_hang_reboot = TRUE _cell_iohang_wtfc_reboot = FALSE _cell_io_hang_time = 90 _cell_io_hang_kill_time = 95 _cell_io_hang_disable = FALSE _cell_write_simulate_hard_error_freq = 0 _cell_assert_on_flash_data_corruption = 0 _cell_memalloc_analysis = disabled _cell_flashcache_diag_reads_frequency = 0 _cell_read_flash_data_verif_level = 3 _cell_read_flash_gdisk_verif_level = 3 _cell_max_retry_on_read_flash_gdisk_verif_err = 2 _cell_enable_read_verif_on_these_gdisks = _cell_enable_read_verif_on_gdisk_first_N_MB = -1 _cell_flash_cache_sanity_checking = 0 _cellrsdef_fast_restart = 1 _cell_max_memory = 22355 (default = 0) _cell_max_connections = 1500 (default = 0) _cell_sga_lowmem_threshold_size = 1024 (default = 0) _cell_nomem_threshold_enabled = TRUE _cell_sga_lowmem_threshold_enabled = TRUE _cell_disable_heap_summary = FALSE _cell_disable_flashcache_hung_io_handling = FALSE _cell_flashcache_aging_writes_enabled = TRUE _cell_flashcache_lru_max_hot_percent = 50 _cell_flashcache_max_FDOM_outst_ios = 70 _cell_populate_flash_max_FDOM_outst_wr_ios = 100 _cell_auto_close_fd_interval = 120 _cell_dump_sga_on_oom_exception = FALSE _cell_quarantine_manager_disabled = FALSE _cell_qm_disable_sql_step_quarantine = FALSE _cell_qm_disable_disk_region_quarantine = FALSE _cell_qm_db_quarantine_threshold = 3 _cell_qm_offload_quarantine_threshold = 3 _cell_thread_max_trace_file_size = -1 _cell_redolog_fast_ack = FALSE _cell_max_tsld_hd_svctm_ms = 500 _cell_max_tsld_fd_svctm_ms = 100 _cell_max_rltv_svctm_ratio = 10 _cell_svctm_ratio_wt = 100 _cell_err_num_wt = 20 _cell_iopoor_perf_disable = FALSE _cell_disable_flashcache_db_blk_chksum = FALSE _cell_disable_flash_gdisk_db_blk_chksum = FALSE _cell_auto_dump_errstack = TRUE _cell_perf_max_hd_proa_fail = 0 _cell_perf_max_fd_proa_fail = 16 _cell_max_hd_hung_reboot = 2 _cell_max_fd_hung_reboot = 9 _cell_si_max_num_diag_mode_dumps = 20 _cell_fc_persistence_max_io_retry = 4 _cell_fc_persistence_state = 0 _cell_fc_md_shadow_paging_enabled = FALSE _cell_fc_bootstrap_timeout = 20000000 _cell_fc_cache_mirror_writes = 1 _cell_fc_dw_batch_size = 1 _cell_simulate_railroad_crashes = FALSE _cell_qm_max_simulated_railroad_crashes = 2 _cell_latency_warning_threshold = _cell_latency_threshold_check_interval = 360000 _cell_latency_threshold_print_warning = FALSE _cell_si_expensive_debug_tracing = FALSE _cell_poor_perf_schedule_time = 5 _cell_iohang_schedule_time = 1 _cell_io_hang_drop_flash = TRUE _cell_io_hang_drop_hard = FALSE _cell_assert_unsafe_allocmem = FALSE _cell_fplib_fix_control = 0 Unable to lookup value for parameter _lost_cache_detect _cell_num_vers_check_fail_messages = 0 _cell_qm_db_quarantine_time_threshold = 86400 _cell_flashlog_flags = 0 _cell_flashlog_max_active_table_size = 1024 _cell_secure_erase_power = 5 _cell_mpp_cpu_freq = 2 _ms_listener_port = 5043 _cell_mpp_threshold = 90 _cell_mpp_max_pushback = 50 _si_write_diag_disable = FALSE _cell_max_cellsrvstat_sessions = 3 _cell_si_diag_mode_force = FALSE _cell_tracefile_max_size = 1610612736 _cell_bwrite_si_build_disabled = FALSE _cell_state_dump_options = 0
有了这份参数列表以后我就能认出来了
控制这个功能的隐含参数为
_cell_mpp_threshold = 90
_cell_mpp_max_pushback = 50
其中_cell_mpp_threshold表示cell cpu的阈值,默认为90%,_cell_mpp_max_pushback表示返回Block I/O的最大比例,默认为50%。
也就是说默认情况下如果cell节点的cpu超过90%, 那么cell就会使用不经smart scan过滤Block I/O返回给db节点,这部分I/O的最大比例为50%。
我们可以在cellinit.ora参数文件中修改这个默认值。也可以使用如下方式修改当前内存中的值:
CellCLI> alter cell events = "immediate cellsrv.cellsrv_setparam('_cell_mpp_threshold','75')"
免责声明:
以上过程和隐含参数仅供参考,请不要在非Oracle Support的指导下设置这些隐含参数,如果造成任何负面影响,本人不承担由此造成的任何损失。
Comment