我系统地(每天 1/2 次)在数据库服务器中遇到冻结。
突然,所有的 COMMIT 都卡在了“init”状态;它们堆积起来(高达 20/30),几乎没有其他查询被执行。约 30 秒后,服务器恢复正常操作。
我已经挖掘了很多数据(如下),这表明 InnoDB 中正在发生一些事情,但我很难找到一个具体的原因。
在进程列表中:
- 在问题期间,我只看到 COMMITs
- 之前,没有长时间运行的操作
我也浏览了二进制日志,但没有什么可疑的:
- 在问题期间,没有任何东西被写入
- 以前,只有小查询
事实上,从问题发生前不久开始的二进制日志相当小——在 ROW 模式下,如果有一个巨大的更新操作,我希望它会很大。
我认为磁盘有问题,但数据不符合例如磁盘突然减速的模式。
哪个可能是原因?对于这种类型的问题,我希望有一个大的写入查询,但我找不到任何查询。
该系统是一个专用服务器,具有 128GB 内存和双 CPU Intel Xeon E5-2680 v2 @ 2.80GHz。存储是希捷 15k 磁盘的镜像;操作系统是 Ubuntu 14.04,文件存储在单独的 ext4 分区中。
MySQL是v5.6.24,分配了24G的缓冲池。
查询缓存被禁用。
图表、全局状态和 innodb 变量紧随其后,并附有一些注释。
在所有图表中,每个 X 单位是一秒。问题的近似间隔由竖线突出显示。
请注意,对于问题的每一秒,我都归档了所有进程、全局状态变量和 innodb 状态。
Innodb 缓冲池页面
脏页/空闲页中有明显的峰值:
并且在缓冲池大小中:
刷新的页面有一个非常有趣的行为;不要秒杀很多,但 InnoDB 开始连续刷新;在临界之后,它猛增:
Innodb_buffer_pool_wait_free
始终为 0。
日志文件
这是写入日志文件的数据增加,虽然不是很大:
磁盘操作
未决操作(fsyncs/read/writes)没有显着变化,但实际 fsyncs 有所下降:
命令
几乎所有命令 (Com_*) 在问题期间都不会执行。以下是唯一执行的,显然是暴跌的:
当前全局状态(部分)
+-----------------------------------------------+----------------+
| Aborted_clients | 4767955 |
| Aborted_connects | 33 |
| Binlog_cache_disk_use | 47747 |
| Binlog_cache_use | 68002226 |
| Binlog_stmt_cache_disk_use | 0 |
| Binlog_stmt_cache_use | 682187 |
| Bytes_received | 810930341088 |
| Bytes_sent | 6128156018408 |
| Com_admin_commands | 66190088 |
| Com_alter_table | 5837 |
| Com_analyze | 9742 |
| Com_begin | 328167299 |
| Com_change_db | 216 |
| Com_commit | 327839739 |
| Com_create_db | 13 |
| Com_create_index | 1 |
| Com_create_table | 877609 |
| Com_delete | 13720394 |
| Com_delete_multi | 89347 |
| Com_drop_db | 14 |
| Com_drop_table | 740674 |
| Com_insert | 130474570 |
| Com_insert_select | 572148 |
| Com_kill | 98 |
| Com_load | 1 |
| Com_lock_tables | 3 |
| Com_release_savepoint | 28061 |
| Com_rename_table | 978 |
| Com_replace | 7995547 |
| Com_replace_select | 46493 |
| Com_rollback | 284617 |
| Com_rollback_to_savepoint | 3 |
| Com_savepoint | 28062 |
| Com_select | 1664064423 |
| Com_set_option | 152369114 |
| Com_show_create_table | 64205947 |
| Com_show_databases | 11 |
| Com_show_engine_status | 1375041 |
| Com_show_fields | 97123409 |
| Com_show_keys | 1 |
| Com_show_master_status | 66 |
| Com_show_processlist | 2624788 |
| Com_show_slave_status | 1250533 |
| Com_show_status | 1591300 |
| Com_show_table_status | 198 |
| Com_show_tables | 90403059 |
| Com_show_triggers | 197 |
| Com_show_variables | 1580763 |
| Com_show_warnings | 3 |
| Com_unlock_tables | 4 |
| Com_update | 39673285 |
| Com_update_multi | 1259352 |
| Compression | OFF |
| Connections | 5426115 |
| Created_tmp_disk_tables | 108340111 |
| Created_tmp_files | 633100 |
| Created_tmp_tables | 350818717 |
| Delayed_errors | 0 |
| Delayed_insert_threads | 0 |
| Delayed_writes | 0 |
| Flush_commands | 1 |
| Handler_commit | 2607344684 |
| Handler_delete | 59374751 |
| Handler_discover | 0 |
| Handler_external_lock | 6579792721 |
| Handler_mrr_init | 0 |
| Handler_prepare | 379741472 |
| Handler_read_first | 122813781 |
| Handler_read_key | 300828885004 |
| Handler_read_last | 68546 |
| Handler_read_next | 1973974492391 |
| Handler_read_prev | 535277275 |
| Handler_read_rnd | 25166576467 |
| Handler_read_rnd_next | 403233196007 |
| Handler_rollback | 7441860 |
| Handler_savepoint | 56123 |
| Handler_savepoint_rollback | 4 |
| Handler_update | 7089849963 |
| Handler_write | 45088211806 |
| Innodb_buffer_pool_dump_status | not started |
| Innodb_buffer_pool_load_status | not started |
| Innodb_buffer_pool_pages_data | 1376429 |
| Innodb_buffer_pool_bytes_data | 22551412736 |
| Innodb_buffer_pool_pages_dirty | 54 |
| Innodb_buffer_pool_bytes_dirty | 884736 |
| Innodb_buffer_pool_pages_flushed | 511407857 |
| Innodb_buffer_pool_pages_free | 8196 |
| Innodb_buffer_pool_pages_misc | 188235 |
| Innodb_buffer_pool_pages_total | 1572860 |
| Innodb_buffer_pool_read_ahead_rnd | 0 |
| Innodb_buffer_pool_read_ahead | 43857258 |
| Innodb_buffer_pool_read_ahead_evicted | 1057 |
| Innodb_buffer_pool_read_requests | 3228343017942 |
| Innodb_buffer_pool_reads | 244620532 |
| Innodb_buffer_pool_wait_free | 0 |
| Innodb_buffer_pool_write_requests | 89857610256 |
| Innodb_data_fsyncs | 252778445 |
| Innodb_data_pending_fsyncs | 0 |
| Innodb_data_pending_reads | 0 |
| Innodb_data_pending_writes | 0 |
| Innodb_data_read | 4856234004480 |
| Innodb_data_reads | 300490939 |
| Innodb_data_writes | 688782200 |
| Innodb_data_written | 20083187664896 |
| Innodb_dblwr_pages_written | 511407857 |
| Innodb_dblwr_writes | 80028802 |
| Innodb_have_atomic_builtins | ON |
| Innodb_log_waits | 83 |
| Innodb_log_write_requests | 6576724198 |
| Innodb_log_writes | 91699110 |
| Innodb_os_log_fsyncs | 92727019 |
| Innodb_os_log_pending_fsyncs | 0 |
| Innodb_os_log_pending_writes | 0 |
| Innodb_os_log_written | 3324697918464 |
| Innodb_page_size | 16384 |
| Innodb_pages_created | 131874643 |
| Innodb_pages_read | 296401190 |
| Innodb_pages_written | 511407857 |
| Innodb_row_lock_current_waits | 5 |
| Innodb_row_lock_time | 193034202 |
| Innodb_row_lock_time_avg | 607 |
| Innodb_row_lock_time_max | 121631 |
| Innodb_row_lock_waits | 317570 |
| Innodb_rows_deleted | 59374751 |
| Innodb_rows_inserted | 16430811966 |
| Innodb_rows_read | 2466321863882 |
| Innodb_rows_updated | 108284843 |
| Innodb_num_open_files | 4 |
| Innodb_truncated_status_writes | 0 |
| Innodb_available_undo_logs | 128 |
| Key_blocks_not_flushed | 0 |
| Key_blocks_unused | 857368 |
| Key_blocks_used | 6054 |
| Key_read_requests | 2074128816 |
| Key_reads | 0 |
| Key_write_requests | 292337091 |
| Key_writes | 0 |
| Last_query_cost | 0.000000 |
| Last_query_partial_plans | 0 |
| Max_used_connections | 157 |
| Not_flushed_delayed_rows | 0 |
| Open_files | 24 |
| Open_streams | 0 |
| Open_table_definitions | 352 |
| Open_tables | 528 |
| Opened_files | 437024702 |
| Opened_table_definitions | 1775335 |
| Opened_tables | 914419 |
| Prepared_stmt_count | 0 |
| Queries | 4907804173 |
| Questions | 4841613101 |
| Select_full_join | 46349 |
| Select_full_range_join | 82799 |
| Select_range | 266986250 |
| Select_range_check | 536 |
| Select_scan | 304460971 |
| Slow_launch_threads | 0 |
| Slow_queries | 144689 |
| Sort_merge_passes | 3055697 |
| Sort_range | 119144232 |
| Sort_rows | 30616561566 |
| Sort_scan | 115294967 |
| Table_locks_immediate | 3285742243 |
| Table_locks_waited | 0 |
| Table_open_cache_hits | 3445246883 |
| Table_open_cache_misses | 169145 |
| Table_open_cache_overflows | 0 |
| Threads_cached | 43 |
| Threads_connected | 64 |
| Threads_created | 278 |
| Threads_running | 2 |
| Uptime | 8670805 |
| Uptime_since_flush_status | 8670805 |
+-----------------------------------------------+----------------+
Innodb 变量
+------------------------------------------+------------------------+
| Variable_name | Value |
+------------------------------------------+------------------------+
| innodb_adaptive_flushing | ON |
| innodb_adaptive_flushing_lwm | 10 |
| innodb_adaptive_hash_index | ON |
| innodb_adaptive_max_sleep_delay | 150000 |
| innodb_additional_mem_pool_size | 8388608 |
| innodb_api_bk_commit_interval | 5 |
| innodb_api_disable_rowlock | OFF |
| innodb_api_enable_binlog | OFF |
| innodb_api_enable_mdl | OFF |
| innodb_api_trx_level | 0 |
| innodb_autoextend_increment | 64 |
| innodb_autoinc_lock_mode | 1 |
| innodb_buffer_pool_dump_at_shutdown | OFF |
| innodb_buffer_pool_dump_now | OFF |
| innodb_buffer_pool_filename | <removed> |
| innodb_buffer_pool_instances | 8 |
| innodb_buffer_pool_load_abort | OFF |
| innodb_buffer_pool_load_at_startup | OFF |
| innodb_buffer_pool_load_now | OFF |
| innodb_buffer_pool_size | 25769803776 |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
| innodb_checksum_algorithm | innodb |
| innodb_checksums | ON |
| innodb_cmp_per_index_enabled | OFF |
| innodb_commit_concurrency | 0 |
| innodb_compression_failure_threshold_pct | 5 |
| innodb_compression_level | 6 |
| innodb_compression_pad_pct_max | 50 |
| innodb_concurrency_tickets | 5000 |
| innodb_data_file_path | ibdata1:12M:autoextend |
| innodb_data_home_dir | <removed> |
| innodb_disable_sort_file_cache | OFF |
| innodb_doublewrite | ON |
| innodb_fast_shutdown | 1 |
| innodb_file_format | Antelope |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | OFF |
| innodb_flush_log_at_timeout | 1 |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_flush_method | O_DIRECT |
| innodb_flush_neighbors | 1 |
| innodb_flushing_avg_loops | 30 |
| innodb_force_load_corrupted | OFF |
| innodb_force_recovery | 0 |
| innodb_ft_aux_table | |
| innodb_ft_cache_size | 8000000 |
| innodb_ft_enable_diag_print | OFF |
| innodb_ft_enable_stopword | ON |
| innodb_ft_max_token_size | 84 |
| innodb_ft_min_token_size | 3 |
| innodb_ft_num_word_optimize | 2000 |
| innodb_ft_result_cache_limit | 2000000000 |
| innodb_ft_server_stopword_table | |
| innodb_ft_sort_pll_degree | 2 |
| innodb_ft_total_cache_size | 640000000 |
| innodb_ft_user_stopword_table | |
| innodb_io_capacity | 200 |
| innodb_io_capacity_max | 2000 |
| innodb_large_prefix | OFF |
| innodb_lock_wait_timeout | 120 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_buffer_size | 8388608 |
| innodb_log_compressed_pages | ON |
| innodb_log_file_size | 134217728 |
| innodb_log_files_in_group | 3 |
| innodb_log_group_home_dir | <removed> |
| innodb_lru_scan_depth | 1024 |
| innodb_max_dirty_pages_pct | 75 |
| innodb_max_dirty_pages_pct_lwm | 0 |
| innodb_max_purge_lag | 0 |
| innodb_max_purge_lag_delay | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_monitor_disable | |
| innodb_monitor_enable | |
| innodb_monitor_reset | |
| innodb_monitor_reset_all | |
| innodb_old_blocks_pct | 37 |
| innodb_old_blocks_time | 1000 |
| innodb_online_alter_log_max_size | 134217728 |
| innodb_open_files | 300 |
| innodb_optimize_fulltext_only | OFF |
| innodb_page_size | 16384 |
| innodb_print_all_deadlocks | OFF |
| innodb_purge_batch_size | 300 |
| innodb_purge_threads | 1 |
| innodb_random_read_ahead | OFF |
| innodb_read_ahead_threshold | 56 |
| innodb_read_io_threads | 4 |
| innodb_read_only | OFF |
| innodb_replication_delay | 0 |
| innodb_rollback_on_timeout | OFF |
| innodb_rollback_segments | 128 |
| innodb_sort_buffer_size | 1048576 |
| innodb_spin_wait_delay | 6 |
| innodb_stats_auto_recalc | ON |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | OFF |
| innodb_stats_persistent | ON |
| innodb_stats_persistent_sample_pages | 20 |
| innodb_stats_sample_pages | 8 |
| innodb_stats_transient_sample_pages | 8 |
| innodb_status_output | OFF |
| innodb_status_output_locks | OFF |
| innodb_strict_mode | OFF |
| innodb_support_xa | ON |
| innodb_sync_array_size | 1 |
| innodb_sync_spin_loops | 30 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
| innodb_undo_directory | . |
| innodb_undo_logs | 128 |
| innodb_undo_tablespaces | 0 |
| innodb_use_native_aio | ON |
| innodb_use_sys_malloc | ON |
| innodb_version | 5.6.24 |
| innodb_write_io_threads | 4 |
+------------------------------------------+------------------------+
您可能需要针对 InnoDB 和可能的 ext4 卷调整写入
方面#1
我注意到您将innodb_write_io_threads设置为 4(默认值)。您需要增加它,以便脏页可以
.ibd
更稳健地刷新到它们各自的文件中。请设置为 16。方面#2
执行写入的暂停可能适用于 ext4。为什么 ?
您已将innodb_flush_method设置为
O_DIRECT
. 这应该使磁盘写入更稳定。尽管如此,我还是给你一个惊喜。大约一年前,当 innodb_flush_method=O_DSYNC 时,我回答了用 O_SYNC 打开的 ib_logfile 。我提到了一个Percona 博客,它说O_DIRECT 是在 ext4 中使用最新内核伪造的。经过长时间的实验,我发现问题出在臭名昭著的交换疯狂上。
该问题已通过使用该
innodb-numa-interleave
选项得到解决。