我们有一个 MySQL 主数据库在 5.7.40 上运行。我们正在测试升级到 8.0.36(这是在 AWS RDS 上)。我们有 2 个 MySQL-8 副本和 2 个 MySQL-5.7 副本从主副本复制。几乎在同一时间,两个 MySQL 8.0 副本都停止复制并抱怨HA_ERR_FOUND_DUPP_KEY
Replica SQL for channel '': Worker 2 failed executing transaction '87953f5d-7595-11ed-830d-02f4790d85ab:57805008598' at source log mysql-bin-changelog.676514, end_log_pos 52858646; Could not execute Write_rows event on table ebdb.bike_issues; Duplicate entry '177235118' for key 'bike_issues.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's source log mysql-bin-changelog.676514, end_log_pos 52858646, Error_code: MY-001062
奇怪的是,尽管副本抱怨不同的键值,但时间戳几乎相同(相隔几毫秒)。MySQL 5.7 副本运行良好,因此显然主端没有出现任何问题。此时主节点的日志中也没有显示任何内容。
它抱怨说,该表是非常常见的写入表,我们的 MySQL 8 副本现在运行了一个多星期,没有任何复制问题。我们进行基于行、基于 GTID 的复制(gtid_mode ON、enforce_gtid_consistency ON)
我可以通过设置slave_exec_mode
为IDEMPOTENT
暂时恢复复制。当我在复制同步后检查错误日志时,我没有在第二个副本错误日志中看到第一个副本的密钥的错误,反之亦然,即它们都在不同的密钥上失败。这可能是复制接收器部分的问题吗?或者可能是由于版本不匹配导致的一些错误?或者某些mysql变量不匹配?
我怎样才能进一步调试这个?是什么可能导致了这种现象呢?
编辑#1
在主数据库上创建表的输出
Create Table: CREATE TABLE `bike_issues` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`issue_type` varchar(191) COLLATE utf8mb4_bin NOT NULL,
`bike_id` int(11) NOT NULL,
`reported_at` datetime NOT NULL,
`resolved_at` datetime DEFAULT NULL,
`resolution_type` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`token` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
`idempotency_key` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_bike_issues_on_token` (`token`),
UNIQUE KEY `index_bike_issues_on_idempotency_key` (`idempotency_key`),
KEY `by_bike_id_issue_type_resolved_at` (`bike_id`,`issue_type`,`resolved_at`) USING BTREE,
KEY `index_bike_issues_on_resolved_at` (`resolved_at`)
) ENGINE=InnoDB AUTO_INCREMENT=177614144 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
在 MySQL 8 副本上
Create Table: CREATE TABLE `bike_issues` (
`id` int NOT NULL AUTO_INCREMENT,
`issue_type` varchar(191) COLLATE utf8mb4_bin NOT NULL,
`bike_id` int NOT NULL,
`reported_at` datetime NOT NULL,
`resolved_at` datetime DEFAULT NULL,
`resolution_type` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`token` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
`idempotency_key` varchar(191) COLLATE utf8mb4_bin DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_bike_issues_on_token` (`token`),
UNIQUE KEY `index_bike_issues_on_idempotency_key` (`idempotency_key`),
KEY `by_bike_id_issue_type_resolved_at` (`bike_id`,`issue_type`,`resolved_at`) USING BTREE,
KEY `index_bike_issues_on_resolved_at` (`resolved_at`)
) ENGINE=InnoDB AUTO_INCREMENT=177614497 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
小学时
show variables like '%binlog%';
+--------------------------------------------+----------------------+
| Variable_name | Value |
+--------------------------------------------+----------------------+
| binlog_cache_size | 32768 |
| binlog_checksum | NONE |
| binlog_direct_non_transactional_updates | OFF |
| binlog_error_action | IGNORE_ERROR |
| binlog_format | ROW |
| binlog_group_commit_sync_delay | 0 |
| binlog_group_commit_sync_no_delay_count | 0 |
| binlog_gtid_simple_recovery | ON |
| binlog_max_flush_queue_time | 0 |
| binlog_order_commits | ON |
| binlog_row_image | FULL |
| binlog_rows_query_log_events | OFF |
| binlog_stmt_cache_size | 32768 |
| binlog_transaction_dependency_history_size | 25000 |
| binlog_transaction_dependency_tracking | COMMIT_ORDER |
| innodb_api_enable_binlog | OFF |
| innodb_locks_unsafe_for_binlog | OFF |
| log_statements_unsafe_for_binlog | OFF |
| max_binlog_cache_size | 18446744073709547520 |
| max_binlog_size | 134217728 |
| max_binlog_stmt_cache_size | 18446744073709547520 |
| sync_binlog | 2000 |
+--------------------------------------------+----------------------+
mysql> show variables like '%slave%';
+------------------------------+-----------------------+
| Variable_name | Value |
+------------------------------+-----------------------+
| init_slave | |
| log_slave_updates | ON |
| log_slow_slave_statements | ON |
| pseudo_slave_mode | OFF |
| rpl_stop_slave_timeout | 31536000 |
| slave_allow_batching | OFF |
| slave_checkpoint_group | 512 |
| slave_checkpoint_period | 300 |
| slave_compressed_protocol | OFF |
| slave_exec_mode | STRICT |
| slave_load_tmpdir | /rdsdbdata/tmp |
| slave_max_allowed_packet | 1073741824 |
| slave_net_timeout | 60 |
| slave_parallel_type | DATABASE |
| slave_parallel_workers | 0 |
| slave_pending_jobs_size_max | 16777216 |
| slave_preserve_commit_order | OFF |
| slave_rows_search_algorithms | TABLE_SCAN,INDEX_SCAN |
| slave_skip_errors | OFF |
| slave_sql_verify_checksum | ON |
| slave_transaction_retries | 10 |
| slave_type_conversions | |
| sql_slave_skip_counter | 0 |
+------------------------------+-----------------------+
mysql> show variables where variable_name in ('gtid_mode', 'enforce_gtid_consistency', 'innodb_flush_log_at_trx_commit');
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| enforce_gtid_consistency | ON |
| gtid_mode | ON |
| innodb_flush_log_at_trx_commit | 2 |
+--------------------------------+-------+
在 MySQL 8.0 副本上
mysql> show variables like '%binlog%';
+------------------------------------------------+----------------------+
| Variable_name | Value |
+------------------------------------------------+----------------------+
| binlog_cache_size | 32768 |
| binlog_checksum | NONE |
| binlog_direct_non_transactional_updates | OFF |
| binlog_encryption | OFF |
| binlog_error_action | ABORT_SERVER |
| binlog_expire_logs_auto_purge | ON |
| binlog_expire_logs_seconds | 2592000 |
| binlog_format | ROW |
| binlog_group_commit_sync_delay | 0 |
| binlog_group_commit_sync_no_delay_count | 0 |
| binlog_gtid_simple_recovery | ON |
| binlog_max_flush_queue_time | 0 |
| binlog_order_commits | ON |
| binlog_rotate_encryption_master_key_at_startup | OFF |
| binlog_row_event_max_size | 8192 |
| binlog_row_image | FULL |
| binlog_row_metadata | MINIMAL |
| binlog_row_value_options | |
| binlog_rows_query_log_events | OFF |
| binlog_stmt_cache_size | 32768 |
| binlog_transaction_compression | OFF |
| binlog_transaction_compression_level_zstd | 3 |
| binlog_transaction_dependency_history_size | 25000 |
| binlog_transaction_dependency_tracking | COMMIT_ORDER |
| innodb_api_enable_binlog | OFF |
| log_statements_unsafe_for_binlog | OFF |
| max_binlog_cache_size | 18446744073709547520 |
| max_binlog_size | 134217728 |
| max_binlog_stmt_cache_size | 18446744073709547520 |
| sync_binlog | 1000 |
+------------------------------------------------+----------------------+
mysql> show variables where variable_name in ('gtid_mode', 'enforce_gtid_consistency', 'innodb_flush_log_at_trx_commit');
+--------------------------------+-------+
| Variable_name | Value |
+--------------------------------+-------+
| enforce_gtid_consistency | ON |
| gtid_mode | ON |
| innodb_flush_log_at_trx_commit | 2 |
+--------------------------------+-------+
mysql> show variables like '%replica%';
+-----------------------------------------------+----------------+
| Variable_name | Value |
+-----------------------------------------------+----------------+
| group_replication_consistency | EVENTUAL |
| init_replica | |
| innodb_replication_delay | 0 |
| log_replica_updates | ON |
| log_slow_replica_statements | OFF |
| pseudo_replica_mode | OFF |
| replica_allow_batching | ON |
| replica_checkpoint_group | 512 |
| replica_checkpoint_period | 300 |
| replica_compressed_protocol | OFF |
| replica_exec_mode | STRICT |
| replica_load_tmpdir | /rdsdbdata/tmp |
| replica_max_allowed_packet | 1073741824 |
| replica_net_timeout | 60 |
| replica_parallel_type | LOGICAL_CLOCK |
| replica_parallel_workers | 32 |
| replica_pending_jobs_size_max | 134217728 |
| replica_preserve_commit_order | ON |
| replica_skip_errors | OFF |
| replica_sql_verify_checksum | ON |
| replica_transaction_retries | 10 |
| replica_type_conversions | |
| replication_optimize_for_static_plugin_config | OFF |
| replication_sender_observe_commit_only | OFF |
| rpl_stop_replica_timeout | 31536000 |
| skip_replica_start | ON |
| sql_replica_skip_counter | 0 |
+-----------------------------------------------+----------------+
我们启用了二进制日志,并且能够在再次发生这种情况时弄清楚这一点。副本有
read_only=0
一个应用程序级别的错误,即在极其罕见的情况下进行写入。