我有三个节点,我想将它们设置到 Percona XtraDB 集群 (PXC) 中。我已经启动了第一个节点并加入了第二个节点,但无法以某种方式加入第三个节点。所有配置都和我刚才复制粘贴的一样:
[mysqld]
# Galera
wsrep_cluster_address = gcomm://10.1.5.100,10.1.5.101,10.1.5.102
wsrep_cluster_name = db-test
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider=/usr/lib64/galera3/libgalera_smm.so
wsrep_provider_options = "gcache.size=256M"
wsrep_slave_threads = 16 # 2~3 times with CPU
wsrep_sst_auth = "sstuser:sstPwd#123"
wsrep_sst_method = xtrabackup-v2
我在 CentOS 7.x 上运行节点。下面是已经启动并运行的两个 PXC 节点的状态:
| wsrep_ist_receive_seqno_end | 0 |
| wsrep_incoming_addresses | 10.1.5.100:3306,10.1.5.101:3306 |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0/0/0/0/0 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 8d59ca0f-cd35-11e8-863c-d79869fa6d80 |
| wsrep_cluster_conf_id | 4 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | ac97f711-cad5-11e8-8f39-be9d0594cdb9 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 0 |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <[email protected]> |
| wsrep_provider_version | 3.31(rf216443) |
| wsrep_ready | ON |
+----------------------------------+-----------------------------------------+
71 rows in set (0.01 sec)
以下是无法加入的第三个节点的错误日志中的错误:
backup-v2|10.1.5.102:4444/xtrabackup_sst//1
2018-10-11T09:20:03.278884-00:00 2 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 2) (Increment: 1 -> 3)
2018-10-11T09:20:03.278997-00:00 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-10-11T09:20:03.279155-00:00 2 [Note] WSREP: Assign initial position for certification: 69, protocol version: 4
2018-10-11T09:20:03.279626-00:00 0 [Note] WSREP: Service thread queue flushed.
2018-10-11T09:20:03.280052-00:00 2 [Note] WSREP: Check if state gap can be serviced using IST
2018-10-11T09:20:03.280145-00:00 2 [Note] WSREP: Local state seqno is undefined (-1)
2018-10-11T09:20:03.280445-00:00 2 [Note] WSREP: State gap can't be serviced using IST. Switching to SST
2018-10-11T09:20:03.280510-00:00 2 [Note] WSREP: Failed to prepare for incremental state transfer: Local state seqno is undefined: 1 (Operation not permitted)
at galera/src/replicator_str.cpp:prepare_for_IST():549. IST will be unavailable.
2018-10-11T09:20:03.287673-00:00 0 [Note] WSREP: Member 1.0 (db-test-3.pd.local) requested state transfer from '*any*'. Selected 0.0 (db-test-2.pd.local)(SYNCED) as donor.
2018-10-11T09:20:03.287850-00:00 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 69)
2018-10-11T09:20:03.288073-00:00 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2018-10-11T09:20:03.288225-00:00 2 [Note] WSREP: GCache history reset: ac97f711-cad5-11e8-8f39-be9d0594cdb9:0 -> ac97f711-cad5-11e8-8f39-be9d0594cdb9:69
2018-10-11T09:20:38.988120-00:00 0 [Warning] WSREP: 0.0 (db-test-2.pd.local): State transfer to 1.0 (db-test-3.pd.local) failed: -32 (Broken pipe)
2018-10-11T09:20:38.988274-00:00 0 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():766: Will never receive state. Need to abort.
2018-10-11T09:20:38.988366-00:00 0 [Note] WSREP: gcomm: terminating thread
2018-10-11T09:20:38.988493-00:00 0 [Note] WSREP: gcomm: joining thread
2018-10-11T09:20:38.988942-00:00 0 [Note] WSREP: gcomm: closing backend
2018-10-11T09:20:38.995070-00:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(NON_PRIM,8d59ca0f,3)
memb {
d3167260,0
}
joined {
}
left {
}
partitioned {
8d59ca0f,0
e3def063,0
}
)
2018-10-11T09:20:38.995334-00:00 0 [Note] WSREP: Current view of cluster as seen by this node
view ((empty))
2018-10-11T09:20:38.996612-00:00 0 [Note] WSREP: gcomm: closed
2018-10-11T09:20:38.996837-00:00 0 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Terminated
2018-10-11T09:20:47.767946+00:00 WSREP_SST: [ERROR] Removing /var/lib/mysql//xtrabackup_galera_info file due to signal
2018-10-11T09:20:47.788109+00:00 WSREP_SST: [ERROR] Removing file due to signal
2018-10-11T09:20:47.808425+00:00 WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
2018-10-11T09:20:47.818240+00:00 WSREP_SST: [ERROR] Error while getting data from donor node: exit codes: 143 143
2018-10-11T09:20:47.828411+00:00 WSREP_SST: [ERROR] ******************************************************
2018-10-11T09:20:47.840006+00:00 WSREP_SST: [ERROR] Cleanup after exit with status:32
下面是来自被选为捐赠者的节点的错误:
2018/10/11 09:20:38 socat[22418] E connect(5, AF=2 10.1.5.102:4444, 16): No route to host
2018-10-11T09:20:38.805798+00:00 WSREP_SST: [ERROR] ******************* FATAL ERROR **********************
2018-10-11T09:20:38.818683+00:00 WSREP_SST: [ERROR] Error while sending data to joiner node: exit codes: 0 1
2018-10-11T09:20:38.832059+00:00 WSREP_SST: [ERROR] ******************************************************
2018-10-11T09:20:38.846813+00:00 WSREP_SST: [ERROR] Cleanup after exit with status:32
2018-10-11T09:20:38.985060-00:00 0 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.1.5.102:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --mysqld-version '5.7.23-23-57' --binlog 'db-test-2-bin' --gtid 'ac97f711-cad5-11e8-8f39-be9d0594cdb9:69' : 32 (Broken pipe)
2018-10-11T09:20:38.985552-00:00 0 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '10.1.5.102:4444/xtrabackup_sst//1' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --mysqld-version '5.7.23-23-57' --binlog 'db-test-2-bin' --gtid 'ac97f711-cad5-11e8-8f39-be9d0594cdb9:69'
2018-10-11T09:20:38.990613-00:00 0 [Warning] WSREP: 0.0 (db-test-2.pd.local): State transfer to 1.0 (db-test-3.pd.local) failed: -32 (Broken pipe)
2018-10-11T09:20:38.990815-00:00 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 69)
2018-10-11T09:20:38.997784-00:00 0 [Note] WSREP: declaring e3def063 at tcp://10.1.5.100:4567 stable
2018-10-11T09:20:38.997807-00:00 0 [Note] WSREP: Member 0.0 (db-test-2.pd.local) synced with group.
2018-10-11T09:20:38.998230-00:00 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 69)
2018-10-11T09:20:38.998277-00:00 0 [Note] WSREP: forgetting d3167260 (tcp://10.1.5.102:4567)
2018-10-11T09:20:38.998806-00:00 13 [Note] WSREP: Synchronized with group, ready for connections
2018-10-11T09:20:38.999112-00:00 13 [Note] WSREP: Setting wsrep_ready to true
2018-10-11T09:20:38.999198-00:00 13 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-10-11T09:20:39.003491-00:00 0 [Note] WSREP: Node 8d59ca0f state primary
2018-10-11T09:20:39.005025-00:00 0 [Note] WSREP: Current view of cluster as seen by this node
view (view_id(PRIM,8d59ca0f,4)
memb {
8d59ca0f,0
e3def063,0
}
joined {
}
left {
}
partitioned {
d3167260,0
}
)
2018-10-11T09:20:39.005270-00:00 0 [Note] WSREP: Save the discovered primary-component to disk
2018-10-11T09:20:39.009691-00:00 0 [Note] WSREP: forgetting d3167260 (tcp://10.1.5.102:4567)
2018-10-11T09:20:39.010097-00:00 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2018-10-11T09:20:39.011037-00:00 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: eb0b1f21-cd36-11e8-8ac8-c60fb82759c9
2018-10-11T09:20:39.019171-00:00 0 [Note] WSREP: STATE EXCHANGE: sent state msg: eb0b1f21-cd36-11e8-8ac8-c60fb82759c9
2018-10-11T09:20:39.021665-00:00 0 [Note] WSREP: STATE EXCHANGE: got state msg: eb0b1f21-cd36-11e8-8ac8-c60fb82759c9 from 0 (db-test-2.pd.local)
2018-10-11T09:20:39.021786-00:00 0 [Note] WSREP: STATE EXCHANGE: got state msg: eb0b1f21-cd36-11e8-8ac8-c60fb82759c9 from 1 (db-test-1.pd.local)
2018-10-11T09:20:39.021861-00:00 0 [Note] WSREP: Quorum results:
version = 4,
component = PRIMARY,
conf_id = 3,
members = 2/2 (primary/total),
act_id = 69,
last_appl. = 0,
protocols = 0/9/3 (gcs/repl/appl),
group UUID = ac97f711-cad5-11e8-8f39-be9d0594cdb9
2018-10-11T09:20:39.021999-00:00 0 [Note] WSREP: Flow-control interval: [141, 141]
2018-10-11T09:20:39.022058-00:00 0 [Note] WSREP: Trying to continue unpaused monitor
2018-10-11T09:20:39.022774-00:00 17 [Note] WSREP: REPL Protocols: 9 (4, 2)
2018-10-11T09:20:39.023163-00:00 17 [Note] WSREP: New cluster view: global state: ac97f711-cad5-11e8-8f39-be9d0594cdb9:69, view# 4: Primary, number of nodes: 2, my index: 0, protocol version 3
2018-10-11T09:20:39.023209-00:00 17 [Note] WSREP: Setting wsrep_ready to true
2018-10-11T09:20:39.023256-00:00 17 [Note] WSREP: Auto Increment Offset/Increment re-align with cluster membership change (Offset: 1 -> 1) (Increment: 3 -> 2)
2018-10-11T09:20:39.023373-00:00 17 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-10-11T09:20:39.023540-00:00 17 [Note] WSREP: Assign initial position for certification: 69, protocol version: 4
2018-10-11T09:20:39.023832-00:00 0 [Note] WSREP: Service thread queue flushed.
2018-10-11T09:20:44.480289-00:00 0 [Note] WSREP: cleaning up d3167260 (tcp://10.1.5.102:4567)
当我引导第三个不是它自己的集群时,它运行得很好。但是当我尝试停止另一个集群中的前两个节点并尝试让它们加入新集群时,它们无法加入。我可以从第三个节点 ping 和远程登录前两个集群节点,反之亦然。我什至尝试停止所有节点并从头开始引导集群,但这没有帮助。
这里究竟发生了什么?
首先,感谢您提供足够的调试信息,不是每个人都这样做。
您的 SST(数据副本)失败。显然,netcat 因“无路由到主机”错误而失败——这告诉您新主机无法从您粘贴的捐赠者访问。这实际上不是集群配置问题,而是操作系统/网络问题 - 您的端口可能已关闭、防火墙已启动或其他网络问题。尝试从捐赠者 ping 另一台主机或在 4444 端口上运行测试 netcat 以调试中断。一旦主机可访问,您的 sst 应该成功并且节点加入集群。通常这是一些愚蠢的错误,比如防火墙在使用的端口之一上启动、数据目录权限错误、用户错误等。
如果是测试设置,您可以尝试将 sst 方法更改为其他方法以帮助调试(它仅使用 mysql 端口,因此更简单)。