我最近更换了一个在 GlusterFS 集群中提供砖块的 HDD。我能够将该 HDD 映射回砖块,然后让 GlusterFS 成功复制到它。
然而,整个过程有一个问题似乎对我不起作用。我试图用替换的砖块在卷上运行“heal”命令,但会不断遇到这个问题:
$ gluster volume heal nova
Locking failed on c551316f-7218-44cf-bb36-befe3d3df34b. Please check log file for details.
Locking failed on ae62c691-ae55-4c99-8364-697cb3562668. Please check log file for details.
Locking failed on cb78ba3c-256f-4413-ae7e-aa5c0e9872b5. Please check log file for details.
Locking failed on 79a6a414-3569-482c-929f-b7c5da16d05e. Please check log file for details.
Locking failed on 5f43c6a4-0ccd-424a-ae56-0492ec64feeb. Please check log file for details.
Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
Locking failed on 6c0111fc-b5e7-4350-8be5-3179a1a5187e. Please check log file for details.
Locking failed on 88fcb687-47aa-4921-b3ab-d6c3b330b32a. Please check log file for details.
Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
日志基本上与上述内容相呼应,特别是:
$ tail etc-glusterfs-glusterd.vol.log
[2015-08-03 23:08:03.289249] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289258] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on c7416c1f-494b-4a95-b48d-6c766c7bce14. Please check log file for details.
[2015-08-03 23:08:03.289279] W [rpc-clnt-ping.c:199:rpc_clnt_ping_cbk] 0-management: socket or ib related error
[2015-08-03 23:08:03.289827] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.289858] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on d73de03a-0f66-4619-89ef-b73c9bbd800e. Please check log file for details.
[2015-08-03 23:08:03.290509] E [glusterd-syncop.c:562:_gd_syncop_mgmt_lock_cbk] 0-management: Could not find peer with ID d827a48e-627f-0000-0a00-000000000000
[2015-08-03 23:08:03.290529] E [glusterd-syncop.c:111:gd_collate_errors] 0-: Locking failed on 4a780f57-37e4-4f1b-9c34-187a0c7e44bf. Please check log file for details.
[2015-08-03 23:08:03.290597] E [glusterd-syncop.c:1804:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2015-08-03 23:07:03.351603] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2015-08-03 23:07:03.351644] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
这些其他日志在我尝试上述操作时有消息:
$ ls -ltr
-rw------- 1 root root 41704 Aug 2 12:07 glfsheal-nova.log
-rw------- 1 root root 15986 Aug 2 12:07 cmd_history.log-20150802
-rw------- 1 root root 290359 Aug 3 19:07 var-lib-nova-instances.log
-rw------- 1 root root 221829 Aug 3 19:07 glustershd.log
-rw------- 1 root root 195472 Aug 3 19:07 nfs.log
-rw------- 1 root root 61831116 Aug 3 19:07 var-lib-nova-mnt-92ef2ec54fd18595ed18d8e6027a1b3d.log
-rw------- 1 root root 3504 Aug 3 19:08 cmd_history.log
-rw------- 1 root root 89294 Aug 3 19:08 cli.log
-rw------- 1 root root 136421 Aug 3 19:08 etc-glusterfs-glusterd.vol.log
纵观它们,尚不清楚其中是否与这个特定问题有关。
通过上述设置,我最初认为我只能从 GlusterFS 集群的主节点运行修复命令,但事实证明,我真正的问题在于 GlusterFS 集群中的 11 个节点运行 2 个不同的版本GlusterFS 的。
一旦我意识到这一点,我将所有节点更新到最新版本的 GlusterFS (3.7.3) 并且能够从任何节点执行修复,正如人们所期望的那样。