Nikita Kipriyanov提出的问题 -server

Nikita Kipriyanov

Asked: 2024-10-07 20:42:57 +0800 CST

PVE：LVM 使用错误的后端设备（多路径设备的非多路径组件）进行初始化

8

我有一台运行 Proxmox VE 8.2（基于 Debian 12）的服务器，已完全更新，使用多路径 SAN。我们在那里分配了一个空间，十个设备，每个设备 2T，每个设备被看到 16 次（路径数），并映射到 10 个多路径虚拟设备并收集到一个 LVM 组中，其中雕刻了一个逻辑卷。这部分没有问题。

它被“即时”初始化并正常运行，直到主机重新启动。之后我注意到，每当lvs在系统 shell 中运行任何与 LVM 相关的命令（等）时，它都会发出以下形式的警告：

WARNING: Device mismatch detected for <LV> which is accessing <list of devices like
 /dev/sdak, /dev/sdaz, /dev/sdbn, ...> instead of <list of the devices
 like /dev/mapper/oamdwhdg-01, /dev/mapper/oamdwhdg-02, ...>

这些/dev/mapper/oamdwhdg-XX实际上是多路径设备，它们之所以这样命名是出于以下操作原因/etc/multipath.conf：

multipaths {
...
    multipath {
        wwid 36...
        alias oamdwhdg-04
    }
...
}

即无论出于何种原因，LV 都是使用后端设备映射的，而不是使用多路径虚拟设备。

因此我更新了过滤器/etc/lvm/lvm.conf，现在如下所示：

global_filter=[ "a|/dev/sda3|", "a|/dev/mapper/oamdwhdg|", "r|.*|" ]

（我更新了它，而不是添加的，它global_filter=["r|/dev/zd.*|","r|/dev/rbd.*|"]之前是有评论的added by pve-manager to avoid scanning ZFS zvols and Ceph rbds。我相信我的过滤器比这个严格得多。）

/dev/sda3是安装 PVE 的本地磁盘，它包含pve不依赖于 SAN 的 VG，因此它是唯一/dev/sd...不使用多路径且被列入白名单的磁盘。

我使用这个过滤器测试了它vgscan，它显示可以找到两个组。

然后我执行update-initramfs并重新启动。在启动过程中，服务器无法进入紧急 shell。但是，当我到达Ctrl+D那里时，它几乎正常启动：可以看到多路径 VG，但未激活（好像vgchange -an oamdwhdg尚未运行）。但是，我使用上述命令手动完全正常地激活了它，并且它运行正常。

我怀疑这是因为 initramfs 中多路径在 LVM 之前未正确初始化，因此它尝试使用 /dev/sdXX 设备设置映射。但我不明白为什么添加过滤器后它无法进入紧急 shell。

这里有两个（非常相关的）问题：为什么要使用紧急 shell，也就是出了什么问题以及如何使其按预期工作？

Nikita Kipriyanov

Asked: 2019-12-04 22:53:12 +0800 CST

BareOS BackupCatalog 作业卡在 Director 中终止，RunAfterJob 未运行

1

这看起来很奇怪。我使用 Bacula 和现在的 BareOS 已有 10 多年了，但现在一个系统出现了奇怪的行为，我不知道为什么以及如何修复。

当它运行每日备份时，它工作正常，直到它到达 BackupCatalog 作业，该作业被配置为在其他所有操作之后运行。

该作业看起来已成功终止（表中的 JobStatus=T list jobs）：

*list jobs
...
+-------+---------------+--------------+---------------------+------+-------+----------+-----------------+-----------+
| JobId | Name          | Client       | StartTime           | Type | Level | JobFiles | JobBytes        | JobStatus |
+-------+---------------+--------------+---------------------+------+-------+----------+-----------------+-----------+
...
| 5,475 | BackupCatalog | kantor-fd    | 2019-12-04 02:56:40 | B    | F     |       21 |      27,364,860 | T         |
+-------+---------------+--------------+---------------------+------+-------+----------+-----------------+-----------+

但是，在messages日志文件中，我看不到最后一项工作的常规摘要。日志文件完成如下：

19-Nov 02:32 kantor-dir JobId 5398: shell command: run BeforeJob "/usr/lib/bareos/scripts/make_catalog_backup.pl Kantor"
19-Nov 02:33 kantor-dir JobId 5398: Start Backup JobId 5398, Job=BackupCatalog.2019-11-18_23.10.00_10
19-Nov 02:33 kantor-dir JobId 5398: Using Device "FileStorage" to write.
19-Nov 02:33 kantor-sd JobId 5398: Volume "Kantor-2018-01-08_08:48:50" previously written, moving to end of data.
19-Nov 02:33 kantor-sd JobId 5398: Ready to append to end of Volume "Kantor-2018-01-08_08:48:50" size=4716094462
19-Nov 02:33 kantor-sd JobId 5398: Elapsed time=00:00:05, Transfer rate=5.663 M Bytes/second

就这样。注意，RunAfterJob 脚本似乎没有被执行。但是如果我手动执行它，它就可以工作（导出的目录数据库文件被删除）。然而，这不是 RunAfterJob 脚本的唯一工作。

我希望它最终会显示出这样的东西。所有其他工作都有它：

19-Nov 02:32 kantor-dir JobId 5397: Bareos kantor-dir 16.2.6 (02Jun17):
  Build OS:               x86_64-pc-linux-gnu debian Debian GNU/Linux buster/sid
  JobId:                  5397
  Job:                    FTP.2019-11-18_23.05.00_09
...
  FD termination status:  OK
  SD termination status:  OK
  Termination:            Backup OK

19-Nov 02:32 kantor-dir JobId 5397: Begin pruning Jobs older than 1 month 10 days .
...

此外，导演的身份看起来很奇怪：

*status dir
kantor-dir Version: 16.2.6 (02 June 2017) x86_64-pc-linux-gnu debian Debian GNU/Linux buster/sid
Daemon started 03-Dec-19 11:10. Jobs: run=4, running=1 mode=0 db=mysql
 Heap: heap=135,168 smbytes=222,459 max_bytes=236,758 bufs=543 max_bufs=594

Scheduled Jobs:
...
====

Running Jobs:
Console connected at 04-Dec-19 09:03
 JobId Level   Name                       Status
======================================================================
  5475 Full    BackupCatalog.2019-12-03_23.10.00_08 has terminated
====

Terminated Jobs:

 JobId  Level    Files      Bytes   Status   Finished        Name 
====================================================================
...
  5471  Incr      6,591    7.499 G  OK       03-Dec-19 23:15 termsrv
  5472  Incr        427    11.37 G  OK       03-Dec-19 23:44 1C
  5473  Incr          3    3.198 G  OK       04-Dec-19 02:56 Oracle
  5474  Incr      5,797    2.600 G  OK       04-Dec-19 02:56 FTP


Client Initiated Connections (waiting for jobs):
...
====

即在“正在运行的工作”中列出的上述工作，但它说它已终止。它没有在“终止的工作”中列出，好像导演还有事情要做。

它在这种状态下挂了六个小时。我还看到时间有些奇怪（表和日志文件中的 StartTime 相差半小时，但是，系统date和 MySQLselect NOW();是同步的）。

director 重启后，director 状态看起来更合适：

Running Jobs:
Console connected at 04-Dec-19 09:06
No Jobs running.
====
No Terminated Jobs.

这一切都始于两周前。如果我让它挂起，所有后续计划的作业将无限期地等待这个卡住的作业，这意味着不会执行任何备份。

我觉得这可能是该作业的 RunAfterJob 脚本存在的问题，但它是标准发布的脚本。如果我用手跑进去，它会起作用。作业定义本身也是标准发布的，唯一的修改是我在 FileSet 中添加了 compression=GZIP，但我每次都这样做，这从未引起任何问题。

要找什么？怎么修？

更新：

问题消失了。我不明白，为什么。备份工作至少两天。似乎什么都没有卡住。

Nikita Kipriyanov

Asked: 2019-11-26 03:44:06 +0800 CST

HP Gen9 上的 Debian — 最新的 hpssacli 似乎太旧了

0

我在服务器上安装了最新的 Debian 10 Buster 系统，即 HPE DL360 Gen9。它具有 P440ar 适配器，可与“新”hpsa驱动程序一起使用。据我记得，RAID 配置了内置的“预启动”GUI 实用程序。所有固件都更新到最新版本，所以我相信该实用程序也是最新版本。

现在我必须为 Zabbix 服务器设置 RAID 状态监控。

hpsa数组是用hpssacli实用程序管理的（旧的hpacucli支持cciss驱动程序，这对我不适用）。我有一个从 Zabbix 代理运行的包装脚本，它能够发现和查询系统中每个数组的状态，该脚本只是调用hpssacli、解析和调整其输出以用于 Zabbix。我已经这样做了好多年了。

在这个新设置的系统上，我遇到了麻烦。我尝试了 HPe 自己的SDR MCP存储库，它不支持 buster 是的（HPe 更新其存储库的速度非常慢），所以我刚刚找到了一个最新的hpssaclideb 并安装了它。它似乎是hpssacli-2.40-13.0_amd64.deb，日期为 2016-06-28 17:55。

但是，当我尝试运行它时，它说：我的数组是使用更新版本的实用程序创建的，我的版本太旧而无法管理它：

root@vh3:~# wget https://downloads.linux.hpe.com/SDR/repo/mcp/pool/non-free/hpssacli-2.40-13.0_amd64.deb
--2019-11-25 14:13:38--  https://downloads.linux.hpe.com/SDR/repo/mcp/pool/non-free/hpssacli-2.40-13.0_amd64.deb
Распознаётся downloads.linux.hpe.com (downloads.linux.hpe.com)… 15.249.152.85
Подключение к downloads.linux.hpe.com (downloads.linux.hpe.com)|15.249.152.85|:443... соединение установлено.
HTTP-запрос отправлен. Ожидание ответа… 200 OK
Длина: 8237034 (7,9M)
Сохранение в: «hpssacli-2.40-13.0_amd64.deb»

hpssacli-2.40-13.0_amd64.deb                    100%[====================================================================================================>]   7,85M   394KB/s    за 22s     

2019-11-25 14:14:01 (363 KB/s) - «hpssacli-2.40-13.0_amd64.deb» сохранён [8237034/8237034]

root@vh3:~# ls
hpssacli-2.40-13.0_amd64.deb
root@vh3:~# dpkg -i hpssacli-2.40-13.0_amd64.deb 
Выбор ранее не выбранного пакета hpssacli.
(Чтение базы данных … на данный момент установлено 57199 файлов и каталогов.)
Подготовка к распаковке hpssacli-2.40-13.0_amd64.deb …
Распаковывается hpssacli (2.40-13.0) …
Настраивается пакет hpssacli (2.40-13.0) …
Обрабатываются триггеры для man-db (2.8.5-2) …
root@vh3:~# hpssacli ctrl all show

Smart Array P440ar in Slot 0 (Embedded) 

APPLICATION UPGRADE REQUIRED: This controller has been configured with a more
                              recent version of software.
                              To prevent data loss, configuration changes to
                              this controller are not allowed.
                              Please upgrade to the latest version to be able
                              to continue to configure this controller.

虽然这似乎不会阻止我的脚本监控控制器状态，但我还希望能够从操作系统管理它，以便能够添加驱动器并创建更多阵列而无需在未来停止系统。

我也尝试使用hwraid.le-vert.net repo，但没有hpssacli实用程序（它只有hpacucli，至少在 buster 中）。

我该怎么办？在哪里可以找到这个“最新”版本以及如何找到我需要的版本？

Nikita Kipriyanov

Asked: 2019-10-16 01:16:09 +0800 CST

Linux：pvmove 的 ZFS 模拟 - 如何从 vdev 移出数据？

8

我需要扩展服务器的磁盘容量。该池以 1Tb 磁盘启动，然后使用 2Tb 磁盘进行扩展。有超过 1Tb 的可用空间，即所有数据都可以轻松容纳 2Tb 部分，但目前分配在 1Tb 磁盘上。

实际上，这些磁盘分别是一对 1Tb 和一对 2Tb 上的硬件 (PERC) RAID1 阵列。

我想用 3Tb 替换这些 1Tb 磁盘。“一个接一个”的替换并不是一个真正的选择。原则上，这个物理 RAID 可以一个一个地替换磁盘，然后扩大阵列以填满磁盘。但是，我想避免这条路径，因为它会在磁盘冗余丢失时留下一些相当长的时间段。

我想将所有数据从 1Tb 中移出，然后将其删除，并替换为 3Tb。一切都在运行中完成，系统运行，停机时间为零。

使用 LVM，操作必须非常简单易懂：

pvmove 从 1Tb 物理磁盘中分配的所有数据（RAID 术语中的 VD）
vgreduce 从 vg 中删除该 pv 并 pvremove 删除 pv 元数据
使用 megacli 删除 1Tb 阵列（PERC 更名为 LSI/Avago MegaRAID SAS）
物理更换磁盘
使用 megacli 再次组装另一个阵列
创建一个新的 pv 并将其添加到 vg

这是我以前做的例行程序。每一步都很好理解，在每一步我都可以完全控制正在发生的事情，如果出现问题，我总是知道如何进行等等。

如何使用 ZFS 安全且有意识地执行相同的过程？

如果这很重要：

服务器是戴尔 PowerEdge R730
操作系统是 Proxmox VE 6.0，它基于 Debian 10.1。它是从 PVE ISO 映像安装的，即不是从 Debian 安装转换而来的。
系统不依赖于这个池，因为它是从组装到另一个池中的一组 SSD 运行的
该池托管一些不需要高性能的 VM 虚拟磁盘。但是，这些数据很有价值，如果丢失，我不能容忍。因此，程序应该清晰易懂
系统不断被用户使用，但他们会容忍数据迁移过程中的性能损失

Nikita Kipriyanov

Asked: 2016-04-05 06:01:03 +0800 CST

软件更新后无法将 MariaDB Galera Cluster 节点重新加入集群。my.cnf 中的 wsrep_* 选项似乎被忽略了

0

我已经运行了一段时间的 Galera 集群，MariaDB 版本 10.0.20，Galera 版本 25.3.10，一切都在 Gentoo 上。该集群运行良好，节点重新启动了几次以进行维护等等。当我在任何节点上启动 mysql 时，它会加载 wsrep_provider 库，该库启动 SST，然后最终加入集群。

现在是时候进行大规模升级了。我选择了http://galeracluster.com/documentation-webpages/upgrading.html#id1中描述的“滚动升级”路径，将所选节点放下并在那里更新软件。

对于不耐烦的，以下内容的简短摘要：mariadb 启动并忽略任何 wsrep_config 选项，如果我在运行实例中手动指定它们，它似乎挂起。

两个问题： - 为什么它会默默地忽略我的配置选项？如何让它尊重他们？- 如何开始复制，如何将节点加入集群？

一切都升级了。现在我有 MariaDB 10.1.13，Galera 25.3.15。/etc/mysql/my.cnf 中的配置已合并。现在看起来像这样（只有 mysqld 部分）：

[mysqld]
character-set-server            = utf8
user                            = mysql
port                            = 3306
socket                          = /var/run/mysqld/mysqld.sock
pid-file                        = /var/run/mysqld/mysqld.pid
log-error                       = /var/log/mysql/mysqld.err
basedir                         = /usr
datadir                         = /var/lib/mysql
skip-external-locking
key_buffer_size                 = 16M
max_allowed_packet              = 4M
table_open_cache                = 400
sort_buffer_size                = 512K
net_buffer_length               = 16K
read_buffer_size                = 256K
read_rnd_buffer_size            = 512K
myisam_sort_buffer_size         = 8M
lc_messages_dir                 = /usr/share/mysql
lc_messages                     = en_US

log-bin
server-id                       = 1

tmpdir                          = /tmp/

innodb_buffer_pool_size = 128M
innodb_data_file_path = ibdata1:10M:autoextend:max:128M
innodb_log_file_size = 48M
innodb_log_buffer_size = 8M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table

binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="www_cluster"
wsrep_cluster_address="gcomm://192.168.4.28,192.168.5.28,192.168.5.29"
wsrep_sst_method=rsync

我三重检查了这个文件，以确保我的 wsrep_* 选项在 [mysqld] 部分中。这些选项与升级前完全相同。

但是，mysql 服务现在作为普通的独立服务器启动，而不是集群：

2016-04-04 16:30:46 139950286550848 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=www1' or '--log-bin=mysqld-bin' to avoid this problem.
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: The InnoDB memory heap is disabled
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Memory barrier is not used
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Using Linux native AIO
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Using generic crc32 instructions
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Completed initialization of buffer pool
2016-04-04 16:30:46 139950286550848 [Note] InnoDB: Highest supported file format is Barracuda.
2016-04-04 16:30:47 139950286550848 [Note] InnoDB: 128 rollback segment(s) are active.
2016-04-04 16:30:47 139950286550848 [Note] InnoDB: Waiting for purge to start
2016-04-04 16:30:47 139950286550848 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 116616176
2016-04-04 16:30:47 139949849761536 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-04-04 16:30:47 139950286550848 [Note] Plugin 'FEEDBACK' is disabled.
2016-04-04 16:30:47 139950286550848 [Note] Server socket created on IP: '0.0.0.0'.
2016-04-04 16:30:47 139950286550848 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.13-MariaDB'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Source distribution

如果我查询 wsrep_* 变量，它显示没有加载提供程序：

MariaDB [(none)]> SHOW STATUS LIKE 'wsrep%';
+--------------------------+----------------------+
| Variable_name            | Value                |
+--------------------------+----------------------+
| wsrep_cluster_conf_id    | 18446744073709551615 |
| wsrep_cluster_size       | 0                    |
| wsrep_cluster_state_uuid |                      |
| wsrep_cluster_status     | Disconnected         |
| wsrep_connected          | OFF                  |
| wsrep_local_bf_aborts    | 0                    |
| wsrep_local_index        | 18446744073709551615 |
| wsrep_provider_name      |                      |
| wsrep_provider_vendor    |                      |
| wsrep_provider_version   |                      |
| wsrep_ready              | OFF                  |
| wsrep_thread_count       | 0                    |
+--------------------------+----------------------+
12 rows in set (0.00 sec)

看起来它不尊重 wsrep_provider 和 wsrep_cluster_address （我相信所有其他 wsrep_*）选项。

我试图在手动运行 mariadb 实例时全局设置它们：

MariaDB [(none)]> set global wsrep_provider='/usr/lib/galera/libgalera_smm.so';
Query OK, 0 rows affected (0.02 sec)
MariaDB [(none)]> SHOW STATUS LIKE 'wsrep%';
+------------------------------+-----------------------------------+
| Variable_name                | Value                             |
+------------------------------+-----------------------------------+
| wsrep_apply_oooe             | 0.000000                          |
| wsrep_apply_oool             | 0.000000                          |
| wsrep_apply_window           | 0.000000                          |
| wsrep_causal_reads           | 0                                 |
| wsrep_cert_deps_distance     | 0.000000                          |
| wsrep_cert_index_size        | 0                                 |
| wsrep_cert_interval          | 0.000000                          |
| wsrep_cluster_conf_id        | 18446744073709551615              |
| wsrep_cluster_size           | 0                                 |
| wsrep_cluster_state_uuid     |                                   |
| wsrep_cluster_status         | Disconnected                      |
| wsrep_commit_oooe            | 0.000000                          |
| wsrep_commit_oool            | 0.000000                          |
| wsrep_commit_window          | 0.000000                          |
| wsrep_connected              | OFF                               |
| wsrep_flow_control_paused    | 0.000000                          |
| wsrep_flow_control_paused_ns | 0                                 |
| wsrep_flow_control_recv      | 0                                 |
| wsrep_flow_control_sent      | 0                                 |
| wsrep_incoming_addresses     |                                   |
| wsrep_last_committed         | 18446744073709551615              |
| wsrep_local_bf_aborts        | 0                                 |
| wsrep_local_cached_downto    | 18446744073709551615              |
| wsrep_local_cert_failures    | 0                                 |
| wsrep_local_commits          | 0                                 |
| wsrep_local_index            | 18446744073709551615              |
| wsrep_local_recv_queue       | 0                                 |
| wsrep_local_recv_queue_avg   | 0.000000                          |
| wsrep_local_recv_queue_max   | 0                                 |
| wsrep_local_recv_queue_min   | 0                                 |
| wsrep_local_replays          | 0                                 |
| wsrep_local_send_queue       | 0                                 |
| wsrep_local_send_queue_avg   | 0.000000                          |
| wsrep_local_send_queue_max   | 0                                 |
| wsrep_local_send_queue_min   | 0                                 |
| wsrep_local_state            | 0                                 |
| wsrep_local_state_comment    | Initialized                       |
| wsrep_local_state_uuid       |                                   |
| wsrep_protocol_version       | 18446744073709551615              |
| wsrep_provider_name          | Galera                            |
| wsrep_provider_vendor        | Codership Oy <[email protected]> |
| wsrep_provider_version       | 3.15(r8459459)                    |
| wsrep_ready                  | OFF                               |
| wsrep_received               | 0                                 |
| wsrep_received_bytes         | 0                                 |
| wsrep_repl_data_bytes        | 0                                 |
| wsrep_repl_keys              | 0                                 |
| wsrep_repl_keys_bytes        | 0                                 |
| wsrep_repl_other_bytes       | 0                                 |
| wsrep_replicated             | 0                                 |
| wsrep_replicated_bytes       | 0                                 |
| wsrep_thread_count           | 0                                 |
+------------------------------+-----------------------------------+
52 rows in set (0.00 sec)

相应的日志条目：

2016-04-04 16:45:19 139950170823424 [Note] WSREP: Stop replication
2016-04-04 16:45:19 139950170823424 [Note] WSREP: Provider was not loaded, in stop replication
2016-04-04 16:45:19 139950170823424 [Note] WSREP: Initial position: 913d310a-9360-11e4-9d31-922a1e5d98fe:63026
2016-04-04 16:45:19 139950170823424 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2016-04-04 16:45:19 139950170823424 [Note] WSREP: wsrep_load(): Galera 3.15(r8459459) by Codership Oy <[email protected]> loaded successfully.
2016-04-04 16:45:19 139950170823424 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2016-04-04 16:45:19 139950170823424 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-04-04 16:45:19 139950170823424 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.4.28; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false;
2016-04-04 16:45:19 139949719164672 [Note] WSREP: Service thread queue flushed.
2016-04-04 16:45:19 139950170823424 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

并填写集群地址：

MariaDB [(none)]> set global wsrep_cluster_address='gcomm://192.168.4.28,192.168.5.28,192.168.5.29';    
Query OK, 0 rows affected (3.55 sec)

MariaDB [(none)]> SHOW STATUS LIKE 'wsrep%';
+------------------------------+-------------------------------------------------------+
| Variable_name                | Value                                                 |
+------------------------------+-------------------------------------------------------+
| wsrep_apply_oooe             | 0.000000                                              |
| wsrep_apply_oool             | 0.000000                                              |
| wsrep_apply_window           | 0.000000                                              |
| wsrep_causal_reads           | 0                                                     |
| wsrep_cert_deps_distance     | 0.000000                                              |
| wsrep_cert_index_size        | 0                                                     |
| wsrep_cert_interval          | 0.000000                                              |
| wsrep_cluster_conf_id        | 471                                                   |
| wsrep_cluster_size           | 3                                                     |
| wsrep_cluster_state_uuid     | 913d310a-9360-11e4-9d31-922a1e5d98fe                  |
| wsrep_cluster_status         | Primary                                               |
| wsrep_commit_oooe            | 0.000000                                              |
| wsrep_commit_oool            | 0.000000                                              |
| wsrep_commit_window          | 0.000000                                              |
| wsrep_connected              | ON                                                    |
| wsrep_evs_delayed            |                                                       |
| wsrep_evs_evict_list         |                                                       |
| wsrep_evs_repl_latency       | 0.0456547/0.049131/0.0560754/0.00491047/3             |
| wsrep_evs_state              | OPERATIONAL                                           |
| wsrep_flow_control_paused    | 0.000000                                              |
| wsrep_flow_control_paused_ns | 0                                                     |
| wsrep_flow_control_recv      | 0                                                     |
| wsrep_flow_control_sent      | 0                                                     |
| wsrep_gcomm_uuid             | fd416235-fa6b-11e5-867c-72011d51ce8d                  |
| wsrep_incoming_addresses     | 192.168.5.28:3306,192.168.5.29:3306,192.168.4.28:3306 |
| wsrep_last_committed         | 18446744073709551615                                  |
| wsrep_local_bf_aborts        | 0                                                     |
| wsrep_local_cached_downto    | 18446744073709551615                                  |
| wsrep_local_cert_failures    | 0                                                     |
| wsrep_local_commits          | 0                                                     |
| wsrep_local_index            | 2                                                     |
| wsrep_local_recv_queue       | 0                                                     |
| wsrep_local_recv_queue_avg   | 0.000000                                              |
| wsrep_local_recv_queue_max   | 1                                                     |
| wsrep_local_recv_queue_min   | 0                                                     |
| wsrep_local_replays          | 0                                                     |
| wsrep_local_send_queue       | 0                                                     |
| wsrep_local_send_queue_avg   | 0.000000                                              |
| wsrep_local_send_queue_max   | 1                                                     |
| wsrep_local_send_queue_min   | 0                                                     |
| wsrep_local_state            | 1                                                     |
| wsrep_local_state_comment    | Joining: receiving State Transfer                     |
| wsrep_local_state_uuid       |                                                       |
| wsrep_protocol_version       | 7                                                     |
| wsrep_provider_name          | Galera                                                |
| wsrep_provider_vendor        | Codership Oy <[email protected]>                     |
| wsrep_provider_version       | 3.15(r8459459)                                        |
| wsrep_ready                  | OFF                                                   |
| wsrep_received               | 1                                                     |
| wsrep_received_bytes         | 266                                                   |
| wsrep_repl_data_bytes        | 0                                                     |
| wsrep_repl_keys              | 0                                                     |
| wsrep_repl_keys_bytes        | 0                                                     |
| wsrep_repl_other_bytes       | 0                                                     |
| wsrep_replicated             | 0                                                     |
| wsrep_replicated_bytes       | 0                                                     |
| wsrep_thread_count           | 2                                                     |
+------------------------------+-------------------------------------------------------+
57 rows in set (0.00 sec)

各自的日志：

2016-04-04 16:45:19 139949719164672 [Note] WSREP: Service thread queue flushed.
2016-04-04 16:45:19 139950170823424 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-04-04 16:49:00 139950170520320 [Note] WSREP: Stop replication
2016-04-04 16:49:02 139950170520320 [Note] WSREP: Start replication
2016-04-04 16:49:02 139950170520320 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-04-04 16:49:02 139950170520320 [Note] WSREP: protonet asio version 0
2016-04-04 16:49:02 139950170520320 [Note] WSREP: Using CRC-32C for message checksums.
2016-04-04 16:49:02 139950170520320 [Note] WSREP: backend: asio
2016-04-04 16:49:02 139950170520320 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2016-04-04 16:49:02 139950170520320 [Note] WSREP: restore pc from disk failed
2016-04-04 16:49:02 139950170520320 [Note] WSREP: GMCast version 0
2016-04-04 16:49:02 139950170520320 [Note] WSREP: (fd416235, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-04-04 16:49:02 139950170520320 [Note] WSREP: (fd416235, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-04-04 16:49:02 139950170520320 [Note] WSREP: EVS version 0
2016-04-04 16:49:02 139950170520320 [Note] WSREP: gcomm: connecting to group 'www_cluster', peer '192.168.4.28:,192.168.5.28:,192.168.5.29:'
2016-04-04 16:49:02 139950170520320 [Warning] WSREP: (fd416235, 'tcp://0.0.0.0:4567') address 'tcp://192.168.4.28:4567' points to own listening address, blacklisting
2016-04-04 16:49:02 139950170520320 [Note] WSREP: (fd416235, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2016-04-04 16:49:02 139950170520320 [Note] WSREP: declaring 4c4f4779 at tcp://192.168.5.28:4567 stable
2016-04-04 16:49:02 139950170520320 [Note] WSREP: declaring b75760e5 at tcp://192.168.5.29:4567 stable
2016-04-04 16:49:02 139950170520320 [Note] WSREP: Node 4c4f4779 state prim
2016-04-04 16:49:02 139950170520320 [Note] WSREP: view(view_id(PRIM,4c4f4779,710) memb {
        4c4f4779,0
        b75760e5,0
        fd416235,0
} joined {
} left {
} partitioned {
})
2016-04-04 16:49:02 139950170520320 [Note] WSREP: save pc into disk
2016-04-04 16:49:03 139950170520320 [Note] WSREP: gcomm: connected
2016-04-04 16:49:03 139950170520320 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-04-04 16:49:03 139950170520320 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-04-04 16:49:03 139950170520320 [Note] WSREP: Opened channel 'www_cluster'
2016-04-04 16:49:03 139948892088064 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2016-04-04 16:49:03 139948892088064 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-04-04 16:49:03 139948892088064 [Note] WSREP: STATE EXCHANGE: sent state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8
2016-04-04 16:49:03 139948892088064 [Note] WSREP: STATE EXCHANGE: got state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8 from 0 (www2)
2016-04-04 16:49:03 139948892088064 [Note] WSREP: STATE EXCHANGE: got state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8 from 1 (galera)
2016-04-04 16:49:03 139949710771968 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.55477S), skipping check
2016-04-04 16:49:03 139948892088064 [Note] WSREP: STATE EXCHANGE: got state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8 from 2 ()
2016-04-04 16:49:03 139948892088064 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 470,
        members    = 2/3 (joined/total),
        act_id     = 63180,
        last_appl. = -1,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 913d310a-9360-11e4-9d31-922a1e5d98fe
2016-04-04 16:49:03 139948892088064 [Note] WSREP: Flow-control interval: [28, 28]
2016-04-04 16:49:03 139948892088064 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 63180)
2016-04-04 16:49:03 139950170217216 [Note] WSREP: State transfer required:
        Group state: 913d310a-9360-11e4-9d31-922a1e5d98fe:63180
        Local state: 00000000-0000-0000-0000-000000000000:-1
2016-04-04 16:49:03 139950170217216 [Note] WSREP: New cluster view: global state: 913d310a-9360-11e4-9d31-922a1e5d98fe:63180, view# 471: Primary, number of nodes: 3, my index: 2, protocol version 3
2016-04-04 16:49:03 139950170217216 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-04-04 16:49:03 139948883695360 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.4.28' --datadir '/var/lib/mysql/'  --defaults-file '/etc/mysql/my.cnf'  --parent '6067' --binlog 'mysqld-bin' '
2016-04-04 16:49:03 139950170217216 [Note] WSREP: Prepared SST request: rsync|192.168.4.28:4444/rsync_sst
2016-04-04 16:49:03 139950170217216 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-04-04 16:49:03 139950170217216 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-04-04 16:49:03 139949719164672 [Note] WSREP: Service thread queue flushed.
2016-04-04 16:49:03 139950170217216 [Note] WSREP: Assign initial position for certification: 63180, protocol version: 3
2016-04-04 16:49:03 139949719164672 [Note] WSREP: Service thread queue flushed.
2016-04-04 16:49:03 139950170217216 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (913d310a-9360-11e4-9d31-922a1e5d98fe): 1 (Operation not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2016-04-04 16:49:03 139948892088064 [Note] WSREP: Member 2.0 () requested state transfer from '*any*'. Selected 0.0 (www2)(SYNCED) as donor.
2016-04-04 16:49:03 139948892088064 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 63180)
2016-04-04 16:49:03 139950170217216 [Note] WSREP: Requesting state transfer: success, donor: 0
2016-04-04 16:49:05 139949710771968 [Note] WSREP: (fd416235, 'tcp://0.0.0.0:4567') turning message relay requesting off

现在节点继续坐在“加入：接收状态转移”中。

然而，选定的捐助者节点 (www2) 表示它已完成状态转移：

160404 16:49:04 [Note] WSREP: (4c4f4779, 'tcp://0.0.0.0:4567') address 'tcp://192.168.5.28:4567' pointing to uuid 4c4f4779 is blacklisted, skipping
160404 16:49:04 [Note] WSREP: (4c4f4779, 'tcp://0.0.0.0:4567') address 'tcp://192.168.5.28:4567' pointing to uuid 4c4f4779 is blacklisted, skipping
160404 16:49:04 [Note] WSREP: (4c4f4779, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
160404 16:49:04 [Note] WSREP: (4c4f4779, 'tcp://0.0.0.0:4567') address 'tcp://192.168.5.28:4567' pointing to uuid 4c4f4779 is blacklisted, skipping
160404 16:49:04 [Note] WSREP: (4c4f4779, 'tcp://0.0.0.0:4567') address 'tcp://192.168.5.28:4567' pointing to uuid 4c4f4779 is blacklisted, skipping
160404 16:49:04 [Note] WSREP: declaring b75760e5 at tcp://192.168.5.29:4567 stable
160404 16:49:04 [Note] WSREP: declaring fd416235 at tcp://192.168.4.28:4567 stable
160404 16:49:04 [Note] WSREP: Node 4c4f4779 state prim
160404 16:49:04 [Note] WSREP: view(view_id(PRIM,4c4f4779,710) memb {
    4c4f4779,0
    b75760e5,0
    fd416235,0
} joined {
} left {
} partitioned {
})
160404 16:49:04 [Note] WSREP: save pc into disk
160404 16:49:04 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
160404 16:49:04 [Note] WSREP: STATE_EXCHANGE: sent state UUID: feddde16-fa6b-11e5-86f6-fb86e3a56ec8
160404 16:49:04 [Note] WSREP: STATE EXCHANGE: sent state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8
160404 16:49:04 [Note] WSREP: STATE EXCHANGE: got state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8 from 0 (www2)
160404 16:49:04 [Note] WSREP: STATE EXCHANGE: got state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8 from 1 (galera)
160404 16:49:05 [Note] WSREP: STATE EXCHANGE: got state msg: feddde16-fa6b-11e5-86f6-fb86e3a56ec8 from 2 ()
160404 16:49:05 [Note] WSREP: Quorum results:
    version    = 3,
    component  = PRIMARY,
    conf_id    = 470,
    members    = 2/3 (joined/total),
    act_id     = 63180,
    last_appl. = 63152,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = 913d310a-9360-11e4-9d31-922a1e5d98fe
160404 16:49:05 [Note] WSREP: Flow-control interval: [28, 28]
160404 16:49:05 [Note] WSREP: New cluster view: global state: 913d310a-9360-11e4-9d31-922a1e5d98fe:63180, view# 471: Primary, number of nodes: 3, my index: 0, protocol version 3
160404 16:49:05 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
160404 16:49:05 [Note] WSREP: REPL Protocols: 7 (3, 2)
160404 16:49:05 [Note] WSREP: Service thread queue flushed.
160404 16:49:05 [Note] WSREP: Assign initial position for certification: 63180, protocol version: 3
160404 16:49:05 [Note] WSREP: Service thread queue flushed.
160404 16:49:05 [Note] WSREP: Member 2.0 () requested state transfer from '*any*'. Selected 0.0 (www2)(SYNCED) as donor.
160404 16:49:05 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 63180)
160404 16:49:06 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
160404 16:49:06 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '192.168.4.28:4444/rsync_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --defaults-group-suffix ''  --binlog 'mysqld-bin' --gtid '913d310a-9360-11e4-9d31-922a1e5d98fe:63180''
160404 16:49:06 [Note] WSREP: sst_donor_thread signaled with 0
160404 16:49:06 [Note] WSREP: Flushing tables for SST...
160404 16:49:06 [Note] WSREP: Provider paused at 913d310a-9360-11e4-9d31-922a1e5d98fe:63180 (9555)
160404 16:49:06 [Note] WSREP: Tables flushed.
WSREP_SST: [INFO] Preparing binlog files for transfer: (20160404 16:49:07.005)
mysqld-bin.000033
160404 16:49:07 [Note] WSREP: (4c4f4779, 'tcp://0.0.0.0:4567') turning message relay requesting off
160404 16:51:39 [Note] WSREP: resuming provider at 9555
160404 16:51:39 [Note] WSREP: Provider resumed.
160404 16:51:40 [Note] WSREP: 0.0 (www2): State transfer to 2.0 () complete.
160404 16:51:40 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 63180)
160404 16:51:40 [Note] WSREP: Member 0.0 (www2) synced with group.
160404 16:51:40 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 63180)
160404 16:51:40 [Note] WSREP: Synchronized with group, ready for connections
160404 16:51:40 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

现在要做什么？如何继续更新节点操作？

PVE：LVM 使用错误的后端设备（多路径设备的非多路径组件）进行初始化

BareOS BackupCatalog 作业卡在 Director 中终止，RunAfterJob 未运行

HP Gen9 上的 Debian — 最新的 hpssacli 似乎太旧了

Linux：pvmove 的 ZFS 模拟 - 如何从 vdev 移出数据？

软件更新后无法将 MariaDB Galera Cluster 节点重新加入集群。my.cnf 中的 wsrep_* 选项似乎被忽略了

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

Nikita Kipriyanov's questions