AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / unix / 问题

问题[pacemaker](unix)

Martin Hope
Marko Todoric
Asked: 2019-06-28 23:14:45 +0800 CST

如果第一个节点关闭,PCS Stonith (fencing) 将杀死两个节点集群

  • 2

我已经使用 pcs (corosync/pacemaker/pcsd) 配置了一个两节点物理服务器集群 (HP ProLiant DL560 Gen8)。我还使用 fence_ilo4 在它们上配置了围栏。

如果一个节点出现故障(在 DOWN 下,我的意思是断电),就会发生奇怪的事情,第二个节点也会死掉。Fencing 会杀死自己,导致两台服务器都离线。

我该如何纠正这种行为?

我尝试的是在下面的部分中添加“ wait_for_all: 0”和“ ” 。但它仍然会杀死它。expected_votes: 1/etc/corosync/corosync.confquorum

在某些时候,要在其中一台服务器上执行一些维护,并且必须将其关闭。如果发生这种情况,我不希望其他节点停机。

这是一些输出

[root@kvm_aquila-02 ~]# pcs quorum status
Quorum information
------------------
Date:             Fri Jun 28 09:07:18 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          2
Ring ID:          1/284
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           1  
Flags:            2Node Quorate 

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1         NR kvm_aquila-01
         2          1         NR kvm_aquila-02 (local)


[root@kvm_aquila-02 ~]# pcs config show
Cluster Name: kvm_aquila
Corosync Nodes:
 kvm_aquila-01 kvm_aquila-02
Pacemaker Nodes:
 kvm_aquila-01 kvm_aquila-02

Resources:
 Clone: dlm-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: dlm (class=ocf provider=pacemaker type=controld)
   Operations: monitor interval=30s on-fail=fence (dlm-monitor-interval-30s)
               start interval=0s timeout=90 (dlm-start-interval-0s)
               stop interval=0s timeout=100 (dlm-stop-interval-0s)
 Clone: clvmd-clone
  Meta Attrs: interleave=true ordered=true 
  Resource: clvmd (class=ocf provider=heartbeat type=clvm)
   Operations: monitor interval=30s on-fail=fence (clvmd-monitor-interval-30s)
               start interval=0s timeout=90s (clvmd-start-interval-0s)
               stop interval=0s timeout=90s (clvmd-stop-interval-0s)
 Group: test_VPS
  Resource: test (class=ocf provider=heartbeat type=VirtualDomain)
   Attributes: config=/shared/xml/test.xml hypervisor=qemu:///system migration_transport=ssh
   Meta Attrs: allow-migrate=true is-managed=true priority=100 target-role=Started 
   Utilization: cpu=4 hv_memory=4096
   Operations: migrate_from interval=0 timeout=120s (test-migrate_from-interval-0)
               migrate_to interval=0 timeout=120 (test-migrate_to-interval-0)
               monitor interval=10 timeout=30 (test-monitor-interval-10)
               start interval=0s timeout=300s (test-start-interval-0s)
               stop interval=0s timeout=300s (test-stop-interval-0s)

Stonith Devices:
 Resource: kvm_aquila-01 (class=stonith type=fence_ilo4)
  Attributes: ipaddr=10.0.4.39 login=fencing passwd=0ToleranciJa pcmk_host_list="kvm_aquila-01 kvm_aquila-02"
  Operations: monitor interval=60s (kvm_aquila-01-monitor-interval-60s)
 Resource: kvm_aquila-02 (class=stonith type=fence_ilo4)
  Attributes: ipaddr=10.0.4.49 login=fencing passwd=0ToleranciJa pcmk_host_list="kvm_aquila-01 kvm_aquila-02"
  Operations: monitor interval=60s (kvm_aquila-02-monitor-interval-60s)
Fencing Levels:

Location Constraints:
Ordering Constraints:
  start dlm-clone then start clvmd-clone (kind:Mandatory)
Colocation Constraints:
  clvmd-clone with dlm-clone (score:INFINITY)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: kvm_aquila
 dc-version: 1.1.19-8.el7_6.4-c3c624ea3d
 have-watchdog: false
 last-lrm-refresh: 1561619537
 no-quorum-policy: ignore
 stonith-enabled: true

Quorum:
  Options:
    wait_for_all: 0

[root@kvm_aquila-02 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: kvm_aquila-02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
 Last updated: Fri Jun 28 09:14:11 2019
 Last change: Thu Jun 27 16:23:44 2019 by root via cibadmin on kvm_aquila-01
 2 nodes configured
 7 resources configured

PCSD Status:
  kvm_aquila-02: Online
  kvm_aquila-01: Online
[root@kvm_aquila-02 ~]# pcs status
Cluster name: kvm_aquila
Stack: corosync
Current DC: kvm_aquila-02 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Fri Jun 28 09:14:31 2019
Last change: Thu Jun 27 16:23:44 2019 by root via cibadmin on kvm_aquila-01

2 nodes configured
7 resources configured

Online: [ kvm_aquila-01 kvm_aquila-02 ]

Full list of resources:

 kvm_aquila-01  (stonith:fence_ilo4):   Started kvm_aquila-01
 kvm_aquila-02  (stonith:fence_ilo4):   Started kvm_aquila-02
 Clone Set: dlm-clone [dlm]
     Started: [ kvm_aquila-01 kvm_aquila-02 ]
 Clone Set: clvmd-clone [clvmd]
     Started: [ kvm_aquila-01 kvm_aquila-02 ]
 Resource Group: test_VPS
     test   (ocf::heartbeat:VirtualDomain): Started kvm_aquila-01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
pacemaker cluster
  • 1 个回答
  • 2113 Views
Martin Hope
Morphinz
Asked: 2018-09-26 02:03:43 +0800 CST

当我开始 corosync 时,所有服务器都因核心转储而恐慌

  • 1

我升级了我的服务器。然后我在我的服务器上一个一个地启动corosync服务。我首先在 3 台服务器上启动,然后等待 5 分钟。然后我在其他服务器上启动了下一个 4 corosync,同时有 7 个服务器崩溃了。我使用 corosync 已有 5 年了。我在用;

Kernel: 4.14.32-1-lts
Corosync 2.4.2-1 
Pacemaker 1.1.18-1

我以前从没见过这个。我猜新的 corosync 版本中有什么东西坏了真的很糟糕!

Kernel: 4.14.70-1-lts
Corosync 2.4.4-3 
Pacemaker 2.0.0-1

-

这是我的corosync.conf:https ://paste.ubuntu.com/p/7KCq8pHKn3/你能告诉我如何找到问题的原因吗?

Sep 25 08:56:03 SRV-2 corosync[29089]:   [TOTEM ] A new membership (10.10.112.10:56) was formed. Members joined: 7
Sep 25 08:56:03 SRV-2 corosync[29089]:   [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]:   [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]:   [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]:   [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 corosync[29089]:   [QUORUM] Members[7]: 1 2 3 4 5 6 7
Sep 25 08:56:03 SRV-2 corosync[29089]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep 25 08:56:03 SRV-2 corosync[29089]:   [VOTEQ ] Waiting for all cluster members. Current votes: 7 expected_votes: 28
Sep 25 08:56:03 SRV-2 systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Sep 25 08:56:03 SRV-2 systemd[1]: Started Process Core Dump (PID 43798/UID 0).
Sep 25 08:56:03 SRV-2 systemd[1]: corosync.service: Main process exited, code=dumped, status=11/SEGV
Sep 25 08:56:03 SRV-2 systemd[1]: corosync.service: Failed with result 'core-dump'.
Sep 25 08:56:03 SRV-2 kernel: watchdog: watchdog0: watchdog did not stop!
Sep 25 08:56:03 SRV-2 systemd-coredump[43799]: Process 29089 (corosync) of user 0 dumped core.

                                                      Stack trace of thread 29089:
                                                      #0  0x0000000000000000 n/a (n/a)
Write failed: Broken pipe


coredumpctl info
           PID: 23658 (corosync)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Mon 2018-09-24 09:50:58 +03 (1 day 3h ago)
  Command Line: corosync
    Executable: /usr/bin/corosync
 Control Group: /system.slice/corosync.service
          Unit: corosync.service
         Slice: system.slice
       Boot ID: 79d67a83f83c4804be6ded8e6bd5f54d
    Machine ID: 9b1ca27d3f4746c6bcfcdb93b83f3d45
      Hostname: SRV-1
       Storage: /var/lib/systemd/coredump/core.corosync.0.79d67a83f83c4804be6ded8e6bd5f54d.23658.153777185>
       Message: Process 23658 (corosync) of user 0 dumped core.

                Stack trace of thread 23658:
                #0  0x0000000000000000 n/a (n/a)

           PID: 5164 (corosync)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 11 (SEGV)
     Timestamp: Tue 2018-09-25 08:56:03 +03 (4h 9min ago)
  Command Line: corosync
    Executable: /usr/bin/corosync
 Control Group: /system.slice/corosync.service
          Unit: corosync.service
         Slice: system.slice
       Boot ID: 2f49ec6cdcc144f0a8eb712bbfbd7203
    Machine ID: 9b1ca27d3f4746c6bcfcdb93b83f3d45
      Hostname: SRV-1
       Storage: /var/lib/systemd/coredump/core.corosync.0.2f49ec6cdcc144f0a8eb712bbfbd7203.5164.1537854963>
       Message: Process 5164 (corosync) of user 0 dumped core.

                Stack trace of thread 5164:
                #0  0x0000000000000000 n/a (n/a)

我找不到更多日志,所以我无法挖掘问题。

linux pacemaker
  • 1 个回答
  • 221 Views
Martin Hope
yesOrMaybeWhatever
Asked: 2018-03-31 03:38:46 +0800 CST

主动-被动起搏器集群中的数据同步

  • 0

我计划在一个主动-被动集群中使用pacemaker并corosync为pcs我们的两个服务于 perl 应用程序的前端服务器提供 HA。我已经为服务 IP、Apache 应用程序守护程序以及我还需要做的事情配置了资源 - 配置复制。这意味着,我需要配置复制,根据哪个节点是活动节点,它将整个 /opt/application 目录同步到被动节点。如何实现这个才是硬道理?两个前端都是 ESX VM。

谢谢!

synchronization pacemaker
  • 1 个回答
  • 1649 Views
Martin Hope
krs4keshara
Asked: 2017-12-31 21:04:53 +0800 CST

Corosync/Pacemaker pcs 等效于 crm 的命令

  • 1

我知道在 corosync 和起搏器的高可用性方面, crm实用程序命令一直是人们管理集群的首选方法。现在,它已被弃用,我们被告知使用pcs实用程序命令,这些命令假设可以完成我们过去使用crm所做的所有事情。

现在我困扰的是找到 pcs 等效命令;

crm node attribute <node_name> set <resource_name> <some_parameters>.

如果我尝试使用 pcs 节点,则没有任何这样的命令集可用。

我在 CentOS 7.2 版本和使用 Percona 主从集群。

pacemaker corosync
  • 1 个回答
  • 1518 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    模块 i915 可能缺少固件 /lib/firmware/i915/*

    • 3 个回答
  • Marko Smith

    无法获取 jessie backports 存储库

    • 4 个回答
  • Marko Smith

    如何将 GPG 私钥和公钥导出到文件

    • 4 个回答
  • Marko Smith

    我们如何运行存储在变量中的命令?

    • 5 个回答
  • Marko Smith

    如何配置 systemd-resolved 和 systemd-networkd 以使用本地 DNS 服务器来解析本地域和远程 DNS 服务器来解析远程域?

    • 3 个回答
  • Marko Smith

    dist-upgrade 后 Kali Linux 中的 apt-get update 错误 [重复]

    • 2 个回答
  • Marko Smith

    如何从 systemctl 服务日志中查看最新的 x 行

    • 5 个回答
  • Marko Smith

    Nano - 跳转到文件末尾

    • 8 个回答
  • Marko Smith

    grub 错误:你需要先加载内核

    • 4 个回答
  • Marko Smith

    如何下载软件包而不是使用 apt-get 命令安装它?

    • 7 个回答
  • Martin Hope
    user12345 无法获取 jessie backports 存储库 2019-03-27 04:39:28 +0800 CST
  • Martin Hope
    Carl 为什么大多数 systemd 示例都包含 WantedBy=multi-user.target? 2019-03-15 11:49:25 +0800 CST
  • Martin Hope
    rocky 如何将 GPG 私钥和公钥导出到文件 2018-11-16 05:36:15 +0800 CST
  • Martin Hope
    Evan Carroll systemctl 状态显示:“状态:降级” 2018-06-03 18:48:17 +0800 CST
  • Martin Hope
    Tim 我们如何运行存储在变量中的命令? 2018-05-21 04:46:29 +0800 CST
  • Martin Hope
    Ankur S 为什么 /dev/null 是一个文件?为什么它的功能不作为一个简单的程序来实现? 2018-04-17 07:28:04 +0800 CST
  • Martin Hope
    user3191334 如何从 systemctl 服务日志中查看最新的 x 行 2018-02-07 00:14:16 +0800 CST
  • Martin Hope
    Marko Pacak Nano - 跳转到文件末尾 2018-02-01 01:53:03 +0800 CST
  • Martin Hope
    Kidburla 为什么真假这么大? 2018-01-26 12:14:47 +0800 CST
  • Martin Hope
    Christos Baziotis 在一个巨大的(70GB)、一行、文本文件中替换字符串 2017-12-30 06:58:33 +0800 CST

热门标签

linux bash debian shell-script text-processing ubuntu centos shell awk ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve