AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / user-105070

Jim's questions

Martin Hope
Jim
Asked: 2019-12-18 23:16:18 +0800 CST

起搏器无法启动,因为重复节点但无法删除重复节点,因为起搏器无法启动

  • 0

好的!对起搏器/corosync 来说真的很新,比如 1 天新。

软件:Ubuntu 18.04 LTS 以及与该发行版相关的版本。

起搏器:1.1.18

同步:2.4.3

我不小心从整个测试集群中删除了节点(3 个节点)

当我尝试使用 GUI 恢复所有内容时pcsd,由于节点被“清除”而失败。凉爽的。

所以。corosync.conf我从“主”节点获得了最后一个副本。我复制到其他两个节点。我修复bindnetaddr了各自的confs。我pcs cluster start在我的“主”节点上运行。

其中一个节点未能启动。我查看了pacemaker该节点上的状态,并收到以下异常:

Dec 18 06:33:56 region-ctrl-2 crmd[1049]:     crit: Nodes 1084777441 and 2 share the same name 'region-ctrl-2': shutting down

我尝试在无法启动crm_node -R --force 1084777441的机器上运行,但当然,它没有运行,所以我得到一个错误。因此,我在其中一个健康节点上运行了相同的命令,它没有显示任何错误,但该节点永远不会消失,并且在受影响的机器上继续显示相同的错误。pacemakerpacemakercrmd: connection refused (111)pacemaker

所以,我决定一次又一次地拆除整个集群。我从机器上清除了所有的包。我重新安装了一切新鲜的。我复制并修复corosync.conf了机器。我重新创建了集群。我得到了完全相同的血腥错误。

所以这个命名1084777441的节点不是我创建的机器。这是为我创建的集群之一。当天早些时候,我意识到我使用的是 IP 地址corosync.conf而不是名称。我修复了/etc/hosts机器,从 corosync 配置中删除了 IP 地址,这就是为什么我一开始无意中删除了我的整个集群(我删除了作为 IP 地址的节点)。

以下是我的 corosync.conf:

totem {
    version: 2
    cluster_name: maas-cluster
    token: 3000
    token_retransmits_before_loss_const: 10
    clear_node_high_bit: yes
    crypto_cipher: none
    crypto_hash: none

    interface {
        ringnumber: 0
        bindnetaddr: 192.168.99.225
        mcastport: 5405
        ttl: 1
    }
}

logging {
    fileline: off
    to_stderr: no
    to_logfile: no
    to_syslog: yes
    syslog_facility: daemon
    debug: off
    timestamp: on

    logger_subsys {
        subsys: QUORUM
        debug: off
    }
}

quorum {
    provider: corosync_votequorum
    expected_votes: 3
    two_node: 1
}

nodelist {
    node {
        ring0_addr: postgres-sb
        nodeid: 3
    }

    node {
        ring0_addr: region-ctrl-2
        nodeid: 2
    }

    node {
        ring0_addr: region-ctrl-1
        nodeid: 1
    }
}

节点之间的这个 conf 唯一不同的是bindnetaddr.

这里似乎存在鸡/蛋问题,除非我不知道有某种方法可以从某处的平面文件数据库或 sqlite 数据库中删除节点,或者有其他更权威的方法可以从集群中删除节点。

额外的

我已经确保/etc/hosts每台机器的主机名都匹配。我忘了提那个。

127.0.0.1 localhost
127.0.1.1 postgres
192.168.99.224 postgres-sb
192.168.99.223 region-ctrl-1
192.168.99.225 region-ctrl-2

192.168.7.224 postgres-sb
192.168.7.223 region-ctrl-1
192.168.7.225 region-ctrl-2


# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

我决定尝试从头开始。我apt removed --purge编辑corosync*,pacemaker* crmsh和pcs. 我rm -rf编/etc/corosync。corosync.conf我在每台机器上都保留了一份副本。

我在每台机器上重新安装了所有东西。我将保存的内容复制corosync.conf到/etc/corosync/并corosync在所有机器上重新启动。

我仍然得到同样的错误。这必须是其中一个组件中的错误!

因此,似乎crm_get_peer无法识别名为的主机region-ctrl-2在corosync.conf. 然后节点 2 会自动分配一个 ID 1084777441。这对我来说没有意义。机器的主机名region-ctrl-2设置在/etc/hostname并/etc/hosts使用uname -n. corosync.conf正在显式地为命名的机器分配一个 ID,但region-ctrl-2某些东西显然无法识别来自该主机的分配corosync,而是为该主机分配了一个值为 1084777441 的非随机 ID。我怎么解决这个问题?

日志

    info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores
    info: get_cluster_type:     Detected an active 'corosync' cluster
    info: qb_ipcs_us_publish:   server name: pacemakerd
    info: pcmk__ipc_is_authentic_process_active:        Could not connect to lrmd IPC: Connection refused
    info: pcmk__ipc_is_authentic_process_active:        Could not connect to cib_ro IPC: Connection refused
    info: pcmk__ipc_is_authentic_process_active:        Could not connect to crmd IPC: Connection refused
    info: pcmk__ipc_is_authentic_process_active:        Could not connect to attrd IPC: Connection refused
    info: pcmk__ipc_is_authentic_process_active:        Could not connect to pengine IPC: Connection refused
    info: pcmk__ipc_is_authentic_process_active:        Could not connect to stonith-ng IPC: Connection refused
    info: corosync_node_name:   Unable to get node name for nodeid 1084777441
  notice: get_node_name:        Could not obtain a node name for corosync nodeid 1084777441
    info: crm_get_peer: Created entry ea4ec23e-e676-4798-9b8b-00af39d3bb3d/0x5555f74984d0 for node (null)/1084777441 (1 total)
    info: crm_get_peer: Node 1084777441 has uuid 1084777441
    info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777441] - corosync-cpg is now online
  notice: cluster_connect_quorum:       Quorum acquired
    info: crm_get_peer: Created entry 882c0feb-d546-44b7-955f-4c8a844a0db1/0x5555f7499fd0 for node postgres-sb/3 (2 total)
    info: crm_get_peer: Node 3 is now known as postgres-sb
    info: crm_get_peer: Node 3 has uuid 3
    info: crm_get_peer: Created entry 4e6a6b1e-d687-4527-bffc-5d701ff60a66/0x5555f749a6f0 for node region-ctrl-2/2 (3 total)
    info: crm_get_peer: Node 2 is now known as region-ctrl-2
    info: crm_get_peer: Node 2 has uuid 2
    info: crm_get_peer: Created entry 5532a3cc-2577-4764-b9ee-770d437ccec0/0x5555f749a0a0 for node region-ctrl-1/1 (4 total)
    info: crm_get_peer: Node 1 is now known as region-ctrl-1
    info: crm_get_peer: Node 1 has uuid 1
    info: corosync_node_name:   Unable to get node name for nodeid 1084777441
  notice: get_node_name:        Defaulting to uname -n for the local corosync node name
 warning: crm_find_peer:        Node 1084777441 and 2 share the same name: 'region-ctrl-2'
    info: crm_get_peer: Node 1084777441 is now known as region-ctrl-2
    info: pcmk_quorum_notification:     Quorum retained | membership=32 members=3
  notice: crm_update_peer_state_iter:   Node region-ctrl-1 state is now member | nodeid=1 previous=unknown source=pcmk_quorum_notification
  notice: crm_update_peer_state_iter:   Node postgres-sb state is now member | nodeid=3 previous=unknown source=pcmk_quorum_notification
  notice: crm_update_peer_state_iter:   Node region-ctrl-2 state is now member | nodeid=1084777441 previous=unknown source=pcmk_quorum_notification
    info: crm_reap_unseen_nodes:        State of node region-ctrl-2[2] is still unknown
    info: pcmk_cpg_membership:  Node 1084777441 joined group pacemakerd (counter=0.0, pid=32765, unchecked for rivals)
    info: pcmk_cpg_membership:  Node 1 still member of group pacemakerd (peer=region-ctrl-1:900, counter=0.0, at least once)
    info: crm_update_peer_proc: pcmk_cpg_membership: Node region-ctrl-1[1] - corosync-cpg is now online
    info: pcmk_cpg_membership:  Node 3 still member of group pacemakerd (peer=postgres-sb:976, counter=0.1, at least once)
    info: crm_update_peer_proc: pcmk_cpg_membership: Node postgres-sb[3] - corosync-cpg is now online
    info: pcmk_cpg_membership:  Node 1084777441 still member of group pacemakerd (peer=region-ctrl-2:3016, counter=0.2, at least once)
  pengine:     info: crm_log_init:      Changed active directory to /var/lib/pacemaker/cores
     lrmd:     info: crm_log_init:      Changed active directory to /var/lib/pacemaker/cores
     lrmd:     info: qb_ipcs_us_publish:        server name: lrmd
  pengine:     info: qb_ipcs_us_publish:        server name: pengine
      cib:     info: crm_log_init:      Changed active directory to /var/lib/pacemaker/cores
    attrd:     info: crm_log_init:      Changed active directory to /var/lib/pacemaker/cores
    attrd:     info: get_cluster_type:  Verifying cluster type: 'corosync'
    attrd:     info: get_cluster_type:  Assuming an active 'corosync' cluster
    info: crm_log_init: Changed active directory to /var/lib/pacemaker/cores
    attrd:   notice: crm_cluster_connect:       Connecting to cluster infrastructure: corosync
      cib:     info: get_cluster_type:  Verifying cluster type: 'corosync'
      cib:     info: get_cluster_type:  Assuming an active 'corosync' cluster
    info: get_cluster_type:     Verifying cluster type: 'corosync'
    info: get_cluster_type:     Assuming an active 'corosync' cluster
  notice: crm_cluster_connect:  Connecting to cluster infrastructure: corosync
    attrd:     info: corosync_node_name:        Unable to get node name for nodeid 1084777441
      cib:     info: validate_with_relaxng:     Creating RNG parser context
     crmd:     info: crm_log_init:      Changed active directory to /var/lib/pacemaker/cores
     crmd:     info: get_cluster_type:  Verifying cluster type: 'corosync'
     crmd:     info: get_cluster_type:  Assuming an active 'corosync' cluster
     crmd:     info: do_log:    Input I_STARTUP received in state S_STARTING from crmd_init
    attrd:   notice: get_node_name:     Could not obtain a node name for corosync nodeid 1084777441
    attrd:     info: crm_get_peer:      Created entry af5c62c9-21c5-4428-9504-ea72a92de7eb/0x560870420e90 for node (null)/1084777441 (1 total)
    attrd:     info: crm_get_peer:      Node 1084777441 has uuid 1084777441
    attrd:     info: crm_update_peer_proc:      cluster_connect_cpg: Node (null)[1084777441] - corosync-cpg is now online
    attrd:   notice: crm_update_peer_state_iter:        Node (null) state is now member | nodeid=1084777441 previous=unknown source=crm_update_peer_proc
    attrd:     info: init_cs_connection_once:   Connection to 'corosync': established
    info: corosync_node_name:   Unable to get node name for nodeid 1084777441
  notice: get_node_name:        Could not obtain a node name for corosync nodeid 1084777441
    info: crm_get_peer: Created entry 5bcb51ae-0015-4652-b036-b92cf4f1d990/0x55f583634700 for node (null)/1084777441 (1 total)
    info: crm_get_peer: Node 1084777441 has uuid 1084777441
    info: crm_update_peer_proc: cluster_connect_cpg: Node (null)[1084777441] - corosync-cpg is now online
  notice: crm_update_peer_state_iter:   Node (null) state is now member | nodeid=1084777441 previous=unknown source=crm_update_peer_proc
    attrd:     info: corosync_node_name:        Unable to get node name for nodeid 1084777441
    attrd:   notice: get_node_name:     Defaulting to uname -n for the local corosync node name
    attrd:     info: crm_get_peer:      Node 1084777441 is now known as region-ctrl-2
    info: corosync_node_name:   Unable to get node name for nodeid 1084777441
  notice: get_node_name:        Defaulting to uname -n for the local corosync node name
    info: init_cs_connection_once:      Connection to 'corosync': established
    info: corosync_node_name:   Unable to get node name for nodeid 1084777441
  notice: get_node_name:        Defaulting to uname -n for the local corosync node name
    info: crm_get_peer: Node 1084777441 is now known as region-ctrl-2
      cib:   notice: crm_cluster_connect:       Connecting to cluster infrastructure: corosync
      cib:     info: corosync_node_name:        Unable to get node name for nodeid 1084777441
      cib:   notice: get_node_name:     Could not obtain a node name for corosync nodeid 1084777441
      cib:     info: crm_get_peer:      Created entry a6ced2c1-9d51-445d-9411-2fb19deab861/0x55848365a150 for node (null)/1084777441 (1 total)
      cib:     info: crm_get_peer:      Node 1084777441 has uuid 1084777441
      cib:     info: crm_update_peer_proc:      cluster_connect_cpg: Node (null)[1084777441] - corosync-cpg is now online
      cib:   notice: crm_update_peer_state_iter:        Node (null) state is now member | nodeid=1084777441 previous=unknown source=crm_update_peer_proc
      cib:     info: init_cs_connection_once:   Connection to 'corosync': established
      cib:     info: corosync_node_name:        Unable to get node name for nodeid 1084777441
      cib:   notice: get_node_name:     Defaulting to uname -n for the local corosync node name
      cib:     info: crm_get_peer:      Node 1084777441 is now known as region-ctrl-2
      cib:     info: qb_ipcs_us_publish:        server name: cib_ro
      cib:     info: qb_ipcs_us_publish:        server name: cib_rw
      cib:     info: qb_ipcs_us_publish:        server name: cib_shm
      cib:     info: pcmk_cpg_membership:       Node 1084777441 joined group cib (counter=0.0, pid=0, unchecked for rivals)
linux
  • 1 个回答
  • 2177 Views
Martin Hope
Jim
Asked: 2016-08-23 10:54:36 +0800 CST

nfs 为 linux 客户端上的所有用户安装 rw nexenta 共享 - 失败

  • 2

情况:

我们使用 Nexenta 设备进行 NFS 文件服务 (nfs v3)。我们正在共享一条匿名读/写访问的路径。

我们有一个 Linux 主机,我们想挂载导出的读/写共享,并允许匿名读/写访问该客户端机器上的挂载共享。不幸的是,最终发生的是 root 可以写入共享,但非特权用户不能。

使用以下方法挂载共享:

# mount -t nfs -o rw 10:10:xx:xx:/path/to/share /mnt/mounted_path
# mount -t nfs -o rw,users 10:10:xx:xx:/path/to/share /mnt/mounted_path
# mount.nfs 10:10:xx:xx:/path/to/share /mnt/mounted_path -o rw

我们有特定用户打算用于共享上的匿名使用。所以我尝试以该用户身份安装卷:

# su - shared_user
$ sudo mount.nfs 10:10:xx:xx:/path/to/share /mnt/mounted_path -w -o user=shared_user,rw
mount.nfs: an incorrect mount option was specified

去读了一些:http: //nfs.sourceforge.net/nfs-howto/ar01s06.html。

决定尝试以下方法,尝试安装时会出现以下错误:

# mount -t nfs -orw,no_root_squash 10:10:xx:xx:/path/to/share /mnt/mounted_path 
mount.nfs: an incorrect mount option was specified
# mount -t nfs -orw,nroot_squash 10:10:xx:xx:/path/to/share /mnt/mounted_path  # for grins and giggles.
mount.nfs: an incorrect mount option was specified
$ sudo mount.nfs 10:10:xx:xx:/path/to/share /mnt/mounted_path -w -o user=shared_user,rw,no_root_squash
mount.nfs: an incorrect mount option was specified

我只是 SOL 还是我在这里根本缺少什么?这是我试图寻找的圣杯吗?我从来没有大量使用过 NFS,所以现在我有点不知所措。蒂亚!

linux nfs readwrite
  • 1 个回答
  • 2055 Views
Martin Hope
Jim
Asked: 2016-05-11 14:32:51 +0800 CST

禁用内部 Intel X710 LLDP 代理

  • 6

我们进行需要大量使用 LLDP 的特殊硬件配置。我们有一些新的服务器机架,它们都使用 Intel X710 10Gb 网卡。LLDP 突然停止工作。我们的 LLDP 实现很简单。使用默认 TLV 在 TOR(机架顶部)交换机上启用 LLDP。使用 lldpad (CentOS 6.5) 在 Linux 映像上启用 LLDP,并使用 lldptool 提取邻居信息,这在过去已在数千台机器上运行。只是,对于这些带有这些 NIC 的机器,整个事情就停止了工作。

使用来自交换机和服务器的数据包转储表明,帧已从服务器正确发送到交换机,相反,交换机正在正确接收来自服务器的帧并将 TLV 帧发送回服务器。但是,服务器没有收到交换帧 TLV,这让我们摸不着头脑。我们在 TOR 上放置了使用不同 NIC 的其他机器,它们按预期获取 LLDP 数据。

我问谷歌...

根据此链接,这些 X710 似乎正在运行内部 LLDP 代理,该代理正在拦截来自交换机的 LLDP 帧。我们看到发生这种情况的受影响机器上的固件是:

# ethtool -i eth2
driver: i40e
version: 1.3.47
firmware-version: 4.53 0x80001e5d 17.0.10
bus-info: 0000:01:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

在 NIC 上禁用内部 LLDP 代理的方法不起作用。尽管如此,我仍在挖掘,但我想我有几个选择:

  1. 找到正确的方法来禁用 NIC 上的内部 LLDP 代理,并使用现有方法在这些机器上提取 LLDP 数据 - 首选。
  2. 使用 NIC LLDP 代理并找到一种从 NIC 中提取邻居 TLV 的方法。

是否有其他人在使用这些卡时遇到过相同或类似的问题,如果有,您是如何解决这个问题的?

我想如果我想使用内部代理数据,它将通过ethtoolor公开snmp,但我还没有找到一种方法来显示信息。

TIA

编辑 作为记录,当我尝试英特尔论坛中概述的步骤时,我得到以下输出:

root@host (~)# find /sys/kernel/debug/
/sys/kernel/debug/
root@host (~)# mkdir /sys/kernel/debug/i40e
mkdir: cannot create directory `/sys/kernel/debug/i40e': No such file or directory
dell intel centos6 lldp
  • 4 个回答
  • 11171 Views
Martin Hope
Jim
Asked: 2014-01-01 17:34:52 +0800 CST

如何修复 NginX/Phusion/Passenger “zsh:1:没有这样的文件或目录:passenger/buildout/agents/SpawnPreparer”

  • 0

希望这是提出这个问题的正确网站。

我有一个网络服务,我正在尝试使用 NginX 和 Phusion Passenger 来设置一个 Ruby Sinatra 应用程序。问题是当我尝试启动服务器并测试站点时,我收到以下错误:

zsh:1: no such file or directory: passenger/buildout/agents/SpawnPreparer

现在,我已经能够确定那SpawnPreparer是 call zsh,虽然,我不知道为什么。我什zsh至不使用,并且在构建服务器时也没有使用它。不过,我想知道的是是否可以在 nginx.conf 中配置乘客以使用不同的 shell 来生成其进程?如果是这样,怎么做?

这是 Phusion 4.0.14

谢谢!

nginx
  • 1 个回答
  • 1208 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    新安装后 postgres 的默认超级用户用户名/密码是什么?

    • 5 个回答
  • Marko Smith

    SFTP 使用什么端口?

    • 6 个回答
  • Marko Smith

    命令行列出 Windows Active Directory 组中的用户?

    • 9 个回答
  • Marko Smith

    什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同?

    • 3 个回答
  • Marko Smith

    如何确定bash变量是否为空?

    • 15 个回答
  • Martin Hope
    Tom Feiner 如何按大小对 du -h 输出进行排序 2009-02-26 05:42:42 +0800 CST
  • Martin Hope
    Noah Goodrich 什么是 Pem 文件,它与其他 OpenSSL 生成的密钥文件格式有何不同? 2009-05-19 18:24:42 +0800 CST
  • Martin Hope
    Brent 如何确定bash变量是否为空? 2009-05-13 09:54:48 +0800 CST
  • Martin Hope
    cletus 您如何找到在 Windows 中打开文件的进程? 2009-05-01 16:47:16 +0800 CST

热门标签

linux nginx windows networking ubuntu domain-name-system amazon-web-services active-directory apache-2.4 ssh

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve