AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / user-1058876

Ondřej Žižka's questions

Martin Hope
Ondřej Žižka
Asked: 2023-06-02 07:53:12 +0800 CST

Samba 挂载系统

  • 6

我使用具有 2 个节点 + quorum 的 pc 设置了一个集群

[root@konor2 etc]# pcs status
Cluster name: wildflycluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-06-01 09:52:35 +02:00)
Cluster Summary:
  * Stack: corosync
  * Current DC: konor2c (version 2.1.5-7.el9-a3f44794f94) - partition with quorum
  * Last updated: Thu Jun  1 09:52:36 2023
  * Last change:  Thu Jun  1 06:03:53 2023 by root via cibadmin on konor2c
  * 2 nodes configured
  * 4 resource instances configured

Node List:
  * Online: [ konor2c ]
  * OFFLINE: [ konor1c ]

Full List of Resources:
  * Resource Group: wildfly-resources-grp:
    * wildfly-vip       (ocf:heartbeat:IPaddr2):         Started konor2c
    * wildfly-server    (systemd:wildfly):       Started konor2c
    * smb-mount-it      (systemd:home-jboss-mnt-protector-IT.mount):     Started konor2c
    * smb-mount-transmek        (systemd:home-jboss-mnt-protector-transmek.mount):       Started konor2c

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

其中/etc/fstab有以下SMB坐骑:

//protector/data         /home/jboss/mnt/protector/data          cifs  noauto,vers=3.0,_netdev,credentials=/etc/ucr/vu-d.crd,domain=unmz,uid=wildfly,noexec,nosuid,mapchars,file_mode=0664,dir_mode=0775,nounix,nobrl 0
//protector/IT         /home/jboss/mnt/protector/IT          cifs  noauto,vers=3.0,_netdev,credentials=/etc/ucr/vu-d.crd,domain=unmz,uid=wildfly,noexec,nosuid,mapchars,file_mode=0664,dir_mode=0775,nounix,nobrl 0 0
//protector/transmek   /home/jboss/mnt/protector/transmek    cifs  noauto,vers=3.0,_netdev,credentials=/etc/ucr/vu-d.crd,domain=unmz,uid=wildfly,noexec,nosuid,mapchars,file_mode=0664,dir_mode=0775,nounix,nobrl 0 0

使用 mount 命令挂载共享,我可以使用 systemctl 挂载它们。然后它工作正常,但过了一段时间(可能是 2 到 20 小时 - 我还没有找到触发器)。进程 cifsiod 开始消耗大量 CPU,一段时间后它消耗所有 CPU,必须从 VMware vCenter 手动重启 VM。在 中/var/log/messages,有这样的消息:

May 30 22:10:00 konor1 systemd[1]: Finished system activity accounting tool.
May 30 22:11:20 konor1 pacemaker-controld[419997]: notice: State transition S_IDLE -> S_POLICY_ENGINE
May 30 22:11:20 konor1 pacemaker-schedulerd[419996]: notice: Calculated transition 196, saving inputs in /var/lib/pacemaker/pengine/pe-input-104.bz2
May 30 22:11:20 konor1 pacemaker-controld[419997]: notice: Transition 196 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-104.bz2): Complete
May 30 22:11:20 konor1 pacemaker-controld[419997]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
May 30 22:19:12 konor1 pacemaker-controld[419997]: notice: High CPU load detected: 3.990000
May 30 22:19:13 konor1 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:0:1456407]
May 30 22:19:13 konor1 kernel: Modules linked in: tls nls_utf8 cifs cifs_arc4 rdma_cm iw_cm ib_cm ib_core cifs_md4 dns_resolver nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_
reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vm
w_vsock_vmci_transport vsock sunrpc vfat fat intel_rapl_msr intel_rapl_common vmw_balloon rapl pcspkr vmw_vmci i2c_piix4 joydev xfs libcrc32c sr_mod cdrom ata_generic vmwgfx drm_ttm_helper ttm d
rm_kms_helper ahci syscopyarea sysfillrect sysimgblt fb_sys_fops libahci ata_piix sd_mod drm t10_pi sg crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel vmxnet3 vmw_pvscsi se
rio_raw dm_mirror dm_region_hash dm_log dm_mod fuse
May 30 22:19:13 konor1 kernel: CPU: 0 PID: 1456407 Comm: kworker/0:0 Kdump: loaded Not tainted 5.14.0-284.11.1.el9_2.x86_64 #1
May 30 22:19:13 konor1 kernel: Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.18227214.B64.2106252220 06/25/2021
May 30 22:19:13 konor1 kernel: Workqueue: cifsiod smb2_reconnect_server [cifs]
May 30 22:19:13 konor1 kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x21/0x30
May 30 22:19:13 konor1 kernel: Code: 82 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 66 90 ba 01 00 00 00 8b 07 85 c0 75 0d f0 0f b1 17 85 c0 75 f2 c3 cc cc cc cc f3 90 <eb> e9 e9 38 fe ff ff 0f 1f 84
 00 00 00 00 00 0f 1f 44 00 00 41 57
May 30 22:19:13 konor1 kernel: RSP: 0018:ffffb00087187d78 EFLAGS: 00000202
May 30 22:19:13 konor1 kernel: RAX: 0000000000000001 RBX: ffff9cdc14b62800 RCX: 000000364c970000
May 30 22:19:13 konor1 kernel: RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff9cdc14b60828
May 30 22:19:13 konor1 kernel: RBP: ffff9cdc14b60828 R08: ffffb00087187e38 R09: 0000000000000000
May 30 22:19:13 konor1 kernel: R10: ffffb00087187ce8 R11: ffff9cdc3594dc00 R12: 0000000000000000
May 30 22:19:13 konor1 kernel: R13: ffff9cdc14b60800 R14: 000000000000ffff R15: 000000000000ffff
May 30 22:19:13 konor1 kernel: FS:  0000000000000000(0000) GS:ffff9cdcb9c00000(0000) knlGS:0000000000000000
May 30 22:19:13 konor1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 30 22:19:13 konor1 kernel: CR2: 00007fa14a882000 CR3: 00000001ab010003 CR4: 00000000000606f0
May 30 22:19:13 konor1 kernel: Call Trace:
May 30 22:19:13 konor1 kernel: <TASK>
May 30 22:19:13 konor1 kernel: _raw_spin_lock+0x25/0x30
May 30 22:19:13 konor1 kernel: smb2_reconnect.part.0+0x3f/0x5f0 [cifs]
May 30 22:19:13 konor1 kernel: ? set_next_entity+0xda/0x150
May 30 22:19:13 konor1 kernel: smb2_reconnect_server+0x203/0x5f0 [cifs]
May 30 22:19:13 konor1 kernel: ? __tdx_hypercall+0x80/0x80
May 30 22:19:13 konor1 kernel: process_one_work+0x1e5/0x3c0
May 30 22:19:13 konor1 kernel: ? rescuer_thread+0x3a0/0x3a0
May 30 22:19:13 konor1 kernel: worker_thread+0x50/0x3b0
May 30 22:19:13 konor1 kernel: ? rescuer_thread+0x3a0/0x3a0
May 30 22:19:13 konor1 kernel: kthread+0xd6/0x100
May 30 22:19:13 konor1 kernel: ? kthread_complete_and_exit+0x20/0x20
May 30 22:19:13 konor1 kernel: ret_from_fork+0x1f/0x30
May 30 22:19:13 konor1 kernel: </TASK>
May 30 22:19:23 konor1 corosync-qdevice[933368]: Server didn't send echo reply message on time
May 30 22:19:34 konor1 corosync-qdevice[933368]: Connect timeout
May 30 22:19:41 konor1 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [kworker/0:0:1456407]

依此类推...仍在重复...在corosync.log我可以找到以下消息(来自另一天)

I, [2023-05-22T09:57:32.101 #00000]     INFO -- : 200 GET /remote/get_configs?cluster_name=wildflycluster (10.10.51.46) 3.75ms
I, [2023-05-22T10:06:42.066 #00000]     INFO -- : 200 GET /remote/get_configs?cluster_name=wildflycluster (10.10.51.47) 4.13ms
I, [2023-05-22T10:06:42.271 #00012]     INFO -- : Config files sync started
I, [2023-05-22T10:06:42.272 #00012]     INFO -- : SRWT Node: konor2 Request: get_configs
I, [2023-05-22T10:06:42.272 #00012]     INFO -- : Connecting to: https://konor2:2224/remote/get_configs?cluster_name=wildflycluster
I, [2023-05-22T10:06:42.272 #00012]     INFO -- : SRWT Node: konor1 Request: get_configs
I, [2023-05-22T10:06:42.272 #00012]     INFO -- : Connecting to: https://konor1:2224/remote/get_configs?cluster_name=wildflycluster
I, [2023-05-22T10:07:05.272 #00012]     INFO -- : Config files sync finished
I, [2023-05-22T10:07:35.262 #00000]     INFO -- : 200 GET /remote/get_configs?cluster_name=wildflycluster (10.10.51.46) 7.95ms
I, [2023-05-22T10:16:42.015 #00013]     INFO -- : Config files sync started
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : SRWT Node: konor2 Request: get_configs
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : Connecting to: https://konor2:2224/remote/get_configs?cluster_name=wildflycluster
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : SRWT Node: konor1 Request: get_configs
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : Connecting to: https://konor1:2224/remote/get_configs?cluster_name=wildflycluster
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : No response from: konor1 request: get_configs, error: couldnt_connect
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : No response from: konor2 request: get_configs, error: couldnt_connect
I, [2023-05-22T10:16:42.016 #00013]     INFO -- : Config files sync finished

看起来服务器丢失了网络连接

我有另一个具有相同 SMB 挂载的单个 VM(非集群),并且挂载了几天没有任何问题。当我在没有 SMB 挂载的情况下运行集群时,它运行了好几天都没有问题。这种情况在集群的两个 VM 上都存在。我也重新安装了它们,但它仍然是一样的。

你见过类似的东西吗?您对如何排除故障有任何建议吗?谢谢你的任何提示。

mount
  • 1 个回答
  • 56 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    如何减少“vmmem”进程的消耗?

    • 11 个回答
  • Marko Smith

    从 Microsoft Stream 下载视频

    • 4 个回答
  • Marko Smith

    Google Chrome DevTools 无法解析 SourceMap:chrome-extension

    • 6 个回答
  • Marko Smith

    Windows 照片查看器因为内存不足而无法运行?

    • 5 个回答
  • Marko Smith

    支持结束后如何激活 WindowsXP?

    • 6 个回答
  • Marko Smith

    远程桌面间歇性冻结

    • 7 个回答
  • Marko Smith

    子网掩码 /32 是什么意思?

    • 6 个回答
  • Marko Smith

    鼠标指针在 Windows 中按下的箭头键上移动?

    • 1 个回答
  • Marko Smith

    VirtualBox 无法以 VERR_NEM_VM_CREATE_FAILED 启动

    • 8 个回答
  • Marko Smith

    应用程序不会出现在 MacBook 的摄像头和麦克风隐私设置中

    • 5 个回答
  • Martin Hope
    Vickel Firefox 不再允许粘贴到 WhatsApp 网页中? 2023-08-18 05:04:35 +0800 CST
  • Martin Hope
    Saaru Lindestøkke 为什么使用 Python 的 tar 库时 tar.xz 文件比 macOS tar 小 15 倍? 2021-03-14 09:37:48 +0800 CST
  • Martin Hope
    CiaranWelsh 如何减少“vmmem”进程的消耗? 2020-06-10 02:06:58 +0800 CST
  • Martin Hope
    Jim Windows 10 搜索未加载,显示空白窗口 2020-02-06 03:28:26 +0800 CST
  • Martin Hope
    andre_ss6 远程桌面间歇性冻结 2019-09-11 12:56:40 +0800 CST
  • Martin Hope
    Riley Carney 为什么在 URL 后面加一个点会删除登录信息? 2019-08-06 10:59:24 +0800 CST
  • Martin Hope
    zdimension 鼠标指针在 Windows 中按下的箭头键上移动? 2019-08-04 06:39:57 +0800 CST
  • Martin Hope
    jonsca 我所有的 Firefox 附加组件突然被禁用了,我该如何重新启用它们? 2019-05-04 17:58:52 +0800 CST
  • Martin Hope
    MCK 是否可以使用文本创建二维码? 2019-04-02 06:32:14 +0800 CST
  • Martin Hope
    SoniEx2 更改 git init 默认分支名称 2019-04-01 06:16:56 +0800 CST

热门标签

windows-10 linux windows microsoft-excel networking ubuntu worksheet-function bash command-line hard-drive

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve