谁能阐明我在下面列出的差异。也许可以解释为什么 NetworkManager 做的不同。请告知我们是否可以将 NetworkManager 更改为更像非 NetworkManager 场景。
两台 CentOS 7.8 服务器都使用 dhclient,但其中一台由 NetworkManager 控制。两者每隔几天都有相同的开关/NIC 关闭/向上事件(此时无法控制 - 出于多种原因,而且我们是远程的)
使用 NetworkManager 的服务器#0 在停机/停机后立即尝试请求 DHCP。它无法从 DHCP 获得任何响应(另一个交换机问题),然后取消 DHCP 事务并将状态更改为超时。然后它什么也不做,除非重新启动 NetworkManager(显然这只能在控制台完成)。请看下面的整个序列。
未使用 NetworkManager 的服务器#1 通过这些停机/停机中断恢复正常,似乎它只是在整个 NIC 停机时保持其租约,甚至不更新 NIC,只是继续使用它的 IP!稍后,它能够以常规租用超时间隔更新 DHCP。请看下面的整个序列。
请让我知道我是否可以将 NetworkManager 更改为更像普通的 dhclient。也许可以将其配置为在关闭/启动后仅保留当前租约,并以常规租约超时间隔续订?谢谢!!
服务器#0:
-- Last regular DHCP renew:
Feb 26 09:31:21 server0 dhclient[4766]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x58eefe09)
Feb 26 09:31:21 server0 dhclient[4766]: DHCPACK from 10.20.20.131 (xid=0x58eefe09)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5084] dhcp4 (enp96s0f0): address 10.20.20.223
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): plen 22 (255.255.252.0)
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): gateway 10.20.20.1
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): lease time 18000
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5090] dhcp4 (enp96s0f0): nameserver '10.20.20.49'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5091] dhcp4 (enp96s0f0): nameserver '10.20.20.48'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5091] dhcp4 (enp96s0f0): domain name 'dom.com'
Feb 26 09:31:21 server0 NetworkManager[3701]: <info> [1614349881.5091] dhcp4 (enp96s0f0): state changed bound -> bound
Feb 26 09:31:21 server0 dhclient[4766]: bound to 10.20.20.223 -- renewal in 8129 seconds.
Feb 26 09:31:21 server0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 09:31:21 server0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: new request (4 scripts)
Feb 26 09:31:21 server0 nm-dispatcher: req:1 'dhcp4-change' [enp96s0f0]: start running ordered scripts...
-- Random switch outage:
Feb 26 10:49:10 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8263] device (enp96s0f0): state change: activated -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8467] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 4766
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8468] dhcp4 (enp96s0f0): state changed bound -> done
Feb 26 10:49:16 SERVER0 NetworkManager[3701]: <info> [1614354556.8679] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:49:16 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:49:16 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:1 'down' [enp96s0f0]: start running ordered scripts...
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:49:16 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 10:58:46 SERVER0 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is not accessible
-- NetworkManager tries to recover and request DHCP:
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6768] device (enp96s0f0): carrier: link connected
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6783] device (enp96s0f0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6823] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6835] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6837] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6844] manager: NetworkManager state is now CONNECTING
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.6848] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.7360] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.7369] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:58:46 SERVER0 NetworkManager[3701]: <info> [1614355126.7435] dhcp4 (enp96s0f0): dhclient started with pid 44653
Feb 26 10:58:46 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:58:54 SERVER0 dhclient[44653]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x161525b4)
Feb 26 10:59:13 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 3 (xid=0x2f70b1a3)
Feb 26 10:59:16 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 6 (xid=0x2f70b1a3)
Feb 26 10:59:22 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 9 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 dhclient[44653]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x2f70b1a3)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn> [1614355171.8451] dhcp4 (enp96s0f0): request timed out
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8451] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8540] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44653
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8541] dhcp4 (enp96s0f0): state changed timeout -> done
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8545] device (enp96s0f0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8553] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <warn> [1614355171.8559] device (enp96s0f0): Activation: failed for connection 'enp96s0f0'
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8563] device (enp96s0f0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8606] policy: auto-activating connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8615] device (enp96s0f0): Activation: starting connection 'enp96s0f0' (7bdb7768-49c5-4cc4-a740-ee0a86cd90d5)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8617] device (enp96s0f0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
-- NetworkManager tries to recover and request DHCP again following a different process:
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8624] manager: NetworkManager state is now CONNECTING
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.8628] device (enp96s0f0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.9420] device (enp96s0f0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.9429] dhcp4 (enp96s0f0): activation: beginning transaction (timeout in 45 seconds)
Feb 26 10:59:31 SERVER0 NetworkManager[3701]: <info> [1614355171.9489] dhcp4 (enp96s0f0): dhclient started with pid 44712
Feb 26 10:59:32 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:36 SERVER0 dhclient[44712]: DHCPREQUEST on enp96s0f0 to 255.255.255.255 port 67 (xid=0x5bd6c866)
Feb 26 10:59:44 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:49 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 5 (xid=0x3ffbeab4)
Feb 26 10:59:54 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 7 (xid=0x3ffbeab4)
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5823] device (enp96s0f0): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5846] device (enp96s0f0): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5850] device (enp96s0f0): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5869] manager: NetworkManager state is now CONNECTED_LOCAL
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5982] manager: NetworkManager state is now CONNECTED_SITE
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5988] policy: set 'enp96s0f0' (enp96s0f0) as default for IPv6 routing and DNS
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.5992] device (enp96s0f0): Activation: successful, device activated.
Feb 26 10:59:59 SERVER0 NetworkManager[3701]: <info> [1614355199.6003] manager: NetworkManager state is now CONNECTED_GLOBAL
Feb 26 10:59:59 SERVER0 systemd: Starting Network Manager Script Dispatcher Service...
Feb 26 10:59:59 SERVER0 systemd: Started Network Manager Script Dispatcher Service.
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:1 'up' [enp96s0f0]: start running ordered scripts...
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': new request (4 scripts)
Feb 26 10:59:59 SERVER0 nm-dispatcher: req:2 'connectivity-change': start running ordered scripts...
Feb 26 11:00:01 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 14 (xid=0x3ffbeab4)
Feb 26 11:00:15 SERVER0 dhclient[44712]: DHCPDISCOVER on enp96s0f0 to 255.255.255.255 port 67 interval 21 (xid=0x3ffbeab4)
-- NetworkManager cancels and times out and does nothing anymore
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <warn> [1614355216.8456] dhcp4 (enp96s0f0): request timed out
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info> [1614355216.8463] dhcp4 (enp96s0f0): state changed unknown -> timeout
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info> [1614355216.8649] dhcp4 (enp96s0f0): canceled DHCP transaction, DHCP client pid 44712
Feb 26 11:00:16 SERVER0 NetworkManager[3701]: <info> [1614355216.8650] dhcp4 (enp96s0f0): state changed timeout -> done
服务器#1:
-- Last regular DHCP renew:
Feb 26 10:34:00 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 10:34:00 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 10:34:02 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8195 seconds.
-- Random switch outage:
Feb 26 10:49:10 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Down
Feb 26 10:58:46 server1 kernel: i40e 0000:60:00.0 enp96s0f0: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: None
-- Machine is accessible during this time!
-- Next regular DHCP renew:
Feb 26 12:50:37 server1 dhclient[5252]: DHCPREQUEST on enp96s0f0 to 10.20.20.131 port 67 (xid=0x71bfdb34)
Feb 26 12:50:37 server1 dhclient[5252]: DHCPACK from 10.20.20.131 (xid=0x71bfdb34)
Feb 26 12:50:39 server1 dhclient[5252]: bound to 10.20.20.224 -- renewal in 8611 seconds.
在 NetworkManager 中,设备具有整体的逻辑状态。这就是你在
nmcli device
.如果设备已连接(激活),则它可能无法从 DHCP 获取地址(或者,稍后可能会发生 DHCP 超时)。取决于
ipv4.dhcp-timeout
(您可以设置为无穷大),一段时间后 DHCP 将被视为失败。发生这种情况时,设备可能会完全停机。这取决于设置ipv4.may-fail
。如果ipv4.may-fail=no
,则 DHCP 失败对激活来说是致命的,并且设备会关闭。如果没有,只要你有 IPv6 地址,整体状态还是算不错的。在这种情况下,应无限期重试 DHCP,同时设备保持激活/启动状态。另一方面,如果设备由于故障而停机,它就有资格再次自动连接(至少,如果你设置了它
connection.autoconect=yes
)。此自动连接循环最多重复connection.autoconnect-retries
多次,然后自动连接被阻止 5 分钟,然后再次开始。这就是它应该的样子。但是对于 CentOS7.8,我不确定这一切是否如我所说的那样有效。你说,“那么它什么也不做,除非 NetworkManager 重新启动”。您确定吗?你等得够久吗?DHCP 失败后,它可能会后退一点。您粘贴的日志在此之后完成。
调试 NetworkManager 时,调试日志更有用。
level=TRACE
在 NetworkManager.conf 中配置日志记录。也许
ipv4.may-fail=no
会有帮助?然后至少设备会关闭,并且自动连接周期将再次开始。顺便说一句,如果您希望 NetworkManager 在拔下电缆时让设备保持运行状态(您似乎喜欢 dhclient),那么在
man NetworkManager.conf
.