这是一个在裸机上使用 Kubespray 构建的新集群。
calicoctl
报告不Established
状态的问题,StatefulSet
成员之间无法相互通信,并且大多数Ingress
请求大约需要 10 秒才能打开示例 Nginx 页面。
所有其他组件,例如 etcd、podsudo kubectl get cs
和sudo kubectl cluster-info dump
都可以。
master-1 (192.168.250.111) 和 node-1 (192.168.250.112) 上的 calico-node pod 在日志中报告没有错误
master-2 (192.168.240.111) 和 node-1 (192.168.240.112) 上的 calico-node pod 在日志中报告错误
bird: BGP: Unexpected connect from unknown address 192.168.240.240 (port 36597)
- 此 IP 是 VPN 路由器的 IP(这些服务器的网关)
master-3 (192.168.230.111) 和 node-3 (192.168.230.112) 上的 calico-node pod 在日志中报告错误
bird: BGP: Unexpected connect from unknown address 192.168.230.230 (port 35029)
- 此 IP 是 VPN 路由器的 IP(这些服务器的网关)
192.168.250.112(节点 1):
era@server-node-1:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | up | 19:54:47 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:54:35 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.230.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:54:35 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.230.112 | node-to-node mesh | up | 20:42:30 | Established |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-1:~$
192.168.240.112(节点 2):
era@server-node-2:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | start | 19:52:09 | Passive |
| 192.168.240.111 | node-to-node mesh | up | 19:54:37 | Established |
| 192.168.230.111 | node-to-node mesh | start | 19:52:09 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.250.112 | node-to-node mesh | start | 19:52:09 | Passive |
| 192.168.230.112 | node-to-node mesh | start | 19:52:09 | Active Socket: Connection |
| | | | | reset by peer |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-2:~$
192.168.230.112(节点 3):
era@server-node-3:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.250.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:51:59 | Passive |
| 192.168.230.111 | node-to-node mesh | up | 19:54:25 | Established |
| 192.168.250.112 | node-to-node mesh | up | 20:42:30 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:51:59 | Passive |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-3:~$
我试图设置确切的网络接口,看看它是否有帮助 - 没有帮助:
era@server-master-1:~$ kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=ens3
daemonset.apps/calico-node env updated
尝试使用nc
179 测试从任何节点和主节点到任何节点和主节点的端口,他们成功了。
Ubuntu 18.04 用于操作系统。
有什么建议可以在 Calico 中调试以解决问题吗?任何提示对于更接近解决方案都是有用的。
更新
我发现问题与丢失的路线相关。
下面是 192.168.250.112 的输出。所以它无法到达 192.168.240.x 中的节点和主节点,因为没有路由:
era@server-node-1:~$ ip route | grep tun
10.233.76.0/24 via 192.168.230.112 dev tunl0 proto bird onlink
10.233.77.0/24 via 192.168.230.111 dev tunl0 proto bird onlink
10.233.79.0/24 via 192.168.250.111 dev tunl0 proto bird onlink
era@server-node-1:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | up | 21:39:05 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:54:35 | Connect Socket: Connection |
| | | | | reset by peer |
| 192.168.230.111 | node-to-node mesh | up | 20:42:31 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:54:35 | Connect Socket: Connection |
| | | | | reset by peer |
| 192.168.230.112 | node-to-node mesh | up | 20:42:30 | Established |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-1:~$
下面是 192.168.240.112 的输出。所以它无法到达 192.168.250.x 和 192.168.230.x 中的节点和主节点,因为没有路由:
era@server-node-2:~$ ip r | grep tunl
10.233.66.0/24 via 192.168.240.111 dev tunl0 proto bird onlink
era@server-node-2:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+--------------------------------+
| 192.168.250.111 | node-to-node mesh | start | 19:52:10 | Passive |
| 192.168.240.111 | node-to-node mesh | up | 19:54:38 | Established |
| 192.168.230.111 | node-to-node mesh | start | 22:05:18 | Active Socket: Connection |
| | | | | reset by peer |
| 192.168.250.112 | node-to-node mesh | start | 19:52:10 | Passive |
| 192.168.230.112 | node-to-node mesh | start | 22:05:22 | Active Socket: Connection |
| | | | | reset by peer |
+-----------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-2:~$
下面是 192.168.230.112 的输出。所以它无法到达 192.168.240.x 中的节点和主节点,因为没有路由:
era@server-node-3:~$ ip r | grep tunl
10.233.77.0/24 via 192.168.230.111 dev tunl0 proto bird onlink
10.233.79.0/24 via 192.168.250.111 dev tunl0 proto bird onlink
10.233.100.0/24 via 192.168.250.112 dev tunl0 proto bird onlink
era@server-node-3:~$ sudo calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.250.111 | node-to-node mesh | up | 21:36:50 | Established |
| 192.168.240.111 | node-to-node mesh | start | 19:51:59 | Passive |
| 192.168.230.111 | node-to-node mesh | up | 19:54:25 | Established |
| 192.168.250.112 | node-to-node mesh | up | 20:42:30 | Established |
| 192.168.240.112 | node-to-node mesh | start | 19:51:59 | Passive |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
era@server-node-3:~$
那么为什么这些路线不存在以及如何通过添加它们来改变这种行为呢?如果我手动添加,路线会自动删除。
问题是在 VPN TUN(第 3 层)上应用了 NAT。Calico 不支持它(或者我不熟悉可用的 NATed 解决方案)。
解决方案:使用路由而不是 NAT