我正在尝试按照本指南在 Hetzner Cloud 上设置 HA Kubernetes 集群。我创建了 6 台服务器、3 台控制平面主机和 3 台工作人员。尝试使用 kubeadm 将第二台服务器加入集群时,出现以下错误:
在 k8s-server-1 上:
Jul 06 14:09:01 k8s-server-1 kubelet[8059]: E0706 14:09:01.430599 8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:54 k8s-server-1 kubelet[8059]: E0706 14:08:54.370142 8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:51 k8s-server-1 kubelet[8059]: E0706 14:08:51.762075 8059 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-1\": Get \"https://my.kubernetes.test:6443/api/v1/nodes/k8s-server-1?resourceVersion=0&timeout=10s\": context deadline exceeded"
Jul 06 14:08:47 k8s-server-1 kubelet[8059]: E0706 14:08:47.325309 8059 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-k8s-server-1.168f32516b37209a", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-apiserver-k8s-server-1", UID:"10b8928a4f8e5e0b449a40ab35a3efdc", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver}"}, Reason:"Unhealthy", Message:"Readiness probe failed: HTTP probe failed with statuscode: 500", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-1"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd0ee49429a, ext:115787424848, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd16f1a0a1d, ext:117801107410, loc:(*time.Location)(0x74c3600)}}, Count:2, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/events/kube-apiserver-k8s-server-1.168f32516b37209a": read tcp 192.168.178.2:60934->192.168.178.8:6443: use of closed network connection'(may retry after sleeping)
Jul 06 14:08:47 k8s-server-1 kubelet[8059]: E0706 14:08:47.324053 8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:46 k8s-server-1 kubelet[8059]: I0706 14:08:46.986663 8059 status_manager.go:566] "Failed to get status for pod" podUID=10b8928a4f8e5e0b449a40ab35a3efdc pod="kube-system/kube-apiserver-k8s-server-1" error="etcdserver: request timed out"
在 k8s-server-2 上:
Jul 06 14:09:04 k8s-server-2 kubelet[6685]: E0706 14:09:04.072247 6685 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"weave-net-9fldg.168f3252093de42e", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"weave-net-9fldg", UID:"88743b7a-aa81-4948-be9b-78c4bbf436fe", APIVersion:"v1", ResourceVersion:"714", FieldPath:"spec.initContainers{weave-init}"}, Reason:"Pulled", Message:"Successfully pulled image \"docker.io/weaveworks/weave-kube:2.8.1\" in 6.525660057s", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Jul 06 14:08:57 k8s-server-2 kubelet[6685]: E0706 14:08:57.993540 6685 controller.go:144] failed to ensure lease exists, will retry in 400ms, error: Get "https://my.kubernetes.test:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-server-2?timeout=10s": context deadline exceeded
Jul 06 14:08:57 k8s-server-2 kubelet[6685]: I0706 14:08:57.352989 6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.992481 6685 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"weave-net-9fldg.168f3252093de42e", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"weave-net-9fldg", UID:"88743b7a-aa81-4948-be9b-78c4bbf436fe", APIVersion:"v1", ResourceVersion:"714", FieldPath:"spec.initContainers{weave-init}"}, Reason:"Pulled", Message:"Successfully pulled image \"docker.io/weaveworks/weave-kube:2.8.1\" in 6.525660057s", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/events": read tcp 192.168.178.3:47722->192.168.178.8:6443: use of closed network connection'(may retry after sleeping)
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.990109 6685 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-2\": Get \"https://my.kubernetes.test:6443/api/v1/nodes/k8s-server-2?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: I0706 14:08:56.989160 6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.988865 6685 kubelet.go:1683] "Failed creating a mirror pod for" err="Post \"https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/pods\": read tcp 192.168.178.3:47722->192.168.178.8:6443: use of closed network connection" pod="kube-system/etcd-k8s-server-2"
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: E0706 14:08:54.210098 6685 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-k8s-server-2_kube-system(22b3a914daf1bef98cb01ddd7868523d)\"" pod="kube-system/etcd-k8s-server-2" podUID=22b3a914daf1bef98cb01ddd7868523d
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: I0706 14:08:54.208472 6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: E0706 14:08:54.208199 6685 kubelet.go:1683] "Failed creating a mirror pod for" err="rpc error: code = Unknown desc = context deadline exceeded" pod="kube-system/etcd-k8s-server-2"
Jul 06 14:08:53 k8s-server-2 kubelet[6685]: E0706 14:08:53.347043 6685 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-proxy-2z5js.168f3250c7fc2120", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-proxy-2z5js", UID:"0ac8fe5d-7332-4a4d-abee-48c6d4dee38f", APIVersion:"v1", ResourceVersion:"711", FieldPath:"spec.containers{kube-proxy}"}, Reason:"Started", Message:"Started container kube-proxy", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd04243d720, ext:5783805064, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd04243d720, ext:5783805064, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Jul 06 14:08:53 k8s-server-2 kubelet[6685]: I0706 14:08:53.269542 6685 scope.go:111] "RemoveContainer" containerID="e2664d16d53ff5ae6de27fe52e84651791bca1ca70a6987c9a4e3e7318eaa174"
Jul 06 14:08:47 k8s-server-2 kubelet[6685]: I0706 14:08:47.194425 6685 scope.go:111] "RemoveContainer" containerID="7aaa63419740b5e30cc76770abc92dfbabe1f48d4d812b4abc89168f73e46d51"
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: I0706 14:08:46.987598 6685 status_manager.go:566] "Failed to get status for pod" podUID=778e041efc75c1983cbb59f2b3d46d09 pod="kube-system/kube-controller-manager-k8s-server-2" error="etcdserver: request timed out"
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: E0706 14:08:46.986807 6685 controller.go:144] failed to ensure lease exists, will retry in 200ms, error: etcdserver: request timed out
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: E0706 14:08:46.986800 6685 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-2\": etcdserver: request timed out"
服务器列表:| 姓名 | 公共IP | 私有IP | | --- | --- | --- | | k8s-服务器-1 | 192.168.178.2 | 10.23.1.2 | | k8s-服务器-2 | 192.168.178.3 | 10.23.1.3 | | k8s-服务器-3 | 192.168.178.4 | 10.23.1.4 | | k8s-worker-1 | 192.168.178.5 | 10.23.1.5 | | k8s-worker-2 | 192.168.178.6 | 10.23.1.6 | | k8s-worker-3 | 192.168.178.7 | 10.23.1.7 |
此外,k8s-server-* 应用了以下防火墙规则(仅适用于通过公共 IP 路由的流量,不适用于私有网络内部): | 方向 | 港口 | 来源/目的地 | | --- | --- | --- | | 入口 | 80 | 任何| | 入口 | 第443章 任何| | 入口 | 22 | 静态公司IP | | 入口 | 6443 | 静态公司IP | | 出口 | 任何| 任何|
在同一网络中有一个负载均衡器,将流量路由到 k8s-server-1。它的公共 IP 是 192.168.178.8,私有 IP 是 10.23.1.8。
我在两个节点上运行的内容:
apt-get update
apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io
systemctl enable docker.service
systemctl enable containerd.service
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
systemctl enable docker
systemctl daemon-reload
systemctl restart docker
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sysctl --system
apt-get update
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
...在服务器 1 上:
kubeadm config images pull
kubeadm init --apiserver-advertise-address=10.23.1.2 --control-plane-endpoint "my.kubernetes.test:6443" --upload-certs
mkdir ~/.kube
cp /etc/kubernetes/admin.conf ~/.kube/config
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
watch kubectl get pod -n kube-system
watch kubectl get nodes
...在服务器 2 上:
kubeadm config images pull
kubeadm join my.kubernetes.test:6443 --token XXXXX.XXXXX --discovery-token-ca-cert-hash sha256:XXXXXXXXXX --control-plane --certificate-key XXXXXXXXXX
我也可以通过将
--apiserver-advertise-address
参数添加到kubeadm join
命令来解决问题。