这是调查开始的地方:CoreDNS 无法工作超过几秒钟,出现以下错误:
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
ingress-nginx ingress-nginx-controller-8xcl9 1/1 Running 0 11h
ingress-nginx ingress-nginx-controller-hwhvk 1/1 Running 0 11h
ingress-nginx ingress-nginx-controller-xqdqx 1/1 Running 2 (10h ago) 11h
kube-system calico-kube-controllers-684bcfdc59-cr7hr 1/1 Running 0 11h
kube-system calico-node-62p58 1/1 Running 2 (10h ago) 11h
kube-system calico-node-btvdh 1/1 Running 0 11h
kube-system calico-node-q5bkr 1/1 Running 0 11h
kube-system coredns-8474476ff8-dnt6b 0/1 CrashLoopBackOff 1 (3s ago) 5s
kube-system coredns-8474476ff8-ftcbx 0/1 Error 1 (2s ago) 5s
kube-system dns-autoscaler-5ffdc7f89d-4tshm 1/1 Running 2 (10h ago) 11h
kube-system kube-apiserver-hyzio 1/1 Running 4 (10h ago) 11h
kube-system kube-controller-manager-hyzio 1/1 Running 4 (10h ago) 11h
kube-system kube-proxy-2d8ls 1/1 Running 0 11h
kube-system kube-proxy-c6c4l 1/1 Running 4 (10h ago) 11h
kube-system kube-proxy-nzqdd 1/1 Running 0 11h
kube-system kube-scheduler-hyzio 1/1 Running 5 (10h ago) 11h
kube-system kubernetes-dashboard-548847967d-66dwz 1/1 Running 0 11h
kube-system kubernetes-metrics-scraper-6d49f96c97-r6dz2 1/1 Running 0 11h
kube-system nginx-proxy-dyzio 1/1 Running 0 11h
kube-system nginx-proxy-zyzio 1/1 Running 0 11h
kube-system nodelocaldns-g9wxh 1/1 Running 0 11h
kube-system nodelocaldns-j2qc9 1/1 Running 4 (10h ago) 11h
kube-system nodelocaldns-vk84j 1/1 Running 0 11h
kube-system registry-j5prk 1/1 Running 0 11h
kube-system registry-proxy-5wbhq 1/1 Running 0 11h
kube-system registry-proxy-77lqd 1/1 Running 0 11h
kube-system registry-proxy-s45p4 1/1 Running 2 (10h ago) 11h
kubectl describe
在那个吊舱上并没有给图片带来太多:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned kube-system/coredns-8474476ff8-dnt6b to zyzio
Normal Pulled 25s (x4 over 68s) kubelet Container image "k8s.gcr.io/coredns/coredns:v1.8.0" already present on machine
Normal Created 25s (x4 over 68s) kubelet Created container coredns
Normal Started 25s (x4 over 68s) kubelet Started container coredns
Warning BackOff 6s (x11 over 66s) kubelet Back-off restarting failed container
但是查看日志确实:
$ kubectl logs coredns-8474476ff8-dnt6b -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 5b233a0166923d642fdbca0794b712ab
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[FATAL] plugin/loop: Loop (127.0.0.1:49048 -> :53) detected for zone ".", see https://coredns.io/plugins/loop#troubleshooting. Query: "HINFO 2906344495550081187.9117452939332601176."
链接故障排除文档真是太好了!我开始浏览该页面并发现我确实/etc/resolv.conf
包含有问题的本地 IP nameserver 127.0.0.53
。
另外,我在 中找到了真正的DNS IP /run/systemd/resolve/resolv.conf
,但现在的问题是:如何执行故障排除文档中描述的操作,说:
将以下内容添加到您的 kubelet 配置 yaml:resolvConf:(或通过命令行标志 --resolv-conf 在 1.10 中已弃用)。您的“真实” resolv.conf 是包含您的上游服务器的实际 IP 的文件,并且没有本地/环回地址。该标志告诉 kubelet 将备用的 resolv.conf 传递给 Pod。对于使用 systemd-resolved 的系统,/run/systemd/resolve/resolv.conf 通常是“真正的”resolv.conf 的位置,尽管这可能因您的发行版而异。
所以,问题是:
- 如何找到或在哪里创建提到的 kubelet 配置 yaml,
- 我应该在什么级别指定
resolvConf
值,以及 - 它可以接受多个值吗?我定义了两个名称服务器。它们应该作为单独的条目还是数组给出?
/etc/resolv.conf/
位于您的每个节点中。SSH
您可以通过进入节点来编辑它。然后您必须重新启动
kubelet
才能使更改生效。(如果这不起作用,请使用 重新启动您的节点
sudo reboot
)/home/kubernetes/kubelet-config.yaml
(也位于您的每个节点上)文件包含 kubelet 的配置。您可以创建新文件,并使用字段resolv.conf
指向它resolvConf
重要提示:新配置仅适用于更新后创建的 pod。强烈建议在更改配置之前耗尽您的节点。
Kubelet 配置文档状态
resolvConf
的类型为string,因此可能只接受单个值。