Solaris DNS

Question

Volker Raschek

Asked: 2022-01-12 14:18:15 +0800 CST2022-01-12 14:18:15 +0800 CST 2022-01-12 14:18:15 +0800 CST

archlinux：kubernetes - coredns 无法正常工作

772

我已经安装了v1.23.0带有 Arch Linux 作为发行版的 kubernetes。集群由一个主节点和一个节点组成。Booth 系统是基于 KVM 的虚拟机。

当 Pod 想要进行 DNS 查询时，当服务将请求转发到在另一个 kubernetes 节点上运行的 coredns 的 Pod 实例时，它会超时。

所以我怀疑网络提供商工作不正常或某些设置（内核模块、sysctl 等）未设置，因为当请求转发到本地运行的 coredns pod 实例时，客户端会收到响应。这是我的调试步骤：

在开始调试之前，我通过添加log到 coredns 的 configmap 来提高 coredns 的 loglevel。

# kubectl get -n kube-system configmaps coredns -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        log
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

我将我的网络容器部署为调试环境，使用诸如等网络工具dig来nslookup测试不同的 coredns 实例。

kubectl apply -f https://raw.githubusercontent.com/volker-raschek/network-tools/master/network-tools.yml

coredns 的以下 Pod 和服务可用：

kubectl get pod,service -n kube-system -l k8s-app=kube-dns -o wide
NAME                          READY   STATUS    RESTARTS   AGE   IP          NODE                  NOMINATED NODE   READINESS GATES
pod/coredns-64897985d-cgxmv   1/1     Running   0          24h   10.85.0.4   archlinux-x86-64-000  <none>           <none>
pod/coredns-64897985d-l9ftl   1/1     Running   0          24h   10.85.0.3   archlinux-x86-64-001  <none>           <none>

NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
service/kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   24h   k8s-app=kube-dns

在我的网络 pod 中执行 shell 并尝试google.com通过coredns服务查询 IP 地址。如何识别命令需要不同的时间长度。不幸的是，我无法通过服务重现超时错误：

# kubectl exec  network-tools -- time dig A google.com @10.96.0.10 +short
142.250.185.238
real    0m 5.02s
user    0m 0.02s
sys 0m 0.00s
# kubectl exec  network-tools -- time dig A google.com @10.96.0.10 +short
142.250.185.238
real    0m 0.03s
user    0m 0.01s
sys 0m 0.00s
# kubectl exec  network-tools -- time dig A google.com @10.96.0.10 +short
142.250.185.238
real    0m 10.03s
user    0m 0.01s
sys 0m 0.01s

现在我将查询限制为不同的 coredns pod。请注意，coredns-64897985d-cgxmv具有 IP的 pod10.85.0.4正在不同的节点上运行。

吊舱/coredns-64897985d-l9ftl / 10.85.0.3

kubectl exec  network-tools -- time dig A google.com @10.85.0.3 +short
142.251.36.206
real    0m 0.09s
user    0m 0.00s
sys 0m 0.01s

吊舱/coredns-64897985d-cgxmv / 10.85.0.4

coredns这是显式使用另一个节点的 pod时的超时错误。

# kubectl exec  network-tools -- time dig A google.com @10.85.0.4 +short

; <<>> DiG 9.16.20 <<>> A google.com @10.85.0.4 +short
;; global options: +cmd
;; connection timed out; no servers could be reached

Command exited with non-zero status 9
real    0m 15.02s
user    0m 0.02s
sys 0m 0.00s
command terminated with exit code 9

The following logs were written by the coredns pods:

pod/coredns-64897985d-l9ftl / 10.85.0.3

# kubectl logs -n kube-system coredns-64897985d-l9ftl 
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 127.0.0.1:54962 - 9983 "HINFO IN 4683476401105705616.5032820535498752139. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.058383302s
[INFO] 10.85.0.1:24999 - 26748 "A IN google.com. udp 51 false 4096" NOERROR qr,rd,ra 1549 0.070006969s
[INFO] 10.85.0.1:6142 - 9467 "A IN google.com. udp 51 false 4096" NOERROR qr,aa,rd,ra 1549 0.000959536s
[INFO] 10.85.0.1:2544 - 20425 "A IN google.com. udp 51 false 4096" NOERROR qr,aa,rd,ra 1549 0.00065977s
[INFO] 10.85.0.1:26782 - 372 "A IN google.com. udp 51 false 4096" NOERROR qr,aa,rd,ra 1549 0.001292768s
[INFO] 10.85.0.1:62687 - 27302 "A IN google.com. udp 51 false 4096" 
...

pod/coredns-64897985d-cgxmv / 10.85.0.4

# kubectl logs -n kube-system coredns-64897985d-cgxmv
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.6
linux/amd64, go1.17.1, 13a9191
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete

To narrow down the problem, I reinstalled my cluster via ansible and installed calico instead of flannel via the command below. The same problem exists there.

$ helm install calico projectcalico/tigera-operator --version v3.21.3

I used the installation guide of kubeadm to initialize the cluster. I executed the following kubeadmin init command to initialize the cluster:

$ kubeadm init \
  --apiserver-advertise-address=192.168.179.101 \
  --apiserver-cert-extra-sans=api.example.com \
  --control-plane-endpoint=192.168.179.100 \
  --cri-socket=unix:///var/run/crio/crio.sock \
  --pod-network-cidr=10.244.0.0/16 \
  --upload-certs

The kernel module br_netfilter and the sysctl properties are defined, but the problem still exists. I am at the end of my solution approaches and need advice from experts here. Below is a list of my kernel modules, sysctl settings and other information.

I hope someone can help me.

kernel information

$ uname -a
Linux archlinux-x86-64-000 5.10.90-1-lts #1 SMP Wed, 05 Jan 2022 13:07:40 +0000 x86_64 GNU/Linux

kernel modules

$ lsmod | sort
ac97_bus               16384  1 snd_soc_core
aesni_intel           372736  0
agpgart                53248  4 intel_agp,intel_gtt,ttm,drm
atkbd                  36864  0
bpf_preload            16384  0
bridge                274432  1 br_netfilter
br_netfilter           32768  0
cec                    61440  1 drm_kms_helper
cfg80211              983040  0
crc16                  16384  1 ext4
crc32c_generic         16384  0
crc32c_intel           24576  3
crc32_pclmul           16384  0
crct10dif_pclmul       16384  1
cryptd                 24576  2 crypto_simd,ghash_clmulni_intel
crypto_simd            16384  1 aesni_intel
drm                   577536  5 drm_kms_helper,qxl,drm_ttm_helper,ttm
drm_kms_helper        278528  3 qxl
drm_ttm_helper         16384  1 qxl
ext4                  933888  1
failover               16384  1 net_failover
fat                    86016  1 vfat
fb_sys_fops            16384  1 drm_kms_helper
fuse                  167936  1
ghash_clmulni_intel    16384  0
glue_helper            16384  1 aesni_intel
i2c_i801               36864  0
i2c_smbus              20480  1 i2c_i801
i8042                  36864  0
intel_agp              24576  0
intel_gtt              24576  1 intel_agp
intel_pmc_bxt          16384  1 iTCO_wdt
intel_rapl_common      28672  1 intel_rapl_msr
intel_rapl_msr         20480  0
ip6_udp_tunnel         16384  1 vxlan
ip_set                 57344  0
ip_tables              32768  0
ipt_REJECT             16384  0
ip_vs                 184320  6 ip_vs_rr,ip_vs_sh,ip_vs_wrr
ip_vs_rr               16384  0
ip_vs_sh               16384  0
ip_vs_wrr              16384  0
irqbypass              16384  1 kvm
iTCO_vendor_support    16384  1 iTCO_wdt
iTCO_wdt               16384  0
jbd2                  151552  1 ext4
joydev                 28672  0
kvm                   933888  1 kvm_intel
kvm_intel             331776  0
ledtrig_audio          16384  1 snd_hda_codec_generic
libcrc32c              16384  4 nf_conntrack,nf_nat,nf_tables,ip_vs
libps2                 20480  2 atkbd,psmouse
llc                    16384  2 bridge,stp
lpc_ich                28672  0
mac_hid                16384  0
mbcache                16384  1 ext4
Module                  Size  Used by
mousedev               24576  0
net_failover           24576  1 virtio_net
nf_conntrack          172032  6 xt_conntrack,nf_nat,xt_nat,nf_conntrack_netlink,xt_MASQUERADE,ip_vs
nf_conntrack_netlink    57344  0
nf_defrag_ipv4         16384  1 nf_conntrack
nf_defrag_ipv6         24576  2 nf_conntrack,ip_vs
nf_nat                 57344  3 xt_nat,nft_chain_nat,xt_MASQUERADE
nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables,ip_set
nf_reject_ipv4         16384  1 ipt_REJECT
nf_tables             249856  183 nft_compat,nft_counter,nft_chain_nat
nft_chain_nat          16384  7
nft_compat             20480  122
nft_counter            16384  84
nls_iso8859_1          16384  0
overlay               147456  18
pcspkr                 16384  0
psmouse               184320  0
qemu_fw_cfg            20480  0
qxl                    73728  0
rapl                   16384  0
rfkill                 28672  2 cfg80211
rng_core               16384  1 virtio_rng
serio                  28672  6 serio_raw,atkbd,psmouse,i8042
serio_raw              20480  0
snd                   114688  8 snd_hda_codec_generic,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_timer,snd_compress,snd_soc_core,snd_pcm
snd_compress           32768  1 snd_soc_core
snd_hda_codec         172032  2 snd_hda_codec_generic,snd_hda_intel
snd_hda_codec_generic    98304  1
snd_hda_core          110592  3 snd_hda_codec_generic,snd_hda_intel,snd_hda_codec
snd_hda_intel          57344  0
snd_hwdep              16384  1 snd_hda_codec
snd_intel_dspcfg       28672  1 snd_hda_intel
snd_pcm               147456  7 snd_hda_intel,snd_hda_codec,soundwire_intel,snd_compress,snd_soc_core,snd_hda_core,snd_pcm_dmaengine
snd_pcm_dmaengine      16384  1 snd_soc_core
snd_soc_core          327680  1 soundwire_intel
snd_timer              49152  1 snd_pcm
soundcore              16384  1 snd
soundwire_bus          90112  3 soundwire_intel,soundwire_generic_allocation,soundwire_cadence
soundwire_cadence      36864  1 soundwire_intel
soundwire_generic_allocation    16384  1 soundwire_intel
soundwire_intel        45056  1 snd_intel_dspcfg
stp                    16384  1 bridge
syscopyarea            16384  1 drm_kms_helper
sysfillrect            16384  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
ttm                   114688  2 qxl,drm_ttm_helper
udp_tunnel             20480  1 vxlan
usbhid                 65536  0
veth                   32768  0
vfat                   24576  0
virtio_balloon         24576  0
virtio_blk             20480  2
virtio_console         40960  0
virtio_net             61440  0
virtio_pci             28672  0
virtio_rng             16384  0
vxlan                  77824  0
xhci_pci               20480  0
xhci_pci_renesas       20480  1 xhci_pci
x_tables               53248  11 xt_conntrack,xt_statistic,nft_compat,xt_tcpudp,xt_addrtype,xt_nat,xt_comment,ipt_REJECT,ip_tables,xt_MASQUERADE,xt_mark
xt_addrtype            16384  2
xt_comment             16384  64
xt_conntrack           16384  13
xt_mark                16384  12
xt_MASQUERADE          20480  6
xt_nat                 16384  7
xt_statistic           16384  3
xt_tcpudp              20480  15

sysctl

sysctl properties on pastbin. Exceeds the maximum number of characters for serverfault.

1 个回答

Voted

Mikolaj S. · Answer 1 · 2022-01-22T01:12:44+08:00

Posting community wiki answer based on GitHub topic - Files included in CentOS RPM interfere with CNI operation and ArchLinux wiki page - CRIO-O. Feel free to expand it.

The issue is known and described in ArchLinux wiki:

Warning: Arch installs the plugins provided by cni-plugins to both /usr/lib/cni and /opt/cni/bin but most other plugins (e.g. in-cluster deployments, kubelet managed plugins, etc) by default only install to the second directory.

CRI-O is only configured to look for plugins in the first directory and as a result, any plugins in the second directory are unavailable without some configuration changes.

The workaround is presented in this post by the user bart0sh:

I'm using the following workaround:

install cri-o

delete everything from /etc/cni/net.d

setup node with kubeadm

install cni plugin (wave, flannel, calico, etc)

and additional configuration presented by the user volker-raschek:

Besides the CNI configurations under /etc/cni/net.d, the extension of the plugin_dirs property was missing in the crio.conf configuration, because crio doesn't seem to look in /opt/cni for the rolled out plugins by default. I created a drop-in file.
$ cat /etc/crio/crio.conf.d/00-plugin-dir.conf
[crio.network]
plugin_dirs = [
 "/opt/cni/bin/",
 "/usr/lib/cni",
]

archlinux：kubernetes - coredns 无法正常工作

kernel information

kernel modules

sysctl

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

archlinux：kubernetes - coredns 无法正常工作

kernel information

kernel modules

sysctl

1 个回答

相关问题