关于【kubeadm】的问题- 第1页

Tamino Elgert

Asked: 2022-02-28 06:24:15 +0800 CST

带有证书管理器和letsencrypt的Kubernetes Nginx Ingress不允许域名中的通配符

0

我有一个带有 Nginx Ingress 的自托管 Kubernetes 集群。Cert-manager 也在集群上运行，我尝试使用 Letsencrypt 获取有效的 SSL 证书。一切正常，我获得了 example.com、www.example.com或 app1.example.com 的有效证书，但不适用于通用通配符 *.example.com。如果我尝试以任何方式在 sec.tls.hosts 下的入口中输入通配符，则不会为我生成证书。我得到输出

kubectl get certificate

NAME              READY   SECRET            AGE
tls-test-cert     False   tls-electi-cert   20h

kubectl get CertificateRequest

NAME                    APPROVED   DENIED   READY   ISSUER                REQUESTOR                                         AGE
tls-test-cert-8jw75     True                False   letsencrypt-staging   system:serviceaccount:cert-manager:cert-manager   18m

kubectl describe CertificateRequest

[...]
Status:
  Conditions:
    Last Transition Time:  2022-02-27T13:54:38Z
    Message:               Certificate request has been approved by cert-manager.io
    Reason:                cert-manager.io
    Status:                True
    Type:                  Approved
    Last Transition Time:  2022-02-27T13:54:38Z
    Message:               Waiting on certificate issuance from order gateway/tls-test-cert-8jw75-1425588341: "pending"
    Reason:                Pending
    Status:                False
    Type:                  Ready
Events:
  Type    Reason           Age   From          Message
  ----    ------           ----  ----          -------
  Normal  cert-manager.io  18m   cert-manager  Certificate request has been approved by cert-manager.io
  Normal  OrderCreated     18m   cert-manager  Created Order resource gateway/tls-test-cert-8jw75-1425588341
  Normal  OrderPending     18m   cert-manager  Waiting on certificate issuance from order gateway/tls-test-cert-8jw75-1425588341: ""

我的 Nginx 入口：（我将我的域交换到 example.com 以获取这篇文章）

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-management
  namespace: gateway
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: "letsencrypt-staging"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
spec:
  ingressClassName: nginx
  tls:
  - secretName: tls-test-cert
    hosts:
      - example.com
      - '*.example.com'
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: test-gateway
                port:
                  number: 80
    - host: '*.example.com'
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: test-gateway
                port:
                  number: 80

发行人：（我在这里编辑了我的电子邮件）

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-staging
  namespace: cert-manager
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: *******
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
      - http01:
          ingress:
            class: nginx

我的反向代理（测试网关）肯定可以工作并将所有子域转发到我的网站。提前感谢您对可能导致此问题的任何想法。

arjunbnair

Asked: 2022-01-03 02:49:07 +0800 CST

Kubernetes API 服务器无法注册主节点

1

我试图使用 kubeadm 创建一个 Kubernetes 集群。我启动了一个 Ubuntu 18.04 服务器，安装了 docker（确保 docker.service 正在运行），安装了 kubeadm kubelet 和 kubectl。

以下是我执行的步骤：

sudo apt-get update
sudo apt install apt-transport-https ca-certificates curl software-properties-common -y
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu `lsb_release -cs` test"
sudo apt update
sudo apt install docker-ce
sudo systemctl enable docker
sudo systemctl start docker

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt-get install kubeadm kubelet kubectl -y
sudo apt-mark hold kubeadm kubelet kubectl 
kubeadm version
swapoff –a

此外，为了配置 Docker cgroup 驱动程序，我编辑了/etc/systemd/system/kubelet.service.d/10-kubeadm.conf。在文件中，我添加Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"并注释掉了Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml".

/etc/systemd/system/kubelet.service.d/10-kubeadm.conf供参考：

# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
#Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

在此之后我跑了：systemctl daemon-reload和systemctl restart kubelet。kubelet.service 运行良好。

接下来，我运行sudo kubeadm init --pod-network-cidr=10.244.0.0/16并收到以下错误：

root@ip-172-31-1-238:/home/ubuntu# kubeadm init --pod-network-cidr=10.244.0.0/16
[init] 使用 Kubernetes 版本：v1.23.1
[preflight] 运行飞行前检查
[preflight] 拉取设置 Kubernetes 集群所需的映像
[preflight] 这可能需要一两分钟，具体取决于您的 Internet 连接速度
[preflight] 您也可以使用“kubeadm config images pull”预先执行此操作
[ certs] 使用 certificateDir 文件夹 "/etc/kubernetes/pki"
[certs] 生成 "ca" 证书和密钥
[certs] 生成 "apiserver" 证书和密钥
[certs] 为 DNS 名称 [ip-172-31-1-238 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] 和 IP [10.96.0.1 172.31.1.238] 签名的 apiserver 服务证书
[certs] 生成“apiserver-kubelet-client”证书和密钥
[certs] 生成“front-proxy-ca”证书和密钥
[certs] 生成“front-proxy-client”证书和密钥
[certs] 生成“etcd/ca” " 证书和密钥
[certs] 生成 "etcd/server" 证书和密钥
[certs] etcd/server 为 DNS 名称 [ip-172-31-1-238 localhost] 和 IP [172.31.1.238 127.0.0.1 签名::1]
[certs] 生成“etcd/peer”证书和密钥
[certs] etcd/peer 服务证书为 DNS 名称 [ip-172-31-1-238 localhost] 和 IP [172.31.1.238 127.0.0.1 ::1]
[certs] 生成“etcd/healthcheck-client”证书和密钥
[certs] 生成 "apiserver-etcd-client" 证书和密钥
[certs] 生成 "sa" 密钥和公钥
[kubeconfig] 使用 kubeconfig 文件夹 "/etc/kubernetes"
[kubeconfig] 编写 "admin.conf" kubeconfig 文件
[kubeconfig] 写入“kubelet.conf”kubeconfig 文件
[kubeconfig] 写入“controller-manager.conf”kubeconfig 文件
[kubeconfig] 写入“scheduler.conf”kubeconfig 文件
[kubelet-start] 将带有标志的 kubelet 环境文件写入文件“/ var/lib/kubelet/kubeadm 标志。环境"
[kubelet-start] 将 kubelet 配置写入文件 "/var/lib/kubelet/config.yaml"
[kubelet-start] 启动 kubelet
[control-plane] 使用清单文件夹 "/etc/kubernetes/manifests"
[control-plane ] 为“kube-apiserver”
创建静态 Pod 清单 [control-plane] 为“kube-controller-manager”
创建静态 Pod 清单 [control-plane] 为“kube-scheduler”
创建静态 Pod 清单 [etcd] 创建静态 Pod 清单对于“/etc/kubernetes/manifests”中的本地 etcd
[wait-control-plane] 等待 kubelet 作为静态 Pod 从目录“/etc/kubernetes/manifests”启动控制平面。这最多可能需要 4m0s
[kubelet-check] 40s 的初始超时已通过。
    Unfortunately, an error has occurred:  
            timed out waiting for the condition  

    This error is likely caused by:  
            - The kubelet is not running  
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)  

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:  
            - 'systemctl status kubelet'  
            - 'journalctl -xeu kubelet'  

    Additionally, a control plane component may have crashed or exited when started by the container runtime.  
    To troubleshoot, list all containers using your preferred container runtimes CLI.  

    Here is one example how you may list all Kubernetes containers running in docker:  
            - 'docker ps -a | grep kube | grep -v pause'  
             Once you have found the failing container, you can inspect its logs with:  
            - 'docker logs CONTAINERID'  

运行后systemctl status kubelet.service，似乎 kubelet 运行良好。
但是，运行后journalctl -xeu kubelet，我得到以下日志：

kubelet.go:2347]“容器运行时网络未准备好”networkReady="NetworkReady=false 原因：NetworkPluginNotReady 消息：docker：网络插件未准备好：cni 配置未初始化"
kubelet.go:2422]"获取节点时出错"err="节点“ip-172-31-1-238”未找到”
kubelet.go:2422]“获取节点时出错”err="节点“ip-172-31-1-238”未找到”
controller.go:144]无法确保租约存在，将在 7 秒后重试，错误：Get "https://172.31.1.238:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/ip-172-31 -1-238?timeout=10s": dial tcp 172.31.1.238:6443: connect: connection
denied kubelet.go:2422] "Error getting node" err="node "ip-172-31-1-238" not found "
kubelet.go:2422]“获取节点时出错”err="未找到节点“ip-172-31-1-238”"
kubelet_node_status.go:70] "正在尝试注册节点" node="ip-172-31-1-238"
kubelet_node_status.go:92] "无法向 API 服务器注册节点" err="Post "https://172.31 .1.238:6443/api/v1/nodes": 拨打 tcp 172.31.1.238:6443: 连接: 连接被拒绝" node="ip-172-31-1-238"
kubelet.go:2422] "获取节点时出错" 错误="节点 "ip-172-31-1-238" 未找到"

版本：
Docker： Docker 版本 20.10.12，构建 e91ed57
Kubeadm： {Major：“1”，Minor：“23”，GitVersion：“v1.23.1”，GitCommit：“86ec240af8cbd1b60bcc4c03c20da9b98005b92e”，GitTreeState：“clean”，BuildDate：” 2021-12-16T11:39:51Z"，GoVersion："go1.17.5"，编译器："gc"，平台："linux/amd64"}

不确定这是否是 Kube Api Server 和 Kubelet 之间的连接问题。
有谁知道如何解决这一问题？

arjunbnair

Asked: 2021-12-31 12:08:05 +0800 CST

kubelet 服务未在 Kubernetes 主节点中运行（波动）

0

我试图使用 kubeadm 创建一个 Kubernetes 集群。我启动了一个 Ubuntu 18.04 服务器，安装了 docker（确保 docker.service 正在运行），安装了 kubeadm kubelet 和 kubectl。

以下是我执行的步骤：

sudo apt-get update
sudo apt-get install docker.io -y
sudo systemctl enable docker
sudo systemctl start docker
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add
sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"
sudo apt-get install kubeadm kubelet kubectl -y
sudo apt-mark hold kubeadm kubelet kubectl 
kubeadm version
swapoff –a

sudo hostnamectl set-hostname master-node
sudo kubeadm init --pod-network-cidr=10.244.0.0/16

运行sudo kubeadm init --pod-network-cidr=10.244.0.0/16后，我收到以下错误：

root@ip-172-31-10-50:/home/ubuntu# sudo kubeadm init --pod-network-cidr=192.168.0.0/16
[init] 使用 Kubernetes 版本：v1.23.1
[preflight] 运行 pre-flight检查
[preflight] 拉取设置 Kubernetes 集群所需的图像
[preflight] 这可能需要一两分钟，具体取决于您的 Internet 连接速度
[preflight] 您也可以使用“kubeadm config images pull”预先执行此操作
[certs] 使用 certificateDir 文件夹 "/etc/kubernetes/pki"
[certs] 使用现有的 ca 证书颁发机构
[certs] 使用磁盘上现有的 apiserver 证书和密钥
[certs] 使用磁盘上现有的 apiserver-kubelet-client 证书和密钥
[certs] 使用现有的 front-proxy-ca 证书颁发机构
[certs] 使用磁盘上现有的 front-proxy-client 证书和密钥
[certs] 使用现有的 etcd/ca 证书颁发机构
[certs] 使用磁盘上现有的 etcd/server 证书和密钥
[certs] 使用现有的 etcd/peer 证书和磁盘上的密钥
[certs] 使用现有的 etcd/healthcheck-client 证书和磁盘上的密钥
[certs] 使用现有的 apiserver-etcd-client 证书和磁盘上的密钥
[certs] 使用现有的“ sa" key
[kubeconfig] 使用 kubeconfig 文件夹 "/etc/kubernetes"
[kubeconfig] 使用现有的 kubeconfig 文件：" /etc/kubernetes/admin.conf" [kubeconfig] 使用现有的 kubeconfig 文件："/etc/kubernetes/
kubelet.conf "
[kubeconfig] 使用现有的 kubeconfig 文件：“/etc/kubernetes/controller-manager.conf”
[kubeconfig] 使用现有的 kubeconfig 文件：“/etc/kubernetes/scheduler.conf”
[kubelet-start] 编写带有标志的 kubelet 环境文件file "/var/lib/kubelet/kubeadm-> flags.env"
[kubelet-start] 将 kubelet 配置写入文件 "/var/lib/kubelet/config.yaml"
[kubelet-start] 启动 kubelet
[control-plane ] 使用清单文件夹“/etc/kubernetes/manifests”
[control-plane] 为“kube-apiserver”
创建静态 Pod 清单 [control-plane] 为“kube-controller-manager”创建静态 Pod 清单
[control-plane] 创建“kube-scheduler”的静态 Pod 清单
[etcd] 在“/etc/kubernetes/manifests”中为本地 etcd 创建静态 Pod 清单
[wait-control-plane] 等待 kubelet 作为静态 Pod 从目录“/etc/kubernetes/manifests”启动控制平面。这最多可能需要 4m0s
[kubelet-check] 40s 的初始超时已通过。
[kubelet-check] kubelet 似乎没有运行或健康。
[kubelet-check] 等于 'curl -sSL http://localhost:10248/healthz' 的 HTTP 调用失败并出现错误：Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect ：拒绝连接。

我尝试使用Flannel 的 CIDR(10.244.0.0/16)和Calico 的 CIDR(192.168.0.0/16)运行kubectl init --pod-network-cidr。但是，我得到了同样的错误。

另外，我观察到我的 EC2 实例中Kubelet的状态在波动。当我运行systemctl status kubelet.service时，有时它没有运行，有时 Kubelet 正在运行。它自动发生。认为这是kubectl init失败的原因，因为kubelet-check清楚地说：“看起来 kubelet 没有运行或健康”

运行systemctl status kubelet.service后，报错：

root@ip-172-31-10-50:/home/ubuntu# systemctl status kubelet.service
● kubelet.service - kubelet：Kubernetes 节点代理
已加载：已加载（/lib/systemd/system/kubelet.service；已启用；供应商预设：已启用）
Drop-In：/etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
活动：自 2021 年 12 月 29 日星期三以来激活（自动重启）（结果：退出代码）世界标准时间 17:52:35；3 秒前
文档：https
://kubernetes.io/docs/home/ 进程：22901 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 22901 （代码=退出，状态=1/失败）

当我继续运行systemctl status kubelet.service时，几秒钟后，kubectl.service 似乎正在运行，几秒钟后，它又失败了。

...跳过...
● kubelet.service - kubelet：Kubernetes 节点代理
已加载：已加载（/lib/systemd/system/kubelet.service；已启用；供应商预设：已启用）
Drop-In：/etc/systemd/system /kubelet.service.d
└─10-kubeadm.conf
活动：自 2021 年 12 月 30 日星期四 18:50:49 UTC 起活动（运行）；125ms 前
Docs: https://kubernetes.io/docs/home/
Main PID: 12895 (kubelet)
Tasks: 9 (limit: 4686)
CGroup: /system.slice/kubelet.service
└─12895 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf > --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/conf

我不确定为什么 kubelet 会以这种方式波动。
有谁知道如何解决这一问题？

Ted

Asked: 2021-12-17 12:08:53 +0800 CST

自签名 ca 证书上的 kubeadm 令牌创建失败

1

我正在尝试在 ubuntu 服务器的 openstack 集群上使用 kubespray 部署 k8s 集群。当 kubeadm 尝试通过向 keystone 端点 xxx:5000/v3/ 提交发布请求以创建引导令牌来初始化云提供商时，安装失败。kubelet.service 无法启动，因为 keystone 端点是由自签名证书签名的。见下文。我从 keystone 端点保存了 ca 证书，并将其放在 /etc/kubernetes/ssl/ 中的主节点上，kubelet 和 kubeadm 会在其中查找证书。我还根据此处和此处的文档更新了 /etc/kubernetes/kubeadm-config.yaml，我已更新 kubeadm join-default 配置以包含“unsafeSkipCAVerification：true”，但 kubelet.service 在自签名证书上仍然失败。kubeadm 应该通过存储在 /etc/kubernetes/cloud_config 文件中的用户名/密码进行身份验证，并且我已经验证这些值是正确的。我不确定在哪里可以改变这种行为。任何指导将不胜感激。

ubuntu:/etc/kubernetes# kubeadm config print join-defaults
apiVersion: kubeadm.k8s.io/v1beta3
caCertPath: /etc/kubernetes/pki/ca.crt
discovery:
  bootstrapToken:
  apiServerEndpoint: kube-apiserver:6443
  token: abcdef.0123456789abcdef
  unsafeSkipCAVerification: true
  timeout: 5m0s
  tlsBootstrapToken: abcdef.0123456789abcdef
kind: JoinConfiguration
  nodeRegistration:
  criSocket: /var/run/dockershim.sock
  imagePullPolicy: IfNotPresent
  name: mdap-node-01
  taints: null

kubelet 堆栈跟踪：

 Dec 15 22:19:51 ubuntu kubelet[388780]: E1215 22:19:51.760564  388780 server.go:294] "Failed to run kubelet" err="failed to run Kubelet: could not init cloud provider \"openstack\": Post \"https://XXX.XXX.XXX.132:5000/v3/auth/tokens\": x509: certificate signed by unknown authority"
 Dec 15 22:19:51 ubuntu systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE


FAILED - RETRYING: Create kubeadm token for joining nodes with 24h expiration (default) (4 retries left).Result was: {
"attempts": 2,
"changed": false,
"cmd": [
    "/usr/local/bin/kubeadm",
    "--kubeconfig",
    "/etc/kubernetes/admin.conf",
    "token",
    "create"
],
"delta": "0:01:15.035670",
"end": "2021-12-16 15:03:22.901080",
"invocation": {
    "module_args": {
        "_raw_params": "/usr/local/bin/kubeadm --kubeconfig /etc/kubernetes/admin.conf token create",
        "_uses_shell": false,
        "argv": null,
        "chdir": null,
        "creates": null,
        "executable": null,
        "removes": null,
        "stdin": null
        "stdin_add_newline": true,
        "strip_empty_ends": true,
        "warn": true
    }
},
"msg": "non-zero return code",
"rc": 1,
"retries": 6,
"start": "2021-12-16 15:02:07.865410",
"stderr": "timed out waiting for the condition\nTo see the stack trace of this error execute with --v=5 or higher",
"stderr_lines": [
    "timed out waiting for the condition",
    "To see the stack trace of this error execute with --v=5 or higher"
],
"stdout": "",
"stdout_lines": []

Oana

Asked: 2021-12-08 05:20:24 +0800 CST

VMWare Workstation 和 Windows 10：无法连接到托管在 VM 上的服务器

1

我在 VMWare Workstation 15 上配置了一个私有 2 节点 Kubernetes 集群。我使用的是 MetalLB 和 Calico。入口服务和入口看起来像：

xxx@c1-cp1:~/Desktop$ kubectl get svc -n ingress-controller-2
NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP      PORT(S)                      AGE
wsnginx-ingress-nginx-controller             LoadBalancer   10.109.117.222   192.168.44.136   80:30167/TCP,443:30680/TCP   24h
wsnginx-ingress-nginx-controller-admission   ClusterIP      10.105.103.165   <none>           443/TCP                      24h
xxx@c1-cp1:~/Desktop$ kubectl get ing apollo-ingress
NAME             CLASS     HOSTS                ADDRESS          PORTS   AGE
apollo-ingress   wsnginx   test.xxx.com   192.168.44.136   80      3h17m

我正在使用 Nat 网络适配器和静态 IPS。我的端口转发配置如下

curl -D- http://192.168.44.136 -H 'Host: test.xxx.com'从 VM，返回 200 状态，但我无法从主机 Win10 上访问它，127.0.0.1:8080因为我得到一个404 NGINX NotFound.

你能帮帮我吗？我究竟做错了什么？我怎么能在我的私人网络中公开它？谢谢！

更新我不确定这是否是正确的方法，但我设法通过更改一点 Ingress 资源从主机连接。我在注释中添加了主机参数，如

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: apollo-ingress
spec:
  ingressClassName: wsnginx
  rules:
    #- host: test.xxx.com
    - http:
        paths:
          - backend:
              service:
                name: apollo-service
                port: 
                  number: 80
            path: /
            pathType: Prefix

现在我的入口看起来像这样

NAMESPACE   NAME                                                   CLASS     HOSTS                        ADDRESS          PORTS     AGE
default     ingress.networking.k8s.io/apollo-ingress               wsnginx   *                            192.168.44.136   80        3h31m

看来我现在也可以从我的主机上访问它了。我有一个 Rest API，所以我刚刚从浏览器打开它http://127.0.0.1:8080

Daigo

Asked: 2021-12-01 22:10:33 +0800 CST

在使用 kubeadm 引导集群之前，如何修改 CoreDNS 配置映射？

1

我需要使用 kubeadm 构建我的本地 Kubernetes 集群。

由于我的环境没有 DNS，我必须修改 CoreDNS 的配置映射，使其不包含转发部分。

部署集群后，我可以使用编辑 configmap kubectl edit cm coredns -n kube-system，但修改后 CoreDNS 需要一些时间才能正常工作，这可能对我的生产环境有问题。

是否可以在执行之前编辑此配置图kubeadm init？

Daigo

Asked: 2021-11-25 00:25:37 +0800 CST

使用 containerd 作为 CRI 时离线安装 kubernetes 失败

1

由于某种原因，我不得不构建一个没有 Internet 连接的裸机 Kubernetes 集群。

由于 dockershim 已被弃用，我决定使用 containerd 作为 CRI，但kubeadm init由于超时，使用 kubeadm 离线安装在执行时失败。

    Unfortunately, an error has occurred:
            timed out waiting for the condition

    This error is likely caused by:
            - The kubelet is not running
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
            - 'systemctl status kubelet'
            - 'journalctl -xeu kubelet'

由于以下原因，我可以看到很多错误日志journalctl -u kubelet -f：

11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.473188    9299 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://133.117.20.57:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/rhel8?timeout=10s": dial tcp 133.117.20.57:6443: connect: connection refused
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.533555    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: I1124 16:25:25.588986    9299 kubelet_node_status.go:71] "Attempting to register node" node="rhel8"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.589379    9299 kubelet_node_status.go:93] "Unable to register node with API server" err="Post \"https://133.117.20.57:6443/api/v1/nodes\": dial tcp 133.117.20.57:6443: connect: connection refused" node="rhel8"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.634625    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.735613    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.835815    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:25 rhel8 kubelet[9299]: E1124 16:25:25.936552    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.036989    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.137464    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.238594    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.338704    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:26 rhel8 kubelet[9299]: E1124 16:25:26.394465    9299 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"rhel8.16ba6aab63e58bd8", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Node", Namespace:"", Name:"rhel8", UID:"rhel8", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"Starting", Message:"Starting kubelet.", Source:v1.EventSource{Component:"kubelet", Host:"rhel8"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc05f9812b2b227d8, ext:5706873656, loc:(*time.Location)(0x55a228f25680)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc05f9812b2b227d8, ext:5706873656, loc:(*time.Location)(0x55a228f25680)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://133.117.20.57:6443/api/v1/namespaces/default/events": dial tcp 133.117.20.57:6443: connect: connection refused'(may retry after sleeping)
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.143503    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.244526    9299 kubelet.go:2407] "Error getting node" err="node \"rhel8\" not found"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302890    9299 remote_runtime.go:116] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302949    9299 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" pod="kube-system/kube-scheduler-rhel8"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.302989    9299 kuberuntime_manager.go:815] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"k8s.gcr.io/pause:3.2\": failed to pull image \"k8s.gcr.io/pause:3.2\": failed to pull and unpack image \"k8s.gcr.io/pause:3.2\": failed to resolve reference \"k8s.gcr.io/pause:3.2\": failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.2\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused" pod="kube-system/kube-scheduler-rhel8"
11 24 16:25:27 rhel8 kubelet[9299]: E1124 16:25:27.303080    9299 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-rhel8_kube-system(e5616b23d0312e4995fcb768f04aabbb)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-rhel8_kube-system(e5616b23d0312e4995fcb768f04aabbb)\\\": rpc error: code = Unknown desc = failed to get sandbox image \\\"k8s.gcr.io/pause:3.2\\\": failed to pull image \\\"k8s.gcr.io/pause:3.2\\\": failed to pull and unpack image \\\"k8s.gcr.io/pause:3.2\\\": failed to resolve reference \\\"k8s.gcr.io/pause:3.2\\\": failed to do request: Head \\\"https://k8s.gcr.io/v2/pause/manifests/3.2\\\": dial tcp: lookup k8s.gcr.io on [::1]:53: read udp [::1]:39732->[::1]:53: read: connection refused\"" pod="kube-system/kube-scheduler-rhel8" podUID=e5616b23d0312e4995fcb768f04aabbb

当我对 Internet 连接执行相同操作时，安装成功。并且当使用 docker 代替 containerd 时，即使没有 Internet 连接也可以成功完成安装。

Daigo

Asked: 2021-10-26 18:04:04 +0800 CST

kubernetes coredns 处于 CrashLoopBackOff 状态，出现“未找到名称服务器”错误

1

我曾尝试在我的裸机服务器上使用 kubeadm 构建 kubernetes，并将 containerd 作为 cri，但似乎 coredns 在安装 cni（weave-net）后无法启动。

两个 coredns 容器现在处于“CrashLoopBackOff”状态，它们的日志是：

plugin/forward: no nameservers found

而“kubectl describe pod”的描述如下：

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  4m52s (x9 over 13m)    default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         4m7s                   default-scheduler  Successfully assigned kube-system/coredns-58cf647449-8pq7k to k8s
  Normal   Pulled            3m13s (x4 over 4m6s)   kubelet            Container image "localhost:5000/coredns:v1.8.4" already present on machine
  Normal   Created           3m13s (x4 over 4m6s)   kubelet            Created container coredns
  Normal   Started           3m13s (x4 over 4m6s)   kubelet            Started container coredns
  Warning  Unhealthy         3m13s                  kubelet            Readiness probe failed: Get "http://10.32.0.3:8181/ready": dial tcp 10.32.0.3:8181: connect: connection refused
  Warning  BackOff           2m54s (x12 over 4m5s)  kubelet            Back-off restarting failed container

如果我在 /etc/resolv.conf 上添加一些设置，例如“nameserver 8.8.8.8”，coredns pods 就会开始运行。但是，目前我根本不使用任何外部 dns，并且使用 Docker 作为 cri，虽然 /etc/resolv.conf 上没有设置，但 coredns 运行良好。

是否可以在不在 resolv.conf 上设置一些上游 dns 服务器的情况下处理这个问题？

服务器信息：

OS: RedHat Enterprise Linux 8.4
cri: containerd 1.4.11
cni: weave-net 1.16
tools: kubeadm, kubectl, kubelet 1.22.1

我也尝试过使用 calico 作为 cni，但结果是一样的。

Chris G.

Asked: 2021-10-14 13:20:48 +0800 CST

kubeconfig 文件的自定义根证书

0

跑步kubeadm init phase certs apiserver --config kubeadm.yaml

是否可以为用户组/kubectl/config 文件使用多个/自定义根证书？

我之所以问，是因为我想在每个项目的基础上授予访问权限-然后删除自定义根证书-但为特殊的 kubectl 管理员保留“原始”根证书。

我已经看到您可以使用 ssh 隧道作为第一道防线，以保护根证书公钥。但是您仍然需要分发公共签名证书，即使它位于 ssh 公钥私钥后面。因此，也许有一种方法可以使用 ssh 隧道 - 并将自定义证书放入certificatesDir: /etc/kubernetes/pki？

kubeadm.yaml

apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki

我知道你可以--insecure-skip-tls-verify在配置文件中使用，但这似乎是个坏主意。

Daigo

Asked: 2021-10-03 04:13:07 +0800 CST

带有 containerd 的 Kubeadm 无法使用本地加载的图像

0

我正在尝试在裸机服务器（RHEL8）中使用 containerd 构建 kubernetes。

没有互联网连接，所以我手动下载了所需的图像（例如 k8s.gcr.io/kube-scheduler:v1.22.1）并使用“ctr image import”加载它们。

图像似乎已成功加载。

#ctr images ls -q
k8s.gcr.io/coredns/coredns:v1.8.4
k8s.gcr.io/etcd:3.5.0-0
k8s.gcr.io/kube-apiserver:v1.22.1
k8s.gcr.io/kube-controller-manager:v1.22.1
k8s.gcr.io/kube-proxy:v1.22.1
k8s.gcr.io/kube-scheduler:v1.22.1
k8s.gcr.io/pause:3.5

然后我执行了“kubeadm init”，但由于 ImagePull 错误而失败。

#kubeadm init --kubernetes-version=1.22.1 --cri-socket=/run/containerd/containerd.sock
[init] Using Kubernetes version: v1.22.1
[preflight] Running pre-flight checks
        [WARNING FileExisting-tc]: tc not found in system path
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:

如何让 kubeadm 使用本地镜像？还是可以忽略这些预检错误？

编辑：这个过程（手动加载图像而不是执行 kubeadm config images pull）在 docker 和 CentOS7 中运行良好。

带有证书管理器和letsencrypt的Kubernetes Nginx Ingress不允许域名中的通配符

Kubernetes API 服务器无法注册主节点

kubelet 服务未在 Kubernetes 主节点中运行（波动）

自签名 ca 证书上的 kubeadm 令牌创建失败

VMWare Workstation 和 Windows 10：无法连接到托管在 VM 上的服务器

在使用 kubeadm 引导集群之前，如何修改 CoreDNS 配置映射？

使用 containerd 作为 CRI 时离线安装 kubernetes 失败

kubernetes coredns 处于 CrashLoopBackOff 状态，出现“未找到名称服务器”错误

kubeconfig 文件的自定义根证书

带有 containerd 的 Kubeadm 无法使用本地加载的图像

新安装后 postgres 的默认超级用户用户名/密码是什么？

SFTP 使用什么端口？

命令行列出 Windows Active Directory 组中的用户？

什么是 Pem 文件，它与其他 OpenSSL 生成的密钥文件格式有何不同？

如何确定bash变量是否为空？

问题[kubeadm](server)