在具有 2 个 ubuntu 控制器集群的 Kubernetes 1.25 中,在检测到 /etc/kubernetes/pki 中的证书过期后,我们在两个控制器节点上启动了“kubeadm renew all”命令,然后重新启动了节点。
重启后,kube-apiserver 无法再启动,并且其 docker 容器不断崩溃。
docker 日志抱怨无法访问 oidc 服务器(它是一个 keycloak pod),因此似乎 oidc 连接器对于启动 api 服务器是必需的:
{"log":"I0805 13:24:51.011009 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.\n","stream":"stderr","time":"2024-08-05T13:24:51.011063776Z"}
{"log":"I0805 13:24:51.011027 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.\n","stream":"stderr","time":"2024-08-05T13:24:51.011072396Z"}
{"log":"I0805 13:24:51.013060 1 etcd.go:292] \"Using watch cache\" resource=\"customresourcedefinitions.apiextensions.k8s.io\"\n","stream":"stderr","time":"2024-08-05T13:24:51.013133647Z"}
{"log":"E0805 13:25:01.010261 1 oidc.go:335] oidc authenticator: initializing plugin: Get \"https://10.10.2.123/admin/master/console/#/MYORGANIZATION/.well-known/openid-configuration\": dial tcp 10.10.2.123:443: connect: connection refused\n","stream":"stderr","time":"2024-08-05T13:25:01.010450438Z"}
{"log":"E0805 13:25:11.009612 1 oidc.go:335] oidc authenticator: initializing plugin: Get \"https://10.10.2.123/admin/master/console/#/MYORGANIZATION/.well-known/openid-configuration\": dial tcp 10.10.2.123:443: connect: connection refused\n","stream":"stderr","time":"2024-08-05T13:25:11.009785705Z"}
kubelet 日志表明 kubelet 找不到第一个 kubernetes 节点(预计因为 kube-apiserver 不工作):
Aug 05 16:33:21 mynode1hostname kubelet[3556]: E0805 16:33:21.973266 3556 kubelet.go:2448] "Error getting node" err="node \"mynode1hostname\" not found"
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.032233 3556 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.10.2.123:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/mynode1hostname?timeout=10s": dial tcp 10.10.2.123:6443: connect: connection refused
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.074320 3556 kubelet.go:2448] "Error getting node" err="node \"mynode1hostname\" not found"
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.083596 3556 file.go:182] "Provided manifest path is a directory, not recursing into manifest path" path="/etc/kubernetes/manifests/BACKUP"
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.174820 3556 kubelet.go:2448] "Error getting node" err="node \"mynode1hostname\" not found"
kube-apiserver.yaml 文件非常标准。我们尝试通过注释其行来删除 OIDC 依赖项,但无济于事,因为 kubelet 仍然抱怨无法到达 OIDC(因此 OIDC 看起来像循环死循环,因为 kubelet 无法在没有 oidc 的情况下启动,并且 keycloak pod 无法由于 kubelet 失败而启动等等):
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.10.2.123:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.10.2.123
- --anynomous-auth=true
- --allow-privileged=true
- --audit-log-path=/var/log/kube-apiserver.log
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-issuer=https://kubernetes.default.svc.cluster.local
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --token-auth-file=/etc/kubernetes/passwords.csv
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
# - --oidc-issuer-url=https://10.10.2.123/admin/master/console/#/MYORGANIZATION
# - --oidc-client-id=Kubernetes
# - --oidc-username-claim=username
# - --oidc-groups-claim=group
- -v=5
image: registry.k8s.io/kube-apiserver:v1.25.11
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 10.10.2.123
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-apiserver
readinessProbe:
failureThreshold: 3
httpGet:
host: 10.10.2.123
path: /readyz
port: 6443
scheme: HTTPS
periodSeconds: 1
timeoutSeconds: 15
resources:
requests:
cpu: 250m
startupProbe:
failureThreshold: 24
httpGet:
host: 10.10.2.123
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/ca-certificates
name: etc-ca-certificates
readOnly: true
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /usr/local/share/ca-certificates
name: usr-local-share-ca-certificates
readOnly: true
- mountPath: /usr/share/ca-certificates
name: usr-share-ca-certificates
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/ca-certificates
type: DirectoryOrCreate
name: etc-ca-certificates
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /usr/local/share/ca-certificates
type: DirectoryOrCreate
name: usr-local-share-ca-certificates
- hostPath:
path: /usr/share/ca-certificates
type: DirectoryOrCreate
name: usr-share-ca-certificates
status: {}
在更新 kubernetes 证书并重新启动节点后,有人也发现类似的情况吗?我尝试在 SO 中搜索,但没有此案例的明确证据。
先感谢您