Em um cluster Kubernetes 1.25 com 2 controladores Ubuntu, após detectar a expiração de certificados dentro de /etc/kubernetes/pki, lançamos o comando 'kubeadm promote all' em ambos os nós controladores e reinicializamos os nós.
Após a reinicialização, o kube-apiserver não consegue mais inicializar e seu contêiner docker continua travando.
Os logs do docker reclamam de não alcançar o servidor oidc (é um pod keycloak), então parece que o conector oidc é obrigatório para iniciar o servidor api:
{"log":"I0805 13:24:51.011009 1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.\n","stream":"stderr","time":"2024-08-05T13:24:51.011063776Z"}
{"log":"I0805 13:24:51.011027 1 plugins.go:161] Loaded 11 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,PodSecurity,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.\n","stream":"stderr","time":"2024-08-05T13:24:51.011072396Z"}
{"log":"I0805 13:24:51.013060 1 etcd.go:292] \"Using watch cache\" resource=\"customresourcedefinitions.apiextensions.k8s.io\"\n","stream":"stderr","time":"2024-08-05T13:24:51.013133647Z"}
{"log":"E0805 13:25:01.010261 1 oidc.go:335] oidc authenticator: initializing plugin: Get \"https://10.10.2.123/admin/master/console/#/MYORGANIZATION/.well-known/openid-configuration\": dial tcp 10.10.2.123:443: connect: connection refused\n","stream":"stderr","time":"2024-08-05T13:25:01.010450438Z"}
{"log":"E0805 13:25:11.009612 1 oidc.go:335] oidc authenticator: initializing plugin: Get \"https://10.10.2.123/admin/master/console/#/MYORGANIZATION/.well-known/openid-configuration\": dial tcp 10.10.2.123:443: connect: connection refused\n","stream":"stderr","time":"2024-08-05T13:25:11.009785705Z"}
Os logs do kubelet dizem que o kubelet não consegue encontrar o primeiro nó do kubernetes (esperado porque o kube-apiserver não está funcionando):
Aug 05 16:33:21 mynode1hostname kubelet[3556]: E0805 16:33:21.973266 3556 kubelet.go:2448] "Error getting node" err="node \"mynode1hostname\" not found"
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.032233 3556 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://10.10.2.123:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/mynode1hostname?timeout=10s": dial tcp 10.10.2.123:6443: connect: connection refused
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.074320 3556 kubelet.go:2448] "Error getting node" err="node \"mynode1hostname\" not found"
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.083596 3556 file.go:182] "Provided manifest path is a directory, not recursing into manifest path" path="/etc/kubernetes/manifests/BACKUP"
Aug 05 16:33:22 mynode1hostname kubelet[3556]: E0805 16:33:22.174820 3556 kubelet.go:2448] "Error getting node" err="node \"mynode1hostname\" not found"
O arquivo kube-apiserver.yaml é bastante padrão. Tentamos remover a dependência do OIDC comentando suas linhas sem sucesso, pois o kubelet ainda reclama por não alcançar o OIDC (então o OIDC parece um loop morto circular, pois o kubelet não pode iniciar sem oidc e o pod keycloak não pode iniciar devido ao kubelet falhar e assim por diante):
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.10.2.123:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.10.2.123
- --anynomous-auth=true
- --allow-privileged=true
- --audit-log-path=/var/log/kube-apiserver.log
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-issuer=https://kubernetes.default.svc.cluster.local
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --token-auth-file=/etc/kubernetes/passwords.csv
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
# - --oidc-issuer-url=https://10.10.2.123/admin/master/console/#/MYORGANIZATION
# - --oidc-client-id=Kubernetes
# - --oidc-username-claim=username
# - --oidc-groups-claim=group
- -v=5
image: registry.k8s.io/kube-apiserver:v1.25.11
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 10.10.2.123
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-apiserver
readinessProbe:
failureThreshold: 3
httpGet:
host: 10.10.2.123
path: /readyz
port: 6443
scheme: HTTPS
periodSeconds: 1
timeoutSeconds: 15
resources:
requests:
cpu: 250m
startupProbe:
failureThreshold: 24
httpGet:
host: 10.10.2.123
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/ca-certificates
name: etc-ca-certificates
readOnly: true
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /usr/local/share/ca-certificates
name: usr-local-share-ca-certificates
readOnly: true
- mountPath: /usr/share/ca-certificates
name: usr-share-ca-certificates
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
securityContext:
seccompProfile:
type: RuntimeDefault
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/ca-certificates
type: DirectoryOrCreate
name: etc-ca-certificates
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /usr/local/share/ca-certificates
type: DirectoryOrCreate
name: usr-local-share-ca-certificates
- hostPath:
path: /usr/share/ca-certificates
type: DirectoryOrCreate
name: usr-share-ca-certificates
status: {}
Alguém também encontrou um caso semelhante após renovar os certificados do Kubernetes e reiniciar os nós? Tentei pesquisar no SO sem nenhuma evidência clara deste caso.
Agradeço antecipadamente