看来我无法删除掌舵版本。状态卡在 DELETING 中,kubernetes 关联的清理作业也失败了,但没有说明可能导致它失败的任何原因。
你们以前经历过这种行为吗?这怎么能解决?我还单独运行了在清理容器中使用的 kubectl 命令,但仍然没有。
谢谢 !。
这里附加的命令输出:
helm ls --all prometheus-operator --debug
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
prometheus-operator 1 Mon Aug 5 17:22:14 2019 DELETING prometheus-operator-6.3.1 0.31.1 monitoring
prometheus-operator-v2 1 Mon Aug 5 19:26:20 2019 DEPLOYED prometheus-operator-6.4.0 0.31.1 monitoring
kubectl get job prometheus-operator-operator-cleanup -n monitoring
NAME COMPLETIONS DURATION AGE
prometheus-operator-operator-cleanup 0/1 19h 19h
kubectl describe jobs/prometheus-operator-operator-cleanup -n monitoring
Name: prometheus-operator-operator-cleanup
Namespace: monitoring
Selector: controller-uid=c6bfd107-b79a-11e9-a527-42010aa80121
Labels: app=prometheus-operator-operator
chart=prometheus-operator-6.3.1
heritage=Tiller
release=prometheus-operator
Annotations: helm.sh/hook: pre-delete
helm.sh/hook-delete-policy: hook-succeeded
helm.sh/hook-weight: 3
Parallelism: 1
Completions: 1
Start Time: Mon, 05 Aug 2019 19:04:59 +0300
Pods Statuses: 0 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=prometheus-operator-operator
chart=prometheus-operator-6.3.1
controller-uid=c6bfd107-b79a-11e9-a527-42010aa80121
heritage=Tiller
job-name=prometheus-operator-operator-cleanup
release=prometheus-operator
Service Account: prometheus-operator-operator
Containers:
kubectl:
Image: k8s.gcr.io/hyperkube:v1.12.1
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
kubectl delete alertmanager --all; kubectl delete prometheus --all; kubectl delete prometheusrule --all; kubectl delete servicemonitor --all; sleep 10; kubectl delete crd alertmanagers.monitoring.coreos.com; kubectl delete crd prometheuses.monitoring.coreos.com; kubectl delete crd prometheusrules.monitoring.coreos.com; kubectl delete crd servicemonitors.monitoring.coreos.com; kubectl delete crd podmonitors.monitoring.coreos.com;
Environment: <none>
Mounts: <none>
Volumes: <none>
Events: <none>
检查这个。
您可能只需要:
kubectl edit jobs/prometheus-operator-operator-cleanup -n monitoring
并从资源中删除终结器块。发现了问题。不知道为什么我在工作描述中没有任何事件,但再次运行删除并能够检查生成的清理 pod 日志:
该问题是由不完整的 helm 版本删除(中断)引起的。在此删除过程中,prometheus 操作员的服务帐户及其关联的 clusterrolebinding+clusterrole 被删除,并且在第二次 helm delete 尝试时,它缺少删除第一次尝试中未删除的所有其他内容所需的权限。