我正在使用 ArgoCD部署应用程序。部署清单包括一个为应用程序执行一些一次性初始化的作业。Job 资源如下所示:
apiVersion: batch/v1
kind: Job
metadata:
labels:
app.kubernetes.io/instance: house
app.kubernetes.io/name: step-certificates
name: create-acme-provisioner
namespace: step-certificates
spec:
backoffLimit: 100
template:
metadata:
labels:
app.kubernetes.io/instance: house
app.kubernetes.io/name: step-certificates
spec:
containers:
- command:
- /bin/bash
- -c
- |
while ! step ca health; do
echo "waiting for ca"
sleep 1
done
if ! step ca provisioner list | grep -q '"name": "acme"'; then
step ca provisioner add acme --type ACME \
--admin-subject step \
--password-file /home/step/secrets/passwords/password \
--admin-provisioner "Admin JWK"
fi
image: cr.step.sm/smallstep/step-ca:0.22.1
name: create-acme-provisioner
volumeMounts:
- mountPath: /home/step/certs
name: certs
readOnly: true
- mountPath: /home/step/config
name: config
readOnly: true
- mountPath: /home/step/secrets
name: secrets
readOnly: true
- mountPath: /home/step/secrets/passwords
name: ca-password
readOnly: true
restartPolicy: Never
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
volumes:
- configMap:
name: step-certificates-certs
name: certs
- configMap:
name: step-certificates-config
name: config
- name: secrets
secret:
secretName: step-certificates-secrets
- name: ca-password
secret:
secretName: step-certificates-ca-password
ttlSecondsAfterFinished: 60
它按预期工作——在主应用程序启动时它会失败几次,但随后它会运行,一切看起来都很好:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
create-acme-provisioner-7zhp2 0/1 Completed 0 12s
step-certificates-0 2/2 Running 0 54m
$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
create-acme-provisioner 1/1 3s 20s
问题是 ArgoCD 不断重新同步作业资源。每分钟,所以作业再次运行......再次......等等。来自 argocd-application-controller pod 的日志如下所示:
time="2022-09-30T16:20:42Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:20:42Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00259-Dpgma tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:20:42Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00259-Dpgma
time="2022-09-30T16:21:45Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:21:45Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00260-KsLXq tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:21:45Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00260-KsLXq
time="2022-09-30T16:22:49Z" level=info msg="Initialized new operation: {&SyncOperation{Revision:114442fcfb789190cfb9e7353a636369e7113c01,Prune:true,DryRun:false,SyncStrategy:nil,Resources:[]SyncOperationResource{SyncOperationResource{Group:batch,Kind:Job,Name:create-acme-provisioner,Namespace:,},},Source:nil,Manifests:[],SyncOptions:[CreateNamespace=true],} { true} [] {-1 &Backoff{Duration:30s,Factor:*2,MaxDuration:10m,}}}" application=step-certificates-infra
time="2022-09-30T16:22:49Z" level=info msg="Tasks (dry-run)" application=step-certificates-infra syncId=00261-itFqU tasks="[Sync/0 resource batch/Job:step-certificates/create-acme-provisioner nil->obj (,,)]"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Applying resource Job/create-acme-provisioner in cluster: https://10.96.0.1:443, namespace: step-certificates"
time="2022-09-30T16:22:49Z" level=info msg="Adding resource result, status: 'Synced', phase: 'Running', message: 'job.batch/create-acme-provisioner created'" application=step-certificates-infra kind=Job name=create-acme-provisioner namespace=step-certificates phase=Sync syncId=00261-itFqU
为什么 ArgoCD 重新同步此资源,我如何让它停止?
我弄清楚发生了什么事。
作业配置为
ttlSecondsAfterFinished
,在此处记录。我误读了文档,并认为这会清理作业创建的 Pod,但实际上它会导致作业本身被删除。由于 Job 是由 ArgoCD 管理的,当由于
ttlSecondsAfterFinished
设置 ArgoCD 被删除时,会提示重新创建它。正如@SYN 在评论中建议的那样,另一种解决方案是将 Job 配置为 ArgoCD PostSync 挂钩,并带有
hook-delete-policy
:当 ArgoCD 成功同步应用程序时,它会创建这个作业,当作业成功时,ArgoCD 会删除它。
这意味着该作业在每次同步时运行一次,但这很好。它不再每 60 秒运行一次。