我正在使用 systemd 单元文件来控制在服务器上运行的 python 进程(使用 systemd v247)。
此进程必须在退出后 60 秒重新启动,无论是失败还是成功,除非它在 600 秒内失败 5 次。
该单元文件链接另一个服务,以便通过电子邮件通知故障。
/etc/systemd/system/python-test.service
[Unit]
After=network.target
OnFailure=mailer@%n.service
[Service]
Type=simple
ExecStart=/home/debian/tmp.py
# Any exit status different than 0 is considered as an error
SuccessExitStatus=0
StandardOutput=append:/var/log/python-test.log
StandardError=append:/var/log/python-test.log
# Always restart service 60sec after exit
Restart=always
RestartSec=60
# Stop restarting service after 5 consecutive fail in 600sec interval
StartLimitInterval=600
StartLimitBurst=5
[Install]
WantedBy=multi-user.target
/etc/systemd/system/[email protected]
[Unit]
After=network.target
[Service]
Type=oneshot
ExecStart=/home/debian/mailer.py --to "[email protected]" --subject "Systemd service %I failed" --message "A systemd service failed %I on %H"
[Install]
WantedBy=multi-user.target
在基本测试期间,触发OnFailure
工作得很好。但是,当我将以下部分添加到单元文件中时,OnFailure
仅在连续 5 次失败时触发。
StartLimitInterval=600
StartLimitBurst=5
这不是我想要的行为,因为我希望每次进程失败时都能收到通知,即使尚未达到突发限制。
检查进程状态时,未达到突发限制时输出不同
● python-test.service
Loaded: loaded (/etc/systemd/system/python-test.service; disabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Thu 2022-12-22 19:51:23 UTC; 2s ago
Process: 1421600 ExecStart=/home/debian/tmp.py (code=exited, status=1/FAILURE)
Main PID: 1421600 (code=exited, status=1/FAILURE)
CPU: 31ms
Dec 22 19:51:23 test-vps systemd[1]: python-test.service: Failed with result 'exit-code'.
比当它是
● python-test.service
Loaded: loaded (/etc/systemd/system/python-test.service; disabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2022-12-22 19:52:02 UTC; 24s ago
Process: 1421609 ExecStart=/home/debian/tmp.py (code=exited, status=1/FAILURE)
Main PID: 1421609 (code=exited, status=1/FAILURE)
CPU: 31ms
Dec 22 19:51:56 test-vps systemd[1]: python-test.service: Failed with result 'exit-code'.
Dec 22 19:52:02 test-vps systemd[1]: python-test.service: Scheduled restart job, restart counter is at 5.
Dec 22 19:52:02 test-vps systemd[1]: Stopped python-test.service.
Dec 22 19:52:02 test-vps systemd[1]: python-test.service: Start request repeated too quickly.
Dec 22 19:52:02 test-vps systemd[1]: python-test.service: Failed with result 'exit-code'.
Dec 22 19:52:02 test-vps systemd[1]: Failed to start python-test.service.
Dec 22 19:52:02 test-vps systemd[1]: python-test.service: Triggering OnFailure= dependencies.
我找不到任何解释如何OnFailure
在单元文件中修改触发的内容。
有没有办法在每次进程失败时通知邮件并仍然保持突发限制?