我注意到 postfix 日志有问题,缺少有关某些电子邮件传递状态的信息。该问题影响了大约 1% 的电子邮件。
“健康”日志:
<server># grep 8EB992EFBB44 postfix_log/mail04.log
Jun 5 03:09:29 mail04 postfix/smtpd[8537]: 8EB992EFBB44: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun 5 03:09:29 mail04 postfix/cleanup[34349]: 8EB992EFBB44: message-id=<[email protected]>
Jun 5 03:12:02 mail04 postfix/qmgr[76377]: 8EB992EFBB44: from=<[email protected]>, size=48845, nrcpt=1 (queue active)
Jun 5 03:15:12 mail04 postfix/smtp[35058]: 8EB992EFBB44: to=<[email protected]>, relay=mx.baz.com[xxx.xxx.xxx.xxx]:25, conn_use=70, delay=343, delays=153/190/0/0.24, dsn=2.0.0, status=sent (250 ok)
Jun 5 03:15:12 mail04 postfix/qmgr[76377]: 8EB992EFBB44: removed
“破碎”日志:
<server># grep F3C362EF37CA postfix_log/mail04.log
Jun 5 04:03:27 mail04 postfix/smtpd[39666]: F3C362EF37CA: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun 5 04:03:27 mail04 postfix/cleanup[41287]: F3C362EF37CA: message-id=<[email protected]>
Jun 5 04:03:28 mail04 postfix/qmgr[76377]: F3C362EF37CA: from=<[email protected]>, size=48892, nrcpt=1 (queue active)
** here should be a log line from postfix/smtp but there is none **
Jun 5 04:03:29 mail04 postfix/qmgr[76377]: F3C362EF37CA: removed
背景资料:
系统:FreeBSD xxx.xxx.xxx 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64
Postfix 安装在 jail 中。日志在同一台机器上,日志目录通过 nullfs 挂载。该站点有高负载峰值,导致磁盘(本地)以 100% 的速度运行。
更新
日志每天轮换,当前大小约为 500MB。
我通过将 99000 条消息排队到同一目的地进行了测试(以排除 dns/network/mx 问题)。5715 邮件没有任何 DSN 记录。失败的消息队列时间随着时间的推移均匀分布,我没有看到任何时间限制的问题。
一些未送达的电子邮件:
envelopeid | processed_time
--------------+----------------------------
8D7652EF3BAE | 2012-06-06 13:19:11.072715
DD53A2EF3C5C | 2012-06-06 13:33:24.374783
8C52F2EF4E3F | 2012-06-06 13:39:15.810616
BBC572EF525C | 2012-06-06 13:44:22.762812
E95822EF54D1 | 2012-06-06 13:52:01.134533
839DD2EF4FBB | 2012-06-06 14:13:48.511236
017EE2EF6234 | 2012-06-06 15:04:48.618963
这些只是一些选择,此类未送达电子邮件的记录几乎每秒都会发生。
<server># egrep '(8D7652EF3BAE|BBC572EF525C|017EE2EF6234)' mail04.log
Jun 6 13:19:10 mail04 postfix/smtpd[20350]: 8D7652EF3BAE: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun 6 13:19:10 mail04 postfix/cleanup[21024]: 8D7652EF3BAE: message-id=<[email protected]>
Jun 6 13:19:10 mail04 postfix/qmgr[7939]: 8D7652EF3BAE: from=<[email protected]>, size=63718, nrcpt=1 (queue active)
Jun 6 13:19:11 mail04 postfix/qmgr[7939]: 8D7652EF3BAE: removed
Jun 6 13:44:22 mail04 postfix/smtpd[20346]: BBC572EF525C: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun 6 13:44:22 mail04 postfix/cleanup[24811]: BBC572EF525C: message-id=<[email protected]>
Jun 6 13:44:22 mail04 postfix/qmgr[7939]: BBC572EF525C: from=<[email protected]>, size=63758, nrcpt=1 (queue active)
Jun 6 15:04:49 mail04 postfix/smtpd[20344]: 017EE2EF6234: client=xxx.xxx.xxx[xxx.xxx.xxx.xxx]
Jun 6 15:04:49 mail04 postfix/cleanup[35585]: 017EE2EF6234: message-id=<[email protected]>
Jun 6 15:04:49 mail04 postfix/qmgr[7939]: 017EE2EF6234: from=<[email protected]>, size=63706, nrcpt=1 (queue active)
<server>#
<server># find /var/spool/postfix/active/ -type f -print | wc -l
1
<server>#
重要提示:正如您在上面看到的,一些电子邮件事件中没有该removed
行。