这在 procmail 中有效,但似乎 procmail在 2001 年 9 月被放弃了。当在“收件人:”标头中使用 utf-8 来使用表情符号或非拉丁字符写我的名字时,我有一个规则可以感知。当我在 Dovecot 的 Sieve 实现“Pigeonhole”中尝试同样的操作时,我很沮丧,因为它似乎丢弃了一些数据。
参考。RFC5228 参考中的筛选规则
。Dovecot Pigeonhole 实施
我尝试了什么:
require ["fileinto"];
if header :contains ["to", "from"] "=?utf-8?B?" { fileinto "Junk"; }
elsif address :contains :all ["to", "from"] "=?utf-8?B?" { fileinto "Junk"; }
使用此示例数据:
From: "=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>
To: "=?utf-8?B?Q1VTVA==?=" <[email protected]>
Subject: =?utf-8?B?UmU6TWljcm9jaGlwIFRleGFzIE9mZmVy?=
Date: Mon, 20 Mar 2023 16:12:50 +0900
Hello potential customer! Please stop whatever you're
doing and pay attention to me!
我得到什么:
sieve-test -Tlevel=matching -t - /tmp/badmail.sieve /tmp/badmail.txt
## Started executing script 'badmail'
2: header test
2: starting `:contains' match with `i;ascii-casemap' comparator:
2: extracting `to' headers from message
2: matching value `"CUST" <[email protected]>'
2: with key `=?utf-8?B?' => 0
2: extracting `from' headers from message
2: matching value `"Mini Wu" <[email protected]>'
2: with key `=?utf-8?B?' => 0
2: finishing match with result: not matched
2: jump if result is false
2: jumping to line 3
3: address test
3: starting `:contains' match with `i;ascii-casemap' comparator:
3: extracting `to' headers from message
3: parsing address header value `"=?utf-8?B?Q1VTVA==?=" <[email protected]>'
3: address value `[email protected]'
3: extracting `all' part from address <[email protected]>
3: matching value `[email protected]'
3: with key `=?utf-8?B?' => 0
3: extracting `from' headers from message
3: parsing address header value `"=?utf-8?B?TWluaSBXdQ==?=" <[email protected]>'
3: address value `[email protected]'
3: extracting `all' part from address <[email protected]>
3: matching value `[email protected]'
3: with key `=?utf-8?B?' => 0
3: finishing match with result: not matched
3: jump if result is false
3: jumping to line 3
## Finished executing script 'badmail'
Implicit keep: store message in folder: INBOX
它在跟踪输出中记录了“=?utf-8?B?...”,所以我知道它知道。但是 'header' 测试和 'address' 测试都在执行之前丢弃了该数据。我还尝试了:comparator "i;octet"
而不是默认的“i;ascii-casemap”,结果相同。
我如何测试原始标头而不是这些解释值?
所以..您实际上并不是要区分“表情符号或非拉丁字符”,而是要区分‡字符如何在线上传输的细节?
我想不出让 Sieve 返回原始字节的方法。您可以通过在邮件服务器中进行匹配来解决问题,例如使用 Postfix (RFC2047-ignorant) header_checks 功能来添加自定义标头,例如
然后检查 sieve 中是否存在此类标记头。
即使是今天,我怀疑整个事情在可预见的未来是否会成为可靠的分类标准。一个中继 SMTP 服务器,直到并包括传递给 sieve 的服务器,可能会在以前没有作为消息转换的一部分的地方添加编码。一些邮件客户端会在不需要的地方添加编码,而其他邮件客户端即使应该这样做也不会这样做。检测无意间的差异可能不会静态地影响相同类型的消息。
‡ 普通邮件很少有多余编码以外的选择 - Dovecot 还不能保证 8 位干净的传输,例如 SMTPUTF8