我有这个代码,但我实际上没有收到电子邮件文本。
我必须解码电子邮件文本吗?
import sys
import imaplib
import getpass
import email
import email.header
from email.header import decode_header
import base64
def read(username, password, sender_of_interest):
# Login to INBOX
imap = imaplib.IMAP4_SSL("imap.mail.com", 993)
imap.login(username, password)
imap.select('INBOX')
# Use search(), not status()
# Print all unread messages from a certain sender of interest
if sender_of_interest:
status, response = imap.uid('search', None, 'UNSEEN', 'FROM {0}'.format(sender_of_interest))
else:
status, response = imap.uid('search', None, 'UNSEEN')
if status == 'OK':
unread_msg_nums = response[0].split()
else:
unread_msg_nums = []
data_list = []
for e_id in unread_msg_nums:
data_dict = {}
e_id = e_id.decode('utf-8')
_, response = imap.uid('fetch', e_id, '(RFC822)')
html = response[0][1].decode('utf-8')
email_message = email.message_from_string(html)
data_dict['mail_to'] = email_message['To']
data_dict['mail_subject'] = email_message['Subject']
data_dict['mail_from'] = email.utils.parseaddr(email_message['From'])
#data_dict['body'] = email_message.get_payload()[0].get_payload()
data_dict['body'] = email_message.get_payload()
data_list.append(data_dict)
print(data_list)
# Mark them as seen
#for e_id in unread_msg_nums:
#imap.store(e_id, '+FLAGS', '\Seen')
imap.logout()
return data_dict
所以我这样做:
print('Getting the email text bodiies ... ')
emailData = read(usermail, pw, sender_of_interest)
print('Got the data!')
for key in emailData.keys():
print(key, emailData[key])
输出是:
mail_to [email protected]
mail_subject 获取 json 文件
mail_from ('Pedro Rodriguez', ' [email protected] ')
body [<email.message.Message object at 0x7f7d9f928df0>, <email.message.Message object at 0x7f7d9f928f70>]
如何实际获取电子邮件文本?
尝试了该建议,但似乎失败了,因为它是多部分的,并带有附加的文本文件:
def read(username, password, sender_of_interest):
# Login to INBOX
imap = imaplib.IMAP4_SSL("imap.qq.com", 993)
imap.login(username, password)
imap.select('INBOX')
# Use search(), not status()
# Print all unread messages from a certain sender of interest
if sender_of_interest:
status, response = imap.uid('search', None, 'UNSEEN', 'FROM {0}'.format(sender_of_interest))
else:
status, response = imap.uid('search', None, 'UNSEEN')
if status == 'OK':
unread_msg_nums = response[0].split()
else:
unread_msg_nums = []
data_list = []
for e_id in unread_msg_nums:
data_dict = {}
e_id = e_id.decode('utf-8')
_, response = imap.uid('fetch', e_id, '(RFC822)')
email_message = email.message_from_bytes(response[0][1], policy=default)
#html = response[0][1].decode('utf-8')
#email_message = email.message_from_string(html)
data_dict['mail_to'] = email_message['To']
data_dict['mail_subject'] = email_message['Subject']
data_dict['mail_from'] = email.utils.parseaddr(email_message['From'])
#data_dict['body'] = email_message.get_payload()[0].get_payload()
#data_dict['body'] = email_message.get_payload()[0]
data_dict['body'] = email_message.get_body('html', 'text').get_payload(decode=True)
data_list.append(data_dict)
print(data_list)
# Mark them as seen
#for e_id in unread_msg_nums:
#imap.store(e_id, '+FLAGS', '\Seen')
imap.logout()
return data_dict
得到这个错误:
emailData = read(usermail, pw, sender_of_interest) 回溯(最近一次调用):文件“/usr/lib/python3.10/idlelib/run.py”,第 578 行,在 runcode exec(code, self.locals) 文件中“<pyshell#126>”,第 1 行,文件“<pyshell#125>”,第 29 行,读取 TypeError: MIMEPart.get_body() 采用 1 到 2 个位置参数,但给出了 3 个
我还导入了 BeautifulSoup 以从 html 中获取文本:
from bs4 import BeautifulSoup
# this seems to work
def read(username, password, sender_of_interest):
# Login to INBOX
imap = imaplib.IMAP4_SSL("imap.qq.com", 993)
imap.login(username, password)
imap.select('INBOX')
# Use search(), not status()
# Print all unread messages from a certain sender of interest
if sender_of_interest:
status, response = imap.uid('search', None, 'UNSEEN', 'FROM {0}'.format(sender_of_interest))
else:
status, response = imap.uid('search', None, 'UNSEEN')
if status == 'OK':
unread_msg_nums = response[0].split()
else:
unread_msg_nums = []
data_list = []
for e_id in unread_msg_nums:
data_dict = {}
e_id = e_id.decode('utf-8')
_, response = imap.uid('fetch', e_id, '(RFC822)')
email_message = email.message_from_bytes(response[0][1], policy=default)
html = response[0][1].decode('utf-8')
data_dict['mail_to'] = email_message['To']
data_dict['mail_subject'] = email_message['Subject']
data_dict['mail_from'] = email.utils.parseaddr(email_message['From'])
body = email_message.get_body(('html', 'text')).get_payload(decode=True)
soup = BeautifulSoup(body, 'html.parser')
div_bs4 = soup.find('div')
text = div_bs4.string
data_dict['body'] = text
data_list.append(data_dict)
print(data_list)
# Mark them as seen
#for e_id in unread_msg_nums:
#imap.store(e_id, '+FLAGS', '\Seen')
imap.logout()
return data_dict
body 的输出现在是:
'body': '你能得到附件吗?
现在我需要做的就是获取附件!
根据“文本”的确切含义,您可能需要该
get_body
方法。但在达到这一点之前,您已经彻底修改了电子邮件。您从服务器收到的不是“HTML”,将其转换为字符串然后调用message_from_string
它是迂回且容易出错的。你得到的是字节;直接使用该message_from_bytes
方法。(这避免了当字节不是 UTF-8 时出现的各种问题;该message_from_string
方法只有在 Python 2 中才真正有意义,因为 Python 2 没有显式的bytes
。)使用 a
policy
选择(不再是非常)新的EmailMessage
;您需要 Python 3.3+ 才能使用此功能。较旧的遗留email.Message
类没有此方法,但出于许多其他原因也应在新代码中避免使用此方法。对于具有非平凡嵌套结构的多部分消息,这可能会失败;不带参数的方法
get_body
可以返回multipart/alternative
消息部分,然后您必须从那里获取它。您还没有指定您的消息应该是什么样子,所以我不会进一步深入研究。更根本的是,您可能需要更细致地了解现代电子邮件的结构。请参阅多部分电子邮件中的“部分”是什么?