What travels on a network are streams of bytes, also known in networking jargon as octets. Bytes can, of course, represent text, via any of several possible encodings. However, what you want to send over the network often has more structure than just a stream of text or bytes. The Multipurpose Internet Mail Extensions (MIME) and other encoding standards bridge the gap by specifying how to represent structured data as bytes or text. While often originally designed for email, such encodings are also used in the web and many other networked systems. Python supports such encodings through many library modules, such as base64
, quopri
, and uu
(covered in “Encoding Binary Data as ASCII Text”), and the modules of the email
package (covered in “MIME and Email Format Handling”).
The email
package handles parsing, generation, and manipulation of MIME files such as email messages, Network News (NNTP) posts, HTTP interactions, and so on. The Python standard library also contains other modules that handle some parts of these jobs. However, the email
package offers a complete and systematic approach to these important tasks. We suggest you use email
, not the older modules that partially overlap with parts of email
’s functionality. email
, despite its name, has nothing to do with receiving or sending email; for such tasks, see the modules poplib
and smtplib
, covered in “Email Protocols”. email
deals with handling MIME messages (which may or may not be mail) after you receive them, or constructing them properly before you send them.
The email
package supplies two factory functions that return an instance m
of the class email.message.Message
. These functions rely on the class email.parser.Parser
, but the factory functions are handier and simpler. Therefore, we do not cover the email.parser
module further in this book.
message_from_string |
Builds |
message_from_file |
Builds |
v3 only, also supplies two similar factory functions to build message objects from bytestrings and binary files:
message_from_bytes |
Builds |
message_from_binary_file |
Builds |
The email.message
module supplies the class Message
. All parts of the email
package make, modify, or use instances of Message
. An instance m
of Message
models a MIME message, including headers and a payload (data content). To create an initially empty m
, call Message
with no arguments. More often, you create m
by parsing via the factory functions message_from_string
and message_from_file
of email
, or by other indirect means such as the classes covered in “Creating Messages”. m
’s payload can be a string, a single other instance of Message
, or (for a multipart message) a list of other Message
instances.
You can set arbitrary headers on email messages you’re building. Several Internet RFCs specify headers for a wide variety of purposes. The main applicable RFC is RFC 2822. An instance m
of the Message
class holds headers as well as a payload. m
is a mapping, with header names as keys and header value strings as values.
To make m
more convenient, the semantics of m
as a mapping are different from those of a dict
. m
’s keys are case-insensitive. m
keeps headers in the order in which you add them, and the methods keys
, values
, and items
return lists of headers in that order. m
can have more than one header named key
: m
[
key
]
returns an arbitrary such header (or None
when the header is missing), and del
m
[
key
]
deletes all of them (it’s not an error if the header is missing).
To get a list of all headers with a certain name, call m.get_all(key)
. len(
m
)
returns the total number of headers, counting duplicates, not just the number of distinct header names. When there is no header named key
, m
[
key
]
returns None
and does not raise KeyError
(i.e., behaves like m
.get(
key
)
): del
m
[
key
]
does nothing in this case, and m.get_all(key)
returns an empty list. In v2, you cannot loop directly on m
; loop on m
.keys()
instead.
An instance m
of Message
supplies the following attributes and methods that deal with m
’s headers and payload.
add_header |
Like
When When a parameter’s value contains non-ASCII characters, specify it as a tuple with three items, |
as_string |
Returns the entire message as a string. When |
attach |
Adds |
epilogue |
The attribute |
get_all |
Returns a list with all values of headers named |
get_boundary |
Returns the string value of the |
get_charsets |
Returns the list |
get_content_maintype |
Returns |
get_content_subtype |
Returns |
get_content_type |
Returns |
get_filename |
Returns the string value of the |
get_param |
Returns the string value of parameter |
get_params |
Returns the parameters of |
get_payload |
Returns When |
get_unixfrom |
Returns the envelope header string for |
is_multipart |
Returns |
preamble |
Attribute |
set_boundary |
Sets the |
set_payload |
Sets |
set_unixfrom |
Sets the envelope header string for |
walk |
Returns an iterator on all parts and subparts of |
The email.Generator
module supplies the class Generator
, which you can use to generate the textual form of a message m
. m
.as_string()
and str(
m
)
may be sufficient, but Generator
gives you more flexibility. You instantiate the Generator
class with a mandatory argument and two optional arguments:
Generator |
This emits |
The email
package supplies modules with names that, in v2, start with 'MIME'
, each module supplying a subclass of Message
named just like the module. In v3, the modules are in the subpackage email.mime
, and the modules’ names are lowercase (for example, email.mime.text
in v3, instead of email.MIMEText
in v2). Although imported from different modules in v2 and v3, the class names are the same in both versions.
These classes make it easier to create Message
instances of various MIME types. The MIME classes are as follows:
The email.encoders
module (in v3) supplies functions that take a non-multipart message m
as their only argument, encode m
’s payload, and set m
’s headers appropriately. In v2, the module’s name is titlecase, email.Encoders
.
encode_base64 |
Uses Base64 encoding, optimal for arbitrary binary data. |
encode_noop |
Does nothing to |
encode_quopri |
Uses Quoted Printable encoding, optimal for text that is almost but not fully ASCII (see “The quopri Module”). |
encode_7or8bit |
Does nothing to |
The email.utils
module (in v3) supplies several functions useful for email processing. In v2, the module’s name is titlecase, email.Utils
.
formataddr |
|
formatdate |
|
getaddresses |
Parses each item of |
mktime_tz |
|
parseaddr |
Parses string |
parsedate |
Parses string |
parsedate_tz |
Like |
quote |
Returns a copy of string |
unquote |
Returns a copy of string |
The email
package helps you both in reading and composing email and email-like messages (but it’s not involved in receiving and transmitting such messages: those tasks belong to different and separate modules covered in Chapter 19). Here is an example of how to use email
to read a possibly multipart message and unpack each part into a file in a given directory:
import
os
,
def
unpack_mail
(
mail_file
,
dest_dir
):
''' Given file object mail_file, open for reading, and dest_dir, a
string that is a path to an existing, writable directory,
unpack each part of the mail message from mail_file to a
file within dest_dir.
'''
with
mail_file
:
msg
=
.
message_from_file
(
mail_file
)
for
part_number
,
part
in
enumerate
(
msg
.
walk
()):
if
part
.
get_content_maintype
()
==
'multipart'
:
# we'll get each specific part later in the loop,
# so, nothing to do for the 'multipart' itself
continue
dest
=
part
.
get_filename
()
if
dest
is
None
:
dest
=
part
.
get_param
(
'name'
)
if
dest
is
None
:
dest
=
'part-
{}
'
.
format
(
part_number
)
# In real life, make sure that dest is a reasonable filename
# for your OS; otherwise, mangle that name until it is
with
open
(
os
.
path
.
join
(
dest_dir
,
dest
),
'wb'
)
as
f
:
f
.
write
(
part
.
get_payload
(
decode
=
True
))
And here is an example that performs roughly the reverse task, packaging all files that are directly under a given source directory into a single file suitable for mailing:
def
pack_mail
(
source_dir
,
**
headers
):
''' Given source_dir, a string that is a path to an existing,
readable directory, and arbitrary header name/value pairs
passed in as named arguments, packs all the files directly
under source_dir (assumed to be plain text files) into a
mail message returned as a MIME-formatted string.
'''
msg
=
.
Message
.
Message
()
for
name
,
value
in
headers
.
items
():
msg
[
name
]
=
value
msg
[
'Content-type'
]
=
'multipart/mixed'
filenames
=
next
(
os
.
walk
(
source_dir
))[
-
1
]
for
filename
in
filenames
:
m
=
.
Message
.
Message
()
m
.
add_header
(
'Content-type'
,
'text/plain'
,
name
=
filename
)
with
open
(
os
.
path
.
join
(
source_dir
,
filename
),
'r'
)
as
f
:
m
.
set_payload
(
f
.
read
())
msg
.
attach
(
m
)
return
msg
.
as_string
()
The best way to handle email-like messages is with the email
package. However, some other modules covered in Chapters 19 and 20, in v2 only, use instances of the class rfc822.Message
or its subclass, mimetools.Message
. This section covers the subset of these classes’ functionality that you need to make effective use, in v2, of the modules covered in Chapters 19 and 20.
An instance m
of the class Message
in either of these v2-only modules is a mapping, with the headers’ names as keys and the corresponding header value strings as values. Keys and values are strings, and keys are case-insensitive. m
supports all mapping methods except clear
, copy
, popitem
, and update
. get
and setdefault
default to ''
instead of None
. The instance m
also supplies convenience methods (e.g., to combine getting a header’s value and parsing it as a date or an address). For such purposes, we suggest you use the functions of the module email.utils
(covered in “The email.utils Module”): use m
just as a mapping.
When m
is an instance of mimetools.Message
, m
supplies additional methods:
Several kinds of media (e.g., email messages) can contain only ASCII text. When you want to transmit arbitrary binary data via such media, you need to encode the data as ASCII text strings. The Python standard library supplies modules that support the standard encodings known as Base64, Quoted Printable, and UU.
The base64
module supports the encodings specified in RFC 3548 as Base16, Base32, and Base64. Each of these encodings is a compact way to represent arbitrary binary data as ASCII text, without any attempt to produce human-readable results. base64
supplies 10 functions: 6 for Base64, plus 2 each for Base32 and Base16. The 6 Base64 functions are:
b64decode |
Decodes Base64-encoded bytestring |
b64encode |
Encodes bytestring |
standard_b64decode |
Like |
standard_b64encode |
Like |
urlsafe_b64decode |
Like |
urlsafe_b64encode |
Like |
The four Base16 and Base32 functions are:
In v3 only, the module also supplies functions to encode and decode the non-standard but popular encodings Base85 and Ascii85, which, while not codified in RFCs nor compatible with each other, can offer space savings of 15% by using larger alphabets for encoded bytestrings. See the online docs for details on those functions.
The quopri
module supports the encoding specified in RFC 1521 as Quoted Printable (QP). QP can represent any binary data as ASCII text, but it’s mainly intended for data that is mostly textual, with a small amount of characters with the high bit set (i.e., characters outside the ASCII range). For such data, QP produces results that are both compact and reasonably human-readable. The quopri
module supplies four functions:
decode |
Reads the binary file-like object |
decodestring |
Decodes bytestring |
encode |
Reads binary file-like object |
encodestring |
Encodes bytestring |
The uu
module supports the classic Unix-to-Unix (UU) encoding, as implemented by the Unix programs uuencode and uudecode. UU starts encoded data with a begin
line, giving the filename and permissions of the file being encoded, and ends it with an end
line. Therefore, UU encoding lets you embed encoded data in otherwise unstructured text, while Base64 encoding relies on the existence of other indications of where the encoded data starts and finishes. The uu
module supplies two functions: