email: Representing email Headers

RFC 5322 is the base standard that describes the format of email messages. It derives from the older RFC 822 standard which came into widespread use at a time when most email was composed of ASCII characters only. RFC 5322 is a specification written assuming email contains only 7-bit ASCII characters, and leaves dealing with non-ASCII characters to the MIME RFCs.

As email has been deployed worldwide, it has become internationalized, such that language specific character sets can now be used in email messages. The base standard still requires email messages to be transferred using only 7-bit ASCII characters, so a slew of RFCs have been written describing how to encode email containing non-ASCII characters into RFC 5322-compliant format. These RFCs include RFC 2045, RFC 2046, RFC 2047, and RFC 2231. The email package supports these standards in this module, with help from the email.charset and email.cte modules.

The email package represents message headers using its header classes. Headers can be created using the header() factory method. All header classes come in two varieties, depending on the input data type: a bytes form and a string form. A convenience attribute, data_type, can be used tell which type a particular header is, but in most cases an application does not need to worry about the type of individual headers, only the types of the message objects with which it is dealing. String headers store and manipulate a string representation of the header data containing arbitrary unicode code points, while bytes headers store and manipulate a bytes representation of the header data encoded according to RFC 2047.

Which specific header class is used to represent a header is controlled by the headers registry contained in the policy object in use when the header is created. All standard policy objects use the same headers registry, which recognizes all the header fields required by RFC 5322, plus a number of additional headers commonly found in email messages. Unstructured headers (and any header that is not recognized by the headers registry is treated as unstructured) are represented by the __baseclass__ class, which in the standard registry is the UnstructuredHeader class. This class is used as the base class for all headers.

A string header can be converted into a bytes header via its encode() method. This will perform the necessary RFC 2047 encoding of the header. Conversely, a bytes header can be converted into a string header via its decode() method, which will decode the field contents into unicode. The message classes automatically handle encoding and decoding header objects as required.

Here is an example of creating a subject that contains non-ASCII characters:

>>> from email.headers import header
>>> sheader = header('Subject', 'pöstal')
>>> sheader.data_type
'string'
>>> print(sheader)
Subject: pöstal
>>> bheader = sheader.encode()
>>> bheader.data_type
'bytes'
>>> print(bheader)
Subject: =?utf-8?q?p=F6stal?=

Either header can be added to a message object and the message object will convert it as needed:

>>> from email.message import StringMessage, BytesMessage
>>> from sys import stdout
>>> smsg = StringMessage()
>>> smsg.append_header(sheader)
>>> for line in smsg.serialize():
>>>     stdout.write(line)
Subject: pöstal
>>> bmsg = BytesMessage()
>>> bmsg.append_header(sheader)
>>> for line in bmsg.serialize()
>>>     stdout.write(line)
Subject: =?utf-8?q?p=F6stal?=
>>> bmsg['Subject'].data_type
'bytes'

Factory Functions

email.headers.header(name, value=None, policy=PolicySet(Strict()))

header is a factory function that returns an appropriate class based on the type of its value argument (bytes versus string) and the value of the name argument.

name is the name of the header (for example, ‘Subject’). Case is not significant in header names, and is normalized for those header names that appear in the policy’s header registry. Names that do not appear in the header registry are case preserved with the exception of the first letter, which is made upper case if it is not already. Under the default policy, names are restricted to characters in the range 33 to 126 inclusive except for the ‘:’, regardless of whether the header is of data type string or bytes.

value is the body of the header. The body may not include carriage return or linefeed characters. Values for headers of data type string are otherwise unrestricted. Under the default policy, headers of data type bytes must contain only printable ASCII characters (characters in the range 33 to 126 inclusive), spaces and tabs. Further restrictions may apply to headers of either data type when the name appears in the header registry associated with a structured header subclass. If the value is None, then an empty instance is created, and the value can be set later (through the value attribute for unstructured headers, or through specialized methods for structured headers).

policy sets the default policy for this header (see policy).

The class obtained from the header registry in the policy, based on the value of name, is combined with the __baseclass__ class from the registry and one of the two classes __bytesclass__ or __stringclass__ from the registry to construct the actual class whose instance is returned.

Header Classes

class email.headers.StringHeaderMixin

This is the value for the __stringclass__ entry in the default header registry.

data_type
Read only, value string.
encode(charset='utf-8')
Encode a string header into a bytes header. charset will be used to encode any characters in the value into bytes as needed, and therefore will also appear as the charset in any resulting RFC 2047 encoded words.
class email.headers.BytesHeaderMixin

This is the value for the __bytesclass__ entry in the default header registry.

data_type
Read only, value bytes.
decode(charset='us-ascii', errors='strict')
Decode a bytes header into a string header. charset will be used to decode any non-ASCII range characters in the value, and/or any non-ASCII range characters obtained from RFC 2047 encoded words whose charset is unknown-8bit. errors is passed to the codecs module (see codec-base-classes for possible values). The default charset of us-ascii and the default value of strict for errors means that the presence of such non-ASCII characters will result in a UnicodeDecodeError. On the other hand in most cases providing a charset will result in mojibake.
class email.headers.UnstructuredHeader(name, value, policy=PolicySet())

This is the value for the __baseclass__ entry in the default header registry.

name is the name of the header It should be in normalized form, as it will not be processed.

value is the field body. It should be in unfolded form (that is, it should not contain any carriage returns or line feeds). It is not validated during object creation.

policy
The default policy in effect for this header (see policy). Changing the policy may invalidate the defects list; use validate() to refresh it if needed.
policy_override
A Policy object providing policy settings that will override settings from the default policy. Changing the policy may invalidate the defects list; use validate() to refresh it if needed.
raw_data
The raw data that was parsed to create this header. The parser should set this attribute after creating the header. This attribute is used if the header is serialized with a policy that has use_raw_data_if_possible set.
name
The ‘field name’ in RFC 5233 parlance. The value is whatever was passed in to the constructor, but in normal circumstances will be the field name in normalized form (see header()).
value
The ‘field body’ in RFC 5233 parlance. The value is whatever was passed in to the constructor, but in normal circumstances will be an unfolded field body (see header()).
defects
A list of Defect instances corresponding to RFC compliance issues found while processing the header data, if the policy in use records them. Defects can be noted during any method call, so the defects list may be incomplete unless validate() has been called. Its initial value is an empty list.
validate(policy=None)
Make sure that all detectable defects are registered. Calling this method clears the defects list, which will be reset if an only if the policy used does so (the default policy does so).
serialize(policy=None)
A generator returning the lines resulting from folding the header according to RFC 5322 folding rules, using the formatting parameters specified by policy. The generated lines include the end of line characters specified by the policy.
__str__()
A representation of the header as a single line (name and value separated by a colon and a space). For a bytes header this will be the ASCII representation decoded to unicode, and is equivalent to serializing the header with Policy(max_line_length=None, must_be_7bit=True, newline=''), and doing a decode(‘ASCII’) on the resulting byte string.
__eq__(other)
Two headers are equal if they have the same base types, their names compare equal without regard to case, and their values compare equal (see value_equal()).
__ne__(other)
not __eq__(other)
value_equal(other)
The values of unstructured headers are considered equal if they compare equal under simple string comparison. Bytes header values are decoded to strings before comparison. If a UnicodeDecodeError is raised during the decoding of either value, then the byte strings are compared.