RFC 5322 is the base standard that describes the format of email messages. It derives from the older RFC 822 standard which came into widespread use at a time when most email was composed of ASCII characters only. RFC 5322 is a specification written assuming email contains only 7-bit ASCII characters, and leaves dealing with non-ASCII characters to the MIME RFCs.
As email has been deployed worldwide, it has become internationalized, such that language specific character sets can now be used in email messages. The base standard still requires email messages to be transferred using only 7-bit ASCII characters, so a slew of RFCs have been written describing how to encode email containing non-ASCII characters into RFC 5322-compliant format. These RFCs include RFC 2045, RFC 2046, RFC 2047, and RFC 2231. The email package supports these standards in this module, with help from the email.charset and email.cte modules.
The email package represents message headers using its header classes. Headers can be created using the header() factory method. All header classes come in two varieties, depending on the input data type: a bytes form and a string form. A convenience attribute, data_type, can be used tell which type a particular header is, but in most cases an application does not need to worry about the type of individual headers, only the types of the message objects with which it is dealing. String headers store and manipulate a string representation of the header data containing arbitrary unicode code points, while bytes headers store and manipulate a bytes representation of the header data encoded according to RFC 2047.
Which specific header class is used to represent a header is controlled by the headers registry contained in the policy object in use when the header is created. All standard policy objects use the same headers registry, which recognizes all the header fields required by RFC 5322, plus a number of additional headers commonly found in email messages. Unstructured headers (and any header that is not recognized by the headers registry is treated as unstructured) are represented by the __baseclass__ class, which in the standard registry is the UnstructuredHeader class. This class is used as the base class for all headers.
A string header can be converted into a bytes header via its encode() method. This will perform the necessary RFC 2047 encoding of the header. Conversely, a bytes header can be converted into a string header via its decode() method, which will decode the field contents into unicode. The message classes automatically handle encoding and decoding header objects as required.
Here is an example of creating a subject that contains non-ASCII characters:
>>> from email.headers import header
>>> sheader = header('Subject', 'pöstal')
>>> sheader.data_type
'string'
>>> print(sheader)
Subject: pöstal
>>> bheader = sheader.encode()
>>> bheader.data_type
'bytes'
>>> print(bheader)
Subject: =?utf-8?q?p=F6stal?=
Either header can be added to a message object and the message object will convert it as needed:
>>> from email.message import StringMessage, BytesMessage
>>> from sys import stdout
>>> smsg = StringMessage()
>>> smsg.append_header(sheader)
>>> for line in smsg.serialize():
>>> stdout.write(line)
Subject: pöstal
>>> bmsg = BytesMessage()
>>> bmsg.append_header(sheader)
>>> for line in bmsg.serialize()
>>> stdout.write(line)
Subject: =?utf-8?q?p=F6stal?=
>>> bmsg['Subject'].data_type
'bytes'
header is a factory function that returns an appropriate class based on the type of its value argument (bytes versus string) and the value of the name argument.
name is the name of the header (for example, ‘Subject’). Case is not significant in header names, and is normalized for those header names that appear in the policy’s header registry. Names that do not appear in the header registry are case preserved with the exception of the first letter, which is made upper case if it is not already. Under the default policy, names are restricted to characters in the range 33 to 126 inclusive except for the ‘:’, regardless of whether the header is of data type string or bytes.
value is the body of the header. The body may not include carriage return or linefeed characters. Values for headers of data type string are otherwise unrestricted. Under the default policy, headers of data type bytes must contain only printable ASCII characters (characters in the range 33 to 126 inclusive), spaces and tabs. Further restrictions may apply to headers of either data type when the name appears in the header registry associated with a structured header subclass. If the value is None, then an empty instance is created, and the value can be set later (through the value attribute for unstructured headers, or through specialized methods for structured headers).
policy sets the default policy for this header (see policy).
The class obtained from the header registry in the policy, based on the value of name, is combined with the __baseclass__ class from the registry and one of the two classes __bytesclass__ or __stringclass__ from the registry to construct the actual class whose instance is returned.
This is the value for the __stringclass__ entry in the default header registry.
This is the value for the __bytesclass__ entry in the default header registry.
This is the value for the __baseclass__ entry in the default header registry.
name is the name of the header It should be in normalized form, as it will not be processed.
value is the field body. It should be in unfolded form (that is, it should not contain any carriage returns or line feeds). It is not validated during object creation.