:mod:`email`: Representing email Headers ---------------------------------------- .. module:: email.headers :synopsis: Representing structured and unstructured headers that may or may not contain non-ASCII characters. :rfc:`5322` is the base standard that describes the format of email messages. It derives from the older :rfc:`822` standard which came into widespread use at a time when most email was composed of ASCII characters only. :rfc:`5322` is a specification written assuming email contains only 7-bit ASCII characters, and leaves dealing with non-ASCII characters to the MIME RFCs. As email has been deployed worldwide, it has become internationalized, such that language specific character sets can now be used in email messages. The base standard still requires email messages to be transferred using only 7-bit ASCII characters, so a slew of RFCs have been written describing how to encode email containing non-ASCII characters into :rfc:`5322`\ -compliant format. These RFCs include :rfc:`2045`, :rfc:`2046`, :rfc:`2047`, and :rfc:`2231`. The :mod:`email` package supports these standards in this module, with help from the :mod:`email.charset` and :mod:`email.cte` modules. The email package represents message headers using its header classes. Headers can be created using the :func:`header` factory method. All header classes come in two varieties, depending on the input data type: a bytes form and a string form. A convenience attribute, :attr:`~StringHeaderMixin.data_type`, can be used tell which type a particular header is, but in most cases an application does not need to worry about the type of individual headers, only the types of the message objects with which it is dealing. String headers store and manipulate a string representation of the header data containing arbitrary unicode code points, while bytes headers store and manipulate a bytes representation of the header data encoded according to :rfc:`2047`. Which specific header class is used to represent a header is controlled by the :attr:`~email.policy.PolicySet.headers` registry contained in the policy object in use when the header is created. All standard policy objects use the same headers registry, which recognizes all the header fields required by :rfc:`5322`, plus a number of additional headers commonly found in email messages. Unstructured headers (and any header that is not recognized by the headers registry is treated as unstructured) are represented by the ``__baseclass__`` class, which in the standard registry is the :class:`UnstructuredHeader` class. This class is used as the base class for all headers. A string header can be converted into a bytes header via its :meth:`~StringHeaderMixin.encode` method. This will perform the necessary :rfc:`2047` encoding of the header. Conversely, a bytes header can be converted into a string header via its :meth:`~BytesHeaderMixin.decode` method, which will decode the field contents into unicode. The message classes automatically handle encoding and decoding header objects as required. Here is an example of creating a subject that contains non-ASCII characters:: >>> from email.headers import header >>> sheader = header('Subject', 'pöstal') >>> sheader.data_type 'string' >>> print(sheader) Subject: pöstal >>> bheader = sheader.encode() >>> bheader.data_type 'bytes' >>> print(bheader) Subject: =?utf-8?q?p=F6stal?= Either header can be added to a message object and the message object will convert it as needed:: >>> from email.message import StringMessage, BytesMessage >>> from sys import stdout >>> smsg = StringMessage() >>> smsg.append_header(sheader) >>> for line in smsg.serialize(): >>> stdout.write(line) Subject: pöstal >>> bmsg = BytesMessage() >>> bmsg.append_header(sheader) >>> for line in bmsg.serialize() >>> stdout.write(line) Subject: =?utf-8?q?p=F6stal?= >>> bmsg['Subject'].data_type 'bytes' Factory Functions ~~~~~~~~~~~~~~~~~ .. function:: header(name, value=None, policy=PolicySet(Strict())) header is a factory function that returns an appropriate class based on the type of its value argument (bytes versus string) and the value of the *name* argument. *name* is the name of the header (for example, 'Subject'). Case is not significant in header names, and is normalized for those header names that appear in the policy's header registry. Names that do not appear in the header registry are case preserved with the exception of the first letter, which is made upper case if it is not already. Under the default policy, names are restricted to characters in the range 33 to 126 inclusive except for the ':', regardless of whether the header is of data type string or bytes. *value* is the body of the header. The body may not include carriage return or linefeed characters. Values for headers of data type string are otherwise unrestricted. Under the default policy, headers of data type bytes must contain only printable ASCII characters (characters in the range 33 to 126 inclusive), spaces and tabs. Further restrictions may apply to headers of either data type when the name appears in the header registry associated with a structured header subclass. If the value is :const:`None`, then an empty instance is created, and the value can be set later (through the :attr:`~UnstructuredHeader.value` attribute for unstructured headers, or through specialized methods for structured headers). *policy* sets the default policy for this header (see :mod:`~email.policy`). The class obtained from the header registry in the policy, based on the value of *name*, is combined with the ``__baseclass__`` class from the registry and one of the two classes ``__bytesclass__`` or ``__stringclass__`` from the registry to construct the actual class whose instance is returned. Header Classes ~~~~~~~~~~~~~~ .. class:: StringHeaderMixin() This is the value for the ``__stringclass__`` entry in the default header registry. .. attribute:: data_type Read only, value ``string``. .. method:: encode(charset='utf-8') Encode a string header into a bytes header. *charset* will be used to encode any characters in the value into bytes as needed, and therefore will also appear as the charset in any resulting :rfc:`2047` encoded words. .. class:: BytesHeaderMixin() This is the value for the ``__bytesclass__`` entry in the default header registry. .. attribute:: data_type Read only, value ``bytes``. .. method:: decode(charset='us-ascii', errors='strict') Decode a bytes header into a string header. *charset* will be used to decode any non-ASCII range characters in the value, and/or any non-ASCII range characters obtained from :rfc:`2047` encoded words whose charset is ``unknown-8bit``. *errors* is passed to the :mod:`codecs` module (see :ref:`codec-base-classes` for possible values). The default charset of ``us-ascii`` and the default value of ``strict`` for *errors* means that the presence of such non-ASCII characters will result in a :exc:`UnicodeDecodeError`. On the other hand in most cases providing a charset will result in mojibake__. __ http://en.wikipedia.org/wiki/Mojibake .. class:: UnstructuredHeader(name, value, policy=PolicySet()) This is the value for the ``__baseclass__`` entry in the default header registry. *name* is the name of the header It should be in normalized form, as it will not be processed. *value* is the field body. It should be in unfolded form (that is, it should not contain any carriage returns or line feeds). It is not validated during object creation. .. attribute:: policy The default policy in effect for this header (see :mod:`~email.policy`). Changing the policy may invalidate the :attr:`defects` list; use :meth:`validate` to refresh it if needed. .. attribute:: policy_override A :class:`~email.policy.Policy` object providing policy settings that will override settings from the default policy. Changing the policy may invalidate the :attr:`defects` list; use :meth:`validate` to refresh it if needed. .. attribute:: raw_data The raw data that was parsed to create this header. The parser should set this attribute after creating the header. This attribute is used if the header is serialized with a policy that has :attr:`use_raw_data_if_possible` set. .. attribute:: name The 'field name' in :rfc:`5233` parlance. The value is whatever was passed in to the constructor, but in normal circumstances will be the field name in normalized form (see :func:`header`). .. attribute:: value The 'field body' in :rfc:`5233` parlance. The value is whatever was passed in to the constructor, but in normal circumstances will be an unfolded field body (see :func:`header`). .. attribute:: defects A list of :class:`~email.errors.Defect` instances corresponding to RFC compliance issues found while processing the header data, if the policy in use records them. Defects can be noted during any method call, so the defects list may be incomplete unless :meth:`validate` has been called. Its initial value is an empty list. .. method:: validate(policy=None) Make sure that all detectable defects are registered. Calling this method clears the :attr:`defects` list, which will be reset if an only if the policy used does so (the default policy does so). .. method:: serialize(policy=None) A generator returning the lines resulting from folding the header according to :rfc:`5322` folding rules, using the formatting parameters specified by *policy*. The generated lines include the end of line characters specified by the policy. .. method:: __str__() A representation of the header as a single line (name and value separated by a colon and a space). For a bytes header this will be the ASCII representation decoded to unicode, and is equivalent to serializing the header with ``Policy(max_line_length=None, must_be_7bit=True, newline='')``, and doing a :attr:`~str.decode`\ ('ASCII') on the resulting byte string. .. method:: __eq__(other) Two headers are equal if they have the same base types, their names compare equal without regard to case, and their values compare equal (see :meth:`value_equal`). .. method:: __ne__(other) not __eq__(other) .. method:: value_equal(other) The values of unstructured headers are considered equal if they compare equal under simple string comparison. Bytes header values are decoded to strings before comparison. If a :exc:`UnicodeDecodeError` is raised during the decoding of either value, then the byte strings are compared. .. method: unfold_header(headerstring) Takes an input string containing carriage return and/or line feed characters and returns the data unfolded according to slightly modified :rfc:5322: unfolding rules. The modification to the strict interpretation of the :rfc:`5233` rules is that white space that is used for folding is compressed into a single space during unfolding. The RFC does not specify this behavior, but it is the most useful behavior for handling email messages generated by the vast majority of email handling software. This method should not be needed by most applications, but is exposed for that applications that need to manipulate the raw data directly.