.. index:: email6 2011-03-28 Policy Framework First Draft ======================================= Last week turned out to be mostly about tests and bugs. As per my last post, I moved the tests into a test package. Then I went on to add a bunch of `additional tests`_ developed by Michael Henry at the PyCon sprints. More tests are always good before starting to modify code, right? .. _additional tests: http://bugs.python.org/issue11589 Michael's tests had revealed a couple bugs, though, so I then went on to apply the `fix`_ for those bugs, which included a `rewritten algorithm`_ for encoding strings as quoted printable. I adapted the algorithm proposed by Michael, then discovered a different and probably `better algorithm`_ had already been proposed a while back and gotten lost in the tracker. That proposed patch was against the email package in Python2, though, and the corresponding code in Python3 has a different interface, so the patch wasn't easily adapted. Since there are other changes that need to be made to the quoted printable encoder, I have deferred implementing the better algorithm until I get as far as touching that code for the email6 work. .. _fix: http://bugs.python.org/issue11590 .. _rewritten algorithm: http://bugs.python.org/issue11606 .. _better algorithm: http://bugs.python.org/issue5803 There was also a `bug`_ in the Email5 API that I wanted to fix before starting to make API changes. When you deal with "dirty" headers in Email5.1, you may get back a ``Header`` object when querying a header. Now, the normal way to deal with crazy headers in Email5 is to pass them to ``decode_header`` to get the pairs of character sets and original bytes from the wire out. But ``decode_header`` wasn't accepting a ``Header`` object for ``decoding``. My first approach was to try shifting back to returning strings even when the header was "dirty", by wrapping them up in encoded words with the ``unknown-8bit`` charset. That more or less worked, but doing it that way would mean making some other changes to methods such as ``get_param`` to handle headers that had gotten re-encoded into encoded words. This was far from optimal. The reporter of the bug pointed out that I had carefully documented that ``Message`` would return a ``Header`` if the source header had unencoded non-ASCII bytes in it, which made changing this behavior in a bug fix release a non-starter. So I gave in and just fixed ``decode_header`` to handle ``Header`` objects. Since *all* headers in email6 will be a (new type of) ``Header`` object, programmers may as well get used to dealing with them. .. _bug: http://bugs.python.org/issue11584 For email6 itself, there is now a `feature branch`_ where I will do the patch development for email6 before applying the changes to the main cpython repository. The branch is named ``email6``, of course. Anyone may browse or clone this repository to take a look at the current state of development. .. _feature branch: http://hg.python.org/features/email6 And that current state is that I have checked in the first draft of the Policy framework. This consists of a new module, `policy.py`_, the associated documentation, `policy.rst`_, and a set of tests, `test_policy.py`_ .. _policy.py: http://hg.python.org/features/email6/file/email6/Lib/email/policy.py .. _policy.rst: http://hg.python.org/features/email6/file/email6/Doc/library/email.policy.rst .. _test_policy.py: http://hg.python.org/features/email6/file/email6/Lib/test/test_email/test_policy.py The basic idea is that a ``Policy`` object is an immutable container for a bunch of attributes and callback hooks. You can call a ``Policy`` object to get a new one with some of the defaults changed. And you can add them together, with the non-default settings from the right operand overriding those from the left operand. So far we have policies such as: * default * SMTP * HTML * Strict *default* may get renamed *email6*. I'd prefer 'default', since that's what I'd like it to be by the time we get to Python 3.4. The actual default policy when I start adding the parameter to other classes and functions will be *email5*, though, so the name *default* for email6 is probably not going to work. The *SMTP* policy is just like default, but generates "wire format" line separators (``\r\n``). *HTML* is like *SMTP*, but does not wrap headers. *Strict* sets a flag that will (once I implement it) cause the parser to raise errors when it encounters defects instead of just keeping track of them. Using *Strict* is where you can see the utility of adding policies together:: >>> StrictSMTP = SMTP + Strict You could use StrictSMTP to parse an incoming SMTP message where you wanted your program to blow up if the message was invalid. (When would you ever want that? I don't know, but someone probably will!). So far I've only defined one hook, ``register_defect``. You could subclass ``Policy`` and define your own ``register_defect`` method that would, say, log all defects to a log file, thus giving you some idea of the quality of the email being processed by your program, even if you did nothing else with the defect info. Now we'll see what the Email SIG thinks of this implementation, and meanwhile I'll be adding policy arguments to the parser and generator classes.