Last week turned out to be mostly about tests and bugs. As per my last post, I moved the tests into a test package. Then I went on to add a bunch of additional tests developed by Michael Henry at the PyCon sprints. More tests are always good before starting to modify code, right?
Michael’s tests had revealed a couple bugs, though, so I then went on to apply the fix for those bugs, which included a rewritten algorithm for encoding strings as quoted printable. I adapted the algorithm proposed by Michael, then discovered a different and probably better algorithm had already been proposed a while back and gotten lost in the tracker. That proposed patch was against the email package in Python2, though, and the corresponding code in Python3 has a different interface, so the patch wasn’t easily adapted. Since there are other changes that need to be made to the quoted printable encoder, I have deferred implementing the better algorithm until I get as far as touching that code for the email6 work.
There was also a bug in the Email5 API that I wanted to fix before starting to make API changes. When you deal with “dirty” headers in Email5.1, you may get back a Header object when querying a header. Now, the normal way to deal with crazy headers in Email5 is to pass them to decode_header to get the pairs of character sets and original bytes from the wire out. But decode_header wasn’t accepting a Header object for decoding. My first approach was to try shifting back to returning strings even when the header was “dirty”, by wrapping them up in encoded words with the unknown-8bit charset. That more or less worked, but doing it that way would mean making some other changes to methods such as get_param to handle headers that had gotten re-encoded into encoded words. This was far from optimal. The reporter of the bug pointed out that I had carefully documented that Message would return a Header if the source header had unencoded non-ASCII bytes in it, which made changing this behavior in a bug fix release a non-starter. So I gave in and just fixed decode_header to handle Header objects. Since all headers in email6 will be a (new type of) Header object, programmers may as well get used to dealing with them.
For email6 itself, there is now a feature branch where I will do the patch development for email6 before applying the changes to the main cpython repository. The branch is named email6, of course. Anyone may browse or clone this repository to take a look at the current state of development.
And that current state is that I have checked in the first draft of the Policy framework. This consists of a new module, policy.py, the associated documentation, policy.rst, and a set of tests, test_policy.py
The basic idea is that a Policy object is an immutable container for a bunch of attributes and callback hooks. You can call a Policy object to get a new one with some of the defaults changed. And you can add them together, with the non-default settings from the right operand overriding those from the left operand.
So far we have policies such as:
- default
- SMTP
- HTML
- Strict
default may get renamed email6. I’d prefer ‘default’, since that’s what I’d like it to be by the time we get to Python 3.4. The actual default policy when I start adding the parameter to other classes and functions will be email5, though, so the name default for email6 is probably not going to work.
The SMTP policy is just like default, but generates “wire format” line separators (\r\n). HTML is like SMTP, but does not wrap headers. Strict sets a flag that will (once I implement it) cause the parser to raise errors when it encounters defects instead of just keeping track of them. Using Strict is where you can see the utility of adding policies together:
>>> StrictSMTP = SMTP + Strict
You could use StrictSMTP to parse an incoming SMTP message where you wanted your program to blow up if the message was invalid. (When would you ever want that? I don’t know, but someone probably will!).
So far I’ve only defined one hook, register_defect. You could subclass Policy and define your own register_defect method that would, say, log all defects to a log file, thus giving you some idea of the quality of the email being processed by your program, even if you did nothing else with the defect info.
Now we’ll see what the Email SIG thinks of this implementation, and meanwhile I’ll be adding policy arguments to the parser and generator classes.