Email Package Issue Summary

This is an attempt to categorize the open (at the time of inclusion) bug reports relevant to the email package. It is a work in progress.

Issues that appear to be fixable in the current package

Note that regardless of whether these actually get fixed in the current package, unit tests for them must be included in email6.

issue7472: email.encoders.encode_7or8bit(): typo “iso-2202”. “iso-2022” is correct. (closed)

issue4487: Add utf8 alias for email charsets (closed)

issue4768: email.generator.Generator object bytes/str crash - b64encode() bug? (closed)
This can be “fixed” in the current package. Victor’s patch may or may not be appropriate, depending on the decision made about issue 4769 (closed).
issue5277: email message.get_params() and related methods sometimes fail. (closed)
Bug in header parsing. Patch and test available, just needs to be reviewed and applied.
issue5610: email feedparser.py CRLFLF bug: $ vs \Z (closed)
How to handle mixed line endings properly during parsing.
issue6465: email.feedparser regular expression bug (NLCRE_crack) (closed)
Similar to issue 5610 (closed), but regarding a CRLF split across input chunks.
issue6598: calling email.utils.make_msgid frequently has a non-trivial probability of generating colliding ids
Issue has patch and tests, just needs to be applied.

issue7304: email.message.Message.set_payload and as_string given charset ‘us-ascii’ plus 8bit data produces invalid message

issue1555570: email parser incorrectly breaks headers with a CRLF at 8192 (closed)
Corner case in feedparser when handling CRLF that gets broken across its chunk-read boundary. Tony Nelson provides a fix and a test.
issue7143: get_payload(decode=True) eats last newline in base64 encoded payload (closed)
If a base64 encoded part ends with an (encoded) newline, that newline is incorrectly stripped by get_payload.
issue8769: Straightforward usage of email package fails to round-trip (closed)
If an encoded word contains a , or ; and is long, it gets wrapped at the , or ;, thereby breaking the encoded word into invalid chunks. Andrew has a patch that may or may not be general enough to apply.

Email relevant issues in modules the email package uses

mimetypes

issue4963: mimetypes.guess_extension result changes after mimetypes.init()
We wouldn’t reload mimetypes, but if an application program does it could affect the results generated by email, so we have an interest in seeing this fixed.
issue6626: show Python mimetypes module some love
This issue is of interest primarily because it shows someone interested in working on mimetypes.
issue6763: Crash on mac os x leopard in mimetypes.guess_type (or PyObject_Malloc) (closed)
Tracked down to a threading issue. Since email might well be used in a threaded ap, this issue could be of concern.
issue1043134: Add preferred extensions for MIME types
Possibly not relevant, but worth noting.

datetime

issue665194: datetime-RFC2822 roundtripping (closed)
Proposal to add support for RFC2822 dates to the datetime module. Alternatively (or additionally), parsedate and formatdate could accept datetimes. (It seems like email6 would be a good opportunity to add datetime support to the email package.) Issue includes a patch for datetime.
issue5207: extend strftime/strptime format for RFC3339 and RFC2822
Related suggestion for allowing RFC2822 (and RFC3339) dates to be parsed by strptime.
issue762963: timemodule.c: Python loses current timezone
I don’t know if this affects email or not, but it sounds like it might.

others

issue1753718: base64 “legacy” functions violate RFC 3548
Doc bug in a related module, should be fixed so that email consumers pick the correct API if they use the module directly.
issue1466065: base64 module ignores non-alphabet characters (closed)
If we want to be thorough in reporting defects, the proposed ‘validate’ option could be useful.
issue4770: binascii module, inconsistent behavior: some functions accept unicode string input (closed)
This issue raises several concerns with how binascii works, and how the email package makes use of it, mostly with regards to transcoding transfer encodings. It also raises an issue with the fact that email has been encoding using ‘raw-unicode-escape’, which appears to be incorrect.
issue843590: ‘macintosh’ encoding alias for ‘mac_roman’ (closed)
This is actually pretty specific to the email package as the proposed alias is incorrect according to the RFCs but encountered “in the wild” in email messages.
issue4769: b64decode should accept strings or bytes (closed)
Affects the API we call.

Header Parsing

folding

issue4696: email module does not unfold headers
The email package currently does not really deal correctly with folding and unfolding headers according the RFC. In the new API I would expect the unfolded header value to be returned by the main API, while the raw data API would be used to obtain the folded header. Note that per RFC 5322 unfolding is done before semantic parsing.
issue1974: email.MIMEText.MIMEText.as_string incorrectly folding long subject header (closed)
More on folding. Python used to insert a tab<linebreak> combo to fold long headers, and this is clearly wrong. The fix to this bug was to use <space><linebreak>, which is still wrong but less so. Barry notes in the issue that this should be “done right” in 3.x, but needs an API change.
issue5612: whitespace folding in the email package could be better ;-) (closed)

Chris notes that the fix to issue 1974 (closed) only works because header folding currently collapses multiple whitespaces into single blanks, but that is not standards compliant (folding should not modify existing whitespace). He says that this is better in 3.1, but it looks to me like it is even more broken, unless I’m doing something stupid (quite possible): in 3.x the multiple spaces are preserved in the MIMEText instance, but despite the docs for str/encode, the header is not folded when str/as_string are called. Worse, when Generator.flatten is used, the header appears to be truncated at the first occurrence of multiple spaces.

I don’t think it is practical to fix this issue in 2.7, even though that this what the bug report is against. Backporting the email6 module would appear to be the only practical way to address the folding issues for 2.7.

issue504152: rfc822 long header continuation broken (closed)
The grandaddy of all header folding issues. Amazing how long this issue has been with us. This issue contains some of Barry’s API thoughts.
issue968430: error flattening complex smime signed message
Here the header folding problems (whitespace stripping) are interfering with signed attachments. We should use the examples from this issue to create test cases for the new header parser/generator.
issue1670765: email.Generator: no header wrapping for multipart/signed (closed)

According to this report the fact that headers are refolded at all interferes with signed messages. Patch provided that deals with just stopping folding for signed parts.

NB: here is a pointer to the part of RFC 1847 that indicates the multipart/signed body must not be modified in transit.

The fix from this issue has been committed, which solves part of the problem. The issue includes additional tests for the whitespace modification problem.

issue1372770: email.Header should preserve original FWS (closed)
This is a succinct description of how I think the email package should handle folding whitespace when doing folding.
issue1590744: mail message parsing glitch
Yet Another Folding Bug (or, at least, parser/generator invertability bug).

Address Parsing

issue963906: Unicode email address helper (closed)
Feature request specific to handling unicode in email addresses. The tests and some of the logic (the IDNA logic) are relevant to email6, but the class as proposed is not.
issue1025395: email.Utils.parseaddr fails to parse valid addresses
Some useful address parsing test cases that the email package probably currently gets wrong, or at least doesn’t handle broken data as well as it should.
issue1050268: rfc822.parseaddr is broken, breaks sendmail call in smtplib (closed)
Problems with interpretation of quoting in addresses. Provides some interesting test cases.

dates and times

issue1155362: Bugs in parsedate_tz (closed)
Suggestion that parsedate_tz allow the timezone to be not preceeded by a blank. None of the posters have seen this outside of spam. The parser already handles a ‘+’ with no preceding space.
issue1162477: Parsing failures in parsedate_tz (closed)
Another broken date representation that could be handled: dots instead of colons in the time.
issue1194222: parsedate and Y2K (closed)
Handling two digit years correctly to handle dates generated by older programs.
issue1454285: test_parsedate_acceptable_to_time_functions+DST == :-( (closed)
One of the test cases fails if the system timezone is Australia/Melbourne.
issue748843: Let Email.Utils.parsedate use last 3 timetuple items (closed)
Proposal to have parsedate() automatically update the returned tuple so that all returned fields are valid. More or less rejected on efficiency grounds, but perhaps worth thinking about in terms of the “cleanness” of the API, since we are mostly ignoring efficiency concerns during initial development.

Other

issue2679: email.feedparser regex duplicate
This is a code refactoring request, and would probably be rendered obsolete by an full RFC2822 parser.
issue3169: email/header.py doesn’t handle Base64 headers that have been insufficiently padded. (closed)
This one is independent of the parser and has to do with repairing bad data (Postel’s law).
issue3609: does parse_header really belong in CGI module?
cgi still has this function in py3k, as well as parse_multipart (also mentioned in the issue). Logically the email package should be supplying these kind of tools to CGI, so this issue should be considered in the use cases when designing the new API.
issue5871: email.header.Header too lax with embeded newlines (closed)
The current email package does not properly encode newlines in headers. This one is almost a security issue since it could lead to email header injection attacks against web sites that use the email module for generating outbound email.
issue6302: Add decode_header_as_string method to email.utils (closed)
This issue will be addressed by the new dual bytes/string API.
issue795081: email.Message param parsing problem II
This is for the permissive version of the parser and suggests an unquoting heuristic that the poster says has proven to be robust in the face of dirty Internet data.

Parser/Generator

issue4661: email.parser: impossible to read messages encoded in a different encoding (closed)
This issue could be the poster child for the need to rewrite the email package.
issue724459: Add documentation about line endings in email messages.
Discussion of general Python philosophy about handling line endings: use \n internally, and any module that writes to the wire should convert to CRLF (smtplib, imaplib, etc). The issue is a request for a doc enhancement, and it certainly applies to the design of the email package. This issue needs to be addressed at the fundamental design level.
issue1349106: email.Generator does not separate headers with “\r\n” (closed)
IMO, despite (or because of) issue 724459, the generator should have an API for creating standards compliant output using CRLF regardless of platform. This is to support consumers of the package that do communicate on the wire, and may also be necessary in order to fully support handling mixed line endings (see issue 975330). The default, however, should continue to be \n, because that’s what general python programs expect the line end discipline to be.
issue975330: Inconsistent newline handling in email module
The new API must be consistent in how newlines are handled in text parts, regardless of what encoding happens. This issue interacts directly with issue 724459.
issue6942: email.generator.Generator memory consumption
A performance/resource usage enhancement proposal.
issue1243730: Big speedup in email message parsing
Another performance enhancement, by eliminating some uses of re in favor of direct string manipulation.
issue740495: API enhancement: poplib.MailReader()
Request, essentially, for an additional API for FeedParser that would make life easier when using data returned from poplib: accepting a list of lines.
issue1440472: email.Generator is not idempotent (closed)
The parser/generator are not currently inverses. We intend to fix this in email6.
issue1459867: Message.as_string should use “mangle_from_=unixfrom”? (closed)
In addition, __str__ and as_string don’t respect the unixfrom flag value set on the Message object.
issue1443866: email 3.0+ stops parsing headers prematurely (closed)
Current parser treats non-header lines that start in column 1 as the end of the headers. It is not clear that this is in fact wrong as long as a defect is recorded, but we should consider how smart it is possible/reasonable to be about detecting the start of the body when the message is ill-formed.
issue1443875: email/charset.py convert() patch
Essentially a request to allow non-strict decoding (‘replace’) at the application program’s request.
issue1672568: silent error in email.message.Message.get_payload (closed)
Example of where “never fail on query” and “don’t let errors pass silently” may conflict in the parser. Should be considered in the API design.
issue1243654: Faster output if message already has a boundary (closed)
Optimization, but raises the issue of what should happen if a message’s boundary is already defined.
issue8008: Allow Arbitrary OpenID providers in this bug tracker (closed)
The current method of handing string input (turning it into a StringIO so it looks like a file to FeedParser) has memory (and speed) consequences when the input string is large.

Miscellaneous

issue4766: email documentation needs to be precise about strings/bytes (closed)
This will get fixed as we rewrite the docs to explain the new API.
issue8050: smtplib SMTP.sendmail (TypeError: expected string or buffer) (closed)
While this was an invalid bug, as I note in the comments having the facility somewhere in the stdlib to pass a Message object to SMTP.sendmail would be very handy.

General issues from consumers of the email package

issue747320: rfc2822 formatdate functionality duplication
A general call for removing code duplication, but the point is made that some RFCs have slightly different formats. However, logging in particular should use email instead of having its own. There is a further note that there are other places in the stdlib where duplication of email services occurs, but no pointers in the issue.

xmprpclib

issue7606: test_xmlrpc fails with non-ascii path (closed)
xmlrpcib needs to handle non-ASCII characters in http headers. The http spec calls for such data to be RFC 2047 encoded, so the email package has a role to play here.

urllib

issue4733: Add a “decode to declared encoding” version of urlopen to urllib
I’m listing this issue because it involves headers and charsets and the fact that the data coming in “on the wire” for http is binary. However, the offered enhancement patch doesn’t appear to directly involve the email package.
issue4773: HTTPMessage not documented and has inconsistent API across Py2/Py3
urlopen returns a Message object in Py3 currently. The issue proposes to hide the Py2/Py3 differences behind a simpler (documented) API. The issue here for email6 is to make sure we update that wrapper to use the appropriate new API calls from email6. (Or we may have to create it if no one else fixes the issue by the time email6 lands).

httplib

issue4403: regression from 2.6: smtplib.py requiring ascii for sending messages (closed)
Here smtplib needs to put bytes on the wire correctly, and the email package is the logical way to do this. At the very least there are smtplib doc issues here, and perhaps some use cases.
issue5053: http.client.HTTPMessage.getallmatchingheaders() always returns []
http.client has a function :func:getallmatchingheaders that could be replaced by the current :func:get_all from :class:Message. The resolution of this issue will be affected by the transition to headers always being Header objects, since that in fact changes the API that http.client is exposing if http.client switches to using get_all. It would arguably not be a bad thing for this API to return header objects, but it means we need to think about the proposed compatibility layer in a wider context than just the email package itself.
issue7370: BaseHTTPServer reinventing rfc822 date formatting
Suggested refactoring...the patch uses rfc822 but of course it should be the email module’s formatdate (but see issue 5207 as well).
issue8318: Deprecation of multifile inappropriate or incomplete (closed)
The use case that triggered this (parsing range objects) may or may not be handled by the current MIME implementation, but certainly needs to be. We may wish to consider other possible applications of multifile, which is a bit more general that MIME parsing, and whether or not we want to provide an API to support those uses. In any case the multifile docs need to be updated with transition instructions.

CGI

issue4953: cgi module cannot handle POST with multipart/form-data in 3.x (closed)
Because FeedParser can’t parse bytes, CGI can’t handle upload of binary files.
issue1367631: maximum length not enforced in cgi.parse()
If CGI is converted to use email for parsing forms, then this use case (maxlen) should be considered.

nntplib

Issues that could have an effect on a backport

issue1813: Codec lookup failing under turkish locale (closed)
Python uses the “wrong” toupper/tolower methods by default, which can cause some problems in certain locales. This problem does not exist (modulo the issues addressed by RFC 3454) in py3k, where upper and lower are locale-independent.

2.x only issues

issue2848: Remove mimetools usage from the stdlib (closed)

issue4212: email.LazyImporter does not use absolute imports (closed)
Python 3 does not use LazyImporter.
issue1379416: email.Header encode() unicode P2.6 (closed)
Looks like it might be a simple fix.
issue1368247: unicode in email.MIMEText and email/Charset.py (closed)
doc bug, really
issue1555842: email package and Unicode strings handling (closed)
Unicode isn’t handled right. Big surprise.
issue1681333: email.header unicode fix (closed)
Runs of ‘us-ascii’ encoded words lose intermediate spaces. The py3 code does not have the call to _normalize may be the source of the problem in py2.
issue1685453: email package should work better with unicode (closed)
Master issue for unicode bugs.