.. index:: pair: email6; open issues Email Package Issue Summary =========================== This is an attempt to categorize the open (at the time of inclusion) bug reports relevant to the email package. It is a work in progress. .. index:: triple: email6; open issues; fixable Issues that appear to be fixable in the current package ------------------------------------------------------- Note that regardless of whether these actually get fixed in the current package, unit tests for them *must* be included in email6. :titledissue:`7472` :titledissue:`4487` :titledissue:`4768` This can be "fixed" in the current package. Victor's patch may or may not be appropriate, depending on the decision made about :issue:`4769`. :titledissue:`5277` Bug in header parsing. Patch and test available, just needs to be reviewed and applied. :titledissue:`5610` How to handle mixed line endings properly during parsing. :titledissue:`6465` Similar to :issue:`5610`, but regarding a CRLF split across input chunks. :titledissue:`6598` Issue has patch and tests, just needs to be applied. :titledissue:`7304` :titledissue:`1555570` Corner case in feedparser when handling CRLF that gets broken across its chunk-read boundary. Tony Nelson provides a fix and a test. :titledissue:`7143` If a base64 encoded part ends with an (encoded) newline, that newline is incorrectly stripped by get_payload. :titledissue:`8769` If an encoded word contains a , or ; and is long, it gets wrapped at the , or ;, thereby breaking the encoded word into invalid chunks. Andrew has a patch that may or may not be general enough to apply. .. index:: single: email6; open issues in dependencies single: open issues; in email6 dependencies Email relevant issues in modules the email package uses ------------------------------------------------------- mimetypes ~~~~~~~~~ :titledissue:`4963` We wouldn't reload mimetypes, but if an application program does it could affect the results generated by email, so we have an interest in seeing this fixed. :titledissue:`6626` This issue is of interest primarily because it shows someone interested in working on mimetypes. :titledissue:`6763` Tracked down to a threading issue. Since email might well be used in a threaded ap, this issue could be of concern. :titledissue:`1043134` Possibly not relevant, but worth noting. datetime ~~~~~~~~ :titledissue:`665194` Proposal to add support for RFC2822 dates to the :mod:`datetime` module. Alternatively (or additionally), parsedate and formatdate could accept datetimes. (It seems like email6 would be a good opportunity to add datetime support to the email package.) Issue includes a patch for datetime. :titledissue:`5207` Related suggestion for allowing RFC2822 (and RFC3339) dates to be parsed by strptime. :titledissue:`762963` I don't know if this affects email or not, but it sounds like it might. others ~~~~~~ :titledissue:`1753718` Doc bug in a related module, should be fixed so that email consumers pick the correct API if they use the module directly. :titledissue:`1466065` If we want to be thorough in reporting defects, the proposed 'validate' option could be useful. :titledissue:`4770` This issue raises several concerns with how binascii works, and how the email package makes use of it, mostly with regards to transcoding transfer encodings. It also raises an issue with the fact that email has been encoding using 'raw-unicode-escape', which appears to be incorrect. :titledissue:`843590` This is actually pretty specific to the email package as the proposed alias is incorrect according to the RFCs but encountered "in the wild" in email messages. :titledissue:`4769` Affects the API we call. .. index:: triple: email6; open issues; header parsing Header Parsing -------------- folding ~~~~~~~ :titledissue:`4696` The email package currently does not really deal correctly with folding and unfolding headers according the RFC. In the new API I would expect the unfolded header value to be returned by the main API, while the raw data API would be used to obtain the folded header. Note that per `RFC 5322`__ unfolding is done *before* semantic parsing. __ http://tools.ietf.org/html/rfc5322#section-3.2.2 :titledissue:`1974` More on folding. Python used to insert a tab combo to fold long headers, and this is clearly wrong. The fix to this bug was to use , which is still wrong but less so. Barry notes in the issue that this should be "done right" in 3.x, but needs an API change. :titledissue:`5612` Chris notes that the fix to :issue:`1974` only works because header folding currently collapses multiple whitespaces into single blanks, but that is not standards compliant (folding should not modify existing whitespace). He says that this is better in 3.1, but it looks to me like it is even more broken, unless I'm doing something stupid (quite possible): in 3.x the multiple spaces are preserved in the MIMEText instance, but despite the docs for str/encode, the header is not folded when str/as_string are called. Worse, when Generator.flatten is used, the header appears to be truncated at the first occurrence of multiple spaces. I don't think it is practical to fix this issue in 2.7, even though that this what the bug report is against. Backporting the email6 module would appear to be the only practical way to address the folding issues for 2.7. :titledissue:`504152` The grandaddy of all header folding issues. Amazing how long this issue has been with us. This issue contains some of Barry's API thoughts. :titledissue:`968430` Here the header folding problems (whitespace stripping) are interfering with signed attachments. We should use the examples from this issue to create test cases for the new header parser/generator. :titledissue:`1670765` According to this report the fact that headers are refolded at all interferes with signed messages. Patch provided that deals with just stopping folding for signed parts. NB: `here`__ is a pointer to the part of :rfc:`1847` that indicates the multipart/signed body must *not* be modified in transit. __ http://tools.ietf.org/html/rfc1847#page-4 The fix from this issue has been committed, which solves part of the problem. The issue includes additional tests for the whitespace modification problem. :titledissue:`1372770` This is a succinct description of how I think the email package should handle folding whitespace when doing folding. :titledissue:`1590744` Yet Another Folding Bug (or, at least, parser/generator invertability bug). Address Parsing ~~~~~~~~~~~~~~~ :titledissue:`963906` Feature request specific to handling unicode in email addresses. The tests and some of the logic (the IDNA logic) are relevant to email6, but the class as proposed is not. :titledissue:`1025395` Some useful address parsing test cases that the email package probably currently gets wrong, or at least doesn't handle broken data as well as it should. :titledissue:`1050268` Problems with interpretation of quoting in addresses. Provides some interesting test cases. RFC2047 related ~~~~~~~~~~~~~~~ :titledissue:`1078919` Python generates something RFC compliant that other mailers don't understand, and what they do understand is more readable. So perhaps we could generate the more human friendly version. :titledissue:`1079` The issue is that encoded atoms are not always recognized correctly. The best way to fix this would be to use a full RFC2822 parser. :titledissue:`2658` Similar to above, but involving newline parsing. :titledissue:`1210680` This looks like a bug in gmail's interpretation of RFC2047 which I hope has since been fixed, but it might be worth implementing anyway as a courtesy to other possibly broken mail clients. :titledissue:`1467619` The current RFC2047 decoder is eating spaces that aren't between the encoded words. This should be fixed by the new header parser. I've confirmed that the simplistic fix in the issue causes different whitespace problems, so Barry is correct that this issue can only be solved correctly by rewriting the parser. :titledissue:`1690608` Note that I suspect the lack of this support causes the email package to generate headers that are not quite RFC compliant, since if I'm reading the RFC correctly in order to be compliant the name must be encoded as a standalone encoded word, not joined to the rest of the address as is currently done. :titledissue:`9286` A test case where it looks like parseaddr returns the wrong result (merwok wok@rusty --> merwokwok@rusty). dates and times ~~~~~~~~~~~~~~~ :titledissue:`1155362` Suggestion that parsedate_tz allow the timezone to be not preceeded by a blank. None of the posters have seen this outside of spam. The parser already handles a '+' with no preceding space. :titledissue:`1162477` Another broken date representation that could be handled: dots instead of colons in the time. :titledissue:`1194222` Handling two digit years correctly to handle dates generated by older programs. :titledissue:`1454285` One of the test cases fails if the system timezone is Australia/Melbourne. :titledissue:`748843` Proposal to have :func:`~email.utils.parsedate` automatically update the returned tuple so that all returned fields are valid. More or less rejected on efficiency grounds, but perhaps worth thinking about in terms of the "cleanness" of the API, since we are mostly ignoring efficiency concerns during initial development. Other ~~~~~ :titledissue:`2679` This is a code refactoring request, and would probably be rendered obsolete by an full RFC2822 parser. :titledissue:`3169` This one is independent of the parser and has to do with repairing bad data (Postel's law). :titledissue:`3609` cgi still has this function in py3k, as well as parse_multipart (also mentioned in the issue). Logically the email package should be supplying these kind of tools to CGI, so this issue should be considered in the use cases when designing the new API. :titledissue:`5871` The current email package does not properly encode newlines in headers. This one is almost a security issue since it could lead to email header injection attacks against web sites that use the email module for generating outbound email. :titledissue:`6302` This issue will be addressed by the new dual bytes/string API. :titledissue:`795081` This is for the permissive version of the parser and suggests an unquoting heuristic that the poster says has proven to be robust in the face of dirty Internet data. .. index:: triple: email6; open issues; MIME MIME related ------------ :titledissue:`1823` We need to do the right thing if a charset is set on a mulitpart. Currently it is possible to produce an invalid content-transfer-encoding by doing so, which subsequently screws up message generation. In the new design, the MIME multipart subclass should check the charset when set. :titledissue:`5423` Conceptually related to :issue:`1823`. I'm not quite sure why Chris broke it out into a separate issue. :titledissue:`1874` This is related to :issue:`1823`. The current design doesn't register a defect for a multipart with an invalid content-transfer-encoding, because the parser only ever instantiates Message objects. In the new design it will instantiate a multipart-specific subclass, and that subclass could find and register the defect. The issue includes a patch for the current parser code, though. :titledissue:`4177` Although the title says crash, it's really just consuming excessive resources. The issue points up the need to consider how to handle large MIME objects in an efficient fashion. This goes beyond the concern for storing large parts on disk, to dealing with the fact that handling a *single* large part in memory may be something we need to be concerned about. This, however, would seem to be a later stages optimization and not something we should worry about too much in the initial design. :titledissue:`5803` This is a performance issue and can be postponed as long as needed for email6, but if we wind up rewriting the quoprimime module we should take it into account. :titledissue:`6521` A documentation issue, which should be resolved by the new bytes/string API and its accompanying documentation. The current docs for the current package could be fixed, though. :titledissue:`626452` Proposal to support Content-ID and Message-ID uniform resource locators. Might be possible to sneak in an implementation while rewriting other code, but is certainly lower priority than other work. :titledissue:`634412` Like previous, for multipart/related. :titledissue:`3244` This is a feature request, and is is for use with urllib and httplib, but may well belong in the email module's mime support. :titledissue:`1043706` Barry suggests part of an interface and links to the email list discussion of this topic. :titledissue:`1525919` Problems with the current API and controlling the application of transfer encodings. What this issue makes clear is that the API for generating transport encoded parts needs work. The design of including a transport encoding registry should make it possible to have an API that is both simpler and less error prone than what we currently have. :titledissue:`8054` MIMEText reportedly encodes message bodies in chunks when handed unicode input, which confuses some mail clients. Not sure if this bug applies to 3.x. .. index:: triple: email6; open issues; Parser/Generator Parser/Generator ---------------- :titledissue:`4661` This issue could be the poster child for the need to rewrite the email package. :titledissue:`724459` Discussion of general Python philosophy about handling line endings: use :code:`\n` internally, and any module that writes to the wire should convert to CRLF (smtplib, imaplib, etc). The issue is a request for a doc enhancement, and it certainly applies to the design of the email package. *This issue needs to be addressed at the fundamental design level*. :titledissue:`1349106` IMO, despite (or because of) :issue:`724459`, the generator should have an API for creating standards compliant output using CRLF regardless of platform. This is to support consumers of the package that do communicate on the wire, and may also be necessary in order to fully support handling mixed line endings (see :issue:`975330`). The default, however, should continue to be :code:`\n`, because that's what general python programs expect the line end discipline to be. :titledissue:`975330` The new API must be consistent in how newlines are handled in text parts, regardless of what encoding happens. This issue interacts directly with :issue:`724459`. :titledissue:`6942` A performance/resource usage enhancement proposal. :titledissue:`1243730` Another performance enhancement, by eliminating some uses of re in favor of direct string manipulation. :titledissue:`740495` Request, essentially, for an additional API for FeedParser that would make life easier when using data returned from poplib: accepting a list of lines. :titledissue:`1440472` The parser/generator are not currently inverses. We intend to fix this in email6. :titledissue:`1459867` In addition, __str__ and as_string don't respect the unixfrom flag value set on the Message object. :titledissue:`1443866` Current parser treats non-header lines that start in column 1 as the end of the headers. It is not clear that this is in fact wrong as long as a defect is recorded, but we should consider how smart it is possible/reasonable to be about detecting the start of the body when the message is ill-formed. :titledissue:`1443875` Essentially a request to allow non-strict decoding ('replace') at the application program's request. :titledissue:`1672568` Example of where "never fail on query" and "don't let errors pass silently" may conflict in the parser. Should be considered in the API design. :titledissue:`1243654` Optimization, but raises the issue of what should happen if a message's boundary is already defined. :titledissue:`8008` The current method of handing string input (turning it into a StringIO so it looks like a file to FeedParser) has memory (and speed) consequences when the input string is large. .. index:: triple: email6; open issues; library clients Miscellaneous ------------- :titledissue:`4766` This will get fixed as we rewrite the docs to explain the new API. :titledissue:`8050` While this was an invalid bug, as I note in the comments having the facility somewhere in the stdlib to pass a Message object to SMTP.sendmail would be very handy. General issues from consumers of the email package -------------------------------------------------- :titledissue:`747320` A general call for removing code duplication, but the point is made that some RFCs have slightly different formats. However, logging in particular should use email instead of having its own. There is a further note that there are other places in the stdlib where duplication of email services occurs, but no pointers in the issue. xmprpclib ~~~~~~~~~ :titledissue:`7606` xmlrpcib needs to handle non-ASCII characters in http headers. The http spec calls for such data to be :rfc:`2047` encoded, so the email package has a role to play here. urllib ~~~~~~ :titledissue:`4733` I'm listing this issue because it involves headers and charsets and the fact that the data coming in "on the wire" for http is binary. However, the offered enhancement patch doesn't appear to directly involve the email package. :titledissue:`4773` urlopen returns a Message object in Py3 currently. The issue proposes to hide the Py2/Py3 differences behind a simpler (documented) API. The issue here for email6 is to make sure we update that wrapper to use the appropriate new API calls from email6. (Or we may have to create it if no one else fixes the issue by the time email6 lands). httplib ~~~~~~~ :titledissue:`4403` Here smtplib needs to put bytes on the wire correctly, and the email package is the logical way to do this. At the very least there are smtplib doc issues here, and perhaps some use cases. :titledissue:`5053` http.client has a function :func:getallmatchingheaders that could be replaced by the current :func:get_all from :class:Message. The resolution of this issue will be affected by the transition to headers always being Header objects, since that in fact changes the API that http.client is exposing if http.client switches to using get_all. It would arguably not be a bad thing for this API to return header objects, but it means we need to think about the proposed compatibility layer in a wider context than just the email package itself. :titledissue:`7370` Suggested refactoring...the patch uses rfc822 but of course it should be the email module's formatdate (but see :issue:`5207` as well). :titledissue:`8318` The use case that triggered this (parsing range objects) may or may not be handled by the current MIME implementation, but certainly needs to be. We may wish to consider other possible applications of multifile, which is a bit more general that MIME parsing, and whether or not we want to provide an API to support those uses. In any case the multifile docs need to be updated with transition instructions. CGI ~~~ :titledissue:`4953` Because :class:`~email.parser.FeedParser` can't parse bytes, CGI can't handle upload of binary files. :titledissue:`1367631` If CGI is converted to use email for parsing forms, then this use case (maxlen) should be considered. nntplib ~~~~~~~ .. index:: triple: email6; open issues; backport Issues that could have an effect on a backport ---------------------------------------------- :titledissue:`1813` Python uses the "wrong" toupper/tolower methods by default, which can cause some problems in certain locales. This problem does not exist (modulo the issues addressed by :rfc:`3454`) in py3k, where upper and lower are locale-independent. .. index:: triple: email6; open issues; 2.x only 2.x only issues --------------- :titledissue:`2848` :titledissue:`4212` Python 3 does not use LazyImporter. :titledissue:`1379416` Looks like it might be a simple fix. :titledissue:`1368247` doc bug, really :titledissue:`1555842` Unicode isn't handled right. Big surprise. :titledissue:`1681333` Runs of 'us-ascii' encoded words lose intermediate spaces. The py3 code does not have the call to _normalize may be the source of the problem in py2. :titledissue:`1685453` Master issue for unicode bugs.