.. index::
   pair: email6; open issues

Email Package Issue Summary
===========================

This is an attempt to categorize the open (at the time of inclusion)
bug reports relevant to the email package.  It is a work in progress.


.. index::
   triple: email6; open issues; fixable

Issues that appear to be fixable in the current package
-------------------------------------------------------

Note that regardless of whether these actually get fixed in the current
package, unit tests for them *must* be included in email6.

:titledissue:`7472`

:titledissue:`4487`

:titledissue:`4768`
    This can be "fixed" in the current package.  Victor's patch may
    or may not be appropriate, depending on the decision made about
    :issue:`4769`.

:titledissue:`5277`
    Bug in header parsing.  Patch and test available, just needs to
    be reviewed and applied.

:titledissue:`5610`
    How to handle mixed line endings properly during parsing.

:titledissue:`6465`
    Similar to :issue:`5610`, but regarding a CRLF split across input
    chunks.

:titledissue:`6598`
    Issue has patch and tests, just needs to be applied.

:titledissue:`7304`

:titledissue:`1555570`
    Corner case in feedparser when handling CRLF that gets broken
    across its chunk-read boundary.  Tony Nelson provides a fix
    and a test.

:titledissue:`7143`
    If a base64 encoded part ends with an (encoded) newline, that newline
    is incorrectly stripped by get_payload.

:titledissue:`8769`
    If an encoded word contains a , or ; and is long, it gets wrapped
    at the , or ;, thereby breaking the encoded word into invalid
    chunks.  Andrew has a patch that may or may not be general enough
    to apply.


.. index::
   single: email6; open issues in dependencies
   single: open issues; in email6 dependencies

Email relevant issues in modules the email package uses
-------------------------------------------------------


mimetypes
~~~~~~~~~

:titledissue:`4963`
    We wouldn't reload mimetypes, but if an application program does
    it could affect the results generated by email, so we have an
    interest in seeing this fixed.

:titledissue:`6626`
    This issue is of interest primarily because it shows someone
    interested in working on mimetypes.

:titledissue:`6763`
    Tracked down to a threading issue.  Since email might well be used
    in a threaded ap, this issue could be of concern.

:titledissue:`1043134`
    Possibly not relevant, but worth noting.


datetime
~~~~~~~~

:titledissue:`665194`
    Proposal to add support for RFC2822 dates to the :mod:`datetime`
    module.  Alternatively (or additionally), parsedate and formatdate
    could accept datetimes.  (It seems like email6 would be a good
    opportunity to add datetime support to the email package.)
    Issue includes a patch for datetime.

:titledissue:`5207`
    Related suggestion for allowing RFC2822 (and RFC3339) dates to be
    parsed by strptime.

:titledissue:`762963`
    I don't know if this affects email or not, but it sounds like it might.


others
~~~~~~

:titledissue:`1753718`
    Doc bug in a related module, should be fixed so that email
    consumers pick the correct API if they use the module directly.

:titledissue:`1466065`
    If we want to be thorough in reporting defects, the proposed
    'validate' option could be useful.

:titledissue:`4770`
    This issue raises several concerns with how binascii works, and how
    the email package makes use of it, mostly with regards to transcoding
    transfer encodings.  It also raises an issue with the fact that
    email has been encoding using 'raw-unicode-escape', which appears
    to be incorrect.

:titledissue:`843590`
    This is actually pretty specific to the email package as the proposed
    alias is incorrect according to the RFCs but encountered "in the wild"
    in email messages.

:titledissue:`4769`
    Affects the API we call.


.. index::
   triple: email6; open issues; header parsing

Header Parsing
--------------

folding
~~~~~~~

:titledissue:`4696`
    The email package currently does not really deal correctly with folding
    and unfolding headers according the RFC.  In the new API I would expect
    the unfolded header value to be returned by the main API, while the
    raw data API would be used to obtain the folded header.  Note that
    per `RFC 5322`__ unfolding is done *before* semantic parsing.

__ http://tools.ietf.org/html/rfc5322#section-3.2.2

:titledissue:`1974`
    More on folding.  Python used to insert a tab<linebreak> combo to
    fold long headers, and this is clearly wrong.  The fix to this bug
    was to use <space><linebreak>, which is still wrong but less so.
    Barry notes in the issue that this should be "done right" in 3.x,
    but needs an API change.

:titledissue:`5612`
    Chris notes that the fix to :issue:`1974` only works because header
    folding currently collapses multiple whitespaces into single blanks,
    but that is not standards compliant (folding should not modify
    existing whitespace).  He says that this is better in 3.1, but it
    looks to me like it is even more broken, unless I'm doing something
    stupid (quite possible): in 3.x the multiple spaces are preserved
    in the MIMEText instance, but despite the docs for str/encode,
    the header is not folded when str/as_string are called.  Worse,
    when Generator.flatten is used, the header appears to be truncated
    at the first occurrence of multiple spaces.

    I don't think it is practical to fix this issue in 2.7, even though
    that this what the bug report is against.  Backporting the email6
    module would appear to be the only practical way to address the
    folding issues for 2.7.

:titledissue:`504152`
    The grandaddy of all header folding issues.  Amazing how long this
    issue has been with us.  This issue contains some of Barry's API
    thoughts.

:titledissue:`968430`
    Here the header folding problems (whitespace stripping) are
    interfering with signed attachments.  We should use the examples from
    this issue to create test cases for the new header parser/generator.

:titledissue:`1670765`
    According to this report the fact that headers are refolded at all
    interferes with signed messages.  Patch provided that deals with
    just stopping folding for signed parts.

    NB: `here`__ is a pointer to the part of :rfc:`1847` that indicates
    the multipart/signed body must *not* be modified in transit.

    __ http://tools.ietf.org/html/rfc1847#page-4

    The fix from this issue has been committed, which solves part of
    the problem.  The issue includes additional tests for the
    whitespace modification problem.

:titledissue:`1372770`
    This is a succinct description of how I think the email package
    should handle folding whitespace when doing folding.

:titledissue:`1590744`
    Yet Another Folding Bug (or, at least, parser/generator
    invertability bug).


Address Parsing
~~~~~~~~~~~~~~~

:titledissue:`963906`
    Feature request specific to handling unicode in email addresses.
    The tests and some of the logic (the IDNA logic) are relevant
    to email6, but the class as proposed is not.

:titledissue:`1025395`
    Some useful address parsing test cases that the email package
    probably currently gets wrong, or at least doesn't handle
    broken data as well as it should.

:titledissue:`1050268`
    Problems with interpretation of quoting in addresses.  Provides
    some interesting test cases.

RFC2047 related
~~~~~~~~~~~~~~~

:titledissue:`1078919`
    Python generates something RFC compliant that other mailers don't
    understand, and what they do understand is more readable.  So
    perhaps we could generate the more human friendly version.

:titledissue:`1079`
    The issue is that encoded atoms are not always recognized correctly.
    The best way to fix this would be to use a full RFC2822 parser.

:titledissue:`2658`
    Similar to above, but involving newline parsing.

:titledissue:`1210680`
    This looks like a bug in gmail's interpretation of RFC2047 which
    I hope has since been fixed, but it might be worth implementing
    anyway as a courtesy to other possibly broken mail clients.

:titledissue:`1467619`
    The current RFC2047 decoder is eating spaces that aren't between the
    encoded words.  This should be fixed by the new header parser.
    I've confirmed that the simplistic fix in the issue causes different
    whitespace problems, so Barry is correct that this issue can only
    be solved correctly by rewriting the parser.

:titledissue:`1690608`
    Note that I suspect the lack of this support causes the email package
    to generate headers that are not quite RFC compliant, since if I'm
    reading the RFC correctly in order to be compliant the name must be
    encoded as a standalone encoded word, not joined to the rest of the
    address as is currently done.

:titledissue:`9286`
    A test case where it looks like parseaddr returns the wrong result
    (merwok wok@rusty --> merwokwok@rusty).


dates and times
~~~~~~~~~~~~~~~

:titledissue:`1155362`
    Suggestion that parsedate_tz allow the timezone to be not preceeded
    by a blank.  None of the posters have seen this outside of spam.
    The parser already handles a '+' with no preceding space.

:titledissue:`1162477`
    Another broken date representation that could be handled: dots
    instead of colons in the time.

:titledissue:`1194222`
    Handling two digit years correctly to handle dates generated by
    older programs.

:titledissue:`1454285`
    One of the test cases fails if the system timezone is Australia/Melbourne.

:titledissue:`748843`
    Proposal to have :func:`~email.utils.parsedate` automatically update
    the returned tuple so that all returned fields are valid.  More or
    less rejected on efficiency grounds, but perhaps worth thinking
    about in terms of the "cleanness" of the API, since we are mostly
    ignoring efficiency concerns during initial development.


Other
~~~~~

:titledissue:`2679`
    This is a code refactoring request, and would probably be rendered
    obsolete by an full RFC2822 parser.

:titledissue:`3169`
    This one is independent of the parser and has to do with repairing
    bad data (Postel's law).

:titledissue:`3609`
    cgi still has this function in py3k, as well as parse_multipart (also
    mentioned in the issue).  Logically the email package should be
    supplying these kind of tools to CGI, so this issue should be
    considered in the use cases when designing the new API.

:titledissue:`5871`
    The current email package does not properly encode newlines in
    headers.  This one is almost a security issue since it could lead
    to email header injection attacks against web sites that use the
    email module for generating outbound email.

:titledissue:`6302`
    This issue will be addressed by the new dual bytes/string API.

:titledissue:`795081`
    This is for the permissive version of the parser and suggests an
    unquoting heuristic that the poster says has proven to be robust
    in the face of dirty Internet data.


.. index::
   triple: email6; open issues; MIME

MIME related
------------

:titledissue:`1823`
    We need to do the right thing if a charset is set on a mulitpart.
    Currently it is possible to produce an invalid content-transfer-encoding
    by doing so, which subsequently screws up message generation.  In the new
    design, the MIME multipart subclass should check the charset when set.

:titledissue:`5423`
    Conceptually related to :issue:`1823`.  I'm not quite sure why Chris
    broke it out into a separate issue.

:titledissue:`1874`
    This is related to :issue:`1823`.  The current design doesn't register
    a defect for a multipart with an invalid content-transfer-encoding,
    because the parser only ever instantiates Message objects.  In the
    new design it will instantiate a multipart-specific subclass, and
    that subclass could find and register the defect.  The issue includes
    a patch for the current parser code, though.

:titledissue:`4177`
    Although the title says crash, it's really just consuming excessive
    resources.  The issue points up the need to consider how to handle
    large MIME objects in an efficient fashion.  This goes beyond the
    concern for storing large parts on disk, to dealing with the fact that
    handling a *single* large part in memory may be something we need
    to be concerned about.  This, however, would seem to be a later
    stages optimization and not something we should worry about too much
    in the initial design.

:titledissue:`5803`
    This is a performance issue and can be postponed as long as needed
    for email6, but if we wind up rewriting the quoprimime module we
    should take it into account.

:titledissue:`6521`
    A documentation issue, which should be resolved by the new bytes/string
    API and its accompanying documentation.  The current docs for the
    current package could be fixed, though.

:titledissue:`626452`
    Proposal to support Content-ID and Message-ID uniform resource
    locators.  Might be possible to sneak in an implementation
    while rewriting other code, but is certainly lower priority than
    other work.

:titledissue:`634412`
    Like previous, for multipart/related.

:titledissue:`3244`
    This is a feature request, and is is for use with urllib and httplib,
    but may well belong in the email module's mime support.

:titledissue:`1043706`
    Barry suggests part of an interface and links to the email list
    discussion of this topic.

:titledissue:`1525919`
    Problems with the current API and controlling the application of
    transfer encodings.  What this issue makes clear is that the API for
    generating transport encoded parts needs work.  The design of including
    a transport encoding registry should make it possible to have an API
    that is both simpler and less error prone than what we currently have.

:titledissue:`8054`
    MIMEText reportedly encodes message bodies in chunks when handed
    unicode input, which confuses some mail clients.  Not sure if this
    bug applies to 3.x.


.. index::
   triple: email6; open issues; Parser/Generator

Parser/Generator
----------------

:titledissue:`4661`
    This issue could be the poster child for the need to rewrite the email
    package.

:titledissue:`724459`
    Discussion of general Python philosophy about handling line endings:
    use :code:`\n` internally, and any module that writes to the wire should
    convert to CRLF (smtplib, imaplib, etc).  The issue is a request for
    a doc enhancement, and it certainly applies to the design of the
    email package. *This issue needs to be addressed at the fundamental
    design level*.

:titledissue:`1349106`
    IMO, despite (or because of) :issue:`724459`, the generator should
    have an API for creating standards compliant output using CRLF
    regardless of platform.  This is to support consumers of the package
    that do communicate on the wire, and may also be necessary in order
    to fully support handling mixed line endings (see :issue:`975330`).
    The default, however, should continue to be :code:`\n`, because that's
    what general python programs expect the line end discipline to be.

:titledissue:`975330`
    The new API must be consistent in how newlines are handled in text
    parts, regardless of what encoding happens.  This issue interacts
    directly with :issue:`724459`.

:titledissue:`6942`
    A performance/resource usage enhancement proposal.

:titledissue:`1243730`
    Another performance enhancement, by eliminating some uses of
    re in favor of direct string manipulation.

:titledissue:`740495`
    Request, essentially, for an additional API for FeedParser that
    would make life easier when using data returned from poplib:
    accepting a list of lines.

:titledissue:`1440472`
    The parser/generator are not currently inverses.  We intend to fix
    this in email6.

:titledissue:`1459867`
    In addition, __str__ and as_string don't respect the unixfrom
    flag value set on the Message object.

:titledissue:`1443866`
    Current parser treats non-header lines that start in column 1 as
    the end of the headers.  It is not clear that this is in fact
    wrong as long as a defect is recorded, but we should consider
    how smart it is possible/reasonable to be about detecting the
    start of the body when the message is ill-formed.

:titledissue:`1443875`
    Essentially a request to allow non-strict decoding ('replace')
    at the application program's request.

:titledissue:`1672568`
    Example of where "never fail on query" and "don't let errors pass
    silently" may conflict in the parser.  Should be considered in the
    API design.

:titledissue:`1243654`
    Optimization, but raises the issue of what should happen if a message's
    boundary is already defined.

:titledissue:`8008`
    The current method of handing string input (turning it into a
    StringIO so it looks like a file to FeedParser) has memory (and speed)
    consequences when the input string is large.


.. index::
   triple: email6; open issues; library clients

Miscellaneous
-------------

:titledissue:`4766`
    This will get fixed as we rewrite the docs to explain the new API.

:titledissue:`8050`
    While this was an invalid bug, as I note in the comments having the
    facility somewhere in the stdlib to pass a Message object to 
    SMTP.sendmail would be very handy.


General issues from consumers of the email package
--------------------------------------------------

:titledissue:`747320`
    A general call for removing code duplication, but the point is made
    that some RFCs have slightly different formats.  However, logging
    in particular should use email instead of having its own.  There is
    a further note that there are other places in the stdlib where
    duplication of email services occurs, but no pointers in the issue.

xmprpclib
~~~~~~~~~

:titledissue:`7606`
    xmlrpcib needs to handle non-ASCII characters in http headers.
    The http spec calls for such data to be :rfc:`2047` encoded,
    so the email package has a role to play here.

urllib
~~~~~~

:titledissue:`4733`
    I'm listing this issue because it involves headers and charsets and
    the fact that the data coming in "on the wire" for http is binary.
    However, the offered enhancement patch doesn't appear to directly
    involve the email package.

:titledissue:`4773`
    urlopen returns a Message object in Py3 currently.  The issue proposes
    to hide the Py2/Py3 differences behind a simpler (documented) API.
    The issue here for email6 is to make sure we update that wrapper
    to use the appropriate new API calls from email6.  (Or we may have
    to create it if no one else fixes the issue by the time email6
    lands).

httplib
~~~~~~~

:titledissue:`4403`
    Here smtplib needs to put bytes on the wire correctly, and the
    email package is the logical way to do this.  At the very least
    there are smtplib doc issues here, and perhaps some use cases.

:titledissue:`5053`
    http.client has a function :func:getallmatchingheaders that
    could be replaced by the current :func:get_all from :class:Message.
    The resolution of this issue will be affected by the transition to
    headers always being Header objects, since that in fact changes
    the API that http.client is exposing if http.client switches to
    using get_all.  It would arguably not be a bad thing for this API
    to return header objects, but it means we need to think about the
    proposed compatibility layer in a wider context than just the email
    package itself.

:titledissue:`7370`
    Suggested refactoring...the patch uses rfc822 but of course it should
    be the email module's formatdate (but see :issue:`5207` as well).

:titledissue:`8318`
    The use case that triggered this (parsing range objects) may or may not be
    handled by the current MIME implementation, but certainly needs to be.  We
    may wish to consider other possible applications of multifile, which is a
    bit more general that MIME parsing, and whether or not we want to provide
    an API to support those uses.  In any case the multifile docs need to be
    updated with transition instructions.


CGI
~~~

:titledissue:`4953`
    Because :class:`~email.parser.FeedParser` can't parse bytes, CGI
    can't handle upload of binary files.

:titledissue:`1367631`
    If CGI is converted to use email for parsing forms, then this
    use case (maxlen) should be considered.


nntplib
~~~~~~~


.. index::
   triple: email6; open issues; backport

Issues that could have an effect on a backport
----------------------------------------------

:titledissue:`1813`
    Python uses the "wrong" toupper/tolower methods by default, which
    can cause some problems in certain locales.  This problem does not
    exist (modulo the issues addressed by :rfc:`3454`) in py3k, where
    upper and lower are locale-independent.


.. index::
   triple: email6; open issues; 2.x only

2.x only issues
---------------

:titledissue:`2848`

:titledissue:`4212`
    Python 3 does not use LazyImporter.

:titledissue:`1379416`
    Looks like it might be a simple fix.

:titledissue:`1368247`
    doc bug, really

:titledissue:`1555842`
    Unicode isn't handled right.  Big surprise.

:titledissue:`1681333`
    Runs of 'us-ascii' encoded words lose intermediate spaces.  The py3
    code does not have the call to _normalize may be the source of the
    problem in py2.

:titledissue:`1685453`
    Master issue for unicode bugs.