加载中...

classification
Title: email.MIMEText.MIMEText.as_string incorrectly folding long subject header
Type:
Components: Library (Lib) Versions: Python 2.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: barry Nosy List: aalbrecht, barry, cjw296, salty-horse, splorgdar
Priority: Keywords: patch

Created on 2008-01-30 14:48 by cjw296, last changed 2008-06-26 08:01 by cjw296.

Files
File name Uploaded Description Edit Remove
test.py splorgdar, 2008-05-23 21:49 reproduction case
py_issue1974.diff aalbrecht, 2008-06-22 12:06 Init Header class in test case without continuation_ws keyword
Messages
msg61866 (view) Author: Chris Withers (cjw296) Date: 2008-01-30 14:48
Somewhere in email.MIMEText.MIMEText.as_string (I'm not sure where) long
subject headers are folded using a newline followed by a tab:

>>> from email.MIMEText import MIMEText
>>> m = MIMEText('foo')
>>> m['Subject']='AA '*40
>>> m.as_string()
'Content-Type: text/plain; charset="us-ascii"\nMIME-Version:
1.0\nContent-Transfer-Encoding: 7bit\nSubject: AA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA\n\tAA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA \n\nfoo'

While RFC 822 section 3.1.1 doesn't forbid this type of folding, the
current behaviour mis-renders in both Thunderbird and Outlook.

Messages generated by Thunderbird and Outlook represent the above
subject as follows:

'Content-Type: text/plain; charset="us-ascii"\nMIME-Version:
1.0\nContent-Transfer-Encoding: 7bit\nSubject: AA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA\n AA AA AA AA AA AA AA AA
AA AA AA AA AA AA AA AA AA \n\nfoo'

Could the email library be adjusted to do the same? Years of wondering
why mails re-sent by mailman, mails from any number of python web
systems and recent complaints from customers of mine of mis-rendered
emails could be solved by doing so.
msg67270 (view) Author: Steve Baker (splorgdar) Date: 2008-05-23 21:33
I'm also hitting this problem. I'm including a demo script. The minimal
condition for me to reproduce this bug is that the subject contain at
least 76 characters of any kind, followed by a space, followed by at
least 3 characters of any kind.
msg68562 (view) Author: Andi Albrecht (aalbrecht) Date: 2008-06-22 12:06
For me this issue seems to be a duplicate of issue1645148.

In my opinion the test case that checks if headers created by strings or
Header instances are equal is incorrect. It shouldn't set the
continuation whitespace explicitly when creating a Headers instance. I
would expect that

  A) msg['aheader'] = 'a long string that breaks, but shortened here'

and

  B) msg['aheader'] = Header('a long string that breaks, but shortened
here')

result in the same output. But for A) a Header instance is initialized
in Generator._write_headers() with continuation_ws set to `\t` and for
B) the default ' ' is used (see http://bugs.python.org/msg31102).

I'm uploading a patch that modifies test_string_headerinst_eq() to what
I think it should look like with this message. Of course, this test will
fail at the moment, but maybe you agree that the Header instance in this
test should be initialized without explicitly setting continuation_ws.
msg68568 (view) Author: Chris Withers (cjw296) Date: 2008-06-22 16:06
Andi, I'm in total agreement with you :-)
(so if this bug could get fixed, both issues could get closed)
msg68630 (view) Author: Andi Albrecht (aalbrecht) Date: 2008-06-23 14:45
IMO, the best way to resolve this is to use the default continuation
whitespace when a Header class is created from a string. But I'm pretty
unsure about the default whitespace character. After having a look at a
random set of mails in my inbox it seems that in most cases a tab is
used for all headers except for the subject (excluding mails sent from
Python scripts of course...). I would prefer '\t'; it's allowed and
results in a cleaner output - but it won't solve the problem with some
email clients which was the starting point for this issue (a short note
in the docs should resolve this anyway... ;-)

But I wonder if for some reason setting headers from strings
intentionally differs from using the Header class? There are some tests
that explicitly test both cases and expect different output. In
TestLongHeaders it's

 test_long_unbreakable_lines_with_continuation()
 test_long_lines_with_different_header(self)
 test_long_received_header() [continuation_ws set to tab here]

So, what I would do is
 * remove continuation_ws keyword from Generator._write_headers()
 * make '\t' the default cont. whitespace for headers (Header class)
 * update test cases and the documentation accordingly

Do you think that this is the right way to go? Or would it break too
much code outside stdlib if the default ws changes?
msg68634 (view) Author: Barry A. Warsaw (barry) Date: 2008-06-23 14:59
I'm pretty convinced that this stuff is broken and needs serious
attention.  Unfortunately, I won't have time to address the email
package before 2.6 and 3.0 get released.  My current work lives at

bzr+ssh://pythonbzr@code.python.org/python/users/barry/30email

but it hasn't been updated for a while.  I highly suggest taking this to
the email-sig to see if anybody else has spare cycles to work in this
for now.
msg68658 (view) Author: Chris Withers (cjw296) Date: 2008-06-23 23:06
Again, in total agreement with Andi *EXCEPT*:

Please use ' ' instead of '\t' for the continuation character.

It's the \t that gets mis-rendered by Outlook and Thunderbird (at the
very least!) and since ' ' is also valid according to the RFC, we make
life much easier for ourselves if we just stick to that...
msg68665 (view) Author: Andi Albrecht (aalbrecht) Date: 2008-06-24 04:33
FWIW, I've uploaded a patch to codereview: 

http://codereview.appspot.com/2407

It uses a space character as Chris suggested.
msg68750 (view) Author: Ori Avtalion (salty-horse) Date: 2008-06-25 19:42
I think there's been a little misinterpretation of the standard in the
comments above.

It's important to note that RFC 2822 basically defines folding as
"adding a CRLF before an existing whitespace in the original message". 

See http://tools.ietf.org/html/rfc2822#section-2.2.3

It does *not* allow prepending folded lines with extra characters that
were not in the original message such as '\t' or ' '.

This is exactly what _encode_chunks does in header.py:
    joiner = NL + self._continuation_ws

(Note that the email package docs and Header docstring use the word
'prepend' which is reflects the error in the code).

With a correct implementation, why would I want to choice of which type
of character to line-break on when folding?
The whole notion of controlling the value of continuation_ws seems wrong.

However, changing the default continuation_ws to ' ', as the patch
suggests, will output syntactically correct headers in the majority of
cases (due to other bugs that remove trailing whitespace and merge
consecutive whitespace into one character).


All in all, I agree with the change of the default continuation_ws due
to its lucky side-effects, but as Barry hinted, the algorithm needs some
serious work to really output valid headers.

Some examples of the good and bad behaviors:

>>> from email.Header import Header
>>> l = ['<%d@dom.ain>' % i for i in range(8)]

>>> # this turns out fine
>>> Header(' '.join(l), continuation_ws=' ').encode()
'<0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain>
<5@dom.ain>\n <6@dom.ain> <7@dom.ain>'

# This does not fold even though it should
>>> Header('\t'.join(l), continuation_ws=' ').encode()
'<0@dom.ain>\t<1@dom.ain>\t<2@dom.ain>\t<3@dom.ain>\t<4@dom.ain>\t<5@dom.ain>\t<6@dom.ain>\t<7@dom.ain>'

# And here the 4-char whitespace is shrinked into one
>>> Header('    '.join(l), continuation_ws=' ').encode()
'<0@dom.ain> <1@dom.ain> <2@dom.ain> <3@dom.ain> <4@dom.ain>
<5@dom.ain>\n <6@dom.ain> <7@dom.ain>'
msg68767 (view) Author: Chris Withers (cjw296) Date: 2008-06-26 08:01
Ori,

I do agree with both you and Barry but is there any chance someone could
make the one-character change to make the /t a space so we can stop
seeing weirdness in common mail clients?

Perhaps a separate issue could be raised to refactor this all correctly?
History
Date User Action Args
2008-06-26 08:01:52cjw296setmessages: + msg68767
2008-06-25 19:42:05salty-horsesetnosy: + salty-horse
messages: + msg68750
2008-06-24 04:34:00aalbrechtsetmessages: + msg68665
2008-06-23 23:06:35cjw296setmessages: + msg68658
2008-06-23 14:59:24barrysetmessages: + msg68634
2008-06-23 14:45:33aalbrechtsetmessages: + msg68630
2008-06-22 16:06:05cjw296setmessages: + msg68568
2008-06-22 12:06:53aalbrechtsetfiles: + py_issue1974.diff
keywords: + patch
messages: + msg68562
nosy: + aalbrecht
2008-05-23 21:49:11splorgdarsetfiles: - test
2008-05-23 21:49:02splorgdarsetfiles: + test.py
2008-05-23 21:33:33splorgdarsetfiles: + test
nosy: + splorgdar
messages: + msg67270
2008-02-01 15:52:13gvanrossumsetassignee: barry
nosy: + barry
2008-01-30 14:48:04cjw296create