Fedora and Python 2

Posted Apr 11, 2018 23:58 UTC (Wed) by mjblenner (subscriber, #53463)
In reply to: Fedora and Python 2 by togga
Parent article: Fedora and Python 2

> Both of these crash in python3. Bugs or features?

Features.

> python2 -c "import json; print(json.dumps(b'xx'))"

JSON is UTF-{8|16|32}. What, exactly, do you want python to do with random bytes?

> echo -n -e "\xFF" | python2 -c "import sys; print(repr(sys.stdin.read()))"

Use sys.stdin.buffer to get bytes rather than UTF-8.

Fedora and Python 2

Posted Apr 12, 2018 8:15 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> Use sys.stdin.buffer to get bytes rather than UTF-8.
And remember to never, ever mix them in the same program. Oh, and it'll mostly work if you forget .buffer in one place. It'll just crash with bad data sometimes.

Fedora and Python 2

Posted Apr 12, 2018 10:24 UTC (Thu) by mjblenner (subscriber, #53463) [Link] (3 responses)

> And remember to never, ever mix them in the same program. Oh, and it'll mostly work if you forget .buffer in one place. It'll just crash with bad data sometimes.

Never ever mix them because if you do it will mostly work? (OK...)

Anyway, sounds like you need

PYTHONIOENCODING="utf-8:surrogateescape"

or use open(0, 'rb') or something, depending on what you're trying to do.

Fedora and Python 2

Posted Apr 12, 2018 17:05 UTC (Thu) by togga (guest, #53103) [Link] (2 responses)

Thanks for the heads up in the new world of Python. I figure I should expect the user to have set this PYTHONIOENCODING variable to "random" to begin with.

Scripts should then always start by setting this parameter, or is it to late? Are we talking shell wrappers here or refuse to start if set incorrectly?
If we do multiple things with multiple needs for encoding, do we need different settings for for different incoming data, in other words set it with each read?

Fedora and Python 2

Posted Apr 12, 2018 17:52 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

You can't both expect programs to work with whatever random cruft you feed them, *and* to keep your data safe.

Setting the encoding to whatever is actually used is simple enough – besides, that stuff happens to work correctly when your data and your locale match. Surprise: they usually do. And if you want to process binary data, then use "sys.stdin/out.buffer" (or binary mode). This is documented.

On the other hand, allowing a random mix of differently-encoded strings (which is what Python2 or Perl do) and then trying to disentangle the resulting mojibake (or even figure out what causes it) is a frustrating and sometimes futile exercise in preventing data loss after it's too late. Been there, done that, bitten the carpet.

"Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.

Fedora and Python 2

Posted Apr 12, 2018 21:59 UTC (Thu) by togga (guest, #53103) [Link]

I haven't got a clue what you're talking about but given python's dynamic typing this line was quite amusing :-)

> "Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.

Fedora and Python 2

Posted Apr 12, 2018 16:54 UTC (Thu) by togga (guest, #53103) [Link]

"What, exactly, do you want python to do with random bytes?"

Python2 just keeps them as is, works wonderful.

"Use sys.stdin.buffer to get bytes rather than UTF-8."

Thanks. It worked. I made a compatible version. Awesome.

$ echo -n -e "\xFF" | python3 -c "import sys; S=type('S', (bytes,), {'__repr__': lambda s: bytes.__repr__(s)[1:]}); read_stdin=sys.stdin.buffer.read; sys.stdin.read = lambda: S(read_stdin()); print(repr(sys.stdin.read()))"
'\xff'