Fedora and Python 2
Fedora and Python 2
Posted Apr 11, 2018 23:58 UTC (Wed) by mjblenner (subscriber, #53463)In reply to: Fedora and Python 2 by togga
Parent article: Fedora and Python 2
Features.
> python2 -c "import json; print(json.dumps(b'xx'))"
JSON is UTF-{8|16|32}. What, exactly, do you want python to do with random bytes?
> echo -n -e "\xFF" | python2 -c "import sys; print(repr(sys.stdin.read()))"
Use sys.stdin.buffer to get bytes rather than UTF-8.
Posted Apr 12, 2018 8:15 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Posted Apr 12, 2018 10:24 UTC (Thu)
by mjblenner (subscriber, #53463)
[Link] (3 responses)
Never ever mix them because if you do it will mostly work? (OK...)
Anyway, sounds like you need
PYTHONIOENCODING="utf-8:surrogateescape"
or use open(0, 'rb') or something, depending on what you're trying to do.
Posted Apr 12, 2018 17:05 UTC (Thu)
by togga (guest, #53103)
[Link] (2 responses)
Scripts should then always start by setting this parameter, or is it to late? Are we talking shell wrappers here or refuse to start if set incorrectly?
Posted Apr 12, 2018 17:52 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
Setting the encoding to whatever is actually used is simple enough – besides, that stuff happens to work correctly when your data and your locale match. Surprise: they usually do. And if you want to process binary data, then use "sys.stdin/out.buffer" (or binary mode). This is documented.
On the other hand, allowing a random mix of differently-encoded strings (which is what Python2 or Perl do) and then trying to disentangle the resulting mojibake (or even figure out what causes it) is a frustrating and sometimes futile exercise in preventing data loss after it's too late. Been there, done that, bitten the carpet.
"Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.
Posted Apr 12, 2018 21:59 UTC (Thu)
by togga (guest, #53103)
[Link]
> "Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.
Posted Apr 12, 2018 16:54 UTC (Thu)
by togga (guest, #53103)
[Link]
Python2 just keeps them as is, works wonderful.
"Use sys.stdin.buffer to get bytes rather than UTF-8."
Thanks. It worked. I made a compatible version. Awesome.
$ echo -n -e "\xFF" | python3 -c "import sys; S=type('S', (bytes,), {'__repr__': lambda s: bytes.__repr__(s)[1:]}); read_stdin=sys.stdin.buffer.read; sys.stdin.read = lambda: S(read_stdin()); print(repr(sys.stdin.read()))"
Fedora and Python 2
And remember to never, ever mix them in the same program. Oh, and it'll mostly work if you forget .buffer in one place. It'll just crash with bad data sometimes.
Fedora and Python 2
Fedora and Python 2
If we do multiple things with multiple needs for encoding, do we need different settings for for different incoming data, in other words set it with each read?
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
'\xff'