Python 2.8?
Python 2.8?
Posted Jan 12, 2017 16:11 UTC (Thu) by JFlorian (guest, #49650)In reply to: Python 2.8? by epa
Parent article: Python 2.8?
Posted Jan 12, 2017 16:42 UTC (Thu)
by xnox (guest, #63320)
[Link]
Posted Jan 12, 2017 23:19 UTC (Thu)
by lsl (subscriber, #86508)
[Link] (8 responses)
Even Nick Coghlan's notes on Python 3[1] hint at the fact that people writing systems or networking code were thrown under the bus for the alleged benefit of "high-level application code" and supposedly better Windows integration.
Except that the former is way too fuzzy a concept to be useful (what program doesn't use the file system at all?) and the latter is just plain wrong. Look at Go for how to do it right. It has excellent Unicode support on Windows, yet manages to present a sane interface to programmers, converting to UTF-16 only when calling into the system.
If the obvious way to open a file specified in argv is broken, your language is doing it wrong. If I have to re-open stdin/stdout in some kind of weird "binary" mode just so that my program works, your language is, again, wrong.
[1] http://python-notes.curiousefficiency.org/en/latest/pytho...
Posted Jan 13, 2017 0:55 UTC (Fri)
by foom (subscriber, #14868)
[Link] (7 responses)
https://www.python.org/dev/peps/pep-0529/
So, now, finally, Python supports a sane API to access files: use byte strings on all platforms. Too bad it came too late...That'd be a real good candidate for importing into Python 2.8, though!
Posted Jan 13, 2017 3:19 UTC (Fri)
by excors (subscriber, #95769)
[Link] (6 responses)
It seems to convert to something that's nearly but not quite UTF-8:
So you need to treat them as opaque byte strings, not as encoded Unicode even on Windows.
Hmm, but how are you meant to use byte strings with open()? I would have thought this should work, but it doesn't:
Posted Jan 13, 2017 19:12 UTC (Fri)
by foom (subscriber, #14868)
[Link] (2 responses)
Regarding [open(f, 'r').close() for f in os.listdir(b'.')]: That sounds like a bug, at least to me.
Posted Jan 13, 2017 19:53 UTC (Fri)
by excors (subscriber, #95769)
[Link] (1 responses)
os.open() seems to do the right thing with byte string filenames, but I guess it would be nice if open() did too. So I think this claim:
> Python 3.6 has actually, finally, fixed the way it deals with Windows path APIs
is unfortunately a bit premature.
Posted Jan 13, 2017 23:04 UTC (Fri)
by foom (subscriber, #14868)
[Link]
Posted Jan 14, 2017 0:08 UTC (Sat)
by vstinner (subscriber, #42675)
[Link] (2 responses)
Python 3.6 now uses UTF-8/surrogatepass on Windows in os.fsdecode() / os.fsencode(). Hopefully, these functions are almost never used on Windows, since Windows has a native support for Unicode. For example, command line arguments, list filenames in a directory, get the hostname, ... : Windows return data directly as Unicode.
The surrogatepass error handler is required to support the same character set than Windows. Windows does allow surrogate characters in filenames. It's really weird and does not conform to Unicode standards which deny these characters in UTF-* encodings.
Python 3 respects Unicode standards: surrogate characters are not allowed in the UTF-8 codec for example. It allowed to implement new nice error handlers for UTF-8: surrogateescape, surrogatepass, etc. By the way, Python 3.6 has a new interesting "namereplace" error handler.
Posted Jan 15, 2017 21:11 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link]
But Python isn't a fan of these kinds of type safties (implicit casting with exceptions would probably be fine though and would have better error messages than "file from readdir does not exist" errors).
Posted Jan 20, 2017 4:17 UTC (Fri)
by foom (subscriber, #14868)
[Link]
I know you clarified later in your comment, but I'd just like to emphasize (esp since a lot of people seem to say that without clarification): Windows APIs absolutely *do not* return "Unicode". Instead, they deal with arrays of 16-bit values, which, when you're lucky, can be decoded via UTF-16 into a unicode string.
And, just like decoding Linux "UTF-8" paths to a unicode string might fail due to the path not actually being UTF8, decoding a Windows "UTF-16" path might fail due to the path not actually being valid UTF16.
In both OSes, in order to avoid errors, you'll want to tell the unicode decoder/encoder to allow the invalid input bytestrings and transform to something nonsensical but reversible. Or, alternatively, avoid decoding the paths into unicode at all, and just leave them in their native 8/16-bit bytestring representations.
Python 2.8?
Python 2.8?
Python 2.8?
Python 2.8?
Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32
>>> open('test-\ud800.txt', 'w').close()
>>> os.listdir('.')
['test-\ud800.txt']
>>> os.listdir(b'.')
[b'test-\xed\xa0\x80.txt']
>>> [f.decode('utf-8') for f in os.listdir(b'.')]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 5: invalid continuation byte
>>> [open(f, 'r').close() for f in os.listdir(b'.')]
FileNotFoundError: [Errno 2] No such file or directory: b'test-\xed\xa0\x80.txt'
Python 2.8?
Python 2.8?
Python 2.8?
Python 2.8?
Python 2.8?
Python 2.8?