Python 2.8?

Posted Jan 12, 2017 16:11 UTC (Thu) by JFlorian (guest, #49650)
In reply to: Python 2.8? by epa
Parent article: Python 2.8?

Or how about we put that effort into completing the port to Python 3 and kick Python 2 to the curb where it belongs? I don't mean that in a snide way, but if an organization is really tied to something ancient like RHEL5, it must be because they really want no change... at all.

Python 2.8?

Posted Jan 12, 2017 16:42 UTC (Thu) by xnox (guest, #63320) [Link]

My understanding is that people who worked on Py3k migration for OpenStack are no longer in full-time employment @ HPE... Hence OpenStack will be stuck at 2.7. Who knows if it is still a viable project or not, given that most/largest public clouds are proprietary (AWS, Azure, GCE, RackSpace (has additions, semi-openstack but not quite, at least that's my perception of RackSpace))

Python 2.8?

Posted Jan 12, 2017 23:19 UTC (Thu) by lsl (subscriber, #86508) [Link] (8 responses)

Some people actually prefer Python 2 as a language. I tend to agree with that. The Unicode thing in Python 3 is just too much of a mess.

Even Nick Coghlan's notes on Python 3[1] hint at the fact that people writing systems or networking code were thrown under the bus for the alleged benefit of "high-level application code" and supposedly better Windows integration.

Except that the former is way too fuzzy a concept to be useful (what program doesn't use the file system at all?) and the latter is just plain wrong. Look at Go for how to do it right. It has excellent Unicode support on Windows, yet manages to present a sane interface to programmers, converting to UTF-16 only when calling into the system.

If the obvious way to open a file specified in argv is broken, your language is doing it wrong. If I have to re-open stdin/stdout in some kind of weird "binary" mode just so that my program works, your language is, again, wrong.

[1] http://python-notes.curiousefficiency.org/en/latest/pytho...

Python 2.8?

Posted Jan 13, 2017 0:55 UTC (Fri) by foom (subscriber, #14868) [Link] (7 responses)

Ironically, Python 3.6 has actually, finally, fixed the way it deals with Windows path APIs when given non-unicode strings -- it now uses the *W APIs, and recodes into utf-8.

https://www.python.org/dev/peps/pep-0529/

So, now, finally, Python supports a sane API to access files: use byte strings on all platforms. Too bad it came too late...That'd be a real good candidate for importing into Python 2.8, though!

Python 2.8?

Posted Jan 13, 2017 3:19 UTC (Fri) by excors (subscriber, #95769) [Link] (6 responses)

It seems to convert to something that's nearly but not quite UTF-8:

Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32

>>> open('test-\ud800.txt', 'w').close()

>>> os.listdir('.')
['test-\ud800.txt']

>>> os.listdir(b'.')
[b'test-\xed\xa0\x80.txt']

>>> [f.decode('utf-8') for f in os.listdir(b'.')]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 5: invalid continuation byte

So you need to treat them as opaque byte strings, not as encoded Unicode even on Windows.

Hmm, but how are you meant to use byte strings with open()? I would have thought this should work, but it doesn't:

>>> [open(f, 'r').close() for f in os.listdir(b'.')]
FileNotFoundError: [Errno 2] No such file or directory: b'test-\xed\xa0\x80.txt'

Python 2.8?

Posted Jan 13, 2017 19:12 UTC (Fri) by foom (subscriber, #14868) [Link] (2 responses)

Regarding [f.decode('utf-8') for f in os.listdir(b'.')]: Apparently python no longer allows surrogates in its utf-8 codec by default; you need to use: .decode('utf-8', errors='surrogatepass'), instead.

Regarding [open(f, 'r').close() for f in os.listdir(b'.')]: That sounds like a bug, at least to me.

Python 2.8?

Posted Jan 13, 2017 19:53 UTC (Fri) by excors (subscriber, #95769) [Link] (1 responses)

The open() issue is not restricted to weird surrogate cases - it fails with pretty much any non-ASCII filename, like 'test-\u00c0.txt' ("No such file or directory: b'test-\xc3\x80.txt'"). Looks like it actually tries to open 'test-\u00c3\u20ac.txt', i.e. open() is always decoding the filename as Win-1252, which doesn't seem especially helpful. (This is with Python 3.6.0 on English-language Windows 7.)

os.open() seems to do the right thing with byte string filenames, but I guess it would be nice if open() did too. So I think this claim:

> Python 3.6 has actually, finally, fixed the way it deals with Windows path APIs

is unfortunately a bit premature.

Python 2.8?

Posted Jan 13, 2017 23:04 UTC (Fri) by foom (subscriber, #14868) [Link]

D'oh.. Guess that's what I get for not testing before praising it... 😐

Python 2.8?

Posted Jan 14, 2017 0:08 UTC (Sat) by vstinner (subscriber, #42675) [Link] (2 responses)

> It seems to convert to something that's nearly but not quite UTF-8:

Python 3.6 now uses UTF-8/surrogatepass on Windows in os.fsdecode() / os.fsencode(). Hopefully, these functions are almost never used on Windows, since Windows has a native support for Unicode. For example, command line arguments, list filenames in a directory, get the hostname, ... : Windows return data directly as Unicode.

The surrogatepass error handler is required to support the same character set than Windows. Windows does allow surrogate characters in filenames. It's really weird and does not conform to Unicode standards which deny these characters in UTF-* encodings.

Python 3 respects Unicode standards: surrogate characters are not allowed in the UTF-8 codec for example. It allowed to implement new nice error handlers for UTF-8: surrogateescape, surrogatepass, etc. By the way, Python 3.6 has a new interesting "namereplace" error handler.

Python 2.8?

Posted Jan 15, 2017 21:11 UTC (Sun) by mathstuf (subscriber, #69389) [Link]

Rust handles this by having an "OsStr" type at system call boundaries related to paths (and, IIRC, environment variables and process arguments). Strings can be cast to them easily (they implement AsRef<OsStr>), but getting them back out requires an explicit from_utf8 call (which can fail) or a _lossy version (which uses replacement characters for unconvertible sequences). On Windows, there is a purely internal "WTF-16" encoding which is UTF-16 with allowances for Windows specific exceptions. This allows the encodings to not get mixed up and allows the real POSIX policy of "filenames are sequences of nonzero, non-/ characters" gracefully without having LANG screw up your code because assumptions are made based on it.

But Python isn't a fan of these kinds of type safties (implicit casting with exceptions would probably be fine though and would have better error messages than "file from readdir does not exist" errors).

Python 2.8?

Posted Jan 20, 2017 4:17 UTC (Fri) by foom (subscriber, #14868) [Link]

> Windows return data directly as Unicode

I know you clarified later in your comment, but I'd just like to emphasize (esp since a lot of people seem to say that without clarification): Windows APIs absolutely *do not* return "Unicode". Instead, they deal with arrays of 16-bit values, which, when you're lucky, can be decoded via UTF-16 into a unicode string.

And, just like decoding Linux "UTF-8" paths to a unicode string might fail due to the path not actually being UTF8, decoding a Windows "UTF-16" path might fail due to the path not actually being valid UTF16.

In both OSes, in order to avoid errors, you'll want to tell the unicode decoder/encoder to allow the invalid input bytestrings and transform to something nonsensical but reversible. Or, alternatively, avoid decoding the paths into unicode at all, and just leave them in their native 8/16-bit bytestring representations.