Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Posted Jan 19, 2020 18:36 UTC (Sun) by anselm (subscriber, #2796)In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by smurf
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3
We can debate until we're all blue in the face whether that was the right way to do it and whether the resulting incompatibilities were justified and whether "surrogateescape" should be the default for UTF8ifying random bytes you can reasonably expect to be ASCII these days, but without acknowledging said real problems this isn't going anywhere.
Python has recently (for Python values of “recently”, i.e., in Python 3.4) acquired a pathlib module that purports to enable system-independent handling of file and directory names. Presumably the way forward towards fixing the whole mess as far as file names are concerned is to handle non-UTF-8 file names in this module; they could be kept as “bags of bytes” under the hood, with best-effort conversions to UTF-8 or bytes available but not mandatory. The Path class already includes methods that will open, read, and write files and list the content of directories (returning more Path objects) etc., so one could presumably go quite far without ever having to convert a path name to UTF-8.
The problem is that there are various places in the library that expect path names as strings and can't deal with Path objects, and these would need to be fixed. As I said, it might be a possible solution for the future.