|
|
Subscribe / Log in / New account

Szorc: Mercurial's Journey to and Reflections on Python 3

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 17, 2020 16:51 UTC (Fri) by excors (subscriber, #95769)
In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by anselm
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3

> If you don't care what the file names look like, you shouldn't have an issue with Python using surrogate escapes, which it will if it encounters a file name that's not UTF-8.

You'll have an issue in Python when you say print("Opening file %s" % sys.argv[1]) or print(*os.listdir()), and it throws UnicodeEncodeError instead of printing something that looks nearly correct.

You can see the file in ls, tab-complete it in bash, pass it to Python on the command line, pass it to open() in Python, and it works; but then you call an API like print() that doesn't use surrogateescape by default and it fails. (It works in Python 2 where everything is bytes, though of course Python 2 has its own set of encoding problems.)

Anyway, I think this thread started with the comment that Mercurial's maintainers didn't want to "use Unicode for filenames", and I still think that's not nearly as simple or good an idea as it sounds. Filenames are special things that need special handling, and surrogateescape is not a robust solution. Any program that deals seriously with files (like a VCS) ought to do things properly, and Python doesn't provide the tools to help with that, which is a reason to discourage use of Python (especially Python 3) for programs like Mercurial.


to post comments


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds