Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 9:12 UTC (Thu) by nim-nim (subscriber, #34454)
In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by Cyberax
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3

Because the whole system relies on a rust-specific Display() method. That won’t be supported or compatible with all other apps out there that will encounter rust filenames and expect them to work as normal filename arguments.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 10:28 UTC (Thu) by smurf (subscriber, #17840) [Link] (4 responses)

Well, when you print anything you're expected to use the current locale. If you can't because the string is not displayable, tough luck.

The idea that file names are somehow privileged to not require that went out of the window a long time ago. It doesn't matter one whit whether the code printing said file name is written in Python, Rust, Go, C++, Z80 assembly, or LOLCODE.

If you want a standard way to carry non-UTF8 pseudo-printable data (e.g. Latin1 filenames from the stone ages), no problem, either use the surrogateescape method or do proper shell quoting. The "write the non-UTF8 data" method is fine only when limited to streams that are known to be binary. "find -print0" comes to mind.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 16:21 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> Well, when you print anything you're expected to use the current locale. If you can't because the string is not displayable, tough luck.
And what if you need to write a transparent proxy that needs to cope with non-UTF-8 headers?

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 17:43 UTC (Thu) by smurf (subscriber, #17840) [Link] (2 responses)

You ask the tool's author to please add "surrogateescape" to their .en/decode("utf-8") calls, either unconditionally or via some special mode. Or to transparently pass unencodeable headers as bytes, either …[ditto]. Or you submit a patch to do that yourself. Or you fork the code.

None of this is rocket science. Headers are supposed to be valid ASCII strings, after all, so why blame the people who try to adhere to the standard? Yes this could have been easier from the beginning, but that's why Python 3.8 is a whole lot better at this than 3.0.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 17:53 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> None of this is rocket science. Headers are supposed to be valid ASCII strings, after all, so why blame the people who try to adhere to the standard?
The reality (that stubborn thing that doesn't go away) has agents that don't obey the standard. So a transparent proxy must accommodate it.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 17, 2020 6:11 UTC (Fri) by smurf (subscriber, #17840) [Link]

I know that. But most of the world is not a transparent proxy.