Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 20:32 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by pj
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3

> I think your statement begs the question: which users?
Indeed. Python maintainers decided to pick and choose them. Only good-behaving users who like eating their veggies ( https://snarky.ca/porting-to-python-3-is-like-eating-your... ) were allowed in.

As a result, Py3 has lost several huge codebases that started migrating to Go instead. Other projects like Mercurial or OpenStack started migration at the very last moment, because of 2.7 EoL.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 21:01 UTC (Mon) by vstinner (subscriber, #42675) [Link] (25 responses)

OpenStack project is way larger than Mercurial and has way more dependencies. OpenStack is more than 2 millions lines of Python code. OpenStack migration started in 2013, and I contributed to port like 90% of unit tests of almost all major projects (all except Nova and Swift where were less open for Python 3 changes), and I helped to port many 3rd party dependencies to Python 3 as well. All OpenStack projects have mandatory python3 CI since 2016 to, at least, not regress on what was already ported. See https://wiki.openstack.org/wiki/Python3 for more information. (I stopped working on OpenStack 2 years ago, so I don't know the current status.)

As Mercurial, Twisted is heavily based on bytes (networking framework) and it has been ported successfully to Python 3 a few years. Twisted can now be used with asyncio.

I tried to help porting Mercurial to Python 3, but their maintainers were not really open to discuss Python 3 when I tried. Well, I wanted to use Unicode for filenames, they didn't want to hear this idea. I gave up ;-)

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 22:16 UTC (Mon) by excors (subscriber, #95769) [Link] (23 responses)

> Well, I wanted to use Unicode for filenames, they didn't want to hear this idea.

The article mentions that issue: POSIX filenames are arbitrary byte strings. There is simply no good lossless way to decode them to Unicode. (There's PEP 383 but that produces strings that are not quite Unicode, e.g. it becomes impossible to encode them as UTF-16, so that's not good). And Windows filenames are arbitrary uint16_t strings, with no good lossless way to decode them to Unicode. For an application whose job is to manage user-created files, it's not safe to make assumptions about filenames; it has to be robust to whatever the user throws at it.

(The article also mentions the solution, as implemented in Rust: filenames are a platform-specific string type, with lossy conversions to Unicode if you really want that (e.g. to display to users).)

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 23:19 UTC (Mon) by vstinner (subscriber, #42675) [Link] (12 responses)

> The article mentions that issue: POSIX filenames are arbitrary byte strings. There is simply no good lossless way to decode them to Unicode.

On Python 3, there is a good practical solution for that: Python uses surrogateescape error handler (PEP 383) by default for filenames. It escapes undecodable bytes as Unicode surrogate characters.

Read my articles https://vstinner.github.io/python30-listdir-undecodable-f... and https://vstinner.github.io/pep-383.html for the history the Unicode usage for filenames in the early days of Python 3 (Python 3.0 and Python 3.1).

The problem is that the UTF-8 codec of Python 2 doesn't respect the Unicode standard: it does encode surrogate characters. The Python 3 codec doesn't encode them, which makes possible to use surrogateescape error handler with UTF-8.

> And Windows filenames are arbitrary uint16_t strings, with no good lossless way to decode them to Unicode.

I'm not sure of which problem you're talking about.

If you care of getting the same character on Windows and Linux (ex: é letter = U+00E9), you should encode the filename differently. Storing the filename as Unicode in the application is a convenient way for that. That's why Python prefers Unicode for filenames. But it also accepts filenames as bytes.

> For an application whose job is to manage user-created files, it's not safe to make assumptions about filenames; it has to be robust to whatever the user throws at it.

Well, it is where I gave up :-)

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 23:29 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> I'm not sure of which problem you're talking about.
A VCS must be able to round-trip files on the same FS. Even if they are not encoded correctly.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 23:37 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

It sounds to me like on Windows you can round-trip arbitrary native filenames through Python "Unicode" strings because in both systems the strings are simply a list of 16-bit code units (which are normally interpreted as UTF-16 but may not be valid UTF-16). So maybe that 'surrogateescape' hack is enough. (But only because Python3 Unicode strings don't have to be valid Unicode after all.)

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 2:22 UTC (Tue) by excors (subscriber, #95769) [Link] (2 responses)

Python strings aren't 16-bit code units. b'\xf0\x92\x8d\x85'.decode('utf-8') is '\U00012345' with length 1, which is sensible.

You can't create a string like '\U00123456' (SyntaxError) or chr(0x123456) (ValueError); it's limited to the 21-bit range. But you *can* create a string like '\udccc' and Python will happily process it, at least until you try to encode it. '\udccc'.encode('utf-8') throws UnicodeEncodeError.

If you use the special decoding mode, b'\xcc'.decode('utf-8', 'surrogateescape') gives '\udccc'. If you (or some library) does that, now your application is tainted with not-really-Unicode strings, and I think if you ever try to encode without surrogateescape then you'll risk getting an exception.

If you tried to decode Windows filenames as round-trippable UCS-2, like

>>> ''.join(chr(c) for c, in struct.iter_unpack(b'>H', b'\xd8\x08\xdf\x45'))
'\ud808\udf45'

then you'd be introducing a third type of string (after Unicode and Unicode-plus-surrogate-escapes) which seems likely to make things even worse.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 2:44 UTC (Tue) by excors (subscriber, #95769) [Link]

> I think if you ever try to encode without surrogateescape then you'll risk getting an exception

Incidentally, that seems to include the default encoding performed by print() (at least in Python 3.6 on my system):

>>> for f in os.listdir('.'): print(f)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udccc' in position 4: surrogates not allowed

os.listdir() will surrogateescape-decode and functions like open() will surrogateescape-encode the filenames, but that doesn't help if you've got e.g. logging code that touches the filenames too.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 4:47 UTC (Tue) by roc (subscriber, #30627) [Link]

Thanks for clearing that up.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 8:08 UTC (Thu) by marcH (subscriber, #57642) [Link]

> A VCS must be able to round-trip files on the same FS

Yet all VCS provide some sort of auto.crlf insanity, go figure.

Just in case someone wants to use Notepad-- from the last decade.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 23:32 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

Huh, so Python3 "Unicode" strings aren't even necessarily valid Unicode :-(.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 16, 2020 17:40 UTC (Thu) by kjp (guest, #39639) [Link]

And the zen of python was forgotten long ago. Explicit is better than implicit? Errors should not pass silently? Nah. Let's just add math operators to dictionaries. Python has no direction, no stewardship, and I think it's been taken over by windows and perl folks.

Python: It's a [unstable] scripting language. NOT a systems or application language.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 1:37 UTC (Tue) by excors (subscriber, #95769) [Link] (2 responses)

> On Python 3, there is a good practical solution for that: Python uses surrogateescape error handler (PEP 383) by default for filenames. It escapes undecodable bytes as Unicode surrogate characters.

But then you end up with a "Unicode" string in memory which can't be safely encoded as UTF-8 or UTF-16, so it's not really a Unicode string at all. (As far as I can see, the specifications are very clear that UTF-* can't encode U+D800..U+DFFF. An implementation that does encode/decode them is wrong or is not Unicode.)

That means Python applications that assume 'str' is Unicode are liable to get random exceptions when encoding properly (i.e. without surrogateescape).

> > And Windows filenames are arbitrary uint16_t strings, with no good lossless way to decode them to Unicode.
>
> I'm not sure of which problem you're talking about.

Windows (with NTFS) lets you create a file whose name is e.g. "\ud800". The APIs all handle filenames as strings of wchar_t (equivalent to uint16_t), so they're perfectly happy with that file. But it's clearly not a valid string of UTF-16 code units (because it would be an unpaired surrogate) so it can't be decoded, and it's not a valid string of Unicode scalar values so it can't be directly encoded as UTF-8 or UTF-16. It's simply not Unicode.

In practice most native Windows applications and APIs treat filenames as effectively UCS-2, and they never try to encode or decode so they don't care about surrogates, though the text rendering APIs try to decode as UTF-16 and go a bit weird if that fails. Python strings aren't UCS-2 so it has to convert to 'str' somehow, but there's no correct way to do that conversion.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 6:04 UTC (Tue) by ssmith32 (subscriber, #72404) [Link] (1 responses)

Microsoft refers to it as an "extended character set":

https://docs.microsoft.com/en-us/windows/win32/fileio/nam...

Also, whatever your complaints are about whatever language, with respect to filenames, the win32 api is worse.

It's amazingly inconsistent. The level of insanity is just astonishing, especially if you're going across files created with the win api, and the .net libs.

You *have to p/invoke to read some files, and use the long filepath prefix, which doesn't support relative paths. And that's just the start.

Admittedly, I haven't touched it for almost a decade in any serious fashion, but, based on the docs linked above, it doesn't seem much has changed.

It's remarkable how easy they make it to write files that are quite hard to open..

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 15:35 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> It's amazingly inconsistent. The level of insanity is just astonishing
Just wait until you see POSIX!

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 15, 2020 0:26 UTC (Wed) by gracinet (guest, #89400) [Link]

Hey Victor,

don't forget that Mercurial has to cope with filenames in its history that are 25 years old. Yes, that predates Mercurial but some of the older repos have had a long history as CVS then SVN.

Factor in the very strong stability requirements and the fact that risk to change a hash value is to be avoided, no wonder a VCS is one of the last to take the plundge. It's really not a matter of size of the codebase in this case.

Note: I wasn't directly involved in Mercurial at the time you were engaging with the project about that, I hope some good came out of it anyway.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 2:18 UTC (Tue) by flussence (guest, #85566) [Link]

This was a sore point in Perl 6 too for many years due to its over-eagerness to destructively normalise everything on read. It fixed it eventually by adding a special encoding, similar to how Java has Modified UTF-8. It's not perfect, but without mandating a charset and normalisation at the filesystem level (something only Apple's dared to do) nothing is.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 7:57 UTC (Tue) by epa (subscriber, #39769) [Link] (2 responses)

How many Mercurial users store non-Unicode file names in a repository? Perhaps if the Mercurial developers had declared that from now on hg requires Unicode-clean filenames, their port to Python 3 would have gone much smoother.

If you do want a truly arbitrary ‘bag of bytes’ not just for file contents but for names too, I have the feeling you’d probably be using a different tool anyway.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 15:39 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

> Perhaps if the Mercurial developers had declared that from now on hg requires Unicode-clean filenames

Losing the ability to read history of when the tool did not have such a restriction would not be a good thing. Losing the ability to manipulate those files (even to rename them to something valid) would also be tricky if it failed up front about bad filenames.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 15, 2020 18:58 UTC (Wed) by hkario (subscriber, #94864) [Link]

it's easy to end up with malformed names in file system

just unzip a file from non-UTF-8 system, you're almost guaranteed to get mojibake as a result; then blindly commit files to the VCS and bam, you're set

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 11:35 UTC (Tue) by dvdeug (guest, #10998) [Link] (5 responses)

Which means there's no way to reliably share a Mercurial repository between Windows and Unix. You can either accept all filenames or make repositories portable between Windows and Unix, not both. Note that even pretending that you can support both systems ignores those whole "arbitrary byte strings" and "arbitrary uint16_t strings" issues. I'd certainly feel comfortable with Mercurial and other tools rejecting junk file names, though I can see where people with old 8-bit charset filenames in their history could have problems.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 11:59 UTC (Tue) by roc (subscriber, #30627) [Link] (4 responses)

> You can either accept all filenames or make repositories portable between Windows and Unix, not both.

You can accept all filenames and make repositories portable between Windows and Unix if they have valid Unicode filenames. AFAIK that's what Mercurial does, and I hope it's what git does.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 12:33 UTC (Tue) by dezgeg (subscriber, #92243) [Link] (3 responses)

Not quite enough... Let's not forget the portability troubles of Mac, where the filesystem does Unicode (de)normalization behind the application's back.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 13:21 UTC (Tue) by roc (subscriber, #30627) [Link] (1 responses)

OK sure. The point is: you can preserve native filenames, and also ensure that repos are portable to any OS/filesystem that can represent the repo filenames correctly. That's what I want any VCS to do.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 15:51 UTC (Tue) by Wol (subscriber, #4433) [Link]

Do what git does with line endings, maybe?

They had a load of grief with mixed Windows/linux repos, so there's now a switch that says "convert cr/lf on checkout/checkin".

Add a switch that says "enforce valid utf-8/utf-16/Apple filenames, and sort out the mess at checkout/checkin".

If that's off by default, or on by default for new repos, or whatever, then at least NEW stuff will be sane, even if older stuff isn't.

Cheers,
Wol

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 15:42 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

There are also the invalid path components on Windows. Other than the reserved names and characters, space and `.` are not allowed to be the end of a path component. All the gritty details: https://docs.microsoft.com/en-us/windows/win32/fileio/nam...

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 13, 2020 23:57 UTC (Mon) by prometheanfire (subscriber, #65683) [Link]

Openstack is working on dropping python2 support this cycle. The problem is going to be ongoing support for older versions that still support python2. Just over the weekend gate crashed on setuptools newest version being installed in python2 when it's python3 only. It's gonna be rough, and we at least semi-prepared for this.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 5:40 UTC (Tue) by ssmith32 (subscriber, #72404) [Link] (35 responses)

So confused by this (but I don't really follow projects in either language.. well some Go ones that were always Go).

Python to Go seems like a weird switch. I tend to use them for very different tasks.

Unless you're bound to GCP as a platform or something similar.

But you're not the only one mentioning this: what projects have I missed that made the switch?

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 16:02 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (34 responses)

> Python to Go seems like a weird switch. I tend to use them for very different tasks.
It actually is not, if you're writing something that is not a Jupyter notebook.

Stuff like command-line utilities and servers works really well in Go.

Several huge Python projects are migrating to Go as a result.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 17:06 UTC (Tue) by mgedmin (subscriber, #34497) [Link] (33 responses)

> Several huge Python projects are migrating to Go as a result.

Can you name them?

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 17:10 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (32 responses)

YouTube is one high-profile example. Salesforce also did a lot of rewriting internally.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 18:09 UTC (Tue) by nim-nim (subscriber, #34454) [Link] (31 responses)

Youtube migrating to the Google programming language is not surprising.

As for the rest, a lot of infra-related things are being rewritten in Go just because containers (k8s and docker both use Go). That has little to do with the benefits offered by the language. It’s good old network effects. When you’re the container language, and everyone wants to do containers, being decent is sufficient to carry the day.

No one will argue that Go is less than decent. Many will argue it’s more than decent, but that’s irrelevant for its adoption curve.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 18:25 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Rewriting a project the scope of Youtube is not a small thing. And from my sources, Py2.7->3 migration was one of the motivating factors. After all, if you're rewriting everything then why not switch a language as well?

Mind you, Google actually tried to fix some of the Python issues by trying JIT compilation with unladen-swallow project before that.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 19:05 UTC (Tue) by rra (subscriber, #99804) [Link]

Go is way, way faster than Python, consumes a lot less memory, and doesn't have the global interpreter lock so has much better multithreading. That's why you see a lot of infrastructure code move to Go.

For most applications, the speed changes don't matter and other concerns should dominate. But for core infrastructure code for large cloud providers, they absolutely matter in key places, and Python is not a good programming language for high-performance networking code.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 18, 2020 13:26 UTC (Sat) by HelloWorld (guest, #56129) [Link] (28 responses)

> No one will argue that Go is less than decent.
I will. Go is the single worst programming language design that achieved any kind of popularity in the last 10 years at least. It is archaic and outdated in pretty much every imaginable way. It puts stuff into the language that doesn't belong there, like containers and concurrency, and doesn't provide you with the tools that are needed to implement these where they belong, which is in a library. The designers of this programming language are actively pushing us back into the 1970s, and many people appear to be applauding that. It's nothing short of appalling.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 18, 2020 14:22 UTC (Sat) by smurf (subscriber, #17840) [Link] (27 responses)

Hmm. I agree with much of that. However: concurrency definitely does belong in a modern language. Just not the unstructured way Go does it. Cf. https://en.wikipedia.org/wiki/Structured_concurrency – which notes that the reasonable way to do it is via some cancellation mechanism, which also needs to be built into the language to be effective – but Go doesn't have one.

The other major gripe with Go which you missed, IMHO, is its appalling error handling; the requirement to return an "(err,result)" tuple and checking "err" *everywhere* (as opposed to a plain "result" and propagating exceptions via catch/throw or try/raise or however you'd call it) causes a "yay unreadable code" LOC explosion and can't protect against non-functional errors (division by zero, anybody?).

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 0:09 UTC (Sun) by HelloWorld (guest, #56129) [Link] (26 responses)

> Hmm. I agree with much of that. However: concurrency definitely does belong in a modern language.
It absolutely does not. Concurrency is an ever-evolving, complex topic, and if you bake any particular approach into the language, it's impossible to change it when we discover better ways of doing it. Java tried this and failed miserably (synchronized keyword). Scala didn't put it into the language. Instead, what happened is that people came up with better and better libraries. First you had Scala standard library Futures, which was a vast improvement over anything Java had to offer at the time. But they were inefficient (many context switches), had no way to interrupt a concurrent computation or safely handle resources (open file handles etc.) and made stack traces useless. Over the years, a series of better and better libraries (Monix, cats-effect) were developed, and now the ZIO library solves every single one of these and a bunch more. And you know what? Two years from now, ZIO will be better still, or we'll have a new library that is even better.

By contrast, Scala does have language support for exceptions. It's pretty much the same as Java's try/catch/finally, how did that hold up? It's a steaming pile of crap. It interacts poorly with concurrency, it easily leads to resource leaks, it's hard to compose, it doesn't tell you which errors can occur where, and everybody who knows what they're doing is using a library instead, because libraries like ZIO don't have *any* of these problems.

So based on that experience, you're going to have a hard time convincing me that concurrency needs language support. Feel free to try anyway, but try ZIO first.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 1:03 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (25 responses)

The best and most maintainable way to write servers is still thread-per-request. Go makes that easy with its lightweight threads. Much better than any async library I've tried.

It really is that simple.

Plus, Go has a VERY practical runtime with zero dependency executables and a good interactive GC. It's amazing how much better Golang's simple mark&sweep is when compared to Java's neverending morass of CMS or G1GC (that constantly require 'tuning').

Sure, I would like a bit more structured concurrency in Go, but this can come later once Go team rolls out generics.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 5:47 UTC (Sun) by HelloWorld (guest, #56129) [Link] (24 responses)

> The best and most maintainable way to write servers is still thread-per-request. Go makes that easy with its lightweight threads. Much better than any async library I've tried.

Apparently you haven't tried ZIO, because it beats the pants off anything Go can do.

It really is that simple.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 6:01 UTC (Sun) by HelloWorld (guest, #56129) [Link] (23 responses)

I'll give just one example. With ZIO it is possible to terminate a fiber without writing custom code (e. g. checking a shared flag) and without leaking resources that the fiber may have acquired. This is simply not possible in Go.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 7:58 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (22 responses)

Zio is not even in contention, since it's built by pure functional erm... how to say it politly... adherents.

Meanwhile, Go is written by practical engineers. Cancellation and timeouts are done through the use of explicitly passed context.Context, resource cleanups are done through defered blocks.

This two simple methods in practice allow complicated systems comprising hundreds thousands of LOC to work reliably. While being easy to develop and iterate, not requiring multi-minute waits for one compile/run cycle.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 10:25 UTC (Sun) by smurf (subscriber, #17840) [Link] (16 responses)

Passing a Context around and checking of every function return for errors and manually terminating no-longer-needed Goroutines isn't exactly free. It bloats the code, it's error prone, too easy to get wrong accidentally, and makes the code much less readable.

If you come across a better paradigm sometime in the future, then bake it into a new version of the language and/or its libraries, and add interoperability features. Python3 is doing this, incidentally: asyncio is a heap of unstructured callbacks that evolved from somebody noticing that you can use "yield from" to build a coroutine runner, then Trio came along with a much better concept that actually enforces structure. Today the "anyio" module affords the same structured concept on top of asyncio, and in some probably-somewhat-distant future asyncio will support all that natively.

Languages, and their standard libraries, evolve.

With Go, this transition to Structured Concurrency is not going to happen any time soon because contexts and structure are nice-to-have features which are entirely optional and not supported by most libraries, thus it's much easier to simply ignore all that fancy structured stuff (another boilerplate argument you need to pass to every goroutine and another clause to add to every "select" because, surprise, there's no built-in cancellation? get real) and plod along as usual. The people in charge of Go do not want to change that. Their choice, just as it's my choice not to touch Go.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 12:35 UTC (Sun) by HelloWorld (guest, #56129) [Link] (15 responses)

> If you come across a better paradigm sometime in the future, then bake it into a new version of the language
You haven't yet demonstrated a single advantage of putting this into the language rather than a library, which is much more flexible and easier to evolve. Your thinking that this needs to be done in the language is probably a result of too much exposure to crippled languages like Python.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 14:35 UTC (Sun) by smurf (subscriber, #17840) [Link] (14 responses)

Go doesn't have automatic cleanup. Each call to "open file" requires a "defer close file". The point of structured code is that it's basically impossible, or at least a lot harder, to violate the structural requirements.

NB, Python also has the whole thing in a library. This is not primarily about language features. The problem is that it is simply impossible to add this to Go without either changing the language, or forcing people to write even more convoluted code.

Python basically transforms "result = foo(); return result" into what Go would call "err, result = foo(Context); if (err) return err, nil; return nil,result" behind the scenes. (If you also want to handle cancellations, it gets even worse – and handling cancellation is not optional if you want a correct program.) I happen to believe that forcing each and every programmer to explicitly write the latter code instead of the former, for pretty much every function call whatsoever, is an unproductive waste of everybody's time. So don't talk to me about Python being crippled, please.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 21:17 UTC (Sun) by HelloWorld (guest, #56129) [Link] (13 responses)

NB, Python also has the whole thing in a library. This is not primarily about language features.

It very much is about language features. Python has dedicated language support for list comprehensions, concurrency and error handling. But there is no need for that. Consider these:

x = await getX()
y = await getY(x)
return x + y

[ x + y
  for x in getX()
  for y in getY(x)
]

The structure is the same: we obtain an x, then we obtain a y that depends on x (expressed by the fact that getY takes x as a parameter), then we return x + y. The details are of course different, because in one case we obtain x from an async task, and in the other we obtain x from a list, but there's nevertheless a common structure. Hence, Scala offers syntax that covers both of these use cases:

for {
  x <- getX()
  y <- getY(x)
} yield x + y

And this is a much better solution than what Python does, because now you get to write generic code that works in a wide variety of contexts including error handling, concurrency, optionality, nondeterminism, statefulness and many, many others that we can't even imagine today.

Python basically transforms "result = foo(); return result" into what Go would call "err, result = foo(Context); if (err) return err, nil; return nil,result" behind the scenes. (If you also want to handle cancellations, it gets even worse – and handling cancellation is not optional if you want a correct program.) I happen to believe that forcing each and every programmer to explicitly write the latter code instead of the former, for pretty much every function call whatsoever, is an unproductive waste of everybody's time. So don't talk to me about Python being crippled, please.

This is a false dichotomy. Not having error handling built into the language doesn't mean you have to check for errors on every call.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 10:33 UTC (Mon) by smurf (subscriber, #17840) [Link] (12 responses)

> It very much is about language features.

Well, sure, if you have a nice functional language where everything is lazily evaluated then of course you can write generic code that doesn't care whether the evaluation involves a loop or a context switch or whatever.

But while Python is not such a language, neither is Go, so in effect you're shifting the playing ground here.

> Not having error handling built into the language doesn't mean you have to check for errors on every call.

No? then what else do you do? Pass along a Haskell-style "Maybe" or "Either"? that's just error checking by a different name.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 12:51 UTC (Mon) by HelloWorld (guest, #56129) [Link] (1 responses)

Well, sure, if you have a nice functional language where everything is lazily evaluated then of course you can write generic code that doesn't care whether the evaluation involves a loop or a context switch or whatever.

You don't need lazy evaluation for this to work. Scala is not lazily evaluated and it works great there.

No? then what else do you do? Pass along a Haskell-style "Maybe" or "Either"? that's just error checking by a different name.

You can factor out the error checking code into a function, so you don't need to write it more than once. After all, this is what we do as programmers: we detect common patterns, like “call a function, fail if it failed and proceed if it didn't” and factor them out into functions. This function is called flatMap in Scala, and it can be used like so:

getX().flatMap { x =>
  getY(x).map { y =>
    x + y
  }
}

But this is arguably hard to read, which is why we have for comprehensions. The following is equivalent to the above code:

for {
  x <- getX
  y <- getY x
} yield x + y

I would argue that if you write it like this, it is no harder to read than what Python gives you:

x = getX
y = getY(x)
return x + y

But the Scala version is much more informative. Every function now tells you in its type how it might fail (if at all), which is a huge boon to maintainability. You can also easily see which function calls might return an error, because you use <- instead of = to obtain their result. And it is much more flexible, because it's not limited to error handling but can be used for things like concurrency and other things as well. It's also compositional, meaning that if your function is concurrent and can fail, that works too.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 18:30 UTC (Mon) by darwi (subscriber, #131202) [Link]

> But the Scala version is much more informative. Every function now tells you in its type how it might fail (if at all), which is a huge boon to maintainability

Long time ago (~2013), I worked as a backend SW engineer. We transformed our code from Java (~50K lines) to Scala (~7K lines, same functionality).

After the transition was complete, not a single NullPointerException was seen anywhere in the system, thanks to the Option[T] generics and pattern matching on Some()/None. It really made a huge difference.

NULL is a mistake in computing that no modern language should imitate :-( After my Scala experience, I dread using any language that openly accepts NULLs (python3, when used in a large 20k+ code-base, included!).

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 15:46 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (9 responses)

> Pass along a Haskell-style "Maybe" or "Either"? that's just error checking by a different name.

Yes, but with these types, *ignoring* (or passing on in Python) the error takes explicit steps rather than being implicit. IMO, that's a *far* better default. I would think the Zen of Python agrees…

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 17:21 UTC (Mon) by HelloWorld (guest, #56129) [Link] (8 responses)

> Yes, but with these types, *ignoring* (or passing on in Python) the error takes explicit steps rather than being implicit. IMO, that's a *far* better default.

No, passing the error on does not take explicit steps, because the monadic bind operator (>>=) takes care of that for us. And that's a Good Thing, because in the vast majority of cases that is what you want to do. The problem with exceptions isn't that error propagation is implicit, that is actually a feature, but that it interacts poorly with the type system, resources that need to be closed, concurrency etc..

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 18:28 UTC (Mon) by smurf (subscriber, #17840) [Link] (6 responses)

Python doesn't have a problem with resources to be closed (that's what "with foo() as bar"-style context managers are for), nor concurrency (assuming that you use Trio or anyio).

Typing exceptions is an unsolved problem; conceivably it could be handled by a type checker like mypy. However, in actual practice most code is documented as possibly-raising a couple of well-known "special" exceptions derived from some base type ("HTTPError"), but might actually raise a couple of others (network error, cancellation, character encoding …). Neither listing them all separately (much too tedious) nor using a catch-all BaseException (defeats the purpose) is a reasonable solution.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 20, 2020 22:44 UTC (Mon) by HelloWorld (guest, #56129) [Link] (2 responses)

Python doesn't have a problem with resources to be closed (that's what "with foo() as bar"-style context managers are for), nor concurrency (assuming that you use Trio or anyio).

Sure, you can solve every problem that arises from adding magic features to the language by adding yet more magic. First, they added exceptions. But that interacted poorly with resource cleanup, so they added with to fix that. Then they realized that this fix interacts poorly with asynchronous code, and they added async with to cope with that. So yes, you can do it that way, because given enough thrust, pigs fly just fine. But you have yet to demonstrate a single advantage that comes from doing so.

On the other hand, there are trivial things that can't be done with with. For instance, if you want to acquire two resources, do stuff and then release them, you can just nest two with statements. But what if you want to acquire one resource for each element in a list? You can't, because that would require you to nest with statements as many times as there are elements in the list. In Scala with a decent library (ZIO or cats-effect), resources are a Monad, and lists have a traverse method that works with ALL monads, including the one for resources and the one for asynchronous tasks. But while asyncio.gather (which is basically the traverse equivalent for asynchronous code) does exist, there's no such thing in contextlib, which proves my point exactly: you end up with code that is constrained to specific use cases when it could be generic and thus much easier to learn because it works the same for _all_ monads.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 21, 2020 6:51 UTC (Tue) by smurf (subscriber, #17840) [Link] (1 responses)

> But what if you want to acquire one resource for each element in a list?

You use an [Async]ExitStack. It's even in contextlib.

Yes, functional languages with Monads and all that stuff in them are super cool. No question. They're also super hard to learn compared to, say, Python.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 21, 2020 14:32 UTC (Tue) by HelloWorld (guest, #56129) [Link]

> You use an [Async]ExitStack. It's even in contextlib.
You can *always* write more code to fix any problem. That isn't the issue here, it's about code reuse. ExitStack shouldn't be needed, and neither should AsyncExitStack. These aren't solutions but symptoms.

> They're also super hard to learn compared to, say, Python.
For the first time, you're actually making an argument for putting the things in the language. But I'm not buying it, because I see how much more stuff I need to learn about in Python that just isn't necessary in fp. There's no ExitStack or AsyncExitStack in ZIO. There's no `with` statement. There's no try/except/finally, there's no ternary operator, no async/await, no assignment expressions, none of that nonsense. It's all just functions and equational reasoning. And equational reasoning is awesome _because_ it is so simple that we can teach it to high school students.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 21, 2020 2:32 UTC (Tue) by HelloWorld (guest, #56129) [Link] (2 responses)

I also think you're conflating two separate issues when it comes to error handling: language and library design. On the language side, this is mostly a solved problem. All you need is sum types, because they allow you to express that a computation either succeeded with a certain type or failed with another. The rest can be done in libraries.

If listing the errors that an operation can throw is too tedious, I would argue that that is not a language problem but a library design problem, because if you can't even list the errors that might happen in your function, you can't reasonably expect people to handle them either. You need to constrain the number of ways that a function can fail in, normally by categorising them in some way (e. g. technical errors vs. business domain errors). I think this is actually yet another way in which strongly typed functional programming pushes you towards better API design.

Unfortunately Scala hasn't proceeded along this path as far as I would like, because much of the ecosystem is based on cats-effect where type-safe error handling isn't the default. ZIO does much better, which is actually a good example of how innovation can happen when you implement these things in libraries as opposed to the language. Java has checked exceptions, and they're utterly useless now that everything is async...

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 21, 2020 7:13 UTC (Tue) by smurf (subscriber, #17840) [Link] (1 responses)

> Java has checked exceptions, and they're utterly useless now that everything is async

… and unstructured.

The Java people have indicated that they're going to migrate their async concepts towards Structured Concurrency, at which point they'll again be (somewhat) useful.

> If listing the errors that an operation can throw is too tedious, I would argue that that is not a language problem but a library design problem

That's one side of the medal. The other is that IMHO a library which insists on re-packaging every kind of error under the sun in its own exception type is intensely annoying because that loses or hides information.

There's not much commonality between a Cancellation, a JSON syntax error, a character encoding problem, or a HTTP 50x error, yet an HTTP client library might conceivably raise any one of those. And personally I have no problem with that – I teach my code to retry any 40x errors with exponential back-off and leave the rest to "retry *much* later and alert a human", thus the next-higher error handler is the Exception superclass anyway.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Mar 19, 2020 16:52 UTC (Thu) by bjartur (guest, #67801) [Link]

Nesting result types explicitly is helpful because it makes you wonder when an exponential backoff is appropriate.

How about getWeather:: String→ DateTime→ IO (DnsResponse (TcpSession (HttpResponse (Json Weather)))) where each layer can fail? Of course, there's some leeway in choosing how to layer the types (although handling e.g. out-of memory errors this way would be unreasonable IMO).

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 21, 2020 22:42 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Even `>>=` is explicit error handling here (as it is with all the syntactic sugar that boils down to it). Using the convenience operators like >>= or Rust's Result::and_then or other similar methods are explicitly handling error conditions. Because the compiler knows about them it can clean up all the resources and such in a known way versus the unwinder figuring out what to do.

As a code reviewer, implicit codepaths are harder to reason about and don't make me as confident when reviewing such code (though the bar may also be lower in these cases because error reporting of escaping exceptions may be louder ignoring the `except BaseException: pass` anti-pattern instances).

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 11:23 UTC (Sun) by HelloWorld (guest, #56129) [Link] (4 responses)

> Zio is not even in contention, since it's built by pure functional erm... how to say it politly... adherents.

You're free to stick with purely dysfunctional programming then. Have fun!

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 18:49 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Indeed. It's way superior because it's actually used in practice.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 21:19 UTC (Sun) by HelloWorld (guest, #56129) [Link] (2 responses)

Well, so is pure FP that you have obviously no clue about and reject for purely ideological reasons. Oh well, your loss.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 21:25 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I actually worked on a rather large project in Haskell (a CPU circuit simulator) and I don't have many fond memories about it. I also spent probably several months in aggregate waiting for Scala code to compile.

My verdict is that pure FP languages are used only for ideological reasons and are totally impractical otherwise.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 19, 2020 22:42 UTC (Sun) by HelloWorld (guest, #56129) [Link]

So you worked on a bad Haskell project and that somehow makes functional programming impractical? That's not how logic works, but it does explain your unsubstantiated knee-jerk reactions to everything fp.

> I also spent probably several months in aggregate waiting for Scala code to compile.
There is some truth to this, it would be nice if the compiler were faster. That said, it has become significantly faster over the years and it's not nearly slow enough to make programming in Scala “totally impractical”. And the fact that I was able to name a very simple problem (“make an asynchronous operation interruptible without writing (error-prone) custom code and without leaking resources”) that has a trivial solution with ZIO and no solution at all in Go proves that pure fp has nothing to do with ideology. It solves real-world problem. There's a reason why React took over in the frontend space: it works better than anything else because it's functional.