Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Posted Jan 13, 2020 20:32 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by pj
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3
Indeed. Python maintainers decided to pick and choose them. Only good-behaving users who like eating their veggies ( https://snarky.ca/porting-to-python-3-is-like-eating-your... ) were allowed in.
As a result, Py3 has lost several huge codebases that started migrating to Go instead. Other projects like Mercurial or OpenStack started migration at the very last moment, because of 2.7 EoL.
Posted Jan 13, 2020 21:01 UTC (Mon)
by vstinner (subscriber, #42675)
[Link] (25 responses)
As Mercurial, Twisted is heavily based on bytes (networking framework) and it has been ported successfully to Python 3 a few years. Twisted can now be used with asyncio.
I tried to help porting Mercurial to Python 3, but their maintainers were not really open to discuss Python 3 when I tried. Well, I wanted to use Unicode for filenames, they didn't want to hear this idea. I gave up ;-)
Posted Jan 13, 2020 22:16 UTC (Mon)
by excors (subscriber, #95769)
[Link] (23 responses)
The article mentions that issue: POSIX filenames are arbitrary byte strings. There is simply no good lossless way to decode them to Unicode. (There's PEP 383 but that produces strings that are not quite Unicode, e.g. it becomes impossible to encode them as UTF-16, so that's not good). And Windows filenames are arbitrary uint16_t strings, with no good lossless way to decode them to Unicode. For an application whose job is to manage user-created files, it's not safe to make assumptions about filenames; it has to be robust to whatever the user throws at it.
(The article also mentions the solution, as implemented in Rust: filenames are a platform-specific string type, with lossy conversions to Unicode if you really want that (e.g. to display to users).)
Posted Jan 13, 2020 23:19 UTC (Mon)
by vstinner (subscriber, #42675)
[Link] (12 responses)
On Python 3, there is a good practical solution for that: Python uses surrogateescape error handler (PEP 383) by default for filenames. It escapes undecodable bytes as Unicode surrogate characters.
Read my articles https://vstinner.github.io/python30-listdir-undecodable-f... and https://vstinner.github.io/pep-383.html for the history the Unicode usage for filenames in the early days of Python 3 (Python 3.0 and Python 3.1).
The problem is that the UTF-8 codec of Python 2 doesn't respect the Unicode standard: it does encode surrogate characters. The Python 3 codec doesn't encode them, which makes possible to use surrogateescape error handler with UTF-8.
> And Windows filenames are arbitrary uint16_t strings, with no good lossless way to decode them to Unicode.
I'm not sure of which problem you're talking about.
If you care of getting the same character on Windows and Linux (ex: é letter = U+00E9), you should encode the filename differently. Storing the filename as Unicode in the application is a convenient way for that. That's why Python prefers Unicode for filenames. But it also accepts filenames as bytes.
> For an application whose job is to manage user-created files, it's not safe to make assumptions about filenames; it has to be robust to whatever the user throws at it.
Well, it is where I gave up :-)
Posted Jan 13, 2020 23:29 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Posted Jan 13, 2020 23:37 UTC (Mon)
by roc (subscriber, #30627)
[Link] (3 responses)
Posted Jan 14, 2020 2:22 UTC (Tue)
by excors (subscriber, #95769)
[Link] (2 responses)
You can't create a string like '\U00123456' (SyntaxError) or chr(0x123456) (ValueError); it's limited to the 21-bit range. But you *can* create a string like '\udccc' and Python will happily process it, at least until you try to encode it. '\udccc'.encode('utf-8') throws UnicodeEncodeError.
If you use the special decoding mode, b'\xcc'.decode('utf-8', 'surrogateescape') gives '\udccc'. If you (or some library) does that, now your application is tainted with not-really-Unicode strings, and I think if you ever try to encode without surrogateescape then you'll risk getting an exception.
If you tried to decode Windows filenames as round-trippable UCS-2, like
>>> ''.join(chr(c) for c, in struct.iter_unpack(b'>H', b'\xd8\x08\xdf\x45'))
then you'd be introducing a third type of string (after Unicode and Unicode-plus-surrogate-escapes) which seems likely to make things even worse.
Posted Jan 14, 2020 2:44 UTC (Tue)
by excors (subscriber, #95769)
[Link]
Incidentally, that seems to include the default encoding performed by print() (at least in Python 3.6 on my system):
>>> for f in os.listdir('.'): print(f)
os.listdir() will surrogateescape-decode and functions like open() will surrogateescape-encode the filenames, but that doesn't help if you've got e.g. logging code that touches the filenames too.
Posted Jan 14, 2020 4:47 UTC (Tue)
by roc (subscriber, #30627)
[Link]
Posted Jan 16, 2020 8:08 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
Yet all VCS provide some sort of auto.crlf insanity, go figure.
Just in case someone wants to use Notepad-- from the last decade.
Posted Jan 13, 2020 23:32 UTC (Mon)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Jan 16, 2020 17:40 UTC (Thu)
by kjp (guest, #39639)
[Link]
Python: It's a [unstable] scripting language. NOT a systems or application language.
Posted Jan 14, 2020 1:37 UTC (Tue)
by excors (subscriber, #95769)
[Link] (2 responses)
But then you end up with a "Unicode" string in memory which can't be safely encoded as UTF-8 or UTF-16, so it's not really a Unicode string at all. (As far as I can see, the specifications are very clear that UTF-* can't encode U+D800..U+DFFF. An implementation that does encode/decode them is wrong or is not Unicode.)
That means Python applications that assume 'str' is Unicode are liable to get random exceptions when encoding properly (i.e. without surrogateescape).
> > And Windows filenames are arbitrary uint16_t strings, with no good lossless way to decode them to Unicode.
Windows (with NTFS) lets you create a file whose name is e.g. "\ud800". The APIs all handle filenames as strings of wchar_t (equivalent to uint16_t), so they're perfectly happy with that file. But it's clearly not a valid string of UTF-16 code units (because it would be an unpaired surrogate) so it can't be decoded, and it's not a valid string of Unicode scalar values so it can't be directly encoded as UTF-8 or UTF-16. It's simply not Unicode.
In practice most native Windows applications and APIs treat filenames as effectively UCS-2, and they never try to encode or decode so they don't care about surrogates, though the text rendering APIs try to decode as UTF-16 and go a bit weird if that fails. Python strings aren't UCS-2 so it has to convert to 'str' somehow, but there's no correct way to do that conversion.
Posted Jan 14, 2020 6:04 UTC (Tue)
by ssmith32 (subscriber, #72404)
[Link] (1 responses)
https://docs.microsoft.com/en-us/windows/win32/fileio/nam...
Also, whatever your complaints are about whatever language, with respect to filenames, the win32 api is worse.
It's amazingly inconsistent. The level of insanity is just astonishing, especially if you're going across files created with the win api, and the .net libs.
You *have to p/invoke to read some files, and use the long filepath prefix, which doesn't support relative paths. And that's just the start.
Admittedly, I haven't touched it for almost a decade in any serious fashion, but, based on the docs linked above, it doesn't seem much has changed.
It's remarkable how easy they make it to write files that are quite hard to open..
Posted Jan 14, 2020 15:35 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Jan 15, 2020 0:26 UTC (Wed)
by gracinet (guest, #89400)
[Link]
don't forget that Mercurial has to cope with filenames in its history that are 25 years old. Yes, that predates Mercurial but some of the older repos have had a long history as CVS then SVN.
Factor in the very strong stability requirements and the fact that risk to change a hash value is to be avoided, no wonder a VCS is one of the last to take the plundge. It's really not a matter of size of the codebase in this case.
Note: I wasn't directly involved in Mercurial at the time you were engaging with the project about that, I hope some good came out of it anyway.
Posted Jan 14, 2020 2:18 UTC (Tue)
by flussence (guest, #85566)
[Link]
Posted Jan 14, 2020 7:57 UTC (Tue)
by epa (subscriber, #39769)
[Link] (2 responses)
If you do want a truly arbitrary ‘bag of bytes’ not just for file contents but for names too, I have the feeling you’d probably be using a different tool anyway.
Posted Jan 14, 2020 15:39 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Losing the ability to read history of when the tool did not have such a restriction would not be a good thing. Losing the ability to manipulate those files (even to rename them to something valid) would also be tricky if it failed up front about bad filenames.
Posted Jan 15, 2020 18:58 UTC (Wed)
by hkario (subscriber, #94864)
[Link]
just unzip a file from non-UTF-8 system, you're almost guaranteed to get mojibake as a result; then blindly commit files to the VCS and bam, you're set
Posted Jan 14, 2020 11:35 UTC (Tue)
by dvdeug (guest, #10998)
[Link] (5 responses)
Posted Jan 14, 2020 11:59 UTC (Tue)
by roc (subscriber, #30627)
[Link] (4 responses)
You can accept all filenames and make repositories portable between Windows and Unix if they have valid Unicode filenames. AFAIK that's what Mercurial does, and I hope it's what git does.
Posted Jan 14, 2020 12:33 UTC (Tue)
by dezgeg (subscriber, #92243)
[Link] (3 responses)
Posted Jan 14, 2020 13:21 UTC (Tue)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Jan 14, 2020 15:51 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
They had a load of grief with mixed Windows/linux repos, so there's now a switch that says "convert cr/lf on checkout/checkin".
Add a switch that says "enforce valid utf-8/utf-16/Apple filenames, and sort out the mess at checkout/checkin".
If that's off by default, or on by default for new repos, or whatever, then at least NEW stuff will be sane, even if older stuff isn't.
Cheers,
Posted Jan 14, 2020 15:42 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
Posted Jan 13, 2020 23:57 UTC (Mon)
by prometheanfire (subscriber, #65683)
[Link]
Posted Jan 14, 2020 5:40 UTC (Tue)
by ssmith32 (subscriber, #72404)
[Link] (35 responses)
Python to Go seems like a weird switch. I tend to use them for very different tasks.
Unless you're bound to GCP as a platform or something similar.
But you're not the only one mentioning this: what projects have I missed that made the switch?
Posted Jan 14, 2020 16:02 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (34 responses)
Stuff like command-line utilities and servers works really well in Go.
Several huge Python projects are migrating to Go as a result.
Posted Jan 14, 2020 17:06 UTC (Tue)
by mgedmin (subscriber, #34497)
[Link] (33 responses)
Can you name them?
Posted Jan 14, 2020 17:10 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (32 responses)
Posted Jan 14, 2020 18:09 UTC (Tue)
by nim-nim (subscriber, #34454)
[Link] (31 responses)
As for the rest, a lot of infra-related things are being rewritten in Go just because containers (k8s and docker both use Go). That has little to do with the benefits offered by the language. It’s good old network effects. When you’re the container language, and everyone wants to do containers, being decent is sufficient to carry the day.
No one will argue that Go is less than decent. Many will argue it’s more than decent, but that’s irrelevant for its adoption curve.
Posted Jan 14, 2020 18:25 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Mind you, Google actually tried to fix some of the Python issues by trying JIT compilation with unladen-swallow project before that.
Posted Jan 14, 2020 19:05 UTC (Tue)
by rra (subscriber, #99804)
[Link]
For most applications, the speed changes don't matter and other concerns should dominate. But for core infrastructure code for large cloud providers, they absolutely matter in key places, and Python is not a good programming language for high-performance networking code.
Posted Jan 18, 2020 13:26 UTC (Sat)
by HelloWorld (guest, #56129)
[Link] (28 responses)
Posted Jan 18, 2020 14:22 UTC (Sat)
by smurf (subscriber, #17840)
[Link] (27 responses)
The other major gripe with Go which you missed, IMHO, is its appalling error handling; the requirement to return an "(err,result)" tuple and checking "err" *everywhere* (as opposed to a plain "result" and propagating exceptions via catch/throw or try/raise or however you'd call it) causes a "yay unreadable code" LOC explosion and can't protect against non-functional errors (division by zero, anybody?).
Posted Jan 19, 2020 0:09 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (26 responses)
By contrast, Scala does have language support for exceptions. It's pretty much the same as Java's try/catch/finally, how did that hold up? It's a steaming pile of crap. It interacts poorly with concurrency, it easily leads to resource leaks, it's hard to compose, it doesn't tell you which errors can occur where, and everybody who knows what they're doing is using a library instead, because libraries like ZIO don't have *any* of these problems.
So based on that experience, you're going to have a hard time convincing me that concurrency needs language support. Feel free to try anyway, but try ZIO first.
Posted Jan 19, 2020 1:03 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (25 responses)
It really is that simple.
Plus, Go has a VERY practical runtime with zero dependency executables and a good interactive GC. It's amazing how much better Golang's simple mark&sweep is when compared to Java's neverending morass of CMS or G1GC (that constantly require 'tuning').
Sure, I would like a bit more structured concurrency in Go, but this can come later once Go team rolls out generics.
Posted Jan 19, 2020 5:47 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (24 responses)
Apparently you haven't tried ZIO, because it beats the pants off anything Go can do.
It really is that simple.
Posted Jan 19, 2020 6:01 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (23 responses)
Posted Jan 19, 2020 7:58 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (22 responses)
Meanwhile, Go is written by practical engineers. Cancellation and timeouts are done through the use of explicitly passed context.Context, resource cleanups are done through defered blocks.
This two simple methods in practice allow complicated systems comprising hundreds thousands of LOC to work reliably. While being easy to develop and iterate, not requiring multi-minute waits for one compile/run cycle.
Posted Jan 19, 2020 10:25 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (16 responses)
If you come across a better paradigm sometime in the future, then bake it into a new version of the language and/or its libraries, and add interoperability features. Python3 is doing this, incidentally: asyncio is a heap of unstructured callbacks that evolved from somebody noticing that you can use "yield from" to build a coroutine runner, then Trio came along with a much better concept that actually enforces structure. Today the "anyio" module affords the same structured concept on top of asyncio, and in some probably-somewhat-distant future asyncio will support all that natively.
Languages, and their standard libraries, evolve.
With Go, this transition to Structured Concurrency is not going to happen any time soon because contexts and structure are nice-to-have features which are entirely optional and not supported by most libraries, thus it's much easier to simply ignore all that fancy structured stuff (another boilerplate argument you need to pass to every goroutine and another clause to add to every "select" because, surprise, there's no built-in cancellation? get real) and plod along as usual. The people in charge of Go do not want to change that. Their choice, just as it's my choice not to touch Go.
Posted Jan 19, 2020 12:35 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (15 responses)
Posted Jan 19, 2020 14:35 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (14 responses)
NB, Python also has the whole thing in a library. This is not primarily about language features. The problem is that it is simply impossible to add this to Go without either changing the language, or forcing people to write even more convoluted code.
Python basically transforms "result = foo(); return result" into what Go would call "err, result = foo(Context); if (err) return err, nil; return nil,result" behind the scenes. (If you also want to handle cancellations, it gets even worse – and handling cancellation is not optional if you want a correct program.) I happen to believe that forcing each and every programmer to explicitly write the latter code instead of the former, for pretty much every function call whatsoever, is an unproductive waste of everybody's time. So don't talk to me about Python being crippled, please.
Posted Jan 19, 2020 21:17 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (13 responses)
Posted Jan 20, 2020 10:33 UTC (Mon)
by smurf (subscriber, #17840)
[Link] (12 responses)
Well, sure, if you have a nice functional language where everything is lazily evaluated then of course you can write generic code that doesn't care whether the evaluation involves a loop or a context switch or whatever.
But while Python is not such a language, neither is Go, so in effect you're shifting the playing ground here.
> Not having error handling built into the language doesn't mean you have to check for errors on every call.
No? then what else do you do? Pass along a Haskell-style "Maybe" or "Either"? that's just error checking by a different name.
Posted Jan 20, 2020 12:51 UTC (Mon)
by HelloWorld (guest, #56129)
[Link] (1 responses)
Posted Jan 20, 2020 18:30 UTC (Mon)
by darwi (subscriber, #131202)
[Link]
Long time ago (~2013), I worked as a backend SW engineer. We transformed our code from Java (~50K lines) to Scala (~7K lines, same functionality).
After the transition was complete, not a single NullPointerException was seen anywhere in the system, thanks to the Option[T] generics and pattern matching on Some()/None. It really made a huge difference.
NULL is a mistake in computing that no modern language should imitate :-( After my Scala experience, I dread using any language that openly accepts NULLs (python3, when used in a large 20k+ code-base, included!).
Posted Jan 20, 2020 15:46 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (9 responses)
Yes, but with these types, *ignoring* (or passing on in Python) the error takes explicit steps rather than being implicit. IMO, that's a *far* better default. I would think the Zen of Python agrees…
Posted Jan 20, 2020 17:21 UTC (Mon)
by HelloWorld (guest, #56129)
[Link] (8 responses)
No, passing the error on does not take explicit steps, because the monadic bind operator (>>=) takes care of that for us. And that's a Good Thing, because in the vast majority of cases that is what you want to do. The problem with exceptions isn't that error propagation is implicit, that is actually a feature, but that it interacts poorly with the type system, resources that need to be closed, concurrency etc..
Posted Jan 20, 2020 18:28 UTC (Mon)
by smurf (subscriber, #17840)
[Link] (6 responses)
Typing exceptions is an unsolved problem; conceivably it could be handled by a type checker like mypy. However, in actual practice most code is documented as possibly-raising a couple of well-known "special" exceptions derived from some base type ("HTTPError"), but might actually raise a couple of others (network error, cancellation, character encoding …). Neither listing them all separately (much too tedious) nor using a catch-all BaseException (defeats the purpose) is a reasonable solution.
Posted Jan 20, 2020 22:44 UTC (Mon)
by HelloWorld (guest, #56129)
[Link] (2 responses)
On the other hand, there are trivial things that can't be done with
Posted Jan 21, 2020 6:51 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (1 responses)
You use an [Async]ExitStack. It's even in contextlib.
Yes, functional languages with Monads and all that stuff in them are super cool. No question. They're also super hard to learn compared to, say, Python.
Posted Jan 21, 2020 14:32 UTC (Tue)
by HelloWorld (guest, #56129)
[Link]
> They're also super hard to learn compared to, say, Python.
Posted Jan 21, 2020 2:32 UTC (Tue)
by HelloWorld (guest, #56129)
[Link] (2 responses)
If listing the errors that an operation can throw is too tedious, I would argue that that is not a language problem but a library design problem, because if you can't even list the errors that might happen in your function, you can't reasonably expect people to handle them either. You need to constrain the number of ways that a function can fail in, normally by categorising them in some way (e. g. technical errors vs. business domain errors). I think this is actually yet another way in which strongly typed functional programming pushes you towards better API design.
Unfortunately Scala hasn't proceeded along this path as far as I would like, because much of the ecosystem is based on cats-effect where type-safe error handling isn't the default. ZIO does much better, which is actually a good example of how innovation can happen when you implement these things in libraries as opposed to the language. Java has checked exceptions, and they're utterly useless now that everything is async...
Posted Jan 21, 2020 7:13 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (1 responses)
… and unstructured.
The Java people have indicated that they're going to migrate their async concepts towards Structured Concurrency, at which point they'll again be (somewhat) useful.
> If listing the errors that an operation can throw is too tedious, I would argue that that is not a language problem but a library design problem
That's one side of the medal. The other is that IMHO a library which insists on re-packaging every kind of error under the sun in its own exception type is intensely annoying because that loses or hides information.
There's not much commonality between a Cancellation, a JSON syntax error, a character encoding problem, or a HTTP 50x error, yet an HTTP client library might conceivably raise any one of those. And personally I have no problem with that – I teach my code to retry any 40x errors with exponential back-off and leave the rest to "retry *much* later and alert a human", thus the next-higher error handler is the Exception superclass anyway.
Posted Mar 19, 2020 16:52 UTC (Thu)
by bjartur (guest, #67801)
[Link]
Nesting result types explicitly is helpful because it makes you wonder when an exponential backoff is appropriate.
How about
Posted Jan 21, 2020 22:42 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link]
As a code reviewer, implicit codepaths are harder to reason about and don't make me as confident when reviewing such code (though the bar may also be lower in these cases because error reporting of escaping exceptions may be louder ignoring the `except BaseException: pass` anti-pattern instances).
Posted Jan 19, 2020 11:23 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (4 responses)
You're free to stick with purely dysfunctional programming then. Have fun!
Posted Jan 19, 2020 18:49 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Posted Jan 19, 2020 21:19 UTC (Sun)
by HelloWorld (guest, #56129)
[Link] (2 responses)
Posted Jan 19, 2020 21:25 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
My verdict is that pure FP languages are used only for ideological reasons and are totally impractical otherwise.
Posted Jan 19, 2020 22:42 UTC (Sun)
by HelloWorld (guest, #56129)
[Link]
> I also spent probably several months in aggregate waiting for Scala code to compile.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
A VCS must be able to round-trip files on the same FS. Even if they are not encoded correctly.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
'\ud808\udf45'
Szorc: Mercurial's Journey to and Reflections on Python 3
UnicodeEncodeError: 'utf-8' codec can't encode character '\udccc' in position 4: surrogates not allowed
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
>
> I'm not sure of which problem you're talking about.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Just wait until you see POSIX!
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Wol
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
It actually is not, if you're writing something that is not a Jupyter notebook.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
I will. Go is the single worst programming language design that achieved any kind of popularity in the last 10 years at least. It is archaic and outdated in pretty much every imaginable way. It puts stuff into the language that doesn't belong there, like containers and concurrency, and doesn't provide you with the tools that are needed to implement these where they belong, which is in a library. The designers of this programming language are actively pushing us back into the 1970s, and many people appear to be applauding that. It's nothing short of appalling.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
It absolutely does not. Concurrency is an ever-evolving, complex topic, and if you bake any particular approach into the language, it's impossible to change it when we discover better ways of doing it. Java tried this and failed miserably (synchronized keyword). Scala didn't put it into the language. Instead, what happened is that people came up with better and better libraries. First you had Scala standard library Futures, which was a vast improvement over anything Java had to offer at the time. But they were inefficient (many context switches), had no way to interrupt a concurrent computation or safely handle resources (open file handles etc.) and made stack traces useless. Over the years, a series of better and better libraries (Monix, cats-effect) were developed, and now the ZIO library solves every single one of these and a bunch more. And you know what? Two years from now, ZIO will be better still, or we'll have a new library that is even better.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
You haven't yet demonstrated a single advantage of putting this into the language rather than a library, which is much more flexible and easier to evolve. Your thinking that this needs to be done in the language is probably a result of too much exposure to crippled languages like Python.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
NB, Python also has the whole thing in a library. This is not primarily about language features.
It very much is about language features. Python has dedicated language support for list comprehensions, concurrency and error handling. But there is no need for that. Consider these:
x = await getX()
y = await getY(x)
return x + y
[ x + y
for x in getX()
for y in getY(x)
]
The structure is the same: we obtain an x, then we obtain a y that depends on x (expressed by the fact that getY takes x as a parameter), then we return x + y. The details are of course different, because in one case we obtain x from an async task, and in the other we obtain x from a list, but there's nevertheless a common structure. Hence, Scala offers syntax that covers both of these use cases:
for {
x <- getX()
y <- getY(x)
} yield x + y
And this is a much better solution than what Python does, because now you get to write generic code that works in a wide variety of contexts including error handling, concurrency, optionality, nondeterminism, statefulness and many, many others that we can't even imagine today.
Python basically transforms "result = foo(); return result" into what Go would call "err, result = foo(Context); if (err) return err, nil; return nil,result" behind the scenes. (If you also want to handle cancellations, it gets even worse – and handling cancellation is not optional if you want a correct program.) I happen to believe that forcing each and every programmer to explicitly write the latter code instead of the former, for pretty much every function call whatsoever, is an unproductive waste of everybody's time. So don't talk to me about Python being crippled, please.
This is a false dichotomy. Not having error handling built into the language doesn't mean you have to check for errors on every call.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Well, sure, if you have a nice functional language where everything is lazily evaluated then of course you can write generic code that doesn't care whether the evaluation involves a loop or a context switch or whatever.
You don't need lazy evaluation for this to work. Scala is not lazily evaluated and it works great there.
No? then what else do you do? Pass along a Haskell-style "Maybe" or "Either"? that's just error checking by a different name.
You can factor out the error checking code into a function, so you don't need to write it more than once. After all, this is what we do as programmers: we detect common patterns, like “call a function, fail if it failed and proceed if it didn't” and factor them out into functions. This function is called flatMap
in Scala, and it can be used like so:
getX().flatMap { x =>
getY(x).map { y =>
x + y
}
}
But this is arguably hard to read, which is why we have for
comprehensions. The following is equivalent to the above code:
for {
x <- getX
y <- getY x
} yield x + y
I would argue that if you write it like this, it is no harder to read than what Python gives you:
x = getX
y = getY(x)
return x + y
But the Scala version is much more informative. Every function now tells you in its type how it might fail (if at all), which is a huge boon to maintainability. You can also easily see which function calls might return an error, because you use <-
instead of =
to obtain their result. And it is much more flexible, because it's not limited to error handling but can be used for things like concurrency and other things as well. It's also compositional, meaning that if your function is concurrent and can fail, that works too.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Python doesn't have a problem with resources to be closed (that's what "with foo() as bar"-style context managers are for), nor concurrency (assuming that you use Trio or anyio).
Sure, you can solve every problem that arises from adding magic features to the language by adding yet more magic. First, they added exceptions. But that interacted poorly with resource cleanup, so they added with
to fix that. Then they realized that this fix interacts poorly with asynchronous code, and they added async with
to cope with that. So yes, you can do it that way, because given enough thrust, pigs fly just fine. But you have yet to demonstrate a single advantage that comes from doing so.
with
. For instance, if you want to acquire two resources, do stuff and then release them, you can just nest two with
statements. But what if you want to acquire one resource for each element in a list? You can't, because that would require you to nest with
statements as many times as there are elements in the list. In Scala with a decent library (ZIO or cats-effect), resources are a Monad, and lists have a traverse
method that works with ALL monads, including the one for resources and the one for asynchronous tasks. But while asyncio.gather
(which is basically the traverse
equivalent for asynchronous code) does exist, there's no such thing in contextlib
, which proves my point exactly: you end up with code that is constrained to specific use cases when it could be generic and thus much easier to learn because it works the same for _all_ monads.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
You can *always* write more code to fix any problem. That isn't the issue here, it's about code reuse. ExitStack shouldn't be needed, and neither should AsyncExitStack. These aren't solutions but symptoms.
For the first time, you're actually making an argument for putting the things in the language. But I'm not buying it, because I see how much more stuff I need to learn about in Python that just isn't necessary in fp. There's no ExitStack or AsyncExitStack in ZIO. There's no `with` statement. There's no try/except/finally, there's no ternary operator, no async/await, no assignment expressions, none of that nonsense. It's all just functions and equational reasoning. And equational reasoning is awesome _because_ it is so simple that we can teach it to high school students.
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
getWeather:: String→ DateTime→
IO (DnsResponse (TcpSession (HttpResponse (Json Weather))))
where each layer can fail? Of course, there's some leeway in choosing how to layer the types (although handling e.g. out-of memory errors this way would be unreasonable IMO).
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
Szorc: Mercurial's Journey to and Reflections on Python 3
There is some truth to this, it would be nice if the compiler were faster. That said, it has become significantly faster over the years and it's not nearly slow enough to make programming in Scala “totally impractical”. And the fact that I was able to name a very simple problem (“make an asynchronous operation interruptible without writing (error-prone) custom code and without leaking resources”) that has a trivial solution with ZIO and no solution at all in Go proves that pure fp has nothing to do with ideology. It solves real-world problem. There's a reason why React took over in the frontend space: it works better than anything else because it's functional.