Fedora ponders the Python 2 end game [LWN.net]

Fedora ponders the Python 2 end game

Posted Aug 1, 2017 20:46 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (35 responses)

I know of several large Python2 codebases that are migrating to Google Go.

Fedora ponders the Python 2 end game

Posted Aug 1, 2017 22:57 UTC (Tue) by iabervon (subscriber, #722) [Link] (19 responses)

This is the perfect time to migrate from Python 2 to Go. You'll be done just in time to find out firsthand whether Go 2 actually meets their goal of not splitting the Go ecosystem...

Fedora ponders the Python 2 end game

Posted Aug 1, 2017 23:52 UTC (Tue) by dgm (subscriber, #49227) [Link] (2 responses)

So, can we say it's time to Go, but not to Go 2? What would Dijkstra say about that?

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 3:36 UTC (Wed) by tome (subscriber, #3171) [Link] (1 responses)

He'd say ouch but he'd be unharmed.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 3:49 UTC (Wed) by cry_regarder (subscriber, #50545) [Link]

That is a fairly considered opinion.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 6:04 UTC (Wed) by togga (subscriber, #53103) [Link]

Fair point. Maybe a more robust solution is not a single platform but rather a "floating" set of them with som common stable but flexible base?

Although Python was stable for a "good while" giving some nice years of productivity it seems smart to stick with something that won't intentionally or unintentionally screw you over after a number of years due to for instance one person's or company's decision (I recall having seen this before...).

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 16:46 UTC (Wed) by khim (subscriber, #9252) [Link] (14 responses)

It's not that hard to "not split the Go ecosystem". FORTRAN did that (and they changed they way language is used in pretty significant way over time), C did that (with transition to C99), even C++ did that (twice - with transition to C++98 from C and with transition to C++11 which is NOT 100% compatible with C++98). Only python developers decided they don't have to have to use tried and true approach and instead would force everyone to rewrite everything.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 18:48 UTC (Wed) by drag (guest, #31333) [Link] (13 responses)

You pointed out several examples of programs being broken by language versioning in Fortran and C/C++ and then in the same paragraph claimed that the python developers are the only ones that are guilty of this.

That's very contradictory.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 19:53 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (10 responses)

There are degrees of breakage - nobody expects software to be 100% perfect. But C++ tried hard to preserve compatibility with C, and it was not perfect but close enough so that incompatibilities could be fixed easily. The gains were also quite major in case of Fortran and C->C++.

Py3 did a huge compatibility break that required major changes and a lot of unexpected breakages. And for no real gain.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 23:19 UTC (Wed) by anselm (subscriber, #2796) [Link] (9 responses)

I don't know. For example, IIRC there are all sorts of subtle differences between C and C++, to a point where a valid C program doesn't necessarily work the same way in C++. By contrast, it is possible to write code that works in both Python 2.7 and 3.x, and the Python developers have made changes in recent versions of Python 3.x that improve compatibility even more.

Personally I prefer Python 3 because, among other things, strings work a lot better than they used to in Python 2. Making the transition is a one-time hassle but as far as I'm concerned it is worth it.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 0:05 UTC (Thu) by khim (subscriber, #9252) [Link] (2 responses)

For example, IIRC there are all sorts of subtle differences between C and C++, to a point where a valid C program doesn't necessarily work the same way in C++

That's why even today there are C parser in gcc and clang and you could link together modules written in C and C++.

By contrast, it is possible to write code that works in both Python 2.7 and 3.x

By contrast? "Normal" C code is also a C++ code, all the changes and possible incompatibilties are explicitly listed in C++ standard (annex C) and in general the "natural" case is when code written for old version works in a new version - only some odd corner-cases are broken!

Compare to python, where "normal" python 3 code is completely incompatible with python 2 and where compatibility features which allowed one to write 2/3 code only arrived later, when developers "suddenly" discovered that people are just not in hurry to spend countless hours doing pointless work for no good reason.

Python2 to Python3 transition may not be the worst type transition (PHP6 and Perl6 transition attempts were even worse), but it's certainly the worst one which haven't killed the language (PHP6 died and Perl6 haven't - but in both cases original implementation survived and are in wide use).

the Python developers have made changes in recent versions of Python 3.x that improve compatibility even more

Sure, but all that work was an obvious afterthought whiles it's certainly the most important part of any such transition.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 9:31 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

"Normal" C code is emphatically not C++ code, because normal C code often uses malloc() and seldom explicitly casts the return value to the desired pointer type because C's implicit conversion rules say that void * is implicitly castable to any other pointer type and programmers are often lazy. C++ discarded that implicit conversion.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 13:25 UTC (Thu) by niner (subscriber, #26151) [Link]

How can Perl 6 be worse when it's quite possible to do a piecemeal upgrade of a codebase from Perl 5 to Perl 6? If such a change is even desired instead of just combining the best parts of both languages. No Perl 5 developer was left in the rain and those who want, can use Perl 6. What about this was in any way worse than leaving people who cannot upgrade behind and forcing countless pointless man years of effort for the rest including distributions?

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 0:06 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

C vs. C++ breakage happened gradually, early C++ versions were pretty much "C with classes". I've seen million line C applications being translated into C++ by simply renaming the files and doing a few easy changes. And C is a compiled language, so that helps a lot.

And if everything else fails, you can always #include C-based API with minimum fuss even in modern C++ using 'extern "C"{}' blocks.

There's nothing comparable in Python world. The transition was abrupt and it required quite a lot of changes, and being an interpreted language you actually have to test everything.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 0:10 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

Personally I prefer Python 3 because, among other things, strings work a lot better than they used to in Python 2.

Actually situation with strings is awful in python2 is awful and python3 made it even worse. Why do you think WTF-8 was added to rust? Why to you think Go still considers strings a sequence of bytes with no string attached? World where only nice unicode strings exist is an utopia! That's why they were forced to throw away the notion that file names are strings and introduced path-like objects! And I'm sure it's not the end.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 0:37 UTC (Thu) by anselm (subscriber, #2796) [Link] (2 responses)

I can only speak for myself, but I'm way happier with strings (and byte sequences) in Python 3 than I used to be with strings (and Unicode strings) in Python 2. They pretty much seem to do what I expect them to do, and given a little care it is reasonably easy to write programs that work. Of course I may not be clever enough to appreciate how “awful” Python's string handling really is.

OTOH, I don't really care about WTF-8 in rust nor what Go considers a string because (so far) I'm not using either of those languages, and have no plans to do so in the foreseeable future.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 2:20 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

Of course I may not be clever enough to appreciate how “awful” Python's string handling really is.

My favorite example was Anaconda few years back. Pick text installer (because you are dealing with small VM), pick Russian language and do everything. On the very last screen it tries to show you "everything is done" message which is in KOI8-R instead of UTF-8 - with exception being thrown and whole installation rolled back. Just PERFECT handling of strings.

OTOH, I don't really care about WTF-8 in rust nor what Go considers a string because (so far) I'm not using either of those languages, and have no plans to do so in the foreseeable future.

That's Ok. If your goal is scripts which kinda-sorta-work-if-you-are-lucky then python or, heck, even bash work. If you want robustness then python is not for you.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 15:24 UTC (Thu) by intgr (subscriber, #39733) [Link]

People seem to have the wrong assumption about paths in Python 3. Python does actually properly handle filenames that aren't valid UTF-8; they are escaped with certain Unicode codepoints: https://www.python.org/dev/peps/pep-0383/ (I guess that's like WTF-8 in Rust). I think that's a pretty good compromise: it does the right thing with properly encoded paths (nearly all paths are) but still remains functional with paths that aren't.

> On the very last screen it tries to show you "everything is done" message which is in KOI8-R instead of UTF-8 - with exception being thrown and whole installation rolled back. Just PERFECT handling of strings.

Yes, that's exactly the behavior I want. There was a bug in the program (or translation) and a good programming environment should immediately throw an error, rather than proceed with some unexpected behavior. Even environments that used to play very fast and loose with types and ignore errors, like MySQL and PHP, have recently became significantly stricter. Otherwise, in complex programs, you will end up with latent errors that are much harder to debug and often data loss.

> If you want robustness then python is not for you.

Erm, in one breath you complain about Python being too strict and now you complain that it's not robust?

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 7:51 UTC (Thu) by roc (subscriber, #30627) [Link]

WTF-8 was not really "added" to Rust. There's a crate for it, that's all.

OTOH OsString has been part of Rust for a long time exactly because sometimes you need to deal with weird non-Unicode platform strings.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 23:48 UTC (Wed) by khim (subscriber, #9252) [Link]

Sorry. Thought it would be obvious from the context, but perhaps not. I mean: Mixed programs, in which packages written in Go 2 import packages written in Go 1 and vice versa, must work effortlessly during a transition period of multiple years.

Fortran 90 introduced free-form source input and arrays were redesigned from scratch (and ended up pretty bad: they were designed with Cray CPUs in mind and are not a good fit for modern CPUs) - yet old code could still be compiled with Fortran 2015 compiler! The same with C, C++ and other languages - "old style" is just a switch away and, most importantly, mixed programs, in which packages written in XXX import packages written in YYY and vice versa, must work effortlessly during a transition period of multiple years.

Fortran developers certainly learned from experience (Fortran 77 was not used widely for many years after introduction because it haven't supported some features of old Fortran 66 and thus old modules were not intermixable with new ones), C and C++ developers (and many, many, many others) have learned from it (e.g. Delphi introduced new class-style types and new strings - but old ones were available for years). That fact was certainly well-known to python community - they have just chosen to ignore all that experience.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 16:18 UTC (Thu) by smoogen (subscriber, #97) [Link]

I expect the point dropped is that every compiler toolkit will compile both for N years. The C compilers would work with both K&R C, ANSI C and C99 while emitting warnings. They would then drop K&R C or make it require an explicit flag. It then would only emit warning for mixing ANSI and C99. Then ANSI needed an explicite flag and then C99 only. All in all it pretty much took 15 years from C99 for various compilers to transition.

The same with Fortran. You could mix and match IV and 77 code. Then you needed a flag, then you needed a flag for F90 etc etc. And the same for C++ code. The transitions for each were slow and Azathoth knows how many compiler writers went mad having to work that kind of code. Which I expect that Guido was trying to avoid.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 4:11 UTC (Wed) by wahern (subscriber, #37304) [Link] (7 responses)

At work the recent Dirty Cow kernel patches broke the JVM. Something similar could happen with Go some day (or maybe already has?), which could be much worse as breaks in backward compatibility would require rebuilding every single application. That could prove a nightmare.

I know Linux tries to maintain strong compatibility guarantees, but both the kernel as well as, more typically, distributions fall short of that goal. For example, the deprecation of the sysctl(2) syscall, which broke descriptor-less acquisition of system randomness. (And because getrandom(2) wasn't added until recently, several long-term RHEL releases suffer from a flat-out regression in behavior and sandboxing capabilities.) For all their stodginess, commercial Unices like Solaris had good track records in this regard. (Perhaps this was their downfall!) On balance Go's static compilation model works well for most teams and in fact a unique advantage, but this could change.

External interpreters are more resilient in this regard. There's a reason the ELF specification is so complex, and why enterprise Unix systems evolved toward aggressive dynamic linking; namely, so you could modify and upgrade discrete components in a more fine-grained matter. On a platform like OpenBSD, which ruthlessly changes its ABI, a static compilation model is much more costly. As a recent DEF CON presentation suggests, improved security might require more frequent breaks in backward compatibility as a practical matter.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 7:14 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> At work the recent Dirty Cow kernel patches broke the JVM.
The mainline kernel has never broken JVM in released versions. Linux Torvalds would eat anybody who tried to do this on purposed. And Go has a critical mass of users, so any kernel-level breakages will be obvious. Go's static linking and independence from libc is a stroke of genius in my opinion. You just drop a binary and it's immediately usable.

This would also be a problem for Python, if you're using a C-based module.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 9:43 UTC (Wed) by wahern (subscriber, #37304) [Link] (5 responses)

> Go's static linking and independence from libc is a stroke of genius in my opinion.

It's a return to historic Unix. Plan 9 intentionally rejected dynamic linking, too.

I agree that in current production Linux environments static linking provides a significant net benefit. Static linking decouples components, allowing you to iterate development easier. But the spirit behind static linking is predicated on your codebase being well maintained, your programmers responsive to changes in interfaces, and that operating system interfaces are minimal and extremely stable. This is how things _should_ be, ideally. But if this were true in reality than CentOS and RHEL wouldn't dominate on the backs of their ABI compatibility guarantees, and people wouldn't sweat kernel upgrades or even deploying alternative operating systems. "Well maintained" does not describe most code bases; nor does "responsive" describe most engineers. Have you upgraded your vendor kernels (which would have broken the JVM), or recompile all your C and cgo programs, to mitigate Stackclash?

It's no coincidence that Plan 9 represents almost every resource as a file or set of files. A file-based object contract only needs four interfaces at the languag level to manipulate--open, read, write, and close. More over, in Plan 9 control channels rigorously restrict themselves to ASCII-based string protocols. That provides an incredibly stable contract for process-process and kernel-process integration, providing opportunities for rich and deep code reuse without sacrificing reliable backward and forward compatibility. Theoretically Plan 9 is a finished operating system: you aren't going to need to wait around for the kernel to provide improved sandboxing or entropy gathering as it's already provides all the interfaces you need or will ever get.

But Linux isn't Plan 9. New interfaces like seccomp aren't generally implemented via file APIs for very practical reasons. Even interfaces like epoll() were a regression in this regard, as an early predecessor to epoll() had you open /dev/poll instead of via a new syscall. Linux added getrandom(2) to supplement /dev/urandom because the Unix filesystem namespace model is fundamentally limited when it comes to achieving "everything is a file" semantics--unprivileged, unlimited namespace manipulation breaks deeply embedded assumptions, making it a security nightmare and necessitating difficult trade-offs when you wish to leverage namespace manipulation for, e.g., sandboxing. I could go on endlessly about how Linux extensions subtly break the object-as-file interface contract.

There are similar problems wrt resource management when you rely heavily on multi-threaded processes. Go isn't designed for simply spinning up thousands of goroutines in a single process; both Go and Plan 9 were designed to make it easy (and predicated upon the ability to) scale a service across thousands of machines, in which case you really don't care about the resiliency of a single process or even a single machine.[1] But that kind of model doesn't work for enterprise databases or IoT devices, or in situations where communicating processes implicitly depend on the resiliency of a small set of processes (local or remote) for persistence. Do you implement two-phase commit every time you perform IPC? That kind of model for achieving robustness is neither universally applicable, nor even universally useful; and it makes demands of its own. In practice execution is never perfect even if you try, but that's true just as much as when writing a highly distributed system as when trying to achieve resiliency using more typically approaches.

As I said before, dynamic linking and complex ABIs didn't arise in a vacuum. Torvalds guarantee about backward compatibility is meaningful precisely because it permits static compilation of user processes so you can decouple kernel upgrades from application process upgrades. But that guarantee wouldn't be required for different operating system models. It's not an absolute guarantee, especially given that almost nobody runs mainline kernels directly--you'd be an idiot to do do some from a security perspective. And the kernel-process boundary isn't the only place where you care about independently upgrading components.[2] If we don't appreciate why it came about and assume that returning to a static linking model will solve everything, we're doomed to recapitulate all our previous choices. Instead, we need to learn to recognize the many contexts where the static model breaks down, and continue working to improving the utility and ease of use of the existing tools.

[1] For example, Go developers harbor no pretense about making OOM recoverable, and AFAICT generally believe that enabling strict memory accounting is pointless. That makes it effectively impossible to design and deploy high reliability Go apps for a small number of hosts that don't risk randomly crashing the entire system under heavy load, such as during a DDoS. (Guesstimating free memory is inherently susceptible to TOCTTOU races. And OOM can kill any number of processes that ultimately require, directly or indirectly, restarting the host or at least restarting all the daemon processes which shared persistent state, dropping existing connections.) It's one thing to say that you're _usually_ better off designing your software to be "web scale". I whole heartily agree with approach, at least in the abstract. It's something else entirely to bake that perspective into the very bones of your language, at least in so far as you claim that the language can be used as a general alternative to another systems languages like C. (Which AFAIK is not actually something the Go developers claim.)

[2] Upgrading the system OpenSSL is much easier than rebuilding every deployed application. Go already had it's Heartbleed: see the Ticketbleed bug at https://blog.filippo.io/finding-ticketbleed/

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 19:40 UTC (Wed) by drag (guest, #31333) [Link]

> "Well maintained" does not describe most code bases; nor does "responsive" describe most engineers.

If your developer sucks and projects have bad habits then static vs binary dependencies are not going to save you. It's going to suck in slightly different ways, but it's still going to suck.

> There are similar problems wrt resource management when you rely heavily on multi-threaded processes. Go isn't designed for simply spinning up thousands of goroutines in a single process; both Go and Plan 9 were designed to make it easy (and predicated upon the ability to) scale a service across thousands of machines, in which case you really don't care about the resiliency of a single process or even a single machine.

Mult-threaded processes are for performance within a application. Dealing with resiliency is a entirely separate question.

If you have a application or database that isn't able to spread across multiple machines and can't deal with hardware failures and process crashings then you have a bad architecture that needs to be fixed.

Go users shoot for 'stateless applications' which allows easy scalibility and resiliency. Getting fully stateless applications is not easy and many times not even possible, but when it is possible it brings tremendous benefits and cost savings. It's the same type of mentality that stays that functional programming is better then procedural programming. It's the same idea that is used for RESTful applications and APIs.

Eventually, however, you need to store state and for that you want to use databases of one type or another.

The same rules apply however when it comes to resiliency. You have to design the database architecture with the anticipation that hardware is going to fail and databases crash and file systems corrupt. It's just that the constraints are much higher and thus so is the costs.

It really has nothing to do with static vs dynamic binaries, however, except that Go's approach makes it a lot easy to deploy and manage services/applications that are not packaged or tracked by distributions. The number of things actually tracked and managed by distributions is just a tiny fraction of the software that people run on Linux.

> For example, Go developers harbor no pretense about making OOM recoverable, and AFAICT generally believe that enabling strict memory accounting is pointless. That makes it effectively impossible to design and deploy high reliability Go apps for a small number of hosts that don't risk randomly crashing the entire system under heavy load, such as during a DDoS.

When you are dealing with a OOM app they are effectively blocked anyways. It's not like your Java application is going to be able to serve up customer data when it's stuck using up 100% of your CPU and 100% of your swap furiously trying to use garbage collecting to try to recover during the middle of a DDOS.

Many situations the quickest and best approach is just to 'kill -9' the processes or power cycle the hardware. Having a application that is able to just die and is able to automatically restart quickly is a big win in these sorts of situations. As soon as the DDOS ends then you are right back at running fresh new processes. With big applications and thousands of threads and massive single processes that are eating up GB of memory... it's a crap shoot whether or not they are going to be able to recover or enter a good state after a severe outage. Now you are stuck tracking down bad PIDs in a datacenter with hundreds or thousands of them.

> And the kernel-process boundary isn't the only place where you care about independently upgrading components.

No, but right now with Linux distributions it's the only boundary that really exists.

Linux userland exists as a one big spider-web of interconnected inter-dependencies. There really is no 'layers' there. There really isn't any 'boundries' there... It's all one big mush. Some libraries do a very good job at having strong ABI assurances, but for the most part it doesn't happen.

Whether or not it's a easier to just upgrade a single library or if it's even possible or if it's easier to recompile a program... all this is very in 'It depends'. It really really depends. It depends on the application, how it's built, how the library is managed, and thousands and thousands of different factors.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 7:10 UTC (Thu) by mjthayer (guest, #39183) [Link] (2 responses)

The thought that always goes through my head when I hear this discussion is "why not link statically but shell out to openssl(1) for encrypted connections". I am sure there is a good reason, which wiser people than me (I am no openssl expert, neither the library nor the command line tool) will tell me. This is not specific to openssl of course. It applies just as much to image format translation.

Generally, there must be a security risk threshold below which static linking can make sense. If you are at risk from a vulnerability in a dependency, you are presumably at risk from vulnerabilities in your own code too, so you have to be vigilant and to do updates from time to time anyway. The point where most updates are due to security issues in components is probably the point where the threshold has been passed.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 7:46 UTC (Thu) by mjthayer (guest, #39183) [Link] (1 responses)

Having posted that and tried to anticipate the responses, I have to admit that I don't know whether shelling out to openssl(1) is really such a big gain over dynamically linking in most Linux use cases. The main one I see where it is is when you are shipping a binary outside of the distribution and don't want to support multiple openssl shared library ABI versions.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 13:45 UTC (Thu) by niner (subscriber, #26151) [Link]

Funny that you bring this up in a news post that's dealing with "python2/3" vs. "python" command name. What do you hope to gain by shelling out to openssl instead of dynamically linking? The binary still has an interface that may or may not change in incompatible ways. At least for dynamic libraries there's support for versioning. As the article demonstrates, there is no such thing for system commands.

Fedora ponders the Python 2 end game

Posted Aug 4, 2017 1:23 UTC (Fri) by lsl (subscriber, #86508) [Link]

> Go already had it's Heartbleed: see the Ticketbleed bug at https://blog.filippo.io/finding-ticketbleed/

Uhm, that is not a Go issue at all but a bug in the TLS implementation of some F5 load balancer appliances. You could just trigger the bug using the Go TLS library client-side as it uses TLS session identifers of different size than OpenSSL (which was probably the only thing the vendor tested with).

When you sent a 1-byte session ID to these F5 appliances they still assumed these were 32 bytes long (as that's what OpenSSL and NSS happen to use) and echoed back your session ID plus the 31 bytes immediately following it in memory.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 5:51 UTC (Wed) by togga (subscriber, #53103) [Link] (6 responses)

How does Go stand as glue language?

What would it entail to set up an interactive environment integrating with for instance current scientific python 3rd party modules like Numpy/SciPy/Matplotlib?

[Go] >> pyimport("matplotlib.pyplot", "plt")
[Go] >> plt.ion()
[Go] >> plt.figure()
[Go] >> x = get_perf_data().view(np.recarray)
[Go] >> plt.plot(x.timestamp, x.value, 'x')

Go has perhaps a better ecosystem than old Python for integrating with web-applications opening up for a more portable platform?

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 7:08 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Go is decent, but there aren't that many glue libraries available in Go (yet) or stuff like Jupyter.

However, Go nicely scales upward - you can write million-line-scale maintainable systems in it, and easier than in Python.

For your plotlib example: https://github.com/gonum/plot/wiki/Example-plots

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 8:08 UTC (Wed) by wahern (subscriber, #37304) [Link] (4 responses)

Go relies on it's own calling conventions in order to implement dynamically growable stacks at the machine-code level. Invoking C functions (or C ABI-compatible functions) requires acquiring and jumping to a specially reserved stack. This is can problematic for many reasons, especially when you make heavy use of FFI.

Also, Go uses automatic garbage collection, but it doesn't provide anchoring primitives or similar interfaces that allow you to express complex cross-language ownership relationships in a way that is visible to the garbage collector. In a language like Perl or Lua I can easily create Lua objects which reference C objects which in turn reference Perl objects, persist these references past function invocation, and in a well-defined manner that cooperates with Perl's and Lua's garbage collectors.[1] I'm less familiar with Python but I presume it's relatively easy to do this as well, to some significant degree.

In short, for threading, stack management, and garbage collection--the three most fundamental aspects of any language--Go performs alot of the work at the compilation phase and in a way that conflicts with the C ABI, which is the de facto plane for cross-language interfacing. This was a very deliberate design choice, as for something like dynamically growable stacks (necessary for goroutines, which are absolutely fundamental to Go) you must eschew easy compatibility. Likewise, defining stable garbage collector interfaces for FFI is very difficult if you wish to preserve opportunities for refactoring or optimizing your implementation. Basically, Go is perhaps the one language least suitable as a glue language.

For a good overview of the issues one needs to take into account when designing a good glue language, read the short paper, "Passing a Language through the Eye of a Needle: How the embeddability of Lua impacted its design", at http://queue.acm.org/detail.cfm?id=1983083

[1] Cyclical references can be problematic. But Lua, at least, provides support for anchoring cyclical references in a way that the garbage collector can safely detect and break. In particular, either by making sure cycles pass through the uservalue stash of userdata objects, or for other anchoring strategies (e.g. anchors created in the global registry table indexed by C values) by making use of ephemeron tables.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 9:04 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Yes. However, most of Go's interfacing is one-way - calling C libraries from Go, and it's very easy to do with cgo. You can just import C headers and call functions right away.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 10:44 UTC (Wed) by wahern (subscriber, #37304) [Link] (1 responses)

A glue language, as I understand it, is a language which makes it easy to assemble an application where much of the heavy lifting (complex low-level logic and resource management, but often not high-level business logic, policy, or orchestration) is implemented by external components loaded into the runtime process. Being able to easily invoke C routines is only the most minimal requirement; necessary but hardly sufficient.

A strongly typed language usually makes for a poor glue language, IME, in terms of productivity. Where a glue language makes sense, it's because strict typing (or the particular characteristics of the language's typing) is a poor fit for the whole application, but you still want the benefit of that stricter typing (early bug detection, performance) for certain components. Because the types of one language almost never map cleanly to the types of another, you usually want the freedom that looser typing can provide in the glue language. And dynamically typed languages usually make introspection and dependency injection much easier; certainly more natural. That makes it easier to generate or instantiate clean bindings to complex interfaces (i.e. interfaces to software worthwhile to reuse from a different language), to write regression tests, and to refactor and iterate faster at the higher level of the application stack.

None of that describes Go. Rather, with Go you would usually solve your problems using the unique tools that Go provides. For example, to achieve good dependency injection you either leverage its duck typing or switch to CSP-style patterns by interfacing across channels. In other words, because Go provides relatively unique features for a systems language--for example, lexical closures that bind dynamically-allocated mutable objects--when Go is your primary language you should have less reason to resort to another language. Also, given how much people appreciate the freedom that Go's static compilation model provides--something which you've expressed in this thread--utilizing libraries via FFI would seem especially costly relative to most other languages.

Certainly languages other than Go are more suitable for particular tasks, and existing library implementations sufficiently useful to sometimes be worth the cost of binding. But the cost+benefit doesn't make Go a very good glue language as a general matter. The cost is higher because of the otherwise unnecessary constraints imposed when doing FFI, and the general mismatch between Go's unique runtime and every other runtime; and the benefit relatively less because Go has really strong language constructs often lacking in even dynamically typed languages. And this is generally what I hear from Go developers--to avoid cgo; that you quickly run into headaches for anything non-trivial; that if you find yourself relying on cgo too much you're probably doing it wrong. That's not something you should be hearing about a glue language, and not something you hear nearly as often in the context of Lua, Perl, or Python.

Fedora ponders the Python 2 end game

Posted Dec 8, 2017 22:41 UTC (Fri) by togga (subscriber, #53103) [Link]

With numpy announcing dropped python support etc this topic is hotter than ever.

"A strongly typed language usually makes for a poor glue language, IME, in terms of productivity."

My thinking is that this might be compensated by the quick compile time of go autogenerating the wrapping layer. This wrapping layer will be order of magnitude faster than python ctypes. With available debug data (introspection of c interfaces) it should be feasible?

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 19:33 UTC (Wed) by bokr (guest, #58369) [Link]

You may be interested to read what guile 2.2 does with multiple
languages,

https://www.gnu.org/software/guile/manual/html_node/index...

Specifically the following teaser is more than a teaser for 2.2, and
you can find much more elsewhere (presumably documentation will
catch up and have some direct links from here ;-)
E.g.,

https://www.gnu.org/software/guile/manual/html_node/Compi...

Various formats for the guile 2.2 manual can be had via
https://www.gnu.org/software/guile/manual/

BTW, note that calling things by unconfusing names was recognized
as important pretty long ago (400 BC+- ;-)
https://en.wikipedia.org/wiki/Rectification_of_names

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 6:40 UTC (Wed) by joib (subscriber, #8541) [Link]

> The questions is what platform is the future in the quick and dirty, "just works" productivity-department without the performance issues of Python?

Haskell?

No, I'm sort of serious.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 8:55 UTC (Wed) by rsidd (subscriber, #2582) [Link]

«The questions is what platform is the future in the quick and dirty, "just works" productivity-department without the performance issues of Python?»

Julia. For scientific computing anyway (I'm a scientist). Just as easy to write as python, and if you're careful about declaring types etc when required, blazing fast. And has most of the math stuff from numpy etc built-in, has a Jupyter front-end, can use matplotlib, etc. But it has some startup overhead (being llvm-based) so probably not suitable for scripting.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 9:37 UTC (Wed) by mikapfl (subscriber, #84646) [Link] (21 responses)

I think python3 is a safe bet for the "quick and dirty, "just works" productivity-department". Also, it actually performs faster then python2. I don' really see where you need "quick and dirty" and "high-performance" together, but if you need first quick-and-dirty, and then want to enhance performance, I'd say python3+numba is a great combination.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 11:00 UTC (Wed) by niner (subscriber, #26151) [Link] (20 responses)

How on earth can Python 3 be considered "just works" when Python 3 was the thing that broke everything? How should one ever trust the Python core developers again after this massive screw up? Especially since they have not shown any sign of even acknowledging their mistake.

No, the answer is "it depends". Some names have already be mentioned. For "quick and dirty", "just works" and "takes backwards compatibility really serious" I'd personally just use Perl.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 11:55 UTC (Wed) by Kamilion (guest, #42576) [Link] (17 responses)

Huh.
*Holds up a copy of Perl 6 and Parrot Essentials (2003)*

*connects to VM*
~$ perl --version

This is perl 5, version 22, subversion 1 (v5.22.1) built for x86-linux-gnu-thread-multi (with 58 registered patches, see perl -V for more detail)
Copyright 1987-2015, Larry Wall

*scratches head*

Sooooooo, where's the perl6 I tried to learn 14 years ago, before resorting to learning python?

I mean, yeah, python3's taken a few years, but when you say you'd personally just use Perl, and after saying "Especially since they have not shown any sign of even acknowledging their mistake.", I find myself feeling somewhat amused and confused at the same time.

On the flip side, I try to take a lot of care that python code I touch runs in both 2.7 and 3.4+.
https://github.com/kamilion/customizer/commit/f0b5521ef30...

Fixing the compatibility issues between the versions using u'{}'.format(data) to ensure all data was unicode, even under python 2, wasn't really terribly difficult.
I've had more trouble trying to bring some old python 2.3 code back to life.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 12:05 UTC (Wed) by lkundrak (subscriber, #43452) [Link]

Perl 5 and (perhaps unfortunately named) Perl 6 are different languages. Unlike Python 2, Perl 5 is actively maintained and doing well.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 12:08 UTC (Wed) by niner (subscriber, #26151) [Link] (15 responses)

Perl 6 is a different language than Perl 5. It's a clean break but other than the Python 2 -> 3 transition, it at least brings real improvements like being able to use multiple CPU cores in Perl 6 code (no GIL), state of the art Unicode support rivaled only by Swift, reactive programming, real grammars instead of just regular expressions and lots more.

And again in contrast to Python 2 -> 3, Perl 5 code can still be used via https://github.com/niner/Inline-Perl5/
There's no need to port whole code bases to Perl 6. Perl 5 and Perl 6 code can live side by side in the same program, allowing for a piecemeal transition if so desired.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 12:23 UTC (Wed) by anselm (subscriber, #2796) [Link] (14 responses)

So the Perl 6 developers did everything right except when they tried to figure out a name for their new language.

Frankly, I think that the issues with the Python-2-to-3 transition have been wildly overhyped. It is quite possible to write code that runs with both Python 2.7 and Python 3.x, and there are automated tools to help with that. The main mistake the Python developers made was to underestimate the time it would take to adapt various popular third-party libraries, but by now this is pretty much finished. I personally went over to Python-3-only for new code a while ago and haven't looked back.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 12:30 UTC (Wed) by niner (subscriber, #26151) [Link] (4 responses)

At work we sit on roughly 1.5 million Python 2 expressions baked into this lovely templating language called DTML [1]. There's no way anyone would pay for porting those and no existing tool will help us with that. So come 2020, we can either just continue to use an unsupported Python 2.7, probably compiled from source, or try to get rid of Python entirely. Our system already compiles most of those to Perl code before executing. For getting rid of Python 2 completely however, we'll have to be able to compile whole Python scripts including function definitions (but luckily no class definitions). Lots of work, but at least then we will be able to stay backwards compatible forever. Both with old code and with existing know how.

[1] http://docs.zope.org/zope2/zope2book/DTML.html#using-pyth...

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 14:10 UTC (Wed) by anselm (subscriber, #2796) [Link] (1 responses)

Oh. DTML. That sucks. I feel your pain :^(

As far as I'm concerned, Zope did look like a good idea in the early 2000s or so but quickly became a liability. Fortunately I managed to move off it soon afterward. I'm using Django today, which by now works great with Python 3.

Incidentally, there seem to be enough people in situations similar to yours that Python 2.7 support might not go away completely in 2020. It's just that the head Python guys said it won't be them providing that support. In effect, starting today you people have more than two years to pool your money and get Python 2.7 LTS organised. It's not as if the code base required loads of TLC to keep running, so this may be cheaper in the end than moving millions of lines of code to something else.

DTML

Posted Aug 2, 2017 14:21 UTC (Wed) by corbet (editor, #1) [Link]

Way back around then, when I was working to replace the initial PHP-based LWN site, I did a prototype in Zope and DTML. Then I got distracted for a few months. When I came back to it I couldn't understand anything I'd done and had to start over from the beginning trying to figure out what all the weird wrapper layers did. I concluded that it would always be that way and started over with something else... I've never regretted that decision.

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 16:35 UTC (Wed) by smoogen (subscriber, #97) [Link]

Or someone will offer a commercial branch of python-2.7 which has security updates applied to it for another decade. I mean that was the major reason for getting a paid version of Sun Fortran or C in the late 90's.. to keep that K&R and Fortran IV going :).

Fedora ponders the Python 2 end game

Posted Aug 2, 2017 18:54 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Python 2 will be maintained past 2020, there are too many projects dependent on it. RHEL 7 will be supported until 2027 at least and it contains Py2.

Fedora ponders the Python 2 end game

Posted Aug 4, 2017 1:49 UTC (Fri) by lsl (subscriber, #86508) [Link] (8 responses)

> The main mistake the Python developers made was to underestimate the time it would take to adapt various popular third-party libraries, but by now this is pretty much finished.

I don't think so. I still see Python programs (allegedly supporting Python 3) heavily shitting themselves upon encountering data that cannot be decoded to Unicode. We're talking external data here, like stuff coming in over the network or user-created file names.

Fedora ponders the Python 2 end game

Posted Aug 4, 2017 4:23 UTC (Fri) by smckay (guest, #103253) [Link] (7 responses)

Does Python 3 make it hard to handle malformed text? I mainly use Java and write backend code so the complaint about non-Unicode filenames is hard to understand. Does that happen a lot? Is it a client-side issue? For me the solution would be to tell ops to stop being cute and use normal filenames. :)

It does sound like the Unicode codecs have significant problems if exceptions are part of the default behavior. It's not like you can tell the socket/pipe/file to stop pulling your leg and cough up the <i>good</i> data. Rule #1 of text handling: do the best you can with what you're given and hope no one notices the ÃƒÆ.

Wrong file name encoding made easy

Posted Aug 4, 2017 5:17 UTC (Fri) by mbunkus (subscriber, #87248) [Link] (2 responses)

> I mainly use Java and write backend code so the complaint about non-Unicode filenames is hard to understand. Does that happen a lot?

It's really trivial to recreate. Take a non-ASCII file name on Windows, e.g. "möp.txt". Compress it into a ZIP, copy it to Linux and unzip it there. You'll end up with:

$ unzip the-zip.zip
Archive: the-zip.zip
inflating: mp.txt
$ ls -l
total 28
-rw-rw-r-- 1 mosu vj 18617 Aug 3 20:27 'm'$'\366''p.txt'
-rwxr--r-- 1 mosu vj 4272 Aug 4 07:07 the-zip.zip
$ rm m$'\366'p.txt
$ 7z x the-zip.zip
…snipped output…
$ ls -l
total 28
-rw-rw-r-- 1 mosu vj 18617 Aug 3 20:27 'm'$'\302\224''p.txt'
-rwxr--r-- 1 mosu vj 4272 Aug 4 07:07 the-zip.zip

The reason is simple: stupid file formats. There are no specs for file name encoding in ZIPs. There's no file name encoding indicator either. One could always use 7z, of course, where the specs state that file names be encoded in one of the Unicode encodings, but Windows doesn't come with support for 7z out of the box, but for ZIP. Aaaaand try getting people to use some new(ish) format and see how much success you have :(

Another thing that happens more often than I'd like is some mail program or other getting MIME wrong somehow so that an attachment containing non-ASCII characters in its file name being saved with the wrong encoding resulting in similar brokenness.

Wrong file name encoding made easy

Posted Aug 7, 2017 20:15 UTC (Mon) by cesarb (subscriber, #6266) [Link] (1 responses)

> There are no specs for file name encoding in ZIPs. There's no file name encoding indicator either.

Actually, there is. See appendix D of the ZIP spec at https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT which says:

- If bit 11 is set, the filename is in UTF-8
- If bit 11 is unset, the filename is in CP437
- The UTF-8 filename can also be in extra record 0x7075

Wrong file name encoding made easy

Posted Aug 8, 2017 2:40 UTC (Tue) by smckay (guest, #103253) [Link]

Ah, so when we run into a non-compliant zip file it is time to barf and die because we are running Python 3. I think I understand now.

Fedora ponders the Python 2 end game

Posted Aug 4, 2017 8:36 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

The right idea is to avoid Unicode decoding altogether. Just treat input as a stream of bytes for as long as you can.

For example, I have recently struggled with a Py3-based proxy server that failed with one broken client that sends non-ASCII header names.

Fedora ponders the Python 2 end game

Posted Aug 5, 2017 21:14 UTC (Sat) by flussence (guest, #85566) [Link] (2 responses)

A lot of modern languages try to force the Unicode issue by trying to hide the data from the programmer with a stone wall of abstraction. I've learned from trying to use several of them that there's just no sane default behaviour for all strings. Latin-1 is bad because it forces you to jump through hoops to handle human-readable text correctly, Unicode is *worse* because it forces you to jump through hoops to handle machine-readable text correctly (humans are generally more forgiving parsers).

The most programmer-abusive thing I've tried to use lately is actually perl6's binary types. There's no concept of endianness so the 16/32 bit wide variants are completely unusable for I/O… and they *only* work with I/O functions. You can't pack, unpack, recast to array or any other kind of high level operation on them. The rest of the language is (mostly) sane, but this forgotten corner has all the readability and portability of disassembled SIMD code with the performance of said asm being emulated in a high level language.

Fedora ponders the Python 2 end game

Posted Aug 7, 2017 11:12 UTC (Mon) by niner (subscriber, #26151) [Link] (1 responses)

"and they *only* work with I/O functions. You can't pack, unpack, recast to array or any other kind of high level operation on them."

That's not exactly true:
> perl6 -e '"Ödögödöckö".encode.say'
utf8:0x<c3 96 64 c3 b6 67 c3 b6 64 c3 b6 63 6b c3 b6>

> perl6 -e '"Ödögödöckö".encode.List.say'
(195 150 100 195 182 103 195 182 100 195 182 99 107 195 182)

> perl6 -e 'use experimental :pack; pack("NNN", 1, 2, 3).say'
Buf:0x<00 00 00 01 00 00 00 02 00 00 00 03>

> perl6 -e 'use experimental :pack; pack("NNN", 1, 2, 3).unpack("NNN").say'
(1 2 3)

Fedora ponders the Python 2 end game

Posted Aug 9, 2017 20:58 UTC (Wed) by flussence (guest, #85566) [Link]

Those are the 8-bit variants, of course they work. People notice when something as fundamental as UTF-8 breaks (usually!)

I'm talking about things like this:
> perl6 -e 'use experimental :pack; buf32.new(0x10203040, 1, 2, 3, 4).unpack("N").say'
4538991231697411

Or, here's a basic “real world” example I just made up: Read two equal-size image files in farbfeld format, alpha composite them, and write out the result to a new file… what would idiomatic perl6 code for that look like? Probably shorter than this comment if these bits of the language worked, but they don't.

Sorry for what looks like nitpicking some obscure corner of the language, but I've seen a few too many people get burned out exploring these dark corners; they receive the silent treatment when they point out the language is getting in their way, and subsequently ragequit. There's a lot of this broken window syndrome outside of the cool-oneliner-demo APIs, and it's been like this since forever.

Fedora ponders the Python 2 end game

Posted Aug 3, 2017 10:03 UTC (Thu) by Otus (subscriber, #67685) [Link] (1 responses)

> How on earth can Python 3 be considered "just works" when Python 3 was the thing that broke everything?

Python 3 just works if you are starting from scratch.

Had Python 2 not had as much adoption as it did, the breakage would have been a good idea. As is, they should have deprecated one thing at a time in a manner that would have been backwards compatible for a couple of releases. (And left the meaningless changes, some of which have been since rolled back.)

> How should one ever trust the Python core developers again after this massive screw up?

That is the clincher. I hope they've learned the lesson.

Fedora ponders the Python 2 end game

Posted Dec 22, 2017 20:30 UTC (Fri) by togga (subscriber, #53103) [Link]

No. Python3 is the core problem here (along with end of life for Python2). At least for me, developing in python3 is cumbersome and full of practical issues. Python3 is just not a productive language for me, and therefore I'm seeking alternatives.