Hovmöller: Moving a large and old codebase to Python3 [LWN.net]

2to3 was a bad idea

Posted Feb 21, 2018 2:13 UTC (Wed) by vstinner (subscriber, #42675) [Link] (28 responses)

I recently gave a talk "Python 3: 10 years later" at FOSDEM: https://fosdem.org/2018/schedule/event/python3/

I explained that it took a few years to the Python community to understand that 2to3 was a bad idea. Dropping immediately Python 2 support is simply not possible for many good reasons. Adding Python 3 support now seems obvious, but it wasn't the first approach promoted in the community.

It's not just the 2to3 tool, but also Python 2.7 and 3.x which were designed to force developers to drop Python 2 support at once. Again, it took a few years to adjust Python 2.7 and 3.x to ease the migration to Python 3.

2to3 was a bad idea

Posted Feb 21, 2018 7:47 UTC (Wed) by niner (subscriber, #26151) [Link] (27 responses)

What I do not understand - to this day is why make it so hard on the users in the first place? Why not let programmers use both Python 2 and Python 3 in the same program, like it's possible with Perl 5 and Perl 6? That way it could be upgraded piece by piece instead of all at once.

2+3 in the same codebase

Posted Feb 21, 2018 9:09 UTC (Wed) by smurf (subscriber, #17840) [Link]

Hindsight is 20/20. At the time there were too many semantic differences.

Also, in Perl the internals can easily discover which environment they're currently executing in (as opposed to the environment they have been created in). That's far more difficult (and thus inefficient) in Python, but would have been required for str-vs.-bytes, semantics of bytes/bytearray, iterators, …

2to3 was a bad idea

Posted Feb 21, 2018 10:22 UTC (Wed) by vstinner (subscriber, #42675) [Link] (10 responses)

> like it's possible with Perl 5 and Perl 6

I may be wrong, but I understood that hacks like Inline-Perl5 to run Perl5 in Perl6 use two processes. The Perl6 process spawns a Perl5 process and then do RPC calls and exchange data. It works. But it's maybe not the perfect solution. There are many implementations of Perl6, but none were forked from Perl5. So Perl6 breaks the full C API, and is very different from Perl5. I don't know well Perl, but this recent open letter gives an idea of the situation:
https://www.perl.com/article/an-open-letter-to-the-perl-c...

If you read my slides, Python 3 was forked from Python 2 on purpose. Backward compatibility and reduce the number of differences between Python 2 and Python 3 were part of the Python 3 design. Technically, it's possible to write Python code running on Python 2 and Python 3 with a few changes (see the six module). I consider that this part is a success.

> What I do not understand - to this day is why make it so hard on the users in the first place? Why not let programmers use both Python 2 and Python 3 in the same program (...)

Honestly, I don't think that it's for a technical reason. Technically, you can do many things. In practice, Python is written by a small team of developers working in their free time. Developers prefer to work on a fresh code base without all the design issues of Python 2. But many changes were made in Python 2.7 and Python 3.x to make the migration simpler.

Python prefers to regulary makes tiny backward incompatible changes (prepared with deprecation warnings in the previous release) to reduce the technical debt inside Python. Perl works differently. In Perl, you can ask to get the old behaviour in one file (ex: run as Perl 5.14 in Perl 5.20).

IMHO it's more a deliberate choice to regularly polish the language and its standard library.

It's not easy to compare Python and Perl because of these differences. But I let you decide between Perl6 and Python3 which one succeded. Hint: Perl developers like to repeat like Perl6 is a different language that you should have a different name.

Perl5 continues to evolve whereas Python community decided to "un-schedule" Python 2.8.

2to3 was a bad idea

Posted Feb 21, 2018 11:32 UTC (Wed) by niner (subscriber, #26151) [Link] (9 responses)

Inline::Perl5 works with a single process using Perl 5's embedding interface. So yes, there are two interpreters/virtual machines, but they are in the same process. Data has to be marshalled/converted between the two and one needs shims to make e.g. classes of the one language available in the other. Inline::Perl5 makes this rather seamless though with the effect that for most cases it just magically works(tm). Of course Perl's flexibility as a language helps in this. Note that there are also Inline::Python modules both for Perl 5 and Perl 6. They allow for the same kind of interoperability between Python and Perl.

For compatibility between Python 2 and Python 3 code, embedding is not an option since ironically their internals are too similar, i.e. they share symbol names. But combining the two is still possible using two processes and IPC, like you imagined Inline::Perl5 would work. That would have been an option since day one of Python3K. Of course it's not perfect and performance would be an issue. But there are countless cases where something like this could have spared users a lot of pain and made it easier for people to start using Python 3, even though some essential 3rd party libraries had not migrated yet. And it'd have made it easier for library authors to upgrade without losing their user base. It could even still help today.

As the author of Inline::Perl5 I can tell you, that getting it to a point where I could use Perl 5's database support in Perl 6 was an effort of two fun afternoons of hacking. Once you've got a basic communications channel open, it's really just a matter of making it more comfortable to use and blur the borders between languages.

Regarding the success of Python 3 vs. Perl 6 I can share a user's perspective: my company is stuck on Python 2 with no sane way to upgrade while we are already using some Perl 6 code in production. And the sole reason for the latter is the ability to combine both Perls in a system.

2to3 was a bad idea

Posted Feb 22, 2018 6:59 UTC (Thu) by mb (subscriber, #50428) [Link] (8 responses)

>For compatibility between Python 2 and Python 3 code, embedding is not an option

I don't think this is the case.

It is easily possible to have a program that runs on 3.x and 2.7, as long as new 3.x-only features and old deprecated pre-2.7 features are not used.
So in the transition period it's just needed to first port to 2.7 and then to 3.x.
Code embedding is not needed at all, because the syntax is fully compatible. The rest is handled by the highly dynamic nature of the language. You can just do if 3: foo; else bar for API differences in the libraries.

2to3 was a bad idea

Posted Feb 22, 2018 7:41 UTC (Thu) by niner (subscriber, #26151) [Link] (6 responses)

The point is that you have to port all your code including all your dependencies and their dependencies and ... to 2.7 and then again all your code and all those dependencies to 3.x. If that were easy or quick, it wouldn't have taken Python 3 10 years to gain acceptance.

2to3 was a bad idea

Posted Feb 22, 2018 8:35 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (5 responses)

>If that were easy or quick, it wouldn't have taken Python 3 10 years to gain acceptance.

2.7 didn't exist 10 years back nor did 3.x releases that bridged the gap between 2.x and 3.x releases. If 3.0 had all of that relevant functionality of the latest 3.x release and 2.7 was released before 3.0 and the appropriate libraries like six or future were released along with 3.0 release, the time taken would be less but the Python community didn't have all the insight into this that they have now.

2to3 was a bad idea

Posted Feb 22, 2018 18:36 UTC (Thu) by raven667 (subscriber, #5198) [Link] (4 responses)

> If 3.0 had all of that relevant functionality of the latest 3.x release and 2.7 was released before 3.0 and the appropriate libraries like six or future were released along with 3.0 release, the time taken would be less but the Python community didn't have all the insight into this that they have now.

I'm not sure of that, I imagine that there were python developers who were pointing out the transition difficulties at the time, who saw the engineering and social challenges, but were ignored by the leadership. It might be good for someone more familiar with the details to go back and figure out who was right and who was wrong, to have some accountability for those decisions, to identify who has a good intuition for likely future outcomes and who doesn't. One of humanities super-powers over other living things is the ability to accurately predict the future, to envision consequences, but at the time decisions get made you don't know for certain what is likely to be outcome, without looking back you can't assign weight to conflicting opinions and worldviews, so every decision is like its being made for the first time.

2to3 was a bad idea

Posted Feb 23, 2018 20:20 UTC (Fri) by rahvin (guest, #16953) [Link] (3 responses)

> It might be good for someone more familiar with the details to go back and figure out who was right and who was wrong, to have some accountability for those decisions, to identify who has a good intuition for likely future outcomes and who doesn't.

I don't think pointing fingers will get you anywhere, regardless of who was responsible they appear to have mostly figured it out at this point. People hopefully learned from the mistake. Besides, it's probably more than one or even two people that made the initial decision. IME pointing fingers is literally the worst thing you can do. It's a destructive process to attempt to assign blame, and it often destroys team cohesion.

2to3 was a bad idea

Posted Feb 26, 2018 21:46 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (2 responses)

No finger pointing was mentioned. The idea is to analyze how the decision-making process works (or doesn't) and how to improve it. Those who care about blame will find some and assign it anyways.

2to3 was a bad idea

Posted Feb 26, 2018 22:16 UTC (Mon) by raven667 (subscriber, #5198) [Link]

Yes, that is what I meant, playing some blame and shame game is not effective to generate learning, only the avoidance of pain, whereas accountability and analysis of the decision making process, after the outcomes are known, can generate better decisions in the future about the future.

2to3 was a bad idea

Posted Mar 6, 2018 11:02 UTC (Tue) by dgm (subscriber, #49227) [Link]

I mostly agree. The question is not about pointing fingers, but about avoiding such bad decissions in the future. For that, looking at the root cause is the most important thing. Were there any good arguments that were being ignored? If so, why? Of course, you have to balance that with analysis-paralisis. Decissions need to be made in reasonable time, otherwise they are not any good.

Who got it right is important, but only so much. You have to take into account the broken clock effect (you know, even a broken clock gives the right time twice a day), so better start listening to people with future-viewing super powers once they have proved themselves, at least, a couple of times.

2to3 was a bad idea

Posted Feb 26, 2018 21:49 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Yes, Python and C code could be Python2 and Python3 compatible, but that requires code changes. True embedding means I can link to libpython2.7.so *and* libpython3.5m.so at the same time and mediate between them (see Lunatic[1] for how it could work). The use of the same symbols and global variable names prevents this approach.

[1]https://labix.org/lunatic-python

2to3 was a bad idea

Posted Feb 21, 2018 10:49 UTC (Wed) by ceplm (subscriber, #41334) [Link] (14 responses)

The dark secret of all py2k -> py3k (of whose I am one) converters, that most of the work they did, they are a bit ashamed of.

Only during porting we found out that our glorious test suite is actually completely useless to support really large refactoring.

And especially, those of us who were not blessed/cursed by having native language which fits in ISO-8859-1 encoding, for the first time in their life were forced to stop pretending that whole world fits into that encoding and they had to read for the first time in their life https://wp.me/p83KNI-eH and many of us had never forgiven for the pain it caused them (https://is.gd/vZMzJ0 ;)).

And yes, I think people in ISO-8859-1 languages outside of US-ASCII are even worse than Americans. The former are (sometimes, rarely) a bit humble because they know their native character encoding is insufficient. French, Germans and other ISO 8859-1 natives look down on Americans smugly persuaded they know all about foreign languages, and yet they know nothing.

And yes, a bit of self-promotion. I have just finished porting to dual py2k/py3k of M2Crypto, with a bit of marketing hype the most complete Python bindings for OpenSSL (especially useful if you want more than getting 's' in your https). Available on PyPI and the GitLab repo with all issues, merge requests and all that jazz on https://gitlab.com/m2crypto/m2crypto/ .

2to3 was a bad idea

Posted Feb 22, 2018 15:12 UTC (Thu) by nim-nim (subscriber, #34454) [Link] (9 responses)

??? French and German need € which is not in ISO-8859-1

French needs Œ/œ wich requires at least iso-8859-15 (and it was always easier to migrate directly to UTF-8 than get iso-8859-15 properly supported by US-centric tools)

German also needs capitalized ss nowadays

Replace French and German people by "French and German programmers using Windows with CP 1252 and pretending it is ISO-8859-1" and you may be closer to the truth.

2to3 was a bad idea

Posted Feb 22, 2018 15:44 UTC (Thu) by ceplm (subscriber, #41334) [Link]

Sorry, you are right. At least, s/ISO 8859-1/ISO 8859-15/g of course.

2to3 was a bad idea

Posted Feb 22, 2018 22:52 UTC (Thu) by anselm (subscriber, #2796) [Link] (6 responses)

German also needs capitalized ss nowadays

Do we, now? I live in Germany and I don't think I've ever seen a capitalized ß anywhere except in press releases from the Unicode committee.

2to3 was a bad idea

Posted Feb 23, 2018 4:39 UTC (Fri) by spaetz (guest, #32870) [Link] (1 responses)

German newssite Tagesschau claims that this will affect family names in identity cards. So, there will be effects.

2to3 was a bad idea

Posted Feb 23, 2018 11:45 UTC (Fri) by vstinner (subscriber, #42675) [Link]

"""
Python doesn't make politic :-) Python implements Unicode standard.

Python 3.7 currently uses Unicode 10.0: haypo@selma$ ./python Python 3.7.0a0 (heads/master:b903067, Jun 30 2017, 11:49:25)
>>> unicodedata.unidata_version
'10.0.0'

It seems like Unicode 10 still uses "SS":

>>> 'ß'.upper()
'SS'

The German government has to change the Unicode standard :-) Please report this "bug" to the Unicode standard :-) So I closes this issue as "third party.
"""

https://bugs.python.org/issue30810

2to3 was a bad idea

Posted Feb 23, 2018 7:00 UTC (Fri) by tdz (subscriber, #58733) [Link] (2 responses)

Think of type setting with small capitals (https://en.wikipedia.org/wiki/Small_caps). The old single-size ß never looked good in these cases.

2to3 was a bad idea

Posted Feb 25, 2018 0:46 UTC (Sun) by anselm (subscriber, #2796) [Link] (1 responses)

You don't use “ß” when typesetting in small caps.

2to3 was a bad idea

Posted Feb 26, 2018 9:54 UTC (Mon) by smurf (subscriber, #17840) [Link]

You didn't before the capital 'ß' existed. You do now. I've seen real-life examples.

2to3 was a bad idea

Posted Feb 23, 2018 10:03 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

Like any encoding change it will take time to be adopted, with lots of techies pretending it does not exist and blocking non-techies from using it, in order to avoid dealing with migrations

(That's where the smug people ceplm complains about come from, they will pretend their native language only needs ISO-8859-1, and work in an en_US locale with a qwerty keyboard, in the hope it convinces @boss to postpone i18n work. They mostly only end up convincing themselves.)

IIRC capital ss adoption was not uniform in German-speaking countries, with some sold on it and others more cautious.

2to3 was a bad idea

Posted Feb 28, 2018 10:19 UTC (Wed) by tedd (subscriber, #74183) [Link]

German also needs capitalized ss nowadays

Oh God, too much Reddit. I thought there was a WWII joke in there somewhere.

2to3 was a bad idea

Posted Feb 23, 2018 0:05 UTC (Fri) by raiph (guest, #89283) [Link] (3 responses)

It seems to me that the python language team:

* is unaware that Unicode is the planet's standard for text characters (seems unlikely); or

* has decided to permanently limit python's built in character handling to work only with the subset of human languages represented by the most politically powerful within the tech community in the 90s (if true, seems incredible); or

* is aware that Python 3 ignores this issue and is contemplating a Python 4 that will radically alter its character handling.

Does anyone know which of these three applies?

2to3 was a bad idea

Posted Feb 23, 2018 8:59 UTC (Fri) by smurf (subscriber, #17840) [Link] (2 responses)

Python 3 does *not* ignore this issue. Its strings are all Unicode. Python2 did. Its unicode-vs.-string handling worked for test cases but was an unhandle-able mess when you needed to keep binary and Unicode data separate. Python3 introduced that separation but, as was said, the test cases were completely unable to capture the real-life difficulties that approach caused.

My opinion: better late than never.

2to3 was a bad idea

Posted Feb 23, 2018 10:58 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

There is no magic Unicode data. It simply doesn’t exist.

2to3 was a bad idea

Posted Feb 23, 2018 20:26 UTC (Fri) by rahvin (guest, #16953) [Link]

Given what python went through they might be able to write one!

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 7:13 UTC (Wed) by theophrastus (guest, #80847) [Link]

My py2 to py3 situation can be considered unusual and likely small compared with any here, but I did discover for myself a particularly labyrinthine rats-nest that both wasn't touched by '2to3' nor did any of us more casual users see coming: specific interactions with other processes. In short, we had a lot of python routines which communicated with a lot of networked resources via a dozen different aging protocols (direct subprocesses, UDP, NFS, ssh, old socket libraries, and the like) and it appeared completely random if those streams came in as strings or bytes, some starting as one and once the connection was established went to the other. So the subsequent functions handling all the data all had to be split as to whether bytes or strings had to be processed at intermediate levels; particularly those which had regular expressions parsing applied.

There were more than a few of us who were ready to throw up our hands and stick with python2. But we persisted, and I think we've got our in-house data harvesting all converted to python3 now. But I've never encountered a process where the solution felt more Pyrrhic. Too bad there aren't merit badges for this sort of thing.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 14:43 UTC (Wed) by jhoblitt (subscriber, #77733) [Link] (3 responses)

Why would six be preferred over future? The later allows code to be slowly converted to be 'native' python3 compatible while python2 support can be dropped after full conversion by simply removing the import(s).

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 21:52 UTC (Wed) by ceplm (subscriber, #41334) [Link] (2 responses)

Well, first of all, because I have never heard about future until now. Unfortunately, it doesn't seem to be the only reason.

Nice thing about six is that it is so simple and one file, so it doesn't have to be dependency of your project, but it is simply bundled, to be updated whenever you feel like it.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 22:54 UTC (Wed) by jhoblitt (subscriber, #77733) [Link]

At $day_job, we've migrated several 100K lines of py2, with external scientific users, to py3 using future and it has been fairly painless. We're planning to completely drop support for py2 in the next couple of months (finally...).

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 23, 2018 10:05 UTC (Fri) by jengelh (guest, #33263) [Link]

And then there's python-nine, which is python-six turned the right way (according to them).

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 20:53 UTC (Wed) by togga (subscriber, #53103) [Link] (7 responses)

After too much day-to-day py2/py3 experience and after reading all stories my advice is to migrate away from this "continuous migration language". I've never seen such issues in other languages. Apart from that, performance (especially multithreading) is still way off. There are more modern, more efficient and more stable languages out there.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 22:06 UTC (Wed) by ceplm (subscriber, #41334) [Link] (6 responses)

You apparently haven't seen much:

* How well you are with your Perl 6 code? (Not mentioning some nasty incompatibilities between some minor versions of Perl 5, e.g., in relation to Unicode).

* Ruby 1.8 -> 1.9? Do I need to say more (that's minor version again) And of course, 1.9 -> 2.0 wasn't without its pains as well.

* NodeJS? v0.10.* from 2013 and it is now EOS. All of them based on V8 with various levels of incompatibility.

* Java 5, 6, or sorry 7, and whatever version it is now? Each one of them needs substantial fixes and corrections.

And of course, PHP managed to be incompatible between its major versions in similar manner (<5, 5+) even without resolving difficult problems py3k resolved (ehm, PHP still doesn't have proper Unicode support, does it?).

The lesson is simple: do have your test suite ready and be prepared to spend some time before upgrading.

Actually, ignoring py2k->py3k transition, I would argue that Python is a way more stable than others: program from Python 1.5 (that's December 31, 1997) run usually quite well on 2.7.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 22:46 UTC (Wed) by sfeam (subscriber, #2841) [Link]

"I would argue that Python is a way more stable than others: program from Python 1.5 (that's December 31, 1997) run usually quite well on 2.7". That is very much not my experience. Incremental porting to 2.3 2.4 2.5 2.7 has each time been a major headache, to the point that much older code is never ported at all. This leads to production machines with parallel installation of all python versions back to 2.4 just so that existing 3rd-party applications can continue to run.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 21, 2018 23:08 UTC (Wed) by jhoblitt (subscriber, #77733) [Link]

Some of these points aren't really comparable to the py2 ->3 migration.

Early perl5's unicode handling problems were fairly minor and string handling generally just worked. py3 still suffers from encoding problems even when everything is marked as `u` strings and all file I/O has an explicit encoding declared. Often, an encoding exception bubbles up from some library with no indicated of the file being read of the offending string.

There is no ambiguity with `use v6;` Which does not even compare to the wheel-of-fish that `/usr/bin/python` has been even after PEP394, which did come out until after every distro had done something different.

My impression of the ruby 1.8.x -> 1.9.x migration being bumpy was because many of the new 1.9 idioms were not supported by 1.8. This lead to folks bending over backwards to keep gems compatible with 1.8. Other than C code, can you give an example of 1.8 code that's broken under 2.4? AFAIK, Ruby has never had a world breaking change.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 22, 2018 1:02 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

There are several points you’re missing.

Perl5/6 is the only comparable break but unlike Python, Perl5 is well supported and is still being developed. There are ways to integrate both languages in one process. And last but very much not least, Perl6 doesn’t have GIL.

I don’t know much about Ruby or NodeJS, but Java is almost perfectly backwards compatible. I’m still using a library compiled for Java5 on the newest Java9. They are also code-compatible.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 22, 2018 5:49 UTC (Thu) by jezuch (subscriber, #52988) [Link] (2 responses)

> * Java 5, 6, or sorry 7, and whatever version it is now? Each one of them needs substantial fixes and corrections.

It doesn't. Upgrading Java is a headache for devops but for programmers. (Changes in GC algorithm behaviour are much more risky than any other.)

And your snipe about "whatever version it is now" is misguided (it's 9 btw). Java was famously slow-moving but that's changing now.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 22, 2018 15:50 UTC (Thu) by ceplm (subscriber, #41334) [Link] (1 responses)

My main point was that if somebody hopes he finds a platform where he can survive without proper test suite and constant maintenance, he is wrong. And looking over the shoulders of my JBoss colleagues, it is very much true about Java as it is about Python or any other platform I was talking about. Yes, it is Java 9 now, no offense meant.

Hovmöller: Moving a large and old codebase to Python3

Posted Feb 25, 2018 9:50 UTC (Sun) by cpitrat (subscriber, #116459) [Link]

Having a good test suite allows you to find where the version change breaks you, it doesn't reduce the amount of changes you have to do (it actually increase it significantly). If you have a million lines of Python code, migrating from 2 to 3 is costly. Other migrations mentioned above are way less disruptive and can, AFAIK, be done progressively (deprecated features can be replaced before the change).