Linux distributions and Python 2

Posted Jun 11, 2018 19:09 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
In reply to: Linux distributions and Python 2 by hkario
Parent article: Linux distributions and Python 2

> they will never migrate off of it, as bad as it sounds.
Bad?

Having a completely stable language is actually a blessing.

Linux distributions and Python 2

Posted Jun 11, 2018 19:18 UTC (Mon) by Otus (subscriber, #67685) [Link] (19 responses)

Yeah, people still use last millennium's C versions.

Unfortunately, (c)python has tied the language and compiler/interpreter version quite strongly.

Linux distributions and Python 2

Posted Jun 11, 2018 19:23 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (16 responses)

C upgrades are basically backwards compatible. Upgrading from C98 to C11 usually requires only minor adjustments.

Py27 to Py3 migration is very painful and gains you pretty much nothing.

Linux distributions and Python 2

Posted Jun 11, 2018 19:55 UTC (Mon) by Otus (subscriber, #67685) [Link] (8 responses)

And yet this year's gcc releases see fit to support -std=c90 et al.

I will be completely unsurprised if 2.7 continues to see use ten years from now. I'd prefer to use it if I knew it would be supported. As is, I'm grudgingly making sure anything new also works under some 3.x.

Linux distributions and Python 2

Posted Jun 13, 2018 7:54 UTC (Wed) by codewiz (subscriber, #63050) [Link] (6 responses)

I'd be surprised if anything shipped as part of current Linux distros still required to be built in C90 mode. It's there just in case, and it's very easy to support for gcc because, as other have noted, the C and C++ languages have never broken compatibility to the level that Python did.

If C98 suddenly switched the meaning of 'char' from being an 8-bit integer to a Unicode codepoint, you can bet that we'd still be using C90 today, officially supported or not.

Linux distributions and Python 2

Posted Jun 13, 2018 8:53 UTC (Wed) by gevaerts (subscriber, #21521) [Link] (5 responses)

As far as I know, a C90 compiler can can have char be a Unicode codepoint just as well as a C98 one.

Linux distributions and Python 2

Posted Jun 13, 2018 10:13 UTC (Wed) by codewiz (subscriber, #63050) [Link] (4 responses)

There's wchar_t, but it's a distinct type from char: http://en.cppreference.com/w/cpp/language/types#Character...

While the standard doesn't even bother saying whether char is signed or unsiged, no sane C compiler would dare switching overnight the representation of C strings. Incrementally switching a large codebase from 8bit charater strings to Unicode would be nearly impossible. Pure insanity! Which is exactly why the transition to Python3 has been going on and on for the past 10 years in spite of the considerable effort the community put into it :-)

Linux distributions and Python 2

Posted Jun 13, 2018 10:55 UTC (Wed) by excors (subscriber, #95769) [Link] (3 responses)

wchar_t is a bit awkward since it can't actually hold Unicode codepoints on Windows.

Win32 did go through an ANSI->Unicode(ish) switch (with Windows NT, I presume), and tried to make it easy for applications to migrate incrementally. If you start with char* and "strings" and strcmp() and MessageBoxA(), you can gradually replace it with TCHAR* and _T("strings") and _tcscmp and MessageBox, and it will compile exactly the same as before. Then you #define UNICODE and suddenly it all gets macroed into WCHAR*/L"strings"/wcscmp/MessageBoxW etc, and you probably get a load of build errors, and you fix some before getting bored and going back to ANSI mode. After enough iterations you might get something that compiles properly with the Unicode API. Then you probably keep the TCHAR/_T/etc macros - it seems standard practice is to use those instead of the Unicode-only versions, even if you know you're never going back to ANSI.

I guess the difference with Python is that Microsoft saw the transition as an unending process, and they knew they would have to support the ANSI APIs forever, so they accepted the ugly macros as the cost of supporting both. Python expected the transition to finish quickly so they could kill off Python 2, so it seems they focused more on designing a nice end state (with arguable success) and less on easing the transition between the two states.

Linux distributions and Python 2

Posted Jun 13, 2018 11:56 UTC (Wed) by codewiz (subscriber, #63050) [Link] (2 responses)

And let's not forget _MBCS, hehe :-)

I guess Microsoft can be excused for the messy Unicode transition. They were among the first to attempt it, and at the time they were still saddled with support for a myriad of encodings from the DOS era. Oouch! :-)

But in 2008 there was no excuse whatsoever for not using the awesome UTF-8 encoding as the internal representation of strings. It was already obvious which way the wind was blowing, and pretty much every dynamic language transitioned to UTF-8, except Python. Actually, PHP 6 also tried moving to UTF-16, but after struggling with performance regressions and compatibility issues for several years, they finally realized it was a very poor decision and ditched the entire release. Then PHP 7 came out with UTF-8 strings.

Later programming languages like Go and Rust could have picked either UTF-8 or UTF-32 without compatibility concerns, and they still went with UTF-8 for simplicity and performance. You lose the O(1) random-access to Unicode codepoints, sure, but how often do you really do that? Whereas having to re-encode all text on the I/O boundary is a major inconvenience and causes all sort of round-trip issues.

Linux distributions and Python 2

Posted Jun 16, 2018 18:39 UTC (Sat) by epa (subscriber, #39769) [Link] (1 responses)

Does it really matter how the Python 3 implementation represents Unicode strings internally? It could use UTF-8, or UTF-16, or UCS-4, or even some wacky encoding of its own, and in principle you wouldn't notice any difference. I think what you are saying is that there should be an easier implicit conversion between byte strings and Unicode strings, with UTF-8-ish encoding and decoding rules used by default.

Linux distributions and Python 2

Posted Jun 24, 2018 14:40 UTC (Sun) by codewiz (subscriber, #63050) [Link]

Well, it does matter for both performance and memory usage, which are observable properties of a program as much as its output.

Earlier versions of the C++ standard intentionally left some things unspecified to give implementations a certain degree of freedom in when choosing the internal representations of data structures, etc. Turned out it was a terrible idea in practice, because some implementations, including GCC, chose to make std::string copy-on-write with reference counting, some had a small-string optimization which would save an allocation, and others would append the \0 on the fly only when someone invoked c_str(), occasionally causing the string to be reallocated.

This caused portability issues where a valid and idiomatic C++ program which would normally execute in 1 second on Linux could take hours and run out of memory on a different standard-compliant run-time which would copy all the strings. And there was no reasonable way to fix the performance bug without causing unpredictable performance regressions in other valid programs.

What I'm getting to here is that the performance characteristics of strings, dictionaries and lists is part of the contract. Even if left unspecified, a lot of code in the wild will start depending on it. And once you specify the exact behavior and complexity of all the basic operations on a container, this leaves very little room for changing the internal representation in a significant way.

Over time, Python 3 grew some clever string optimizations to mitigate the overhead of converting strings on the I/O boundary. But these optimizations are inherently data dependent. Let's say I want to read a 100MB html file into a string and then send it over a socket. This will take roughly 100MB of RAM in Python 2, while a Python 3 program will jump from 100MB to 400MB after someone inserted a single emoji in the middle of the file.

While dynamic, garbage-collected languages are expected to have less predictable runtime behavior, the idea that user-controlled input can undo an important optimization that doubles or even quadruples memory usage is terrifying.This could even be used as a DoS vector against Python servers running in containers with hard memory limits. For similar reasons, dicts typically employ a cryptographic hash function to prevent DoS attacks based on causing too many collisions, which would trigger worst-case O(n) lookups for every key.

And this is just one of the several issues that a clever internal representation will bring over a simple one. Another tricky one that comes to mind is round-trip of utf-8 and other encodings, including input containing invalid codepoints. I've seen Python backup software which failed to restore an archive due to a filename containing an invalid utf-8 character (that file came from AmigaOS which didn't have utf-8, and I never noticed because all cli tools simply handled the filename without trying to read too much into it :-)

These are the kind of concerns you wish a good language runtime would hide from you, so you don't need to audit your codebase to make sure nobody accidentally used the built-in string class to read arbitrary user input. Go and Rust took the simple and elegant approach of declaring that strings are internally represented the same way they're represented externally. Python 2 was essentially already the same way, so isn't it weird that it was perceived as a defect so serious to justify a 10-year long migration to get it fixed?

Linux distributions and Python 2

Posted Jun 14, 2018 8:44 UTC (Thu) by kooky (subscriber, #92468) [Link]

I wouldn't say you get nothing.

I've moved several large Flask web apps from python2 to python3. It has been very satisfying.

We were getting quite a few `unicode` trouble tickets. Fix usually just to run as python3. Changes to code usually very minor and done in a few hours.

Linux distributions and Python 2

Posted Jun 11, 2018 20:19 UTC (Mon) by malefic (guest, #37306) [Link] (6 responses)

> Py27 to Py3 migration is very painful

Having migrated numerous codebases, I disagree. For high quality codebases with decent test coverage the migration is straightforward. Instagram, which isn't small by any metric, had made their transition with several months worth of work by a few engineers (https://www.youtube.com/watch?v=66XoCk79kjM)

> gains you pretty much nothing.

A decade of language development is not "nothing". Even if you don't care about numerous language improvements, the real-world performance gains alone are a very good reason to upgrade (https://twitter.com/victorstinner/status/865973964421476352, https://bit.ly/2JEIzkh)

Linux distributions and Python 2

Posted Jun 11, 2018 20:37 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

Py3 is often slower than Py2: https://stackoverflow.com/questions/37052139/string-perfo...

You can't say beforehand if your codebase is going to do any better with Py3. I've seen it go both ways.

Linux distributions and Python 2

Posted Jun 12, 2018 5:54 UTC (Tue) by k8to (guest, #15413) [Link]

I think it strongly depends how the str/bytes split affects your particular code.

I've brought very significant codebases to 3 without any significant pain, just cleanups that should have been done anyway, and I've also had code that required huge pain because it wanted to deal mostly with byte strings, but so much of the python3 library doesn't want byte strings.

Linux distributions and Python 2

Posted Jun 14, 2018 3:48 UTC (Thu) by lambda (subscriber, #40735) [Link] (3 responses)

For high quality codebases with decent test coverage the migration is straightforward

Good thing that most code out there is a high-quality code base with decent test coverage then, right?

I just ported a 750 line script from Python 2 to Python 3; it's only fairly recently that I've found there are enough reasons to do so (in this case, availability of Python 3 and not Python 2 in the base install of our distro, plus the typing module and mypy to add static types), and enough barriers have been removed.

Here are the steps I went through to make sure the port was working:

Run 2to3 over the script
Run autopep8 to fix a few issues introduced by 2to3
Run pylint and flake8, fix up issues found
Add type annotations, check types with mypy, fix up type issues that had crept in
Write unit tests (this had originally been a quick script, so not covered by unit tests, but had grown enough that they'd be useful for this port), fix issues discovered by unit tests that still remained even after mypy and pylint fixups
Realize that I'd ported to Python 3.6, and target system was running 3.5, so set up tox and fixed up more things that differed between 3.5 and 3.6 (tests that worked in 3.6 due to deterministic dict ordering started failing non-deterministically in 3.6; thanks to tox for printing the random number seed on each test run for making that easier to reproduce and fix)
Actually try building my package and tests on the target system, have to fix more test issues due to an older version of pytest on the target system.
Finally actually run the result on the target system, and even after all of those steps above (pylint, flake8, mypy, unit tests with pretty good coverage) still have issues to fix

And this is for a fairly simple little script, dealing with some files which just contain ASCII text and ASCII filenames.

Our main codebase uses Twisted, using PB and Jelly to pass objects over the network. There were no automated tests when I started at the company, and while we've been trying to add tests for new features or code that we're touching, we still have pretty poor coverage. Additionally, it's is a networked application that also has to interact with networked filesystems on Windows, macOS, and GNU/Linux, which makes the encoding issues even more fun to deal with. Oh, and most of what is jellied and shared between systems consists of dicts, with different value types for different keys.

At some point, we are going to have to port this to Python 3, but I imagine it's going to take a long time, cause a lot of grief for the developers in the process, and still probably cause regressions for customers even if we're as thorough as we can be by adding unit tests, integration tests, and type annotations.

Linux distributions and Python 2

Posted Jun 15, 2018 15:13 UTC (Fri) by togga (subscriber, #53103) [Link] (2 responses)

Your comment actually makes a case for migrating from python altogether. I've done many attempts on py3 over the years and in most cases py3 is a regression, especially when using python as a glue- pr scripting language. Syntax candy and numerous ways to workaround a broken threading model doesn't make up for the downsides.

Start try to reuse existing python modules from another more "down to earth" language like go and then migrate away is my tip. Python language is obviously not long term stable.

Linux distributions and Python 2

Posted Jun 28, 2018 6:24 UTC (Thu) by larslehtonen (guest, #125318) [Link] (1 responses)

Go is the real Python 3. Is that still at all controversial?

Linux distributions and Python 2

Posted Jun 28, 2018 13:15 UTC (Thu) by anselm (subscriber, #2796) [Link]

Yes.

Linux distributions and Python 2

Posted Jun 13, 2018 9:08 UTC (Wed) by gdt (subscriber, #6284) [Link] (1 responses)

Yeah, people still use last millennium's C versions.

C had a moment not dissimilar to Python's, with the move from K&R to ANSI C.

It was one of the defining moments in the development of free software. Some popular Unixen took the opportunity to charge for their ANSI C compilers, which lead people to discover GCC and then the GNU Project's other tools.

Linux distributions and Python 2

Posted Jun 14, 2018 3:55 UTC (Thu) by lsl (subscriber, #86508) [Link]

> C had a moment not dissimilar to Python's, with the move from K&R to ANSI C.

The difference being that I can still use code written in the K&R dialect and link it together with new sources into a single program. Heck, I can even use K&R style as well as C11 features in the same translation unit. Sure, you get a -Wimplicit-int warning and it's a GCC thing but it works.

Linux distributions and Python 2

Posted Jun 11, 2018 20:59 UTC (Mon) by remicardona (guest, #99141) [Link] (12 responses)

> Having a completely stable language is actually a blessing.

And that's fine, if you're willing to maintain CPython 2.7 when upstream _finally_ cuts the cord.

As far as I'm concerned, there's more than enough py3 work around that I can actively decline py27 offers. Maybe I'm exaggerating a bit, but to me, py27 is now as foreign a language as node or php. I've worked with it and taught it for a long time, but I don't want to deal with its idiosyncrasies. As a professional trainer/teacher, py3 is way easier to teach than py27.

And most companies are either on board with py3 or they're finally realizing that py27 is a dead end and they're going to have a hard time attracting developers (juniors or seasoned) with py27 code bases.

Again, nothing wrong with py27 per se, but good luck finding actively maintained packages that consider py27 as a blessing. Good luck finding devs willing to work on py27 projects. Good luck finding a supported/maintained py27 distribution. And good luck migrating to py3 the day py27 is no longer usable within your organization.

And I mean that sincerely: good luck.

Linux distributions and Python 2

Posted Jun 11, 2018 21:15 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> And that's fine, if you're willing to maintain CPython 2.7 when upstream _finally_ cuts the cord.
It'll be supported by RHEL well into 2027. I don't have to care much about it.

Py3 is not really a different language, it's the same old Py2 with added inconvenience. I've done migrations Py2->3 and haven't found anything worth the effort so far. And I really detest the efforts to force the migration by actively making Py2 more and more inconvenient to use.

Personally, a major motivation for me to finally switch to PyX would be GIL removal.

Linux distributions and Python 2

Posted Jun 12, 2018 12:01 UTC (Tue) by hkario (subscriber, #94864) [Link] (1 responses)

> it's the same old Py2 with added inconvenience

that's, like, your opinion, man

seriously, I'd love to drop python 2 to be able to use generators to their full advantage (`yield from` would make my code so much cleaner)

so, sorry, but if you really think that Py3 is just py2 with few "inconvenient" items added then I don't think you can claim to know Py3

> Personally, a major motivation for me to finally switch to PyX would be GIL removal.

well, it won't happen in py2.7, that's certain

Linux distributions and Python 2

Posted Jun 14, 2018 20:12 UTC (Thu) by togga (subscriber, #53103) [Link]

> well, it won't happen in py2.7, that's certain

http://pypy.org

Linux distributions and Python 2

Posted Jun 14, 2018 9:13 UTC (Thu) by LtWorf (subscriber, #124958) [Link] (3 responses)

> it's the same old Py2 with added inconvenience

Something tells me that you never ventured outside of writing English, because in any other language, having strings in unicode rather than ASCII is a huge improvement.

Then, Python3 has type annotations, which lets you use mypy to do some static checks on the code.

I wrote typedload (https://github.com/ltworf/typedload) to load json-like data into typed data structures, so that once loaded, you know your data has the correct types and can be safely be passed around.

My grandmother did not know how to use a computer, that doesn't mean that computers are useless.

Linux distributions and Python 2

Posted Jun 14, 2018 17:21 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

JFYI, my native language is Russian so I’m acutely aware of encoding problems since so many of them were used for Cyrillic scripts. I can speak Ukrainian and understand Polish and Czech. I’d used to be able to speak German but haven’t practiced it in decades and lost most of it. Now I’m also studying Mandarin Chinese for a change.

Static annotations are nice, but are kinda beside the point.

Linux distributions and Python 2

Posted Jun 16, 2018 16:33 UTC (Sat) by lsl (subscriber, #86508) [Link] (1 responses)

Except that Python 2 strings aren't ASCII and support Unicode perfectly well: you just put UTF-8 inside them.

Linux distributions and Python 2

Posted Sep 19, 2019 16:11 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

They work but are very error prone, because characters might take any amount of elements, so you have all sort of issues when doing cycles or manipulation.

With unicode objects, you know that they are not array of bytes and a conversion is needed, and you get an error if you didn't do the conversion that you were supposed to do.

Linux distributions and Python 2

Posted Jun 15, 2018 15:28 UTC (Fri) by togga (subscriber, #53103) [Link] (4 responses)

> Good luck finding devs willing to work on py27 projects.

My experience is the opposite. With a py3 project, you are almost guaranteed to end up in an encoding hell. For me, py3 projects raises the "over-engineered" or "system architecture by syntax convenience" flags and you may find yourself in a world of software pain.

Linux distributions and Python 2

Posted Mar 5, 2021 11:22 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (3 responses)

One could go and read a 10 minutes tutorial…

Linux distributions and Python 2

Posted Mar 5, 2021 12:02 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

"go read a tutorial" does not change the data your python3 code encounters out in the real world.

Linux distributions and Python 2

Posted Mar 5, 2021 15:59 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (1 responses)

It helps in knowing how to not write buggy code that handles encodings.

Anyway for "random data in the world" python3 is way better than python2. Which is why the change was made.

In python2 you couldn't really rely on ranges or indexes on strings, because you never knew if you would be splitting a unicode sequence and creating something broken.

But, it worked most of the times… it would just fail when unexpected unicode sequences appeared.

Linux distributions and Python 2

Posted Mar 5, 2021 16:15 UTC (Fri) by foom (subscriber, #14868) [Link]

In python3 you can't really rely on ranges or indexes on strings either, because you never know if you will be splitting a unicode grapheme cluster and creating something broken.

But, it works most of the time… it will just fail when unexpected unicode sequences appear (combining accents, or emoji skin tone modifiers, or flags, or ...)

Contrast this with perl6, which has built in support for correctly preserving grapheme clusters in it's string methods.