|
|
Subscribe / Log in / New account

Fedora and Python 2

By Jake Edge
April 4, 2018

It has been known for quite some time that Python 2 will reach its end of life in 2020—after being extended by five years from its original 2015 expiry. After that, there will be no support, bug fixes, or security patches for Python 2, at least from the Python Software Foundation and the core developers. Some distributions will need to continue to support the final Python 2 release, however, since their support windows extend past that date; the enterprise and long-term support distributions will likely be supporting it well into the 2020s and possibly beyond. But even shorter-support-cycle distributions need to consider their plan for a sweeping change of this sort—in less than two years.

There was talk of having the actual end of life (EOL) occur at a party at PyCon 2020, but a mid-March query to the python-dev mailing list helped nail down the date once and for all. Currently, the only supported branch in the 2.x family is Python 2.7, which is up to 2.7.14 and is scheduled to have a 2.7.15 release sometime in 2018. It seems likely there will be at least one more release before EOL, which Python benevolent dictator for life (BDFL) Guido van Rossum proclaimed will be January 1, 2020:

The way I see the situation for 2.7 is that EOL is January 1st, 2020, and there will be no updates, not even source-only security patches, after that date. Support (from the core devs, the PSF, and python.org) stops completely on that date. If you want support for 2.7 beyond that day you will have to pay a commercial vendor. Of course it's open source so people are also welcome to fork it. But the core devs have toiled long enough, and the 2020 EOL date (an extension from the originally [announced] 2015 EOL!) was announced with sufficient lead time and fanfare that I don't feel bad about stopping to support it at all.

Benjamin Peterson, who is the 2.7 release manager, agreed, though he cautioned that the final 2.7 release may not literally be made on new year's day 2020. Others took notice of the date, including Petr Viktorin and the other maintainers of the python2 package for Fedora. Viktorin posted a message to the Fedora devel mailing list on behalf of all of the nine python2 maintainers that noted the EOL date and their intent to "orphan" the python2 package:

Fedora still has more than 3000 packages depending on python2 – many more than we can support without upstream help. We (rightly) don't have the authority to say "please drop your unneeded python2 subpackages, or let us drop them for you" [0]. The next best thing we *can* say is: "if Fedora is to keep python2 alive, we won't be the ones doing it – at least not at the current magnitude".

The first Fedora release that would be affected by the EOL date is probably Fedora 30, which is likely to land in the first half of 2019—and be supported into 2020. But, Viktorin argued, it makes sense to get started now by removing python2 dependencies for packages that don't really need them:

Unlike most other orphanings, we have some thousands of dependent packages, so a lot of time and care is required. In case no one steps up, we'd like to start dropping Python 2 support from dependent packages *now*, starting with ported libraries on whose python2 version nothing in Fedora depends. (We keep a list of those at [1].) Of course, we're ready to make various compromises with interested packagers, as long as there's an understanding that we won't just support python2 forever.

There was some confusion about what was being suggested but, in general, the reaction was positive. A rude complaint that the problem was essentially impossible to solve was met with strong disagreement. As Richard W.M. Jones pointed out: "it's hard to argue with a plan which has been pre-announced *2 years* in advance. If only all Fedora changes were given such a generous runway." But Randy Barlow wondered if the proposed incremental approach was right:

I'm +1 to the idea of dropping Python 2 support in general, but I'm not sure we should really do it gradually (which is what would effectively happen if some packagers start dropping now and others later, and others not at all). It seems to me like it'd be cleaner to have a release note on Fedora 30 that's just "Python 2 support dropped" and do it all at once.

That kind of cataclysmic approach might work for the Python code actually shipped by Fedora, but there is plenty of other code out there to consider. Python is, after all, a programming language, so there is an unknowable amount of Python 2 running on Fedora users' machines right now. A more cautious approach gives them time to notice and upgrade; as Gerald Henriksen put it:

By gradually (or sooner than Fedora 30) getting rid of all the libraries and other Python 2 stuff it at least gives the option for those people who get surprised to fix things before the Python interpreter itself goes EOL and doesn't get security fixes.

It should be possible to continue supporting Python 2.7 into 2020 and beyond by piggybacking on the work that the enterprise distributions will be doing. It is also possible, though perhaps not all that likely, that few or no security flaws will be found in the language after it drops out of its support window. RHEL 7 and CentOS 7 ship Python 2.7; both of those distributions will receive updates until 2024. That should help with keeping Python 2 alive, Kevin Kofler said; borrowing patches from RHEL/CentOS is something he has been doing for Qt and kdelibs for some time. As Viktorin pointed out, the Fedora Python SIG is already maintaining some EOL Python versions; it will do the same for Python 2.7:

As Python SIG we maintain old Python versions like 2.6 or 3.3 *today* – but just for developers who need to test backwards compatibility of their upstream libraries; we don't want to see them used as a base for Fedora packages. Why? To make sure Fedora packages work with modern Python, and to have only one time-sensitive place to concentrate on when a critical security fix comes. We want to put Python 2.7 in the same situation.

Part of the reason to start dropping Python 2 packages now is to figure out which packages can do it now and which ones will need additional help or coordination in the next few years.

Beyond just backward compatibility, though, Viktorin and company have another reason they are willing to maintain Python 2.7 past its EOL, which is mentioned in the original email: "support exceptionally important non–security critical applications, if their upstreams don't manage to port to Python 3 in time". However, if there are others who think they have a better approach to handling the EOL (or are willing to pick up the regular python2 package maintenance, rather than moving to a python27 "legacy" package as is planned), then the Python team wants to alert them to its plans. Viktorin expresses some skepticism that folks outside of the Python SIG will truly be in a position to take over, but doesn't want to foreclose that possibility.

This is not the first time that Fedora has discussed the switch. Back in August 2017, we looked at a discussion of where /usr/bin/python will point in a post-python2 world. Other distributions are grappling with the issue as well. A year ago, it was discussed on the debian-python mailing list (and again in August 2017), it is on the radar for openSUSE, and it recently came up for Ubuntu, as well. Each is working out how to highlight the problem areas for Python-2-only packages in their repositories and to make the switch to Python 3 smoothly. We will be seeing more of these kinds of discussions, across the Linux world (and beyond), as time ticks down to 2020.

The switch from Python 2 to 3 is a huge job; one might guess that it is orders of magnitude larger than anyone had anticipated back in the heady days of Python 3000 (around 2007, say). That is a testament to the popularity of the language and the various tools and frameworks it has spawned; it also likely serves as an abject warning for other projects that might ever consider a compatibility break of that nature. In the mid-late-2020s, with the transition presumably well behind them, the Python core developers (and community as a whole) will be due for huge sigh of relief. But it will take work all over the free-software world, including by distributions like Fedora, in order to get there.



to post comments

Fedora and Python 2

Posted Apr 5, 2018 6:27 UTC (Thu) by cyperpunks (subscriber, #39406) [Link] (74 responses)

Does this make sense at all? Python 3 is basically a new language, porting is anything
from trivial to very, very complex.

The ordering problem might be the worst: can't port since dependency A is missing, while
A needs dep B which needs C, which needs A.

How many billions of lines of Python 2 is out there?
There is 134 418 packages on PyPI, how many of those supports Python 3?

When will the last Python 2 script execute? In 50-60 years?

Fedora and Python 2

Posted Apr 5, 2018 9:28 UTC (Thu) by ecree (guest, #95790) [Link] (44 responses)

> "Python 3 is basically a new language"
The real problem is that while it's a _different_ language, it's not monotonically a _better_ language. In particular, if you're trying to write the kind of system software that shouldn't care what text encoding its input uses and shouldn't crash when fed badly-encoded text, working around Python 3's halpful Unicode handling leads to considerable friction; you basically have to either pretend that everything in the outside world is Latin-1, or use bytes objects _everywhere_ and accept that half the standard library won't work. Ah well, at least they eventually added %-formatting to bytes objects.

Fedora and Python 2

Posted Apr 5, 2018 9:49 UTC (Thu) by k8to (guest, #15413) [Link] (43 responses)

This situation makes me really sad.

When python 3 was more or less announced, I was sure they had the right idea, because the weird exceptions you'd get when a str hit a unicode object in python2 were just no good. But now that I've had time to experience the python 3 way, I think it's worse. I usually would rather deal with bytes. There are so many situations where the code I deal with doesn't know the encoding of the bytes its receiving, and python3 doesn't give me a reasonable way to accept those bytes and use most of the tools I would use in python2.

One of those unintended outcomes sort of things, I think. It feels like python3 was a few years to early to pick the right strategy with unicode.

Fedora and Python 2

Posted Apr 5, 2018 10:09 UTC (Thu) by Sesse (subscriber, #53779) [Link] (5 responses)

A few years early? They were several years after Perl 5 figured out their strategy, which continues to work well to this day and _didn't_ cause massive disruption.

(Perl 6 is a different story altogether…)

Fedora and Python 2

Posted Apr 5, 2018 17:21 UTC (Thu) by k8to (guest, #15413) [Link] (4 responses)

Okay, maybe the perl developers are smarter. Or maybe they just got luckier.
I certainly didn't believe that "let the container have bytes, and describe the encoding if known" was the right way around the time py3k was being clarified, but I do now. I think the industry at large hadn't come to that decision at all at that time.

Fedora and Python 2

Posted Apr 5, 2018 18:52 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

They got luckier. The user-visible part of UTF-8 handling in Perl vs. Python2 is roughly comparable in terms of interoperability of non-UTF8 vs. UTF8 strings, except that they were a bit more clever with their default encodings so that "simple" things you may want to do "just work".

However, I have personally converted a largish console-based Perl5 codebase from running on Latin-1 (or Latin-8 to be exact, as soon as the € happened) to UTF-8. Let me tell you that this is an exercise in hunting down annoying hard-to-reproduce bugs that you wouldn't wish on your worst enemy. We had the whole gamut – from mojibake in the database through strings which crashed the interpreter when printed to tearing our hair out trying to write code that works correctly in both locales. The only thing that saved us is the fact that you can unscramble real-world mixed UTF-8/Latin8 content safely, thanks to the way UTF-8 is encoded.

Python3's way of strict separation between bytes and strings may be more annoying when you start off, esp. on Windows, but IMHO it's a whole lot easier to make sure that the end result is actually correct.

Fedora and Python 2

Posted Apr 7, 2018 23:21 UTC (Sat) by flussence (guest, #85566) [Link]

I'll vouch for Perl 5's Unicode support being painful. Even in code where you start off with control over the encodings, you still have to know: what “use utf8” does, when to use “use feature 'unicode_strings'”, that {de,en}code_utf8 are a trap to avoid, the difference between “:encoding(utf-8)” and “:encoding(utf8)” (the latter is another trap!) and things like which DBI plugins use UTF-8 by default and which will mangle data by default (and the switch to enable unicode is usually in a different place for each one).

Those are all things I encountered just in a single project. To be fair it was one where I ended up needing to write custom Encode::* modules, so maybe not representative… but it's still a lot of pain for the sake of not breaking code written pre-Y2K.

Fedora and Python 2

Posted Apr 6, 2018 18:35 UTC (Fri) by tialaramex (subscriber, #21167) [Link] (1 responses)

There certainly are examples where Python maintainers tried to be far cleverer than they needed to be, and then (because humans aren't clever, we're dumb, that's lesson #1) this blew up in their faces.

Python's TLS implementation wants to check whether the TLS server has presented a certificate that's valid for the name of the TLS server you're trying to connect to. This makes sense, certainly as a default, if I said I wanted a TLS connection to foo.example.com, I definitely shouldn't need to roll my own certificate validator, or even explicitly say "Wait, this connection I have, is it _really_ to foo.example.com?" because duh, of course I want those things, let the handful of people who don't want checks ask *not* to check.

And all the people involved in designing this stuff at the IETF stage were conscious that this problem could be hard, and we don't want hard because this is a security system. So the certificates have DNS A-labels inside them. All you need to do is match the DNS name in the certificate (written with A-labels, ie ASCII) against the DNS name you looked up in your DNS system, which is also written with A-labels, ie ASCII. This is really boring and easy. Users don't necessarily understand A-labels, they might be gibberish, but the presentation layer is quite separate, securing that is a UX problem and not relevant to TLS or other low level components. All the tricky human stuff is pushed into the layer that was already dealing with humans, everything that's machine-to-machine needn't care about human cultural complexities like language.

Too simple for Python though, they decided to handle everything with U-labels so they can mark all the types "str". So now suddenly this low-level bit banging code that's supposed to securely move packets ends up with the entire i18n system baked into it, and inherits mysterious presentation layer related bugs. Problem with the matching? Oh sorry, you need to go fix this whole other Python sub-system that has nothing whatsoever to do with TLS ...

Eventually, literally in February this year, sanity finally prevailed and the latest Python 3 actually just does what the RFCs said it ought to do in the first place, massively simplifying the code _and_ making it more correct. Most users will never notice, because this was after all the Right Thing anyway.

Fedora and Python 2

Posted Apr 9, 2018 13:01 UTC (Mon) by cortana (subscriber, #24596) [Link]

Presumably this is https://bugs.python.org/issue28414 for anyone else who was interested in the background...

Fedora and Python 2

Posted Apr 5, 2018 11:56 UTC (Thu) by pabs (subscriber, #43278) [Link] (32 responses)

If you don't care about encoding, can't you just read bytes and write bytes and then get the appropriate jumble of bytes?

Fedora and Python 2

Posted Apr 5, 2018 16:19 UTC (Thu) by k8to (guest, #15413) [Link] (31 responses)

But the python2 you were using that worked with bytes now insists on working with strings.

Fedora and Python 2

Posted Apr 6, 2018 13:55 UTC (Fri) by barryascott (subscriber, #80640) [Link] (30 responses)

python 3 defaults the text APIs to use unicode. That works for me.

If you know that its bytes that you care about use the APIs that give you the bytes.

You can get the env vars as bytes, see file system names as bytes and read files in bytes.

I do not understand the criticism.

Barry

Fedora and Python 2

Posted Apr 8, 2018 12:59 UTC (Sun) by togga (guest, #53103) [Link] (29 responses)

Python2 is excellent as a glue language, combining components together, just because it didn't care about the contents in strings. Everything with a C/API is easy accessible, a C string is after all a null-terminated sequence of bytes. Python2's weak points (eg performance, threading, GIL, ...) could be masked by work being done behind C libraries and numpy (with workarounds like numexpr). Python3 changes all this on a fundamental level:
>>> import ctypes
>>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '_fields_' must be a sequence of (name, C type) pairs

Python3 is not longer a powerful glue environment (just look at the BCC encode/decode mess) and has no apparent strong side anymore, maybe syntax like a new BASIC.

After successfully using Python2 for 14 years I now don't recommend python for anything. There are more modern languages like Go with simple syntax, can also be used interactively and contrary to both python2 and python3 doesn't need a ton of a workarounds to be efficient.

Fedora and Python 2

Posted Apr 8, 2018 13:43 UTC (Sun) by smurf (subscriber, #17840) [Link] (28 responses)

Did you bother to submit a bug report / feature request to allow bytestrings?
And why are you using them, instead of strings, in the first place?

Fedora and Python 2

Posted Apr 8, 2018 19:16 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (26 responses)

Because symbols in libraries can contain invalid UTF-8 sequences?

Fedora and Python 2

Posted Apr 8, 2018 19:59 UTC (Sun) by smurf (subscriber, #17840) [Link] (23 responses)

Possible but unlikely – I never saw a symbol that wasn't ASCII. Obviously so unlikely that the code's author didn't consider that case.

Anyway, the right way to fix this is to report a bug. Bitch about it on LWN only when whoever is responsible for the code refuses to fix it.

Fedora and Python 2

Posted Apr 8, 2018 20:06 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (21 responses)

Well, I see. You prefer you programs to work possibly most of the time.

I prefer code that works 100% of the time, barring unrelated hardware/system software issues.

Fedora and Python 2

Posted Apr 8, 2018 21:12 UTC (Sun) by smurf (subscriber, #17840) [Link] (20 responses)

Wrong. I prefer problems to be fixed instead of bitched about.

To that effect I will now take that example code and do what somebody else should have done long ago, i.e. file a bug.

Fedora and Python 2

Posted Apr 8, 2018 21:27 UTC (Sun) by mjblenner (subscriber, #53463) [Link] (10 responses)

That example code:

>>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]})

isn't really a bug. The 'c_string_symbol' is the python-side handle to the C structure field. In python3 it needs to be unicode (i.e like the python source file), since you do something like

>>> s = t(some_dll[b'c_string_symbol']) # bytes used here to get the C function symbol
>>> s.c_string_symbol()

Fedora and Python 2

Posted Apr 8, 2018 22:41 UTC (Sun) by smurf (subscriber, #17840) [Link] (9 responses)

This does mean that symbols which aren't well-formed UTF8 are inaccessible. There should be a way to get around this restriction, even if the only tools that actually generate those beasts are code obfuscators (if that).

On the other hand: Python is written in UTF-8 (duh) and Python's way to access symbols is by using attributes (also duh). Requiring code to cater to corner cases that don't actually occur in the real world is a surefire recipe for code bloat but doesn't help anybody.

Fedora and Python 2

Posted Apr 8, 2018 23:00 UTC (Sun) by mjblenner (subscriber, #53463) [Link] (8 responses)

> This does mean that symbols which aren't well-formed UTF8 are inaccessible.

Uh, no. The bit that gets the symbols from the dll is using bytes. This bit:

some_dll[b'c_function_name']

You can't refer to it in python by the same random bytes though (why would that matter?).

Fedora and Python 2

Posted Apr 11, 2018 22:06 UTC (Wed) by togga (guest, #53103) [Link] (7 responses)

"You can't refer to it in python by the same random bytes though (why would that matter?)."

Because scripting is mostly about automation. It's just broken in this case to convert these symbols to another representation. This is just one example of many in this theme since many of the libraries and third party extensions needs string representation.

In a glue language scenario (which has been a strong side of Python), if I want to grab symbol (or blob) X from system A, handle it and pass it to system B, I do not want to have Y=f(X) as intermediate representation and then do the inverse function before passing it to B.

Simple things such as reading from an UART or a socket has become a mine-field as soon as you want to use something in Python dealing with strings. Especially annoying when developing and things might not be that clean, nice and tidy.

This whole problem-domain is something Python3 has created. For me, Python itself lacks both in performance and multi-threading and doesn't really have anything language-wise (apart from lots of third party libraries) to compensate for this loss of productivity.

Fedora and Python 2

Posted Apr 11, 2018 23:52 UTC (Wed) by smurf (subscriber, #17840) [Link] (1 responses)

> This whole problem-domain is something Python3 has created.

Well, all I can say is that my experience (both with Perl5 and Python2) is rather different. Our fight with Perl5, incrementally switching our corporate code base to UTF8 compatibility, was … ugly.

Thus I'm very happy about the fact that Python3 spews a large unfriendly stack dump to your terminal when I forget to specify how an external byte stream is encoded. While it's somewhat annoying when you JUST KNOW that all your data is UTF8, or latin1, or randombytes … things change, and when it "suddenly" isn't, you get mojibake. Or worse. No thanks.

Fedora and Python 2

Posted Apr 12, 2018 16:29 UTC (Thu) by togga (guest, #53103) [Link]

I use Python as a script language in the sense of a glue language that adapts to the world, you seem to have the ambition to change the world to adapt to Python. Python and UTF8, is not THAT good :-) For me Python is rather quite old, bloated and tired, and I don't even want to get started with UTF8 as some sort of universal data representation...

The latter sounds to me like utopia and an endless job for achieving nothing but explains lots of the attitudes of the Python community.

Fedora and Python 2

Posted Apr 12, 2018 1:12 UTC (Thu) by mjblenner (subscriber, #53463) [Link] (1 responses)

> It's just broken in this case to convert these symbols to another representation.

OK. I kind of get where you're coming from. Although I'm a bit confused. Or you're a bit confused.

ctypes is an ABI interface, so having the structure field be a different name to the function symbol is of no relevance for using python to glue various other C functions together (even when passing that structure around).

i.e. here:

> type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ...

Anyway, the easy answer is to just use python strings there. ctypes function symbol lookup converts strings to utf-8, so 99%+ of the time, this will work.

Otherwise...

Decode the symbol name to a string with errors='surrogateescape' for use in python, and use the same error handler to decode back to the original bytes for getting the symbol out of the library.

Or you could add a layer of indirection between the structure field names and the function symbols.

Fedora and Python 2

Posted Apr 12, 2018 16:36 UTC (Thu) by togga (guest, #53103) [Link]

> "Or you're a bit confused."

I'm not confused, I'm just experienced lots of issues I didn't have before Python3's software castle in the air.

> "the easy answer is to just use python strings"

Isn't this kind of bloated. These strings can come from anywhere and might not even be visible in Python code at all.

> "Decode the symbol name to a string with errors='surrogateescape' for use in python, and use the same error handler to decode back to the original bytes for getting the symbol out of the library."
> "Or you could add a layer of indirection between the structure field names and the function symbols."

You mean use Python3 and stick with tons of workarounds and issues just for it's sake? Change the whole world to Python3? I value my time much more than that.

Fedora and Python 2

Posted Apr 12, 2018 23:47 UTC (Thu) by dvdeug (guest, #10998) [Link] (2 responses)

I'm not feeling it here. Having a text file in multiple encodings is incredibly fragile and a pain to work with. If you have to deal with external, non-ASCII symbols in your program, you're going to want to change the names to something you can work with in Python, not a random set of bits. If you're automatically generating code and don't care about the Python symbol names, then encoding them using base64 is trivial (and again, you don't care about the Python symbol names so why do you care if they're line noise or encoded line noise?)

If you're just passing something from system A to system B, you shouldn't have to change the data. But there's a fairly thin region where you can choose to not unmangle something and still expect to be able to do anything with it. Stuff not being clean, nice and tidy is all the more reason to make sure you know exactly how the data you're handling is formatted.

Fedora and Python 2

Posted Apr 15, 2018 15:13 UTC (Sun) by togga (guest, #53103) [Link] (1 responses)

> "then encoding them using base64 is trivial (and again, you don't care about the Python symbol names so why do you care if they're line noise or encoded line noise?)"

1. Doesnt scale. Changing representation requires one additional pass over the data. Python is already slow to begin with.
2. Accessing human readable symbols is convenient when needed by scripts, tests or debug.

Fedora and Python 2

Posted Apr 15, 2018 20:53 UTC (Sun) by dvdeug (guest, #10998) [Link]

It adds time linear in the amount of text being processed. Since it only needs to touch text being processed, and processing it already takes at least time linear in the amount of text being processed, it does not change the Big O of your operation at all. It scales by definition. I suspect even in Python it will always be trivial in the amount of time it takes, but the issue is certainly not whether it scales.

We're not talking about human readable symbols; we're talking about "non-ASCII symbols in your program" that aren't Unicode. Even if the editor mangles it for you, how is bsymFFEAA9 worse than \xff\xea\xa9? Something slightly smarter than base64 would preserve human readable names and only mangle unreadable names, but the only case where not worrying about mangling is going to cause problems in Python 3 is when it's not human readable.

Fedora and Python 2

Posted Apr 11, 2018 22:31 UTC (Wed) by togga (guest, #53103) [Link] (8 responses)

Both of these crash in python3. Bugs or features?

$ python2 -c "import json; print(json.dumps(b'xx'))"
"xx"
$ echo -n -e "\xFF" | python2 -c "import sys; print(repr(sys.stdin.read()))"
'\xff'

Fedora and Python 2

Posted Apr 11, 2018 23:39 UTC (Wed) by smurf (subscriber, #17840) [Link]

JSON doesn't have a non-UTF8 string data type. Thus you can't encode bytes to JSON.
You can change that. See "pydoc3 json".

Your second example works when you set a locale in which \xFF is a valid character.

$ echo -n -e "\xFF" | env LC_ALL=iso-8859-1 python3 -c "import sys; print(repr(sys.stdin.read()))"
'\udcff'

Fedora and Python 2

Posted Apr 11, 2018 23:58 UTC (Wed) by mjblenner (subscriber, #53463) [Link] (6 responses)

> Both of these crash in python3. Bugs or features?

Features.

> python2 -c "import json; print(json.dumps(b'xx'))"

JSON is UTF-{8|16|32}. What, exactly, do you want python to do with random bytes?

> echo -n -e "\xFF" | python2 -c "import sys; print(repr(sys.stdin.read()))"

Use sys.stdin.buffer to get bytes rather than UTF-8.

Fedora and Python 2

Posted Apr 12, 2018 8:15 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> Use sys.stdin.buffer to get bytes rather than UTF-8.
And remember to never, ever mix them in the same program. Oh, and it'll mostly work if you forget .buffer in one place. It'll just crash with bad data sometimes.

Fedora and Python 2

Posted Apr 12, 2018 10:24 UTC (Thu) by mjblenner (subscriber, #53463) [Link] (3 responses)

> And remember to never, ever mix them in the same program. Oh, and it'll mostly work if you forget .buffer in one place. It'll just crash with bad data sometimes.

Never ever mix them because if you do it will mostly work? (OK...)

Anyway, sounds like you need

PYTHONIOENCODING="utf-8:surrogateescape"

or use open(0, 'rb') or something, depending on what you're trying to do.

Fedora and Python 2

Posted Apr 12, 2018 17:05 UTC (Thu) by togga (guest, #53103) [Link] (2 responses)

Thanks for the heads up in the new world of Python. I figure I should expect the user to have set this PYTHONIOENCODING variable to "random" to begin with.

Scripts should then always start by setting this parameter, or is it to late? Are we talking shell wrappers here or refuse to start if set incorrectly?
If we do multiple things with multiple needs for encoding, do we need different settings for for different incoming data, in other words set it with each read?

Fedora and Python 2

Posted Apr 12, 2018 17:52 UTC (Thu) by smurf (subscriber, #17840) [Link] (1 responses)

You can't both expect programs to work with whatever random cruft you feed them, *and* to keep your data safe.

Setting the encoding to whatever is actually used is simple enough – besides, that stuff happens to work correctly when your data and your locale match. Surprise: they usually do. And if you want to process binary data, then use "sys.stdin/out.buffer" (or binary mode). This is documented.

On the other hand, allowing a random mix of differently-encoded strings (which is what Python2 or Perl do) and then trying to disentangle the resulting mojibake (or even figure out what causes it) is a frustrating and sometimes futile exercise in preventing data loss after it's too late. Been there, done that, bitten the carpet.

"Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.

Fedora and Python 2

Posted Apr 12, 2018 21:59 UTC (Thu) by togga (guest, #53103) [Link]

I haven't got a clue what you're talking about but given python's dynamic typing this line was quite amusing :-)

> "Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.

Fedora and Python 2

Posted Apr 12, 2018 16:54 UTC (Thu) by togga (guest, #53103) [Link]

"What, exactly, do you want python to do with random bytes?"

Python2 just keeps them as is, works wonderful.

"Use sys.stdin.buffer to get bytes rather than UTF-8."

Thanks. It worked. I made a compatible version. Awesome.

$ echo -n -e "\xFF" | python3 -c "import sys; S=type('S', (bytes,), {'__repr__': lambda s: bytes.__repr__(s)[1:]}); read_stdin=sys.stdin.buffer.read; sys.stdin.read = lambda: S(read_stdin()); print(repr(sys.stdin.read()))"
'\xff'

Fedora and Python 2

Posted Apr 11, 2018 21:41 UTC (Wed) by togga (guest, #53103) [Link]

"Anyway, the right way to fix this is to report a bug. Bitch about it on LWN only when whoever is responsible for the code refuses to fix it."

Is it really? Python3 has clearly chosen a design-path incompatible with my use-cases and this is on a fundamental level not fixed by a "bug report".

For me, discussing on LWN isn't bitching. It's one very good platform to discuss and transfer information, idéas and knowledge. Now when my Py2 floor is starting to crack I find LWN one good source to know what to do from here. Also, since I'm invested in Py2 I'm motivated to put some work on alternatives and here is one place to find people in similar situations that could help.

I advice to join us with positive attitude and constructive ideas, I think we need less bitch-talk.

Fedora and Python 2

Posted Apr 12, 2018 4:33 UTC (Thu) by njs (subscriber, #40338) [Link] (1 responses)

Struct field names aren't symbols, and don't appear anywhere in libraries; they're just convenience names for use in your code.

Ctypes does support using bytestrings for symbols:

In [4]: libc = ctypes.CDLL("libc.so.6")

In [5]: libc[b"sprintf"]
Out[5]: <_FuncPtr object at 0x7f14c0f88110>

So I think this criticism is simply mistaken.

Fedora and Python 2

Posted Apr 12, 2018 5:44 UTC (Thu) by njs (subscriber, #40338) [Link]

Whoops, this was already addressed; I just misread the threading.

Fedora and Python 2

Posted Apr 11, 2018 21:29 UTC (Wed) by togga (guest, #53103) [Link]

"Did you bother to submit a bug report / feature request to allow bytestrings?"
These ends up as attributes, that'll not just be a bug-report but propagate everywhere and probably bring up the need for a Python4.

"And why are you using them, instead of strings, in the first place?"
Because these are not hand-written in python scripts, these are read from external sources from various places, like C-strings from C/API:s.

Fedora and Python 2

Posted Apr 5, 2018 12:53 UTC (Thu) by mordae (guest, #54701) [Link] (3 responses)

> But now that I've had time to experience the python 3 way, I think it's worse. I usually would rather deal with bytes.

I don't really get this. Python 3 made the encoding handling explicit and much more predictable. There is an implicit conversion of bytes into str assuming ASCII, which I think is sensible nowadays (with EBCDIC gone). And apart from that, it's impossible to convert str into bytes without being explicit about the encoding.

> There are so many situations where the code I deal with doesn't know the encoding of the bytes its receiving...

Can you name one?

Fedora and Python 2

Posted Apr 5, 2018 16:11 UTC (Thu) by lsl (subscriber, #86508) [Link] (1 responses)

> > There are so many situations where the code I deal with doesn't know the encoding of the bytes its receiving...
>
> Can you name one?

Reading from standard input. Reading a file system directory's contents. Opening a file with user-specified name. The values of environment variables. All kinds of networking protocols that only reserve a small subset of ASCII (e.g. '\n') for control purposes with the rest being opaque bytes.

Fedora and Python 2

Posted Apr 5, 2018 16:23 UTC (Thu) by k8to (guest, #15413) [Link]

The really egregious one is running commands for their output on Windows.

It's very common on that platform that the encoding of command output is special to the command. Some things follow the system selected locale, some don't. Some even mix fixed strings in ascii with filename based output in utf-16 and crap like that.

Python3 would be okay for this if bytes were as full-featured as Python2 strs used to be, but they're really not. A lot of the standard library really insists on strings.

Fedora and Python 2

Posted Apr 6, 2018 11:15 UTC (Fri) by jwilk (subscriber, #63328) [Link]

> There is an implicit conversion of bytes into str assuming ASCII

Python 2 has implicit bytes→unicode conversion. Python 3 doesn't have it.

Fedora and Python 2

Posted Apr 5, 2018 11:03 UTC (Thu) by dunlapg (guest, #57764) [Link] (4 responses)

Python 3 is basically a new language, porting is anything from trivial to very, very complex.

People are still maintaining awk, sed, etc. From an ecosystem point of view, it seems like it would be a good idea to have python2 become something similar -- a 'utility' language which is just expected to exist on all Linux systems, which gets fixes but very little active development.

It's just a little matter of finding people to do that...

Fedora and Python 2

Posted Apr 5, 2018 12:32 UTC (Thu) by pbonzini (subscriber, #60935) [Link] (3 responses)

Python is a few orders of magnitude more code than sed or awk, not counting the millions of lines in libraries (sed and awk are more or less monolithic).

Fedora and Python 2

Posted Apr 5, 2018 14:09 UTC (Thu) by dunlapg (guest, #57764) [Link] (2 responses)

Yes, sorry if it wasn't clear: I didn't in any way mean to imply that python 2 was as small as awk or as easy to maintain: rather, just that it was as important to the ecosystem, even if it never grew or developed further than it already has.

Also, just to be clear, I'm not trying to say that any particular individual or group has an obligation to continue supporting python 2; I am saying that the world would be a better place if "someone" stepped up to do it; and if I were RedHat I'd be planning on having a fully python-2-compatible system maintained forward indefinitely as one of my selling points.

Fedora and Python 2

Posted Apr 5, 2018 16:25 UTC (Thu) by k8to (guest, #15413) [Link] (1 responses)

It kind of is, but the burden is quite high, given all the included functionality. I'm sure you recognize that, but I'm doubtful there will be a sufficiency of interest. I'd be happily proven wrong.

Fedora and Python 2

Posted Apr 5, 2018 17:17 UTC (Thu) by k8to (guest, #15413) [Link]

(Mainly I'm thinking about network protocols here. http, ssl, etc)

Fedora and Python 2

Posted Apr 5, 2018 12:40 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

About 40.000 say so in the categories (https://pypi.python.org/pypi?%3Aaction=browse), more or less the same as Python 2. 16.000 say that they support both.

Fedora and Python 2

Posted Apr 5, 2018 15:41 UTC (Thu) by southey (guest, #9466) [Link] (21 responses)

Yes this makes complete sense!
Don't forget that users can still install Python 2 from source in their distribution of choice and users can create their own Python 2 distribution as well. But if users wants to have the benefits of a distribution then users need to support their distribution and get those Python 2 packages ported over. This would also be true for PyPI but PyPI is not a Linux distribution and PyPI needs to become more proactive as well.

Arguments based on dependencies are no longer valid. It will be 10 years in December since Python 3 was released and many of the frequently used packages have already been ported to Python 3. So if a dependency has not been ported by now then users should have no expectations that it will be ported in the future nor should users expect that the package will be continue to be supported. So unless users can get it ported, users need to start replacing those packages with something else that is supported in Python 3.

Fedora and Python 2

Posted Apr 6, 2018 12:50 UTC (Fri) by bandrami (guest, #94229) [Link] (5 responses)

> and users can create their own Python 2 distribution as well

Though they will get sued if they have the temerity to call that Python 2 distribution "Python"

Fedora and Python 2

Posted Apr 12, 2018 4:50 UTC (Thu) by njs (subscriber, #40338) [Link] (4 responses)

They will not. If they start *changing the language* and still call it "Python", then they'll get a polite request to change the name to avoid confusing people and protect the "Python" trademark – that's what happened with Tauthon. But the poster you're replying to isn't talking about that, they're talking about making regular old Python 2 available outside of the normal distribution channels, and that's totally ok and very common (see pyenv, conda, etc.).

Fedora and Python 2

Posted Apr 17, 2018 13:27 UTC (Tue) by bandrami (guest, #94229) [Link] (3 responses)

You're leaving out a HUGE part there. Someone took over development of Python 2 *after it had been abandoned by its original team* and was threatened with legal action for keeping the name if he maintained that language.

Can you think of any other situation in free software where a team that has abandoned a codebase has not just discouraged someone willing from taking over maintenance, but in fact used legal pressure to prevent it?

Fedora and Python 2

Posted Apr 17, 2018 14:00 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link]

>Can you think of any other situation in free software where a team that has abandoned a codebase has not just discouraged someone willing from taking over maintenance

That's not what happened here. Maintenance of the codebase is perfectly fine. The name however is not free to use for forks. That is a situation that is quite common in Free software projects.

Fedora and Python 2

Posted Apr 17, 2018 20:22 UTC (Tue) by smurf (subscriber, #17840) [Link]

What do you mean, "abandoned"? It's not. The Python 2 branch contains a whole lot more changes (looking at the last half year) than Tauthon.

Fedora and Python 2

Posted Apr 28, 2018 20:07 UTC (Sat) by flussence (guest, #85566) [Link]

>Can you think of any other situation in free software where a team that has abandoned a codebase has not just discouraged someone willing from taking over maintenance, but in fact used legal pressure to prevent it?
A few years back the libav gang attempted to sue the legitimate FFmpeg project out of existence over its use of the logo. They failed because they didn't actually own any rights to the image to begin with; a third party contributed it. Hasn't stopped them using it, mind you.

This Python naming dispute isn't an act of malice - it's a simple trademark defence. Mozilla does exactly the same thing, which is why we have IceCat, IceWeasel, PaleMoon, Seamonkey etc.

Fedora and Python 2

Posted Apr 8, 2018 13:52 UTC (Sun) by togga (guest, #53103) [Link] (14 responses)

"So unless users can get it ported, users need to start replacing those packages with something else that is supported in Python 3."

Considering that the Python "community" (or BDFL) for obvious reasons has failed to switch it's users from 2 to 3 during a period of 10 years and then finally resided to threats and end-of-support for this, I'd wonder if this is a sound stance to take. Buying this tactics has had negative impact on the open source community. Py2/Py3 range of problems is now at the same (or surpassing) the annoyance levels of multi-platform path-separation, argument-quoting- and eol-issues. Fedora should answer the following question for their users:

Why should users accept regressions to migrate to Python3?
Why would this situation not happen all over again with Pyhton4, when Guido is disappointed with something else in Python3?

I think Fedora should stay classy and recommend migration away from a BDFL-controlled language gone awry. Ideally a fork would have happened long before this point.

Fedora and Python 2

Posted Apr 8, 2018 18:55 UTC (Sun) by smurf (subscriber, #17840) [Link] (13 responses)

> for obvious reasons

Yeah. Hindsight is always 20/20. Where were you 10 years ago? did you know better at the time? I certainly didn't.

Python is not any more BDFL-"controlled" than any other language.

Fedora and Python 2

Posted Apr 8, 2018 19:28 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (11 responses)

Everybody makes mistakes. It's fine.

The correct way is to acknowledge them and fix them. PHP actually did this with ill-fated PHP6, which also had started with "convert-the-world-to-magic-Unicode" approach. But after several years they realized that it's not going to work, so the fork was abandoned and most of the interesting stuff had been backported into PHP5. And the next migration attempt (PHP7) was designed to have only fairly minor breaking changes.

Yet Python persists in the belief that "they know better" and "just because". That "Unicode is like eating veggies" ( https://nothingbutsnark.svbtle.com/porting-to-python-3-is... ).

Well, as a result a lot of large codebases are now migrating away from Python entirely. Google now has a transparent Golang interoperability layer built specifically for Youtube and Dropbox is just moving to Golang piecewise.

Fedora and Python 2

Posted Apr 8, 2018 21:09 UTC (Sun) by smurf (subscriber, #17840) [Link] (10 responses)

PHP has enough problems that personally I wouldn't use it as a positive example for anything whatsoever, but that's just me. IMHO the PHP6-full-of-Unicode idea was doomed to fail the second they decided to base their work on UTF-16. It's now umpteen years later (well, not quite, but you get the idea) and PHP still doesn't have any better support for Unicode than 5.whatever. That's not "fixing" in my book.

Python2's indiscriminate support for mixing test and binary data caused a ton of hard to track bugs. "the day Django stops supporting Python 2 they will be able to rip out a ton of code that exists purely because it was so easy to mix text and binary data and get it wrong" (cited from the page you linked to). Yes it's annoying to suddenly be required to think about whether your data ist binary or text, and if the latter what encoding, but frankly you should have done that in the first place – the real problem is Python2's inability to support that distinction.

And yes there were some mistakes in early Python 3.x versions, but "acknowledge and fix mistakes" happened, e.g. by ditching 2to3 in favor of six and by extending support for Py2 – not by throwing in the towel because the whole thing was doomed to fail from the beginning. It obviously isn't; Python3 uptake may be slower than expected initially, but it's still way faster than that of Perl6. :-P

If Go is a better fit than Python2/3 for whatever it is you want to do, well, go for it. I'm not going to infer any perceived inferiority of Python2 vs. Python3 from that. It's probably more like "if we need to spend some effort to switch over anyway, let's examine what else might work even better".

Fedora and Python 2

Posted Apr 9, 2018 2:17 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> PHP6-full-of-Unicode idea was doomed to fail the second they decided to base their work on UTF-16
Why is that? What is wrong with UTF-16?

> "the day Django stops supporting Python 2 they will be able to rip out a ton of code that exists purely because it was so easy to mix text and binary data and get it wrong" (cited from the page you linked to).
I really doubt it.

> If Go is a better fit than Python2/3 for whatever it is you want to do, well, go for it. I'm not going to infer any perceived inferiority of Python2 vs. Python3 from that. It's probably more like "if we need to spend some effort to switch over anyway, let's examine what else might work even better".
Exactly. And this is self-inflicted entirely.

Fedora and Python 2

Posted Apr 9, 2018 2:58 UTC (Mon) by roc (subscriber, #30627) [Link] (5 responses)

I know nothing about PHP but

> What is wrong with UTF-16?

Outside UTF-16-based platforms (Windows, Java, JS), UTF-16 is strictly worse than UTF-8:
* Twice the space usage for ASCII (almost never more compact than UTF-8 on any text in practice)
* Multi-code-point characters are rarer, therefore less well tested
* Byte-order-dependent
* Needs special logic to sort lexicographically
More details: http://utf8everywhere.org/

Fedora and Python 2

Posted Apr 9, 2018 7:27 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

So why did Py3 default to it on Windows and many Linuxes until Py3.3?

Fedora and Python 2

Posted Apr 9, 2018 10:09 UTC (Mon) by smurf (subscriber, #17840) [Link]

Presumably they now know better. Correcting mistakes and all that.

Fedora and Python 2

Posted Apr 9, 2018 23:49 UTC (Mon) by roc (subscriber, #30627) [Link] (2 responses)

I don't know.

People use UTF16 a lot more than they should because in the 90s everyone was taught that 2-byte code points was *the* way to support Unicode, and Windows, Java, JS and others designed their APIs around that assumption. That kinda made sense if you believed, as many people did, that the 2-byte encoding with no multi-code-point characters (UCS2) would suffice for every Unicode character forever. That teaching was pervasive enough that many people came to *define* Unicode support as 2-byte code points, and some still do. So I'm not surprised that Python people got it wrong.

It's all a multibillion-dollar mistake :-(. Platform and application vendors did a ton of work to shift to UCS2 and then (implicitly) UTF16, to end up with a worse encoding than UTF8, a lot more work than if they'd just reinterpreted their byte strings as UTF8. Linux got this right, I think largely by just delaying Unicode support until it was clear UTF16 was a loser.

People at the Unicode consortium who set everyone down the UTF16 path really should fess up and apologize. Their mistake caused vast resources to be wasted and has actually caused Unicode support to be worse than it otherwise would have been.

Fedora and Python 2

Posted Apr 12, 2018 23:18 UTC (Thu) by dvdeug (guest, #10998) [Link]

There's people at Unicode still pissed at UTF-8. We could have had one standard line end marker, one paragraph marker, proper dashes and quotes, but instead we got UTF-8 and forced Unicode to be a fancier ASCII. UTF-1 was sort of ASCII compatible back at the start of Unicode, but it was ... horrifying. Doing "mod 190" to encode characters, anyone?

The other argument is that nobody could have sold a 32 bit encoding in the early 1990s. In 1996, they declared that it was going to have expand from 16 bit to 32 bit (or 20.1 bit). In 2001, Deseret was one of the first scripts encoded beyond above FFFF because they needed to start encoding stuff up there, but they didn't want to start with scripts people were going to fight to keep in the BMP. And yet it wasn't until 2010 that MySQL, even in UTF-8, supported characters above FFFF. Unless they've made some changes since I checked last time, it still bites people that MySQL charset utf8 is for FFFF and below only, and utf8mb4 is needed to actually encode UTF-8.

With a bunch of foresight on everyone's part, it might have been better. But pushing a 32 bit encoding in 1990 could also have mired the idea and left us working with an ISO 2022-style pile of encodings or at least stalled things by a decade where more legacy data in legacy encodings, and even more legacy encodings, were created, and more protocols were designed around the idea that everything has its own encoding instead of everything being in a fixed encoding, or at least a Unicode-compatible encoding.

Han Unification means UCS-2 was doomed and UTF-16 makes no sense

Posted Apr 15, 2018 1:14 UTC (Sun) by DHR (guest, #81356) [Link]

When UNICODE-1 (ucs-2) was designed, with a maximum of 64k code points, it was 100% forseeable that this was a mistake.

The key was it require "Han unification". Japanese, traditional Chinese, Korean, and simplified Chinese symbols would have to share code points. This was never going to be acceptable to people using those languages.

The analogy I heard was: would the Greek and Roman alphabets share code points? Alpha and A are really the same, are they not? How about Aleph? No way!

The fact is that UNICODE-1 was doomed before birth.

UTF-16 was always a bad idea. Some tried to ignore that and we live with that mistake.

<https://en.wikipedia.org/wiki/Han_unification>

UTF-8 was designed by the Plan9 folks. Quite early. On a napkin. Some of them later brought us Go.

Fedora and Python 2

Posted Apr 9, 2018 10:56 UTC (Mon) by niner (subscriber, #26151) [Link] (2 responses)

2 years after its first stable release, we're using Perl 6 in production and installed from distro repositories. And we use it to extend an existing Perl 5 code base thanks to interoperability features.

2 years after Python 3's first stable release, I couldn't even install it from a distro repo. And 10 years after said release, I still can't use it in combination with our existing Python 2 code.

I wonder what you use as base for your update comparison. Because at the 2 year mark, I'd call it a hands down victory for Perl 6 and we don't know what the 10 year mark's gonna be like. And anyway, I didn't even know there was a contest.

Fedora and Python 2

Posted Apr 10, 2018 8:16 UTC (Tue) by OttoErickson (guest, #122996) [Link] (1 responses)

Not being a Perl Monger I'm genuinely surprised to hear that Perl 6 is being used in production. I was under the impression that Perl 6 had largely been abandon; it would seem that my impression was incorrect. In all honesty, this is not a criticism or flame of Perl; I've never really worked with Perl—any version—so I haven't kept up with it. I'd love to see an article on the topic of Perl 6 here on LWN.

Fedora and Python 2

Posted Apr 12, 2018 8:11 UTC (Thu) by niner (subscriber, #26151) [Link]

To get a feeling for activity in Perl 6, I can heartily recommend https://p6weekly.wordpress.com/

Fedora and Python 2

Posted Apr 11, 2018 21:05 UTC (Wed) by togga (guest, #53103) [Link]

"Where were you 10 years ago? did you know better at the time? I certainly didn't."

Since I was a heavy Numpy/Scipy/etc user 10 years ago I just followed Python3 development occasionally and did not really do any effort on using it. Using it for prototyping algorithms, simulations, debug and test-scripts I've never really encountered any problem Python3 tried to solve. Since I use Python for debug and interface a lot with C/API:s, Python3 issues is really piling up when trying to use it. Personally I'm on an exit strategy now but unfortunately there is a lot of Python code out there especially since it's an "easy syntax", popular at many workplaces. In the last couple of years I've been struck by py2/py3 issues on a almost day-to-day basis. I find most of them un-necessary and annoying, haven't seen anything like this in any other language I use.

Fedora and Python 2

Posted Apr 10, 2018 17:43 UTC (Tue) by sionescu (subscriber, #59410) [Link]

Breaking changes are the inherent reality of evolving open-source software because there's no enterprise structure that charges customers high enough for the level of indefinite compatibility that you seem to wish for.
If you want an unchanging dynamic-like language try Common Lisp :)

Fedora and Python 2

Posted Apr 5, 2018 15:13 UTC (Thu) by arjan (subscriber, #36785) [Link] (6 responses)

Working on a distro that started this transition some time ago.... the reality is that the only hard stragglers that we suffer from are projects where Red Hat folks are primary drivers (such as Ceph and until recently Ansible, although that got fixed for py3)....

Fedora and Python 2

Posted Apr 6, 2018 10:16 UTC (Fri) by misc (subscriber, #73730) [Link] (2 responses)

In the case of ansible, the main issue is that the software need to work on python 2 and python 3. No matter what people think, there will be python 2 for a long time, and customers pay for that.

And to make sure things work on python 3, you need a extensive coverage of the test suite and we are not there yet for Ansible, due to multiple factors, such as "mocking cloud services API is tedious" and "people submit more modules and new stuff than tests and bugfixes". There is a balance between ease of contribution and having a perfect PR.

All the code pass compilation check for python 3 since a long time, but that's Python, that is not enough to verify that it doesn't crash in weird way. And even with complete test coverage, you can uncover edge case (for example, something returning changed when nothing was changed, testing that would kinda requires to double the ressources for testing, and likely the time too).

And we had a catch-22 situation. People didn't used it on python 3 because it wasn't supported officially, but we couldn't support because we were not sure it would work enough, and there was enough workaround by installing python 2. Hopefully, the porting work was started and after a while, it became just "let's try with python 3 and submit patches until that's good enough". For example, that's what I did, took the git repo, tried to deploy openshift (because that was the biggest codebase I could find at that time) until it worked. And now that work good enough, but I focused on my use case, I didn't looked at all cloud modules, as it seemed that we were also waiting on upstream to releases python 3 version (this and the fact that I didn't used them).

Ansible is also really easy to test on python 3, just change one line in inventory. It shouldn't eat your babies or anything, even if nobody can promise that (but we also don't promise that for python 2 execution, and yet people use it, so...).

Fedora and Python 2

Posted Apr 6, 2018 12:44 UTC (Fri) by arjan (subscriber, #36785) [Link]

maybe my language was imprecise.. ansible used to be a problem, but we're now using it py3 mode without issues, so it is a positive example...

I understand the "would like to keep py2 compat" angle since unfortunately not all current enterprise distros ship with py3 as a good option...

Fedora and Python 2

Posted Apr 9, 2018 10:00 UTC (Mon) by OttoErickson (guest, #122996) [Link]

> No matter what people think, there will be python 2 for a long time, and customers pay for that.

Job security.

Fedora and Python 2

Posted Apr 6, 2018 20:04 UTC (Fri) by flussence (guest, #85566) [Link] (2 responses)

On my Gentoo box (where I've gone out of my way to minimise python2 deps), the main obstacle is LLVM, which is a requirement to browse the web or use graphics cards. Google is a serial offender for this stuff with nearly all their software having multiple python2 dependencies, but I've been purging that whenever I can.
There's also GIMP, but they don't even keep up to date with their own toolkit any more…

Fedora and Python 2

Posted Apr 6, 2018 20:34 UTC (Fri) by comio (guest, #115526) [Link]

10 kudos for your Gentoo. I tried also to reduce the number of python2 packages.

Fedora and Python 2

Posted Apr 6, 2018 20:54 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

> There's also GIMP, but they don't even keep up to date with their own toolkit any more…

GTK hasn't been the GIMP toolkit in many many years. It is ancient history

Fedora and Python 2

Posted Apr 5, 2018 20:19 UTC (Thu) by hsivonen (subscriber, #91034) [Link] (26 responses)

It would be good if the distros pooled effort to maintain Tauthon (https://github.com/naftaliharris/tauthon/blob/master/READ...) and shipped it as the package providing the /usr/bin/python program.

Fedora and Python 2

Posted Apr 5, 2018 23:09 UTC (Thu) by smurf (subscriber, #17840) [Link] (24 responses)

Cute name, but the declining rate of changes since September 2017 seems to indicate that some work would be required to revive it.

Frankly, I don't see the appeal. You need Python 3 features, you use Python 3.

Fedora and Python 2

Posted Apr 6, 2018 3:38 UTC (Fri) by jhoblitt (subscriber, #77733) [Link] (1 responses)

What does that provide that `future` doesn't?

Fedora and Python 2

Posted Apr 6, 2018 13:09 UTC (Fri) by smurf (subscriber, #17840) [Link]

A whole damn lot. async/await, "yield from", type annotations, … read the linked-to web page.

Fedora and Python 2

Posted Apr 11, 2018 21:19 UTC (Wed) by togga (guest, #53103) [Link] (21 responses)

"You need Python 3 features, you use Python 3."

What if you need some Python 3 features but without the Python3 encode/decode string-hell?

Except for GIL and threading, Python2 was quite a productive language. Tauthon would be a nice starting point for distros wanting to support and maintain legacy code.

My understanding is that most of Py2 to Py3 conversions out there have been a waste of time, It is sad to see Numpy in the middle of this.

Fedora and Python 2

Posted Apr 12, 2018 14:26 UTC (Thu) by ceplm (subscriber, #41334) [Link] (19 responses)

> What if you need some Python 3 features but without the Python3 encode/decode string-hell?

There is no encode/decode hell, there are only programmers who should peel onions in submarine (https://wp.me/p83KNI-eH).

Fedora and Python 2

Posted Apr 12, 2018 19:41 UTC (Thu) by togga (guest, #53103) [Link]

I get it. When peeling onions in a submarine, all encode/decode issues doesn't feel like hell anymore. Your referenced article made the same progress as the onions regarding Py3 design issue.

Fedora and Python 2

Posted Apr 12, 2018 20:29 UTC (Thu) by peniblec (subscriber, #111147) [Link] (17 responses)

Correct me if I’m wrong, but Joel’s point in this article is that:

It does not make sense to have a string without knowing what encoding it uses. […] If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly.

To paraphrase, if you have to display any kind of text to a human user, you (or your programming environment) must explicitly know what encoding to use to translate the byte streams you carry around into intelligible characters.

Now AFAIU, when people complain about the “encode/decode string-hell” they are not really disputing this. From what I gather, these people deplore that by default, various parts of Python 3’s standard library expect their inputs to be Unicode characters, in contexts where there is no reason for them to be.

Personally, while I enjoy Python 3 overall, I agree that the decision to have streams default to meat-world characters rather than bytes is debatable. Not every program has to deal with human-readable strings.

Let’s say though that we all collectively agreed that Python having a bias toward human text is a good thing: let’s assume that dealing with byte-streams that do not map to Unicode characters is so rare that having to sprinkle a few bs and .buffers here and there is not a deal-breaker.

Even then, Python’s approach to human text feels somewhat naive: lengths, indexing, iteration and comparison are all based on code points, which AFAIU do not really represent anything meaningful in meat-space.

For example, Python 3 thinks that 'é' != 'é' because one is 'e'+'\N{COMBINING ACUTE ACCENT}' and the other is '\N{LATIN SMALL LETTER E WITH ACUTE}'. My French AZERTY keyboard makes typing the latter straightforward; I understand that GTK applications make it easy to type the former with “e Control-Shift-U 301”.

I can’t think of a program geared toward human interaction that should consider these two strings different. Python does offer unicodedata.normalize() to solve this specific problem; must we rely on every text-handling Python program out there to make its input go through this function? Arguably, shouldn’t the language abstract this minutiae away from us?

tl;dr: While Joel’s article is a classic and a must-read, I’m not sure it addresses the problems raised by Python 3’s critics:

  • the language’s preference toward meat-space characters adds hoops to jump through when dealing with genuine byte-streams;

  • the language’s naive handling of these meat-space characters adds hoops to jump through when dealing with those too.

Fedora and Python 2

Posted Apr 12, 2018 22:45 UTC (Thu) by dvdeug (guest, #10998) [Link] (3 responses)

There's certainly an argument for normalization, but every person annoyed by Python 3 would likely be more pissed off if, by default, it silently changed text when reading it in. Imagine a text editor where you opened your new novel, "Nous étions à l’étude, quand le Proviseur entra, ..." and changed it to "Nous étudiions, quand le Directeur entra, ..." and fed it back to git to discover a diff that changed every single line in the file.

Fedora and Python 2

Posted Apr 13, 2018 6:05 UTC (Fri) by peniblec (subscriber, #111147) [Link] (2 responses)

I may not have thought enough about this, but couldn't this text
editor normalize tokens only for some operations (e.g.
character-count, searching) and otherwise preserve the file's content,
only effectively changing the parts the user actually edited?

Fedora and Python 2

Posted Apr 14, 2018 6:26 UTC (Sat) by dvdeug (guest, #10998) [Link] (1 responses)

If Python normalized text by default, the text editor would have a hard time doing that.

Fedora and Python 2

Posted Apr 14, 2018 11:48 UTC (Sat) by peniblec (subscriber, #111147) [Link]

OK. Let’s say Python’s string type uses normalization/grapheme clusters/nanomachines to correctly compare sequences of Unicode characters. Would that necessarily make a text editor overzealously normalize your whole file, thus polluting your patch?

I don’t know how actual text editors do it, but I imagine that their representation of your file’s content is more nuanced than simply “whatever open(filename) returned”. I would assume that they represent a “file” as sequences of opaque “word” or “line” objects, each of those objects having methods to

  • get their position in the file’s byte-stream (start and end offset, cached once decoded), so that the editor knows where to apply changes;

  • get their “canonical” Unicode representation, so that the editor can do whatever an editor is supposed to do with meat-space characters (comparison for search-and-replace, length computation for line-wrapping).

So with such a design, I don’t think “Python’s str canonicalizing behind your back” would necessarily lead to “OMG this commit is full of extraneous crap introduced by this dumb Python text editor”. Again, I might not have thought enough about this, maybe the above does nothing to solve the problem.

(Congratulations, you’ve nerd-sniped me into designing a text editor ;) )

Alternative workaround: teach our diffing tools to normalize text before computing differences :D

They do already let us skip whitespace changes, for example, which is a subclass of the more general category of “things computers care about despite being mostly irrelevant to meatbags”.

Fedora and Python 2

Posted Apr 12, 2018 23:07 UTC (Thu) by HelloWorld (guest, #56129) [Link]

It's interesting how many languages get this wrong. For instance, Java doesn't even give you code points, much less grapheme clusters. Instead, it gives you 16-bit “char” values (“code units” in Unicode-speak) and then you have some methods like codePointAt that give you the code point at some (char-based) position in the string. And when you want to iterate over the code points in a string, I don't know how you're supposed to get from one index to the next, i.e. whether you need to increase the index by one or by two. It might be that you need to compare against Character.MAX_LOW_SURROGATE, but I'm not sure… Needless to say it doesn't help you dealing with grapheme clusters at all, apparently you're supposed to use third-party libraries like icu4j. All in all, it's a clusterfuck (ba-dum tss!)

Fedora and Python 2

Posted Apr 13, 2018 1:38 UTC (Fri) by smurf (subscriber, #17840) [Link] (7 responses)

> the language’s naive handling of these meat-space characters adds hoops to jump through when dealing with those too.

You need to normalize Unicode before doing meaningful things with it. That's a given in any programming language.

You might find fault with the people who invented Unicode. Blaming your (non-)choice of programming language isn't going to help, except that I can think of lots of ways to make it worse. Just look at Java.

Fedora and Python 2

Posted Apr 13, 2018 6:23 UTC (Fri) by peniblec (subscriber, #111147) [Link] (1 responses)

If normalization is so obviously needed before dealing with Unicode strings, wouldn’t it make sense for languages to take care of it by default?

For example, a language’s string-comparison function could automatically make normalized copies of its operands and compare these; users who actually want to compare codepoints could use something like list(s1.codepoints()) == list(s2.codepoints()).

(Not sure what iteration should produce by default, though. Grapheme clusters?)

Maybe performance would take such a hit that it makes sense to let the user ask for normalization explicitly.

Disclaimer: I don’t actually know any language which deals with Unicode strings this way; then again, I don’t actually know many languages.

Fedora and Python 2

Posted Apr 13, 2018 6:47 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Yes, that's exactly what Perl 6 did. It encodes the text into grapheme clusters, the stuff that people think about as characters. They can be directly indexed, used in splits and so on. As far as I know, that's the only mainstream(-ish) language that does this.

I personally wouldn't have been so opposed to Py3 if it were to do the same. Unicode is a hard problem and full support of it might require compromises.

Fedora and Python 2

Posted Apr 13, 2018 12:44 UTC (Fri) by HelloWorld (guest, #56129) [Link] (4 responses)

Apparently canonicalisation isn't the solution either. I found this interesting comment elsewhere:
https://mortoray.com/2013/11/27/the-string-type-is-broken...

The essential bit: “Unfortunately, the standard normalisation forms are buggy, and under the current stability policy, cannot be fixed. One example of this that I know is U+387 GREEK ANO TELEIA, which wrongly decomposes canonically (!) into U+00B7 MIDDLE DOT (the Greek name even means literally “upper dot”). This means that some processes may choose to avoid normalisation, because, even the canonical forms risk losing important information.”

Fedora and Python 2

Posted Apr 13, 2018 16:39 UTC (Fri) by ceplm (subscriber, #41334) [Link]

Standard reply to every "foo is known to be buggy" is "And what's the bug number?" Also, I would ask the author of the comment why bug cannot be fixed. Doesn't make sense to me.

Fedora and Python 2

Posted Apr 13, 2018 17:48 UTC (Fri) by sfeam (subscriber, #2841) [Link] (2 responses)

U+0387 doesn't "decompose" into anything. It's not a combining form. It is an example of a character in one alphabet whose common written form happens to look like a character from some other alphabet or set of conventional symbols. Because they look similar [in typical fonts] people tend to type whichever is more convenient. But neither one is the "canonical form" of the other. A more familiar pair would be Greek letter "mu" (U+03BC) and the scientific prefix "micro" (U+00B5). The existence of such pairs can be a problem, but it's a different problem than canonicalization. While it might make sense to be suspicious of micro signs appearing in what is otherwise a Greek alphabet URL, it would be a bad idea to replace all micro signs with "mu" (or vice versa) in a document that happened to include both Greek text and quantities in SI units.

Fedora and Python 2

Posted Apr 13, 2018 19:38 UTC (Fri) by jwilk (subscriber, #63328) [Link] (1 responses)

>>> unicodedata.normalize('NFD', u'\u0387') == u'\xB7'
True

Fedora and Python 2

Posted Apr 13, 2018 21:37 UTC (Fri) by sfeam (subscriber, #2841) [Link]

Well that's a bug then, isn't it.

Fedora and Python 2

Posted Apr 13, 2018 10:30 UTC (Fri) by ceplm (subscriber, #41334) [Link] (3 responses)

My point with the Joel's article was that in my experience large part of people complaining about Python 3 encoding/decoding hell are those who still believe that "string is bunch of bytes" is enough, because they live in the bubble of languages where it is enough (i.e., English and Western European languages). I have converted recently M2Crypto to be py2k/py3k-straddling and I had no problems with Unicode encoding/decoding. What I had problems with, and plenty plenty of them, was that completely messy str/unicode/bytes py2k mess completely confused the real situation. Instead of blaming py3k, I keep blaming py2k and those programmers that their cuckoo-land of "character is one byte" delusion.

And yes, I agree that the implementation in py3k is not perfect, conversion between on-wire eight-bit-per-character to proper str is sometimes problematic, but a lot of work has been spent on it already and the situation is not that bleak, I would call it whatever hell.

Certainly, comparing to the disaster py2k was, py3k is huge improvement.

Fedora and Python 2

Posted Apr 14, 2018 13:38 UTC (Sat) by peniblec (subscriber, #111147) [Link] (2 responses)

Fair enough. I’ve mostly only worked in Python 3 codebases, and the only place where I hear people debate the str-vs-bytes business is on LWN.

That restricts my sample of arguments against Python 3 to the high-level design issues I mentioned; I have not been “in the trenches” migrating sloppy code to Python 3. In my imagination the “characters are bytes” camp (and their code) had been dissolved during the noughties; I guess that was wishful thinking :)

Fedora and Python 2

Posted Apr 14, 2018 15:15 UTC (Sat) by excors (subscriber, #95769) [Link] (1 responses)

Perhaps the issue is that some people (particularly people on LWN) are more interested in systems programming, and their programs typically deal with protocols and file formats and APIs that are primarily byte-based and occasionally contain human-readable text, whereas other people are more interested in e.g. web programming where their data is primarily human-readable inputs and outputs and modern Unicode-based file formats (HTML, CSS, etc).

People in the first category might understand Unicode perfectly well, but they often need to deal with e.g. filenames (which aren't really Unicode on Linux or Windows), or with e.g. HTTP headers (where the encoding is unclearly specified and real data often violates the specification anyway), and they want a language that makes it easy and natural to process data like that. Python 3 makes it less easy and less natural than Python 2, since the language and the libraries tend to default to Unicode strings, so those people are unhappy. Meanwhile people in the second category prefer having everything be Unicode by default, since that's all they use anyway. Neither side is wrong or ignorant, they just have different use cases and different requirements, and Python failed to find a way to satisfy both groups.

Fedora and Python 2

Posted Apr 14, 2018 16:48 UTC (Sat) by SiB (subscriber, #4048) [Link]

Exactly!

In our department (physics) we use python for data analysis and for instrumentation control (including space flight). Python 3 is perfectly fine for the data analysis. Instrumentation control uses the python repl as commanding interface, where python 2 is still ahead.

Fedora and Python 2

Posted Apr 28, 2018 20:22 UTC (Sat) by RooTer (guest, #91640) [Link]

> What if you need some Python 3 features but without the Python3 encode/decode string-hell?

Having developed python apps in both Python 2 and 3 for years, I would say the encode/decode hell exists in Python 2 realm, not 3.
Seems as stupid `UnicodeDecodeError`plague almost every Python 2 project, and switch to Python 3 would be a good idea just for the clear str/bytes distinction.

Fedora and Python 2

Posted Apr 6, 2018 11:33 UTC (Fri) by Otus (subscriber, #67685) [Link]

Or perhaps pypy would be a better alternative?


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds