Fedora and Python 2
It has been known for quite some time that Python 2 will reach its end of life in 2020—after being extended by five years from its original 2015 expiry. After that, there will be no support, bug fixes, or security patches for Python 2, at least from the Python Software Foundation and the core developers. Some distributions will need to continue to support the final Python 2 release, however, since their support windows extend past that date; the enterprise and long-term support distributions will likely be supporting it well into the 2020s and possibly beyond. But even shorter-support-cycle distributions need to consider their plan for a sweeping change of this sort—in less than two years.
There was talk of having the actual end of life (EOL) occur at a party at PyCon 2020, but a mid-March query to the python-dev mailing list helped nail down the date once and for all. Currently, the only supported branch in the 2.x family is Python 2.7, which is up to 2.7.14 and is scheduled to have a 2.7.15 release sometime in 2018. It seems likely there will be at least one more release before EOL, which Python benevolent dictator for life (BDFL) Guido van Rossum proclaimed will be January 1, 2020:
Benjamin Peterson, who is the 2.7 release manager, agreed, though he cautioned that the final 2.7 release may not literally be made on new year's day 2020. Others took notice of the date, including Petr Viktorin and the other maintainers of the python2 package for Fedora. Viktorin posted a message to the Fedora devel mailing list on behalf of all of the nine python2 maintainers that noted the EOL date and their intent to "orphan" the python2 package:
The first Fedora release that would be affected by the EOL date is probably Fedora 30, which is likely to land in the first half of 2019—and be supported into 2020. But, Viktorin argued, it makes sense to get started now by removing python2 dependencies for packages that don't really need them:
There was some confusion about what was being suggested but, in general,
the reaction was positive. A rude
complaint that the problem was essentially
impossible to solve was met with strong disagreement. As Richard W.M. Jones
pointed out: "it's hard to argue with
a plan which has been pre-announced
*2 years* in advance. If only all Fedora changes were given such a
generous runway.
" But Randy Barlow wondered if the proposed incremental approach
was right:
That kind of cataclysmic approach might work for the Python code actually shipped by Fedora, but there is plenty of other code out there to consider. Python is, after all, a programming language, so there is an unknowable amount of Python 2 running on Fedora users' machines right now. A more cautious approach gives them time to notice and upgrade; as Gerald Henriksen put it:
It should be possible to continue supporting Python 2.7 into 2020 and beyond by piggybacking on the work that the enterprise distributions will be doing. It is also possible, though perhaps not all that likely, that few or no security flaws will be found in the language after it drops out of its support window. RHEL 7 and CentOS 7 ship Python 2.7; both of those distributions will receive updates until 2024. That should help with keeping Python 2 alive, Kevin Kofler said; borrowing patches from RHEL/CentOS is something he has been doing for Qt and kdelibs for some time. As Viktorin pointed out, the Fedora Python SIG is already maintaining some EOL Python versions; it will do the same for Python 2.7:
Part of the reason to start dropping Python 2 packages now is to figure out which packages can do it now and which ones will need additional help or coordination in the next few years.
Beyond just backward compatibility, though, Viktorin and company have
another reason they are willing to maintain Python 2.7 past its EOL,
which is mentioned in the original email: "support exceptionally
important non–security critical applications, if
their upstreams don't manage to port to Python 3 in time
". However, if
there are others who think they have a better approach to handling the EOL
(or are willing to
pick up the regular python2 package maintenance, rather than moving to a
python27
"legacy" package as is planned), then the Python team wants to alert them
to its plans. Viktorin expresses some skepticism that folks outside of the
Python SIG will truly be in a position to take over, but doesn't want to
foreclose that possibility.
This is not the first time that Fedora has discussed the switch. Back in August 2017, we looked at a discussion of where /usr/bin/python will point in a post-python2 world. Other distributions are grappling with the issue as well. A year ago, it was discussed on the debian-python mailing list (and again in August 2017), it is on the radar for openSUSE, and it recently came up for Ubuntu, as well. Each is working out how to highlight the problem areas for Python-2-only packages in their repositories and to make the switch to Python 3 smoothly. We will be seeing more of these kinds of discussions, across the Linux world (and beyond), as time ticks down to 2020.
The switch from Python 2 to 3 is a huge job; one might guess that it is orders of magnitude larger than anyone had anticipated back in the heady days of Python 3000 (around 2007, say). That is a testament to the popularity of the language and the various tools and frameworks it has spawned; it also likely serves as an abject warning for other projects that might ever consider a compatibility break of that nature. In the mid-late-2020s, with the transition presumably well behind them, the Python core developers (and community as a whole) will be due for huge sigh of relief. But it will take work all over the free-software world, including by distributions like Fedora, in order to get there.
Posted Apr 5, 2018 6:27 UTC (Thu)
by cyperpunks (subscriber, #39406)
[Link] (74 responses)
The ordering problem might be the worst: can't port since dependency A is missing, while
How many billions of lines of Python 2 is out there?
When will the last Python 2 script execute? In 50-60 years?
Posted Apr 5, 2018 9:28 UTC (Thu)
by ecree (guest, #95790)
[Link] (44 responses)
Posted Apr 5, 2018 9:49 UTC (Thu)
by k8to (guest, #15413)
[Link] (43 responses)
When python 3 was more or less announced, I was sure they had the right idea, because the weird exceptions you'd get when a str hit a unicode object in python2 were just no good. But now that I've had time to experience the python 3 way, I think it's worse. I usually would rather deal with bytes. There are so many situations where the code I deal with doesn't know the encoding of the bytes its receiving, and python3 doesn't give me a reasonable way to accept those bytes and use most of the tools I would use in python2.
One of those unintended outcomes sort of things, I think. It feels like python3 was a few years to early to pick the right strategy with unicode.
Posted Apr 5, 2018 10:09 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (5 responses)
(Perl 6 is a different story altogether…)
Posted Apr 5, 2018 17:21 UTC (Thu)
by k8to (guest, #15413)
[Link] (4 responses)
Posted Apr 5, 2018 18:52 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
However, I have personally converted a largish console-based Perl5 codebase from running on Latin-1 (or Latin-8 to be exact, as soon as the € happened) to UTF-8. Let me tell you that this is an exercise in hunting down annoying hard-to-reproduce bugs that you wouldn't wish on your worst enemy. We had the whole gamut – from mojibake in the database through strings which crashed the interpreter when printed to tearing our hair out trying to write code that works correctly in both locales. The only thing that saved us is the fact that you can unscramble real-world mixed UTF-8/Latin8 content safely, thanks to the way UTF-8 is encoded.
Python3's way of strict separation between bytes and strings may be more annoying when you start off, esp. on Windows, but IMHO it's a whole lot easier to make sure that the end result is actually correct.
Posted Apr 7, 2018 23:21 UTC (Sat)
by flussence (guest, #85566)
[Link]
Those are all things I encountered just in a single project. To be fair it was one where I ended up needing to write custom Encode::* modules, so maybe not representative… but it's still a lot of pain for the sake of not breaking code written pre-Y2K.
Posted Apr 6, 2018 18:35 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
Python's TLS implementation wants to check whether the TLS server has presented a certificate that's valid for the name of the TLS server you're trying to connect to. This makes sense, certainly as a default, if I said I wanted a TLS connection to foo.example.com, I definitely shouldn't need to roll my own certificate validator, or even explicitly say "Wait, this connection I have, is it _really_ to foo.example.com?" because duh, of course I want those things, let the handful of people who don't want checks ask *not* to check.
And all the people involved in designing this stuff at the IETF stage were conscious that this problem could be hard, and we don't want hard because this is a security system. So the certificates have DNS A-labels inside them. All you need to do is match the DNS name in the certificate (written with A-labels, ie ASCII) against the DNS name you looked up in your DNS system, which is also written with A-labels, ie ASCII. This is really boring and easy. Users don't necessarily understand A-labels, they might be gibberish, but the presentation layer is quite separate, securing that is a UX problem and not relevant to TLS or other low level components. All the tricky human stuff is pushed into the layer that was already dealing with humans, everything that's machine-to-machine needn't care about human cultural complexities like language.
Too simple for Python though, they decided to handle everything with U-labels so they can mark all the types "str". So now suddenly this low-level bit banging code that's supposed to securely move packets ends up with the entire i18n system baked into it, and inherits mysterious presentation layer related bugs. Problem with the matching? Oh sorry, you need to go fix this whole other Python sub-system that has nothing whatsoever to do with TLS ...
Eventually, literally in February this year, sanity finally prevailed and the latest Python 3 actually just does what the RFCs said it ought to do in the first place, massively simplifying the code _and_ making it more correct. Most users will never notice, because this was after all the Right Thing anyway.
Posted Apr 9, 2018 13:01 UTC (Mon)
by cortana (subscriber, #24596)
[Link]
Posted Apr 5, 2018 11:56 UTC (Thu)
by pabs (subscriber, #43278)
[Link] (32 responses)
Posted Apr 5, 2018 16:19 UTC (Thu)
by k8to (guest, #15413)
[Link] (31 responses)
Posted Apr 6, 2018 13:55 UTC (Fri)
by barryascott (subscriber, #80640)
[Link] (30 responses)
If you know that its bytes that you care about use the APIs that give you the bytes.
You can get the env vars as bytes, see file system names as bytes and read files in bytes.
I do not understand the criticism.
Barry
Posted Apr 8, 2018 12:59 UTC (Sun)
by togga (guest, #53103)
[Link] (29 responses)
Python3 is not longer a powerful glue environment (just look at the BCC encode/decode mess) and has no apparent strong side anymore, maybe syntax like a new BASIC.
After successfully using Python2 for 14 years I now don't recommend python for anything. There are more modern languages like Go with simple syntax, can also be used interactively and contrary to both python2 and python3 doesn't need a ton of a workarounds to be efficient.
Posted Apr 8, 2018 13:43 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (28 responses)
Posted Apr 8, 2018 19:16 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (26 responses)
Posted Apr 8, 2018 19:59 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (23 responses)
Anyway, the right way to fix this is to report a bug. Bitch about it on LWN only when whoever is responsible for the code refuses to fix it.
Posted Apr 8, 2018 20:06 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (21 responses)
I prefer code that works 100% of the time, barring unrelated hardware/system software issues.
Posted Apr 8, 2018 21:12 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (20 responses)
To that effect I will now take that example code and do what somebody else should have done long ago, i.e. file a bug.
Posted Apr 8, 2018 21:27 UTC (Sun)
by mjblenner (subscriber, #53463)
[Link] (10 responses)
>>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]})
isn't really a bug. The 'c_string_symbol' is the python-side handle to the C structure field. In python3 it needs to be unicode (i.e like the python source file), since you do something like
>>> s = t(some_dll[b'c_string_symbol']) # bytes used here to get the C function symbol
Posted Apr 8, 2018 22:41 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (9 responses)
On the other hand: Python is written in UTF-8 (duh) and Python's way to access symbols is by using attributes (also duh). Requiring code to cater to corner cases that don't actually occur in the real world is a surefire recipe for code bloat but doesn't help anybody.
Posted Apr 8, 2018 23:00 UTC (Sun)
by mjblenner (subscriber, #53463)
[Link] (8 responses)
Uh, no. The bit that gets the symbols from the dll is using bytes. This bit:
some_dll[b'c_function_name']
You can't refer to it in python by the same random bytes though (why would that matter?).
Posted Apr 11, 2018 22:06 UTC (Wed)
by togga (guest, #53103)
[Link] (7 responses)
Because scripting is mostly about automation. It's just broken in this case to convert these symbols to another representation. This is just one example of many in this theme since many of the libraries and third party extensions needs string representation.
In a glue language scenario (which has been a strong side of Python), if I want to grab symbol (or blob) X from system A, handle it and pass it to system B, I do not want to have Y=f(X) as intermediate representation and then do the inverse function before passing it to B.
Simple things such as reading from an UART or a socket has become a mine-field as soon as you want to use something in Python dealing with strings. Especially annoying when developing and things might not be that clean, nice and tidy.
This whole problem-domain is something Python3 has created. For me, Python itself lacks both in performance and multi-threading and doesn't really have anything language-wise (apart from lots of third party libraries) to compensate for this loss of productivity.
Posted Apr 11, 2018 23:52 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (1 responses)
Well, all I can say is that my experience (both with Perl5 and Python2) is rather different. Our fight with Perl5, incrementally switching our corporate code base to UTF8 compatibility, was … ugly.
Thus I'm very happy about the fact that Python3 spews a large unfriendly stack dump to your terminal when I forget to specify how an external byte stream is encoded. While it's somewhat annoying when you JUST KNOW that all your data is UTF8, or latin1, or randombytes … things change, and when it "suddenly" isn't, you get mojibake. Or worse. No thanks.
Posted Apr 12, 2018 16:29 UTC (Thu)
by togga (guest, #53103)
[Link]
The latter sounds to me like utopia and an endless job for achieving nothing but explains lots of the attitudes of the Python community.
Posted Apr 12, 2018 1:12 UTC (Thu)
by mjblenner (subscriber, #53463)
[Link] (1 responses)
OK. I kind of get where you're coming from. Although I'm a bit confused. Or you're a bit confused.
ctypes is an ABI interface, so having the structure field be a different name to the function symbol is of no relevance for using python to glue various other C functions together (even when passing that structure around).
i.e. here:
> type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ...
Anyway, the easy answer is to just use python strings there. ctypes function symbol lookup converts strings to utf-8, so 99%+ of the time, this will work.
Otherwise...
Decode the symbol name to a string with errors='surrogateescape' for use in python, and use the same error handler to decode back to the original bytes for getting the symbol out of the library.
Or you could add a layer of indirection between the structure field names and the function symbols.
Posted Apr 12, 2018 16:36 UTC (Thu)
by togga (guest, #53103)
[Link]
I'm not confused, I'm just experienced lots of issues I didn't have before Python3's software castle in the air.
> "the easy answer is to just use python strings"
Isn't this kind of bloated. These strings can come from anywhere and might not even be visible in Python code at all.
> "Decode the symbol name to a string with errors='surrogateescape' for use in python, and use the same error handler to decode back to the original bytes for getting the symbol out of the library."
You mean use Python3 and stick with tons of workarounds and issues just for it's sake? Change the whole world to Python3? I value my time much more than that.
Posted Apr 12, 2018 23:47 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (2 responses)
If you're just passing something from system A to system B, you shouldn't have to change the data. But there's a fairly thin region where you can choose to not unmangle something and still expect to be able to do anything with it. Stuff not being clean, nice and tidy is all the more reason to make sure you know exactly how the data you're handling is formatted.
Posted Apr 15, 2018 15:13 UTC (Sun)
by togga (guest, #53103)
[Link] (1 responses)
1. Doesnt scale. Changing representation requires one additional pass over the data. Python is already slow to begin with.
Posted Apr 15, 2018 20:53 UTC (Sun)
by dvdeug (guest, #10998)
[Link]
We're not talking about human readable symbols; we're talking about "non-ASCII symbols in your program" that aren't Unicode. Even if the editor mangles it for you, how is bsymFFEAA9 worse than \xff\xea\xa9? Something slightly smarter than base64 would preserve human readable names and only mangle unreadable names, but the only case where not worrying about mangling is going to cause problems in Python 3 is when it's not human readable.
Posted Apr 11, 2018 22:31 UTC (Wed)
by togga (guest, #53103)
[Link] (8 responses)
$ python2 -c "import json; print(json.dumps(b'xx'))"
Posted Apr 11, 2018 23:39 UTC (Wed)
by smurf (subscriber, #17840)
[Link]
Your second example works when you set a locale in which \xFF is a valid character.
$ echo -n -e "\xFF" | env LC_ALL=iso-8859-1 python3 -c "import sys; print(repr(sys.stdin.read()))"
Posted Apr 11, 2018 23:58 UTC (Wed)
by mjblenner (subscriber, #53463)
[Link] (6 responses)
Features.
> python2 -c "import json; print(json.dumps(b'xx'))"
JSON is UTF-{8|16|32}. What, exactly, do you want python to do with random bytes?
> echo -n -e "\xFF" | python2 -c "import sys; print(repr(sys.stdin.read()))"
Use sys.stdin.buffer to get bytes rather than UTF-8.
Posted Apr 12, 2018 8:15 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Posted Apr 12, 2018 10:24 UTC (Thu)
by mjblenner (subscriber, #53463)
[Link] (3 responses)
Never ever mix them because if you do it will mostly work? (OK...)
Anyway, sounds like you need
PYTHONIOENCODING="utf-8:surrogateescape"
or use open(0, 'rb') or something, depending on what you're trying to do.
Posted Apr 12, 2018 17:05 UTC (Thu)
by togga (guest, #53103)
[Link] (2 responses)
Scripts should then always start by setting this parameter, or is it to late? Are we talking shell wrappers here or refuse to start if set incorrectly?
Posted Apr 12, 2018 17:52 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (1 responses)
Setting the encoding to whatever is actually used is simple enough – besides, that stuff happens to work correctly when your data and your locale match. Surprise: they usually do. And if you want to process binary data, then use "sys.stdin/out.buffer" (or binary mode). This is documented.
On the other hand, allowing a random mix of differently-encoded strings (which is what Python2 or Perl do) and then trying to disentangle the resulting mojibake (or even figure out what causes it) is a frustrating and sometimes futile exercise in preventing data loss after it's too late. Been there, done that, bitten the carpet.
"Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.
Posted Apr 12, 2018 21:59 UTC (Thu)
by togga (guest, #53103)
[Link]
> "Explicit is better than implicit" is one of Python's mottos. I happen to think that it's helpful. If you don't, well, there are other languages.
Posted Apr 12, 2018 16:54 UTC (Thu)
by togga (guest, #53103)
[Link]
Python2 just keeps them as is, works wonderful.
"Use sys.stdin.buffer to get bytes rather than UTF-8."
Thanks. It worked. I made a compatible version. Awesome.
$ echo -n -e "\xFF" | python3 -c "import sys; S=type('S', (bytes,), {'__repr__': lambda s: bytes.__repr__(s)[1:]}); read_stdin=sys.stdin.buffer.read; sys.stdin.read = lambda: S(read_stdin()); print(repr(sys.stdin.read()))"
Posted Apr 11, 2018 21:41 UTC (Wed)
by togga (guest, #53103)
[Link]
Is it really? Python3 has clearly chosen a design-path incompatible with my use-cases and this is on a fundamental level not fixed by a "bug report".
For me, discussing on LWN isn't bitching. It's one very good platform to discuss and transfer information, idéas and knowledge. Now when my Py2 floor is starting to crack I find LWN one good source to know what to do from here. Also, since I'm invested in Py2 I'm motivated to put some work on alternatives and here is one place to find people in similar situations that could help.
I advice to join us with positive attitude and constructive ideas, I think we need less bitch-talk.
Posted Apr 12, 2018 4:33 UTC (Thu)
by njs (subscriber, #40338)
[Link] (1 responses)
Ctypes does support using bytestrings for symbols:
In [4]: libc = ctypes.CDLL("libc.so.6")
In [5]: libc[b"sprintf"]
So I think this criticism is simply mistaken.
Posted Apr 12, 2018 5:44 UTC (Thu)
by njs (subscriber, #40338)
[Link]
Posted Apr 11, 2018 21:29 UTC (Wed)
by togga (guest, #53103)
[Link]
"And why are you using them, instead of strings, in the first place?"
Posted Apr 5, 2018 12:53 UTC (Thu)
by mordae (guest, #54701)
[Link] (3 responses)
I don't really get this. Python 3 made the encoding handling explicit and much more predictable. There is an implicit conversion of bytes into str assuming ASCII, which I think is sensible nowadays (with EBCDIC gone). And apart from that, it's impossible to convert str into bytes without being explicit about the encoding.
> There are so many situations where the code I deal with doesn't know the encoding of the bytes its receiving...
Can you name one?
Posted Apr 5, 2018 16:11 UTC (Thu)
by lsl (subscriber, #86508)
[Link] (1 responses)
Reading from standard input. Reading a file system directory's contents. Opening a file with user-specified name. The values of environment variables. All kinds of networking protocols that only reserve a small subset of ASCII (e.g. '\n') for control purposes with the rest being opaque bytes.
Posted Apr 5, 2018 16:23 UTC (Thu)
by k8to (guest, #15413)
[Link]
It's very common on that platform that the encoding of command output is special to the command. Some things follow the system selected locale, some don't. Some even mix fixed strings in ascii with filename based output in utf-16 and crap like that.
Python3 would be okay for this if bytes were as full-featured as Python2 strs used to be, but they're really not. A lot of the standard library really insists on strings.
Posted Apr 6, 2018 11:15 UTC (Fri)
by jwilk (subscriber, #63328)
[Link]
Python 2 has implicit bytes→unicode conversion. Python 3 doesn't have it.
Posted Apr 5, 2018 11:03 UTC (Thu)
by dunlapg (guest, #57764)
[Link] (4 responses)
People are still maintaining awk, sed, etc. From an ecosystem point of view, it seems like it would be a good idea to have python2 become something similar -- a 'utility' language which is just expected to exist on all Linux systems, which gets fixes but very little active development.
It's just a little matter of finding people to do that...
Posted Apr 5, 2018 12:32 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link] (3 responses)
Posted Apr 5, 2018 14:09 UTC (Thu)
by dunlapg (guest, #57764)
[Link] (2 responses)
Yes, sorry if it wasn't clear: I didn't in any way mean to imply that python 2 was as small as awk or as easy to maintain: rather, just that it was as important to the ecosystem, even if it never grew or developed further than it already has.
Also, just to be clear, I'm not trying to say that any particular individual or group has an obligation to continue supporting python 2; I am saying that the world would be a better place if "someone" stepped up to do it; and if I were RedHat I'd be planning on having a fully python-2-compatible system maintained forward indefinitely as one of my selling points.
Posted Apr 5, 2018 16:25 UTC (Thu)
by k8to (guest, #15413)
[Link] (1 responses)
Posted Apr 5, 2018 17:17 UTC (Thu)
by k8to (guest, #15413)
[Link]
Posted Apr 5, 2018 12:40 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted Apr 5, 2018 15:41 UTC (Thu)
by southey (guest, #9466)
[Link] (21 responses)
Arguments based on dependencies are no longer valid. It will be 10 years in December since Python 3 was released and many of the frequently used packages have already been ported to Python 3. So if a dependency has not been ported by now then users should have no expectations that it will be ported in the future nor should users expect that the package will be continue to be supported. So unless users can get it ported, users need to start replacing those packages with something else that is supported in Python 3.
Posted Apr 6, 2018 12:50 UTC (Fri)
by bandrami (guest, #94229)
[Link] (5 responses)
Though they will get sued if they have the temerity to call that Python 2 distribution "Python"
Posted Apr 12, 2018 4:50 UTC (Thu)
by njs (subscriber, #40338)
[Link] (4 responses)
Posted Apr 17, 2018 13:27 UTC (Tue)
by bandrami (guest, #94229)
[Link] (3 responses)
Can you think of any other situation in free software where a team that has abandoned a codebase has not just discouraged someone willing from taking over maintenance, but in fact used legal pressure to prevent it?
Posted Apr 17, 2018 14:00 UTC (Tue)
by rahulsundaram (subscriber, #21946)
[Link]
That's not what happened here. Maintenance of the codebase is perfectly fine. The name however is not free to use for forks. That is a situation that is quite common in Free software projects.
Posted Apr 17, 2018 20:22 UTC (Tue)
by smurf (subscriber, #17840)
[Link]
Posted Apr 28, 2018 20:07 UTC (Sat)
by flussence (guest, #85566)
[Link]
This Python naming dispute isn't an act of malice - it's a simple trademark defence. Mozilla does exactly the same thing, which is why we have IceCat, IceWeasel, PaleMoon, Seamonkey etc.
Posted Apr 8, 2018 13:52 UTC (Sun)
by togga (guest, #53103)
[Link] (14 responses)
Considering that the Python "community" (or BDFL) for obvious reasons has failed to switch it's users from 2 to 3 during a period of 10 years and then finally resided to threats and end-of-support for this, I'd wonder if this is a sound stance to take. Buying this tactics has had negative impact on the open source community. Py2/Py3 range of problems is now at the same (or surpassing) the annoyance levels of multi-platform path-separation, argument-quoting- and eol-issues. Fedora should answer the following question for their users:
Why should users accept regressions to migrate to Python3?
I think Fedora should stay classy and recommend migration away from a BDFL-controlled language gone awry. Ideally a fork would have happened long before this point.
Posted Apr 8, 2018 18:55 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (13 responses)
Yeah. Hindsight is always 20/20. Where were you 10 years ago? did you know better at the time? I certainly didn't.
Python is not any more BDFL-"controlled" than any other language.
Posted Apr 8, 2018 19:28 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (11 responses)
The correct way is to acknowledge them and fix them. PHP actually did this with ill-fated PHP6, which also had started with "convert-the-world-to-magic-Unicode" approach. But after several years they realized that it's not going to work, so the fork was abandoned and most of the interesting stuff had been backported into PHP5. And the next migration attempt (PHP7) was designed to have only fairly minor breaking changes.
Yet Python persists in the belief that "they know better" and "just because". That "Unicode is like eating veggies" ( https://nothingbutsnark.svbtle.com/porting-to-python-3-is... ).
Well, as a result a lot of large codebases are now migrating away from Python entirely. Google now has a transparent Golang interoperability layer built specifically for Youtube and Dropbox is just moving to Golang piecewise.
Posted Apr 8, 2018 21:09 UTC (Sun)
by smurf (subscriber, #17840)
[Link] (10 responses)
Python2's indiscriminate support for mixing test and binary data caused a ton of hard to track bugs. "the day Django stops supporting Python 2 they will be able to rip out a ton of code that exists purely because it was so easy to mix text and binary data and get it wrong" (cited from the page you linked to). Yes it's annoying to suddenly be required to think about whether your data ist binary or text, and if the latter what encoding, but frankly you should have done that in the first place – the real problem is Python2's inability to support that distinction.
And yes there were some mistakes in early Python 3.x versions, but "acknowledge and fix mistakes" happened, e.g. by ditching 2to3 in favor of six and by extending support for Py2 – not by throwing in the towel because the whole thing was doomed to fail from the beginning. It obviously isn't; Python3 uptake may be slower than expected initially, but it's still way faster than that of Perl6. :-P
If Go is a better fit than Python2/3 for whatever it is you want to do, well, go for it. I'm not going to infer any perceived inferiority of Python2 vs. Python3 from that. It's probably more like "if we need to spend some effort to switch over anyway, let's examine what else might work even better".
Posted Apr 9, 2018 2:17 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
> "the day Django stops supporting Python 2 they will be able to rip out a ton of code that exists purely because it was so easy to mix text and binary data and get it wrong" (cited from the page you linked to).
> If Go is a better fit than Python2/3 for whatever it is you want to do, well, go for it. I'm not going to infer any perceived inferiority of Python2 vs. Python3 from that. It's probably more like "if we need to spend some effort to switch over anyway, let's examine what else might work even better".
Posted Apr 9, 2018 2:58 UTC (Mon)
by roc (subscriber, #30627)
[Link] (5 responses)
> What is wrong with UTF-16?
Outside UTF-16-based platforms (Windows, Java, JS), UTF-16 is strictly worse than UTF-8:
Posted Apr 9, 2018 7:27 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Posted Apr 9, 2018 10:09 UTC (Mon)
by smurf (subscriber, #17840)
[Link]
Posted Apr 9, 2018 23:49 UTC (Mon)
by roc (subscriber, #30627)
[Link] (2 responses)
People use UTF16 a lot more than they should because in the 90s everyone was taught that 2-byte code points was *the* way to support Unicode, and Windows, Java, JS and others designed their APIs around that assumption. That kinda made sense if you believed, as many people did, that the 2-byte encoding with no multi-code-point characters (UCS2) would suffice for every Unicode character forever. That teaching was pervasive enough that many people came to *define* Unicode support as 2-byte code points, and some still do. So I'm not surprised that Python people got it wrong.
It's all a multibillion-dollar mistake :-(. Platform and application vendors did a ton of work to shift to UCS2 and then (implicitly) UTF16, to end up with a worse encoding than UTF8, a lot more work than if they'd just reinterpreted their byte strings as UTF8. Linux got this right, I think largely by just delaying Unicode support until it was clear UTF16 was a loser.
People at the Unicode consortium who set everyone down the UTF16 path really should fess up and apologize. Their mistake caused vast resources to be wasted and has actually caused Unicode support to be worse than it otherwise would have been.
Posted Apr 12, 2018 23:18 UTC (Thu)
by dvdeug (guest, #10998)
[Link]
The other argument is that nobody could have sold a 32 bit encoding in the early 1990s. In 1996, they declared that it was going to have expand from 16 bit to 32 bit (or 20.1 bit). In 2001, Deseret was one of the first scripts encoded beyond above FFFF because they needed to start encoding stuff up there, but they didn't want to start with scripts people were going to fight to keep in the BMP. And yet it wasn't until 2010 that MySQL, even in UTF-8, supported characters above FFFF. Unless they've made some changes since I checked last time, it still bites people that MySQL charset utf8 is for FFFF and below only, and utf8mb4 is needed to actually encode UTF-8.
With a bunch of foresight on everyone's part, it might have been better. But pushing a 32 bit encoding in 1990 could also have mired the idea and left us working with an ISO 2022-style pile of encodings or at least stalled things by a decade where more legacy data in legacy encodings, and even more legacy encodings, were created, and more protocols were designed around the idea that everything has its own encoding instead of everything being in a fixed encoding, or at least a Unicode-compatible encoding.
Posted Apr 15, 2018 1:14 UTC (Sun)
by DHR (guest, #81356)
[Link]
The key was it require "Han unification". Japanese, traditional Chinese, Korean, and simplified Chinese symbols would have to share code points. This was never going to be acceptable to people using those languages.
The analogy I heard was: would the Greek and Roman alphabets share code points? Alpha and A are really the same, are they not? How about Aleph? No way!
The fact is that UNICODE-1 was doomed before birth.
UTF-16 was always a bad idea. Some tried to ignore that and we live with that mistake.
<https://en.wikipedia.org/wiki/Han_unification>
UTF-8 was designed by the Plan9 folks. Quite early. On a napkin. Some of them later brought us Go.
Posted Apr 9, 2018 10:56 UTC (Mon)
by niner (subscriber, #26151)
[Link] (2 responses)
2 years after Python 3's first stable release, I couldn't even install it from a distro repo. And 10 years after said release, I still can't use it in combination with our existing Python 2 code.
I wonder what you use as base for your update comparison. Because at the 2 year mark, I'd call it a hands down victory for Perl 6 and we don't know what the 10 year mark's gonna be like. And anyway, I didn't even know there was a contest.
Posted Apr 10, 2018 8:16 UTC (Tue)
by OttoErickson (guest, #122996)
[Link] (1 responses)
Posted Apr 12, 2018 8:11 UTC (Thu)
by niner (subscriber, #26151)
[Link]
Posted Apr 11, 2018 21:05 UTC (Wed)
by togga (guest, #53103)
[Link]
Since I was a heavy Numpy/Scipy/etc user 10 years ago I just followed Python3 development occasionally and did not really do any effort on using it. Using it for prototyping algorithms, simulations, debug and test-scripts I've never really encountered any problem Python3 tried to solve. Since I use Python for debug and interface a lot with C/API:s, Python3 issues is really piling up when trying to use it. Personally I'm on an exit strategy now but unfortunately there is a lot of Python code out there especially since it's an "easy syntax", popular at many workplaces. In the last couple of years I've been struck by py2/py3 issues on a almost day-to-day basis. I find most of them un-necessary and annoying, haven't seen anything like this in any other language I use.
Posted Apr 10, 2018 17:43 UTC (Tue)
by sionescu (subscriber, #59410)
[Link]
Posted Apr 5, 2018 15:13 UTC (Thu)
by arjan (subscriber, #36785)
[Link] (6 responses)
Posted Apr 6, 2018 10:16 UTC (Fri)
by misc (subscriber, #73730)
[Link] (2 responses)
And to make sure things work on python 3, you need a extensive coverage of the test suite and we are not there yet for Ansible, due to multiple factors, such as "mocking cloud services API is tedious" and "people submit more modules and new stuff than tests and bugfixes". There is a balance between ease of contribution and having a perfect PR.
All the code pass compilation check for python 3 since a long time, but that's Python, that is not enough to verify that it doesn't crash in weird way. And even with complete test coverage, you can uncover edge case (for example, something returning changed when nothing was changed, testing that would kinda requires to double the ressources for testing, and likely the time too).
And we had a catch-22 situation. People didn't used it on python 3 because it wasn't supported officially, but we couldn't support because we were not sure it would work enough, and there was enough workaround by installing python 2. Hopefully, the porting work was started and after a while, it became just "let's try with python 3 and submit patches until that's good enough". For example, that's what I did, took the git repo, tried to deploy openshift (because that was the biggest codebase I could find at that time) until it worked. And now that work good enough, but I focused on my use case, I didn't looked at all cloud modules, as it seemed that we were also waiting on upstream to releases python 3 version (this and the fact that I didn't used them).
Ansible is also really easy to test on python 3, just change one line in inventory. It shouldn't eat your babies or anything, even if nobody can promise that (but we also don't promise that for python 2 execution, and yet people use it, so...).
Posted Apr 6, 2018 12:44 UTC (Fri)
by arjan (subscriber, #36785)
[Link]
I understand the "would like to keep py2 compat" angle since unfortunately not all current enterprise distros ship with py3 as a good option...
Posted Apr 9, 2018 10:00 UTC (Mon)
by OttoErickson (guest, #122996)
[Link]
Job security.
Posted Apr 6, 2018 20:04 UTC (Fri)
by flussence (guest, #85566)
[Link] (2 responses)
Posted Apr 6, 2018 20:34 UTC (Fri)
by comio (guest, #115526)
[Link]
Posted Apr 6, 2018 20:54 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link]
GTK hasn't been the GIMP toolkit in many many years. It is ancient history
Posted Apr 5, 2018 20:19 UTC (Thu)
by hsivonen (subscriber, #91034)
[Link] (26 responses)
Posted Apr 5, 2018 23:09 UTC (Thu)
by smurf (subscriber, #17840)
[Link] (24 responses)
Frankly, I don't see the appeal. You need Python 3 features, you use Python 3.
Posted Apr 6, 2018 3:38 UTC (Fri)
by jhoblitt (subscriber, #77733)
[Link] (1 responses)
Posted Apr 6, 2018 13:09 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Posted Apr 11, 2018 21:19 UTC (Wed)
by togga (guest, #53103)
[Link] (21 responses)
What if you need some Python 3 features but without the Python3 encode/decode string-hell?
Except for GIL and threading, Python2 was quite a productive language. Tauthon would be a nice starting point for distros wanting to support and maintain legacy code.
My understanding is that most of Py2 to Py3 conversions out there have been a waste of time, It is sad to see Numpy in the middle of this.
Posted Apr 12, 2018 14:26 UTC (Thu)
by ceplm (subscriber, #41334)
[Link] (19 responses)
There is no encode/decode hell, there are only programmers who should peel onions in submarine (https://wp.me/p83KNI-eH).
Posted Apr 12, 2018 19:41 UTC (Thu)
by togga (guest, #53103)
[Link]
Posted Apr 12, 2018 20:29 UTC (Thu)
by peniblec (subscriber, #111147)
[Link] (17 responses)
Correct me if I’m wrong, but Joel’s point in this article is that: It does not make sense to have a string without knowing what encoding it uses. […] If you have a string, in memory, in a file, or in an email message, you have to know what encoding it is in or you cannot interpret it or display it to users correctly. To paraphrase, if you have to display any kind of text to a human user, you (or your programming environment) must explicitly know what encoding to use to translate the byte streams you carry around into intelligible characters. Now AFAIU, when people complain about the “encode/decode string-hell” they are not really disputing this. From what I gather, these people deplore that by default, various parts of Python 3’s standard library expect their inputs to be Unicode characters, in contexts where there is no reason for them to be. Personally, while I enjoy Python 3 overall, I agree that the decision to have streams default to meat-world characters rather than bytes is debatable. Not every program has to deal with human-readable strings. Let’s say though that we all collectively agreed that Python having a bias toward human text is a good thing: let’s assume that dealing with byte-streams that do not map to Unicode characters is so rare that having to sprinkle a few Even then, Python’s approach to human text feels somewhat naive: lengths, indexing, iteration and comparison are all based on code points, which AFAIU do not really represent anything meaningful in meat-space. For example, Python 3 thinks that I can’t think of a program geared toward human interaction that should consider these two strings different. Python does offer tl;dr: While Joel’s article is a classic and a must-read, I’m not sure it addresses the problems raised by Python 3’s critics: the language’s preference toward meat-space characters adds hoops to jump through when dealing with genuine byte-streams; the language’s naive handling of these meat-space characters adds hoops to jump through when dealing with those too.
Posted Apr 12, 2018 22:45 UTC (Thu)
by dvdeug (guest, #10998)
[Link] (3 responses)
Posted Apr 13, 2018 6:05 UTC (Fri)
by peniblec (subscriber, #111147)
[Link] (2 responses)
Posted Apr 14, 2018 6:26 UTC (Sat)
by dvdeug (guest, #10998)
[Link] (1 responses)
Posted Apr 14, 2018 11:48 UTC (Sat)
by peniblec (subscriber, #111147)
[Link]
OK. Let’s say Python’s string type uses normalization/grapheme clusters/nanomachines to correctly compare sequences of Unicode characters. Would that necessarily make a text editor overzealously normalize your whole file, thus polluting your patch? I don’t know how actual text editors do it, but I imagine that their representation of your file’s content is more nuanced than simply “whatever get their position in the file’s byte-stream (start and end offset, cached once decoded), so that the editor knows where to apply changes; get their “canonical” Unicode representation, so that the editor can do whatever an editor is supposed to do with meat-space characters (comparison for search-and-replace, length computation for line-wrapping). So with such a design, I don’t think “Python’s (Congratulations, you’ve nerd-sniped me into designing a text editor ;) ) Alternative workaround: teach our diffing tools to normalize text before computing differences :D They do already let us skip whitespace changes, for example, which is a subclass of the more general category of “things computers care about despite being mostly irrelevant to meatbags”.
Posted Apr 12, 2018 23:07 UTC (Thu)
by HelloWorld (guest, #56129)
[Link]
Posted Apr 13, 2018 1:38 UTC (Fri)
by smurf (subscriber, #17840)
[Link] (7 responses)
You need to normalize Unicode before doing meaningful things with it. That's a given in any programming language.
You might find fault with the people who invented Unicode. Blaming your (non-)choice of programming language isn't going to help, except that I can think of lots of ways to make it worse. Just look at Java.
Posted Apr 13, 2018 6:23 UTC (Fri)
by peniblec (subscriber, #111147)
[Link] (1 responses)
If normalization is so obviously needed before dealing with Unicode strings, wouldn’t it make sense for languages to take care of it by default? For example, a language’s string-comparison function could automatically make normalized copies of its operands and compare these; users who actually want to compare codepoints could use something like (Not sure what iteration should produce by default, though. Grapheme clusters?) Maybe performance would take such a hit that it makes sense to let the user ask for normalization explicitly. Disclaimer: I don’t actually know any language which deals with Unicode strings this way; then again, I don’t actually know many languages.
Posted Apr 13, 2018 6:47 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
I personally wouldn't have been so opposed to Py3 if it were to do the same. Unicode is a hard problem and full support of it might require compromises.
Posted Apr 13, 2018 12:44 UTC (Fri)
by HelloWorld (guest, #56129)
[Link] (4 responses)
The essential bit: “Unfortunately, the standard normalisation forms are buggy, and under the current stability policy, cannot be fixed. One example of this that I know is U+387 GREEK ANO TELEIA, which wrongly decomposes canonically (!) into U+00B7 MIDDLE DOT (the Greek name even means literally “upper dot”). This means that some processes may choose to avoid normalisation, because, even the canonical forms risk losing important information.”
Posted Apr 13, 2018 16:39 UTC (Fri)
by ceplm (subscriber, #41334)
[Link]
Posted Apr 13, 2018 17:48 UTC (Fri)
by sfeam (subscriber, #2841)
[Link] (2 responses)
Posted Apr 13, 2018 10:30 UTC (Fri)
by ceplm (subscriber, #41334)
[Link] (3 responses)
And yes, I agree that the implementation in py3k is not perfect, conversion between on-wire eight-bit-per-character to proper str is sometimes problematic, but a lot of work has been spent on it already and the situation is not that bleak, I would call it whatever hell.
Certainly, comparing to the disaster py2k was, py3k is huge improvement.
Posted Apr 14, 2018 13:38 UTC (Sat)
by peniblec (subscriber, #111147)
[Link] (2 responses)
Fair enough. I’ve mostly only worked in Python 3 codebases, and the only place where I hear people debate the str-vs-bytes business is on LWN. That restricts my sample of arguments against Python 3 to the high-level design issues I mentioned; I have not been “in the trenches” migrating sloppy code to Python 3. In my imagination the “characters are bytes” camp (and their code) had been dissolved during the noughties; I guess that was wishful thinking :)
Posted Apr 14, 2018 15:15 UTC (Sat)
by excors (subscriber, #95769)
[Link] (1 responses)
People in the first category might understand Unicode perfectly well, but they often need to deal with e.g. filenames (which aren't really Unicode on Linux or Windows), or with e.g. HTTP headers (where the encoding is unclearly specified and real data often violates the specification anyway), and they want a language that makes it easy and natural to process data like that. Python 3 makes it less easy and less natural than Python 2, since the language and the libraries tend to default to Unicode strings, so those people are unhappy. Meanwhile people in the second category prefer having everything be Unicode by default, since that's all they use anyway. Neither side is wrong or ignorant, they just have different use cases and different requirements, and Python failed to find a way to satisfy both groups.
Posted Apr 14, 2018 16:48 UTC (Sat)
by SiB (subscriber, #4048)
[Link]
In our department (physics) we use python for data analysis and for instrumentation control (including space flight). Python 3 is perfectly fine for the data analysis. Instrumentation control uses the python repl as commanding interface, where python 2 is still ahead.
Posted Apr 28, 2018 20:22 UTC (Sat)
by RooTer (guest, #91640)
[Link]
Having developed python apps in both Python 2 and 3 for years, I would say the encode/decode hell exists in Python 2 realm, not 3.
Posted Apr 6, 2018 11:33 UTC (Fri)
by Otus (subscriber, #67685)
[Link]
Fedora and Python 2
from trivial to very, very complex.
A needs dep B which needs C, which needs A.
There is 134 418 packages on PyPI, how many of those supports Python 3?
Fedora and Python 2
The real problem is that while it's a _different_ language, it's not monotonically a _better_ language. In particular, if you're trying to write the kind of system software that shouldn't care what text encoding its input uses and shouldn't crash when fed badly-encoded text, working around Python 3's halpful Unicode handling leads to considerable friction; you basically have to either pretend that everything in the outside world is Latin-1, or use bytes objects _everywhere_ and accept that half the standard library won't work. Ah well, at least they eventually added %-formatting to bytes objects.
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
I certainly didn't believe that "let the container have bytes, and describe the encoding if known" was the right way around the time py3k was being clarified, but I do now. I think the industry at large hadn't come to that decision at all at that time.
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
>>> import ctypes
>>> t = type('iface', (ctypes.Structure,), {'_fields_': [(b'c_string_symbol', ctypes.CFUNCTYPE(ctypes.c_uint32))]})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '_fields_' must be a sequence of (name, C type) pairs
Fedora and Python 2
And why are you using them, instead of strings, in the first place?
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
>>> s.c_string_symbol()
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
> "Or you could add a layer of indirection between the structure field names and the function symbols."
Fedora and Python 2
Fedora and Python 2
2. Accessing human readable symbols is convenient when needed by scripts, tests or debug.
Fedora and Python 2
Fedora and Python 2
"xx"
$ echo -n -e "\xFF" | python2 -c "import sys; print(repr(sys.stdin.read()))"
'\xff'
Fedora and Python 2
You can change that. See "pydoc3 json".
'\udcff'
Fedora and Python 2
Fedora and Python 2
And remember to never, ever mix them in the same program. Oh, and it'll mostly work if you forget .buffer in one place. It'll just crash with bad data sometimes.
Fedora and Python 2
Fedora and Python 2
If we do multiple things with multiple needs for encoding, do we need different settings for for different incoming data, in other words set it with each read?
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
'\xff'
Fedora and Python 2
Fedora and Python 2
Out[5]: <_FuncPtr object at 0x7f14c0f88110>
Fedora and Python 2
Fedora and Python 2
These ends up as attributes, that'll not just be a bug-report but propagate everywhere and probably bring up the need for a Python4.
Because these are not hand-written in python scripts, these are read from external sources from various places, like C-strings from C/API:s.
Fedora and Python 2
Fedora and Python 2
>
> Can you name one?
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Python 3 is basically a new language, porting is anything
from trivial to very, very complex.
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Don't forget that users can still install Python 2 from source in their distribution of choice and users can create their own Python 2 distribution as well. But if users wants to have the benefits of a distribution then users need to support their distribution and get those Python 2 packages ported over. This would also be true for PyPI but PyPI is not a Linux distribution and PyPI needs to become more proactive as well.
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
A few years back the libav gang attempted to sue the legitimate FFmpeg project out of existence over its use of the logo. They failed because they didn't actually own any rights to the image to begin with; a third party contributed it. Hasn't stopped them using it, mind you.
Fedora and Python 2
Why would this situation not happen all over again with Pyhton4, when Guido is disappointed with something else in Python3?
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Why is that? What is wrong with UTF-16?
I really doubt it.
Exactly. And this is self-inflicted entirely.
Fedora and Python 2
* Twice the space usage for ASCII (almost never more compact than UTF-8 on any text in practice)
* Multi-code-point characters are rarer, therefore less well tested
* Byte-order-dependent
* Needs special logic to sort lexicographically
More details: http://utf8everywhere.org/
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Han Unification means UCS-2 was doomed and UTF-16 makes no sense
Fedora and Python 2
Not being a Perl Monger I'm genuinely surprised to hear that Perl 6 is being used in production. I was under the impression that Perl 6 had largely been abandon; it would seem that my impression was incorrect.
In all honesty, this is not a criticism or flame of Perl; I've never really worked with Perl—any version—so I haven't kept up with it. I'd love to see an article on the topic of Perl 6 here on LWN.
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
If you want an unchanging dynamic-like language try Common Lisp :)
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
There's also GIMP, but they don't even keep up to date with their own toolkit any more…
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
b
s and .buffer
s here and there is not a deal-breaker.'é' != 'é'
because one is 'e'+'\N{COMBINING ACUTE ACCENT}'
and the other is '\N{LATIN SMALL LETTER E WITH ACUTE}'
. My French AZERTY keyboard makes typing the latter straightforward; I understand that GTK applications make it easy to type the former with “e Control-Shift-U 301”.unicodedata.normalize()
to solve this specific problem; must we rely on every text-handling Python program out there to make its input go through this function? Arguably, shouldn’t the language abstract this minutiae away from us?
Fedora and Python 2
Fedora and Python 2
editor normalize tokens only for some operations (e.g.
character-count, searching) and otherwise preserve the file's content,
only effectively changing the parts the user actually edited?
Fedora and Python 2
Fedora and Python 2
open(filename)
returned”. I would assume that they represent a “file” as sequences of opaque “word” or “line” objects, each of those objects having methods to
str
canonicalizing behind your back” would necessarily lead to “OMG this commit is full of extraneous crap introduced by this dumb Python text editor”. Again, I might not have thought enough about this, maybe the above does nothing to solve the problem.Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
list(s1.codepoints()) == list(s2.codepoints())
.Fedora and Python 2
Fedora and Python 2
https://mortoray.com/2013/11/27/the-string-type-is-broken...
Fedora and Python 2
U+0387 doesn't "decompose" into anything. It's not a combining form. It is an example of a character in one alphabet whose common written form happens to look like a character from some other alphabet or set of conventional symbols. Because they look similar [in typical fonts] people tend to type whichever is more convenient. But neither one is the "canonical form" of the other. A more familiar pair would be Greek letter "mu" (U+03BC) and the scientific prefix "micro" (U+00B5). The existence of such pairs can be a problem, but it's a different problem than canonicalization. While it might make sense to be suspicious of micro signs appearing in what is otherwise a Greek alphabet URL, it would be a bad idea to replace all micro signs with "mu" (or vice versa) in a document that happened to include both Greek text and quantities in SI units.
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Fedora and Python 2
Seems as stupid `UnicodeDecodeError`plague almost every Python 2 project, and switch to Python 3 would be a good idea just for the clear str/bytes distinction.
Fedora and Python 2