|
|
Subscribe / Log in / New account

Szorc: Mercurial's Journey to and Reflections on Python 3

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 0:02 UTC (Tue) by rgmoore (✭ supporter ✭, #75)
In reply to: Szorc: Mercurial's Journey to and Reflections on Python 3 by pj
Parent article: Szorc: Mercurial's Journey to and Reflections on Python 3

One of the points made in the blog post, though, is that the creators of Python 3 did some really stupid stuff that made it needlessly difficult to write code that worked in both versions. The specific example that stood out to me was the use of identifiers to specify whether a string literal was a string of bytes or of unicode points. In Python 2, it was possible to specify b'' to say it was a byte string and u'' to say it was a unicode strong. Python 3 kept the b'' syntax but initially eliminated the u'' for unicode strings, and only brought it back when users complained. That hurt people trying to move from Python 2 to Python 3 without providing any benefit to people starting with Python 3.


to post comments

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 10:12 UTC (Tue) by smurf (subscriber, #17840) [Link] (14 responses)

The u'' syntax was removed because the initial idea was that people would use 2to3 and similar tools to convert their code base to Python3 once, and they'd be done. Given the initial goal of quickly converting the whole infrastructure to Python3 that could even have worked.

What happened instead was an intense period of slowly converting to Py3, heaps of code that use "import six", and modules that ran, and run, with both 2 and 3 once some of those nits were reverted. And they were.

Thus, IMHO accusations of Python core developers not listening to (some of) their users are for the most part really unfounded. Hindsight is 20/20, yes they could have done some things better, but frankly my compassion for people who take their own sweet time to port their code to Python3 and complain *now*, when there's definitely no more chance to add anything to Py2 to make the transition easier, is severely limited.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 14, 2020 15:54 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

> The u'' syntax was removed because the initial idea was that people would use 2to3 and similar tools to convert their code base to Python3 once, and they'd be done.
That wouldn't have worked because Python 3.0 lacked many required features, like being able to use format strings with bytes. They got re-added only in Python 3.5 released in late 2014 ( https://www.python.org/dev/peps/pep-0461/ ). So for many projects realistic porting could begin around 2015 when it trickled down to major distros.

These concerns were raised back in 2008, but Py3 developers ignored them because it was clear (to them) that only bad code needed it and developers should shut up and eat their veggies.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 22, 2020 18:47 UTC (Wed) by togga (subscriber, #53103) [Link] (12 responses)

> "like being able to use format strings with bytes. They got re-added only in Python 3.5 released in late 2014"

And then obviously removed later on.

Python 2.7.17 >>> b'{x}'.format(x=10)
'10'

Python 3.7.5 >>> b'{x}'.format(x=10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'format'

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 22, 2020 19:47 UTC (Wed) by foom (subscriber, #14868) [Link] (11 responses)

In python 3.5, the "legacy" % formatting,
>>> b'%d' % (55,)
was supported again, but *not* the new and recommended format function.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 22, 2020 19:57 UTC (Wed) by togga (subscriber, #53103) [Link] (10 responses)

What an irony that the new and recommended format function is not working with latest Python3.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 23, 2020 8:50 UTC (Thu) by smurf (subscriber, #17840) [Link] (9 responses)

That's not an irony, it actually makes sense. %- and .format-formatting are typically used in different contexts.
.format was never "recommended" on bytestrings, in fact it was initially proposed for Python3. Neither was %, but lots of older code uses it in contexts which end up byte-ish when you migrate to Py3. That usage never was prevalent for .format, so why should the Python devs incur the additional complexity of adding it to bytes?

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 23, 2020 14:18 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (7 responses)

> That usage never was prevalent for .format, so why should the Python devs incur the additional complexity of adding it to bytes?

So instead the burden is put on the coder to have to think about whether bytes or strings will be threaded through their code and can't use the newer API if they might have bytes floating about?

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 23, 2020 16:48 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

You're not supposed to use bytes. Bytes are unhealthy and bad for you. Fake Unicode all the way!

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 25, 2020 2:24 UTC (Sat) by togga (subscriber, #53103) [Link] (5 responses)

I think not being able to migrate developers to python3 for 10 years took a toll on pride resulting in politics and statements rather than sane language development. A number on weird decisions (some described in the article) pointing at.

There is no reason for not
* allowing byte strings as attributes
* being consistent with types and syntax for byte strings and strings
* being consistent with format options for strings and byte strings
* etc.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 25, 2020 11:03 UTC (Sat) by smurf (subscriber, #17840) [Link] (4 responses)

Python3 never had bytestrings as attributes, so I don't know how that would follow.

Python3 source code is Unicode. Python attribute access is written as "object.attr". This "attr" part therefore must be Unicode. Why would you want to use anything else? If you need bytestrings as keys, or integers for that matter, use a dict.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 25, 2020 21:44 UTC (Sat) by togga (subscriber, #53103) [Link] (3 responses)

Python source code doesn't have to be unicode. The encoding of the source code has nothing to do with attributes.

>Why would you want to use anything else?
Mostly due to library API:s requiring attributes for many thing. This is a big source for py3 encode/decode errors.

>"use a dict."
This is what attributes does:

>>> a=type('A', (object,), {})()
>>> setattr(a, 'b', 22)
>>> setattr(a, b'c', 12)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: attribute name must be string, not 'bytes'
>>> a.__dict__
{'b': 22}
>>> type(a.__dict__)
<class 'dict'>
>>> a.__dict__[b'c']=12
>>> a.__dict__
{'b': 22, b'c': 12}

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 29, 2020 12:32 UTC (Wed) by smurf (subscriber, #17840) [Link] (2 responses)

> Python source code doesn't have to be unicode

Sure, if you want to be pedantic you can use "-*- coding: iso-8859-1 -*-" (or whatever) in the first two lines and write your code in Latin1 (or whatever), but that's just the codec Python uses to read the source. It's still decoded to Unicode internally.

> >"use a dict."
> This is what attributes does:

Currently. In CPython. Other Python implementations, or future versions of CPython, may or may not use what is, or looks like, a generic dict to do that.

Yes, I do question why anybody would want to use attributes which then can't be accessed with `obj.attr` syntax. That's useless.
Also, it's not just bytes, arbitrary strings frequently contains hyphens, dots, or even start with digits.

Use a (real) dict.

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Feb 12, 2020 20:40 UTC (Wed) by togga (subscriber, #53103) [Link] (1 responses)

>> " That's useless."

As I said above, it's a necessity due to the design of library APIs. Examples of needed, otherwise unnecessary, encode/decode are plenty (and error-prone). Article mentions a few, I've already mentioned ctypes where for instance structure field names (often read from binary APIs such as c strings, etc) is required to be attributes.

This thread has become a bit off topic. The interesting question for me is Python 2to3 language regressions or which migrations that are feasible, that stage was done in ~ 2010 to 2013 with several Python3 failed migration attempts. Nothing of value has changed since. Half of my Python2 use-cases is not suited for Python3 due to it's design choices and I do not intend to squeeze any problem in a tool not suited for it. That's more of a fact.

The question back of my head for me is about the other half of my use-cases that fits Python3. Given the experience of python leadership attitudes, decisions, migration pains, etc which the article is one example of. Is python3 a sound language choice for new projects?

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Feb 12, 2020 20:47 UTC (Wed) by togga (subscriber, #53103) [Link]

>> interesting question for me is Python 2to3 language regressions

Oops.. it should read the opposite. "is not" Python 2to3 language regressions

Szorc: Mercurial's Journey to and Reflections on Python 3

Posted Jan 25, 2020 13:26 UTC (Sat) by foom (subscriber, #14868) [Link]

The new format method was added because it was thought to be a better syntax for doing formatting. The % formatting was only kept in python3 (for strings) because it didn't seem feasible to migrate everyone's existing format strings, which might even be stored externally in config files.

Given the invention of better format syntax, forcing the continued use of the worse/legacy % format syntax for bytestrings seems a somewhat mystifying decision.

It's not as if the only use of bytestrings is in code converted from python 2...


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds