Python identifiers, PEP 8, and consistency
While there are few rules on the names of variables, classes, functions, and so on (i.e. identifiers) in the Python language, there are some guidelines on how those things should be named. But, of course, those guidelines were not always followed in the standard library, especially in the early years of the project. A suggestion to add aliases to the standard library for identifiers that do not follow the guidelines seems highly unlikely to go anywhere, but it led to an interesting discussion on the python-ideas mailing list.
To a first approximation, a Python
identifier can be any sequence of Unicode code points that correspond
to characters, but they cannot start with a numeral nor be the same as
one of the 35 reserved keywords. That leaves a lot of room for
expressiveness (and some confusion) in
those names. There is, however, PEP 8
("Style Guide for Python Code
") that has some naming
conventions for identifiers, but the PEP contains a caveat: "The naming
conventions of Python's library are a bit of a mess, so we'll never get
this completely consistent
".
But consistency is just what Matt del Valle was after when he proposed making aliases for identifiers in the standard library that do not conform to the PEP 8 conventions. The idea cropped up after reading the documentation for the threading module in the standard library, which has a note near the top about deprecating the camel-case function names in the module for others that are in keeping with the guidelines in PEP 8. The camel-case names are still present, but were deprecated in Python 3.10 in favor of names that are lower case, sometimes with underscores (e.g. threading.current_thread() instead of threading.currentThread()).
The PEP
PEP 8 suggests that function names "should be lowercase, with words
separated by underscores as necessary to improve readability
", which
is what the changes for threading do. In addition, the PEP says
that names for variables,
methods, and arguments should follow the function convention, while types
and classes should use camel case (as defined by the PEP, which includes an
initial capital letter, unlike other camel-case definitions out there).
Del Valle calls that form of capitalization "PascalCase" and noted that
there are various inconsistencies in capitalization in the standard
library:
I realize that large chunks of the stdlib predates pep8 and therefore use various non-uniform conventions. For example, the logging module is fully camelCased, and many core types like `str` and `list` don't use PascalCase as pep8 recommends. The `collections` module is a veritable mosaic of casing conventions, with some types like `deque` and `namedtuple` being fully lowercased while others like `Counter` and `ChainMap` are PascalCased.
Given the precedent in threading, he wondered if it would be
feasible to "add aliases across the board for all public-facing
stdlib types and functions that don't follow pep8-recommended
casing
". The "wart" of inconsistent naming conventions in his code
bothers him, perhaps more than it should, he said, but he thought others
might feel similarly, which could perhaps lead to the problem being solved
rather than endured. Beyond that, though, it makes it somewhat more
difficult to teach good practices in the language:
I always try to cover pep8 very early to discourage people I'm training from internalizing bad habits, and it means you have to explain that the very standard library itself contains style violations that would get flagged in most modern code reviews, and that they just have to keep in mind that despite the fact that the core language does it, they should not.
Reactions
Overall, the reception was rather chilly, though not universally so. The commenters generally acknowledged that there are some unfortunate inconsistencies, but the pain of making a change like what he proposed is too high for the value it would provide. Eric V. Smith put it this way:
The cost of having two ways to name things for the indefinite future is too high. Not only would you have to maintain it in the various Python implementations, you'd have to explain why code uses "str" or "Str", or both.
Among Del Valle's suggested changes were aliasing the "type functions" to their PascalCase equivalents (e.g. str() to Str()), as Smith mentions. But that would be a fundamental change with no real upside and a high cost, Smith said. Mike Miller agreed with that, but wondered if there might be some middle ground, noting some common confusion with the datetime module:
One of my biggest peeves is this:import datetime # or from datetime import datetimeWhich is often confusing... is that the datetime module or the class someone chose at random in this module? A minor thorn that… just doesn't go away.
Neil Girdhar also thought
that changing str() and friends was "way too
ambitious. But some minor cleanup might not be so pernicious?
"
On the other hand, Jelle Zijlstra brought
some first-hand experience with changes of this sort to the discussion.
He had worked on
explicitly
deprecating (i.e. with DeprecationWarning) some of the camel-case
identifiers in the threading module; "in retrospect I don't feel like that was
a very useful contribution. It just introduces churn to a bunch of
codebases and makes it harder to write multiversion code.
"
Chris Angelico had a number of objections to Del Valle's ideas, but existing code that already reuses the names of some of the identifiers is particularly problematic:
Absolutely no value in adding aliases for everything, especially things that can be shadowed. It's not hugely common, but suppose that you deliberately shadow the name "list" in your project - now the List alias has become disconnected from it, unless you explicitly shadow that one as well. Conversely, a much more common practice is to actually use the capitalized version as a variant:class List(list): ...This would now be shadowing just one, but not the other, of the built-ins. Confusion would abound.
Angelico, along with others in the thread, pointed to the first
section of PEP 8, which is titled "A Foolish Consistency is
the Hobgoblin of Little Minds
" (from the Ralph Waldo Emerson quote).
That section makes it clear that the PEP is meant as a guide; consistency
is most important at the function and module level, with project-level
consistency being next in line. Any of those is more important than
rigidly following the guidelines. As
Angelico put it: "When a style guide becomes a boat anchor, it's not
doing its job.
"
Paul Moore had a more fundamental objection to aliasing the type functions, noting that the PEP does not actually offer clear-cut guidance. He quoted from the "Naming Conventions" section and showed how it led to ambiguity:
"""
Names that are visible to the user as public parts of the API should follow conventions that reflect usage rather than implementation.
"""
To examine some specific cases, lists are a type, but list(...) is a function for constructing lists. The function-style usage is far more common than the use of list as a type name (possibly depending on how much of a static typing advocate you are...). So "list" should be lower case by that logic, and therefore according to PEP 8. And str() is a function for getting the string representation of an object as well as being a type - so should it be "str" or "Str"? That's at best a judgement call (usage is probably more evenly divided in this case), but PEP 8 supports both choices. Or to put it another way, "uniform" casing is a myth, if you read PEP 8 properly.
But there are tools, such as the flake8 linter, that try to rigidly apply the PEP 8 "rules" to a code base; some projects enforce the use of these tools before commits can be made. But linters cannot really determine the intent of the programmer, so they are inflexible and are probably not truly appropriate as an enforcement mechanism. Moore said:
Unfortunately, this usually (in my experience) comes about through a "slippery slope" of people saying that mandating a linter will stop endless debates over style preferences, as we'll just be able to say "did the linter pass?" and move on. This of course ignores the fact that (again, in my experience) far *more* time is wasted complaining about linter rules than was ever lost over arguments about style :-(
Changes
Del Valle acknowledged
that "some awkward shadowing edge-cases are the strongest argument
against this proposal
", but Angelico disagreed. "The strongest argument is churn -
lots and lots of changes for zero benefit.
". Del Valle recognized
that the winds were strongly blowing against the sweeping changes he had
suggested, but in the hopes of "salvaging *something* out of
it
" he reduced the scope substantially: "Add pep8-compliant
aliases for camelCased public-facing names in the stdlib (such as logging
and unittest) in a similar manner as was done with threading
"
While Ethan Furman was in favor of such a change, others who had also mentioned the inconsistencies in unittest and logging did not follow suit. Most who replied to Furman recommended switching to pytest instead of unittest for testing, though alternatives to logging were not really on offer.
Guido van Rossum
had a succinct
response to the idea: "One thought: No.
" That essentially put the
kibosh on it (not formally, of course, but
Van Rossum's opinion carries a fair amount of weight), so Del Valle withdrew
it entirely. It is clear there was no groundswell of support for it, even
in more limited guises, but the discussion touched on various aspects of
the language and its history. It seems clear that if Python had been developed
in one fell swoop, rather than being added to in a piecemeal fashion over
decades, different choices would have been made. More (or even fully)
consistent identifiers within the project's code base may well have been
part of that.
But, at this point, it is far too late for a retrofit, at least for many; even if everyone agreed on how to change things, the upheaval, code churn, and dual-naming would be messy. And the gain, while not zero, is not huge. Beyond that, the day when the inconsistent names could actually be removed is extremely distant—likely never, in truth. So users and teachers of the language will need to keep in mind some semi-strange inconsistencies in the darker corners, warts, which exist in all programming (and other) languages. Humans are not consistent beasts, after all.
Index entries for this article | |
---|---|
Python | Python Enhancement Proposals (PEP)/PEP 8 |
Posted Nov 30, 2021 23:25 UTC (Tue)
by iabervon (subscriber, #722)
[Link] (2 responses)
Of course, by this metric, readers of PEP 8 are almost certainly not going to be naming a class that ought to be lowercase, and few packages outside of the standard library's builtins would even do it (exceptions being, perhaps, construct, numpy, and pandas). But I think it would be good (as a retcon) to specify for educational purposes. "If I tell you it's a pathlib.Path, you have some idea what to expect; if I tell you it's a str, you really don't."
Posted Dec 1, 2021 7:44 UTC (Wed)
by epa (subscriber, #39769)
[Link] (1 responses)
In Python, one answer might be to make the language case insensitive. Wait, hear me out… suppose you can declare particular source files or classes to be case insensitive, so callers can use any case. The library code can then be made consistent. An automatic linter or cleaner will convert code to the canonical form used in the library. If you do that cleanup, you can then go back to strict case matching. Or you might choose to keep working in case-insensitive mode and get warnings, rather than hard errors, on mismatches. Effectively making it a code style issue rather than some non-negotiable semantic question, which it doesn’t need to be.
That doesn’t do much for studlyCaps versus underscores, I admit. I guess if you must have aliases, they should be declared specially so that linter tools (and the compiler’s own diagnostics) can help the programmer. They should not just be a bunch of wrapper functions.
Posted Dec 1, 2021 19:35 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
> That doesn’t do much for studlyCaps versus underscores, I admit. I guess if you must have aliases, they should be declared specially so that linter tools (and the compiler’s own diagnostics) can help the programmer. They should not just be a bunch of wrapper functions.
You don't need wrapper functions in Python. Python functions and classes are first-class objects, so you can just do this:
def original_name(args...): ...
alias = original_name
If a linter is unable to figure that out, it needs to be redesigned from the ground up.
* Except for the case-insensitive strings. Those are terrible.
Posted Dec 1, 2021 2:45 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
I hate this section of the PEP. You can cite it as justification for literally any violation of PEP 8, regardless of whether there's a good reason to violate PEP 8. In my experience, it's very uncommon for people citing this section of the PEP to actually explain the underlying justification for making an exception, even though that very section explicitly states that "Consistency with this style guide is important."
As for consistency within the standard library, or within individual modules... that's a terrible argument. The standard library is wholly inconsistent with itself, and the collections module is an excellent example of this inconsistency. This is actually an argument *in favor* of picking a specific style and standardizing on it, rather than continuing with the status quo. Of course, if you have to pick one style, you should probably pick the PEP 8 style, and so this is not a refutation at all.
Posted Dec 1, 2021 14:43 UTC (Wed)
by martin.langhoff (guest, #61417)
[Link] (2 responses)
That's one of the points where having a lead everyone trusts to make calls in these gray zones is key.
At the other extreme of the range, there's PHP to consider :-)
Posted Dec 1, 2021 19:39 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
Posted Dec 13, 2021 17:19 UTC (Mon)
by nye (subscriber, #51576)
[Link]
If anything, I would say that PHP is *closer* to the consistent end of the range than Python is, at least if you consider its trajectory. Many of the changes of the last four or five years have been in the direction of improving consistency - sometimes in ways that are similar to those rejected in this Python discussion. Notable are certain cases where "backwards compatibility" was the only defence for otherwise indefensible behaviour, like the weirdness in `implode()` - PHP is definitely more willing than post-3.0 Python to break compatibility in favour of correctness, and consistency is increasingly a part of that.
Granted there are a lot of standard library functions which are simple wrappers around C functions, and in those cases they normally opt for "consistency" with the C function rather than with each other. I personally wish it had fewer of these very shallow wrappers, especially for very widely used libraries like cURL, but I can see the argument.
Posted Dec 1, 2021 17:04 UTC (Wed)
by jmaa (guest, #128856)
[Link]
In the spirit of quantifying mistakes, let's try a Fermat-style equation for how much money and time might be wasted in migrating codebases to use new consistent identifiers, in a scenario where the original names are deprecated. I pulled these numbers from my ass, but they sound plausible:
So that's around 16666 wasted work hours, and $666666 down the drain. This only counts the direct migration work, not all of the political drama and dread related to backwards-incompatible version upgrades, or the costs incurred by botched migrations.
Posted Dec 1, 2021 17:24 UTC (Wed)
by pj (subscriber, #4506)
[Link] (2 responses)
they could even write code to monkeypatch such aliases in so all they'd have to do is something like:
...once this kind of thing is done, it arguably could go live with stdlib. It'd be cute to be able to
to enable such a feature :)
Posted Dec 2, 2021 6:32 UTC (Thu)
by buck (subscriber, #55985)
[Link] (1 responses)
Posted Dec 30, 2021 3:45 UTC (Thu)
by moxfyre (guest, #13847)
[Link]
`from __future__ import foolish_consistency`
Posted Dec 2, 2021 13:47 UTC (Thu)
by smitty_one_each (subscriber, #28989)
[Link]
Yeah, `from datetime import datetime` is an eyesore, but it makes an important point about the history of the project.
The newbie marks a turning point when the basics of the language/library are understood, and delving into the 'hysterical raisins' why things are the way they are becomes of interest.
This sort of totalitarian cleanup would be appropriate for a Python4, when the GIL is relaxed, and the Promised Land entered.
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
1 000 000 000 lines of actively maintained Python code
0.001 incidence ratio of deprecation
1 minute per incidence
40$ hourly programmer wage
I feel like people who want that level of consistency could make a `pep8ed` module that would be just aliases.
Python identifiers, PEP 8, and consistency
from pep8ed.datetime import DateTime, TimeDelta
from pep8er import pep8ify
import datetime
pep8ify(datetime)
now = datetime.DateTime.now()
import __foolish_consistency__
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency
Python identifiers, PEP 8, and consistency