|
|
Subscribe / Log in / New account

Python identifiers, PEP 8, and consistency

By Jake Edge
November 30, 2021

While there are few rules on the names of variables, classes, functions, and so on (i.e. identifiers) in the Python language, there are some guidelines on how those things should be named. But, of course, those guidelines were not always followed in the standard library, especially in the early years of the project. A suggestion to add aliases to the standard library for identifiers that do not follow the guidelines seems highly unlikely to go anywhere, but it led to an interesting discussion on the python-ideas mailing list.

To a first approximation, a Python identifier can be any sequence of Unicode code points that correspond to characters, but they cannot start with a numeral nor be the same as one of the 35 reserved keywords. That leaves a lot of room for expressiveness (and some confusion) in those names. There is, however, PEP 8 ("Style Guide for Python Code") that has some naming conventions for identifiers, but the PEP contains a caveat: "The naming conventions of Python's library are a bit of a mess, so we'll never get this completely consistent".

But consistency is just what Matt del Valle was after when he proposed making aliases for identifiers in the standard library that do not conform to the PEP 8 conventions. The idea cropped up after reading the documentation for the threading module in the standard library, which has a note near the top about deprecating the camel-case function names in the module for others that are in keeping with the guidelines in PEP 8. The camel-case names are still present, but were deprecated in Python 3.10 in favor of names that are lower case, sometimes with underscores (e.g. threading.current_thread() instead of threading.currentThread()).

The PEP

PEP 8 suggests that function names "should be lowercase, with words separated by underscores as necessary to improve readability", which is what the changes for threading do. In addition, the PEP says that names for variables, methods, and arguments should follow the function convention, while types and classes should use camel case (as defined by the PEP, which includes an initial capital letter, unlike other camel-case definitions out there). Del Valle calls that form of capitalization "PascalCase" and noted that there are various inconsistencies in capitalization in the standard library:

I realize that large chunks of the stdlib predates pep8 and therefore use various non-uniform conventions. For example, the logging module is fully camelCased, and many core types like `str` and `list` don't use PascalCase as pep8 recommends. The `collections` module is a veritable mosaic of casing conventions, with some types like `deque` and `namedtuple` being fully lowercased while others like `Counter` and `ChainMap` are PascalCased.

Given the precedent in threading, he wondered if it would be feasible to "add aliases across the board for all public-facing stdlib types and functions that don't follow pep8-recommended casing". The "wart" of inconsistent naming conventions in his code bothers him, perhaps more than it should, he said, but he thought others might feel similarly, which could perhaps lead to the problem being solved rather than endured. Beyond that, though, it makes it somewhat more difficult to teach good practices in the language:

I always try to cover pep8 very early to discourage people I'm training from internalizing bad habits, and it means you have to explain that the very standard library itself contains style violations that would get flagged in most modern code reviews, and that they just have to keep in mind that despite the fact that the core language does it, they should not.

Reactions

Overall, the reception was rather chilly, though not universally so. The commenters generally acknowledged that there are some unfortunate inconsistencies, but the pain of making a change like what he proposed is too high for the value it would provide. Eric V. Smith put it this way:

The cost of having two ways to name things for the indefinite future is too high. Not only would you have to maintain it in the various Python implementations, you'd have to explain why code uses "str" or "Str", or both.

Among Del Valle's suggested changes were aliasing the "type functions" to their PascalCase equivalents (e.g. str() to Str()), as Smith mentions. But that would be a fundamental change with no real upside and a high cost, Smith said. Mike Miller agreed with that, but wondered if there might be some middle ground, noting some common confusion with the datetime module:

One of my biggest peeves is this:
    import datetime # or
    from datetime import datetime
Which is often confusing... is that the datetime module or the class someone chose at random in this module? A minor thorn that… just doesn't go away.

Neil Girdhar also thought that changing str() and friends was "way too ambitious. But some minor cleanup might not be so pernicious?" On the other hand, Jelle Zijlstra brought some first-hand experience with changes of this sort to the discussion. He had worked on explicitly deprecating (i.e. with DeprecationWarning) some of the camel-case identifiers in the threading module; "in retrospect I don't feel like that was a very useful contribution. It just introduces churn to a bunch of codebases and makes it harder to write multiversion code."

Chris Angelico had a number of objections to Del Valle's ideas, but existing code that already reuses the names of some of the identifiers is particularly problematic:

Absolutely no value in adding aliases for everything, especially things that can be shadowed. It's not hugely common, but suppose that you deliberately shadow the name "list" in your project - now the List alias has become disconnected from it, unless you explicitly shadow that one as well. Conversely, a much more common practice is to actually use the capitalized version as a variant:
class List(list):
    ...
This would now be shadowing just one, but not the other, of the built-ins. Confusion would abound.

Angelico, along with others in the thread, pointed to the first section of PEP 8, which is titled "A Foolish Consistency is the Hobgoblin of Little Minds" (from the Ralph Waldo Emerson quote). That section makes it clear that the PEP is meant as a guide; consistency is most important at the function and module level, with project-level consistency being next in line. Any of those is more important than rigidly following the guidelines. As Angelico put it: "When a style guide becomes a boat anchor, it's not doing its job."

Paul Moore had a more fundamental objection to aliasing the type functions, noting that the PEP does not actually offer clear-cut guidance. He quoted from the "Naming Conventions" section and showed how it led to ambiguity:

"""
Names that are visible to the user as public parts of the API should follow conventions that reflect usage rather than implementation.
"""

To examine some specific cases, lists are a type, but list(...) is a function for constructing lists. The function-style usage is far more common than the use of list as a type name (possibly depending on how much of a static typing advocate you are...). So "list" should be lower case by that logic, and therefore according to PEP 8. And str() is a function for getting the string representation of an object as well as being a type - so should it be "str" or "Str"? That's at best a judgement call (usage is probably more evenly divided in this case), but PEP 8 supports both choices. Or to put it another way, "uniform" casing is a myth, if you read PEP 8 properly.

But there are tools, such as the flake8 linter, that try to rigidly apply the PEP 8 "rules" to a code base; some projects enforce the use of these tools before commits can be made. But linters cannot really determine the intent of the programmer, so they are inflexible and are probably not truly appropriate as an enforcement mechanism. Moore said:

Unfortunately, this usually (in my experience) comes about through a "slippery slope" of people saying that mandating a linter will stop endless debates over style preferences, as we'll just be able to say "did the linter pass?" and move on. This of course ignores the fact that (again, in my experience) far *more* time is wasted complaining about linter rules than was ever lost over arguments about style :-(

Changes

Del Valle acknowledged that "some awkward shadowing edge-cases are the strongest argument against this proposal", but Angelico disagreed. "The strongest argument is churn - lots and lots of changes for zero benefit.". Del Valle recognized that the winds were strongly blowing against the sweeping changes he had suggested, but in the hopes of "salvaging *something* out of it" he reduced the scope substantially: "Add pep8-compliant aliases for camelCased public-facing names in the stdlib (such as logging and unittest) in a similar manner as was done with threading"

While Ethan Furman was in favor of such a change, others who had also mentioned the inconsistencies in unittest and logging did not follow suit. Most who replied to Furman recommended switching to pytest instead of unittest for testing, though alternatives to logging were not really on offer.

Guido van Rossum had a succinct response to the idea: "One thought: No." That essentially put the kibosh on it (not formally, of course, but Van Rossum's opinion carries a fair amount of weight), so Del Valle withdrew it entirely. It is clear there was no groundswell of support for it, even in more limited guises, but the discussion touched on various aspects of the language and its history. It seems clear that if Python had been developed in one fell swoop, rather than being added to in a piecemeal fashion over decades, different choices would have been made. More (or even fully) consistent identifiers within the project's code base may well have been part of that.

But, at this point, it is far too late for a retrofit, at least for many; even if everyone agreed on how to change things, the upheaval, code churn, and dual-naming would be messy. And the gain, while not zero, is not huge. Beyond that, the day when the inconsistent names could actually be removed is extremely distant—likely never, in truth. So users and teachers of the language will need to keep in mind some semi-strange inconsistencies in the darker corners, warts, which exist in all programming (and other) languages. Humans are not consistent beasts, after all.


Index entries for this article
PythonPython Enhancement Proposals (PEP)/PEP 8


to post comments

Python identifiers, PEP 8, and consistency

Posted Nov 30, 2021 23:25 UTC (Tue) by iabervon (subscriber, #722) [Link] (2 responses)

I think it's useful to use lowercase words for some classes that are generic structure elements, to distinguish them from classes with business logic or application-specific semantics. I think it ought to be TimeDelta to indicate that it's got a particular interpretation but float to indicate that it could be lots of different things from a percentage to a number of seconds. Classes where you can have literals or special syntax for creating them or anything you get out of JSON would be lowercase. If you have a field or variable with a lowercase type, or a function that returns a lowercase type or takes lowercase type arguments, you should think about how your naming, comments, and documentation explain how to interpret the value. If a function returns a CamelCase type, the type's documentation should provide that explanation instead, and the function's documentation can just talk about which one it returns or how it is constructed.

Of course, by this metric, readers of PEP 8 are almost certainly not going to be naming a class that ought to be lowercase, and few packages outside of the standard library's builtins would even do it (exceptions being, perhaps, construct, numpy, and pandas). But I think it would be good (as a retcon) to specify for educational purposes. "If I tell you it's a pathlib.Path, you have some idea what to expect; if I tell you it's a str, you really don't."

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 7:44 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

I like the approach taken by Haskell, where types and constructors must start with a capital letter, and ordinary functions must not.

In Python, one answer might be to make the language case insensitive. Wait, hear me out… suppose you can declare particular source files or classes to be case insensitive, so callers can use any case. The library code can then be made consistent. An automatic linter or cleaner will convert code to the canonical form used in the library. If you do that cleanup, you can then go back to strict case matching. Or you might choose to keep working in case-insensitive mode and get warnings, rather than hard errors, on mismatches. Effectively making it a code style issue rather than some non-negotiable semantic question, which it doesn’t need to be.

That doesn’t do much for studlyCaps versus underscores, I admit. I guess if you must have aliases, they should be declared specially so that linter tools (and the compiler’s own diagnostics) can help the programmer. They should not just be a bunch of wrapper functions.

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 19:35 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

Having spent a fair amount of time working in a case-insensitive Pascal-like language (hello, Skyrim modding!), this is actually a lot less painful than it sounds.* However, your capitalization will get irregular if you don't have a standard style and some kind of automated enforcement mechanism. If your language supports variable shadowing (which Papyrus does not, but Python obviously does), then it might be more painful because you will have a larger set of variables potentially "in scope" and there is greater potential for collisions, which the compiler will not warn you about.

> That doesn’t do much for studlyCaps versus underscores, I admit. I guess if you must have aliases, they should be declared specially so that linter tools (and the compiler’s own diagnostics) can help the programmer. They should not just be a bunch of wrapper functions.

You don't need wrapper functions in Python. Python functions and classes are first-class objects, so you can just do this:

def original_name(args...): ...

alias = original_name

If a linter is unable to figure that out, it needs to be redesigned from the ground up.

* Except for the case-insensitive strings. Those are terrible.

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 2:45 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (3 responses)

> Angelico, along with others in the thread, pointed to the first section of PEP 8, which is titled "A Foolish Consistency is the Hobgoblin of Little Minds" (from the Ralph Waldo Emerson quote). That section makes it clear that the PEP is meant as a guide; consistency is most important at the function and module level, with project-level consistency being next in line. Any of those is more important than rigidly following the guidelines. As Angelico put it: "When a style guide becomes a boat anchor, it's not doing its job."

I hate this section of the PEP. You can cite it as justification for literally any violation of PEP 8, regardless of whether there's a good reason to violate PEP 8. In my experience, it's very uncommon for people citing this section of the PEP to actually explain the underlying justification for making an exception, even though that very section explicitly states that "Consistency with this style guide is important."

As for consistency within the standard library, or within individual modules... that's a terrible argument. The standard library is wholly inconsistent with itself, and the collections module is an excellent example of this inconsistency. This is actually an argument *in favor* of picking a specific style and standardizing on it, rather than continuing with the status quo. Of course, if you have to pick one style, you should probably pick the PEP 8 style, and so this is not a refutation at all.

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 14:43 UTC (Wed) by martin.langhoff (guest, #61417) [Link] (2 responses)

_Too much_ consistency can asphyxiate. Python has generally struck a good balance at the consistent and tidy end of the spectrum.

That's one of the points where having a lead everyone trusts to make calls in these gray zones is key.

At the other extreme of the range, there's PHP to consider :-)

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 19:39 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

It is obvious that there is a tradeoff. The problem is, I've seen people say "there is a tradeoff, therefore the status quo is fine." But that's a non sequitur. The fact that a tradeoff exists has nothing to do with the question of whether or not the status quo is at the right point of that tradeoff. Arguments should be grounded in the specific facts of each individual case, and you almost never see that when people start bringing out the "foolish consistency" quote.

Python identifiers, PEP 8, and consistency

Posted Dec 13, 2021 17:19 UTC (Mon) by nye (subscriber, #51576) [Link]

> At the other extreme of the range, there's PHP to consider :-)

If anything, I would say that PHP is *closer* to the consistent end of the range than Python is, at least if you consider its trajectory. Many of the changes of the last four or five years have been in the direction of improving consistency - sometimes in ways that are similar to those rejected in this Python discussion. Notable are certain cases where "backwards compatibility" was the only defence for otherwise indefensible behaviour, like the weirdness in `implode()` - PHP is definitely more willing than post-3.0 Python to break compatibility in favour of correctness, and consistency is increasingly a part of that.

Granted there are a lot of standard library functions which are simple wrappers around C functions, and in those cases they normally opt for "consistency" with the C function rather than with each other. I personally wish it had fewer of these very shallow wrappers, especially for very widely used libraries like cURL, but I can see the argument.

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 17:04 UTC (Wed) by jmaa (guest, #128856) [Link]

Remember when Tony Hoare called the null pointer a billion dollar mistake?

In the spirit of quantifying mistakes, let's try a Fermat-style equation for how much money and time might be wasted in migrating codebases to use new consistent identifiers, in a scenario where the original names are deprecated. I pulled these numbers from my ass, but they sound plausible:

1 000 000 000 lines of actively maintained Python code
0.001 incidence ratio of deprecation
1 minute per incidence
40$ hourly programmer wage

So that's around 16666 wasted work hours, and $666666 down the drain. This only counts the direct migration work, not all of the political drama and dread related to backwards-incompatible version upgrades, or the costs incurred by botched migrations.

Python identifiers, PEP 8, and consistency

Posted Dec 1, 2021 17:24 UTC (Wed) by pj (subscriber, #4506) [Link] (2 responses)

I feel like people who want that level of consistency could make a `pep8ed` module that would be just aliases.

from pep8ed.datetime import DateTime, TimeDelta

they could even write code to monkeypatch such aliases in so all they'd have to do is something like:

from pep8er import pep8ify
import datetime

pep8ify(datetime)

now = datetime.DateTime.now()

...once this kind of thing is done, it arguably could go live with stdlib. It'd be cute to be able to

import __foolish_consistency__

to enable such a feature :)

Python identifiers, PEP 8, and consistency

Posted Dec 2, 2021 6:32 UTC (Thu) by buck (subscriber, #55985) [Link] (1 responses)

Indeed, this got me thinking of the "six" module and wondering whether an "eight" module would allow one to opt into PEP 8-ified naming, but your post has me sold on `import __foolish_consistency__` instead.

Python identifiers, PEP 8, and consistency

Posted Dec 30, 2021 3:45 UTC (Thu) by moxfyre (guest, #13847) [Link]

Puhleez! That should be…

`from __future__ import foolish_consistency`

Python identifiers, PEP 8, and consistency

Posted Dec 2, 2021 13:47 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

This conversation gets at diminishing returns.

Yeah, `from datetime import datetime` is an eyesore, but it makes an important point about the history of the project.

The newbie marks a turning point when the basics of the language/library are understood, and delving into the 'hysterical raisins' why things are the way they are becomes of interest.

This sort of totalitarian cleanup would be appropriate for a Python4, when the GIL is relaxed, and the Promised Land entered.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds