LWN.net Logo

Development

An "enum" for Python 3

By Jake Edge
May 22, 2013

Designing an enumeration type (i.e. "enum") for a language may seem like a straightforward exercise, but the recently "completed" discussions over Python's PEP 435 show that it has a few wrinkles. The discussion spanned several long threads in two mailing lists (python-ideas, python-devel) going back to January in this particular iteration, but the idea is far older than that. A different approach was suggested in PEP 354, which was proposed in 2005 but rejected at that time, largely due to lack of widespread interest. A 2010 discussion also led nowhere (at least in terms of the standard library), but the most recent discussions finally bore fruit: Guido van Rossum accepted PEP 435 on May 9.

The basic idea is to have a class that implements an enum, which, in Python, might look a lot like:

    from enum import Enum

    class Color(Enum):
        red = 1
        green = 2
        blue = 3
That would allow using Color.green (and the others) as a constant, effectively. Not only would Color.blue have a value, but it would also have a name ('blue') and an order (based on the declaration order). Enums can also be iterated over, so that:
    for color in Color:
        print(color, color.name, color.value)
gives:
    Color.red red 1
    Color.green green 2
    Color.blue blue 3

Along the way, there were several different enum proposals made. Ethan Furman offered one that incorporated multiple types of enum, including ones for bit flags, string-valued enums, and automatically numbered sequences. Alex Stewart came up with a different syntax for defining enums to avoid the requirement to specify each numeric value. Neither made it to the PEP stage, though pieces of both were adopted into the first draft of PEP 435, which was authored by Eli Bendersky and Barry Warsaw.

There are a couple of fairly obvious motivations for adding enums, which were laid out in the PEP. An immutable set of related, constant values is a useful construct. Making them their own type, rather than just using sequences of some other basic type (like integer or string) means that error checking can be done (i.e. no duplicates) and that nonsensical operations can raise errors (e.g. Color.blue * 42). Finally, it is convenient to be able to declare enum members once but to still be able to get a string representation of the member name (i.e. without some kind of overt assignment like: green.name='green').

Some of the use cases mentioned early in the discussion of the PEP are for values like stdin and stdout, the flags for socket() or seek() calls, HTTP error codes, opcodes from the dis (Python bytecode disassembly) module, and so forth. One of the questions that was immediately raised about the original version of the PEP was its insistence that "Enums are not integers!", so ordered comparisons like:

    Color.red < Color.green
would raise an exception, though equality tests would not:
    print(Color.green == 2)
    True
To some, that seemed to run directly counter to the whole idea of an enum type, but allowing ordered comparisons has some unexpected consequences as Warsaw described. Two different enums could be compared with potentially nonsensical results:
    print(MyAnimal.cat == YourAnimal.dog)
    True
In general, the belief is that "named integers" is a small subset of the use cases for enums, and that most uses do not need ordered comparisons. But, the final accepted PEP does have an IntEnum variant that provides the ordering desired by some. IntEnum members are also a subclass of int, so they can be used to replace user-facing constants in the standard library that are already treated as integers (e.g. HTTP error codes, socket() and seek() flags, etc.).

A second revision of the PEP was posted in April, after lengthy discussion both in python-devel and python-ideas. Furman offered up another proposal, this time as an unnumbered PEP with four separate classes for different types of enums. Two different views of enums arose in the discussion, as Furman summarized:

There seems to be two basic camps: those that think an enum should be valueless, and have nothing to do with an integer besides using it to select the appropriate enumerator [...] and those for whom the integer is an integral part of the enumeration, whether for sorting, comparing, selecting an index, or whatever.

The critical aspect of using or not using an integer as the base type is: what happens when an enumerator from one class is compared to an enumerator from another class? If the base type is int and they both have the same value, they'll be equal -- so much for type safety; if the base type is object, they won't be equal, but then you lose your easy to use int aspect, your sorting, etc.

Worse, if you have the base type be an int, but check for enumeration membership such that Color.red == 1 == Fruit.apple, but Color.red != Fruit.apple, you open a big can of worms because you just broke equality transitivity (or whatever it's called). We don't want that.

Furman's proposal looked overly complex to Bendersky and others commenting on a fairly short python-ideas thread. Meanwhile in python-devel, another monster thread was spinning up. The first objection to the revised PEP was in raising a NotImplementedError when doing ordered comparisons of enum members. That was quickly dispatched with a recognition that TypeError made far more sense. Other issues, such as the ordered comparison issue that was handled with IntEnum in the final version, did not resolve quite as quickly.

One question, originally raised by Antoine Pitrou, concerned the type of the enum members. The early PEP revisions considered Color.red to not be an instance of the Color class, and Warsaw strongly defended that view. At some level, that makes sense (since the members are actually attributes of the class), but it is confusing in other ways. In a sub-thread, Van Rossum, Warsaw, and others looked at the pros and cons of the types of enum members, as well as implementation details of various options. In the end, Van Rossum made some pronouncements on various features, including the question of member type, so:

    isinstance(Color.blue, Color)
    True
is now an official part of the specification.

As Python's benevolent dictator for life (BDFL), which is Van Rossum's only-semi-joking title, he can put an end to arguments and/or "bikeshedding" about language features. In the same thread, he made some further pronouncements (along with a plea for a halt to the bikeshedding). It is a privilege that he exercises infrequently, but it is clearly useful to the project to have someone in that role. Much like Linus Torvalds for the kernel, it can be quite helpful to have someone who can stop a seemingly endless thread.

Van Rossum's edicts came after Furman summarized the outstanding issues (after a summary request from Georg Brandl). That is a fairly common occurrence in long-running Python threads: someone will try to boil down the differences into a concise list of outstanding issues. Another nice feature of Python discussions is their tone, which is generally respectful and flame-free. Participants certainly disagree, sometimes strenuously, but the tone is refreshingly different from many other projects' mailing lists.

Not everyone is happy with the end result for enums, however. Andrew Cooke is particularly sad about the outcome. He points out that several expected behaviors for enums are not present in PEP 435:

    class Color(Enum):
        red = 1
        green = 1
is not an error; Color.green is an alias for Color.red (a dubious "feature", he noted with a bit of venom). In addition, there is a way to avoid having to assign values for each enum member (auto-numbering, essentially), but its syntax is clunky:
    Color = Enum('Color', 'red green blue')
Beyond having to repeat the class name as a string (which violates the "don't repeat yourself" (DRY) principle), it starts the numbering from one, rather than zero. Nick Coghlan responded to Cooke's complaints by more or less agreeing with the criticism. There is still room for improvement in Python enums, but PEP 435 represents a solid step forward, according to Coghlan.

It is instructive to watch the design of a language feature play out in public as they do for Python (and other languages). Enums are something that the developers will have to live with for a long time, so it is not surprising that there would be lots of participation and examination of the feature from many different angles. While PEP 435 probably didn't completely satisfy anyone's full set of requirements, there is still room for more features, both in the standard library and elsewhere, as Coghlan pointed out. The story of enums in Python likely does not end here.

Comments (23 posted)

Python and implicit string concatenation

By Jake Edge
May 22, 2013

In a posting with a title hearkening back to a famous letter of old ("Implicit string literal concatenation considered harmful?"), Guido van Rossum opened up a python-ideas discussion about a possible feature deprecation. In particular, he had been recently bitten by a hard-to-find bug in his code because of Python's implicit string concatenation. He wondered if it was perhaps time to consider deprecating the feature, which was added for "flimsy" reasons, he said. As the discussion shows, however, getting rid of dubious language features is a tricky task at best.

The problem stems from Python's behavior when faced with two adjacent string literals: it concatenates the two strings. Van Rossum ran into difficulties and got an argument count exception because he forgot a comma. He wrote:

    foo('a' 'b')
when he really wanted:
    foo('a', 'b')
The former passes one argument (the string "ab"), while the latter passes two. This implicit concatenation can cause similar (but potentially even harder to spot) problems in things like lists:
    [ 'a', 'b',
      'c', 'd'
      'e', 'f' ]
which creates a five-element list with "de" as the fourth element. The reason Van Rossum added the feature to Python is, by his own admission, questionable: "I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros". Beyond that, the string concatenation operator ("+") is evaluated at compile time, so there is no runtime penalty to constructing long strings that way. Given all of that, should the feature be dumped in some future version of Python?

There were some who quickly jumped on the deprecation bandwagon. It is, evidently, a common problem—one that is rather irritating to hit. But other commenters noted some problems in blindly requiring the + operator. Perhaps the biggest problem is that the "%" interpolation operator has higher precedence than +, which means:

    print("long string %d " +
          "another long string %s" % (2, 'foo'))
would not work at all using the "+" operator, but would work fine when omitting it. In the thread, it was mentioned that the % operator may be slated for eventual deprecation itself, but the same problem exists with its replacement: the format() string method. So, a simple substitution that adds + simply won't work in all cases and additional parentheses will be required.

Another alternative, using triple-quoted strings, has its own set of problems, mostly related to handling indentation inside those strings. Adding some kind of "dedent" (i.e. un-indent) operation or syntax into the language might help. Currently:

    '''This string
       will have "extra"
       spaces'''
will result in a string with multiple spaces before "will" and "spaces".

There was also a suggestion to add a new "..." operator that would indicate a continued string, but Stephen J. Turnbull (and others) didn't think that justified adding a new operator (the ellipsis), especially given that the symbol is already used for custom list slice operations.

Additional suggestions included a new string modifier (prefacing the string literal with "m" for example) to indicate a continued string:

    a = [m'abc'
          'def']
Another "farfetched" idea was to allow compile-time string processors to be specified for string literals:
    !merge!'''\
        abc
        def'''
which would run the "merge" processor on the string, creating "abcdef".

There was no end to the suggested alternatives to implicit concatenation—there's a reason the mailing list is called python-ideas after all—but participants were fairly evenly split on whether to deprecate the feature. There were enough complaints about doing so that it seems unlikely that concatenation by juxtaposition will be deprecated any time soon. As Van Rossum noted, though, it is a feature that would almost certainly never pass muster if it were proposed today. Furthermore:

I do realize that this will break a lot of code, and that's the only reason why we may end up punting on this, possibly until Python 4, or forever. But I don't think the feature is defensible from a language usability POV. It's just about backward compatibility at this point.

Van Rossum's lament should be a helpful reminder for language designers. It is very difficult to get rid of a feature once it has been added. That is especially true for a low-level syntax element like literal strings—just ask the developer of make about the tab requirement for makefiles. For Python, concatenation by juxtaposition seems likely to be around for long time to come—perhaps forever.

Comments (9 posted)

Brief items

Quotes of the week

So the current state of the art is just to copy & paste ScreenInit and friends from another driver, because the documentation wouldn't actually be any shorter than the hundreds of lines of code.
Daniel Stone (thanks to Arthur Huillet)

Ask a programmer to review 10 lines of code, he'll find 10 issues. Ask him to do 500 lines and he'll say it looks good.
Giray Özil (from a retweet by Randall Arnold)

Comments (none posted)

QEMU 1.5.0 released

Version 1.5.0 of the QEMU hardware emulator is out. "This release was developed in a little more than 90 days by over 130 unique authors averaging 20 commits a day. This represents a year-to-year growth of over 38 percent making it the most active release in QEMU history." Some of the new features include KVM-on-ARM support, a native GTK+ user interface, and lots of hardware support and performance improvements. See the change log for lots of details.

Full Story (comments: 9)

Perl 5.18.0 released

The Perl 5.18.0 release is out. "Perl v5.18.0 represents approximately 12 months of development since Perl v5.16.0 and contains approximately 400,000 lines of changes across 2,100 files from 113 authors." See this perldelta page for details on what has changed.

Full Story (comments: 1)

New Python releases

Several point releases of Python are now available. Benjamin Peterson announced the release of 2.7.5 on May 15, and Georg Brandl announced 3.2.5 and 3.3.2 on May 16. The primary focus of the releases are regression fixes, so users are encouraged to upgrade.

Comments (none posted)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

Blender dives into 3D printing industry (Libre Graphics World)

Libre Graphics World looks at the use of Blender in 3D printing; the recent 2.67 release includes a "3D printing toolbox." "While Blender cannot help with making actual devices easier to use, it definitely could improve designing printable objects. And that's exactly what happened last week, when Blender 2.67 was released."

Comments (3 posted)

Page editor: Nathan Willis
Next page: Announcements>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds