By Jake Edge
May 22, 2013
Designing an enumeration type (i.e. "enum") for a language may seem like a
straightforward exercise, but the recently "completed" discussions over
Python's PEP 435
show that it has a few wrinkles. The discussion spanned several long
threads in two mailing lists
(python-ideas, python-devel) going back to January in this particular
iteration, but the
idea is far older than that. A different approach was suggested in PEP 354, which was
proposed in
2005 but rejected at that time, largely due to lack of widespread interest.
A 2010
discussion also led nowhere (at least in terms of the standard
library), but the most recent discussions finally bore fruit: Guido van
Rossum accepted PEP 435 on May 9.
The basic idea is to have a class that implements an enum, which, in
Python, might look a lot like:
from enum import Enum
class Color(Enum):
red = 1
green = 2
blue = 3
That would allow using
Color.green (and the others) as a constant,
effectively.
Not only would
Color.blue have a value, but it would also have a
name ('blue') and an order (based on the declaration order). Enums can
also be iterated over, so that:
for color in Color:
print(color, color.name, color.value)
gives:
Color.red red 1
Color.green green 2
Color.blue blue 3
Along the way, there were several different enum proposals made. Ethan
Furman offered one that incorporated
multiple types of enum, including ones for bit flags, string-valued enums,
and automatically numbered sequences. Alex Stewart came up with a different syntax for defining
enums to avoid the requirement to specify each numeric value. Neither made
it to the PEP
stage, though pieces of both were adopted
into the first draft of PEP 435, which was
authored by Eli Bendersky and Barry Warsaw.
There are a couple of fairly obvious motivations for adding enums, which
were laid out in the PEP. An immutable set of related, constant values
is a useful construct. Making them their own type, rather than just using
sequences of some other basic type (like integer or string) means that
error checking can be done (i.e. no duplicates) and that nonsensical
operations can raise errors (e.g. Color.blue * 42).
Finally, it is convenient to be able to declare enum members once but to
still be able to get a string representation of the member name
(i.e. without some kind of overt assignment like: green.name='green').
Some of the use cases mentioned early in
the discussion of the PEP are for values like stdin and
stdout, the
flags for socket() or seek() calls, HTTP error codes,
opcodes from the
dis (Python bytecode disassembly) module, and so forth. One of
the questions that was
immediately raised about the original
version of the PEP was its insistence that "Enums are not
integers!", so ordered comparisons like:
Color.red < Color.green
would raise an exception, though equality tests would not:
print(Color.green == 2)
True
To some, that seemed to run directly counter to the whole idea of an enum
type, but allowing ordered comparisons has some
unexpected consequences as Warsaw
described. Two different enums could be compared with potentially
nonsensical results:
print(MyAnimal.cat == YourAnimal.dog)
True
In general, the belief is that "named integers" is a small subset of the
use cases for enums, and that most uses do not need ordered comparisons.
But, the final accepted PEP does have an
IntEnum variant
that provides the ordering desired by some. IntEnum members are also a
subclass of
int, so they can be used to replace user-facing
constants in the
standard library that are already treated as integers (e.g. HTTP error codes,
socket() and
seek() flags, etc.).
A second revision of the PEP was posted in
April, after lengthy discussion both in python-devel and python-ideas.
Furman offered up another proposal, this
time as an unnumbered
PEP with four separate classes for different types of enums. Two
different views of enums
arose in the discussion, as Furman summarized:
There seems to be two basic camps: those that think an enum
should be valueless, and have nothing to do with an integer besides using
it to select the appropriate enumerator [...] and those for whom the
integer is an integral part
of the enumeration, whether for sorting, comparing, selecting an index, or
whatever.
The critical aspect of using or not using an integer as the base type is:
what happens when an enumerator from one class is compared to an enumerator
from another class? If the base type is int and they both have the same
value, they'll be equal -- so much for type safety; if the base type is
object, they won't be equal, but then you lose your easy to use int aspect,
your sorting, etc.
Worse, if you have the base type be an int, but check for enumeration
membership such that Color.red == 1 == Fruit.apple, but Color.red !=
Fruit.apple, you open a big can of worms because you just broke equality
transitivity (or whatever it's called). We don't want that.
Furman's proposal looked overly complex to Bendersky and others commenting
on a fairly short python-ideas thread. Meanwhile in python-devel, another
monster thread was spinning up. The first objection to the revised PEP was
in raising a
NotImplementedError when doing ordered comparisons of enum
members. That was quickly dispatched with a recognition that
TypeError made far more sense. Other issues, such as the ordered
comparison issue that was handled with IntEnum in the final version, did
not resolve quite as
quickly.
One question, originally raised by Antoine
Pitrou, concerned the type of the enum members. The early
PEP revisions considered Color.red to not be an instance of the
Color class, and Warsaw strongly defended that view. At some
level, that makes sense (since the members
are actually attributes of the class), but it is confusing in other ways.
In a sub-thread, Van Rossum, Warsaw, and
others looked at the pros and cons of the types of enum members, as well as
implementation details of various options. In the end, Van Rossum made some pronouncements on various
features, including the question of member type, so:
isinstance(Color.blue, Color)
True
is now an official part of the specification.
As Python's benevolent dictator for life (BDFL), which is Van Rossum's only-semi-joking title, he can put an end to arguments and/or "bikeshedding"
about language features. In the same thread, he made some further pronouncements (along with a plea for
a halt to the bikeshedding). It is a privilege
that he exercises infrequently, but it is clearly useful to the project to
have someone in that role. Much like Linus Torvalds for the kernel, it can
be quite helpful to have someone who can stop a seemingly endless thread.
Van Rossum's edicts came after Furman summarized the outstanding issues (after a
summary request from Georg Brandl). That is a fairly common occurrence in
long-running Python threads: someone will try to boil down the differences
into a concise list of outstanding issues. Another nice feature of Python
discussions is their tone, which is generally respectful and flame-free.
Participants certainly disagree, sometimes strenuously, but the tone
is refreshingly different from many other projects' mailing lists.
Not everyone is happy with the end result for enums, however. Andrew Cooke
is particularly sad
about the outcome. He points out that several expected behaviors for
enums are not present in PEP 435:
class Color(Enum):
red = 1
green = 1
is not an error;
Color.green is an alias for
Color.red
(a dubious "
feature", he noted with a bit of venom).
In addition, there is a way to avoid having to assign values for each enum
member (auto-numbering, essentially), but its syntax is clunky:
Color = Enum('Color', 'red green blue')
Beyond having to repeat the class name as a string (which violates the
"don't repeat yourself" (DRY) principle), it starts the numbering from one,
rather than zero. Nick Coghlan
responded
to Cooke's complaints by more or less agreeing with the criticism. There
is still room for improvement in Python enums, but PEP 435 represents a
solid step forward, according to Coghlan.
It is instructive to watch the design of a language feature play out in
public as they do for Python (and other languages). Enums are something
that the developers will
have to live with for a long time, so it is not surprising that there would
be lots of participation and examination of the feature from many different
angles. While PEP 435 probably didn't completely satisfy anyone's full set
of requirements, there is still room for more features, both in the
standard library and elsewhere, as Coghlan pointed out. The story of enums
in Python likely does not end here.
Comments (23 posted)
By Jake Edge
May 22, 2013
In a posting with a title hearkening back
to a famous letter of old ("Implicit string literal concatenation
considered harmful?"), Guido van Rossum opened up a python-ideas
discussion about
a possible feature deprecation. In particular, he had been recently bitten by a
hard-to-find bug in his code because of Python's implicit string
concatenation. He wondered if it was perhaps time to consider deprecating
the feature, which was added for "flimsy" reasons, he said.
As the
discussion shows, however, getting rid of dubious language features is a
tricky task at best.
The problem stems from Python's behavior when faced with two adjacent
string literals: it concatenates the two strings.
Van Rossum ran into difficulties and got an
argument count exception because he forgot a comma. He wrote:
foo('a' 'b')
when he really wanted:
foo('a', 'b')
The former passes one argument (the string "ab"), while the latter passes
two. This implicit concatenation can cause similar (but potentially even
harder to spot)
problems in things like lists:
[ 'a', 'b',
'c', 'd'
'e', 'f' ]
which creates a five-element list with "de" as the fourth element. The
reason Van Rossum added the feature to Python is, by his own admission, questionable: "
I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros". Beyond that, the string concatenation operator
("
+") is evaluated at compile time, so there is no runtime penalty to
constructing long strings that way. Given all of that, should the feature
be dumped in some future version of Python?
There were some who quickly jumped on the deprecation bandwagon. It is,
evidently, a
common problem—one that is rather irritating to hit. But other
commenters noted some problems in blindly requiring the +
operator. Perhaps the biggest problem is that the "%"
interpolation operator has higher precedence than +, which means:
print("long string %d " +
"another long string %s" % (2, 'foo'))
would not work at all using the "
+" operator, but would work fine
when omitting
it. In the thread,
it was mentioned that the
% operator may be slated for eventual
deprecation itself, but the same
problem
exists with its replacement:
the
format() string method. So, a simple substitution that adds
+ simply
won't work in all cases and additional parentheses will be required.
Another alternative, using triple-quoted strings, has its own set of
problems, mostly related to handling indentation inside those strings.
Adding some kind of "dedent" (i.e. un-indent) operation or syntax into the
language might help. Currently:
'''This string
will have "extra"
spaces'''
will result in a string with multiple spaces before "will" and "spaces".
There was also a suggestion to add a new "..." operator that would
indicate a continued string, but Stephen J. Turnbull (and others) didn't think that justified adding a new operator
(the ellipsis), especially given that the symbol is already used for
custom list slice operations.
Additional suggestions included a new
string modifier (prefacing the string literal with "m" for example) to
indicate a
continued string:
a = [m'abc'
'def']
Another "
farfetched"
idea was to allow compile-time string
processors to be specified for string literals:
!merge!'''\
abc
def'''
which would run the "merge" processor on the string, creating "abcdef".
There was no end to the suggested alternatives to implicit
concatenation—there's a
reason the mailing list is called python-ideas after all—but
participants
were fairly evenly split on whether to deprecate the feature.
There were enough complaints about doing so that it seems unlikely
that concatenation by
juxtaposition will be deprecated any time soon. As Van Rossum noted, though, it is a feature that would
almost certainly never pass muster if it were proposed today. Furthermore:
I do realize that this will break a lot of code, and that's the only
reason why we may end up punting on this, possibly until Python 4, or
forever. But I don't think the feature is defensible from a language
usability POV. It's just about backward compatibility at this point.
Van Rossum's lament should be a helpful reminder for language designers.
It is very difficult to get rid of a feature once it has been added. That is
especially true for a low-level syntax element like literal
strings—just ask
the developer of
make about the tab
requirement for makefiles. For Python, concatenation by juxtaposition
seems likely to be around for long time to come—perhaps forever.
Comments (9 posted)
Brief items
So
the current state of the art is just to copy & paste ScreenInit and
friends from another driver, because the documentation wouldn't
actually be any shorter than the hundreds of lines of code.
—
Daniel Stone (thanks to Arthur Huillet)
Ask a programmer to review 10 lines of code, he'll find 10 issues. Ask him to do 500 lines and he'll say it looks good.
—
Giray
Özil (from a retweet by Randall Arnold)
Comments (none posted)
Version 1.5.0 of the QEMU hardware emulator is out. "
This release
was developed in a little more than 90 days by over 130 unique authors
averaging 20 commits a day. This represents a year-to-year growth of over
38 percent making it the most active release in QEMU history." Some
of the new features include KVM-on-ARM support, a native GTK+ user
interface, and lots of hardware support and performance improvements. See
the change log for lots of
details.
Full Story (comments: 9)
The Perl 5.18.0 release is out. "
Perl v5.18.0 represents approximately 12 months of development since Perl
v5.16.0 and contains approximately 400,000 lines of changes across 2,100
files from 113 authors." See
this perldelta
page for details on what has changed.
Full Story (comments: 1)
Several point releases of Python are now available. Benjamin Peterson announced the release of 2.7.5 on May 15, and Georg Brandl announced 3.2.5 and 3.3.2 on May 16. The primary focus of the releases are regression fixes, so users are encouraged to upgrade.
Comments (none posted)
Newsletters and articles
Comments (none posted)
Libre Graphics World
looks
at the use of Blender in 3D printing; the recent 2.67 release includes
a "3D printing toolbox." "
While Blender cannot help with making
actual devices easier to use, it definitely could improve designing
printable objects. And that's exactly what happened last week, when Blender
2.67 was released."
Comments (3 posted)
Page editor: Nathan Willis
Next page: Announcements>>