|
|
Subscribe / Log in / New account

Development

New operators for Python dicts?

By Jake Edge
March 4, 2015

The Python dictionary is a commonly used data structure that supports a rich set of operations. But there are some operations that it lacks—two operators in particular: "+" and "+=". That lack is the subject of a recent discussion on the python-ideas mailing list. There are questions about the precise semantics of the operators, but there is also something of an existential question about the need for operators whose semantics can already be handled using existing operations.

Some background

Dictionaries (or dicts) are also known as associative arrays or hashes in other languages. In essence, they map some key, which is usually—but not always—a string, to some other value. A simple example:

    >>> a_dict = { 'a' : 3, 9 : 7, 'foo' : 'bar' }
    >>> a_dict['a']
    3
    >>> a_dict[9]
    7
    >>> a_dict['bar']
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: 'bar'
That sets up a_dict as a dict with three elements, then shows accessing various elements of the dict. The last key, "bar", is not present, so attempting to access it results in a runtime KeyError exception.

One of the other fundamental Python types is the list, which provides an ordered sequence of objects, in many ways like arrays in other languages.

    >>> b_list = [ 1, 2, 3 ]
    >>> b_list[2]
    3
But lists have two operators that dicts lack. In particular, lists can be concatenated using + and +=:
    >>> b_list + b_list
    [1, 2, 3, 1, 2, 3]
    >>> b_list += [ 4, 5, 6 ]
    >>> b_list
    [1, 2, 3, 4, 5, 6]
Doing something similar with dicts, though, leads to an exception:
    >>> a_dict + a_dict
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
A TypeError is also raised for the += operator when used on dicts.

Adding + and +=

But Ian Lee would like to see that change. He first raised the issue in a brief subthread on the python-dev mailing list in a discussion about code review for PEP 448. Lee subsequently moved the topic to python-ideas, where he suggested adding both the + and += operators for dicts to put them on an equal footing with lists.

The semantics of adding two dicts that have no keys in common seems clear: the result is a dict with all of the key/value pairs from both operands. For +, the result is a new dict, while += modifies the dict on the left. The only real question is what to do when there are duplicate keys. Dicts already have an update() method that takes the value for a duplicate key from the argument dict:

    >>> a_dict = { 'a' : 1, 'b' : 2 }
    >>> a_dict.update( { 'b' : 9 } )
    >>> a_dict
    {'a': 1, 'b': 9}
Lee suggested using the "last setter wins" for the new operators, as the update() method does. So the value for a duplicate key comes from the right operand:
    >>> a_dict = { 'a' : 1, 'b' : 2 }
    >>> b_dict = { 'b' : 'bar', 'c' : 'baz' }
    >>> a_dict + b_dict
    { 'a' : 1, 'b' : 'bar', 'c' : 'baz' }
    >>> b_dict += a_dict
    >>> b_dict
    { 'a' : 1, 'b' : 2, 'c' : 'baz' }

Donald Stufft liked the idea behind the change, but didn't like using +. He would rather use "|" to try to make it clearer that it is really more of a set union operation, rather than a concatenation or addition. Ethan Furman, though, sees + as a generic operator for combining things. On the other hand: "I suppose I could come around to '|', though -- it does ease the tension around the behavior of duplicate keys", he said.

After a bit of a digression through a question of commutativity (which is not preserved by the operators, but that is hardly unique—string concatenation doesn't either, for example), Marc-André Lemburg explained that he didn't see the need for +, though += could be useful:

However, I don't really see the point in having an operation that takes two dictionaries, creates a new empty one and updates this with both sides of the operand. It may be theoretically useful, but it results in the same poor performance you have in string concatenation.

In applications, you normally just need the update functionality for dictionaries. If you do need a copy, you can create a copy explicitly - but those cases are usually rare.

Having one of those operators without the other seems a bit strange to some, though. Operators in Python are implemented as special methods on objects, so a + b becomes a.__add__(b) (similarly, += uses the __iadd__() special method). Dicts could pick up an __iadd__() method (or the | equivalent: __ior__()), but most developers, especially those new to the language, would probably expect + to work if += did.

Other options

In the case of duplicated keys, there are (at least) two other options. An exception could be raised when combining two dicts that have keys in common, as Greg Ewing suggested, though that might be surprising. Another option would be to apply the addition operator to the two values, but that might cause its own set of surprises:

    >>> a_dict = { 'a' : 2, 'b' : 'foo' }
    >>> b_dict = { 'a' : 4, 'b' : 'foo' }
    >>> c_dict = { 'b' : 3 }
    >>> a_dict + b_dict
    { 'a' : 6, 'b' : 'foofoo' }
    >>> b_dict + c_dict
    ...
    TypeError: cannot concatenate 'str' and 'int' objects
Either the addition/concatenation or the exception might well surprise developers.

Lee summarized the ideas and approaches from early on in the thread in a kind of a pre-PEP document.

Even though there is a lot of precedent for operators like + and +=, Steven D'Aprano argued that they are actually flawed ideas that should not be further propagated. The fact that lists have those operators is not for the better:

It is *unfortunate* that += works with lists and tuples because + works, not a feature to emulate. Python made the best of a bad deal with augmented assignments: a syntax which works fine in C doesn't *quite* work cleanly in Python, but demand for it [led] to it being supported. The consequence is that every generation of Python programmers now need to learn for themselves that += on non-numeric types has surprising corner cases. Usually the hard way.

D'Aprano described one of those corner cases (which also appears in the Python FAQ) for the tuple immutable sequence type:

    >>> t = ([], None)
    >>> t[0] += [1]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment
    >>> t
    ([1], None)
In the example, t is an immutable sequence of two items: an empty list and None. Because of the way Python handles the += operator, the exception isn't raised until after the "desired" change has been made to the list. As Andrew Barnert explained, Python essentially turns the statement into:
    setitem(t, 0, getitem(t, 0).__iadd__([1]))
It is the setitem() that fails, but the __iadd__() has already succeeded in changing the list object. Another way to look at it would be:
    >>> l = t[0]
    >>> l += [1]
    >>> t[0] = l
The final assignment is where that sequence fails, but the list object l has already been modified. That "feature" is—at best—a language wart.

The subject of dict.__add__() comes up on python-ideas with some frequency, and it is clear there are strong feelings on all of the different sides. Stufft thinks it would make a nice "mini-addition" to the language that might make newer versions a little more attractive:

Similarly doing:
    new_dict = dict1.copy()
    new_dict.update(dict2)
Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version.

But Stephen J. Turnbull is not convinced that the semantics are so clear that what the operators do would be obvious to most. He noted that four different ways to handle the duplicate-key problem had been proposed and added two more, possibly with tongue in cheek. In addition, since there are existing ways to perform those operations, adding another violates the Python "there's only one way to do it" (TOOWTDI) guideline.

Early on, Lee indicated that he would try to shepherd a PEP through the process to see if the operators could be added to dicts. Brett Cannon agreed with that idea:

I think a PEP that finally settled this idea would be good, even if it just ends up being a historical document as to why dicts don't have an __add__ method. Obviously there is a good amount of support both for and against the idea.

That's where things stand now. No PEP has yet appeared, though it seems likely that one will. It is an interesting question in that both sides seem to see their choice as the "obvious" one. There is precedent in that lists have the two operators, but that precedent does lead to some corner cases and warts. Even if the PEP were to be accepted, it would only be a feature for some upcoming version of Python 3—features are no longer being added to Python 2. One suspects that in the end it will come down to what benevolent dictator for life (BDFL) Guido van Rossum thinks—so far he has been silent in the thread.

Comments (4 posted)

Brief items

Quotes of the week

"We'll only ship proprietary unauditable software from huge companies with a history of backdoors." wasn't the Lenovo change we needed
John Sullivan

"We as an industry need to fix this." is business-speak for "Magic stuff-fixing elves should come in at night while we sleep and fix this."
Don Marti

I've found that hopping over on #samba-technical on Freenode is usually a good way to corner a team member to get a review on a small patch.

Although they might pretend to be an Eliza bot written in LISP if they're being lazy. Don't fall for playing the Turing Test game with them; they play to lose. :)

Scott Lovenberg (thanks to Michael Wood)

Comments (7 posted)

LLVM 3.6 Released

Version 3.6 of the LLVM compiler suite is out. Changes include "many many bug fixes, optimization improvements, support for more proposed C++1z features in Clang, better native Windows compatibility, embedding LLVM IR in native object files, Go bindings, and more." Details can be found in the LLVM 3.6 release notes and the Clang 3.6 release notes.

Full Story (comments: none)

VLC 2.2.0 released

Version 2.2.0 of the VLC media player has been released. According to the announcement, highlights in the new version include automatic, hardware-accelerated rotation of portrait-orientation videos such as those shot on smartphones, resuming playback at the last point watched in the previous session, in-application download and installation of extensions, support for interactive Blu-Ray menus, and "compatibility with a very large number of unusual codecs". The release is available for Linux, Android, and Android TV, plus various Windows and Apple platforms.

Comments (18 posted)

IPython 3.0 released

The IPython interactive development system project has announced its 3.0 release. "Support for languages other than Python is greatly improved, notebook UI has been significantly redesigned, and a lot of improvement has happened in the experimental interactive widgets. The message protocol and document format have both been updated, while maintaining better compatibility with previous versions than prior updates. The notebook webapp now enables editing of any text file, and even a web-based terminal (on Unix platforms)." (LWN looked at IPython in 2014).

Comments (none posted)

GNU GNATS 4.2.0 available

Version 4.2.0 of the GNU GNATS bug-tracking tool has been released. This is the first major update of the program in ten years; among the many improvements are the use of Automake as the build system, a license update (GPLv3), and multiple portability fixes.

Full Story (comments: 1)

Buildroot 2015.02 released

Buildroot 2015.02 is available; the changes include a reworked handling of shared libraries, a new warning whenever unsafe paths are encountered, and support for new processor architectures.

Full Story (comments: none)

Calligra 2.9 Released

Version 2.9 of the Caligra office suite has been released. New in this edition is the "Calligra Gemini" edition, which allows the same document to be worked on in the desktop environment and on a touchscreen tablet, as well as support for Microsoft Word documents in the Okular document viewer, and numerous updates and fixes.

Full Story (comments: none)

Newsletters and articles

Development newsletters from the past week

Comments (none posted)

The state of Linux gaming in the SteamOS era (Ars Technica)

Ars Technica takes a look at Linux gaming and at what effect SteamOS has had already for gaming on Linux. The article also considers the future and where SteamOS might (or might not) take things. "This all brings up another major question for SteamOS followers: how long is this "beta" going to last, exactly? While Valve has unquestionably built a viable Linux gaming market from practically nothing, the company's lackadaisical development timeline might be holding the market back from growing even more. In the last year, the initial excitement behind the SteamOS beta launch seems to have given way to "Valve Time" malaise in some ways."

Comments (28 posted)

Page editor: Nathan Willis
Next page: Announcements>>


Copyright © 2015, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds