New operators for Python dicts?
The Python dictionary is a commonly used data structure that supports a rich set of operations. But there are some operations that it lacks—two operators in particular: "+" and "+=". That lack is the subject of a recent discussion on the python-ideas mailing list. There are questions about the precise semantics of the operators, but there is also something of an existential question about the need for operators whose semantics can already be handled using existing operations.
Some background
Dictionaries (or dicts) are also known as associative arrays or hashes in other languages. In essence, they map some key, which is usually—but not always—a string, to some other value. A simple example:
>>> a_dict = { 'a' : 3, 9 : 7, 'foo' : 'bar' } >>> a_dict['a'] 3 >>> a_dict[9] 7 >>> a_dict['bar'] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'bar'That sets up a_dict as a dict with three elements, then shows accessing various elements of the dict. The last key, "bar", is not present, so attempting to access it results in a runtime KeyError exception.
One of the other fundamental Python types is the list, which provides an ordered sequence of objects, in many ways like arrays in other languages.
>>> b_list = [ 1, 2, 3 ] >>> b_list[2] 3But lists have two operators that dicts lack. In particular, lists can be concatenated using + and +=:
>>> b_list + b_list [1, 2, 3, 1, 2, 3] >>> b_list += [ 4, 5, 6 ] >>> b_list [1, 2, 3, 4, 5, 6]Doing something similar with dicts, though, leads to an exception:
>>> a_dict + a_dict Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for +: 'dict' and 'dict'A TypeError is also raised for the += operator when used on dicts.
Adding + and +=
But Ian Lee would like to see that change. He first raised the issue in a brief subthread on the python-dev mailing list in a discussion about code review for PEP 448. Lee subsequently moved the topic to python-ideas, where he suggested adding both the + and += operators for dicts to put them on an equal footing with lists.
The semantics of adding two dicts that have no keys in common seems clear: the result is a dict with all of the key/value pairs from both operands. For +, the result is a new dict, while += modifies the dict on the left. The only real question is what to do when there are duplicate keys. Dicts already have an update() method that takes the value for a duplicate key from the argument dict:
>>> a_dict = { 'a' : 1, 'b' : 2 } >>> a_dict.update( { 'b' : 9 } ) >>> a_dict {'a': 1, 'b': 9}Lee suggested using the "
last setter wins" for the new operators, as the update() method does. So the value for a duplicate key comes from the right operand:
>>> a_dict = { 'a' : 1, 'b' : 2 } >>> b_dict = { 'b' : 'bar', 'c' : 'baz' } >>> a_dict + b_dict { 'a' : 1, 'b' : 'bar', 'c' : 'baz' } >>> b_dict += a_dict >>> b_dict { 'a' : 1, 'b' : 2, 'c' : 'baz' }
Donald Stufft liked the idea behind the
change, but didn't like using +. He would rather use "|"
to try to make it clearer that it is really more of a set union operation,
rather than a concatenation or addition. Ethan Furman, though, sees + as a generic operator for combining
things. On the other hand: "I suppose I could come around to '|',
though -- it does ease
the tension around the behavior of duplicate keys
", he said.
After a bit of a digression through a question of commutativity (which is not preserved by the operators, but that is hardly unique—string concatenation doesn't either, for example), Marc-André Lemburg explained that he didn't see the need for +, though += could be useful:
In applications, you normally just need the update functionality for dictionaries. If you do need a copy, you can create a copy explicitly - but those cases are usually rare.
Having one of those operators without the other seems a bit strange to some, though. Operators in Python are implemented as special methods on objects, so a + b becomes a.__add__(b) (similarly, += uses the __iadd__() special method). Dicts could pick up an __iadd__() method (or the | equivalent: __ior__()), but most developers, especially those new to the language, would probably expect + to work if += did.
Other options
In the case of duplicated keys, there are (at least) two other options. An exception could be raised when combining two dicts that have keys in common, as Greg Ewing suggested, though that might be surprising. Another option would be to apply the addition operator to the two values, but that might cause its own set of surprises:
>>> a_dict = { 'a' : 2, 'b' : 'foo' } >>> b_dict = { 'a' : 4, 'b' : 'foo' } >>> c_dict = { 'b' : 3 } >>> a_dict + b_dict { 'a' : 6, 'b' : 'foofoo' } >>> b_dict + c_dict ... TypeError: cannot concatenate 'str' and 'int' objectsEither the addition/concatenation or the exception might well surprise developers.
Lee summarized the ideas and approaches from early on in the thread in a kind of a pre-PEP document.
Even though there is a lot of precedent for operators like + and +=, Steven D'Aprano argued that they are actually flawed ideas that should not be further propagated. The fact that lists have those operators is not for the better:
D'Aprano described one of those corner cases (which also appears in the Python FAQ) for the tuple immutable sequence type:
>>> t = ([], None) >>> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment >>> t ([1], None)In the example, t is an immutable sequence of two items: an empty list and None. Because of the way Python handles the += operator, the exception isn't raised until after the "desired" change has been made to the list. As Andrew Barnert explained, Python essentially turns the statement into:
setitem(t, 0, getitem(t, 0).__iadd__([1]))It is the setitem() that fails, but the __iadd__() has already succeeded in changing the list object. Another way to look at it would be:
>>> l = t[0] >>> l += [1] >>> t[0] = lThe final assignment is where that sequence fails, but the list object l has already been modified. That "feature" is—at best—a language wart.
The subject of dict.__add__() comes up on python-ideas with some frequency, and it is clear there are strong feelings on all of the different sides. Stufft thinks it would make a nice "mini-addition" to the language that might make newer versions a little more attractive:
new_dict = dict1.copy() new_dict.update(dict2)Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version.
But Stephen J. Turnbull is not convinced that the semantics are so clear that what the operators do would be obvious to most. He noted that four different ways to handle the duplicate-key problem had been proposed and added two more, possibly with tongue in cheek. In addition, since there are existing ways to perform those operations, adding another violates the Python "there's only one way to do it" (TOOWTDI) guideline.
Early on, Lee indicated that he would try to shepherd a PEP through the process to see if the operators could be added to dicts. Brett Cannon agreed with that idea:
That's where things stand now. No PEP has yet appeared, though it seems likely that one will. It is an interesting question in that both sides seem to see their choice as the "obvious" one. There is precedent in that lists have the two operators, but that precedent does lead to some corner cases and warts. Even if the PEP were to be accepted, it would only be a feature for some upcoming version of Python 3—features are no longer being added to Python 2. One suspects that in the end it will come down to what benevolent dictator for life (BDFL) Guido van Rossum thinks—so far he has been silent in the thread.
Index entries for this article | |
---|---|
Python | Dictionaries |
Posted Mar 5, 2015 14:30 UTC (Thu)
by zyga (subscriber, #81533)
[Link]
Consider:
a = 1
This calls int.__add__ as int.__iadd__ doesn't exist.
+= will always work if + is defined
Posted Mar 5, 2015 17:37 UTC (Thu)
by dashesy (guest, #74652)
[Link] (1 responses)
Posted Mar 7, 2015 1:20 UTC (Sat)
by droundy (subscriber, #4559)
[Link]
In particular, sets use | and |= where + and += are being proposed for dicts.
Posted Mar 9, 2015 7:18 UTC (Mon)
by marcH (subscriber, #57642)
[Link]
Mmmm... let's see which confused languages mix up feature and implementation...
(Cheap shot; sorry)
New operators for Python dicts?
a += 1
__iadd__ may not be defined for certain types
New operators for Python dicts?
What about parallel case of sets?
New operators for Python dicts?