|
|
Log in / Subscribe / Register

Python dictionary "addition" and "subtraction"

By Jake Edge
March 13, 2019

A proposal to add a new dictionary operator for Python has spawned a PEP and two large threads on the python-ideas mailing list. To a certain extent, it is starting to look a bit like the "PEP 572 mess"; there are plenty of opinions on whether the feature should be implemented and how it should be spelled, for example. As yet, there has been no formal decision made on how the new steering council will be handling PEP pronouncements, though a review of open PEPs is the council's "highest priority". This PEP will presumably be added into the process; it is likely too late to be included in Python 3.8 even if it were accepted soon, so there is plenty of time to figure it all out before 3.9 is released sometime in 2021.

João Matos raised the idea in a post at the end of February. (That email is largely an HTML attachment, which is not getting archived cleanly, its contents can be seen here or quoted by others in the thread.) In that message, he suggested adding two new operators for dictionaries: "+" and "+=". They would be used to merge two dictionaries:

    tmp = dict_a + dict_b      # results in all keys/values from both dict_a and dict_b
    dict_a += dict_b           # same but done "in place"
In both cases, any key that appears in both dictionaries would take its value from dict_b. There are several existing ways to perform this "update" operation, including using the dictionary unpacking operator "**" specified in PEP 448 ("Additional Unpacking Generalizations"):
    tmp = { **dict_a, **dict_b }     # like tmp = dict_a + dict_b
    dict_a = { **dict_a, **dict_b }  # like dict_a += dict_b
Or, as Rhodri James pointed out, one can also use the dictionary update() method:
    tmp = dict_a.copy(); tmp.update(dict_b)  # like tmp = dict_a + dict_b
    dict_a.update(dict_b)                    # like dict_a += dict_b     

Matos's idea drew the attention of Guido van Rossum, who liked it. It is analogous to the "+=" operator for mutable sequence types, he said.

This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.

There were some questions about non-commutative "+" operators (i.e. where a+b is not the same as b+a), but several pointed out that there are a number of uses of the "+" operator in Python that are not commutative, including string concatenation (e.g. 'a'+'b' is not the same as 'b'+'a'). Steven D'Aprano suggested that instead of "addition", the operation should be treated like a set union, which uses the "|" operator. He had some other thoughts on new dictionary operators as well:

That also suggests d1 & d2 for the intersection between two dicts, but which value should win?

More useful than intersection is, I think, dict subtraction: d1 - d2 being a new dict with the keys/values from d1 which aren't in d2.

All of this hearkens back to a 2015 python-ideas thread that was summarized here at LWN. As a comment from Raymond Hettinger on a Python bug tracker issue notes, however, Van Rossum rejected the idea back in 2015. Some things have changed since then: for one, Van Rossum is no longer the final decision-maker on changes to the language (though his opinion still carries a fair amount of weight, of course), but he has also changed his mind on the feature.

Hettinger made it clear that he opposes adding a new "+" operator, both in another comment on the bug and in a python-ideas post. He does not see this operation as being "additive" in some sense, so using "+" does not seem like the right choice. In addition, there are already readable alternatives, including using the collections.ChainMap() class:

[...] I'm not sure we actually need a short-cut for "d=e.copy(); d.update(f)". Code like this comes-up for me perhaps once a year. Having a plus operator on dicts would likely save me five seconds per year.

If the existing code were in the form of "d=e.copy(); d.update(f); d.update(g); d.update(h)", converting it to "d = e + f + g + h" would be a tempting but algorithmically poor thing to do (because the behavior is quadratic). Most likely, the right thing to do would be "d = ChainMap(e, f, g, h)" for a zero-copy solution or "d = dict(ChainMap(e, f, g, h))" to flatten the result without incurring quadratic costs. Both of those are short and clear.

Several thought "|" made more sense than "+", but that was not universal. For the most part, any opposition to the idea is similar to Hettinger's: dictionary addition is unneeded and potentially confusing. It also violates "some of the unifying ideas about Python", Hettinger said.

Van Rossum suggested that D'Aprano write a PEP for a new dictionary operator. Van Rossum also thought the PEP should include dictionary subtraction for consideration, though that idea seemed to have even less support among participants in the thread. D'Aprano did create PEP 584 ("Add + and - operators to the built-in dict class"); that post set off another long thread.

As part of that, Jimmy Girardet asked what problem is being solved by adding a new operator, given that there are several alternatives that do the same thing. Stefan Behnel noted that "+" for lists and tuples, along with "|" for sets, provide a kind of basic expression for combining those types, but that dictionaries lack that, which is "a gap in the language". Adding such an operator would "enable the obvious syntax for something that should be obvious". Van Rossum has similar thoughts:

[...] even if you know about **d in simpler contexts, if you were to ask a typical Python user how to combine two dicts into a new one, I doubt many people would think of {**d1, **d2}. I know I myself had forgotten about it when this thread started! If you were to ask a newbie who has learned a few things (e.g. sequence concatenation) they would much more likely guess d1+d2.

There is an implementation of the PEP available, but it is still seemingly a long way from being accepted—if it ever is. The PEP itself may need some updating. It would seem that the arguments for switching to "|" may be gaining ground. Early on, Van Rossum favored "+", but there have been some strong arguments in favor of switching. More recently, Van Rossum said that he is "warming up to '|' as well". Other than his arguments early on, there have been few others who were strongly advocating "+" over "|".

There have also been discussions of corner cases and misunderstandings that might arise from the operators. But the main objections seem to mostly fall, as they did in the PEP 572 "discussion", on the question of each person's "vision" for the language. It could certainly be argued that the PEP 572 assignment expression (i.e. using ":=" for assignments within statements) is a more radical change to the overall language, but many of the arguments against dictionary "addition" sound eerily familiar.

It is, as yet, unclear how PEPs will be decided; that will be up to the steering council. It may well be somewhat telling that none of the other four steering council members have been seriously involved in the discussion, but it is also the case that many may have tuned out python-ideas as a forum that is difficult to keep up with. Only time will tell what happens with PEP 584 (and the rest of the PEPs that are pending).


Index entries for this article
PythonDictionaries


to post comments

Python dictionary "addition" and "subtraction"

Posted Mar 13, 2019 16:52 UTC (Wed) by pj (subscriber, #4506) [Link] (3 responses)

Can't this just be monkey-patched in by anyone who wants it? One of the joys of python, no?

Python dictionary "addition" and "subtraction"

Posted Mar 13, 2019 17:17 UTC (Wed) by smurf (subscriber, #17840) [Link] (2 responses)

You can create a subclass of "dict" with extra methods, but you can't do that on "dict" directly. Built-in types are immutable. If you want a language you can do this with, use Ruby.

Python dictionary "addition" and "subtraction"

Posted Mar 13, 2019 19:26 UTC (Wed) by ms-tg (subscriber, #89231) [Link] (1 responses)

> You can create a subclass of "dict" with extra methods, but you can't do that on "dict" directly. Built-in types are immutable. If you want a language you can do this with, use *Ruby*.

Definitely!

In fact, Ruby provides a way to do it *without monkey-patching* called Refinements:
https://ruby-doc.org/core-2.5.3/doc/syntax/refinements_rd...

So if you wanted to extend the Ruby Hash class (equivalent to Python Dict) in this way with method #+ and #+=, you could create the following refinement module:

module HashWithPlus
refine Hash do
alias_method :+, :merge
alias_method :"+=", :merge!
end
end

Then, in any lexical scope you can say:

using HashWithPlus

{ "a" => 1 } + { "b" => 2}
=> {"a"=>1, "b"=>2}

With no monkey-patching! The refined Hash will only be visible in scopes that explicitly use the refinement.

(just sharing for those interested)

Python dictionary "addition" and "subtraction"

Posted Mar 14, 2019 10:00 UTC (Thu) by sheepgoesbaaa (guest, #98005) [Link]

Thanks, I enjoyed reading that :)

Python dictionary "addition" and "subtraction"

Posted Mar 13, 2019 19:14 UTC (Wed) by iabervon (subscriber, #722) [Link] (8 responses)

My intuition for "a+b" is that the second one takes precedence, while my intuition for "a|b" is that the first one takes precedence (like "a.get(key) or b.get(key)").

Python dictionary "addition" and "subtraction"

Posted Mar 13, 2019 21:35 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (7 responses)

Indeed, the problem is that:

1. Single pipe is traditionally commutative (bitwise OR, set union, etc.). It has been used non-commutatively (shell pipelines), but that usage is so far afield that it provides no intuition here.
2. Double pipe (or the "or" keyword) traditionally short-circuits to the left. I've never seen it short-circuit to the right.
3. Plus is sometimes commutative, but its noncommutative uses traditionally preserve the order of items (concatenation) rather than allowing one item to "override" another. So the closest analogue here would be c = ChainMap(a, b) (so that you have c.maps == [a, b]) as Hettinger suggests... but that actually gives precedence to the left! ChainMap.new_child() does give precedence to the "right" in a sense, but it's type-asymmetric (self is a ChainMap, the argument usually isn't), and probably should not be an operator at all.

Python dictionary "addition" and "subtraction"

Posted Mar 13, 2019 22:11 UTC (Wed) by iabervon (subscriber, #722) [Link] (6 responses)

For "a+b", I was thinking that "a+=b" seems like it should do something for every element of b, and what "a.update(b)" does feels like what "a+=b" would most likely do. Last-wins also matches what would happen if you just made a new dict and set all the items from a and b in the order they appear, so it feels right for concatenation.

I'd sort of guess that, if you've got an "or"-like thing, even if it doesn't short-circuit (i.e., evaluates all of its arguments), it'll pick between distinguishable values with the same truth value as if it were short-circuiting.

Python dictionary "addition" and "subtraction"

Posted Mar 14, 2019 15:41 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (5 responses)

My intuition for "a + b", where "a" and "b" are both dictionaries, would be that keys which exist in both "a" and "b" have their values combined recursively with the same "+" operator.

>>> { 'apple': 5, 'pear': 7 } + { 'strawberry': 2, 'pear': 10 }
{ 'apple': 5, 'pear': 17, 'strawberry': 2 }
>>> { 'x': 'abc', 'z': { 'a': 13 } } + { 'x': 'def', 'y': 'ijk', 'z': { 'a': 9, 'b': 10 } }
{ 'x': 'abcdef', 'y': 'ijk', 'z': { 'a': 22, 'b': 10 } }

This would make it similar to the Semigroup operator (<>) in Haskell.

Python dictionary "addition" and "subtraction"

Posted Mar 14, 2019 17:17 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (4 responses)

That already exists, it's called collections.Counter.

Python dictionary "addition" and "subtraction"

Posted Mar 15, 2019 11:29 UTC (Fri) by jani (subscriber, #74547) [Link] (3 responses)

Doesn't that require the dict values to be integers? The proposal above only expects them to have + and - operators.

Python dictionary "addition" and "subtraction"

Posted Mar 15, 2019 23:56 UTC (Fri) by quietbritishjim (subscriber, #114117) [Link] (2 responses)

The values in Counter don't need to be integers, or even numbers. Both the constructor and the update method can take either iterables or mappings, and the iterable versions put integer counts in while the mapping versions just directly use the values (with the + operator, in the case of update). There's a note in the documentation [1] (scroll down to grey box) saying the implementation of each method makes minimal assumptions on the value type.

[1] https://docs.python.org/3/library/collections.html#collec...

Python dictionary "addition" and "subtraction"

Posted Mar 17, 2019 11:35 UTC (Sun) by jani (subscriber, #74547) [Link] (1 responses)

Given two dictionaries mapping string keys to list values, how do you actually use Counter to concatenate the lists (list + operation) for keys that exist in both dictionaries?

Python dictionary "addition" and "subtraction"

Posted Mar 20, 2019 20:48 UTC (Wed) by quietbritishjim (subscriber, #114117) [Link]

Hmm, it turns out it doesn't work. The obvious thing is to use +:

Counter({'a': [1], 'b': [1, 2]}) + Counter({'a': [2, 3], 'b': [3]})

This fails for two reasons:

  • The + operator is guaranteed to include only positive results even if summing the values is negative, so it includes the comparison <0 for each element, which fails for lists.
  • If the set of keys is different on the two sides then it does an addition with 0 (this only applies to keys missing in the left hand list, but that is presumably an implementation detail).

An alternative is to use the update() method, which doesn't have a restriction to positive values so doesn't compare against zero:

c = Counter({'a': [1], 'b': [1, 2]})
c.update(Counter({'a': [2, 3], 'b': [3]}))

But this doesn't work either:

  • As with the + operator, missing keys are interpreted as having value 0.
  • The values are passed to + in the opposite order than you would expect, so the above example results in c['a'] == [2,3,1]! This is true in Python 3.7 but not Python 2.7.

I think this is a bug in the documentation, which seems to say that these should be possible (at least for update()), or even a straight up bug in the code. But in fairness it is an unusual use of the class.

TMTOWTDI

Posted Mar 13, 2019 20:53 UTC (Wed) by jccleaver (guest, #127418) [Link]

There's definitely some overlap between these and and perl6's more arcane operators, but in perl 5 and perl 6 the trivial solution is just:

> %hash2 = (%hash1, %hash2); # %hash2 values will take priority

There are other ways to handle it if you need to watch memory usage in pathological cases, but sometimes simple things should be simple.

Python dictionary "addition" and "subtraction"

Posted Mar 14, 2019 9:27 UTC (Thu) by NRArnot (subscriber, #3033) [Link] (1 responses)

I'm very lukewarm. What is wrong with using .copy() and .update() methods? "Explicit is better than implicit" and methods are certainly more explicit.

But if there is support, I'd argue strongly against using + and - operators. This is far more likely to lead to accidents than using |. This latter operator is far less frequently used, and seeing it always alerts one to the fact that something out of the ordinary is going on. Also one can visualize a set as a dict where all values are the same and irrelevant, such as True. In fact back before we had sets, that's what we used to do. So extending the set | operator to apply similarly to dicts makes reasonable sense.

Subtraction I like even less. Non-commutativity I can live with, but subtraction completely ignores the values in the dict being "Subtracted". A much better idea would be a dict.subtract( other) method.

Python dictionary "addition" and "subtraction"

Posted Mar 15, 2019 19:00 UTC (Fri) by k8to (guest, #15413) [Link]

Sometimes I think language projects just try to incorporate change because solving problems means making changes in the minds of most people, while accepting the status quo as fine can feel like not solving a problem.

I find the methods quite clear and provide good hints about what sort of data type we're working with here anyway for code that doesn't bother to provide that information in an easily accessible way. Even with the relatively constrained set of semantics for built-in types in basic python, I often find production code where the developers haven't really understood the provided toolkit. I'm not a fan of adding semantic synonyms that offer no specific utility when in practice it's clear that programmers are otten already operating beyond their understanding of their tools.

Python dictionary "addition" and "subtraction"

Posted Mar 21, 2019 3:08 UTC (Thu) by DHR (guest, #81356) [Link]

The operator that this is most like is ";". I know that sounds weird, but think about it. Variables defined in the environment before a ; are combined with variables defined after it. If there is an overlap, the definitions after the semicolon win.

Personally, I don't see that this is a good addition to the language. It won't get a lot of use. If the operator "+" is used, I bet that more examples will be accidents rather than intentional.

Dictionaries are functions, in the mathematical sense. If they were relations, then + would make more sense.


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds