Late-bound argument defaults for Python
Python supports default values for arguments to functions, but those defaults are evaluated at function-definition time. A proposal to add defaults that are evaluated when the function is called has been discussed at some length on the python-ideas mailing list. The idea came about, in part, due to yet another resurrection of the proposal for None-aware operators in Python. Late-bound defaults would help with one use case for those operators, but there are other, stronger reasons to consider their addition to the language.
In Python, the defaults for arguments to a function can be specified in the function definition, but, importantly, they are evaluated in the scope where the function is defined. So default arguments cannot refer to other arguments to the function, as those are only available in the scope of the function itself when it gets called. For example:
def foo(a, b = None, c = len(a)):
...
That definition specifies that a has no default, b
defaults to None if no argument gets passed for it, and c
defaults to the value of len(a). But that expression will not refer to the
a in the argument list; it
will instead look for an a in the scope where the
function is defined. That is probably not what the programmer intended.
If no a is found, the function definition will fail with a
NameError.
Idea
On October 24, Chris Angelico introduced his proposal for late-bound arguments. He used an example function, derived from the bisect.bisect_right() function in the standard library, to demonstrate the idea. The function's arguments are specified as follows:
def bisect(a, x, lo=0, hi=None):
He notes that there is a disparity between lo and hi:
"It's clear what value lo gets if you omit it. It's less clear what
hi
gets.
" Early in his example function, hi is actually set
to len(a) if it is None. Effectively None is
being used as a placeholder (or sentinel value)
because Python has no way to directly express
the idea that hi should default to the length of a. He
proposed new syntax to identify hi as a late-bound argument:
def bisect(a, x, lo=0, hi=:len(a)):
The "=:" would indicate that if no argument is passed for hi, the expression would be evaluated in the context of the call and assigned to hi before any of the function's code is run. It is interesting to note that the documentation for bisect.bisect_right() linked above looks fairly similar to Angelico's idea (just lacking the colon) even though the actual code in the library uses a default value of None. It is obviously useful to know what the default will be without having to dig into the code.
In his post, Angelico said that in cases where None is a legitimate value, there is another way to handle the default, but it also obscures what the default will be:
And the situation only gets uglier if None is a valid argument, and a unique sentinel is needed; this standard idiom makes help() rather unhelpful:_missing = object() def spaminate(thing, count=_missing): if count is _missing: count = thing.getdefault()Proposal: Proper syntax and support for late-bound argument defaults.def spaminate(thing, count=:thing.getdefault()): ...[...]The purpose of this change is to have the function header define, as fully as possible, the function's arguments. Burying part of that definition inside the function is arbitrary and unnecessary.
The first order of business in these kinds of discussions is the inevitable
bikeshedding about how the operator is spelled. Angelico chose a
"deliberately subtle
" syntax, noting that in many cases it
will not matter when the argument is bound. It is visually similar to the
walrus
operator (":="), but that is not legal in a function
definition, so there should be no ambiguity, he said.
Ethan Furman liked the idea but would rather see a different operator (perhaps "?=") used because of the potential confusion with the walrus operator. Guido van Rossum was also in favor of the feature, but had his spelling suggestion as well:
I like that you're trying to fix this wart! I think that using a different syntax may be the only way out. My own bikeshed color to try would be `=>`, assuming we'll introduce `(x) => x+1` as the new lambda syntax, but I can see problems with both as well :-).
New syntax for lambda expressions has also been
discussed, with most settling on "=>" as the best choice,
in part because "->" is used for type annotations; some kind
of "arrow" operator is commonly used in other languages for defining
anonymous functions.
Several others were similarly in favor of late-bound defaults
and many seemed to be happy with Van Rossum's spelling, but Brendan
Barnwell was opposed to both;
he was concerned that it would "encourage people to cram complex expressions into the
function definition
". Since it would only truly be useful—readable—for a simpler
subset of defaults, it should not be added, he said. Furthermore:
To me, this is definitely not worth adding special syntax for. I seem to be the only person around here who detests "ASCII art" "arrow" operators but, well, I do, and I'd hate to see them used for this. The colon or alternatives like ? or @ are less offensive but still too inscrutable to be used for something that can already be handled in a more explicit way.
But Steven D'Aprano did not
think that the addition of late-bound defaults would "cause a
large increase in the amount of overly complex
default values
". Angelico was also
skeptical that the feature was some sort of bad-code attractant. "It's like writing a list comprehension;
technically you can put any expression into the body of it, but it's
normally going to be short enough to not get unwieldy.
" In truth,
any feature can be abused; this one does not look to them to be
particularly worse in that regard.
PEP 671
Later that same day, Angelico posted
a draft of PEP 671
("Syntax for late-bound function argument defaults
"). In it,
he adopted the "=>" syntax, though he noted a half-dozen other
possibilities. He also fleshed out the specification of the default
expression and some corner cases:
The expression is saved in its source code form for the purpose of inspection, and bytecode to evaluate it is prepended to the function's body.Notably, the expression is evaluated in the function's run-time scope, NOT the scope in which the function was defined (as are early-bound defaults). This allows the expression to refer to other arguments.
Self-referential expressions will result in UnboundLocalError::
def spam(eggs=>eggs): # NopeMultiple late-bound arguments are evaluated from left to right, and can refer to previously-calculated values. Order is defined by the function, regardless of the order in which keyword arguments may be passed.
But one case, which had been raised by Ricky Teachey in the initial thread, was discussed at some length when Jonathan Fine asked about the following function definition:
def puzzle(*, a=>b+1, b=>a+1):
return a, b
Angelico was inclined
to treat that as a syntax error, "since permitting it would
open up some hard-to-track-down bugs
". Instead it could be some
kind of run-time error in the case where neither argument is passed, he
said.
He is concerned
that allowing "forward references" to arguments that have yet to be
specified (e.g. b in a=>b+1 above) will be confusing and
hard to explain. D'Aprano suggested
handling early-bound argument defaults before their late-bound counterparts
and laid out a new process for argument handling that was "consistent
and understandable
". In particular, he saw no reason to make some
kinds of late-bound defaults into a special case:
Note that step 4 (evaluating the late-bound defaults) can raise *any* exception at all (it's an arbitrary expression, so it can fail in arbitrary ways). I see no good reason for trying to single out UnboundLocalError for extra protection by turning it into a syntax error.
Angelico noted
that it was still somewhat difficult for even experienced Python
programmers to keep straight, but, in addition, he had yet to hear of a
real use case. Erik Demaine offered
two examples, "though they are a bit artificial
"; he said
that simply evaluating the defaults in left-to-right order (based on the
function definition) was reasonably easy to understand. Angelico said
that any kind of reordering of the evaluation was not being considered; as he
sees it:
The two options on the table are:1) Allow references to any value that has been provided in any way
2) Allow references only to parameters to the left
Option 2 is a simple SyntaxError on compilation (you won't even get as far as the def statement). Option 1 allows everything all up to the point where you call it, but then might raise UnboundLocalError if you refer to something that wasn't passed.
The permissive option allows mutual references as long as one of the arguments is provided, but will give a peculiar error if you pass neither. I think this is bad API design.
Van Rossum pointed
out that the syntax-error option would break new ground:
"Everywhere else in Python, undefined names are runtime errors
(NameError or UnboundLocalError).
" Angelico sees the error in
different terms, though, noting
that mismatches in global and local scope are a syntax error; he gave an
example:
>>> def spam(): ... ham ... global ham ... File "<stdin>", line 3 SyntaxError: name 'ham' is used prior to global declaration
He also gave a handful of different function definitions that were subtly
different using the new feature; he was concerned about the "bizarre
inconsistencies
" that can arise, because they "are
difficult to explain unless you know exactly how everything is
implemented internally
". He would prefer to see real-world use
cases for
the feature to decide whether it should be supported at all, but was
adamant that the strict left-to-right interpretation was easier to
understand:
If this should be permitted, there are two plausible semantic meanings for these kinds of constructs:1) Arguments are defined left-to-right, each one independently of each other
2) Early-bound arguments and those given values are defined first, then late-bound arguments
The first option is much easier to explain [...]
D'Aprano explained that the examples cited were not particularly hard to understand and fell far short of the "bizarre inconsistencies" bar. There is a clear need to treat the early-bound and late-bound defaults differently:
However there is a real, and necessary, difference in behaviour which I think you missed:def func(x=x, y=>x) # or func(x=x, @y=x)The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:def method(self, x=>self.attr) # @x=self.attr def bisect(a, x, lo=0, hi=>len(a)) # @hi=len(a)Without that difference in behaviour, probably fifty or eighty percent of the use-cases are lost. (And the ones that remain are mostly trivial ones of the form arg=[].) So we need this genuine inconsistency.
As can be seen, D'Aprano prefers a different color for the bikeshed: using
"@" to prepend late-bound default arguments. He also said that
Angelico had perfectly explained the "harder to explain" option in a single
sentence; both are equally easy to explain, D'Aprano said. Beyond that, it
does not make sense to "prohibit something as a syntax error
because it *might* fail at runtime
". In a followup
message, he spelled that out further:
We don't do this:y = x+1 # Syntax error, because x might be undefinedand we shouldn't make this a syntax errordef func(@spam=eggs+1, @eggs=spam-1):either just because `func()` with no arguments raises. So long as you pass at least one argument, it works fine, and that may be perfectly suitable for some uses.
Winding down
While many of the participants in the threads seem reasonably happy—or at least neutral—on the idea, there is some difference of opinion on the details as noted above. But several thread participants are looking for a more general "deferred evaluation" feature, and are concerned that late-bound argument defaults will preclude the possibility of adding such a feature down the road. Beyond that, Eric V. Smith wondered about how late-bound defaults would mesh with Python's function-introspection features. Those parts of the discussion got a little further afield from Angelico's proposal, so they merit further coverage down the road.
At first blush, Angelico's idea to fix this "wart" in Python seems fairly straightforward, but the discussion has shown that there are multiple facets to consider. It is not quite as simple as "let's add a way to evaluate default arguments when the function is called"—likely how it was seen at the outset. That is often the case when looking at new features for an established language like Python; there is a huge body of code that needs to stay working, but there are also, sometimes conflicting, aspirations for features that could be added. It is a tricky balancing act.
As with many python-ideas conversations, there were multiple interesting sub-threads, touching on language design, how to teach Python (and this feature), how other languages handle similar features (including some discussion of ALGOL thunks), the overall complexity of Python as it accretes more and more features, and, of course, additional bikeshedding over the spelling. Meanwhile, Angelico has been working on a proof-of-concept implementation, so PEP 671 (et al.) seems likely to be under discussion for some time to come.
| Index entries for this article | |
|---|---|
| Python | Arguments |
| Python | Python Enhancement Proposals (PEP)/PEP 671 |
