|
|
Subscribe / Log in / New account

From late-bound arguments to deferred computation, part 1

By Jake Edge
August 16, 2022

Back in November, we looked at a Python proposal to have function arguments with defaults that get evaluated when the function is called, rather than when it is defined. The article suggested that the discussion surrounding the proposal was likely to continue on for a ways—which it did—but it had died down by the end of last year. That all changed in mid-June, when the already voluminous discussion of the feature picked up again; once again, some people thought that applying the idea only to function arguments was too restrictive. Instead, a more general mechanism to defer evaluation was touted as something that could work for late-bound arguments while being useful for other use cases as well.

Background

Python programmers can specify default values for function arguments, which get used if no value is passed for the argument when the function is called. But that default value is determined by evaluating the expression at definition time, so the following will not do what is likely intended:

    def fn(a, max=len(a)):
        ...
That might look like max will default to the length of the argument a, but in reality it will default to length of some other a in an enclosing scope when the definition is being executed. If the other a does not exist, the definition will raise a NameError.

Chris Angelico wanted to provide a mechanism to tell the interpreter to evaluate the default expression when the function was called, which would fix that particular problem. He proposed using "=>" to indicate a late-bound default, so the definition of max would change to max=>len(a) in the example above. To that end, he authored PEP 671 ("Syntax for late-bound function argument defaults") in October and posted it to the python-ideas mailing list, which set off the first round of discussion that was covered in our article.

On December 1, he announced an updated version of the PEP, which set off an even longer discussion thread. The level of opposition to the feature had grown in the interim, it would seem. For some, the feature did not truly provide enough "bang for the buck"; there are simply not enough real-world uses of it to offset the cost in terms of development and maintenance. Beyond that, adding a feature like this increases the cognitive load of the language, which affects existing developers as well as those just learning Python.

But there was also a possible alternative that kept cropping up in the various discussions: deferred computation. Having a way to specify that an expression should not be evaluated until its value is truly needed would provide a more general solution—or so it is argued. But no PEP, or even concrete description, for deferred computation emerged, so proponents of PEP 671 were often frustrated by the amorphous nature of the supposed alternative. Things got rather heated in the discussion, to the point that a list moderator placed python-ideas in moderated mode for a day, but it all wound down without any sort of resolution in mid-December.

More discussion

That all changed in mid-June when the thread was revived by two posts, one from "Bluenix" opposed to the PEP and another by Steve Jorgensen strongly in favor of it. Angelico suggested that Bluenix create their own PEP "instead of just hating on my proposal"; he noted that it is more work to do so, of course, "but it would also be a lot more useful".

In response to Jorgensen, Rob Cliffe asked about the status of the PEP. Angelico said that he had wearied of the earlier discussion, so he had back-burnered the PEP; he had suggestions for how to potentially revive it, though it would seem that folks decided on more python-ideas discussion instead. That discussion got heated and contentious at times, as well; it seems clear that all "sides" have dug into their positions and there is little—or no—room for changing minds.

Another use case where Angelico sees value for late-bound defaults is with a default collection such as a list. The "obvious" way to write that probably does not do, what the programmer expects:

    # naive version
    def fn(arg=[]):    # creates the list at compile time
        ...

    # better version, but more verbose
    def fn(arg=None):
        if arg is None:
            arg = []   # creates the list each time
        ...

    # PEP 671 version
    def fn(arg=>[]):
        ...
The "naive" version will reuse the same list every time fn() is called without passing arg; the other two would create a new empty list every time. The extra code needed in the second version could be eliminated with PEP 671, but "I don't find that burdensome enough to want syntax", Stephen J. Turnbull said. Paul Moore agreed:
It's somewhat attractive in theory, adding a short form for something which people do tend to write out "longhand" at the moment. But the saving is relatively small, and there are a number of vaguely annoying edge cases that probably don't come up often, but overall just push the proposal into the "annoyingly complicated" area. The net result is that we get something that *might* help with a minor annoyance, but the cost in (theoretical, but necessary) complexity is just a bit too high.

One of those complexities is how to handle default values that refer to other arguments. The PEP specifies that late-bound defaults are evaluated in left-to-right order as specified in the function definition, but even then there are oddities, such as:

    def fn(n=>len(items), items=[]):
        ...
If items were a late-bound default, a call to fn() with no arguments would be illegal, because of the left-to-right rule. The PEP waffles a bit on what should happen with a (confusing) function definition like that; "implementations MAY choose to do so in two separate passes, first for all passed arguments and early-bound defaults, and then a second pass for late-bound defaults". As Moore noted, though, those and other edge cases should be rare.

Deferred computation

As before, David Mertz and Steven D'Aprano were the main proponents of at least considering a generalized deferred-computation mechanism before setting the late-bound argument syntax in stone. Mertz said that he still opposed the PEP, but that he might be swayed if a "soft keyword" version was proposed. Soft keywords are new keywords that are added to the language in a way that does not preclude their existing uses as variable and function names. New language keywords are generally avoided because of their impact on existing code, but the switch to a PEG parser for CPython allows context-specific keywords; for example, the match and case keywords for the structural pattern matching feature are soft keywords.

Mertz suggested that a keyword version of late-bound defaults, using later, defer, or delay, for example, would make the function-argument handling less of a special case. That keyword could be used in other contexts if the more-general deferred-computation mechanism appears. But it is clear that some participants were losing their patience with the constant "deferred computation" refrain in the absence of any kind of concrete specification (or even description) of how that might look. Cliffe said that the late-bound defaults and deferred computation (or "lazy evaluation" as he called it) are orthogonal to each other. He thought that raising lazy evaluation in the context of PEP 671 was off topic—or even FUD.

D'Aprano disagreed with the idea that the two are unrelated, however; "Late-bound defaults is a very small subset of lazy evaluation." There is a subtle difference, however, that Angelico raised (and not for the first time): the scope of the default evaluation specified in the PEP is rather different from what a generalized lazy scheme would provide. The PEP specifies that the evaluation of the default expression is done in the function's run-time scope:

    def fn(stuff, max=>len(stuff)):
	stuff.append(1)
	print(max)
So a call to fn([]) would print 0 since that is the length of stuff in the function before any code in it is run. But if max were deferred, thus only evaluated when it was needed for the print() call, it would print 1. There is some further weirdness that was not directly mentioned there, however: in the deferred case, if the print() call is moved up a line to before the append, it would print 0 as well. So the default argument would change value depending on where it is referred to in the function.

To a certain extent, D'Aprano waved away that objection. Angelico responded to that, clearly frustrated, by demanding that D'Aprano actually go implement what he was describing; "You'll find it's a LOT more problematic than you claim." He also suggested that D'Aprano simply specify what a deferred evaluation object would look like for Python and how it could be used to implement late-bound defaults, but Angelico's rhetoric gets in the way of what he is trying to say, as Brendan Barnwell pointed out. Moore also suggested toning things down; he said that Angelico is forgetting that there are other participants who may just be trying to follow along:

Asking for more concrete examples of the proposed alternative is reasonable. Getting frustrated when they are not provided is understandable, but doesn't help. Calling the proposed alternative "vapourware" just doubles down on the uncompromising "implement it or I'll ignore you" stance. And replying with increasingly frustrated posts that end up at a point where people like me can't even work out how we'd go looking to see whether anyone *had* provided concrete examples of deferred evaluation just makes things worse. All of which could have been avoided by simply including an early argument posted here in the PEP, under rejected alternatives, with a link to the post and a statement that "this was proposed as an alternative, but there's not enough detail provided to confirm how it would replace the existing proposal".

Moore's message refers back to an earlier exchange, where D'Aprano pushed for deferred evaluation to be added as part of the "rejected ideas" section of the PEP, which Angelico resisted. But Angelico did give some reasons why he felt the two concepts were not fully compatible:

Generic "lazy evaluation" is sufficient to do *some* of what can be done with late-bound argument defaults, but it is not sufficient for everything [...]

[...] Generic lazy evaluation should be processed at some point where the parameter is used, but late-bound defaults are evaluated as the function begins. They are orthogonal.

At the time, Moore suggested including some of that explanation into the PEP. Had that happened, perhaps some (small part) of the unpleasantness in the thread could have been avoided. Angelico did add some text about deferred evaluation back into the PEP a few days later, though.

Part 2

On June 21, Mertz posted the first version of his proto-PEP for "Generalized deferred computation". It was meant to flesh out the idea of a mechanism to defer the evaluation of Python expressions and to incorporate late-bound defaults as well. Stay tuned for part 2, which looks at the proposal and the reactions to it.


Index entries for this article
PythonArguments
PythonDeferred computation
PythonPython Enhancement Proposals (PEP)/PEP 671


to post comments

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 12:26 UTC (Wed) by cew5550 (guest, #122770) [Link]

Given the 'later' keyword mechanism I wonder if this would be possible:

later def f(n):
pass

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 13:08 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (3 responses)

Not having looked at any of the PEPs for it…but Python already has deferred evaluation. It even has syntax to say "evaluate here":

x = []
deferred = lambda: len(x)
print(deferred()) # 0
x.append(1)
print(deferred()) # 1

It even does exactly what is wanted (AFAICT) as to the "what if the backing data is changed before asking for the value?" behavior.

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 14:16 UTC (Wed) by cew5550 (guest, #122770) [Link] (2 responses)

Not quite. To use the example in the article, this will not work:

foo = lambda n: max(n)

def bar(a, foo(a)):
pass

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 16:16 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (1 responses)

I think the idea is to do something like this:

def foo(x=lambda: []):
    return x() + [1, 2, 3]

The problem is, for a more complicated function you either end up evaluating x() more than once, or you have to decide when to evaluate it and assign the result to a new variable. If the new syntax is tightly scoped to late-binding of arguments, then having it just be evaluated once at the top of the function is not too bad... but if you want a generalized deferred evaluation syntax, this gets a lot more complicated and weird.

Currently, if you want to have magic stuff happen when you evaluate a variable, that variable has to be a dotted attribute reference (i.e. something like foo.bar). There is no way to override the evaluation of a regular variable by itself, and overriding the evaluation of an attribute is generally understood to be an implementation detail of the owning class or module (i.e. it is expected that the magic still "behaves like an attribute" - doesn't have observable side effects, is cheap to evaluate, only uses data that is owned by the class, etc.). Allowing magic to happen just because you wrote the name of a variable is, IMHO, a bridge too far, because there is no obvious encapsulation boundary behind which the magic can hide.

Of course, languages like Haskell get along just fine with stuff like this. But Haskell was designed from the ground up to have lazy evaluation and referential transparency. Python, by contrast, is very much an impure language, and is usually written in an imperative style. I find it hard to believe that they're going to introduce impure lazy evaluation without causing a lot of headaches and confusion.

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 20:30 UTC (Wed) by barryascott (subscriber, #80640) [Link]

And now I cannot call foo with my list as foo wants a function.

foo([1,2,3])

The cure you offer is worse than the status quo.

Barry

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 20:46 UTC (Wed) by amarao (guest, #87073) [Link] (3 responses)

Deffered computation may be a cure for fast python start. If modulea are loaded in deffer mode (that mean, when import => foo, foo is not loaded until used), we may shrunk script star time down from shameful 300ms...

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 21:17 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

Long Python startup times are essentially a fixed problem of the distant past. Python used to take several 100 milliseconds to start, 10 years ago. Today:
$ time python3 -c 'import sys; sys.exit(0)'
real    0m0,016s

From late-bound arguments to deferred computation, part 1

Posted Aug 18, 2022 8:28 UTC (Thu) by amarao (guest, #87073) [Link]

It's an artificial example with a single builtin import.

Try this:

```
import yaml
import requests
import pytest
import argparse
import collections
```

On my machine:

time python3 1.py

real 0m0.345s
user 0m0.318s
sys 0m0.024s

For comparison:

time vim --help >/dev/null
real 0m0.007s
user 0m0.007s

And for python every next import slows everything down. If I add to imports 2-3 more modules, I get to 0.5s of empty module (imports only). I think, the dist-packages are the worst. Python built-in batteries are often simple modules (in /usr/lib/python3/), but all installed packages are _packages_, and they are hellishly slow to import.

May be, having imports deferred, can shrink those shameful 300ms down to something < 100ms (on real scripts with normal amount of imports).

From late-bound arguments to deferred computation, part 1

Posted Aug 18, 2022 18:35 UTC (Thu) by jwilk (subscriber, #63328) [Link]

10 years ago people were using mostly Python 2.7, which is still much faster at starting than Python 3.10. (And much slower than bash or perl.)

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 21:26 UTC (Wed) by marcH (subscriber, #57642) [Link]

> Cliffe said that the late-bound defaults and deferred computation (or "lazy evaluation" as he called it) are orthogonal to each other.

I admittedly got lost in the subtle differences and lost track of who calls what how. On other hand, Cliffe is certainty not the only one to use the "lazy evaluation" terminology. According to Wikipedia it's been used since the 70's

From late-bound arguments to deferred computation, part 1

Posted Aug 17, 2022 22:13 UTC (Wed) by iabervon (subscriber, #722) [Link] (4 responses)

I really think the syntax should be:

def foo(x?, max?):
  x ?= []
  max ?= len(x)
  ...

That is, x is a local, and it starts out unassigned unless the caller provides a value. If it is unassigned (which is different from having any value at all), it will be assigned to an expression evaluated during the execution of the function in the scope of the function, using a distinctive statement that help() can identify if it occurs in straight-line code at the beginning of the function. Also, this syntax matches the explanatory pseudocode in the "how to teach this" section with respect to where everything is, and doesn't look like anything that deferred evaluation would be related to.

From late-bound arguments to deferred computation, part 1

Posted Aug 18, 2022 12:12 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (2 responses)

Unlike most of the proposals I don't instinctively hate this.

It solves the stated problem narrowly, rather than adding a bunch of extra features which I'm sure somebody wants but aren't actually related to this and should argue their own merits.

It looks like what's actually happening, when the function call executes we perform these assignments, so, logically, they happen then and not during some earlier phase. If you mean to do A, then B, then C, as code you can just write that, and if a new programmer writes C, then B, then A and is surprised, they can be told that it happens in the order they wrote, rather than referred to a sub-sub-paragraph of some document they've never seen as would inevitably happen if foo(x => [], max => len(x)) has different behaviour from bar(max=> len(x), x => []) or worse, has different behaviour only on some implementations or in some circumstances.

However, one remaining thing I don't like is "unassigned". Magic extra states are bad, and this looks like a magic extra state. That's what Hoare's billion dollar mistake is, in essence, let's try not to keep doing that. If I have a boolean, and it's not true, it should be false (not "it should compare false" or "it should act as false in boolean context" or other weasel words, it should actually be false. A boolean having three (or more!) possible states is obviously going to result in mistakes. I'm not sure how to avoid that here.

From late-bound arguments to deferred computation, part 1

Posted Aug 18, 2022 13:38 UTC (Thu) by NRArnot (subscriber, #3033) [Link]

I also don't instinctively dislike this. Is it actually a new state, or is it a singleton value that's distinct from None only in that it's not intended to be explicitly referred to? It exists behind the scenes and if it did have a name _Unassigned, then these two would be equivalent
    arg ?= default
    arg = arg if arg is not _Unassigned else default
(c.f. the obscure fun with object-identity for small integers versus large ones) Or just stick to None and hide it behind the question-marks. It matters only if somebody wants to explicitly pass None as a value through such a function as a non-default value. Which is sort of perverse, and it would be explicitly documented as "you can't do this". The language might even be defined such that explicitly passing None to a question-marked argument immediately raises an exception. No compatibility issues, since the question-marked argument language feature doesn't yet exist.

From late-bound arguments to deferred computation, part 1

Posted Aug 18, 2022 16:49 UTC (Thu) by iabervon (subscriber, #722) [Link]

It's not actually magic state that's new to Python. You can get unbound locals today:

def a():
  if b():
    x = c()
  ...
  if d():
    print(x)
  ...

If b() returns a false value, x will be unbound, and if d() then returns a true value, the access will raise an UnboundLocalError (which is a good property in magic extra state: observing it is an unsubtle error with no way of continuing the expression despite it). My proposal would just add another way to get into that situation, and a way to handle being in it cleanly. It would mean that

def f(x?):
   ...

is effectively

def f(...):
  if [the caller provided a value for x]:
    x = [that value]

with only the parts in brackets being things you can't write yourself today. (Those being what people would like to care about without conflating it with anything else.)

From late-bound arguments to deferred computation, part 1

Posted Aug 18, 2022 18:46 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Problem is, this syntax has already been proposed to mean "x = [] if x is not None" to align with the already-in-Python typing.Optional[T] type hint, and the null-coalescing or Maybe-coalescing operators in several non-Python languages. Giving it a totally new unbound-coalescing meaning would be pretty far off the beaten track, especially when you consider that other languages have been moving away from allowing variables to be uninitialized in the first place (C and C++ allow it on pain of UB, most other languages either shout at you at compile time or just zero-initialize everything automatically).


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds