Development

Delayed execution for Python

By Jake Edge
March 15, 2017

A discussion about more efficient debug logging on the python-ideas mailing list quickly morphed into thinking about a more general construct for Python: a delayed evaluation mechanism. The basic problem is that Python will fully evaluate arguments to functions even when they will end up being unused. That is always inefficient but may not seriously affect performance except for expensive argument calculations, where that inefficiency can really add up.

The discussion started with a post from Barry Scott, who was looking for a way to avoid the costly evaluation of arguments to a debug logging routine. The value of the arguments would never be used because debugging was disabled. He had code something like the following:

    debug = False

    def debuglog(msg):
        if debug:
	    print('Debug: %s' % (msg,))

    debuglog('Info: %s' % (expensive(),))

A way to avoid the call to expensive(), since the value would never be used, is what Scott was seeking. Several people suggested using lambda to create a "callable" object and to pass that to debuglog():

    def debuglog(msg):
        if debug:
	    if callable(msg):
	        msg = msg()
            print('Debug: %s' % (msg,))

    debuglog(lambda: 'Info: %s' % (expensive(),))

Python's lambda expressions create a temporary function (i.e. a callable object), so the effect here is to pass the string as a function. It will only be evaluated in debuglog() if debugging messages are enabled, thus avoiding the expensive() call when it is not needed.

Victor Stinner suggested using a preprocessor to remove the debug code for production as PEP 511 (which he authored) would allow if it gets accepted. But Marc-Andre Lemburg said: "preprocessors are evil, let's please not have them in Python :-)". Instead, he recommended using the __debug__ flag and the -0 command-line option for production, which will cause Python to not generate any code for if __debug__: blocks. It will, however, also eliminate assert statements, which Stinner saw as a flaw to that plan.

Meanwhile, Steven D'Aprano noted that he has sometimes thought that Python should have some kind of delayed (or lazy) evaluation construct. For example purposes, he used a construct like "<#expensive()#>" as a way to create a "lightweight 'thunk' for delayed evaluation". Python does not have thunks, however; D'Aprano believes that adding them would help solve the delayed/lazy evaluation problem:

That's what thunks could give us, if only we had a clear picture of how they would work, when they would be evaluated, and what syntax they should use.

That thread led Joseph Hackman to propose a new delayed keyword to denote expressions that should be lazily evaluated. Here is an example of how the new keyword would be used:

    debuglog('Info: %s', (delayed: expensive(),))

Hackman went on to describe how he envisioned it working:

Unlike 'lambda' which returns a function (so the receiver must be lambda-aware), delayed execution blocks are for all purposes values. The first time the value (rather than location) is read, or any method on the delayed object is called, the expression is executed and the delayed expression is replaced with the result. (Thus, the delayed expression is only [ever] evaluated once).

Joseph Jevnik pointed to the lazy module, which uses a decorator to identify functions that should be lazily evaluated. David Mertz also mentioned the Dask parallel computing library, which has a delayed() function to indicate deferred evaluation. In the earlier thread, comparisons were made to generators and Future objects, as well. Some of those might serve as inspiration for implementing lazy evaluation in the core language.

One problem with the proposal was pointed out by D'Aprano: adding new keywords to Python (or any language) is difficult. They are not backward compatible and will break working code. Something like "delayed" is quite likely to have been used by vast numbers of programs as a variable or function name, for example. He also had questions about the semantics of delayed, especially with regard to exactly what would actually trigger the evaluation; that is, when does the value become concrete?

D'Aprano and Chris Angelico asked some questions about how it would all work. Hackman would like to see it remain simple:

As for what triggers execution? I think everything except being on the right side of an assignment. Even identity. So if a delayed expression would evaluate to None, then code that checks is None should return true. I think this is important to ensure that no code needs to be changed to support this feature.

Angelico asked about putting delayed objects into collections (e.g. lists, dictionaries); would that cause the object to be evaluated or not? It is an important question, since variable-length argument lists and keyword arguments are both packaged into collections before being passed to the function. But Hackman thinks it should all work out (with "some TLC") because the actual, concrete values are not required to put objects into lists or as values in dictionaries.

There was some inevitable bikeshedding about names as well. Abe Dillon suggested "lazy" as a shorter keyword. Joshua Morton did some research on GitHub repositories and found that "deferred" seems to be the least used among the terms he tried ("delay, defer, lazy, delayed, deferred, etc."), but it still appears 350,000 times. There was also discussion of whether the colon was needed or desired, without any clear-cut consensus.

In a third thread, Michel Desmoulin described another use case for lazy (or delayed). He has some code that is similar to the following:

    val = conf.get('setting', load_from_db('setting'))

But that means that the setting is queried from the database each time. He would rather do something like:

    val = conf.get('setting', lazy load_from_db('setting'))

That, of course, led to ideas for other ways to approach the problem, but they suffered from a few disadvantages. One could make conf a subclass of dict and define a __missing__() method to call load_from_db(), as Angelico suggested. Or create a different kind of dict subclass, as Markus Meskanen proposed. Both of those make some assumptions as Desmoulin pointed out:

- I have access to the code instantiating conf;
- all code using conf are using load_from_db as a default value;
- load_from_db exists for all code using the conf object

He went on to outline a couple other features that are not "needed" in Python (e.g. list comprehensions), but he conceded that "you don't need lazy, it's just convenient". He further expanded on that in another post, showing the reasons he thinks that the other solutions, while potentially workable in some cases, are not particularly convenient or elegant:

lazy is not only practical, but it's also beautiful. It reads well. It solves a problem we all have on a regular basis.

Yes we can live without it. I mean, Python is already incredibly convenient, of course whatever we suggest now is going to be a cherry on top of the language cake.

Hackman plans to write up a PEP for lazy, though nothing has been posted yet. But Lemburg is unconvinced that the feature can be specified well enough to be useful:

For the discussion, it would help if you'd write up a definition of where the lazy evaluation should finally happen, which use cases would be allowed or not and how the compiler could detect these.

IMO, there are much better ways to write code which only evaluates expensive code when really needed.

I don't see how "lazy" could automate this in a well defined, helpful and obvious way, simply because the side effects of moving evaluation from the place of definition to an arbitrary other place in the code are not easy to manage.

In the end, any new keyword is going to be a hard sell. Some other syntax might make the feature more palatable, but BDFL Guido van Rossum is notoriously averse to operators that seem "Perl-esque", so that will need to be avoided as well. The lazy feature certainly garnered some support in the threads, but few core developers participated, so it is hard to know how the PEP might be received.

Comments (17 posted)

Brief items

Development quotes of the week

If there’s one thing I’ve learned in open source, it’s this: the more work you do, the more work gets asked of you. There is no solution to that problem that I’m aware of.

— Nolan Lawson

If you’re using a popular programming language in your field, chances are that any problem you encounter has already been solved.

— Curtis Miller

Blessed be the day (last friday actually) when I realized that instead of waiting for "some of those geeks on github" to complete the french translation, I could actually contribute by doing it myself :)

— Nicolas Robadey (Thanks to Tomas Pospisek)

Comments (none posted)

LLVM 4.0.0 released

The LLVM 4.0.0 release is out. "This release is the result of the community's work over the past six months, including: use of profile data in ThinLTO, more aggressive aggressive dead code elimination, experimental support for coroutines, experimental AVR target, better GNU ld compatibility and significant performance improvements in LLD, as well as improved optimizations, many bug fixes and more." The LLVM compiler project has moved to a new numbering scheme with this release, where the first number increments with each major release.

Full Story (comments: 14)

MATE 1.18 released

Version 1.18 of the MATE desktop has been released. "The release is focused on completing the migration to GTK3+ and adopting new technologies to replace some of deprecated components MATE Desktop 1.16 still relied on."

Comments (5 posted)

SciPy 0.19.0

Scipy 0.19.0 has been released. "SciPy 0.19.0 is the culmination of 7 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.19.x branch, and on adding new features on the master branch."

Full Story (comments: none)

U-Boot v2017.03 is released

U-Boot 2017.03 has been released. "So, some biggish news. As things stand today, this is the last release where the Blackfin and SPARC, unless a new maintainer wants to step in. The currently listed maintainers haven't gotten back to me of late."

Full Story (comments: none)

Newsletters and articles

Development newsletters

Comments (none posted)

Haas: Parallel Query v2

Robert Haas describes the many parallelism enhancements in the upcoming PostgreSQL 10 release. "The Gather node introduced in PostgreSQL 9.6 gathers results from all workers in an arbitrary order. That's fine if the data that the workers were producing had no particular ordering anyway, but if each worker is producing sorted output, then it would be nice to gather those results in a way that preserves the sort order. This is what Gather Merge does. It can speed up queries where it's useful for the results of the parallel portion of the plan to have a particular sort order, and where the parallel portion of the plan produces enough rows that performing an ordinary Gather followed by a Sort would be expensive."

Comments (2 posted)

Page editor: Rebecca Sobol
Next page: Announcements>>