Development
Delayed execution for Python
A discussion about more efficient debug logging on the python-ideas mailing list quickly morphed into thinking about a more general construct for Python: a delayed evaluation mechanism. The basic problem is that Python will fully evaluate arguments to functions even when they will end up being unused. That is always inefficient but may not seriously affect performance except for expensive argument calculations, where that inefficiency can really add up.
The discussion started with a post from Barry Scott, who was looking for a way to avoid the costly evaluation of arguments to a debug logging routine. The value of the arguments would never be used because debugging was disabled. He had code something like the following:
debug = False def debuglog(msg): if debug: print('Debug: %s' % (msg,)) debuglog('Info: %s' % (expensive(),))
A way to avoid the call to expensive(), since the value would never be used, is what Scott was seeking. Several people suggested using lambda to create a "callable" object and to pass that to debuglog():
def debuglog(msg): if debug: if callable(msg): msg = msg() print('Debug: %s' % (msg,)) debuglog(lambda: 'Info: %s' % (expensive(),))
Python's lambda expressions create a temporary function (i.e. a callable object), so the effect here is to pass the string as a function. It will only be evaluated in debuglog() if debugging messages are enabled, thus avoiding the expensive() call when it is not needed.
Victor Stinner suggested using a
preprocessor to remove the debug code for production as PEP 511 (which he
authored) would allow if it gets accepted. But Marc-Andre Lemburg said: "preprocessors are evil, let's
please not have them in Python :-)
". Instead, he recommended using
the __debug__ flag and the -0
command-line option for production, which will cause Python to not
generate any code
for if __debug__: blocks. It will, however, also eliminate
assert statements, which Stinner saw as a flaw to that plan.
Meanwhile, Steven D'Aprano noted that he
has sometimes thought that Python should have some kind of delayed (or
lazy) evaluation construct. For example purposes, he used a construct like
"<#expensive()#>" as a way to create a "lightweight
'thunk' for
delayed evaluation
". Python does not have thunks, however; D'Aprano
believes that adding them would help solve the delayed/lazy evaluation
problem:
That thread led Joseph Hackman to propose a new delayed keyword to denote expressions that should be lazily evaluated. Here is an example of how the new keyword would be used:
debuglog('Info: %s', (delayed: expensive(),))
Hackman went on to describe how he envisioned it working:
Joseph Jevnik pointed to the lazy module, which uses a decorator to identify functions that should be lazily evaluated. David Mertz also mentioned the Dask parallel computing library, which has a delayed() function to indicate deferred evaluation. In the earlier thread, comparisons were made to generators and Future objects, as well. Some of those might serve as inspiration for implementing lazy evaluation in the core language.
One problem with the proposal was pointed out by D'Aprano: adding new keywords to Python (or any language) is difficult. They are not backward compatible and will break working code. Something like "delayed" is quite likely to have been used by vast numbers of programs as a variable or function name, for example. He also had questions about the semantics of delayed, especially with regard to exactly what would actually trigger the evaluation; that is, when does the value become concrete?
D'Aprano and Chris Angelico asked some questions about how it would all work. Hackman would like to see it remain simple:
Angelico asked about putting delayed
objects into collections (e.g. lists, dictionaries); would that cause the object
to be evaluated or not? It is an important question, since variable-length
argument lists and keyword arguments are both packaged into collections
before being passed to the function. But Hackman thinks it should all work out (with
"some TLC
") because the actual, concrete values are not
required to put objects into lists or as values in dictionaries.
There was some inevitable bikeshedding about names as well. Abe Dillon suggested "lazy" as a shorter
keyword. Joshua Morton did some research on GitHub repositories and found that "deferred" seems to be the least
used among the terms he tried ("delay, defer, lazy, delayed,
deferred, etc.
"), but it still appears 350,000 times. There was
also discussion of whether the colon was needed or desired, without any
clear-cut consensus.
In a third thread, Michel Desmoulin described another use case for lazy (or delayed). He has some code that is similar to the following:
val = conf.get('setting', load_from_db('setting'))But that means that the setting is queried from the database each time. He would rather do something like:
val = conf.get('setting', lazy load_from_db('setting'))
That, of course, led to ideas for other ways to approach the problem, but they suffered from a few disadvantages. One could make conf a subclass of dict and define a __missing__() method to call load_from_db(), as Angelico suggested. Or create a different kind of dict subclass, as Markus Meskanen proposed. Both of those make some assumptions as Desmoulin pointed out:
- all code using conf are using load_from_db as a default value;
- load_from_db exists for all code using the conf object
He went on to outline a couple other features that are not "needed" in
Python (e.g. list comprehensions), but he conceded that "you don't need
lazy, it's just convenient
". He further expanded on that in another post, showing the reasons he thinks
that the other solutions, while potentially workable in some cases, are not
particularly convenient or elegant:
Yes we can live without it. I mean, Python is already incredibly convenient, of course whatever we suggest now is going to be a cherry on top of the language cake.
Hackman plans to write up a PEP for lazy, though nothing has been posted yet. But Lemburg is unconvinced that the feature can be specified well enough to be useful:
IMO, there are much better ways to write code which only evaluates expensive code when really needed.
I don't see how "lazy" could automate this in a well defined, helpful and obvious way, simply because the side effects of moving evaluation from the place of definition to an arbitrary other place in the code are not easy to manage.
In the end, any new keyword is going to be a hard sell. Some other syntax might make the feature more palatable, but BDFL Guido van Rossum is notoriously averse to operators that seem "Perl-esque", so that will need to be avoided as well. The lazy feature certainly garnered some support in the threads, but few core developers participated, so it is hard to know how the PEP might be received.
Brief items
Development quotes of the week
LLVM 4.0.0 released
The LLVM 4.0.0 release is out. "This release is the result of the community's work over the past six months, including: use of profile data in ThinLTO, more aggressive aggressive dead code elimination, experimental support for coroutines, experimental AVR target, better GNU ld compatibility and significant performance improvements in LLD, as well as improved optimizations, many bug fixes and more." The LLVM compiler project has moved to a new numbering scheme with this release, where the first number increments with each major release.
MATE 1.18 released
Version 1.18 of the MATE desktop has been released. "The release is focused on completing the migration to GTK3+ and adopting new technologies to replace some of deprecated components MATE Desktop 1.16 still relied on."
SciPy 0.19.0
Scipy 0.19.0 has been released. "SciPy 0.19.0 is the culmination of 7 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.19.x branch, and on adding new features on the master branch."
U-Boot v2017.03 is released
U-Boot 2017.03 has been released. "So, some biggish news. As things stand today, this is the last release where the Blackfin and SPARC, unless a new maintainer wants to step in. The currently listed maintainers haven't gotten back to me of late."
Newsletters and articles
Development newsletters
- Emacs news (March 13)
- These Weeks in Firefox (March 14)
- What's cooking in git.git (March 8)
- What's cooking in git.git (March 10)
- What's cooking in git.git (March 13)
- What's cooking in git.git (March 14)
- Git Rev News (March 15)
- OCaml Weekly News (March 14)
- Perl Weekly (March 13)
- PostgreSQL Weekly News (March 12)
- Python Weekly (March 9)
- Ruby Weekly (March 9)
- This Week in Rust (March 14)
- Wikimedia Tech News (March 13)
Haas: Parallel Query v2
Robert Haas describes the many parallelism enhancements in the upcoming PostgreSQL 10 release. "The Gather node introduced in PostgreSQL 9.6 gathers results from all workers in an arbitrary order. That's fine if the data that the workers were producing had no particular ordering anyway, but if each worker is producing sorted output, then it would be nice to gather those results in a way that preserves the sort order. This is what Gather Merge does. It can speed up queries where it's useful for the results of the parallel portion of the plan to have a particular sort order, and where the parallel portion of the plan produces enough rows that performing an ordinary Gather followed by a Sort would be expensive."
Page editor: Rebecca Sobol
Next page:
Announcements>>