Reducing Python's startup time
The startup time for the Python interpreter has been discussed by the core developers and others numerous times over the years; optimization efforts are made periodically as well. Startup time can dominate the execution time of command-line programs written in Python, especially if they import a lot of other modules. Python startup time is worse than some other scripting languages and more recent versions of the language are taking more than twice as long to start up when compared to earlier versions (e.g. 3.7 versus 2.7). The most recent iteration of the startup time discussion has played out in the python-dev and python-ideas mailing lists since mid-July. This time, the focus has been on the collections.namedtuple() data structure that is used in multiple places throughout the standard library and in other Python modules, but the discussion has been more wide-ranging than simply that.
A "named tuple" is a way to assign field names to elements in a Python tuple object. The canonical example is to create a Point class using the namedtuple() factory:
Point = namedtuple('Point', ['x', 'y'])
p = Point(1,2)
The elements of the named tuple can then be accessed using the field names
(e.g. p.x) in addition to the usual p[0] mechanism.
A bug filed in November
2016 identified namedtuple() as a culprit in increasing the
startup time for importing the functools
standard library module. The suggested
solution was to
replace the namedtuple() call with its equivalent Python code that
was copied from the _source
attribute of a class created with namedtuple(). The
_source attribute contains the pure Python
implementation of the named tuple class, which eliminates the need to
create and
execute some of that code at import time (which is what namedtuple()
does).
There are a few problems with that approach, including the fact that any updates or fixes to what namedtuple() produces would not be reflected in functools. Beyond that, though, named tuple developer Raymond Hettinger was not convinced there was a real problem:
Nick Coghlan agreed with Hettinger's assessment:
Hettinger closed the bug, though it was reopened in December to consider a different approach using Argument Clinic and subsequently closed again for more or less the same reasons. That's where it stood until mid-July when Jelle Zijlstra added a comment that pointed to a patch to speed up named tuple creation by avoiding some of the exec() calls. It was mostly compatible with the existing implementation, though it did not support the _source attribute. That led to a classic "bug war", of sorts, where people kept reopening the bug, only to see it be immediately closed again. It is clear that some felt that the arguments for closing the bug were not particularly compelling.
After several suggestions that the proper way to override the bug-closing
decisions
made by
Hettinger and Coghlan was to take the issue to python-dev, Antoine Pitrou did just that. According to Pitrou, the two
main complaints about
the proposed fix were that it eliminated the _source attribute and
that "optimizing startup cost is supposedly not worth the
effort
". Pitrou argued that _source is effectively unused
by any Python code that he could find and that startup optimizations are
quite useful:
In addition, the _source attribute is something of an odd duck in that it would seem to be part of the private interface because it is prefixed with an underscore, but also that it is meant to be used as a learning tool, which is not typical for Python objects. The underscore was used so that source could be used as a tuple field name but, as Hettinger noted, it probably should have been named differently (e.g. source_). But he is adamant that there are benefits to having that attribute, mostly from a learning and understanding standpoint.
Ever the pragmatist, Guido van Rossum offered something of a compromise. He agreed with Pitrou about the need to optimize named tuple class creation, but hoped that it would still be possible to support Hettinger's use case:
[...] Concluding, I think we should move on from the original implementation and optimize the heck out of namedtuple. The original has served us well. The world is constantly changing. Python should adapt to the (happy) fact that it's being used for systems larger than any of us could imagine 15 years ago.
As might be guessed, a pronouncement like that from Van Rossum, Python's
benevolent
dictator for life (BDFL), led Hettinger to reconsider: "Okay, then Nick and I are
overruled. I'll move Jelle's patch forward. We'll also need to lazily
generate _source but I don't think that will be hard.
" He did add
"one minor grumble
", however, regarding the complexity of the
CPython code:
That tradeoff between complexity and performance is one that has played out in many different development communities over the years—the kernel community faces it regularly. Part of the problem is that the negative effects of a performance optimization may not be seen for a long time. As Coghlan put it:
Van Rossum's pronouncement set off a predictable bikeshedding frenzy around named tuple enhancements that eventually moved to python-ideas and may be worthy of a further look at some point. But there was also some pushback regarding Hettinger's repeated contention that shaving a few milliseconds here and there from the Python startup time was not an important goal. As Barry Warsaw said:
Gregory P. Smith pointed to the commonly mentioned command-line utilities as one place where startup time matters, but also described another problematic area:
[...] In real world applications you do not control the bulk of the code that has chosen to use namedtuple. They're scattered through 100-1000s of other transitive dependency libraries (not just the standard library), the modification of each of which faces hurdles both technical and non-technical in nature.
The discussion (and a somewhat
dismissive tweet from Hettinger [Note: Hettinger strongly disclaims the "dismissive" characterization.]) led Victor Stinner to start a new thread on python-dev to directly
discuss the interpreter startup time, separate from the named tuple issue.
He collected some data that showed that the startup time for the
in-development Python 3.7
is 2.3 times longer than Python 2.7. He also compared the startup of the
Python-based
Mercurial source code management system to that of Git (Mercurial is 45
times slower) as
well as comparing the startup times of several other scripting languages (Python
falls into the
middle of the pack there). In the thread, Pitrou pointed out the importance of
"anecdotal data
", which Hettinger's tweet had dismissed:
Python has come a long way from its roots as a teaching language. There is clearly going to be some tension between the needs of languages geared toward teaching and those of languages used for production-quality applications of various kinds. That means there is a balance to be struck, which is something the core developers (and, in particular, Van Rossum) have been good at over the years. One suspects that startup time—and the named tuple implementation—can be optimized without sacrificing that.
