May 6, 2009
This article was contributed by Nathan Willis
Google uses Python for many of its engineering projects, from internal
server monitoring and reporting to outward-facing products like Google
Groups, so it is no surprise that the company wants to improve Python
application performance. A group of Google developers is working on a new
optimization branch of Python dubbed Unladen Swallow, with
the goal of a five-fold speed increase over the trunk. It will achieve
that goal by adding just-in-time compilation and a new virtual machine
design, all while retaining source compatibility for Python application
developers.
Unladen Swallow's lead developers Collin Winter, Jeffrey Yasskin, and
Thomas Wouters have long been core developers for the CPython project, the reference
implementation and most widespread interpreter for the Python language.
All three are Google employees, and others contribute their "twenty percent
time" to Unladen Swallow, but the group insists that it is a Python
project, not an effort owned by Google.
Winter said the origin of the idea dates back to his work on the
web-based code review tool Mondrian,
when the team's attempts at optimization repeatedly hit limitations in
CPython, such as the Global Interpreter Lock (GIL), the mutex that prevents
concurrency on multiprocessor or multi-core machines. While researching
potential speed-ups and changes, Winter and the other Google engineers
eventually decided that the long-range ideas they had in mind were
significant enough to warrant making a separate branch. Plus, doing so
would give them the chance to stress-test their ideas before trying to roll
them back into CPython.
The Concept: a bird's eye view
The core of the Unladen Swallow team's planned improvements
is to remove performance bottlenecks in the Python virtual machine (VM)
design, leaving the rest of the interpreter — not to mention the
substantial runtime library — relatively untouched. The long-term
plan is to replace CPython's existing stack-based VM with a
register-based VM built with code from the Low
Level Virtual Machine (LLVM) project, and to add a just-in-time
compiler (JIT) on top of the new VM. Other performance-based improvements
are welcome at the same time, and the team has several in store based on
their talks with heavy Python users.
Using a JIT will speed up execution by compiling to machine code, thus
eliminating the overhead of fetching, decoding, and dispatching Python
opcodes. "In CPython," Winter explained, "this overhead
is significant; some minor tweaks were made to CPython 2.7 that netted a
15% speed-up with relatively little work."
Adding the JIT presents a good opportunity to switch from a stack-based
VM to LLVM's register-based design, which Winter said will net its own
performance benefits. The merits of stack- versus register-based VMs is an
ongoing debate, but Winter cites a 2005 study
[PDF] from the Lua project showcasing
the empirical benefits of the register-based design.
Unladen Swallow is based on Python 2.6.1, which is not the most recent
release. Python 3.0 was released in
December of 2008, implementing the backward-incompatible
3.0 version of Python. Because the majority of Python code in the wild
— and in use at Google — is still written for Python 2.x, the
Unladen Swallow team decided to focus its efforts on the earlier version
where more benefits would be felt. By using the CPython source as its
base, Python users can expect Unladen Swallow to retain 100% source
compatibility.
Still, Winter said, the team does keep in close contact with Python
designer Guido van Rossum (himself a Google employee) and other members of
the CPython team. "In our discussions with Guido and others about
how and where to merge our changes back into CPython, the idea has been
proposed that Unladen Swallow should merge into 3.x. 3.x is the future of
the language, and if 3.x is significantly faster than 2.x, that's an
obvious incentive to port applications and libraries to 3.x. None of that
is set in stone, and Guido may well change his mind."
Recent sightings
The team has set a tight development schedule
for Unladen Swallow, making quarterly milestone releases. The
first release, 2009Q1, was limited in scope, aiming for a 25 to 35% speed
increase over vanilla CPython by making less than drastic changes to the
code. The changes include a new eval loop reimplemented using vmgen, several improvements to the garbage
collector — better tracking long-lived objects so that the garbage
collection can make fewer collection runs — and to the data
serialization module cPickle, which the developers said will benefit web
applications in particular. Several obscure Python opcodes were also
removed and replaced with functionally-equivalent Python functions, which
reduces code size without affecting performance.
Unladen Swallow 2009Q1 is available as source code only for the time
being, and can be checked out as a branch
from the project's public Subversion repository. No specific compilation
instructions are provided because this release closely follows the upstream
CPython, but the developers do recommend building in 64-bit mode in order
to take the fullest advantage of the performance increases.
Since speed of execution is the goal, the team performs regular benchmarks
on the code. The thirteen benchmark tests in the suite are based on
real-world performance tests designed to highlight practical application
tasks, particularly for web applications. The results
of the tests on Unladen Swallow 2009Q1 versus CPython 2.6.1 are posted on
the project wiki; Unladen Swallow ranges from 7.43% faster to 157.17%
faster, beating CPython on every benchmark.
Work is underway now on Unladen Swallow 2009Q2, which will focus on
replacing the existing CPython VM with an equivalent built using LLVM.
Elsewhere in the ecosystem
Other open source projects have sought to improve Python application
execution using some of the same ideas. Psyco was an earlier JIT for
Python, but which was later superseded by the PyPy project.
PyPy's primary goal is not performance, though, rather it is to build a
Python implementation in Python itself. Stackless Python implements
concurrency through the use of its own scheduler and special primitives
called "tasklets." Finally, the Parrot project is implementing Python on
its own register-based VM.
In some ways, Unladen Swallow is more ambitious than these other
projects, particularly when you consider the rapid pace of development laid
out in the road map. On the other hand, Unladen Swallow starts from the
CPython 2.6.1 code base, and incorporates many CPython developers, which
greatly improves the chances that its changes will one day be blessed as
the official CPython release. Many of the 2009Q1 changes have already been
sent upstream
to CPython, and the door is still wide open for the 3.0 series should the
JIT and VM performance deliver real-world performance increases anywhere
close to the expected 400 percent.
(
Log in to post comments)