Unladen swallow: accelerating Python
Google uses Python for many of its engineering projects, from internal server monitoring and reporting to outward-facing products like Google Groups, so it is no surprise that the company wants to improve Python application performance. A group of Google developers is working on a new optimization branch of Python dubbed Unladen Swallow, with the goal of a five-fold speed increase over the trunk. It will achieve that goal by adding just-in-time compilation and a new virtual machine design, all while retaining source compatibility for Python application developers.
Unladen Swallow's lead developers Collin Winter, Jeffrey Yasskin, and Thomas Wouters have long been core developers for the CPython project, the reference implementation and most widespread interpreter for the Python language. All three are Google employees, and others contribute their "twenty percent time" to Unladen Swallow, but the group insists that it is a Python project, not an effort owned by Google.
Winter said the origin of the idea dates back to his work on the web-based code review tool Mondrian, when the team's attempts at optimization repeatedly hit limitations in CPython, such as the Global Interpreter Lock (GIL), the mutex that prevents concurrency on multiprocessor or multi-core machines. While researching potential speed-ups and changes, Winter and the other Google engineers eventually decided that the long-range ideas they had in mind were significant enough to warrant making a separate branch. Plus, doing so would give them the chance to stress-test their ideas before trying to roll them back into CPython.
The Concept: a bird's eye view
The core of the Unladen Swallow team's planned improvements is to remove performance bottlenecks in the Python virtual machine (VM) design, leaving the rest of the interpreter — not to mention the substantial runtime library — relatively untouched. The long-term plan is to replace CPython's existing stack-based VM with a register-based VM built with code from the Low Level Virtual Machine (LLVM) project, and to add a just-in-time compiler (JIT) on top of the new VM. Other performance-based improvements are welcome at the same time, and the team has several in store based on their talks with heavy Python users.
Using a JIT will speed up execution by compiling to machine code, thus
eliminating the overhead of fetching, decoding, and dispatching Python
opcodes. "In CPython,
" Winter explained, "this overhead
is significant; some minor tweaks were made to CPython 2.7 that netted a
15% speed-up with relatively little work.
"
Adding the JIT presents a good opportunity to switch from a stack-based VM to LLVM's register-based design, which Winter said will net its own performance benefits. The merits of stack- versus register-based VMs is an ongoing debate, but Winter cites a 2005 study [PDF] from the Lua project showcasing the empirical benefits of the register-based design.
Unladen Swallow is based on Python 2.6.1, which is not the most recent release. Python 3.0 was released in December of 2008, implementing the backward-incompatible 3.0 version of Python. Because the majority of Python code in the wild — and in use at Google — is still written for Python 2.x, the Unladen Swallow team decided to focus its efforts on the earlier version where more benefits would be felt. By using the CPython source as its base, Python users can expect Unladen Swallow to retain 100% source compatibility.
Still, Winter said, the team does keep in close contact with Python
designer Guido van Rossum (himself a Google employee) and other members of
the CPython team. "In our discussions with Guido and others about
how and where to merge our changes back into CPython, the idea has been
proposed that Unladen Swallow should merge into 3.x. 3.x is the future of
the language, and if 3.x is significantly faster than 2.x, that's an
obvious incentive to port applications and libraries to 3.x. None of that
is set in stone, and Guido may well change his mind.
"
Recent sightings
The team has set a tight development schedule for Unladen Swallow, making quarterly milestone releases. The first release, 2009Q1, was limited in scope, aiming for a 25 to 35% speed increase over vanilla CPython by making less than drastic changes to the code. The changes include a new eval loop reimplemented using vmgen, several improvements to the garbage collector — better tracking long-lived objects so that the garbage collection can make fewer collection runs — and to the data serialization module cPickle, which the developers said will benefit web applications in particular. Several obscure Python opcodes were also removed and replaced with functionally-equivalent Python functions, which reduces code size without affecting performance.
Unladen Swallow 2009Q1 is available as source code only for the time being, and can be checked out as a branch from the project's public Subversion repository. No specific compilation instructions are provided because this release closely follows the upstream CPython, but the developers do recommend building in 64-bit mode in order to take the fullest advantage of the performance increases.
Since speed of execution is the goal, the team performs regular benchmarks on the code. The thirteen benchmark tests in the suite are based on real-world performance tests designed to highlight practical application tasks, particularly for web applications. The results of the tests on Unladen Swallow 2009Q1 versus CPython 2.6.1 are posted on the project wiki; Unladen Swallow ranges from 7.43% faster to 157.17% faster, beating CPython on every benchmark.
Work is underway now on Unladen Swallow 2009Q2, which will focus on replacing the existing CPython VM with an equivalent built using LLVM.
Elsewhere in the ecosystem
Other open source projects have sought to improve Python application execution using some of the same ideas. Psyco was an earlier JIT for Python, but which was later superseded by the PyPy project. PyPy's primary goal is not performance, though, rather it is to build a Python implementation in Python itself. Stackless Python implements concurrency through the use of its own scheduler and special primitives called "tasklets." Finally, the Parrot project is implementing Python on its own register-based VM.
In some ways, Unladen Swallow is more ambitious than these other projects, particularly when you consider the rapid pace of development laid out in the road map. On the other hand, Unladen Swallow starts from the CPython 2.6.1 code base, and incorporates many CPython developers, which greatly improves the chances that its changes will one day be blessed as the official CPython release. Many of the 2009Q1 changes have already been sent upstream to CPython, and the door is still wide open for the 3.0 series should the JIT and VM performance deliver real-world performance increases anywhere close to the expected 400 percent.
Index entries for this article | |
---|---|
GuestArticles | Willis, Nathan |
Posted May 7, 2009 8:53 UTC (Thu)
by djc (subscriber, #56880)
[Link]
Posted May 7, 2009 9:20 UTC (Thu)
by niner (subscriber, #26151)
[Link] (5 responses)
Posted May 7, 2009 9:26 UTC (Thu)
by brouhaha (subscriber, #1698)
[Link] (1 responses)
On the other hand, the semantic level of LLVM is somewhat lower than that of Parrot.
Posted May 11, 2009 19:45 UTC (Mon)
by chromatic (guest, #26207)
[Link]
Posted May 7, 2009 10:55 UTC (Thu)
by Frej (guest, #4165)
[Link] (2 responses)
Or from the horses mouth... http://llvm.org/Users.html
I'm not saying it's the best or wisest choice. I'm not qualified to do that ;) - just that it's far from starting on a fresh.
Posted May 7, 2009 11:00 UTC (Thu)
by niner (subscriber, #26151)
[Link]
Posted May 7, 2009 14:12 UTC (Thu)
by mjthayer (guest, #39183)
[Link]
Posted May 7, 2009 9:57 UTC (Thu)
by epa (subscriber, #39769)
[Link] (4 responses)
Shed Skin accepts a subset of Python and translates it to C++ which is then compiled to pure native code. It would be interesting to see how its performance compares to Unladen Swallow.
Posted May 7, 2009 14:03 UTC (Thu)
by k8to (guest, #15413)
[Link] (2 responses)
Posted May 22, 2009 14:13 UTC (Fri)
by pboddie (guest, #50784)
[Link] (1 responses)
The sad thing is that because of people going round and pointing the finger at numerous projects claiming that they're "useless", progress on some of the more promising ones has been very slow. I'm not convinced that whole-program analysis will give the best bang for the buck with Python, but given that the author of Shed Skin is, as far as I'm aware, the only guy really doing anything in this area in Python and in public, calling it "useless" is just a continuation of the trend of narrow-mindedness that pushes everything but the current "favourite" to the margins, leading the developers of these marginalised projects to make pessimistic multi-year estimates about when their projects will supposedly be "useful" enough for the finger-pointers.
Posted Jun 2, 2009 14:18 UTC (Tue)
by k8to (guest, #15413)
[Link]
It might make a useful tool for writing small subsets of python code for special purposes. RPython for example is not going to be adopted outside the pypy world becuase it's far less useful than Python for writing real code.
People put forth Shed Skin typically as a general python implementation, which it isn't. As a general python implementation it is useless, because it is not.
Maybe it's the next generation of Pyrexx.
Posted May 8, 2009 18:49 UTC (Fri)
by amk (subscriber, #19)
[Link]
Posted May 7, 2009 12:52 UTC (Thu)
by faassen (guest, #1676)
[Link] (1 responses)
Perhaps considering the aggressive road map, as you mention, but there is no Python interpreter project more ambitious than PyPy, which is one of the projects you make this comparison with. Unladen Swallow is an incremental improvement project. PyPy is a conceptual rethink giant leap forward style project. Unladen Swallow is therefore far less risky, but I'd also call PyPy rather more ambitious in comparison.
Posted May 8, 2009 21:03 UTC (Fri)
by man_ls (guest, #15091)
[Link]
Posted May 7, 2009 21:35 UTC (Thu)
by kune (guest, #172)
[Link] (1 responses)
It's certainly super-smart to adress the compatibility issue by starting with the CPython source code. It has also the nice effect that patches are merged upstream. The next CPython releases will have improved performance based on work done in the project.
Whether the LLVM implementation will really lead to the performance targets set by the project is an open question. A concern is also that LLVM requires a C++ compiler making support on exotic platforms more difficult.
Regardless of those concerns Unladen Swallow is a project worth being started.
Posted May 8, 2009 2:40 UTC (Fri)
by jamesh (guest, #1159)
[Link]
There are many C extensions that depend on the GIL for safe operation. Borrowing references, using it to synchronise access to their own state, etc. Much of the C API is written such that it requires the GIL to be used safely too.
I do think that removing the GIL is a worthy goal if they can achieve it without significant performance decrease, but it is one area where it will be difficult to keep compatibility.
Posted May 7, 2009 23:43 UTC (Thu)
by dag- (guest, #30207)
[Link] (1 responses)
If marketing reasons would bring it to only python 3.x, I guess the sentiment towards python in general could backfire in the community. Nobody likes to be forced (even not gently) if not needed.
Posted May 8, 2009 6:33 UTC (Fri)
by kune (guest, #172)
[Link]
Unladen swallow: accelerating Python
Unladen swallow: accelerating Python
backend, when with Parrot, there's already an effort on it's way to do a
register-based VM Python implementation. Seems like duplicated effort for
no obvious gain to me.
Unladen swallow: accelerating Python
Unladen swallow: accelerating Python
Unladen swallow: accelerating Python
It has been used by Apple in their graphics pipeline from OS X 10.5
It's the future of some linux drivers as well, known as gallium3D. I think it has already been merged....
Unladen swallow: accelerating Python
implementation based on one.
Unladen swallow: accelerating Python
The article forgot to mention IronPython, which runs on the CLI virtual machine (.NET / Mono). Although perhaps that just runs on top of a virtual machine, rather than compiling Python programs directly into VM opcodes?
Other Python implementations
Other Python implementations
Other Python implementations
Other Python implementations
Other Python implementations
Unladen swallow: accelerating Python
The author said "In some ways"; don't know about PyPy, but I'd say that a target of a 400% speed increase is pretty ambitious. Integration in Python 3.x seems quite ambitious too.
Only in some ways
Unladen swallow: accelerating Python
Unladen swallow: accelerating Python
Unladen swallow: accelerating Python
Unladen swallow: accelerating Python