|
|
Subscribe / Log in / New account

Unladen swallow: accelerating Python

May 6, 2009

This article was contributed by Nathan Willis

Google uses Python for many of its engineering projects, from internal server monitoring and reporting to outward-facing products like Google Groups, so it is no surprise that the company wants to improve Python application performance. A group of Google developers is working on a new optimization branch of Python dubbed Unladen Swallow, with the goal of a five-fold speed increase over the trunk. It will achieve that goal by adding just-in-time compilation and a new virtual machine design, all while retaining source compatibility for Python application developers.

Unladen Swallow's lead developers Collin Winter, Jeffrey Yasskin, and Thomas Wouters have long been core developers for the CPython project, the reference implementation and most widespread interpreter for the Python language. All three are Google employees, and others contribute their "twenty percent time" to Unladen Swallow, but the group insists that it is a Python project, not an effort owned by Google.

Winter said the origin of the idea dates back to his work on the web-based code review tool Mondrian, when the team's attempts at optimization repeatedly hit limitations in CPython, such as the Global Interpreter Lock (GIL), the mutex that prevents concurrency on multiprocessor or multi-core machines. While researching potential speed-ups and changes, Winter and the other Google engineers eventually decided that the long-range ideas they had in mind were significant enough to warrant making a separate branch. Plus, doing so would give them the chance to stress-test their ideas before trying to roll them back into CPython.

The Concept: a bird's eye view

The core of the Unladen Swallow team's planned improvements is to remove performance bottlenecks in the Python virtual machine (VM) design, leaving the rest of the interpreter — not to mention the substantial runtime library — relatively untouched. The long-term plan is to replace CPython's existing stack-based VM with a register-based VM built with code from the Low Level Virtual Machine (LLVM) project, and to add a just-in-time compiler (JIT) on top of the new VM. Other performance-based improvements are welcome at the same time, and the team has several in store based on their talks with heavy Python users.

Using a JIT will speed up execution by compiling to machine code, thus eliminating the overhead of fetching, decoding, and dispatching Python opcodes. "In CPython," Winter explained, "this overhead is significant; some minor tweaks were made to CPython 2.7 that netted a 15% speed-up with relatively little work."

Adding the JIT presents a good opportunity to switch from a stack-based VM to LLVM's register-based design, which Winter said will net its own performance benefits. The merits of stack- versus register-based VMs is an ongoing debate, but Winter cites a 2005 study [PDF] from the Lua project showcasing the empirical benefits of the register-based design.

Unladen Swallow is based on Python 2.6.1, which is not the most recent release. Python 3.0 was released in December of 2008, implementing the backward-incompatible 3.0 version of Python. Because the majority of Python code in the wild — and in use at Google — is still written for Python 2.x, the Unladen Swallow team decided to focus its efforts on the earlier version where more benefits would be felt. By using the CPython source as its base, Python users can expect Unladen Swallow to retain 100% source compatibility.

Still, Winter said, the team does keep in close contact with Python designer Guido van Rossum (himself a Google employee) and other members of the CPython team. "In our discussions with Guido and others about how and where to merge our changes back into CPython, the idea has been proposed that Unladen Swallow should merge into 3.x. 3.x is the future of the language, and if 3.x is significantly faster than 2.x, that's an obvious incentive to port applications and libraries to 3.x. None of that is set in stone, and Guido may well change his mind."

Recent sightings

The team has set a tight development schedule for Unladen Swallow, making quarterly milestone releases. The first release, 2009Q1, was limited in scope, aiming for a 25 to 35% speed increase over vanilla CPython by making less than drastic changes to the code. The changes include a new eval loop reimplemented using vmgen, several improvements to the garbage collector — better tracking long-lived objects so that the garbage collection can make fewer collection runs — and to the data serialization module cPickle, which the developers said will benefit web applications in particular. Several obscure Python opcodes were also removed and replaced with functionally-equivalent Python functions, which reduces code size without affecting performance.

Unladen Swallow 2009Q1 is available as source code only for the time being, and can be checked out as a branch from the project's public Subversion repository. No specific compilation instructions are provided because this release closely follows the upstream CPython, but the developers do recommend building in 64-bit mode in order to take the fullest advantage of the performance increases.

Since speed of execution is the goal, the team performs regular benchmarks on the code. The thirteen benchmark tests in the suite are based on real-world performance tests designed to highlight practical application tasks, particularly for web applications. The results of the tests on Unladen Swallow 2009Q1 versus CPython 2.6.1 are posted on the project wiki; Unladen Swallow ranges from 7.43% faster to 157.17% faster, beating CPython on every benchmark.

Work is underway now on Unladen Swallow 2009Q2, which will focus on replacing the existing CPython VM with an equivalent built using LLVM.

Elsewhere in the ecosystem

Other open source projects have sought to improve Python application execution using some of the same ideas. Psyco was an earlier JIT for Python, but which was later superseded by the PyPy project. PyPy's primary goal is not performance, though, rather it is to build a Python implementation in Python itself. Stackless Python implements concurrency through the use of its own scheduler and special primitives called "tasklets." Finally, the Parrot project is implementing Python on its own register-based VM.

In some ways, Unladen Swallow is more ambitious than these other projects, particularly when you consider the rapid pace of development laid out in the road map. On the other hand, Unladen Swallow starts from the CPython 2.6.1 code base, and incorporates many CPython developers, which greatly improves the chances that its changes will one day be blessed as the official CPython release. Many of the 2009Q1 changes have already been sent upstream to CPython, and the door is still wide open for the 3.0 series should the JIT and VM performance deliver real-world performance increases anywhere close to the expected 400 percent.


Index entries for this article
GuestArticlesWillis, Nathan


to post comments

Unladen swallow: accelerating Python

Posted May 7, 2009 8:53 UTC (Thu) by djc (subscriber, #56880) [Link]

I wouldn't call Unladen Swallow more ambitious than PyPy. PyPy's approach of allowing Python (or Python-like RPython code) to generate a pretty fast interpreter for any language is rather impressive. Even more so when you factor in that they now have a (prototype of a) JIT compiler that's generated for your interpreter and some extremely fast GC implementations (using which can perform better than C code on some benchmarks).

Unladen swallow: accelerating Python

Posted May 7, 2009 9:20 UTC (Thu) by niner (subscriber, #26151) [Link] (5 responses)

I wonder why the Unladen swallow developers start fresh with an LLVM
backend, when with Parrot, there's already an effort on it's way to do a
register-based VM Python implementation. Seems like duplicated effort for
no obvious gain to me.

Unladen swallow: accelerating Python

Posted May 7, 2009 9:26 UTC (Thu) by brouhaha (subscriber, #1698) [Link] (1 responses)

Does Parrot compile to native code? (I haven't looked at it in that much detail.) LLVM generally does. LLVM also has a large number of optional optimization passes that can be applied.

On the other hand, the semantic level of LLVM is somewhat lower than that of Parrot.

Unladen swallow: accelerating Python

Posted May 11, 2009 19:45 UTC (Mon) by chromatic (guest, #26207) [Link]

Parrot has a nascent JIT, but it doesn't currently compile to native code. Parrot's intent right now is to provide excellent compiler tools so that multiple languages can interoperate at a calling conventions level. Optimization is a secondary priority. (It's still a priority, but it's not the primary priority at the moment.)

Unladen swallow: accelerating Python

Posted May 7, 2009 10:55 UTC (Thu) by Frej (guest, #4165) [Link] (2 responses)

I don't think that using LLVM is starting from fresh.
It has been used by Apple in their graphics pipeline from OS X 10.5
It's the future of some linux drivers as well, known as gallium3D. I think it has already been merged....

Or from the horses mouth... http://llvm.org/Users.html

I'm not saying it's the best or wisest choice. I'm not qualified to do that ;) - just that it's far from starting on a fresh.

Unladen swallow: accelerating Python

Posted May 7, 2009 11:00 UTC (Thu) by niner (subscriber, #26151) [Link]

I didn't mean starting a register based VM from fresh but a Python
implementation based on one.

Unladen swallow: accelerating Python

Posted May 7, 2009 14:12 UTC (Thu) by mjthayer (guest, #39183) [Link]

And according to that page, PyPy has LLVM as a target too.

Other Python implementations

Posted May 7, 2009 9:57 UTC (Thu) by epa (subscriber, #39769) [Link] (4 responses)

The article forgot to mention IronPython, which runs on the CLI virtual machine (.NET / Mono). Although perhaps that just runs on top of a virtual machine, rather than compiling Python programs directly into VM opcodes?

Shed Skin accepts a subset of Python and translates it to C++ which is then compiled to pure native code. It would be interesting to see how its performance compares to Unladen Swallow.

Other Python implementations

Posted May 7, 2009 14:03 UTC (Thu) by k8to (guest, #15413) [Link] (2 responses)

Shed skin is useless. It pretends python is a statically typed language.

Other Python implementations

Posted May 22, 2009 14:13 UTC (Fri) by pboddie (guest, #50784) [Link] (1 responses)

Useless? If you consider the restrictions of Shed Skin and those of RPython, which is used as the implementation language in PyPy, there's a lot of overlap.

The sad thing is that because of people going round and pointing the finger at numerous projects claiming that they're "useless", progress on some of the more promising ones has been very slow. I'm not convinced that whole-program analysis will give the best bang for the buck with Python, but given that the author of Shed Skin is, as far as I'm aware, the only guy really doing anything in this area in Python and in public, calling it "useless" is just a continuation of the trend of narrow-mindedness that pushes everything but the current "favourite" to the margins, leading the developers of these marginalised projects to make pessimistic multi-year estimates about when their projects will supposedly be "useful" enough for the finger-pointers.

Other Python implementations

Posted Jun 2, 2009 14:18 UTC (Tue) by k8to (guest, #15413) [Link]

Okay how about: Shed skin is useless for real world python software.

It might make a useful tool for writing small subsets of python code for special purposes. RPython for example is not going to be adopted outside the pypy world becuase it's far less useful than Python for writing real code.

People put forth Shed Skin typically as a general python implementation, which it isn't. As a general python implementation it is useless, because it is not.

Maybe it's the next generation of Pyrexx.

Other Python implementations

Posted May 8, 2009 18:49 UTC (Fri) by amk (subscriber, #19) [Link]

Of some relevance: video of the "Python VMs" panel discussion at PyCon 2009 is at http://blip.tv/file/1947197/ .

Unladen swallow: accelerating Python

Posted May 7, 2009 12:52 UTC (Thu) by faassen (guest, #1676) [Link] (1 responses)

"In some ways, Unladen Swallow is more ambitious than these other projects"

Perhaps considering the aggressive road map, as you mention, but there is no Python interpreter project more ambitious than PyPy, which is one of the projects you make this comparison with. Unladen Swallow is an incremental improvement project. PyPy is a conceptual rethink giant leap forward style project. Unladen Swallow is therefore far less risky, but I'd also call PyPy rather more ambitious in comparison.

Only in some ways

Posted May 8, 2009 21:03 UTC (Fri) by man_ls (guest, #15091) [Link]

The author said "In some ways"; don't know about PyPy, but I'd say that a target of a 400% speed increase is pretty ambitious. Integration in Python 3.x seems quite ambitious too.

Unladen swallow: accelerating Python

Posted May 7, 2009 21:35 UTC (Thu) by kune (guest, #172) [Link] (1 responses)

Unladen Swallow does support Python C modules. All the other implementations (IronPython, JPython and PyPy) don't and they are not decidingly faster than CPython. The developers from Google looked at JPython but the missing C module support was a major reason not to follow that road.

It's certainly super-smart to adress the compatibility issue by starting with the CPython source code. It has also the nice effect that patches are merged upstream. The next CPython releases will have improved performance based on work done in the project.

Whether the LLVM implementation will really lead to the performance targets set by the project is an open question. A concern is also that LLVM requires a C++ compiler making support on exotic platforms more difficult.

Regardless of those concerns Unladen Swallow is a project worth being started.

Unladen swallow: accelerating Python

Posted May 8, 2009 2:40 UTC (Fri) by jamesh (guest, #1159) [Link]

It will be interesting to see how much compatibility they maintain as they move forward with trying to remove the global interpreter lock.

There are many C extensions that depend on the GIL for safe operation. Borrowing references, using it to synchronise access to their own state, etc. Much of the C API is written such that it requires the GIL to be used safely too.

I do think that removing the GIL is a worthy goal if they can achieve it without significant performance decrease, but it is one area where it will be difficult to keep compatibility.

Unladen swallow: accelerating Python

Posted May 7, 2009 23:43 UTC (Thu) by dag- (guest, #30207) [Link] (1 responses)

It would be sad if existing performance improvements from Unladen Swallow based on Python 2.x would not go into eg. a python 2.7 release. Especially as a lot of python2 applications may be around for a long time.

If marketing reasons would bring it to only python 3.x, I guess the sentiment towards python in general could backfire in the community. Nobody likes to be forced (even not gently) if not needed.

Unladen swallow: accelerating Python

Posted May 8, 2009 6:33 UTC (Fri) by kune (guest, #172) [Link]

A number of patches have already gone upstream and will be in 2.7 and 3.x. The issue is the integration of LLVM.


Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds