PyPy: the other new compiler project

By Jonathan Corbet
May 19, 2010

We have recently seen a lot of attention paid to projects like LLVM. Even though the GNU Compiler Collection is developing at a rapid pace, there are people in the community who are interested in seeing different approaches taken, preferably with a newer code base. LLVM is not where all the action is, though. For the last few years (since 2003, actually), a relatively stealthy project called PyPy has been trying to shake up the compiler landscape in its own way.

On the face of it, PyPy looks like an academic experiment: it is an implementation of the Python 2.5 interpreter which is, itself, written in Python. One might thus expect it to be more elegant in its code than the standard, C-implemented interpreter (usually called CPython), but rather slower in its execution. If one runs PyPy under CPython, the result is indeed somewhat slow, but that is not how things are meant to be done. When running in its native mode, PyPy can be surprising.

PyPy is actually written in a subset of Python called RPython ("restricted Python"). Many of the features and data types of Python are available, but there are rules. Variables are restricted to data of one type. Only built-in types can be used in for loops. There is no creation of classes or functions at run time, and the generator feature is not supported. And so on. The result is a version of the language which, while still clearly Python, looks a bit more like C.

Running the RPython-based interpreter in CPython is supported; it is fully functional, if a bit slow. Running in this mode can be good for debugging. But the production version of PyPy is created in a rather different way: the PyPy hackers have created a multi-step compiler which is able to translate an RPython program into a lower-level language. That language might be C, in which case the result can be compiled and linked in the usual way. But the target language is not fixed; the translator is able to output code for the .NET or Java virtual machines as well. That means that the PyPy interpreter can be easily targeted to whatever runtime environment works best.

The result works. It currently implements all of the features of Python 2.5, with very few exceptions. There are some behavioral differences due to, for example, the use of a different garbage-collection algorithm; PyPy can be slower to call destructors than CPython is. Python extensions written in C can be used, though one gets the sense that this feature is still stabilizing. PyPy is able to run complex applications like Django and Twisted. On the other hand, for now, it only runs on 32-bit x86 systems, it is described as "memory-hungry," and Python 3 support seems to be a relatively distant goal.

Beyond that, it's fast. PyPy includes a built-in just-in-time compiler (JIT); it is, in a sense, a platform for the creation of JITs for various targets. The result is an interpreter which, much of the time, is significantly faster than CPython. For the curious, the PyPy Speed Center contains lots of benchmark results, presented in a slick, JavaScript-heavy interface. PyPy does not always beat CPython, but it often does so convincingly, and speed appears to be a top priority for the PyPy developers. It may well be that the speed of PyPy may eventually prove compelling enough that, as Alex Gaynor suggests, many of us will be using PyPy routinely instead of CPython in the near future.

There are some other interesting features as well. There is a stackless Python mode which supports microthreaded, highly-concurrent applications. There is a sandboxed mode which intercepts all external library calls and hands them over to a separate policy daemon for authorization. And so on.

What really catches your editor's eye, though, is the concept of PyPy as a generalized compiler for the creation of JITs for high-level languages. The translation process is flexible, to the point that it can easily accommodate stackless mode, interesting optimizations, or experimentation with different language features. The object model can be (and has been) tweaked to support tainting and tracing features. And the system as a whole is not limited to the creation of JIT compilers for Python; projects are underway to implement a number of other languages, including Prolog, Smalltalk, and JavaScript.

It could easily be argued that PyPy incorporates much of the sort of innovation which many people have said never happens with free software. And it is all quite well documented. This is a project which is not afraid of ambitious goals, and which appears to be able to achieve those goals; it will be interesting to watch over the next few years.

PyPy: the other new compiler project

Posted May 20, 2010 1:36 UTC (Thu) by ustcchenjian (guest, #37311) [Link]

Great post，surely it will be watched over by many persons including me。

The concept of PyPy as a generalized compiler is cool， and the realization of it by PyPy is cooler。

I love it！！

PyPy: the other new compiler project

Posted May 20, 2010 6:18 UTC (Thu) by ekj (guest, #1524) [Link] (14 responses)

It seems to me that this observation is a lot more general.

It is frequently said, that high-level-languages (i.e. python) provide better developer-productivity, at the cost of slower execution. It used to be true that for real speed, you handcoded the inner loop in assembly.

With every passing generation of CPU though, that has become less and less true. There are simply so many optimizations, and so high complexity, that most mortal programmers are in practice unable to write the body of a loop in a more optimized way than an optimizing compiler can.

i.e. compilers translating C -> Assembler do a better job of it, than human beings can.

Pypy seems to demonstrate that the same is true at a higher level, atleast in some situations. Writing code in (R)Python and having pypy translate that to C, can in many situations give a program that runs faster than it would have, had you written it in C in the first place.

Handcoding assembler, increasingly, doesn't pay -- the compiler does it better.

Could it be that handcoding C *ALSO* doesn't pay, and that you'd tend to be better off writing the program in a higher-level-language, and having a compiler translate it ? Or is that too general, and fails to apply generally, despite applying if the program you're writing is a python-interpreter ?

PyPy: the other new compiler project

Posted May 20, 2010 6:51 UTC (Thu) by Tjebbe (guest, #34055) [Link]

That is what the Java people claim, and perhaps it is true. Not sure if we are there exactly yet, whenever I talk to those people about what i do (in C and C++) they make the above statement (of course after the all c is insecure), but a 'prove me wrong' (with something more than a benchmark) usually ends the conversation.

So I'm very interested if this project will :)

Sorry, but this is not true at all

Posted May 20, 2010 7:12 UTC (Thu) by khim (subscriber, #9252) [Link] (11 responses)

There are simply so many optimizations, and so high complexity, that most mortal programmers are in practice unable to write the body of a loop in a more optimized way than an optimizing compiler can.

Small functions (like memcpy) are still much faster if hand-coded assembler. And more complex functions are often written in "today's assembler". When you write code for NEON or SSE where each "function call" generates just one known instruction. You give up the register allocation duty, but this is done to save coding time, it does not speed up execution.

Now, if we are talking about megabytes of code compiler works better then human, but that's because there are not enough time to carefully hand-optimize huge amounts of code to in effect human's -O0 code is compared with compiler's -O3 code...

Writing code in (R)Python and having pypy translate that to C, can in many situations give a program that runs faster than it would have, had you written it in C in the first place.

Care to present your benchmarks? I'm seeing comparison between PyPy (moderately fast JIT) and CPython (exteremely slow interpreter), I don't see where PyPy beats C. Such benchmarks are hard to write correctly (the language interfaces are too different) but usually when your C library is slower then Python one (be it CPython or PyPy in any incarnation) it's because it does 100 times more (and you are throwing away 99% of what it did) or it spends lots of time going from Python to C and back (the switch is usually slower then either C or Python).

Handcoding assembler, increasingly, doesn't pay -- the compiler does it better.

Sorry, but this is not true at all. Compiler does it worse, unless it recognizes some precomputed pattern. For example compiler multiplies register by small number in the best possible way - much better then human. Why? It's easy: compiler writers tried all possible combinations (billions of them) and selected the best one for number below some cut-off (this is how ICC does it, GCC does it worse because it only contains few rules). But this approach hits the wall really fast. In general hand-coded assembly still wins - if you take your time and write good assembler for your CPU.

The problem here is timing: to write good hand-coded assembler code for P4 for sizable program you'll need 5-10 years. By that time P4 will be history, Core2 and Atom will be kings and you'll be behind once again. That's why you use hybrid approach cited above: it's just faster to write code with intrinsics so you have hope to create product while CPU is still in use.

Could it be that handcoding C *ALSO* doesn't pay, and that you'd tend to be better off writing the program in a higher-level-language, and having a compiler translate it ? Or is that too general, and fails to apply generally, despite applying if the program you're writing is a python-interpreter ?

It does not apply in general and it does not apply here. PyPy wins because it's JIT while CPython is pure interpreter.

Sorry, but this is not true at all

Posted May 20, 2010 12:04 UTC (Thu) by djc (subscriber, #56880) [Link] (1 responses)

Here's a microbenchmark where PyPy outperformed C:

http://morepypy.blogspot.com/2008/01/rpython-can-be-faste...

(AIUI PyPy has gotten a lot better since...)

There are lies, damn lies and microbenchmarks...

Posted May 20, 2010 16:53 UTC (Thu) by khim (subscriber, #9252) [Link]

This is exactly what I'm talking about: creators of the compiler always know where they can beat everyone else. Also note that even author of benchmark in question readily admit they are comparing totally different algorithms! This benchmark is almost entirely tied to speed of allocator: while C version uses some hand-coded allocator to make speed of malloc less relevant it's not the best allocator there.

Sure, changes in the algorithm can buy you more speed then microoptimizations, but... how it's related to topic under discussion?

Sorry, but this is not true at all

Posted May 20, 2010 16:20 UTC (Thu) by intgr (subscriber, #39733) [Link] (7 responses)

I agree with everything else that you stated; however:

> CPython (exteremely slow interpreter)

Compared to JITs it is slow, yes. Compared to other *interpreters*, CPython is the fastest one that exists; in most cases it outperforms other interpreters such as Perl, PHP and needless to say Ruby.

Considering that the Python language is way more dynamic than PHP (dynamic typing; class definitions, functions, operator overloads etc can change at any point at runtime, including magic methods like __getattr__), I think it is a real achievement.

Okay, as far as the interpreters go it's not so bad....

Posted May 20, 2010 17:18 UTC (Thu) by khim (subscriber, #9252) [Link] (3 responses)

I'm not sure I want to start the debate about interpreters but in all cases we are talking about not percents but times compared to C (let alone assembler) speed. JIT beats the interpreter and compiler beats JIT on real tasks. Assembler beats everything... if you give the programmer time - and we are talking years here, so it's just not practical.

JIT case is very interesting. People often think that JIT can outperform the compiler (we just need to wait few more years), but it's just not so on practice. The reason is simple: cache. While number of transistors in CPUs grows every year number transistors in CPU core is essentially constant (think L1 cache: 20 years ago - 8K in 486, 10 years ago - 128K in Athlon, today... still 128K and often less). This means that JIT uses very scarce resource for it's work so while artificial samples can be created where JIT outperforms simple PBO in real programs on practice it almost always loses.

Okay, as far as the interpreters go it's not so bad....

Posted May 20, 2010 17:24 UTC (Thu) by intgr (subscriber, #39733) [Link]

Do note that my above post is agreeing with you:
> I agree with everything else that you stated

I didn't want to start a "debate", I just thought it was unfair to call CPython an "extremely slow interpreter", because it's not.

Okay, as far as the interpreters go it's not so bad....

Posted May 22, 2010 5:15 UTC (Sat) by salimma (subscriber, #34460) [Link] (1 responses)

I'm not convinced cache is much of an issue for long-running applications -- for those, one should compare the performance of a Java or C# application after the JIT is no longer being triggered, with a C/C++ equivalent.

It does not matter...

Posted May 22, 2010 9:48 UTC (Sat) by khim (subscriber, #9252) [Link]

You loss can be big or small, but you can't win:

If JIT determines at some point that it's not longer needed and "disconnects" - it's just version of PBO.
If JIT determines that situation is static but checks from time to time that it's not changed - you lose small.
If JIT actively works and changes the recompiles everything all the time - you lose big.

You can only ever win if JIT recompiles stuff constantly (so PBO in normal compiler can't cope) AND the workload does not depend on L1 cache all that much (so loss from JIT work is more then compensated by JIT optimizations). This situation can be easily created in tests but almost never occurs in real life.

Sorry, but this is not true at all

Posted May 22, 2010 15:58 UTC (Sat) by nix (subscriber, #2304) [Link]

Compared to other *interpreters*, CPython is the fastest one that exists

That is extremely debatable. In general, when given a choice between speed and implementation clarity, Python has gone for the latter.

Lua is one example of an interpreter immensely faster than CPython (partly simply because it is smaller: the entire interpreter fits in L2 cache on my machine; Python will barely fit in L3.)

Sorry, but this is not true at all

Posted May 23, 2010 15:17 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Python is the fastest interpreter.

Ha. hAHahhahahaHAHAH!

It doesn't even use computed goto to make a threaded interpreter. Try to run Erlang during your spare time - it's much faster than CPython. Or different Forth interpreters.

CPython and computed goto

Posted May 23, 2010 17:33 UTC (Sun) by scottt (guest, #5028) [Link]

> It doesn't even use computed goto to make a threaded interpreter. The 'release31-maint' branch of CPython does use computed goto, see: USE_COMPUTED_GOTOS in Python/ceval.c

Sorry, but this is not true at all

Posted May 21, 2010 15:42 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

The delightful "Writing Efficient Programs" by Jon Bentley (sadly long out of print, but his "Programming Pearls" contains the gist of it) tells what to do to make programs go faster/use less memory: First you have to measure where the performance drains are, that turns out not to be at all evident (programmers are notoriously bad at guessing at them!). Look at the architecture of the program, check for more efficient algorithms. Then go look at the "small picture": Typical programs spend 95% of their time in 5% of their code. If you make that 5% go twice as fast, your program goes almost twice as fast; futzing around with the rest gives almost no improvement. Only if rewriting in your high level language hits the wall, consider rewriting in a lower level language. Plus never forget that hacking the program for efficiency has a cost in maintenability, and only under rare circumstances is the added programmer time of extreme measures worth the savings in computer time (and with Moore's law it is getting ever harder).

PyPy: the other new compiler project

Posted May 20, 2010 13:07 UTC (Thu) by liljencrantz (guest, #28458) [Link]

Often repeated wisdom, but I've rarely seen it work like that in practice. Interpreted languages usually have huge memory overheads that forces non-trivial data out of the caches, drastically lowering performance on real world workloads.

That said, I think the trade off of increased programmer productivity but decreased program speed is actually often the right choice.

PyPy: the other new compiler project

Posted May 20, 2010 6:38 UTC (Thu) by eru (subscriber, #2753) [Link]

It could easily be argued that PyPy incorporates much of the sort of innovation which many people have said never happens with free software.

Seems to me this is just the kind of innovation that has always happened in free software: Experimental languages and language implementations have usually been distributed with as some kind of free software, long before the GNU project and Linux. With languages, the programmers are their own users, and are in the best good position to implement and test their ideas. The problems with innovation occur when this is not the case, such as with office software, where the end-user and the implementor are people with very different skills and points of view.

Python 3 support

Posted May 20, 2010 7:04 UTC (Thu) by man_ls (guest, #15091) [Link]

I find myself happy to read things like: "Python 3 support seems to be a relatively distant goal". The reason is that I maintain a small Python package, and support for Python 3 is also a distant goal for me. Currently I need to support as far back as Python 2.4 (for Mac OS X users), so all relatively recent improvements in the language are of no use to me. Imagine what my package can gain with new Python features in the next decade: nothing.

With big (but incompatible) leaps forward like Python 3 the situation is not going to improve. As Python 3 is incompatible with Python 2.x (unless some very awkward constructions are used), network effects can either work for or against migrating everything to Python 3, and so far it seems that inertia will be too hard to overcome. And we humble Python coders will be happy with the situation: Python 2 is here to stay.

PyPy: the other new compiler project

Posted May 20, 2010 7:36 UTC (Thu) by hppnq (guest, #14462) [Link] (4 responses)

What really catches your editor's eye, though, is the concept of PyPy as a generalized compiler for the creation of JITs for high-level languages.

Insert obligatory reference to Parrot.

PyPy: the other new compiler project

Posted May 20, 2010 16:31 UTC (Thu) by intgr (subscriber, #39733) [Link]

Parrot is a single JIT, it restricts you to a certain virtual machine and bytecode format. PyPy is a framework for JITs, where adapting it to a new language/bytecode is no harder than writing an interpreter in RPython. In fact, you could directly turn source code into JIT-ed machine code without any intermediary bytecode (like Chrome's V8 JIT does for JavaScript).

Now I'm not at all convinced if this approach is a good thing. After all, it took many years to reach the PyPy 1.2 milestone and its optimization capabilities are still pretty primitive compared to other JITs.

PyPy: the other new compiler project

Posted May 20, 2010 18:49 UTC (Thu) by erwbgy (subscriber, #4104) [Link] (1 responses)

That means that the PyPy interpreter can be easily targeted to whatever runtime environment works best.

Presumably this means that PyPy could target the Parrot virtual machine if they wanted to.

PyPy: the other new compiler project

Posted May 20, 2010 23:08 UTC (Thu) by alvieboy (guest, #51617) [Link]

They can target parrot. But parrot is just another VM - although I'm very excited about it.

LLVM is also interesting, as a framework. Python, being pure OO is quite complex (let's say, quite large) on its specification. Unlike Perl, where you can use three basic types (all pointers) [SV,AV and HV] to perform almost everything you need (SV is quite complex, yes), in Python you do have to define a huge structure to handle even simple classes and simple methods. I almost gave up on python (for embedded script) for that same reason. LUA does not suit either. I ended up with nothing.

Now, regarding VM/compilers:

Other thing most of those VM/compilers/compiler generators assume is the general availability of registers. Although those exist in almost vast numbers in x86, ARM, MIPS, SH, you name it, other architectures like forth processors do not actually have registers. Everyone seems to assume registers exist and often emulation of those registers must be implemented to accomplish code or pseudocode generated.

I'd like to see a VM that implements algorithms instead of low-level register operations. That would be a fantastic innovation.

PyPy: the other new compiler project

Posted May 20, 2010 19:49 UTC (Thu) by wingo (guest, #26929) [Link]

Everyone who writes a compiler and VM eventually sees it as a toolkit that everyone else should use. I should know, I maintain Guile ;-)

Can you write general-purpose code in RPython?

Posted May 20, 2010 10:44 UTC (Thu) by epa (subscriber, #39769) [Link] (1 responses)

The article mentions that PyPy is written in a restricted, somewhat more statically-typed dialect of Python called RPython. Instead of writing Python code with occasional C extensions for speed, surely it would be possible to write RPython code for speed-critical sections, and have that compiled to native code?

Can you write general-purpose code in RPython?

Posted May 20, 2010 11:05 UTC (Thu) by spiv (guest, #9031) [Link]

That's roughly what Pyrex and Cython do: provide a restricted variant of Python (with optional C type declarations!) that is compiled to C for you.

PyPy: the other new compiler project

Posted May 31, 2010 1:14 UTC (Mon) by obi (guest, #5784) [Link]

It'd be interesting to hear whether they took the same architectural decisions as the rubinius project (ruby in ruby), which by the way recently hit 1.0.