By Jonathan Corbet
May 19, 2010
We have recently seen a lot of attention paid to projects like LLVM. Even
though the GNU Compiler Collection is
developing at a rapid pace,
there are people in the community who are interested in seeing different
approaches taken, preferably with a newer code base. LLVM is not where all
the action is, though. For the last few years (since 2003, actually), a
relatively stealthy
project called
PyPy has been trying to shake
up the compiler landscape in its own way.
On the face of it, PyPy looks like an academic experiment: it is an
implementation of the Python 2.5 interpreter which is, itself, written in
Python. One might thus expect it to be more elegant in its code than the
standard, C-implemented interpreter (usually called CPython), but rather
slower in its execution. If one runs PyPy under CPython, the result is
indeed somewhat slow, but that is not how things are meant to be done.
When running in its native mode, PyPy can be surprising.
PyPy is actually written in a subset of Python called RPython ("restricted
Python"). Many of the features and data types of Python are available, but
there are rules. Variables are restricted to data of
one type. Only built-in types can be used in for loops. There is
no creation of classes or functions at run time, and the generator feature
is not supported. And so on. The result is a version of the language
which, while still clearly Python, looks a bit more like C.
Running the RPython-based interpreter in CPython is supported; it is fully
functional, if a bit slow. Running in this mode can be good for
debugging. But the production version of PyPy is created in a rather
different way: the PyPy hackers have created a multi-step compiler which is
able to translate an RPython program into a lower-level language. That
language might be C, in which case the result can be compiled and linked in
the usual way. But the target language is not fixed; the translator is
able to output code for the .NET or Java virtual machines as well. That
means that the PyPy interpreter can be easily targeted to whatever
runtime environment works best.
The result works. It currently implements all of the features of Python
2.5, with very few exceptions. There are some behavioral differences due
to, for example, the use of a different garbage-collection algorithm; PyPy
can be slower to call destructors than CPython is. Python extensions
written in C can be used, though one gets the sense that this feature is
still stabilizing. PyPy is able to run
complex applications like Django and Twisted. On the other hand, for now,
it only runs on 32-bit x86 systems, it is described as "memory-hungry," and
Python 3 support seems to be a relatively distant goal.
Beyond that, it's fast. PyPy includes a built-in just-in-time compiler (JIT); it
is, in a sense, a platform for the creation of JITs for various targets.
The result is an interpreter which, much of the time, is significantly
faster than CPython. For the curious, the PyPy Speed Center contains lots of
benchmark results, presented in a slick, JavaScript-heavy interface. PyPy
does not always beat CPython, but it often does so convincingly, and speed
appears to be a top priority for the PyPy developers. It may well be that
the speed of PyPy may eventually prove compelling enough that, as Alex
Gaynor suggests,
many of us will be using PyPy routinely instead of CPython in the near
future.
There are some other interesting features as well. There is a stackless Python mode which supports
microthreaded, highly-concurrent applications. There is a sandboxed mode
which intercepts all external library calls and hands them over to a
separate policy daemon for authorization. And so on.
What really catches your editor's eye, though, is the concept of PyPy as a
generalized compiler for the creation of JITs for high-level languages.
The translation process is flexible, to the point that it can easily
accommodate stackless mode, interesting optimizations, or experimentation
with different language features. The object model can be (and has been)
tweaked to support tainting and tracing features. And the system as a
whole is not limited to the creation of JIT compilers for Python; projects
are underway to implement a number of other languages, including Prolog,
Smalltalk, and JavaScript.
It could easily be argued that PyPy incorporates much of the sort of
innovation which many people have said never happens with free software.
And it is all quite well
documented. This is a project which is not afraid of ambitious goals,
and which appears to be able to achieve those goals; it will be interesting
to watch over the next few years.
(
Log in to post comments)