Development

PyPy and software transactional memory

By Jake Edge
February 26, 2014

Software transactional memory (STM) is an ambitious feature that is being worked on for the PyPy interpreter. STM is an alternative to locking that is meant to eventually replace the (in)famous global interpreter lock (GIL) in the PyPy version of Python. That should also help multi-threaded Python programs scale better. But STM is tricky, necessitating two separate rewrites of that code for PyPy—including one announced at the beginning of February.

The idea behind STM is relatively straightforward. Shared data is processed inside of "transactions" that can either fully complete or, if there are conflicts, be rolled back. While a thread is processing a transaction, it records any shared object that is read or written. At the end of the transaction, the thread verifies that no other thread has changed any of the values participating in the transaction (i.e. those recorded). If so, the change is committed, and other threads will see the changes made. If not, the changes are rolled back and retried in a new transaction.

A bit of history

PyPy, which is a Python interpreter written in Python, has always had a focus on performance. That may seem a little contradictory, given that it is written in a language not particularly known for its performance characteristics, but PyPy uses various techniques to overcome that. First, it uses a restricted subset of Python (called RPython) that can easily be statically analyzed. Second, PyPy uses a just-in-time (JIT) compiler to increase its performance.

But STM has been on the project's radar for a few years. PyPy project lead Armin Rigo posted a message to the pypy-dev mailing list about STM back in 2011, but he referenced a similar idea for Python from 2003. Rigo went on to post a PyPy blog message calling for adding STM to PyPy. Seven months later, in March 2012, a call for donations to support STM went out and has so far gathered roughly half of the $50,400 target. Work on the feature started soon after the donation call.

Rigo (and Remi Meier, who works with him on STM for PyPy) then went back to the drawing board in June 2013. Originally, the STM piece was written in C, while the garbage collector was written in RPython. That turned out to be problematic, so they turned to an all-C approach. But, as Rigo stressed, STM in PyPy is a research project—it might take a few iterations before things all come together.

And some status

Another of those iterations came about recently. Rigo and Meier have come up with a different way to organize PyPy's memory to better support STM. It will require a full rewrite of the C STM library, but once that is done, the existing pypy-stm code should be fairly easily moved to it. Early indications are that the new mechanism is significantly faster:

This means that we are looking forward to a result that is much better than originally predicted. The pypy-stm has chances to run at a one-thread speed that is only "n%" slower than the regular pypy-jit, for a value of "n" that is optimistically 15 --- but more likely some number around 25 or 50. This is seriously better than the original estimate, which was "between 2x and 5x". It would mean that using pypy-stm is quite worthwhile even with just two cores.

As outlined in a draft design document, the idea behind the new mechanism came out of the discovery of the remap_file_pages() Linux system call. Using remap_file_pages() means the new library will not be portable to other operating systems, at least initially. There are some thoughts on how to add it for Windows and others, but that is still a ways off.

The basic idea is to mmap() enough memory for N copies of the shared data, where N is the number of threads (shown at right). By using remap_file_pages(), each thread's portion of that memory area is actually mapped to the same set of pages, so all threads share the same memory even though each thread sees that memory at a different virtual address. [all pages shared] Some trickery with the Clang compiler causes code to be emitted that uses the %gs register as an offset to shift the memory accesses by that amount. It is similar to the way the %fs register is used by the POSIX threads (Pthreads) library for thread-local variables.

The diagram at left shows two threads (T1, T2) running, each with their address space pointing in to the shared region.

When a thread wants to modify an object, the page that it resides in is "unshared" by another call to remap_file_pages() that remaps the address back into the thread-local part of the mapped memory: "i.e. stop the corresponding %gs-relative, thread-local page from mapping to the same physical page as others". [one page being modified] Much like a copy-on-write operation, each access causes a new page to be allocated and the contents of the corresponding shared page are copied to it.

At right, we see that thread T2 is modifying an object, so it points to the private copy made.

Each transaction will record the objects it accesses: privately it records the objects it reads and publicly records those it writes. Before an object is modified, it is unshared, thus copied to the thread-local page as described above; it can then be rolled back by copying the shared page again. In addition, only one transaction is allowed to write to a given object—other transactions are aborted if they attempt to write to it. Once a transaction completes successfully, which means there were no reads of data changed by another thread (writes can't conflict due to the "one transaction writing an object" rule mentioned above), the page is copied back to the shared area.

It seems like a clever approach and, as the preliminary benchmarks indicate, one that has some significant performance advantages. But it is important to remember Rigo's warning that it is a research project. Significant progress has been made before, but needed to be redone—that could happen again. But, with luck, this approach will lead to a PyPy without a GIL but with the performance to allow it to be used for multi-threaded programs on systems with two cores or more.

Comments (none posted)

Brief items

Quotes of the week

Long term, I don't want to get shivved in the parking lot because my code broke.

— Carlo Flores at SCALE 12x, on why he decided to write his testing framework "Smoke" in minimalist fashion.

It's a giant Death Star-shaped marshmallow; I'm not sure "applying pressure" does anything.

— Owen DeLong at SCALE 12x, on applying pressure for AT&T to adopt IPv6.

Comments (1 posted)

LXC 1.0 released

The LXC (Linux Containers) development team has announced the release of LXC 1.0. It comes with lots of new features including fully unprivileged containers, a stable API (with a five-year commitment for security and bug fix updates), official bindings for Python, Lua, Go, and Ruby, support for cloning and snapshotting containers, and more. "LXC 1.0 features a wide variety of improvements to container security, a consistent set of tools, updated documentation and an API with multiple bindings. We are confident that this is the best LXC release yet and that our users will find it reliable and easy to use. A series of blog posts on LXC and LXC 1.0 features is also available: https://www.stgraber.org/2013/12/20/lxc-1-0-blog-post-series"

Full Story (comments: 16)

systemd 209

Lennart Poettering has announced the release of systemd 209. In the roughly five months since 208, systemd has seen a lot of changes including support for kdbus (albeit with an unstable API for now). Two new tools, systemd-networkd and systemd-socket-proxyd, have been added. Several libraries have been combined into a single libsystemd to reduce code duplication. Lots more changes are listed in the announcement. Unlike earlier releases, it is not yet available for Fedora Rawhide due to an ARM build problem. "This is a massive new release, it includes a lot of new code. You probably don't want to base your LTS release on this. We hope to return to a shorter release cycle now to stabilize the new code. Expect a couple of bugfix releases over the next weeks."

The first of those bugfix releases, systemd 210, is also available; beyond fixes, it also adds another (though smaller) set of new features.

Full Story (comments: 42)

EFL 1.9 is available

Version 1.9 of the Enlightment Foundation Libraries (EFL) has been released, accompanied by the corresponding tool sets. Among the new features are support for the AT-SPI2 accessibility interfaces, GStreamer 1.0 support for Emotion players, XPresent support, text handling improvements, and a set of Evas Filters, which the release announcement describes as "a combination of filters used to apply specific effects to an Evas Object. For the moment, these effects are specific to the Text Objects. The filters can be applied to an object using a simple script language specifically designed for these effects. Commands of these script language include blend, blur, grow, displace, transform and more. ".

Full Story (comments: none)

Newsletters and articles

Development newsletters from the past week

What's cooking in git.git (February 25)
GNU Toolchain Update (February 24)
LLVM Weekly (February 24)
OCaml Weekly News (February 25)
Perl Weekly (February 24)
PostgreSQL Weekly News (February 23)
Python Weekly (February 20)
Ruby Weekly (February 20)
This Week in Rust (February 24)
Tor Weekly News (February 26)

Comments (none posted)

Kügler: Next means Focus on the Core

Sebastian Kügler looks at the "Plasma Next" project on his blog. Plasma is the workspace portion of the KDE environment, and Next is the new version that will be released separately from KDE applications and the underlying libraries. It is based on KDE Frameworks 5 and Qt5; the first stable release is planned for June. "One of my favourite new features that has recently landed is Marco [Martin]'s work on contrast behind translucent dialogs, which hugely improves readability in many cases, and make "the old Plasma" almost look bland in comparison. We've cleaned up quite a lot of workflows, not by making them any different, but by removing visual noise in between. The idea is to polish common elements to feel fresh and like an upgrade to users, but not entirely different. In the UI, known behavioral patterns are kept in place, with more pronounced core functions, and less fuzz around them. We're aiming at keeping all the functionality and adaptability in place. To the user, the migration to Plasma Next should feel like an upgrade, not something completely new, but trusted after a bigger step in its evolution, yet recognizably true to its values."

Comments (105 posted)

KDE's Next Generation Semantic Search (KDE.news)

An article over at KDE.news looks at the next generation of semantic search for the KDE desktop. "The upcoming release of KDE Applications (version 4.13) will introduce the next step in the effort to improve the performance and stability of search features in KDE software. The improved Semantic Search is lighter on resources and more reliable than it was previously, but, thanks to considerable reuse of existing code, it is mature and offers a complete feature set. Users will find that features such as search are exposed in the same, familiar manner - but searching in a variety of applications will be faster and more reliable."

Comments (106 posted)

Servo: Inside Mozilla's mission to reinvent the web browser (ZDNet)

Here's a ZDNet article looking at Mozilla's "Servo" project, an attempt to make web browsers perform better and more securely on multi-core systems. "Servo takes a different approach to current browsers. It splits the work to compute the layout, render content and execute scripts on a web page into three tasks, each of which it can carry out in parallel. The browser's ability to carry out these tasks at the same time stems from the nature of the Servo's underlying programming language, Rust, which has been developed by Mozilla for several years and is nearing version 1.0."

Comments (68 posted)

Page editor: Nathan Willis
Next page: Announcements>>