Gilectomy

By Jake Edge
June 2, 2016

Python's (in)famous global interpreter lock (GIL), which effectively serializes multi-threaded access to the interpreter (thus hampering concurrency using threads), has long been seen as something that Python could do without. But there are both technical and political hurdles to clear before the GIL can be removed. Larry Hastings presented his thoughts and progress on doing a "gilectomy" to the CPython interpreter at the 2016 Python Language Summit.

Hastings said that he has a proof-of-concept solution that gets around the technical and political problems. There are two questions that often get asked: "Could we remove the GIL?" and "Should we remove the GIL?" It is clear that it can be removed, he said, because IronPython and Jython already have. The answer to the second is "maybe"; it will depend on what it buys versus the technical debt it incurs. But, he said, he is going to keep trying to remove the GIL until either it gets removed or everyone tells him to stop.

The GIL was added in 1992 by Guido van Rossum; since then, the world has changed, but Python hasn't. Now, everything, including eyeglasses, is multi-core. Python, however, cannot really take advantage of these cores using threads.

There are four technical considerations that need to be addressed, he said. Reference counting for the garbage collector is one. There is also a need to look at the globals and statics in the interpreter and make them per-thread variables. The C extension parallelism and reentrancy issues need to be handled as do places in the code where atomicity is required.

There are also three political considerations. Van Rossum has said that he will only consider removing the GIL if it does not negatively impact the performance of single-threaded programs. Breaking all of the C extensions, which is the outcome of some other GIL-removing projects, is not reasonable. Removing the GIL must also not over-complicate the code.

There are some potential solutions to the reference counting issue that should not be considered, Hastings said. Both tracing garbage collection and software transactional memory might perform reasonably, but both are likely to be quite complicated and to break all of the C extensions.

So reference counting remains in his proof of concept. That means using atomic increment and decrement operations, which leads to a 30% performance hit right off the bat. As more threads are added it gets worse. He has an idea about "buffered reference counting", but did not have time to describe that at the summit. For global data, PyThreadState can be used to make it per-thread data. He has added fine-grained locking to the small-block allocator so that it can be used by multiple threads as well.

Parallelism is simply something that C extensions will have to live with. It makes the lives of extension developers more difficult, but there really is no way to soften that blow, he said. In order to enforce atomicity, he has added a lock API to CPython (with "macros to hide it behind") so that all mutable objects get locked before accessing them. He noted that "mutable" refers to the C objects, not Python objects, so even immutable objects in Python, like strings, are still mutable from the perspective of the interpreter.

Hastings laid out a set of five rules for locking in CPython to ensure that locking functions smoothly. Locks must be recursive and objects must be self-locking wherever possible. The reference count cannot be touched except through the defined interface and the object type is immutable. The latter drew a question about the desirability of changing object types, but Hastings said that there will be some things that have to be given up to facilitate the removal of the GIL.

When code needs to take multiple locks, it should do it in address order. Finally, the kernel should not be involved in taking the lock unless there is contention. That maps to a futex on Linux, but Windows and Mac OS X have equivalent functionality.

His proof of concept lives in the same source tree as the regular CPython interpreter, which can be configured to run with or without the GIL. One thing that might be possible if the GIL-removal work pans out is to enforce best practices on C extensions, since there will be a new API. The GIL removal is somewhat complicated, so it may fail that particular political consideration, he said.

Hastings briefly described his eight-point plan to remove the GIL (after noting Van Rossum's 2007 "It isn't Easy to Remove the GIL" blog post). It is presumably based on the process he took with his "toy" proof of concept. It starts by adding the atomic increment/decrement, adds locks to various types (dict, list) and free lists, on through murdering the GIL and fixing up the tests.

He showed the results of a "dumb test" he ran using the proof of concept. It calculated the Fibonacci sequence in seven threads. It was roughly 3.5x slower than the standard CPython interpreter in terms of wall time and 25x slower in terms of CPU time (because seven threads were running). That is not as good as he had hoped for in this early stage (he was shooting for only 2x slower), but there are still a lot of low-hanging optimization possibilities.

The open questions ("apart from 'should we do it at all?'" he said with a grin) are about things like separating read and write locks or allowing user-settable locks in the language itself. It might also make sense to look at running multiple interpreters in the same process—GIL-removal time might be the right point to add that feature.

He concluded the talk by noting that he had "Gilectomy" stickers available and a GitHub repository set up for those interested. He said he was planning to "sprint" on the project right after the main PyCon conference; "I have T-shirts if you sprint with me."

There wasn't a lot of time for questions, but there were a few. One person asked about how Gilectomy impacts PyPy. Hastings said he didn't know, but thought that project was more prepared for these kinds of changes than CPython is. Nick Coghlan commented that there is a fair amount of code out there that should be doing locking but isn't; the programs are getting away with it mostly because the GIL—or, as another person suggested, CPU scheduling—protects it. Eliminating the GIL will expose those programs. Hastings also noted that it was unfortunate but that one of the costs of Gilectomy will be to break some C extensions, though he is unsure of how many.

[As evidence of the interest in the Python community about removing the GIL, Hastings took a photo of the (overly) full room where he gave a Gilectomy talk at PyCon later in the week. It can be seen on the right.]

Index entries for this article
Conference	Python Language Summit/2016

Gilectomy

Posted Jun 3, 2016 1:43 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (5 responses)

All this work sounds worse than just excising all global state from libpython and changing the API to just take a PyContext* object to all functions, similar to Lua. It would even allow one to run multiple interpreters in a single process without it being a roller coaster of horror, anger, and dispair.

Gilectomy

Posted Jun 3, 2016 3:11 UTC (Fri) by smurf (subscriber, #17840) [Link] (4 responses)

… and invalidating all the C extension code out there. Passing that pointer to every function under the sun isn't free, either.

A thread-specific variable to hold the PyContext*?

Gilectomy

Posted Jun 3, 2016 3:29 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

As if the bad behavior the GIL becoming a no-op doesn't mean that all C extensions don't need a once-over anyways. A thread-specific variable might do for a compatibility layer where the existing API just uses that as an implicit variable to the new API functions, but that probably wouldn't work for multi threaded contexts so well. But just doing that half-solution means you still can't use two interpreters in a single thread.

Gilectomy

Posted Jun 3, 2016 6:03 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

It's high time for Python extension API to be redesigned. The existing API can be adapted through a shim layer, like in PyPy to keep current extensions viable during the transition period.

Gilectomy

Posted Jun 3, 2016 10:52 UTC (Fri) by cortana (subscriber, #24596) [Link] (1 responses)

Can't macros be used to keep source compatibility for extension code that isn't updated to use an explicit PyContext pointer?

Gilectomy

Posted Jun 3, 2016 11:00 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Hrm, you can't take the address of those functions then anymore. Either way, I think just taking the sour pill and doing the API right would have saved a lot more time than the GIL has wasted for developers including Python users, mailing list discussions, and attempts to excise it.

Gilectomy

Posted Jun 3, 2016 2:35 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

I'm really glad nobody is promoting STM anymore. Perhaps in 10 more years we'll finally get a free-threaded Python interpreter.

Gilectomy

Posted Jun 6, 2016 10:02 UTC (Mon) by salimma (subscriber, #34460) [Link]

The video of the presentation is now up on YouTube (as submitted to Hacker News).