Subinterpreter support for Python

By Jake Edge
May 15, 2018

Eric Snow kicked off the 2018 edition of the Python Language Summit with a look at getting a better story for multicore Python by way of subinterpreters. Back in 2015, we looked at his efforts at that point; things have been progressing since. There is more to do, of course, so he is hoping to attract more developers to work on the project.

Snow has been a core developer since 2012 and has "seen some interesting stuff" over that time. He has been working on the subinterpreters scheme for four years or so.

The problem is that programmers expect to be able to take advantage of multiple cores, whether they really need to or not. The Python multicore story is murky, at best, which leads to a perception problem. If you start talking about threads, the global interpreter lock (GIL) rears its head. He got involved in trying to change things after a coworker expressed frustration with Python because of its multicore story; the coworker indicated that the reason the company they worked for was moving away from Python was because of multicore issues. That got him motivated to try to do something, he said.

So he suggested looking around at other languages' multicore support; JavaScript web workers is one example of a successful solution in that space. The key attributes of that mechanism are that the workers are independent, isolated from the others, and there are efficient means for them to cooperate.

CPython already has most of a solution that does that, but it is hidden away in the largely unused subinterpreter feature. Subinterpreters will allow multiple Python interpreters per process and there is the potential for zero-copy data sharing between them. But subinterpreters share the GIL, so that needs to be changed in order to make it multicore friendly. In his opinion, subinterpreters are the best avenue to address the multicore problem for Python. They can do so without breaking backward compatibility with the extensions written in C, which is not true of some of the other ideas for better multicore scalability (e.g. PyPy).

There are some missing pieces, most of which are addressed in PEP 554, which is his "Multiple Interpreters in the Stdlib" proposal that is targeting Python 3.8 (which is ostensibly planned for October 2019, though there was discussion of releasing it earlier, later in the summit). There is already a C API for subinterpreters, but it needs to be exposed to Python programs from a module in the standard library. There also needs to be a way to pass data between the interpreters. Both of those are addressed in PEP 554. Another piece is to stop sharing the GIL, which is something that looks "totally doable", Snow said.

Maintaining the isolation between the interpreters and managing the shared resources will be two of the challenges. The sys module contains a lot of state that will need to be compartmentalized. There is a separate effort aimed at cleaning up some of the cruft that has accumulated in CPython over the decades. PEP 432 proposes a restructuring of the CPython startup process, but many of the ideas there would be helpful to the subinterpreter effort. In particular, it consolidates the interpreter's runtime state; all of the static global variables are moved to a single structure. At a minimum that is helpful to get an idea of what all of the global state is.

The only real current user of subinterpreters that Snow is aware of is mod_wsgi, which implements the Python web services gateway interface (WSGI) for the Apache web server. There is also a list of subinterpreter bugs that he showed, which need to be addressed; many of those were reported by the mod_wsgi developers. There are some testing gaps too. A subinterpreters test has been merged for 3.7; Snow hopes that PEP 554 is approved and lands for Python 3.8 with even more tests.

The PEP provides for a shared-nothing concurrency model. It has a minimal Python API in an interpreters module. It also adds channels to pass immutable objects between interpreters. A subinterpreter will retain its state, so the interpreter can be "primed" with modules and other setup in advance of its use. He suggested that those interested should read the PEP, which includes several of the examples that he quickly ran through.

There are a few blockers for PEP 554, he said. He would like to put an interpreters module out on the Cheese Shop (i.e. the Python Package Index or PyPI) so that he can get more feedback on the implementation. There are some open questions to be addressed and the PEP needs to be updated and reposted—something he hoped to get to before PyCon is over on May 17.

The ultimate goal is to improve and clarify the multicore support in Python as he described in a September 2016 blog post. That was written as something of a post-mortem on the project, when he thought he was ready to give up. He went away and came back; "I'm OK now", Snow said with a chuckle.

His high-level plan is broken up into two phases. The first is to implement PEP 554, expose subinterpreters in Python, and support passing some objects over channels. Part of that would be to improve the isolation enough to make the feature usable, but the key piece of the puzzle is to stop sharing the GIL. Phase two would build on that base; it would allow C extension modules for subinterpreters by getting rid of all of the static globals in the interpreter, turning them into per-interpreter state.

One kind of global state that does need to move in phase one is the allocators, which need to change into per-interpreter allocators. Once that happens, the GIL can follow, so that there will be a GIL per interpreter. There are lots of things that can be done in phase two and beyond, but he is hoping to get some others to help out to reach the goal of subinterpreter support in 3.8. This is not his area of expertise, Snow said, but he recently started working for Microsoft, which generously allows him to work on Python one day a week.

Thomas Wouters noted that the examples passed code to be run in the subinterpreter as a string, which is rather painful to do in a language with significant white space; will there be support for passing functions to be run instead? Snow agreed that it is needed and could be added.

Wouters also wanted to know how many existing users of subinterpreters would be broken once they no longer share the GIL. Programs are known to use the serialization of the GIL to protect their own data structures from concurrent updates, so changing the GIL is likely to lead to unexpected race conditions and the like. Snow acknowledged that and agreed with Larry Hastings that keeping the shared GIL as an option might be the solution for that.

Index entries for this article
Conference	Python Language Summit/2018
Python	Subinterpreters

Subinterpreter support for Python

Posted May 15, 2018 19:31 UTC (Tue) by epa (subscriber, #39769) [Link] (12 responses)

If it's a shared-nothing model, why have multiple interpreters per process? Why not implement them by forking subprocesses instead?

Subinterpreter support for Python

Posted May 15, 2018 19:46 UTC (Tue) by ballombe (subscriber, #9523) [Link]

I do not know about python, but generally in this kind of scheme, a thread still share some memory with its parent, but not with its siblings.

Subinterpreter support for Python

Posted May 15, 2018 21:17 UTC (Tue) by roc (subscriber, #30627) [Link] (5 responses)

It's not a shared-nothing model: "there is the potential for zero-copy data sharing between them".

Subinterpreter support for Python

Posted May 18, 2018 6:36 UTC (Fri) by njs (subscriber, #40338) [Link] (4 responses)

But so far the goal is just to share bytestrings (i.e. raw memory), which is already pretty cheap to do between subprocesses. Anything more complicated requires some way to handle refcounting efficiently without the GIL, and there are some hand-wavy ideas but no-one has yet found a clear convincing solution for this yet :-/.

Subinterpreter support for Python

Posted May 18, 2018 6:54 UTC (Fri) by andresfreund (subscriber, #69562) [Link] (3 responses)

> But so far the goal is just to share bytestrings (i.e. raw memory), which is already pretty cheap to do between subprocesses

It's not that cheap. You basically have to use posix or sysv shm and a protocol ontop of that generating mappings between all the processes.

Subinterpreter support for Python

Posted May 18, 2018 8:13 UTC (Fri) by epa (subscriber, #39769) [Link] (2 responses)

Right, so it's slower than using threads, which can make a difference for highly optimized programs written in C or C++. Does it really make a significant difference in an interpreted language like Python?

Even if you didn't have shared memory at all and just had to copy data structures from one subprocess to another, I still wouldn't expect the overhead to be that high in the context of interpreted Python code. Copying a few megabytes of data in machine code will almost always be much faster than operating on a pointer-chasing data structure with a bytecode interpreter. So is this a case of premature optimization?

Subinterpreter support for Python

Posted May 18, 2018 22:11 UTC (Fri) by roc (subscriber, #30627) [Link] (1 responses)

If you have some kind of pipeline parallelism you might pass ownership of large objects from thread to thread many times. Being able to do this without copying (and associated memory allocation and deallocation) could be important even in Python.

OTOH it's true that it might be easier to fork processes and use shared memory for data sharing than to use threads. There are a bunch of tradeoffs here --- on one hand, higher overheads for process creation/destruction (especially on Windows) and context switching, difficulty of sharing pointer-based structures across processes, processes being less ergonomic for users than threads ... on the other hand, the difficulties of multithreading described in this article.

Subinterpreter support for Python

Posted May 19, 2018 8:21 UTC (Sat) by njs (subscriber, #40338) [Link]

Right, the problem is that currently, no-one knows how make passing ownership of objects/pointer-based structures between subinterpreters cheaper than passing it between subprocesses.

Unless that changes, the calculus is: subinterpreters have somewhat cheaper startup, context switching, and byte-copies, at the cost of significantly increased complexity in the interpreter and breaking at least some existing code (e.g. NumPy would need an unknown but non-trivial amount of work to make it compatible with subinterpreters; there's lots of legacy C code that uses global caches and assumes they'll be protected by the GIL).

Startup costs can often be amortized (and starting a subinterpreter is expensive enough that you'll want to architecture your app to do this anyway), context switching and byte copies are perhaps two of the most heavily optimized operations on modern computers, and we're talking about an interpreted language where small differences in low-level operations are often going to be lost in the noise. So, I'm not convinced yet, but we'll see...

Subinterpreter support for Python

Posted May 15, 2018 21:47 UTC (Tue) by davidstrauss (guest, #85867) [Link] (1 responses)

One reason might be because forking is not very portable off of Linux and similar kernels, and Python supports Windows.

Subinterpreter support for Python

Posted May 16, 2018 17:36 UTC (Wed) by flussence (guest, #85566) [Link]

Yep, this is the reason Perl 5.8 has a threads module - it's basically a poor-man's fork(), for a poor-man's OS.

The documentation for that module also begins with a large admonition telling people not to use it, because bolting on shared-nothing subinterpreters turned out to be a mistake that created a ton of complexity for next to no benefit. I gave it a try once, they weren't kidding.

Subinterpreter support for Python

Posted May 16, 2018 1:23 UTC (Wed) by ringerc (subscriber, #3071) [Link] (2 responses)

If you're embedding Python in a process that its self uses threading, the GIL sharing sucks incredibly badly.

Subinterpreters would be a huge blessing to things like databases that support embedded Python procedures.

For example, PostgreSQL would quite like to support a threaded mode, but pl/python just wouldn't work at things stand.

Subinterpreter support for Python

Posted May 16, 2018 4:02 UTC (Wed) by andresfreund (subscriber, #69562) [Link] (1 responses)

> For example, PostgreSQL would quite like to support a threaded mode, but pl/python just wouldn't work at things stand.

Wonder if we could just use dlmopen() to load it into a separate namespace, which IIRC would force the GIL to be unshared.

Subinterpreter support for Python

Posted May 17, 2018 9:22 UTC (Thu) by ringerc (subscriber, #3071) [Link]

That'd be interesting. Unsure about portability. "The dlmopen() function is a GNU extension". Solaris is OK, doesn't look so good for Windows or OS X, no idea about BSDs.

Might result in crying when combined Python C extensions and Python's own dlopen()ing.

Subinterpreter support for Python

Posted May 16, 2018 16:43 UTC (Wed) by nescafe (subscriber, #45063) [Link]

Huh, sounds like the threading model TCL has used for quite some time now -- each thread gets its own interpreter, and the language allows for threads to communicate natively by sending events to each other.

Subinterpreter support for Python

Posted Sep 11, 2019 18:41 UTC (Wed) by catchmonster (guest, #134361) [Link]

Eric has a good points here.
Multicore out of the box is way to go. shared-nothing concurrency model in my opinion would be a great start.
Minimalistic approach to subinterperters - with separate GIL per interpreter would allow us all to test and build upon success...