Subinterpreters for Python
A project that has been floating around in the Python world for a number of years is now working its way toward inclusion into the language—or not. "Subinterpreters", which are separate Python interpreters that can currently be created via the C API for extensions, are seen by some as a way to get a more Go-like concurrency model for Python. The first step toward that goal is to expose that API in the standard library. But there are questions about whether subinterpreters are actually a desirable feature for Python at all, as well as whether the hoped-for concurrency improvements will materialize.
PEP 554
Eric Snow's PEP 554
("Multiple Interpreters in the Stdlib
") would expose the
existing
subinterpreter support from the C API in the standard library. That would
allow Python programs to use multiple separate interpreters; the PEP also
proposes to add a way to share some data types between the instances. The
eventual goal is to allow those subinterpreters to run in parallel, but the
implementation is not there yet.
In particular, giving each subinterpreter its own global interpreter lock (GIL) is not (yet) on the table. The GIL prevents multiple threads from executing Python bytecode at the same time. It exists mainly because the CPython memory-management code and garbage collector are not thread-safe. But the existence of the GIL has meant that other features, C-based extensions for example, depend on it for proper functioning. There have been efforts to remove the GIL from Python along the way, including the Gilectomy project. Subinterpreters are seen by some as another way of addressing the "GIL problem".
The PEP proposes adding an interpreters module to the standard library that will allow the creation of subinterpreters as follows:
interp = interpreters.create()
Interpreters can then run code passed as a string to the run()
method. Data is not shared between these interpreters unless it is done
explicitly by using "channels" created this way:
recv, send = interpreters.create_channel()
As might be guessed, simple objects (e.g. bytes, strings, integers) can then be
sent and received using the send() and recv() methods of
the corresponding channel objects.
The run() method blocks until the subinterpreter completes, though it can be executed in a separate thread as an example from the PEP that uses the threading module shows:
interp = interpreters.create()
def run():
interp.run('print("during")')
t = threading.Thread(target=run)
print('before')
t.start()
print('after')
Because the GIL is shared between all of the interpreters, however, the concurrency gains are minimal. In the most recent revisions, the PEP tries to make it clear that exposing the feature from the C API is worth doing regardless of what happens with the GIL:
PEP 554 has been around since 2017, but Snow thinks it is getting ready for "pronouncement" (a decision to accept or reject it) now. While he believes there is value to exposing the interface in its own right, the PEP has had trouble separating itself from the ongoing GIL work; PEP 554 could perhaps be added to Python 3.9, though the GIL changes are not complete. In mid-April, Snow posed a question to the python-dev mailing list, wondering if it made sense to hold off on the PEP until 3.10 because there is no per-interpreter GIL.
While PEP 554 might be accepted and the implementation ready in time for 3.9, the separate effort toward a per-interpreter GIL is unlikely to be sufficiently done in time. That will likely happen in the next couple months (for 3.10).
So...would it be sufficiently problematic for users if we land PEP 554 in 3.9 without per-interpreter GIL?
His main concern is that users will be confused and frustrated by encountering subinterpreters with a shared GIL, which will have lots of limitations; that might lead them to not reconsider the feature when those limitations are lifted for 3.10. He listed four options for proceeding: merging it without the GIL changes, the same but mark it as a "provisional" module, not merging until the GIL changes are ready, and the same but adding a 3.9-only subinterpreters module to the Python Package Index (PyPI). He was in favor of the first or the second option.
C extensions
But others are concerned that adding subinterpreter support to the standard library will put additional burdens onto the developers of C-based extensions. Those extensions sometimes use global variables, which do not play well with subinterpreters—whether they are created via the existing C API or the proposed standard library interpreters module. That means that using subinterpreters could lead to strange, hard-to-find problems when combined with extensions.
CPython core developer Nathaniel Smith, who is also a core developer of the C-based extension NumPy, was particularly unhappy with the proposal:
NumPy core developer Sebastian Berg chimed
in as well. He suggested that it could take up to a solid year of work to
support subinterpreters in NumPy. He also said that
the proposal to raise an exception when subinterpreters import extensions
that are not subinterpreter-ready is helpful, though it likely will still lead to
bugs being filed against the extensions. The PEP proposes to raise
ImportError for any extension that does not support PEP 489
("Multi-phase extension module initialization
"); multi-phase
initialization eliminates the problems with global state variables for the
extensions by moving them into their own module-specific dictionary object.
Both Smith and Berg are skeptical of the existing C-level subinterpreter
support. Berg said: "I believe
you must consider subinterpreters basically a non-feature at this time.
It has neither users nor reasonable ecosystem support
", while Smith
said that he might write a PEP to propose that subinterpreters be
completely eliminated from Python. Snow replied
to Berg that there are existing users, however:
That's not to say that alone justifies exposing the C-API, of course. :)
Benefits?
Beyond the concerns about extensions, though, Smith is not convinced of the benefits for concurrency that could eventually come from subinterpreter support. PEP 554 is careful not to directly connect the interpreters module with the eventual plan to stop sharing the GIL between subinterpreters, though it is clearly the eventual goal for some. Smith is skeptical of that plan as well:
Berg concurred to a certain extent. He said that there is a need for a wider vision, beyond the PEP's smaller goals, to explain what the plans are for subinterpreters so that a fuller picture can be considered. Snow agreed that there was a need for better documentation, an informational PEP or other justification document, though that has not appeared as yet. Ultimately, the decision on the PEP rests with Antoine Pitrou, who is the delegate for the PEP. He is generally favorably inclined toward it:
He had some concrete suggestions on things to improve in the API and suggested that the feature be added provisionally (effectively option two in Snow's original message). He also explicitly solicited more feedback. Mark Shannon reviewed the PEP and said that he was in favor of the idea, but that it did not make sense to add the module to the standard library without showing that it would be beneficial for parallelism:
If per-[subinterpreter] GILs are possible then, and only then, sub-interpreters will provide true parallelism and (limited) shared memory concurrency.
The problem is that we don't know whether we can implement per-[subinterpreter] GILs without too large a negative performance impact. I think we can, but we can't say so for certain.
Snow disagreed, not surprisingly, but Shannon put together a table comparing different existing approaches to concurrency in Python with PEP 554 and an "ideal" communicating sequential processes (CSP) model. Go's concurrency model is roughly based around CSP; adding it to Python has also been tried along the way. Shannon said:
As it stands, multiprocessing a better fit for CSP than PEP 554.
IMO, sub-interpreters only become a useful option for concurrency if they allow true parallelism and are not much more expensive than threads.
Snow sees concurrency as something of a side issue, but he is thinking of taking up the suggestion by Berg and others to more fully document the complete plan:
There was plenty of other discussion, but Snow eventually deferred the PEP until the 3.10 time frame:
It is an interesting feature and one that numerous core developers think
could really help the performance of Python programs on multiple cores.
But, without the GIL changes, it is difficult to know for sure whether it
will be a substantial win. As Smith put
it: "[...] the new concurrency model in PEP 554 has never actually
been used, and it isn't even clear whether it's useful at all.
Designing useful concurrency models is *stupidly* hard.
" We will
have to wait to see if subinterpreters can clear that hurdle.
| Index entries for this article | |
|---|---|
| Python | Python Enhancement Proposals (PEP)/PEP 554 |
| Python | Subinterpreters |
