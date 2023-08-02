GIL removal and the Faster CPython project

The Python global interpreter lock (GIL) has long been a barrier to increasing the performance of programs by using multiple threads—the GIL serializes access to the interpreter's virtual machine such that only one thread can be executing Python code at any given time. There are other mechanisms to provide concurrency for the language, but the specter of the GIL—and its reality as well—have often been cited as a major negative for Python. Back in October 2021, Sam Gross introduced a proof-of-concept, no-GIL version of the language. It was met with a lot of excitement at the time, but seemed to languish to a certain extent for more than a year; now, the Python Steering Council has announced its intent to accept the no-GIL feature. It will still be some time before it lands in a released Python version—and there is the possibility that it all has to be rolled back at some point—but there are several companies backing the effort, which gives it all a good chance to succeed.

After its introduction in 2021, and the discussion around that, the next public appearance for the feature was at the 2022 Python Language Summit in April. Gross gave a talk about his no-GIL fork in the hopes of getting some tacit agreement on proceeding with the work. That agreement was not forthcoming, in part because the full details and implications of a no-GIL interpreter were not really known. Meanwhile, the Faster CPython project, which came about in mid-2021, had been working along on its plan to increase the single-threaded performance of the interpreter. Mark Shannon reported on the status of that effort at the 2022 summit as well. He also authored PEP 659 ("Specialized Adaptive Interpreter") that describes the kinds of changes being made, some of which have found their way into Python 3.10 and 3.11.

At this year's PyCon, two of the Faster CPython team gave talks describing the techniques they have been using to improve the performance of the interpreter: Brandt Bucher looked at adaptive instructions, while Shannon described memory layout improvements and other optimizations. Given the GIL, nearly all existing Python programs are single threaded, so improving the performance of those programs will effectively speed up the entire Python world. One of the concerns that has been heard about no-GIL Python is what its impact on single-threaded programs would be.

PEP 703

In January 2023, Łukasz Langa posted the first version of PEP 703 ("Making the Global Interpreter Lock Optional in CPython") that is authored by Gross; Langa is sponsoring the PEP as a core developer. As might be guessed, that set off a lengthy thread, with, once again, a lot of excitement. There were also some concerns expressed with regard to the implications of not having a GIL, especially for Python extensions written in C; since the GIL protects that code from many concurrency problems, removing it might well lead to bugs.

One thing that everyone wants to avoid is another "flag day" transition like that of Python 2 to 3. The huge and unfortunate impact of Python 3 being incompatible with its predecessor was not foreseen—the core developers vastly underestimated the growing popularity of the language, for one thing—but that mistake will not be repeated. Any switch to remove the GIL will need to smoothly work with code that is not (yet) ready for it.

There was a question from Shannon about " what people think is a acceptable slowdown for single-threaded code ". To a large extent, that question went unanswered in the thread, but he had estimated an impact " in the 15-20% range, but it could be more, depending on the impact on PEP 659 ".

Another Faster CPython team member, Eric Snow, posted a lengthy analysis with a bunch of questions, which he summarized as: " tl;dr I'm really excited by this proposal but have significant concerns, which I genuinely hope the PEP can address. " He noted that he was the author of a "competing" concurrency option in PEP 684 ("A Per-Interpreter GIL"), along with the related PEP 683 ("Immortal Objects, Using a Fixed Refcount"), though he does not truly see multiple sub-interpreters, each with their own GIL, as being incompatible with the no-GIL work. Much of his concern was focused on the impacts on the C extensions (which is also a problem for PEP 684, though to a lesser extent), but single-threaded performance was also mentioned. Gross replied that the impact on the extensions was not completely negative:

There are also substantial benefits to extension module maintainers. The PEP includes quotes from a number of maintainers of widely used C API extensions who suffer the complexity of working around the GIL. For example, Zachary DeVito, PyTorch core developer, wrote "On three separate occasions in the past couple of months… I spent an order-of-magnitude more time figuring out how to work around GIL limitations than actually solving the particular problem."

Updated PEP

The thread had mostly run its course by the end of January. In early May, Gross posted an updated version of PEP 703, along with an implementation based on the in-progress Python 3.12. There was just one response early on (which Gross replied to). On May 12, Gross asked the steering council to decide on the PEP. As it turned out, there was still a lot more discussion to go before any decision would be made.

On June 2, Shannon posted a performance assessment of the PEP with some pretty eye-opening numbers (that were disputed) on the impact of the changes; his estimates of the impact ranged from 11 to 30%. He also noted that removing the GIL had some negative impacts on the existing and planned Faster CPython work:

The adaptive specializing interpreter relies on the GIL; it is not thread-friendly. If NoGIL is accepted, then some redesign of our optimization strategy will be necessary. While this is perfectly possible, it does have a cost. The effort spent on this redesign and resulting implementation is not being spent on actual optimizations.

Shannon has noted that he is not a fan of the free-threading, shared-memory concurrency model; his assessment ends with a suggestion that sub-interpreters provide a better concurrency solution with fewer of the performance and other concerns that no-GIL brings. Others, including steering council member Gregory P. Smith found that analysis to be somewhat oversimplified. Langa posted benchmark numbers that showed considerably less impact than Shannon's estimates. Langa followed that up with some additional results that correspond closely with what Gross had reported in the PEP.

Guido van Rossum, who heads up the Faster CPython team, wanted to ensure that everyone learned from the mistakes made in the past:

If there's one lesson we've learned from the Python 2 to 3 transition, it's that it would have been very beneficial if Python 2 and 3 code could coexist in the same Python interpreter. We blew it that time, and it set us back by about a decade. Let's not blow it this time. If we're going forward with nogil (and I'm not saying we are, but I can't exclude it), let's make sure there is a way to be able to import extensions requiring the GIL in a nogil interpreter without any additional shenanigans – neither the application code nor the extension module should have to be modified in any way [...]

Meanwhile, Smith replied to Gross's steering-council request (and copied it to the forum thread):

The steering council is going to take its time on this. A huge thank you for working to keep it up to date! We're not ready to simply pronounce on 703 as it has a HUGE blast radius. [...] That does not mean "no" to this. There is demand for it. (personally, I've wanted this since forever!) It's just that it won't be easy and we'll need to consider the entire ecosystem and how to smoothly allow such a change to happen without breaking the world. I'm glad to see the continued discuss thread with faster-cpython folks in particular piping up. The intersection between this work and ongoing single threaded performance improvements will always be high and we don't want to hamper that in the near term.

Gross largely disagreed with Shannon's assessment and, in particular, with his characterization of threading. He was also, seemingly, somewhat unhappy with Smith's reply:

You wrote that the Steering Council's decision does not mean "no," but the steering council has not set a bar for acceptance, stated what evidence is actually needed, nor said when a final decision will be made. Given the expressed demand for PEP 703, it makes sense to me for the steering committee to develop a timeline for identifying the factors it may need to consider and for determining the steps that would be required for the change to happen smoothly. Without these timelines and milestones in place, I would like to explain that the effect of the Steering Council's answer is a "no" in practice. I have been funded to work on this for the past few years with the milestone of submitting the PEP along with a comprehensive implementation to convince the Python community. Without specific concerns or a clear bar for acceptance, I (and my funding organization) will have to treat the current decision-in-limbo as a "no" and will be unable to pursue the PEP further.

That obviously put pressure on the council, as did the users who were clamoring for a no-GIL Python, but the decision is clearly not a simple one. On June 14, more pressure was applied from the Faster CPython team. Van Rossum described some of the costs of no-GIL, but also expressed concern about waiting for a decision:

We've had a group discussion about how our work would be affected by free threading. Our key conclusion is that merging nogil will set back our current results by a significant amount of time, and in addition will reduce our velocity in the future. We don't see this as a reason to reject nogil – it's just a new set of problems we would have to overcome, and we expect that our ultimate design would be quite different as a result. But there is a significant cost, and it's not a one-time cost. We could use help from someone (Sam?) who has experience thinking about the problems posed by the new environment. [...] In the meantime we're treading water, unsure whether to put our efforts in continuing with the current plan, or in designing a new, thread-safe optimization architecture.

Fast, free threading

The next day, Shannon started a new thread (titled: "A fast, free threading Python") that described three possible options for a way forward. It started with a lengthy description of the tradeoffs for optimization of a dynamic language like Python. Of the three aspects that he thinks need to be considered, single-threaded performance, parallelism, and mutability, the last has mostly been glossed over in earlier discussions, " but it is key ":

It isn't quite: Performance, parallelism, mutability: pick two. but more like: Performance, parallelism, mutability: pick one to restrict.

He also cautioned that there are some unknowns:

Performing the optimizations necessary to make Python fast in a free-threading environment will need some original research. That makes it more costly and a lot more risky.

The options for the steering council amount to choosing a fast single-threaded interpreter as currently planned, a no-GIL free-threading interpreter with an unknown (but non-zero) impact on single-threaded performance, or both at the same time. His preference is for both, but he is concerned that the council might choose no-GIL without also committing to the rest of the work needed:

Please don't choose option 2 [no-GIL] hoping that we will get option 3 [both], because "someone will sort out the performance issue". They won't, unless the resources are there. If we must choose option 1 [current Faster CPython plans] or 2, then I think it has to be option 1. It gives us a speedup at much lower cost in CPUs and energy, by doing things more efficiently rather than just throwing lots of cores at the problem.

Marc-André Lemburg asked about a phased approach, where, effectively, GIL or no-GIL were chosen at the command line; over time, the two could slowly be merged. " Or would this not be feasible because the 'slow merge' would actually require redesigning the whole specialization approach? " Smith replied that he thinks that is more or less what PEP 703 is proposing; even though Shannon basically recommended against it, Smith thinks pursuing both at once is possible:

I'd more or less expect work on specialization for to proceed in parallel without worrying if those benefits cannot yet be available in a free threaded build for a few of releases. Turning it mostly into an additional code maintenance and test matrix burden on the CPython core dev side to keep both our still-primary single threaded GIL based interpreter and the experimental free threaded build working. I figure this is basically exactly what Mark claims not to want. Presumably due to the interim added build and maintenance complexity. But also seems like the most likely way to get to his "both" option 3 that I suspect we all magically wish would just happen.

Smith followed that up by noting that free threading will need to addressed at some point; even if the Faster CPython plans work out and Python 3.15 is five times faster than Python 3.10, nobody will " be satisfied at 'just 5x' in the end ". Van Rossum agreed, but was also concerned that the council " might be betting on hope as a strategy " by choosing no-GIL and hoping for the best.

Like Mark, I hope that you're choosing (3) – like Mark says, it's clearly the best option. But we will need to be honest about it, and accept that we need more resources to improve single-threaded performance. (And, as I believe someone already pointed out, it will also be harder to do future maintenance on CPython's C code, since so much of it is now exposed to potential race conditions. This is a problem for a language that's for a large part maintained by volunteers.)

The talk of "more resources" led Itamar Oren to wonder what that means: " It's not clear to me to what extent the SC [steering council] is in a position to tie PEP acceptance or rejection to allocation of funding. " Van Rossum replied that Microsoft was committed to continue funding the team and that " our charter is not limited to single-threaded performance work ", but that there is extra work to do in a no-GIL world:

Meanwhile, we can start adapting the specialization and optimization work to a no-GIL world, with the goal of obtaining Mark's Option 3 (free threading and faster per-thread performance). Ideally we would reach a state where we can make no-GIL the one and only build mode without a drop in single-threaded performance (important for apps that haven't been re-architected, e.g. apps that currently use multi-processing, or algorithms that are hard to parallelize). It is this latter step (getting to Option 3) that requires extra resources – for example, it would be great if Meta or another tech company could spare some engineers with established CPython internals experience to help the core dev team with this work. Finally, I want to re-emphasize that while Microsoft has a team using the Faster CPython moniker, we don't intend to own CPython performance – we believe in good citizenship and want to contribute in a way that puts our skills and experience to the best possible use for the Python community.

Van Rossum did not just choose Meta out of a hat, here; Gross works for the company, which presumably funded his no-GIL work, and the Cinder CPython fork is maintained by a team at Meta. Carl Meyer said that he expected the Cinder team to work on no-GIL Python. In fact, on July 7, Meyer announced that Meta would fund work on the no-GIL interpreter:

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.

On July 19, Anaconda followed suit. Stan Seibert said that the company would fund work on the " packaging challenges that will be associated with adopting PEP 703, including any work on pip, cibuildwheel, and conda-forge that will be needed to get nogil-compatible packages into the hands of the Python community ". Some of that funding commitment likely helped the council reach a verdict, but the results of a core-developer poll on no-GIL also pushed the council in the direction of accepting the PEP. That poll showed 87% of 46 voters thought that free-threaded Python should be actively pursued and 63% of 38 voters said that they were willing to help support and maintain a no-GIL Python based on PEP 703.

Steering council decision

On July 28, council member Thomas Wouters announced that the council would be accepting PEP 703, though it was " still working on the acceptance details ". The idea would be to introduce the no-GIL version of the interpreter in order to give everyone a chance to figure out what pieces are missing, so that they can be filled in before no-GIL becomes the default and, eventually, the only, version of Python. The time frame for that transition is estimated to be around five years, but there will be no repeat of earlier mistakes:

We do not want another Python 3 situation, so any changes in third-party code needed to accommodate no-GIL builds should just work in with-GIL builds (although backward compatibility with older Python versions will still need to be addressed). This is not Python 4. We are still considering the requirements we want to place on ABI compatibility and other details for the two builds and the effect on backward compatibility.

As was noted in the various discussions, there is more to removing the GIL than simply adopting a PEP. Wouters made it clear that the core developers will need to gain experience with no-GIL Python so that they can lead the rest of the community:

We will probably need to figure out new C APIs and Python APIs as we sort out thread safety in existing code. We also need to bring along the rest of the Python community as we gain those insights and make sure the changes we want to make, and the changes we want them to make, are palatable.

If the Python community finds that the switch is " just going to be too disruptive for too little gain ", the council wants to be able to change its mind anytime before declaring no-GIL as the default mode for the language. He outlined the steps that the council sees, starting with a short-term (perhaps for Python 3.13, which is due in October 2024) experimental no-GIL build of the interpreter that core developers and others can try out. In the medium term, no-GIL would be a supported option, but not the default; when that happens depends a lot on how quickly the community adopts and supports the no-GIL build. In the long term, no-GIL would be the default build and the GIL would be completely excised (" without unnecessarily breaking backward compatibility "). Along the way, periodic reviews will be needed:

Throughout the process we (the core devs, not just the SC) will need to re-evaluate the progress and the suggested timelines. We don't want this to turn into another ten year backward compatibility struggle, and we want to be able to call off PEP 703 and find another solution if it looks to become problematic, and so we need to regularly check that the continued work is worth it.

As might be guessed, that spawned multiple congratulatory and excited-for-the-future responses, though there are a few who think that keeping the GIL would be a better choice for the language. The announcement presumably also sent the Faster CPython folks back to their drawing boards; though there were some accusations of turf wars in the discussions, that did not really seem to be the case. The Faster CPython team simply wanted to ensure that all of the costs were taken into consideration; overall, the team seems quite excited to work on surmounting the challenges of producing a no-GIL interpreter, with minimal (or, ideally, no) performance impact on single-threaded code.

It is quite a turning point in the history of the language, but the work is (obviously) not done yet. There is a huge amount of researching, coding, testing, experimenting, documenting, and so on between here and a no-GIL-only version of the language in, say, Python 3.17 in October 2028. One guesses that the work will not be done, then, either—there will be more optimizations to be found and applied if there is still funding available to do so. Meanwhile, we have yet to dig into the details of the PEP itself; that will come soon. We will be keeping an eye on the no-GIL development process as it plays out over the coming years as well.

