GIL removal and the Faster CPython project

By Jake Edge
August 2, 2023

The Python global interpreter lock (GIL) has long been a barrier to increasing the performance of programs by using multiple threads—the GIL serializes access to the interpreter's virtual machine such that only one thread can be executing Python code at any given time. There are other mechanisms to provide concurrency for the language, but the specter of the GIL—and its reality as well—have often been cited as a major negative for Python. Back in October 2021, Sam Gross introduced a proof-of-concept, no-GIL version of the language. It was met with a lot of excitement at the time, but seemed to languish to a certain extent for more than a year; now, the Python Steering Council has announced its intent to accept the no-GIL feature. It will still be some time before it lands in a released Python version—and there is the possibility that it all has to be rolled back at some point—but there are several companies backing the effort, which gives it all a good chance to succeed.

After its introduction in 2021, and the discussion around that, the next public appearance for the feature was at the 2022 Python Language Summit in April. Gross gave a talk about his no-GIL fork in the hopes of getting some tacit agreement on proceeding with the work. That agreement was not forthcoming, in part because the full details and implications of a no-GIL interpreter were not really known. Meanwhile, the Faster CPython project, which came about in mid-2021, had been working along on its plan to increase the single-threaded performance of the interpreter. Mark Shannon reported on the status of that effort at the 2022 summit as well. He also authored PEP 659 ("Specialized Adaptive Interpreter") that describes the kinds of changes being made, some of which have found their way into Python 3.10 and 3.11.

At this year's PyCon, two of the Faster CPython team gave talks describing the techniques they have been using to improve the performance of the interpreter: Brandt Bucher looked at adaptive instructions, while Shannon described memory layout improvements and other optimizations. Given the GIL, nearly all existing Python programs are single threaded, so improving the performance of those programs will effectively speed up the entire Python world. One of the concerns that has been heard about no-GIL Python is what its impact on single-threaded programs would be.

PEP 703

In January 2023, Łukasz Langa posted the first version of PEP 703 ("Making the Global Interpreter Lock Optional in CPython") that is authored by Gross; Langa is sponsoring the PEP as a core developer. As might be guessed, that set off a lengthy thread, with, once again, a lot of excitement. There were also some concerns expressed with regard to the implications of not having a GIL, especially for Python extensions written in C; since the GIL protects that code from many concurrency problems, removing it might well lead to bugs.

One thing that everyone wants to avoid is another "flag day" transition like that of Python 2 to 3. The huge and unfortunate impact of Python 3 being incompatible with its predecessor was not foreseen—the core developers vastly underestimated the growing popularity of the language, for one thing—but that mistake will not be repeated. Any switch to remove the GIL will need to smoothly work with code that is not (yet) ready for it.

There was a question from Shannon about "what people think is a acceptable slowdown for single-threaded code". To a large extent, that question went unanswered in the thread, but he had estimated an impact "in the 15-20% range, but it could be more, depending on the impact on PEP 659".

Another Faster CPython team member, Eric Snow, posted a lengthy analysis with a bunch of questions, which he summarized as: "tl;dr I'm really excited by this proposal but have significant concerns, which I genuinely hope the PEP can address." He noted that he was the author of a "competing" concurrency option in PEP 684 ("A Per-Interpreter GIL"), along with the related PEP 683 ("Immortal Objects, Using a Fixed Refcount"), though he does not truly see multiple sub-interpreters, each with their own GIL, as being incompatible with the no-GIL work. Much of his concern was focused on the impacts on the C extensions (which is also a problem for PEP 684, though to a lesser extent), but single-threaded performance was also mentioned. Gross replied that the impact on the extensions was not completely negative:

There are also substantial benefits to extension module maintainers. The PEP includes quotes from a number of maintainers of widely used C API extensions who suffer the complexity of working around the GIL. For example, Zachary DeVito, PyTorch core developer, wrote "On three separate occasions in the past couple of months… I spent an order-of-magnitude more time figuring out how to work around GIL limitations than actually solving the particular problem."

Updated PEP

The thread had mostly run its course by the end of January. In early May, Gross posted an updated version of PEP 703, along with an implementation based on the in-progress Python 3.12. There was just one response early on (which Gross replied to). On May 12, Gross asked the steering council to decide on the PEP. As it turned out, there was still a lot more discussion to go before any decision would be made.

On June 2, Shannon posted a performance assessment of the PEP with some pretty eye-opening numbers (that were disputed) on the impact of the changes; his estimates of the impact ranged from 11 to 30%. He also noted that removing the GIL had some negative impacts on the existing and planned Faster CPython work:

The adaptive specializing interpreter relies on the GIL; it is not thread-friendly. If NoGIL is accepted, then some redesign of our optimization strategy will be necessary. While this is perfectly possible, it does have a cost. The effort spent on this redesign and resulting implementation is not being spent on actual optimizations.

Shannon has noted that he is not a fan of the free-threading, shared-memory concurrency model; his assessment ends with a suggestion that sub-interpreters provide a better concurrency solution with fewer of the performance and other concerns that no-GIL brings. Others, including steering council member Gregory P. Smith found that analysis to be somewhat oversimplified. Langa posted benchmark numbers that showed considerably less impact than Shannon's estimates. Langa followed that up with some additional results that correspond closely with what Gross had reported in the PEP.

Guido van Rossum, who heads up the Faster CPython team, wanted to ensure that everyone learned from the mistakes made in the past:

If there's one lesson we've learned from the Python 2 to 3 transition, it's that it would have been very beneficial if Python 2 and 3 code could coexist in the same Python interpreter. We blew it that time, and it set us back by about a decade.
Let's not blow it this time. If we're going forward with nogil (and I'm not saying we are, but I can't exclude it), let's make sure there is a way to be able to import extensions requiring the GIL in a nogil interpreter without any additional shenanigans – neither the application code nor the extension module should have to be modified in any way [...]

Meanwhile, Smith replied to Gross's steering-council request (and copied it to the forum thread):

The steering council is going to take its time on this. A huge thank you for working to keep it up to date! We're not ready to simply pronounce on 703 as it has a HUGE blast radius.
[...] That does not mean "no" to this. There is demand for it. (personally, I've wanted this since forever!) It's just that it won't be easy and we'll need to consider the entire ecosystem and how to smoothly allow such a change to happen without breaking the world.
I'm glad to see the continued discuss thread with faster-cpython folks in particular piping up. The intersection between this work and ongoing single threaded performance improvements will always be high and we don't want to hamper that in the near term.

Gross largely disagreed with Shannon's assessment and, in particular, with his characterization of threading. He was also, seemingly, somewhat unhappy with Smith's reply:

You wrote that the Steering Council's decision does not mean "no," but the steering council has not set a bar for acceptance, stated what evidence is actually needed, nor said when a final decision will be made. Given the expressed demand for PEP 703, it makes sense to me for the steering committee to develop a timeline for identifying the factors it may need to consider and for determining the steps that would be required for the change to happen smoothly.
Without these timelines and milestones in place, I would like to explain that the effect of the Steering Council's answer is a "no" in practice. I have been funded to work on this for the past few years with the milestone of submitting the PEP along with a comprehensive implementation to convince the Python community. Without specific concerns or a clear bar for acceptance, I (and my funding organization) will have to treat the current decision-in-limbo as a "no" and will be unable to pursue the PEP further.

That obviously put pressure on the council, as did the users who were clamoring for a no-GIL Python, but the decision is clearly not a simple one. On June 14, more pressure was applied from the Faster CPython team. Van Rossum described some of the costs of no-GIL, but also expressed concern about waiting for a decision:

We've had a group discussion about how our work would be affected by free threading. Our key conclusion is that merging nogil will set back our current results by a significant amount of time, and in addition will reduce our velocity in the future. We don't see this as a reason to reject nogil – it's just a new set of problems we would have to overcome, and we expect that our ultimate design would be quite different as a result. But there is a significant cost, and it's not a one-time cost. We could use help from someone (Sam?) who has experience thinking about the problems posed by the new environment.
[...] In the meantime we're treading water, unsure whether to put our efforts in continuing with the current plan, or in designing a new, thread-safe optimization architecture.

Fast, free threading

The next day, Shannon started a new thread (titled: "A fast, free threading Python") that described three possible options for a way forward. It started with a lengthy description of the tradeoffs for optimization of a dynamic language like Python. Of the three aspects that he thinks need to be considered, single-threaded performance, parallelism, and mutability, the last has mostly been glossed over in earlier discussions, "but it is key":

It isn't quite:
Performance, parallelism, mutability: pick two.
but more like:
Performance, parallelism, mutability: pick one to restrict.

He also cautioned that there are some unknowns:

Performing the optimizations necessary to make Python fast in a free-threading environment will need some original research. That makes it more costly and a lot more risky.

The options for the steering council amount to choosing a fast single-threaded interpreter as currently planned, a no-GIL free-threading interpreter with an unknown (but non-zero) impact on single-threaded performance, or both at the same time. His preference is for both, but he is concerned that the council might choose no-GIL without also committing to the rest of the work needed:

Please don't choose option 2 [no-GIL] hoping that we will get option 3 [both], because "someone will sort out the performance issue". They won't, unless the resources are there.
If we must choose option 1 [current Faster CPython plans] or 2, then I think it has to be option 1. It gives us a speedup at much lower cost in CPUs and energy, by doing things more efficiently rather than just throwing lots of cores at the problem.

Marc-André Lemburg asked about a phased approach, where, effectively, GIL or no-GIL were chosen at the command line; over time, the two could slowly be merged. "Or would this not be feasible because the 'slow merge' would actually require redesigning the whole specialization approach?" Smith replied that he thinks that is more or less what PEP 703 is proposing; even though Shannon basically recommended against it, Smith thinks pursuing both at once is possible:

I'd more or less expect work on specialization for to proceed in parallel without worrying if those benefits cannot yet be available in a free threaded build for a few of releases. Turning it mostly into an additional code maintenance and test matrix burden on the CPython core dev side to keep both our still-primary single threaded GIL based interpreter and the experimental free threaded build working.
I figure this is basically exactly what Mark claims not to want. Presumably due to the interim added build and maintenance complexity. But also seems like the most likely way to get to his "both" option 3 that I suspect we all magically wish would just happen.

Smith followed that up by noting that free threading will need to addressed at some point; even if the Faster CPython plans work out and Python 3.15 is five times faster than Python 3.10, nobody will "be satisfied at 'just 5x' in the end". Van Rossum agreed, but was also concerned that the council "might be betting on hope as a strategy" by choosing no-GIL and hoping for the best.

Like Mark, I hope that you're choosing (3) – like Mark says, it's clearly the best option. But we will need to be honest about it, and accept that we need more resources to improve single-threaded performance. (And, as I believe someone already pointed out, it will also be harder to do future maintenance on CPython's C code, since so much of it is now exposed to potential race conditions. This is a problem for a language that's for a large part maintained by volunteers.)

The talk of "more resources" led Itamar Oren to wonder what that means: "It's not clear to me to what extent the SC [steering council] is in a position to tie PEP acceptance or rejection to allocation of funding." Van Rossum replied that Microsoft was committed to continue funding the team and that "our charter is not limited to single-threaded performance work", but that there is extra work to do in a no-GIL world:

Meanwhile, we can start adapting the specialization and optimization work to a no-GIL world, with the goal of obtaining Mark's Option 3 (free threading and faster per-thread performance). Ideally we would reach a state where we can make no-GIL the one and only build mode without a drop in single-threaded performance (important for apps that haven't been re-architected, e.g. apps that currently use multi-processing, or algorithms that are hard to parallelize).
It is this latter step (getting to Option 3) that requires extra resources – for example, it would be great if Meta or another tech company could spare some engineers with established CPython internals experience to help the core dev team with this work.
Finally, I want to re-emphasize that while Microsoft has a team using the Faster CPython moniker, we don't intend to own CPython performance – we believe in good citizenship and want to contribute in a way that puts our skills and experience to the best possible use for the Python community.

Van Rossum did not just choose Meta out of a hat, here; Gross works for the company, which presumably funded his no-GIL work, and the Cinder CPython fork is maintained by a team at Meta. Carl Meyer said that he expected the Cinder team to work on no-GIL Python. In fact, on July 7, Meyer announced that Meta would fund work on the no-GIL interpreter:

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.

On July 19, Anaconda followed suit. Stan Seibert said that the company would fund work on the "packaging challenges that will be associated with adopting PEP 703, including any work on pip, cibuildwheel, and conda-forge that will be needed to get nogil-compatible packages into the hands of the Python community". Some of that funding commitment likely helped the council reach a verdict, but the results of a core-developer poll on no-GIL also pushed the council in the direction of accepting the PEP. That poll showed 87% of 46 voters thought that free-threaded Python should be actively pursued and 63% of 38 voters said that they were willing to help support and maintain a no-GIL Python based on PEP 703.

Steering council decision

On July 28, council member Thomas Wouters announced that the council would be accepting PEP 703, though it was "still working on the acceptance details". The idea would be to introduce the no-GIL version of the interpreter in order to give everyone a chance to figure out what pieces are missing, so that they can be filled in before no-GIL becomes the default and, eventually, the only, version of Python. The time frame for that transition is estimated to be around five years, but there will be no repeat of earlier mistakes:

We do not want another Python 3 situation, so any changes in third-party code needed to accommodate no-GIL builds should just work in with-GIL builds (although backward compatibility with older Python versions will still need to be addressed). This is not Python 4. We are still considering the requirements we want to place on ABI compatibility and other details for the two builds and the effect on backward compatibility.

As was noted in the various discussions, there is more to removing the GIL than simply adopting a PEP. Wouters made it clear that the core developers will need to gain experience with no-GIL Python so that they can lead the rest of the community:

We will probably need to figure out new C APIs and Python APIs as we sort out thread safety in existing code. We also need to bring along the rest of the Python community as we gain those insights and make sure the changes we want to make, and the changes we want them to make, are palatable.

If the Python community finds that the switch is "just going to be too disruptive for too little gain", the council wants to be able to change its mind anytime before declaring no-GIL as the default mode for the language. He outlined the steps that the council sees, starting with a short-term (perhaps for Python 3.13, which is due in October 2024) experimental no-GIL build of the interpreter that core developers and others can try out. In the medium term, no-GIL would be a supported option, but not the default; when that happens depends a lot on how quickly the community adopts and supports the no-GIL build. In the long term, no-GIL would be the default build and the GIL would be completely excised ("without unnecessarily breaking backward compatibility"). Along the way, periodic reviews will be needed:

Throughout the process we (the core devs, not just the SC) will need to re-evaluate the progress and the suggested timelines. We don't want this to turn into another ten year backward compatibility struggle, and we want to be able to call off PEP 703 and find another solution if it looks to become problematic, and so we need to regularly check that the continued work is worth it.

As might be guessed, that spawned multiple congratulatory and excited-for-the-future responses, though there are a few who think that keeping the GIL would be a better choice for the language. The announcement presumably also sent the Faster CPython folks back to their drawing boards; though there were some accusations of turf wars in the discussions, that did not really seem to be the case. The Faster CPython team simply wanted to ensure that all of the costs were taken into consideration; overall, the team seems quite excited to work on surmounting the challenges of producing a no-GIL interpreter, with minimal (or, ideally, no) performance impact on single-threaded code.

It is quite a turning point in the history of the language, but the work is (obviously) not done yet. There is a huge amount of researching, coding, testing, experimenting, documenting, and so on between here and a no-GIL-only version of the language in, say, Python 3.17 in October 2028. One guesses that the work will not be done, then, either—there will be more optimizations to be found and applied if there is still funding available to do so. Meanwhile, we have yet to dig into the details of the PEP itself; that will come soon. We will be keeping an eye on the no-GIL development process as it plays out over the coming years as well.

Index entries for this article
Python	CPython
Python	Global interpreter lock (GIL)
Python	Python Enhancement Proposals (PEP)/PEP 703

GIL removal and the Faster CPython project

Posted Aug 3, 2023 2:47 UTC (Thu) by milesrout (subscriber, #126894) [Link] (12 responses)

This is exciting. A long time coming, too. I feel like I've been hearing about 'no-GIL Python' since I first seriously started learning Python about a decade ago, and I doubt it was a new idea then either.

It seems to me that if the Faster CPython people have made such large performance improvements to single-threaded Python recently, and the removal of the GIL slows down the performance of single-threaded Python to erase those changes, then it's pretty hard to argue that it counts as a serious performance regression. After all, it's just slowing down the language to where it was a couple of years ago. In the meantime computers have got faster and the GIL removal will let people create huge speedups in other areas.

And then there's the fact that if you're really concerned about single-threaded CPU-bound performance in $current_year then that's pretty niche, and expecting that performance in Python is even more niche. If you care about CPU throughput then why would you only use an 1/8th of your CPU by going single-threaded? And for that matter, why would you only use about 1/50th or 1/100th of your CPU by using pure Python?

GIL removal and the Faster CPython project

Posted Aug 3, 2023 6:42 UTC (Thu) by maniax (subscriber, #4509) [Link] (11 responses)

I'm not happy about the change, as this looks like the wrong direction.

In my experience with parallelization and using more CPUs, the threaded, lock/mutex/semaphore model scales worse than having separate, isolated processes that do not touch anyone else's memory. Most high-performance systems run one thread/process per CPU, pin it there, and have a way of passing data back and forth which is not "all the time", but "task" based ("process this transaction and give me the result"). One example is the Python multiprocessing module (although it had a few very nasty known bugs) - for a lot of work it's trivial to add and use it, and it gives you pretty good scalability.

(nasty known bug that seems to have been fixed in Python3 according to https://pythonspeed.com/articles/python-multiprocessing/ - the main process was threaded, used fork() and thus was able to create some nasty deadlocks (they weren't following posix, which says that for a threaded process, after fork() the only thing you should do is execve())

So in general, faster single-thread performance and multiple interpreter instances should be faster in pretty much all cases than a threaded locking model.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 7:15 UTC (Thu) by jem (subscriber, #24231) [Link]

Having separate processes vs threads is orthogonal to the IPC mechanism used. You can use message queues as a communication medium between threads, too, if that is faster than "touching anyone else's memory". Much depends on what your use case is, so saying "pretty much all cases" is quite a bold statement.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 8:48 UTC (Thu) by kleptog (subscriber, #1183) [Link] (2 responses)

I agree. Over the last decade I've become convinced that no-shared-mutable-state threading is the only future proof way. CPU speeds aren't going to increase much any more, further improvements are going to come from the numbers of cores, and data-sharing is just a huge liability here.

But it is use-case dependant. If you're executing CPU-heavy tasks in parallel, then multiple threads with no shared mutable state are going to beat the pants off anything that needs to think about locking. On the other hand if you're using threading for I/O tasks (basically asyncio-with-added-data-races) then the message passing might be a problem. OTOH, you see many such programs having I/O threads just signalling a main thread anyway so it's just a design choice.

The most extreme version of this I've seen is Erlang, where your programs spawn hundreds maybe thousands of threads (the Actor model) because they're cheap and message-passing is fast. Its data model however is very simple, I just don't see Python ever getting message passing fast enough to make that feasible.

I guess the discussion is really about what kind of language Python wants to be: easy to use or high performance parallel processing. TANSTAAFL. The idea of "you can just start threads and data races are your problem" is I guess "easy to use"?

GIL removal and the Faster CPython project

Posted Aug 3, 2023 12:02 UTC (Thu) by ballombe (subscriber, #9523) [Link] (1 responses)

> I agree. Over the last decade I've become convinced that no-shared-mutable-state threading is the only future proof way. CPU speeds aren't going to increase much any more, further improvements are going to come from the numbers of cores, and data-sharing is just a huge liability here.

I have come to the same conclusion. "Embarrassingly parallel" should be renamed to "Efficiently parallel".
I know that countless computer science Ph.D have been written on how to do shared-mutable-state parallelism, but in practice, no-shared-mutable-state parallelism is always more efficient and more robust. It is just a matter of doing parallelism at the right level. It is also relatively easy to add support for no-shared-mutable-state parallelism to a language in a foolproof way that does not allow the user to create race conditions and data dependencies, and it is easier to document.

GIL removal and the Faster CPython project

Posted Aug 11, 2023 17:19 UTC (Fri) by Lennie (subscriber, #49641) [Link]

I see no problem with the CPython developers improving threading.

Because the multi-process model should not degrade in performance from doing this work.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 15:02 UTC (Thu) by zorro (subscriber, #45643) [Link] (3 responses)

I don't really understand the controversy. If instead of writing

    for i in range(len(list)):
        foo(list[i])

I can write

    parallel for i in range(len(list)):
        foo(list[i])

and get an N x speedup, then that is huge. No sub-interpreters, locking or message passing in sight.

If no-GIL allows me to parallelize my loops in a simple way so that my CPU cores don't go to waste then I'm all for it.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 15:16 UTC (Thu) by mb (subscriber, #50428) [Link] (2 responses)

>No [...] locking or message passing in sight.

That depends on whether your list contains items that point to global mutable state or if your function accesses global mutable state directly.

In Python there is no way to statically check this.
Therefore, it's not as simple as marking a loop for parallel execution.
And that is exactly where subtle breakages can and will occur.

In Rust it is statically checked to the extent that the compiler forces you to add dynamic checks or locks if it can't be checked statically.
That's why such easy to use parallelism is easy and safe to implement in Rust.
Most subtle breakages are extremely unlikely. Some types of bugs are outright impossible.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 15:28 UTC (Thu) by zorro (subscriber, #45643) [Link] (1 responses)

Sure, you need to know what your are doing. But many workloads involve stateless item-by-item data processing loops, and speeding them up by a factor N would be as simple as this in a no-GIL world. I don't see why this technology should be denied to Python developers.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 17:15 UTC (Thu) by bluss (guest, #47454) [Link]

Work is ongoing on per-GIL subinterpreters and it will soon allow using independent interpreter threads, which should make it relatively easy to achieve this. For example using concurrent.futures.

GIL removal and the Faster CPython project

Posted Aug 4, 2023 2:45 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> In my experience with parallelization and using more CPUs, the threaded, lock/mutex/semaphore model scales worse than having separate, isolated processes

The problem with isolated processes is that sometimes you need to share the data. Something simple like a shared read-mostly cache becomes nearly impossible. You also can't use simple parallel algorithms when needed.

GIL removal and the Faster CPython project

Posted Aug 4, 2023 8:57 UTC (Fri) by kleptog (subscriber, #1183) [Link] (1 responses)

> The problem with isolated processes is that sometimes you need to share the data. Something simple like a shared read-mostly cache becomes nearly impossible. You also can't use simple parallel algorithms when needed.

Having threads which don't share data doesn't mean you *can't* share data, it just mean that accessing shared data is an explicit action. As a thread you can assume that all your objects are yours, no locking required.

To take Erlang as an example, you either have a process (Erlang is a VM so the threads are called processes) that owns the shared structure and other tasks access it by sending messages, or you use the global ETS table which is a global shared dict that can be accessed atomically from any process.

The idea is to keep the amount of shared mutable data to a minimum, so you only need locking for the moments you actually access the shared state, rather than pessimistically assuming *every* access might access shared state. If you have an algorithm that requires many threads to modify the same data simultaneously, it's going to suck whatever you do. What the kernel does is only lock data as required and assume the developer know what they're doing, which does work but is probably not appropriate for Python (which must always preserve it's internal state).

Though RCU is a really neat trick which works in many cases.

GIL removal and the Faster CPython project

Posted Aug 5, 2023 4:42 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> Having threads which don't share data doesn't mean you *can't* share data

Python does _not_ have (concurrent) threads. That's the point. So nothing you do matters.

You can try to work around this by using subprocesses and objects in shared memory, but this never works out well.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 7:57 UTC (Thu) by roc (subscriber, #30627) [Link] (8 responses)

Seems to me that free-threading is a path to subtle data-race bugs in user code *and* severely limited single-thread performance :-(. I can't think of any similar language interpreter that supports free-threading like this.

It would be helpful if the motivation section for PEP 703 carefully explained why in each case PEP 684 with one interpreter per thread would not mitigate the issue, or could not be enhanced to mitigate the issue. There would still be inter-interpreter communication overhead, but that seems like a much easier problem to solve than PEP 703. E.g. maybe you could build an object migration API that exposes a inter-interpreter object queue, where pushing an object into the queue consumes the object reference. If the object refcount is 1 you can scan the object subgraph, and if any inner objects have refcounts > 1 you copy them or immortalize them, then hand off ownership of the entire object graph to the destination interpreter.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 9:07 UTC (Thu) by mb (subscriber, #50428) [Link] (2 responses)

>I can't think of any similar language interpreter that supports free-threading like this.

There already are Python interpreters without a GIL, like Jython and pypy-stm.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 9:35 UTC (Thu) by roc (subscriber, #30627) [Link] (1 responses)

Thanks, good to know.

They're not exactly mainstream though.

GIL removal and the Faster CPython project

Posted Aug 18, 2023 12:54 UTC (Fri) by sammythesnake (guest, #17693) [Link]

"Not exactly mainstream" is a little generous, to be honest, Jython hasn't yet got support for Python 3 and pypy-stm was abandoned as essentially unworkable several years back...

It's a shame, though - STM in particular was a pretty cool *concept*, even if making it work kinda wants hypothetical hardware support :-P

GIL removal and the Faster CPython project

Posted Aug 3, 2023 9:39 UTC (Thu) by roc (subscriber, #30627) [Link]

To be clear I have no idea how an optimal same-address-space object-transfer protocol would be implemented in Python, but definitely something can be done that's much more efficient than regular IPC. In the JS world there is "structured clone", which supports transferring data buffers to the cloned object without copying.

GIL removal and the Faster CPython project

Posted Aug 3, 2023 21:18 UTC (Thu) by bluss (guest, #47454) [Link] (1 responses)

Interpreter or not, I'm not sure if that matters. C# comes to mind. It makes it easy to spawn new threads, it's a "safe" language, and it's very easy to modify hashmaps from multiple threads at once. And the result is both one's own state as well as the internal state of the collection can be corrupted.

GIL removal and the Faster CPython project

Posted Aug 4, 2023 9:01 UTC (Fri) by zorro (subscriber, #45643) [Link]

In my example above, I was also thinking about C#. I love being able to convert my for/foreach loops into Parellel.For/ForEach with just a few lines of code and get an instant N x speedup. Of course, you need to ensure that the data you are feeding into your parallel loop is "embarrassingly partitionable", but this is true for many workloads.

I'm sure there is some task and thread management overhead under the hood when using this .NET feature, but it does not sound as heavyweight as a Python sub-interpreter (I could be wrong about that)

GIL removal and the Faster CPython project

Posted Aug 5, 2023 9:06 UTC (Sat) by ssmith32 (subscriber, #72404) [Link] (1 responses)

My takeaway from the article was that the whole point of the discussion that was had was to:

- ensure existing python code works the same

- ensure Faster Python (single thread performance) isn't impacted long term. Most notably, by the time no-gil is the default.

So.. you shouldn't see single thread performance go down, and you'll only get into race-condition territory if you actively decide to change how you write your python code.

GIL removal and the Faster CPython project

Posted Aug 8, 2023 18:12 UTC (Tue) by roc (subscriber, #30627) [Link]

Sure, people want to "ensure Faster Python (single thread performance) isn't impacted long term" but that isn't really possible.

You could make the entire JIT state per-thread, I suppose, though that will be slow since you'll have an extra layer of indirection to get to your PICs etc. Otherwise you'll be synchronizing on it some way or another.

You'll definitely have unavoidable overheads making sure that Python object state (e.g. refcounts) isn't corrupted by data races. Sure, do deferred refcounting etc, but you need extra machinery that at some point you'll have to pay for. Also the complexity of the Python implementation is going to soar.

Cost of Python 3 transition not foreseen?

Posted Aug 18, 2023 14:48 UTC (Fri) by faassen (guest, #1676) [Link]

> One thing that everyone wants to avoid is another
> "flag day" transition like that of Python 2 to 3. The huge
> and unfortunate impact of Python 3 being incompatible
> with its predecessor was not foreseen—the core
> developers vastly underestimated the growing
> popularity of the language, for one thing—but that
> mistake will not be repeated.

First of all, I do appreciate that the lesson was learned!

But this is not the first time that this site has mentioned that the huge impact of Python 3 not being compatible with Python 2 was not foreseen. I beg to differ, and I have a post from 2007 to prove it:

https://blog.startifact.com/posts/older/brief-python-3000...

> It won't be easy to motivate a customer to pay for
> porting activity that will bring no new features to them
> whatsoever. People will therefore continue to run this
> code on Python 2.x. Since Python 2.x code doesn't work
> on Python 3.x, it won't be accessible to people who made
> the jump. Since Python 3.x code doesn't work on Python
> 2.x, it won't be accessible to those with existing code
> bases who can't make the jump any time soon. As a
> result, two Python communities for a period of what I
> expect to be 5 to 10 years.
> ...
> Python 3 is a serious risk to the Python community.

Now I was wrong about "two Python communities", but otherwise the point stands.

Follow-up posts:

https://blog.startifact.com/posts/older/python-3-worries-...

Perhaps this obscure blog wasn't noticed by anyone? No: this statement by me in particular caught the attention of a prominent Python core developer, who didn't like it at all and we argued about it (in person) several times:

> You, the core developers, are causing a huge risk to the
> Python community by splitting it asunder for a period
> of years, and increase the code maintenance costs of all
> Python developers significantly due to this transition.

More context:

https://blog.startifact.com/posts/older/the-purpose-to-my...

And the final post in that sequence:

https://blog.startifact.com/posts/older/communicating-wit...

So shall we put to rest the notion that the cost of the Python 2 to Python 3 transition wasn't foreseen? Some people did foresee it. It was just not taken seriously at the time.