Tcl helped Guile Scheme

Posted Oct 3, 2024 20:04 UTC (Thu) by NYKevin (subscriber, #129325)
In reply to: Tcl helped Guile Scheme by foom
Parent article: Tcl/Tk 9.0 released

> You must somehow ensure you're calling into the correct Python interpreter state. But all the well-lit paths are broken and incorrect -- they assume that either you already have a python interpreter context associated with the current thread, or that there is a single "main" interpreter that you want to use. (See https://github.com/python/cpython/issues/59956)

Unfortunately, there's just not much that can be done to properly fix that API. The whole point of PyGILState is to provide a convenient way for foreign code to call into Python without having to know any details of how Python has been initialized or configured. From the perspective of this API, you could be calling into a /usr/bin/python process from a C extension, calling into an embedded interpreter from the program doing the embedding, or any number of more complicated cases. In principle, I think you can even call into this API from a ptrace-injected thread.

The obvious problem is that, if the foreign code does not know which interpreter it wants, then the API has no hope of selecting the right one in the general case. And if the foreign code does know which interpreter it wants, it can call PyThreadState_New() etc. instead of using PyGILState. That's marginally more verbose, but frankly not by very much (there would be some marginal utility in a pair of helper functions or macros similar to Py_BEGIN_ALLOW_THREADS, but each would only be two or three lines long). The larger problem is that you need to restructure your C code in such a way that you know which interpreter you want, either because you create it in the first place, or because you arrange for Python to call into you before you call into Python.

Offering this API in the first place was perhaps imprudent in retrospect, but I'm sure there are plenty of cases where it is very useful.

Tcl helped Guile Scheme

Posted Oct 3, 2024 20:19 UTC (Thu) by anselm (subscriber, #2796) [Link]

The obvious problem is that, if the foreign code does not know which interpreter it wants, then the API has no hope of selecting the right one in the general case.

Tcl addresses this by making the first parameter to most of the function calls in its C API a pointer to a Tcl_Interp structure (which contains the state of the Tcl interpreter in question). That way, it is always clear which Tcl interpreter is meant when you call Tcl from your own C code.

This structure is one of the consequences of Tcl being originally intended as an embedded extension language – the application (usually written in C or C++) would create a new Tcl interpreter via the Tcl C API and use that to run Tcl code (which would then presumably use functionality that the application provided to that interpreter in the shape of Tcl commands) –, rather than a free-standing programming language with an escape hatch into C. Later on, when Tcl acquired the capability to load dynamic shared libraries, it became more popular to provide the “application” as a shared library you'd load from tclsh or wish (a tclsh that is already linked to Tk), rather than the other way round by having a C application that brought in libtcl.so.

Tcl helped Guile Scheme

Posted Oct 4, 2024 3:12 UTC (Fri) by foom (subscriber, #14868) [Link] (4 responses)

Yes, perhaps the best way to fix the Python GILState APIs is to simply deprecate them and recommend an interpreter-aware alternative.

ISTM that a lot of the time, C code will be doing something like invoking a python function callback pointer which had been recorded in the past (potentially on a different thread). It would be "easy" for such code to record the interpreter pointer at the same time as recording the function pointer. Or, at least, it would be if there existed proper documentation of the requirement to do so, and demonstrating how to do so.

Tcl helped Guile Scheme

Posted Oct 4, 2024 18:46 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (3 responses)

It is documented, albeit badly (IMHO). See https://docs.python.org/3/c-api/init.html

Unfortunately, the correct procedure is rather path-dependent and can be complicated in some edge cases, which is probably why the PyGILState functions exist. Here are some examples:

* If Python previously called into you from the same thread, then that thread would usually be managed by Python, and so you probably already have the GIL. Then there would be nothing to do. The only exception I can think of is if you called into Python first (and then it called you back), which implies that you already figured out how to acquire the GIL, so just do whatever you did again.
* If Python previously called into you from a different thread, then you can use PyInterpreterState *interp = PyInterpreterState_Get() to get the current interpreter, and then (on the new thread) call PyEval_RestoreThread(PyThreadState_New(interp)) (error checking elided). When you're done calling into Python, you will probably want to call PyThreadState_Clear(PyThreadState_Get()); PyThreadState_DeleteCurrent() to free the object created by PyThreadState_New() (and implicitly release the GIL). If you want to keep it around for multiple calls into Python, you can use PyEval_SaveThread()/PyEval_RestoreThread() to release and re-acquire the GIL in the usual manner (and you can also use the Py_BEGIN_ALLOW_THREADS convenience macro), but you should eventually free it when finished calling into Python. You need a separate thread state object for each thread that will call into Python (technically, you need one for each thread-interpreter combination, but I'm assuming you're not planning to interleave interactions with multiple interpreters on a single thread, because then you have to use PyThreadState_Swap() and the complexity really starts to make your code hard to read).

Frankly, those two bullets cover the vast majority of "real" use cases. But for full feature parity, there's a long tail of other cases to consider:

* If Python did not previously call into you at all, then you have a couple of further options. You could use the "main" interpreter, which is the first interpreter initialized for the process, as returned by PyInterpreterState_Main(), but that's really only suitable if you're not going to interact with any pre-existing Python objects and just need an arbitrary interpreter. In that case, it is probably preferable to create your own private sub-interpreter instead, preferably with its own private GIL (which you can do by acquiring the main interpreter's GIL and then using Py_NewInterpreterFromConfig(...) to create the sub-interpreter). OTOH, creating lots of tiny interpreters is probably not optimal either, so there's a certain amount of nuance here. You may need to experiment with different setups and benchmark the performance. Creating a private interpreter is most likely to be worth it if you know that the rest of the process is making extensive use of the main interpreter (or some sub-interpreter which shares the main's GIL), and least likely if you're the only code in the whole process calling into Python.
* The other possibility is that you know which interpreter you want, because you created it in the first place. But then this reduces to the "Python previously called you from a different thread" case, since creating an interpreter puts your thread into (more or less) the same state as if Python just called into it from that interpreter. It also implicitly releases the parent interpreter's GIL (if you request a separate GIL for the child interpreter), so you need not worry about doing that.
* PyGILState_Ensure() can also be called recursively if you do not know whether you hold the GIL, and so for full parity, we need to have a way to check whether you have the GIL already (to avoid self-deadlock in the case that you do have the GIL). You can use PyGILState_Check() for that, and unlike the rest of the PyGILState API, this ought to be reasonably compatible with multiple interpreters. That's because it is reading the same thread-local variable that the rest of the PyThreadState API interacts with, so it should not "care" whether you are acquiring the GIL via PyGILState or with the interpreter-aware functions. The caveat is that it is possible to clear this variable without actually releasing the GIL (e.g. with PyThreadState_Swap()), in which case this function can lie to you. So don't do that and then try to query whether the GIL is held - PyThreadState_Swap() requires holding the GIL anyway, so I'm not entirely sure there's even a use case for doing this in the first place.
* Believe it or not, we're still not done, because recursive calls to PyGILState_Ensure() can be interleaved with the rest of the PyThreadState API, including PyEval_SaveThread(), so it also has to check for that case explicitly and re-acquire the lock if it has been released since the last call. I'm... not really sure why you would do that, but apparently it is a thing you can do.
* Note also that Python's thread-local storage API does not require holding the GIL to use, and supports arbitrary (void*) types, so you can record additional information there if it is useful to do so (which is exactly how PyGILState works internally - it maintains a counter of how many times you have called into it in order to figure out when to destroy the thread state, and this system is even designed to avoid destroying manually-created thread states so that it can be safely interleaved with PyThreadState_New() and company).

Fair warning: I have read the documentation for these functions. I have not actually tried to write code against them. I did read some of CPython's source code to spot-check a few points of uncertainty, and I'm fairly sure that most if not all of the above is basically correct (in particular: PyGILState_Ensure() does indeed call PyEval_RestoreThread() to acquire the GIL if necessary, so I'm reasonably convinced that it is legal to create a new thread state from scratch and "restore" it), but it is always possible that I have missed something.

Support for sub-interpreters

Posted Oct 5, 2024 11:26 UTC (Sat) by cgm (guest, #173850) [Link] (2 responses)

Wow! That's a pretty good argument for using Tcl!

Support for sub-interpreters

Posted Oct 6, 2024 8:28 UTC (Sun) by NYKevin (subscriber, #129325) [Link] (1 responses)

Yes, but then I have to learn a new language. I can't throw a stone without hitting three Python scripts written by random people, so I have to know Python.

Support for sub-interpreters

Posted Oct 8, 2024 14:43 UTC (Tue) by dskoll (subscriber, #1630) [Link]

Tcl is a very easy language to learn. Most experienced programmers could learn it in a day or two.

But yes, given the relative popularity of languages, knowing Tcl isn't much of an advantage wrt getting work or understanding other software.