|
|
Subscribe / Log in / New account

Progress in wrangling the Python C API

By Jake Edge
November 7, 2023

There has been a lot of action for the Python C API in the last month or so—much of it organizational in nature. As predicted in our late September article on using the "limited" C API in the standard library, the core developer sprint in October was the scene of some discussions about the API and the plans for it. Out of those discussions have come two PEPs, one of which describes the API, its purposes, strengths, and weaknesses, while the other would establish a C API working group to coordinate and oversee the development and maintenance of it.

Working group

In mid-October, Guido van Rossum announced PEP 731 ("C API Working Group Charter") as the first visible outcome of the meetings at the sprint. If approved by the steering council, it would establish a working group of the five PEP authors (Van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, and Irit Katriel) to oversee the C API, and to steer it in ways that are analogous to what the council does for Python. There are multiple contentious issues surrounding the API, the PEP states, so there is a need for a dedicated group of core developers to work through them: "The general feeling is that there are too many stakeholders, proposals, requirements, constraints, and conventions, to make progress without having a small trusted group of deciders."

Some presentations on the C API at the 2023 Python language summit led to a site for gathering problems with the API. It has collected more than 60 different accounts of problems that people are experiencing using, maintaining, and extending the API. The PEP gives a rough summary of the kinds of problems that would be under the purview of the working group:

Despite many discussions and in-person meetings at core developer sprints and Language Summits, and a thorough inventory of the problems and stakeholders of the C API, no consensus has been reached about many contentious issues, including, but not limited to:
  • Conventions for designing new API functions;
  • How to deal with compatibility;
  • What's the best strategy for handling errors;
  • The future of the Stable ABI and the Limited API;
  • Whether to switch to a handle-based API convention (and how).

Beyond just gathering problems, though, the effort has expanded to gather potential solutions in two other repositories: API evolution for relatively non-controversial, common-sense changes, and API revolution "for radical or controversial API changes" that will definitely need to go through the PEP process.

The reaction to the announcement was generally positive, though there were suggestions that the working group should include some other stakeholders, such as developers of C extensions. The PEP states that the working group will be made up of at least three core developers, but that "members should consider the needs of the various stakeholders carefully". The PEP notes that the group serves at the pleasure of the steering council; thus the council provides a check on the actions of the group. As Dower noted, the stakeholders are not being left out:

Since the WG would be proposing changes that are only directly binding on the core development team, I'm okay with the core developer requirement.

If the WG doesn't appear to be soliciting contributions from prominent users of the API and factoring them in, that's a reason to go to the steering council with a complaint.

Being directly on the WG isn't a prerequisite to contribute. It's just a burden of having to take responsibility for the decisions and how they impact other competing interests.

Analysis

On November 1, Katriel announced an outcome from this year's language summit: PEP 733 ("An Evaluation of Python's Public C API"). It has nearly 30 authors that reads like a "who's who" of Python core developers—perhaps reflecting the attendees of the summit—but was coordinated by Katriel. Effectively, it is a summary of the problems that were collected, categorized into nine separate problem areas, along with a look at the various stakeholders and their requirements. Beyond that, it also describes some of the history of the API, its purposes, and its strengths ("to make sure that they are preserved"):

As mentioned in the introduction, the C API enabled the development and growth of the Python ecosystem over the last three decades, while evolving to support use cases that it was not originally designed for. This track record in itself is indication of how effective and valuable it has been.

For one thing, the stakeholders have diversified over the years. The C API first came about "as the internal interface between CPython's interpreter and the Python layer", but it was later exposed for third-party developers to use for extending CPython and to embed the interpreter in their own applications. Since then, new Python implementations have arisen that also need to use the C API; in addition, multiple projects seek to provide bindings or a better API for Python extensions in C (e.g. Cython), Rust (e.g. PyO3), and other languages and frameworks. Those projects use the C API in various ways, as well.

The overarching problem that has been identified with the C API is the difficulty of maintaining and evolving it, in part due to all of the different stakeholders and their competing needs. A process for incremental evolution, with deprecations and eventual removals of some parts of the API, could be a possible way forward; another option is periodic upheaval of the API via redesigns "each of which learns from the mistakes of the past and is not shackled by backwards compatibility requirements (in the meantime, new API elements may be added, but nothing can ever be removed)". Between those two extremes is a compromise approach that fixes "issues which are easy or important enough to tackle incrementally, and leaving others alone".

But the CPython core developers have different opinions on how to change the API, which is "an ongoing source of disagreements". So a fundamental framework for changes needs to come about:

Any new C API needs to come with a clear decision about the model that its maintenance will follow, as well as the technical and organizational processes by which this will work.

If the model does include provisions for incremental evolution of the API, it will include processes for managing the impact of the change on users [Issue 60], perhaps through introducing an external backwards compatibility module [Issue 62], or a new API tier of "blessed" functions [Issue 55].

Another problem area is the specification of the API, or, in truth, the lack thereof. Currently it is defined as "whatever CPython does" in a particular version; the documentation provides some amount of specification, but it is insufficient to verify any of the different API levels. That leads to unexpected changes to the API between versions, for one thing. The API also exposes more of the internals of CPython than is intended—or desired—and is C-specific, so other languages need to parse and handle C language constructs of various sorts.

The different API levels (or tiers), "which provide different tradeoffs of stability vs API evolution, and sometimes performance", are also a source of problems. The stable ABI, which is used by binary extensions that are built using the limited version of the C API, is "incomplete and not widely adopted"; there are different opinions of whether it is worth keeping at all, but, if it is kept, it needs to support multiple ABI versions in a single binary. The limited API needs some changes as well. Meanwhile, there are inconsistencies in the way that CPython private functions are named and there may be a need to add an "unsafe" tier that provides functions that remove error checking for performance purposes.

Most of the other categories listed in the PEP cover choices that were made in parts of the API, in particular, object reference management, object creation, type definition, and error handling, that are now seen as sub-optimal. Evolving (or replacing) those is desired, but that will have to be worked out once the overall maintenance scheme is determined. There are a handful of implementation flaws and some missing features that need attention as well.

As can be seen, there is a lot for the working group (and steering council) to address with regard to the C API. First up, is whether the council is ready to approve the working group charter; given that the council has already indicated that it is in favor of the idea, it seems likely that approval will come fairly quickly.

Current C API changes

Meanwhile, though, some work has been going on for the Python 3.13 release that is due next year. In fact, at the end of October, Stefan Behnel raised some complaints about the large amount of changes that appeared in the first 3.13 alpha release:

Hundreds of functions were removed, renamed, replaced with different functions. Header files were removed. Names were removed from header files. Macros were changed, even a few publicly documented ones. And the default answer to "how do I replace this usage" seems to be "we'll try to design a blessed replacement in time".

The changes were extensive and disruptive enough that he (provocatively) wondered if the release should be called "Python 4". But, as Jean Abou Samra pointed out, this was all part of Stinner's longstanding plan to clarify the public versus private C API. It is, however, clearly disruptive, so others joined Behnel in thinking that things were moving too quickly.

Stinner maintains that the situation is manageable and that he is planning to devote much of his time over the next few months toward getting extensions working. He has marked many functions as private, which caused the breakage that Behnel and others encountered, but plans to "public-ize" various API elements as needed to support the existing C API users so that all of them are working by the time the first 3.13 beta is released in May.

Stinner opened a GitHub issue to track the problems. As can be seen there, opinions differ on how to address the problems that arise; Stinner would like to fix them one-by-one, but others think reverting the changes makes more sense. One complicating factor, as steering-council member Gregory P. Smith noted, is that there is another big development going on in the 3.13 development tree: removing the global interpreter lock (GIL) from the interpreter. Those changes are described in PEP 703 ("Making the Global Interpreter Lock Optional in CPython") and it is important not to impede progress in that area:

We need to treat 3.13 as a more special than usual release and aim to minimize compatibility headaches for existing project code. That way more things that build and run on 3.12 build can run on 3.13 as is or with minimal work.

This will enable ecosystem code owners to focus on the bigger picture task of enabling existing code to be built and tested on an experimental pep703 free-threading build rather than having a pile of unrelated cleanup trivia blocking that.

One senses that a directional shift in Stinner's current C API work may be in the offing; he did point out that the week-old Cython 3.0.5 release has preliminary support for 3.13-alpha1, which is an indication that his plan is generally working. One thing that will not be shifting, however, is the version to Python 4—ever, according to Smith. Should the major version of Python need to change at some point, it seems that four "shalt thou not count", as with The Holy Hand Grenade of Antioch. Five, of course, should be "right out", at least according to Monty Python—for Python, the language, though, we will just have to wait and see.


Index entries for this article
PythonC API


to post comments

Progress in wrangling the Python C API

Posted Nov 8, 2023 4:09 UTC (Wed) by 5fdb1f (guest, #156654) [Link] (6 responses)

Could the C API be "revolution"-ized enough to make the Rust bindings more ergonomic (see: https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/) or would that break too much on the C side?

Progress in wrangling the Python C API

Posted Nov 8, 2023 13:49 UTC (Wed) by encukou (guest, #83287) [Link]

That particular issue is looking for a volunteer comfortable with C. My offer for mentorship still stands: https://discuss.python.org/t/20314

Progress in wrangling the Python C API

Posted Nov 9, 2023 2:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (4 responses)

This is an unfortunate situation where Python is offering semantics that (safe) Rust is unable to model, mostly because Python's semantics are little more than "here's a pointer, here's a flag telling you whether or not you can write to it, also by the way I'm going to hand out the same pointer and flag to anyone else who asks."

(I mean, yes, it is a little more complicated than that. Python actually tells you all sorts of metadata about how the buffer is laid out in memory, whether it is multi-dimensional, etc., and can even hand out non-contiguous buffers. But in terms of what you are allowed to do with it, it's pretty much just "a pointer and a writable flag.")

In general, Python's attitude is that, if you want to fiddle with the buffer protocol, you should be doing it as an implementation detail of some object that you entirely control, or else you should be using some kind of explicit locking which is not managed by the interpreter. But Python is also totally fine with a library writing "lol this is not thread-safe, don't even try" in their documentation and calling it a day. And frankly, that insouciance cannot be made compatible with "let's have a sound Rust wrapper type that can safely interact with arbitrary Python buffer protocol objects." Python does not provide the guarantees that Rust wants, nor does Python provide a standardized mechanism for Rust to ask for such guarantees.

What I think needs to happen is one (or more) of the following:

* Rust code may only use the buffer protocol to interact with objects that it owns or mutably borrows (in Rust terms) and that have a (Python) refcount of 1 (or which have not returned from their Python constructors yet). Additionally, the object must not be compatible with (Python) weak references (so that the refcount is correct). This combination of requirements is very limiting. I'm not sure how easy it is to prove all of these requirements are satisfied at compile time, but you could dynamically check them, so this could be used to build a safe primitive out of unsafe code.
* Python starts offering flags that indicate a buffer is const or exclusive. const is a stronger condition than read-only, because it means the object won't change of its own accord (or by some means other than the buffer protocol). But const can probably only be offered for objects that Python considers immutable, and exclusive raises questions of how the locking is supposed to work. What happens if a legacy client asks for a (writable) buffer when the new API considers the request unacceptable? Does it just get a BufferError? That probably breaks something, but there's no obvious alternative behavior.
* Python requires only one client to have mutable access to a given buffer at a given time. It is technically possible to do this without breaking ABI compatibility, because buffer accessors are already required to call PyBuffer_Release when they are done with the buffer (so you can keep track of who is reading or writing the buffer at any given time). However, this would be semantically incompatible with the existing behavior and would likely result in problems anyway. For example, there's currently nothing wrong with a client eagerly acquiring all of its buffers up-front, figuring out which ones alias, and then managing those aliases with a separate set of locks on the C side, without informing Python of the locks at all. If you added a lock to the buffer protocol, then that client would break because it would be unable to acquire the same buffer more than once. It is, frankly, unreasonable to break that client just to make things easier for Rust - they used the API as advertised, they're not doing anything particularly sketchy or unusual, and there's currently no way to inform Python of the locks even if you want to (nor does the current version of Python have the slightest idea what to do with that information).
* Python offers an advisory locking mechanism for the buffer protocol. This doesn't really solve anything because clients can ignore it, but maybe you deprecate non-locked access and try to force everybody to take locks in a future version. Single-threaded clients will probably be unimpressed by that.
* Rust treats Python buffers as equivalent to raw pointers, and stops trying to build safe wrappers for the protocol as a whole. Instead, build wrappers specific to each type you want to interact with. Unfortunately, that's probably not a whole lot easier than trying to wrap the whole protocol, but at least you can somewhat depend on the semantics of the specific C extension, which might be able to offer stronger guarantees than Python does.

Progress in wrangling the Python C API

Posted Nov 9, 2023 2:32 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (2 responses)

Another option is to tell Python programmers “don’t do that”. Python already allows accessing arbitrary memory via ctypes, so there is nothing new here.

Rust could also view this as an array of AtomicU8, which is safe if not very useful.

Progress in wrangling the Python C API

Posted Nov 9, 2023 7:38 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (1 responses)

> Another option is to tell Python programmers “don’t do that”. Python already allows accessing arbitrary memory via ctypes, so there is nothing new here.

That's... completely irrelevant to everything.

ctypes enables Python code to access memory owned by a C object (or, in principle, any sort of data structure that can be describe in C-like terms), as well as to call C functions (and functions with C bindings). The buffer protocol enables C extensions to access memory owned by another C extension (or a builtin CPython type), and it also incidentally has a very small Python API[1] that enables Python to act as glue code for this process.[2] Neither is a substitute for the other, and telling Python programmers not to write glue code will not change much of anything (they will ignore you because "writing glue code" is Python's main use case).

[1]: https://docs.python.org/3/library/stdtypes.html#memoryview
[2]: Python can read from these buffers, but the whole point of the buffer protocol is to enable direct C-to-C memory access (i.e. without holding the GIL or executing Python bytecode). The object that gave you the buffer probably already exposes __getitem__ etc. to Python anyway.

Progress in wrangling the Python C API

Posted Nov 11, 2023 9:38 UTC (Sat) by DemiMarie (subscriber, #164188) [Link]

The reason I mentioned ctypes is to show that one can already corrupt memory from within Python. “Don’t do that” = “if you change memory that you should not change and something breaks, you get to keep both pieces.”

Progress in wrangling the Python C API

Posted Nov 10, 2023 7:17 UTC (Fri) by himi (subscriber, #340) [Link]

It seems to me that the Python C API would probably benefit from formalising the ownership and mutability rules to match Rust's - not just because it makes interfacing Python with Rust easier, but because it would improve the lot of /everything/ interfacing with Python. Rust's rules are simple, fairly easy to reason about, and powerful, even if you don't have the whole language enforcing them - it seems like it'd be a good model for /anything/ that wants to manage access to shared memory, particularly across a natural boundary like an extension API.

If I've understood things from previous discussions about this properly (and option 3 in your comment supports this) the API already kind of supports some of this, though not really very well - from memory it might be doable with some new flags that plugins supporting the new model would set and check, while old plugins would ignore/not set and get the same behaviour they do now. Then it's just a matter of making sure the Python side maintained its end of the new model, and off you go. Of course, mixing new and old would probably Not Work(tm) in interesting ways, but mixing and matching ABIs is rarely a good idea . . .

You probably wouldn't want to try and enforce the rules with locking and so forth (though in the no-gil world maybe it'd be necessary?) - trusting that the plugin didn't ignore the flags that were set, or lie about the flags /it/ set, would seem like the best approach. After all, it's at least making explicit promises, as it stands there are no promises at all, let alone actual guard rails anywhere.

It may also be the case that changes on the Python side to support the new model would break some users - but isn't that kind of the situation anyone using the current C API has to deal with across major version changes? And if the whole C API is being reworked (hopefully also properly formalised, stabilised and documented) then isn't that a great time to implement this kind of breaking change?


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds