|
|
Log in / Subscribe / Register

Using the limited C API for the Python stdlib?

By Jake Edge
September 20, 2023

The "limited" C API for CPython extensions has been around for well over a decade at this point, but it has not seen much uptake. It is meant to give extensions an API that will allow binaries built with it to be used for multiple versions of CPython, because those binaries will only access the stable ABI that will not change when CPython does. Victor Stinner has been working on better definition for the API; as part of that work, he suggested that some of the C extensions in the standard library start using it in an effort for CPython to "eat its own dog food". The resulting discussion showed that there is still a fair amount of confusion about this API—and the thrust of Stinner's overall plan.

The limited API comes from PEP 384 ("Defining a Stable ABI"), but that is largely a historical document at this point. The C API Stability document and developers guide both have more up-to-date information. There are several APIs available that extensions can use, but only the limited API provides ABI stability between major releases of CPython (e.g. from 3.11 to 3.12); packages using the other APIs will need to be rebuilt in order to ensure that they work with a new major (or even minor, in the case of the unstable API) release.

At the end of August, Stinner wondered about switching some of the C-based extensions in the standard library to use the limited API. The goal is to more extensively test the API and to promote it by example: "Using private C API functions and the internal C API should be the exception, not the default in Python stdlib". While the standard library itself is rebuilt and packaged with a new CPython release, other extensions will benefit from moving to the stable ABI (also known as "abi3"), which comes from using the limited API:

The stable ABI makes the distribution of package binaries easier. For example, binaries are already available before the new Python is being released! It makes newer Python usable since the first day of its release, because it's simply the same binary for all Python versions. (One binary per platform+architecture is still needed.)

It turns out that at least a few standard library modules are already using the limited API, so all that is needed in order to "convert" them is a line that declares that the module uses the limited API:

    #define Py_LIMITED_API 0x030d0000  /* value is version 3.13 */
Defining Py_LIMITED_API hides symbols (functions and other interface elements) that are not part of the limited API, so that they cannot be used in the extension code. Other modules can be converted with only minor changes to them, Stinner said. There are some standard library modules that will not be changed, yet, because their performance suffers from being unable to use the internal API. He reported a performance regression in the C extension for the statistics module as part of closing his pull request to switch it to the limited API; in the end, he decided that doing so made little sense. But, for many other modules, "there is no significant impact on performance"

Stinner tried to start converting standard library modules back in 2020, but ran into a few different problems, which have now largely been resolved. Beyond performance degradation, he also has encountered API calls that are not part of the limited API, but perhaps should be considered for inclusion. He wondered what other core developers thought about converting some standard library modules.

Barry Warsaw liked the idea as a way for the project to test out the limited API itself. He thought that doing so might also help if it was decided to move some modules out of the standard library, since a Python Package Index (PyPI) replacement could then have a binary wheel ready and waiting for new CPython releases. Alex Gaynor was also in favor:

I like the idea of eating our own dog food. I'm also the author of several of those packages that use abi3 wheels, so I have a strong [interest] in the limited API becoming better :)

I'm also sympathetic to the people who will say, "eating our own dog food isn't a good enough reason to lose performance", so I think it would be a very good outcome of this process if, wherever we identify areas for improvement by eating our own dog food, we make the dog food taste better.

But Guido van Rossum was less enthused with the idea; he thought it would lead to a lot of churn and a bunch of pull requests (PRs) "that few people care to review, and that will increase everybody's frustration (not just yours) with how hard it is to get people to review PRs". The standard library modules are not broken, so he wondered why they were being "fixed":

"Eat your own dogfood" is a fine idea, and I think it's great to apply it to new modules. Just like we sometimes add [type] annotations to new code, despite our general reluctance to add annotations to existing code (especially stdlib code). I feel the same ought to apply here: let's not try to "fix" existing modules, because they aren't broken, and ultimately there is no reason for the stdlib to use the limited API.

Stinner replied with a list of links to commits and issue discussions as background about the effort, going back to 2018, but Van Rossum was concerned that the underlying motivation was somewhat suspect:

The argument seems to be "dogfooding is good" and possibly that stdlib modules are used widely as "example code" so best practices should be followed? Those aren't technical reasons though – IMO this smells like technical solutions for social problems.

But Stinner said that there is an underlying technical reason as well: "the limited C API is badly tested by Python itself". That has led to finding bugs after a release had already been made; if some parts of the standard library were built and tested with the limited API, those bugs could be found and fixed well before a release is made. In addition, converting real extensions will help show any gaps in the API functions available in the limited API.

Van Rossum suggested moving slowly with any changes to the standard library and wanted to discuss the C API at the upcoming core developer sprint in October. He also outlined his understanding of the different APIs that are available for CPython, along with what the guarantees are for extensions that use them. It is a somewhat complicated picture that Stinner is trying to clarify as part of his work.

But the stable ABI has been around for quite some time at this point, Marc-André Lemburg said, and has seen limited adoption, so perhaps the effort should be redirected into helping extensions remain compatible with a range of Python versions. Those extensions would need to be recompiled for new CPython major versions, "but that's easily done using cibuildwheel". That tool can build and test binary wheels for multiple operating systems and Python versions as part of a project's continuous-integration (CI) process.

The tooling for building extensions has not helped with adoption of the limited API, though, Paul Moore said. Currently, the tools default to using the full C API for extensions, so that is generally what extension authors do; if that changed, adoption might grow substantially. Beyond that, Petr Viktorin pointed out that cibuildwheel only helps with extensions that are on PyPI; applications that use CPython as a way to create their own plugins and extensions want to use the stable ABI so they can work with multiple CPython versions. A Vim commit outlined the situation well, he said.

In a lengthy message, Stinner described the overall problem he is trying to help solve: having extensions be more quickly available at the time a new CPython is released. He works on Fedora, which will be shipping the newly minted CPython 3.12 (due in early October) in Fedora 39, which is slated for mid-October; the hope is to have up-to-date versions of most of the popular extensions available by that time. He sees the limited API (thus stable ABI) as being a key facilitator of that for future CPython releases. "If we can help maintainers to move towards the limited C API, you can expect having more C extensions to be usable at day 1 of Python 3.13 release." That will also help the maintainers of the extensions, since users will not be clamoring for them to update their extension as soon as a new Python is available.

In Van Rossum's mind, it is "the requirement that once 3.x.0 is released all 3rd party packages should be instantly available" that is the root cause of the problem; he suggested resetting user expectations since that is never going to be achievable. Viktorin wondered what kind of time frame would be reasonable to expect most third-party extensions. Van Rossum replied that it has generally taken a few months after the release to get to that point, but thought that package maintainers should be encouraged to start putting together wheels for their modules once the first release candidate of a new CPython is released. He is also "unhappy about the pressure I am currently feeling to make it our fault if not every 3rd party package works on day one".

As might be guessed, Stinner disagreed with much of that. He has no silver bullet, but getting more packages to use the limited API will lead to more of them being available on day one. Meanwhile, maintainers should not be subjected to additional pressure to update their builds; they "prefer to work on new features, or fix their own bugs, rather than following Python C API changes". Van Rossum tired of the discussion, however, and wanted to wait until they could talk about the issue face-to-face in October.

There were other sub-topics in the thread, of course, but the question of what to do for the standard library, if anything, will presumably be the subject of a lively discussion at the sprint. Van Rossum seems unconvinced that the stable ABI has much to offer ("I still feel that the Stable ABI is a solution largely in search of a problem"), but other core developers (and extension authors) disagree. In the end, it seems unlikely that there will be any movement away from supporting the limited API, though the effort to broaden its reach—in CPython itself at least—is still up in the air.


Index entries for this article
PythonC API
PythonStandard library


to post comments

Using the limited C API for the Python stdlib?

Posted Sep 21, 2023 8:19 UTC (Thu) by intgr (subscriber, #39733) [Link] (1 responses)

I'm wondering, is this "limited API" or ABI going to remain compatible after moving to a GIL-less Python? (https://lwn.net/Articles/939981/)

Using the limited C API for the Python stdlib?

Posted Sep 22, 2023 2:45 UTC (Fri) by vstinner (subscriber, #42675) [Link]

I suggest you reading this discussion which is about PEP 703 ("no GIL") and the stable ABI: https://discuss.python.org/t/python-abis-and-pep-703/34018 In short, Sam Gross sees the stable ABI as a solution to ship a single binary working on Python with a GIL and on Python with no GIL.

Using the limited C API for the Python stdlib?

Posted Sep 21, 2023 15:52 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

> He is also "unhappy about the pressure I am currently feeling to make it our fault if not every 3rd party package works on day one".

But I thought we were learning lessons from the Python 2 -> 3 migration failure? If we're still getting paper cuts every minor release…

Alas, if we're waiting here, I can't test my project until all of its runtime deps are also available, so the long-tail of "ready for 3.12" can be *months* before "oh, I found a bug" can even reliably happen.

> Van Rossum seems unconvinced that the stable ABI has much to offer ("I still feel that the Stable ABI is a solution largely in search of a problem")

I maintain a project that has 1+GB per release to cover the platforms, architectures, and Python versions supported. If I could instead build just one wheel per platform, I'm sure PyPI maintainers would appreciate the reduction in required storage (though I do end up deleting past releases to make room, something archivists probably cringe at). I know my CI machines would appreciate the reduced highly-duplicate work (`sccache` can only do so much).

I've started looking at using the stable API, but I've always found "one more thing" that is missing and run out of time to continue investigations.

Using the limited C API for the Python stdlib?

Posted Sep 22, 2023 9:56 UTC (Fri) by cyperpunks (subscriber, #39406) [Link]

>> He is also "unhappy about the pressure I am currently feeling to make it our fault if not every 3rd party package works on day one".

>But I thought we were learning lessons from the Python 2 -> 3 migration failure? If we're still getting paper cuts every minor release…

Python treat all release the same way, there are no LTS releases with less changes and more stability, only the patch number
changes and that means no incompatible changes in the view of the users.

Why is this so hard for Python maintainers to understand?

Using the limited C API for the Python stdlib?

Posted Sep 26, 2023 21:15 UTC (Tue) by salimma (subscriber, #34460) [Link]

I wonder how this relates to the HPy project - which aims to provide a better C API for Python.

https://hpyproject.org/

Both that and abi3 seem better for extensions that can live with the API provided than using the full C API


Copyright © 2023, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds