Using the limited C API for the Python stdlib?
The "limited" C API for CPython extensions has been around for well over a decade at this point, but it has not seen much uptake. It is meant to give extensions an API that will allow binaries built with it to be used for multiple versions of CPython, because those binaries will only access the stable ABI that will not change when CPython does. Victor Stinner has been working on better definition for the API; as part of that work, he suggested that some of the C extensions in the standard library start using it in an effort for CPython to "eat its own dog food". The resulting discussion showed that there is still a fair amount of confusion about this API—and the thrust of Stinner's overall plan.
The limited API comes from PEP 384 ("Defining a Stable ABI"), but that is largely a historical document at this point. The C API Stability document and developers guide both have more up-to-date information. There are several APIs available that extensions can use, but only the limited API provides ABI stability between major releases of CPython (e.g. from 3.11 to 3.12); packages using the other APIs will need to be rebuilt in order to ensure that they work with a new major (or even minor, in the case of the unstable API) release.
At the end of August, Stinner wondered
about switching some of the C-based extensions in the standard library to
use the limited API. The goal is to more extensively test the API and to
promote it by example: "Using private C API functions and the internal C
API should be the exception, not the default in Python stdlib
". While
the standard library itself is rebuilt and packaged with a new CPython
release, other
extensions will benefit from moving to the stable ABI (also known
as "abi3"), which comes from
using the limited API:
The stable ABI makes the distribution of package binaries easier. For example, binaries are already available before the new Python is being released! It makes newer Python usable since the first day of its release, because it's simply the same binary for all Python versions. (One binary per platform+architecture is still needed.)
It turns out that at least a few standard library modules are already using the limited API, so all that is needed in order to "convert" them is a line that declares that the module uses the limited API:
#define Py_LIMITED_API 0x030d0000 /* value is version 3.13 */
Defining Py_LIMITED_API hides symbols (functions and other
interface elements)
that are not part of the limited API, so that they cannot be used in the
extension code.
Other modules can be converted with only minor changes to them, Stinner
said. There are some standard library modules that will not be changed,
yet, because their performance suffers from being unable to use the
internal API. He reported a
performance regression in the C extension for the statistics
module as part of closing his pull request to switch it to the limited API;
in the end, he
decided that doing so made little sense.
But, for many other modules, "there is no significant impact on performance"
Stinner tried to start converting standard library modules back in 2020, but ran into a few different problems, which have now largely been resolved. Beyond performance degradation, he also has encountered API calls that are not part of the limited API, but perhaps should be considered for inclusion. He wondered what other core developers thought about converting some standard library modules.
Barry Warsaw liked the idea as a way for the project to test out the limited API itself. He thought that doing so might also help if it was decided to move some modules out of the standard library, since a Python Package Index (PyPI) replacement could then have a binary wheel ready and waiting for new CPython releases. Alex Gaynor was also in favor:
I like the idea of eating our own dog food. I'm also the author of several of those packages that use abi3 wheels, so I have a strong [interest] in the limited API becoming better :)I'm also sympathetic to the people who will say, "eating our own dog food isn't a good enough reason to lose performance", so I think it would be a very good outcome of this process if, wherever we identify areas for improvement by eating our own dog food, we make the dog food taste better.
But Guido van Rossum was
less enthused with the idea; he thought it would lead to a lot of churn
and a bunch of pull requests (PRs) "that few people care to review, and
that will increase everybody's frustration (not just yours) with how hard
it is to get people to review PRs
". The standard library modules are
not broken, so he wondered why they were being "fixed":
"Eat your own dogfood" is a fine idea, and I think it's great to apply it to new modules. Just like we sometimes add [type] annotations to new code, despite our general reluctance to add annotations to existing code (especially stdlib code). I feel the same ought to apply here: let's not try to "fix" existing modules, because they aren't broken, and ultimately there is no reason for the stdlib to use the limited API.
Stinner replied with a list of links to commits and issue discussions as background about the effort, going back to 2018, but Van Rossum was concerned that the underlying motivation was somewhat suspect:
The argument seems to be "dogfooding is good" and possibly that stdlib modules are used widely as "example code" so best practices should be followed? Those aren't technical reasons though – IMO this smells like technical solutions for social problems.
But Stinner said
that there is an underlying technical reason as well: "the limited C API
is badly tested by Python itself
". That has led to finding bugs after
a release had already been made; if some parts of the standard library were
built and tested with the limited API, those bugs could be found and fixed
well before a
release is made. In addition, converting real extensions will help show
any gaps in the API functions available in the limited API.
Van Rossum suggested moving slowly with any changes to the standard library and wanted to discuss the C API at the upcoming core developer sprint in October. He also outlined his understanding of the different APIs that are available for CPython, along with what the guarantees are for extensions that use them. It is a somewhat complicated picture that Stinner is trying to clarify as part of his work.
But the stable ABI has been around for quite some time at this point,
Marc-André Lemburg said,
and has seen limited adoption, so perhaps the effort should
be redirected into helping extensions
remain compatible with a range of Python versions. Those extensions
would need to be recompiled for new CPython major versions, "but that's
easily done using
cibuildwheel
". That tool
can build and test binary wheels for multiple operating systems and Python
versions as part of a project's continuous-integration (CI) process.
The tooling for building extensions has not helped with adoption of the limited API, though, Paul Moore said. Currently, the tools default to using the full C API for extensions, so that is generally what extension authors do; if that changed, adoption might grow substantially. Beyond that, Petr Viktorin pointed out that cibuildwheel only helps with extensions that are on PyPI; applications that use CPython as a way to create their own plugins and extensions want to use the stable ABI so they can work with multiple CPython versions. A Vim commit outlined the situation well, he said.
In a lengthy message, Stinner described
the overall problem he is trying to help solve: having extensions be more
quickly available at the time a new CPython is released. He works on
Fedora, which will be shipping the newly minted CPython 3.12 (due in early
October) in Fedora 39, which is slated for mid-October; the hope is to have
up-to-date versions of most of the popular extensions available by that
time. He sees the limited API (thus stable ABI) as being a key facilitator
of that for future CPython releases. "If we can help maintainers to move
towards the limited C API, you can expect having more C extensions to be
usable at day 1 of Python 3.13 release.
" That will also help the
maintainers of the extensions, since users will not be clamoring for them
to update their extension as soon as a new Python is available.
In Van Rossum's mind, it is "the requirement that once 3.x.0 is released
all 3rd party packages should be instantly available
" that is the root
cause of the problem; he suggested resetting user expectations since that
is never going to be achievable. Viktorin wondered
what kind of time frame would be reasonable to expect most third-party
extensions. Van Rossum replied
that it has generally taken a few months after the release to get to that
point, but thought that package maintainers should be encouraged to start
putting together wheels for their modules once the first release candidate
of a new CPython is released. He is also "unhappy about the pressure I
am currently feeling to make it our fault if not every 3rd party package
works on day one
".
As might be guessed, Stinner disagreed
with much of that. He has no silver bullet, but getting more packages to
use the limited API will lead to more of them being available on day one.
Meanwhile, maintainers should not be subjected to additional pressure to
update their builds; they "prefer to work on new features, or fix their
own bugs, rather than following Python C API changes
". Van Rossum tired
of the discussion, however, and wanted to wait until they could talk
about the issue face-to-face in October.
There were other sub-topics in the thread, of course, but the question of
what to do for the standard library, if anything, will presumably be the
subject of a
lively discussion at the sprint. Van Rossum seems unconvinced that the
stable ABI has much to offer ("I still feel that the Stable ABI is a
solution largely in search of a problem
"), but other core developers
(and extension authors) disagree. In the end, it seems unlikely that there
will be any movement away from supporting the limited API, though the effort to
broaden its reach—in CPython itself at least—is still up in the air.
| Index entries for this article | |
|---|---|
| Python | C API |
| Python | Standard library |
