Leading items

Welcome to the LWN.net Weekly Edition for November 9, 2023

This edition contains the following feature content:

Reducing patch postings to linux-kernel: the famously high volume of the linux-kernel mailing list make life harder for both developers and email providers. There is currently a proposal under discussion that would redirect the posting of many patches in an attempt to mitigate this problem.
Guest-first memory for KVM: a form of address-space isolation meant to protect guests from a compromised host.
The first half of the 6.7 merge window: what the first 9,800 commits brought for 6.7.
The BPF-programmable network device: a new virtual network interface that provides significant performance improvements for some workloads.
Progress in wrangling the Python C API: the C-language API to the Python interpreter has a number of problems; developers are slowly working out a way to address some of them.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Reducing patch postings to linux-kernel

By Jake Edge
November 8, 2023

The linux-kernel mailing list famously gets an enormous amount of email on a daily basis; the volume is so high that various email providers try to rate-limit it, which can lead to huge backlogs on the sending side and, of course, delayed mail. Part of the reason there is so much traffic is that nearly every patch gets copied to the mailing list, even when it may be unnecessary to do so. A proposed change would start shunting some of that patch email aside and, as might be guessed, has both supporters and detractors, but the discussion does highlight some of the different ways the mailing list is used by kernel developers.

In a post that was not sent to linux-kernel, but to the Kernel Summit discussion list (ksummit-discuss) and the kernel.org users mailing list, Konstantin Ryabitsev described the problems he sees with the current status of the list. It has nearly 3000 subscribers and is copied on nearly every patch because of a wildcard entry at the end of the MAINTAINERS file; that, in turn, leads to around 3.1 million messages being delivered to inboxes daily based on the roughly 33,000 monthly posts to the list. But that does not work well when delivering to a whole bunch of Gmail addresses:

due to gmail's quota policies, this generally results in anywhere from 50K to 200K messages stuck in the queue all trying to deliver to gmail and being deferred with "this account is receiving too much mail"

His suggested solution is to reroute the wildcard entry so that patches go to the patches@lists.linux.dev mailing list rather than linux-kernel. That will (eventually) reduce the volume on the list, thus "unclog the outgoing queues and speed up mail delivery for everyone". Currently, the get_maintainer.pl tool, which is often used with other tools like "git send-email", will pick up the entry for "THE REST" at the end of MAINTAINERS; the entry says to send everything to linux-kernel. The linux-patches list is available, for those who want it, via the lei tool or by anonymous POP3 for those who want to receive the patches that way—at Gmail, for instance. But direct subscriptions to linux-patches will be vetted so that the mechanism for overwhelming email providers with patches does not recur.

Joe Perches, who maintains get_maintainer.pl and other development tools, called it an "excellent idea"; Borislav Petkov said that CCing patches to linux-kernel was meant for archiving purposes, so a separate list should work just fine for that instead. Others, though, disagreed with one of the downsides of the current situation that Ryabitsev had listed: "due to the sheer volume of messages, LKML is generally seen as useless for holding any actual discussions".

Eric W. Biederman said that he at least skims linux-kernel with some frequency; Christoph Hellwig agreed, adding that he does start discussions on that list as well. Meanwhile, Willy Tarreau suggested that it is a good way to keep abreast of kernel developments:

This way every day I can have a quick glance at all subjects there, that's how I discover new topics, patch series, discussions etc. I think that a non negligible number of LKML subscribers are there for this exact reason.

He said that he would personally miss the patches that got moved to the other list, but he also questioned how much improvement the change would actually bring. Others also wondered about how much traffic would be reduced; Pavel Machek thought that the number of patches picked up by the wildcard was fairly low. Paolo Bonzini said that it might be the configuration of git send-email that was actually causing the patches to be posted to linux-kernel, which means local changes would be needed to alter it; other developers did not think that was a common configuration, however. Laurent Pinchart pointed out that the "submitting patches" document is rather ambiguous about sending patches to linux-kernel:

linux-kernel@vger.kernel.org should be used by default for all patches, but the volume on that list has caused a number of developers to tune it out. Please do not spam unrelated lists and unrelated people, though.
This should be updated, even if for the only reason that the text is quite confusing (in my opinion at least, I'm not sure if it means LKML should be used for all patches, or shouldn't).

He also noted that, unlike some of the other responders, he has completely tuned out linux-kernel since the volume of email "drowns the useful information in noise for me". Greg Kroah-Hartman said that switching the tools to do the right thing will help; for example, the linux-usb mailing list is specified as where patches for the USB subsystem should go, but get_maintainer.pl still also lists linux-kernel for USB patches. The change suggested by Ryabitsev would avoid doing that, "which is a good thing and should cut down on the overall size over time".

If that change is made, though, Julia Lawall wondered how she would be able to review "the discussion that led up to a commit"; currently, looking at linux-kernel is "the obvious place to go for that". Dan Carpenter suggested "lore.kernel.org and b4 and lei", while Pratyush Yadav provided a detailed description of how to use those tools for tracking down a discussion. That procedure will find patches and discussions that occurred before he subscribed, which makes it "more powerful and complete than subscribing to mailing lists". Ryabitsev pointed out that all of the lists that are archived at lore.kernel.org are actually indexed together, so that searches using lore.kernel.org/all will find message IDs or subjects in all of the kernel lists—including linux-patches.

There may well be a discussion about the idea in the Kernel Summit track at the upcoming Linux Plumbers Conference, or perhaps the Maintainer Summit the following day. That is presumably one reason that Ryabitsev posted to ksummit-discuss, though Carpenter noted that it makes a better forum for general topics than linux-kernel these days.

There are probably as many different kernel-development styles as there are kernel developers—a number that grows with each release—so finding common ground between them all is difficult, if not outright impossible. The problems with mail delivery these days are real, sadly, and it is most certainly not only Gmail that causes those kinds of woes. Given that, which only seems to get worse over time, some kind of mailing-list fix is going to be needed; Ryabitsev's plan seems a reasonable approach that may well help. Beyond that, those who are using the large, free email providers may want to consider voluntarily switching their linux-kernel subscription elsewhere in order to improve the service and reliability of the mailing list for everyone else.

Comments (49 posted)

Guest-first memory for KVM

By Jonathan Corbet
November 2, 2023

One of the core objectives of any confidential-computing implementation is to protect a guest system's memory from access by actors outside of the guest itself. The host computer and hypervisor are part of the group that is to be excluded from such access; indeed, they are often seen as threat in their own right. Hardware vendors have added features like memory encryption to make memory inaccessible to the host, but such features can be difficult to use and are not available on all CPUs, so there is ongoing interest in software-only solutions that can improve confidentiality. The guest-first memory patch set, posted by Sean Christopherson and containing work by several developers, looks poised to bring some software-based protection to an upcoming kernel release.

Protecting memory from the host in the absence of encryption tends to rely on address-space isolation — arranging things so that the host has no path by which to access a guest's memory. The protection in this case is less complete — an overtly hostile host kernel can undo it — but it can be effective against many host-side exploits. Back in 2020, the KVM protected memory work created a new hypercall with which a guest could request that the host unmap a range of memory in use by that guest; that would render the host system (at both the kernel and user-space levels) unable to access that memory. That work ran into a number of problems, though, and never found its way into the mainline.

The guest-first-memory work takes a similar approach, but it moves the control to the host and reduces the available protection. Specifically, it adds a new KVM ioctl() command, called KVM_CREATE_GUEST_MEMFD, that takes a size in bytes as a parameter and returns a new file descriptor. The operation is similar to memfd_create(), in that the returned descriptor refers to an anonymous file, with the requested size, that lives entirely in memory. The differences are that this memfd is tied to the virtual machine for which it was created, and it cannot be mapped into user space on the host (or into any other virtual machine). This memory can be mapped into the guest's "physical" address space, though, with a variant on the usual KVM memory-management operations.

With this operation, the hypervisor can allocate memory resources for a guest without being able to access that memory itself. That protects the guest from having its memory contents disclosed or modified, either by accident or by malicious behavior on the part of a (possibly compromised) hypervisor. Unlike some previous attempts (including KVM protected memory), this operation does not take the affected memory out of the host kernel's direct memory map. Thus, while a guest using this memory is protected from user-space threats on the host, it could still be attacked by a compromised kernel. The bar to a successful attack has been raised significantly, but the protection is not total.

There are a number of advantages to using guest-first memory, according to the patch description. Currently, KVM does not allow guests to have a higher level of access to memory than the hypervisor does; if memory is to be mapped writable in the guest, it must be both mapped and writable in the hypervisor as well, even if the hypervisor has no need to be able to write that memory. Guest-first memory, by dispensing with the hypervisor mapping entirely, clearly gets around that problem.

Guest-first memory can also be useful in the presence of hardware-based memory encryption. Encrypted memory is already protected from access by the hypervisor; should the hypervisor attempt to do so anyway, the CPU will generate a trap, which is likely to lead to the hypervisor's demise. If that memory is not mapped into the hypervisor to begin with, though, it cannot be touched by accident. Unmappable memory can also be useful for the development and debugging of hypervisors meant to work with hardware-based confidential-computing features, even on hardware lacking those features.

Longer term, this feature may also be useful for the management of dedicated memory pools; a guest memfd could be set up on the pool without the need for access from the host. It could, perhaps, allow memory for guest systems to be managed (on the host) without using struct page at all, reducing overhead on the host and increasing the isolation of that memory. Also with an eye on the longer term, this patch series creates a more general concept of a "protected virtual machine" that is intended to be a container for confidential-computing mechanisms within KVM.

Meanwhile, though, guest-first memory has the downside that it cannot be migrated, meaning that host-side memory-management processes (such as compaction) will have to work around it. This limitation was seen as a significant problem when KVM protected memory was under discussion, but it has not been addressed in this series and will not be "at least not in the foreseeable future".

Even so, Paolo Bonzini (the KVM maintainer) has let it be known that he plans to apply this series after the 6.7 merge window with the idea of getting it into linux-next and, later, pushing it upstream for the 6.8 kernel release. He also said that he intends to apply the series to a future RHEL kernel, meaning that guest-only memory will show up in an RHEL release at some point in the (presumably not too distant) future. That is still unlikely to happen, though, before guest-only memory has landed in the mainline and the API has settled down.

Some settling may be required; this is a 35-part patch series adding nearly 3,000 lines of code, so it would not be surprising if, even after 13 revisions, there were some adjustments needed. Still, it looks like progress is being made on a multi-year effort to increase the amount of address-space isolation afforded to guest systems. With luck, users of shared cloud systems (of whom there are a lot) will all benefit from this sort of hardening.

Comments (21 posted)

The first half of the 6.7 merge window

By Jonathan Corbet
November 3, 2023

As of this writing, 9,842 non-merge changesets have found their way into the mainline repository since the 6.7 merge window opened. Nearly a third of those consist of the entire bcachefs development history but, even discounting that, there has been a lot of material landing for the next release. Read on for a summary of the most interesting changes pulled so far in this development cycle.

Architecture-specific

It is now possible to enable or disable 32-bit emulation on x86-64 kernels with the ia32_emulation= command-line parameter. This allows 32-bit emulation to be turned off where it is not needed. That, in theory, reduces the kernel's attack surface, since the 32-bit compatibility interfaces are seen as being less well tested than the rest of the kernel API. This option, though, allows the capability to be present for those who use it. The IA32_EMULATION_DEFAULT_DISABLED configuration option controls whether 32-bit emulation is enabled by default.
S390x and 32-bit Arm systems now support the current set (cpuv4) of BPF instructions.
After years of discussion, support for the ia64 ("Itanium") architecture has been removed. Not everybody is happy about this decision, though, and Linus Torvalds has indicated that he would be open to restoring ia64 support — but only after seeing it properly maintained out-of-tree for a year.

Core kernel

The futex2 API has been merged, providing an alternative to the single, multiplexed futex() system call. The new API also adds features for better performance on NUMA systems and support for sizes other than 32 bits. [Update: those features were not actually a part of this merge and will presumably show up in a future release; apologies for the error.]
It is now possible to use binfmt_misc to add new binary formats within unprivileged namespaces; see this commit for more information.
A set of Rust bindings for workqueues has been added; this commit contains some examples of their use.
Cpusets have a new "remote partition" mode that makes some configuration tasks easier; see this documentation commit for more information.
BPF programs can now make use of per-CPU kptrs; a small amount of information is available in this changelog.
Support for BPF exceptions (which are best thought of as a way to force an immediate exit from a BPF program) has been added. See this article and this changelog for more information.
The io_uring subsystem now supports a number of new operations. IORING_OP_READ_MULTISHOT will perform multiple reads from a file descriptor until a buffer fills. IORING_OP_WAITID is an asynchronous version of waitid(). SOCKET_URING_OP_GETSOCKOPT and SOCKET_URING_OP_SETSOCKOPT implement getsockopt() and setsockopt().
Io_uring has also gained support for futex operations, though only a subset of the futex API is implemented now.

Filesystems and block I/O

The fscrypt subsystem can now encrypt data in units smaller than the filesystem block size; this commit includes some documentation on this feature.
The Btrfs filesystem has added a new "stripe tree" data structure; its initial use is to implement RAID0 and RAID1 on zoned block devices, but it is expected to eventually address a number of longstanding problems with higher RAID levels in Btrfs in general. This out-of-tree document provides more information.
Btrfs has also added "simple quotas", which address some of the performance problems experienced with full quota support. Simple quotas only track extents in the subvolume where they were created, resulting in a much simpler calculation that is, as a consequence, unable to account for shared extents. The feature is undocumented in-tree, but this cover letter gives an overview.
The bcachefs filesystem has finally been merged, though marked as "experimental" for now. The merge contains nearly 2,800 commits, not a single one of which adds documentation. There is information on this filesystem at bcachefs.org.
The kernel has gained support for TLS encryption for NVMe-TCP.

Hardware support

Clock: Cirrus Logic ep93xx timers, Amlogic S4 SoC PLL and peripheral clock controllers, TI TWL6032 clock controllers, Qualcomm SM8550 camera clock controllers, and Qualcomm SM4450 global clock controllers.
Graphics: JDI LPM102A188A DSI panels, Raydium RM692E5-based DSI panels, and Solomon SSD132x OLED displays.
Miscellaneous: Xilinx Versal DDR memory controllers, Analog Devices MAX77503 regulators, Mitsumi MM8013 fuel gauges, Qualcomm PM8916 BMS-VM fuel gauges, Qualcomm PM8916 linear battery chargers, Ampere Coresight performance monitoring units, Nuvoton NPCM BMC sdhci-pltfm controllers, and Qualcomm QSEECOM interfaces.
Networking: Loongson1 GMAC Ethernet controllers, Intel data path function devices, digital phase-locked-loop controllers, I3C-connected MCTP devices, and Mediatek MT7925-based wireless interfaces.

Miscellaneous

Rust 1.73.0 is now the version needed to build the Rust-for-Linux code.

Networking

The fair queuing packet scheduler has gained a number of performance improvements: "This series brings a 5% throughput increase in intensive tcp_rr workload, and 13% increase for (unpaced) UDP packets."
The TCP protocol can now optionally support microsecond-resolution timestamps on a per-route basis; this changelog includes instructions on how to enable this feature.
There is a new form of virtual network device where the transmit logic is entirely provided by a BPF program; this changelog has a bit more information.
The TCP authentication option (RFC 5925) is now supported; it supersedes the older, MD5-based authentication mechanism. This commit contains documentation on how TCP-AO works and how to use it.

Virtualization and containers

The iommufd subsystem can now perform dirty-tracking for DMA operations. According to the merge message: "This can be used to generate a record of what memory is being dirtied by DMA activities during a VM migration process. A VMM like qemu will combine the IOMMU dirty bits with the CPU's dirty log to determine what memory to transfer."

Internal kernel changes

There is a new "lightweight queue" implementation which is "a FIFO single-linked queue that only requires a spinlock for dequeueing, which happens in process context. Enqueueing is atomic with no spinlock and can happen in any context." There is no documentation outside of the kerneldoc comments in the source
Also added is "objpool", which is "a scalable implementation of high performance queue for object allocation and reclamation". The usage of this feature can be seen in this test module.

There is still a fair amount of work sitting in linux-next, most of which can be expected to land in the mainline before the end of the merge window. That, in turn, should happen on November 12. Keep an eye on LWN for our second-half summary once the merge window closes.

Comments (2 posted)

The BPF-programmable network device

By Jonathan Corbet
November 6, 2023

Containers and virtual machines on Linux communicate with the world via virtual network devices. This arrangement makes the full power of the Linux networking stack available, but it imposes the full overhead of that stack as well. Often, the routing of this networking traffic can be handled with relatively simple logic; the BPF-programmable network device, which was merged for the 6.7 kernel release, makes it possible to avoid expensive network processing, in at least some cases.

When a guest (either a container or a virtual machine) sends data over the network in current systems, that data first enters the network stack within that guest, where it is formed into packets and sent out through the virtual interface. On the host side, that packet is received and handled, once again within the network stack. If the packet is destined for a peer outside of the host, the packet will be routed to a (real) network interface for retransmission. The guest's data has made it into the world, but only after having passed through two network stacks.

The new device, named "netkit", aims to short out some of that overhead. It is, in some sense, a typical virtual device in that a packet transmitted at one end will only pass through the host system's memory before being received at the other. The difference is in how transmission works. Every network-interface driver provides a net_device_ops structure containing a large number of function pointers — as many as 90 in the 6.6 kernel. One of those is ndo_start_xmit():

    netdev_tx_t	(*ndo_start_xmit)(struct sk_buff *skb, struct net_device *dev);

This function's job is to initiate the transmission of the packet found in skb by way of the indicated device dev. In a typical virtual device, this function will immediately "receive" the packet into the network stack on the peer side with a call to a function like netif_rx(). The netkit device, though, behaves a bit differently.

When this virtual interface is set up, it is possible to load one or more BPF programs into each side of the interface. Since netkit BPF programs can affect traffic routing on the host side, only the host is allowed to load these programs for either the host or the guest. The ndo_start_xmit() callback provided by netkit will, rather than just passing the packet back into the network stack, invoke each of the attached programs in sequence, passing the packet to each. The BPF programs are able to modify the packet (to change the destination device, for example), and are expected to return a value saying what should be done next:

NETKIT_NEXT: continue processing with the next BPF program in the series (if any). If there are no more programs to invoke, this return is treated like NETKIT_PASS.
NETKIT_PASS: immediately pass the packet into the receiving side's network stack without calling any other BPF programs.
NETKIT_DROP: immediately drop the packet.
NETKIT_REDIRECT: immediately redirect the packet to a new network device, queuing it for transmission without the need to pass through the host's network stack.

Each interface can be configured with a default policy (either NETKIT_PASS or NETKIT_DROP) that applies if there is no BPF program loaded to make the decision. Most of the time, the right policy is probably to drop the packet, ensuring that no traffic leaks out of the guest until the interface is fully configured to handle it.

There are performance gains to be had if the decision to drop a packet can be made as soon as possible. Unwanted network traffic can often come in great quantities, so the less time spent on it, the better. But, as the changelog states, the best performance gains may come from the ability to redirect packets without re-entering the network stack:

For example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance.

According to the slides from a 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit talk, guests operating through the netkit device (which was called "meta" at that time) are able to attain TCP data-transmission rates that are just as high as can be had by running directly on the host. The performance penalty for running within a guest has, in other words, been entirely removed.

Given the potential performance gains for some users, it's not surprising that this patch series, posted by Daniel Borkmann but also containing work by Nikolay Aleksandrov, was merged quickly. It was first posted to the BPF mailing list on September 26, went through four revisions there, then applied for the 6.7 merge window one month later. This feature will not be for all users but, for those who are deploying network-intensive applications within containers or virtual machines, it could be appealing indeed.

Comments (2 posted)

Progress in wrangling the Python C API

By Jake Edge
November 7, 2023

There has been a lot of action for the Python C API in the last month or so—much of it organizational in nature. As predicted in our late September article on using the "limited" C API in the standard library, the core developer sprint in October was the scene of some discussions about the API and the plans for it. Out of those discussions have come two PEPs, one of which describes the API, its purposes, strengths, and weaknesses, while the other would establish a C API working group to coordinate and oversee the development and maintenance of it.

Working group

In mid-October, Guido van Rossum announced PEP 731 ("C API Working Group Charter") as the first visible outcome of the meetings at the sprint. If approved by the steering council, it would establish a working group of the five PEP authors (Van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, and Irit Katriel) to oversee the C API, and to steer it in ways that are analogous to what the council does for Python. There are multiple contentious issues surrounding the API, the PEP states, so there is a need for a dedicated group of core developers to work through them: "The general feeling is that there are too many stakeholders, proposals, requirements, constraints, and conventions, to make progress without having a small trusted group of deciders."

Some presentations on the C API at the 2023 Python language summit led to a site for gathering problems with the API. It has collected more than 60 different accounts of problems that people are experiencing using, maintaining, and extending the API. The PEP gives a rough summary of the kinds of problems that would be under the purview of the working group:

Despite many discussions and in-person meetings at core developer sprints and Language Summits, and a thorough inventory of the problems and stakeholders of the C API, no consensus has been reached about many contentious issues, including, but not limited to:

Conventions for designing new API functions;
How to deal with compatibility;
What's the best strategy for handling errors;
The future of the Stable ABI and the Limited API;
Whether to switch to a handle-based API convention (and how).

Beyond just gathering problems, though, the effort has expanded to gather potential solutions in two other repositories: API evolution for relatively non-controversial, common-sense changes, and API revolution "for radical or controversial API changes" that will definitely need to go through the PEP process.

The reaction to the announcement was generally positive, though there were suggestions that the working group should include some other stakeholders, such as developers of C extensions. The PEP states that the working group will be made up of at least three core developers, but that "members should consider the needs of the various stakeholders carefully". The PEP notes that the group serves at the pleasure of the steering council; thus the council provides a check on the actions of the group. As Dower noted, the stakeholders are not being left out:

Since the WG would be proposing changes that are only directly binding on the core development team, I'm okay with the core developer requirement.
If the WG doesn't appear to be soliciting contributions from prominent users of the API and factoring them in, that's a reason to go to the steering council with a complaint.
Being directly on the WG isn't a prerequisite to contribute. It's just a burden of having to take responsibility for the decisions and how they impact other competing interests.

Analysis

On November 1, Katriel announced an outcome from this year's language summit: PEP 733 ("An Evaluation of Python's Public C API"). It has nearly 30 authors that reads like a "who's who" of Python core developers—perhaps reflecting the attendees of the summit—but was coordinated by Katriel. Effectively, it is a summary of the problems that were collected, categorized into nine separate problem areas, along with a look at the various stakeholders and their requirements. Beyond that, it also describes some of the history of the API, its purposes, and its strengths ("to make sure that they are preserved"):

As mentioned in the introduction, the C API enabled the development and growth of the Python ecosystem over the last three decades, while evolving to support use cases that it was not originally designed for. This track record in itself is indication of how effective and valuable it has been.

For one thing, the stakeholders have diversified over the years. The C API first came about "as the internal interface between CPython's interpreter and the Python layer", but it was later exposed for third-party developers to use for extending CPython and to embed the interpreter in their own applications. Since then, new Python implementations have arisen that also need to use the C API; in addition, multiple projects seek to provide bindings or a better API for Python extensions in C (e.g. Cython), Rust (e.g. PyO3), and other languages and frameworks. Those projects use the C API in various ways, as well.

The overarching problem that has been identified with the C API is the difficulty of maintaining and evolving it, in part due to all of the different stakeholders and their competing needs. A process for incremental evolution, with deprecations and eventual removals of some parts of the API, could be a possible way forward; another option is periodic upheaval of the API via redesigns "each of which learns from the mistakes of the past and is not shackled by backwards compatibility requirements (in the meantime, new API elements may be added, but nothing can ever be removed)". Between those two extremes is a compromise approach that fixes "issues which are easy or important enough to tackle incrementally, and leaving others alone".

But the CPython core developers have different opinions on how to change the API, which is "an ongoing source of disagreements". So a fundamental framework for changes needs to come about:

Any new C API needs to come with a clear decision about the model that its maintenance will follow, as well as the technical and organizational processes by which this will work.
If the model does include provisions for incremental evolution of the API, it will include processes for managing the impact of the change on users [Issue 60], perhaps through introducing an external backwards compatibility module [Issue 62], or a new API tier of "blessed" functions [Issue 55].

Another problem area is the specification of the API, or, in truth, the lack thereof. Currently it is defined as "whatever CPython does" in a particular version; the documentation provides some amount of specification, but it is insufficient to verify any of the different API levels. That leads to unexpected changes to the API between versions, for one thing. The API also exposes more of the internals of CPython than is intended—or desired—and is C-specific, so other languages need to parse and handle C language constructs of various sorts.

The different API levels (or tiers), "which provide different tradeoffs of stability vs API evolution, and sometimes performance", are also a source of problems. The stable ABI, which is used by binary extensions that are built using the limited version of the C API, is "incomplete and not widely adopted"; there are different opinions of whether it is worth keeping at all, but, if it is kept, it needs to support multiple ABI versions in a single binary. The limited API needs some changes as well. Meanwhile, there are inconsistencies in the way that CPython private functions are named and there may be a need to add an "unsafe" tier that provides functions that remove error checking for performance purposes.

Most of the other categories listed in the PEP cover choices that were made in parts of the API, in particular, object reference management, object creation, type definition, and error handling, that are now seen as sub-optimal. Evolving (or replacing) those is desired, but that will have to be worked out once the overall maintenance scheme is determined. There are a handful of implementation flaws and some missing features that need attention as well.

As can be seen, there is a lot for the working group (and steering council) to address with regard to the C API. First up, is whether the council is ready to approve the working group charter; given that the council has already indicated that it is in favor of the idea, it seems likely that approval will come fairly quickly.

Current C API changes

Meanwhile, though, some work has been going on for the Python 3.13 release that is due next year. In fact, at the end of October, Stefan Behnel raised some complaints about the large amount of changes that appeared in the first 3.13 alpha release:

Hundreds of functions were removed, renamed, replaced with different functions. Header files were removed. Names were removed from header files. Macros were changed, even a few publicly documented ones. And the default answer to "how do I replace this usage" seems to be "we'll try to design a blessed replacement in time".

The changes were extensive and disruptive enough that he (provocatively) wondered if the release should be called "Python 4". But, as Jean Abou Samra pointed out, this was all part of Stinner's longstanding plan to clarify the public versus private C API. It is, however, clearly disruptive, so others joined Behnel in thinking that things were moving too quickly.

Stinner maintains that the situation is manageable and that he is planning to devote much of his time over the next few months toward getting extensions working. He has marked many functions as private, which caused the breakage that Behnel and others encountered, but plans to "public-ize" various API elements as needed to support the existing C API users so that all of them are working by the time the first 3.13 beta is released in May.

Stinner opened a GitHub issue to track the problems. As can be seen there, opinions differ on how to address the problems that arise; Stinner would like to fix them one-by-one, but others think reverting the changes makes more sense. One complicating factor, as steering-council member Gregory P. Smith noted, is that there is another big development going on in the 3.13 development tree: removing the global interpreter lock (GIL) from the interpreter. Those changes are described in PEP 703 ("Making the Global Interpreter Lock Optional in CPython") and it is important not to impede progress in that area:

We need to treat 3.13 as a more special than usual release and aim to minimize compatibility headaches for existing project code. That way more things that build and run on 3.12 build can run on 3.13 as is or with minimal work.
This will enable ecosystem code owners to focus on the bigger picture task of enabling existing code to be built and tested on an experimental pep703 free-threading build rather than having a pile of unrelated cleanup trivia blocking that.

One senses that a directional shift in Stinner's current C API work may be in the offing; he did point out that the week-old Cython 3.0.5 release has preliminary support for 3.13-alpha1, which is an indication that his plan is generally working. One thing that will not be shifting, however, is the version to Python 4—ever, according to Smith. Should the major version of Python need to change at some point, it seems that four "shalt thou not count", as with The Holy Hand Grenade of Antioch. Five, of course, should be "right out", at least according to Monty Python—for Python, the language, though, we will just have to wait and see.

Comments (7 posted)

Page editor: Jonathan Corbet
Next page: Brief items>>