LWN.net Weekly Edition for June 26, 2025

Welcome to the LWN.net Weekly Edition for June 26, 2025

This edition contains the following feature content:

Libxml2's "no security embargoes" policy: an overloaded maintainer stops playing the security game.
GNOME deepens systemd dependencies: the GNOME project has decided to start making more use of functionality from the systemd project.
How to write Rust in the kernel: part 1: a getting-started guide for kernel developers who would like to explore using Rust.
Who are kernel defconfigs for?: an attempt to change the x86 defconfig file exposes some disagreements over which configuration defaults make sense for the kernel.
A distributed filesystem for archival systems: ngnfs: the final installment in our LSFMM+BPF coverage looks at a new filesystem focused on archival uses.
Getting extensions to work with free-threaded Python: now that no-GIL Python is possible, what is needed to get extensions to play along?
Asterinas: a new Linux-compatible kernel project: an effort to produce a safer kernel in Rust.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Libxml2's "no security embargoes" policy

By Joe Brockmeier
June 25, 2025

Libxml2, an XML parser and toolkit, is an almost perfect example of the successes and failures of the open-source movement. In the 25 years since its first release, it has been widely adopted by open-source projects, for use in commercial software, and for government use. It also illustrates that while many organizations love using open-source software, far fewer have yet to see value in helping to sustain it. That has led libxml2's current maintainer to reject security embargoes and sparked a discussion about maintenance terms for free and open-source projects.

A short libxml2 history

The original libxml, also known as gnome-xml, was written by Daniel Veillard for the GNOME project. He also developed its successor, libxml2, which was released in early 2000 under the MIT license, even though GNOME applications tended to be under the GPLv2.

In the early 2000s, Veillard seemed eager to have others adopt libxml2 outside the GNOME project. It was originally hosted on its own site rather than on GNOME infrastructure. Libxml2 is written in C, but had language bindings for C++, Java, Pascal, Perl, PHP, Python, Ruby, and more. The landing page listed a slew of standards implemented by libxml2, as well as the variety of operating systems that it supported, and boasted that it "passed all 1800+ tests from the OASIS XML Tests Suite". The "reporting bugs and getting help" page gave extensive guidance on how to report bugs, and also noted that Veillard would attend to bugs or missing features "in a timely fashion". The page, captured by the Internet Archive in 2004, makes no mention of handling security reports differently than bug reports—but those were simpler times.

One can see why organizations felt comfortable, and even encouraged, to adopt libxml2 for their software. Why reinvent the extremely complicated wheel when someone else has not only done it but also bragged about their wheel's suitability for purpose and given it a permissive license to boot?

By the late 2000s, the project had matured, and the pace of releases slowed accordingly. Veillard continued to maintain the project, but skimming through the GNOME xml mailing list shows that his attention was largely elsewhere. Nick Wellnhofer began to make regular contributions to the project around 2013, and by 2017 he was doing a great deal of work on the project, eventually doing most of the work on releases—though Veillard was still officially sending them out. He was also making similar contributions to a related project, libxslt, which is a processor for Extensible Stylesheet Language Transformations (XSLT) which are used for transforming XML documents into other XML documents or into HTML, plain text, etc.

I want my libxml2

In April 2021, Stefan Behnel complained that it had been almost 18 months since the last libxml2 release. "There have been a lot of fixes during that time, so, may I kindly ask what's hindering a new release?" Veillard replied that the reason was that he was too busy with work, and there was "something I would need to get in before a release". That something seems to be a security fix for CVE-2021-3541, a flaw in libxml2 that could lead to a denial of service. The release of libxml2 2.9.11, which fixed the CVE, and 2.9.12, seem to have been the last contributions from Veillard to the project.

Wellnhofer had become the de facto maintainer of libxml2 and libxslt as Veillard was fading away from them, but he temporarily stepped down in July 2021. He had been able to fund his work through Chrome bug bounties and other Google programs, but: "returns from security research are diminishing quickly and I see no way to obtain a minimal level of funding anymore".

Veillard thanked Wellnhofer for his work, and said he was not sure that he would be able to ensure the same level of care for the projects on his own: "that's obvious for anybody monitoring those lists lately".

In January 2022, Wellnhofer announced that he was able to resume maintenance of libxml2 and libxslt through 2022, thanks to a donation from Google. He planned to move the projects to GNOME's infrastructure and resume releases, plus set up an official way to sponsor libxml2 development. Ultimately, he chose Open Source Collective as a fiscal host. (LWN covered Open Source Collective in 2024.) To date, it appears that the project has received the immense sum of $11,000, most of which was in the form of a $10,000 donation from Google, which appears to be the funding Wellnhofer received for maintenance of libxml2 through 2022.

Irresponsible behavior

Fast-forwarding to 2025, Wellnhofer opened an issue on May 8, in the libxml2 GitLab repository to announce a new security policy for the project. He said that he was spending several hours each week dealing with security issues, and that was unsustainable for an unpaid volunteer.

As an example of what Wellnhofer was faced with, and a hint as to what may have been the final straw, there are currently four bugs marked with the security label in the libxml2 issue tracker. Three of those were opened on May 7 by Nikita Sveshnikov, a security researcher who works for a company called Positive Technologies. One of the issues is a report about a null-pointer deference that could lead to a denial of service. It includes a request for Wellnhofer to provide a CVE number for the vulnerability and provide information about an expected patch date. Note that neither libxml2 nor GNOME are CVE Numbering Authorities (CNAs).

One can debate whether the vulnerabilities reported by Sveshnikov and other researchers have much value. Wellnhofer argues he has fixed about 100 similar bugs and does not consider that class of bugs to be security-critical. Even if it is a valid security flaw, it is clear why it might rankle a maintainer. The report is not coming from a user of the project, and it comes with no attempt at a patch to fix the vulnerability. It is another demand on an unpaid maintainer's time so that, apparently, a security research company can brag about the discovery to promote its services.

If Wellnhofer follows the script expected of a maintainer, he will spend hours fixing the bugs, corresponding with the researcher, and releasing a new version of libxml2. Sveshnikov and Positive Technologies will put another notch in their CVE belts, but what does Wellnhofer get out of the arrangement? Extra work, an unwanted CVE, and negligible real-world benefit for users of libxml2.

So, rather than honoring embargoes and dealing with deadlines for security fixes, Wellnhofer would rather treat security issues like any other bug; the issues would be made public as soon as they were reported and fixed whenever maintainers had time. Wellnhofer also announced that he was stepping down as the libxslt maintainer and said it was unlikely that it would ever be maintained again. It was even more unlikely, he said, with security researchers "breathing down the necks of volunteers."

Treating security flaws as regular bugs might make some downstream users nervous, but Wellnhofer hopes it will encourage more contributions:

The more I think about it, the more I realize that this is the only way forward. I've been doing this long enough to know that most of the secrecy around security issues is just theater. All the "best practices" like OpenSSF Scorecards are just an attempt by big tech companies to guilt trip OSS maintainers and make them work for free.

GNOME contributor Michael Catanzaro worried that security flaws would be exploited in the wild if they were treated like regular bugs, and suggested alternate strategies for Wellnhofer if he was burning out. He agreed that "wealthy corporations" with a stake in libxml2 security issues should help by becoming maintainers. If not, "then the consequence is security issues will surely reach the disclosure deadline (whatever it is set to) and become public before they are fixed".

Wellnhofer was not interested in finding ways to put a band-aid on the problem; he said that it would be better for the health of the project if companies stopped using it altogether:

The point is that libxml2 never had the quality to be used in mainstream browsers or operating systems to begin with. It all started when Apple made libxml2 a core component of all their OSes. Then Google followed suit and now even Microsoft is using libxml2 in their OS outside of Edge. This should have never happened. Originally it was kind of a growth hack, but now these companies make billions of profits and refuse to pay back their technical debt, either by switching to better solutions, developing their own or by trying to improve libxml2.

The behavior of these companies is irresponsible. Even if they claim otherwise, they don't care about the security and privacy of their users. They only try to fix symptoms.

He added that he would love to mentor new maintainers for libxml2, "but there simply aren't any candidates".

The viewpoint expressed by Wellnhofer is understandable, though one might argue about the assertion that libxml2 was not of sufficient quality for mainstream use. It was certainly promoted on the project web site as a capable and portable toolkit for the purpose of parsing XML. Open-source proponents spent much of the late 1990s and early 2000s trying to entice companies to trust the quality of projects like libxml2, so it is hard to blame those companies now for believing it was suitable for mainstream use at the time.

However, Wellnhofer's point that these companies have not looked to improve or care for libxml2 in the intervening years is entirely valid. It seems to be a case of "out of sight, out of mind"; as long as there are no known CVEs plaguing the many open-source libraries that these applications depend on, nobody at Apple, Google, Microsoft, or any of the other companies, seem to care much about the upkeep of these projects. When a vulnerability is found, the maintainer is seemingly expected to spring into action out of a sense of responsibility to the larger ecosystem.

Safe to say no

Wellnhofer's arguments about corporate behavior have struck a chord with several people in the open-source community. Ariadne Conill, a long-time open-source contributor, observed that corporations using open source had responded with "regulatory capture of the commons" instead of contributing to the software they depend on.

She suggested that maintainers lacked the "psychological safety" to easily say no. They can say no to corporate requests; doing so, however, means weighing that "the cost of doing so may negatively impact the project's ability to meet its end goal". In that light, maintainers may opt to concede to requests for free labor rather than risking the unknown consequences.

In response to Wellnhofer's change in security policy for libxml2, Mike Hoye proposed that projects adopt public maintenance terms that would indicate "access to code is no promise of access to people". The terms for a project would be included as a MAINTENANCE-TERMS.md file in the top-level directory, similar to the README.md and CONTRIBUTING.md files included with many projects these days. The sample maintenance terms that Hoye provided state that the software is provided as-is and disclaim any promises, including response time, disclosure schedules, or any "non-contractual obligations or conventions, regardless of their presumed urgency or severity".

Hoye said that the point of the maintenance terms is to deliberately build a culture of social permission where maintainers feel safe saying "no". Otherwise, he said:

Someday, somebody's going to come to you and say, I'm from Apple, I'm from Amazon, I'm from Project Zero and you need to drop everything because your project is the new heartbleed or Log4j or who knows what and the world is falling over and if that psychological offramp isn't there, if you haven't laid out clearly what PROVIDED AS-IS means and how you're going to act about it ahead of time, saying "I'll be at my kid's recital" or "I'm on vacation" or just "no" is extremely difficult.

Chris Siebenmann said that he thinks that Wellnhofer's rejection of security embargoes is "an early sign of more of this to come, as more open source maintainers revolt". The current situation, Siebenmann said, is increasingly bad for the maintainers involved and is not sustainable. He now draws a sharp distinction between the corporate use of open-source software versus independent projects, such as Debian or the BSDs, run by volunteers; he expects that others will be doing the same in the future.

Maintainers may not want to say no to other volunteers. But, Siebenmann said, if a corporation shows up with a security issue, they can point to the maintenance terms—because corporations are not using open source as part of a cooperative venture and are not people "even if they employ people who make 'people open source' noises".

Wellnhofer's stance and Hoye's idea seem to be resonating with other maintainers who have strong feelings about corporate open-source behavior. Whether open-source maintainers adopt MAINTENANCE-TERMS.md files as a common practice is yet to be seen. The increasing frequency of conversations about funding open source and whether corporations are doing their share does suggest that something needs to change soon if open source is to be sustainable and not just a sucker's game for maintainers.

Comments (65 posted)

GNOME deepens systemd dependencies

By Joe Brockmeier
June 23, 2025

Adrian Vovk, a GNOME contributor and member of its release team, recently announced in a blog post that GNOME would be adding new dependencies on systemd, and soon. The idea is to shed GNOME's homegrown service manager in favor of using systemd, and to improve GNOME's ability to run concurrent user sessions. However, the move is also going to throw a spanner in the works for the BSDs and Linux distributions without systemd when the changes take effect in the GNOME 49 release that is set for September.

Vovk's announcement started by noting that GNOME does not have a formal, well-defined policy about systemd dependencies. The rule of thumb, he said, was that GNOME doesn't absolutely depend on systemd, but some individual features of GNOME may break without it. But there is no project-wide policy that dictates that the project should avoid depending on systemd, even though GNOME has historically been available on many non-Linux operating systems and Linux distributions that do not use systemd as their service manager.

The now-retired GNOME wiki has a "What We Release" page published nearly 12 years ago that explained the GNOME release team's philosophy on dependencies and non-Linux systems clearly; the project is focused on "a tightly-integrated desktop environment based on the GNOME Shell running on a GNU-based operating system with a Linux kernel". Any non-Linux usage, such as running GNOME on a BSD, is considered a secondary concern.

Systemd, even then, was listed as a component that is encouraged but not required by GNOME. Wayland—which is soon to be the only supported display system for GNOME—is also named as a recommended (but not required) component. The page hasn't been ported to the GNOME Project Handbook that is still maintained, but GNOME's philosophy toward non-Linux usage and favoring systemd has not changed, even if it is not currently codified as a formal policy.

More systemd, fewer hacks

For about a decade, GNOME has had at least one strong dependency on systemd: its user-session manager, systemd-logind. In 2015, support for the ConsoleKit framework was completely removed in favor of systemd-logind. However, Vovk said, it is possible to use elogind—which is logind "extracted" from systemd as a standalone daemon. That made it possible for BSD distributions and others to run GNOME without systemd itself as a dependency. Now, GNOME is going to be gaining two more dependencies on systemd—and those will not be so easy to swap out. The first is systemd-userdbd, which will be used by the GNOME Display Manager (GDM).

Vovk said that GNOME and systemd do not support running more than one graphical session under a single user account—but GDM may need to display multiple login screens at once, so that multiple users can log into their own GNOME sessions on a system. So, GDM needs to start each graphical login session as a unique user.

To do this GDM has relied on "legacy behaviors and straight-up hacks" to provide multiple graphical sessions. Now, GDM can use systemd-userdbd to allocate user accounts dynamically and run each login screen as a unique user. That means that the hacks are going away, and GDM will require systemd-userdbd. Note that the unique users are only needed for the login screen instance; when a user logs in, the GNOME session runs as that user.

At some point, Vovk said, the systemd-userdbd dependency will extend further to replace the AccountsService daemon that GNOME uses to access user accounts and information about those accounts. "Now that systemd's userdb enables rich user records, we can start work on replacing AccountsService." Vovk said that AccountsService was meant to be a temporary solution, but it has now been in use for 15 years. In a discussion about the move, Rich Felker asked why a fallback implementation couldn't just pull user data from /etc/passwd and /etc/shadow as it had always done. Vovk said:

For one, /etc/passwd doesn't have any form of rich information about the user. No profile pictures, no ability to export any kind of user settings to the login screen, etc. It barely has the ability to store a display name (which is actually "GECOS" instead, which is... complicated for historical reasons). Userdb is json, so we can add new stuff to it whenever we want instead of going in and implementing weird side databases.

The JSON user records used by systemd-userdbd can include many things that are not available in the GECOS field. That format, which can trace its heritage back to the early 1960s and General Electric's General Comprehensive Operating System (GECOS), is more than a bit long in the tooth. The systemd JSON user-record format, on the other hand, can not only contain more biographical information about users, it can contain additional security credentials, resource-management settings, and more.

Goodbye GNOME service manager

GNOME has had a built-in service manager since its 2.x days that can be used by gnome-session to start and manage services. GNOME has mostly used systemd for service management since the 3.34 release, but it kept the built-in service manager as a fallback when systemd is unavailable—and because the "hacks" used by GDM for multi-seat support were incompatible with systemd. Since those are going away, Vovk said, the built-in service manager will be "completely unused and untested", so it is being removed. He said that getting rid of the legacy service manager would also make it possible to implement a session save and restore feature.

Vovk submitted a merge request to remove the built-in service manager on June 6. In the discussion there, Pablo Correa Gomez, who is a GNOME Foundation board member and has been working on adding systemd to PostmarketOS to make it easier to support GNOME (and KDE), said that he had conflicting feelings about the change. Wearing his GNOME-maintainer hat, he admitted that he had been "bitten many times before by the poor, old gnome-session management code". He agreed that the old code needed to go. But he was terrified by the change as a downstream maintainer:

We, at postmarketOS are working, and getting very close to good systemd integration. But Alpine Linux and quite some others are not. Removing the built-in startup on GNOME session basically means breaking support for quite a lot of people, and giving others quite a headache. Generally speaking, I think that is not good for anybody.

He said that the change needed public communication, a call for feedback, and a decision from the release team on the timing. He also thanked Vovk for doing the work and necessary cleanups "not just specifically related to systemd".

Vovk replied that the session manager was a blocker for a feature to be included in GNOME 49, presumably the dynamic users for GDM greeter sessions, though he did not specify. He noted that the announcement about the change was coming, and it would set out exactly what would need to be implemented to use GNOME 49 without systemd. "In short, I don't think it's an unreasonable amount of integration work." He also said that the removal of GNOME's session manager had already been discussed with the release team and approved.

In his announcement blog post, Vovk apologized for the short timeline, but said: "this blog post could only be published after I knew how exactly I'm splitting up gnome-session into separate launcher and main D-Bus service processes". One might quibble with that—certainly it's possible to announce that a component other projects depend on might be dropped soon, even if the exact details are not yet known. The affected parties will usually appreciate the courtesy of a heads-up and the opportunity to collaborate, as Gomez indicated.

What are the options for distributions that do not use systemd? Vovk suggests that they "consider using systemd"; failing that, they will have to implement replacements for the systemd components as has been done with elogind. He goes into some detail about what would be required; replacing the GNOME built-in service manager with another, rewriting the systemd unit files to work with the alternative service manager, as well as replacing gnome-session-ctl, which is a utility that coordinates between GDM, the D-Bus service, and systemd. That's all before upgrading to GNOME 49—in a future GNOME release, probably 50, non-systemd distributions will also need to implement even more systemd-userdbd features:

Finally: You should implement the necessary infrastructure for the userdb Varlink API to function. Once AccountsService is dropped and GNOME starts to depend more on userdb, the alternate code path will be removed from GDM.

Reactions

Alpine Linux founder Natanael Copa said that the move was frustrating: Alpine uses musl, instead of the GNU C Library (glibc), and musl is not currently supported by systemd. Luca Boccassi said that systemd was open to supporting musl as long as someone else reimplements the features musl lacks. Copa replied that "nobody has the time, will and skills to do the actual work, so now it looks like we are losing GNOME".

The Chimera Linux distribution, which we covered in January, uses the Dinit service manager and musl as its C library. Its founder, Nina Kolesa, said that she was not expecting GNOME to drop the gnome-session legacy code now; she expected it to be dropped much earlier, "given it's pretty much unused in everything except gdm and non-systemd distros". Replacing gnome-session in Chimera had been "the plan since forever", so GNOME's decision to drop it just hurried things along a bit. Kolesa dismissed complaints about the decision:

The legacy handling code is kinda terrible and janky, doing away with it is a good thing overall and if your non-systemd service manager is worth anything at all, you *can* replicate the same (better) approach.

On GNOME's Discourse forum, Noé Lopez said that he was looking into porting gnome-session to the GNU Shepherd service manager but needed more guidance than was available in Vovk's blog post. He understood the desire to drop old code, but collaboration was needed to avoid dropping a huge part of GNOME's user base. GNOME contributor Emmanuele Bassi said that he didn't think that a huge part of GNOME's user base was running a non-systemd-based distribution in 2025. Vovk replied to Lopez with a link to the merge request and invited him to reach out on the systemd mailing list with questions about systemd-userdbd. "I'll be happy to assist there".

Must GNOME be portable?

GNOME embracing systemd as a required component and/or a Linux-only desktop environment has been a recurring topic for many years now, going back to at least 2009 when Christian Schaller called for the release team to declare GNOME "a Linux desktop system as opposed to a desktop system for any Unix-like system". LWN covered Bastien Nocera's plan to make systemd a hard requirement for the power plugin in the GNOME settings daemon in 2012.

In addition to his activities on the GNOME release team, Vovk is also a contributor to GNOME OS, a distribution for testing and development of the GNOME desktop. At least, that's what it is today. Vovk has written about his vision to turn GNOME OS into an image-based "daily-drivable general purpose OS" with the goal of making it suitable for non-enthusiasts. Part of that vision includes dulling the "sharp edges" of Linux on the desktop in favor of a GNOME OS optimized for usability. From that perspective, cutting away the legacy bits that are needed for portability and focusing on a fully-controlled platform makes some sense.

As "mort" said on the Lobste.rs discussion forum:

There are many great alternatives which do slot neatly into the "desktop environment" hole, most notably KDE and XFCE. These are great projects and using one of them is probably a much better idea than to try to extract only the desktop environment-like part of GNOME to get it to run outside of the intended GNOME system. Personally, I'm excited to see what the GNOME folks are able to do when unconstrained by old traditions like "everything should be a self-contained puzzle piece which can slot together with any arbitrary set of other puzzle pieces".

GNOME has been flirting with systemd as a hard requirement for a long time; perhaps it's time to make it official so that there is no expectation that GNOME is portable beyond Linux systems with systemd. That will disappoint some users, perhaps many, but other desktop projects would likely welcome them—and their non-systemd requirements—with open arms.

Comments (110 posted)

How to write Rust in the kernel: part 1

By Daroc Alden
June 20, 2025

Rust in the kernel

The Linux kernel is seeing a steady accumulation of Rust code. As it becomes more prevalent, maintainers may want to know how to read, review, and test the Rust code that relates to their areas of expertise. Just as kernel C code is different from user-space C code, so too is kernel Rust code somewhat different from user-space Rust code. That fact makes Rust's extensive documentation of less use than it otherwise would be, and means that potential contributors with user-space experience will need some additional instruction. This article is the first in a multi-part series aimed at helping existing kernel contributors become familiar with Rust, and helping existing Rust programmers become familiar with what the kernel does differently from the typical Rust project.

In order to lay the groundwork for the rest of the articles in this series, this first article gives a high-level overview of installing and configuring Rust tooling, as well as an explanation of how Rust fits into the kernel's existing build system. Future articles will cover how Rust fits into the kernel's maintainership model, what goes into writing a driver in Rust, the design of the Rust interfaces to the rest of the kernel, and hints about specific things to look for when reviewing Rust code.

Prerequisites

While support for Rust on GCC is catching up, and the rustc_codegen_gcc code generation backend is now capable of compiling the Rust components of the kernel, the Rust for Linux project currently only supports building with plain rustc. Since rustc uses LLVM, the project also recommends building the kernel as a whole with Clang while working on Rust code (although mixing GCC on the C side and LLVM on the Rust side does work). The build also requires bindgen to build the C/Rust API bindings, and a copy of the Rust standard library so that it can be built with the flags that the kernel requires. Building the kernel in the recommended way therefore requires Clang, lld, LLVM, the Rust compiler, the source form of the Rust standard library, and bindgen, at a minimum.

Many Linux distributions package sufficiently current versions of all of these; the Rust quick start documentation gives distribution-specific installation instructions. The minimum version of rustc required is 1.78.0, released in May 2024. The Rust for Linux project has committed to not raising the minimum required version unnecessarily. According to Miguel Ojeda, the current informal plan is to stick with the version included in Debian stable, once that catches up with the current minimum (likely later this year).

Developers working on Rust should probably also install Clippy (Rust's linter), rustdoc (Rust's documentation building tool), and rust-analyzer (the Rust language server), but these are not strictly required. The Rust for Linux project tries to keep the code free of linter warnings, so patches that introduce new warnings may be frowned upon by maintainers. Invoking make rustavailable in the root of the kernel source will check that the necessary tools have compatible versions installed. Indeed, all the commands discussed here should be run from the root of the repository. The make rust-analyzer command will set up configuration files for rust-analyzer that should allow it to work seamlessly with an editor that has language server support, such as Emacs or Vim.

Rust code is controlled by two separate kernel configuration values. CONFIG_RUST_IS_AVAILABLE is automatically set when compatible tooling is available; CONFIG_RUST (available under "General Setup → Rust support") controls whether any Rust code is actually built, and depends on the first option. Unlike the vast majority of user-space Rust projects, the kernel does not use Cargo, Rust's package manager and build tool. Instead, the kernel's makefiles directly invoke the Rust compiler in the same way they would a C compiler. So adding an object to the correct make target is all that is needed to build a Rust module:

    obj-$(CONFIG_RUST) += object_name.o

The code directly enabled by CONFIG_RUST is largely the support code and bindings between C and Rust, and is therefore not a representative sample of what most Rust driver code actually looks like. Enabling the Rust sample code (under "Kernel hacking → Sample kernel code → Rust samples") may provide a more representative sample.

Testing

Rust's testing and linting tools have also been integrated into the kernel's existing build system. To run Clippy, add CLIPPY=1 to the make invocation; this performs a special build of the kernel with debugging options enabled that make it unsuitable for production use, and so should be done with care. make rustdoc will build a local copy of the Rust documentation, which also checks for some documentation warnings, such as missing documentation comments or malformed intra-documentation links. The tests can be run with kunit.py, the kernel's white-box unit-testing tool. The tool does need additional arguments to set the necessary configuration variables for a Rust build:

    ./tools/testing/kunit/kunit.py run --make_options LLVM=1 \
        --kconfig_add CONFIG_RUST=y --arch=<host architecture>

Actually locating a failing test case could trip up people familiar with KUnit tests, though. Unlike the kernel's C code, which typically has KUnit tests written in separate files, Rust code tends to have tests in the same file as the code that it is testing. The convention is to use a separate Rust module to keep the test code out of the main namespace (and enable conditional compilation, so it's not included in release kernels). This module is often (imaginatively) called "test", and must be annotated with the #[kunit_tests(test_name)] macro. That macro is implemented in rust/macros/kunit.rs; it looks through the annotated module for functions marked with #[test] and sets up the needed C declarations for KUnit to automatically recognize the test cases.

Rust does have another kind of test that doesn't correspond directly to a C unit test, however. A "doctest" is a test embedded in the documentation of a function, typically showing how the function can be used. Because it is a real test, a doctest can be relied upon to remain current in a way that a mere example may not. Additionally, doctests are rendered as part of the automatically generated Rust API documentation. Doctests run as part of the KUnit test suite as well, but must be specifically enabled (under "Kernel hacking → Rust hacking → Doctests for the `kernel` crate").

An example of a function with a doctest (lightly reformatted from the Rust string helper functions) looks like this:

    /// Strip a prefix from `self`. Delegates to [`slice::strip_prefix`].
    ///
    /// # Examples
    ///
    /// ```
    /// # use kernel::b_str;
    /// assert_eq!(
    ///     Some(b_str!("bar")),
    ///     b_str!("foobar").strip_prefix(b_str!("foo"))
    /// );
    /// assert_eq!(
    ///     None,
    ///     b_str!("foobar").strip_prefix(b_str!("bar"))
    /// );
    /// assert_eq!(
    ///     Some(b_str!("foobar")),
    ///     b_str!("foobar").strip_prefix(b_str!(""))
    /// );
    /// assert_eq!(
    ///     Some(b_str!("")),
    ///     b_str!("foobar").strip_prefix(b_str!("foobar"))
    /// );
    /// ```
    pub fn strip_prefix(&self, pattern: impl AsRef<Self>) -> Option<&BStr> {
        self.deref()
            .strip_prefix(pattern.as_ref().deref())
            .map(Self::from_bytes)
    }

Normal comments in Rust code begin with //. Documentation comments, which are processed by various tools, start with /// (to comment on the following item) or //! (to comment on the containing item). These are equivalent:

    /// Documentation
    struct Name {
        ...
    }

    struct Name {
        //! Documentation
        ...
    }

Documentation comments are analogous to the specially formatted /** comments used in the kernel's C code. In this doctest, the assert_eq!() macro (an example of the other kind of macro invocation in Rust) is used to compare the return value of the .strip_prefix() method to what it should be.

Quick reference

# Check Rust tools are installed
make rustavailable
# Build kernel with Rust enabled
# (After customizing .config)
make LLVM=1
# Run tests
./tools/testing/kunit/kunit.py \
  run \
  --make_options LLVM=1 \
  --kconfig_add CONFIG_RUST=y \
  --arch=<host architecture>
# Run linter
make LLVM=1 CLIPPY=1
# Check documentation
make rustdoc
# Format code
make rustfmt

Finally, Rust code can also include kernel selftests, the kernel's third way to write tests. These need to be configured on an individual basis, using the kernel-configuration snippets in the tools/testing/selftests/rust directory. Kselftests are intended to be run on a machine booted with the corresponding kernel, and can be run with make TARGETS="rust" kselftest.

Formatting

Rust's syntax is complex. This has been one of several sticking points in adoption of the language, since people often feel that it makes the language difficult to read. That problem cannot wholly be solved with formatting tools, but they do help. Rust's canonical formatting tool is called rustfmt, and if it is installed, it can be run with make rustfmt to reformat all the Rust code in the kernel.

Building and testing Rust code is necessary, but not sufficient, to review Rust code. It may be enough to get one started experimenting with the existing Rust code in the kernel, however. Next up, we will will do an in-depth comparison between a simple driver module and its Rust equivalent, as an introduction to the kernel's Rust driver abstractions.

Comments (none posted)

Who are kernel defconfigs for?

By Jonathan Corbet
June 24, 2025

Working on the kernel can be a challenging task but, for many, configuring a kernel build can be the largest obstacle to getting started. The kernel has thousands of configuration options; many of those, if set incorrectly, will result in a kernel that does not work on the target system. The key to helping users with complex configuration problems is to provide reasonable defaults but, in the kernel community, there is currently little consensus around what those defaults should be.

The kernel's configuration options control many aspects of how a kernel will be built. There is one for every device driver, for example, and usually for the subsystems to which the drivers belong as well. Failing to enable a needed driver will result in a non-working kernel; enabling all drivers will result in a long-running build process and a massive kernel at the end. Configuration options also control the availability of many features and system calls, the application of security mitigations and policies, many debugging features, subarchitecture support, and more. Many of the options either depend on or conflict with others.

Working through this maze of configuration options is not fun; the kernel provides some tools for configuration editing, but they can only provide so much help. There are make options to either enable or disable the entire set of configuration options; these are useful for build testing, but are rarely used to create a kernel that will actually run somewhere. A bit more help can be had from make localmodconfig, which examines the modules loaded in the currently running kernel, then generates a configuration to build just those modules. Eric Raymond once attempted to turn the configuration process into an adventure game, but that did little to tame the monsters found therein.

In the end, many people simply start with a known-working configuration (perhaps from their distributor), adjust it as necessary, and try to not think about it again for as long as the resulting kernel works. Linus Torvalds has often said that the configuration system is a problem:

People, I've said this before, and apparently I need to say it again: the kernel config is likely the nastiest part of building a local kernel, and the biggest impediment to people actually building their own kernels.
And people building their own kernel is the first step to becoming a kernel developer.

There is one other operation, make defconfig, that will create an architecture-specific configuration that is intended to be a good starting point, providing a set of reasonable defaults that will create a working kernel. Ingo Molnar, one of the x86 maintainers, has recently come to the conclusion that the x86 default configuration has drifted from those goals; he posted a patch series that aimed to modernize that configuration:

Historically the x86 defconfigs aimed to be distro kernel work-alikes with fewer drivers and a substantially shorter build time. We regularly ask our contributors to test their changes on x86 defconfigs, and we frequently analyze code generation on such kernels as well.
In practice, over the past couple of years this goal has diverged from what actual modern Linux distributions do these days, and this series aims to correct that divergence.

The series makes quite a few changes to the existing default configuration:

Virtualization changes include enabling guest support for the Xen, Jailhouse, ACRN, and Hyper-V hypervisors, along with Intel's TDX confidential-computing mechanism. Support for running as a KVM host is also enabled.
The current x86 default configuration does not include BPF; Molnar's patch set turns it on, including the BPF security module.
It enables a number of memory-management options, including memory hotplugging, kernel samepage merging (KSM), transparent huge pages, userfaultfd, memory control groups, and the multi-generational LRU.
On the core-kernel front, features like core scheduling, NUMA balancing, namespaces, pressure-stall information, and BSD process accounting are enabled.
A number of debugging options are also enabled, including the kgdb debugger, UBSAN, function profiling, and more.

The series generated few comments in general, but Peter Zijlstra complained, suggesting that the memory-management changes (and enabling KSM in particular) were "a giant security fail". Torvalds, in turn, strongly told Molnar to stop this work, saying that the default configuration should be for "normal people". Options that are useful to cloud providers (such as the virtualization subsystems) should not be enabled, he said. The fact that all distributors enable a specific option is also not, in his mind, an argument for enabling that option in the default configuration. Torvalds would seemingly like to see the configuration system made easier to navigate, but this is evidently not the way he wants that to be done.

Molnar responded with the detailed reasoning behind his changes, saying that he wants easier kernel configuration for the people who contribute patches to the kernel. That group includes developers working for cloud providers, who Torvalds does not see as "normal people". Molnar asked:

Why not make the defconfig work out of the box for the testing environments of a broader group of our actual contributors, as long as the build cost isn't overly high?

He added that kernel developers tend to work (and test) with configurations that differ significantly from what the distributors are shipping, and that can lead to problems downstream when distributors enable the options the developers are avoiding.

Despite laying out his reasoning, Molnar is seemingly not ready for a fight. He has removed most of the changes from the patch series, saying:

These commits are not coming back. Clearly my approach of using the lowest common denominator of distro kernel configs is not appreciated and I have no desire whatsoever to fight such pushback.

Torvalds did not respond, and nobody else has jumped in, so the conversation ended there. For the time being at least, the x86 default configurations will differ significantly from those used by distributors, and will lack features that many people might prefer to have in their built kernels. It may stay that way for some time. The kernel's build system is an area where few developers choose to go; it is maintained (and probably only really understood) by a single developer. If a consensus cannot be reached even on a set of basic defaults, it seems unlikely that more significant improvements can be expected.

Comments (35 posted)

A distributed filesystem for archival systems: ngnfs

By Jake Edge
June 20, 2025

LSFMM+BPF

A new filesystem was the topic of a session led by Zach Brown at the 2025 Linux Storage, Filesystem, Memory Management, and BPF Summit (LSFMM+BPF). The ngnfs filesystem is not a "next generation" NFS, as might be guessed from the name; Brown said that he did not think about that linkage ("I hate naming so much") until it was pointed out to him by Chuck Lever in an email. It is, instead, a filesystem for enormous data sets that are mostly stored offline.

He works for Versity, which has an "archival software stack" that is used for products storing "really big data sets with a ton of files that have mostly been tiered off to archive". That means there are no file contents that are online any longer, which is the weirdest thing for a filesystem developer to wrap their head around, he said. The filesystem is metadata-heavy, with the archival agent making mostly metadata changes to extended attributes (xattrs) that describe where the file contents are currently stored. That includes information like what tier the data is in and what its location is on the media (e.g. tape).

The archive tiers have "large aggregate bandwidth" that would swamp a single host that was driving the system. So it is a distributed, POSIX filesystem that is, for example, "feeding eight machines that all have a bunch of attached tape drives". That is the context for the filesystem, Brown said: "a whole bunch of files, mostly metadata churn, but, annoyingly, as the file contents flow around, we need a bunch of aggregate bandwidth so it's not just one node doing all this".

He called the filesystem "engine fs" and said that the name had come from "next generation" when he began working on it; he used "ngn" and always pronounced it as "engine". That left him with a blind spot that NFS was embedded in the name, he said with a laugh.

His experience with GlusterFS, ocfs2, and other distributed filesystems has led him to try to remove the choke points (or bottlenecks) that he has observed in those other filesystems. The idea with ngnfs is to minimize the path between the two required elements: the application endpoint and the persistent-storage endpoint. Many competing systems have an enormous amount of other "stuff that you flow through to do all this work" for things like locking; it makes those systems hard to understand and to reason about, he said.

There are three "big ideas" behind ngnfs, though none of them are revolutionary; "this is just my brain solving this set of constraints in the way that it finds least awful", Brown said with a chuckle. There are per-device user-space servers, so that each archive device in the fleet has a processor in front of it. There is a network protocol that the servers speak to network endpoints. Finally, there is a client that is "building POSIX behavior by doing reads and writes across the blocks" provided by the servers. All of that should sound familiar, he said, "but it's how we build the protocol and the behavior of the client as it gets its sets of blocks that makes this a little different"

The network protocol is pretty minimal; there is "almost nothing there" in the Git tree. The protocol is block-oriented, with small, fixed-sized blocks and the expected read and write operations. Writing is a little more complicated because it is doing distributed writes across all of the servers. The read and write operations have additional cache-coherency information so that readers can, for example, specify that they will be writing a block as well; there are no higher-level locks for operations such as rename, because the locks are at the block level. This cache-coherency protocol is "kind of the heart of why this is interesting".

Because there is an intelligent endpoint on the server side, it can help make some decisions for clients. So, not all of the operations are simply reads and writes; there are some "richer commands that let it [the server] explore the blocks and make choices for you". He didn't want to get too deep into details, but block allocation is an area requiring server intelligence.

The client is the most interesting piece to him. The key thing to understand about the client "is the way we make these block modifications safe". For most kernel filesystems, there is a mutex that is protecting a set of blocks, so those that are protected can be read or written, but ngnfs has done away with those mutexes. Instead, the blocks are assembled into transaction objects; if they are being modified, the client has write access to all of the blocks, so they can be dirtied in memory; "when someone else needs them, they'll all leave as an atomic write". Reads also use the transaction objects, but there is no need to track dirty blocks.

Brown realized that attendees would immediately be thinking about ABBA deadlocks; that is what the client code is set up to avoid. The client attempts to get block access in block-traversal order, but that order can change, so the client is structured to use "trylocks", which attempt to obtain a lock without blocking if it cannot be acquired. If that fails, the client has to unwind and reacquire the access to the needed blocks. There is overhead in doing that, he said, but by localizing it in the client, the block-granular locking scheme can be used, so more widespread locking can be avoided. Writeback caching is "the big motivation for doing this"; the classic example is an untar, which just dirties a bunch of blocks in memory and "you don't have round trips for every logical operation".

Josef Bacik asked about how ngnfs handles its metadata-heavy workload; he has seen people struggle with those kinds of workloads on other filesystems, adding metadata servers and other complexity. Brown said that it all comes down to blocks. It will seem familiar to filesystem developers if they look at it as the "dumbest, dirt-stupid block filesystem, [then] spray those blocks over the network with a coherent protocol". Those blocks include everything: inode blocks, indirect blocks, directory-entry (dirent) blocks, extended attributes, and so on.

Christian Brauner asked if the client already existed. Brown said "sort of"; in the Git repository, there is a debugfs network client that has some thin wrappers around virtual filesystem (VFS) operations for file creation, rename, and things like that. There is also a server that does the block I/O.

Jeff Layton asked about file locking, which is not currently implemented, Brown said. There have been no requests for it, but if it is requested, it would be done in a block-centric manner. The applications that are being used do not fight over files, so there is no real need for locking, he thinks. "Until they do and then you're going to have to deal with it", Layton said and Brown acknowledged.

Brauner asked if there were any VFS changes that were needed for ngnfs. Brown said that there were not; all of the transactions, trylocks, and retries would be handled in the client implementation.

Beyond the block-granular contention, which is helpful in naturally avoiding the need for higher-level locking, he is most excited by the online-repair possibilities offered by ngnfs, Brown said as he was wrapping up. Clients can do "incoherent reads", where the blocks may be stale or undergoing modification, but the repair process can examine whatever the server has available. If the data is inconsistent in some way, an entire range can be rewritten with a compare-and-exchange operation; the server may recognize that the blocks have changed and require the repair operation to get new blocks. The whole repair process can be done in parallel on multiple clients to constantly ensure that the blocks stay consistent.

Comments (4 posted)

Getting extensions to work with free-threaded Python

By Jake Edge
June 25, 2025

PyCon US

One of the biggest changes to come to the Python world is the addition of the free-threading interpreter, which eliminates the global interpreter lock (GIL) that kept the interpreter thread-safe, but also serialized multi-threaded Python code. Over the years, the GIL has been a source of complaints about the scalability of Python code using threads, so many developers have been looking forward to the change, which has been an experimental feature since Python 3.13 was released in October 2024. Making the free-threaded version work with the rest of the Python ecosystem, especially native extensions, is an ongoing effort, however; Nathan Goldbaum and Lysandros Nikolaou spoke at PyCon US 2025 about those efforts.

Goldbaum began by noting that Python has "superpowers" in part because of its ability to call into native code. For example, when using NumPy, what looks like Python actually calls into C code; the interpreter mediates that, so the Python programmer does not even know. Typically, that native code is written in C, C++, or Rust. Up until recently, the GIL has always been part of the way that the interpreter mediates access.

He showed the slide above and said that it would be used as the basis for multiple parts of the talk. In it, each thread spool represents a thread running in a native Python extension (such as NumPy). Each spool has a lock icon representing the GIL; some, such as those doing I/O or making native function calls, are unlocked, while one that is calling into the CPython C API is locked. Two other spools are "grayed" out because they are waiting for the GIL so that they can call into the C API. As the diagram shows, there is some parallelism available even with the GIL, but multiple threads needing to use the C API will be serialized by the GIL.

There is an additional detail in the slide that he wanted to highlight: the plug and receptacle between the interpreter runtime and the thread spool. In the GIL-enabled build, obtaining the GIL also means that the thread is attached to the Python runtime, so all of the threads but the one holding the GIL are in the unplugged state. He did not go into further detail, but the idea is that attached threads are registered with the interpreter runtime so that they can make calls into the C API. The attached versus detached (unplugged) state is not really a useful distinction for the GIL-enabled build, he said, but it does make a difference for the free-threaded build.

He put up an updated slide for the free-threaded interpreter, which looked similar; the differences were a lack of locks (because there is no GIL) and that all of the threads calling into the C API were attached to the interpreter runtime and were running. You still need to be attached to the runtime in order to call the C API and he emphasized that the API to do so has not changed. There are two ways to attach (PyGILState_Ensure() and the Py_END_ALLOW_THREADS macro) and two corresponding ways to detach (PyGILState_Release() and the Py_BEGIN_ALLOW_THREADS macro). The free-threaded build will just maintain the existing API that extensions are already using "so, by default, most things will just kind of work", which means there is less to do than might be guessed to run existing extensions on the free-threaded build.

One problem area that does need attention, however, is extensions that rely on global state. Goldbaum showed an example from NumPy 2.1 (though it was not verbatim) where there was a static variable for its print mode that would get set from Python code. That was "horribly broken" with the free-threaded build because multiple threads could be setting it at once; it could result in the options for, say, printing an array changing while the array is being printed.

Ecosystem migration

Nikolaou then stepped up to talk about the work that a team from Quansight Labs (where he and Goldbaum work) and Meta (where free-threaded Python was born) had done to jumpstart the ecosystem migration to the free-threaded build. He put up a slide with nearly 20 different Python projects and said that the team had spent time on getting those working with the free-threaded build. The team started with build systems and bindings generators, like Meson, Conda, Cython, and PyO3; some members are working on CPython directly, while others are working up the stack on things like NumPy, SciPy, Matplotlib, scikit-learn, and pandas. Beyond that, work has been done on various projects in the surrounding ecosystem like Pillow, pyca/cryptography, PyYAML, and AIOHTTP.

He pointed to two web sites that are tracking compatibility. Hugo van Kemenade has a site tracking free-threaded wheels that are available for the top 360 packages on the Python Package Index (PyPI). Similarly, Donghee Na has the Free-threaded Python Library Compatibility Checker, which builds packages with the free-threaded interpreter daily and shows the successes and failures. The team has also been working on the Python Free-Threading Guide to help the long tail of projects that will be making any needed changes themselves.

It is important to understand that there are two separate builds of the Python interpreter for 3.13 and the upcoming 3.14: one with the GIL and one where it is disabled by default (i.e. the free-threaded build). Getting the free-threaded Python (often specified as 3.13t or 3.14t) is fairly straightforward, Nikolaou said; it can be installed in parallel with the standard interpreter. It is available from Linux distributions, Homebrew, Conda, uv, and more.

Native extensions do not automatically come with support for free-threaded Python; extensions need to declare that they support it. Trying to use an extension that does not make that declaration on a free-threaded Python build will result in a RuntimeWarning that the GIL has been enabled. Contrary to what many people think, the GIL is not gone, and "probably will not be gone in the future as well", he said.

When a package is being ported to support free threading, lots of documentation should be added to describe exactly what is and is not supported. For things that are not supported, the documentation should provide alternatives. The team has found that it is important to encourage user feedback, because that can provide the use cases and can help guide the developers to the areas that need attention. SciPy does this particularly well, he said; it explicitly lists classes and functions that can and cannot be used by multithreaded code. It also raises exceptions when objects are shared between threads in unsupported ways, which is a good practice.

Native data races are an area that needs attention when porting extensions to support free threading. Data races are undefined behavior, which, in C and C++, "is particularly evil and we should be avoiding it". He put up a classic example of a global counter that is being incremented in a loop; if multiple threads are executing that code, the results are undefined. He did not directly say it, but the existing races may not have occurred because of the GIL or were not encountered because multithreaded Python programs were fairly rare.

Using sanitizers, such as ThreadSanitizer, and other tools can detect these kinds of problems, but multithreaded testing is also need to flush them out. To that end, Quansight Labs has released pytest-run-parallel, which is a pytest plugin to stress test a package's tests in multiple threads.

Early release of packages with free-threading support is something that works well to speed the porting process. Problems that users encounter (and report) will help find outstanding issues, but it also helps the ecosystem. "Having just one dependency in your dependency tree that does not support free threading means that the GIL will be re-enabled at run time", which makes testing the free-threaded build harder.

Mutexes

A tool that can be used to deal with global state problems is a mutex, Goldbaum said after returning to the stage. A mutex is like the GIL, in that only one thread can hold it and others must wait for it in order to continue executing, but a mutex has a more limited scope. So, instead of code that is problematic when multiple threads are using it, such as:

    int counter = 42;

    void increment() {
        counter++;
    }

A mutex can be used to protect counter from being accessed and incremented by multiple threads at once:

    int counter = 42;
    static PyMutex mutex = {0};

    void increment() {
        PyMutex_Lock(&mutex);
        counter++;
        PyMutex_Unlock(&mutex);
    }

Another common reason why a mutex might be needed is to wrap calls into a non-thread-safe library. He showed an example from Pillow where calls into the FreeType library were wrapped in mutex locks and unlocks using a macro that was a no-op for GIL-enabled builds. The team used a single global mutex for the library and wrapped all of the calls into the thread-unsafe FreeType API.

Whenever you use more than one lock, though, there is the possibility of deadlocks, Goldbaum said. One way to avoid deadlocks is to use atomic operations, which allow multiple threads to safely change shared variables. Atomics are also a low-level way to tell the compiler to not reorder code in ways that can introduce timing issues, he said. Atomics allow writing algorithms that can avoid locking because the programmer can precisely control the order of operations to avoid the need for locks.

Atomics is a "huge topic" and it is easy to write incorrect code using them. He recommended the Rust Atomics and Locks book, which is freely available online; it is how he learned about atomics. Even for those who do not know Rust, the book provides useful information that is applicable to any language that exposes native atomics.

Caches are another problem area for multithreaded code; caches are good for single-threaded performance, but they are "bad for threads". One quick way to make progress on porting a cache-using package for free threading is to disable any caches that are not critical. Any caches that remain are just sources of global state that need to be protected from access by multiple threads.

Using one-time initialization APIs, such as Rust's OnceLock, to populate a cache can avoid problems for multithreaded code. But, because the one-time initialization APIs will block other threads while one thread does the work, it can result in deadlocks, either with the GIL for GIL-enabled builds or with the garbage collector on free-threaded builds. The PyO3 Rust bindings for Python provide the OnceLockExt trait to avoid this problem; extensions written in C or C++ will need to find a way to do something similar.

Mutable data structures are perhaps the source of the biggest problems for free-threaded support. Any time two or more threads have access to a mutable object, there is the potential for non-deterministic behavior. The general picture that developers should have in their heads is a classic triangle with "thread safety", "scalability and performance", and "simplicity" as the three vertices. "If you're really lucky, you can choose two; sometimes you can only choose one." Goldbaum believes there is a lot of room to develop thread-safe primitives that are optimized for different use cases.

He noted that the native debuggers (LLDB and GDB) were useful tools when developing or porting a Python extension. He also suggested that anyone working on an extension in any language—"except maybe in Rust, but even then"—use AddressSanitizer and ThreadSanitizer. He pointed to Docker images for CPython built with ThreadSanitizer as a possibility for using in continuous-integration (CI) testing. There is also advice on debugging as part of the free-threading guide.

The future

He sees bindings generators as "the future in a free-threaded world", as opposed to writing extensions using the raw C API for CPython. For C++, that likely means using pybind11 or nanobind; Cython should strongly be considered for C. Rust extensions should use PyO3; he thinks that Rust and PyO3 is the best choice for writing thread-safe extensions "or even just native extensions at all for Python".

For new extension projects, he thinks Rust really should be the only choice, but, for those who feel differently, Rust should at least be "strongly considered". It is easy to write incorrect extension code in C and C++; "Rust prevents a lot of issues". He is not the only one who thinks so; he put up a slide from David Hewitt's Python Language Summit talk earlier in the conference that showed roughly 30% of new PyPI projects have at least some Rust in them.

There is a need to coordinate between libraries in the free-threaded world. For example, the threadpoolctl module is used by NumPy to limit the number of threads that OpenBLAS starts in its thread pool; if too many threads are spawned on a system, there will be problems with resources and contention. Integration between libraries that are creating their own thread pools will be needed to ensure that the system does not get overwhelmed.

Rethinking mutable state should be on the agenda, as well, Goldbaum said as the session wound down. Making more types of immutable data structures available would be helpful. Reworking the buffer protocol with something like borrow checking would make it easier to share byte buffers. Currently the buffer protocol allows arbitrary reads and writes in buffers, which is problematic with multiple threads.

A YouTube video of the talk is available.

[Thanks to the Linux Foundation for its travel sponsorship that allowed me to travel to Pittsburgh for PyCon US.]

Comments (53 posted)

Asterinas: a new Linux-compatible kernel project

June 19, 2025

This article was contributed by Ronja Koistinen

Asterinas is a new Linux-ABI-compatible kernel project written in Rust, based on what the authors call a "framekernel architecture". The project overlaps somewhat with the goals of the Rust for Linux project, but approaches the problem space from a different direction by trying to get the best from both monolithic and microkernel designs.

What's a framekernel?

The framekernel concept is explained in the September 2024 paper "Framekernel: A Safe and Efficient Kernel Architecture via Rust-based Intra-kernel Privilege Separation" by Yuke Peng et al. A fuller version of the paper was published in early June.

Traditionally, monolithic kernels lump everything into one kernel-mode address space, whereas microkernels only implement a minimal trusted computing base (TCB) in kernel space and rely on user-mode services for much of the operating system's functionality. This separation implies the use of interprocess communication (IPC) between the microkernel and those services. This IPC often has a performance impact, which is a big part of why microkernels have remained relatively unpopular.

The core of Asterinas's "framekernel" design is the encapsulation of all code that needs Rust's unsafe features inside a library, enabling the rest of the kernel (the services) to be developed using safe abstractions. Those services remain within the kernel's address space, but only have access to the resources that the core library gives to them. This design is meant to improve the safety of the system while retaining the simple and performant shared-memory architecture of monolithic kernels. The Asterinas book on the project's website provides a nice architectural mission statement and overview.

The aptness of the "framekernel" nomenclature can perhaps be debated. The frame part refers to the development framework wrapping the unsafe parts behind a memory-safe API. The concept of the TCB is, of course, not exclusive to microkernel architectures but, because there are strong incentives to strictly scrutinize and, in some contexts, even formally verify the TCB of a system, keeping the TCB as small as possible is a central aspect of microkernel designs.

An update on the project is available on the Asterinas blog in the June 4 post titled "Kernel Memory Safety: Mission Accomplished". The post explains the team's motivations and the need for the industry to address memory-safety problems; it provides some illustrations that explain how the framekernel is different from monolithic kernels and microkernels. It also takes a moment to emphasize that the benefits of Rust don't stop with memory safety; there are improvements to soundness as well. Perhaps most importantly, the post highlights the upcoming Asterinas presentation at the 2025 USENIX Annual Technical Conference.

Related work

In their paper, the authors compare Asterinas to some prior Rust-based operating-system work, exploring the benefits of the language's memory-safety features and explain how Asterinas differs from that previous work. Specifically, the paper contrasts Asterinas with RedLeaf, an operating system written in Rust and presented at the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20) in 2020. Asterinas uses hardware isolation to permit running user-space programs written in any programming language, aims to be general-purpose, and provides a Linux-compatible ABI, while RedLeaf is a microkernel that is designed not to use the hardware's isolation features, and the project focuses on different things.

Another project of interest is Tock, an embedded system that targets SoCs with limited hardware protection functionality. Like Asterinas, Tock also divides the kernel into a trusted core allowed to use unsafe and untrusted "capsules" that are not. As mentioned, Asterinas does rely on hardware protection and isn't intended for strictly embedded use, which differentiates it from Tock.

It bears mentioning that the Rust for Linux project, which is introducing Rust code into the upstream Linux kernel, has similar goals as Asterinas. It also aims to encapsulate kernel interfaces with safe abstractions in such a way that drivers can be written in Rust without any need for unsafe.

Work toward formal verification

One goal of shrinking the TCB of an operating system is to make it feasible to have it formally verified. In February 2025, the Asterinas blog featured a post detailing plans to do just that. The best known formally verified kernel is seL4, an L4-family microkernel.

Asterinas aims to use the framekernel approach to achieve a system that has a small, formally verified TCB akin to a lean microkernel, but also a simple shared-memory architecture with Linux ABI compatibility, all at the same time. This is a radical departure from any previously formally verified kernel; the blog post describes those kernels as deliberately small and limited compared to "full-fledged, UNIX-style OSes".

The Asterinas project is collaborating with a security-auditing company called CertiK to use Verus to formally verify the kernel. There is an extensive report available from CertiK on how Asterinas was audited and the issues that were found.

Libraries and tools

The Asterinas kernel is only one result of the project. The other two are OSTD, described as "a Rust OS framework that facilitates the development of and innovation in OS kernels written in Rust", and OSDK, a Cargo addon to assist with the development, building, and testing of kernels based on OSTD.

There are four stated goals for OSTD as a separate crate. One is to lower the entry bar for operating-system innovation and to lay the groundwork for newcomers to operating-system development. The second is to enhance memory safety for operating systems written in Rust; other projects can benefit from its encapsulation and abstraction of low-level operations. The third is to promote code reuse across Rust-based operating-system projects. The fourth is to boost productivity by enabling testing of new code in user mode, allowing developers to iterate without having to reboot.

It is worth emphasizing that the kernels that can be written with OSTD do not have to be Linux-compatible or, in any way, Unix-like. The APIs provided are more generic than that; they are memory-safe abstractions for functionality like x86 hardware management, booting, virtual memory, SMP, tasks, users, and timers. Like most Rust crates, OSTD is documented on docs.rs.

Asterinas reports Intel, among others, as a sponsor of the project. Intel's interest is likely related to its Trust Domain Extensions (TDX) feature, which provides hardware modes and features to facilitate isolation of virtual machines, and memory encryption. The Asterinas book has a brief section on TDX, and the OSDK supports it.

The OSTD, or at least the parts that Asterinas ends up using, seems to essentially be the restricted TCB that allows unsafe. For an illustrative example, we could take a look at the network kernel component's source code and see that the buffer code uses DMA, locking, allocation, and virtual-memory code from the OSTD through memory-safe APIs.

Current state

Asterinas was first released under the Mozilla Public License in early 2024; it has undergone rapid development over the past year. GitHub lists 45 individual committers, but the majority of the commits are from a handful of PhD students from the Southern University of Science and Technology, Peking University, and Fudan University, as well as a Chinese company called Ant Group, which is a sponsor of Asterinas.

At the time of writing, Asterinas supports two architectures, x86 and RISC-V. In the January blog post linked above, it was reported that Asterinas supported 180 Linux system calls, but the number has since grown to 206 on x86. As of version 6.7, Linux has 368 system calls in total, so there is some way to go yet.

Overall, Asterinas is in early development. There have been no releases, release announcements, changelogs, or much of anything other than Git tags and a short installation guide in the documentation. The Dependents tab of the OSTD crate on crates.io shows that no unrelated, published crate yet uses OSTD.

It does not seem like Asterinas is able to run any applications yet. Issue #1868 in Asterinas's repository outlines preliminary plans toward a first distribution. The initial focus on a custom initramfs and some rudimentary user-space applications, followed by being able to run Docker. There are initial plans to bootstrap a distribution based on Nix. Notably (but unsurprisingly), this issue mentions that Asterinas doesn't support loading Linux kernel modules, nor does it ever plan to.

Near-future goals

The Roadmap section of the Asterinas book says that the near-term goals are to expand the support for CPU architectures and hardware, as well as to focus on real-world usability in the cloud by providing a host OS for virtual machines. Apparently, the support for Linux virtio devices is already there, so a major hurdle has already been cleared. In particular, the Chinese cloud market, in the form of Aliyun (also known as Alibaba Cloud) is a focus. The primary plans involve creating a container host OS with a tight, formally verified TCB and support for some trusted-computing features in Intel hardware, for the Chinese cloud service.

While both Rust for Linux and Asterinas have similar goals (providing a safer kernel by relying on Rust's memory safety), their scopes and approaches are different. Rust for Linux focuses on safe abstractions strictly for new device drivers to be written in safe Rust, but this leaves the rest of the kernel untouched. Asterinas, on the other hand, aims to build a whole new kernel from the ground up, restricting the unsafe-permitting core to the absolute minimum, which can then be formally verified. Asterinas also focuses on containers and cloud computing, at least for now, while Rust for Linux looks to benefit the whole of the Linux ecosystem.

Despite the stated cloud focus, there is more going on, for example building support for X11 and Xfce. Also, the OSTD could, of course, prove interesting for OS development enthusiasts irrespective of the Asterinas project, but so far it remains unknown and untested by a wider audience.

Asterinas is certainly a refreshingly innovative take on principles for operating-system development, leaning on the safety and soundness foundations provided by the Rust language and compiler. So far it is at an early exploratory stage driven by enthusiastic Chinese researchers and doesn't see any serious practical use, but it is worth keeping an eye on. It will be interesting to see the reception it will get from the Rust for Linux team and the Linux community at large.

Comments (67 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: LSFMM+BPF book; tag2upload; PostmarketOS 25.06; Firefox 140.0; NLnet funding; Quotes; ...
Announcements: Newsletters, conferences, security updates, patches, and more.

Next page: Brief items>>