LWN.net Weekly Edition for April 25, 2024
Welcome to the LWN.net Weekly Edition for April 25, 2024
This edition contains the following feature content:
- Linus and Dirk chat about AI, XZ, hardware, and more: the latest iteration of a traditional Open Source Summit session.
- Gentoo bans AI-created contributions: the Gentoo project feels the need to take action against increased usage of machine-learning systems.
- Existential types in Rust: a type-system improvement that helps asynchronous use cases (and beyond).
- Rust for embedded Linux kernels: embedded developer Fabien Parent describes his approach to getting useful Rust code into the mainline kernel.
- Warning about WARN_ON(): the kernel community debates the use of the WARN_ON() family of macros.
- Weighted memory interleaving and new system calls: giving applications more control over how their memory is placed in heterogeneous-memory systems.
- A change in direction for security-module stacking?: Linus Torvalds questions the development direction of the kernel's security-module subsystem.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Linus and Dirk chat about AI, XZ, hardware, and more
One of the mainstays of the Linux Foundation's Open Source Summit is the "fireside chat" (sans fire) between Linus Torvalds and Dirk Hohndel to discuss open source and Linux kernel topics of the day. On April 17, at Open Source Summit North America (OSSNA) in Seattle, Washington, they held with tradition and discussed a range of topics including proper whitespace parsing, security, and the current AI craze.
Calm and boring
It has been a number of years since the Linux kernel was the exciting new technology. That seems to suit Torvalds just fine. Hohndel opened the discussion by asking where things stand with kernel development right now. Torvalds responded that the kernel was at 6.9-rc4 and that things seemed to be "calm and boring," which is as it should be for a 30-year-old project. Those seeking excitement, he said, should seek out "one of the hype areas".
But Linux does have a lot of drama and high-stakes discussions, Hohndel said. "A really important topic that has once again reared its head is tabs versus spaces." Torvalds rolled his eyes at this and muttered, but he was unable to resist taking the bait.
The topic was Torvalds taking exception to a patch
that replaced a tab with a space character to make Kconfig files easier to
parse. He showed his displeasure by purposely adding
a few more tabs to a different Kconfig file that would trip up unwary programs. In the commit message, he wrote that
it wasn't clear what tool failed to parse tabs correctly but it was in
need of fixing: "Because if you can't parse tabs as whitespace, you
should not be parsing the kernel Kconfig files
".
Naturally, this got widespread attention. This, Torvalds said, "is the kind of excitement you get in the kernel community". Hohndel agreed, and said that the reason this garnered attention is because there's "not enough other drama" to focus on.
Hardware bugs
With that the discussion sailed on to more substantive topics, such as hardware bugs. Torvalds said that the security bugs in hardware have been very frustrating. He wasn't complaining about the work required to address hardware vulnerabilities, however, but the secrecy that comes into play when working to fix the issues. That, he complained, was not how he liked to work.
I love the development model where you can talk to people and work on interesting stuff, and the security issues we've had over the last decade have kind of destroyed that for me.
If it weren't for the secrecy, said Torvalds, "the challenges would otherwise be pretty interesting".
Another frustration for Torvalds is how long it takes to address a bug in the hardware itself. "We can react quite quickly in software", he said, "but then the hardware people are saying, oh, we have five generations of hardware that we can't fix after the fact". And, because the next couple of generations of hardware are already designed, it will take a few more years before new hardware can work around it.
Hohndel asked whether RISC-V, an open hardware platform, was going to be an improvement. Torvalds said that his fear was that RISC-V would make all the same mistakes that we've seen a decade earlier with x86. They'll be fixed more quickly, he predicted, because "by now people have learned something". But looking into a future with RISC-V being widely deployed, he expects to see the same problems that x86 and Arm have had.
Hohndel noted that the RISC-V work will be done in the open, providing an opportunity to come in early and say "we've tried this, and it doesn't work". Torvalds dismissed the idea that open hardware meant flaws would be caught in development, though, and said that there's a "big gulf between software and hardware people" that is hard to work across.
The XZ incident
The XZ backdoor was a dominant topic in presentations and hallway conversations at OSSNA this year. Naturally, Hohndel steered the conversation from hardware security to that topic.
Open source, Torvalds said, relies on "a certain amount of trust". Trusting people around you to do the right thing. Not only open source, he noted, but proprietary software as well. Communities and companies must depend on trusting people, and that trust can be violated.
Torvalds is no stranger to violations of trust. He gave the example of the University of Minnesota (UMN) sending patches with intentional bugs as part of an experiment. Kernel maintainers caught the bad patches, and were really upset about being experimented on. That study was interesting, he said, but "they didn't do it very well".
While the study was poorly executed, most people would agree the UMN incident was not malicious. The XZ backdoor was malicious, and Torvalds pointed out that "nobody had any explicit gates in place to try to catch this". Despite that, though, "it was caught fairly quickly". Not because of procedures or processes, but because it was found randomly when a developer noticed something wrong. But random is good, Torvalds said. It's not possible to always have specific rules in place to catch everything, and when there are rules in place, attackers can try to work around them. The fact that XZ was caught, and quickly, "does imply a fairly strong amount of stability".
However, he did say that this event is a wake-up call, and there are now "a lot of people looking into various measures in the kernel". The biggest defense, said Torvalds, is "a healthy community". The Linux kernel community is that, with an "incredibly big, incredibly entwined and connected community where there are multi-year and multi-decade relationships", he said. It is also, he was quick to point out, an outlier. Many open-source projects are run by just a few people, or just one, whereas "we have 1,000 people that basically participate in every single release every couple of months". So what the kernel does can't apply to 99 percent of other projects.
Here, Hohndel called on the audience to get involved. "Each of you works for a company, have your company adopt a couple of such projects and just participate." Every bit helps, he said, read the code, be part of review, "provide moral support to the maintainers".
But while the bad actors are out there and draw attention they're not the main problem, according to Torvalds. The main problem is that there will continue to be bugs because no one is perfect, and those need solving as well.
AI
To bring the topic back to something "fun and entirely uncontroversial", Hohndel decided to steer the chat in the direction of AI. If you want to double your salary, he said, just add "AI" to your title. Until it takes all the jobs. "What I find so interesting is this idea that [generative] AI is going to be the end of programmers, the end of authors", Hohndel said. Even Torvalds would be replaced by an AI model.
"Finally!", Torvalds joked. "I hate the hype", he said, but he does find the technology interesting. It has also had some positives, like bringing companies to the table for kernel development. "For example, a company like NVIDIA—who is not exactly famous for being great at interacting with the kernel community—has been much more active". Suddenly, he said, they started caring about Linux. So it has had a positive impact.
However, he cautioned that people should take a wait-and-see approach. And he was optimistic about the technology making it easier to catch kernel bugs. "Making the tools smarter is not a bad thing", he said, but warned against "gloom and doom" or over-hyping the technology.
What's next
Torvalds is no stranger to making tools. A little project called "Git" has also had an enormous impact on the industry. But that, Hohndel pointed out, was more than a decade ago. People want to know "what's next"? When will we see another major project?
If Torvalds has his way, the answer to the question is never. "I say that because every single project I've started always started from me being frustrated with other people being incompetent." Or, he added, with their money-grubbing. So he hopes that he doesn't find himself in that situation again, or "that there will be somebody else who solves my problems".
Right now, Torvalds said, "I don't have any huge problems, Linux for me solved all the problems I had way back in '92 or '93". If others hadn't found it useful, "I would not have continued".
By this time, the pair had run out of time. Hohndel said he had many more questions, but they would have to save them for Hong Kong, where the next Open Source Summit will be held in August.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]
Gentoo bans AI-created contributions
Gentoo Council member Michał Górny posted
an RFC to the gentoo-dev mailing
list in late February about banning "'AI'-backed (LLM/GPT/whatever)
contributions
" to the Gentoo Linux project. Górny wrote that the spread of the
"AI bubble
" indicated a need for Gentoo to formally take a stand on AI
tools. After a lengthy discussion, the Gentoo Council voted
unanimously this week to adopt his proposal and ban contributions generated with AI/ML tools.
The case against
In his RFC, he laid out three broad areas of concern:
copyrights, quality, and ethics. On the copyright front, he argued that LLMs are trained on
copyrighted material and the companies behind them are unconcerned
with copyright violations. "In particular, there's a good risk that
these tools would yield stuff we can't legally use.
"
He questioned the quality of LLM output, though he did allow that LLMs might
"provide good assistance if you are careful enough
". But, he said, there's
no guarantee contributors are aware of the risks. He minced no
words about his view of the ethics of the use of AI. Górny took issue
with everything from the energy consumption driven by AI to labor
issues and "all kinds of spam and scam
". The only reasonable
course of action, he said, would be to ban
the use of those tools altogether in creating works for Gentoo:
In other words, explicitly forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to create ebuilds, code, documentation, messages, bug reports and so on for use in Gentoo.
He added that this only extended to works created expressly for the Gentoo project, and did not encompass upstream projects using things like ChatGPT. Andreas K. Hüttel asked whether there were objections to packaging AI software for Gentoo. This did not elicit a response in favor or against on the list, but the AI policy page expressly mentions that the policy does not prohibit packaging AI-related software.
Is this necessary?
Rich Freeman wrote
that he thought it made sense to consider the use of AI, but suggested the Gentoo
developer certificate of origin (DCO) already had the necessary language
to prohibit AI-generated contributions. "Perhaps we ought to
just re-advertise the policy that already exists?
" He also poked
at the ethical case laid out by Górny, and suggested it would alienate
some contributors even if the majority of the project was in favor. Freeman said it
was not a bad idea to reiterate that Gentoo didn't want contributions
that were just piped out of a GPT application into forums, bug reports, commits, etc., but didn't think that it
required any new policy.
Ulrich Mueller replied
that there is overlap with existing policy, but did not find it
redundant and supported the idea of a clarification on how to deal
with AI-generated code. Sam James agreed
with the proposal but worried that it was "slightly
performative [...] given that we can't really enforce it
." Górny
wrote
that it was unlikely that the project could detect these
contributions, or that it would want to actively pursue finding
them. The point, he said, is to make a statement that they are
undesirable.
Oskari Pirhonen wanted
to know about cases where a contributor uses ChatGPT to help with
writing documentation or commit messages (but not code) because they
don't have "an excellent grasp of English
". If those
contributions explicitly called out AI-generated content, would those
be acceptable? Górny said that would not help much, and dismissed
the quality of content generated by ChatGPT. Mueller wanted to know
where the line was: "Are translation tools like DeepL allowed?
I don't see much of a copyright issue for these.
"
In a rare dissent, Matt Jolly responded
that Gentoo would always have poor quality contributions, and could
simply use common sense to filter out low-quality LLM material. "We
already have methods for weeding out low quality contributions and bad
faith contributors - let's trust in these and see what we can do to
strengthen these tools and processes.
" He argued in favor of using
LLMs for code documentation and asked why he had to type out an
explanation of what his code does if an LLM can generate something that only requires some
editing. The proposal, he said, was a bad idea and banning LLMs "at
this point is just throwing the baby out with the
bathwater
". Guidelines would be fine, even a ban on completely
AI-generated works, but he was opposed to "pre-emptively
banning useful tools
".
James replied
that tools trained on Gentoo's current repository should be OK, as
well as using LLMs to assist with commit messages. But, he said, a lot
of FOSS projects were seeing too much AI spam and were not interested
in picking the "possibly good
" parts out.
David Seifert responded in support of the RFC and asked if it could be added to the next Gentoo Council meeting agenda. Górny said that he had been asked for a specific motion and provided this language:
It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.
Approved
Given the ratio of comments in favor of banning AI-generated contributions to objections to such a ban, it is not surprising that the council voted to accept Górny's proposal. Now the question is how Gentoo implements the ban. In an emailed response to questions, Górny said that Gentoo is relying on trust in its contributors to adhere to the policy rather than trying to police contributions to see if they were generated with AI/ML tools:
In both cases, our primary goal is to make it clear what's acceptable and what's not, and politely ask our contributors to respect that. If we receive contributions that contain really "weird" mistakes, the kind that [do not] seem likely to be caused by a human error, we're going to start asking questions, but I think that's the best we can do.
As AI/ML continues to dominate the tech industry's agenda, Gentoo is unusual in looking to shut it out rather than trying to join the party. How well the policy works, and how soon it is tested, will be interesting to see.
Existential types in Rust
For several years, contributors to the Rust project have been working to improve support for asynchronous code. The benefits of these efforts are not confined to asynchronous code, however. Members of the Rust community have been working toward adding explicit existential types to Rust since 2017. Existential types are not a common feature of programming languages (something the RFC acknowledges), so the motivation for their inclusion might be somewhat obscure.
The benefits of static type systems are well-known, but they do have some
downsides as well. Type systems, especially complex type systems, can
make writing type signatures painful and produce complicated error messages. A
recent comment on Hacker News showed
an example of types added to the popular
SQLAlchemy
Python library, lamenting: "Prior
to the introduction of types in Python, I thought I wanted it. Now I hate
them.
"
These complaints are hardly new; they drove C++ and Java to adopt auto and var keywords for variable declarations, respectively, in order to save programmers from having to actually write down the lengthy types assigned to values in their programs. Both of these features reduce the burden associated with complex types by letting the compiler do some of the work and infer the relevant types from context. These mechanisms don't represent a complete solution, and cause their own set of problems, however. Using them, it is easy to accidentally change the inferred type of a variable in a way that breaks the program. And the resulting error messages still refer to the full, unreadable types. Additionally, local type inference doesn't help with types in function signatures.
There are solutions to all of these problems — the C++ committee introduced concepts, which can help simplify some complex types behind an interface, partially to address this — but Rust has been trying to avoid falling into the trap altogether, despite an increasingly complex type system of its own. Existential types are one mechanism intended to make dealing with complex types easier. Unfortunately, they are also currently not well-explained or well-understood outside a few specific niches. The RFC calls this out as one problem with the current status quo:
The problem stems from a poor understanding of what "existential types" are — which is entirely unsurprising: existential types are a technical type theoretic concept that are not widely encountered outside type theory (unlike universally-quantified types, for instance). In discussions about existential types in Rust, these sorts of confusions are endemic.
Existential types get their name from mathematical logic via the existential quantifier, but the realization of the concept in an actual programming language like Rust is a good deal less abstract. Simply put, existential types are types that exist, but which cannot be directly manipulated outside of their scope. Normal generic types (referred to as universally-quantified types in the quote above) let the caller of a function decide what concrete type the function should be called with. In this circumstance, the function can only interact with values of this type as opaque values, because it doesn't know what type the caller will choose. Existential types invert the direction of that control, letting the function itself decide what concrete type should be used, while the caller of the function must now treat the values as being of an unknown type.
Existential types today
Rust actually already has a limited form of existential types, just not by that name. Instead, the Rust documentation refers to them as impl Trait types. They allow the programmer to say that a function takes or returns some type that implements a trait, without actually saying what that type is. For example, the caller of this function can use the return value as an iterator, but cannot see what type it has (in this case, a Range):
    fn example() -> impl Iterator<Item = u8> {
      0..10
    }
impl Trait types are useful for abstracting away API details without introducing any kind of runtime indirection. At compile time, the compiler knows the specific concrete type that underlies an impl Trait type, but it doesn't need that type explicitly written out, nor does it need to complicate error messages by showing it. In contrast to a mechanism like auto, changing the body of the function in a way that results in returning a type incompatible with the type signature (in this case, one that is not an Iterator) still causes a type error.
Abstracting away the inferred type like this is especially useful for asynchronous functions, which are syntactic sugar for functions that return impl Future. Since asynchronous functions return existential types under the hood, any limitations or improvements to existential types affect asynchronous functions as well. Existential types are also useful for returning closure types, which do not actually have names in Rust. (A design decision made for efficiency reasons that C++ actually shares — it permits better inlining of anonymous functions.)
In 2018, Rust gained the ability to have impl Trait types as the argument to a function as well. However, these types still remain quite constrained compared to full existential types. For example, they can't appear in type aliases, or be stored inside structures. It's only in December 2023 with Rust 1.75 that they were allowed as return values from trait methods.
Existential types in the future
But there is one more subtle restriction on impl Trait types — every occurrence in the program refers to a different concrete type. Two functions that both return impl Debug, for example, could very well return two entirely different types. This makes it hard to write a collection of functions (such as implementations of the same interface for different configurations or architectures) that are all guaranteed to return the same type, without explicitly writing out that type.
There is a workaround for that use case, but it involves a layer of run-time indirection by making functions return a trait object — a heap-allocated structure full of function pointers that presents an opaque interface to a value. Using trait objects is a poor substitute for existential types for a few reasons. For one, it has a noticeable performance overhead because it prevents static method resolution and function inlining. For a language that prides itself on providing zero-cost abstractions, requiring programs to use runtime indirection is unacceptable. For another, returning trait objects can't quite express the same guarantees that existential types can.
The next step on the road toward full existential types is allowing them to be used in type aliases, which would make their use more consistent with other types in Rust. That change would allow programmers to write things like this:
    type Foo = impl Debug;
    fn function1() -> Foo {
      'a'
    }
    fn function2() -> Foo {
      'b'
    }
Critically, these functions are now guaranteed to return values of the same type, which lets programmers express patterns that were not previously possible. This is also the missing piece to allow impl Trait types to be stored in structures. In current Rust, the concrete type underlying an impl Trait type is only inferred when processing a function's arguments or return types — which is sufficient for the existing uses of existential types, but not for full existential types. When support for existential types in Rust is fully complete, the compiler should be able to infer the type of a member of a structure from how it's used. For now, permitting existential types in type aliases as the RFC does provides a workaround:
    struct Bar {
      item: impl Debug, // Error, can't infer underlying type
    }
    // Code using the RFC:
    type Quux = impl Debug;
    struct Bar {
      item: Quux,
    }
    // Later uses of 'Quux' let the compiler infer a concrete type.
    fn function3 -> Quux {
      42
    }
This should cover a number of use cases, because the most common reason to want to store a value of an existential type in a structure is because it is produced by some method, and not otherwise storable except by converting it to a trait object.
This work is the last major step toward existential types that can be used in
all the same ways as Rust's existing types. The
RFC points out the confusion the current piecemeal solution causes as
one reason to want a version of existential types that can be used everywhere:
"it is valuable from documentation and explanatory angles to unify the uses
of impl Trait so that these types of questions never even arise.
"
Glen De Cauwsemaecker commented on the work in November 2023, saying that he had tried to use asynchronous functions in some of his networking code, but had run into serious usability problems when combining asynchronous functions with traits. After struggling to express the interface he wanted, he ended up using the experimental feature for existential type aliases:
The feature and RFC tracked in this issue works beautifully. It has none of the ergonomic pitfalls, requires no breaking changes in existing future code, in general plays very nice with Rust and the way it handles async code through futures. It just works.
Despite positive endorsements like that, work on bringing full existential types to Rust has not exactly been smooth. In keeping with the Rust community's approach to building complex features, extensions to impl Trait types have trickled in over time as small chunks of the whole feature. For example, programmers can now write trait methods that return an impl Trait type, which is internally de-sugared to an associated existential type alias — but writing an associated existential type alias by hand is not yet supported. Rust 2024 is also expected to change how impl Trait types capture lifetime information.
This piecemeal approach means that there are still design questions about how existential types should interact in some cases with the rest of Rust's increasingly complicated type system. Another feature currently in development is "associated type defaults", which would permit specifying a default value for a trait's associated type. How this would interact with existential type aliases is still up in the air.
Even though the road to bringing existential types to Rust has been long, it does seem likely that the last remaining design problems will be sorted out in the near future. Existential types would, among their ancillary benefits, make writing asynchronous functions in certain contexts (such as storing their returned impl Future values in a structure, among other uses) a good deal more ergonomic. Polishing Rust's story for asynchronous programming is one of the roadmap goals for Rust 2024, and the focus of substantial effort by Rust's contributors.
Rust for embedded Linux kernels
The Rust programming language, it is hoped, will bring a new level of safety to the Linux kernel. At the moment, though, there are still a number of impediments to getting useful Rust code into the kernel. In the Embedded Open Source Summit track of the Open Source Summit North America, Fabien Parent provided an overview of his work aimed at improving the infrastructure needed to write the device drivers needed by embedded systems in Rust; there is still some work to be done.Parent started with the case for using Rust in the kernel; it may not be a proper justification, he said, but it is true that Rust is one of the most admired languages in use. C is about 50 years old and has not changed much since the C89 standard came out. It has the advantage of a simple syntax that is easy to learn, and it is efficient for writing low-level code. But C also makes it easy to write code containing undefined behavior and lacks memory-management features.
![Fabien Parent [Fabien Parent]](https://static.lwn.net/images/conf/2024/ossna/FabienParent-sm.png) Rust, instead, is about ten years old and has a new release every six
weeks.  It is harder to learn and forces developers to come up to speed on
concepts like ownership and borrowing.  But the code produced is efficient;
Rust's abstractions are meant to be zero-cost, with the verification work
done at compile time.  Rust forces developers to handle errors, eliminating
another frequent cause of bugs.
Rust, instead, is about ten years old and has a new release every six
weeks.  It is harder to learn and forces developers to come up to speed on
concepts like ownership and borrowing.  But the code produced is efficient;
Rust's abstractions are meant to be zero-cost, with the verification work
done at compile time.  Rust forces developers to handle errors, eliminating
another frequent cause of bugs.
Thus, he said, it makes sense to use Rust in the kernel, hopefully leading to safer code overall. There is basic Rust support in the kernel now, but it is focused on driver code. There is currently no plan to support core-kernel code written in Rust, partly because the LLVM-based rustc compiler, which is the only viable compiler for Rust code currently, does not support all of the architectures that the kernel does. Rust support in the kernel is still considered to be experimental.
There are some drawbacks to using Rust in the kernel, starting with the current drivers-only policy. Most kernel vulnerabilities, he said, are not actually in driver code; instead, they appear in core code like networking and filesystems. As long as Rust is not usable there, it cannot help address these problems. Adding Rust, of course, will complicate the maintenance of the kernel, forcing maintainers to learn another language. The abstractions needed to interface Rust to the rest of the kernel are all new code, some of which may well contain bugs of its own.
Parent became interested in Rust after stumbling across a sample GPIO driver in Rust on LWN. He immediately started trying to write some kernel code in Rust, but failed soon thereafter. At this point, there simply is not a lot of kernel code that a new developer can use to learn from. So, instead, he went and rewrote all of his custom tools in Rust; after that, he was better prepared to work on the kernel.
There are, he said, a lot of people trying to contribute to the Rust-for-Linux effort; there is an online registry containing much of that work. But many of the basic abstractions needed for useful Rust code still are not in the mainline, and that is preventing others from making progress. The work that is seemingly advancing, including support for graphics drivers, Android's binder, and filesystems like PuzzleFS are not useful for the embedded work that Parent is interested in. Most of this work has been done on x86 systems, with the exception of the Apple M1 GPU driver. Many of the key abstractions needed for embedded work are missing from the kernel; many of those exist, but they are often unmaintained.
Parent had a long list of requirements for embedded systems, starting with support for the Rust language on 64-bit Arm systems; that, at least, has been merged for the upcoming 6.9 kernel release. Many abstractions for subsystems like clocks, pin control, run-time power management, regulators, and so on are not yet there. The abstractions have proved to be a challenge; maintainers will not merge code that is not used elsewhere in the kernel, but drivers cannot be merged until the abstractions are there. That leads to a situation where a lot of people are involved, each of whom are waiting on pieces from the others. That makes it hard to get the pieces upstream.
Parent's objective is to write simple drivers with minimal dependencies, each of which can be used to get a small number of abstractions upstream. He gave as an example a regulator driver that needs a relatively small set of abstractions, including those for platform drivers, regulators, regmap, I2C drivers, and Open Firmware for probing. He will be trying to get that set upstream; from there, work can proceed to more complex drivers.
The (conspicuously undocumented) regmap interface was called out for how it can showcase the advantages of Rust. Regmap eases access to devices that export an array of registers for configuration and operation. The Rust regmap abstraction allows the provision of a type-safe interface, built on top of the regmap_field API, that is generated with some "macro magic". The type checking allows the interface to ensure that register operations use the correct data types with each register, catching a number of common errors.
Parent's next step is to upstream a lot of this work, a task that, he acknowledges, will be difficult. But, if nothing else, he has learned a few lessons, starting with the fact that abstractions are more complex than one might expect, and they will have bugs. One problematic area is in ownership of resources; that is going to be hard to nail down for as long as there are extensive interfaces between the Rust and C sides. He advised other Rust developers to not try to write complete abstractions at the outset; instead, only the parts that are actually needed should be implemented.
Linked lists, a famous point of difficulty for Rust in general, present a special hazard in kernel code. The Rust compiler likes to move data around as a program runs; if that data happens to be a structure containing linked-list pointers, moving it will break the list and create hard-to-find bugs. Adding a list_head structure to an existing C structure can, as a result, break a Rust abstraction built on that structure in ways that are hard to detect automatically. The way he talked about this problem suggested a certain amount of hard-earned experience.
Even so, he summarized, writing kernel code in Rust makes a lot of things easier. Error handling is much more straightforward, and the compiler can ensure that developers have handled all possible values. Driver code tends to be a lot shorter and, he said, if the code compiles, it is likely to work.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]
Warning about WARN_ON()
Kernel developers, like conscientious developers for many projects, will often include checks in the code for conditions that are never expected to occur, but which would indicate a serious problem should that expectation turn out to be incorrect. For years, developers have been encouraged (to put it politely) to avoid using assertions that crash the machine for such conditions unless there is truly no alternative. Increasingly, though, use of the kernel's WARN_ON() family of macros, which developers were told to use instead, is also being discouraged.A longstanding way to test for a condition that cannot be recovered from is the BUG_ON() macro, which includes a test for the unexpected condition:
    /* This can never happen, honest, would I lie? */
    BUG_ON(foo_ptr == NULL);
A BUG_ON() call leads directly to a kernel panic, resulting (usually) in the machine being rebooted. There are times when there is no alternative, but use of BUG_ON() has been discouraged for years. Crashing the machine deprives the user of any chance of reacting to the problem or saving work and can make it harder to track down the source of the problem. Even so, there are something like 12,000 BUG_ON() instances in the kernel (not counting BUILD_BUG_ON(), which only affects the build process and is not discouraged in the same way).
Instead, developers are told to use WARN_ON(), which puts a traceback into the kernel log but does not crash the machine (in theory, at least, but keep reading). The kernel's coding-style document says:
Do not add new code that uses any of the BUG() variants, such as BUG(), BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not required if there is no reasonable way to at least partially recover.
Increasingly, though, developers are being told to avoid using WARN_ON() as well. There are a couple of reasons for that. One is that any WARN_ON() that can be triggered from user space can be used, at a minimum, to spam the system log, obscuring other events and perhaps affecting system performance. The other reason, though, applies even to WARN_ON() calls that cannot be triggered in this way.
The kernel contains a sysctl knob called panic_on_warn. It does exactly what its name suggests: if this option is set, any WARN_ON() call will cause the system to panic. In essence, it turns WARN_ON() calls into BUG_ON() calls. This option is set by users who see any warning as a sufficiently suspicious event that, when one happens, it is better to kill the system and start over. Such users include many Android devices and the host kernels run at cloud providers (and beyond). Any WARN_ON() that actually triggers, in other words, has the potential to bring down a lot of machines.
The same coding-style document advises developers that this outcome is something that panic_on_warn users have explicitly opted into:
However, the existence of panic_on_warn users is not a valid reason to avoid the judicious use WARN*(). That is because, whoever enables panic_on_warn has explicitly asked the kernel to crash if a WARN*() fires, and such users must be prepared to deal with the consequences of a system that is somewhat more likely to crash.
The current pressure against WARN_ON() use is not entirely consistent with this advice, though. Thus, Alex Elder was recently motivated to send a patch changing the advice given in the coding-style document. Gone is the language suggesting that panic_on_warn users were getting what they asked for; the new text reads:
The existence of this option is not a valid reason to avoid the judicious use of warnings. There are other options: ``dev_warn*()`` and ``pr_warn*()`` issue warnings but do **not** cause the kernel to crash. Use these if you want to prevent such panics.
Christoph Hellwig was quick to call this
change "wronger than wrong
": "If you set panic_on_warn you get to
keep the pieces
".  Laurent Pinchart pointed
out that the suggested alternatives are not the same; they are much
easier to ignore and, thus, less effective at getting developers to fix the
problem that the warning is trying to draw attention to.  Greg
Kroah-Hartman, though, was
happy to see this change.  The recommendation to avoid
panic_on_warn has been ignored, he said, so
new WARN_ON() calls should not be added.
To summarize the situation: over the years, BUG_ON() has been seen as so destructive that developers are simply told not to use it at all. The WARN_ON() macro has, instead, taken its place; but in settings where panic_on_warn is set, the end result of a WARN_ON() call is essentially the same. So, naturally, use of WARN_ON() is also now discouraged much of the time.
Whether the proposed documentation change will be applied is unclear; the kernel's befuddled documentation maintainer, who has happily not been appointed the arbiter of the kernel's coding style, makes a point of not applying coding-style changes in the absence of a clear consensus. It is not clear that a consensus on this change exists currently. Regardless of that change, though, developers will continue to be encouraged toward logging functions like pr_warn() instead of WARN_ON() — until somebody inevitably adds a panic_on_pr_warn sysctl knob and the whole process starts over again.
Weighted memory interleaving and new system calls
Gregory Price recently posted version 4 of a patch set that adds support for weighted memory interleaving — allowing a process's memory to be distributed between non-uniform memory access (NUMA) nodes in a more controlled way. According to the performance measurements he includes, the patch set could provide a significant improvement for computers with network-attached memory. The patch set also introduces new system calls and paves the way for future extensions intended to give processes more control over their own memory.
Modern computers can have a variety of kinds of memory in use at the same time. Not just traditional NUMA between separate banks of RAM within the same computer, but also memory distributed across a data center, like Compute Express Link (CXL) attached memory. These technologies allow computers to support much larger amounts of memory, at the cost of significantly complicating memory management and slower memory access speeds.
Current Linux kernels group different kinds of memory into tiers based on their latency. LWN covered how they interact with an earlier version of Price's patch set in October. The kernel also allows configuring processes to have different pages of their memory resident on different NUMA nodes. This spreads out the load between the separate parts of memory, but it's not perfect. For one thing, banks of memory can have different available bandwidths. The current default behavior is to assign allocated pages to different nodes in a round-robin way, which could over-allocate the bandwidth of the least-capable bank, even if other banks have more available capacity.
Price's patch set lets users specify unique weights for each NUMA node, and uses those weights when distributing freshly allocated pages. These weights are configured globally, but can be applied to specific processes using the kernel's NUMA memory-policy support. Only tasks that have the new MPOL_WEIGHTED_INTERLEAVE memory policy will use the weights.
The cover letter of the patch set includes a performance comparison (contributed by several different people) demonstrating how much better weighted interleaving can perform than the default round-robin scheme. In brief, it compares four settings for the same workloads: plain DRAM, CXL attached memory with the default interleaving policy, CXL memory with global weights according to bandwidth, and "targeted" weights that use different settings for the executable code, stack, and heap of the process. The default interleaving policy is on average 78% slower than DRAM. The global weights bring that performance to between 6% slower and 4% faster than DRAM depending on workload, and correctly chosen targeted weights push the performance to 2.5% to 4% better than DRAM.
Targeted weights have such a dramatic effect because different areas of a process's memory can have different access patterns that give an advantage to one memory policy or another. Memory policies for a whole process or a specific area of memory are configured with set_mempolicy() and mbind() respectively:
    long set_mempolicy(int mode, const unsigned long *nodemask,
                       unsigned long maxnode);
    long mbind(void addr[.len], unsigned long len, int mode,
               const unsigned long nodemask[(.maxnode + ULONG_WIDTH - 1)
                                            / ULONG_WIDTH],
               unsigned long maxnode, unsigned int flags);
The signature of mbind() introduces problems for weighted memory interleaving, however; the signature cannot be extended, because it is running up against the limits of how many arguments can be provided to a system call (at most six). Price's patch set rectifies this by introducing a new system call — mbind2() — that takes a structure as an argument, but otherwise performs the same function.
    struct mpol_args {
      __u16 mode;
      __u16 mode_flags;
      __s32 home_node;
      __u64 pol_maxnodes;
      __aligned_u64 *pol_nodes;
      /* Optional: interleave weights for MPOL_WEIGHTED_INTERLEAVE */
      unsigned char *il_weights;    /* of size MAX_NUMNODES */
    };
    mbind2(unsigned long addr, unsigned long len, struct mpol_args *args,
           size_t size, unsigned long flags);
The mpol_args struct is intended to be extensible over time, which is why the system call includes the structure's size. Any new extensions to the memory-policy infrastructure can add options to the end of the structure, and old callers won't need to be updated. Price's patch set also adds set_mempolicy2() and get_mempolicy2() using the same scheme, to support setting or retrieving task-wide weights, respectively.
Like existing memory-policy settings, Price's new weighted-interleaving policy is not a hard-and-fast rule. Setting a new memory policy does not migrate existing pages (unless the MPOL_MF_MOVE option is specified), and if the preferred NUMA node has no more memory available, the kernel will fall back to using the next available NUMA node. Price spelled this out explicitly in response to concerns brought up during review:
This interface does not limit memory usage of a particular node, it distributes data according to the requested policy.
Nuanced distinction, but important. If nodes become exhausted, tasks are still free to allocate memory from any node in the nodemask, even if it violates the requested mempolicy.
This is consistent with the existing behavior of mempolicy.
The weighted-interleaving
patch set has gathered relatively little commentary, perhaps because the
idea itself has been in progress for a long while. Price
posted a related patch set in November that would have changed
set_mempolicy() to take a process ID as an additional argument. The change
would allow
privileged processes to set memory policies for other processes. At the time,
Price described the November patch set as being designed
"to make mempolicy more flexible and extensible,
such as adding interleave weights (which may need to change at runtime
due to hotplug events)
". Because mbind() already passes six
parameters, however, that change would have needed new system calls as well. The
November patch set did not end up being merged. Now that Price's newer
weighted-interleaving patch set
introduces the needed system calls, it is possible that another version of the
older patch set will follow once the weighted-interleaving one is accepted.
Price's weighted-interleaving patch set does seem likely to be merged [Update: a reader points out that some of these changes, but not the new system calls, were merged under a different name as part of 6.9], given the impressive number of Suggested-by tags in the cover letter and the minimal objections from Ying Huang and Geert Uytterhoeven, who reviewed it. It seems as though many people are eager to have more control over how their processes' memory is distributed.
A change in direction for security-module stacking?
The long-running effort to complete the work on stacking (or composing) the Linux security modules (LSMs) recently encountered a barrier—in the form of a "suggestion" to discontinue it from Linus Torvalds. His complaint revolved around the indirect function calls that are used to implement LSMs, but he also did not think much of the effort to switch away from those calls. While it does not appear that a major course-change is in store for LSMs, it is clear that Torvalds is not happy with the direction of that subsystem.
In an April 9 post
to the linux-security-module mailing list, Torvalds decried the stacking
plans in part because "we just had *another* hardware security issue with
speculated indirect branches go public
".  He was referring to the branch history injection flaw that was the
most recent in a long line of speculative-hardware vulnerabilities.  Torvalds said that he recognized that stacking
LSMs was a "design decision
and a target
" for over a decade, but it needs a rethink:
So I say "suggestion" in the subject line, but really think it needs to be more than that: this whole "nested LSM" stuff as a design goal just needs to be all rolled back, and the new design target is "one LSM, enabled statically at build time, without the need for indirect calls".
He also said that he was aware of KP Singh's work
to use static calls to avoid the indirect
function calls in LSMs, but seemed to suggest that the patches were "random hacks
". 
There are some seeming misunderstandings in Torvalds's complaints, however.
For one 
thing, there is no "nesting" of LSMs—"stacking" either really—the security
solutions are 
composed, instead.  A given hook function in the core kernel will
effectively traverse the list of active LSMs, calling the corresponding
hook function if present for an active LSM, until it gets a denial, which
short-circuits the rest of the calls.  If no LSM denies the access, it is
allowed. 
There are a number of real use cases for having multiple LSMs active in the kernel. Our 2022 article on the feature describes the history of how we have gotten to this point and why it is important to be able to enable multiple LSMs on current systems. It has been possible to compose any number of "minor" LSMs for years now, but the final push is on to allow more than one "major" LSM (e.g. SELinux, Smack, AppArmor) to be enabled. The main reason behind the need for that is containers, so that a Fedora container that uses SELinux can run on an Ubuntu host that uses AppArmor, for example.
In a response to Torvalds, Kees Cook patiently pointed out some of that history, including the reasons behind the LSM-stacking work. In the end, he said, it has simplified things to the point where subsystems that logically should be LSMs could be switched:
The general "LSM stacking" work, while it does add a certain kind of complexity, has actually made the many pre-existing manual layering of LSMs much more sane and simpler to reason about. Now the security hooks are no longer a random sprinkling of calls tossed over the core kernel, and are now all singularly well defined. This started years ago with pulling the "capabilities" checking into a well-defined LSM, and continued from there for things like Yama, and has now finally reached the last, and perhaps most historically invasive, LSM: IMA/EVM [Integrity Measurement Architecture/Extended Verification Module], which is finally now a well defined LSM too.I don't think it's sane to demand that LSM stacking be removed. That's just not the world we live in -- we have specific and large scale needs for the infrastructure that is in place.
Cook also disagreed with the characterization of static calls, noting that
they have been needed by the LSM subsystem for over a year just for the
performance benefits.  But Torvalds strongly
disagreed; he 
said that the reason for stacking is: "Just because you people cannot
agree
".  He also explained that it was not static calls themselves that were random hacks,
but that the use of them for LSMs is, in part because of the random-seeming
limit of 11 levels of "nesting".  His parting shot was to further
paint the LSMs as an 
attack vector against the kernel.
As might be guessed, Cook saw
things differently.  He noted, again, that stacking has been around for
quite some time now; his current system has five separate LSMs activated,
not to mention the capabilities LSM that is always present.  "Stacking" is
not removable at this point, but, beyond that, the most recent
vulnerability  is not in the LSM subsystem: "the attack vector is broken
CPUs
". 
In addition, the array to hold the static calls needs to have a limit and
there are 11 LSMs available for the kernel, which is why that number
was chosen.
LSM maintainer Paul Moore was rather unhappy with another part of Torvalds's message. For whatever reason, Torvalds was unable to resist taking a shot at the LSM subsystem and its developers in his initial message:
Yes, I realize that the "security" in the LSM name is a bad joke, and that to a first level approximation the LSM people don't actually care about real security, and that the goal is just "policy".
Moore wondered if the insult was really just rooted in stress from
yet another hardware flaw affecting the kernel, but even so, the effects
will be borne by the LSM developers.  Because of who he is, Torvalds's
words have much greater weight, Moore said.  It is thus rather ironic that
Torvalds is asking—"(demanding? it's hard to say at
this point)
"—those he just insulted to rework their subsystem.  Moore
pointed out that insults are not likely to be particularly motivating. 
Beyond that, as Cook had pointed out, it is far too late to remove stacking
entirely.  The LSM developers will act on Torvalds's email, Moore said, but
the first 
step is to reduce the performance penalty of the indirect calls—and, in the
process, 
mitigate the hardware security flaws they expose—by
getting the LSM static calls patches merged.  "The rest will
need more discussion, preferably after things have cooled down and we
call all look at things with a more objective lens.
"
Casey Schaufler, who has been pushing the full LSM-stacking work upstream for 12 years or more at this point, replied
to Torvalds's complaints by agreeing with some of his points.  As with
other developers, he is completely in favor of replacing the indirect
calls, but is 
unsure what they should be replaced with if static calls are not the right
approach.  "While I can't change the brain dead behavior of 21st century
hardware
I am perfectly willing to re-write the entire $%^&*( LSM layer if it
can be done in a way that makes you happy.
"  But Moore said
that Schaufler should not head down that path;  Moore has no plans to move to
"a single-LSM approach to satisfy a spur
of the moment comment triggered by the latest hardware flaw
".  He
repeated his plan to convert the LSMs to use static calls "and go from
there
".   
Meanwhile, Greg Wettstein thought
that more sweeping changes are needed for LSMs in order to support "an
environment where there are going 
to be multiple and potentially industry specific mandated security 
controls
".  While he agrees that the performance and attack-vector
characteristics of indirect branches need to be mitigated, he does not see
static calls as the right path, at least given the current LSM
architecture.
There needs to be an 'LSM' architecture that allows a security policy to be implemented for a process hierarchy, just like we do for all other resource namespaces. Platform owners are then free to select whether they want to implement multiple orthogonal security controls or lock the platform into a single control of their choosing.
While that may sound like a situation tailor-made for a BPF solution, he cautioned against that approach, as well, citing the discussion about a recent patch. There have been no replies to his post, however, which may be an indication that radical changes along those lines are fairly unlikely.
In truth, Torvalds's post seems to have been made in haste—coupled with serious unhappiness about the latest hardware flaw. Backing out all of the LSM-stacking work seems well-nigh impossible at this point, especially considering the user-space compatibility guarantees that Torvalds himself regularly enforces. Beyond that, the container use case for multiple major LSMs is not going away either, so some sort of solution will be needed there. The LSM development community seems willing to engage on alternate solutions, but one suspects that what has come out of more than a decade of effort will eventually be adopted.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: GitHub malware; Tille elected; Fedora 40; QEMU 9.0; Open Home Foundation; udev-hid-bpf; Firefox crash reporting; Quotes; ...
- Announcements: Newsletters, conferences, security updates, patches, and more.
 
           
![Linus Torvalds and Dirk Hohndel [Linus Torvalds and Dirk Hohndel]](https://static.lwn.net/images/2024/Dirk-Linus-ossna-sm.png)