LWN.net Weekly Edition for April 25, 2024

Welcome to the LWN.net Weekly Edition for April 25, 2024

This edition contains the following feature content:

Linus and Dirk chat about AI, XZ, hardware, and more: the latest iteration of a traditional Open Source Summit session.
Gentoo bans AI-created contributions: the Gentoo project feels the need to take action against increased usage of machine-learning systems.
Existential types in Rust: a type-system improvement that helps asynchronous use cases (and beyond).
Rust for embedded Linux kernels: embedded developer Fabien Parent describes his approach to getting useful Rust code into the mainline kernel.
Warning about WARN_ON(): the kernel community debates the use of the WARN_ON() family of macros.
Weighted memory interleaving and new system calls: giving applications more control over how their memory is placed in heterogeneous-memory systems.
A change in direction for security-module stacking?: Linus Torvalds questions the development direction of the kernel's security-module subsystem.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Linus and Dirk chat about AI, XZ, hardware, and more

By Joe Brockmeier
April 22, 2024

OSSNA

One of the mainstays of the Linux Foundation's Open Source Summit is the "fireside chat" (sans fire) between Linus Torvalds and Dirk Hohndel to discuss open source and Linux kernel topics of the day. On April 17, at Open Source Summit North America (OSSNA) in Seattle, Washington, they held with tradition and discussed a range of topics including proper whitespace parsing, security, and the current AI craze.

Calm and boring

It has been a number of years since the Linux kernel was the exciting new technology. That seems to suit Torvalds just fine. Hohndel opened the discussion by asking where things stand with kernel development right now. Torvalds responded that the kernel was at 6.9-rc4 and that things seemed to be "calm and boring," which is as it should be for a 30-year-old project. Those seeking excitement, he said, should seek out "one of the hype areas".

But Linux does have a lot of drama and high-stakes discussions, Hohndel said. "A really important topic that has once again reared its head is tabs versus spaces." Torvalds rolled his eyes at this and muttered, but he was unable to resist taking the bait.

The topic was Torvalds taking exception to a patch that replaced a tab with a space character to make Kconfig files easier to parse. He showed his displeasure by purposely adding a few more tabs to a different Kconfig file that would trip up unwary programs. In the commit message, he wrote that it wasn't clear what tool failed to parse tabs correctly but it was in need of fixing: "Because if you can't parse tabs as whitespace, you should not be parsing the kernel Kconfig files".

Naturally, this got widespread attention. This, Torvalds said, "is the kind of excitement you get in the kernel community". Hohndel agreed, and said that the reason this garnered attention is because there's "not enough other drama" to focus on.

Hardware bugs

With that the discussion sailed on to more substantive topics, such as hardware bugs. Torvalds said that the security bugs in hardware have been very frustrating. He wasn't complaining about the work required to address hardware vulnerabilities, however, but the secrecy that comes into play when working to fix the issues. That, he complained, was not how he liked to work.

I love the development model where you can talk to people and work on interesting stuff, and the security issues we've had over the last decade have kind of destroyed that for me.

If it weren't for the secrecy, said Torvalds, "the challenges would otherwise be pretty interesting".

Another frustration for Torvalds is how long it takes to address a bug in the hardware itself. "We can react quite quickly in software", he said, "but then the hardware people are saying, oh, we have five generations of hardware that we can't fix after the fact". And, because the next couple of generations of hardware are already designed, it will take a few more years before new hardware can work around it.

Hohndel asked whether RISC-V, an open hardware platform, was going to be an improvement. Torvalds said that his fear was that RISC-V would make all the same mistakes that we've seen a decade earlier with x86. They'll be fixed more quickly, he predicted, because "by now people have learned something". But looking into a future with RISC-V being widely deployed, he expects to see the same problems that x86 and Arm have had.

Hohndel noted that the RISC-V work will be done in the open, providing an opportunity to come in early and say "we've tried this, and it doesn't work". Torvalds dismissed the idea that open hardware meant flaws would be caught in development, though, and said that there's a "big gulf between software and hardware people" that is hard to work across.

The XZ incident

The XZ backdoor was a dominant topic in presentations and hallway conversations at OSSNA this year. Naturally, Hohndel steered the conversation from hardware security to that topic.

Open source, Torvalds said, relies on "a certain amount of trust". Trusting people around you to do the right thing. Not only open source, he noted, but proprietary software as well. Communities and companies must depend on trusting people, and that trust can be violated.

Torvalds is no stranger to violations of trust. He gave the example of the University of Minnesota (UMN) sending patches with intentional bugs as part of an experiment. Kernel maintainers caught the bad patches, and were really upset about being experimented on. That study was interesting, he said, but "they didn't do it very well".

While the study was poorly executed, most people would agree the UMN incident was not malicious. The XZ backdoor was malicious, and Torvalds pointed out that "nobody had any explicit gates in place to try to catch this". Despite that, though, "it was caught fairly quickly". Not because of procedures or processes, but because it was found randomly when a developer noticed something wrong. But random is good, Torvalds said. It's not possible to always have specific rules in place to catch everything, and when there are rules in place, attackers can try to work around them. The fact that XZ was caught, and quickly, "does imply a fairly strong amount of stability".

However, he did say that this event is a wake-up call, and there are now "a lot of people looking into various measures in the kernel". The biggest defense, said Torvalds, is "a healthy community". The Linux kernel community is that, with an "incredibly big, incredibly entwined and connected community where there are multi-year and multi-decade relationships", he said. It is also, he was quick to point out, an outlier. Many open-source projects are run by just a few people, or just one, whereas "we have 1,000 people that basically participate in every single release every couple of months". So what the kernel does can't apply to 99 percent of other projects.

Here, Hohndel called on the audience to get involved. "Each of you works for a company, have your company adopt a couple of such projects and just participate." Every bit helps, he said, read the code, be part of review, "provide moral support to the maintainers".

But while the bad actors are out there and draw attention they're not the main problem, according to Torvalds. The main problem is that there will continue to be bugs because no one is perfect, and those need solving as well.

AI

To bring the topic back to something "fun and entirely uncontroversial", Hohndel decided to steer the chat in the direction of AI. If you want to double your salary, he said, just add "AI" to your title. Until it takes all the jobs. "What I find so interesting is this idea that [generative] AI is going to be the end of programmers, the end of authors", Hohndel said. Even Torvalds would be replaced by an AI model.

"Finally!", Torvalds joked. "I hate the hype", he said, but he does find the technology interesting. It has also had some positives, like bringing companies to the table for kernel development. "For example, a company like NVIDIA—who is not exactly famous for being great at interacting with the kernel community—has been much more active". Suddenly, he said, they started caring about Linux. So it has had a positive impact.

However, he cautioned that people should take a wait-and-see approach. And he was optimistic about the technology making it easier to catch kernel bugs. "Making the tools smarter is not a bad thing", he said, but warned against "gloom and doom" or over-hyping the technology.

What's next

Torvalds is no stranger to making tools. A little project called "Git" has also had an enormous impact on the industry. But that, Hohndel pointed out, was more than a decade ago. People want to know "what's next"? When will we see another major project?

If Torvalds has his way, the answer to the question is never. "I say that because every single project I've started always started from me being frustrated with other people being incompetent." Or, he added, with their money-grubbing. So he hopes that he doesn't find himself in that situation again, or "that there will be somebody else who solves my problems".

Right now, Torvalds said, "I don't have any huge problems, Linux for me solved all the problems I had way back in '92 or '93". If others hadn't found it useful, "I would not have continued".

By this time, the pair had run out of time. Hohndel said he had many more questions, but they would have to save them for Hong Kong, where the next Open Source Summit will be held in August.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]

Comments (8 posted)

Gentoo bans AI-created contributions

By Joe Brockmeier
April 18, 2024

Gentoo Council member Michał Górny posted an RFC to the gentoo-dev mailing list in late February about banning "'AI'-backed (LLM/GPT/whatever) contributions" to the Gentoo Linux project. Górny wrote that the spread of the "AI bubble" indicated a need for Gentoo to formally take a stand on AI tools. After a lengthy discussion, the Gentoo Council voted unanimously this week to adopt his proposal and ban contributions generated with AI/ML tools.

The case against

In his RFC, he laid out three broad areas of concern: copyrights, quality, and ethics. On the copyright front, he argued that LLMs are trained on copyrighted material and the companies behind them are unconcerned with copyright violations. "In particular, there's a good risk that these tools would yield stuff we can't legally use."

He questioned the quality of LLM output, though he did allow that LLMs might "provide good assistance if you are careful enough". But, he said, there's no guarantee contributors are aware of the risks. He minced no words about his view of the ethics of the use of AI. Górny took issue with everything from the energy consumption driven by AI to labor issues and "all kinds of spam and scam". The only reasonable course of action, he said, would be to ban the use of those tools altogether in creating works for Gentoo:

In other words, explicitly forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to create ebuilds, code, documentation, messages, bug reports and so on for use in Gentoo.

He added that this only extended to works created expressly for the Gentoo project, and did not encompass upstream projects using things like ChatGPT. Andreas K. Hüttel asked whether there were objections to packaging AI software for Gentoo. This did not elicit a response in favor or against on the list, but the AI policy page expressly mentions that the policy does not prohibit packaging AI-related software.

Is this necessary?

Rich Freeman wrote that he thought it made sense to consider the use of AI, but suggested the Gentoo developer certificate of origin (DCO) already had the necessary language to prohibit AI-generated contributions. "Perhaps we ought to just re-advertise the policy that already exists?" He also poked at the ethical case laid out by Górny, and suggested it would alienate some contributors even if the majority of the project was in favor. Freeman said it was not a bad idea to reiterate that Gentoo didn't want contributions that were just piped out of a GPT application into forums, bug reports, commits, etc., but didn't think that it required any new policy.

Ulrich Mueller replied that there is overlap with existing policy, but did not find it redundant and supported the idea of a clarification on how to deal with AI-generated code. Sam James agreed with the proposal but worried that it was "slightly performative [...] given that we can't really enforce it." Górny wrote that it was unlikely that the project could detect these contributions, or that it would want to actively pursue finding them. The point, he said, is to make a statement that they are undesirable.

Oskari Pirhonen wanted to know about cases where a contributor uses ChatGPT to help with writing documentation or commit messages (but not code) because they don't have "an excellent grasp of English". If those contributions explicitly called out AI-generated content, would those be acceptable? Górny said that would not help much, and dismissed the quality of content generated by ChatGPT. Mueller wanted to know where the line was: "Are translation tools like DeepL allowed? I don't see much of a copyright issue for these."

In a rare dissent, Matt Jolly responded that Gentoo would always have poor quality contributions, and could simply use common sense to filter out low-quality LLM material. "We already have methods for weeding out low quality contributions and bad faith contributors - let's trust in these and see what we can do to strengthen these tools and processes." He argued in favor of using LLMs for code documentation and asked why he had to type out an explanation of what his code does if an LLM can generate something that only requires some editing. The proposal, he said, was a bad idea and banning LLMs "at this point is just throwing the baby out with the bathwater". Guidelines would be fine, even a ban on completely AI-generated works, but he was opposed to "pre-emptively banning useful tools".

James replied that tools trained on Gentoo's current repository should be OK, as well as using LLMs to assist with commit messages. But, he said, a lot of FOSS projects were seeing too much AI spam and were not interested in picking the "possibly good" parts out.

David Seifert responded in support of the RFC and asked if it could be added to the next Gentoo Council meeting agenda. Górny said that he had been asked for a specific motion and provided this language:

It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.

Approved

Given the ratio of comments in favor of banning AI-generated contributions to objections to such a ban, it is not surprising that the council voted to accept Górny's proposal. Now the question is how Gentoo implements the ban. In an emailed response to questions, Górny said that Gentoo is relying on trust in its contributors to adhere to the policy rather than trying to police contributions to see if they were generated with AI/ML tools:

In both cases, our primary goal is to make it clear what's acceptable and what's not, and politely ask our contributors to respect that. If we receive contributions that contain really "weird" mistakes, the kind that [do not] seem likely to be caused by a human error, we're going to start asking questions, but I think that's the best we can do.

As AI/ML continues to dominate the tech industry's agenda, Gentoo is unusual in looking to shut it out rather than trying to join the party. How well the policy works, and how soon it is tested, will be interesting to see.

Comments (126 posted)

Existential types in Rust

By Daroc Alden
April 24, 2024

For several years, contributors to the Rust project have been working to improve support for asynchronous code. The benefits of these efforts are not confined to asynchronous code, however. Members of the Rust community have been working toward adding explicit existential types to Rust since 2017. Existential types are not a common feature of programming languages (something the RFC acknowledges), so the motivation for their inclusion might be somewhat obscure.

The benefits of static type systems are well-known, but they do have some downsides as well. Type systems, especially complex type systems, can make writing type signatures painful and produce complicated error messages. A recent comment on Hacker News showed an example of types added to the popular SQLAlchemy Python library, lamenting: "Prior to the introduction of types in Python, I thought I wanted it. Now I hate them."

These complaints are hardly new; they drove C++ and Java to adopt auto and var keywords for variable declarations, respectively, in order to save programmers from having to actually write down the lengthy types assigned to values in their programs. Both of these features reduce the burden associated with complex types by letting the compiler do some of the work and infer the relevant types from context. These mechanisms don't represent a complete solution, and cause their own set of problems, however. Using them, it is easy to accidentally change the inferred type of a variable in a way that breaks the program. And the resulting error messages still refer to the full, unreadable types. Additionally, local type inference doesn't help with types in function signatures.

There are solutions to all of these problems — the C++ committee introduced concepts, which can help simplify some complex types behind an interface, partially to address this — but Rust has been trying to avoid falling into the trap altogether, despite an increasingly complex type system of its own. Existential types are one mechanism intended to make dealing with complex types easier. Unfortunately, they are also currently not well-explained or well-understood outside a few specific niches. The RFC calls this out as one problem with the current status quo:

The problem stems from a poor understanding of what "existential types" are — which is entirely unsurprising: existential types are a technical type theoretic concept that are not widely encountered outside type theory (unlike universally-quantified types, for instance). In discussions about existential types in Rust, these sorts of confusions are endemic.

Existential types get their name from mathematical logic via the existential quantifier, but the realization of the concept in an actual programming language like Rust is a good deal less abstract. Simply put, existential types are types that exist, but which cannot be directly manipulated outside of their scope. Normal generic types (referred to as universally-quantified types in the quote above) let the caller of a function decide what concrete type the function should be called with. In this circumstance, the function can only interact with values of this type as opaque values, because it doesn't know what type the caller will choose. Existential types invert the direction of that control, letting the function itself decide what concrete type should be used, while the caller of the function must now treat the values as being of an unknown type.

Existential types today

Rust actually already has a limited form of existential types, just not by that name. Instead, the Rust documentation refers to them as impl Trait types. They allow the programmer to say that a function takes or returns some type that implements a trait, without actually saying what that type is. For example, the caller of this function can use the return value as an iterator, but cannot see what type it has (in this case, a Range):

    fn example() -> impl Iterator<Item = u8> {
      0..10
    }

impl Trait types are useful for abstracting away API details without introducing any kind of runtime indirection. At compile time, the compiler knows the specific concrete type that underlies an impl Trait type, but it doesn't need that type explicitly written out, nor does it need to complicate error messages by showing it. In contrast to a mechanism like auto, changing the body of the function in a way that results in returning a type incompatible with the type signature (in this case, one that is not an Iterator) still causes a type error.

Abstracting away the inferred type like this is especially useful for asynchronous functions, which are syntactic sugar for functions that return impl Future. Since asynchronous functions return existential types under the hood, any limitations or improvements to existential types affect asynchronous functions as well. Existential types are also useful for returning closure types, which do not actually have names in Rust. (A design decision made for efficiency reasons that C++ actually shares — it permits better inlining of anonymous functions.)

In 2018, Rust gained the ability to have impl Trait types as the argument to a function as well. However, these types still remain quite constrained compared to full existential types. For example, they can't appear in type aliases, or be stored inside structures. It's only in December 2023 with Rust 1.75 that they were allowed as return values from trait methods.

Existential types in the future

But there is one more subtle restriction on impl Trait types — every occurrence in the program refers to a different concrete type. Two functions that both return impl Debug, for example, could very well return two entirely different types. This makes it hard to write a collection of functions (such as implementations of the same interface for different configurations or architectures) that are all guaranteed to return the same type, without explicitly writing out that type.

There is a workaround for that use case, but it involves a layer of run-time indirection by making functions return a trait object — a heap-allocated structure full of function pointers that presents an opaque interface to a value. Using trait objects is a poor substitute for existential types for a few reasons. For one, it has a noticeable performance overhead because it prevents static method resolution and function inlining. For a language that prides itself on providing zero-cost abstractions, requiring programs to use runtime indirection is unacceptable. For another, returning trait objects can't quite express the same guarantees that existential types can.

The next step on the road toward full existential types is allowing them to be used in type aliases, which would make their use more consistent with other types in Rust. That change would allow programmers to write things like this:

    type Foo = impl Debug;

    fn function1() -> Foo {
      'a'
    }

    fn function2() -> Foo {
      'b'
    }

Critically, these functions are now guaranteed to return values of the same type, which lets programmers express patterns that were not previously possible. This is also the missing piece to allow impl Trait types to be stored in structures. In current Rust, the concrete type underlying an impl Trait type is only inferred when processing a function's arguments or return types — which is sufficient for the existing uses of existential types, but not for full existential types. When support for existential types in Rust is fully complete, the compiler should be able to infer the type of a member of a structure from how it's used. For now, permitting existential types in type aliases as the RFC does provides a workaround:

    struct Bar {
      item: impl Debug, // Error, can't infer underlying type
    }

    // Code using the RFC:

    type Quux = impl Debug;
    struct Bar {
      item: Quux,
    }

    // Later uses of 'Quux' let the compiler infer a concrete type.
    fn function3 -> Quux {
      42
    }

This should cover a number of use cases, because the most common reason to want to store a value of an existential type in a structure is because it is produced by some method, and not otherwise storable except by converting it to a trait object.

This work is the last major step toward existential types that can be used in all the same ways as Rust's existing types. The RFC points out the confusion the current piecemeal solution causes as one reason to want a version of existential types that can be used everywhere: "it is valuable from documentation and explanatory angles to unify the uses of impl Trait so that these types of questions never even arise."

Glen De Cauwsemaecker commented on the work in November 2023, saying that he had tried to use asynchronous functions in some of his networking code, but had run into serious usability problems when combining asynchronous functions with traits. After struggling to express the interface he wanted, he ended up using the experimental feature for existential type aliases:

The feature and RFC tracked in this issue works beautifully. It has none of the ergonomic pitfalls, requires no breaking changes in existing future code, in general plays very nice with Rust and the way it handles async code through futures. It just works.

Despite positive endorsements like that, work on bringing full existential types to Rust has not exactly been smooth. In keeping with the Rust community's approach to building complex features, extensions to impl Trait types have trickled in over time as small chunks of the whole feature. For example, programmers can now write trait methods that return an impl Trait type, which is internally de-sugared to an associated existential type alias — but writing an associated existential type alias by hand is not yet supported. Rust 2024 is also expected to change how impl Trait types capture lifetime information.

This piecemeal approach means that there are still design questions about how existential types should interact in some cases with the rest of Rust's increasingly complicated type system. Another feature currently in development is "associated type defaults", which would permit specifying a default value for a trait's associated type. How this would interact with existential type aliases is still up in the air.

Even though the road to bringing existential types to Rust has been long, it does seem likely that the last remaining design problems will be sorted out in the near future. Existential types would, among their ancillary benefits, make writing asynchronous functions in certain contexts (such as storing their returned impl Future values in a structure, among other uses) a good deal more ergonomic. Polishing Rust's story for asynchronous programming is one of the roadmap goals for Rust 2024, and the focus of substantial effort by Rust's contributors.

Comments (19 posted)

Rust for embedded Linux kernels

By Jonathan Corbet
April 23, 2024

OSSNA

The Rust programming language, it is hoped, will bring a new level of safety to the Linux kernel. At the moment, though, there are still a number of impediments to getting useful Rust code into the kernel. In the Embedded Open Source Summit track of the Open Source Summit North America, Fabien Parent provided an overview of his work aimed at improving the infrastructure needed to write the device drivers needed by embedded systems in Rust; there is still some work to be done.

Parent started with the case for using Rust in the kernel; it may not be a proper justification, he said, but it is true that Rust is one of the most admired languages in use. C is about 50 years old and has not changed much since the C89 standard came out. It has the advantage of a simple syntax that is easy to learn, and it is efficient for writing low-level code. But C also makes it easy to write code containing undefined behavior and lacks memory-management features.

Rust, instead, is about ten years old and has a new release every six weeks. It is harder to learn and forces developers to come up to speed on concepts like ownership and borrowing. But the code produced is efficient; Rust's abstractions are meant to be zero-cost, with the verification work done at compile time. Rust forces developers to handle errors, eliminating another frequent cause of bugs.

Thus, he said, it makes sense to use Rust in the kernel, hopefully leading to safer code overall. There is basic Rust support in the kernel now, but it is focused on driver code. There is currently no plan to support core-kernel code written in Rust, partly because the LLVM-based rustc compiler, which is the only viable compiler for Rust code currently, does not support all of the architectures that the kernel does. Rust support in the kernel is still considered to be experimental.

There are some drawbacks to using Rust in the kernel, starting with the current drivers-only policy. Most kernel vulnerabilities, he said, are not actually in driver code; instead, they appear in core code like networking and filesystems. As long as Rust is not usable there, it cannot help address these problems. Adding Rust, of course, will complicate the maintenance of the kernel, forcing maintainers to learn another language. The abstractions needed to interface Rust to the rest of the kernel are all new code, some of which may well contain bugs of its own.

Parent became interested in Rust after stumbling across a sample GPIO driver in Rust on LWN. He immediately started trying to write some kernel code in Rust, but failed soon thereafter. At this point, there simply is not a lot of kernel code that a new developer can use to learn from. So, instead, he went and rewrote all of his custom tools in Rust; after that, he was better prepared to work on the kernel.

There are, he said, a lot of people trying to contribute to the Rust-for-Linux effort; there is an online registry containing much of that work. But many of the basic abstractions needed for useful Rust code still are not in the mainline, and that is preventing others from making progress. The work that is seemingly advancing, including support for graphics drivers, Android's binder, and filesystems like PuzzleFS are not useful for the embedded work that Parent is interested in. Most of this work has been done on x86 systems, with the exception of the Apple M1 GPU driver. Many of the key abstractions needed for embedded work are missing from the kernel; many of those exist, but they are often unmaintained.

Parent had a long list of requirements for embedded systems, starting with support for the Rust language on 64-bit Arm systems; that, at least, has been merged for the upcoming 6.9 kernel release. Many abstractions for subsystems like clocks, pin control, run-time power management, regulators, and so on are not yet there. The abstractions have proved to be a challenge; maintainers will not merge code that is not used elsewhere in the kernel, but drivers cannot be merged until the abstractions are there. That leads to a situation where a lot of people are involved, each of whom are waiting on pieces from the others. That makes it hard to get the pieces upstream.

Parent's objective is to write simple drivers with minimal dependencies, each of which can be used to get a small number of abstractions upstream. He gave as an example a regulator driver that needs a relatively small set of abstractions, including those for platform drivers, regulators, regmap, I2C drivers, and Open Firmware for probing. He will be trying to get that set upstream; from there, work can proceed to more complex drivers.

The (conspicuously undocumented) regmap interface was called out for how it can showcase the advantages of Rust. Regmap eases access to devices that export an array of registers for configuration and operation. The Rust regmap abstraction allows the provision of a type-safe interface, built on top of the regmap_field API, that is generated with some "macro magic". The type checking allows the interface to ensure that register operations use the correct data types with each register, catching a number of common errors.

Parent's next step is to upstream a lot of this work, a task that, he acknowledges, will be difficult. But, if nothing else, he has learned a few lessons, starting with the fact that abstractions are more complex than one might expect, and they will have bugs. One problematic area is in ownership of resources; that is going to be hard to nail down for as long as there are extensive interfaces between the Rust and C sides. He advised other Rust developers to not try to write complete abstractions at the outset; instead, only the parts that are actually needed should be implemented.

Linked lists, a famous point of difficulty for Rust in general, present a special hazard in kernel code. The Rust compiler likes to move data around as a program runs; if that data happens to be a structure containing linked-list pointers, moving it will break the list and create hard-to-find bugs. Adding a list_head structure to an existing C structure can, as a result, break a Rust abstraction built on that structure in ways that are hard to detect automatically. The way he talked about this problem suggested a certain amount of hard-earned experience.

Even so, he summarized, writing kernel code in Rust makes a lot of things easier. Error handling is much more straightforward, and the compiler can ensure that developers have handled all possible values. Driver code tends to be a lot shorter and, he said, if the code compiles, it is likely to work.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our travel to this event.]

Comments (5 posted)

Warning about WARN_ON()

By Jonathan Corbet
April 18, 2024

Kernel developers, like conscientious developers for many projects, will often include checks in the code for conditions that are never expected to occur, but which would indicate a serious problem should that expectation turn out to be incorrect. For years, developers have been encouraged (to put it politely) to avoid using assertions that crash the machine for such conditions unless there is truly no alternative. Increasingly, though, use of the kernel's WARN_ON() family of macros, which developers were told to use instead, is also being discouraged.

A longstanding way to test for a condition that cannot be recovered from is the BUG_ON() macro, which includes a test for the unexpected condition:

    /* This can never happen, honest, would I lie? */
    BUG_ON(foo_ptr == NULL);

A BUG_ON() call leads directly to a kernel panic, resulting (usually) in the machine being rebooted. There are times when there is no alternative, but use of BUG_ON() has been discouraged for years. Crashing the machine deprives the user of any chance of reacting to the problem or saving work and can make it harder to track down the source of the problem. Even so, there are something like 12,000 BUG_ON() instances in the kernel (not counting BUILD_BUG_ON(), which only affects the build process and is not discouraged in the same way).

Instead, developers are told to use WARN_ON(), which puts a traceback into the kernel log but does not crash the machine (in theory, at least, but keep reading). The kernel's coding-style document says:

Do not add new code that uses any of the BUG() variants, such as BUG(), BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not required if there is no reasonable way to at least partially recover.

Increasingly, though, developers are being told to avoid using WARN_ON() as well. There are a couple of reasons for that. One is that any WARN_ON() that can be triggered from user space can be used, at a minimum, to spam the system log, obscuring other events and perhaps affecting system performance. The other reason, though, applies even to WARN_ON() calls that cannot be triggered in this way.

The kernel contains a sysctl knob called panic_on_warn. It does exactly what its name suggests: if this option is set, any WARN_ON() call will cause the system to panic. In essence, it turns WARN_ON() calls into BUG_ON() calls. This option is set by users who see any warning as a sufficiently suspicious event that, when one happens, it is better to kill the system and start over. Such users include many Android devices and the host kernels run at cloud providers (and beyond). Any WARN_ON() that actually triggers, in other words, has the potential to bring down a lot of machines.

The same coding-style document advises developers that this outcome is something that panic_on_warn users have explicitly opted into:

However, the existence of panic_on_warn users is not a valid reason to avoid the judicious use WARN*(). That is because, whoever enables panic_on_warn has explicitly asked the kernel to crash if a WARN*() fires, and such users must be prepared to deal with the consequences of a system that is somewhat more likely to crash.

The current pressure against WARN_ON() use is not entirely consistent with this advice, though. Thus, Alex Elder was recently motivated to send a patch changing the advice given in the coding-style document. Gone is the language suggesting that panic_on_warn users were getting what they asked for; the new text reads:

The existence of this option is not a valid reason to avoid the judicious use of warnings. There are other options: ``dev_warn*()`` and ``pr_warn*()`` issue warnings but do **not** cause the kernel to crash. Use these if you want to prevent such panics.

Christoph Hellwig was quick to call this change "wronger than wrong": "If you set panic_on_warn you get to keep the pieces". Laurent Pinchart pointed out that the suggested alternatives are not the same; they are much easier to ignore and, thus, less effective at getting developers to fix the problem that the warning is trying to draw attention to. Greg Kroah-Hartman, though, was happy to see this change. The recommendation to avoid panic_on_warn has been ignored, he said, so new WARN_ON() calls should not be added.

To summarize the situation: over the years, BUG_ON() has been seen as so destructive that developers are simply told not to use it at all. The WARN_ON() macro has, instead, taken its place; but in settings where panic_on_warn is set, the end result of a WARN_ON() call is essentially the same. So, naturally, use of WARN_ON() is also now discouraged much of the time.

Whether the proposed documentation change will be applied is unclear; the kernel's befuddled documentation maintainer, who has happily not been appointed the arbiter of the kernel's coding style, makes a point of not applying coding-style changes in the absence of a clear consensus. It is not clear that a consensus on this change exists currently. Regardless of that change, though, developers will continue to be encouraged toward logging functions like pr_warn() instead of WARN_ON() — until somebody inevitably adds a panic_on_pr_warn sysctl knob and the whole process starts over again.

Comments (33 posted)

Weighted memory interleaving and new system calls

By Daroc Alden
April 19, 2024

Gregory Price recently posted version 4 of a patch set that adds support for weighted memory interleaving — allowing a process's memory to be distributed between non-uniform memory access (NUMA) nodes in a more controlled way. According to the performance measurements he includes, the patch set could provide a significant improvement for computers with network-attached memory. The patch set also introduces new system calls and paves the way for future extensions intended to give processes more control over their own memory.

Modern computers can have a variety of kinds of memory in use at the same time. Not just traditional NUMA between separate banks of RAM within the same computer, but also memory distributed across a data center, like Compute Express Link (CXL) attached memory. These technologies allow computers to support much larger amounts of memory, at the cost of significantly complicating memory management and slower memory access speeds.

Current Linux kernels group different kinds of memory into tiers based on their latency. LWN covered how they interact with an earlier version of Price's patch set in October. The kernel also allows configuring processes to have different pages of their memory resident on different NUMA nodes. This spreads out the load between the separate parts of memory, but it's not perfect. For one thing, banks of memory can have different available bandwidths. The current default behavior is to assign allocated pages to different nodes in a round-robin way, which could over-allocate the bandwidth of the least-capable bank, even if other banks have more available capacity.

Price's patch set lets users specify unique weights for each NUMA node, and uses those weights when distributing freshly allocated pages. These weights are configured globally, but can be applied to specific processes using the kernel's NUMA memory-policy support. Only tasks that have the new MPOL_WEIGHTED_INTERLEAVE memory policy will use the weights.

The cover letter of the patch set includes a performance comparison (contributed by several different people) demonstrating how much better weighted interleaving can perform than the default round-robin scheme. In brief, it compares four settings for the same workloads: plain DRAM, CXL attached memory with the default interleaving policy, CXL memory with global weights according to bandwidth, and "targeted" weights that use different settings for the executable code, stack, and heap of the process. The default interleaving policy is on average 78% slower than DRAM. The global weights bring that performance to between 6% slower and 4% faster than DRAM depending on workload, and correctly chosen targeted weights push the performance to 2.5% to 4% better than DRAM.

Targeted weights have such a dramatic effect because different areas of a process's memory can have different access patterns that give an advantage to one memory policy or another. Memory policies for a whole process or a specific area of memory are configured with set_mempolicy() and mbind() respectively:

    long set_mempolicy(int mode, const unsigned long *nodemask,
                       unsigned long maxnode);

    long mbind(void addr[.len], unsigned long len, int mode,
               const unsigned long nodemask[(.maxnode + ULONG_WIDTH - 1)
                                            / ULONG_WIDTH],
               unsigned long maxnode, unsigned int flags);

The signature of mbind() introduces problems for weighted memory interleaving, however; the signature cannot be extended, because it is running up against the limits of how many arguments can be provided to a system call (at most six). Price's patch set rectifies this by introducing a new system call — mbind2() — that takes a structure as an argument, but otherwise performs the same function.

    struct mpol_args {
      __u16 mode;
      __u16 mode_flags;
      __s32 home_node;
      __u64 pol_maxnodes;
      __aligned_u64 *pol_nodes;
      /* Optional: interleave weights for MPOL_WEIGHTED_INTERLEAVE */
      unsigned char *il_weights;    /* of size MAX_NUMNODES */
    };

    mbind2(unsigned long addr, unsigned long len, struct mpol_args *args,
           size_t size, unsigned long flags);

The mpol_args struct is intended to be extensible over time, which is why the system call includes the structure's size. Any new extensions to the memory-policy infrastructure can add options to the end of the structure, and old callers won't need to be updated. Price's patch set also adds set_mempolicy2() and get_mempolicy2() using the same scheme, to support setting or retrieving task-wide weights, respectively.

Like existing memory-policy settings, Price's new weighted-interleaving policy is not a hard-and-fast rule. Setting a new memory policy does not migrate existing pages (unless the MPOL_MF_MOVE option is specified), and if the preferred NUMA node has no more memory available, the kernel will fall back to using the next available NUMA node. Price spelled this out explicitly in response to concerns brought up during review:

This interface does not limit memory usage of a particular node, it distributes data according to the requested policy.

Nuanced distinction, but important. If nodes become exhausted, tasks are still free to allocate memory from any node in the nodemask, even if it violates the requested mempolicy.

This is consistent with the existing behavior of mempolicy.

The weighted-interleaving patch set has gathered relatively little commentary, perhaps because the idea itself has been in progress for a long while. Price posted a related patch set in November that would have changed set_mempolicy() to take a process ID as an additional argument. The change would allow privileged processes to set memory policies for other processes. At the time, Price described the November patch set as being designed "to make mempolicy more flexible and extensible, such as adding interleave weights (which may need to change at runtime due to hotplug events)". Because mbind() already passes six parameters, however, that change would have needed new system calls as well. The November patch set did not end up being merged. Now that Price's newer weighted-interleaving patch set introduces the needed system calls, it is possible that another version of the older patch set will follow once the weighted-interleaving one is accepted.

Price's weighted-interleaving patch set does seem likely to be merged [Update: a reader points out that some of these changes, but not the new system calls, were merged under a different name as part of 6.9], given the impressive number of Suggested-by tags in the cover letter and the minimal objections from Ying Huang and Geert Uytterhoeven, who reviewed it. It seems as though many people are eager to have more control over how their processes' memory is distributed.

Comments (22 posted)

A change in direction for security-module stacking?

By Jake Edge
April 23, 2024

The long-running effort to complete the work on stacking (or composing) the Linux security modules (LSMs) recently encountered a barrier—in the form of a "suggestion" to discontinue it from Linus Torvalds. His complaint revolved around the indirect function calls that are used to implement LSMs, but he also did not think much of the effort to switch away from those calls. While it does not appear that a major course-change is in store for LSMs, it is clear that Torvalds is not happy with the direction of that subsystem.

In an April 9 post to the linux-security-module mailing list, Torvalds decried the stacking plans in part because "we just had *another* hardware security issue with speculated indirect branches go public". He was referring to the branch history injection flaw that was the most recent in a long line of speculative-hardware vulnerabilities. Torvalds said that he recognized that stacking LSMs was a "design decision and a target" for over a decade, but it needs a rethink:

So I say "suggestion" in the subject line, but really think it needs to be more than that: this whole "nested LSM" stuff as a design goal just needs to be all rolled back, and the new design target is "one LSM, enabled statically at build time, without the need for indirect calls".

He also said that he was aware of KP Singh's work to use static calls to avoid the indirect function calls in LSMs, but seemed to suggest that the patches were "random hacks". There are some seeming misunderstandings in Torvalds's complaints, however. For one thing, there is no "nesting" of LSMs—"stacking" either really—the security solutions are composed, instead. A given hook function in the core kernel will effectively traverse the list of active LSMs, calling the corresponding hook function if present for an active LSM, until it gets a denial, which short-circuits the rest of the calls. If no LSM denies the access, it is allowed.

There are a number of real use cases for having multiple LSMs active in the kernel. Our 2022 article on the feature describes the history of how we have gotten to this point and why it is important to be able to enable multiple LSMs on current systems. It has been possible to compose any number of "minor" LSMs for years now, but the final push is on to allow more than one "major" LSM (e.g. SELinux, Smack, AppArmor) to be enabled. The main reason behind the need for that is containers, so that a Fedora container that uses SELinux can run on an Ubuntu host that uses AppArmor, for example.

In a response to Torvalds, Kees Cook patiently pointed out some of that history, including the reasons behind the LSM-stacking work. In the end, he said, it has simplified things to the point where subsystems that logically should be LSMs could be switched:

The general "LSM stacking" work, while it does add a certain kind of complexity, has actually made the many pre-existing manual layering of LSMs much more sane and simpler to reason about. Now the security hooks are no longer a random sprinkling of calls tossed over the core kernel, and are now all singularly well defined. This started years ago with pulling the "capabilities" checking into a well-defined LSM, and continued from there for things like Yama, and has now finally reached the last, and perhaps most historically invasive, LSM: IMA/EVM [Integrity Measurement Architecture/Extended Verification Module], which is finally now a well defined LSM too.
I don't think it's sane to demand that LSM stacking be removed. That's just not the world we live in -- we have specific and large scale needs for the infrastructure that is in place.

Cook also disagreed with the characterization of static calls, noting that they have been needed by the LSM subsystem for over a year just for the performance benefits. But Torvalds strongly disagreed; he said that the reason for stacking is: "Just because you people cannot agree". He also explained that it was not static calls themselves that were random hacks, but that the use of them for LSMs is, in part because of the random-seeming limit of 11 levels of "nesting". His parting shot was to further paint the LSMs as an attack vector against the kernel.

As might be guessed, Cook saw things differently. He noted, again, that stacking has been around for quite some time now; his current system has five separate LSMs activated, not to mention the capabilities LSM that is always present. "Stacking" is not removable at this point, but, beyond that, the most recent vulnerability is not in the LSM subsystem: "the attack vector is broken CPUs". In addition, the array to hold the static calls needs to have a limit and there are 11 LSMs available for the kernel, which is why that number was chosen.

LSM maintainer Paul Moore was rather unhappy with another part of Torvalds's message. For whatever reason, Torvalds was unable to resist taking a shot at the LSM subsystem and its developers in his initial message:

Yes, I realize that the "security" in the LSM name is a bad joke, and that to a first level approximation the LSM people don't actually care about real security, and that the goal is just "policy".

Moore wondered if the insult was really just rooted in stress from yet another hardware flaw affecting the kernel, but even so, the effects will be borne by the LSM developers. Because of who he is, Torvalds's words have much greater weight, Moore said. It is thus rather ironic that Torvalds is asking—"(demanding? it's hard to say at this point)"—those he just insulted to rework their subsystem. Moore pointed out that insults are not likely to be particularly motivating.

Beyond that, as Cook had pointed out, it is far too late to remove stacking entirely. The LSM developers will act on Torvalds's email, Moore said, but the first step is to reduce the performance penalty of the indirect calls—and, in the process, mitigate the hardware security flaws they expose—by getting the LSM static calls patches merged. "The rest will need more discussion, preferably after things have cooled down and we call all look at things with a more objective lens."

Casey Schaufler, who has been pushing the full LSM-stacking work upstream for 12 years or more at this point, replied to Torvalds's complaints by agreeing with some of his points. As with other developers, he is completely in favor of replacing the indirect calls, but is unsure what they should be replaced with if static calls are not the right approach. "While I can't change the brain dead behavior of 21st century hardware I am perfectly willing to re-write the entire $%^&*( LSM layer if it can be done in a way that makes you happy." But Moore said that Schaufler should not head down that path; Moore has no plans to move to "a single-LSM approach to satisfy a spur of the moment comment triggered by the latest hardware flaw". He repeated his plan to convert the LSMs to use static calls "and go from there".

Meanwhile, Greg Wettstein thought that more sweeping changes are needed for LSMs in order to support "an environment where there are going to be multiple and potentially industry specific mandated security controls". While he agrees that the performance and attack-vector characteristics of indirect branches need to be mitigated, he does not see static calls as the right path, at least given the current LSM architecture.

There needs to be an 'LSM' architecture that allows a security policy to be implemented for a process hierarchy, just like we do for all other resource namespaces. Platform owners are then free to select whether they want to implement multiple orthogonal security controls or lock the platform into a single control of their choosing.

While that may sound like a situation tailor-made for a BPF solution, he cautioned against that approach, as well, citing the discussion about a recent patch. There have been no replies to his post, however, which may be an indication that radical changes along those lines are fairly unlikely.

In truth, Torvalds's post seems to have been made in haste—coupled with serious unhappiness about the latest hardware flaw. Backing out all of the LSM-stacking work seems well-nigh impossible at this point, especially considering the user-space compatibility guarantees that Torvalds himself regularly enforces. Beyond that, the container use case for multiple major LSMs is not going away either, so some sort of solution will be needed there. The LSM development community seems willing to engage on alternate solutions, but one suspects that what has come out of more than a decade of effort will eventually be adopted.

Comments (5 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: GitHub malware; Tille elected; Fedora 40; QEMU 9.0; Open Home Foundation; udev-hid-bpf; Firefox crash reporting; Quotes; ...
Announcements: Newsletters, conferences, security updates, patches, and more.

Next page: Brief items>>