LWN.net Weekly Edition for September 30, 2021

Welcome to the LWN.net Weekly Edition for September 30, 2021

This edition contains the following feature content:

A fork for the time-zone database?: removing historical time-zone data in the name of "equity" stirs up the community.
Improvements to GCC's -fanalyzer option: GCC's static analyzer is getting better — a report from the Linux Plumbers conference.
Two security improvements for GCC: two options for GCC for code hardening.
Taming the BPF superpowers: enforcing signatures on BPF programs is the wrong solution claims a core BPF developer.
The 2021 Kernel Maintainers Summit; full coverage from the 2021 maintainers gathering, including:

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

A fork for the time-zone database?

By Jake Edge
September 28, 2021

A controversy about the handling of the Time Zone Database (tzdb) has been brewing since May, but has come to a head in recent weeks. Changes that were proposed to simplify the main database file have some consequences in terms of time-zone history and changes to the representation of some zones. Those changes have upset a number of users of the database—to the point where some have called for a fork. A September 25 release of tzdb with some, but not all, of the changes seems unlikely to resolve the conflict.

The time-zone database is meant to track time-zone information worldwide for time periods starting at the Unix epoch of January 1, 1970. But, over the years, it has accumulated a lot of data on time zones and policies (e.g. daylight savings time) going back many years before the epoch. As with anything that governments and politicians get involved with, which time zone a country (or part of a larger country) is in, whether it participates in daylight savings time (DST), and when the DST switches are made, are arbitrary and subject to change, seemingly at whim. Tzdb has been keeping up with these changes so that computer programs can handle time correctly since 1986 or so, when it was often called the "Olson database" after its founder, Arthur David Olson.

Merging time zones

Back in May, tzdb maintainer Paul Eggert proposed two changes, one of which was winnowed out during the discussion. The other was to merge zones (identified by location strings, such as "Europe/Berlin") that have the same post-1970 history under a single name. The entries that got merged out would be maintained as links to the merged zone and their pre-1970 history would be moved into the backzone file that is also distributed with the database. Which zone name would be the "winner" is based on the rules embodied in the tzdb theory document; the most populous choice would be the main zone, while the others would be relegated to the status of links.

The maintainer of the Joda-Time library for date and time handling in Java, Stephen Colebourne, disagreed strongly with the change and asked that it be reverted. One of the main problems is that programs using tzdb and looking at pre-1970 dates for places that lost out in the merge will get incorrect time-zone information. The pre-1970 time-zone information for those regions will, effectively, be lost, he said:

[...] many places (eg Anguilla, Antigua and Aruba) are now sharing time-zone history, and that history is **from some other zone**. That seems completely unacceptable.
While I understand the motivation to remove the burden of pre-1970, that cannot come at the cost of giving a place the history of somewhere completely different.

In another message, he pointed out other problems that he saw with the changes, which are partly political in nature:

For example, Norway's and Sweden's time zone history is being wiped out in favour of that of Germany. Can no-one here see the political sensitivity in that?
This has a very serious impact on Joda-Time because it normalizes time-zone IDs. (It treats a Link as the key to the normalization, so anything at the weak end of a Link is replaced by the ID at the strong end. You might complain that it shouldn't do that, but it has operated that way for 20 years...)
This code:
  DateTimeZone zone = DateTimeZone.forID("Europe/Stockholm");
  System.out.println(zone);
will print "Europe/Berlin" if this change is not reverted. I consider this to be catastrophic.

Since Berlin and Stockholm (Oslo, also, as mentioned elsewhere) share the same post-1970 time-zone history, they would all get merged under "Europe/Berlin" as Berlin is the most populous. The pre-1970 history for those countries (Sweden and Norway) would get moved to the backzone file; the data is still available, but many applications have gotten used to getting accurate historical information without consulting backzone. For his part, Eggert is trying to solve an unfairness problem within the database:

Why should we maintain Norway and Sweden's time zone histories, when we don't maintain the histories for Guangdong, KwaZulu-Natal, Thanh Hóa, or Uttar Pradesh? Aside from politics, these regions are similar: although all the regions have distinct timestamp histories with data that I can cite, all the regions can be merged into other tzdb regions (Norway into Berlin, Guangdong into Shanghai, etc.) if we consistently limit tzdb's scope to regions that differ after 1970. Given all that, why should Norway and Sweden continue to be special?
These are not particularly-obscure examples, as Guangdong etc. all have more people than Norway or Sweden do. It would be political to continue to focus on Norway and Sweden while excluding Guangdong etc. purely for reasons unrelated to timekeeping.

He also pointed out that Joda-Time already has to deal with these kinds of merges; earlier releases, including the 2021a release from January, have merged zones. "Whatever techniques people use for these longstanding links should also work for the new links." Eggert is convinced that he is solving a real problem here:

The current patch was not prompted by purism. It was prompted by a complaint from a user who made a good point about the politics of tzdb 2021a, which can reasonably be interpreted to favor countries like Norway etc. over countries like Kosovo etc. Rejecting this kind of complaint and saying "we've always done it that way" is not a promising path forward.

While he believes that users will not really be affected by the changes and that the merge process (which has been ongoing for a number of years in a slower, less-visible manner) is working well, some disagreed. Derick Rethans, who maintains the date and time handling for PHP, Hack, and MongoDB, said that the cleanups are making things worse, in part because they ignore backward compatibility. Colebourne was even more blunt:

Let me be clear - this change cannot stand. The reliability of TZDB has declined considerably over the past few years, but it is time to say enough is enough. This is where the line in the sand needs to be drawn.

Part of the problem is that currently backzone has a fair amount of poor quality data that got shifted out of the main database long ago for that reason. Moving well-researched historical data into that file (for, say, Sweden) makes it difficult to distinguish the two. Eggert said that, currently, the database can be built either with or without the backzone data, but it's an all-or-nothing choice; that could perhaps change moving forward:

For example, if a downstream user wants the 'backzone' entry for Europe/Stockholm which is well-documented, but doesn't want backzone's America/Montreal entry because it's not well-attested and is most likely wrong, the user could specify a list of backzone names that includes Europe/Stockholm but excludes America/Montreal. I think it would not be too much work to add something like this to the tzdb code.

A problem with that approach is that applications may just generally consult whatever tzdb the operating system has installed. Today that means they will get proper time-zone information for, say, Norway on pre-1970 dates, but down the road they would not, unless the operating system builds a version including some of the data from backzone. Different choices of exactly which data to include could easily create incompatibilities between systems for pre-1970 dates.

Charter breach?

On June 3, Colebourne formally requested a reversion of the time-zone merging because he said it breached the RFC 6557 charter. For one thing, the TZ Coordinator (i.e. Eggert) "has not taken into account the views of the mailing list" as required by the charter. Furthermore, the cleanups are not within the scope of the charter, he said.

Multiple people spoke up in support of Colebourne's message, though not all of them agreed that Eggert's plans were a breach of the charter. The clear consensus in that thread, though, was that the changes should be reverted so that some other solution could be found. Maintainers of the date and time code for multiple projects were opposed to the changes, though perhaps not as an official position of the project, at least yet. While the backzone was mentioned, it is not truly a workable solution for projects like PostgreSQL, as pointed out by Tom Lane:

However, the Postgres project is finding itself in a hard place precisely because we *didn't* adopt backzone. We reasoned that the default set of zones was the preferred thing and thus would be the most likely to remain stable. Now, not only is the default different (which perhaps we could live with), but there's no way at all to get the old default. That's not okay, and it seems to me to fly in the face of most understandings of software backwards compatibility, never mind any tzdb-specific rules.

Eggert did not directly address the breach claim, but a few days later posted a compromise idea that would provide a build flag to create the database in two different ways: as it was before the merges and as it is with them. But the question then becomes: which is the default? Many applications do not build the database, but use it as distributed in the tarfile, Colebourne said. Eggert was not opposed to providing an alternate tarfile, but did not seem inclined to revert the merges that he proposed.

On the other hand, Colebourne and seemingly everyone else participating in the thread are willing to work on some kind of technical solution that solves the problems, but think that the current merges should be reverted first. There are ways to derive the two different views into the data, Colebourne said, but that requires keeping the existing data in the main database file. That file can be processed to automatically do the merges as Eggert wants, but that the reverse is not true, Colebourne said. In another message, he described the situation as an impasse, saying that there had been many requests for reversion and "no requests to retain it".

There are technical solutions available to reduce the amount of data published to downstream users, but the starting point must be a fully populated database, not one that is logically broken. The next action must be to revert. Then we can agree on any technical measures necessary.

Colebourne started a thread on what data tzdb should contain. It described the kinds of data present in the database, how they are used, and the problems that need to be addressed with them. It offered up a proposal based on his plan to automatically process the file to create the merges for regions that share post-1970 history, but to retain the existing data so that pre-1970 history did not move to backzone. The proposal was received positively, though there was some constructive criticism; Eggert did not really participate in that thread, however.

Samoa

For a few months, that is where things stood. The development version of tzdb had the merges Eggert proposed, along with various other fixes made along the way. On September 13, Eggert said that a new release was not imminent in response to a query about the status of the merge changes. But, then, along came Samoa.

On September 20, Geoffrey D. Bennett posted a notice that on September 15 Samoa had decided to stop switching to daylight savings time. That meant that tzdb needed to change to reflect that—and before the September 26 DST-switch date. As Eggert put it: "That's not much notice".

Later that day, Colebourne posted "Preparing to fork tzdb". The imminent release that seemed likely to contain those changes meant that a fork was needed in order to maintain the zones as they are in the 2021a release, he said. He would prefer that Eggert revert the changes, but:

In the event that the tzdb maintainer does not revert, consideration must be given to forking the project. The purpose of the fork would initially be to maintain the tzdb data set as it was prior to the dispute. This would then be released in parallel to the original tzdb to ensure that downstream projects do not each do their own thing (ie. to minimize incompatibilities downstream).

Colebourne asked if there was support for such a fork and whether there were people or organizations willing to assist. For the most part, the reaction to the idea of a fork was unfavorable; there were exceptions (Lane and Rethans, for example). Eliot Lear noted several downsides, including confusion among users and implementers, as well as fragmentation of expertise between the two. He suggested proposing changes to RFC 6557 as a way forward. In a somewhat similar vein, Emily Crandall Fleischman suggested invoking the procedure to replace the coordinator as a better alternative than a fork.

Eggert said that the fork would be discriminatory and that it would take a lot of work to fix the fork:

Such a fork would arbitrarily discriminate against countries like Angola and Niger, and in favor of countries like Norway and Sweden.
A primary goal of the recent patches was to avoid racial or national preferences that were present in the previous setup. Arguably these preferences were not intentional, or were apparent and not real; however, that's not an argument I would want to defend.

He suggested working together on technical solutions to resolve the problems that stem from the changes. He also objected to the idea that the data was getting "wiped out" by the changes. But, as Lane pointed out, including the backzone data "does *not* reproduce what was formerly the default set of zones". He said that it might be technically correct to argue that the data is not going away, but that does not really reflect the reality of the situation:

I'm all for improving equity in tzdb's coverage, but I think it should be done by adding coverage for underserved areas, not removing data from areas that had been well-covered. And let's make no mistake: removing data from the default build is removing data, for many downstream users who won't have an opportunity to make their own decisions about what their platforms provide.

Colebourne renewed his call for a reversion. Like Lane, he believes that adding more data is the solution to the problem, but another possibility would be to remove all pre-1970 data from tzdb by moving it to backzone. In the meantime, the path forward is clear in his mind:

The *only* good faith move you can make right now is to revert the patch. I'm quite happy to discuss practical solutions once that is done. If 2021b is released with the disputed patch then the fork will occur, and you as TZ coordinator will have directly caused the fork.

As suggested, Colebourne also polled the list to see if there was a consensus that a change to the coordinator is needed. The results of that were a resounding "no", which he acknowledged in the thread. His June appeal to the Internet Engineering Steering Group (IESG) about a breach in the charter was answered on September 22. Murray S. Kucherawy said that he disagreed with Colebourne's arguments, though that is not necessarily the final answer if he wishes to pursue it further. Colebourne said that both of the formal options for relief had been tried and failed:

The potential options remaining are to fork the project or to solve the issue. For the avoidance of doubt, my preferred option would be to solve the issue.

He noted that there was strong support for the idea of releasing 2021b with just the minimal changes needed to support the Samoa change, then taking a week or so to calm everything down and start to work on other solutions. He asked Eggert if he would do so, but it is perhaps not surprising that Eggert declined. He is concerned that the discrimination problems are now more visible because of the dispute, so he needs to act now:

Unfortunately, the equity issue has broadened and is now visible outside our little community, and I really and sincerely doubt whether it'd be a good idea for us to do nothing about it now. We need to establish that we are fixing the problem and are not deferring action to a never-never land of arcane bureaucracy, and we need to do so in terms that will be clear to outsiders.

He did compromise to some extent by proposing to only merge nine zones for 2021b, rather than the 30+ he proposed to begin with. That would provide evidence that progress is being made, while avoiding the biggest problem area:

[...] the idea is to revert most (but not all) of the objected-to changes. In particular, this will revert the changes to Europe/Oslo and Europe/Stockholm, which have drawn the most objections. The idea is to take the first step now, and to take more steps in future releases (which should not be distant-future releases, as we need to continue to make and exhibit a good-faith effort to fix the problem).

That was not acceptable to Colebourne, again unsurprisingly. But meanwhile, the clock was ticking. Other proposals were made; Russ Allbery wanted to reframe the debate by changing the "naming layer", while Lane tried to find a way to maintain the existing set of zones (and all of their historical data) going forward. Colebourne summarized the whole issue regarding pre-1970 data, while attempting to be even-handed; that led to yet another enormous thread, though he asked that only actual corrections be posted. He also put out a lengthy blog post about the dispute.

2021b release

On September 25, Eggert released version 2021b of tzdb with the merges of nine zones as his amended proposal indicated. He followed up the release announcement with a justification for the release and the choices made in it. Eggert simply sees the 30+ changes he proposed in May as the endgame for a process that started in 2013, though he did acknowledge problems with making changes to so many zones at once, thus the reduction to nine for 2021b.

Historical data is mostly only used by astrology programs, he said, and it is "typically grossly inadequate for realistic use outside the named location". Tzdb focuses on accurate data for 1970 and beyond; his efforts to merge zones is part of that. Now that the fairness issue has come to the fore, it is time to deal with it:

Norway and Sweden have triggered concerns, much more so than similar changes made (for example) to Angola and Congo in tzdb 2014g.
[...] It's a bad look for us that so much concern about Norway and Sweden has appeared on this mailing list, even though hardly anybody seems to have cared about Angola and Congo. It'll be an even worse look if we ignore this issue weeks, months or even years after it's been made clear to us.
[...] With all this in mind, issuing 2021b now is a significant step toward equity in tzdb. It will let us say that we are moving toward a fair process, and will give us the opportunity and motivation to improve on that process and to address and balance the various other concerns that have recently appeared on the mailing list.

As might be guessed, Colebourne was very unhappy with the release.

In summary, I am livid with the high-handed approach you have taken wrt the release of 2021b. Despite near unanimity of the mailing list requesting you to release 2021a+MinimalChanges, you progressed 9 out of the 30 link merges based on a rationale that you acknowledge is not universally accepted.

He said that he would be taking a few days away from the issue, but planned to "start a positive discussion as to what the next steps can be" after that. In his blog post he noted that he would be looking into alternatives, such as perhaps moving a fork of tzdb under the Unicode Common Locale Data Repository (CLDR) project. It turns out that Eggert is not opposed to CLDR being involved in some fashion. Perhaps some kind of compromise can be found in that direction.

The dispute spanned multiple, gigantic threads in May, June, and September. The call to fork tzdb also spawned several heads-up emails to LWN; thanks for those. It is the most visible thing to happen in the normally quiet tzdb arena since the 2011 lawsuit against Olson and subsequently moving tzdb under the Internet Corporation for Assigned Names and Numbers (ICANN) .

It is a little hard to see how users of tzdb are served by making pre-1970 zone information worse for some places, even if those places had been "elevated" in status incorrectly along the way. Dumping that information into the backzone file is tantamount to losing it completely, though the historical record of those moves could be used to reconstruct things. Perhaps a separate "historical tzdb" project is needed that better serves the needs of astrologers and others who have needs for that kind of information. It would be plausible to use the existing tzdb contents as at least a starting point—perhaps more than that.

The timing crunch caused by Samoa's late decision on DST changes was not only disruptive for tzdb, but also for residents of Samoa, as Eggert noted. Two weeks is not a lot of time to get the word out, even outside of the computer realm, but many computers and devices did not magically update to 2021b, so they switched to DST as (previously) scheduled.

It is also unfortunate that the coordinator took the opportunity to lock in these controversial changes on (relatively) short notice over the vehement opposition of some. However inequitable the zone choices in 2021a (and before) were, things had been that way for a long time; disrupting users and developers to create a kind of fait accompli is not a particularly good look either. There are already questions on the mailing list about what distributions and other tzdb users should do with the changes. Taking a bit more time to come up with a scheme that addressed all of the concerns, then making all of the changes at once using that mechanism in a month or two hardly seems burdensome—or unfair—but here we are.

Comments (223 posted)

Improvements to GCC's -fanalyzer option

By Jonathan Corbet
September 23, 2021

LPC

For the second year in a row, the GNU Tools Cauldron (the annual gathering of GNU toolchain developers) has been held as a dedicated track at the online Linux Plumbers Conference. For the 2021 event, that track started with a talk by David Malcolm on his work with the GCC -fanalyzer option, which provides access to a number of static-analysis features. Quite a bit has been happening with -fanalyzer and more is on the way with the upcoming GCC 12 release, including, possibly, a set of checks that have already found at least one vulnerability in the kernel.

When GCC is invoked with -fanalyzer, it runs a module that creates an "exploded graph" combining information on the state of the program's control and data flow. That state includes an abstract representation of memory contents, known constraints on the values of variables, and information like whether the code might be running in a signal handler. The analyzer then uses this graph to try to explore all of the interesting paths through the code to see what might happen.

The GCC 10 release added 15 new warnings for potential errors like freeing memory twice, using memory after freeing it, unsafe calls within signal handlers, possible writing of sensitive data to a log file, and more. The double-free detection was the motivating factor that led to the development of many of those checks. Five more warnings came with GCC 11, including the ability to check whether memory obtained from one allocator is not accidentally freed to a different one and a check for undefined behavior in bit shifts. Also in GCC 11 is support for plugins that extend the analyzer. One example plugin can be found in the test suite; it checks for misuse of the Python global interpreter lock (GIL).

Recently, Malcolm was wondering what he should focus on for GCC 12. He had originally thought that he would add C++ support, but then concluded that he would rather improve the functionality for C instead. Before coming to that conclusion, he had implemented new and delete support in GCC 11 and was looking at exception handling as the next challenge, but that turned out to be "quite involved". Meanwhile, Google Summer of Code student Ankur Saini had added support for virtual functions. So some progress had been made, but Malcolm's work will continue to be focused on C for now.

There were a couple of problems for C that drew his attention, one of which is buffer-overflow detection. He has an experimental implementation that can capture the size of dynamic allocations in symbolic form and issue diagnostics about reads and writes that might go beyond that size. But determining when those warnings should be generated is hard; the code as it is now produces far too many false positives ("a wall of noise") and is not really useful.

It occurred to him, though, that there could be a way to find one specific class of problems: places where an attacker is able to influence whether an access is valid or not. That leads to the problem of taint detection and determining where the trust boundaries in the program are, which turns out to be a hard problem for arbitrary C code. It might, though, be possible to annotate specific programs and get useful diagnostics. His attention went to the kernel, which has a well-defined trust boundary and an API for moving data across that boundary. By annotating functions like copy_from_user() and system-call handlers, it might be possible to find code that does not properly sanitize user-supplied data.

GCC has an attribute (access) that describes how data moves through a particular variable or function. Malcolm added two new values (untrusted_read and untrusted_write) for that attribute to mark data that is read from (or written to) an untrusted location. Thus, for example, the data read into the kernel by copy_from_user() would be marked as untrusted_read. He added a new tainted attribute for functions as well; it indicates that all arguments to that function should be treated as untrusted. By tweaking one macro in the kernel headers, he was able to mark all of the system-call handlers in the kernel with this tainted attribute. Similar things can be done with, for example, callbacks for kernel-generated filesystems.

With those annotations, there are two categories of problems that the analyzer can detect: information leaks and using tainted data. Information leaks happen when uninitialized data is written back to user space. This case is relatively easy to detect — or, at least, he thought it would be. As an example of this type of problem, Malcolm brought up CVE-2017-18549, a driver bug that wrote random stack data back to user space. The uninitialized data that was written in this case was padding within a structure that had otherwise been fully initialized; the analyzer was able to find this problem. Getting this to work required refactoring the handling of uninitialized-data tracking; it was not a small task.

A similar problem comes about in code that reads data from user space, modifies it, then copies the result back, possibly to a different location. If the read fails, the kernel may be working with uninitialized data, which it will then duly write to user space. Handling this required bifurcating the analysis to handle the case when copy_from_user() fails. Once that was done, the analyzer also gained the ability to handle realloc(), which has three possible outcomes.

The tainted-data case comes about when, for example, a user-supplied value is used as an array index. It is harder to detect but also seems more important, since vulnerabilities of this type can often be exploited to compromise the kernel. Consider, for example, CVE-2011-0521, where kernel code would read a signed "size" value, check it against the maximum allowable value, then use it without checking for a negative value. The improved analyzer is able to catch this case.

He is still working on a prototype implementation of this functionality to show to the world; as part of that, he has developed the world's worst kernel module, which contains as many problems as he can come up with. Making the analyzer work with the full kernel, though, is complicated by the fact that the kernel uses a lot of inline assembly code. He has added some basic handling for this, but it doesn't look at the actual opcodes.

He's been running the result on upstream kernels, and has found one real vulnerability already; it has been reported but is not yet fixed or disclosed. For this reason, Malcolm's latest work still lives in an internal company repository; it finds vulnerabilities and he doesn't want to release a zero-day-finding tool until the problems it turns up have been fixed. Much of the rest of the work is in the GCC 12 trunk now. He hopes to be able to finish this work and upstream it by the end of the GCC 12 stage 1 period (this page describes the GCC development-cycle stages).

More information on this project can be found in the GCC wiki.

Comments (24 posted)

Two security improvements for GCC

By Jonathan Corbet
September 24, 2021

LPC

It has often been said that the competition between the GCC and LLVM compilers is good for both of them. One place where that competition shows up is in the area of security features; if one compiler adds a way to harden programs, the other is likely to follow suit. Qing Zhao's session at the 2021 Linux Plumbers Conference told the story of how GCC successfully played catch-up for two security-related features that were of special interest to the kernel community.

Call-used register wiping

Zhao started with a list of security features that kernel developers had been asking for, noting that the LLVM Clang compiler already had a number of them, but GCC did not. She has been working to fill in that gap, starting with the feature known as "call-used register wiping" — clearing the contents of registers used by a function before returning. There are a couple of reasons why one might want this feature in a compiler.

The first of those is to frustrate return-oriented programming (ROP) attacks, which feature regularly in published exploits. A ROP attack works by chaining together a set of "gadgets" — code fragments that perform some useful (to the attacker) function followed by a return. If an attacker can place the right series of "return addresses" on the stack, they can string together a collection of gadgets and get the kernel to do just about anything that they want.

ROP attacks must usually, sooner or later, call some other kernel function to carry out a needed task; the called function will look at the processor registers for its parameters. Making a ROP attack work thus requires getting the right values into those registers; clearing the registers at each function return can be highly effective at frustrating those attacks. It breaks the chain of gadgets that the attacker is trying to assemble.

The other reason to clear registers on return, of course, is to prevent information leaks. It is often surprising to see what attackers can learn from whatever data may have been left in a CPU register.

So clearing registers is good, but there is still the question of which registers need clearing. If the objective is frustrating ROP attacks, clearing only the registers that are used for function parameters is sufficient. Protecting against information leaks, instead, requires clearing all of the registers used. A related question is whether registers should be zeroed or set to random values. For GCC, zero was seen as the safest choice, since it is the least likely to produce values that seem meaningful to other code. It also leads to a smaller and faster implementation.

This functionality is part of the GCC 11 release, controlled by the -fzero-call-used-regs= compiler option, which has a number of possible values to control which registers should be cleared. There is also a new function attribute (zero_call_used_regs) that can be used to control register clearing for a specific function. The implementation is in the form of a new compiler pass that looks at all exit blocks, finds each return instruction, computes the set of registers to clear (which includes tracking which registers are actually used), and emits the instructions to actually perform the clearing. This functionality initially supported the x86 and Arm64 architectures; SPARC was added a bit later.

Support for register clearing when compiling with GCC was merged into the mainline kernel for the 5.15 release; the changelog notes that it reduces the number of usable ROP gadgets in the kernel by about 20%.

Stack variable initialization

The C programming language famously specifies that automatic (stack) variables are not initialized by the compiler. If code uses such a variable before assigning a value to it, it will be working with garbage data that can lead to all kinds of problems. Erroneous outcomes are clearly one of those, but it gets worse; if an attacker can find a way to place a value on the stack where an automatic variable is allocated, they may well be able to compromise the system. If an uninitialized variable is used as a lock, the result can be uncontrolled race conditions. This is all worth avoiding.

There are a number of tools around that can try to detect the use of uninitialized stack variables. Both GCC and Clang support the -Wuninitialized option, which causes warnings to be emitted at compile time, for example. Both compilers also have a -fsanitize= option to detect these usages at run time. Beyond the compilers, tools like Valgrind can be used to find uninitialized-variable usage.

These tools are useful, but they have their limits, Zhao said. Static (compile-time) tools can only perform analysis within individual functions, which can require making assumptions about what other functions do. Their ability to detect problems with uninitialized array elements or values accessed via pointers is limited. So they miss problems while, at the same time, failing to prune out infeasible paths through the code and generating false-positive warnings. Dynamic (run-time) tools cannot cover all paths, so they will miss problems; they also impose a significant run-time overhead.

Starting with the upcoming GCC 12 release, the -ftrivial-auto-var-init= option will control the automatic initialization of on-stack variables. Its default value, uninitialized, maintains the current behavior. If it is set to pattern, variables will be initialized to values that are likely to result in crashes if they are used; this option is intended for debugging use. Setting it to zero, instead, simply initializes all on-stack variables to zero; this option is for hardening production code. There is a new variable attribute (uninitialized) that can be used to mark variables that are deliberately not initialized.

Regardless of the setting of this new option, the compiler will still issue warnings if -Wuninitialized is set. The idea behind this option is not to "fork the language", but to add an extra level of safety; code that fails to properly initialize variables should still be fixed. This work was committed to the GCC trunk in early September; there are some bugs still in need of fixing that should be taken care of soon.

Zhao didn't talk about support for this feature in the kernel. Clang has had support for this option for a while, though, and the kernel can make use of it, so making use of GCC's support once it is available will be straightforward. That should help prevent whole classes of bugs, and may spell the beginning of the end for the structleak GCC plugin that is supported by the kernel now. While the development of these features was driven by a kernel wishlist, they should both prove useful well beyond the kernel context.

The video for this talk is available on YouTube.

Comments (19 posted)

Taming the BPF superpowers

By Jonathan Corbet
September 29, 2021

LPC

Work toward the signing of BPF programs has been finding its way into recent mainline kernel releases; it is intended to improve security by limiting the BPF programs that can be successfully loaded into the kernel. As John Fastabend described in his "Watching the super powers" session at the 2021 Linux Plumbers Conference, this new feature has the potential to completely break his tools. But rather than just complain, he decided to investigate solutions; the result is an outline for an auditing mechanism that brings greater flexibility to the problem of controlling which programs can be run.

The kernel has had the ability to enforce signatures on loadable modules for years, so it makes sense to consider creating the same mechanism for BPF programs. But, while kernel modules and BPF programs look similar — both are code loaded into the kernel from user space, after all — there are some significant differences between them. The safety of kernel modules is entirely dependent on the diligence of developers. They are built and distributed via the usual channels, are tied to specific kernel versions, and can last for years; they present a stable API to user space. BPF programs, instead, benefit from safety built into (and enforced by) the loader. They are often dynamically built and optimized, they are patched at run time to avoid being tied to kernel versions, and they have a different lifetime; often, they are created on the fly and quickly thrown away. These differences suggest that the same signing mechanism might not work equally well for both types of program.

Fastabend covered the BPF signing scheme; curious readers can find a more complete description in this article. In short: BPF program loading is a complicated, multi-step process involving numerous system calls; the signature is meant to cover this entire process. That is done by loading yet another BPF program to handle the process; this mechanism is mostly implemented, but there are some details left to be worked out.

There are certainly advantages to this mechanism, he said. If Alice and Bob have signed BPF programs, they can use them as usual. If Eve comes along with an unsigned program meant to eavesdrop on the kernel, that program will not be loaded and Eve will be frustrated. But there is also a cost: if Alice is generating programs on the fly, those programs will not be signed and will no longer be loadable. The keys used to sign programs should not be present on the system, so signing cannot be done on the fly and Alice's workflow will be blocked. Alice, too, will be frustrated despite being a legitimate user.

This is not just a hypothetical case; a lot of tooling works that way now. Perhaps the best-known example is bpftrace, but it's not the only one. The P4 system defines a domain-specific language for the management of networking data paths. Some of Fastabend's work on Cilium is aimed at run-time optimization of BPF programs. PcapRecorder is an XDP-based clone of the venerable tcpdump utility. And so on. None of these tools can work in an environment where BPF programs must be signed.

A lot of the security goals can be achieved, he said, by just making use of the fs-verity mechanism supported by Linux now. With fs-verity, read-only files can be signed and the kernel will check the signature on each access. If the file has been corrupted somehow, the signature will not match and access to the file will be blocked. So one thing that can be done is to use fs-verity to sign the program that loads BPF programs into the kernel. The system will automatically ensure that this program is not tampered with, and the set of keys that can create valid signatures can be restricted.

But it is possible to go further than that, Fastabend said. Using some sort of policy engine, which could be another BPF program or a Linux security module, the kernel can look at the key that was used to sign any given program and associate a set of privileges with it. At its most basic, there could be a single "can load BPF programs" privilege, which would be similar in effect to attaching the CAP_BPF capability to the program. The system could be more fine-grained than that, even, by controlling actions like access to maps. With this sort of mechanism, he said, signature checking on the BPF objects themselves will be unnecessary.

Consider the case where Alice's BPF-using process is somehow corrupted at run time. Signing of BPF programs will not save the system in this case; the corrupted user-space code can still do things like change values in BPF maps, change the attachment points for programs, and more. In other words, signing a BPF program gives little assurance that said program will run correctly in a hostile environment. A more flexible policy mechanism might do better, though, and could block attempts by a program to exceed its boundaries. Perhaps unsigned programs could be allowed to load, but they would not have the ability to write to user space or access kernel memory, for example. Access to pinned maps could be denied as well.

This mechanism is not yet implemented, but he has some ideas about how it could be done. The LLVM compiler can attach attributes to objects; it could be taught to record all of the helper functions that a program calls, all of its map operations, and so on. The BPF verifier would then confirm that the program stays within those limits, and the supervisor mechanism could allow or deny a specific program based on the attributes. All that's left is to figure out how all this would actually work.

Fastabend concluded by reiterating his goal of ensuring that dynamically generated BPF programs keep working. Program signing seems like the wrong solution; it is only useful in cases where the signed programs won't change. With an appropriate set of policy rules, it should be possible to safely allow a system to run unsigned BPF programs. In the brief discussion period that followed, Alexei Starovoitov (the author of the existing signing work) noted with enthusiasm that there are many other types of permissions that could be added. The maximum number of instructions allowed in a program would be one example. So there appears to be interest in this idea, but the real proof, as always, is in the code.

The video of this session is available on YouTube.

Comments (4 posted)

The 2021 Kernel Maintainers Summit

By Jonathan Corbet
September 27, 2021

The Kernel Maintainers Summit is an invitation-only gathering of top-level kernel subsystem maintainers; it is concerned mostly with process-oriented issues that are not easily worked out on the mailing lists. There was no maintainers summit in 2020; plans had been made to hold it in an electronic form, but there turned out to be a lack of things to talk about. In 2021, though, a number of interesting topics turned up, so an online gathering was held on September 24 as part of the Linux Plumbers Conference.

Topics discussed by the nearly 30 developers attending this year's gathering were:

Looking back at the UMN episode; what can the community learn from the University of Minnesota's attempt to get bad patches into the kernel?
Requirements for accelerator drivers: when is it appropriate to require the existence of a free user-space implementation before accepting a driver into the kernel?
The trouble with upstreaming: even experienced kernel developers can encounter frustration getting code into the mainline kernel. What, if anything, can be done to create more uniform acceptance criteria and give developers more certainty that they will be successful?
How to recruit more kernel maintainers: a short and not hugely conclusive session on addressing the maintainer shortage.
Using Rust for kernel development: what will be needed to get the Rust for Linux patches merged?
Is Linus happy? The traditional closing session gave Linus Torvalds a chance to talk about parts of the process he would like to see work better; it was a short discussion this year.

The 2021 Kernel Maintainers Summit ended with the expression of fervent wishes that next year's gathering could be held in person.

Comments (none posted)

Looking back at the UMN episode

By Jonathan Corbet
September 27, 2021

Maintainers summit

Earlier this year, a bad patch sent by a researcher from the University of Minnesota (UMN) set off a bit of a crisis within the kernel development community when it become known that some (other) patches from UMN were deliberate attempts to insert vulnerabilities into the kernel. Some months after that episode had been resolved, the 2021 Maintainers Summit revisited the issue to see if there are any lessons to be learned from it.

Greg Kroah-Hartman started off by posting a link to a presentation he put together with David Wheeler on the UMN episode. He described the events as "the university sent some crap patches, we caught them". The community, he said, is pretty much over it now. The university apologized, and meanwhile the wider security community, which has been worried about the prospect of Trojan-horse patches for years, was thankful that all of this had come out and gotten people thinking about this kind of problem.

Recently, UMN has reached out to kernel developers, asking how it can restart its involvement with the kernel community; Kroah-Hartman has put them in touch with a kernel developer who will guide them. He is working on writing a document on how research groups should collaborate with the development community; he promised to post a draft over the weekend.

Kees Cook noted that the UMN community is large and has had a number of people moving through it. There were two issues that arose in April: low code quality from UMN in general, and one bad actor. Even that actor was not truly malicious, he said, "just dumb", but nobody in UMN caught his activities in time. Kroah-Hartman said that this episode woke up a lot of people; we were lucky that we caught it. He also offered his apology for yelling at the UMN researchers; he gets to be mad once per year, he said, and this was the time for this year.

Ted Ts'o said that the assembled group should consider more general issues of code quality and how much attention should be paid to security both before and after code is submitted. He mentioned the discovery of a wide set of security problems in the just-merged ksmbd file server, which have evidently been discussed in private for a while before the topic spilled over onto the linux-kernel list. We are continuing to put security bugs into the kernel, and that seems unlikely to change, he said.

Kroah-Hartman then claimed to have written more security bugs than anybody else; in general, core developers are responsible for the most security problems in the kernel. We are all "known good actors who are accidentally malicious", he said. Cook agreed that bug creation was almost entirely "volume based"; the more code a developer writes, the more bugs they create.

Ts'o tried to return the conversation to malicious actors, noting that the UMN developers "weren't smart" about how they tried to add bugs to the kernel. But what if there are malicious actors who are smarter? The only solution, he said, was better tools to try to detect security issues. Kroah-Hartman closed the session by saying that the community has to get better at catching all of the bugs it creates, regardless of whether they are intentional or not.

Comments (9 posted)

Requirements for accelerator drivers

By Jonathan Corbet
September 27, 2021

Maintainers summit

In August, a long-running dispute over drivers for AI accelerators flared up in the kernel community. Drivers for graphics accelerators are required to have at least one open-source implementation of the user-space side of the driver (which is where most of the logic is). Drivers for other types of accelerators have not, so far, been held to that same standard, which has created some friction within the community and an inconsistent experience for developers. The 2021 Maintainers Summit took up this issue in the hope of creating a more coherent policy.

Greg Kroah-Hartman is the subsystem maintainer who has accepted a number of accelerator drivers without applying the open-user-space standard. He started off the session by saying that he can't tell developers of these drivers "no" when there are no standard requirements he can point them to. Dave Airlie, the DRM (graphics subsystem) maintainer, said that his subsystem does indeed have those kinds of standards, but acknowledged that it is a lot to ask that those standards be applied generally. Saying "no", he said, is the best way to get developers to put in the effort to do things right; if the bar is set too low, developers drop their code in, then disappear. Saying "no" makes them engage more.

We need, he continued, to be more responsible for the bigger picture, and that means that we need information about how the hardware the kernel drives actually works. That becomes especially true for drivers that use certain parts of the kernel API, and the DMA-BUF API in particular. DMA-BUF is a mechanism for drivers to interface with each other; a new driver using that API will be talking to other complex drivers that "have been through all the hoops". There is no desire to compromise the operation of those drivers through interaction with a new driver whose developers have not joined the community.

The Habana AI-accelerator driver, which is what has set off most of the controversy, is actually better than most, Airlie said. But it will still create security problems. Developers of drivers like this are not experts on creating secure kernel APIs. Kroah-Hartman said that, if drivers like the Habana one are kept out of the kernel, they'll still use APIs like DMA-BUF, nobody will see it, and the result will be far worse. But Airlie repeated that DMA-BUF is a line he does not want to see crossed.

Kroah-Hartman said he could understand this rule for graphics drivers, but for drivers like Habana's there is no standard that he can apply. Airlie answered that the Habana accelerator is a GPU at heart; one could implement OpenCL on top of it — something he didn't know until Habana open-sourced its compiler. If a vendor is making a compute-only graphics card, he asked, why should they have to jump through hoops when their competitors don't?

So, Kroah-Hartman asked, where is the line where drivers need to come with an open-source user-space implementation? Airlie said that the Habana driver was put forward as "not a GPU driver", but now it is using DMA-BUF. That is where the line should be. This is the standard that the InfiniBand subsystem is using as well now.

Arnd Bergmann said that there are a couple of cases here. For accelerators that can run anything, like GPUs can, it makes sense for the drivers to go through the DRM subsystem and adhere to DRM's rules. For devices with a more defined purpose, though, it might be better for them to come in with a custom kernel interface and lower requirements.

Kroah-Hartman said that maintainers have to make value judgments; a lot of new subsystems are submitted to him and he needs to make a decision on each. What should he do? Airlie reiterated that the line should be drawn at the DMA-BUF and DMA fence APIs. "A little driver sitting in a corner" can be merged without a lot of rules, but accelerators inevitably reach the point where they need to use DMA-BUF. There is no point in running an accelerator without access to DMA or graphics. These devices start simple, he said, but once they go toward production they get more complex. Kroah-Hartman agreed to uphold the DMA-BUF line.

Airlie said that one problem with graphics is that there is no common user-space API for all GPUs, just "a small discoverability API". These devices are so different that any attempt to create a standard API would just get in the way. But that means there is little control over the API that any given driver provides. As it is, graphics developers are often finding interfaces in the drivers that user space has never made use of.

The DMA-BUF line is good, he said, because using that API brings developers into contact with the experts. There is great value in having a community developer who knows where the user-space code for a device is; that makes it possible to see which interfaces are actually needed, among other things. Vendors need to release the compiler for their devices so that the DRM developers can see what the hardware is capable of. If the device can perform DMA directly, for example, the API has to prevent users from accessing that capability.

Torvalds speaks

At this point, Linus Torvalds unmuted to say that he wanted to argue against some of the points that Airlie was making. Airlie is coming from a subsystem where the community has 25 years of experience; there is history and a community. The developers know how these devices will work. When new people come, though, we don't want to create a high bar to entry. Yes, they will do things wrong the first time, but until we let them into the kernel, they will not learn how to do things right.

For this reason, Torvalds is in favor of accepting new code and letting different groups make their own mistakes. The DRM developers, after all, screwed a lot of things up badly; that was part of the path toward their current API. The DRM developers certainly didn't come in knowing all that they know now. The same will be true of companies like Habana; they will do things wrong, but if we block them, they will never get things right. That also is why he let ksmbd and the ntfs3 filesystem in for 5.15.

In other words, he continued, the community should be open to taking in new subsystems, but should also be more proactive in throwing them out again. If a subsystem causes problems for others, out it goes. If there is no user space for it, why keep it around? If the Habana driver creates trouble, it can be thrown out.

The security issue, he said, is "pure bullshit". Hardware engineers already own the machine, and we cannot protect ourselves against what a device might do; we should worry more about maintainability than security. We can't fix hardware-level security problems, but we can try to ensure maintainability. Torvalds did say, though, that Airlie was correct when it comes to use of DMA-BUF. Drivers are mostly independent, but use of DMA-BUF is the point where they start to interact with each other, and that can bring in maintainability problems.

Tests

Kees Cook spoke up to say that code quality is the real issue here, and that we just don't have enough automated analysis to assure that quality anywhere in the kernel. Where are the tests for ksmbd, he asked? In general, there is no way to find out where the tests (if any) for a given body of code are. He is not a big fan of BPF, he said, but the BPF verifier is comprehensive and prevents a lot of problems.

Ted Ts'o said that the syzbot fuzzing system can be annoying, but it is also great. Network filesystems (like ksmbd) are missing a good fuzzing solution. Accelerators, each of which has its own instruction-set architecture, will be an even bigger problem. Airlie, though, said that tests don't help if nothing uses the API that the tests are exercising. They don't help with maintainability, since they can't tell maintainers which APIs are actively in use. That is why the DRM developers insist on an active user space, something with life and developers who can answer questions.

Chris Mason said that, as somebody who has been pushing vendors to get their code into the kernel, he feels that the community is creating an environment where the NVIDIA model (proprietary drivers) is the most efficient way to go. The harder it is to get code upstream, the harder it is to convince vendors to do the right thing. So lowering the bar seems like the right thing to do. We have to let vendors go though the process of doing things wrong and feel the associated pain.

Torvalds talked about a recent conversation with a large company, where he was surprised to learn that some core developers employed there don't always want to work upstream. They tend to be worried about the one workload they care about, so they hack on that and don't worry about the larger case. "Upstream first" is a goal, he said, but it cannot be a hard requirement. We want developers to want to work with us; we don't want to be the straightjacket that they have to work within. That means we have to be somewhat flexible.

Kroah-Hartman said that he wants to take new stuff; that makes its developers a part of the community and gets them to care about us. Airlie replied that vendors only care about customers, so they will only care about us if their customers do. Kroah-Hartman continued that we all have to work together and be accepting; we can't force the creation of a unified API across competitors before the code is merged. Mason rephrased that as "we can't make them drink the Kool-Aid if we don't let them into the restaurant".

Ts'o brought this extended discussion to a close by asking if the group was converging on any sort of consensus. Is there anything that can be said to unify the criteria for driver acceptance? Airlie said that there seemed to be agreement on drawing the line at use of DMA-BUF and the fence API, and nobody disagreed with him. There was a bit of trailing discussion on how to notice when that line was crossed. I suggested moving those interfaces into a module namespace, and Kroah-Hartman posted a patch to the effect a few days later.

Comments (10 posted)

The trouble with upstreaming

By Jonathan Corbet
September 27, 2021

Maintainers summit

The kernel development community loudly encourages developers to get their code into the upstream kernel. The actual experience of merging code into the mainline is often difficult, though, to the point that some developers (and their employers) simply give up on the idea. The 2021 Kernel Maintainers Summit spent some time discussing the ways in which the community makes things harder for developers without coming up with a lot of ways to make things better.

Upstreaming agony

Ted Ts'o started off the session by referring to a Kernel Summit talk held earlier in the week where a couple of Chrome OS developers reviewed their efforts to get closer to the mainline. He referred specifically to slide 10, titled "Upstreaming agony", from the presentation slide deck. That slide talks about how some maintainers are far more responsive than others, how some require massive unrelated changes, and more; "oh sure we can apply this two liner… *after* you rewrite the subsystem". This variability, Ts'o said, can be off-putting for new developers and can be the reason why many developers choose not to participate in the kernel community at all. What, he asked, can be done to improve this situation?

Linus Torvalds answered that there have been similar discussions at many maintainer gatherings; part of the problem is just that not all people are the same, but it is also true that subsystems are different from each other. The memory-management subsystem affects everything, he said, and will thus be harder to change than some random driver. We do have pain points, and some maintainers do make things too hard, but he doesn't see a productive way forward to solve the problem. "Naming and shaming" of problematic maintainers might help, he suggested.

Dave Airlie mentioned some of the the problems that vendors experience; bringing up a new system-on-chip (SoC) can involve upstreaming dozens of drivers, each of which has a different path — involving different acceptance criteria — into the mainline. That does not make things easy. Arnd Bergmann responded that, for Arm-based SoCs, there is a workaround to this problem in the form of his arm-soc tree. Drivers can go in via that path.

I could not resist the urge to point out that, in previous discussions on this topic, the group had decided to create a maintainer's handbook with subsystem profiles where each subsystem's quirks can be documented. The maintainers handbook exists now, but the section for subsystem profiles is nearly empty. So we still have almost no documentation to help patch submitters work with specific subsystems. Torvalds suggested going the other way, whereby developers would document their problems working with specific subsystems. Maintainers don't think about these things, he said, but patch submitters have no choice.

I expressed my lack of enthusiasm for the task of handling patches that add criticism of other maintainers.

Chris Mason reminded the group of the discussion at the end of the previous session, which touched on developers who are happier working within their companies than with the upstream community. Even people seen as core developers can feel that way. Getting code upstream is an uncertain process, Mason said; it makes the patches better, but at the cost of making life harder for developers. It would be good to find ways to add more certainty to the process. At the Monday discussion on folios, developer Matthew Wilcox said that he did not know which tests to run to show the performance impact of his work. When such a long-time core developer doesn't know that, we have done something wrong, Mason said.

Torvalds answered that, when it comes to memory management, there are no benchmarks, just "thousands of different things" that developers run. He mentioned the Phoronix benchmarks, saying that they have found a "ridiculous" number of problems. The benchmarks are not great, but what is great is that there is a person who follows up and is tracking performance over time. He gets annoyed at the syzbot tests, he said, because they are run on massive Intel machines that will show a performance regression just because a cache line moved. For performance testing, that is not hugely helpful. [Note that Torvalds said "syzbot", but it seems likely he meant to refer to the Intel 0day tests].

Al Viro added that a report of a regression is interesting, but even more interesting is the standard deviation of the results. He has seen many claims of improvement that, once one looks at the data, turn out to be "basically noise". Torvalds noted that the syzbot tests do at least provide the standard deviation with their results.

Thomas Gleixner said that the community can document formal criteria, but it's much harder to document expectations with regard to code design. The fact that there are different criteria in different areas makes that harder. And, in any case, people don't listen to that sort of guidance and just "cram things into corners". Mason asked if the tone of some of the guidance given to developers might make things harder; sometimes they can feel like they are getting beaten up all the way through the review process. Gleixner replied that he mostly asks questions, and he spends time explaining issues to inexperienced developers. He gives up, though, when dealing with ten-year veterans who won't listen. Mark Brown added that the community has discussed tone many times before; it often comes down to differences in culture.

At the close of this mostly inconclusive session, Ts'o brought back the point that many developers want to work outside of the community because it is easier. It is not always clear how to fix that; many areas that developers can work on affect a huge variety of users. It is easy to make a patch that works well for one workload but which trashes others. In large companies, developers are rewarded for solving that company's problems; the rewards for contributing back to the community and improving the kernel for everybody are harder to find. We would do well, he said, to find ways to point out and reward developers who have made the kernel generally better.

A common problem, he continued, is the case of the first developer for a new hardware feature being asked to make a new framework that works in the general case. The user interrupts patch set is currently experiencing that, he said; this is new Intel feature, but developers are asking how it will work for other architectures. That is a hard question to answer when other CPU manufactures haven't said whether or how they will make that feature available. We need to go easier on the first developer to a new area, he said.

At that point the room fell silent and the discussion moved on to the next topic.

Comments (4 posted)

How to recruit more kernel maintainers

By Jonathan Corbet
September 27, 2021

Maintainers summit

The kernel development process depends on its subsystem maintainers, who are often overworked and, as a result, grumpy. At the 2021 Kernel Maintainers Summit, Ted Ts'o brought up the topic of maintainer recruitment and retention, but failed to elicit a lot of new ideas from the assembled group.

Making life easier for developers — the topic of the previous session — is important, but perhaps one of the best ways to do that would be to bring in more maintainers. How, Ts'o asked, can we do that? The community should also be thinking about the problems of succession, given our maintainers are not getting any younger. Dave Airlie responded that the real question should be: how do we encourage companies to pay for maintainers? Companies will pay developers, but they are far less interested in supporting the maintainer role.

Linus Torvalds said that he fully agreed with that sentiment. He has been talking with companies and telling them that they need to encourage their developers to take on more roles, and to work into the maintainer role in particular. There should be, he said, one maintainer in a company for every ten developers, but companies are nowhere near that ratio. Thomas Gleixner said that one way to help in that regard would be for companies to give their developers time to work on their own projects.

Greg Kroah-Hartman said that the only way for developers to be able to work as maintainers within companies is for it to be a part of their job — something they will be evaluated on at the end of the year. This has to come from the top down, he said. Airlie answered, though, that maintainership can come from the bottom up as well; the DRM subsystem has a group review structure that requires developers to help review code. When the need arises, that makes it easy to pull developers up into maintainer roles. Chris Mason said that, at Facebook, maintainership is in the job description, and the company has had good success with that.

Ts'o concluded the session by suggesting that this might be a topic for the Linux Foundation Technical Advisory Board to consider; perhaps the board could draw up a set of recommendations for companies.

Comments (2 posted)

Using Rust for kernel development

By Jonathan Corbet
September 27, 2021

Maintainers summit

The Rust for Linux developers were all over the 2021 Linux Plumbers Conference and had many fruitful discussions there. At the Maintainers Summit, Miguel Ojeda stepped away from Plumbers to talk about Rust in a different setting. What will it take to get the Rust patches merged? The answers he got were encouraging, even if not fully committal.

Ojeda started by asking the group whether the community wanted Rust in the kernel. If it goes in, he said, it should do so as a first-class citizen. In his discussions he has encountered a number of kernel developers who are interested in the language; many of them are quite open to it. He has gotten help from a number of those developers in the process. Some groups, including the Android team, actively want it, he said.

Linus Torvalds answered that the kernel community will "almost certainly" do a trial with the language, but that the Rust developers need to accept that it's a trial. It is not necessary to convince everybody in the kernel community before this can happen, but there does need to be a certain level of buy-in from the subsystem maintainers who will be directly affected at the beginning. The support from "fake Linus" (GPIO maintainer Linus Walleij) and Greg Kroah-Hartman is a good start, he said; a majority of kernel maintainers is not mandatory. Torvalds had looked at the patches a few months ago when they were posted, and nothing therein made him say "no way". He has not seen any postings since, though. If Rust support is not merged, Torvalds ended, it will never get to be good enough for real use in the kernel.

Kroah-Hartman said that the Rust patches are looking a lot better, but aren't ready yet. The Rust GPIO driver that Wedson Almeida Filho posted in July was "awesome", and he added that a number of filesystem developers are interested in Rust. That could be a good place to work, since the virtual filesystem APIs in the kernel are relatively stable.

Kees Cook suggested that WiFi or Bluetooth drivers could be a good place to use Rust; Kroah-Hartman answered that he would love to drop all of the current Bluetooth drivers. He said (to groans from the group) that he knows of an upcoming phone that is shipping with 100 out-of-tree drivers; he suggested that developers interested in Rust pick ten of those and see how things work.

Dave Airlie said that some maintainers will certainly be scared by the addition of a new language; they are going to have to take the time to learn it by writing something useful. They will need to know that there are resources out there to help them in dealing with Rust code in their subsystems. Torvalds said that Rust is not that hard to read, even if the error-handling patterns are very different. Anybody who can review patches should be able to pick up enough Rust to review code in that language.

Ted Ts'o suggested that the Rust developers should post patches more regularly — every week or two. Developers will look when something shows up in their inbox, that is the way to get their attention.

Airlie said that there are examples of Rust code at the edges of the kernel, such as drivers. Has any work gone into putting Rust into the core, with C code at the edges? Ojeda answered that the Rust developers are not trying to rewrite things in the core kernel; instead, they are making a set of abstractions so that drivers can be written in safe rust. A C driver using a Rust core would lose a lot of the advantages of using Rust, he said; once you go to Rust, you want to stay there.

Ts'o raised the often-heard concern of wide-ranging, cross-subsystem changes. How hard are those going to be when there is Rust code involved too? It is fine if the GPIO maintainer buys in, since that subsystem is reasonably well contained. But if, say, the filesystem developers have to make a change that breaks the Rust GPIO interface, what happens then? Ojeda repeated that kernel maintainers need to buy into the change, and that they need to be ready to review changes to their subsystem. Developers from other subsystems who want to make a change in a Rust-using subsystem will need to learn the language. The Rust developers can provide help, but it will not be enough; if the kernel maintainers want Rust, they are going to have to help. Mark Brown said that, even when maintainers are enthusiastic, review from the Rust developers to be sure that work is "tasteful" will be important. After all, he just started learning the language one week ago and doesn't know what he is doing at this point.

Arnd Bergmann said that putting the Rust code into a corner of the kernel tree is not the right approach. In the current patches there is one top-level Rust directory, but it would be far better to distribute the Rust code among the affected subsystems. As little of that code as possible should be in a central location. Ojeda answered that there needs to be some general support code, but that a lot of things could be split out of the current kernel crate and put into subsystem-specific crates.

Al Viro asked about the stability of the Rust toolchain, noting that the current requirement to use the latest Rust compiler is a problem. At least once a month he ends up bisecting a problem over three or four years of history; if a different compiler is needed at each bisection step, things are not going to work. Ojeda said that the kernel work is using unstable Rust features now, and that is a problem; it does make compatibility harder. For now, the Rust patches support a single Rust version for each kernel release, and they cannot guarantee that a newer compiler will work later on. So yes, bisection would require changing compiler versions.

If, however, Rust support gets into the mainline kernel, the situation might change. That would put pressure on the Rust development community to stabilize the needed features in the language — though there is never a guarantee that this will actually happen. Sooner or later, though, it will be possible to build a kernel using only stable Rust features; at that point the compatibility problems should go away. He would understand, he said, if the community chose to not merge Rust support until that happens.

Ts'o said that, if the Rust community wants the public-relations win that would (presumably) come from use in the kernel, it might well feel moved to stabilize the needed features. Ojeda said that he had been invited to Rustconf this year, which was a good sign, but then his submission on Rust in the kernel was rejected — a less-good sign. The consideration of Rust for the kernel was then highlighted at that conference, though, so there is definitely some interest in the Rust community at some level.

Thomas Gleixner said that he is not opposed to the experiment, and he likes some of the concepts in the language. He is worried, though, about "the blank page where the memory model should be". Ojeda said that the Rust community is taking its memory-model cues from the C++11 and C11 standards, but nothing is finalized. During a conversation earlier that week, Ojeda said, he encouraged Paul McKenney that this is the best time to go to the Rust community and tell them how things could be; this is an opportunity to fix issues in the C memory model and do things right.

Torvalds concluded the session by reiterating that bringing Rust into the kernel is an experiment, the community is just putting its toes into the water. It will take years to sort things out and determine whether it actually works in the kernel. He is positive toward Rust, he said, and likes the language; he wants a safer language, especially for driver development. But, he admonished at the end, he does not expect to rewrite the whole kernel in the Rust language.

Comments (24 posted)

Conclusion: is Linus happy?

By Jonathan Corbet
September 27, 2021

Maintainers summit

The final session of the Kernel Maintainers Summit is traditionally given over to Linus Torvalds, who uses the time to talk about any pain points he is encountering in the process and what can be done to make things run more smoothly. At the 2021 Summit, that session was brief indeed. It would appear that, even with its occasional glitch, the kernel development process is working smoothly.

Torvalds started by saying that the 5.15 merge window was not the easiest he has ever experienced. Part of the problem, he suggested, was that the merge window came at the end of the (northern-hemisphere) summer; much of Europe had been on vacation, and that led to a lot of pull requests showing up at the end of the merge window. In general, though, things are working. His biggest annoyance, perhaps, is having to say the same things over and over during each merge window. The core maintainers know how the process works, those in less central positions tend to make the same mistakes repeatedly; when he takes over 100 pull requests during a merge window, it can add up to a fair amount of irritation.

Overall, though, Torvalds said that the community is doing pretty well. He also feels that he, personally, is not a bottleneck who is slowing others down.

He touched briefly on the issue of the folio patches, which were not merged for 5.15. These patches, he said, are reworking a core data structure that has been in the kernel since nearly the beginning. Delaying that work for one development cycle is just not a big problem. In general, he said, if there is controversy around a pull request, he will not actually do the pull.

Torvalds closed by saying that he would like to get feedback from developers if there is any subsystem that is particularly problematic for developers to work within. He is unable to help a lot on his own when it comes to subsystems that he does not know well.

At that point, Torvalds was finished, and the group was seemingly tired. The 2021 Maintainers Summit came to a close with no further discussion.

Comments (4 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Briefs: Authenticated boot and disk encryption; TAB election results; coreutils 9.0; Youth hacking for freedom; Quotes; ...
Announcements: Newsletters; conferences; security updates; kernel patches; ...

Next page: Brief items>>