LWN.net Weekly Edition for April 7, 2022 [LWN.net]

Welcome to the LWN.net Weekly Edition for April 7, 2022

This edition contains the following feature content:

Debian still having trouble with merged /usr: many years after other distributions have done away with the separation of /usr from the root filesystem, Debian is still struggling with it.
Indirect branch tracking for Intel CPUs: the kernel gains the ability to use a new(ish) control-flow integrity mechanism.
A security fix briefly breaks DMA: a look at the complications of DMA and how a seemingly simple security fix made things go wrong.
5.18 Merge window, part 2: the changes pulled into the mainline in the latter half of the 5.18 merge window.
Gathering multiple system parameters in a single call: a proposed new system call to gather filesystem (and beyond) information runs into resistance.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Debian still having trouble with merged /usr

By Jake Edge
April 5, 2022

The addition of the "/usr merge" feature has been something of longstanding mess in the Debian world. It seems like a relatively innocuous change, which is in keeping with the practice of most other distributions at this point; it effectively eliminates the top-level /bin, /sbin, and /lib* directories in order to move their contents to the corresponding locations under /usr. But ever since we first covered the feature introduction for Debian—more than six years ago—it has a been a recurring series of headaches within that community. Recent events have seemingly simply prolonged the pain, though perhaps the end is in sight.

Background

The /usr merge idea was first raised in "The Case for the /usr Merge" by Lennart Poettering in 2012. It came out of the systemd community, but was not meant to be systemd-specific, though, as the comment thread on that LWN post indicates, the idea was controversial at least in part because of its association with systemd. At its core, it was a straightforward plan to move the files in /bin to /usr/bin, then make a symbolic link so that /bin would resolve to /usr/bin; the same would be done for the other top-level directories of interest (/sbin and /lib*).

The main reason behind the merge was for compatibility with other Unix systems and to make building packages for Linux distributions easier; upstream software, build tools, and the like would no longer have to treat Linux specially. Anything that relied on /bin/foo or /usr/bin/foo would find what they were looking for, but everything would live under /usr. That would permit things like an immutable, distribution-supplied /usr, snapshots of the system state would be simpler, and so on.

Fedora, as is often the case, took the lead and implemented merged /usr, for Fedora 17 in 2012. Other distributions followed suit, with most of the mainstream distributions following the Fedora approach of having a "flag day" switch; new or upgraded versions of the distribution would either install a merged /usr or switch the system to use that as part of an upgrade. Supporting both the old and the new layout simultaneously was not something those distributions chose to do.

Debian took a more incremental approach, in part because it strives not to make wholesale changes to users' systems like those required by a flag-day upgrade to a merged /usr. In 2016, the ability to voluntarily switch to that scheme was added, then some attempts were made for newer versions of the distribution to be installed with a merged /usr by default. In 2018, the Debian build systems were switched to use that layout, but problems cropped up because packages built under a merged /usr would sometimes fail to work on non-merged systems. The location of some files was being resolved at build time to point into /usr/bin (for example), but those files only existed in /bin on the non-merged systems.

The build system changes were reverted, so that both types of systems could be supported. But the problem led Ian Jackson to ask the Debian technical committee (often abbreviated as CTTE or ctte), which is the ultimate arbiter of technical disputes within the project, to override the decision by the debootstrap developers to make merged /usr be the default. Debootstrap is used to create a base system into a directory that can then be used as a chroot environment for building additional packages, though it has other uses as well.

The committee declined to overrule the debootstrap maintainers, but did give a lengthy summary of the cases for and against /usr merge. The committee also used its advisory role to note that the then-under-development Debian 10 ("buster") would support both system types in a "weak" form, but suggested that it was desirable that Debian 11 ("bullseye") move toward a "middle" ground with better support for merged /usr.

Debian, being Debian, discussed the issue multiple times along the way, at varying levels of intensity and rancor. Supporting both types of systems was difficult in various ways, leading some to say that either /usr merge should be made mandatory—or scrapped entirely. At the end of 2020, Ansgar Burchardt asked the technical committee to decide whether Debian 12 ("bookworm") should only support merged /usr systems. At the end of January 2021, the question was put to a vote, and the result was unanimous in favor of doing so:

The Technical Committee resolves that Debian 'bookworm' should support only the merged-usr root filesystem layout, dropping support for the non-merged-usr layout.
Until after the release of 'bullseye', any implementation of this resolution must be done in the 'experimental' distribution, or otherwise kept out of the critical paths for the release of 'bullseye'.

Symlink farms

That would seem to resolve the situation, in favor of a flag day for Debian 12, but, as might be guessed, things did not quite work out that way. During the discussion on the vote, committee member Simon McVittie expressed concern that there were two competing visions of what a merged /usr meant. He wondered: "Should we be more specific than this in what we vote on, to avoid later having to adjudicate between developers who say that a particular implementation is or isn't merged-usr?" That comment would foreshadow the events to come.

The "standard" way to support /usr merge, as described above, is sometimes called "merged /usr via aliased directories". It is what the usrmerge tool, which got the ball rolling for Debian in 2016, does; that mechanism is tracked on the UsrMerge page on the Debian wiki. An alternative, which is championed by dpkg maintainer Guillem Jover, uses "symlink farms", which McVittie described as follows:

an arrangement where all regular files that have traditionally been in /bin, /sbin, /lib and /lib64 are physically located in /usr, with /bin etc. becoming "symlink farms" containing symlinks like /bin/sh -> /usr/bin/sh, /lib/ld-linux.so.2 -> /usr/lib/ld-linux.so.2 and so on

There was some discussion in the bug entry about the advantages of the two different arrangements, but most of the participants seemed to think that the symlink-farm approach was not desirable, viable, or both. Along the way, McVittie realized that the earlier decision on debootstrap seemed to indicate that the committee thought that the aliased-directories mechanism was the singular approach to solving the /usr merge problem. His vote reflected that understanding.

The discussion moved to the mailing lists, where a certain amount of consensus was found, at least in some areas. McVittie opened a bug asking the committee to resolve the aliased directories versus symlink farm question along with others swirling around the upcoming /usr merge transition for bookworm. He was asking the committee to use its formal advisory role to offer guidance on how that transition should proceed.

A lengthy statement was put out for a vote by the committee; once again, it passed unanimously. The advice clearly rejected the symlink-farm approach, in favor of aliased directories, and made it clear that packages could not assume that systems have a merged /usr until Debian 13 ("trixie") development starts. Beyond that, the committee advised that upgrading from either type of system to Debian 12 should work, packages should build and function on both types of systems (until after Debian 12), and a few other items. The bug was marked as "done" on October 18 and, once again, it seemed that the situation was resolved.

Dpkg resistance

Meanwhile, though, the dpkg maintainer was adamantly opposed to the aliased-directories approach; his complaints were described in a MergedUsr page in the dpkg team's section of the wiki. It presented a long list of problems with that approach because the mechanism is handled outside of dpkg itself:

This approach goes behind dpkg's back, and has caused and [does] now cause problems due to the aliased directories, as multiple pathnames canonicalize into different pathnames that point to the same dentry, which can mess up anything that handles pathnames in databases and similar. Note: dpkg has supported for a long time symlinked directories as a way to allow local admins to manage filesystem size constraints, by moving directories contents into other partitions, but has never supported aliased directories via symlinks pointing to other directories already tracked by dpkg. This approach makes it impossible to know what is the canonical (from dpkg's PoV) pathname for an object just by looking at the filesystem, so it is trivial for users use the wrong one.

Many of the entries in the list were marked with a symbol indicating: "This approach is considered broken by design and is unsupported by dpkg. dpkg-buildinfo marks packages built on these systems as tainted in the .buildinfo file." The symlink-farm approach is superior, the page says, even though it still has the aliasing problem for the symbolic links that get installed:

This still suffers mild aliasing problems, but only for pathnames that will end up in both locations due to backwards compatibility symlinks, which should keep decreasing, and any such issue will self-heal over time, eventually ending up being just a handful of them. The big difference is that in the long term this is a tiny and decreasing bounded set of pathnames that might end up causing problems, while the other layout is an unbounded set affecting all pathnames permanently. Another thing that makes a difference is that the object is a symlink, and not the same object being accessed from different pathnames.

Both that page and the dpkg FAQ entry recommend using the dpkg-fsys-usrunmess utility, which "undoes the merged-/usr-via-aliased-dirs mess". So it was perhaps unsurprising when Josh Triplett reported in mid-March that dpkg had started to emit a warning when it was installed or upgraded:

Setting up dpkg (1.21.2) ...
dpkg: warning: System unsupported due to merged-usr-via-aliased-dirs.
dpkg: warning: See <https://wiki.debian.org/Teams/Dpkg/FAQ#broken-usrmerge>.
This escalation seems in direct contradiction to the tech-ctte decision in 994388. Moreover, this seems to effectively use package maintainer scripts as a means of directing a complaint at Debian users that has not gotten traction in other forums, and then directing such users at a wiki page that contradicts a prior project decision.

His report was aimed at the committee bug that resulted in the decision back in October. As Russ Allbery noted, that kind of a warning "will be perceived by users as an official declaration from Debian as a project that their system configuration is unsupported, while simultaneously this is the default installation mode for new systems and something that we have elsewhere said is a correct system configuration". Burchardt said that the warning was already causing confusion and asked the committee to decide quickly on how to handle it "to avoid this becoming yet another energy drain (we had several sufficiently long enough threads about this topic already)".

Bugs and fixes

There is a question of what "supported" means in the context of /usr merge, Helmut Grohne said; he was added to the committee at the beginning of the year after two members reached their term limit and retired from it. He said:

At this point, neither merged-/usr nor unmerged-/usr is supported well. Both are broken in one way or another and nobody steps up to fix the mess. In particular, the dpkg maintainer does not support merged-/usr in dpkg (which is his constitutional right as long as he does not block reasonable patches), but neither does anyone else. As such I find it difficult to disagree with the content of the warning. I do see how it confuses people. It definitely does not reach people who could do something about [it]. Rather it takes users as hostages.

As had been mentioned in some of the (many) threads surrounding this issue, Ubuntu has successfully made the move to a merged /usr; as Luca Boccassi put it: "And on top of new installations, old installations of Ubuntu upgrading to 21.10 and/or the soon-to-be-released 22.04 have been [forcefully] migrated too. They are not blocked, unsupported, or broken." There have been few problems with that transition, at least according to /usr merge proponents, so whatever bugs exist are not show-stopping. But Allbery cautioned against taking that too far:

We know that merged-/usr is buggy, in that one can construct a set of package operations that leave the system in an invalid state. We have a project disagreement over how serious those bugs are. No one is stepping forward to fix those bugs, which is indeed quite unfortunate. I personally strongly disagree with the belief that simply because Ubuntu hasn't seen many instances of this class of bugs while using a package set where people have not moved files between packages and out of /lib and /bin very much if at all, it is acceptable to leave dpkg in that buggy state.
[...] I personally am disappointed that the folks who have been pushing merged-/usr forward are willing to leave dpkg in a known-buggy state without attempting to patch it to fix the remaining issues. I realize that there are various obstacles in successfully doing that, not all of which are technical, but I want to believe that Debian is the sort of project that will do the hard work (both technical and social) to fix edge cases and maintain a high level of consistency and correctness.

But Triplett disagreed with that characterization:

It does not seem at all obvious that such patches would have been accepted, given the repeated vehement objections from the dpkg maintainer about the chosen approach. Those objections did not invite contribution; at every point, the assertion was that usrmerge was broken, not that dpkg needed help supporting it.

Boccassi said that looking at Ubuntu's experience is legitimate; the problems identified are relatively small and "it's believed to be impossible to get them fixed" over the maintainer's objections. He asked what else would be a reasonable way forward. Allbery replied that proponents of the change should create a patch to fix the problems that have been identified in dpkg even though it might be rejected. In part, it comes down to finding something that is actionable under the powers of the technical committee, he said in a follow-up: "It's difficult, procedurally, for the TC to do anything about a theoretical patch that someone could write but hasn't written."

Committee chair Sean Whitton agreed with that analysis, though he did say that he thought the committee could take action to get the warning changed or removed; other members concurred with that view. Shortly thereafter, Triplett pointed out that the warning message had changed, though it was not really any better in his view.

Grohne also wanted to see fixes for dpkg to better support merged /usr. He even mentioned the possibility of reverting the change:

If the resulting bugs do not get fixed, we may need to consider other means for limiting their impact. The most obvious method here is revisiting the decision and considering whether the /usr-merge may have failed. On a process level, it certainly has failed. At some point, we may need to look at a bigger picture than the technical one. If the people driving the change are not able to do it, then maybe we should not have that change in the first place and revert back to the known working state. Of course that route is not without cost.

A patch arrives

But as the thread progressed, Boccassi noted that a link to a dpkg patch had just been posted to IRC by a user named "uau". Allbery (among others) was elated to see the patch and immediately had some feedback on ways to improve it. Committee member Gunnar Wolf thought that the "patch seems sane from a first, very much 10000m-point-of-view"; he wondered if it had "been shared with Guillem, or included in any relevant bug report".

Boccassi followed up with some more information from the author of the patch, again from IRC. The author said "that some time ago the patch was presented to the dpkg maintainer, who rejected it with an answer along the lines of the usual 'usrmerge is broken by design', with no further comment". Boccassi wondered if those who had been looking for a patch of this nature would "try and take it forward themselves". Based on that conversation with the author, the license of the patch is the same as for dpkg, so that should not be a barrier.

Grohne said that he remembered some more concrete criticism of the patch and that the patch itself says that it is incomplete, so it cannot go into dpkg in its current form. He lamented the state of the transition, and once again raised the specter of reverting the whole thing:

The more and more I have to deal with the /usr-merge the more I get disappointed by how badly this transition is planned and carried out. In principle, the technical merits seem solvable to me, but the total failure on the process level leaves me wish for a revert. I am really surprised that instead of improving the process, you carry on with that destructive attitude. Given this, it seems unsurprising that Guillem does not want to interact with you. Of course that's not an excuse for implementing the recent changes to dpkg. The communication is clearly failing on both sides, which is why we're here at the ctte again.

Burchardt asked that Grohne stop suggesting a revert unless he wanted to start the process of asking the committee (or the developers by way of a general resolution) to do so. There are lots of problems with making that change at this point, so: "If you don't want that solution, please don't suggest it repeatedly: it's non-motivating to spend any further time."

Things went further off the rails on March 29, with a non-maintainer upload (NMU) for dpkg made by Bastian Blank that removed the warning and stopped installing dpkg-fsys-usrunmess. An NMU is something of a hostile act for a package with an active maintainer. As reported by Triplett, the NMU was quickly overridden by Jover with a comment: "This also clears a bullying NMU." As Triplett put it: "we're now even further into full-blown 'fights in the archive' territory".

In his most recent message (as of this writing), Grohne said that Boccassi was not being constructive and that he was not surprised Jover did not want to communicate with him about the problems. He reiterated some of the problems that have occurred with the feature, noting that the current problem was largely not technical:

It has a history of its proponents not fixing the resulting bugs, but deferring them to others and/or denying/downgrading them. I've definitely spent more than a week on fixing /usr-merge breakage excluding the time discussing it. It is not working fine at all. Possibly, it is fixable on a technical level, but it is totally broken socially. Please stop this unconstructive behaviour.
While this may sound single-sided, Guillem's behaviour wrt /usr-merge cannot be described as constructive either. Rest assured, that side of the picture is not being ignored. That should also be evident from dpkg.git at this time.

The dpkg Git repository does show a recent change by Jover that removes the offending warning, but only on Debian systems. The commit message could certainly be interpreted as non-constructive, as well, however. For example: "On Debian some people seem so offended by a (factual) warning, that the obvious recourse has been for them to bully and abuse." The communication between proponents of /usr merge and the maintainer of the critical package-manager for the distribution seems utterly broken at this point—probably irretrievably so.

It is not clear where things go from here. The technical committee has repeatedly made its feelings known on the path it wants to see the project take, but it cannot order anyone to do anything, exactly. Debian developers are volunteers and its packages are the personal fiefdoms of their maintainers. The Debian Constitution does allow the committee to override developer decisions—to not accept a patch, for example—but someone has to voluntarily do the work to bring the late-breaking dpkg patch up to snuff.

It seems like some project elder (in terms of experience, not necessarily age) could steer this kind of change through the rocks. While it would appear to be vanishingly unlikely that the committee would backtrack on its three earlier decisions, even though there are two new members since they were made, it is not impossible, at least in theory. The bugs that exist in dpkg with the aliased-directories version of /usr merge may simply live on as irritations for a time, since they do not really seem to rise to the level of a release-blocking bug, at least so far. We will all have to wait and see how it turns out in the next episode of "As the Debian Turns" ...

Comments (186 posted)

Indirect branch tracking for Intel CPUs

By Jonathan Corbet
March 31, 2022

"Control-flow integrity" (CFI) is a set of technologies intended to prevent an attacker from redirecting a program's control flow and taking it over. One of the approaches taken by CFI is called "indirect branch tracking" (IBT); its purpose is to prevent an attacker from causing an indirect branch (a function call via a pointer variable, for example) to go to an unintended place. IBT for Intel processors has been under development for some time; after an abrupt turn, support for protecting the kernel with IBT has been merged for the upcoming 5.18 release.

The kernel, like many C programs, makes extensive use of indirect branches. As a simple example, consider system calls; user space provides a number indicating which system call is required, and the kernel responds by looking up the appropriate function from a table (using that number) and calling that function via an indirect branch. Function pointers abound in the kernel; among other things, they are used to implement its vaguely object-oriented programming model.

If an attacker is able to somehow corrupt a variable that is used for indirect branches, they may be able to redirect the kernel's execution flow to an arbitrary location. That could result in unintended function calls; on complex processors like x86, it is also possible to get interesting results by jumping into the middle of a multi-byte instruction. Exploit techniques like return-oriented programming and jump-oriented programming depend on this kind of redirection.

IBT is meant as a defense against jump-oriented programming; it works by trying to ensure that the target of every indirect branch is, in fact, intended to be reached that way. There are a number of approaches to IBT, each with its own advantages and disadvantages. For example, the kernel gained support for a compiler-implemented IBT mechanism during the 5.13 development cycle. In this mode, the compiler routes every indirect branch through a "jump table", ensuring that the target is not only meant to be reached by indirect branches, but that the prototype of the called function matches what the caller is expecting. This approach works, at the cost of a fair amount of compile-time and run-time overhead.

Intel's IBT

The Intel IBT approach is rather simpler, but it has the advantage of being supported by the hardware and, as a result, being faster. If IBT is enabled, the CPU will ensure that every indirect branch lands on a special instruction (endbr32 or endbr64), which executes as a no-op; if anything else is found, the processor will raise a control-protection (#CP) exception. Unlike the more complete scheme described above, IBT cannot ensure that the target of an indirect branch matches the caller's expectations, but it can ensure that the target was meant to be reached in this way.

Turning on a mechanism like this will only work if every possible target of an indirect branch begins with one of the endbr instructions. For the most part, this task can be handled by the compiler; both GCC (as of GCC 9) and Clang (as of version 14) implement the -fcf-protection=branch option and will insert these instructions when it is present. That doesn't help with all of the assembly code in the kernel, though. So the bulk of the work (in terms of changesets) is devoted to adding endbr instructions wherever they seem to be needed.

One other small complication comes about when the kernel calls into somebody else's code, which may not have been built with IBT in mind. The kernel does not call outside code often, but one big exception is the system's firmware, which must often be invoked to carry out specific functions. To be safe, the kernel makes a point of turning off IBT around calls into firmware. The current implementation also turns off IBT when giving control to user space.

The need to add endbr instructions to all indirect jump targets sets a potential trap for the future; developers may add assembly functions and forget that instruction. If they do their testing without IBT enabled, the omission will not be noticed, and it may not pop up until some extremely inconvenient time after the faulty work has been merged. To prevent this eventuality, the kernel's objtool utility has been enhanced to check all indirect branches and ensure that all targets are appropriately annotated.

With that checking in place, though, there's another step that can be taken: objtool can also make a list of all functions containing endbr instructions that can never be called via an indirect branch. Those functions do not need that annotation, and the kernel would be a little more secure without them. So the kernel build process takes that list from objtool and "seals" those functions by overwriting the endbr with a nop4 instruction. That reduces the number of targets an attacker can still choose from when IBT is enabled.

As Peter Zijlstra pointed out, there is another, perhaps surprising advantage to removing the unneeded endbr instructions. The kernel limits the functions that are available to loadable modules, and proprietary modules are limited even further. It is a common technique for proprietary modules to look up the non-exported functions they need in the kernel's symbol table, then call them via an indirect branch, thus bypassing the kernel's limitations. But, with IBT enabled, any function lacking an endbr instruction will no longer be callable in this way.

An indirect path to the mainline

The effort to get Intel IBT support into the Linux kernel has been ongoing for some time; the first patches implementing support (for user-space code rather than for the kernel) were posted by Yu-cheng Yu in 2018. This work then seemingly became one of those flying-Dutchman patches that continually cross the mailing lists without ever managing to land in the mainline; version 30 was posted in August 2021 and seemed no closer to merging. A similar fate befell the user-space shadow-stack patches, which were recently taken over by Rick Edgecombe after many previous revisions.

Late last year, Peter Zijlstra decided to create a separate Intel IBT implementation to protect the kernel itself; the first version was posted last November after Zijlstra evidently "hacked this up on Friday night / Saturday morning". The work evolved quickly, and the fourth revision, posted in early March, is the code that was merged for 5.18.

That is where things stand today. IBT is supported, for kernel code only, in Intel processors starting with the Tiger Lake generation, which hit the market in late 2020. It is not a perfect tool, but it will raise the bar for attackers on systems where it is present and enabled. Meanwhile, it is not clear when (or whether) user-space support will find its way into the kernel; many of the 30 revisions posted so far have received no comments at all.

Comments (27 posted)

A security fix briefly breaks DMA

By Jonathan Corbet
April 1, 2022

In theory, direct memory access (DMA) operations are simple to understand; a device transfers data directly to or from a memory buffer managed by the CPU. Almost all contemporary devices perform DMA, since it would not be possible to obtain the needed performance without it. Like so many things, DMA turns out to be a bit more complicated in practice. That complexity led to an erroneous patch, intended to improve security, breaking DMA for some devices in 5.17 and some stable kernels.

The simple model of a DMA transfer looks something like this:

There is a buffer in memory that can be accessed — for either reading or writing — by both the CPU and the peripheral device, and either side can access the buffer at will. Some systems actually work this way but, in most others, various complications come into play. For example, there is almost certainly a set of caches between the CPU and the memory buffer. If the CPU writes data to the buffer with the intent of transferring it to the device, that data may be resident in the cache for some time before being flushed to main memory. If the device is instructed to read the data from the buffer, it may not see the data in cache, resulting in incorrect (corrupted) data being written to the device.

Similarly, on many architectures, the CPU cache may be entirely unaware of data written to the DMA buffer by the device. If the CPU tries to read that data, it could instead get stale data from the cache, once again resulting in corrupted data. Long experience in the kernel-development community has shown that users have a certain tendency to become irate when that happens.

Ownership, direction, and bounce buffers

The kernel's DMA-support layer has grown a set of mechanisms designed to prevent data corruption and the unhappiness that follows from it. In particular, the DMA API can avoid the pitfalls that come with systems that are not cache-coherent — where copies of data in different locations may not be synchronized with each other. When dealing with non-cache-coherent (or "streaming") mappings, driver code must keep track of two important attributes: the ownership of the buffer and the direction that the data is moving. Ownership describes which side (the CPU or the device) is able to access the buffer at any given time; while data direction describes what the owner will be doing with the buffer. There are two functions (among others) that manage these attributes for a given DMA buffer:

    void dma_sync_single_for_device(struct device *dev, dma_addr_t addr,
				    size_t size, enum dma_data_direction dir);
    void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, size_t size,
				 enum dma_data_direction dir);

When a DMA buffer is first mapped, the ownership belongs to the CPU. A call to dma_sync_single_for_device() sets the ownership to the device, while dma_sync_single_for_cpu() brings the ownership back to the CPU. The dir argument to both functions describes the direction that data is moving; it is one of DMA_NONE, DMA_TO_DEVICE, DMA_FROM_DEVICE, or DMA_BIDIRECTIONAL. Note that terms like "read" and "write" are not used here, since in almost any DMA transaction one side is reading from the buffer while the other is writing to it.

The combination of the ownership and the direction tells the DMA layer what needs to be done at any given time. If the CPU is passing ownership to the device, and the direction is DMA_TO_DEVICE, then the DMA layer must take care to flush any data from the CPU caches to the buffer so that the device sees the correct data. When taking ownership back from the device, with a direction of DMA_FROM_DEVICE, the DMA layer should instead invalidate any cached data, since it may well be incorrect. As long as the driver is careful to set the ownership and direction of the buffer properly at all times, there should be no problems with data corruption.

There is one other complication to cover in order to understand what went wrong. There are times when the device can perform DMA, but it cannot directly access the buffer that the CPU is using. The buffer may be in a region of memory that the device cannot reach, it may be split into too many discontiguous pieces, or there may be an I/O memory-management unit (IOMMU) complicating the picture. When this happens, a separate DMA-layer module (called the "software I/O translation lookaside buffer" or SWIOTLB) allocates a "bounce buffer" that is accessible to the device:

In this situation, data must often be copied between the two buffers. If the CPU is sending data to the device, for example, data must be copied ("bounced") from the CPU's buffer to the bounce buffer, and the address of the bounce buffer handed to the device. Data will be copied in the opposite direction when data is being received from the device. The good news is that the DMA API is able to handle this case transparently as long as the ownership calls described above are properly used. Device drivers need not know whether a bounce buffer is in use or not.

A problem with bounce buffers

There is a potential security problem with bounce buffers, though. If data is being transferred from the device to the CPU, the DMA code will, when ownership of the DMA buffer returns to the CPU, copy the contents of the bounce buffer back to the CPU's buffer. If, however, the device did not write a full buffer's worth of data, then some of the data copied out of the bounce buffer will have originated from somewhere else. It may come from a previous I/O operation, or from an entirely unrelated kernel subsystem. The device driver is then likely to copy this data back to user space; indeed, the buffer might be directly mapped into user space to begin with. The end result is that data is leaked from the kernel to user space.

It makes sense to zero the bounce buffer before setting it up, but DMA buffers are often used many times, and it is not normal to zero them between operations. So even a bounce buffer that was zeroed at the beginning may accumulate unrelated data and expose it to user space. A real-world example of this problem was deemed CVE-2018-1000204 and fixed in 2018, but that fix is only partial, since it only zeroes the buffer at allocation time.

Halil Pasic wrote a more complete fix that was merged in February; a followup fix was added in early March. It works by changing the behavior of dma_sync_single_for_device(). If the data direction is DMA_TO_DEVICE, the contents of the CPU's buffer must be copied to the bounce buffer, and the DMA API has always done that. Pasic's patch caused that same copy to happen for all operations, even when the direction is DMA_FROM_DEVICE.

Normally, it would not make sense to copy data into the bounce buffer for DMA_FROM_DEVICE; the device is just going to overwrite it anyway. But this copy ensures that, if the device only writes part of the bounce buffer, the remainder of the CPU's buffer will not be overwritten by random data when the bounce buffer is copied back; instead, it will get back a copy of the data that was already there. In a sense, it makes the end result reflect what the device has actually done, in that only data written by the device will change in the CPU's buffer.

The latter fix was merged for the 5.17-rc8 development kernel — less than two weeks before the final 5.17 release — and quickly found its way into the 5.16.15 and 5.15.29 stable updates.

Regression

On March 23, Oleksandr Natalenko reported that this change broke the ath9k wireless driver. There followed an extended discussion where it took a while to figure out what was really going on. Robin Murphy was initially incredulous:

I'm extremely puzzled how any driver could somehow be dependent on non-device-written data getting replaced with random crap, given that it wouldn't happen with a real IOMMU, or if SWIOTLB just didn't need to bounce, and the data would hardly be deterministic either.

The source of the problem was eventually narrowed down to some code in ath_edma_get_buffers(). This code executes a sequence that looks like this:

    dma_sync_single_for_cpu(..., DMA_FROM_DEVICE);
    if (! a_packet_is_ready()) {
        dma_sync_single_for_device(..., DMA_FROM_DEVICE);
	return false;
    }

The code is taking ownership of the buffer, checking to see if the device has put a packet there, then returning ownership to the device if there is no packet available. The key point here is that, while this is happening, the device is still writing to the buffer; that packet could be arriving while the driver is checking for it. This procedure worked just fine until the change went in; with the new behavior, the dma_sync_single_for_device() call copies the CPU buffer into the bounce buffer, potentially overwriting data that was just placed there by the device. This happens often enough, it seems, to reliably break the device.

Developers like Christoph Hellwig initially saw the ath9k behavior as a bug, and felt that the problem should be fixed there. Murphy described this behavior as "a violation of the DMA API". The problem was that the device was allowed to keep writing to the DMA buffer even though the ownership had shifted to the CPU. It seemed, for a moment, that the ath9k driver could be fixed and the bounce-buffering change could remain.

Linus Torvalds disagreed strongly, though, for a few different reasons. He noted that ath9k might not be the only driver that shows this kind of problem; it just showed up first there because those adapters are widely used. If other drivers do similar things, users and developers could end up chasing bugs for a long time. Some of those bugs might well make it into production kernel releases before being noticed.

More to the point, though, he stated that the ath9k driver's behavior was correct, and the bounce-buffer change was not. Specifically, he pointed out that the dma_sync_single_to_device() call specified DMA_FROM_DEVICE. In that situation, he claimed, the bounce-buffer implementation should do nothing at all; the data is coming from the device, so the CPU has no business copying data into the bounce buffer. So the only right thing to do is to revert the patch.

In the end, reverting the commit is exactly what happened, though it was later partly reinstated to preserve the parts of the patch that were not problematic. So the ath9k driver works again, and potential bugs in an unknown number of other drivers have been avoided. The problem of leaking random data out of the bounce buffer remains unsolved; a different approach will need to be found to resolve that one.

Comments (20 posted)

5.18 Merge window, part 2

By Jonathan Corbet
April 4, 2022

Linus Torvalds released the 5.18-rc1 kernel prepatch on April 3, after having pulled 13,207 non-merge changesets into the mainline repository. This merge window has thus not only been turbulent, with a significant number of regressions and refused pull requests, it has also been relatively busy. Just over 9,000 of those changesets were pulled after the first 5.18 merge window summary was written; the time has come to catch up with the remainder of changes merged for this development cycle.

The most interesting changes pulled in the latter part of the 5.18 merge window include:

Architecture-specific

Live patching is now supported for 32-bit PowerPC systems.
The RISCV architecture has gained support for the "Sv57" page-table format. This is a five-level format with support for 57-bit virtual addresses.
The RISCV perf implementation has been ripped out and replaced with one based on the SBI PMU and Sscofpmf extensions; that allows for support of most perf features.
The RISCV SBI CPU-idle extension is supported as of 5.18.
RISCV has also gained support for restartable sequences.
The Intel software-defined silicon driver, which allows Intel to control which features of a given processor can be used, has been merged.
Support for AMD's "host system management port", which is "an interface to provide OS-level software with access to system management functions via a set of mailbox registers", has been merged. A small amount of additional information can be found in this documentation commit.

Core kernel

The BPF type format (BTF) mechanism can now annotate variables that refer to user-space memory. Among other things, this gives the verifier a better way to detect and check user-space memory accesses. See this merge changelog for more information.
The BPF program-packing memory allocator has been merged; it allows for more efficient memory use in systems with a large number of loaded BPF programs.
The MADV_DONTNEED madvise() command now works with hugetlb pages.
There is a new madvise() operation, MADV_DONTNEED_LOCKED, which will (like MADV_DONTNEED) cause the reclaim of the indicated pages. Unlike MADV_DONTNEED, though, this operation even applies to pages that have been locked into memory; the pages are forced out, but their "locked" status does not change. So if the affected pages are faulted back in, they will be locked again. This changelog explains the reasoning behind this functionality.

Filesystems and block I/O

Device-mapper I/O accounting has been significantly reworked, resulting in much more accurate accounting for targets like dm-crypt.
The Reiserfs filesystem has been deprecated with an eye toward removal in 2025.
Support for write streams has been removed from the block subsystem. There are currently no devices supporting that functionality and no real prospect of any being added.
64-Bit integrity checksums on NVMe devices are now supported.
The exfat filesystem has a new mount option (keep_last_dots) that will cause it to not strip trailing periods from file names; this makes the filesystem incompatible with Windows, which does strip trailing periods.

Hardware support

Clock: Microchip PolarFire clock controllers, Renesas RZ/G2L clock controllers, Renesas 9-series PCIe clock generators, NXP i.MX93 clock controllers, StarFive JH7100 audio clocks, Apple M1 numerically controlled oscillators, Qualcomm QCM2290, SM6125, and SM6350 display clock controllers, Qualcomm SM6350 graphics clock controllers, and Allwinner H616/R329 RTC clock controllers.
Graphics: ChromeOS privacy-screen controllers, ITE IT6505 DisplayPort bridges, Solomon SSD130x OLED displays, and MIPI DBI-compatible panels.
Industrial I/O: Semtech SX9324 and SX9360 proximity sensors, Analog Devices ADXL367 3-axis accelerometers, Analog Devices ADMV1014 microwave downconverters, Analog Devices ADA4250 instrumentation amplifiers, Analog Devices ADMV4420 K-band downconverters, and Analog Devices LTC2688 digital-to-analog converters.
Miscellaneous: SiGma Micro-based keyboards, Airoha EN7523 GPIO controllers, uPI uG3105 battery monitors, Injoinic IP5xxx power bank ICs, Macronix external hardware ECC engines, Silergy SY7636A temperature sensors, Maxim Semiconductor MAX77714 power-management ICs, AMD PSP I2C semaphores, MediaTek ADSP mailbox controllers, ASPEED PECI controllers, Layerscape security fuse processors, Sunplus on-chip controllers, Sunplus UARTs, Rockchip NANENG COMBO PHYs, MediaTek keypads, and Imagis IST30xxC touchscreens.
Networking: Realtek RTL8367S Ethernet switches, Davicom dm9051 SPI Ethernet controllers, Fungible Ethernet adapters, MediaTek MT7986 wireless MACs, MediaTek MT7921U 802.11ax 2x2:2SS wireless adapters, Lynx 28G SerDes PHYs, and I2C-connected Management Controller Transport Protocol (MCTP) devices implementing the DSP0237 specification.
Pin control and GPIO: Broadcom BCM4908 pin controllers, Meson s4 pin controllers, Sunplus SP7021 PinMux and GPIO controllers, Renesas R8A779F0 pin-function controllers, Mediatek MT8186 pin controllers, NXP i.MX93 pin controllers, Nuvoton WPCM450 pin and GPIO controllers, and Qualcomm SC8280xp pin controllers.
USB: Richtek RT1719 Sink Only Type-C USB controllers and Qualcomm embedded USB debuggers.

Miscellaneous

As usual, the user-space perf tools have seen a long list of improvements; see this merge commit for a summary.

Networking

The bridge subsystem now has support for multiple spanning trees; see this merge commit for more information.
The process of instrumenting the networking code to expose the reason for packet drops continues.
BPF programs attached to network control groups can now use a couple of new helper functions to explicitly set the return value for system calls. This enables the communication of better information about why a given system call was rejected.
Packet transmission from BPF programs run with BPF_PROG_RUN is now supported. See this merge commit, this changelog, and this documentation patch for more information.
Fragment support has been added to the express data path (XDP) mechanism, allowing the processing of jumbo frames and more. See this commit for more information.
The teardown of network namespaces has been significantly accelerated, which is important for some large systems with a lot of network traffic.

Security-related

The strict memcpy() bounds checking patches have been merged. This work should help to catch a range of memory-safety problems before they ever make it into a production kernel.
The kernel is now compiled with the -Warray-bounds and -Wzero-length-bounds warnings enabled. This is the culmination of a long-term effort to eliminate zero-length arrays and related tricks from the kernel code.
indirect branch tracking control-flow integrity has been added for the x86 architecture. This feature prevents indirect branches from being redirected to locations that were not intended as the target of such a branch. Specifically, all indirect branches must land on an ENDBR instruction.

Virtualization and containers

The virtio-crypto device has gained support for encryption with RSA. Documentation seems to be nonexistent, but this commit may be comprehensible to somebody.

Internal kernel changes

As described in this article, the kernel is now compiled against the C11 language standard rather than C89.
The new "fprobe" mechanism allows for quick function-call tracing when the full features of ftrace are not needed; see this documentation commit for more information.
The build system now supports two new environment variables, USERCFLAGS and USERLDFLAGS; they can be used to pass additional options to the compiler and linker, respectively.
There have been more significant changes to the internal support code for network filesystems; see this merge changelog for an overview.
The long-deprecated PCI DMA API has been removed; drivers should be using the regular DMA API instead.

The 5.18 kernel now moves into the stabilization phase, where the bugs that inevitably crept in with all of those new features will (hopefully) be found and fixed. Assuming the normal schedule holds, the final 5.18 release can be expected on May 22 or 29.

Comments (16 posted)

Gathering multiple system parameters in a single call

By Jake Edge
April 6, 2022

Running a command like lsof, which lists the open files on the system along with information about the process that has each file open, takes a lot of system calls, mostly to read a small amount of information from many /proc files. Providing a new interface to collect those calls together into a single (or, at least, fewer) system calls is the target of Miklos Szeredi's getvalues() RFC patch that was posted on March 22. While the proposal does not look like it is going far, at least in its current form, it did spark some discussion of the need—or lack thereof—for a way to reduce this kind of overhead, as well as to explore some alternative ways to get there via code that already exists in the kernel.

`getvalues()`

In his post, Szeredi highlighted the performance problem: "Calling open/read/close for many small files is inefficient". Running lsof on his desktop resulted in around 60,000 calls to read small amounts of data from /proc files; "90% of those are 128 bytes or less". But another problem that getvalues() tries to address is the fragmentation of the interfaces for gathering system information on Linux:

For files we have basic stat, statx, extended attributes, file attributes (for which there are two overlapping ioctl interfaces). For mounts and superblocks we have stat*fs as well as /proc/$PID/{mountinfo,mountstats}. The latter also has the problem on not allowing queries on a specific mount.

His proposed solution is a system call with the following prototype, which uses a new structure type:

    struct name_val {
	    const char *name;		/* in */
	    struct iovec value_in;	/* in */
	    struct iovec value_out;	/* out */
	    uint32_t error;		/* out */
	    uint32_t reserved;
    };

    int getvalues(int dfd, const char *path, struct name_val *vec, size_t num,
	          unsigned int flags);

It will look up an object (which he calls $ORIGIN) using dfd and path, as with openat(); flags is used to modify the path-based lookup. vec is an array of num entries for the parameters of interest. getvalues() will return the number of values filled in or an error.

The name field in struct name_val is where most of the action is. It consists of a string in a kind of new micro-language that describes the value of interest, using prefixes to identify different types of information. From the post:

mnt                    - list of mount parameters
mnt:mountpoint         - the mountpoint of the mount of $ORIGIN
mntns                  - list of mount ID's reachable from the current root
mntns:21:parentid      - parent ID of the mount with ID of 21
xattr:security.selinux - the security.selinux extended attribute
data:foo/bar           - the data contained in file $ORIGIN/foo/bar

The prefix can be omitted if it is the same as that of the previous entry in vec, so a "mnt:mountpoint" followed by a ":parentid" would imply the "mnt" prefix on the latter. value_in provides a buffer to hold the value retrieved; passing a NULL for iov_base in the struct iovec will reuse the previous entry's buffer. That allows a single buffer to be used for multiple retrieved values with getvalues() stepping through the buffer as needed. value_out will hold the address of where the value was stored, which is useful for shared buffers, and its length. If an error occurs, its code will be stored in error.

It is a fairly straightforward interface, though it does add yet another (simple) parser into the kernel. Szeredi also posted a sample program that shows how it can be used.

Reaction

Casey Schaufler pointed out that the open/read/close problem could be addressed without all of the rest of the generality with a openandread() system call or similar. He also had some questions and comments about the interface, some of its shortcuts, and its behavior in the presence of errors. Greg Kroah-Hartman noted that he had posted a proposal for a readfile() system call that would address the overhead problem as well. It was the subject of an LWN article just over two years ago. But it turns out that he found little real-world performance improvement using readfile(), which is part of why it was never merged. "Do you have anything real that can use this that shows a speedup?".

Bernd Schubert thought that network filesystems could benefit, because operations could be batched up rather than sent individually over the wire. He said that because there is no readfile() (or its equivalent) available, network filesystem protocols are not adding combined operations for open/read/close. But J. Bruce Fields said that NFSv4 already has compound operations, "so you can do OPEN+READ+CLOSE in a single round trip". So far, at least, the NFS client does not actually use it, but the protocol support is there.

While Christian Brauner was in favor of better ways to query filesystem information, he was concerned about the ease-of-use for getvalues():

I would really like if we had interfaces that are really easy to use from userspace comparable to statx for example. I know having this generic as possible was the goal but I'm just a bit uneasy with such interfaces. They become cumbersome to use in userspace.
[...] Would it be really that bad if we added multiple syscalls for different types of info? For example, querying mount information could reasonably be a more focussed separate system call allowing to retrieve detailed mount propagation info, flags, idmappings and so on. Prior approaches to solve this in a completely generic way have gotten us not very far too so I'm a bit worried about this aspect too.

But Szeredi thinks that the generality of the interface is important for the future. A system call like statx() could perhaps be added for filesystem information (e.g. statfsx()), but that only works for data that can be represented in a flat structure. Hierarchical data has to be represented in some other way. He would like to see some kind of unified interface to gather information from multiple different sources in the kernel, both textual and binary, that uses hierarchical namespaces (a la file paths) for data that does not have a flat structure—rather than a collection of ad hoc interfaces that get added over time.

Kroah-Hartman pointed to two different mechanisms that might be used, starting with the KVM binary_stats.c interface, "which tried to create a 'generic' api, but ended up just making something to work for KVM as they got tired of people ignoring their more intrusive patch sets". But Szeredi said that the KVM mechanism would not be easily used for things like extended attributes (xattrs) that do not have a fixed size. Kroah-Hartman followed that up with a suggestion to look at varlink as a possible protocol for transferring the data.

Ted Ts'o was not sure what problem getvalues() was truly solving. He noted that an lsof on his laptop did not take an inordinate amount of time, so the performance argument does not really make sense to him. As for ease-of-use, he suggested adding user-space libraries that gather up the data from various sources "to make life easier for application programmers". He had other concerns as well:

Each new system call, especially with all of the parsing that this one is going to use, is going to be an additional attack surface, and an additional new system call that we have to maintain --- and for the first 7-10 years, userspace programs are going to have to use the existing open/read/close interface since enterprise kernels stick [around] for a L-O-N-G time, so any kind of ease-of-use argument isn't really going to help application programs until RHEL 10 becomes obsolete.

If the open/read/close problem is real for some filesystems (e.g. network or FUSE), Christoph Hellwig said, a better way to address it would be with an io_uring operation. "And even on that I need to be sold first." The readfile() article linked above also has a section on a mechanism to support that use case with io_uring.

Linus Torvalds was skeptical of the whole concept. Coalescing the open/read/close cycle has been shown to make little difference from a performance standpoint, and he did not think that the more general query interface was particularly compelling either:

With the "open-and-read" thing, the wins aren't that enormous.
And getvalues() isn't even that. It's literally a [specialty] interface for a very special thing. Yes, I'm sure it avoids several system calls. Yes, I'm sure it avoids parsing strings etc. But I really don't think this is something we want to do unless people can show enormous and real-world examples of where it makes such a huge difference that we absolutely have to do it.

Virtual xattrs?

Dave Chinner pointed out that the XFS filesystem has a somewhat similar ioctl() command (XFS_IOC_ATTRMULTI_BY_HANDLE) that is used to dump and restore extended attributes in batches. He suggested that idea could be further extended:

I've said in the past when discussing things like statx() that maybe everything should be addressable via the xattr namespace and set/queried via xattr names regardless of how the filesystem stores the data. The VFS/filesystem simply translates the name to the storage location of the information. It might be held in xattrs, but it could just be a flag bit in an inode field.
Then we just get named xattrs in batches from an open fd.

He said that the values that Szeredi envisions being available via getvalues() could simply be mapped into an xattr namespace and retrieved using "a new, cleaner version of xattr batch APIs that have been around for 20-odd years already". Schaufler cautioned that there is a "significant and vocal set of people who dislike xattrs passionately", but if that problem could be solved, Chinner's approach had a lot going for it. "You could even provide getvalues() on top of it."

Szeredi seemed amenable to the idea, though he wondered about information from elsewhere in the system. Amir Goldstein said that there is already precedence for "virtual xattrs" in the CIFS filesystem, so that idea could be extended to mount information and statistics of various kinds: "I don't see a problem with querying attributes of a mount/sb the same way as long as the namespace is clear about what is the object that is being queried (e.g. getxattr(path, "fsinfo.sbiostats.rchar",...)."

Chinner also noted that using the xattr interface would provide "a symmetrical API for -changing- values". Instead of using some other mechanism (e.g. configfs) to change system parameters, they could be done with a setxattr() call. "That retains the simplicity of proc and sysfs attributes in that you can change them just by writing a new value to the file...."

The discussion more or less wound down after that. The xattrs-based idea seemed reasonably popular and much of the infrastructure to use it is already present in the kernel in various forms. So, while getvalues() itself does not have a path toward merging, seemingly, the idea behind it could perhaps be preserved in a somewhat different form. So far, patches for that have not appeared, but perhaps that is something we will see before too long.

Comments (24 posted)

Security quote of the week

[...] What if we have a scenario where a third party authenticates the client (by verifying that they have a valid token issued by their ID provider) and then uses that to issue their own token that's much longer lived? Well, now the client has a long-lived token sitting on it. And if anyone copies that token to another device, they can now pretend to be that client.
This is, sadly, depressingly common. A lot of services will verify the user, and then issue an oauth token that'll expire some time around the heat death of the universe. If a client system is compromised and an attacker just copies that token to another system, they can continue to pretend to be the legitimate user until someone notices (which, depending on whether or not the service in question has any sort of audit logs, and whether you're paying any attention to them, may be once screenshots of your data show up on Twitter).

— Matthew Garrett

Comments (2 posted)

Kernel release status

The current development kernel is 5.18-rc1, released on April 3. Linus said: "In fact, at least in pure commits, this has been a bigger merge window than we've had in some time. But let's hope it's all smooth sailing this release." In the end, 13,207 non-merge changesets were merged during this merge window.

Stable updates: 4.14.275, containing mostly of backports of a set of arm64 Spectre mitigations, was released on April 2.

The first stable updates after the close of the merge window tend to be large, and the next set doesn't disappoint: 5.17.2 (1,126 patches), 5.16.19 (1,017), 5.15.33 (913), and 5.10.110 (599) are due on April 7.

Comments (none posted)

Cook: Security things in Linux v5.10

Kees Cook catches up with the security-related changes in the 5.10 kernel, released at the end of 2020.

With static branches, an if/else choice can be hard-coded, instead of being run-time evaluated every time. Such branches can be updated too (the kernel just rewrites the code to switch around the “branch”). All these principles apply to static calls as well, but they’re for replacing indirect function calls (i.e. a call through a function pointer) with a direct call (i.e. a hard-coded call address). This eliminates the need for Spectre mitigations (e.g. RETPOLINE) for these indirect calls, and avoids a memory lookup for the pointer. For hot-path code (like the scheduler), this has a measurable performance impact. It also serves as a kind of Control Flow Integrity implementation: an indirect call got removed, and the potential destinations have been explicitly identified at compile-time.

Comments (none posted)

An XFS 5.19 roadmap

XFS filesystem users may be interested in this message from Dave Chinner, who has taken back XFS maintainership for the 5.19 development cycle. It contains his plans for that cycle, which include bringing in a number of large patch sets.

I'd really like to try getting the merge bottlenecks we've had recently unstuck, so there are a few patchsets I want to try to get reviewed, tested and merged for 5.19. Hopefully not too many surprises will get in the way and so some planning to try to minimises surprised might be a good thing.

Comments (none posted)

Quote of the week

The universal deployment of IP networks on Avian Carriers is facing a multi-decade delay. After operators discovered that birds are not real (now confirmed by the US Government), work began to first understand the many quirks of the drones' firmware before proceeding with wider-scale deployment. No clear timelines exist at this point in time.

— RFC 9225

Comments (2 posted)

Claws Mail 4.1.0 released

Version 4.1.0 of the Claws Mail email client is out. New features include text zooming in the message view, improvements to a number of preferences, a "keyword warner" plugin to give a warning before sending a message containing any (user-defined) keywords, and more.

Full Story (comments: 2)

Behnel: Cython is 20!

On his blog, Stefan Behnel writes about the 20th anniversary of Cython, which is a compiler for Python extensions written in C, for wrapping C libraries in order to provide Python bindings for them, and for embedding Python into other applications. It is used by NumPy, scikit-learn (and other scikit-* extensions), pandas, and more.

On April 4th, 2002, Greg Ewing published the first release of Pyrex 0.1.
Already at the time, it was invented and designed as a compiler that extended the Python language with C data types to build extension modules for CPython. A design that survived the last 20 years, and that made Pyrex, and then Cython, a major corner stone of the Python data ecosystem. And way beyond that.
Now, on April 4th, 2022, its heir Cython is still very much alive and serves easily hundreds of thousands of developers worldwide, day to day.

Comments (3 posted)

Emacs 28.1 released

Version 28.1 of the Emacs editor has been released. The announcement says little about what's in this release, but there are a lot of details in the NEWS file. Significant changes include native compilation of ELisp files, support for running the editor in a seccomp() sandbox, improved emoji support, and much more. Wayland support did not make it into this release, but is already merged for version 29.

Comments (10 posted)

Firefox 99.0 released

Version 99.0 of the Firefox browser has been released. "The Linux sandbox has been strengthened: processes exposed to web content no longer have access to the X Window system (X11)".

Comments (9 posted)

LXD 5.0 LTS released

Version 5.0 LTS of the LXD container-management system has been released. This is a long-term-support release, which will be supported into 2027. New features include disk and USB hotplug support, the ability to start with degraded networking, and more; see this forum post for more information.

Full Story (comments: 3)

Rust Lang Roadmap for 2024

The Rust language team has put up a blog entry describing the plans for the language over the next couple of years or so.

More precise analyses, less rigamarole: Make the compiler better able to recognize when code is correct via improvements to the borrow checker, type inference, and so forth. Identify and eliminate "boilerplate" patterns like having to copy-and-paste the same set of where clauses everywhere.

Comments (none posted)

Boucher: rustc_codegen_gcc can now bootstrap rustc

On his blog, Antoni Boucher updates the status of rustc_codegen_gcc, which "is a GCC codegen for rustc, meaning that it can be loaded by the existing rustc frontend, but benefits from GCC by having more architectures supported and having access to GCC’s optimizations". A significant milestone has been reached: "the GCC codegen has made enough progress to be able to compile rustc itself". For the Rust programming language, rustc is the standard compiler, so this work will eventually allow programs to be built for a number of architectures that are not supported by rustc. He also made progress beyond just building the compiler as he "was able to compile rustc using the GCC codegen and use the resulting rustc to compile a Hello World".

Comments (49 posted)

DistroWatch Weekly April 4

openSUSE Tumbleweed Review of the Week April 1

SparkyLinux News April 1

Ubuntu Weekly Newsletter April 2

Emacs News April 4

GCC 12.0.1 Status Report April 4

GCC Rust monthly report April 4

What's cooking in git.git March 30

Git Rev News March 31

This Week in GNOME April 1

LibreOffice project and community recap April 1

LLVM Weekly April 4

OCaml Weekly News April 5

Perl Weekly April 4

Python Weekly Newsletter March 31

Weekly Rakudo News April 4

Ruby Weekly News March 31

This Week in Rust March 30

Wikimedia Tech News April 4

Fedora FESCO meeting minutes April 5

openSUSE board meeting minutes March 28

openSUSE board meeting minutes February 28

openSUSE Release Engineering minutes April 6

Perl Steering Council meeting minutes April 01

Free Software Supporter April

Kernel code of conduct committee report March 31

CFP Deadlines: April 7, 2022 to June 6, 2022

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
April 14	June 2 June 4	openSUSE Conference 2022	Nürnberg, Germany
April 15	July 17 July 24	Debconf 2022	Prizren, Kosovo
April 15	July 7 July 9	Free Silicon Conference 2022	Paris, France
April 26	May 17 May 19	Yocto Project Summit 2022.05	Online
May 1	June 10 June 12	South East Linux Fest	Charlotte, NC, USA
May 2	September 6 September 8	Research Software Engineers' Conference 2022	Newcastle, UK
May 15	July 21	Icinga Camp Berlin	Berlin, Germany
May 26	September 15 September 18	EuroBSDCon 2022	Vienna, Austria
May 30	September 13 September 16	Open Source Summit Europe	Dublin, Ireland
May 30	August 23 August 24	Open Source Summit Latin America	Online

If the CFP deadline for your event does not appear here, please tell us about it.

Events: April 7, 2022 to June 6, 2022

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
April 5 April 7	Cephalocon 2022	Portland, OR, US
April 7 April 9	FOSSASIA Summit	Online
April 20 April 22	foss-north Online 2022	Online
April 22 April 24	LinuxFest Northwest 2022	Online
April 22 April 23	Grazer Linuxtage 2022	Graz, Austria
April 23 April 24	FLISOL 2022	Latinoamérica
April 27 May 3	PyCon	Salt Lake City, UT, US
April 29 May 1	Linux Application Summit 2022	Rovereto, Italy
May 2 May 4	Linux Storage, Filesystem, and Memory-Management Summit	Palm Springs CA, USA
May 10 May 11	Red Hat Summit 2022	Boston, US
May 10 May 11	HOT - Heidelberg OSADL Talks	Online
May 13	PostgreSQL Conference Germany	Leipzig, Germany
May 13 May 14	Fedora Linux 36 Release Party
May 17 May 19	Yocto Project Summit 2022.05	Online
May 24 May 27	PGCon	Online
May 30 May 31	Embedded Recipes	Paris, France
May 31 June 2	sambaXP	Göttingen, Germany
June 1 June 3	Kernel Recipes	Paris, France
June 1 June 4	BSDCan	Online
June 2	Devconf.CZ Mini IRL	Brno, Czech
June 2 June 4	openSUSE Conference 2022	Nürnberg, Germany

If your event does not appear here, please tell us about it.

Alert summary March 31, 2022 to April 6, 2022

Dist.	ID	Release	Package	Date
Arch Linux	ASA-202204-2		polkit	2022-04-05
Arch Linux	ASA-202204-1		postgresql	2022-04-05
Arch Linux	ASA-202204-4		rizin	2022-04-05
Arch Linux	ASA-202204-3		zlib	2022-04-05
Debian	DLA-2969-1	LTS	asterisk	2022-04-03
Debian	DLA-2966-1	LTS	libgc	2022-03-30
Debian	DLA-2962-2	LTS	pjproject	2022-03-31
Debian	DLA-2970-1	LTS	qemu	2022-04-04
Debian	DLA-2967-1	LTS	wireshark	2022-03-31
Debian	DLA-2968-1	LTS	zlib	2022-04-02
Debian	DSA-5111-1	stable	zlib	2022-04-01
Fedora	FEDORA-2022-2558f14c58	F34	389-ds-base	2022-04-01
Fedora	FEDORA-2022-40544b5314	F35	389-ds-base	2022-04-01
Fedora	FEDORA-2022-ad2b0ad61b	F34	cobbler	2022-03-31
Fedora	FEDORA-2022-445ec90e7c	F35	cobbler	2022-03-31
Fedora	FEDORA-2022-cd2c5e0634	F35	fish	2022-04-05
Fedora	FEDORA-2022-e85e37206b	F35	gdal	2022-04-05
Fedora	FEDORA-2022-1f981071eb	F34	ghc-cmark-gfm	2022-04-02
Fedora	FEDORA-2022-1f981071eb	F34	ghc-hakyll	2022-04-02
Fedora	FEDORA-2022-1f981071eb	F34	gitit	2022-04-02
Fedora	FEDORA-2022-d0fe2a444a	F35	libkiwix	2022-04-04
Fedora	FEDORA-2022-56942dc7c5	F34	mingw-fribidi	2022-04-05
Fedora	FEDORA-2022-8c2af4ba24	F35	mingw-fribidi	2022-04-05
Fedora	FEDORA-2022-e85e37206b	F35	mingw-gdal	2022-04-05
Fedora	FEDORA-2022-b0a85ed1b3	F34	mingw-openexr	2022-04-05
Fedora	FEDORA-2022-f2e0d16c90	F35	mingw-openexr	2022-04-05
Fedora	FEDORA-2022-9515529c96	F35	mingw-openjpeg2	2022-03-31
Fedora	FEDORA-2022-ee15b98ea1	F34	mingw-python-pillow	2022-04-05
Fedora	FEDORA-2022-64332f2a7c	F35	mingw-python-pillow	2022-04-05
Fedora	FEDORA-2022-e85e37206b	F35	mingw-python3	2022-04-05
Fedora	FEDORA-2022-9515529c96	F35	openjpeg2	2022-03-31
Fedora	FEDORA-2022-9e88b5d8d7	F34	openssl	2022-04-03
Fedora	FEDORA-2022-1f981071eb	F34	pandoc	2022-04-02
Fedora	FEDORA-2022-1f981071eb	F34	pandoc-citeproc	2022-04-02
Fedora	FEDORA-2022-1f981071eb	F34	patat	2022-04-02
Fedora	FEDORA-2022-29c30bc7ef	F34	phoronix-test-suite	2022-04-04
Fedora	FEDORA-2022-cce05f0e5e	F35	phoronix-test-suite	2022-04-04
Fedora	FEDORA-2022-ee15b98ea1	F34	python-pillow	2022-04-05
Fedora	FEDORA-2022-64332f2a7c	F35	python-pillow	2022-04-05
Fedora	FEDORA-2022-3f78aabf8c	F34	seamonkey	2022-04-04
Fedora	FEDORA-2022-6043a7b938	F34	skopeo	2022-04-02
Fedora	FEDORA-2022-eda0e65b01	F35	skopeo	2022-04-02
Mageia	MGASA-2022-0130	8	chromium-browser-stable	2022-04-05
Mageia	MGASA-2022-0126	8	golang	2022-03-31
Mageia	MGASA-2022-0128	8	libtiff	2022-04-03
Mageia	MGASA-2022-0129	8	openjpeg2	2022-04-03
Mageia	MGASA-2022-0123	8	openvpn	2022-03-30
Mageia	MGASA-2022-0127	8	php-smarty	2022-04-03
Mageia	MGASA-2022-0125	8	wavpack	2022-03-31
Mageia	MGASA-2022-0124	8	zlib	2022-03-31
openSUSE	openSUSE-SU-2022:1100-1	15.3	389-ds	2022-04-04
openSUSE	openSUSE-SU-2022:0100-1	15.3 SLE15	abcm2ps	2022-03-31
openSUSE	openSUSE-SU-2022:0096-1	15.3 SLE15	fish3	2022-03-31
openSUSE	openSUSE-SU-2022:0097-1	15.3 SLE15	icingaweb2	2022-03-31
openSUSE	openSUSE-SU-2022:1065-1	15.3	kernel-firmware	2022-03-31
openSUSE	openSUSE-SU-2022:0098-1	15.3 SLE15	nextcloud	2022-03-31
openSUSE	openSUSE-SU-2022:0095-1	15.3	openSUSE-build-key	2022-03-31
openSUSE	openSUSE-SU-2022:0103-1	15.3	opera	2022-04-04
openSUSE	openSUSE-SU-2022:1091-1	15.3 15.4	python	2022-04-01
openSUSE	openSUSE-SU-2022:1064-1	15.3	python2-numpy	2022-03-31
openSUSE	openSUSE-SU-2022:1059-1	15.3	salt	2022-03-30
openSUSE	openSUSE-SU-2022:1073-1	15.3	yaml-cpp	2022-04-01
openSUSE	openSUSE-SU-2022:1061-1	15.3 15.4	zlib	2022-03-30
Oracle	ELSA-2022-9260	OL6	Extended Lifecycle Support (ELS) Unbreakable Enterprise kernel	2022-04-05
Oracle	ELSA-2022-9257	OL6	httpd	2022-04-01
Oracle	ELSA-2022-9260	OL7	kernel	2022-04-05
Red Hat	RHSA-2022:1253-01	OSP16.2	Red Hat OpenStack Platform 16.2 (python-waitress)	2022-04-06
Red Hat	RHSA-2022:1173-01	EL6	httpd	2022-04-04
Red Hat	RHSA-2022:1139-01	EL7.3	httpd	2022-04-02
Red Hat	RHSA-2022:1138-01	EL7.4	httpd	2022-04-02
Red Hat	RHSA-2022:1136-01	EL7.6	httpd	2022-04-02
Red Hat	RHSA-2022:1137-01	EL7.7	httpd	2022-04-02
Red Hat	RHSA-2022:1198-01	EL7	kernel	2022-04-05
Red Hat	RHSA-2022:1213-01	EL8.2	kernel	2022-04-05
Red Hat	RHSA-2022:1199-01	EL7	kernel-rt	2022-04-05
Red Hat	RHSA-2022:1209-01	EL8.2	kernel-rt	2022-04-05
Red Hat	RHSA-2022:1185-01	EL7	kpatch-patch	2022-04-05
Red Hat	RHSA-2022:1186-01	EL8.2	kpatch-patch	2022-04-05
Scientific Linux	SLSA-2022:1198-1	SL7	kernel	2022-04-06
Slackware	SSA:2022-095-01		mozilla	2022-04-05
Slackware	SSA:2022-089-01		vim	2022-03-30
SUSE	SUSE-SU-2022:1102-1	MP4.1 SLE15 SES7	389-ds	2022-04-04
SUSE	SUSE-SU-2022:1100-1	MP4.2 SLE15	389-ds	2022-04-04
SUSE	SUSE-SU-2022:14934-1	SLE11	expat	2022-03-31
SUSE	SUSE-SU-2022:1065-1	MP4.2 SLE15	kernel-firmware	2022-03-31
SUSE	SUSE-SU-2022:1093-1	SLE12	libreoffice	2022-04-04
SUSE	SUSE-SU-2022:1113-1	OS8 OS9 SLE12	mozilla-nss	2022-04-05
SUSE	SUSE-SU-2022:14936-1	SLE11	mozilla-nss	2022-04-05
SUSE	SUSE-SU-2022:1041-1	SLE15	opensc	2022-03-30
SUSE	SUSE-SU-2022:1091-1	MP4.2 SLE15	python	2022-04-01
SUSE	SUSE-SU-2022:1064-1	MP4.2 SLE15	python2-numpy	2022-03-31
SUSE	SUSE-SU-2022:1044-1	SLE12	python3	2022-03-30
SUSE	SUSE-SU-2022:1094-1	SLE12	python36	2022-04-04
SUSE	SUSE-SU-2022:1060-1	MP4.1 SLE15 SES7	salt	2022-03-30
SUSE	SUSE-SU-2022:1059-1	MP4.2 SLE15	salt	2022-03-30
SUSE	SUSE-SU-2022:1051-1	SLE12	salt	2022-03-30
SUSE	SUSE-SU-2022:1057-1	SLE15	salt	2022-03-30
SUSE	SUSE-SU-2022:1058-1	SLE15 SES6	salt	2022-03-30
SUSE	SUSE-SU-2022:1105-1	OS9 SLE12	util-linux	2022-04-04
SUSE	SUSE-SU-2022:1103-1	SLE12	util-linux	2022-04-04
SUSE	SUSE-SU-2022:1108-1	SLE15	util-linux	2022-04-04
SUSE	SUSE-SU-2022:1073-1	MP4.1 MP4.2 SLE15	yaml-cpp	2022-04-01
SUSE	SUSE-SU-2022:1072-1	SLE12	yaml-cpp	2022-04-01
SUSE	SUSE-SU-2022:1061-1	MP4.1 MP4.2 SLE15 SES6 SES7	zlib	2022-03-30
SUSE	SUSE-SU-2022:1043-1	OS8 SLE12	zlib	2022-03-30
SUSE	SUSE-SU-2022:1062-1	OS9 SLE12	zlib	2022-03-30
SUSE	SUSE-SU-2022:14929-1	SLE11	zlib	2022-03-30
Ubuntu	USN-5356-1	18.04	dosbox	2022-03-31
Ubuntu	USN-5365-1	20.04 21.10	h2database	2022-04-05
Ubuntu	USN-5358-1	18.04 20.04 21.10	linux, linux-aws, linux-azure, linux-gcp, linux-hwe-5.13, linux-hwe-5.4, linux-kvm, linux-oracle, linux-oracle-5.4	2022-03-30
Ubuntu	USN-5357-1	16.04 18.04	linux, linux-aws, linux-azure-4.15, linux-dell300x, linux-hwe, linux-kvm, linux-snapdragon	2022-03-30
Ubuntu	USN-5361-1	14.04 16.04	linux, linux-aws, linux-kvm, linux-lts-xenial	2022-03-31
Ubuntu	USN-5358-2	18.04 20.04 21.10	linux-aws-5.4, linux-azure, linux-gcp, linux-gcp-5.13, linux-gcp-5.4, linux-gke, linux-gke-5.4, linux-gkeop, linux-gkeop-5.4	2022-03-31
Ubuntu	USN-5357-2	16.04 18.04	linux-aws-hwe, linux-gcp-4.15, linux-oracle	2022-03-31
Ubuntu	USN-5362-1	20.04	linux-intel-5.13	2022-03-31
Ubuntu	USN-5359-1	18.04 20.04	rsync	2022-03-31
Ubuntu	USN-5360-1	18.04 20.04	tomcat9	2022-03-31
Ubuntu	USN-5354-1	18.04 20.04 21.10	twisted	2022-03-31
Ubuntu	USN-5364-1	20.04 21.10	waitress	2022-04-05
Ubuntu	USN-5355-2	14.04 16.04	zlib	2022-03-30
Ubuntu	USN-5355-1	18.04 20.04 21.10	zlib	2022-03-30

Full Story (comments: none)

Linus Torvalds Linux 5.18-rc1 Apr 03

Sebastian Andrzej Siewior v5.17.1-rt17 Apr 01

Clark Williams 5.15.32-rt39 Mar 31

Greg Kroah-Hartman Linux 4.14.275 Apr 02

Arnd Bergmann ARM: ARMv5 multiplatform conversions Apr 05

Tong Tiangen arm64: add machine check safe support Apr 06

guoren@kernel.org riscv: Add COMPAT mode support for rv64 Apr 02

Thomas Gleixner x86/fpu/xsave: Add XSAVEC support and XGETBV1 utilization Apr 04

Kirill A. Shutemov TDX Guest: TDX core support Apr 06

Beau Belgrave tracing/user_events: Update user_events ABI from Apr 01

Eric DeVolder crash: Kernel handling of CPU and memory hot un/plug Apr 01

Dmitrii Dolgov Priorities for bpf progs attached to the same tracepoint Apr 03

Liam Howlett Introducing the Maple Tree Apr 04

John Ogness implement threaded console printing Apr 05

Martin Povišer Apple Macs machine-level ASoC driver Mar 31

Yunfei Dong media: mtk-vcodec: support for M8192 decoder Mar 31

niravkumar.l.rabara@intel.com Add Altera hardware mutex driver Apr 01

Sameer Pujar ASRC support on Tegra186 and later Mar 31

Michael Walle hwmon: add lan9668 driver Mar 31

jason-jh.lin Add Mediatek Soc DRM (vdosys0) support for mt8195 Apr 01

frank zago WCH CH341 GPIO and SPI support Mar 31

Sui Jingfeng drm/loongson: add drm driver for loongson display controller Apr 02

Caleb Connolly power: supply: introduce the Qualcomm smb2 Apr 01

Jishnu Prakash thermal: qcom: Add support for PMIC5 Gen2 ADC_TM Apr 03

Chuanhong Guo spi: add support for Mediatek SPI-NAND controller Apr 03

Yusuf Khan drivers: ddcci: add drivers for DDCCI Apr 03

Sumit Gupta Tegra234 cpufreq driver support Apr 04

Biju Das Add Renesas RZ/G2UL Type-1 {SoC,SMARC EVK} support Apr 02

Chris Morgan power: supply: Add Support for RK817 Charger Apr 04

Richard Fitzgerald ASoC: Add a driver for the Cirrus Logic CS35L45 Smart Amplifier Apr 05

Satya Priya Add Qualcomm Technologies, Inc. PM8008 regulator driver Apr 05

Alex Bennée rpmb subsystem, uapi and virtio-rpmb driver Apr 05

Jane Chu DAX poison recovery Apr 05

Jarrett Schultz platform: surface: Introduce Surface XBL Driver Apr 05

Ashish Mhetre memory: tegra: Add MC channels and error logging Apr 06

Tony Huang Add mmc driver for Sunplus SP7021 SOC Apr 06

Yicong Yang Add support for HiSilicon PCIe Tune and Trace device Apr 06

Puranjay Mohan PRUSS Remoteproc, Platform APIS, and Ethernet Driver Apr 06

Xuan Zhuo virtio pci support VIRTIO_F_RING_RESET (refactor vring) Apr 06

Manikanta Pubbisetty add support for WCN6750 Apr 06

Cheng Xu Elastic RDMA Adapter (ERDMA) driver Apr 06

Miquel Raynal RZN1 RTC support Apr 05

Chris Packham arm64: mvebu: Support for Marvell 98DX2530 (and variants) Apr 06

Aswath Govindraju AM62: Add support for AM62 USB wrapper driver Apr 06

Nicolas Dufresne H.264 Field Decoding Support for Frame-based Decoders Mar 31

Andy Shevchenko gpiolib: Two new helpers and way toward fwnode Apr 01

Leon Romanovsky Add gratuitous ARP support to RDMA-CM Apr 04

Jeffle Xu fscache,erofs: fscache-based on-demand read semantics Mar 31

Jeff Layton ceph+fscrypt: fully-working prototype Mar 31

Stephen Brennan fs/dcache: Per directory amortized negative dentry pruning Mar 31

Dai Ngo NFSD: Initial implementation of NFSv4 Courteous Server Mar 31

Qu Wenruo btrfs: add subpage support for RAID56 Apr 01

Omar Sandoval btrfs: add send/receive support for reading/writing compressed data Apr 04

Jeff Layton ceph+fscrypt: full support Apr 05

Christoph Hellwig use block_device based APIs in block layer consumers Apr 06

Chandan Babu R xfs: Extend per-inode extent counters Apr 06

Charan Teja Kalla mm: shmem: support POSIX_FADV_[WILL|DONT]NEED for shmem files Mar 31

Daniel Verkamp mm/memfd: MFD_NOEXEC for memfd_create Apr 01

Yang Shi Make khugepaged collapse readonly FS THP more consistent Apr 04

Peter Xu userfaultfd-wp: Support shmem and hugetlbfs Apr 04

Kirill A. Shutemov mm, x86/cc: Implement support for unaccepted memory Apr 06

Jason A. Donenfeld random: opportunistically initialize on /dev/urandom reads Apr 05

Eric Snowberg Add CA enforcement keyring restrictions Apr 05

Stanislav Fomichev bpf: cgroup_sock lsm flavor Apr 05

Kuppuswamy Sathyanarayanan Add TDX Guest Attestation support Mar 30

Gavin Shan Support SDEI Virtualization Apr 03

Matthew Rosato KVM: s390: enable zPCI for interpretive execution Apr 04

Tony Krowiak s390/vfio-ap: dynamic configuration support Apr 04

Suravee Suthikulpanit Introducing AMD x2APIC Virtualization (x2AVIC) support. Apr 05

Marc Zyngier KVM: arm64: vgic-v3: MMIO-based LPI invalidation and co Apr 05

LWN.net Weekly Edition for April 7, 2022

Background

Symlink farms

Dpkg resistance

Bugs and fixes

A patch arrives

Intel's IBT

An indirect path to the mainline

Ownership, direction, and bounce buffers

A problem with bounce buffers

Regression

Architecture-specific

Core kernel

Filesystems and block I/O

Hardware support

Miscellaneous

Networking

Security-related

Virtualization and containers

Internal kernel changes

getvalues()

Reaction

Virtual xattrs?

Brief items

Security

Kernel development

Development

Announcements

Newsletters

Distributions and system administration

Development

Meeting minutes

Miscellaneous

Calls for Presentations

CFP Deadlines: April 7, 2022 to June 6, 2022

Upcoming Events

Events: April 7, 2022 to June 6, 2022

Security updates

Kernel patches of interest

Kernel releases

Architecture-specific

Core kernel

Device drivers

Device-driver infrastructure

Filesystems and block layer

Memory management

Security-related

Virtualization and containers

`getvalues()`