|
|
Subscribe / Log in / New account

LWN.net Weekly Edition for October 7, 2021

Welcome to the LWN.net Weekly Edition for October 7, 2021

This edition contains the following feature content:

This week's edition also includes these inner pages:

  • Brief items: Brief news items from throughout the community.
  • Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (none posted)

Rolling stable kernels

By Jake Edge
October 6, 2021

OSSNA

Sasha Levin, one of the maintainers of the stable kernels, gave a presentation at Open Source Summit North America 2021 on a proposal for a different way to handle the stable tree. He noted that throughout most of the kernel's history, version numbers did not really mean anything, but that the versioning scheme suggests that they do, which leads to a disconnect between how the kernels are seen versus how they are actually maintained. He proposed making a "rolling stable" release that provides users what they need—timely fixes to their kernel—without forcing them to choose to switch to a new version number.

Context

He began with the history of kernel versioning schemes and how they have evolved over the years. Starting with version 0.01, Linus Torvalds used a common version-number mechanism that is adopted by smaller projects: if there were just a few small features, bump the number a little bit, if there were bigger features or more of them, bump it by more. When Torvalds decided he had enough of a critical mass that it was ready for wider use, he released Linux 1.0.

[Sasha Levin]

Once 1.0 was out, Linux got more users and those users were not necessarily kernel developers; they wanted to develop applications and run their own workloads. They did not want to run on a development version of the kernel, which is where the idea of a stable kernel came from. The kernel proceeded with a model where one branch was for the stable kernel and other was for doing development.

Fixes were backported from the development branch to the stable branch; once the development branch was ready for release, it would become the next stable branch. The kernel continued with the concept that versions were bumped for new features but the lack of central management meant that it was not clear when a release was ready to be made. In the period between the 2.0 and 2.6 kernels, the process kind of ground to a halt, Levin said.

The process of going from 2.4 to 2.6 was "an adventure that took three years". That led the community to recognize that the process was not working and that eventually "it would take an infinite amount of time to go to the next version", he said. So, during the 2.6 era, the kernel developers came up with a new approach; instead of feature-based releases, they would switch to a time-based release process.

Eventually, the idea that evolved was that new kernels would be released every ten weeks. The kernel developers also started saying that version numbers did not matter—and they really didn't. But that is an example of "engineers doing marketing", Levin said, because the version-numbering scheme did not change, nor did the lack of meaning for version numbers get clearly communicated to customers.

That led to the now-familiar two-week merge window, followed by seven or eight release candidates, then a release. Every kernel was considered stable, so there was no separate development kernel. Some customers wanted more stability, which is where the idea of long-term stable (LTS) releases came from, he said. This model worked so well that the kernel stayed on 2.6.x for roughly eight years.

Expectations

But mistakes were made in switching to the new time-based development process. While kernel developers ignored version numbers, customers did not, and the community did not sell the idea of time-based versions well. Since the numbers were still present, customers assumed they meant something, so a move from, say, 3.20 to 4.0 was "big and scary" because the major version had changed.

Staying on 2.6.x for so long also had some ramifications. It was done in part to help customers get used to the new system, but to some extent, they got too comfortable. When the kernel developers wanted to move to 3.0, they found that various assumptions about the kernel version were baked into some products, so workarounds had to be added at that time.

Customers were getting used to minor-version updates (e.g. from 2.6.y to 2.6.y+1), but they started to make assumptions about their meanings too, Levin said. Once the minor number got large, they treated it as a "more minor" change. In their eyes, 2.6.1 to 2.6.2 was an important change that needed an immediate upgrade, but 2.6.30 to 2.6.31 was a lesser change that could be deferred until 2.6.35 or something. Of course, to the kernel community, there was no difference.

To address that, the plan was made to move to 3.0 and to increase the second number (minor version) with each release, effectively dropping the use of the third element of the version number for mainline releases. Levin called this the "finger and toes" version-number scheme, because Torvalds said that he could count up to around 20 "minor" releases before running out of digits and needing to increase the major version number. So after 3.19, 4.0 was released.

To kernel developers, version numbers continued to have no meaning other than as a measure of where things were on the timeline, he said; users were encouraged to run the latest version. But if you tell customers they need to move from 3.20 to 4.0, they will consider it a multi-year effort that requires planning and paperwork; "so they will only do it when they have time, which is never". Going from 3.20 to 4.0 seems like a big jump, and 4.0 to 4.2 seems much smaller, even though the latter brings roughly double the number of changes.

The kernel community has gotten better over time at testing, avoiding performance regressions, not breaking user space, and other things that should make it easier for users to migrate easily. But there is still resistance to doing so, in part because the kernel developers have not sold users on the plan.

Some distributions and other users have listened, however, and are keeping up with the latest kernels or the LTS versions. They have recognized that the releases are simply time-based, so upgrades are not actually all that scary. Beyond that, users want the latest shiny features ("BPF, the new I/O scheduler, they want all the performance they can get") and they want all of that immediately. They do not want backports of more recent features onto older kernels. They also want frequent security updates. All of that aligns well with what the kernel developers want—and deliver.

Enterprise kernels are less popular these days in part because "maintaining an ancient kernel has become exponentially difficult". The kernel moves so fast that there are more and more backports that need to be done, each of which complicates the maintenance of the enterprise kernel. In practice, most of the enterprise kernels have become "frankenkernels"; they contain backports of the majority of the current kernel and the version number they claim is not at all accurate.

Rolling

In the "LTS world", users would pick a kernel LTS series, say 5.10.x, and keep upgrading their kernels to the latest in that series until that became impossible. When that happened, they would pick a new LTS series, switch to that, and continue from there. Kernel developers have been trying to sell the idea that they should switch sooner; when the next release comes out (in this case, 5.11.x), they should start using that series.

But making that switch takes a conscious effort by the user; they also have to pick a time but "there's never a good time to say 'we're changing our kernel', especially this often", he said. Beyond that, there are some technical hurdles to switching to a new kernel Git branch mid-stream, there may be patches that need to be forward-ported, and so on.

So, Levin asked, "what can be done to improve this process?" The kernel community provides the same guarantees no matter whether the change is between major versions, minor versions, or stable versions; "it promises that it will not break user space". It might break user space by accident, but it will fix the kernel if it does. Those breakages can occur on any kind of upgrade, though; no upgrade type is more or less prone to accidental breakage. "Version numbers still, really truly don't matter".

The idea would be to "eliminate the friction point" where customers need to choose to move to the next series. Currently, the stable and LTS trees have a branch for each mainline release; effectively, the stable trees get forked every time Torvalds does a release. The mainline Git tree just continues on; once the next merge window opens, patches bound for the next release get committed to the main branch. There is no equivalent mechanism for the stable tree.

The stable maintainers want to create a different structure that eliminates the need for users to switch branches each time they move to a new series. It would provide the same guarantees as the current trees and run on the same schedule. But it would roll forward by default, as does Torvalds's tree.

The idea is to use "Git magic" to make it all work; there is no cherry picking of patches. Instead a "hack on Git merges" is used, which preserves the SHA1 values as they are in the existing stable branches. The Git tags survive and the SHA1 preservation means that the folks doing verification to prevent supply-chain attacks can do so more easily. Bisection still works, authors don't change so Git blame works, and so on. The stable maintainers just "manipulate Git a little bit to give the appearance that all this development happened in a single branch", he said. It is simply a change in "how we present kernel trees to the end user".

Currently, users will be on a stable branch or an LTS branch, but under this model, they would instead be on the stable or LTS branch. They are not using some random kernel version, but are instead using the version that the kernel developers have recommended for the best experience. From that, they will get all of the latest features, bug fixes, and security updates.

Concerns

There are, of course, some concerns about this approach that he has heard in feedback from companies and users. The first is that there is an increased need for testing under the rolling model that is being pushed onto the users. But he disagrees. Currently users think they do not need to retest much if they go from x.y.50 to x.y.51, but that is not true. The same is true with rolling. Testing must be done whenever the kernel changes.

Another area of worry is that the rolling model effectively "masks the merge window inside of it". One day the rolling stable tree might be on 5.7.15 and the next on 5.8.1; within that pull are a lot of patches that are from the 5.8 merge window, which is a big step. But it all goes back to testing, he said; it is possible that the merge window introduced a lot of bugs, but it is also possible that a single stable patch does the same thing.

The schedules are staying the same, he said. The rolling stable and LTS branches are just derivatives of the current branches of the stable tree. And the new model does not require users to do anything on a different schedule; it simply gives them the latest kernel when they decide to upgrade and do a Git pull.

The rolling model is not a radical departure, nor is it a new idea. "If you are using Debian, that's kind of how it works"; the Debian testing kernels are more or less the equivalent of the rolling stable branch, he said. He is confident that the rolling model will work because there distributions and devices that already use the model in some form; "they like how it works". Overall, it is an attempt to address some of the mistakes made in the past around versioning and what the stable developers told users about what they should use, Levin said.

The rolling stable and rolling LTS branches already exist; rolling stable is at 5.14.9, as of this writing, while rolling LTS is at 5.10.70. As mentioned, they are just derivatives of the existing stable and LTS branches; there is nothing in them that is not in those other two. "If you want to follow the development philosophy of the Linux kernel, these branches are going to make it super easy for you".

In answer to a question, Levin said that there is some time between a mainline release and the corresponding changes ending up on the rolling stable tree. Once 5.15 is released, they will wait until after the 5.16 merge window closes before incorporating the changes from 5.15.

He was asked about the future of the LTS releases, as well. He noted that the LTS idea is something of a contradiction; on one hand, the stable maintainers recommend always using the latest LTS, but on the other, LTS releases are supported for five years. His personal hope is that LTS releases go away over time, but he does not realistically see that happening anytime soon because there is not enough testing and qualification being done. He expects, though, that in 20 years there will be no LTS kernels.

[I would like to thank LWN subscribers for supporting my travel to Seattle for Open Source Summit North America.]

Comments (47 posted)

Moving Google toward the mainline

By Jake Edge
October 5, 2021

OSSNA

Two Google engineers came to Open Source Summit North America 2021 to talk about a project to change the way the company creates and maintains the kernel it runs in its data centers on its production systems. Andrew Delgadillo and Dylan Hatch described the current production kernel (Prodkernel) and the problems that occur because it is so far from the mainline. Project Icebreaker is an effort to change that and to provide a near-mainline kernel for development and testing within Google; the talk looked at the project, its risks, its current status, and its plans.

Prodkernel

Delgadillo began the talk with Prodkernel, which runs on most of the machines in Google's data centers; those systems are used for handling search requests and "other jobs both externally facing and internally facing", he said. Prodkernel consists of around 9000 patches on top of an older upstream Linux kernel. Those patches implement various internal APIs (e.g. for Google Fibers), provide hardware support, add performance optimizations, and contain other "tweaks that are needed to run binaries that we use at Google". Every two years or so, those patches are rebased onto a newer kernel version, which provides a number of challenges. For one thing, there are a lot of changes in the kernel in two years; even if the rebase of a feature seems to go well, tracking down any bugs that crop up involves a "very large search space".

[Andrew Delgadillo]

There were some specific internal needs and timelines that drove the need for Prodkernel, he said, which is why Google could not simply use the mainline and push any of the extra features needed into that. He gave some examples of the features that are needed for Google's production workloads but that are not available in the mainline; those included a need to be able to set quality of service values from user space for outgoing network traffic, to have specific rules for the out-of-memory (OOM) killer, and to add a new API for cooperative scheduling in user space.

One of the big problems with Prodkernel is that it detracts from upstream participation, he said. Any features that Google creates for production are developed and tested on Prodkernel, which can be as much as two years behind the mainline kernel. If the developer wants to propose the feature for the mainline, the Prodkernel model imposes two main hurdles. For one, the feature needs to be rebased to the mainline, which may be a difficult task when the delta between the versions is large. Even if that gets done, there is a question of the testing of the feature. The feature has been validated on Prodkernel with production workloads, but now it has been taken to a new environment and been combined with totally new source code. That new combination cannot be tested in production because the mainline lacks the other features of Prodkernel.

Google's workloads tend to uncover bottlenecks, deadlocks, and other types of bugs, but the use of Prodkernel means that the information is not really useful to others. If Google is running something close to an LTS stable kernel, for example, reporting the bug might lead the team to a fix that could be backported; generally, though, finding and fixing the bugs is something that Google has to do for itself, he said. In addition, any fixes are probably not useful to anyone else since they apply to a years-old kernel. Also, any bugs that have been fixed in more recent kernels do not get picked up until either they are manually found and backported or the next rebase is done.

The rebasing process is "extremely costly" because it takes away time that developers could be working on other things. Each patch has to have any conflicts resolved with respect to the upstream kernel; it may well be that the developer has not even looked at the code for a year or more since they developed it, but they have to dig back in to port it forward. Then, of course, the resulting kernel has to be revalidated with Google's workloads. Bugs that are found in that process can be difficult to track down. Kernel bisection is one way, of course, but conflicts from rebasing need to be resolved at every step; that could perhaps be automated but it still makes for a difficult workflow, he said.

The delay associated with the rebasing effort worsens the problems with upstream participation, which makes the next rebase take that much more time. It is a pretty clear example of technical debt, Delgadillo said, and it just continues to grow. Each Prodkernel rebase increases the number of patches, which lengthens the time it takes to do the next one; it is not sustainable. So an effort is needed to reduce the technical debt, which will free up time for more upstream participation—thus further reducing the technical debt.

Icebreaker

[Dylan Hatch]

Hatch then introduced Project Icebreaker, which is a new kernel project at Google with two main goals. The first is to stay close to the upstream kernel; the idea is to release a new Icebreaker kernel for every major upstream kernel release. Those releases would be made "on time, we want to stay caught up with upstream Linux". That will provide developers with a platform for adding features that is close enough to the mainline that those features can be proposed upstream.

The second goal is to be able to run arbitrary Google binaries in production on that kernel. It would be "a real production kernel" that would allow validating the upstream changes in advance of the Prodkernel rebase. Under the current scheme, Google has been "piling everything into the tail end of this two-year period", he said. With Icebreaker, that testing can begin almost as soon as a new mainline kernel gets released.

Those goals are important because the team needs "better participation upstream". Developers working on features for kernels far removed from the current mainline have a hard time seeing the benefit of getting that feature upstream. There is a lot of work that needs to be done to untangle the feature from Prodkernel, test it on the mainline kernel, and then propose it upstream—all of which may or may not result in the feature being merged. The alternative is to simply wait for the rebase; time will be made available to do that work, but once the new Prodkernel is qualified, it is already too late for the feature to go upstream.

Having kernels closer to mainline will also help Google qualify and verify all of the upstream patches that much sooner. Rather than waiting until the two years are up and doing a huge rebase and retest effort, the work can be amortized over time

Structure

There are two sides to consider when looking at the structure of the Icebreaker effort, he said. On one side is how features can be developed in order to get them deployed on an Icebreaker kernel. On the other is how those patches need to be upgraded in order to get them onto a new mainline kernel for the next Icebreaker release.

Icebreaker creates a fork from the mainline at the point of a major release. It adds a bunch of feature branches (also known as "topic branches") to that, each of which is a self-contained set of patches for one chunk of functionality that is being added by Google. That is useful in and of itself, because each of those branches is effectively a patch series that could be proposed upstream; "so you are starting with something upstreamable and not going the other way around", Hatch said.

Development proceeds on those feature branches, with bug fixes and new functionality added as needed. Eventually, those feature branches get merged into subsystem-specific staging branches for testing. The staging branches then get merged into a next branch for the release. The next branch is an Icebreaker kernel that is "ready to go, but it still has its roots in these feature branches", he said. After the release is made, a "fan-out merge into the staging branches" is done, in order to synchronize them with the release version. Importantly, this fan-out merge is not done into the feature branches. Those stay in a pristine upstreamable state.

By following the life of one of these feature branches, we can see how the upgrade process goes, he said. When a new mainline kernel is released, a new branch for the feature is created and the branch for the earlier kernel is merged onto it. The SHA1 values for the commits on the earlier feature branch are maintained and the conflict resolution is contained in the merge commit.

Bug handling is easier with this workflow. The bugs can be fixed on the earliest supported feature branch where they occur and then merged into all of the successive feature branches. The SHA1 of the commit that introduced the bug and that of the fix will remain the same on those other branches. There is no need to carry extra metadata to track the different fix patches in each of the different supported kernel versions.

The Icebreaker model is much more upstream-friendly than the current Prodkernel scheme, Hatch said. The Icebreaker feature branches are done on an up-to-date mainline kernel and they get tested that way, so the test results are more relevant to upstream. This will allow developers to propose features for the mainline far more easily. Much of the Icebreaker branch structure and the like can be seen in the diagrams in the slides from the talk.

Risks

"There are some risks with Icebreaker, unfortunately", he said. One of the bigger ones is that there needs to be a lot of feature branch testing. There may be a tendency to treat those branches like a file cabinet, where patches are stored and merged into wherever they are needed. But that is not useful if it is not known whether "it builds or boots or passes any tests".

Thus it is important to validate just the feature branch before merging it elsewhere. If it is known that it was working before the merge, then any subsequent breakage will have been caused by something in the merge. Otherwise, it is just complicating the whole process to merge a feature in an unknown state into a new tree. The same goes when upgrading to a new mainline kernel version, he said.

The dependencies between features could be a risk for Icebreaker as well. The model is that features are mostly self-contained, but that is not completely true; there are some dependencies. They can range from APIs being used in another feature to performance optimizations that are needed for a feature to do its job correctly. That could be handled by resolving the dependencies on the staging branch, but those branches are not carried along to the next Icebreaker kernel, only the feature branches are.

The answer is to do merges between feature branches, which does work, but adds some complexities into the mix. There is a need to figure out which branches can be merged into each other. "How crazy can we let these merges become?", he asked. There are no rules for when two feature branches should simply be turned into a single feature branch or when there is utility in keeping them separate; those things will have to be determined on a case-by-case basis.

Another risk is that Icebreaker is much less centralized than the Prodkernel process is. Feature owners and subsystem maintainers within Google will need to participate and buy into this new workflow and model. They will need to trust that this new Icebreaker plan—confusing and complicated in the early going—will actually lead to better outcomes and a reduction in the technical debt.

The last risk that Hatch noted is that features in Icebreaker do actually need to get upstream or it will essentially devolve into Prodkernel again. If more and more patches are added to Icebreaker without a reduction from some being merged due to features going upstream, the team will not be able to keep up with the mainline. The production kernel team needs to take advantage of the fact that Icebreaker is so close to mainline and get its features upstream.

Status and plans

Delgadillo took over to talk about the status of Icebreaker. At the time of the talk, it was based on the 5.13 kernel, at a time when the 5.15 kernel is in the release-candidate stage. So the project is essentially one major release behind the mainline, which is "a lot closer that we have ever been".

[Dylan Hatch & Andrew Delgadillo]

In the process, some 30 patches were dropped from the tree because they were merged upstream. Out of 9000 patches being carried, 30 may not sound like a lot, he said, but it is a start. It is not something that would have happened without a project like Icebreaker. The team is working on 5.14 now and was able to drop 12 feature branches as part of that. Those were for features Google was backporting from the mainline, but that does not need to be done for recent kernels. That is another reduction in the technical debt, he said. Hopefully that process will get "easier and easier as we go along".

In addition, issues have been found and fixed, then reported upstream or have been sent to the stable trees for backporting. That is not something that happened frequently with Prodkernel because it was so far behind the mainline. In general, they were build fixes and the like, he said, but were useful to others, which is a plus.

Looking forward, Icebreaker plans to catch up to upstream at 5.15 or 5.16, which will be a turning point for the project. It will be "riding the wave" of upstream at that point, which will allow "us to relax the cadence at which we need to update our tree", he said. One of the problems that has occurred is that feature maintainers have had to rebase and fix conflicts every three or four weeks as Icebreaker worked on catching up; in the Prodkernel model, that would only happen every two years or so. Once the project has caught up, there will only need to be rebases every ten or so weeks, aligned with the mainline schedule.

Testing the Icebreaker feature branches on top of mainline kernel release candidates is also something the project would like to do. That would allow Google to participate in the release-candidate process and help test those kernels. Once Icebreaker is aligned with mainline, it will make upstream development of features possible in a way that simply could not be done with Prodkernel.

At that point, Delgadillo and Hatch took questions. The first was about the plans for Prodkernel: will there still be the giant, two-year rebase? Hatch said that for now Icebreaker and Prodkernel would proceed in parallel. Delgadillo noted that Icebreaker is new, and has not necessarily worked out all of its kinks. He also said that while Icebreaker is meant to be functionally equivalent to Prodkernel, it may not be at parity performance-wise. It is definitely a goal to run these kernels in production, but that has not really happened yet.

Readers may find it interesting to contrast this talk with one from the 2009 Kernel Summit that gives a perspective on how things worked 12 years ago.

[I would like to thank LWN subscribers for supporting my travel to Seattle for Open Source Summit North America.]

Comments (28 posted)

User-space interrupts

By Jonathan Corbet
September 30, 2021

LPC
The term "interrupt" brings to mind a signal that originates in the hardware and which is handled in the kernel; even software interrupts are a kernel concept. But there is, it seems, a use case for enabling user-space processes to send interrupts directly to each other. An upcoming Intel processor generation includes support for this capability; at the 2021 Linux Plumbers Conference, Sohil Mehta ran a Kernel-Summit session on how Linux might support that feature.

At its core, Mehta began, the user-space interrupts (or simply "user interrupts") feature is a fast way to do event signaling. It delivers signals directly to user space, bypassing the kernel to achieve lower latency. Our existing interprocess communication mechanisms all have limitations, he said. The synchronous mechanisms often require a dedicated thread, have high latency, and are generally inefficient. Asynchronous mechanisms [Sohil Mehta] (signals, for example) have even higher latency. So often the only alternative is polling, which wastes CPU time. It would be nice to have a fast, efficient alternative.

That alternative is user-space interrupts, which will first appear in Intel's "Sapphire Rapids" CPUs. RFC patches supporting this feature were posted in mid-September. Those patches support user-to-user signaling without going through the kernel; instead, the new SENDUIPI instruction allows one process to send an interrupt directly to another process. Future versions will also include kernel-to-user signaling and, eventually, interrupts sent directly to user space from devices.

Mehta put up some benchmark results (which can be seen in the slides) showing that user-space interrupts are nine times faster than using eventfd(), and 16 times faster than using pipes or signals. The advantage is lower if the receiving process is blocked in the kernel, since it is not possible to avoid a context switch in that case. Even then, user-space interrupts are 10% faster for the recipient, and significantly faster for the sender, which need not enter the kernel at all. Florian Weimer asked how user-space interrupts compared to futexes, but evidently that testing has not been done.

Use cases for this feature include fast interprocess communication, of course. User-mode CPU schedulers should be able to benefit from it, as can user-space I/O stacks (networking, for example). Getting the full benefit from this feature will require enhancements to libraries like libevent and liburing. There are no real-world applications using this feature yet, Mehta said; he is interested in hearing about other applications that might benefit from it. Ted Ts'o suggested host-to-guest wakeups in virtualization environments; evidently that use case is being investigated, but there are no real results yet.

For any number of good reasons, user-space processes cannot just arbitrarily send interrupts to others; there is some setup required. On the receiving side, it all starts with a call to:

    uintr_register_handler(handler, flags);

Where handler() is the function that is called to handle a user-space interrupt, and flags must be zero. The definition of the handler function requires a bit of special care; its prototype is:

    void __attribute__ ((interrupt))
        handler(struct __uintr_frame *frame, unsigned long long vector);

The next step is to create at least one file descriptor associated with this handler:

    int uintr_create_fd(u64 vector, unsigned int flags);

Here, vector is a number between zero and 63; one file descriptor can be created for each vector. The process then hands that file descriptor to the sending side. If the sender is another thread in the same process, the hand-off is trivial; otherwise a Unix-domain socket can be used to transfer the descriptor. The sender then performs its setup with:

   int uintr_register_sender(int fd, unsigned int flags);

Where fd is the file descriptor passed by the recipient and flags, as always, is zero. The return value is a handle that can be used with the _senduipi() intrinsic that is supported by GCC 11 to actually send an interrupt to the receiver.

Actual delivery of the interrupt depends on what the receiver is doing at the time; if that process is running in user space, the handler function will be called immediately with the appropriate vector number. Once the handler returns, execution will continue at the point of interruption. If the receiver is blocked in a system call in the kernel, the interrupt will be delivered on return to user space without interrupting the in-progress system call. There is a uintr_wait() system call in the patch set that will block until a user-space interrupt arrives then return immediately, but it is described as a "placeholder" until the desired behavior for this case is worked out.

Prakesh Sangappa asked whether it was really necessary to exchange the file descriptor with all senders; in a system where there could be large numbers of senders, that could get expensive. Mehta replied that there are a couple of optimizations that are being looked at. Ts'o asked whether user-space interrupts could be broadcast to multiple recipients; the answer is that broadcast is not supported.

Arnd Bergmann wanted to know if any thought had been given to emulating this feature on older CPUs. The answer appears to be yes; the kernel will trap the relevant instructions and transparently emulate them. Mehta asked for feedback on the emulation mechanism and, in particular, whether it should be implemented for other architectures. Bergmann discouraged that idea, saying that if user-space interrupts are implemented for those architectures, they will surely not be compatible with the emulated version. Emulation for other architectures, he said, should only be done once those architectures have defined their own instructions.

Greg Kroah-Hartman asked about whether the Clang compiler has support for the _senduipi() intrinsic; that support is being worked on, but is not yet ready. Kroah-Hartman also asked about more details on workloads that benefit from this feature, to which Mehta replied that he did not have anything specific to point to yet.

Mehta closed the session (which was running out of time) by asking what should happen when the recipient is blocked in a system call. As mentioned, the current patch set waits for the system call to return before delivering the interrupt. Should the behavior be changed to be closer to signals, with the interrupt delivered immediately and the system call returning with an EINTR status? Nobody had an opinion to share on that question, so the session ended there.

The video of this talk is available on YouTube.

Comments (30 posted)

How Red Hat uses GitLab for kernel development

By Jonathan Corbet
October 1, 2021

LPC
Much of the free-software development world has adopted Git forges (such as GitHub, GitLab, or sourcehut) with enthusiasm. The kernel community has not. Reasons for that reticence vary, but one that is often heard is that these forges simply don't work well at the scale needed for the kernel project. At a Kernel-Summit session during the 2021 Linux Plumbers conference, Donald Zickus and Prarit Bhargava sought to show how Red Hat has put GitLab to good use to support its kernel team. Not only can these forges work for kernel development, they said, but moving to a forge can bring a number of advantages.

The transition

Red Hat has transitioned its kernel team from "an old Patchwork server" to GitLab in the last year, Zickus began. Prior to the change, the team had a fairly traditional, email-based workflow that got harder to manage as the patch volume increased. Red Hat has a number of strict rules regarding patch review and getting acknowledgments from the appropriate people; tracking the readiness of patches as they went through this gauntlet, which involves a lot of manual work, became increasingly hard. Reviewers didn't know which patches they should be looking at, and the continuous-integration (CI) system was bolted on.

It was time to make a change, so the company turned to GitLab.

Bhargava briefly introduced the Lab tool, which provides a command-line interface to many GitLab features. Perhaps ironically, this tool is hosted on GitHub. A lot of developers prefer a command-line interface, he said.

In general, kernel maintainers tend to have their own scripts; each maintainer's tooling is different. Some maintainers would detect certain kinds of errors, while others would not. GitLab's ability to run scripts on actions has replaced much of this customization, ensuring that each patch is treated consistently and has the proper signoffs, includes the (apparently mandatory) Bugzilla ID, etc. Patches that come up short in one way or another can be labeled as needing attention.

Email, Bhargava said, makes it easy to comment on patches. It is rather less easy for maintainers to sift through the resulting volume of messages. GitLab is able to thread comments and replies, all organized per merge request, making the process easier. All of this is tied to a "big fat 'approve' button" to allow a merge to proceed. At this point, he said, he's not seeing developers using email-based approvals anymore.

The upstream kernel uses the MAINTAINERS file to determine who should review a given patch, which is another step for contributors to remember. Within Red Hat, that process has now been automated; when a merge request is generated, maintainers and reviewers are assigned automatically based on an owners.yaml file. There are two categories of review, depending on whether the reviewer's approval is required. Interested developers can sign up for notifications for changes to specific areas.

Previously, CI was added to the email-based process, separated from the generation of patches. Nothing required its use. In the new system, CI is integrated directly. While CI systems are not exciting to most maintainers, he said, they should be; they add a lot of stability to the kernel. With some of the CI testing done for the kernel as a whole, developers don't even know that the testing is happening; GitLab makes CI testing explicit and visible.

Zickus took over to say that the experience with GitLab has not been entirely smooth; they have found various problems over time. GitLab has worked with them to resolve these problems, which were mostly with the API and tooling. Red Hat also has a dedicated group working on issues that GitLab has not been able to resolve; there is a "strategic partnership" between the two companies.

There are some open issues, of course, including managing the chain of trust: pull requests for the kernel need to be properly signed. Better logs for merge requests would be helpful. Perhaps the biggest concern, though, has to do with making GitLab into a single point of failure; what if the company is bought out by somebody who is hostile to Red Hat and its goals? In that case, it would be relatively easy to pull all of the necessary data out of the system; the Git trees are already mirrored elsewhere. They have a script now that can take all of the comments from GitLab and dump them into a public-inbox instance.

Prarit closed the prepared talk by saying that it is still not 100% clear that GitLab is the best way forward, even though Red Hat is fairly deeply invested in it at this point. But the Git forge approach is worthwhile, he said. There were a lot of worries about making this kind of transition that turned out not to be real problems.

Discussion

Greg Kroah-Hartman started the discussion by (somewhat jokingly) congratulating the speakers for having integrated all of Gerrit's functionality into GitLab. But, he asked, how many patches are they really managing this way? Zickus answered that they manage 15-16,000 patches for each RHEL update, every three months. Bhargava said that, when they sat down to look at forges two years ago, Gerrit didn't have many of the features they were looking for; perhaps it has gotten better since.

Ted Ts'o said that his biggest worry is collaboration between subsystems. If information for one subsystem is "orphaned" in GitLab, that is going to make life harder for developers elsewhere. He is worried about discussions in particular; comments in Gerrit, for example, are not available anywhere else. The kernel community could consider hosting a GitLab instance at, [Discussion] say, kernel.org, but then some patch comments would live there, while others would be on the mailing lists. Developers would have to search in two places to get the full picture. Unless email can remain a first-class citizen in the development process, he said, a workable transition is hard to see.

Zickus answered that Red Hat is running an email bridge now that is used to ease the transition for developers. It is not intended to be kept around as a long-term solution, though.

Konstantin Ryabitsev said that he is not excited about the possibility of hosting a forge instance on kernel.org. Hosting the Bugzilla instance there has not been a good experience; it is mostly abandoned, but he is stuck cleaning out the spam that accumulates there. Once a tool is added, it is almost impossible to remove, since a couple of people inevitably become dependent on it. So it will have to be maintained forever.

A bigger issue, though, has to do with robustness. If kernel.org is not reachable, kernel development still goes on; an outage is inconvenient but not really a big problem. Adding a central forge, though, risks creating a situation where, should it go down, no work can get done. Imagine a situation, he said, where there is a zero-day vulnerability affecting billions of devices, and an attacker wants to prevent it from being patched. Attacking a crucial piece of infrastructure like a central forge might then look like a good idea. If the answer is "fall back to email", then nothing has really been solved.

Zickus said that GitLab is replicated and stored on the Google cloud, to which Ryabitsev responded that the Google cloud is unavailable in parts of the world like Russia and China. A large cloud provider is not a good solution, he said; on the other hand, self-hosting brings its own set of scaling and expense problems. Zickus said that Red Hat has a large testing team in China, and GitLab works well there. If that were to change, though, it would be a real problem.

Ryabitsev said he is not opposed to subsystems switching to a forge, as long as it doesn't become "a place where discussions go to die". Currently, if the infrastructure goes away, the "fossil record" is still around on sites like lore.kernel.org. Zickus said that dumping conversations to a public-inbox solves some of that problem. He then summarized the conversation by saying that there isn't really opposition to using the tool, but there are worries about preserving the conversations that happen there.

As time ran out, Ts'o said that this session was not the end of the conversation. Forges like GitLab point out what could be in our future, and that features like automatic CI testing are a really good idea.

The video of this talk is available on YouTube.

Comments (28 posted)

Rust and GCC, two different ways

By Jonathan Corbet
October 4, 2021

LPC
Developers working in languages like C or C++ have access to two competing compilers — GCC and LLVM — either of which can usually get the job done. Rust developers, though, are currently limited to the LLVM-based rustc compiler. While rustc works well, there are legitimate reasons for developers to wish for an alternative. As it turns out, there are two different ways to compile Rust using GCC under development, though neither is ready at the moment. Developers of both approaches came to the 2021 Linux Plumbers Conference to present the status of their work.

rustc_codegen_gcc

First up was Antoni Boucher to talk about the rustc_codegen_gcc project. The rustc compiler, he started, is based on LLVM; among other things, that means that it does not support all of the architectures that GCC supports. What LLVM does have Rustc does have, though, an API that allows plugging in an alternative code generator. This API can be used to plug in the GCC code-generation machinery via libgccjit. That is the approach taken by rustc_codegen_gcc.

Why would this be a useful thing to do? Boucher said that the Rust language is increasing in popularity, but it needs support for more architectures than LLVM can provide. The Rust for Linux work, in [Antoni Boucher] particular, is highlighting this issue, but there are a lot of other users out there too. Developers of embedded systems would benefit from better architecture support, as would the Firefox browser.

A number of Rust features are supported by rustc_codegen_gcc now, including basic and aggregate types, variables, functions, atomics, thread-local storage, inline assembly, numerous intrinsics, and more. The compiler has support in the Compiler Explorer. The libcore tests pass, as do most of the user-interface tests. As an experiment, this compiler has been used to build Rust code for the m68k architecture; this work is still in an early stage, Boucher said, but it shows that it is indeed possible to build Rust programs for platforms that LLVM does not support.

There are still some problems to solve. A number of attributes still need support, as does generation of debug information. The quality of the generated code is not always the best. More work must be done to support new architectures. Link-time optimization is not yet supported, and so on. This work has also required a set of changes to libgccjit, most of which are still in review.

There are some other issues too, including the need to use a patched version of GCC until all of the changes have been merged upstream. Even then, there will be a need to backport those patches to allow the use of older GCC versions, which is important to be able to compile the kernel.

Even so, this project seems reasonably well advanced. Boucher noted that there was an active pull request to add rustc_codegen_gcc to the rustc compiler, but that's no longer true; that merge was done on September 29.

Native GCC

Philip Herron then took over to talk about the native GCC front end for Rust, known as gccrs. Rather than being a hybrid of LLVM and GCC, this compiler is a full implementation of the Rust language in the GNU toolchain. This work is written in C++ (easier to bootstrap, he said) and is intended to become a part of mainline GCC. It uses the existing binutils, and reuses the official Rust libraries (such as libcore, libstd, and libproc).

Once again, the speaker raised the question of "why?". He likes big projects, he said, so this one looked attractive. It makes for an interesting contrast with how the problems are solved in LLVM, and is a [Philip Herron] good opportunity to see how GCC handles a modern, high-level language. Once the work is done, it will be useful to compare the results in terms of code size, register allocation, and energy efficiency.

There are a lot of benefits that come from having an independent implementation of Rust, he said. Tight integration with GCC will be useful for a number of projects, which will be able to benefit from GCC plugins as well. A GCC-based Rust compiler will make it easier to bootstrap rustc on new platforms. Support for link-time optimization, which tends not to work well in mixed-compiler situations, should be improved. And, of course, GCC brings support for more target architectures.

Work toward Rust support in GCC got started back in 2014, Herron said, but then it stalled out; the language was evolving too fast for the GCC developers to keep up with it. This effort was then restarted in 2019; the recent interest in Rust for the kernel is helping to drive this project. Various companies, including Open Source Security and Embecosm, are supporting development of the GCC-base Rust compiler. There is a detailed plan in place for a "minimum viable product" compiler to be released by the end of 2022.

Thus far, working support exists for core data structures and most control flow, though some of the control-flow work is still in progress. Generics and trait resolution work. Future work includes macros, imports, unstable features, and intrinsics. Amusingly, the current compiler can build "hello world", but it requires using unsafe code; the lack of macros means that println!() is unavailable and the C printf() function must be called instead.

Work planned for further in the future includes the borrow checker, which will be done in cooperation with the Polonius project. Incremental compilation is on the list, as is backporting the front-end to older GCC versions. Longer term, it is hoped that this work will help to drive Rust compiler compatibility testing in general.

The video of these talks is available on YouTube.

Comments (50 posted)

New features coming in Julia 1.7

October 4, 2021

This article was contributed by Lee Phillips

Julia is an open-source programming language and ecosystem for high-performance scientific computing; its development team has made the first release candidate for version 1.7 available for testing on Linux, BSD, macOS, and Windows. Back in May, we looked at the increased performance that arrived with Julia 1.6, its last major release. In this article we describe some of the changes and new features in the language and its libraries that are coming in 1.7.

Historically, Julia's release candidates have been close to the finished product, and most users who would like to work with the new features can safely download binaries of version 1.7rc1 from Julia headquarters in the "upcoming release" section. Officially, however, the current version is not "production ready"; the developers welcome bug reports to the GitHub issue tracker.

Syntax changes

An exhaustive list of all of the changes can be found in the release notes. We will look at some of those, starting with a small handful of adjustments to the language syntax to add an extra measure of concision and expressiveness. As with all language alterations since version 1.0, nothing in 1.7 should create breakage except in rare cases.

A new form of array concatenation presents one of these rare cases. Previously, the semicolon was used for concatenation along the first dimension, and it retains that meaning. But repeated semicolons were treated as a single semicolon, and now they have a new significance. This should only break programs where a repeated semicolon is present as a typo.

Currently, the semicolon operator works as follows:

    v1 = [1, 2, 3]
    v2 = [4, 5, 6]
    [v1; v2]   # results in [1, 2, 3, 4, 5, 6]

But the operator is extended in version 1.7: n semicolons now concatenates along the nth dimension. New dimensions are created as needed. For example, the result of [v1;; v2] is to create a new second dimension and join along it, producing the 3×2 matrix:

    1  4
    2  5
    3  6

The new operator syntax allows us to create a third dimension concisely: [v1;;; v2] gets us a 3×1×2 array, with:

    1
    2
    3

as the first plane and:

    4
    5
    6

as the second plane.

Thinking about indexing can help clarify the results of matrix concatenation, but it is important to remember that Julia uses 1-based indexing unlike many other languages. Our initial vectors were 1D, so they're indexed with a single dimension: v2[2] = 5. When joining them together along a new second dimension, the result is 2D, so it has two indices: [v1;; v2][2, 1] = 2. The result of [v1;;; v2] is a 3D array; before the third dimension is added, a second dimension must exist, but the concatenation is along the third dimension, so the result has a shape of 3×1×2. Indexing of this 3D array uses three indices: [v1;;; v2][3, 1, 2] = 6. The second index only goes to 1 because there is one column.

There are also two new destructuring features. The first one is most easily explained with an example. Given a struct:

    struct S
        a
        b
    end

we can instantiate it with newS = S(4, 5). As before, we can access the two properties of newS with newS.a and newS.b, which yield 4 and 5. The new feature allows us to do this:

    (; a, b) = newS

Now a has the value 4 and b has the value 5.

The second destructuring feature eliminates a potential source of bugs when doing a destructuring operation into a mutable container. If we define a vector with a = [1, 2], what would be the result of a[2], a[1] = a? Anyone writing this probably intends to switch the positions of the elements, changing a to [2, 1]. But in previous versions of the language, the result was [1, 1], because a was being mutated as it was iterated during the destructuring.

To see why it worked this way up to now, the following illustrates the steps the language would take when evaluating the operation:

    a[2] = a[1]    # now the vector a == [ 1, 1 ]
    a[1] = a[2]    # not quite what we were after

The new version does what most people probably expect by deferring the mutation during iteration. (Of course, this is another breaking change in the rare cases where people actually depended on the mutating behavior.)

Unicode refinements

Julia continues to refine its embrace of Unicode as part of the syntax. From the start, Julia has allowed juxtaposition to mean multiplication in cases where it wasn't ambiguous, so if y = 7 then 2y was 14. It has also allowed Unicode square- and cube-root symbols to have their familiar mathematical meaning. Now we can combine juxtaposition with these symbols, as this REPL session illustrates:

    julia> x = 64
    64

    julia> √x
    8.0

    julia> 3√x
    24.0

    julia> ∛x
    4.0

    julia> 3∛x
    12.0

Infix operators in Julia are simply functions that can be used with an alternative syntax. For example, the plus operator is really a function: 3 + 4 is the same as +(3, 4).

The programmer can use a wide variety of Unicode symbols as the names of functions, but in general these names cannot be used as infix operators unless they're given special treatment by the language. Julia 1.7 defines two new symbols that can be used this way: ⫪ and ⫫, which can be entered in the REPL using \Top and \Bot followed by TAB. They have the same precedence as comparison operators such as >.

As a little example of what one might do with this, the following REPL session defines a function that tests if the absolute value of its first argument is larger than the absolute value of its second:

    julia> function ⫪(a, b)
               return abs(a) > abs(b)
           end
    ⫪ (generic function with 1 method)

    julia> ⫪(-8, 3)
    true

    julia> ⫪(-9, -12)
    false

    julia> -9 ⫪ -12
    false

The last input shows that this new symbol can be used as an infix operator.

The ability to use Unicode for names and operators helps to make Julia programs look more like math. However, Unicode is notorious for containing distinct characters that appear identical, a circumstance that has led directly to a class of domain-name vulnerabilities. Having different characters that are visually indistinguishable can obviously be a source of serious bugs.

One way Julia tries to prevent this problem is by ensuring that such visually identical characters have identical semantics. Version 1.7 defines three Unicode centered dots, with code points U+00b7, U+0387, and U+22c5 (they all look like this: ·) as functionally identical. They can be used as infix operators. I tested this by defining a function using one of the dots, and found that I could call my function, including in infix form, with either of the others. The LinearAlgebra package uses the centered dot for a vector inner product.

The new version also brings in the Unicode minus sign (U+2212, or \minus in the REPL) to mean the same thing as the ASCII hyphen that we generally use for subtraction.

New REPL features

The REPL has long featured something called "paste mode", where a user could paste in an example session, such as the ones above, directly into the REPL; it would automatically strip out the prompts, recognize user input, and generally do the right thing. This has now been extended to all of the REPL modes (pkg, shell, and help) in addition to the normal mode; it even switches modes automatically based on the prompt string in the pasted text.

The help mode shows documentation mainly by formatting doctrings supplied by the programmer at the function or module level. Now, in case a module is missing a docstring (not uncommon, especially with small packages), help will look around in the package directory for a README file and print the one closest to the module in question. In any case, it will print the list of exported names. A similar feature existed in some pre-1.0 Julia versions, so this is more of a revival than something new.

If the REPL user does something that returns a large array, the REPL will print an abbreviated form, using ellipses to indicate skipped elements. However, until now, the REPL would mercilessly dump any enormous string that it was asked to display. In Julia version 1.7, long strings are elided in a similar manner as long arrays. The REPL will print the first three and a half lines or so, followed by a notation like ⋯ 13554 bytes ⋯, and then the final three and a half lines. The show() function can be used to see the whole string.

If the user attempts to import an uninstalled package, now the REPL will offer some advice:

    julia> using Example
    │ Package Example not found, but a package named Example is available from a
    │ registry. 
    │ Install package?
    │   (@v1.7) pkg> add Example 
    └ (y/n) [y]: 

Answering "y" is all that is needed, whereas before users would need to know to enter package mode and use the add command. This should be particularly helpful for beginners.

Changes in the standard library

The well-established deleteat!(v, i) function mutates the Vector v by deleting elements at the indices in the list i. The new Julia version adds its twin, keepat!(v, i), which is also a mutating function as indicated by the "!" convention. It effectively deletes the elements for all of the indexes not present in i.

Previously, redirecting both standard error and standard output in a Julia session was an exercise in verbosity requiring four levels of nesting. Now, there is a new library function that makes this much easier:

    redirect_stdio(p, stdout="stdout.txt", stderr="stderr.txt")   

This will call the function p() and redirect its print statements to the supplied paths. This new function will be most convenient when used with a do block, which creates an anonymous function and passes it as the first argument to a function call. In this way we can wrap anything we want inside the redirect_stdio() function:

    redirect_stdio(stdout="stdout.txt", stderr="stderr.txt") do
               println("Julia has rational numbers:")
               println(1//5 + 2//5)
    end

After executing this in the REPL we will have an empty stderr.txt file and a stdout.txt file with contents:

    Julia has rational numbers:
    3//5

Like other languages, Julia has tuples that are immutable vectors; it has a named version of tuples as well:

    tup = (1, 2, 3)
    ntup = (a=1, b=2, c=3)
    # ntup[:a] == ntup.a == 1

As shown in the comment, named tuples can be indexed using a name (e.g. ntup[:a]), which is equivalent to the property-style access (e.g. ntup.a). Subsets of ordinary tuples can be extracted by indexing with a vector of indices:

    tup[ [1, 3]  ]    # yields a new tuple: (1, 3)

New methods added to the getindex() function, which performs indexing behind the scenes, means that we can now index named tuples with a vector of symbols:

    ntup[ [:a, :c] ]   # yields a new named tuple: (a = 1, c = 3)

This new feature makes named tuple indexing consistent with indexing of ordinary tuples.

The existing replace() function can make a single substitution on a string. It has been enhanced to accept any number of replacement patterns, which are applied left to right and "simultaneously". The release notes use the "simultaneously" term, which simply means that replaced substrings are not subject to further replacements. The new function is both convenient and significantly faster than the regular-expression techniques that people usually resort to. A simple example should clarify how it works:

    julia> s = "abc"
    "abc"

    julia> replace(s, "b" => "XX", "c" => "Z")
    "aXXZ"

    julia> replace(s, "c" => "Z", "Z" => "WWW")
    "abZ"

    julia> replace(s, "ab" => "x", "bc" => "y")
    "xc"

The last two examples show how the feature works in the presence of multiple possible matches.

Significant improvements related to pseudo-random number generation are arriving with version 1.7. The default generator has been swapped with one that has better performance in terms of time, memory consumption, and statistical properties. The new generator also makes it easier to perform reproducible parallel computations.

Julia supports several types of parallel and concurrent computation. Most parallel computation in Julia is organized around tasks, which are similar to coroutines in other languages. The new random number generator is "task local", which means that, during a parallel computation, each task gets its own instance of the generator. The same sequence of pseudo-random numbers will be generated on each task, independent of the allocation to threads. This allows for reproducible simulations using random numbers, even with algorithms that dynamically create tasks during run time.

That brings us to one of the most significant advances arriving with the new version. Previously, in a multi-threaded computation, once a task was assigned to a thread, it was stuck on that thread forever. Julia version 1.7 introduces task migration. Now the scheduler can move tasks among all available threads, potentially helping with load balancing and parallel efficiency. In a followup article we'll take a detailed look at this in the context of a brief tutorial on parallel computation in Julia.

Conclusion

The number of new features and improvements in the upcoming version of Julia is impressive, especially coming just six months after version 1.6. Development seems to have entered a mature phase with steady incremental but substantial improvements in performance, consistency, and programmer convenience.

Julia's position as a major platform for science and engineering computation seems secure, with inroads into a wide variety of disciplines. It's also a free-software success story, with every layer—the language, its 5,000 public packages, and the LLVM compiler it's built on—developed in the open, with contributions welcome from everyone. Julia's package manager, a subsystem integrated into the language and the REPL, eases the process of making those contributions, and allows the end user to keep the 5,000 packages straight. In a followup article we will look at how that system works from the points of view of the developer and the user.

Comments (3 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Briefs: Rusting Linux; Asahi Linux; AlmaLinux Foundation; Matrix & BBB; LLVM 13; PostgreSQL 14; Python 3.10; Quotes; ...
  • Announcements: Newsletters; conferences; security updates; kernel patches; ...
Next page: Brief items>>

Copyright © 2021, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds