LWN.net Weekly Edition for June 29, 2023
Welcome to the LWN.net Weekly Edition for June 29, 2023
This edition contains the following feature content:
- Delegating privilege with BPF tokens: another attempt at making BPF functionality available to less-privileged users.
- Ongoing LSFMM+BPF 2023 coverage:
- Removing the kthread freezer: pushing forward the removal of this long unloved API.
- Converting filesystems to iomap: a discussion on the iomap documentation, especially as it relates to converting existing filesystems to use it.
- Development statistics for 6.4: where the code in the 6.4 kernel release came from.
- JupyterLab 4.0: a development environment for education and research: a look at the new features in this major JupyterLab release.
- Reports from OSPM 2023, part 3: the final set of reports from the 2023 Conference on Power Management and Scheduling in the Linux Kernel.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Delegating privilege with BPF tokens
The quest to enable limited use of BPF features in unprivileged processes continues. In the previous episode, an attempt to use authoritative Linux security module (LSM) hooks for this purpose was strongly rejected by the LSM developers. BPF developer Andrii Nakryiko has now returned with a new mechanism based on a privilege-conveying token. That approach, too, has run into some resistance, but a solution for the strongest concerns might be in sight.Nakryiko (and his employer) would like the ability to allow a process to carry out a limited set of BPF operations without needing to hold any special capabilities. Currently, most BPF operations require (at least) the CAP_BPF capability, so code that needs to use BPF functionality must be run with privilege that often goes beyond what is actually needed. The security module implemented in Nakryiko's previous attempt could have been used to allow specific operations as controlled by the security policy, but this module required authoritative hooks (security hooks that grant access that would otherwise be denied); such hooks are not allowed in the kernel. Thus, necessarily, the new approach takes a different tack.
In early June, Nakryiko posted a patch set implementing the concept of a "BPF token" that can be used to convey limited, BPF-related capabilities from one process to another. A privileged supervisor process can use a new command to the bpf() system call, BPF_TOKEN_CREATE, to create a token, which is returned in the form of a special file descriptor. The creator specifies the operations that the token is meant to enable; these include creating maps (with control over which types of maps can be created), loading BPF type format (BTF) data, loading programs, and creating more tokens.
There is a flag that causes the kernel to ignore any abilities requested that do not actually exist; its purpose is to ease the task of writing code that works across multiple kernel versions, some of which may not support all operations. This option can also be used to create a token that is valid for any supported operation — even those that do not exist when the code is written.
Once created, a token can be passed to another process with the usual SCM_RIGHTS mechanism. It is also possible to "pin" a token into the BPF filesystem, making it usable to any process that is able to access that filesystem. Pinning can be a way to inject a BPF token into a running container, for example. Since the BPF filesystem is namespace-aware, pinning a token into a specific container's filesystem does not make that token globally visible.
Most bpf() calls use a command-specific structure in the sprawling bpf_attr union. When token support is added to a specific command, that command's structure gains a new integer field where the caller can place their token. If a token is present and grants the ability to carry out the requested operation, the request will proceed regardless of whether the calling process has the needed capabilities. As is the case with BPF generally, a value of zero indicates "no file descriptor" (and thus no token), so file descriptor zero cannot be used to represent a BPF token.
The first posting of this work drew a response from security developer Casey Schaufler, who was unenthusiastic:
Token based privilege has a number of well understood weaknesses, none of which I see addressed here. I also have a real problem with the notion of "trusted unprivileged" where trust is established by a user space application. Ignoring the possibility of malicious code for the moment, the opportunity for accidental privilege leakage is huge.
Later, in response to a request from Nakryiko, Schaufler described some of the weaknesses he was talking about; most of them involved a token leaking out of its intended container and being abused by an attacker. Nakryiko responded that this mechanism was intended to be used in high-trust environments where the attacker shouldn't exist, but Schaufler said that was inadequate, and that the security mechanism had to ensure that it could not be abused in that way.
Undeterred, Nakryiko posted a new version of the patch set a few days later with only minimal changes. This time, it was Toke Høiland-Jørgensen who raised concerns about this approach:
I am not convinced that this token-based approach is a good way to solve this: having the delegation mechanism be one where you can basically only grant a perpetual delegation with no way to retract it, no way to check what exactly it's being used for, and that is transitive (can be passed on to others with no restrictions) seems like a recipe for disaster.
He went on to suggest the creation of a privileged process that could receive BPF requests via remote procedure calls and apply whatever policy made sense before executing them. Nakryiko responded that this design would not work well in practice — an answer that was echoed by Hao Luo, who described Google's experience with that pattern.
Djalal Harouni also expressed concerns that tokens could leak between containers, and suggested that a BPF token should be an attribute of a specific BPF filesystem instance. That, he said, would help to attach the token to a specific namespace, preventing leakage and matching how other credentials are handled. Christian Brauner agreed with that suggestion.
In response, Nakryiko acknowledged the concern and the suggested solution:
The main worry is that BPF token, once issued, could be illegally/uncontrollably passed outside of container, intentionally or not. And by having this association with mount namespace (through BPF FS) we automatically limit the sharing to only contain that has access to that BPF FS.
He suggested a slightly different implementation, though, based on his
desire to allow a namespace to have more than one token: the creation of a
BPF token could include a file descriptor identifying a BPF filesystem
instance. The resulting token could only be pinned into that specific
filesystem instance, and would be prevented, somehow, from leaving the
mount namespace where that filesystem instance exists; "specific details
to be worked out
".
Version 3 of
the patch set, posted on June 22, implemented a step in this
direction. In this version, creating a token and pinning it into a BPF
filesystem are done in a single operation, and it is no longer possible to
pin a token after creation. That will keep tokens from being pinned
outside of the intended context, but does not address the possibility that
a token could be deliberately leaked via SCM_RIGHTS. So
Nakryiko's objective that a BPF token "cannot leave the boundaries of
that mount namespace
" has not yet been fully achieved.
Whether that change is enough to address the concerns that have been expressed remains to be seen. Then we will have to see whether the more security-oriented developers in the community are willing to accept a token-based mechanism in general. If not, it would probably be a good time for them to suggest a workable alternative. Should no such problems arise, though, BPF tokens may make an appearance in a near-future kernel release.
Removing the kthread freezer
The final day of the 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit featured three separate sessions led by Luis Chamberlain (he also led a plenary on day two); the first of those was a filesystem session on the status of the kthread-freezer-removal effort. The kthread freezer is meant to help filesystems freeze their state in order to suspend or hibernate the system, but since at least 2015, the freezer has been targeted for removal. Things did not change much a year later, nor by LSFMM in 2018 when Chamberlain had picked up Jiri Kosina's removal effort; this year, Chamberlain was back to try to push things along.
It may come as a surprise to some that freezing filesystems in preparation for suspending the system has been broken in Linux for years, he began. There is no unified mechanism to freeze filesystems and if there is a lot of I/O going on, it can lead to a system hang when resuming, which is not quite what users are looking for.
The problem comes about because the kthread-freezer API, which was added to help stop in-flight I/O during suspend operations, has sloppy semantics and is used somewhat haphazardly. The control group (cgroup) freezer was broken in the kernel the last time the kthread-freezer topic was discussed at LSFMM, but he wondered if that was still the case. It has been fixed, Aleksa Sarai said, which required a new cgroup filesystem, another attendee added. There were also problems with the freezing process racing with the automounter, Chamberlain said, but no one in the room seemed to know about the status of that; "I guess we'll have to keep that in mind".
![Luis Chamberlain [Luis Chamberlain]](https://static.lwn.net/images/2023/lsfmb-chamberlain2-sm.png)
There are some ordering problems that will need to be resolved, which eventually may require building a directed acyclic graph (DAG) of the filesystem superblocks so that freezing and thawing can be done in the right order. He said that Al Viro has lamented the fact that he implemented the LOOP_CHANGE_FD ioctl() command so that Fedora live installations could jump directly to the newly installed filesystem; that breaks the expected ordering when iterating through superblocks, so suspending those systems may be broken. RAID can also introduce ordering oddities. The assumption is that the ordering is consistent when iterating forward and backward over the superblocks; it likely holds for most users on laptops and mobile devices, who are the ones that predominantly do suspends in any case.
Chamberlain wondered if there was a need for a mechanism to notify user-space applications that a suspend was coming in order to give them some time to quiesce. Ted Ts'o said that kind of notification exists in Windows, but the applications need to be given some amount of time to actually quiesce; if that process does not complete, the suspend needs to go on without them. Implementing the notification is not hard, "that's just plumbing" using D-Bus or something similar.
Handling network block devices is another problem area that was identified eight years ago, Ts'o said; "everyone said 'yeah, that's hard' and they all backed away slowly". David Howells noted that FUSE filesystems add complexity to the problem as well, since there are both kernel and user-space pieces that have to be frozen. Amir Goldstein pointed out that the checkpoint/restore developers have already been dealing with these kinds of complexities, which might serve as a model.
Lennart Poettering said that there is already a bunch of infrastructure in systemd for doing the user-space notification. If applications are interested in getting the notification, they can get it from systemd, which will give them a few seconds to react if needed. He noted that the suspend-then-hibernate sequence, which hibernates the system after a period of time in suspend mode, currently wakes up all of user space for a brief time before the hibernate, which is "just stupid". So there is work underway to leave all of user space frozen, using the cgroup freezer, except for the small piece that oversees the switch to hibernating. Jan Kara said that the kernel will still have to unfreeze the filesystems so that the overseer process can check the battery status and the like.
Chamberlain said that it sounded like the user-space side of the problem was largely solved at this point. He wanted to talk about what's next after the kthread-freezer calls get removed from the filesystems. That removal is done using Coccinelle semantic patches. His most recent patch is for the core of the automatic kernel freeze and resume code that will replace the kthread freezer API; the previous RFC patch set from January has the removal for more than a dozen filesystems using the Coccinelle rules.
He wondered if it makes sense to go ahead and remove the use of the API in other parts of the kernel. The API was added to allow filesystems to stop I/O in flight, he said, so it is probably being used incorrectly elsewhere. Jeff Layton said that the API is being used in NFS, and he is not convinced that is being done correctly, so he would like help removing the kthread freezer from there. Sarai said that cgroup v1 still uses the kthread freezer and he does not know why it was not changed to match cgroup v2; there will need to be a discussion about that before the API can be completely removed. Howells noted that all of the network filesystems will have some of the same problems that Layton is concerned about. Chamberlain wrapped things up by saying that the removal can be done incrementally, working through filesystems and subsystems one by one.
Note that the video for this session is mislabeled with the name of the Chamberlain-led iomap-conversion-status session, which took place right after. As might be guessed, the video for that session is titled "Removal of kthread freezer next steps".
Converting filesystems to iomap
A discussion that largely centered around the documentation of iomap, which provides a block-mapping interface for modern filesystems, was led by Luis Chamberlain at the 2023 Linux Storage, Filesystem, Memory-Management and BPF Summit. There is an ongoing process of converting filesystems to use iomap, in order to leave buffer heads behind and to better support folios, so the intent was to get feedback on the documentation from developers who are working on those conversions. One of the concrete outcomes of the session was a plan to move that documentation from its current location on the KernelNewbies wiki into the kernel documentation.
Hannes Reinecke said that the lack of clear units in the iomap documentation confused him; were things specified in bytes, sectors, pages, or something else? In addition, there are many different operation-function pointers, in three different struct *_ops, that need to be provided by a filesystem, but it was not clear to him what each of them was meant to do. Chamberlain said that it had also confused him when he started looking at iomap, but the basic idea is that there are lots of different types of operations, many with flags or options of various sorts, so the myriad of ops are just meant to split those out into their own separate functions. The alternative is a single function with lots of complexity to handle all of the different possibilities. Reinecke said that he was fine with having all of those functions, but that the documentation did not (yet) explain what all the operations were for.
The documentation tries to explain what is needed to convert a filesystem to use iomap, Chamberlain said. There are sections for direct I/O, buffered I/O, file-extent mapping, and so on. Iomap provides an iterator for ranges; it tries to replace the existing block-range operations. As was discussed in the earlier buffer-head session, though, there are no helpers for metadata operations in iomap. Filesystems have to implement their own metadata handling, as XFS does, or continue to use buffer heads for that. Adding helpers to iomap is possible, but may not be all that useful because the filesystems that have their own metadata operations are not likely to want to switch to something new, he said.
Reinecke summarized the current thinking on iomap; it is the interface that new filesystems should be using and, as discussed in the large-block-size session, it will be the only way for filesystems to support block sizes larger than the page size. He noted that the patch set allowing buffer heads to be configured out of the kernel may not really be useful, though, because UEFI systems need a VFAT filesystem, which currently requires buffer heads. He has patches to convert VFAT to iomap, which are partially working at this point, so that problem may go away in time.
The suggested order for reworking filesystems in the iomap documentation should be switched, Reinecke said; it currently has direct I/O as the first thing to convert, but he thought it should be left for last. Josef Bacik said that Btrfs has been doing the conversion and it started with direct I/O, because changing the buffered-I/O path requires reworking a lot more code; he thinks that the direct-I/O conversion is more straightforward for filesystems to tackle first.
Ted Ts'o cautioned that doing conversions on the simpler filesystems first may not be the right path either. Iomap is missing some of the necessary infrastructure to make the process less painful; metadata reads and writes are a prime example of that. In addition, many of the simple filesystems do not support direct I/O at all, so they cannot start there; meanwhile they do need the ability to read and write metadata, so asking them to convert right now is likely to result in developers who "run away screaming".
Jan Kara said that there are two facets to the iomap conversion: handling the data path with iomap, which is ready to be done now, and removing buffer heads, which is a separate question that requires a "sane story for them". It is important to recognize that filesystems cannot be forced to fully convert to iomap, Reinecke said; that is the eventual goal, but it may never be reached. Kara said that he had patches queued that convert the ext2 direct-I/O data path to iomap; those patches also include some VFS changes that will make that conversion easier for the simpler filesystems. The more complex filesystems, such as ext4, Btrfs, and XFS, do not need those changes because they already have internal helpers. He is working with Ritesh Harjani on converting the rest of the ext2 data path to iomap.
The next step would be to convert the metadata handling, but there are not good answers for that yet. Reinecke said he had been working with others to provide helpers that will allow filesystems to request data transfers in sizes smaller than a page, and get back a folio and offset into it where the data is located; for his purposes, it does not really matter if 512 bytes or a whole page is read as long as he knows where to get the data he is interested in. Then, the sub-page write piece needs to be worked out; once those pieces are in place, the conversion of the metadata paths can be tackled.
Harjani came in over the remote link to talk about the work he has been doing on the buffered-I/O path for ext2. There are some open problems, one of which is being addressed by a patch series under review for sub-page dirty tracking. Another issue is that the BH_Boundary flag is currently used for filesystems, like ext2, that can have indirect blocks that are discontinuous; if the BIO covering the range gets rearranged, it can lead to a sub-optimal data-access pattern. The flag is not supported in iomap, but probably needs to be the next piece addressed after the dirty tracking.
Ts'o said that the issue really only affected filesystems that use V7-Unix-style indirect blocks, which VFAT, for example, does not use; modern filesystems use extent mapping instead. So this may be an example of something that iomap may want to support for better performance for those older filesystems like ext2, minix, and UFS, but it may be decided that the performance without adding the feature is good enough and "we'll live with a performance hit on those older filesystems".
This is another reason that the documentation should make it clear that iomap is still being developed; the interfaces that eventually shake out may be different than what is there today. The documentation may change over time because people are working to make it easier for filesystems to use iomap, but that is still under construction. "We shouldn't promise that it is going to be easy, because it is not easy ... yet."
Support for the older filesystems is generally only needed to be able to access the filesystems, Reinecke said; there is no real need to ensure that they are particularly fast. For things like VFAT or the ISO CD-ROM filesystem, slowing them down slightly will not really be noticed; they were slow to begin with, after all. So he suggested not spending a lot of time making things faster for those cases; "if you care, write a different filesystem".
Chamberlain noted that Kara had mentioned the Linux Test Project (LTP) test suite as one that is good to use for testing these kinds of changes, but wondered if there were others. Kara said that there are direct-I/O tests in fstests that can also be used.
There has been a lot of work done by Goldwyn Rodrigues on locking, Chamberlain said, that needs to get into the kernel so that Btrfs can convert more than just the direct-I/O path to iomap. Rodrigues came in remotely to say that the worst part of the problems he has been tackling, extent locking within the page lock, is mostly working at this point, though there are "a couple of hackish patches". He is hoping to get the patch set out for review soon, which will presumably lead to some better ideas for the hacks.
Kara reported to Harjani that he had just poked around in the code and did not think the BH_Boundary flag will be much of a problem. It is meant to tell the block layer that the filesystem needs to submit the read before it has the information to submit further reads, but iomap simply returns each contiguous extent, so it implicitly handles that case. Kara said that Harjani can ignore the boundary-handling issue and "it will be mostly fine".
Chamberlain closed the session by asking attendees, particularly those who are working on converting filesystems to use iomap, to review the documentation on the wiki. He outlined how to get edit rights there and suggested that developers simply reflect their comments in the wiki text itself; in a kernel release or two, it could be submitted to the mainline. An attendee said that "sooner is better", and others agreed, so Chamberlain said that he would simply post it to the mailing list for review.
Development statistics for 6.4
The 6.4 kernel was released on June 25 after a nine-week development cycle. By that point, 14,835 non-merge changesets had been pulled into the mainline kernel, a slight increase from 6.3 (14,424 changesets) but still lower than many other development cycles. As usual, LWN has taken a look at those changesets, who contributed them, and what the most active developers were up to.The work in 6.4 was contributed by 1,980 developers, 282 of whom made their first kernel contribution during this development cycle. The most active 6.4 developers were:
Most active 6.4 developers
By changesets Uwe Kleine-König 781 5.3% Krzysztof Kozlowski 499 3.4% Rob Herring 200 1.3% Ian Rogers 200 1.3% Konrad Dybcio 146 1.0% Thomas Zimmermann 132 0.9% AngeloGioacchino Del Regno 126 0.8% Hans de Goede 121 0.8% Christoph Hellwig 118 0.8% Ville Syrjälä 116 0.8% Tom Rix 115 0.8% Nick Alcock 112 0.8% Johannes Berg 111 0.7% Darrick J. Wong 111 0.7% Philipp Hortmann 101 0.7% Geert Uytterhoeven 88 0.6% Greg Kroah-Hartman 87 0.6% Manivannan Sadhasivam 85 0.6% Eric Dumazet 84 0.6% Bart Van Assche 83 0.6%
By changed lines Ian Rogers 167443 17.4% Hawking Zhang 123915 12.8% Eduard Zingerman 25322 2.6% Laurent Pinchart 17210 1.8% Ping-Ke Shih 16062 1.7% Darrick J. Wong 11027 1.1% Uwe Kleine-König 10159 1.1% Benjamin Tissoires 8623 0.9% Konrad Dybcio 8421 0.9% Jani Nikula 7982 0.8% Jiri Slaby 7645 0.8% AngeloGioacchino Del Regno 7353 0.8% Krzysztof Kozlowski 7285 0.8% Hans de Goede 7068 0.7% Paul Gortmaker 7011 0.7% Tony Nguyen 6834 0.7% Jeffrey Hugo 6718 0.7% Wolfram Sang 6665 0.7% Devi Priya 6036 0.6% Qu Wenruo 5617 0.6%
The 6.3 merge window included a patch from Uwe Kleine-König adding a new function pointer to struct platform_driver. Noting that the driver core ignores the return value from the remove() function, he decided to make that function return void instead. There are, however, many drivers defining that function — more than could be changed at that time. So, rather than changing remove(), he added remove_new(), which behaves in the same way with the exception that it returns void; that made it possible to convert drivers at leisure.
"At leisure" may not describe what happened in 6.4, though, where Kleine-König contributed 781 changesets, almost all of them converting drivers to remove_new(). It's worth noting that we are likely to see a repeat of this performance; the plan calls for renaming remove_new() back to remove() (and updating all the drivers again) once the conversion is complete. Krzysztof Kozlowski, whose work (mostly in the devicetree subtree) would have normally put him easily into the top position, came in a distant second this time around. Ian Rogers made a number of enhancements to the perf tool, Rob Herring worked mostly on devicetree improvements, and Konrad Dybcio hacked on various system-on-chip drivers and devicetree files.
Rogers also made it to the top of the "lines changed" column by contributing updated event definitions for perf. Hawking Zhang added the obligatory set of amdgpu register definitions. Eduard Zingerman reworked many of the BPF self-tests, adding a lot of inline assembly code; this patch is a typical example. Laurent Pinchart deleted a number of unused camera-sensor drivers while adding the i.MX8 ISI driver, and Ping-Ke Shih added a set of static tables to the rtw89 WiFi driver.
The top testers and reviewers this time around were:
Test and review credits in 6.4
Tested-by Daniel Wheeler 159 12.9% Chen-Yu Tsai 61 5.0% Marek Szyprowski 36 2.9% Abhinav Kumar 27 2.2% Sachin Sant 21 1.7% Joel Fernandes 20 1.6% Zqiang 18 1.5% Tommaso Merciai 17 1.4% Philipp Hortmann 16 1.3% Tony Zhu 16 1.3% Arnaldo Carvalho de Melo 15 1.2% Marek Szlosek 15 1.2%
Reviewed-by Simon Horman 327 3.9% Konrad Dybcio 208 2.5% Krzysztof Kozlowski 197 2.4% AngeloGioacchino Del Regno 151 1.8% David Sterba 134 1.6% Rob Herring 127 1.5% Chen-Yu Tsai 118 1.4% Dmitry Baryshkov 116 1.4% Dave Chinner 115 1.4% Hans de Goede 113 1.4% Christoph Hellwig 112 1.3% Geert Uytterhoeven 104 1.3%
For 6.4, 1,064 commits (7% of the total) carried Tested-by tags, while 6,392 (43%) had Reviewed-by tags. That is a significant drop since 6.3 for both types of tags. The Tested-by tags, in particular, clearly do not reflect the actual testing activity that is taking place in the kernel community.
A total of 230 employers (that could be identified) supported work on 6.4, a slight increase from 6.3. The most active employers were:
Most active 6.4 employers
By changesets Intel 1542 10.4% Linaro 1505 10.1% 1137 7.7% (Unknown) 1086 7.3% Red Hat 881 5.9% Pengutronix 826 5.6% AMD 624 4.2% (None) 582 3.9% Meta 467 3.1% Oracle 384 2.6% NVIDIA 381 2.6% SUSE 355 2.4% IBM 349 2.4% Qualcomm 300 2.0% (Consultant) 262 1.8% Collabora 236 1.6% Renesas Electronics 224 1.5% Huawei Technologies 221 1.5% NXP Semiconductors 195 1.3% Microsoft 183 1.2%
By lines changed 191963 19.9% AMD 156235 16.2% Intel 73179 7.6% (Unknown) 67648 7.0% Linaro 42260 4.4% Red Hat 39293 4.1% Qualcomm 28096 2.9% (None) 23696 2.5% (Consultant) 23488 2.4% SUSE 23393 2.4% Meta 20978 2.2% Realtek 20830 2.2% NVIDIA 19050 2.0% Oracle 16942 1.8% IBM 15768 1.6% Renesas Electronics 14519 1.5% Pengutronix 11598 1.2% MediaTek 10956 1.1% Collabora 10066 1.0% Microsoft 9824 1.0%
As usual, there are not a lot of surprises to be found in these results.
"Lines changed" is, like commit counts, a poor proxy for software productivity, but it's hard to find a better one. So, your editor has concluded, one might as well just go nuts with the "lines changed" metric over the long term. With the use of git blame and a certain amount of CPU time, it is possible to look at who is "blamed" for every line in the kernel source — who is the last developer to have touched it, in other words.
Running this analysis on the 6.4-rc7 kernel turns up 22,612 developers who have touched at least one line — 2,135 of whom have touched exactly one line. The developers who have left the biggest imprint on the 6.4 kernel are:
Developer Lines Pct Linus Torvalds 2159025 5.9% Alex Deucher 1177105 3.2% Hawking Zhang 840838 2.3% Huang Rui 479002 1.3% Mauro Carvalho Chehab 417086 1.1% Aurabindo Pillai 383629 1.1% Oded Gabbay 292611 0.8% Ian Rogers 271905 0.7% Leo Li 228680 0.6% Bhawanpreet Lakha 206275 0.6% Qingqing Zhuo 198516 0.5% Aaron Liu 193174 0.5% Ping-Ke Shih 184453 0.5% Larry Finger 172346 0.5% Ben Skeggs 170190 0.5% Roman Li 164743 0.5% Mark Brown 158765 0.4% David Howells 158530 0.4% Hans de Goede 157719 0.4% Laurent Pinchart 145900 0.4% James Smart 145427 0.4% Kalle Valo 144362 0.4% Hans Verkuil 142093 0.4% Johannes Berg 139245 0.4% Takashi Iwai 131543 0.4% Feifei Xu 127510 0.4% Christoph Hellwig 127129 0.3% Thierry Reding 118462 0.3% Linus Walleij 115289 0.3% David S. Miller 97207 0.3%
Torvalds does not write a lot of kernel code these days, and hasn't for some time; his position at the top of the list is the enduring legacy of the initial Git commit of the 2.6.12 kernel in 2005. Many of the other developers on that list — Alex Deucher, Hawking Zhang, Huang Rui, Aurabindo Pillai, Leo Li, Qingqing ZHuo, Aaron Liu, and Roman Li — are there primarily as the result of having contributed amdgpu header files, though Deucher's work is quite a bit broader than that. Mauro Carvalho Chehab has left such a big footprint after many years of intensive work in the media subsystem and the conversion of much kernel documentation to the RST format.
A few of the developers on the above list are there as the result of consistent kernel work over a period of decades, but that clearly is not the main variable being measured here. Those wanting to see more can view the top 2,000 results separately. Note that no attempt has been made to join multiple entries resulting from name changes, typos, or mailing-list mangling.
As of this writing, there are about 11,330 changesets waiting in linux-next for the 6.5 merge window. LWN will, of course, be watching as that work pours into the mainline and yet another development cycle runs its course. The kernel development community continues to run at full speed.
JupyterLab 4.0: a development environment for education and research
JupyterLab is a web-based development environment widely used by data scientists, engineers, and educators for data visualization, data analysis, prototyping, and interactive learning materials. The Jupyter community has recently announced the release of JupyterLab 4.0, introducing lots of new features and performance improvements to enhance its capabilities both in research and educational settings.
JupyterLab's umbrella project, Jupyter, focuses on creating free and open-source software for interactive computing across all programming languages, using the three-clause BSD license for all of its projects. Jupyter evolved from IPython, which is an interactive shell for Python that later added support for other interpreted languages. Jupyter's core concept is the computational notebook: a shareable document that combines computer code, plain language descriptions, data tables, visualizations, and even interactive controls like sliders for changing parameters.
LWN looked at JupyterLab's first beta release in 2018, but it has made a quite a bit of progress from there. JupyterLab is a full-fledged web-based development environment to create these computational notebooks, which are organized into input and output cells. Users type Python code (or code in any of the other more than 40 supported programming languages) into an input cell. After the user presses shift-return, the code is evaluated and the output is displayed in an output cell below it.
Not only is this an excellent pedagogical tool for teachers, but JupyterLab is also indispensable for researchers when experimenting or building prototypes (see the screen shot below for an experiment analyzing signals from a brain-computer interface). Its popularity among data scientists is no coincidence: rather than re-running a Python script that processes a large data set every time the script is modified, JupyterLab allows users to iteratively develop a notebook with data loading in one cell and processing steps in other cells. Users can then fix bugs and add functionality to the processing cells and re-run only those cells without having to reload the data. Of course, this is also possible with the Python REPL, but without the clear cell-based approach.
Using JupyterLab
JupyterLab can be installed as a Python package (jupyterlab) through PyPI and conda-forge. The 4.0 release is also available from the official repositories in the upcoming Fedora 39 (package jupyterlab), in openSUSE Tumbleweed (package python-jupyterlab, and Arch Linux (package jupyterlab). Debian and Ubuntu don't have the package in their official repositories. After installation, JupyterLab is started by typing jupyter lab at the command line. This opens a new tab or window in the default web browser, displaying the development environment.
At the left of the JupyterLab window, a file browser displays the contents of the current directory, while the panel on the right shows the "launcher". This is where the developer creates new notebooks, text files, or Markdown files. Additionally, the developer can open a console with a Python REPL or a shell from the launcher. Opening a new launcher adds a tab by default, but by dragging the tab's title bar, it can be converted into a new panel and arranged horizontally or vertically. That can be used for simultaneously viewing multiple notebooks or other files, such as CSV files with data used in the notebook's analysis (see the screen shot below with an analysis of the frequency of egg laying by my chickens).
Code editor and performance improvements
As the changelog for JupyterLab 4.0 reveals, the latest release includes significant improvements. The code editor that JupyterLab uses for its cells, CodeMirror, has been updated to version 6. Its most notable enhancement is better customization capabilities. For example, in previous versions, users had to modify settings separately for each type of cell, the file editor, and the console editor. Now, they can change their settings for all of these in one location and override some settings for specific cell types, such as hiding line numbers only for Markdown cells.
The new CodeMirror version also loads large notebooks more quickly. This is just one of the performance improvements featured in the latest JupyterLab release. For example, the upgrade from MathJax version 2 to 3 improves rendering times for mathematical equations. Other optimizations have been made to the CSS rules for JupyterLab's web interface to improve browser performance when many HTML elements are present on a page. JupyterLab 4.0 also introduces notebook windowing: when setting this feature in the Notebook part of the Settings panel to "full", JupyterLab only renders the parts of a notebook that are currently visible in the browser window; the other parts are only rendered when scrolling makes them visible. However, this setting is not enabled by default, as it may have side effects if some cell outputs are displaying HTML iframe elements. All of these performance improvements should be noticeable when working with larger notebooks.
Extensible architecture
JupyterLab is designed as an extensible environment. JupyterLab extensions customize or enhance JupyterLab, for example by providing a new theme, a file viewer or editor for specific file types, or a renderer for specific types of output cells in notebooks.
JupyterLab 3 introduced the ability to install extensions as Python packages via pip. Now JupyterLab 4.0 builds on this feature, making extensions more discoverable from its web interface. The Extension Manager (accessible in the left sidebar by clicking on the jigsaw icon) by default displays extensions from PyPI, at least for Python packages that have the Trove classifier "Framework :: Jupyter :: JupyterLab :: Extensions :: Prebuilt" in their package metadata. However, there is no check to ensure that the extension is compatible with the current JupyterLab version. Consequently, it is possible that an extension found in the Extension Manager will fail to run.
Extension developers need to be aware of numerous breaking changes in the API from JupyterLab 3.x to 4.0 and have to modify affected extensions accordingly. The extension system and packaging have also been changed in JupyterLab 4.0. Fortunately, there is an upgrade script that helps with this migration by creating the necessary files for packaging the extension and updating dependencies to package versions compatible with JupyterLab 4.0. The extension tutorial has also been updated.
Real-time collaboration
JupyterLab is not only a tool for individual development; recent versions have also improved the possibilities for real-time collaboration. This feature allows multiple developers to work on the same notebook simultaneously. When editing the same document, users can see the cursors from other users in the editor, and a side panel displays all connected users. Under the hood, this is based on Yjs, which is a JavaScript framework for shared editing.
Since not all users require real-time collaboration, JupyterLab 4.0 has separated this feature into a distinct package, jupyterlab-collaboration. Just like the main JupyterLab package, this can be installed from PyPI or conda-forge, and it can also be installed from the Extension Manager. After this, JupyterLab needs to be started with the --collaborative option to enable the collaborative mode. Moreover, JupyterLab's web server only listens for local connections by default, while collaborators will obviously need remote access. There are options to allow the server to listen for remote connections (e.g. --ip=0.0.0.0); when it starts up it will display a URL for remote access to the notebook, though this feature is not well documented (except in the help text). TLS can also be enabled by adding options for the certificate and key file on the command line.
Some flaws of Jupyter notebooks still remain
While Jupyter notebooks offer a convenient way to develop Python programs interactively, they have some drawbacks compared to traditional Python development, and JupyterLab 4.0 doesn't change this. To begin with, the cell-based approach can lead to less structured and more linear code. Additionally, notebooks are designed to be standalone, whereas in conventional Python development, functions and classes are often organized into separate modules according to their purpose. Although it's possible to write traditional Python modules and import them into a notebook, this forces the developer to switch between two modes of programming. Consequently, many developers simply adhere to the linear, cell-based structure of their notebook and copy and paste code they want to reuse from another notebook. This approach does not promote code reuse and modularity, making it challenging to maintain and understand Jupyter notebooks, especially in larger projects.
Another point of concern is that notebook cells can be executed out of order. This is convenient while iteratively developing code because it allows going back to a previously executed cell, altering its code to fix a bug, and executing it again before proceeding with another cell. However, this can lead to unexpected results and can make it difficult to understand the flow of the code if the developer forgets to run other cells that are depending on the changed cells. This contrasts with traditional Python development, which involves executing code sequentially in a file or in the REPL, providing a clearer code flow.
When storing Jupyter notebooks into version control, another problem emerges. A notebook is saved in JSON format, combining its code and output (text as well as images) in a single file. This makes it difficult to track changes with version-control systems like Git. In particular, with non-deterministic tasks, such as those in machine learning, merely running the notebook can result in significant changes in the output cells, resulting in a large diff even if the code cells remain unchanged. Fortunately, a tool like nbdime can be used to display only the relevant differences in Jupyter notebooks. Another option is to always clear the output cells of a notebook before committing changes to Git.
Conclusion
JupyterLab 4.0 represents a significant step forward in the development of interactive notebooks for education and research. With its enhanced code editor, performance improvements, new Extension Manager, and expanded real-time collaboration capabilities, JupyterLab continues to be an invaluable tool for data scientists, engineers, and educators. Although the notebook concept still comes with its limitations, it's clear that JupyterLab offers a big productivity boost for many who are working with data. Give it a try for your next project.
Reports from OSPM 2023, part 3
The fifth conference on Power Management and Scheduling in the Linux Kernel (abbreviated "OSPM") was held on April 17 to 19 in Ancona, Italy. LWN was not there, unfortunately, but the attendees of the event have gotten together to write up summaries of the discussions that took place and LWN has the privilege of being able to publish them. Reports from the third and final day of the event appear below.
Proxy execution
Author: John Stultz (video)The last day of OSPM began with a talk and discussion on proxy execution, a generalized form of priority inheritance. Proxy execution is an idea that has been worked on for a number of years by several core kernel contributors without getting much traction.
Android doesn't use realtime scheduling for applications, but it still needs to prioritize foreground applications over background applications. This is usually done via control-group mechanisms to restrict background tasks, allowing foreground applications to get more CPU time. Unfortunately, as is commonly seen with realtime priorities, Android devices frequently experience priority inversion. Low-priority background tasks may occasionally grab a mutex that an important foreground task requires, but the background task is restricted from running to release the lock.
The classic solution to priority inversion is priority inheritance via rt_mutexes, but this doesn't help, because the completely fair scheduler (CFS) doesn't choose which task to run based on strict priority order, as is done with realtime scheduling; the same problem appears in SCHED_DEADLINE as well, where deadlines instead of priorities would need to be inherited. So a more general form of priority inheritance is needed; that is what proxy execution provides.
The simple idea for proxy execution is that one can treat the scheduler's task-selection algorithm as a black box, and let it select the most important task at any time to run. But instead of just choosing from the currently runnable tasks, we want it to also consider mutex-blocked tasks when selecting a task to run. Then, should the task it selects be blocked on a mutex, we instead select the mutex owner to run as the selected task's "proxy", but run it with the entire scheduler context of the selected task.
This elegant and simple idea, unfortunately, has a number of edge cases that needed to be resolved for it to work. As the complexity grows, this idea of simply running a lock-holding task on behalf of a blocked task starts to break down. It begins to make sense that implementing this idea has stymied previous developers, and one might wonder if it's worth it.
However, the reason the Android team is continuing to push this effort is because the early results seen with it are favorable, as it avoids the long-tail outlier latencies caused by priority inversion, giving more deterministic behavior for foreground applications.
To spur discussion, I covered a number of risks and issues, some recently addressed and others still to be resolved, as well as some half-formed ideas for avoiding some of the complex migrations.
Peter Zijlstra mentioned that work on split reservations in SCHED_DEADLINE, as well as on CFS deadline servers parallels the splitting of the kernel's task data into separate scheduler and execution contexts for proxy execution, so there is a potential for sharing some logic and cleanup there. Similarly, around the issues of pick_next_task() having side effects, the task returned is what is run, so side effects don't matter. Zijlstra noted similar changes are needed for core scheduling to separate pick_next_task() and set_next_task().
Joel Fernandes and Steven Rostedt also contributed ideas to the discussion around how we might avoid migrations as well as how we might amortize task-tree migrations so it isn't a big cost all at once.
Additionally, for the concern about optimistic spinning being disabled and impacting performance, Zijlstra suggested that we might be able to avoid the complexity of blocked-task migration in the case where the lock holder is actively running on a CPU, restoring optimistic spinning in that case. Zijlstra also pushed to get actual numbers on how costly the additional migration and selection retries are so we have a real sense of the concern rather than just a theoretical one. This seemed to align with some of the SCHED_DEADLINE bandwidth-inheritance efforts as well. Dario Faggioli pointed out that the spinning was actually necessary for some of the EDF research algorithms, causing the CPU time of the waiting task to be consumed so that later deadlines aren't affected. The implications of this were unclear to me, so follow up discussions were planned for a later time.
(See also: this article on proxy execution.)
Sched_ext: Pluggable scheduling with BPF
Author: David Vernet (video)Sched_ext is a new extensible scheduling class that allows system-wide scheduling policies to be implemented in BPF. The talk began by introducing sched_ext at a high level, and enumerating the reasons that it was built. These reasons include rapid and safe experimentation, bespoke scheduling policies for individual applications, enabling quick rollouts of Spectre mitigations (such as for L1TF, where the CPU scheduler can help), and enabling some complexity to be moved to user space. For Meta, sched_ext has allowed experiments with different features, some of which will soon be sent upstream for inclusion in CFS. Additionally, it has allowed Meta to build bespoke schedulers that have resulted in multi-percent improvements in throughput and p99 latency for its main web workloads.
At this point, Zijlstra pointed out that most Spectre mitigations had nothing to do with the CPU scheduler, and questioned the value of the "quick rollout" mechanism. It was clarified that the intention was that it could be beneficial if and when any vulnerabilities are discovered in the future that can be mitigated with scheduling. Another attendee asked why Meta was interested in upstreaming to CFS if it could just run its own schedulers. The response was that Meta is an upstream-first company, with few internal patches used in its deployed kernel. Additionally, many of the kernel engineers at Meta (including Chris Mason, who was in attendance) are long-time maintainers or contributors in core subsystems, and believe strongly in the general benefits of open source.
The talk continued with an overview of the interface of sched_ext, describing the struct sched_ext_ops callback mechanism and the dispatch queue (DSQ) abstraction, both of which are described in the LWN article linked above.
Zijlstra said that sched_ext was just a worse version of the debugfs scheduler knobs, in that vendors such as Oracle would do nonstandard things such as shipping their own scheduler rather than using CFS. Mason responded that, if a scheduler had a feature which provided compelling performance, and if that feature was missing from CFS, then we would simply have to figure out how to enable it in CFS so that it could be used without sacrificing performance. Steven Rostedt pointed out that sched_ext is a different proposal than the debugfs knobs because sched_ext sits below CFS, and therefore wouldn't impose any maintainership burden on Zijlstra.
As the presentation continued, there were several other concerns raised by the audience. Thomas Gleixner claimed that using BPF could expose the scheduler to user-space-API stability constraints due to the BPF scheduler program exposing interfaces to its user-space counterpart. It was pointed out in response that sched_ext uses the struct_ops interface, which is purely internal to the kernel and provides no ABI guarantees. This point echoes the sentiment shared by Linus Torvalds at the 2022 Kernel Maintainers Summit.
Another concern was that sched_ext would discourage upstream contributions to CFS. Sched_ext intends to try and mitigate this possibility by encouraging users to upstream their schedulers in similar fashion to kernel modules. Schedulers that are out-of-tree may break any time if the scheduler's struct_ops interface changes, whereas in-tree schedulers will be updated to avoid build and performance regressions. The bar for upstreaming sched_ext schedulers can also be lower than for CFS changes. Upstreaming to CFS requires carefully integrating a feature into a complex code base and ensuring that it doesn't regress any of the existing features. With sched_ext, developers can upstream their BPF schedulers, and then add their features to CFS at a later time once they are proven to be worth the complexity and maintainership cost. Finally, it was pointed out that all sched_ext programs must be GPLv2, and will be rejected by the verifier otherwise.
The talk concluded with everyone going out to enjoy an espresso.
A push/pull model for timers
Author: Anna-Maria Behnsen (video)When a timer expires, it is better to run its handler function on an active CPU rather than waking an idle CPU for this purpose. The current approach to prevent handling of timer-wheel timers on idle CPUs uses heuristics at enqueue time. The heuristic calculates the best CPU, which hopefully will not be idle when the timer expires. It is the so-called "push model" — push the timer to another CPU at enqueue time. This model has two problems: heuristics are not reliable, and most timer-wheel timers are canceled or rearmed before they expire. This means wasting cycles during the enqueue operation to generate a possibly incorrect solution that has a high probability of not being used anyway.
The future state would be a pull model. To support this model, a hierarchy is created at boot time as CPUs are brought online. When a CPU goes idle, all timers that are not pinned to that CPU (and are not deferrable) will be enqueued into the hierarchy. The hierarchy is used to build groups where a single CPU acts as migrator and ensures that enqueued timers are expired in time and are "pulled" onto a busy CPU.
Testing is an important step to validate this approach. Several people volunteered to integrate the sixth version (now seventh) of the implementation into their test environment.
A discussion about scheduler behavior was introduced by the audience as well as by the speaker; when a timer expires on a remote CPU, the CPU where the timer was started originally gets woken up. This is caused by the completely fair scheduler, as it assumes that the context is still cache-hot and, for performance reasons, the CPU is woken up again instead of reloading the context on the actually busy CPU. The result during the discussion was that this behavior could be changed and patch proposals are welcome.
There is still an open point in the new approach about handling the deferrable timers, most of which could be mapped onto non-pinned timers, but those that also contain the "pinned" flag beside the "deferrable" flag, could not simply be mapped. Some people in the audience offered support to get rid of those five last users, which are mainly users of deferrable work.
Dynamic Energy Model
Author: Lukasz Luba (video)This talk was about four new features that are going to be part of the energy model (EM) framework. These features form a set and have been called the "dynamic energy model". The first feature allows users to modify the model values at run time to better reflect the current hardware; an example would be to reflect increased static power (leakage) due to higher temperature of the SoC in gaming workloads. The patch set is available on the mailing list. The feature was presented at the 2022 Linux Plumbers Conference, but this time some plots have been added showing how the CPU power can increase over time.
The second feature aims to improve the simulation performed in the energy-aware scheduler (EAS) while doing the task placement during wakeup. The hardware dependency issue described in the talk creates a situation where the energy used for performing the computation for two tiny tasks can be far higher than the simulation in the kernel. It is due to shared voltage and frequency domain between the little cores in a big.LITTLE system and the dynamic shared unit (DSU), which contains the L3 cache. They both can perform dynamic voltage and frequency scaling (DVFS) to save power, but L3 cache is an important factor in three-gear SoCs — big.LITTLE systems with a third performance level provided by an even bigger "big" CPU.
The performance (latency and throughput) of L3 cache for big CPUs is crucial. To maintain that performance, the system may increase the frequency (and voltage) for the domain containing the little CPUs and the DSU, even when the little CPUs don't require that. This behavior of the hardware is not modeled in the current energy model, but the new feature is going to change that. The results from a simple experiment showed power saving for two tiny tasks of ~50% in that specific scenario, which was in total ~35mW. According to little's energy model, ~40mW means an average power of one CPU core running at 1GHz, fully loaded, so it shouldn't be ignored.
The third feature was about modeling CPU wakeup cost from deep idle state. Currently, the EAS doesn't take into account if the CPU is in a deeper idle state and if it is worth waking it up for a small task. This wakeup energy cost might not be that small for the big CPU and it might be better to find another CPU that could handle the small task.
The last feature was about introducing a new field in the energy model that reflects the performance of the CPU for each frequency. The performance of the CPU might not be linear with the frequency. It can also vary for different workloads (e.g. integer vs. floating-point heavy). That doesn't fit well with the current software model. Luba also showed slides with power variation for a CPU core running at fixed frequency. The power varied for different benchmarks.
Those two pieces of information create a variety of power vs. performance curves for the CPU. This has been called the "power and performance profile", and there can be many of those. There was also a reference to Morten Rasmussen's talk at the 2022 Linux Plumbers Conference, when he showed four energy models with those different characteristics. With the run-time modifications to the energy model, it would be possible to plug in a new power and performance profile for a long-running application (like video conferencing) to improve decisions in EAS, better utilize the hardware, and save the battery.
Eco-friendly Linux kernel development: minimizing energy consumption during CI/CD
Author: Andrea Righi (video)In this talk, Righi presented KernelCraft (since renamed to virtme-ng), a tool that allows users to quickly and efficiently build and run tests with custom kernels.
Back in the old days, Righi was testing kernels on the same PC he was using for development, which resulted in multiple filesystem corruptions and lost work. With virtualization, he was able to test kernels more efficiently, but redeploying and reconfiguring a corrupted VM was still not ideal. Righi then discovered virtme (a tool written by Andy Lutomirski), that allows one to run a kernel virtualizing the entire filesystem of the host (exported read-only using the 9p filesystem). In this way, it is possible to avoid the hassle of re-deploying the testing VM if something goes bad. Unfortunately, virtme is unmaintained at the moment, so Righi decided to fork the project and create KernelCraft.
This tool generates a minimal kernel configuration that includes only the essential support required to run the kernel inside a QEMU instance. The tool then builds the kernel and uses virtme to create a live copy-on-write snapshot of the host system (using overlayfs with tmpfs to handle writes inside the guest).
Zijlstra suggested taking a look at the kvm_guest.config make target in the kernel, while Steven Rostedt suggested taking a look at ktest.pl in the kernel source tree for generating a minimal .config.
According to the tests, Righi is able to generate a kernel from scratch, run a simple uname -r inside it, and collect the result on the host in only 82 seconds, compared to over an hour using a typical compile-deploy-test workflow with an Ubuntu kernel; the amount of energy saved in the process is around a factor of 10x.
In conclusion, this tool has proved to be useful in a CI/CD scenario or for conducting kernel bisects, as it allows significant time and energy savings. Moreover, with this talk, Righi aimed to share the tools and environment that he is using in his kernel development and debugging activity. Sharing our respective workflows can be extremely useful, as even seemingly minor details can help or inspire others to become more involved in kernel development.
Brief items
Security
Kernel development
Kernel release status
The 6.4 kernel was released on June 25; Linus said:
Most of the stuff in my mailbox the last week has been about upcoming things for 6.5, and I already have 15 pull requests pending. I appreciate all you proactive people.But that's for tomorrow. Today we're all busy build-testing the newest kernel release, and checking that it's all good. Right?
Headline features in this release include: generic iterators for BPF, the removal of the SELinux runtime disable knob, the removal of the SLOB memory allocator, linear address masking support on Intel CPUs, process-level samepage merging control, support for user trace events, more infrastructure for writing kernel modules in Rust, per-VMA locks, and much more. See the LWN merge-window summaries (part 1, part 2), and the (in-progress) KernelNewbies 6.4 page for the details.
Stable updates: 6.3.10, 6.1.36, 5.15.119, 5.10.186, 5.4.249, 4.19.288, and 4.14.320 were released on June 28.
Distributions
AlmaLinux's response to Red Hat's policy change
The AlmaLinux organization has posted a message describing the impact of Red Hat's decision to stop releasing the source to the RHEL distribution and how AlmaLinux will respond.
In the immediate term, our plan is to pull from CentOS Stream updates and Oracle Linux updates to ensure security patches continue to be released. These updates will be carefully curated to ensure they are 1:1 compatible with RHEL, while not violating Red Hat’s licensing, and will be vetted and tested just like all of our other releases.
Update: Rocky Linux has also sent out a
release on the subject. "There will be no disruption or change for
any Rocky Linux users, collaborators, or partners
".
Kuhn: A Comprehensive Analysis of the GPL Issues With the Red Hat Enterprise Linux (RHEL) Business Model
Over on the Software Freedom Conservancy blog, Policy Fellow and Hacker-in-Residence Bradley M. Kuhn analyzes the recent changes to Red Hat Enterprise Linux (RHEL) source availability in light of the GPL. It contains some interesting information about two alleged GPL violations that came about because the company's business model is structured in a way that brings it too close to non-compliance with the license, he said:Perhaps the biggest problem with a murky business model that skirts the line of GPL compliance is that violations can and do happen — since even a minor deviation from the business model clearly violates the GPL agreements. Pre-IBM Red Hat deserves a certain amount of credit, as SFC is aware of only two documented incidents of GPL violations that have occurred since 2006 regarding the RHEL business model. We've decided to share some general details of these violations for the purpose of explaining where this business model can so easily cross the line.[...] In another violation incident, we learned that Red Hat, in a specific non-USA country, was requiring that any customer who lowered the number of RHEL machines under service contract with Red Hat sign an additional agreement. This additional agreement promised that the customer had deleted every copy of RHEL in their entire organization other than the copies of RHEL that were currently contracted for service with Red Hat. Again, this is a "further restriction". The GPL agreements give everyone the unfettered right to make and keep as many copies of the software as they like, and a distributor of GPL'd software may not require a user to attest that they've deleted these legitimate, licensed copies of third-party-licensed software under the GPL. SFC informed Red Hat's legal department of this violation, and we were assured that this additional agreement would no longer be presented to any Red Hat customers in the future.
McGrath: Red Hat’s commitment to open source
Red Hat's Mike McGrath responds to the many criticisms aimed at the company since it changed its policy regarding RHEL source code.
Ultimately, we do not find value in a RHEL rebuild and we are not under any obligation to make things easier for rebuilders; this is our call to make. That brings me to CentOS Stream, of which there is immense confusion. I acknowledge that this is a change in a longstanding tradition where we went above and beyond, and change like this can cause some confusion. That confusion manifested as accusations about us going closed-source and about alleged GPL violations. There is CentOS Stream the binary deliverable, and CentOS Stream the source repository. The CentOS Stream gitlab source is where we build RHEL releases, in the open for all to see. To call RHEL “closed source” is categorically untrue and inaccurate. CentOS Stream moves faster than RHEL, so it might not be on HEAD, but the code is there. If you can’t find it, it’s a bug – please let us know.
Distributions quote of the week
So here is the reality with security updates. The vast majority of security updates are shipped in RHEL 3-9 months after we fix them, because minimizing the quantity of updates is an important goal in RHEL to reduce update churn for customers, so we only want to release quick fixes for issues that pose serious risk. (Most security issues are just not very urgent.) This means you get most security fixes drastically sooner in CentOS Stream than you would in RHEL.— Michael CatanzaroHowever, higher-severity security updates do get fixed in RHEL first. Developers are not permitted to fix higher-severity security issues in CentOS Stream until after the fix is shipped in at least one RHEL update. We're encouraged to do so immediately after the fix ships in RHEL, so there *should* only be a minor delay of, say, one or two business days for the developer to notice the update has shipped. So in general, CentOS Stream *should* generally be ahead of RHEL and ideally only slightly behind for the more serious CVEs.
Development
Ekstrand: NVK update: Enabling new extensions, conformance status & more
Faith Ekstrand has provided an update on the status of the NVK Vulkan driver for NVIDIA GPUs.
Probably the single most common question I get from folks is, "When will NVK be in upstream mesa?" The short answer is that it'll be upstreamed along with the new kernel API. The new API is going to be required in order to implement Vulkan correctly in a bunch of cases. Even though it mostly works on top of upstream nouveau, I don't want to be maintaining support for that interface for another 10 years when it only partially works.We don't yet have an exact timetable for when the new API will be ready. I'm currently hoping that we get it all upstream this year but I can't say when exactly.
Page editor: Jake Edge
Announcements
Newsletters
Distributions and system administration
Development
Calls for Presentations
CFP Deadlines: June 29, 2023 to August 28, 2023
The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.
Deadline | Event Dates | Event | Location |
---|---|---|---|
July 2 | November 3 November 5 |
Ubuntu Summit | Riga, Latvia |
July 7 | September 13 September 14 |
All Systems Go! 2023 | Berlin, Germany |
July 17 | October 17 October 19 |
X.Org Developers Conference 2023 | A Coruña, Spain |
July 21 | September 13 | eBPF Summit 2023 | online |
July 29 | September 8 September 9 |
OLF Conference 2023 | Columbus, OH, US |
July 31 | October 7 October 8 |
LibreOffice - Ubuntu Conference Asia 2023 | Surakarta, Indonesia |
July 31 | November 7 November 9 |
Open Source Monitoring Conference | Berlin, Germany |
July 31 | November 27 November 29 |
Deutsche Open Stack Tage | Berlin, Germany |
August 6 | November 13 November 15 |
Linux Plumbers Conference | Richmond, VA, US |
August 12 | September 22 September 24 |
Jesień Linuksowa 2023 | Gliwice, Poland |
August 13 | September 10 September 17 |
DebConf 23 | Kochi, India |
August 14 | September 14 September 16 |
Kieler Open Source und Linux Tage | Kiel, Germany |
August 18 | October 5 October 6 |
PyConZA | Durban, South Africa |
August 19 | September 2 September 3 |
Open Source Conference Albania 2023 | Tirana, Albania |
August 20 | September 25 September 26 |
GStreamer Conference 2023 | A Coruña, Spain |
August 27 | October 30 November 3 |
Netdev 0x17 | Vancouver, Canada |
If the CFP deadline for your event does not appear here, please tell us about it.
Upcoming Events
Events: June 29, 2023 to August 28, 2023
The following event listing is taken from the LWN.net Calendar.
Date(s) | Event | Location |
---|---|---|
June 28 June 30 |
Embedded Open Source Summit | Prague, Czech Republic |
July 13 July 16 |
Free and Open Source Yearly | Portland OR, US |
July 15 July 21 |
aKademy 2023 | Thessaloniki, Greece |
July 26 July 31 |
GUADEC | Riga, Latvia |
August 2 August 4 |
Flock to Fedora | Cork, Ireland |
August 5 August 6 |
FrOSCon 18 | Hochschule Bonn-Rhein-Sieg, Germany |
If your event does not appear here, please tell us about it.
Security updates
Alert summary June 22, 2023 to June 28, 2023
Dist. | ID | Release | Package | Date |
---|---|---|---|---|
Debian | DSA-5438-1 | stable | asterisk | 2023-06-22 |
Debian | DLA-3466-1 | LTS | avahi | 2023-06-21 |
Debian | DSA-5439-1 | stable | bind9 | 2023-06-25 |
Debian | DLA-3471-1 | LTS | c-ares | 2023-06-26 |
Debian | DLA-3467-1 | LTS | hsqldb | 2023-06-22 |
Debian | DSA-5437-1 | stable | hsqldb | 2023-06-21 |
Debian | DLA-3468-1 | LTS | hsqldb1.8.0 | 2023-06-22 |
Debian | DSA-5436-1 | stable | hsqldb1.8.0 | 2023-06-21 |
Debian | DLA-3472-1 | LTS | libx11 | 2023-06-26 |
Debian | DLA-3469-1 | LTS | lua5.3 | 2023-06-23 |
Debian | DLA-3465-1 | LTS | minidlna | 2023-06-21 |
Debian | DSA-5434-1 | stable | minidlna | 2023-06-21 |
Debian | DLA-3470-1 | LTS | owslib | 2023-06-25 |
Debian | DSA-5435-1 | stable | trafficserver | 2023-06-21 |
Debian | DSA-5435-2 | stable | trafficserver | 2023-06-22 |
Debian | DLA-3464-1 | LTS | xmltooling | 2023-06-21 |
Fedora | FEDORA-2023-1b99669138 | F37 | chromium | 2023-06-27 |
Fedora | FEDORA-2023-9ea5d6e289 | F38 | dav1d | 2023-06-24 |
Fedora | FEDORA-2023-edb993aeaf | F37 | dotnet6.0 | 2023-06-24 |
Fedora | FEDORA-2023-401e38c388 | F38 | dotnet6.0 | 2023-06-24 |
Fedora | FEDORA-2023-e6d5cb11bb | F37 | dotnet7.0 | 2023-06-24 |
Fedora | FEDORA-2023-ee819d655b | F38 | dotnet7.0 | 2023-06-24 |
Fedora | FEDORA-2023-c7f63322b5 | F38 | kubernetes | 2023-06-27 |
Fedora | FEDORA-2023-d22162d9ba | F38 | mingw-dbus | 2023-06-24 |
Fedora | FEDORA-2023-eb9bec6e8c | F37 | tang | 2023-06-23 |
Fedora | FEDORA-2023-3e84bba241 | F38 | tang | 2023-06-23 |
Fedora | FEDORA-2023-92686b3e8b | F37 | trafficserver | 2023-06-23 |
Fedora | FEDORA-2023-2e6bead58b | F38 | trafficserver | 2023-06-23 |
Fedora | FEDORA-2023-6ad6467a06 | F38 | vim | 2023-06-25 |
Fedora | FEDORA-2023-ab291ca614 | F38 | wabt | 2023-06-26 |
Mageia | MGASA-2023-0207 | 8 | docker-docker-registry | 2023-06-28 |
Mageia | MGASA-2023-0205 | 8 | libcap | 2023-06-28 |
Mageia | MGASA-2023-0206 | 8 | libx11 | 2023-06-28 |
Mageia | MGASA-2023-0204 | 8 | mediawiki | 2023-06-28 |
Mageia | MGASA-2023-0210 | 8 | python-requests | 2023-06-28 |
Mageia | MGASA-2023-0211 | 8 | python-tornado | 2023-06-28 |
Mageia | MGASA-2023-0209 | 8 | sofia-sip | 2023-06-28 |
Mageia | MGASA-2023-0208 | 8 | sqlite | 2023-06-28 |
Mageia | MGASA-2023-0212 | 8 | xonotic | 2023-06-28 |
Oracle | ELSA-2023-3582 | OL8 | .NET 6.0 | 2023-06-21 |
Oracle | ELSA-2023-3593 | OL8 | .NET 7.0 | 2023-06-22 |
Oracle | ELSA-2023-3592 | OL9 | .NET 7.0 | 2023-06-21 |
Oracle | ELSA-2023-3577 | OL9 | 18 | 2023-06-21 |
Oracle | ELSA-2023-3741 | OL7 | c-ares | 2023-06-21 |
Oracle | ELSA-2023-3741 | OL7 | c-ares | 2023-06-22 |
Oracle | ELSA-2023-3584 | OL8 | c-ares | 2023-06-21 |
Oracle | ELSA-2023-3579 | OL7 | firefox | 2023-06-21 |
Oracle | ELSA-2023-3579 | OL7 | firefox | 2023-06-22 |
Oracle | ELSA-2023-3590 | OL8 | firefox | 2023-06-21 |
Oracle | ELSA-2023-3589 | OL9 | firefox | 2023-06-21 |
Oracle | ELSA-2023-12394 | OL8 | kernel | 2023-06-21 |
Oracle | ELSA-2023-3725 | OL9 | less | 2023-06-21 |
Oracle | ELSA-2023-3711 | OL9 | libtiff | 2023-06-21 |
Oracle | ELSA-2023-3715 | OL9 | libvirt | 2023-06-21 |
Oracle | ELSA-2023-3722 | OL9 | openssl | 2023-06-22 |
Oracle | ELSA-2023-3714 | OL9 | postgresql | 2023-06-22 |
Oracle | ELSA-2023-3555 | OL7 | python | 2023-06-21 |
Oracle | ELSA-2023-3555 | OL7 | python | 2023-06-22 |
Oracle | ELSA-2023-3591 | OL8 | python3 | 2023-06-22 |
Oracle | ELSA-2023-3594 | OL8 | python3.11 | 2023-06-21 |
Oracle | ELSA-2023-3585 | OL9 | python3.11 | 2023-06-21 |
Oracle | ELSA-2023-3661 | OL8 | texlive | 2023-06-22 |
Oracle | ELSA-2023-3661 | OL9 | texlive | 2023-06-21 |
Oracle | ELSA-2023-3563 | OL7 | thunderbird | 2023-06-21 |
Oracle | ELSA-2023-3563 | OL7 | thunderbird | 2023-06-22 |
Oracle | ELSA-2023-3588 | OL8 | thunderbird | 2023-06-21 |
Oracle | ELSA-2023-3587 | OL9 | thunderbird | 2023-06-21 |
Red Hat | RHSA-2023:3741-01 | EL7 | c-ares | 2023-06-21 |
Red Hat | RHSA-2023:3847-01 | EL8 | kernel | 2023-06-27 |
Red Hat | RHSA-2023:3852-01 | EL8.1 | kernel | 2023-06-27 |
Red Hat | RHSA-2023:3723-01 | EL9 | kernel | 2023-06-21 |
Red Hat | RHSA-2023:3819-01 | EL8 | kernel-rt | 2023-06-27 |
Red Hat | RHSA-2023:3708-01 | EL9 | kernel-rt | 2023-06-21 |
Red Hat | RHSA-2023:3853-01 | EL8.1 | kpatch-patch | 2023-06-27 |
Red Hat | RHSA-2023:3705-01 | EL9 | kpatch-patch | 2023-06-21 |
Red Hat | RHSA-2023:3725-01 | EL9 | less | 2023-06-21 |
Red Hat | RHSA-2023:3839-01 | EL8 | libssh | 2023-06-27 |
Red Hat | RHSA-2023:3827-01 | EL8 | libtiff | 2023-06-27 |
Red Hat | RHSA-2023:3711-01 | EL9 | libtiff | 2023-06-21 |
Red Hat | RHSA-2023:3715-01 | EL9 | libvirt | 2023-06-21 |
Red Hat | RHSA-2023:3722-01 | EL9 | openssl | 2023-06-21 |
Red Hat | RHSA-2023:3714-01 | EL9 | postgresql | 2023-06-21 |
Red Hat | RHSA-2023:3780-01 | EL8 | python27:2.7 | 2023-06-22 |
Red Hat | RHSA-2023:3777-01 | EL8.2 | python27:2.7 | 2023-06-22 |
Red Hat | RHSA-2023:3810-01 | EL8.6 | python27:2.7 | 2023-06-27 |
Red Hat | RHSA-2023:3796-01 | EL8.6 | python3 | 2023-06-26 |
Red Hat | RHSA-2023:3781-01 | EL8 | python38:3.8, python38-devel:3.8 | 2023-06-26 |
Red Hat | RHSA-2023:3776-01 | EL8.6 | python39:3.9 and python39-devel:3.9 | 2023-06-22 |
Red Hat | RHSA-2023:3811-01 | EL8 | python39:3.9, python39-devel:3.9 | 2023-06-27 |
Red Hat | RHSA-2023:3821-01 | EL8 | ruby:2.7 | 2023-06-27 |
Red Hat | RHSA-2023:3840-01 | EL8 | sqlite | 2023-06-27 |
Red Hat | RHSA-2023:3837-01 | EL8 | systemd | 2023-06-27 |
Red Hat | RHSA-2023:3822-01 | EL8 | virt:rhel, virt-devel:rhel | 2023-06-27 |
Scientific Linux | SLSA-2023:3741-1 | SL7 | c-ares | 2023-06-22 |
Slackware | SSA:2023-172-01 | bind | 2023-06-21 | |
Slackware | SSA:2023-173-01 | cups | 2023-06-22 | |
Slackware | SSA:2023-172-02 | kernel | 2023-06-21 | |
SUSE | SUSE-SU-2023:2571-1 | MP4.2 MP4.3 SLE15 SLE39 SLE-m5.1 SLE-m5.2 SLE-m5.3 SLE-m5.4 SES7 SES7.1 oS15.4 oS15.5 osM5.3 | Salt | 2023-06-21 |
SUSE | SUSE-SU-2023:2650-1 | MP4.0 MP4.1 MP4.2 MP4.3 SLE15 oS15.4 oS15.5 | amazon-ssm-agent | 2023-06-27 |
SUSE | SUSE-SU-2023:2656-1 | SLE12 | amazon-ssm-agent | 2023-06-27 |
SUSE | SUSE-SU-2023:2667-1 | MP4.3 SLE15 oS15.4 | bind | 2023-06-28 |
SUSE | SUSE-SU-2023:2605-1 | MP4.3 SLE15 SLE-m5.3 SLE-m5.4 oS15.4 osM5.3 | bluez | 2023-06-22 |
SUSE | SUSE-SU-2023:2562-1 | OS9 SLE12 | bluez | 2023-06-21 |
SUSE | SUSE-SU-2023:2628-1 | MP4.0 MP4.1 MP4.2 MP4.3 SLE15 oS15.4 oS15.5 | cloud-init | 2023-06-23 |
SUSE | SUSE-SU-2023:2665-1 | MP4.3 SLE15 oS15.4 oS15.5 | cosign | 2023-06-27 |
SUSE | SUSE-SU-2023:2616-1 | MP4.2 MP4.3 SLE15 SLE-m5.2 SLE-m5.3 SLE-m5.4 SES7 SES7.1 oS15.4 oS15.5 osM5.3 | cups | 2023-06-22 |
SUSE | SUSE-SU-2023:2224-2 | oS15.5 | curl | 2023-06-21 |
SUSE | SUSE-SU-2023:2618-1 | SLE15 oS15.5 | dav1d | 2023-06-23 |
SUSE | SUSE-SU-2023:2273-2 | SLE15 oS15.5 | geoipupdate | 2023-06-21 |
SUSE | SUSE-SU-2023:2297-2 | SLE15 oS15.5 | golang-github-vpenso-prometheus_slurm_exporter | 2023-06-23 |
SUSE | SUSE-SU-2023:2617-1 | MP4.0 MP4.1 MP4.2 MP4.3 SLE15 oS15.5 | google-cloud-sap-agent | 2023-06-23 |
SUSE | openSUSE-SU-2023:0137-1 | osB15 | guile1, lilypond | 2023-06-27 |
SUSE | SUSE-SU-2023:2242-2 | SLE15 oS15.5 | java-1_8_0-openjdk | 2023-06-23 |
SUSE | openSUSE-SU-2023:0157-1 | osB15 | keepass | 2023-06-28 |
SUSE | SUSE-SU-2023:2651-1 | MP4.1 SLE15 SES7 | kernel | 2023-06-27 |
SUSE | SUSE-SU-2023:2611-1 | MP4.2 SLE15 SLE-m5.1 SLE-m5.2 SES7.1 oS15.4 | kernel | 2023-06-22 |
SUSE | SUSE-SU-2023:2653-1 | MP4.3 SLE15 SLE-m5.3 SLE-m5.4 oS15.4 osM5.3 | kernel | 2023-06-27 |
SUSE | SUSE-SU-2023:2654-1 | MP4.3 SLE15 oS15.4 | kubernetes1.24 | 2023-06-27 |
SUSE | SUSE-SU-2023:2664-1 | SLE15 oS15.5 | kubernetes1.24 | 2023-06-27 |
SUSE | SUSE-SU-2023:2614-1 | MP4.2 MP4.3 SLE15 SLE-m5.2 SLE-m5.3 SLE-m5.4 SES7 SES7.1 oS15.4 oS15.5 osM5.3 | libX11 | 2023-06-22 |
SUSE | SUSE-SU-2023:2652-1 | SLE15 oS15.5 | libvirt | 2023-06-27 |
SUSE | SUSE-SU-2023:2096-2 | SLE15 oS15.5 | netty, netty-tcnative | 2023-06-21 |
SUSE | SUSE-SU-2023:2663-1 | MP4.3 SLE15 oS15.4 | nodejs16 | 2023-06-27 |
SUSE | SUSE-SU-2023:2655-1 | SLE12 | nodejs16 | 2023-06-27 |
SUSE | SUSE-SU-2023:2669-1 | MP4.3 SLE15 oS15.4 oS15.5 | nodejs18 | 2023-06-28 |
SUSE | SUSE-SU-2023:2662-1 | SLE12 | nodejs18 | 2023-06-27 |
SUSE | SUSE-SU-2023:2608-1 | MP4.3 SLE15 oS15.4 oS15.5 | ntp | 2023-06-22 |
SUSE | SUSE-SU-2023:2609-1 | SLE12 | ntp | 2023-06-22 |
SUSE | SUSE-SU-2023:2604-1 | MP4.3 SLE15 SLE-m5.1 SLE-m5.2 SLE-m5.3 SLE-m5.4 oS15.4 oS15.5 osM5.3 | open-vm-tools | 2023-06-22 |
SUSE | SUSE-SU-2023:2624-1 | OS9 SLE12 | openssl-1_0_0 | 2023-06-23 |
SUSE | SUSE-SU-2023:2648-1 | MP4.3 SLE15 SLE-m5.3 SLE-m5.4 oS15.4 osM5.3 | openssl-1_1 | 2023-06-27 |
SUSE | SUSE-SU-2023:2623-1 | OS9 SLE12 | openssl-1_1 | 2023-06-23 |
SUSE | SUSE-SU-2023:2622-1 | SLE15 | openssl-1_1 | 2023-06-23 |
SUSE | SUSE-SU-2023:29171-1 | SLE15 oS15.5 | openssl-1_1 | 2023-06-23 |
SUSE | SUSE-SU-2023:2620-1 | SLE15 oS15.5 | openssl-3 | 2023-06-23 |
SUSE | SUSE-SU-2023:2621-1 | SLE12 | openvswitch | 2023-06-23 |
SUSE | SUSE-SU-2023:2610-1 | MP4.3 SLE15 oS15.4 oS15.5 | php8 | 2023-06-22 |
SUSE | openSUSE-SU-2023:0154-1 | SLE12 | phpMyAdmin | 2023-06-28 |
SUSE | SUSE-SU-2023:2561-1 | MP4.3 SLE15 SLE-m5.3 SLE-m5.4 oS15.4 oS15.5 | python-reportlab | 2023-06-21 |
SUSE | SUSE-SU-2023:2619-1 | MP4.3 SLE15 oS15.4 oS15.5 | python-sqlparse | 2023-06-23 |
SUSE | SUSE-SU-2023:2603-1 | MP4.3 SLE15 oS15.4 oS15.5 | rustup | 2023-06-22 |
SUSE | SUSE-SU-2023:2572-1 | SLE15 SES7 | salt | 2023-06-21 |
SUSE | SUSE-SU-2023:2668-1 | OS9 SLE12 | sqlite3 | 2023-06-28 |
SUSE | SUSE-SU-2023:2253-2 | SLE15 oS15.5 | terraform-provider-aws | 2023-06-21 |
SUSE | SUSE-SU-2023:2261-2 | SLE15 oS15.5 | terraform-provider-null | 2023-06-21 |
SUSE | SUSE-SU-2023:2607-1 | MP4.2 SLE15 SES7 SES7.1 oS15.4 | webkit2gtk3 | 2023-06-22 |
SUSE | SUSE-SU-2023:2647-1 | MP4.3 SLE15 oS15.4 oS15.5 | webkit2gtk3 | 2023-06-27 |
SUSE | SUSE-SU-2023:2606-1 | OS9 SLE12 | webkit2gtk3 | 2023-06-22 |
Ubuntu | USN-6183-1 | 20.04 22.04 22.10 23.04 | bind9 | 2023-06-21 |
Ubuntu | USN-6184-1 | 20.04 22.04 22.10 23.04 | cups | 2023-06-22 |
Ubuntu | USN-6161-2 | 22.04 22.10 23.04 | dotnet6, dotnet7 | 2023-06-23 |
Ubuntu | USN-6189-1 | 22.10 23.04 | etcd | 2023-06-27 |
Ubuntu | USN-6185-1 | 20.04 | linux-aws, linux-azure, linux-bluefield, linux-gcp, linux-gke, linux-gkeop, linux-ibm, linux-kvm, linux-oracle, linux-raspi | 2023-06-22 |
Ubuntu | USN-6186-1 | 23.04 | linux-azure, linux-gcp, linux-ibm, linux-kvm, linux-oracle | 2023-06-22 |
Ubuntu | USN-6187-1 | 22.10 | linux-ibm | 2023-06-22 |
Ubuntu | USN-6188-1 | 14.04 16.04 | openssl | 2023-06-22 |
Kernel patches of interest
Kernel releases
Architecture-specific
Build system
Core kernel
Development tools
Device drivers
Device-driver infrastructure
Documentation
Filesystems and block layer
Memory management
Networking
Security-related
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet