LWN.net Weekly Edition for September 28, 2023 [LWN.net]

Welcome to the LWN.net Weekly Edition for September 28, 2023

This edition contains the following feature content:

AI from a legal perspective: Van Lindberg looks at AI and the law in an OSSEU talk.
Moving the kernel to large block sizes: doing I/O on larger-than-page-size blocks has been of interest to storage developers and manufacturers for a while; where does the effort stand?
Revisiting the kernel's preemption models (part 1): a patch to speed up clearing huge pages turns into a discussion on how the kernel handles being preempted.
User-space spinlocks with help from rseq(): implementing adaptive spinlocks for user space requires some kernel support, which is now present, but may not actually turn out to be worth the effort.
The PuzzleFS container filesystem: another new kernel filesystem for container use cases is being proposed; this one is written in Rust.

This week's edition also includes these inner pages:

Brief items: Brief news items from throughout the community.
Announcements: Newsletters, conferences, security updates, patches, and more.

Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.

Comments (2 posted)

AI from a legal perspective

By Jake Edge
September 26, 2023

OSSEU

The AI boom is clearly upon us, but there are still plenty of questions swirling around this technology. Some of those questions are legal ones and there have been lawsuits filed to try to get clarification—and perhaps monetary damages. Van Lindberg is a lawyer who is well-known in the open-source world; he came to Open Source Summit Europe 2023 in Bilbao, Spain to try to put the current work in AI into its legal context.

Lindberg began by introducing himself; he has been involved in computer law for around 25 years at this point. Throughout that time he also worked in open source (notably as the former General Counsel for the Python Software Foundation). He has also been working on AI issues since 2008, so he is well-positioned to assist in those matters now that "the entire world started going crazy about AI".

He asked the audience whether they primarily identify as technologists or members of the legal field; the response was "not quite half and half", he said, but noted that he would disappoint both groups, "just in different parts of the presentation". His technical description of the techniques being used in AI today would perhaps be somewhat boring for the technologists, while the cases he would talk about are likely already known to those with legal training. But, he said, the good news is that the intersection of AI and law is such a rapidly developing field that there should be something new in his talk for everyone.

In the talk, he would only be focusing on "generative machine learning"; he is aware of the older AI research, but "generative ML is the thing that has really started to drive this AI revolution". It is also much more interesting and challenging from a legal perspective. He would be looking at the advances in the field over just the last five years, though most of what he would present is from the last three or four years, he said.

His presentation is a shortened version of his paper that was published earlier in 2023. It looks at how models are trained and the inferences they perform, along with the copyright implications from a US perspective. The talk would be US-law-focused, as well, because that is "where a lot of the current action is". He pointed to a UK-law-focused article for those who are interested; so far, he has been unable to find a similar work that looks at the issue from an EU-law perspective.

Models

When it is applied to AI, the term "model" is misunderstood by a lot of people, Lindberg said. Many think of it as "the magic black box that does what I want it to do", but it is important to understand what models really are, how they work, and how they are trained, in order to "apply the correct legal analysis". He used an analogy to try to give the audience a "good mental picture" of what is going on with models.

Imagine someone is given the job of "art inspector" and is tasked with inspecting all of the art in the Louvre museum. When he is hired, he knows "absolutely nothing about art"; he does not know what makes it good or bad, what the different types of art are, and so on. He sets out to fix that lack of knowledge by measuring everything he can think of with regard to each work of art: size, weight, materials used, colors employed, creation date and location, etc. He also measures random things like the number of syllables in the artist's name and what corner they choose to sign their work in; he records all of this information in his notebook (i.e. database).

That work starts to get pretty boring, so he invents a game: before he measures something, he is going to use what he already knows to make a guess about the measurement. At first, his guesses are terrible, but after looking at thousands, or millions, of paintings, his guesses start to get better, then much better. He can make pretty accurate guesses about a work with just a little information about it as a starting point; he has effectively recognized hidden patterns that allow him to make these accurate guesses.

That analogy shows the process for model training. It consists of four steps, measuring, predicting, checking, and updating, that get repeated billions of times. When people talk about model creation, they often say that the process is "reading" the data or "it is sucking in all this content"; that is sort of true, but is not exactly what is going on. The training process is extracting certain statistical measurements of the data; it is calculating probabilities associated with those measurements and the data set.

Those probabilities are then used to predict things about some other training data, which has known-correct answers. Those predictions are checked against the answers. For example, a model for something like ChatGPT would use the next word in the existing text to check. For "it was a dark and stormy X", "night" would be a high-probability completion, while "elephant" would be extremely low. Based on the check, the training process updates all of its probabilities to make it a little bit more likely to produce a high-probability completion the next time. It follows this process many millions or billions of times.

That describes the training process for "almost any type of ML", he said; the differences are in what kinds of data are being trained on. The result is the model, which has a particular architecture that consists of the mechanisms used to break down the inputs, the methods used to analyze those inputs, and then a way to represent the output. The architecture is not an implementation, but is simply a logical construct that lives in the heads of the model developers; the code that gets written using, say, PyTorch is an implementation of the model architecture.

The architecture has separate layers for input, output, and some hidden layers that are critical to the inferences (guesses) the model is meant to make. They are arranged as in an artificial neural network like the one above from Lindberg's slides. The input layer turns the input into a number in some fashion; the input could be a pixel value, a word, or a value in a log file. "It doesn't really matter" what the input represents, just that it gets turned into a useful number. The output layer then turns the results from the hidden layers into the final prediction that is the result of the model.

The hidden layers have probabilities, generally called "weights", associated with the inputs and outputs from the nodes in the hidden layer. Unlike a financial model, say, where the model is deterministic, an ML model is "a probabilistic mapping from a set of inputs to a set of outputs". The model uses a technique that is much like Bayes's Theorem, which is used to do probabilistic calculations, he said; it is "essentially a multi-billion parameter Bayesian calculation". The weights in a disk file are just a huge matrix of floating-point values that correspond to the probabilities for each of the different parts of the model.

The weights are "simply a pile of numbers; it is not creative, it is not expressive". They are just the result of a mechanical process, he said. That is important to recognize because it directly impacts the nature of how the law is likely to treat these models. The mental model that people apply to AI will guide their beliefs in how an ML model should be treated; those who see it as the "magic black box" will impute things to it "that simply isn't true". That can lead people to believe that things like logic, emotion, and intent are somehow inside the model; they anthropomorphize the model. The model is, instead, simply "a really complicated statistical equation—that's it".

Intellectual property

When looking at how intellectual property (IP) law applies to AI, there are several parts of the machine-learning process where it might be applied. It could be applied to the training, the model itself, the architecture and code, or the output; each of those needs to be analyzed independently to try to figure out the applicable law. "The inputs are not the outputs and neither one is actually that model in the middle", Lindberg said.

Much of the current activity around AI and IP is about the question of how much copyrighted material can be used in training a machine-learning model. Artists, in particular, but also programmers to a certain extent, are concerned that their works are getting incorporated into these models without recompense to the copyright holders. The argument that the creators make is that there would be no way to train the model if the works did not exist, so they deserve to be paid.

But copyright does not protect every use of a work, he said, and those protections are largely the same in Europe, Asia, and the US. There are a few specific verbs in the US copyright act: copy, create derivative works, and perform; those are the only acts that are protected by copyright. All other uses are either outside of copyright or they are a "fair use"; the latter means that they have been judged to fall outside of the copyright protections.

One of the classic fair uses is "doing analysis of a work"; analyses of this sort can be summaries, reviews, or criticisms of the work. So, reading a book and doing a review in, say, The New York Times, is a perfectly acceptable use of a copyrighted work. Similarly, textual analysis of copyrighted works to gather statistical or stylistic information is not something that copyright protects against. It is a fair use of the work.

The Google Books lawsuit is one that has a lot of relevance for the kinds of lawsuits that are being filed against AI efforts, he said. Google Books is an index of books that the search giant created by scanning physical books, which was the target of a copyright-infringement brought by the Authors Guild. It was ultimately determined that Google Books was a fair use, in part because the search index was not a replacement for the book itself; copyright is intended to protect creators from others using their work in a competitive manner in the marketplace.

Proponents of the current crop of lawsuits point out that generative AI is creating works that are competitive with the original work, at least potentially. But "copyright is about a work, a very specific work that can be infringed", he said; copyright does not protect "your perception in the marketplace, your ability to produce works in the marketplace in general".

The ongoing AI cases have not been decided yet, but what he has argued in his paper is that the models are effectively the same as what Google Books has done. The training of the models simply takes a bunch of measurements and what the models produce is not competition for the work itself. He believes that the courts will find the models to be fair uses of the works, but "nobody knows".

There is one tricky piece, however: what happens when one of these models produces text or an image that is exactly like one of its inputs? "The answer is: that's infringing." It is clearly possible to create a copyright-infringing output from an AI model. He believes the model itself will be found to be fair use, "the outputs, maybe, maybe not". It completely depends on the specific output.

It is trivial to generate a copyright-infringing output from these models, Lindberg said; the easy way is to go to an image-generating model and give it something like "Iron Man" as a prompt. In the US and other places, a character that is sufficiently detailed can be copyrighted; that means the copyright covers more than a specific book, movie, or image. So using that prompt for an image-generation AI will create a "completely new picture of Iron Man that is absolutely copyright-infringing—100%".

Code, which does not have as many degrees of freedom as a natural language like English, will be more frequently reproduced, at least seemingly. Because of that lack of freedom, the probabilities in the model will converge to make the output look like some of the input code more frequently than is seen with text. The model has not memorized the code, per se, but it "memorized how to recreate it, which is a version of copying". Those outputs may then be copyright-infringing.

It turns out that larger amounts of training input lead to fewer infringing outputs. A study that tried to generate infringements was able to find 108 copied output images from a model that used 90,000 training images. But when they applied the same technique to a full-scale image model, the number of copied images found dropped to near zero.

Lawsuits

He put up a slide with five lawsuits that have been filed; the first four (v. GitHub, Stability AI, OpenAI, and Meta) were all filed as class actions by the same US law firm. The other, Getty Images v. Stability AI, has been filed in two places, Delaware and the UK; both are focused on Stable Diffusion, but are different from the other set.

The most prominent case, at least among OSSEU attendees, is probably Doe v. GitHub; it targets the GitHub Copilot tool, which is an AI-based code-autocompletion tool. The case is unusual because it is billed as a copyright suit, but it does not assert copyright infringement. "There is no 'you copied our stuff' in that entire lawsuit." Instead, there are accusations that GitHub removed copyright information, that the output is unfair competition, and that all outputs are necessarily derivative works of the inputs.

The lawsuit plays a bit loose with the idea of a "derivative work", he said; in a legal sense that means that there is "specific expression from one work that has been copied into another". Instead, the lawsuit argues that everything is derived from the inputs, thus everything is a derivative work. GitHub has filed a motion to dismiss the case, in part because there is no copyright infringement claimed; other arguments are made, but not any direct copying. He believes that is because the plaintiffs cannot find something that is infringing.

The second case is Andersen v. Stability AI, which uses bad analogies in its reasoning, he believes. It calls Stable Diffusion, which is an AI-based image creation tool, "essentially a 21st-century collage tool"; the argument is that the model is breaking everything up into pixels, then creating a collage with those pixels, thus the outputs are derivative works. That case also has a motion to dismiss it and it sounds like pretty much all of the case will be dismissed, he said, at least preliminarily. Two of the lead plaintiffs had not registered their copyrights in the works used, while a third had their works removed from the most recent version of the model because they did not meet the quality standards.

The last two in that first set are about the GPT4 and LLaMA models, which are text-based. In both cases, one of the lead plaintiffs is author and comedian Sarah Silverman. The suits are making copyright-infringement claims, but doing so in an interesting way, he said. Instead of showing two texts, one copyrighted and one generated, then showing places where the latter was copied from the former, the suits are taking a different path.

The complaint shows that asking the AI tools for a summary of Silverman's work results in ... a summary of her work. That means, the suit argues, that "the work must be in there somewhere, we just don't know how to get it out". But, Lindberg noted, creating a summary is something that is protected as a fair use of the work. In his analysis, all four in that first set "are not good lawyering"; in fact, if you want to protect authors and artists, you should want those lawsuits to be dismissed quickly.

The Getty Images cases are more interesting, he thinks. They are also copyright-infringement cases, but these have found generated images that are "very reminiscent" of those that were part of the inputs. The Stable Diffusion model also learned that the Getty Images watermark was an important element, so it dutifully reproduces them, at least sort of. "It's creating these terrible-looking photos with a bad version of the Getty Images watermark." The strongest argument that Getty Images has, he thinks, is that Stable Diffusion is violating its trademark in using the watermark. "That argument may win, but notice that's not a copyright argument."

Copyrightable?

Another interesting question with regard to AI is whether its output can be copyrighted or not. While the UK copyright office says that those outputs can be copyrighted, the US copyright office is currently saying that they cannot be. That US ruling was made with regard to the Zarya of the Dawn AI-illustrated comic book; it was determined that the AI-created images in it were not subject to copyright.

Lindberg assisted the author, Kris Kashtanova, in creating a response for the copyright office, which had revoked the previously issued copyright once the AI nature of the work came to light. The copyright office said that the author did not have enough control over the AI-generated output to make it eligible for copyright; there needs to be substantial human control over the output in order for it to be eligible. Kashtanova decided not to appeal that judgment, but Lindberg is working on another, similar case.

That ruling also means that the output of, say, Copilot is not currently eligible for copyright protection in the US. He believes that the copyright office is in the midst of a "speed run" replaying the history of photography copyrights; originally photographs were not eligible, then they were eligible if there was sufficient work done in setting up the photograph (e.g. lighting, costumes). Eventually it was decided that all photographs are eligible for copyright protection and he believes that will happen with model output too; we are currently at the "sufficient work" stage, but the copyright office is seeking comment on the matter.

At that point, time was running out on the talk. It was clear that Lindberg had some other topics he wanted to present, but 40 minutes was simply not enough time to do so. The topics he was able to get to certainly provided some useful information, for both technologists and those in the legal field.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Bilbao for OSSEU.]

Comments (48 posted)

Moving the kernel to large block sizes

By Jake Edge
September 27, 2023

OSSEU

Using larger block sizes in the kernel for I/O is a recurring topic in storage and block-layer circles. The topic came up in discussions at the Linux Storage, Filesystem, Memory-Management and BPF Summit (LSFMM) back in May. One of the participants in those discussions, Hannes Reinecke, gave a talk at Open Source Summit Europe 2023 with an overview of the reasons behind using larger blocks for I/O, the current status of that work, and where it all might lead from here.

Reinecke has worked at SUSE for "like an eternity, nearly 20 years now" and was involved with Linux before that; his first kernel was 1.1.5 or 1.0.5. More recently he has been involved with storage and with NVMe in particular. That led to a pet project of his "that has finally come to life", which is to be able to use larger blocks in Linux.

Blocks and pages

Currently, Linux is restricted to using block sizes that are no larger than PAGE_SIZE, which is typically 4KB. But there are some systems and applications that would benefit from using larger pages; for example, some databases would really like to be able to work with chunks of 16KB because that is how they are organized internally. In addition, some hardware would benefit from handling data in larger block sizes because it reduces the amount of overhead needed to internally track the blocks, thus making the drives more efficient and cheaper.

But, does there have to be a block size, he asked, couldn't the kernel just use whatever amount of data it wants to at the time? The problem is that there is no "do I/O" instruction that atomically reads or writes some arbitrary amount of data. Each I/O operation requires multiple instructions to set it up, transfer the data, and gather up the results. That increases the latency for each operation, so the goal is to minimize the number of I/O operations that are done, but there is a balance to be struck.

There is a question of what the right size for these I/O operations should be. That was the subject of a lot of experimentation in the early days, he said. Eventually, researchers at University of California Berkeley ("of course, as usual") figured out that 512 bytes gave a reasonable compromise value between the overhead and I/O granularity. That was twenty years ago, at this point, but we still use 512 bytes—at least for now.

CPUs have hardware-assisted memory management that operates in pages, however. There is support for determining which pages are dirty (i.e. need to be written to the backing store) that operate in page-size chunks, for example. That means the size of the page is CPU-dependent, Linux cannot just arbitrarily choose a size. For x86_64, the choices are 4KB, 2MB, or 1GB; for PowerPC and some Arm systems, 16KB is used as the page size. The kernel has a compile-time PAGE_SIZE setting that governs the size of the page.

There is a need to read the pages in memory from disk, or to flush their contents to disk, at times. For buffered I/O, the page cache is what manages all of that; it uses the hardware-supplied dirty-page information to determine which pages need to be written. Since all of that is done at page granularity, it is natural to do I/O in page-size units.

But if you had a number of consecutive pages that were all dirty, you could do I/O on the whole set of pages at once. Having a data structure that handles more than one page as a single unit would facilitate that, which is what folios are all about. Beyond buffered I/O, there is direct I/O, which user space has complete control over; the page cache is not involved and user space can do I/O in multiple blocks if it wants. Buffered I/O is provided by the filesystems via the page cache and there are a few different interfaces that can be used for that I/O. The oldest is buffer heads, its successor (of sorts) uses struct bio, and more recently there is iomap, which he said he would be getting back to. In order to do buffered I/O in larger sizes, though, the page cache needs to be converted to use folios.

Folios

Folios are an effort to treat different kinds of pages in a common way. There are normal pages, compound pages (like an array of pages), and transparent hugepages (THPs), each of which has its own quirks. All of them can be addressed using a struct page, though, so kernel developers have to know whether a given page structure is actually a page—or something more complicated. A folio is explicitly designed to handle the different types and, importantly for his talk, it can represent more than a single page, thus allowing it to be used for larger block I/O.

That requires converting the page cache—and probably the memory-management subsystem eventually—to use folios. That effort was proposed by Matthew Wilcox in 2020 and has been discussed at every LSFMM since. It has also been the subject of sometimes contentious mailing-list discussions over that span. But the work is ongoing and will be for several more years ("we will get there eventually"). He showed counts of "struct page" (8385) versus "struct folio" (1859) in the 6.4-rc2 kernel as a rough guide to where things stand.

He then turned to buffer heads, which were present in the 0.01 kernel, so they are the original structure for I/O in Linux. Each buffer head is for a single 512-byte disk sector, it is linked to a particular page structure, and is internally cached in the buffer cache (to save on I/O when accessing it). Buffer heads are still in use by most filesystems and they are also used in a pseudo-filesystem for block devices. The page cache only came later in the kernel's history because the buffer cache was sufficient for the early days.

A struct buffer_head is complicated, so in the 2.5 kernel, struct bio was introduced as a "basic I/O structure" for device drivers. It allows for vectorized I/O to or from an array of pages, routing and rerouting the bio structures to various devices, and is abstracted away from the page cache. These days, buffer heads are implemented on top of the bio infrastructure. There are a number of filesystems, such as AFS, CIFS, NFS, and FUSE, that use struct bio directly, thus do not rely on buffer heads.

Finally, there is iomap "or Christoph Hellwig going crazy"; Hellwig got fed up with the existing I/O interfaces and created iomap as a replacement, Reinecke said. Iomap is a modern interface that already uses folios; it provides a way for a filesystem to specify how the I/O should be mapped and leaves the rest to the block layer. Several filesystems have been converted to use iomap, including XFS, Btrfs, and Zonefs, so nothing more needs to be done for those with regard to the folio conversion. One problem area for iomap, though, is documentation, which is somewhat hard to find and often out of date because iomap is under active development.

Replacing buffer heads?

The storage community has long had a consensus that "buffer heads must die", he said. He led a discussion on that topic at this year's LSFMM. The thinking is that buffer heads are a legacy interface, using an ancient structure, so users should be converting to struct bio or iomap. But, a recent conversation on the ksummit-discuss mailing list contained a disagreement from Linus Torvalds.

The vehemence of that response perhaps indicates that a different path should be chosen to get to the goal of larger block sizes, Reinecke said. Conversion to folios is useful, but only affects the page cache and the memory-management subsystem; buffer heads assume that I/O will be done in sub-page granularity (i.e. 512 bytes), so that needs to be addressed. One path might be to convert everything to iomap and then remove buffer heads, another would be to update buffer heads to work with larger I/O sizes.

In an ideal world, all filesystems would be converted to use iomap, he said; it is a "modern interface and it is actually quite a nice interface". But, as the ksummit-discuss thread has shown, there are legacy filesystems that lack an active maintainer—or any maintainer at all. There is often little or no documentation for the legacy filesystems and no real way to test changes to them. Beyond that, converting any filesystems (legacy or not) is going to require better iomap documentation for the developers working on the conversions.

Another possibility is to simply remove buffer heads; there is a patch set from Hellwig that allows compiling buffer heads out of the kernel, which was merged for the 6.5 kernel. Turning that on would mean disabling all of the filesystems that use buffer heads, which is not entirely realistic at this point, Reinecke said. In particular, the FAT filesystem, which is needed for booting UEFI systems, would not be present in such a kernel.

At LSFMM, Josef Bacik raised the idea of converting buffer heads to use folios, so that it could handle both sub-page and super-page I/O. While that is not the direction Reinecke would have chosen, he started to consider it. A conversion of that sort could either be fairly trivial, if the code was written without wholesale assumptions about sub-page I/O, or "it could be a complete nightmare" because that assumption is pervasive.

Later that day, he was sitting at the bar after looking at the buffer heads code and "complaining bitterly" to his neighbor about them. He wondered how anyone could be expected to convert them, since they are so closely tied to pages. He then realized that his neighbor was Andrew Morton, who said: "back in the day when I wrote it, it was quite good—and it still works, doesn't it?"

So, Reinecke started to reconsider the idea of converting buffer heads to folios, but there are a number of problems that need to be solved. For one thing, buffer heads and iomap are fundamentally incompatible. For example, there is a void pointer in the page structure that either points to a buffer head or an iomap structure, depending on which is being used; when looking at a page in the page cache, it is important to know which you have. The "mix and match approach" needs to be considered carefully. Reviewing the changes will be difficult, he said, because dependencies on PAGE_SIZE are hard to spot.

All of that starts to make him wonder whether the overarching goal of I/O using larger block sizes is really worth all of this effort. "I think it is ... but that's just me." But he does know that databases really want to be able to do larger I/Os and the hope is that supporting larger I/Os will be more efficient for filesystems as well. For the most part, filesystems already do I/O in larger chunks. Beyond those benefits, the drive vendors would like to use larger blocks for efficiency, capacity, and, ultimately, cheaper devices.

Progress

Reinecke had been working away on his patches and finished his patch set the previous week. As sometimes happens in the open-source world, though, another implementation surfaced around the same time. Luis Chamberlain and his colleagues at Samsung posted a different patch set that covers much of the same ground. In the talk, Reinecke said that he was presenting his own patches to solve these problems, but that he would be working with the folks from Samsung on combining the two approaches in the near future.

The overall idea is to switch buffer heads to be attached to a folio rather than to a page. That way, all of the I/O would still be smaller than the attached unit, so the assumptions in the buffer heads code would still be met. The folio would have a pointer to a single buffer head or to a list of buffer heads. There are some things that need to be kept in mind with this conversion; foremost is that the memory-management subsystem still works in units of PAGE_SIZE, while the page cache and buffer cache have moved to folios.

But, in order to do I/O, buffer heads use the bio mechanism, which operates in 512-byte blocks. That is effectively hardwired throughout the block layer and its drivers—it is not something that can be changed without enormous effort, he said. But the actual I/O is handled by the lower-layer drivers, which already merge adjacent blocks into larger units. So the folios in the page cache can be handed to the block layer, which will enumerate them in 512-byte blocks, hand the results to the driver that will reassemble them into larger units. It all "should just work", even though it is not really the obvious way to attack the problem.

So that is the core of what his patch set does. There was still other work to do, of course, including auditing the page cache to ensure that it is allocating folios of the size used by the underlying drive and to ensure that it is incrementing in folio-size steps, not by pages. He also needed to add an interface for the block drivers to report their block size to the page cache. It all worked well, perhaps even too well, since NFS wanted 128MB blocks—and got them—at least until the virtual machine hit an out-of-memory condition. That particular test "neatly proved that all large blocks leads to a higher memory fragmentation" if such a proof was actually needed.

Done yet?

While it is great that these patches enable the kernel to talk to drives with large block sizes, there is a still a problem: there are no drives with large block sizes "because no one can talk to them". He has patches to update the block ramdisk driver (brd) to support larger blocks for testing purposes. That driver could then be used as the backing device for an NVMe target so that it could be tested with large block sizes. "That was quite cool but, of course, there is still some testing needed."

There are still some pieces needed as well. QEMU needs to be updated to support large block sizes, the drivers need to be exercised using them, and other subsystems, such as SCSI, need to be tested. Beyond that, unification with the Samsung work will be required. Once that is all in hand, there will be reviews and the fallout from those to deal with as well before this work can go upstream.

The memory-fragmentation issue is one that is still unresolved. Systems may well have devices with different block sizes in the future; 16KB should not be a major problem in this regard, but even larger block sizes are possible down the road. The memory-management layer continues to work in page-size chunks, which will lead to additional fragmentation. If systems could switch to using memory at the same granularity as the larger blocks, all would be well—but that assumes there is only one large block size, which may well not be true.

One possible solution, which may be worth doing in its own right, is to switch the SLUB allocator to use higher-order folios, rather than page granularity. Then if alloc_page() users were converted to use SLUB, it would remove the fragmentation problem for allocations. Once again, though, that relies on there being a single large block size. He would be interested in hearing other ideas for improving the fragmentation situation in the presence of larger block sizes.

He closed his talk with a suggestion "in case you are really bored": there is still the block layer and its 512-byte orientation that could be improved. Switching the block layer to use folios is not something for the faint of heart, but it should be doable, he thinks. The bio structure does not store the data directly, but uses a struct bio_vec for the data in a vectorized form. Those could perhaps be converted to use folios instead of pages, though there are some 4,000 uses of bio_vec in the block layer.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Bilbao for OSSEU.]

Comments (8 posted)

Revisiting the kernel's preemption models (part 1)

By Jonathan Corbet
September 21, 2023

All that Ankur Arora seemingly wanted to do with this patch set was to make the process of clearing huge pages on x86 systems go a little faster. What resulted was an extensive discussion on the difficulties of managing preemption correctly in the kernel. It may be that some changes will come to the plethora of preemption models that the kernel currently offers.

Fast memory clearing

The patch set in question adds a function to use x86 string instructions to clear large amounts of memory more quickly; the change produces some nice performance improvements for situations (such as page-fault handling) where ranges of memory must be zeroed out before being given to the faulting process. But there is one little problem with this approach: being able to clear large ranges with a single instruction is nice, but that one instruction can execute for a long time — long enough to create unwanted latencies for other processes running on the same CPU.

Excess latency caused by long-running operations is not a new problem for the kernel. The usual response is to break those operations up, inserting cond_resched() calls to voluntarily give up the CPU for a bit if a higher-priority process needs to run. It is, however, not possible to insert such calls into a single, long-running instruction, so some other mitigation is needed. Arora chose to add a new task flag (TIF_ALLOW_RESCHED) marking the current task as being preemptible. If the kernel, at the end of handling an interrupt, sees that flag, it knows that it can switch to a higher-priority task if need be. This new flag could be set before clearing the pages, then reset afterward.

This mechanism turned out to have some problems. The code setting the flag may be preemptible, but other functions it calls may not be. Other events, such as hardware interrupts or CPU traps (page faults, for example) could put the kernel into a non-preemptible situation as well. Having a flag set that marks the current task as being preemptible anyway is just not going to lead to good things.

It looks like making this idea work with current kernels would require moving away from a task flag, and toward marking specific ranges of code as being preemptible in this way. That could limit how widely this feature could be used, though, since finding out whether the current location is preemptible would require maintaining a data structure of the preemptible ranges and searching it, in the interrupt path, to see if preemption is possible. That made Linus Torvalds a little unhappy:

I really hate that, because I was hoping we'd be able to use this to not have so many of those annoying and random "cond_resched()" calls. [...] I was hoping that we'd have some generic way to deal with this where we could just say "this thing is reschedulable", and get rid of - or at least not increasingly add to - the cond_resched() mess.

Peter Zijlstra pointed out that Torvalds was describing full preemption, which the kernel already supports quite well. That led to a bit of a shift in the discussion.

Preemption models

The traditional Unix model does not allow for preemption of the kernel at all; once the kernel gets the CPU, it keeps executing until it voluntarily gives the CPU up. In the beginning, Linux followed this model as well; over the years, though, the kernel has gained a number of preemption modes, selectable with a configuration option:

PREEMPT_NONE is the traditional model, with no preemption at all. The kernel must give up the CPU, via a return to user space, a blocking operation, or a cond_resched() call, before another task can run.
PREEMPT_VOLUNTARY increases (significantly) the number of points where the kernel is said to be voluntarily giving up the CPU. Each call to might_sleep(), which is otherwise a debugging function marking functions that could block, becomes a preemption point, for example.
PREEMPT is full preemption; the kernel can be preempted at any point where some factor (such as holding a spinlock or explicitly disabling preemption) does not explicitly prevent it.
PREEMPT_RT is the realtime preemption mode, where even most spinlocks become preemptible and a number of other changes are made as well.

These options represent different tradeoffs between throughput and latency. Preemption is not free; it can worsen cache behavior, and tracking the state needed to know whether preemption is safe at any given time has costs of its own. But latency hurts as well, especially for interactive use. At the PREEMPT_NONE end of the scale, only throughput matters, and latencies can be long. As the level of preemption increases, latency is reduced, but throughput might suffer as well.

As an extra complication, another option, PREEMPT_DYNAMIC, was added to the 5.12 kernel by Michal Hocko in 2021. It allows the preemption choice to be deferred until boot time, where any of the modes except PREEMPT_RT can be selected by the preempt= command-line parameter. PREEMPT_DYNAMIC allows distributors to ship a single kernel while letting users pick the preemption mode that works best for their workload.

Torvalds, seemingly looking closely at PREEMPT_DYNAMIC for the first time, observed that it maintains all of the information about whether the current task is preemptible, even when running in the no-preemption modes. As Zijlstra responded, that suggests that the overhead of maintaining that information is not seen as being a problem; Ingo Molnar added that, while it might be nice to patch that overhead out, "it's little more than noise on most CPUs, considering the kind of horrible security-workaround overhead we have on almost all x86 CPU types". That overhead, he said, is less of a concern than preemption causing "material changes to a random subset of key benchmarks that specific enterprise customers care about", so PREEMPT_DYNAMIC works well as it is.

Zijlstra also said that, since PREEMPT_DYNAMIC seems to work for distributors, he is open to removing the other options. While the connection wasn't made in the conversation, doing so might solve the original problem as well. If the kernel is always maintaining the information needed to know when preemption is safe, that information can be used for a safe implementation of TIF_ALLOW_RESCHED. It may not come to that, though; the conversation is ongoing and some more significant changes to preemption are being considered; stay tuned for the second part of this series once the dust settles a bit.

Comments (19 posted)

User-space spinlocks with help from rseq()

By Jonathan Corbet
September 22, 2023

OSSEU

Back in May, André Almeida presented some work toward the creation of user-space spinlocks using adaptive spinning. At that time, the work was stalled because there is, in Linux, currently no way to quickly determine whether a given thread is actually executing on a CPU. Some progress has since been made on that front; at the 2023 Open Source Summit Europe, Almeida returned to discuss how that difficulty might be overcome.

He started with on overview of locking primitives and how spinlocks, in particular, work. In short, a spinlock is so-named because, if it an attempt to acquire a lock fails, the code will recheck its status in a loop (thus "spinning") until the lock becomes available. Spinlocks are relatively easy to implement in the kernel because, by the rules under which spinlocks operate, the holder of a lock is known to be running on a CPU somewhere in the system and should release it soon; that insures that the CPU time lost to spinning will be small.

In user space, the story is more complex. One thread might be spinning on a lock while the holder has been preempted and isn't running at all. In such cases, the lock will not be released soon, and the spinning just wastes CPU time. In the worst case, the thread that is spinning may be the one that is keeping the lock holder from running, meaning that the spinning thread is actively preventing the lock it needs from being released. In such situations, the code should simply stop spinning and go to sleep until the lock is released.

Doing that, though, requires a way for the lock-acquisition code to know that the lock owner is not running. One could add a system call for that purpose, but system calls are expensive; in this case, the system-call overhead might easily overwhelm the time spent in the critical section protected by the lock. If it is necessary to call into the kernel, it is better to just block until the lock is released. What is really needed is a way to gain that information without making a system call.

In the May discussion, the idea of using the restartable sequences feature to gain that information came up. This subsystem has hooks into the scheduler to track events like task preemption; it also uses a shared-memory segment to communicate some of that information to user space. Perhaps restartable sequences could be employed to solve this problem as well?

The maintainer of the restartable sequences code, Mathieu Desnoyers, quickly responded with a patch to implement this functionality. This patch adds a new structure member to the rseq struct that is shared between the kernel and user space:

    struct rseq_sched_state {
	/*
	 * Version of this structure. Populated by the kernel, read by
	 * user-space.
	 */
	__u32 version;
	/*
	 * The state is updated by the kernel. Read by user-space with
	 * single-copy atomicity semantics. This field can be read by any
	 * userspace thread. Aligned on 32-bit. Contains a bitmask of enum
	 * rseq_sched_state_flags. This field is provided as a hint by the
	 * scheduler, and requires that the page holding this state is
	 * faulted-in for the state update to be performed by the scheduler.
	 */
	__u32 state;
	/*
	 * Thread ID associated with the thread registering this structure.
	 * Initialized by user-space before registration.
	 */
	__u32 tid;
    };

The state field, which holds a set of flags describing the execution state of the process in question, is the key here. There is only one flag, RSEQ_SCHED_STATE_FLAG_ON_CPU, defined. Whenever the thread associated with this structure is placed onto a CPU for execution, this flag will be set; if the thread stops running for any reason, the flag is cleared again.

This information is enough for the implementation of adaptive spinning in user space. If an attempt to acquire a spinlock fails, the first step is to check the rseq_sched_state of the thread holding the lock (this implicitly requires that this communication is happening between threads that can access each other's restartable-sequences state). If that check shows that the thread is running, then it makes sense to spin waiting for the lock to be freed (with a check inside the loop, of course, to detect the case where the holder is subsequently preempted). Otherwise, a system call is made to simply block until the lock is freed.

That said, Almeida concluded by saying that he is still not entirely sure if this idea lives up to its potential. There is work to be done to optimize cache behavior, integrate adaptive spinning into the POSIX threads locking primitives, and do a lot of benchmarking work. But the approach appears to have promise, and the rest is just work.

[Thanks to the Linux Foundation for supporting our travel to this event.]

Comments (5 posted)

The PuzzleFS container filesystem

By Jonathan Corbet
September 25, 2023

Kangrejos

The last year or so has seen the posting of a few new filesystem types that are aimed at supporting container workloads. PuzzleFS, presented at the 2023 Kangrejos gathering by Ariel Miculas, is another contender in this area, but it has some features of its own, including a novel compression mechanism and an implementation written in Rust.

PuzzleFS, Miculas began, is an immutable (and thus read-only) filesystem that shares design goals with the Open Container Initiative (OCI) v2 image specification. It uses content-defined chunking (discussed further shortly) and a content-addressed data store, with file data and metadata stored separately from each other. The project was started by Tycho Andersen in 2021 as an attempt to create a successor to atomfs.

The first version of the OCI image specification, he said, had a number of problems, many of which are described in this 2019 blog post by Aleksa Sarai. At the base of those problems is the dependence on tar archives to hold the layers in the filesystem. Tar, as it turns out, is not well suited to the container filesystem problem.

The format, he said, is defined poorly. It has no index; instead, there is just a header leading directly into the content. The compression mechanism used means that the filesystem image is not seekable; as a result, the entire filesystem must be decompressed even to extract one little file. There is no deduplication; even a small change means re-downloading the entire thing, though layers can be used to work around that problem to an extent. It is machine-dependent, in that directory entries can be shown in a different order on different systems. The lack of a canonical representation has led to a lot of extensions, many of which are solving the same problem.

PuzzleFS is intended to solve these problems. A filesystem image itself consists of a set of files placed on an underlying filesystem. As with the OCI image format, there is a top-level index.json file that contains a set of tags, each of which represents a versioned filesystem and points to a manifest file. The manifest file, in turn, points to an image configuration and the data stored in the actual image layers. Everything else is stored as a set of blobs in the blobs/sha256 directory.

Most data in the filesystem is broken into variable-size chunks, then stored as blobs using the SHA256 hash of the content as the file name. The chunking itself is done with the FastCDC algorithm, which finds "cut points" where a data stream can be split into blobs of varying sizes. Any given stream (the contents of a file, for example) might be split into five or 50 chunks, depending on how those cut points are determined; each chunk then lands as a separate blob under blobs/sha256, and its hash is added to the manifest.

The cut-point algorithm itself uses a sliding-window technique. The data is stepped through, byte by byte, and the hash of the last 48 bytes (for example) is calculated. If the N least-significant bytes of that hash are zero, then a cut point has been found; the data to that point is separated into a separate blob and the process starts anew.

This algorithm has some interesting characteristics, perhaps most notably its ability to perform deduplication and compression. Since each chunk is stored using its hash as the file name, chunks that are common to multiple files will automatically be shared. In traditional schemes, an update to a file will cause the entire new file to be stored; this is especially true if any bytes are inserted or deleted. Inserting a single byte into a traditionally compressed file will make the entire file after the insertion look different. With content-defined chunking, only the chunk containing the change will differ, while the rest of the file will contain the same chunks, perhaps at a different offset.

The results can be seen in an experiment that Miculas carried out. He downloaded ten different versions of Ubuntu 22.04 from the Docker Hub; they required 766MB to store in that form. Putting them into the OCI image format with compression reduced that size to 282MB. Placing them all into a PuzzleFS instance, instead, reduced the size to 130MB — without using compression. Adding compression cut the whole thing down to 53MB, a savings of 93% from the original size.

A goal of PuzzleFS was to always provide a canonical representation of the filesystem. Thus, for example, the traversal order of the source from which it is generated is defined, with both directories and extended attributes being sorted lexicographically. Another goal was direct mounting support. With tar-based formats, the files must first be extracted to an on-disk representation, creating a window where things could be changed before mounting the image. Thus, there is no guarantee that the files seen by the kernel are the same as those that were in the tar archive. PuzzleFS doesn't have that extraction step, so that problem does not exist.

Data integrity is an important objective in general. It is not possible to use dm-verity in this case to protect the whole volume; while the filesystem is immutable, the underlying data store is not, since adding a new version or layer requires the ability to add new data. So, instead, fs-verity is used to verify the integrity of the individual files in the data store. When mounting a specific image, the hash of the manifest of interest is given to the mount for verification.

An important objective behind this project was the avoidance of memory-safety bugs. For that reason, the filesystem implementation has been written in Rust. That choice has, he said, removed a lot of pain from the development process.

There is a working FUSE implementation, and a kernel implementation in a proof-of-concept state. The kernel side depends on a set of filesystem-interface abstractions that are being developed separately, and which should be headed toward the mainline at some point. Some other work is needed to get other dependencies, including the Cap'n Proto library used for metadata storage, into proper shape for the kernel. The work is progressing, though; interested folks can find the current code in this repository.

One topic Miculas did not address is the resemblance between PuzzleFS and composefs, which shares some similar goals. Composefs has run into difficulties getting into the mainline kernel — though other changes intended to support those goals are going in. PuzzleFS has some features that composefs lacks; whether those will be enough to make its upstream path easier is unclear.

See the slides from this talk for more information and details of the on-disk format.

[Thanks to the Linux Foundation for supporting our travel to this event.]

Comments (27 posted)

Kernel release status

The current development kernel is 6.6-rc3, released on September 24. Linus said:

Unusually, we have a large chunk of changes in filesystems. Part of it is the vfs-level revert of some of the timestamp handling that needs to soak a bit more, and part of it is some xfs fixes. With a few other filesystem fixes too.

The multi-grain timestamp changes turned out to cause the occasional regression (timestamps that could appear to go backward) and were taken back out.

Stable updates: 5.10.196 was released with a single fix on September 21; 6.5.5, 6.1.55, 5.15.133, 5.10.197, 5.4.257, 4.19.295, and 4.14.326 followed on September 23.

Comments (none posted)

Firefox 118.0 released

Version 118.0 of the Firefox browser has been released. Changes include improved fingerprinting prevention and automated translation: "Automated translation of web content is now available to Firefox users! Unlike cloud-based alternatives, translation is done locally in Firefox, so that the text being translated does not leave your machine."

Comments (24 posted)

LibrePCB 1.0.0 Released

The 1.0 version of the LibrePCB "free, cross-platform, easy-to-use electronic design automation suite to draw schematics and design printed circuit boards". As noted in a blog post back in May, a grant has helped spur development of the tool. The focus for the release has been in adding features that were needed so that "there should be no show stopper anymore which prevents you from using LibrePCB for more complex PCB [printed circuit board] designs". New features include a 3D viewer and export format for working with designs in a mechanical computer aided design (CAD) tool, support for manufacturer part number (MFN) management, and lots of board editor features such as thermal relief pads in planes, blind & buried vias, keepout zones, and more. [Thanks to Alphonse Ogulla.]

Comments (7 posted)

DistroWatch Weekly September 25

openSUSE Tumbleweed Review of the Week September 22

Ubuntu Weekly Newsletter September 11

Emacs News September 25

What's cooking in git.git September 22

What's cooking in git.git September 25

This Week in GNOME September 22

This Week in KDE September 23

OCaml Weekly News September 26

Perl Weekly September 25

Python Weekly Newsletter September 21

Weekly Rakudo News September 25

Ruby Weekly News September 21

This Week in Rust September 20

Wikimedia Tech News September 25

Fedora FESCO meeting minutes September 21

openSUSE board meeting minutes August 28

Perl Steering Council meeting minutes September 14

Perl Steering Council meeting minutes September 21

CFP Deadlines: September 28, 2023 to November 27, 2023

The following listing of CFP deadlines is taken from the LWN.net CFP Calendar.

Deadline	Event Dates	Event	Location
September 28	November 28	NLUUG Fall Conference	Utrecht, The Netherlands
October 2	October 6 October 8	Qubes OS Summit	Berlin, Germany
October 10	October 28	China Linux Kernel conference 2023	Shenzhen, China
November 1	November 11	Clang-built Linux Meetup	Richmond, VA, US
November 1	March 14 March 17	SCALE 21x	Pasadena, CA, US

If the CFP deadline for your event does not appear here, please tell us about it.

Events: September 28, 2023 to November 27, 2023

The following event listing is taken from the LWN.net Calendar.

Date(s)	Event	Location
September 26 September 29	Alpine Linux Persistence and Storage Summit	Lizumerhuette, Austria
October 1	Hackday celebrating forty years of GNU at the Free Software Foundation	Boston, US
October 3 October 5	PGConf NYC	New York, US
October 5 October 6	PyConZA	Durban, South Africa
October 6 October 8	Qubes OS Summit	Berlin, Germany
October 6 October 8	LibreOffice Conf Asia x UbuCon Asia 2023	Surakarta, Indonesia
October 7 October 8	LibreOffice - Ubuntu Conference Asia 2023	Surakarta, Indonesia
October 15 October 17	All Things Open 2023	Raleigh, NC, US
October 17	Icinga Camp Milan	Milan, Italy
October 17 October 19	X.Org Developers Conference 2023	A Coruña, Spain
October 20 October 22	Linux Fest Northwest 2023	Bellingham, WA, US
October 21 October 23	openSUSE.Asia Summit 2023	Chongqing, China
October 24 October 26	Linux Foundation Member Summit	Monterey, CA, USA
October 28	China Linux Kernel conference 2023	Shenzhen, China
October 30 November 3	Netdev 0x17	Vancouver, Canada
November 3 November 4	Seattle GNU/Linux Conference 2023	Seattle, US
November 3 November 5	Ubuntu Summit	Riga, Latvia
November 4 November 5	OpenFest 2023	Sofia, Bulgaria
November 7 November 9	Open Source Monitoring Conference	Berlin, Germany
November 9 November 10	Fourth LibreOffice Latin America Congress	Mexico City, Mexico
November 11	Clang-built Linux Meetup	Richmond, VA, US
November 13 November 15	Linux Plumbers Conference	Richmond, VA, US
November 25 November 26	MiniDebConf Cambridge	Cambridge, UK

If your event does not appear here, please tell us about it.

Alert summary September 21, 2023 to September 27, 2023

Dist.	ID	Release	Package	Date
Debian	DSA-5504-1	stable	bind9	2023-09-22
Debian	DLA-3579-1	LTS	elfutils	2023-09-23
Debian	DLA-3585-1	LTS	exempi	2023-09-25
Debian	DLA-3581-1	LTS	flac	2023-09-25
Debian	DLA-3582-1	LTS	ghostscript	2023-09-25
Debian	DLA-3583-1	LTS	glib2.0	2023-09-25
Debian	DLA-3576-1	LTS	gsl	2023-09-21
Debian	DLA-3580-1	LTS	libapache-mod-jk	2023-09-24
Debian	DLA-3578-1	LTS	lldpd	2023-09-22
Debian	DSA-5505-1	stable	lldpd	2023-09-25
Debian	DLA-3574-1	LTS	mutt	2023-09-20
Debian	DLA-3584-1	LTS	netatalk	2023-09-25
Debian	DSA-5503-1	stable	netatalk	2023-09-20
Debian	DLA-3575-1	LTS	python2.7	2023-09-20
Debian	DLA-3577-1	LTS	roundcube	2023-09-22
Fedora	FEDORA-2023-b427f54e68	F37	chromium	2023-09-21
Fedora	FEDORA-2023-98dff7aae5	F37	curl	2023-09-26
Fedora	FEDORA-2023-43ef9f5376	F39	curl	2023-09-26
Fedora	FEDORA-2023-ae0176d606	F37	dotnet6.0	2023-09-22
Fedora	FEDORA-2023-18eae45792	F38	dotnet6.0	2023-09-22
Fedora	FEDORA-2023-92f4b53b3e	F37	dotnet7.0	2023-09-22
Fedora	FEDORA-2023-8e848ac43f	F38	dotnet7.0	2023-09-22
Fedora	FEDORA-2023-1b25579262	F37	golang-github-prometheus-exporter-toolkit	2023-09-21
Fedora	FEDORA-2023-c1318fb7f8	F38	golang-github-prometheus-exporter-toolkit	2023-09-21
Fedora	FEDORA-2023-1b25579262	F37	golang-github-xhit-str2duration	2023-09-21
Fedora	FEDORA-2023-c1318fb7f8	F38	golang-github-xhit-str2duration	2023-09-21
Fedora	FEDORA-2023-1b25579262	F37	golang-gopkg-alecthomas-kingpin-2	2023-09-21
Fedora	FEDORA-2023-c1318fb7f8	F38	golang-gopkg-alecthomas-kingpin-2	2023-09-21
Fedora	FEDORA-2023-00484b4120	F38	libppd	2023-09-26
Fedora	FEDORA-2023-defb0a89ff	F37	linux-firmware	2023-09-26
Fedora	FEDORA-2023-4056a5c165	F38	linux-firmware	2023-09-23
Fedora	FEDORA-2023-dd3ebcea25	F39	linux-firmware	2023-09-26
Fedora	FEDORA-2023-be9d60ef35	F37	roundcubemail	2023-09-24
Fedora	FEDORA-2023-b2e5612471	F38	roundcubemail	2023-09-24
Fedora	FEDORA-2023-a7aba7e1b0	F38	thunderbird	2023-09-24
Mageia	MGASA-2023-0263	8, 9	curl	2023-09-25
Mageia	MGASA-2023-0268	8	file	2023-09-25
Mageia	MGASA-2023-0266	8, 9	firefox/thunderbird	2023-09-25
Mageia	MGASA-2023-0267	8, 9	ghostpcl	2023-09-25
Mageia	MGASA-2023-0265	8, 9	libtommath	2023-09-25
Mageia	MGASA-2023-0264	8, 9	nodejs	2023-09-25
Oracle	ELSA-2023-5252	OL8	dmidecode	2023-09-20
Oracle	ELSA-2023-5219	OL8	frr	2023-09-20
Oracle	ELSA-2023-12825	OL7	kernel	2023-09-24
Oracle	ELSA-2023-12826	OL7	kernel	2023-09-25
Oracle	ELSA-2023-12826	OL7	kernel	2023-09-25
Oracle	ELSA-2023-12825	OL8	kernel	2023-09-24
Oracle	ELSA-2023-12825	OL8	kernel	2023-09-24
Oracle	ELSA-2023-5244	OL8	kernel	2023-09-22
Oracle	ELSA-2023-12824	OL8	kernel	2023-09-24
Oracle	ELSA-2023-12836	OL9	kernel	2023-09-25
Oracle	ELSA-2023-5353	OL8	libtiff	2023-09-26
Oracle	ELSA-2023-5309	OL8	libwebp	2023-09-21
Oracle	ELSA-2023-5214	OL9	libwebp	2023-09-20
Oracle	ELSA-2023-5217	OL7	open-vm-tools	2023-09-20
Oracle	ELSA-2023-5312	OL8	open-vm-tools	2023-09-22
Oracle	ELSA-2023-5313	OL9	open-vm-tools	2023-09-22
Oracle	ELSA-2023-12835	OL7	qemu	2023-09-24
Oracle	ELSA-2023-5201	OL8	thunderbird	2023-09-20
Oracle	ELSA-2023-5224	OL9	thunderbird	2023-09-20
Oracle	ELSA-2023-5264	OL8	virt:ol and virt-devel:rhel	2023-09-22
Red Hat	RHSA-2023:5353-01	EL8	libtiff	2023-09-26
Red Hat	RHSA-2023:5309-01	EL8	libwebp	2023-09-20
Red Hat	RHSA-2023:5360-01	EL8	nodejs:16	2023-09-26
Red Hat	RHSA-2023:5361-01	EL8.6	nodejs:16	2023-09-26
Red Hat	RHSA-2023:5362-01	EL8	nodejs:18	2023-09-26
Red Hat	RHSA-2023:5363-01	EL9	nodejs:18	2023-09-26
Red Hat	RHSA-2023:5312-01	EL8 EL8.8	open-vm-tools	2023-09-20
Red Hat	RHSA-2023:5313-01	EL9	open-vm-tools	2023-09-20
Slackware	SSA:2023-264-01		bind	2023-09-21
Slackware	SSA:2023-264-02		cups	2023-09-21
Slackware	SSA:2023-269-01		mozilla	2023-09-26
Slackware	SSA:2023-264-03		seamonkey	2023-09-21
SUSE	openSUSE-SU-2023:0270-1	osB15	Cadence	2023-09-26
SUSE	SUSE-SU-2023:3739-1	SLE12	ImageMagick	2023-09-22
SUSE	SUSE-SU-2023:3792-1	oS15.4	ImageMagick	2023-09-26
SUSE	SUSE-SU-2023:3737-1	MP4.3 SLE15 oS15.4	bind	2023-09-22
SUSE	SUSE-SU-2023:3796-1	SLE12	bind	2023-09-26
SUSE	SUSE-SU-2023:3729-1	SLE12	busybox	2023-09-22
SUSE	openSUSE-SU-2023:0275-1	SLE12	cacti, cacti-spine	2023-09-26
SUSE	openSUSE-SU-2023:0275-1	SLE12 osB15	cacti, cacti-spine	2023-09-26
SUSE	SUSE-SU-2023:3707-1	MP4.2 MP4.3 SLE15 SLE-m5.2 SLE-m5.3 SLE-m5.4 SES7.1 oS15.4 oS15.5	cups	2023-09-20
SUSE	SUSE-SU-2023:3706-1	OS9 SLE12	cups	2023-09-20
SUSE	SUSE-SU-2023:3755-1	SLE12	djvulibre	2023-09-25
SUSE	SUSE-SU-2023:3734-1	SLE12	exempi	2023-09-22
SUSE	SUSE-SU-2023:3762-1	MP4.2 MP4.3 SLE15 SES7.1 oS15.4	frr	2023-09-25
SUSE	SUSE-SU-2023:3709-1	SLE15 oS15.5	frr	2023-09-20
SUSE	SUSE-SU-2023:3683-2	MP4.3 SLE15 SLE-m5.3 SLE-m5.4 oS15.4	kernel	2023-09-21
SUSE	SUSE-SU-2023:3785-1	SLE15 SLE-m5.1 SLE-m5.2	kernel	2023-09-26
SUSE	SUSE-SU-2023:3600-2	SLE15 SLE-m5.3 SLE-m5.4 oS15.4	kernel	2023-09-21
SUSE	SUSE-SU-2023:3704-2	SLE15 SLE-m5.5 oS15.5	kernel	2023-09-21
SUSE	SUSE-SU-2023:3599-2	SLE15 SLE-m5.5 oS15.5	kernel	2023-09-21
SUSE	SUSE-SU-2023:3728-1	MP4.0 MP4.1 SLE15 oS15.4	libqb	2023-09-22
SUSE	SUSE-SU-2023:3727-1	MP4.2 SLE15	libqb	2023-09-22
SUSE	SUSE-SU-2023:3738-1	SLE12	libssh2_org	2023-09-22
SUSE	SUSE-SU-2023:3794-1	OS8 OS9 SLE12	libwebp	2023-09-26
SUSE	SUSE-SU-2023:3712-1	SLE15 SES7	mariadb	2023-09-20
SUSE	openSUSE-SU-2023:0257-1	osB15	modsecurity	2023-09-25
SUSE	openSUSE-SU-2023:0269-1	osB15	modsecurity	2023-09-26
SUSE	SUSE-SU-2023:3779-1	SLE12	netatalk	2023-09-26
SUSE	SUSE-SU-2023:3795-1	SLE12	open-vm-tools	2023-09-26
SUSE	SUSE-SU-2023:3710-1	SLE15 oS15.5	openvswitch3	2023-09-20
SUSE	openSUSE-SU-2023:0251-1	oS15.4	opera	2023-09-23
SUSE	SUSE-SU-2023:3732-1	SLE12	postfix	2023-09-22
SUSE	SUSE-SU-2023:3791-1	oS15.4	postfix	2023-09-26
SUSE	openSUSE-SU-2023:0260-1	osB15	python-CairoSVG	2023-09-25
SUSE	openSUSE-SU-2023:0272-1	osB15	python-CairoSVG	2023-09-26
SUSE	openSUSE-SU-2023:0271-1	osB15	python-GitPython	2023-09-26
SUSE	openSUSE-SU-2023:0259-1	osB15	python-GitPython	2023-09-25
SUSE	SUSE-SU-2023:3730-1	SLE12	python	2023-09-22
SUSE	SUSE-SU-2023:3731-1	SLE12	python36	2023-09-22
SUSE	SUSE-SU-2023:3708-1	MP4.2 SLE15 SES7.1 oS15.4 oS15.5	python39	2023-09-20
SUSE	SUSE-SU-2023:3721-1	SLE15 oS15.4	qemu	2023-09-21
SUSE	SUSE-SU-2023:3793-1	SLE12	quagga	2023-09-26
SUSE	SUSE-SU-2023:3711-1	SLE15 oS15.5	redis7	2023-09-20
SUSE	openSUSE-SU-2023:0253-1	osB15	renderdoc	2023-09-25
SUSE	SUSE-SU-2023:3714-1	MP4.0 MP4.1 MP4.2 MP4.3 SLE15 oS15.4 oS15.5	rubygem-rails-html-sanitizer	2023-09-20
SUSE	SUSE-SU-2023:3722-1	MP4.3 SLE15 oS15.4 oS15.5	rust, rust1.72	2023-09-21
SUSE	SUSE-SU-2023:3713-1	SLE15 SES7	skopeo	2023-09-20
SUSE	openSUSE-SU-2023:0267-1	osB15	tcpreplay	2023-09-26
SUSE	SUSE-SU-2023:3753-1	MP4.3 SLE15 oS15.4 oS15.5	webkit2gtk3	2023-09-25
SUSE	SUSE-SU-2023:3790-1	oS15.4 oS15.5	wire	2023-09-26
SUSE	SUSE-SU-2023:3778-1	MP4.2 MP4.3 SLE15 oS15.4 oS15.5	wireshark	2023-09-26
SUSE	SUSE-SU-2023:3735-1	SLE12	xrdp	2023-09-22
Ubuntu	USN-6190-2	14.04 16.04 18.04	accountsservice	2023-09-25
Ubuntu	USN-6390-1	20.04 22.04 23.04	bind9	2023-09-20
Ubuntu	USN-6391-2	16.04 18.04	cups	2023-09-21
Ubuntu	USN-6361-2	16.04 18.04	cups	2023-09-26
Ubuntu	USN-6391-1	20.04 22.04 23.04	cups	2023-09-20
Ubuntu	USN-6360-2	14.04 16.04 18.04	flac	2023-09-22
Ubuntu	USN-6395-1	23.04	gnome-shell	2023-09-21
Ubuntu	USN-6393-1	14.04 16.04 18.04 20.04	imagemagick	2023-09-21
Ubuntu	USN-6392-1	23.04	libppd	2023-09-20
Ubuntu	USN-6396-1	16.04 18.04	linux, linux-aws, linux-aws-hwe, linux-azure, linux-azure-4.15, linux-gcp, linux-gcp-4.15, linux-hwe, linux-oracle	2023-09-26
Ubuntu	USN-6387-2	18.04 20.04	linux-bluefield, linux-raspi, linux-raspi-5.4	2023-09-26
Ubuntu	USN-6397-1	20.04	linux-bluefield	2023-09-26
Ubuntu	USN-6365-2	16.04 18.04	open-vm-tools	2023-09-25
Ubuntu	USN-6394-1	16.04	python3.5	2023-09-21

Full Story (comments: none)

Linus Torvalds Linux 6.6-rc3 Sep 24

Sebastian Andrzej Siewior v6.6-rc2-rt4 Sep 22

Greg Kroah-Hartman Linux 6.5.5 Sep 23

Greg Kroah-Hartman Linux 6.1.55 Sep 23

Greg Kroah-Hartman Linux 5.15.133 Sep 23

Greg Kroah-Hartman Linux 5.10.197 Sep 23

Greg Kroah-Hartman Linux 5.10.196 Sep 21

Greg Kroah-Hartman Linux 5.4.257 Sep 23

Greg Kroah-Hartman Linux 4.19.295 Sep 23

Greg Kroah-Hartman Linux 4.14.326 Sep 23

Alexander Potapenko Implement MTE tag compression for swapped pages Sep 22

Joey Gouly Permission Overlay Extension Sep 27

Tianrui Zhao Add KVM LoongArch support Sep 27

Christophe Leroy Implement execute-only protection on powerpc Sep 25

Xiao Wang riscv: Optimize bitops with Zbb extension Sep 26

Clément Léger Add support to handle misaligned accesses in S-mode Sep 26

Haitao Huang Add Cgroup support for SGX EPC memory Sep 22

Xin Li x86: enable FRED for x86-64 Sep 23

weilin.wang@intel.com Perf stat metric grouping with hardware information Sep 24

Zhangjin Wu DCE/DSE: Add Dead Syscalls Elimination support, part1 Sep 26

Jens Axboe Add io_uring futex/futexv support Sep 21

peterz@infradead.org futex: More futex2 bits Sep 21

Yafang Shao bpf, cgroup: Add bpf support for cgroup controller Sep 22

Ming Lei io_uring: cancelable uring_cmd Sep 23

Lukasz Luba Introduce runtime modifiable Energy Model Sep 25

Chuyi Zhou BPF: Add Open-coded task, css_task and css iters Sep 25

Justin Stitt get_maintainer: add patch-only keyword matching Sep 27

Tengfei Fan pinctl: qcom: Add SM4450 pinctrl driver Sep 20

Radu Pirea (NXP OSS) Add MACsec support for TJA11XX C45 PHYs Sep 20

Alexander Usyskin drm/xe/gsc: add initial gsc support Sep 20

Lukasz Majewski net: dsa: hsr: Enable HSR HW offloading for KSZ9477 Sep 20

Roger Quadros net: ethernet: ti: am65-cpsw: Add mqprio and frame pre-emption Sep 20

Danilo Krummrich [RFC] DRM GPUVA Manager GPU-VM features Sep 20

Stefan Binding Support mute notifications for CS35L41 HDA Sep 20

Frank Wunderlich add LVTS support for mt7988 Sep 20

Rohit Agarwal Add USB Support on Qualcomm's SDX75 Platform Sep 21

Yi Liu Add Intel VT-d nested translation Sep 21

Gatien Chevallier hwrng: stm32: support STM32MP13x platforms Sep 21

Wesley Cheng Introduce QC USB SND audio offloading support Sep 21

Drew Fustini RISC-V: Add eMMC support for TH1520 boards Sep 21

Marvin Lin Support Nuvoton NPCM Video Capture/Encode Engine Sep 22

Herve Codina Add support for QMC HDLC, framer infrastructure and PEF2256 framer Sep 22

Fenglin Wu Add support for vibrator in multiple PMICs Sep 22

William Qiu StarFive's Pulse Width Modulation driver support Sep 22

Huqiang Qin Add pinctrl driver support for Amlogic T7 SoCs Sep 22

Laurent Pinchart media: Add onsemi MT9M114 camera sensor driver Sep 20

Yoshihiro Shimoda PCI: dwc: rcar-gen4: Add R-Car Gen4 PCIe support Sep 22

Matti Vaittinen Support ROHM BM1390 pressure sensor Sep 22

Miquel Raynal dmaengine: xdma: Cyclic transfers support Sep 22

Lizhi Hou AMD QDMA driver Sep 22

David E. Box intel_pmc: Add telemetry API to read counters Sep 22

Jithu Joseph IFS support for GNR and SRF Sep 22

Shyam Sundar S K Introduce PMF Smart PC Solution Builder Feature Sep 22

Abdel Alkuor Add TPS25750 USB type-C PD controller support Sep 23

Luo Jie add clock controller of qca8386/qca8084 Sep 23

André Apitzsch leds: Add a driver for KTD202x Sep 23

Vladimir Oltean Add C72/C73 copper backplane support for LX2160 Sep 23

Jonathan Neuschäfer Nuvoton WPCM450 clock and reset driver Sep 23

Cindy Lu vdpa: Add support for iommufd Sep 24

Ayush Singh greybus: Add BeaglePlay Greybus Driver Sep 24

alisadariana@gmail.com iio: adc: ad7192: Add improvements and feature Sep 25

John Watts Add FS035VG158 panel Sep 25

Devi Priya Add PWM support for IPQ chipsets Sep 25

Jacky Huang Add support for Nuvoton ma35d1 rtc controller Sep 25

xiazhengqiao ASoC: mediatek: mt8188-mt6359: add rt5682s support Sep 25

Max Filippov serial: add drivers for the ESP32xx serial devices Sep 25

Konrad Dybcio A7xx support Sep 25

Yuji Ishikawa Add Toshiba Visconti Video Input Interface driver Sep 26

Alexey Romanov Meson S4 HW RNG Support Sep 26

Daniel Borkmann Add bpf programmable device Sep 26

Deren Wu support per-device regulatory settings Sep 25

Richard Acayan SDM670 display subsystem support Sep 25

Melissa Wen drm/amd/display: add AMD driver-specific properties for color mgmt Sep 25

Justin Lai Add Realtek automotive PCIe driver Sep 26

Karel Balej input: Imagis: add support for the IST3032C touchscreen Sep 26

Fabrizio Castro Add RZ/V2M CSI slave support Sep 26

Minda Chen Refactoring Microchip PCIe driver and add StarFive PCIe Sep 27

wangweidong.a@awinic.com ASoC: codecs: Add aw87390 amplifier driver Sep 27

Pankaj Gupta firmware: imx: NXP Secure-Enclave FW Driver Sep 27

Alain Volmat Add support for DCMIPP camera interface of STMicroelectronics STM32 SoC series Sep 27

Matti Vaittinen Support ROHM KX132ACR-LBZ Accelerometer Sep 27

Konrad Dybcio Raydium RM692E5-based BOE panel driver Sep 27

Shengjiu Wang Add audio support in v4l2 framework Sep 20

Maxime Ripard drm/connector: Create HDMI Connector infrastructure Sep 20

Yi Liu iommufd: Add nesting infrastructure Sep 21

FUJITA Tomonori Rust abstractions for network PHY drivers Sep 24

Yi Liu iommufd support pasid attach/replace Sep 26

Benjamin Gaignard Add DELETE_BUF ioctl Sep 27

Bernd Schubert fuse: full atomic open and atomic-open-revalidate Sep 20

Bart Van Assche Pass data temperature information to zoned UFS devices Sep 20

Qu Wenruo btrfs: introduce "abort=" groups for more strict error handling Sep 22

Dave Chinner xfs: byte-based grant head reservation tracking Sep 21

Jeff Layton fs: multigrain timestamps for XFS's change_cookie Sep 22

cem@kernel.org tmpfs: Add tmpfs project quota support Sep 25

Josef Bacik btrfs: add fscrypt support Sep 26

Christian Brauner Implement freeze and thaw as holder operations Sep 27

ankita@nvidia.com mm: Implement ECC handling for pfn with no struct page Sep 20

Kairui Song Refault distance update with MGLRU support Sep 21

Yosry Ahmed mm: memcg: subtree stats flushing and thresholds Sep 21

riel@surriel.com hugetlbfs: close race between MADV_DONTNEED and page fault Sep 22

Suren Baghdasaryan userfaultfd remap option Sep 22

Mike Kravetz Batch hugetlb vmemmap modification operations Sep 24

Hugh Dickins mempolicy: cleanups leading to NUMA mpol without vma Sep 25

Stefan Roesch Smart scanning mode for KSM Sep 25

Huang Ying mm: PCP high auto-tuning Sep 26

Stanislav Kinsburskii Introduce persistent memory pool Sep 25

Nhat Pham hugetlb memcg accounting Sep 26

Matthew Wilcox (Oracle) Handle more faults under the VMA lock Sep 27

Roman Gushchin mm: improve performance of kernel memory accounting Sep 27

Yunsheng Lin introduce page_pool_alloc() related API Sep 20

Arseniy Krasnov vsock/virtio: continue MSG_ZEROCOPY support Sep 22

Daan De Meyer Add cgroup sockaddr hooks for unix sockets Sep 21

Miquel Raynal ieee802154: Associations between devices Sep 22

Wen Gu net/smc: implement virtual ISM extension and loopback-ism Sep 24

Mickaël Salaün Landlock audit support Sep 21

Konstantin Meskhidze Network support for Landlock Sep 20

Dan Williams configfs-tsm: Attestation Report ABI Sep 25

Tetsuo Handa LSM: Allow dynamically appendable LSM modules. Sep 28

Kristina Martsenko KVM: arm64: Support for Arm v8.8 memcpy instructions in KVM guests Sep 22

Nuno Das Neves Introduce /dev/mshv drivers Sep 22

Raghavendra Rao Ananta KVM: arm64: PMU: Allow userspace to limit the number of PMCs on vCPU Sep 26

Gao Xiang erofs-utils: release 1.7 Sep 21

Ian Rogers clang-tools support in tools Sep 22

Regzbot (on behalf of Thorsten Leemhuis) Linux regressions report for mainline [2023-09-24] Sep 24

LWN.net Weekly Edition for September 28, 2023

Models

Intellectual property

Lawsuits

Copyrightable?

Blocks and pages

Folios

Replacing buffer heads?

Progress

Done yet?

Fast memory clearing

Preemption models

Brief items

Kernel development

Development

Announcements

Newsletters

Distributions and system administration

Development

Meeting minutes

Calls for Presentations

CFP Deadlines: September 28, 2023 to November 27, 2023

Upcoming Events

Events: September 28, 2023 to November 27, 2023

Security updates

Kernel patches of interest

Kernel releases

Architecture-specific

Build system

Core kernel

Development tools

Device drivers

Device-driver infrastructure

Filesystems and block layer

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous