LWN.net Weekly Edition for February 12, 2026
Welcome to the LWN.net Weekly Edition for February 12, 2026
This edition contains the following feature content:
- Evolving Git for the next decade: major changes coming soon to everyone's favorite version-control system.
- Kernel control-flow-integrity support comes to GCC: a look at Kees Cook's work to bring a much-needed security feature to GCC.
- Modernizing swapping: the end of the swap map: making the swap subsystem simpler and faster.
- Development statistics for 6.19: contribution numbers for the 6.19 kernel and an analysis of how many first-time kernel contributors stick around.
- FOSS in times of war, scarcity, and AI: a FOSDEM keynote on the problems that geopolitics, dangerous allies, and LLMs pose for free-software communities.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Evolving Git for the next decade
Git is ubiquitous; in the last two decades, the version-control system has truly achieved world domination. Almost every developer uses it and the vast majority of open-source projects are hosted in Git repositories. That does not mean, however, that it is perfect. Patrick Steinhardt used his main-track session at FOSDEM 2026 to discuss some of its shortcomings and how they are being addressed to prepare Git for the next decade.
Steinhardt said that he began to be interested in open-source software around 2002, when he was 11 years old. He bought his first book on programming when he was 12, and made his first contribution to an open-source project in 2011. He became a Git and libgit2 contributor in 2015, has been a backend engineer at GitLab since 2020, and became the manager of the Git team there in 2024.
Git must evolve
Git turned 20 last year; there are millions of Git repositories
and even more scripts depending on Git. "The success of Git is
indeed quite staggering.
" However, the world has changed
quite a bit since Git was first released in 2005; it was designed for
a different era. When Git was released, SHA-1 was considered to be a
secure hash function; that has changed, he said, with the SHAttered
attack that was announced in 2017 by Centrum Wiskunde & Informatica
(CWI) and Google. In 2005, the Linux kernel repository was considered
big; now it is dwarfed by Chromium
and other massive monorepos. Continuous-integration
(CI) pipelines were the exception, he said, in 2005—but now
projects have pipelines with lots of jobs that are kicked off every
time there's a new commit.
Also, Steinhardt said to general laughter: "Git was very hard to
use back then; but to be quite honest, Git's still hard to use
nowadays.
" So, the world has changed and Git needs to change with
it. But, he said, the unique position of Git means that it can't have
a revolution; too many projects and developers rely on it. Instead, it
needs to evolve, and he wanted to highlight some of the important
transitions that Git is going through.
SHA-256
The most user-visible change that Git is going through today, he
said, is the SHA-256 transition. SHA-1 is a central part of the
project's design; every single object stored in Git, such as files
(blobs), directory trees, and commits, has an identity
that is computed by hashing the contents of the object. Objects are content
addressable, "given the contents, you know the name of the
object
". That name, of course, is computed using the
no-longer-secure SHA-1.
The work by CWI and Google proved that attacks on SHA-1 are
viable. It requires a lot of compute, about 110 years
worth of single-GPU computations, but it is possible. He noted that
with all the hype around artificial intelligence, data centers have
greatly increased their GPU capacity. "It is very much in reach of
a large player to compute hash collisions
".
SHAttered kicked off quite a few conversations on the Git mailing
lists. During these conversations, he said, it has been asserted that
the use of SHA-1 is not primarily for security and a number of
arguments have been made to back that up. The SHA-1 object hash
is primarily used as an integrity check to detect transmission errors
or bit flips. Also, source code is transparent, "if you see a merge
request where somebody enters random collision data into your code,
then you might probably ask some questions
". Additionally, there
are other security measures such as GPG signatures, HTTPS transport,
and a web of trust among developers that means Git does not rely on
SHA-1 alone.
"But the reality is that things are a little bit more
complicated
", Steinhardt said. Git may not rely on SHA-1 for
security, but everyone else does. When developers sign a commit with
Git, for example, it is the SHA-1 hash that is signed. It might be
noticeable if source code is changed to cause a collision, but
binary blobs such as firmware are not human-readable, so there is no
way to easily see that there is a malicious file. Tooling around Git
also assumes collision resistance, so CI systems, scripts, and such all
trust the SHA-1 hash.
Finally, various governments and enterprise requirements have
mandated removal of SHA-1 by 2030, so Git needs to move on. And it
has: SHA-256 support was added in October 2020, with version 2.29. "But
nobody is using it
", Steinhardt said, because ecosystem support is
lacking. "Unfortunately, this situation looks somewhat grim
".
There is full support in Git itself, Dulwich Python implementation, and Forgejo collaboration platform. There is experimental support
for SHA-256 in GitLab, go-git, and
libgit2. Other popular Git tools
and forges, including GitHub, have no support for SHA-256 at all. That
creates a chicken-and-egg problem, he said. Nobody is moving to
SHA-256 because it is not supported by large forges, and large forges
are not implementing support because there's no demand.
The problem, Steinhardt said, is that we cannot wait forever. It
will become more and more feasible to break SHA-1, and the next
cryptographic weakness may be just around the corner. Even if there
were full support for SHA-256 today, projects still need time to
migrate. Git will make SHA-256 the default for newly created
repositories in 3.0, he said. The hope is to force forges and
third-party implementations to adapt. "The transition will likely
not be an easy one, and it may result in a few hiccups along the
road.
" When 3.0 will be released is still up in the air; a discussion
about its release date in October 2025 on the Git mailing list did not
result in a firm decision.
He said that the audience could help to move things along. "You
can show your favorite code forges that you care about SHA-256 so they
bump the priority.
" He also encouraged people to help by testing
SHA-256 with new projects and adding support to third-party tools
that depend on Git. "Together, we can hopefully get the ecosystem
to move before the next vulnerability
".
Reftables
Another significant shift for Git, which he declared his favorite topic for discussion, is the move to reftables. By default, Git stores references as "loose" references, where each is stored as a separate file such as "refs/heads/main". The format for these files is straightforward to understand, he noted, but storing every single reference as a file does not scale well. It is fine for a project with a handful of references, but if there are hundreds or thousands then it becomes really inefficient.
To deal with that inefficiency today, Git will create a packed-refs file; this can be done manually with "git pack-refs --all", but Git will also do it automatically. However, Steinhardt said, Git still needs to change the way it deals with references.
The first reason he gave is that "filesystems are simply
weird
". Many filesystems, for example, are case-insensitive by
default. That means that Git cannot have two branches whose names only differ
in case, as just one example. It is also an inefficient design, he
said: to create 20 different references, Git has to create 20
different files. That may not take long from a performance
perspective, but each reference requires 4KB of storage for typical
filesystems. That begins to add up quickly.
Packed references are computationally expensive, he said, which is
not a problem if a project only has a few references. "But,
Git users are not always reasonable.
" He said that GitLab hosts
one repository with about 20-million references; each time a reference
is deleted, the packed-refs file has to be completely
rewritten which means rewriting 2GB of data. "To add insult to
injury, this repository typically deletes references every couple
seconds.
"
The third problem Steinhardt described is that concurrency is an afterthought. It is impossible to get a consistent view of all references when there are multiple readers and writers in a repository at the same time. When a user writes to a repository while another user is reading the references, it is impossible to know if they are getting a consistent result or a mixture of the old and new state.
Those problems have been known for a long time, he said, and that is where the reftable backend comes into the picture. Users can create a new repository with a reftable today. The tables are now stored in a binary format rather than the text-based, which is more efficient—though it does mean that the files are no longer human-readable. The new data structure also allows Git to perform atomic updates when writing references to the reference table, and Git is no longer subject to filesystem limitations when it comes to naming references.
As with SHA-256, reftables will become the default in Git
3.0. "So if you use Git in scripts or on the server side, you
should make sure you don't play weird games by accessing references
directly on the filesystem
". Instead, Git users should always
access references with the git command.
Large files
Steinhardt said that, for most of the people in the room, the
scalability problems related to references were mostly theoretical and
rarely encountered in practice. When it comes to scalability
bottlenecks, "the more important problem tends to be large
files
". Storing large binary files in Git is, unfortunately, not a
use case that is well-supported today. There are third-party
workarounds, such as Git LFS and git-annex, but
the Git project would like to solve the problem directly.
Large files are a problem for Git because of the way that it compresses objects, he said. It works extremely well when working with text files, such as source code, because that is what Git was designed for. But Git's compression does not work well for binary files, and even small edits to such files means creating entirely new objects.
Another problem is that when cloning a repository, the user gets a
full copy of all of its history by default. That's desirable, he said,
for normal repositories; but for large monorepos with binary files,
"you probably don't want to download hundreds of gigabytes of
data
". In addition, there is no support for resuming a cloning
operation: if it fails, the user has to start over. "So if you have
downloaded 400GB out of a 500GB repository and your network
disconnects, then you will have to redownload everything.
"
Code forges also struggle with large files. Users can resort to
partial clones to avoid downloading an entire repository, but forges
do not have that luxury. The consequence of that is significant
storage costs. He said that an analysis of GitLab's hosted
repositories has shown that 75% of the site's storage space is
consumed by binary files larger than 1MB. Huge repository sizes also
cause repository maintenance to become computationally
expensive. Other types of web sites might offload large files to
content-delivery networks (CDNs), but that is not an option for Git
forges, he said. "All data needs to be served by the Git server,
and that makes it become a significant bottleneck.
" Large objects
are a significant cost factor for any large Git provider.
Git LFS and partial clones can help users, but those are just
band-aids, Steinhardt said. Even though partial clones have been a
feature in Git for quite a while, "I bet many of you have never
used them before
". And even when users do use partial clones,
servers still cannot offload the files to a CDN.
The solution is large-object promisors, a remote that is used only to store large blobs and is separate from the main remote that stores other Git objects and the rest of the repository. The functionality is now built directly into Git, and is transparent to the client, he said.
In addition, large-object promisors could be served over protocols
other than HTTPS and SSH. That would allow, for example, serving large
objects via the S3
API. "This allows us to offload objects to a CDN and store
large blobs in a format that is much better suited for them
".
Even with promisors, though, Steinhardt said that Git still does
not handle binary files efficiently on the client side. "This is
where pluggable object databases come into play, which will allow us
to introduce a new storage format for a large binary file
specifically.
" Git needs a format designed for binaries, he said,
where incremental changes to a binary file only lead to a small
storage increase. It needs to be efficient for any file size.
In addition, a new format would need to be compatible with Git's
existing storage format so that users could mix and match the old
format for text files and use the new format for large binaries. Git's
storage format is "deeply baked in
" he said, but alternate
implementations like libgit2 and go-git already have pluggable
storage backends. "So there is no fundamental reason why Git can't
do this too. It requires a lot of plumbing and refactoring, but it's
certainly a feasible thing.
"
The two efforts to handle large objects, promisors and pluggable object databases, are progressing in parallel. The promisors effort is farther along, with the initial protocol implementation shipped in Git 2.50, and additional features in Git 2.52, both released in 2025. He said that it is quite close to being usable on the client side, though when support for promisors will arrive in Git forges is still undetermined.
The pluggable object database work is not that far along, he
said. Over the past few Git releases the project has spent significant
time refactoring how Git accesses objects. In 2.53,
which was released a few days after his talk, Git shipped a unified
object-database interface that will make it easier to change the
format in the future. He said that he expected a proof of concept in
Git 2.54, though implementing a viable format for binary files
"will probably take a little bit longer
".
User-interface improvements
One area of Git that tends to draw plenty of complaints is its user
interface, he said. Many of Git's commands are extremely confusing,
and some workflows "are significantly harder than they have any
right to be
". Recently, Git has had competition in the form of the
Jujutsu version-control
project that has made the Git project take a hard look at what it is
doing. (LWN covered
Jujutsu in January 2024.)
Jujutsu is a Git-compatible, Rust-based project started by Martin von Zweigbergk. It has a growing
community and Steinhardt said that "many people seem to prefer the
Jujutsu experience way more
" than using Git. That is not much of a
surprise, he said; Git's user interface has grown organically over two
decades. It has "inconsistencies and commands that just don't feel
modern
". On the other hand, Jujutsu started from scratch and
learned from Git's mistakes.
Early on, Steinhardt said he had looked at Jujutsu and found it
confusing. "It just didn't make sense to me at all, so I simply
discarded it.
" However, after noticing that there was a steady
influx of people who did like it, he opted for another look. That
time, something clicked. "That moment when you realize that a tool
simply fixes all the UI issues that you had and that you have been
developing for the last 20 years was not exactly great.
" He had
two options: despair or learn from the competition. He chose to learn
from it.
There are a number of things that Jujutsu got right, he said. For
example, history is malleable by default. "It's almost as if you
were permanently in an interactive rebase mode, but without all the
confusing parts.
" When history is rewritten in Jujutsu all
dependents update automatically "so if you added a commit, all
children are rebased automatically
". Conflicts are data, not
emergencies. "You can commit them and resolve them at any later
point in time.
" These features are nice to have, he said, and
fundamentally change how users think about commits. "You stop
treating them as precious artifacts and rather start treating them as
drafts that you can freely edit
".
But, he said, Git is old: the project cannot simply completely
revamp its UI and break users' workflows. There are some things
that Git can steal from Jujutsu, though. He discussed the workflow for
splitting a Git commit, which involves seven separate commands with
Git's current UI. Most users do not know how to do this, he
said. The goal is to add several "opinionated subcommands
" that
make more modern styles of working with merge requests, such as stacked
branches, much easier.
This includes two new commands, planned for Git 2.54, "git history split" and "git history reword". Future releases will have more history-editing subcommands and learn more from Jujutsu.
Steinhardt did not have time for questions; he closed the talk by
saying that it had been a "whirlwind tour
" through what is
cooking in Git right now, and hoped that it had provided a clear
picture of what the project was up to.
The video for the talk is now available on the FOSDEM 2026 web site. Slides have not yet been published.
[I would like to thank the Linux Foundation, LWN's travel sponsor, for funding my travel to Brussels to attend FOSDEM.]
Kernel control-flow-integrity support comes to GCC
Control-flow integrity (CFI) is a set of techniques that make it more difficult for attackers to hijack indirect jumps to exploit a system. The Linux kernel has supported forward-edge CFI (which protects indirect function calls) since 2020, with the most recent implementation of the feature introduced in 2022. That version avoids the overhead introduced by the earlier approach by using a compiler flag (-fsanitize=kcfi) that is present in Clang but not in GCC. Now, Kees Cook has a patch set adding that support to GCC that looks likely to land in GCC 17.
CFI has a tricky problem to solve: a program should only make indirect function calls that the developer intends to make. If there were no bugs in the program, this would be straightforward — the function pointers involved would always be correct, and there would be nothing to worry about. The kernel is not free of bugs, however, and there is always the possibility that an attacker will manage to overwrite a function pointer with some value they control. How can the compiler protect against incorrect function calls when the function pointers involved are potentially compromised?
The approach that the kernel's forward-edge CFI takes is to split functions up according to their type signature. If some piece of code is expecting to call a function that takes a long, but the function pointer actually takes a char *, for example, that's a sign that something has gone off the rails. By recognizing that kind of type confusion and panicking the kernel, forward-edge CFI can prevent some attacks. This isn't a perfect solution: an attacker could still redirect an indirect function call as long as the function signatures match. It's still more protection than the alternative, though.
This kind of mitigation is typical of the Linux kernel self-protection project, on behalf of which Cook did this work. The project is based on the reality that many bugs in the kernel have a long lifetime. LWN recently did an in-depth breakdown of bug lifetime for the 6.17 kernel (showing, among other things, that it usually takes several years to find the majority of bugs known to exist in a given kernel version). LWN subscribers can see the details for any kernel version in the LWN kernel source database. Given that bugs have a way of sticking around, the kernel self-protection project supports work on techniques to keep the kernel secure even in the presence of those bugs. The project tracks work-in-progress at its GitHub issue tracker.
Clang implements -fsanitize=kcfi by computing a hash of the function's type, and storing it just before the function implementation in memory. When a call site in the code makes an indirect call, it first verifies that the hash matches what it expects. The plan is for GCC to implement the feature in the same way, but unfortunately adding support isn't quite as simple as it may seem. The patch set has gone through nine versions since the initial version in August 2025.
One complication is in the handling of type aliases: these are frequently used for readability in C code, and it would break existing kernel code to generate a different hash for two type aliases that refer to the same underlying type. Therefore, most type aliases are resolved during the hashing process and the underlying type is used instead. There is one case where type aliases are usually meaningful in C code: as names for otherwise-anonymous structures, unions, and enumeration types. If two anonymous unions happen to have the same fields, but they are named differently, it is almost always the case that the programmer would like to ensure that one is not confused for the other. Here, the compiler uses the type alias name as part of the hash.
// This is treated the same as a plain int:
typedef int port_number;
// But these two types are considered different:
typedef struct {
void *data;
} foo;
typedef struct {
void *data;
} bar;
The other complication comes from a desire for compatibility with LLVM. In the past, a kernel build was theoretically performed entirely with GCC or with Clang. In practice, Cook wants to support CFI for kernels where the C code is built with GCC while the Rust code is built with rustc (which uses LLVM for code generation). Eventually, once GCC's Rust front end is working, this may not be necessary. But for now, C code that makes an indirect function call to Rust code or vice versa needs to compute the same type hash as the other language so that the values will match at run time. Clang's type hashes are based on the function-name mangling required by the Itanium C++ ABI; Cook implemented the same algorithm for GCC. Sami Tolvanen does not seem to have recorded why the Itanium ABI was chosen for LLVM's original implementation, but it may be because the Itanium ABI is the only ABI standard for POSIX systems that actually specifies name-mangling rules, rather than leaving them to the compiler.
The relevant hashes need to be calculated at every call site, but they also need to be calculated for every function that could be called indirectly. Since C files are compiled separately, there's no way to know whether a function that has its address taken is actually called or not. To be safe, GCC calculates and embeds hashes for every function that has its address taken, plus every function that is directly visible to other translation units.
The actual checks that the type hashes match are implemented late in the compiler pipeline, to ensure that they are not optimized out or otherwise broken up. Therefore, there's some architecture-specific code to implement the checks. Cook's patch set includes support for x86_64, 32- and 64-bit Arm, and RISC-V.
While there was a good deal of discussion on earlier versions, the current version has not attracted much commentary from GCC developers. Jeffrey Law believes the patch set is ready to go into GCC 17 as soon as the last GCC 16 release is made. Even though support hasn't quite landed in GCC, the kernel has already renamed the relevant configuration option from CONFIG_CFI_CLANG to CONFIG_CFI in version 6.18. The configuration option uses feature detection, so older stable kernels will support the option with GCC 17 despite the name.
There is still more work to be done, however. For example, there was a discussion in 2022 about the possibility of adding a programmer-specified per-function seed to the CFI hash. That would let programmers manually partition sets of functions with identical signatures into different groups with different CFI hashes. Bill Wendling and Aaron Ballman added support for that to LLVM in August 2025, although the kernel doesn't yet make use of it.
Advances in kernel security don't come all at once. CHERI-like hardware-enforced capabilities can be used to completely block indirect-jump-based attacks. Modern CPUs are increasingly adding hardware support for CFI operations. For older devices, software fixes like Clang's CFI support help make up the gap. Now, GCC will bring that same CFI scheme to the people who choose to use GCC for reasons of compatibility or ideology. Eventually, perhaps, this kind of mitigation will simply be the default.
Modernizing swapping: the end of the swap map
The first installment in this series introduced several data structures in the kernel's swap subsystem and described work to replace some of those with a new "swap table" structure. The work did not stop there, though; there is more modernization of the swap subsystem queued for an upcoming development cycle, and even more for multiple kernel releases after that. Once that work is done, the swap subsystem will be both simpler and faster than it is now.The data structures introduced thus far include the swap cluster, which represents a 2MB set of swap slots within a swap file, and the new swap table, stored within the swap cluster, that tracks the state of each swap slot. The introduction of the swap table allowed the removal of entire arrays of XArray structures that were, prior to the 6.18 kernel release, used to track the status of individual swap slots within a swap file. That was not a complete list of swap-related data structures, though. The first article, as a way of minimizing the complexity of the picture as much as possible, skipped over an important swap-subsystem component: the swap map.
The swap map
The time has come to fill in that gap, as the swap map is the core target of the ongoing swap-improvement effort. At first glance, the swap map, as found in current kernels, is as simple as data structures get. There is one for each swap device, stored in struct swap_info_struct, and declared as:
unsigned char *swap_map; /* vmalloc'ed array of usage counts */
This field points to an array with one byte for every slot in the swap device; the value stored in each byte is the number of references that exist to that swap slot. There will be one reference for every page-table entry pointing to that slot, regardless of whether the page assigned to that slot is resident in RAM.
Of course, this is the swap code that is being discussed, so there are complications, and that some of the bits in the swap-map entries have special meaning. The most significant of those, for the purposes of this article, is bit six (0x40) of the reference count; it is called SWAP_HAS_CACHE, and it is used to indicate that a swap slot has a page assigned to it. There can be various windows of time where a swap slot is assigned, but no page-table references to that slot yet exist, leading to a reference count of zero. The SWAP_HAS_CACHE bit distinguishes that state from a slot being unassigned.
This flag is also used as a sort of bit lock; there are numerous race conditions that might cause the kernel to try to swap in a page (or make other changes) multiple times in parallel. In such cases, the thread that succeeds in setting the SWAP_HAS_CACHE bit in the entry is the one that proceeds to do the work. This use of SWAP_HAS_CACHE as a synchronization mechanism has led to a number of problems over the years; the swap code has a number of delay-and-retry loops (example) waiting for this bit to clear.
There are some other special values in the swap map; a value of 0x3f (SWAP_MAP_BAD) means, for example, that the underlying storage is bad and should not be used. As a result, the maximum reference count that can be stored in the swap map (SWAP_MAP_MAX — 0x3e) is 62. That presents a problem; in cases where a large number of tasks are sharing an anonymous page, the number of references could easily exceed that value. The way this situation is handled is, to put it mildly, interesting.
Every time that the reference count for a swap slot is incremented, a check must be made for overflow. Should the count already be at the maximum, the topmost bit (0x80 — COUNT_CONTINUED) will be set, the count in the swap map will be set to zero, and a new page will be allocated to provide eight more-significant bits for the reference count (and for all the others on the same original swap-map page). That page will be linked to the swap-map page using the LRU list head in the associated page structures. If an entry has a lot of references and the count in the overflow page also overflows, yet another page will be allocated and added to the list.
The overflow pages only need to be accessed when the principal swap-map entry overflows or underflows, which is good considering that these operations are supposed to be fast. While the motivation behind this somewhat baroque design isn't documented anywhere, one can assume that, while the overflow case must be handled correctly, it is also relatively rare. Massive sharing of anonymous pages is not the common case. When reference counts are lower, this structure offers quick access and minimal memory overhead.
Swap-cache bypass and SWAP_HAS_CACHE
One of the purposes of the swap cache is to hold (and track) folios that are under I/O to or from the swap device. If, for example, a page fault occurs on a swapped-out folio, a new folio will have to be allocated and its contents read from the swap file. That read operation can take some time, though. So the folio is added to the swap cache, the read operation is initiated, and the faulting process made to wait until the read is complete. Often, the swap subsystem will also attempt to read ahead of the current fault location, making a bet that the process will soon fault in subsequent pages as well.
The situation changed a bit in the 2018 4.15 release, though. Once upon a time, swapping was mostly done to rotating storage devices, which are slow. Increasingly, though, swapping looks a lot like just copying data from one part of memory to another. The "swap device" may be a bank of slower memory, or it may be an in-memory compression scheme like zram. On such devices, swap I/O is no longer slow, and behavior like readahead may harm performance rather than helping it.
In 4.15, Minchan Kim added the "swap bypass" feature. Specifically, if a swap device has the SWP_SYNCHRONOUS_IO flag (indicating that the device is so fast that I/O should be done synchronously) set, and if a specific slot in the swap map has a reference count of one, then a request to swap in the page stored in that slot will happen synchronously, readahead will not be performed, and the newly read page will not be added to the swap cache. This optimization added a fair amount of complexity to the swap subsystem, resulting in various bugs over time, but it also resulted in significantly better performance for swap-heavy workloads. That improvement was due to two factors: avoiding the relatively expensive swap-cache maintenance and preventing the use of readahead.
Fast-forwarding now to 2026, first part of the phase-two patch series from Kairui Song is dedicated to removing the bypass feature. The work done in the first phase — specifically the introduction of the swap table — made swap-cache operations much faster, to the point that there is no real value to bypassing the swap-cache even when fast swap devices are in use. Additional work in this series separates out the control of readahead and essentially disables its use entirely for fast devices. Having all swap I/O go through the swap cache simplifies the code and reduces the number of troublesome race conditions. The new code will immediately remove swapped-in folios from the swap cache for SWP_SYNCHRONOUS_IO devices as a way of freeing the memory used for the swapped data.
There is one interesting side effect of removing the swap-bypass code. In current kernels, large (multi-page) folios can only be swapped in intact if their reference count is one — only in the bypass case, in other words. Removal of the bypass feature makes it possible to swap in large folios from fast devices regardless of the reference count.
Removal of swap bypass simplifies the swap-map management and makes it easier for the rest of the series to coalesce swap-slot management into a small set of well-defined functions. Among other things, these functions are all folio-based, reducing the historical page orientation of the swap subsystem. All of those functions use a combination of the cluster lock and the folio lock to manage the swap cache. From there, it is just one more step to use those locks to control access to the swap map as well.
Once the swap cache takes on the role of managing concurrency, there is only one last need for the SWAP_HAS_CACHE bit: marking swap slots that are allocated, but which have a reference count of zero. On the swap-out side, this situation is eliminated by immediately adding a folio to the swap cache once its slot has been assigned. At the other end, when pages are removed from the swap cache, swap slots with zero references are freed immediately. At that point, SWAP_HAS_CACHE is no longer needed; this patch near the end of the series removes it.
Removing the swap map
The work described above is, as of this writing, in the mm-unstable repository (and thus linux-next) and could be merged into the mainline as soon as the 7.0 release. But there is more to come. The third phase of this work is currently under review; this relatively short series eliminates the swap map entirely.
Recall, from the previous installment, that the entries in the new swap table, which are simple unsigned long values, were the same as those stored in the XArray data structures in previous kernels. A value of zero indicates an empty slot. For a resident folio, the entry contains the folio address; for swapped folios, the entry contains the shadow information used to track which pages are quickly faulted back in from swap. The third phase changes the format of this table to support five different types of entries:
- A value of zero still indicates an empty slot.
- If bit zero is set, then this is a shadow entry for a swapped-out folio, but the upper part of the entry holds the reference count for this entry. The specific number of bits available for this count will vary depending on the architecture.
- If the bottom two bits are 10, then the entry is for a folio that is resident in memory. As with shadow entries, the uppermost bits hold the reference count. To make room for that count, the page-frame number of the underlying page is stored rather than its address.
- A "pointer" entry is marked by setting the bottom three bits to 100; pointers are not used in the current series.
- Setting the bottom four bits to 1000 marks a bad slot that should not be used.
This organization takes the final remaining purpose for the swap map — tracking the reference counts — and shoehorns it into the swap table; that allows the swap map to be removed altogether. The result is a more compact memory representation and some significant memory savings; Song estimates that about 30% of the swap subsystem's metadata overhead is gone, saving 256MB of memory for a 1TB swap file. Until now, the kernel has maintained the swap map (tracking the status of slots in a swap file) and the swap cache (which tracks the pages that have been placed into swap) separately. The unification of those two data structures, Song says, reduces the amount of record-keeping overhead significantly, speeding the swap system overall.
The new format can keep a larger reference count than the swap map can. For example, x86_64 systems will need 40 bits to hold the page-frame number, plus two for the resident-folio marker; that leaves 22 bits for the reference count. That size will be smaller on some other architectures (especially 32-bit systems) and, in any case, the possibility of overflow still exists. The complex system used to handle reference-count overflow in current kernels has been removed, though. Instead, if a reference count overflows, an array of unsigned long counts will be allocated for the entire cluster.
The third phase is in its second revision. Thus far, neither version has received much in the way of review comments; that suggests that the removal of the swap map is not yet imminent. Even once this happens, though, the work is not done; Song has alluded to a later phase that will integrate the swapping limits from the memory controller into the swap table as well. So, just like the rest of the kernel, the swap subsystem is unlikely to be considered complete anytime soon.
Development statistics for 6.19
Linus Torvalds released the 6.19 kernel on February 8, as expected. This development cycle brought 14,344 non-merge changesets into the mainline, making it the busiest release since 6.16 in July 2025. As usual, we have put together a set of statistics on where these changes come from, along with a quick look at how long new kernel developers stay around.As a reminder: LWN subscribers can find much of the information below — and more — at any time in the LWN kernel source database.
The 6.19 development cycle brought in the work from 2,141 developers, which just barely beats the previous record (2,134) set for 6.18; 333 of those developers made their first contribution to the kernel in 6.19, also a relatively high number. The most active developers for 6.19 were:
Most active 6.19 developers
By changesets Kuninori Morimoto 459 3.2% Christian Brauner 271 1.9% Johan Hovold 158 1.1% Ville Syrjälä 153 1.1% Ian Rogers 140 1.0% Russell King 124 0.9% Josh Poimboeuf 101 0.7% Andy Shevchenko 100 0.7% Krzysztof Kozlowski 93 0.6% Jani Nikula 91 0.6% Sean Christopherson 88 0.6% Filipe Manana 87 0.6% Marco Crivellari 87 0.6% Christoph Hellwig 86 0.6% Thomas Zimmermann 85 0.6% Eric Dumazet 85 0.6% Peter Zijlstra 82 0.6% Marc Zyngier 82 0.6% Frank Li 78 0.5% SeongJae Park 78 0.5%
By changed lines Miguel Ojeda 58000 7.8% Cyril Chao 19755 2.6% Christian Brauner 16604 2.2% YiPeng Chai 13293 1.8% Dmitry Baryshkov 12244 1.6% Ian Rogers 10933 1.5% Jason Gunthorpe 10851 1.5% Eric Biggers 9549 1.3% Daniel Scally 9429 1.3% AngeloGioacchino Del Regno 6201 0.8% Josh Poimboeuf 6010 0.8% Ilya Bakoulin 6009 0.8% Rob Herring 5777 0.8% Johannes Berg 5707 0.8% Svyatoslav Ryhel 5610 0.8% Akhil P Oommen 5516 0.7% Mauro Carvalho Chehab 5196 0.7% Neilay Kharwadkar 5162 0.7% Igor Belwon 5155 0.7% Lorenzo Stoakes 4830 0.6%
Kuninori Morimoto, who was first seen during the 2.6.28 development cycle in 2008, was the biggest contributor of changesets by virtue of a major refactoring effort in the sound subsystem. Christian Brauner, the maintainer of the virtual filesystem layer, refactored the handling of credentials, added the listns() system call, and added many self tests, among other contributions. Johan Hovold fixed numerous bugs and did a lot of cleanups in various driver subsystems. Ville Syrjälä worked extensively in the i915 graphics-driver subsystem, and Ian Rogers contributed a long list of improvements to the perf tool.
Looking at lines changed, Miguel Ojeda topped the list with the addition of a modified version of the Rust syn crate. Cyril Chao's first-ever kernel contribution, which put him into second place in the "lines-changed" list, was a driver for MediaTek mt8189 platform devices. YiPeng Chai worked with the amdgpu graphics driver, and Dmitry Baryshkov updated devicetree files for a number of Qualcomm devices.
A full 10% of the patches merged for 6.19 had Tested-by tags, while 56% had Reviewed-by tags; both of those numbers are slightly higher than usual. The top testers and reviews for this release were:
Test and review credits in 6.19
Tested-by Dan Wheeler 89 4.8% Joe Lawrence 63 3.4% Mark Brown 55 3.0% Randy Dunlap 53 2.9% Fuad Tabba 53 2.9% Lad Prabhakar 37 2.0% Shaopeng Tan 35 1.9% James Clark 34 1.8% Carl Worth 34 1.8% Hanjun Guo 33 1.8% Gavin Shan 32 1.7% Zeng Heng 32 1.7% Ryan Walklin 30 1.6% Kai Huang 29 1.6% Fenghua Yu 28 1.5% Yan Zhao 28 1.5%
Reviewed-by Charles Keepax 310 2.8% Dmitry Baryshkov 191 1.7% Geert Uytterhoeven 164 1.5% Frank Li 158 1.4% Krzysztof Kozlowski 156 1.4% David Sterba 144 1.3% Christoph Hellwig 139 1.3% Konrad Dybcio 125 1.1% Simon Horman 125 1.1% Ilpo Järvinen 118 1.1% Jan Kara 118 1.1% Jeff Layton 113 1.0% AngeloGioacchino Del Regno 111 1.0% Jonathan Cameron 109 1.0% Andrew Lunn 109 1.0% Ville Syrjälä 105 1.0%
The list of top reviewers is a bit different than in the past; somehow Charles Keepax managed to review 310 commits — more than four for every day of this 70-day release cycle — mostly within the sound-driver subsystem. The other top reviewers were focused on system-on-chip drivers and devicetree-related changes. The list of top testers is more typical, with Daniel Wheeler on top as usual.
The development of the 6.19 kernel was supported by 227 employers that we know of. The most active employers were:
Most active 6.19 employers
By changesets Intel 1591 11.1% (Unknown) 1410 9.8% 1099 7.7% Red Hat 829 5.8% Renesas Electronics 741 5.2% AMD 612 4.3% (None) 554 3.9% Qualcomm 485 3.4% SUSE 462 3.2% Microsoft 434 3.0% NVIDIA 407 2.8% (Consultant) 392 2.7% Meta 379 2.6% Oracle 371 2.6% NXP Semiconductors 325 2.3% Linaro 319 2.2% Huawei Technologies 260 1.8% IBM 236 1.6% Arm 200 1.4% Bootlin 160 1.1%
By lines changed 101728 13.6% (Unknown) 70125 9.4% Intel 59934 8.0% AMD 45025 6.0% Qualcomm 36322 4.9% NVIDIA 32585 4.4% Red Hat 31429 4.2% Microsoft 30621 4.1% (None) 27285 3.7% MediaTek 23223 3.1% Ideas on Board 14880 2.0% Renesas Electronics 14350 1.9% Meta 14232 1.9% Collabora 14088 1.9% Huawei Technologies 13257 1.8% SUSE 13209 1.8% Oracle 12943 1.7% Arm 12761 1.7% IBM 12230 1.6% Linaro 9015 1.2%
These numbers are reasonably consistent with recent history; hardware vendors are still contributing a large share of the changes. When considering which companies are most influential in kernel development, though, one should also look at the Signed-off-by tags added to patches by maintainers as they apply those patches to their repositories:
Non-author signoffs in 6.19
Individual Jakub Kicinski 991 7.5% Mark Brown 946 7.2% Andrew Morton 584 4.4% Alex Deucher 478 3.6% Greg Kroah-Hartman 406 3.1% Jens Axboe 277 2.1% Hans Verkuil 257 2.0% Bjorn Andersson 245 1.9% Paolo Abeni 218 1.7% Christian Brauner 200 1.5% Namhyung Kim 196 1.5% Shawn Guo 185 1.4% Martin K. Petersen 184 1.4% Peter Zijlstra 184 1.4% David Sterba 179 1.4% Jonathan Cameron 167 1.3% Alexei Starovoitov 141 1.1% Geert Uytterhoeven 129 1.0% Jonathan Corbet 121 0.9% Ilpo Järvinen 121 0.9%
By employer Meta 1544 11.8% 1296 9.9% Intel 1243 9.5% Arm 1171 8.9% AMD 848 6.5% Linaro 786 6.0% Red Hat 647 4.9% Qualcomm 564 4.3% Linux Foundation 441 3.4% Microsoft 432 3.3% SUSE 423 3.2% (Unknown) 409 3.1% NVIDIA 312 2.4% Cisco 257 2.0% Oracle 235 1.8% Huawei Technologies 233 1.8% (None) 209 1.6% LG Electronics 196 1.5% Renesas Electronics 174 1.3% IBM 133 1.0%
The top two companies here are both of the hyperscaler variety, with the top being Meta, which appears rather farther down in the list of changeset contributors. While Meta does contribute a lot of patches — and significant core patches at that — it also employs the maintainers that handle a lot more patches authored by others. Arm, too, shows a bigger influence by this metric.
Developer longevity
For as long as the kernel community has existed, people have worried about its ability to attract new developers. I have often pointed out that each release features the work of roughly 300 first-time developers; the response is often to ask how long those developers stay around. The time has come to try to give at least a partial answer to that question. The plot below was generated by accumulating a list of the 5,424 developers who made their first contribution to one of the 5.x mainline kernels, then looking at how many other releases each contributed to.
What we see is that 1,943 of those first-time contributors — 36% of the total — were never seen again after contributing to one release. Another 883 developers (16%) showed up for one other release, and so on. In the end, 32% of the first-time contributors during this period have been present for at least four kernel releases. At the long tail of the distribution, there are two first-time developers (Andrii Nakryiko and Vladimir Oltean) who have only missed one release and two (Stephan Gerhold and Stefano Garzarella) who have contributed to every release from 5.0 to 6.19.
To provide one other small data point: the first-time contributors being studied here arrived between early 2019 (the 5.0 release) and mid-2022 (5.19). Of those 5,424 developers, 1,067 (just under 20%) of them contributed to at least one of the releases made in 2025. It seems reasonable to consider that group as still being active in the kernel community. Whether these results are good or bad is probably a matter for debate, but it does seem clear that, while a lot of contributors pass through quickly, others are staying around for the long haul.
The kernel community as a whole is also clearly here for the long haul; the process shows no real signs of slowing down. As of this writing, there are just short of 11,000 changesets in the linux-next repository; most of those will move into the mainline in the upcoming merge window. The next kernel, which will be called 7.0, will continue to demonstrate the community's fast pace; stay tuned for the details.
FOSS in times of war, scarcity, and AI
Michiel Leenaars, director of strategy at the NLnet Foundation, used his keynote at FOSDEM to sound warnings for the community for free and open-source software (FOSS); in particular, he talked about the threats posed by geopolitical politics, dangerous allies, and large language models (LLMs). His talk was a mix of observations and suggestions that pertain to FOSS in general and to Europe in particular as geopolitical tensions have mounted in recent months.
Leenaars began by saying that there is a lot of good open source out there, but it is not being used for good. The irony is that in trying to empower people to take control of their own computing destiny, the FOSS community has empowered the wrong people—those who would like to use software to control others. The ideals of global cooperation and reuse have enabled abuse as well.
So how did we get here? Leenaars referred back to the birth of the
World Wide Web at CERN in
Switzerland. The thinking was, "we should do things for the world,
we should not have boundaries; let's see if we can
share
". Economies were booming, technology was advancing, money
was being made, and parliamentary democracies were taking
over. Everybody was in a positive, constructive mood. It was the "end
of history", a political philosophy put forward by Francis Fukuyama in
his book The
End of History and the Last Man. The thesis of the book was
that, with liberal democracy, humanity had reached its final form of
government.
Leenaars's talk description had been shared on Hacker News well
before FOSDEM; he noted that one of the
comments said that it sounded like "the official obituary for
the 90s techno-optimism many of us grew up on
". He said that it
is, in a sense.
As FOSS evolved, the community chose "dangerous allies
" in
the tech companies and future public cloud "hyperscalers". "We
thought we could control that; it was not a realistic assumption.
"
There was a darker narrative going on instead, he said; the US
National Security Agency (NSA) was carrying out mass surveillance and
spying on politicians in other countries, which came to light when Edward
Snowden leaked documents that revealed the existence of those
programs.
SCRAPS
Despite "this dark layer underneath
", though, people,
organizations, and governments in Europe were not upset enough to stop
working with and trusting businesses in the US. Instead, Europe
continued to depend on US tech companies, and to host its data in the
public clouds anchored there. He said that Europeans felt like equals
with the US, and that it was safe to trust "our friends and long-time
allies
" in building public clouds that it could rely on. "We
can focus on our core business, and look at the total cost of
ownership
" instead of infrastructure.
That dependence, he said, "makes you incompetent, a victim of
potential abuse
". It's fine in the short term, but the pain comes
afterward. If the entire European Union depends on external
providers, and it does, it draws the short straw. "We don't have
capacity. We are literally incompetent
". CTOs were proud of
"cloud-first" strategies; he proposed a different term, "strategic
computer rental and anchoring to proprietary services
"
(SCRAPS).
Even SCRAPS are not guaranteed. Providers of cloud services can
refuse to do business with an organization, or be compelled to do
so. He referred to sanctions
against the International Criminal Court that caused Microsoft to
block
the email account of the court's chief prosecutor. "We're now
at the mercy of the same people who profit off of us, and they still
hold the kill switch.
"
European people, Leenaars said, are now in panic mode and looking
for government to keep society afloat. "We
shouldn't have become so dependent, but that's about three decades too
late
". Still, many people inside governments are running
toward the fire instead of away from it. He mentioned the Netherlands
Ministry of Finance that has been working on a migration to Microsoft
365. The ministry has seen the whole situation, but it's put so much
effort into it and has been "locked in to the same company for 50
years
". A sort of Stockholm Syndrome has evolved, he said. But he
agreed it has a problem with their current tools. "I filed a
freedom of information request with them three months ago, and they
have not been able to produce a single document
". He thought it
would be nice if the ministry had gained some situational awareness and would
stop putting people in danger.
History did not end
The government's answer is, "let's get more European startups,
lots of competitors
", he said, but that is the wrong
approach. "We don't need to breed more predators; we need
mission-driven organizations, we need companies that are public
stewards.
" He called for a pipeline from academia to engineering,
to nonprofits and service companies that do not seek to be captive
platforms. Simply having a public cloud that is owned by European
businesses is not the answer if those businesses follow the same
models as the US ones.
The world, Leenaars said, is in the worst shape that it's been in
for decades. It turned out that history did not end after all. He
talked about social media and described it as "95% FOSS and the rest is
cognitive warfare
". He had complaints not only about
disinformation being spread online, but the short-form content
that is popular today as well. Kids, he worried, were becoming
dependent on short content that did not deal with complexity. "I
don't fear World War III as much as I fear de-enlightenment and a
subsequent second dark ages.
"
His next worry for FOSS was as a target for state actors in warfare. Countries are now targeting the enemy's software and devices as well as waging traditional warfare. He referenced the Lebanon electronic device attacks (dubbed "Operation Grim Beeper") carried out by Israel in September 2024; those attacks made use of pagers and two-way radios carried by Hezbollah members that had been compromised at some point in the supply chain. That had enabled Israel to eavesdrop on its targets' communications until it then detonated the devices on September 17 and 18.
He also discussed the backdooring of XZ in
2024: an attack that was conducted by "Jia Tan" after gaining trust
with the original XZ maintainer over a long period of time. The
average company has 25,000 software dependencies, he said, and any of
them could be used to break in. There are millions of packages, and
millions of people maintaining them; all of those maintainers and
packages are potential weak spots. But if the new people coming in to
help cannot be trusted, or if maintainers are too paranoid and chase
contributors away, "we're also screwed
".
Cavalry or Trojan Horse?
At this point, Leenaars said, we see horses on the horizon in the
form of LLMs; is that the cavalry coming to the aid of FOSS or an army
of next-generation Trojan Horses galloping through the gates of the
village? The promise of LLMs is that they can take responsibility off of
developers' hands, and allow organizations to focus on the core
business. "That's a thing we've heard before. The product framing
is super-good. Sounds so legit."
He reminded the audience of the
saying that there is no cloud, only other people's computers. In
this context, though, he suggested: "there is no Claude, only other
people's code
."
Leenaars said that LLMs do a good job of some things, but claimed
it was fundamentally impossible for them to do all the things they are
expected to do. It is possible, he allowed, that LLM-tools could do
"a lot of the janitoring we can do that humans are really weary of
doing
". There are, after all, many boring tasks in software
development humans might like to offload. He recommended
that the audience be cautious about what machines are allowed to
do. Keep security in mind, and keep LLMs contained; but even then, he
said he was not convinced that there was a problem that needed solving
by LLMs.
Instead, if FOSS has such a large attack surface in the form of so
many libraries and dependencies, trying to reduce the attack surface
makes more sense than adding LLMs into the mix. It also makes sense to
try to reduce maintainer burnout. He called on "people
in the military who are seeing huge budgets
" to spend some of that
money on talented programmers who could improve FOSS and reduce its
attack surface. There are billions and billions of Euros that will be
invested in Europe's defenses, some of that money should be spent on
FOSS. "The FOSS ecosystem should not build stuff for weapons, but
should get money from people who need to defend us. We are their
defense, we are their infrastructure.
" Europeans should be telling
politicians that they do not just need to support FOSS to enable
digital sovereignty, but also for defense. With that, Leenaars wrapped
up the talk, without any time for questions.
Overall the talk was a bit disjointed, and Leenaars presented few concrete suggestions for the audience. But the talk seemed to resonate with the packed main room, and he touched on topics that were prevalent at FOSDEM all weekend: wariness of the changing political picture in the US, distrust of AI/LLMs, as well as a desire to reduce dependence on US companies and services.
[Thanks to the Linux Foundation, LWN's travel sponsor, for funding my travel to Brussels to attend FOSDEM.]
Page editor: Joe Brockmeier
Inside this week's LWN.net Weekly Edition
- Briefs: Kernel ML; tag2upload; LFS sysvinit; postmarketOS FOSDEM; Ardour 9.0; Offpunk 3.0; Dave Farber RIP; Quotes; ...
- Announcements: Newsletters, conferences, security updates, patches, and more.
