Leading items
Welcome to the LWN.net Weekly Edition for May 28, 2026
This edition contains the following feature content:
- Dirk and Linus discuss AI and kernel development: an informal conversation with kernel creator Linus Torvalds.
- Ongoing coverage from the 2026
Linux Storage, Filesystem, Memory Management, and BPF Summit:
- BPF support in GCC 16 and beyond: an update on the state of BPF support in GCC.
- Support for private memory nodes: how to better manage special-purpose memory provided by devices.
- Custom page-cache policies with BPF: making it possible for user space to influence when pages are evicted from the page cache.
- Toward better handling of major page faults: despite a lot of work in this area, page-fault handling still can be subject to lock contention; how can that situation be improved?
- Reviewing kernel patches with LLMs: a discussion on how to use LLMs for patch review as well as how and where to continue developing the prompt files being used.
- Tier-aware memory-controller limits: adding support for tiered-memory systems to the memory controller.
- Better automatic management of transparent huge pages: the ongoing task to make transparent huge pages truly transparent.
- Further progress toward removing the page map count: the quest to simplify the accounting of page mappings continues.
- MOT: a tool to fight openwashing in AI: helping to define openness for LLMs and their components.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Dirk and Linus discuss AI and kernel development
Linus Torvalds does not enjoy giving talks, but he does consent to the occasional on-stage conversation with Dirk Hohndel at Linux Foundation events. The pair held the 30th of their fire-less fireside chats during a keynote session on May 20, at the 2026 Open Source Summit North America. Topics included 3D printing, guitar pedals, the recent 7.1-rc4 release of the kernel, and Torvalds's complicated relationship with AI tooling.
3D printing
Hohndel kicked off the conversation by saying that Torvalds is a
"huge fan of 3D printing
", and that they owned the same model
of 3D printer. One of the things that is interesting about this space,
he said, is that "basically everything is open source
". Another
thing that he and Torvalds have in common is a dislike of visual tools
for creating the 3D models; they both prefer OpenSCAD, which allows creating
models with a programming language instead. Hohndel wondered if
Torvalds had worked with OpenSCAD's code, or was simply a user of the
tool.
Torvalds said that he was just a user of the software; he enjoyed
describing things in text and treating 3D printing as programming. He
also liked that it produced a physical object, which was something he
didn't get from his work on the kernel. When he codes, though, he
prefers to be "really close to the hardware and work at a different
level
". He had no desire to get involved in OpenSCAD as a project,
because it is so different from what he is used to coding.
Hohndel said that sometimes he would want to fix a bug in an
application, "and then I look at the code and I realize I have
absolutely no idea how I would fix that
". It was a wonderful
learning experience, he said, finding that there were so many things
he knew nothing about but still enjoyed using.
Another interesting thing, he said, was that for many years
open-source tools were considered to be "maybe a little
clunkier
", and proprietary tools were so much nicer; the
open-source applications for 3D printing, though, were "really
cool
" and of high quality. Torvalds replied that "we're
actually past the point where people think that open source is just
for engineers
".
Guitar hero
Torvalds was introduced by Linux Foundation CEO Jim Zemlin as the
creator of two industry-shaping tools: the Linux kernel and the Git
version-control system. However, Hohndel said that he should really be
credited for three major innovations since Torvalds had created
the Subsurface dive-log
application as well. That may, in fact, need to be revised upward to
four with "a guitar-effects pedal written by the one and only Linus
Torvalds
".
The project
is available on GitHub and includes all of the software as well as
necessary schematics, available under the GPLv2, to build a working
effects pedal. Torvalds cautioned that interested users would have to
manufacture the device before playing with it. While it would be
possible to place the components by hand, he recommended sending the
design files to a printed-circuit board (PCB) manufacturer instead. He
had started doing it by hand, but decided that it was too fiddly and
it would be better to let the professionals do it. "If you're into
guitar pedals or into music and want a really bad guitar pedal, you
can now make one yourself
".
Hohndel said that he was an early beta tester of the guitar pedal
and declared "it's not bad, and of course there's a 3D-printed
housing for it and everything. It's really fun.
" Torvalds
deadpanned that he would "change the world of music, too
".
Impact of AI tools
Steering the conversation around to Torvalds's first hobby project,
Hohndel noted that Linux 7.1-rc4 had
recently been released. "My usual questions, 'what's going on? How
are we doing?'
" Torvalds responded that the kernel has had the
same process now for about 20 years, since switching to Git. "I
used to say that it's all working fine, and it's smooth sailing, it's
steady progress. And then, about half a year ago, things
changed.
" He said that in the past six months, the kernel has seen
a lot more commits, and about 20% more commits for the past two kernel
releases than the project had seen for many years.
His first theory was that companies were pushing to get code into
the 7.0 release, because it's a .0 release and a similar thing had
happened with the 6.0 release. "And it turns out I was wrong.
"
The real change was that AI tools had improved enough for a lot of
people "that we're seeing a definite uptick in development on
pretty much all fronts
". The tooling had lowered the barrier of
entry to writing a kernel patch, and that had an impact. That impact
is not entirely positive, he said.
Hohndel replied that part of the 7.1-rc4 announcement was a change
to the kernel's security policy. In the announcement, Torvalds said
that the flood of AI reports had made the security list "almost
entirely unmanageable
". As a result, Willy Tarreau updated the security
bugs documentation clarifying the definition and handling of
security-related bug reports for the kernel.
Torvalds paused to ask how many people in the audience used AI for
coding; after scanning the room, he said, "yeah, pretty much
everybody
". He said he had a love-hate relationship with AI: "I
love the tools. I find it very useful and interesting. But it's
definitely causing pain points.
"
The big pain points in Linux development have been when people are forced to change how they work, he said. People find a comfortable way of working, and then something comes along and disrupts it:
Around 2000, I had to change how I worked because I did not scale anymore for the project as Linux was growing. I still remember it being one of the more painful episodes in kernel development. And it was literally 25 years ago. And I think we're seeing some of the same effects now with AI, where it kind of forces people to get out of their comfort zone.
AI-generated bug reports, he said, are causing the same kind of
pain for maintainers. People do not scale, and it is taking a while
for people to figure out how to use AI tools "in a, maybe
responsible isn't the right word, but in a way that actually works
with the community and works with the other developers
". The
kernel community is definitely seeing some of the pain:
You have this list with pretty few people on it because it's all supposed to be super secret, and we spent all our time just forwarding these reports to other developers who knew that area better. And we made the policy change that basically if you find a security or any bug with AI, you should basically consider it to be public. Just because if you found it with AI, a hundred other people also found it with AI.
Though the bugs should be considered public, Torvalds said that
reporters should not make the exploits public. He emphasized that he
was not only speaking about the Linux kernel, but for exploits in any
piece of software. "Just let people know what the problem was and
don't necessarily tell them exactly how to make somebody's life really
miserable on a Friday afternoon.
"
Security people love attention
He reiterated that he really enjoyed AI tools, and did not feel
that the technology was bad; however, there were still social issues
that had not yet been worked out and those were the key. Security
reporters have been seeking attention for a long time, he said, and
going out of their way to brand security vulnerabilities. They create
a domain, logo, and "want to get all the fame for the bug and then
release it before they ever talk to the victims, i.e., the maintainers
who then need to fix it
."
Hohndel noted that there were four recent
local-privilege-escalations found in the kernel, two of which were
disclosed without talking to the maintainers. "My response is
always, 'here is a company I never want to work with, because if you
do this to the Linux kernel, you do this to anyone.'
" The four
bugs demonstrated out the challenge that Torvalds was talking about,
because there was no opportunity to fix the vulnerabilities before the
world learned about them. But, Hohndel asked, if all AI bugs are
treated as public, "doesn't [that] create the situation that the
maintainers are always on the back foot?
"
"I'm sadly of the opinion that we can't get around that
",
Torvalds said. The kernel has a lot of code, and that means it has
bugs: that should not surprise anyone. It used to be, though, that
kernel maintainers could inform Linux distributions that they really
need to upgrade without describing exactly what the security
vulnerability was. Since the kernel is open source, the fixes were
public, "but they were quite often subtle security issues that are
really hard to figure out. In the time of AI, you can just automate
the figuring it out part
".
Last week, he said, "we fixed a bug; within three hours there
was a blog post up about the implications of that bug fix, because
security people love getting attention. Don't get me wrong, we all
do.
" Security businesses have a real incentive to run AI tools to
find vulnerabilities, make splashy announcements, and be the first to
report them. "And I think this is just how it's going to be. And
there is now nothing you can really do.
" The solution is not to
stop doing open source; AI can just as easily reverse engineer
closed-source software to find vulnerabilities. However, in those
situations, AI cannot be used to help fix the vulnerabilities.
Hohndel said that he had noticed that the companies "enjoy
spending a lot of money and a lot of tokens on pointing out a bug, and
strangely, none of these came with a patch
", even though the
kernel is open source. It was kind of sad that all the attention
from journalists was for finding a bug, and that companies did not
think they could get more attention for delivering fixes for the
bug.
"To be fair, sometimes it is easier to find [bugs] than to fix
them
", Torvalds said. It may sound negative, but he thought that
it was all very good. AI finding bugs meant short-term pain, but the
long-term benefit is that a bug was found and fixed. The end result
was better for it. "The conflict is not that AI is bad, the
conflict is then that there are some social checkpoints and social
pain points that come with this new tool
". The kernel has 35 years
of code, and AI sometimes finds issues that kernel developers had not
found; it was going to take some time to work through the new
issues. "I'm actually very positive about this whole
thing
."
Given the flood of bug reports, Hohndel asked if there were any
good tools that he used to help with code review, understanding
patches, or otherwise help with his workflow. Torvalds responded that
kernel maintainers have a ton of tools and pointed out that "the
Linux kernel is actually doing very well; every single release we have
over a thousand people involved and a solid cadre of
maintainers
". Most of the time, the maintainers are paid well for
being there, too.
Torvalds said that he had been talking about Linux development and
problems with AI because that is what he works on. "But think of
all the tens of thousands of random projects that people maintain that
are not the Linux kernel.
" Those people are at risk of burnout
when they get a flood of AI-driven security reports or bug reports,
"and when you ask for more information, the person has done a
drive-by and doesn't even answer your questions anymore
". Then
Torvalds admitted that he had forgotten what the original question
was.
Hohndel reminded him that the question was about the tools that he
used. "Oh, yeah. We do use AI tools,
" such as Sashiko, which
produces reviews of patches sent to the kernel mailing list. Many
companies are working on private internal tooling, he said, and many
of the main kernel developers are using local AI. He suggested others
also look into local AI tooling: "You don't want to be entirely at
the mercy of the big companies that at some point decide, oh, we need
to make money too.
"
However, most of his work consisted of collaborating with people. He does
not do a lot of coding as a top-level kernel maintainer; his job is
working with people, and he does not use AI to work with
people. "And I should suggest that you don't do that either.
"
Compilers
Since Torvalds had mentioned working with people, Hohndel asked him
what advice he would give someone at the beginning of their
career. Where should they focus? Torvalds replied that AI is a great
tool, but it's just a tool. When he sees people saying that 99% of
their code was written by AI, he gets angry: "I pretty much
guarantee that 100% of their code is written by compilers, but they
never say that.
"
He had grown up writing machine code: "And when I say machine
code, I don't mean assembly language. I mean the numbers.
" He said
that working that directly with the hardware "leaves an
imprint
". It took him a while to adopt higher-level tools. "I
figured out compilers were good, and these days I'm figuring out that
AI tools are good, too.
" He's still writing the code, but not
doing it the same way that he had done before. He was convinced that
AI was changing programming, but not its fundamentals.
It's like the same way that you all use compilers to actually generate your code. You will all use, well, not maybe all of you, but a lot of people will use AI to generate the code that the compilers use to generate the code that the assemblers then use to generate the machine code. This is revolutionary in the same sense that we've seen revolutions before. And AI will increase your productivity by a factor of 10.
And I claim that compilers increase your productivity by a factor of a thousand. So AI is great, but AI is not changing programming. It may be changing other areas, don't get me wrong, but I'm a programmer, so I don't care.
That said, he added that he still wants to understand how
everything works. He does not program in machine code anymore, but he
still looks at the generated code. When he uses a compiler, even when
he uses AI for "pet projects
" like the guitar pedal, he looks
at the assembly language end result because that is what he grew up
with. Even when someone uses AI for coding, if it's for a project that
will be maintained for a long time, "you need to understand not
just your prompts, but you need to understand your end result, because
that's the only way you can maintain it long term
".
At that point, the time allotted for the session had run out; Hohndel said that he had a lot more questions, but he would just have to ask them next year when they did the same session.
[Thanks to the Linux Foundation, LWN's travel sponsor, for funding my travel to Minneapolis to attend the Open Source Summit.]BPF support in GCC 16 and beyond
José Marchesi and the GCC-BPF developers opened the BPF track at the 2026 Linux Storage, Filesystem, Memory-management, and BPF Summit with a 90-minute summary of what has changed for GCC's BPF support in the past year. This kind of session has become something of a tradition. There were similar updates in 2025 and 2024. This time around, GCC seems to be closing in on feature parity with the LLVM toolchain — as the slides detail.
Usually, when the GCC-BPF developers come to conferences, they present for an hour, ask some questions, and that's the end of the discussion, Marchesi said. He wanted to do better this year, promising to remain reachable throughout the conference. He wanted to use the conference to discuss the remaining handful of fixes needed for GCC to pass the kernel's BPF self-tests, which were detailed in the latter half of the talk.
There is now BPF support across the GNU toolchain, Marchesi continued. GCC, of course, but also projects like binutils, DejaGNU, GNU poke, and even GDB support BPF. Some of that "support" has not quite been kept up to date, however. GDB's BPF simulator, for example, is not used much and so has fallen out of date. That said, the other components of the toolchain are making good progress.
GCC 16.1 was
released on April 30. That was the first release to feature work from
Vineet Gupta, who joined the GCC-BPF team recently and is "already making the
rest of us look bad
" with the quality of his contributions. There is also now a
BPF-specific GCC mailing list, bpf@gcc.gnu.org, and a
weekly meeting
on the Software Freedom Conservancy's BBB instance on Mondays.
"It's fun. We have fun. If you are bored, Monday ...
"
Meanwhile, GCC is now able to pass an increasing number (601 of 5488)
[Faust wrote in to explain that I was mis-reading the output of the self-tests,
and that it is actually 601 tests out of 713, comprising 5488 sub-tests.] of the
kernel's BPF self-tests. Lots of the remaining problems apply to a large number
of tests, Marchesi said, so it's a relatively small set of things to fix to make
the self-tests pass. Even before that point, though, GCC does work to compile
many of the simple BPF programs used by systemd. Some distributions, such as
Gentoo, [Gentoo maintainer Holger Hoffstätte
says that Gentoo's GCC-BPF support is optional and not the default.]
use GCC as their default BPF compiler, which is great because it means
that the GCC developers receive actual bug reports.
GCC also has some work-in-progress support for the variant of BPF used by
Solana
— a blockchain project that uses BPF for on-chain contracts. "I don't
understand a lot about those things,
" Marchesi admitted. He also wasn't sure
why they were using a modification of BPF. But, since they are, it provides an opportunity to steal
some of their ideas. For example, Solana has 64-bit product, quotient, and
remainder instructions that might be worth incorporating into BPF proper.
Other convenience features of GCC have also seen progress. GCC now generates line information for BPF programs, so verifier diagnostics can reference specific lines. Gupta has been working on some ABI bugs, and there are various fixes to the code-generation logic, Marchesi said. In particular, memmove() and memset() are now inlined properly.
"Compile once — run everywhere" (CO-RE) relocations, which posed a problem for
GCC last year, have continued to be troublesome. Eventually, the GCC team
decided to just implement the same support for pushing and popping attribute
pragmas that Clang uses to support the feature. "We've had enough. So, we're
going to implement those, if only for structs.
"
As GCC comes closer to passing the kernel's BPF self-tests, the team has also added BPF tests to GCC's test suite, Marchesi said. The BPF support in the DejaGNU testing framework (added by Piyush Raj) has been helpful for that; now, running make check in the GCC repository will automatically download and compile an appropriate kernel, run it in a virtual machine, and use it to run a selection of BPF tests. GCC developers working on other areas of the toolchain don't need to know anything about BPF in order to test it. Hopefully, this should ensure that unrelated changes to GCC don't affect the verifiability of the BPF bytecode it generates.
In response to a question from the audience, Gupta clarified that these tests are run as part of GCC's continuous-integration (CI) testing, but that they could also be part of the kernel's CI tests. The GCC tests essentially make sure that changing the compiler with a fixed kernel version doesn't break things; the kernel tests should ensure that changing the kernel with a fixed GCC version doesn't cause regressions. The two uses could share code, however, Marchesi added.
He summarized the status of all of this work with one table:
The only thing that the assembled kernel developers thought was missing from the table was the status of support for indirect calls and indirect jumps; otherwise, the summary was accurate.
CO-RE problems
At that point, Cupertino Miranda stepped up to talk more about the details of GCC's support for CO-RE relocations. In order for BPF programs to be compatible with multiple kernel versions, they need to be able to access fields in kernel structures at the correct offset, even when those fields have been moved around. CO-RE relocations record, among other things, where the program needs to be updated to account for those changes. C headers indicate which structures need these relocations emitted using the preserve-index-access attribute.
Clang propagates structural attributes to contained structures, while GCC does not. This incompatibility caused problems for GCC's CO-RE relocations. The solution is to add support for pushing and popping a compiler pragma that instructs GCC to treat every encountered structure as having the preserve-index-access attribute.
There was a small discussion about how to implement and merge that in accordance with the wishes of the core GCC developers, before Miranda moved on to discussing bitfields. They are, as might be expected, an additional complication for CO-RE relocations. Andrii Nakryiko explained that the kernel's networking code sometimes has fields that switch from being defined as bitfields to being defined as integers, or vice versa. Clang does not currently handle this correctly — it will generate code to extract the bitfield, but it could be at the wrong offset — which is why the networking code uses a macro to encapsulate "bitfields" in CO-RE-relocatable structures and perform the accesses manually.
Miranda agreed that implementing proper support for relocatable bitfields was tricky, and asked the assembled developers whether it was important to actually implement, if the actual code used a macro to work around the problem already. Nakryiko opined that GCC should try to generate correct code, but that it should emit a warning when a bitfield appears in a CO-RE-relocatable structure. Miranda agreed that was fine.
Packed structures present some of the same problems for code generation that bitfields do. The networking code does have existing packed structures, Nakryiko said, so those also need to work. Although in the future, the networking subsystem will be moving toward more selective use of packed structures. There was a bit more discussion about the implementation, before Miranda and Nakryiko agreed to discuss further offline.
Types and optimization
At that point David Faust got up to speak about the
BTF type and declaration
tags problem that the team had discussed in 2025. GCC finally has support
for the same set of tags that Clang does,
but that support is slightly different than for Clang. The DWARF
debugging format is famously hard to extend, and in order to receive approval from
the other GCC maintainers, Faust had to use a different identifier for the added
tags. The
poke-a-hole utility, which needs to process this debugging information
as part of a kernel build, can recognize the new identifier, so it should not be
a big deal, Faust said. Other than that one difference, GCC and Clang should now
generate debugging information in identical formats for BPF programs.
This new support is available in GCC 16, so "we
can start using [it] in anger
".
The last item that the GCC team wanted to bring up was how to handle situations where an optimized build of the kernel changed the prototype of a function exposed to BPF. For example, GCC's optimizer can see when a function is only ever called with a fixed constant value in one argument, and eliminate that argument from the function. BTF relies on function signatures to allow BPF programs to find and call kernel functions, however.
That particular case is simple enough that it should be reconstructable from the DWARF debugging info, but some transformations are more complicated. For example, structures that are passed by value may have only the accessed fields passed — a transformation that DWARF cannot represent and that the upstream DWARF project is not interested in representing. GCC obviously knows what all of the relevant transformations are, Faust said, it just has nowhere to put that information so that the kernel can access it. If BTF can be extended to handle that information, and if the kernel build process can use the BTF directly generated by GCC, that would be sufficient.
Alexei Starovoitov mentioned that when Clang had added support for directly emitting BTF, the Clang developers copied the deduplication logic from libbpf. He was worried that if GCC did the same thing there would be three slightly different, separately maintained versions of the same logic, which would be messy. Realistically, he said, only the deduplicator in libbpf really works. Nakryiko said that there were also complications introduced by trying to deduplicate both weak and non-weak BTF map definitions.
Faust also asked whether it would make sense to add kfuncs that implement common
bit-manipulation compiler builtins, which are currently inlined wherever they
occur. __builtin_clz(), for example, expands to around 30 BPF
instructions "which is suboptimal
". Nakryiko agreed that this was
acceptable, and had actually been the motivation behind adding fast kfunc calls to
BPF in the first place — allowing the kernel to accelerate common operations in
BPF. He did ask that all of the bit-manipulation functions be
added at once, so that they would have matching names; Faust readily agreed.
Gupta finished up the session by explaining some of the differences between GCC's generated code and LLVM's generated code; both kinds of code are valid, but the verifier has an easier time working with LLVM's version. He plans to address part of the problem by adding a cost model for BPF so that GCC's optimizer produces more LLVM-like code, and part of it by expanding what the verifier can understand to better accommodate GCC's output. In all, GCC support for BPF seems to be coming along nicely. It is already usable for simple real-world programs, and will only become more so if more projects start using it and filing bug reports to guide the remaining work.
Support for private memory nodes
Gregory Price started his session in the memory-management track of the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit by saying that, in current kernels, if a NUMA node has memory, the assumption is that anybody can make use of it. He is trying to implement the opposite policy — to make some memory off-limits for all processes except those designed specifically to use it. The session was used to present his goals and to discuss how they might be implemented.
We are, he said, seeing an increase in devices that bring a lot of memory
with them; they include various types of accelerators, specialized network
interfaces, and more. He has been working specifically with devices that
provide compressed RAM; that memory should not be managed in the same way
as normal system RAM. There are other people working on similar problems,
he said, but there is one common feature: access to a specific range of
memory should be restricted in some way. Other aspects differ, but every
use case ends up reimplementing part of the memory-management subsystem.
Specifically, he said that private memory must have a few attributes. The buddy allocator cannot fall back to it. Any allocations from that memory would have to be explicitly requested; the memory cannot be allocated to users otherwise. And kernel services should not touch folios placed in that memory in unexpected ways. The problem in current kernels is that the buddy allocator will fall back to specialized memory at times, and hot-plugged device memory is not exempt from that policy.
One possible solution would be to simply not add the device memory to the fallback lists. It is an easy and complete solution, he said. Allocations from that node would have to be requested explicitly. The downside is that there is no support for multi-node allocation; if there are two devices, any given call can only request memory from one of them. That makes it hard to interleave allocations across multiple private nodes, and also prevents allocation requests from falling back to ordinary RAM if the special memory is unavailable. At the same time, incidental allocations from the private nodes can still happen; there are multiple places in the kernel that use a for_each_online_node() loop to allocate memory on each available node, and that can't really be fixed.
The conclusion, he said, is that just removing private nodes from the fallback lists is a fragile solution. David Hildenbrand asked if the for_each_online_node() pattern is correct; Price answered that any pattern like that is probably broken, but it will still be widespread regardless. John Hubbard said that the code was perfect when it was written, but the world has changed around it. Price said, though, that the pattern was broken from the beginning, since it assumes uniformity among NUMA (being non-uniform memory access) nodes.
Price's preferred alternative would be to add a new allocation flag, __GFP_PRIVATE, that enables allocation from a private node. If the allocation fails, the allocator would fall back to ordinary RAM, ensuring that allocation requests succeed even if the preferred node has no available memory. Kiryl Shutsemau said that this approach was working around broken allocation patterns, and that it would be better to fix them. Price answered that he had tried, but there is a lot of code to look at; he could spent two years at it and still not be done. If __GFP_PRIVATE is not acceptable, he said, he would simply stop trying because he does not see another way to solve the problem.
Johannes Weiner suggested adding special iterators to select only nodes with a CPU or nodes with memory. Price asked how that would help with existing iterators — a list that includes every call to alloc_pages_node(). It is, he said, an impossible situation to police. Matthew Wilcox said that private-node allocation didn't look like a proper use for a GFP flag; perhaps reusing GFP_DMA, which is not really needed on current systems, would be a better approach. Price answered that he doesn't care about the details as long as he has a way to access the private node.
Hildenbrand asked what sort of code should be able to allocate from private nodes, and whether it would be possible to restrict the use of private-node memory to folios, which would simplify the problem somewhat. Price said he needs to see more users of this functionality to be able to answer that question. Jason Gunthorpe said that some sort of driver-specific handle might be a better way to request private memory than using a GFP flag, but Price said that he still needs allocations to fall back to regular memory.
Moving on, Price said that another problem that comes up is that other
parts of the kernel can touch private memory in surprising ways. NUMA
balancing, for example, will set the page protection to PROT_NONE
to detect accesses, with "nasty results
". Memory compaction may
migrate the page out of private memory. Many of these problems have
already been solved for ZONE_DEVICE memory; that solution can be
extended by checking for private memory in the same places. The default
would be to opt out of most memory-management functionality, but Price
proposed adding some node attributes that would request, for
example, NUMA balancing or working reclaim.
Price concluded by repeating that he needs a GFP flag, or some equivalent,
or he will have to just give up on the problem. He is working on a minimum
viable patch that includes a single flag to opt into NUMA balancing. He
would like to add reclaim support too, but "reclaim is actually five
chipmunks in a trench coat
". He would like to be able to opt into the
attention of some of those rodents (compaction, for example, or tiering)
without getting the whole thing. Compaction would require a special
callback to tell the device that a folio has moved.
At the close of the session, Shutsemau asked what would happen if special
memory becomes more common; how could it be integrated back into the core
memory-management subsystem? Price called that a "big question
"
that he didn't know the answer to; he is just trying to make some steps in
the right direction.
(See also: Price's followup post on this topic.)
Custom page-cache policies with BPF
The kernel's page cache is charged with maintaining pages (or, more correctly, folios) containing copies of data from files in the filesystem; its performance has a big effect on the performance of the system as a whole. One of the key decisions the kernel must make is when to evict folios from the page cache. At the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, Tal Zussman ran a memory-management-track session on how the page cache could be better customized for specific workloads. It will not be much of a spoiler to say that it involves BPF.Eviction from the page cache can be managed by either the traditional least-recently-used (LRU) algorithm or the multi-generational LRU; these subsystems were discussed in detail earlier in the Summit. Zussman started by saying that there are workloads that are not served well by either of those options. As an example, he mentioned an unnamed financial database that performs a large number of small, performance-sensitive queries; there is also a set of lower-priority analytical tasks. The database will fill the page cache while handling the high-priority work but, when an analytical scan starts up, most of the data needed to satisfy those queries is pushed out. The next such query then ends up thrashing all of that data back in. The current page cache, he said, is not flexible enough to address this problem; it is leaving performance on the table.
Vlastimil Babka asked why the kernel's access-twice heuristic does not prevent this scenario. That heuristic marks data as inactive when it is first added to the LRU lists; the second access causes it to be marked as active. Inactive data is evicted first, so this heuristic keeps single-use data from pushing out more useful pages. Zussman answered that there are often multiple scans happening at the same time, fooling that heuristic, but the real problem is that the page cache lacks awareness of how the application accesses data.
There are three ways that this problem might be fixed, he said. One would be to change the LRU policy implemented by the kernel, which is hard to do and would probably regress other workloads. The existing hint interfaces (such as posix_fadvise()) are not able to affect eviction policy in useful ways for many workloads. The second way to solve the problem is application-level caching, which has downsides of its own, including duplicating the caching done by the page cache. Using direct I/O can address the duplication problem, but it requires reimplementing a lot of functionality that the kernel is already providing.
So, he said, there needs to be a way to fundamentally change the underlying policy. The sched_ext subsystem makes that possible for CPU scheduling; he is proposing a similar feature, called cache_ext, for the page cache. It would allow page-cache policies to be loaded (as a BPF program) from user space, with no kernel changes, and attached to control groups. It is implemented as a struct_ops program with callbacks to inform the program when folios are added to or removed from the page cache, and when they are accessed. Another callback requests the program to evict a number of folios from the page cache.
Shakeel Butt asked whether this interface would be general enough to manage all of memory, not just the page cache; Zussman answered that he is focused on file-backed memory for now, but the interface could probably be extended. The set of hooks needed for anonymous pages would likely be different, though.
Policies, he continued, operate on eviction lists maintained by the BPF program. When a folio is added to the page cache, the program picks a list to add that folio to; at eviction time, the program chooses from whichever list it thinks is best. It is a simplistic structure, he said, but most policies can be implemented using these lists. An audience member asked if the lists could be managed in a BPF arena; Zussman said that might be possible, but that limitations within BPF make it hard.
For the financial workload described above, user space would inform the cache_ext program which applications are performing scans by storing their process IDs in a BPF map. When a folio is added to the page cache, the program checks to see if it is being added on behalf of one of those scan applications; if so, the folio is put onto a special list that is targeted first at eviction time. For this application, he said, this policy produced a 70% increase in query throughput and a big reduction in tail latency.
David Hildenbrand asked whether there is an access callback that can recognize a high-priority task and move the relevant folios to a new list; Zussman said that could be implemented, but is not being done now. Kiryl Shutsemau said that turning to BPF is an overreaction to the problem, and suggested that the focus should be on creating better kernel interfaces instead. Improvements to posix_fadvise(), perhaps, could set the "don't cache" flag on folios that should be evicted quickly. Matthew Wilcox, though, expressed regret that this flag, which consumes a scarce page flag, had ever been added. Had cache_ext existed before, he said, that flag would never have been necessary.
Liam Howlett expressed some dismay that cache_ext uses policies attached to control groups; Zussman said that the LRU lists are already maintained for each control group, so that is a natural place to apply the policy. In this way, different groups can have different policies. Howlett said that, if the workload changes, the application is still locked into the old policy unless it is moved to a new control group; Zussman said that the policy can be changed at any time.
Babka asked whether the eviction hook runs when reclaim is done globally, or when it happens at the control-group level; the answer was the latter. Brendan Jackman asked why the policy was being applied at the page-cache level rather than, say, to the inode cache? Zussman said that there may be a place for policies at the inode level, but that would be a much higher-level policy.
Hildenbrand asked how a control group would be transitioned to a new policy; the answer is that switching is a destructive operation — the lists that had been constructed by the old policy would be lost. It is possible to export some knobs to tune policies, which could avoid the need to replace them entirely much of the time. Hildenbrand said that, at times, the memory-management subsystem will remove specific folios from the LRU lists for a while, and asked if the policies could handle that. Zussman answered that any metadata stored in BPF maps would persist in that situation, but the list information would be lost.
The last question came from Shutsemau, who asked whether the workload will be expected to tag page-cache requests somehow for the benefit of the policy. The answer, Zussman said, depends on what the goal is. If the intent is to create a generic policy that does not understand the behavior of specific applications, there will be no use for tagging. A more application-aware policy, though, may want information from the application about what it is doing.
Zussman has posted the slides from this session.
Toward better handling of major page faults
A major page fault occurs when a process attempts to access a page that is not currently present in RAM; satisfying such faults usually involves I/O, and can thus take some time. When many threads sharing an address space are generating page faults, the result can be significant lock contention while that I/O takes place. During the memory-management track at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, Barry Song led a session to try, yet again, to find an enduring solution to this problem.Song began by saying that per-VMA locks had been introduced a few years back; that work moved much of the page-fault-handling work to a lock at the virtual-memory-area (VMA) level, in an attempt to relieve pressure on the process-level mmap_lock. But, when satisfying the fault requires initiating I/O, the kernel will release the per-VMA lock then retry the handling of the fault, with mmap_lock held, once the I/O completes. That can create significant mmap_lock contention, causing threads to stall. He wanted to find ways of reducing that contention, and had a few options to consider.
The first is to simply retry the handling of the fault using the VMA lock rather than mmap_lock after I/O completion; he has posted a patch series implementing this idea. That would introduce even more complexity into the fault-handling path, though, he said.
An alternative would be to completely remove the retry code and, instead, simply hold the VMA lock while waiting for I/O. Lorenzo Stoakes worried that this approach, too, would add complexity. Shakeel Butt asked about how bad the additional complexity would be; Matthew Wilcox answered that it would not be that much worse, but that the fault-handling code is already too complex now. He said there might be a possible third option: apply Song's change to retry under the VMA lock, but only for anonymous pages, where the change is relatively simple.
Ryan Roberts said that the retry flag (the VM_FAULT_RETRY value returned when fault handling must be retried) is covering too many cases. It is used for compatibility with code that is not able to deal with the VMA lock and, as a result, retry has to use mmap_lock. Suren Baghdasaryan said that retries are called for when an operation cannot be done under the VMA lock — at least, not at the current time. There might be a place for a separate flag to call for a retry under the VMA lock. Butt asked whether the contention stalls Song had observed were associated with anonymous or file-backed pages; the answer was that the problem is mostly seen with the latter.
Song returned to the option of removing the retry code entirely. He said it would be possible, but has the potential to create priority-inversion problems. Threads running within an Android app have different priorities; in the wrong scenario, one thread holding mmap_lock could block the high-priority user-interface thread, causing visible stalls. Wilcox said that the real problem is threads waiting for I/O with the VMA locked, but Song said that the problem comes up even if the high-priority thread is not accessing the same VMA. After some discussion on whether the priority-inversion scenario was a real problem, the consensus seemed to be that it indeed is.
Song concluded with a few other discussion points, the first of which was whether it makes sense to use different approaches for anonymous and file-backed pages. In the case of anonymous pages, the kernel can allow page-fault handling and other VMA changes (an mprotect() call, for example) to happen concurrently. The file-backed side might benefit more from removing the retry logic entirely once the priority-inversion problem has been fully understood and avoided. Then, he said, there are cases where multiple threads are faulting in the same sets of pages; rather than contend on the mmap_lock and folio locks, the handler could check the up-to-date status of the folio. If it is up to date, then somebody else has already performed the I/O to handle the fault, so the retry can be avoided.
His final question was whether the kernel should retry fault handling under the VMA lock by default and only fall back to the mmap_lock in cases where it is known to be needed. Baghdasaryan repeated the idea of adding a new retry flag to indicate that the VMA lock should be held. The session ended with a suggestion from Vlastimil Babka that Song should try the various options to see how they work.
Song has posted his slides from this session.
Reviewing kernel patches with LLMs
In a plenary session at the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, the state of patch review using large language models (LLMs) was discussed. It is a topic that has been swirling around in the kernel community for much of the year. The plenary, which was led by Roman Gushchin, Chris Mason, Josef Bacik, and Sasha Levin, resulted in a quite bit of discussion, so much that a second filesystem-track-only (though others surely sat in) slot was used to continue it later in the day.
Gushchin began with a slide depicting Bag End, labeled with
"LLMs", which was a joke, he said, because whether we like it or not, they
are "coming to our pleasant code
". The same slide had a graph of
first-time contributors from
the development statistics article for Linux
7.0, which showed a sharp, roughly 50%, increase for that kernel. That
is part of why he started working on Sashiko to help
provide additional code review.
There are already static-analysis tools being used on kernel code and, of course, there are human reviewers as well. Sashiko sits somewhere in between those two and shares properties, both good and bad, with both. For example, the output is probabilistic, so different results will be produced each time it is run. That is like human reviewers in some ways, since maintainers and others will often spot different problems each time they review a patch set.
Another aspect that is similar to human review is that Sashiko's review is also of lower quality for large patches and patch sets. It can also be biased by the commit log. At one point, Sashiko found a bunch of problems in a patch set, but the commit log said it fixed real bugs, which the LLM reviewer accepted at face value.
He presented a report that he had generated a few days earlier which analyzed interactions between Sashiko and human reviewers. Everyone asks about the false-positive rate, which was around 10% for the 1500 email threads analyzed, but there were lots of true positives too, roughly 85%; the rest were in the gray zone, but were of relatively low value. Sashiko definitely does better on finding high-severity problems and it probably makes sense to ignore its low-severity reports. In the threads analyzed, the critical and high-severity accuracy rate was almost 97%, he said.
He reported that there have been 140 mentions of Sashiko in the commit messages in the kernel tree. There is no standard for attributing problems found to Sashiko, so the real number of Sashiko-found bugs is probably higher. His slides noted that the tool was launched mid-March, so those mentions had all occurred in the seven weeks since then.
There are a set of tradeoffs, which he represented with a triangle, between
bug discovery, token cost, and false positives. It is easy to move around
within the triangle, but it is difficult to improve each of those at once.
Mason asked Bacik if there was any effort toward optimizing token use at
this point; Bacik said "none whatsoever
".
Hannes Reinecke asked about the relationship between tokens used and bugs found. Mason said that token cost was directly related to the amount of context provided to the model, which was then correlated with the number of bugs found. So, Reinecke asked, the more context provided, the better the model's output gets? Gushchin and Mason both agreed with that.
One improvement that can be made is to run Sashiko multiple times on the same patches, Gushchin said. It will give somewhat different results that can then be aggregated and summarized to try to find the most important problems.
Status
Mason took over to talk about the status of the effort. Sashiko is
currently running on the linux-kernel mailing list and 47 associated
mailing lists
("48
", Gushchin interjected) that have opted into using Sashiko.
Other maintainers can coordinate with Gushchin if they want to be added to
Sashiko's processing, Mason said.
But Christoph Hellwig thought that the mailing-list-centric approach was "a
big part of the problem
". All of the other tools can be pointed at a
Git tree and a developer or maintainer can get the results from that
directly. "Having to do a round trip over the mailing list to get a
review every step is stupid.
" There is a need for a way to submit code
and get reviews directly without going through the list, he said.
Mason said he had two answers for that. They can run Sashiko on Git trees,
so that is one way. But, perhaps easier still: "run it yourself, on
your machine, get your own tokens
". He said that Anthropic is willing
to give tokens to kernel maintainers and, he believes, Google is also
willing to do that.
Hellwig said that the continuous-integration (CI) bots make it easy to not
have to set anything up and just get feedback on a patch set; that is
especially important for small-time contributors. "Currently, all the
AI stuff breaks the model that we have
". Chuck Lever said that he
seconded Hellwig's concern; Lever has sent an 18-patch series to the
mailing list seven or eight times because Sashiko keeps finding different
problems each time.
Lever has access to several models, including Google Gemini that Sashiko can use, so he wanted to know how to get that set up in his lab. He wants to be able to reproduce what would have been sent to the mailing list, but to get it locally so that he can act on it before sending multiple revisions. Gushchin said that Sashiko is easy to set up, just clone it, build it, and run it, which takes roughly five minutes.
Christian Brauner said that does not necessarily solve the problem, as the systemd developers have seen. Because the output is probabilistic, it may not find anything when it is run at home, but then find things later. Bacik noted that there is the same problem with human reviewers, however. Lever said that his goal is just to reduce the number of round trips to the mailing list, so he doesn't see the non-determinism as a major problem.
Lorenzo Stoakes would like to see some way to provide direct feedback to Sashiko about its reviews; he has rather different experiences than some with regard to the signal-to-noise ratio of the reviews. Mason said that it is best to send review responses to the mailing list. The only way to figure out problems with the prompts being used for Sashiko is to see what reviewers are pushing back on. The early use of Sashiko by the BPF developers helped determine what needed to be fixed in the prompts.
Gushchin pointed out that when the LLM sends comments and people reply,
both parts could be wrong. Mason agreed, but said that the conversation is
useful even when there are parts that are wrong. Hellwig complained about
the verbosity of the output. It is "overly human-looking language that
takes a huge amount of time to parse
" and turn it into a technical
complaint. It defaults to prose output, Mason said, but perhaps that could
be tweaked.
Ted Ts'o said there needs to be a way to handle known issues in a patch
set. Those issues might be solved later in the patch set or they might be
the kind of problem that has been known for a decade but never rose up
anyone's priority list far enough to get fixed. There need to be ways to
annotate the code to say "ToDo: we know
" about a problem, but are not
dealing with it now, "so, review bot, don't worry about it
".
Gushchin said that the first kind should already be handled by Sashiko. It looks at the end state after the whole patch set has been applied to remove anything that it found that got fixed in further patches. Mason said that he did not want people to start changing the code to appease the LLM reviewers, either. If the suggestion is for something that is not of interest, the maintainer should just delete the email and move on.
The problem for Ts'o is that he keeps getting reports of the same things
over and over which fall into the "ToDo" category. Adding a comment to
that effect is not inaccurate; "I will fix it, maybe in five years after
I retire and am not herding cats for an AI-infrastructure project.
"
Mason said that "ToDo" comments would be fine for Sashiko, but it should be
up to the maintainer whether they want "review spam
" or the
comments.
David Howells wondered about how Sashiko identified the specific patches it
had reviewed and also how Sashiko should be credited in commits. Gushchin
said that the reviews contain the message IDs of the patches; "for
giving credit ... whatever
". On the other hand, Hellwig decried the
trend to "over-crediting" tools. "If you use CoPilot or whatever to
design something, I don't care.
"
Ultimately, the committer is the one responsible, not the tool.
Damien Le Moal described his experience with using LLMs. It did find bugs,
he said, one that was valid and one that was "pure and utter crap
".
The reason for the latter is that it was dealing with hardware, so the
context is not just the code, but also the specifications. The LLM may be
logical, but the specifications disagree. He is interested in using
Sashiko, but is worried because "someone is going to have to
double-check absolutely everything
".
Prompt-file location?
Mason said that provided a nice segue to his next topic, which is what should happen with the kernel review prompts that he has been shepherding. Sashiko is aimed at mailing lists and maintainers, while the review prompts are more suited to use for interactive development. That is how he and Gushchin are dividing their efforts.
The review prompts have two parts, Mason said. First are prompts to explain "how to
review", which is shared with Sashiko. The other is a set of
subsystem-specific knowledge and guidelines that can be used as context by
the LLMs. Brauner asked if adding more context degraded the output, which
Mason agreed that it did, "it just needs to be fixed
", he said to
laughter.
Bacik said that additional context does not actually degrade the model, but that giving it instructions can. He will often tell a bot encountering a new area of the code to generate context about it, which is helpful for token efficiency and improves the quality of the output. Newer models are getting better at this, he said.
Le Moal said that he liked the idea of adding more context, such as the hardware specifications, but they are contained in many, large PDF files. Some of which are behind paywalls, Brauner added, though Le Moal thought there are not all that many of those. Le Moal wondered how that ties in with token cost.
Gushchin said that having a database of publicly accessible specifications
would be great. Mason said that "we'll need to do a lot of
indexing
" of the PDFs, so that the models can use them. "And by
'we', I mean you
", he said with a grin.
All of the prompts currently live in his repository, Mason said. "I
really doubt that you want me to be the arbiter
" of the content of
those files; he
thinks they should be added to the kernel itself. He does not know where
they should go in the kernel, nor does he care. Brauner complained,
laughingly, that Mason had waited until the end of the session to bring up
the controversial part. It was agreed that another session would be
allocated to continue.
Kernel documentation maintainer, Jonathan Corbet, said that the prompts
looked like "a whole lot of very useful documentation on how to
understand and review kernel patches
", though it is "really sad that
we couldn't write it until we were writing it for a machine
". He
wondered what that kind of documentation might have enabled had it been
added to the kernel long before now.
He has heard concerns that systems
like Sashiko remove the "bottom rung for beginning developers
" who
want to learn by reading patches—that work will already have been done for
them. The prompts are useful documentation that belong in the kernel, he
said; maybe some of those developers will read and use it to help restore
the bottom rung for them a bit. Amir Goldstein asked if Corbet would
review patches to add the prompts, which he agreed to do.
Mason noted that Sashiko can review documentation patches, as well, of course. Meanwhile, there is a large backlog of bugs that have been found by LLMs that need to be triaged and fixed. Goldstein pointed out that with LLM assistance, people can now generate bug reports that look genuine, but sometimes turn out to be bogus. The tools can be used to help winnow out the good reports from the bad.
The requirements for security reports to be considered should be raised, he said, because
researchers now have the tools and means to explain the problem better.
Randy Jennings asked if Goldstein was asking for the LLM to build a "bug
recommendation list
". Goldstein said he just wanted explanations that
described and reasoned about the severity of the problem, which could be
used to justify spending human time on it.
The majority of the bugs are not security bugs, Mason said. Security
researchers are justifiably excited by what the models can find, "but I
think we need to treat them like bugs
". Brauner suggested that an
LLM-based triaging effort would be useful; feeding the LLM reports to
another LLM for double-checking would help to reduce problems. He has seen
reports that look reasonable but are "actually bullshit
" multiple
times, so reducing that problem is important.
Round 2
Levin began the overflow session by returning to the prompt files. The subsystem-specific prompts will have different kinds of information about the code base and the subsystem's policies, but where those should live in the kernel tree has been somewhat controversial. The policies would cover how the LLM should review the code, how it should deliver its output, and so on. So he was curious to hear what attendees thought about how the prompts should get integrated with the kernel tree, which would allow the maintainers and developers of those subsystems to better control Sashiko on their own without requiring Mason or others to make changes.
Sashiko is more than just prompts, Gushchin said; there is Rust code that controls how those prompts are used, which should not belong to the kernel. But there are various subsystem-specific rules, such as reverse-Christmas-tree declarations, that do need to be under the control of the developers, Mason said. Those kinds of things should not be embodied in the Rust code.
Howells said that he had done some experiments with Sashiko on patches for the Network Filesystem Services Library (netfslib); he created some context prompts describing some of its internals. He thinks that information will need to be in the kernel so that he and others can change it as needed. Levin said that he had been working on getting Sashiko to review backports to stable trees and also needed to provide extra prompts in order to have it focus on the feedback he was looking for.
Brauner asked if there was a way to satisfy Corbet's thoughts about getting
the prompts into the documentation, while also allowing subsystems to make
their own changes. Ts'o said that he thinks there are certain
"high-level concepts that very clearly belong in the documentation
",
but that there is subsystem-specific information that belongs in the C
files so that humans can see and maintain it. The problem is that the
information needs to be gathered from the C files so that the bots can use
it without necessarily having to read all of the code.
Gushchin said that Sashiko is already good at figuring out most of what it needs to know for reviewing patches, but it cannot necessarily pick out style requirements, such as declaration ordering or comment formatting. Mason suggested that there needed to be compact definitions for the knowledge around spinlocks, say, and when to use them. That will help address the token-efficiency concerns that Ts'o mentioned. The subsystem-specific information will also help newcomers, Levin said.
There was some unfocused (and hard to follow) discussion of what the prompts should actually contain. One attendee complained about the commands in all caps that he saw in earlier versions of the prompts. That style did not lend itself well to documentation, but it was agreed that there was no longer a requirement to do it that way.
Gushchin said that Sashiko works in stages, so different kinds of prompts and context will be appropriate for each stage. There are stages to check for locking issues, resource-management problems, and so on, so it would be nice to structure the subsystem-specific guidance with those stages in mind. Levin said that breaking up that information makes sense for both bots and human readers.
Goldstein noted that maintaining and reviewing prompts is an area that the
current LLM-herding developers (e.g. the four leading the session) can
hopefully participate in. "You're the experts because you've learned how
to tame the agents.
" Sashiko's credibility comes from the developers
who are behind it, which could be lost if Mason and others were to stop
maintaining the prompts.
Mason said there may be some truth to that, but "you don't want me
to maintain a prompt that explains how overlayfs locking works
" because
he is not qualified. What he wants to do is to give maintainers a way to
ensure that the prompt information is correct. He suggested that he and
Gushchin could then help to turn that into working prompt language. Ts'o
noted that it is similar to what goes on for documentation already, where
the maintainer describes the locking hierarchy, say, and someone with
better documentation skills cleans up that description for inclusion in
the kernel.
Mason said that the overall idea of maintaining the files with the kernel was not really the controversial part. That would come when they suggested a particular layout for the files, names, and so on. The way forward is to propose something and see how it works for the community.
Restrictions?
An attendee asked about restricting the review to only consider bugs in the patch set itself, rather than looking at the overall code and reporting on other bugs, some of which it has reported multiple times already. It can be annoying and will only get more so, he said, if there is no way to somehow turn them off. Gushchin said he had some ideas on that, but that it may be hard to do well. Developers also get review comments from humans on parts of the code that is not directly related to their patch set at times. Mason said that acknowledging that there is a bug may help other reviewers skip past it or perhaps fix it, but it may be hard to stop the LLM from reporting it.
Bacik said that currently Sashiko does its analysis multiple times and tries to use its estimation of the severity to filter some of its results. If another version of the patch set is submitted, the diffs between the patch sets are considered to try to reduce the nit-picking reports that developers tire of quickly. As had others, Levin pointed out that it is not uncommon for the same human reviewer to come up with more and different problems on subsequent patch sets even if the changes are minimal. It is, at least in part, a review problem, not just for LLMs.
There are some things that kernel developers could be doing to help both human and LLM reviewers, Ts'o said. When long patch series are posted frequently, all reviewers have to look at the full series, which is wearying for humans (and results in less accurate results for LLMs). If, instead, developers reply to review comments with a change they will make for the next version, with specificity and code, it reduces the review burden for all reviewers. Making changes like that will be far less controversial than changing development practices simply to accommodate LLMs.
Mason said that did not disagree with anything Ts'o said, "but I'm
going to call it out of scope to fix lkml
", Mason said to laughter.
For big patches, a cap can be applied so that they do not overwhelm the
token budget. But, he noted, there are parts of the kernel that are so
critical that they want
to apply the maximum token budget each time code in those areas changes.
Gushchin said that he was surprised at the number of pre-existing bugs that are being found by Sashiko in the kernel, separate from the changes proposed in the patches under review. He is planning to build a kind of a database of these bugs so that interested developers can review them and hopefully fix those that are relevant. Maintainers will be able to access a per-subsystem list of open bugs in order to evaluate them.
Brauner likened it to the list of bugs maintained by the syzkaller project, though those mostly just pile up without being fixed. There was some quick mention of soliciting patches from LLMs to fix all of these problems, but it seemed clear that there was a fair amount of discomfort with that—at least for now. At that point, the session was out of time for the second time.
[I would like to apologize for any errors here. The acoustics in the room were problematic for both hearing and recording. Misunderstanding and misidentification may have resulted.]
Tier-aware memory-controller limits
Joshua Hahn began his session in the memory-management track of the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit by saying that the memory controller for control groups is intended to provide resource allocation, accounting, and protection from interference by other tasks. But it was not really designed for tiered-memory systems; he is looking for a way to improve that situation.
Tiered-memory systems have two or more classes of memory, each of which has
different performance characteristics. The system's RAM is relatively
fast, but also relatively scarce, while plentiful memory on a CXL device
may be slower. The memory controller, though, has no awareness of these
differences, with the result that two control groups running the same
workload under the same policies may perform differently. Without control
over the physical placement of memory, the memory controller is not able to
provide a consistent execution environment.
Hahn asked, rhetorically, whether anybody cares about this problem. There are, naturally, a number of situations where people do care. Latency-sensitive workloads will suffer if they are relegated to slower memory. Hosting services want to provide fairness across all of their tenants. And, in general, performance should be predictable. It is not possible, for example, to measure performance gains from other work if the execution time of the workload is inherently variable.
The proposed solution is to add a new memory-controller knob, memory_tiered_limits, that would enable tier awareness. Another set of knobs, memory.toptier_min, memory.toptier_low, memory.toptier_high and memory.toptier_max, would regulate how much memory the group is entitled to in the top memory tier, and the maximum amount that it can use. When, for example, usage reaches the memory.toptier_high value, reclaim on that tier would be triggered for that control group.
This scheme, he said, yields more consistent results on tiered-memory systems for a variety of workloads. It can also improve throughput overall; distributing a workload properly across tiers can maximize the use of the available memory bandwidth for all of those tiers.
There are some questions still to be answered, he said. Should reclaim always be triggered when top-tier usage hits the watermark, or should it only happen when there is memory pressure? He suggested that, if there is still top-level RAM available in the system, it might not make sense to limit its use. On the user-interface side, there is the question of how much tuning of tier ratios the user should be allowed to do. Promotion of folios from slower to faster tiers is a longstanding problem with the tiered-memory concept, due to the difficulty of efficiently determining which folios are the right ones to promote. It is a problem here too; Hahn would like to find an efficient solution at the control-group level.
A member of the audience asked whether it was possible to expand this mechanism to more than two tiers. He expressed concern that the toptier name could not be changed, leading to problems where the top tier is some sort of scarce, high-bandwidth memory rather than regular system RAM. Hahn said that he would like to see such a system before trying to adapt the memory controller to it.
Another participant asked about fallback. Within the kernel, if an attempt to allocate top-tier memory fails, the allocator will fall back to slower memory. With this proposal, instead, the controller would trigger reclaim on the control group if the top-tier limit has been reached. Hahn agreed that this is a change in behavior, but said that is a result of introducing a new concept — tiered memory — to the memory controller.
The session wound down with an unfocused discussion on the difficulties of tracking memory used by the networking subsystem in general.
Better automatic management of transparent huge pages
Huge pages can improve performance by increasing translation lookaside buffer (TLB) utilization and reducing memory-management overhead. Transparent huge pages (THPs) are supposed to make huge-page usage, well, transparent, Nico Pache said at the beginning of his session in the memory-management track of the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit. That transparency has never worked as well as many would like; he has been working on improvements to make it easier for applications to use huge pages on Linux systems. A following session, led by David Hildenbrand, was focused on how THPs could be taken away from processes that are not using them fully.
Making THPs more transparent
The only way to have the system truly allocate huge pages transparently, he
said, is to set the appropriate option to always in sysfs (see Documentation/admin-guide/mm/transhuge.rst
for details on how these options work), but the implementation of that mode
is not optimal. If a processes touches a single byte it will get a 2MB
huge page that may never be utilized to any great extent. He has, in the
past, suggested a defer
mode that would leave the entire task of creating THPs to the
khugepaged kernel thread, which assembles them after the fact when
it looks like they would improve performance. The consensus seemed to be
that this mode wasn't the right solution, though; the memory-management
developers would rather see an auto mode that makes better
decisions.
Pache's goal is to define that automatic mechanism; he suggested that it should behave like always when memory usage is below a threshold, and like defer otherwise. It would combine khugepaged (to create huge pages after allocation) and the "underutilized" shrinker (to split them apart when they are not being fully used). When the system is in the always mode (below the memory-use threshold) khugepaged would be actively trying to find candidates for promotion to a huge page. In defer mode, instead, the shrinker would look for huge pages to break apart again.
Johannes Weiner commented that, if the shrinker has to run, then user space is doing something wrong. The shrinker is not great, he said, but perhaps good enough for now, especially if it can carry the system forward until user-space allocators improve. Hildenbrand wondered if there could be better ways to tell the kernel about a process's memory-use granularity; perhaps there might be a case for some sort of BPF interface.
Another audience member asked how the kernel might help user-space allocators make better placement decisions. Tal Zussman answered that the first step is to define with the actual goal is. At allocation time, the kernel has no information on how the memory will be used, so an interface allowing user space to provide hints might help. The more interesting problem, he said, is khugepaged, which would benefit from information that would allow it to optimally organize memory into multi-size THPs (mTHPs).
Matthew Wilcox said that the underutilized shrinker works by looking for base pages filled with zeroes, which is an indication that the memory was never used. It will miss memory that was used once and never touched thereafter. He said there could be benefit to adding an madvise() operation to tell the kernel that a given range of memory is not being used. There would have to be a companion "I'm using that memory again" operation as well.
Liam Howlett said that, while hinting interfaces are good, once memory use crosses a threshold, it would be better to just assume that THPs will be useful. Lorenzo Stoakes, though, said that the existence of various hinting interfaces shows that, at times, it is better to be a bit less transparent. Hildenbrand said that the problem with hinting interfaces is that the kernel does not always have a place to store those hints. Wilcox added that the problem with automatic thresholds is that, on Linux, the page cache is always full, so memory always seems to be in use. Hildenbrand suggested using the pressure-stall information generated by the kernel; Weiner said that the refault rate could also be used.
Ryan Roberts suggested increasing the mTHP size for each virtual memory area (VMA) over time as usage indicates, but Weiner said that approach assumes that the process will be running for a long time. Hildenbrand again suggested some sort of BPF interface, but Pache repeated that the hope is to make THP use as transparent as possible. John Hubbard said that hinting interfaces can be useful, since user space knows more about what it is doing than the kernel does; BPF might be overkill, though. Weiner said that applications often know less about themselves than one might assume; if nothing else, they often incorporate libraries that may do surprising things. It can be good to make hinting possible, he said, but there still needs to be a solution that works out of the box.
The conversation became increasingly unfocused as time ran down. Hildenbrand suggested that the best solution would be to fix the hardware to work better with small pages. Another audience member said that, with current memory-price trends, nobody will be able to afford a 2MB huge page next year anyway. A final suggestion was to just use the always mode, coupled with a smarter shrinker.
Better splitting of underutilized huge pages
Later that afternoon, Hildenbrand started a separate session by thanking the organizers (with a smile) for putting this session at the very end of the schedule, when everybody was exhausted. It was, he said, a good topic to finish with, since he didn't have a lot of answers; developers could ponder on it as they headed home. Tired or not, the participants managed to hold a lively discussion on how the kernel might do a better job of splitting apart huge pages when it turns out that they are not being fully used.
Folio splitting, he said, can be problematic. If a process unmaps some of the base pages from one of its THPs, the kernel may want to split that THP apart, but not necessarily right away. Splitting may not be possible in situations where the needed lock is not available, but splitting may also be undesirable from a performance standpoint. So the kernel defers the splitting by adding the THP to the "deferred split" list for further handling.
Meanwhile, the kernel wants to find other huge pages that might need to be split, even if they are fully mapped. To that end, when a PMD-level (2MB on x86) THP is allocated, it is automatically added to the split list. Eventually the underutilized shrinker will come along, scan the THP for zero-filled pages, and possibly split it if the page appears to not be fully used.
There are some problems with the current implementation, he said, some of which are more theoretical than others. One comes about when partially unmapping a THP that is split across VMAs; that will cause the THP to be added to the deferred-split list. Another problem comes about when partially mapped THPs are not detected as such. If a process unmaps an entire folio, frees part of it, then remaps the folio, the partial mapping will not be detected.
Currently, all PMD-level THPs are added to the deferred-split list, as described above. That is not a huge problem, he said, since there aren't many of those pages in the system. In a world where mTHPs are more heavily used, though, adding them to the list could make the list far too long. The fact that there is no priority associated with placement on that list does not help; the kernel makes no distinction between THP sizes, and no distinction between underutilized and partially mapped THPs, when considering a split. There are no LRU semantics either.
All of this, he said, drives a feeling that things should be done
differently. He had an idea toward that goal, though it was "a bad
one
". All THPs would be added to a list when they are created — and
kept there. The result would be a long list, but one that is not often
updated. Occasionally, the kernel would scan this list. When it finds a
partially mapped page, it will attempt to split it. The utilization scan
(looking for zero-filled pages) would be done and, if the page looks like
it is not fully used, the kernel would again try to split it. If the page
might be partially unmapped, a reverse-mapping scan would be done to
see if, once again, it should be split. Otherwise the page would be
skipped over.
That scheme, he said, would result in the underutilized shrinker having to process a lot more huge pages. Rather than maintain a separate list, perhaps the existing LRU lists for anonymous pages could be used. Weiner said that using the anonymous LRU could end up creating a lot more I/O before finding a page that could be split; in the past, he has been unable to find a way to integrate the two lists without causing performance regressions.
Hildenbrand answered that the shrinker would not be handling reclaim, just the huge-page maintenance. Wilcox said that integration with the LRU would cause the shrinker to scan the oldest folios in the system — the ones that are about to be swapped out anyway. Perhaps, he said, the solution is just to split THPs when they are swapped out and get rid of the shrinker entirely. Scanning for zero-filled pages at swap-out time could help to reduce I/O rates as well. Hildenbrand said that could be problematic for systems that do not have swapping enabled.
As the session (and the conference) wound down, Hildenbrand asked if there were any better ideas to be had. The resulting silence, he said (again with a smile), should be interpreted as the group having no objections to his proposal.
Hildenbrand has posted the slides from this session.
Further progress toward removing the page map count
David Hildenbrand has been working for some time to get rid of the mapcount field of struct page. At the 2026 Linux Storage, Filesystem, Memory Management, and BPF Summit, he was clearly feeling like he was getting close to that goal; he described some plans and future challenges in a memory-management-track session.The mapcount field was created to track the number of mappings (page-table entries) that refer to the given page. Among other things, a mapcount of zero means that the page has no references and can be reclaimed. Maintaining mapcount has become increasingly challenging and expensive as the memory-management system has grown in complexity, so Hildenbrand has been looking for ways to get rid of it. This session was, he said, maybe one of the last times he will have to bring up this topic.
Since the 6.15 release, the kernel has had the NO_PAGE_MAPCOUNT configuration option, which enables the code that is being developed to eliminate the use of the mapcount field. It is marked experimental because, along with the fact that it is indeed experimental code, it makes some of the accounting data less precise; whether that will create problems for user space is not yet clear. Since that option was added, progress has been made on a number of fronts, and no problems have been reported. But, since the option is marked experimental, he said, nobody is testing it, so the lack of problem reports is only so comforting.
Next, he plans to properly support folios larger than the PMD size (2MB on x86). The code actually handles PUD-size folios now as a sort of special case, but no sizes between the two are supported. He also wants to support mapping a large folio as an arbitrary collection of page sizes; a 1GB PUD-size huge page might be mapped as a combination of PMD-size and 4KB base pages, for example. Once that is working properly, it will be time to remove the experimental marker from the configuration option.
He sent a patch set in April with the arbitrary-size support and removal of mapcount. That work adds a new field, _total_pages_mapped, to the folio structure that counts all base pages once for each time they are mapped; a PMD-level mapping would increase this count by 512, for example. Accounting for mappings in this way makes some statistics imprecise, he said, but he doesn't know if anybody cares about it.
If a folio has even one base page mapped, that folio is counted as fully mapped in this new field; mapping a single base page out of a PMD-size folio will, once again, increase _total_pages_mapped by 512. This accounting does not change how the resident-set size is calculated, though. How the new code is able to answer questions about folios does change a bit. Some questions, such as whether a folio has any pages mapped, the total number of mappings it has, whether it has unexpected references, and whether it is mapped shared or exclusive are easily answered with the new count. On the other hand, the kernel cannot give a definitive answer to whether an anonymous folio is partially mapped.
One place where this could be a problem, he said, is in the /proc/PID/pagemap file, which has a field indicating how many processes a given page is mapped into. With Hildenbrand's changes, this field might mark an exclusively mapped page as being shared in some situations. It was, he said, a mistake to have ever exported that field. The proportional-share fields (Pss and Pss_Dirty) in /proc/PID/smaps become less precise, as do /proc/PID/numa_maps and /proc/kpagecount. He does not think that anybody will care about these cases, which only come about for partially mapped folios. An audience member asserted that the Pss value was broken in any case, since processes are able to influence it.
Hildenbrand concluded by saying that he would like to make NO_PAGE_MAPCOUNT option the default in the near future. Kiryl Shutsemau suggested to just try it, perhaps with a longer-than-usual trial period in linux-next, to see what breaks.
MOT: a tool to fight openwashing in AI
Many large language models (LLMs) are described as open source, but if one looks a bit deeper it turns out that is not actually so; the model may be free to download, it may be "open weight", but it does not fit the Open Source Initiative (OSI) Open Source Definition (OSD). Assessing the actual openness of models is not easy, as Arnaud Le Hors explained in his talk about the Model Openness Tool (MOT) at Open Source Summit North America 2026. The tool is designed to help users of LLMs understand to what degree a model is (or is not) open, and to combat the openwashing that is prevalent with LLMs.
The problem
Le Hors began by asking the audience a rhetorical question,
"do you think that all the models that are on Hugging Face are open
source? Are they even open models?
" Hugging Face, of course, is a
popular site for sharing and downloading LLMs, data sets, and
applications for working with them.
Much of what is available on Hugging Face, he said, falls short
of the basic requirements of an open-source license. Many vendors or
projects are creating their own licenses for models. Le Hors said
that this was not unlike the early days of open source; that created
"a lot of chaos
", which led to the creation of OSI and its
definition of open source. "Now, many years later, we're seeing a
similar type of challenge with 'open' AI
."
The models are often described as open-source, or just open, which
causes many problems. He said that, in fact, "there are a lot of restrictions
associated with the licenses under which they are made
available
". For example, some licenses try to limit the number of
users or try to place restrictions on the types of use: "They can
say, well, you can use my model, but not for military use.
" That
kind of limitation may be well-intended, but a license with use
restrictions still falls short of being open source.
People believe that if something is on Hugging Face, they can
simply download it and do whatever they want with it. He said that
those users may be infringing on the licenses and taking a legal risk.
Worse, some users download a model, do their own fine-tuning,
and then republish the model under a different license. This would be
the equivalent of downloading software under the GPL and then
republishing it under the Apache License. "You just
can't. Legally, it's not allowed.
"
Model Openness Framework
Le Hors said that those were the kind of problems that the Generative AI Commons working group of the Linux Foundation's AI & Data Foundation has been trying to solve with the Model Openness Framework (MOF). One might wonder, what about the OSI's Open Source AI Definition (OSAID)? He did not address the OSAID during the talk, but it could be because the work on MOF was underway separately from OSAID and a final version was introduced in April 2024, while OSI was still working on OSAID, which was not finalized until October 2024.
The MOF provides a structure for
evaluating machine-learning models and provides a framework for
describing how open (or not) a model actually is. The specification
sets up a tiered system with three classes that represent
"ascending levels of model completeness and openness
", with a
Class III ("Open Model") being the least open and a Class I ("Open
Science Model") being the most open because it not only allows
distribution and tuning, but also enables others to study how the
model was created as well as the data used to train it. If a model's
terms are too restrictive, it does not receive a classification at
all.
According to the specification, a Class III model would allow
fine tuning of a model, unrestricted usage, and creation of a product
or service based on the model. To meet the Class II definition, a
model would also need to include supporting libraries and tools,
inference code, evaluation code, as well as code for training the
model. A Class I model would have all the components included with
the previous classes, as well as a research paper that explains the
model, the components that would be needed to reproduce a similar
model, and the training data "used for any form of model
training
" that users could examine.
Openness, he said, has to do with the license a model and its artifacts are provided under, while completeness refers to what is included with the distribution. The framework covers 17 components that fall into three categories: code, data, and documentation.
For example, code might include the model's architecture and
training code; data would include the model parameters and training
data sets; documentation includes Hugging Face
model cards, technical reports, research papers about the model,
and so forth. He did not go through each of the separate components,
but said that "every component must have an open license
" that
is based on the principle of open-source software according to the
OSI. See slide 5 in his presentation
for a graphic that lists all 17 components.
A lot of the licenses, he said, fall short of addressing the
specifics of the different types of artifacts. For example, there are
not many licenses that are specifically designed cover data. There is
a license, OpenMDW, that is meant to
cover machine-learning models and all of their artifacts, "but it's
not generally used yet.
" There is a blog
post that goes into detail about the OpenMDW license and the
intent behind it.
Model Openness Tool
The MOT, "which is really what I want to talk about today
",
is an online registry and tool for classifying models. Many of the
models listed on the registry, he said, don't even qualify as Class
III because they do not have an open license at all.
The site has a list of models that have been submitted; it displays each model's classification, as well as information about the model, such as links to Hugging Face and GitHub with model resources, the organization supplying the model, etc. The information on the site is taken from YAML files in the MOT GitHub repository.
Le Hors spent some time showing off the site, exploring the model pages, looking at the YAML syntax for the model information, and so forth. He demonstrated the model evaluation form, which takes user input about a model and then provides a classification for the model. As an example, a user might put in all of the available data about a model and receive an evaluation that indicates the model only meets the criteria for Class III. They could then submit the model to be included on the MOT site as-is or make changes to the documentation, license, and so forth to improve the score. Once they are satisfied, they can either sign into MOT with their GitHub account and send a submission directly from the site, or download the YAML file and manually create a pull request. The documentation for the process is fairly comprehensive.
He used the Aquila-VL-2B
model from the Beijing
Academy of Artificial Intelligence (BAAI) as an example. He said that BAAI had
originally submitted a model that "completely failed to
qualify
", and then spent time working to have a completely open
model. "They came up with a new version that actually qualifies,
and they did a stellar job at filling out the record.
"
Other topics
Once he had finished with the demo, Le Hors said that he wanted to talk about some other work that the Generative AI Commons working group had been engaged in. The group has been working on a Responsible Generative AI Framework (RGAF) as part of its Responsible AI effort. He did not go into details, but invited the audience to look into it; there is a blog post about RGAF from March 2025, and version 0.9 of the document is available.
He also mentioned that the commons had started an exploration
working group within the past few weeks that is meant to "be a really
open space for people to come and discuss and explore different topics
related to generative AI or agentic AI
". He invited anyone who
might be interested to visit the web site and join one of the
bi-weekly calls that the group holds.
With just a bit of time left over, he opened the floor for questions. I asked a two-part question about how the submissions to MOT were audited, and why the group was using manual submissions instead of some form of LLM to create entries for the site.
Le Hors said that the project relied on the community to audit
submissions. "And just like you do for anything else like this, if
you lie and you get caught, you'll get a black eye, right?
"
As to why the group wasn't using LLMs, he said that people have tried
but so far have not had much success. "We haven't had
anybody really committed to this in a long period of time to really
make it work.
" Part of the problem, he said, was that there is no
standard for model data. The Hugging Face model card is unstructured
Markdown "with a little bit of metadata
". But he did think it
was possible to do, "it just needs somebody who's really motivated
to work through it
".
Another member of the audience asked if MOT was an independent project, or if it was an IBM project (Le Hors is an IBM employee). He reiterated that it was an LF project as the session's time ran out.
[Thanks to the Linux Foundation, LWN's travel sponsor, for funding my travel to Minneapolis to attend the Open Source Summit.]
Page editor: Joe Brockmeier
Next page:
Brief items>>
