Leading items
Welcome to the LWN.net Weekly Edition for May 16, 2024
This edition contains the following feature content:
- Portable LLMs with llamafile: A new tool that generates binaries with weights from large language models in order to locally run a chatbot.
- Another push for sched_ext: The extensible scheduler class is a way to change the kernel's scheduler using BPF but it has run into some headwinds.
- Some 6.9 development statistics: The usual look at developers and companies involved in creating the 6.9 kernel, plus a new subscribers-only feature.
- The start of coverage from the Linux Storage, Filesystem, Memory
Management, and BPF Summit (LSFMM+BPF):
- The state of the page in 2024: An update on the process of converting the kernel to use folios.
- Lots more to come ...
- Debian dismisses AI-contributions policy: Unlike Gentoo, the Debian project does not seem to be interested in an outright ban on AI-generated contributions.
- Managing expectations with a contributions and credit policy: A new guide to developing a policy that outlines the expectations that a project has with regard to contributions and the credit those contributions will receive.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
Portable LLMs with llamafile
Large language models (LLMs) have been the subject of much discussion and scrutiny recently. Of particular interest to open-source enthusiasts are the problems with running LLMs on one's own hardware — especially when doing so requires NVIDIA's proprietary CUDA toolkit, which remains unavailable in many environments. Mozilla has developed llamafile as a potential solution to these problems. Llamafile can compile LLM weights into portable, native executables for easy integration, archival, or distribution. These executables can take advantage of supported GPUs when present, but do not require them.
Portability
Llamafile is based on the MIT-licensed
llama.cpp library, a
C/C++ implementation of the code necessary to evaluate LLMs of several different
architectures. Llamafile maintains its own copy of llama.cpp, with some
additional MIT-licensed changes. The code for llamafile itself is made available
under the Apache 2.0 license. Llamafile's value comes from building
llama.cpp in a way that works seamlessly across many different environments.
On April 25, Mozilla posted
an update about llamafile's progress that called it "the easiest
and fastest way to run a wide range of open large language models
".
The lead developer of llamafile, Justine Tunney, chose to re-use her previous work on Cosmopolitan Libc. That project implements a C standard library and associated compiler infrastructure to compile programs written in C as multi-format executables that can run on Linux, macOS, Windows, several BSDs, or without an operating system.
In the blog post announcing the start of Cosmopolitan Libc, Tunney said:
I like the idea of having the freedom to write software without restrictions that transcends traditional boundaries. My goal has been helping C become a build-once run-anywhere language, suitable for greenfield development, while avoiding any assumptions that would prevent software from being shared between tech communities.
In service of that goal, Cosmopolitan Libc uses clever techniques to create executable files that can be simultaneously interpreted as several different formats. The executables it produces start with a shell script that behaves differently on different operating systems, which the program exploits to select a pre-compiled static binary suitable to the operating system and architecture at run time. These files (along with the weights of the LLM, in llamafile's case), are bundled into a zip file. Since zip files store their metadata at the end of the file, the same file can serve as shell script, executable, and zip archive.
This approach is of questionable use for most programs, but for Mozilla, it represents an important way to democratize access to LLMs. There are an increasing number of repositories, most prominently HuggingFace, that distribute raw LLM weights — but raw weights aren't actually enough to use the models. Users also rely on inference code and software that provides an API to actually access the results of the model. To make things worse, that inference code is often specific to a particular brand of GPU or machine-learning toolchain, which makes LLMs hard to run except in specific environments.
Llamafile applies Cosmopolitan Libc's philosophy of selecting an appropriate implementation at run time to machine learning. It automatically detects whether the user has AMD or NVIDIA's GPU toolchains available, and if so, uses those. If not, it uses a new open-source linear-algebra library called tinyBLAS. The library supports using the APIs made available by existing graphics drivers to take advantage of GPU acceleration without requiring an installed toolchain. This is less performant than letting NVIDIA's CUDA or AMD's ROCm compile a native program for a specific model of graphics card, but still useful for users who don't have the GPU SDKs but do have the hardware.
TinyBLAS doesn't work with all drivers, however. If no GPU acceleration is available, llamafile falls back to CPU implementations of the core linear algebra libraries — versions that are specialized for particular microarchitectures and specific hardware. On March 31, Tunney published a detailed blog post discussing how she improved CPU inference performance across a wide variety of hardware, often by hand-writing a matrix multiplication kernel tuned for that exact hardware.
There's another trick that llamafile uses to speed up matrix multiplication, though, which is much more specific to its purpose as a platform for running LLMs. Generic linear algebra libraries like BLAS need to be able to multiply arbitrary matrices with unknown dimensions, possibly transposed or weighted in some way. LLM inference, because it proceeds one token at a time, spends a lot of time doing matrix-vector multiplications that can be written in a simpler form.
Even when LLMs do generalized matrix multiplications (during initialization), the models are architected such that the matrices are usually of a known size — often a multiple of 64. This lets a hand-unrolled implementation specific to those sizes outperform a more generic algorithm. Tunney benchmarked the multiplication of a 513x512 matrix with a 512x512 one (a size llamafile uses frequently), finding that her code outperformed Intel's proprietary Math Kernel Library (MKL) — on that specific size. The MKL is still faster on other sizes. Since llamafile controls the size of the batches used during LLM initialization, however, that's still a clear performance improvement.
Using llamafile
Using an LLM packaged by llamafile is fairly straightforward. The project's README links to several examples of different sizes. Downloading a file, marking it as executable, and running it is all that should be required in the vast majority of cases. Users who have binfmt_misc registrations for WINE might need to add a more specific rule to prevent WINE from being used as the program's interpreter. Running the program with no arguments will open llama.cpp's simple chat interface:
The built-in web server also offers an OpenAI-compatible API, so tools that expect to talk to the proprietary service can be seamlessly re-directed, as can tools that use OpenAI's API design as a de-facto standard for LLM inference. Users who are more comfortable on the command line can pass parameters and instructions as arguments instead.
Parameters can also be baked into a llamafile executable. As mentioned above, the files are actually valid zip files; adding a file named .args to the executable will make it treat those arguments as additional command line parameters. The procedure for turning the llamafile binary produced by building the project into a LLM-specific llamafile for distribution is actually the same: add the weights and any required arguments to the zip file.
For performance reasons, however, it's important to add the weights without compression, and ideally aligned to a 4K boundary. This allows llamafile to map the weights directly into memory, which is substantially faster than decompressing them into non-disk-backed memory. For this purpose, the project also provides a utility called zipalign that adds files to a zip archive in the correct way.
On my laptop, which lacks any relevant GPUs but does have a spiffy 12th generation Intel i7 processor, the Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile download provided as an example evaluates provided prompts at a rate of about 16 tokens per second. The actual answer itself is evaluated at about 3.5 tokens per second. The difference is attributable to the fact that during prompt evaluation, the model can use matrix-matrix multiplications, instead of matrix-vector multiplications. But this level of performance — while perhaps too slow to process many large documents — seems entirely adequate for local use.
With LLMs becoming increasingly integrated into other software, efforts to make them easy to run on existing, consumer hardware are an important part of making sure users can benefit from the technology without sending their data to third parties. The ultimate test of whether a format is suitable for widespread use is whether it is actually adopted; with llamafile being only a few months old, it's too soon to say for sure whether the project has achieved its goals. It does, however, seem to be well on the way.
Another push for sched_ext
The extensible scheduler class (or "sched_ext") is a comprehensive framework that enables the implementation of CPU schedulers as a set of BPF programs that can be loaded at run time. Despite having attracted a fair amount of interest from the development community, sched_ext has run into considerable opposition and seems far from acceptance into the mainline. The posting by Tejun Heo of a new version of the sched_ext series at the beginning of May has restarted this long-running discussion, but it is not clear what the end result will be.As a quick refresher: sched_ext allows the creation of BPF programs that handle almost every aspect of the scheduling problem; these programs can be loaded (and unloaded) at run time. Sched_ext is designed to safely fall back to the completely fair scheduler should something go wrong (if a process fails to be run within a time limit, for example). It has been used to create a number of special-purpose schedulers, often with impressive performance benefits for the intended workload. See this 2023 article for a more detailed overview of this work.
Heo lists a number of changes that have been made to sched_ext since the previous version was posted in November. For the most part, these appear to be adjustments to the BPF API to make the writing of schedulers easier. There is also a new shutdown mechanism that, among other things, disables the BPF scheduler during power-management events like system suspend. There is now support for CPU-frequency scaling, and some debugging interfaces have been added to make developing schedulers easier. The core design of sched_ext appears to have stabilized, though.
Increasing interest
Even before getting to the changes, though, Heo called attention to the increasing interest in sched_ext that is being shown across the community and beyond. Valve is planning to use sched_ext for better game scheduling on the Steam Deck. Ubuntu is considering shipping it in the 24.10 release. Meta and Google are increasing their use of it in their production fleets. There is also evidently interest in using it in ChromeOS, and Occulus is looking at it as well. Heo concludes that section with:
Given that there already is substantial adoption which continues to grow and sched_ext doesn't affect the built-in schedulers or the rest of kernel in an invasive manner, I believe it's reasonable to consider sched_ext for inclusion.
Whether that inclusion will happen remains an open question, though. The posting of version 4 of the patch set in July 2023 led to a slow-burning discussion on the merits of this development. Scheduler maintainer Peter Zijlstra rejected the patches outright, saying:
There is not a single doubt in my mind that if I were to merge this, there will be Enterprise software out there that will mandate its own BPF sched thing, or else it won't work.They will not care, they will not contribute, they might even pull a RedHat and only share the code to customers.
He added that he saw no value in merging the code, and dropped out of the conversation. Mel Gorman also expressed his opposition to merging sched_ext, echoing Zijlstra's concern that enterprise software would start requiring the use of special-purpose schedulers. He later added that, in his opinion (one shared with Zijlstra), sched_ext would work actively against the improvement of the current scheduler:
I generally worry that certain things may not have existed in the shipped scheduler if plugging was an option including EAS, throttling control, schedutil integration, big.Little, adapting to chiplets and picking preferred SMT siblings for turbo boost. In each case, integrating support was time consuming painful and a pluggable scheduler would have been a relatively easy out that would ultimately cost us if it was never properly integrated.
Heo, naturally, disagreed
with a lot of the concerns that had been raised. There are, he said,
scheduling problems that cannot be addressed with tweaks to the current
scheduler, especially in "hyperscaling" environments like Meta. He
disagreed that sched_ext would impose a maintenance burden, arguing that
the intrusion of BPF into other parts of the kernel has not had that
result. Making it possible for users to do something new is beneficial,
even if there will inevitably be "stupid cases
" resulting from how
some choose to use the new feature. In summary, he said, opponents are
focused on the potential (and, in his opinion, overstated) costs of
sched_ext without taking into account the benefits it would bring.
Restarting the conversation
That message, in October, was the end of the conversation at the time. Heo is clearly hoping for a better result this time around, but Zijlstra's response was not encouraging:
I fundamentally believe the approach to be detrimental to the scheduler eco-system. Witness the metric ton of toy schedulers written for it, that's all effort not put into improving the existing code.
He said that he would not accept any part of this patch series until
"the cgroup situation
" has been resolved. That "situation" is a
performance problem that affects certain workloads when a number of control
groups are in use. Rik van Riel had put together a patch
series to address this problem in 2019, but it never reached the point
of being merged; Zijlstra seems to be insisting that this work be completed
before sched_ext can be considered, and he gave little encouragement that
it would be more favorably considered even afterward.
Heo expressed
a willingness (albeit reluctantly) to work on the control-group problem
if it would clear the way for sched_ext. He strongly disagreed with
Zijlstra's characterization of sched_ext schedulers as "toy schedulers" and
the claim that working on sched_ext will take effort away from the mainline
scheduler, though. There is, he said, no perfect CPU scheduler, so the
mainline scheduler has to settle for being good enough for all users. That
makes it almost impossible to experiment with "radical ideas
", and
severely limits the pool of people who can work on the scheduler. Much of
the energy that goes into sched_ext schedulers, he said, is otherwise
unavailable for scheduler development at all.
There is, he said, value in some of those radical ideas:
Yet, the many different ways that even simple schedulers can demonstrates sometimes significant behavior and performance benefits for specific workloads suggest that there are a lot of low hanging fruits in the area. Low hanging fruits that we can't easily reach from our current local optimum. A single implementation which has to satisfy all users all the time is unlikely to be an effective vehicle for mapping out such landscape.
Igalia developer Changwoo Min, who is working with Valve on gaming-oriented
scheduling, supported
Heo's argument, saying that: "The successful
implementation of sched_ext enriches the scheduler community with
fresh insights, ideas, and code
".
That, as of this writing, is where this conversation stands.
What next?
Sched_ext is on the schedule for the BPF track of the Linux Storage, Filesystem, Memory-Management, and BPF Summit, which begins on May 13. That discussion will cover the future development of sched_ext but, most likely, will not be able to address the question of whether this work should be merged at all. That discussion could continue, on the mailing lists and elsewhere, for some time yet.
Sometimes, when a significant kernel development stalls in this way, distributors that see value in it will ship the patches anyway, as Ubuntu, Valve, and ChromeOS are considering doing. While shipping out-of-tree code is often discouraged, it can also serve to demonstrate interest in a feature and flush out any early problems that result from its inclusion. If things go well, this practice can strengthen the argument for merging the code into the mainline, albeit with the ever-present possibility of changes that create pain for the early adopters.
Whether that will be the path taken for sched_ext remains to be seen. What is certain is that this work has attracted a lot of interest and is unlikely to go away anytime soon. Sched_ext has the potential to enable a new level of creativity in scheduler development, even if it remains out of the mainline — but that potential will be stronger if it does end up being merged. Significant scheduler patches are not merged quickly even when they are uncontroversial; this one will be slower than most if it is accepted at all.
Some 6.9 development statistics
The 6.9 kernel was released on May 12 after a typical nine-week development cycle. Once again, this is a major release containing a lot of changes and new features. Our merge-window summaries (part 1, part 2) covered those changes; now that the development cycle is complete, the time has come to look at where all that work came from — and to introduce a new and experimental LWN feature for readers interested in this kind of information.A total of 2,028 developers contributed to the 6.9 kernel; 285 of them made their first kernel contribution during this cycle. The most active contributors to 6.9 were:
Most active 6.9 developers
By changesets Uwe Kleine-König 344 2.4% Kent Overstreet 259 1.8% Christoph Hellwig 206 1.4% Krzysztof Kozlowski 201 1.4% Johannes Berg 175 1.2% Ricardo B. Marliere 172 1.2% Eric Dumazet 161 1.1% Andy Shevchenko 127 0.9% Dmitry Baryshkov 123 0.8% Thomas Gleixner 116 0.8% Andrew Davis 108 0.7% Jiri Slaby 100 0.7% Jani Nikula 99 0.7% Sean Christopherson 97 0.7% Darrick J. Wong 97 0.7% Randy Dunlap 96 0.7% Ard Biesheuvel 93 0.6% Masahiro Yamada 88 0.6% Takashi Iwai 81 0.6% Matthew Wilcox 80 0.6%
By changed lines Hamza Mahfooz 72144 9.1% Hawking Zhang 66997 8.5% Matthew Sakai 58713 7.4% Matthew Wilcox 31192 3.9% Ian Rogers 18456 2.3% Darrick J. Wong 12356 1.6% Neil Armstrong 9707 1.2% Dmitry Baryshkov 8300 1.0% Kent Overstreet 8087 1.0% Johannes Berg 7779 1.0% Ping-Ke Shih 6889 0.9% Mike Snitzer 6547 0.8% Rob Clark 5654 0.7% Christoph Hellwig 5589 0.7% Geert Uytterhoeven 5535 0.7% Shinas Rasheed 5310 0.7% Krzysztof Kozlowski 5218 0.7% Rajendra Nayak 5211 0.7% Stefan Herdler 5017 0.6% Yazen Ghannam 4995 0.6%
Following what has become a longstanding tradition, Uwe Kleine-König was the biggest contributor of changesets this time around. This work, which is mostly focused on low-level device-driver refactoring, has brought about 2,500 changesets into the kernel since 6.3 was released in April, 2023. Kent Overstreet continued the work of completing and stabilizing the bcachefs filesystem. Christoph Hellwig kept on with his extensive refactoring work in the block layer and XFS filesystem. Krzysztof Kozlowski worked extensively with drivers and devicetrees for mobile systems, and Johannes Berg did a lot of work within the kernel's WiFi subsystem.
In the "lines changed" column, Hamza Mahfooz and Hawking Zhang kept up another apparent tradition: adding huge files with lots of amdgpu register definitions. Matthew Sakai, instead, added the new dm-vdo device-mapper target. Matthew Wilcox removed the old NTFS filesystem implementation, and Ian Rogers added event definitions for Intel CPUs.
The top testers and reviewers this time around were:
Test and review credits in 6.9
Tested-by Daniel Wheeler 119 7.5% Michael Kelley 78 4.9% Sohil Mehta 71 4.5% Helge Deller 47 2.9% Philipp Hortmann 34 2.1% Shan Kang 34 2.1% Pucha Himasekhar Reddy 32 2.0% Dapeng Mi 31 1.9% Carl Worth 24 1.5% Babu Moger 23 1.4% Dietmar Eggemann 23 1.4% Shaopeng Tan 23 1.4% Peter Newman 23 1.4% Geert Uytterhoeven 22 1.4% Randy Dunlap 22 1.4% Guenter Roeck 21 1.3% Nicolin Chen 21 1.3% Juergen Gross 20 1.3% K Prateek Nayak 20 1.3% Zhang Rui 19 1.2%
Reviewed-by Simon Horman 200 2.2% Christoph Hellwig 171 1.9% Krzysztof Kozlowski 161 1.8% Konrad Dybcio 143 1.6% AngeloGioacchino Del Regno 129 1.4% Andy Shevchenko 115 1.3% Ilpo Järvinen 112 1.2% Andrew Lunn 112 1.2% Darrick J. Wong 98 1.1% Dmitry Baryshkov 98 1.1% Kees Cook 95 1.0% Linus Walleij 92 1.0% Geert Uytterhoeven 89 1.0% Neil Armstrong 88 1.0% Jiri Pirko 88 1.0% Rob Herring 87 1.0% Greg Kroah-Hartman 85 0.9% Gregory Greenman 78 0.9% Hawking Zhang 77 0.8% David Sterba 69 0.8%
The top testers continue, by all appearances, to be people who do that work as a primary job focus. On the review side, there are 19 developers who reviewed at least one patch every day during this development cycle, and five of those reviewed more than two each day.
There are 227 companies that were identified as having supported work on the 6.9 kernel, the highest number (by a small margin) since 6.4 was released. The most active employers were:
Most active 6.9 employers
By changesets Intel 1867 12.9% (Unknown) 1072 7.4% 1031 7.1% (None) 979 6.8% Linaro 924 6.4% AMD 820 5.7% Red Hat 807 5.6% SUSE 468 3.2% Meta 413 2.9% Pengutronix 372 2.6% Huawei Technologies 345 2.4% Oracle 313 2.2% Qualcomm 311 2.1% IBM 301 2.1% (Consultant) 287 2.0% Renesas Electronics 247 1.7% NVIDIA 241 1.7% Texas Instruments 210 1.5% Arm 176 1.2% Microsoft 159 1.1%
By lines changed AMD 171877 21.7% Red Hat 91448 11.5% Intel 70800 8.9% 51104 6.5% Oracle 47906 6.0% (Unknown) 44300 5.6% Linaro 41492 5.2% (None) 28388 3.6% Qualcomm 17812 2.2% Meta 17388 2.2% Renesas Electronics 17051 2.2% Realtek 13862 1.7% SUSE 11953 1.5% NVIDIA 10162 1.3% Huawei Technologies 9100 1.1% (Consultant) 7140 0.9% IBM 6777 0.9% Collabora 6760 0.9% Arm 6712 0.8% Marvell 6587 0.8%
As usual, there are not a lot of surprises here; these results do not change greatly from one release to the next — or even from one year to the next.
One last note
It has been over 17 years since Who wrote 2.6.20? was published here. Back in 2007, it was still widely said that the kernel was mostly developed and maintained by volunteers; by taking the time to map commits in the kernel repository to employers, we showed that the reality was rather different, and that most kernel developers were paid for their work.
After all these years, it sometimes seems that these articles contain about as much news as a tide table. The information found there might be useful, but it is not generally surprising. There is still interest in these articles, though, as we found out when we skipped a few development cycles some years back. Given the ongoing interest and the generally mechanical nature of putting this information together, it perhaps makes sense to delegate more of the work to a machine.
Thus, we are happy to launch the LWN Kernel Source Database as an experimental, subscriber-only feature. Much of the information found in these articles is available there, along with quite a bit more. We encourage readers to play with the system and to let us know what they think. To be clear: there is no plan to stop publishing these articles anytime soon, but now there is a resource for readers who would like to dig deeper.
The state of the page in 2024
The advent of the folio structure to describe groups of pages has been one of the most fundamental transformations within the kernel in recent years. Since the folio transition affects many subsystems, it is fitting that the subject was covered at the beginning of the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit in a joint session of the storage, filesystem, and memory-management tracks. Matthew Wilcox used the session to review the work that has been done in this area and to discuss what comes next.The first step of this transition, he began, was moving much of the information traditionally stored in the kernel's page structure into folios instead, then converting users of struct page to use the new structure. The initial goal was to provide a type-safe representation for compound pages, but the scope has expanded greatly since then. That has led to a bit of ambiguity: what, exactly, is a folio in current kernels? For now, a folio is still defined as "the first page of a compound page".
By the end of the next phase, the plan is for struct page
to shrink down to a single, eight-byte memory descriptor, the bottom few
bits of which describe what type of page is being described. The
descriptor itself will be specific to the page type; slab pages will have
different descriptors than anonymous folios or pages full of page-table
entries, for example.
Among other motivations, a key objective behind the move to descriptors is reducing the size of the memory map — the large array of page structures describing every physical page in the system. Currently, the memory-map overhead is, at 1.6% of the memory it describes, too high. On systems where virtualization is used, the memory map is also found in guests, doubling the memory consumed by the memory-map. By moving to descriptors, that overhead can be reduced to 0.2% of memory, which can save multiple gigabytes of memory on larger systems.
Getting there, though, requires moving more information into the
folio structure. Along the way, concepts like the pin count for a
page can be clarified, cleaning up some longstanding problems in the
memory-management subsystem. This move will, naturally, increase the size
of the folio structure, to a point where it will be larger than
struct page. The advantage, though, is that only one
folio structure is needed for all of the base pages that make up
the folio. For two-page folios, the total memory use is about the same;
for folios of four pages or more, the usage is reduced. If the kernel is
caching the contents of a 1GB file, it currently needs
60MB 16MB of page structures. If that caching is
done entirely with base pages, that overhead will increase to 23MB in the
future. But, if four-page folios are used instead, it drops to 9MB total.
Some types of descriptors, including those for slab pages and page-table entries, have already been introduced. The page-table descriptors are quite a bit smaller than folios, since there are a number of fields that are not needed. For example, these pages cannot be mapped into user space, so there is no need for a mapping count.
Wilcox put up a plot showing how many times struct page and struct folio are mentioned in the kernel since 2021. On the order of 30% of the page mentions have gone away over that time. He emphasized that the end goal is not to get rid of struct page entirely; it will always have its uses. Pages are, for example, the granularity with which memory is mapped into user space.
Since last year's update, quite a lot of work has happened within the memory-management subsystem. Many kernel subsystems have been converted to folios. There is also now a reliable way to determine whether a folio is part of hugetlbfs, the absence of which turned out to be a bit of a surprising problem. The adoption of large anonymous folios has been a welcome improvement.
The virtual filesystem layer has also seen a lot of folio-related work. The sendpage() callback has been removed in favor of a better API. The fs-verity subsystem now supports large folios. The conversion of the buffer cache is proceeding, but has run into a surprise: Wilcox had proceeded with the assumption that buffer heads are always attached to folios, but it turns out that the ext4 filesystem allocates slab memory and attaches that instead. That usage isn't wrong, Wilcox said, but he is "about to make it wrong" and does not want to introduce bugs in the process.
Avoiding problems will require leaving some information in struct page that might have otherwise come out. In general, he said, he would not have taken this direction with buffer heads had he known where it would lead, but he does not want to back it out now. All is well for now, he said; the ext4 code is careful not to call any functions on non-folio-backed buffer heads that might bring the system down. But there is nothing preventing that from happening in the future, and that is a bit frightening.
The virtual filesystem layer is now allocating and using large folios through the entire write path; this has led to a large performance improvement. Wilcox has also added an internal function, folio_end_read(), that he seemed rather proud of. It sets the up-to-date bit, clears the lock bit, checks for others waiting on the folio, and serves as a memory barrier — all with a single instruction on x86 systems. Various other helpers have been added and callbacks updated. There is also a new writeback iterator that replaces the old callback-based interface; among other things, this helps to recover some of the performance that was taken away by Spectre mitigations.
With regard to individual filesystems, many have been converted to folios
over the last year. Filesystems as a whole are being moved away from the
writepage() API; it was seen as harmful, so no folio version was
created. The bcachefs filesystem can now handle large folios — something
that almost no other filesystems can do. The old NTFS
filesystem was removed rather than being converted. The "netfs" layer has
been created to support network filesystems. Wilcox put up a chart showing
the status of many filesystems, showing that a lot of work remained to be
done for most. "XFS is green", he told the assembled developers, "your
filesystem could be green too".
The next step for folios is to move the mapping and index fields out of struct page. These fields could create trouble in the filesystems that do not yet support large folios, which is almost all of them. Rather than risk introducing bugs when those filesystems are converted, it is better to get those fields out of the way now. A number of page flags are also being moved; flags like PageDirty and PageReferenced refer to the folio as a whole rather than to individual pages within it, and thus should be kept there. There are plans to replace the write_begin() and write_end() address-space operations, which still use bare pages.
Beyond that, there is still the task of converting a lot of filesystems, many of which are "pseudo-maintained" at best. The hugetlbfs subsystem needs to be modernized. The shmem and tmpfs in-memory filesystems should be enhanced to use intermediate-size large folios. There is also a desire to eliminate all higher-order memory allocations that do not use compound pages, and thus cannot be immediately changed over to folios; the crypto layer has a lot of those allocations.
Then, there is the "phyr" concept. A phyr is meant to refer to a physical range of pages, and is "what needs to happen to the block layer". That will allow block I/O operations to work directly on physical pages, eliminating the need for the memory map to cover all of physical memory.
It seems that there will be a need for a khugepaged kernel thread that will collapse mid-size folios into larger ones. Other types of memory need to have special-purpose memory descriptors created for them. Then there is the KMSAN kernel-memory sanitizer, which hasn't really even been thought about. KMSAN adds its own special bits to struct page, a usage that will need to be rethought for the folio-based future.
An important task is adding large-folio support to more filesystems. In the conversions that Wilcox has done, he has avoided adding that support except in the case of XFS. It is not an easy job and needs expertise in the specific filesystem type. But, as the overhead for single-page folios grows, the need to use larger folios will grow with it. Large folios also help to reduce the size of the memory-management subsystem's LRU list, making reclaim more efficient.
Ted Ts'o asked how important this conversion is for little-used filesystems; does VFAT need to be converted? Wilcox answered that it should be done for any filesystem where somebody cares about performance. Dave Chinner added that any filesystem that works on an NVMe solid-state device will need large folios to perform well. Wilcox closed by saying that switching to large folios makes compiling the kernel 5% faster, and is also needed to support other desired features, so the developers in the room should want to do the conversion sooner rather than later.
Debian dismisses AI-contributions policy
In April, the Gentoo Linux project banned the use of generative AI/ML tools due to copyright, ethical, and quality concerns. This means contributors cannot use tools like ChatGPT or GitHub Copilot to create content for the distribution such as code, documentation, bug reports, and forum posts. A proposal for Debian to adopt a similar policy revealed a distinct lack of love for those kinds of tools, though it would also seem few contributors support banning them outright.
Tiago Bortoletto Vaz started
the discussion on the Debian project mailing list on May 2, with the
suggestion that the project should consider adopting a policy on the use of
AI/ML tools to generate content. Vaz said that he feared that Debian was
"already facing negative consequences in some areas
" as a
result of this type of content, or it would be in a short time. He referenced the Gentoo
AI policy, and Michał Górny's arguments
against AI tools on copyright, quality, and ethical grounds. He said
he was in agreement with Górny, but wanted to know how other Debian contributors felt.
Ansgar Burchardt wrote
that generative AI is "just another tool
". He noted that Debian doesn't ban Tor, even
though it can be used to violate copyright or for unethical things,
and it doesn't ban human contributions due to quality concerns: "I
don't see why AI as yet another tool should be different.
"
Others saw it differently. Charles Plessy responded
that he would probably vote for a general resolution
against "the use of the current commercial AI for generating Debian packaging, native, or
infrastructure code
". He specified "commercial AI" because "these systems are
copyright laundering machines
" that abuse free software, and found
the idea that other Debian developers would use them discouraging. He was not
against generative AI technology itself, however, as long as it was trained on
content that the copyright holders gave consent to use for that purpose.
Russ Allbery was
skeptical of Gentoo's approach of an outright ban, since "it is
(as they admit) unenforceable
". He also agreed with Burchardt,
"we don't make policies against what tools people use locally for
developing software
". He acknowledged that there are potential
problems for Debian if output from AI tools infringes copyright. Even
so, banning the use of those tools would not make much difference:
"we're going to be facing that problem with upstreams as
well, so the scope of that problem goes far beyond
" direct
contributions to Debian. The project should "plan to be reactive [rather] than
attempt to be proactive
". If there are reports that AI-generated content is a
copyright violation, he said, then the project should deal with it as
it would with any Debian
Free Software Guidelines (DFSG) violation. The project may need to
make judgment calls about the legal issues then, but "hopefully this
will have settled out a bit in broader society before we're forced to
make a decision on a specific case
".
Allbery said his primary concern about the impact of AI is its practical impact:
Most of the output is low-quality garbage and, because it's now automated, the volume of that low-quality garbage can be quite high. (I am repeatedly assured by AI advocates that this will improve rapidly. I suppose we will see. So far, the evidence that I've seen has just led me to question the standards and taste of AI advocates.)
Ultimately, Allbery said he saw no need for new policies. If there
is a deluge of junk, "we have adequate mechanisms to
complain and ask that it stop without making new policy
". The only
statement he wanted to convey so far is that "anyone relying on AI
to summarize important project resources like Debian Policy or the
Developers Guide or whatnot is taking full responsibility for any
resulting failures
".
A sense of urgency
In reply to Allbery, Vaz conceded that Gentoo's policy was not perfect but, despite the difficulty in enforcing it, he maintained there was a need to do something quickly.
Vaz, who is an application
manager (AM) for the Debian new maintainer process, suggested that Debian was already seeing
problems with AI output submitted during the new maintainer (NM)
process and as DebConf submissions, but declined
to provide examples. "So far we can't [prove] anything, and even if
we could, of course we wouldn't bring any of the involved to the
public arena
". He did, however, agree that a statement was a more
appropriate tool than a policy.
Jose-Luis Rivas replied
that Vaz had more context than the rest of the participants in the
discussion and that "others do not have this same information and
can't share this sense of urgency
". He inferred that an NM
applicant might be using a large-language model (LLM) tool during the NM process, but in that
scenario there was "even less point
" in making policy or a
statement about the use of such tools. It would be hard to prove that an LLM was
in use, and "ultimately [it] is in the hands of those judging
"
to make the decisions. "I can't see the point of 'something
needs to be done' without a clear reasoning of the expectations out of
that being done
".
Vaz argued
that having a policy or statement would be useful, even in the absence of proof
that an LLM was in use. He made a comparison to Debian's code of
conduct and its diversity statement: "They might seem quite obvious to some, and
less so to others.
" Having an explicit position on the use
of LLMs would be useful to educate those who are "getting to use LLMs in
their daily life in a quite mindless way
" and "could help us both
avoid and mitigate possible problems in the future
".
The NM scenario Vaz gave was not convincing to Sam Hartman, who replied that the process would not benefit from a policy. It is up to candidates to prove to their application manager (AM), advocates, and reviewers that they can be trusted and have the technical skills to be a Debian Developer:
I as an AM would find an applicant using an LLM as more than a possibly incorrect man page without telling me would violate trust. I don't need a policy to come to that conclusion.
He said he did not mind if a candidate used an LLM to refresh their memory, and saw no need for them to cite the use of the LLM. But if the candidate didn't know the material well enough to catch bad information from an LLM, then it's clear they are not to be trusted to choose good sources of information.
On May 8, after the conversation had died down, Vaz wrote
that it was apparent "we are far from a consensus on an official
Debian position regarding the use of generative AI as a whole in the
project
". He thanked those who had commented, and said that he
hoped the debate would surface again "at a time when we better
understand the consequences of all this
".
It is not surprising to see Debian take a conservative, wait-and-see approach. If Debian is experiencing real problems from AI-generated content, they are not yet painful or widespread enough to motivate support for a ban or specific policy shift. A flood of AI gibberish, or a successful legal challenge to LLM-generated content, might turn the tide.
Managing expectations with a contributions and credit policy
Maintainers of open-source projects sometimes have disagreements with contributors over how contributions are reviewed, modified, merged, and credited. A written policy describing how contributions are handled can help maintainers set reasonable expectations for potential contributors. In turn, that can make the maintainer's job easier because it can help reduce a source of friction in the project. A guide to help create this kind of policy for a project has recently been developed.
People sometimes have rather different expectations about how open-source projects function with regard to contributions. For example, a recent discussion about how to credit a Linux kernel patch that had two authors attracted more than 600 comments, covering a wide range of opinions from "the original author should have sole credit" to "the original author should get no credit at all". Another kind of disagreement is over which types of contributions are welcome: some projects don't want external contributions, or any new features, but contributors keep sending them anyway.
In the absence of a written policy, contributors will make assumptions about the way a project operates—and may start an argument if they don't get the expected response. A written policy describing how contributions are processed and credited can prevent conflicts from even starting. To help maintainers create their own policies, I co-authored a credits and contribution policy development guide with Maria Matějka, Martin Winter, Marcos Sanz, and other members of the RIPE Open Source Working Group.
Coverage
A contributions and credit policy is most useful when it focuses on areas where maintainers and contributors frequently have different, incompatible expectations, such as: what contributions are welcome, how reviews and changes will be handled, and how credit will be assigned. A policy can also include step-by-step instructions for maintainers and contributors on how to handle complicated situations, such as when a contributor won't make requested changes.
A policy can and should cover all kinds of contributions. Melissa Mendonça, a maintainer for NumPy, SciPy, and napari, said: "Often, things like docs, community, design work are not credited as contributions in the usual sense (i.e., don't give you green squares on GitHub)." A policy can help surface these contributions and create a standard process for acknowledging them.
A policy can help resolve three common dilemmas: assigning credit for multiple direct contributors, revising contributions, and attracting the right contributions.
Deciding on who deserves credit for a work is a difficult and unsolved philosophical question. Practically speaking, people assign credit for many reasons, such as rewarding people for contributing, establishing ownership of intellectual property, or tracking down the source of bugs or backdoors.
As the authors of the policy development guide, we took a practical approach to this question with our own contributions and credit policy. We asked ourselves what kind of behaviors we wanted to encourage in our project, and then assigned credit in ways that make those outcomes more likely. Since we want to attract new contributors, we give primary author credit to the new contributor even if a maintainer had to completely rewrite the contribution. Other projects can make similarly pragmatic decisions without solving the general problem of who gets credit.
Sometimes people will try to take advantage of a formal credit policy. Mendonça says that credit for open-source work is "always a system that you can game (for example, making just enough contributions to get a certain credit/position and then moving on)". However, she says that she is "willing to err on the side of giving credit", and take corrective action afterward. Any policy can have an explicit exception for the maintainer to take whatever action they think necessary.
Multi-contributor credit
Some policies for assigning credit are easy to implement, such as the current de facto policy for many projects: "whoever merges the code decides who gets credit". However, for more complex credit policies, some difficult situations quickly arise. Matějka, team leader for the BIRD routing daemon (and a co-author of the policy guide) said:
We sometimes get contributions with good ideas and poor quality. It's always a discussion how much the specific patch is more an idea (to be thanked for in the commit message) or a patch (where we keep the author but add a note that it was updated by the committer).Conversely, credit for a multi-author contribution can also turn into blame; if the original author cannot review the change, crediting the changes to them can end up blaming them for the editor's mistakes. Solving complicated questions of credit can use hours of a maintainer's time or result in a contribution hanging in limbo for months. A standard policy lets maintainers decide how to handle these situations once and then simply refer to the policy in the future.
Many contributions need some revision before they can be merged into the main project. This can go wrong in many ways: maybe no one has time to review, contributors don't respond to requests, or the contribution no longer merges by the time it is approved. A policy can help with these situations in two ways: it can give the maintainer a playbook for how to handle difficult situations (e.g., after two weeks of no response, make the edits and merge the contribution), and it can give contributors guidelines for when to ping the maintainers (e.g., after two weeks of no response, email the mailing list again).
Attracting the right contributions
Many people assume that "open source" means "open to outside contributions and willing to mentor new contributors". In reality, some projects don't want new contributors or features, others will accept outside contributions but only after they have been totally rewritten, and only some maintainers have the time and interest to mentor new contributors. An explicit policy about what contributions are desired helps maintainers avoid conflict from would-be contributors. It also helps both potential contributors and potential users decide whether they want to depend on a project with that particular contribution policy.
One of my early open-source contributions was guided by a contributions policy of sorts. The xjack screensaver prints "All work and no play makes Jack a dull boy" over and over, with a variety of mistakes and typos, inspired by a scene in The Shining. A comment in the source code tells would-be contributors not to bother sending in a patch to change the words it prints. However, while I was doing exactly that, by changing it to print "All work and no play makes Val a dull girl", I found a bug in the code that generated typos. I sent in a patch to fix that bug, which was merged (without credit).
Policy resources
The credits and contribution policy development guide includes several variations on each section of the policy, as well as a list of policies in use by open-source projects. We request (but do not require) giving the authors credit on any derivative work. The policy guide has its own contributions and credit policy and new contributors are welcome. Currently we are especially interested in additional examples of choices for each section of the policy, as well as links to existing policies.
Page editor: Jonathan Corbet
Next page:
Brief items>>
