LWN.net Weekly Edition for December 15, 2022
Welcome to the LWN.net Weekly Edition for December 15, 2022
This edition contains the following feature content:
- The return of lazy imports for Python: an ongoing discussion on a requested Python performance improvement ... goes on.
- Troubles with triaging syzbot reports: fuzz testing is a valuable technique, but it also risks overwhelming maintainers with bug reports.
- mimmutable() for OpenBSD: an upcoming OpenBSD system call to harden applications against attack.
- Bugs and fixes in the kernel history: is the kernel community fixing more bugs than it is introducing?
- Development statistics for the 6.1 kernel (and beyond): where the code in 6.1 came from — from two points of view.
This week's edition also includes these inner pages:
- Brief items: Brief news items from throughout the community.
- Announcements: Newsletters, conferences, security updates, patches, and more.
Please enjoy this week's edition, and, as always, thank you for supporting LWN.net.
The return of lazy imports for Python
Back in September, we looked at a Python Enhancement Proposal (PEP) to add "lazy" imports to the language; the execution of such an import would be deferred until its symbols were needed in order to save program-startup time. While the problem of startup time for short-running, often command-line-oriented, tools is widely acknowledged in the Python community, and the idea of deferring imports is generally popular, there are concerns about the effect of the feature on the ecosystem as a whole. Since our article, the PEP has been revised and discussed further, but the feature was recently rejected by the steering council (SC) because of those concerns; that has not completely ended the quest for lazy imports, however.
Updated PEP
In early October, Germán Méndez Bravo started
a new discussion thread to discuss the updates that had been made to PEP 690 ("Lazy
Imports"). He and co-author Carl Meyer "have (hopefully) considered and
addressed each and all of the suggestions in the previous discussion
thread, by either providing rejection reasons or improving the API and
implementation
". They updated the reference implementation of the
feature, so that interested developers could try it out.
Méndez Bravo also posted some benchmark results that he got when testing three different versions of the interpreter: vanilla CPython, CPython with lazy imports added but unused, and CPython using lazy imports. The idea was to measure the impact of the feature on the operation of the interpreter, rather than the gains that might be found for a particular command-line-program use case. He summarized the impact as pretty minimal, with the disabled imports having no measurable impact versus the vanilla interpreter, while the other two combinations were only 1% slower.
SC member Brett Cannon had
some "personal feedback
" about the updated proposal. In his
opinion,
there are too many ways to enable and disable the feature. In particular,
he found
the enable_lazy_imports_in_module()
API to be "too magical
". It was meant
for SciPy use cases, Méndez Bravo said,
so that individual modules could control their imports without impacting
the rest of an application, but Cannon said that since those modules would
already
need to be modified, they should be changed to do something more explicit. The
PEP authors seem to have agreed with that, since that call was removed
from
the final version of the PEP.
The conversation then languished for a month before another SC member, Petr Viktorin, picked the conversation back up in mid-November. Once again, he was speaking for himself and not the committee; he had concerns about modifying the Python dict object to support the feature. Because the PEP specifies that lazy imports are to be transparent, dictionary lookup is changed to handle lazy objects that represent modules that have not (yet) actually been imported, as described in the Implementation section of the PEP. The Rationale section explains the intended behavior:
The aim of this feature is to make imports transparently lazy. "Lazy" means that the import of a module (execution of the module body and addition of the module object to sys.modules) should not occur until the module (or a name imported from it) is actually referenced during execution. "Transparent" means that besides the delayed import (and necessarily observable effects of that, such as delayed import side effects and changes to sys.modules), there is no other observable change in behavior: the imported object is present in the module namespace as normal and is transparently loaded whenever first used: its status as a "lazy imported object" is not directly observable from Python or from C extension code.
The lazy objects are stored in a module's symbol dictionary (i.e. module.__dict__); in order to ensure that any code that digs around in the module dictionary cannot expose the lazy objects, the underlying dictionary code must be changed. Viktorin was concerned that the behavior could be an obstacle for dictionary optimizations and features in the future. Méndez Bravo agreed that there was a bit of complexity added to the dictionary code, but thought that it was manageable—and that doing things that way was better than other alternatives that had been tried in the Cinder CPython fork where the lazy imports work began. Meta was able to achieve up to 70% reduction in startup times on Python command-line tools using Cinder's lazy imports.
PEP rejected
On December 2, Gregory P. Smith posted the steering council's decision to reject the PEP. The main reason was the effect that it would have on the Python user community:
But a problem we deem significant when adding lazy imports as a language feature is that it becomes a split in the community over how imports work. A need to test code both ways in both traditional and lazy import setups arises. It creates a divergence between projects who expect and rely upon import time code execution and those who forbid it. It also introduces the possibility of unexpected import related exceptions occurring in code at the time of first use virtually anywhere. Such exceptions could bubble up from transitive dependency first use in unanticipated places.A world in which Python only supported imports behaving in a lazy manner would likely be great. But we cannot rewrite history and make that happen. As we do not envision the Python [language] transitioning to a world where lazy imports are the default, let alone only, import behavior. Thus introducing this concept would add complexity to our ecosystem.
The SC also had some concerns with the implementation described in the PEP, including the changes needed to the dictionary implementation, but ultimately decided that those did not matter; the SC would have said "no" even if those problems were addressed. To a certain extent, though, the SC rejection opened to the floodgates to more discussion of the feature.
Both Guido van Rossum and PEP sponsor Barry Warsaw
expressed
disappointment
with the rejection, though both could understand the council's reasoning
for doing
so. Both also noted that the PEP was the best proposal for the feature
that they
had seen. As Warsaw put it: "It was the best option so far for solving
a common use case, and one that puts pressure on ecosystems to move away
from Python.
" Meyer wondered
if there was any appetite for a revised proposal that changed to
explicitly specifying each use (e.g. lazy import foo)
and that created a dict subclass to be used for module
dictionaries if they contain lazy imports. That would address many of the
areas of concern, though it would not really change the
fragmentation issue.
One big question that underlies much of the debate about the feature is around who should decide whether lazy imports are enabled—or supported. PEP 690 envisions application authors enabling lazy imports for the entire application and opting out of laziness for just the few modules that are dependent on being eagerly imported. Back in August, Méndez Bravo described following that process with code at Instagram (which is where Cinder came from), where it worked well.
But others are not so sure that it is application developers who should be making the determination. Viktorin would rather see ways for library authors to take advantage of the feature:
Overall, I think we should make it easier for libraries to use lazy imports themselves, à la SciPy or Mercurial.The current proposal is made for "applications" with tightly controlled set of dependencies. Those are relatively rare in open-source code, and closed-source ones don't have a good way to report bugs that only appear in a specific setup back to the libraries they're using. And the libraries can't test things themselves very well.
Adding explicit lazy syntax to the import sites would allow
libraries to slowly
opt into the feature. The PEP rejected
that approach, but he thought the reasons might be specific to the
Meta/Instagram use case. "Porting to explicit lazy imports, library by
library, would take time and effort, but might eventually give better
results ecosystem-wide.
" Doing so would also allow the implementation
to avoid some of the problem areas:
With explicit lazy imports, we could get away with rougher side effects, avoiding too much magic. Dicts could focus on being containers. Code that needs too much introspection or dynamic features simply wouldn't opt in.
There is concern that library maintainers will be pressured to support lazy imports of their library, however. Warsaw wondered if adding explicit "eager import" syntax would help library maintainers avoid that pressure, but Viktorin did not think it would change anything:
Lazy imports need to be tested, and to be generally useful (outside big apps with rigid dependency chains), they should be tested in individual library test suites. There'll be demand for testing, maintenance, mental overhead around the fact that your library can be imported in two different ways.
That is, of course, already the case, since imports can already be deferred in various ways. Since there is no direct language support for delaying imports, however, that leaves it up to the user of a library, which is part of what Warsaw liked in the PEP:
What I liked about the PEP was that it (at least attempted) to put the burden on the application developer, which is where I think the majority of the responsibility lies. For example, if I turned on implicit lazy imports in my Python CLI, and I found that one of my dependencies can't be lazily imported, I think I'd report the issue (or file a PR) to the dependency, but then I'd just eager-ify the import and my CLI would be none the worse off.
But, as Cannon noted, it is important to consider both the application and the library when looking at doing a lazy import:
The tricky bit with lazy imports as a concept is both the code doing the import and code being imported are affected. Right now there's no handshake in both directions saying both sides of this "transaction" agree that lazy imports are a good thing. You almost need the code being lazily imported to opt into the possibility, and then the person doing the importing saying they want those semantics.
Meyer did
not think that having libraries opt into being lazily imported made
sense, however. If
lazy import foo is shallow, where only foo itself is
lazily imported and not any of the imports it contains (unless specified as
lazy in
foo), then the feature is "effectively just syntactic (and
maybe performance) sugar for manually inlining the import, which is already
possible and not infrequently done
". The PEP gives an example of the manual
inlining that he mentions:
# regular import import foo def func1(): return foo.bar() def func2(): return foo.baz() # manually inlined def func1(): import foo return foo.bar() def func2(): import foo return foo.baz()In the second case, foo will not actually be imported until one of the functions is called. At that point, any imports in foo will be processed (eagerly) as well. Meyer also listed some reasons why he thinks it makes sense to add the syntactic sugar. For one, manual inlining is verbose ("
Sometimes syntactic sugar tastes sweet"), but also:
Manual inlining invokes the import system every time the function is called, which has a noticeable cost. The PEP 690 approach reduces this overhead to zero, after the initial reference that triggers the import.
It is not entirely clear where things go from here. The discussion has largely tailed off as of this writing, but it is a feature that some find useful. The performance and memory-saving benefits that Méndez Bravo reported are certainly eye opening. Finding some way to bring those benefits to all Python users, without fracturing the ecosystem, would definitely be welcome. Perhaps the explicit approach will gain some more traction—and a PEP of its own—before too long.
Troubles with triaging syzbot reports
A report from the syzbot kernel fuzz-testing robot does not usually spawn a vitriolic mailing-list thread, but that is just what happened recently. While the invective is regrettable, the underlying issue is important. The dispute revolves around how best to report bugs to affected subsystems and, ultimately, how not to waste maintainers' time.
Al Viro was apparently fed
up with syzbot reports that involved the ntfs3
filesystem but that were not copied (CCed) to the maintainers of ntfs3.
The syzbot message was sent to the kernel mailing list, but Viro shouted
his reply that
"ANY BUG REPORTS INVOLVING NTFS3 IN
REPRODUCER NEED TO BE CCED TO MAINTAINERS OF NTFS3
". That complaint had
been relayed several times in the past, he indicated, without the problem
getting
fixed, so he was planning to stop looking at the reports. In fact, they
will be "getting triaged
straight to /dev/null here
".
After an ... impenetrable reply from Hillf Danton, Viro followed up with more details of the problems he sees. He pointed to a post from September where he made a similar request and said that others had also reported these kinds of problems to the maintainers of syzbot. The issue is that the mail sent by syzbot does not contain enough useful information for someone to quickly determine if it pertains to their area of interest:
It's really a matter of triage; as it is, syzkaller folks are expecting that any mail from the bot will be looked into by everyone on fsdevel, on the off-chance that it's relevant for them. What's more, it's not just "read the mail" - information in the mail body is next to useless in such situations. [...]What really pisses me off is that on the sending side the required check is trivial - if you are going to fuzz a filesystem, put a note into report, preferably in subject. Sure, it's your code, you get to decide what to spend your time upon (you == syzkaller maintainers). But please keep in mind that for [recipients] it's a lot of recurring work, worthless for the majority of those who end up bothering with it. Every time they receive a mail from that source.
Ignore polite suggestions enough times, earn a mix of impolite ones and .procmailrc recipes, it's that simple...
Danton misunderstood
what Viro was complaining about, but Matthew Wilcox tried to explain.
The complaint is not that the linux-fsdevel list is being
copied on the mail, but that the ntfs3 maintainers are not. Wilcox said:
"So this is just noise.
And enough noise means that signal is lost.
"
Viro agreed and
painstakingly described exactly how he (and any other interested recipient
of a syzbot report) would triage it, which eventually ends up at the syzkaller
dashboard entry for the bug and its syzkaller
reproducer. That file, which resembles "line
noise
", as Viro noted, does contain enough information to see that it
was an ntfs3 filesystem that was being fuzzed. But that information is not
in the email (or, better still, email subject), nor is it used to direct
the report to the right people to look at it. The underlying problem is
that the syzkaller/syzbot maintainers are not providing the relevant data,
which should be easily obtained:
From what I've seen in various discussions, the assumption of syzkaller folks seems to be that most of the relevant information is in stack trace and that's sufficient for practical purposes - anything beyond that is seen as unwarranted special-casing. [...]Face it, the underlying assumption is broken - for a large class of reports the stack trace does not contain the relevant information. It needs to be augmented by the data that should be very easy to get for the bot. Sure, your code, your priorities, but reports are only useful when they are not ignored and training people to ignore those is a bad idea...
Ted Ts'o agreed,
noting that he has been asking for improvements of this sort for several
years. Syzbot "is not doing things that really could be
done automatically --- and cloud VM time is cheap, and upstream
maintainer time is expensive
". In effect, the syzbot developers are
not being respectful of upstream maintainers' time, he said. Things have
been improving, but not in this particular area:
Now, to be fair to the Syzbot team, the Syzbot console has gotten much better. You can now download the syzbot trace, and download the mounted file system, when before, you had to do a lot more work to extract the file system (which is stored in separate constant C array's as compressed data) from the C reproducer. So have things have gotten better.
Marco Elver reported that the problem is being worked on by the syzbot project. He pointed to a bug report comment from syzkaller (and syzbot) creator Dmitry Vyukov that was posted at the end of November. It linked to yet another message from Viro complaining about the problem. Looking further at the bug comment thread makes it clear that progress is being made on identifying what to search for and on adding tags to email subject lines to identify which filesystem is being fuzzed.
The thread eventually went completely off the rails, including a message
that seems likely to draw a response from the kernel code of
conduct committee. The overall tone of the thread was unfortunate, at
least in spots, but
both Ts'o and Viro (especially the latter) spent a fair amount of time
patiently reiterating the problems
that have been raised multiple times along the way, albeit at a lower
volume. Those requests did not go far, so, as Ts'o put it, "maybe
something a bit
more.... assertive by Al [Viro] is something that will inspire them to
prioritize this feature request
".
Fuzz testing generates a huge number of reports; in order for the testing to be effective—useful—those reports have to be acted upon. Since that is the goal, it obviously makes sense to create reports that can be quickly routed to the right people. This not the first time we have seen complaints about fuzzing reports, and in a filesystem context, but hopefully we are on track to see improvements soon.
mimmutable() for OpenBSD
Virtual-memory systems provide a great deal of flexibility in how memory can be mapped and protected. Unfortunately, memory-management flexibility can also be useful to attackers bent on compromising a system. In the OpenBSD world, a new system call is being added to reduce this flexibility; it is, though, a system call that almost no code is expected to use.OpenBSD founder Theo de Raadt first proposed a new system call, called mimmutable(), at the beginning of September. After numerous revisions, the system call looks to be merged as:
int mimmutable(void *addr, size_t len);
A call to mimmutable() will render the mapping of the len bytes of memory starting at addr immutable, meaning that the kernel will not allow any changes to either the memory protections or the mapping in that range. As a result, system calls like mmap() or mprotect() that would affect that range will, instead, fail.
At first glance, mimmutable() looks similar in spirit to OpenBSD's pledge(), which restricts the system calls that the calling process may use. But, while pledge() calls appear in numerous programs in the OpenBSD repository, mimmutable() calls will be rare indeed. Most developers lack a detailed understanding of the memory layout of their programs and are not well placed to render portions of their address space immutable, but the kernel and the linker are a different story.
The details of how mimmutable() will be used are described in detail in this email from De Raadt. In simplified form, it starts when the kernel loads a new executable image; once the text, stack, and data areas have been mapped, they will be made immutable before the program even starts running. For static binaries, the C runtime will do a bit of fixup and then use mimmutable() to make most of the rest of the mapped address space immutable as well. For dynamically linked binaries, the shared-library linker (ld.so) performs a similar set of tasks, mapping each library into the address space, then making most of those mappings immutable.
All of this will happen automatically, without any awareness on the part of the program being loaded. The end result will be a process that cannot make changes to almost all of its mapped address space (though it can always create new mappings in parts of the address space that have not yet been mapped). There is one little exception:
So this static executable is completely immutable, except for the OPENBSD_MUTABLE region. This annotation is used in one place now, deep inside libc's malloc(3) code, where a piece of code flips a data structure between readonly and read-write as a security measure. That does not become immutable.
Making this whole scheme work requires changes beyond just the OpenBSD kernel; the compiler toolchain, in particular, needed enhancements to mark the sections that must remain mutable when the program is loaded. There were evidently some programs that needed tweaks to work properly in this environment; since OpenBSD manages the kernel and user space together, it is able to make the sort of changes that Linux, out of fear of causing user-space regressions, normally cannot.
Even so, implementing mimmutable() involves a fair amount of fiddly work; one would assume that the OpenBSD developers expect to see a corresponding benefit. One obvious place is with executable memory. OpenBSD has long gone out of its way to prevent memory from being simultaneously writable and executable, but the protection that comes from this restriction goes away if an attacker is somehow able to load malicious code into a writable region, then change the permissions afterward. Nailing down the protections for a process's data areas will make that kind of attack impossible.
Beyond that, though, OpenBSD uses a couple of memory protections that are not present in Linux. One of those marks executable memory that is empowered to call into the kernel; on OpenBSD systems, only the C library is given that capability. That will prevent hostile code loaded elsewhere from making direct system calls; protecting the rest of a process with mimmutable() will prevent the changing of protections to allow system calls from elsewhere (such changes would be done with msyscall() on OpenBSD).
OpenBSD also has a special marker for memory regions that are intended to hold stacks. Whenever a process enters the kernel, its stack pointer is checked to see whether it is, indeed, pointing into a stack region; if not, the process is killed. This check thwarts "stack pivot" attacks, where an attacker redirects the stack pointer into a region of memory that is more conducive to the attack being performed. Once again, mimmutable() will prevent an attacker from turning ordinary data regions into stack-capable regions.
It is possible that a system call like mimmutable() could be used to improve security on Linux systems, but it would be a harder project. Linux kernel developers lack the ability to modify user-space programs in lockstep, so it is harder to make this kind of change without breaking somebody's code somewhere. For example, adding a "direct system calls allowed" protection bit could easily break a lot of programs under Linux that, for whatever reason, are not using the C-library wrappers and are calling directly into the kernel.
Similar roadblocks apply for restrictions on stack pointers. The kernel does have a "grows down" bit that identifies stack regions — but only those that can grow. Multithreaded programs often create threads with fixed-length stacks that will lack this bit. As a result, any user-space program that creates stacks for threads would need modification to set such a bit explicitly, and kernel developers cannot make such modifications happen. So stack-pointer checks are not likely to come to Linux anytime soon.
Still, there may be value in a system call that makes memory mappings immutable. Getting such a thing into Linux would require a developer interested in implementing it, a demonstration that user-space code would make use of it, and some sort of convincing story describing attacks that would be thwarted by it. There would probably also be a need to get changes into the toolchains to support this feature. It's a high bar, as is normally the case for new system calls, but perhaps somebody might eventually be inspired to try to get a patch over it.
Bugs and fixes in the kernel history
Each new kernel release fixes a lot of bugs, but each release also introduces new bugs of its own. That leads to a fundamental question: is the kernel community fixing bugs more quickly than it is adding them? The answer is less than obvious but, if it could be found, it would give an important indication of the long-term future of the kernel code base. While digging into the kernel's revision history cannot give a definitive answer to that question, it can provide some hints as to what that answer might be.
Tagged fixes
In current kernel practice, a developer who fixes a bug is expected to include a Fixes tag in the patch description identifying the commit that introduced that bug. This is a relatively recent practice; while various forms of Fixes tags had appeared in commits for some time, the first patch using the current form with the hash of the offending commit appears to be this revert from Rafael Wysocki for 3.12 in October 2013. In that release, only two commits identified buggy commits from previous releases, but the use of this tag grew quickly in subsequent development cycles. The 6.0 kernel release included 2,784 commits with Fixes tags, 2,112 of which identified commits from previous releases as the source of the bug being fixed (the remaining commits fixed bugs that had been introduced in 6.0 and thus never appeared in a released kernel).
Thus, in theory, one can simply count Fixes tags over a development cycle to see how many bugs from previous releases were fixed. Then, looking at subsequent releases, the Fixes tags can be used to see how many bugs were introduced in that cycle and fixed in later cycles. If the number of bugs fixed in a development cycle regularly exceeds the number of bugs introduced in that cycle, then chances are good that the kernel is getting better over time. The idea is simple, but runs into some practical difficulties that will be explored later on. We can start with a plot showing how the above analysis comes out:
(This data can also be viewed in tabular form on this page).
In the above plot, the thicker lines are counts of Fixes tags; so the brown "bugs introduced" line is the number of times that a commit in a given release was identified by a Fixes tag in subsequent releases, while the green "bugs fixed" line shows the number of Fixes tags in a given release identifying buggy commits in previous releases. The thinner lines are instead counting commits: "buggy commits introduced" is the number of commits in a given release that were later fixed, and "commits fixed" is the number of commits from previous releases that were fixed in a given release.
The two sets of numbers differ for a simple reason: some commits are sufficiently buggy that they need to be fixed more than once — a topic we'll return to shortly. There is an interesting difference here, though: in any given development cycle, the number of bugs fixed tracks closely with the number of commits fixed, but there is a big difference between the number of bugs introduced and the number of buggy commits introduced. What we can conclude from this difference is that commits that introduce a lot of bugs require multiple development cycles for all of those bugs to be fixed. It is rare to see a lot of fixes to the same commit in any one development cycle.
Can this plot answer the question posed at the beginning of this article, though? A naive reading shows that the lines cross and that, thus, number of bugs fixed exceeds the number of bugs introduced as of the 5.1 release. But that result must clearly be taken with a fair amount of salt. As has been seen in other recent examinations of Fixes tags, bugs lurk in the kernel for a long time. Kernel developers are still finding and fixing bugs introduced early in the 2.6 era — and before. So the "bugs introduced" numbers for recent kernels are clearly too low, as can be seen by the fact that those lines head toward zero for the most recent releases.
The number of bugs introduced does appear, though, to level out in the range of 1,200 to 1,400 per release; this can be seen in the older releases, where the numbers are unlikely to change much at this point. That trend seems to continue through about 5.8 or so, after which the curve drops down and clearly does not reflect long-term reality. Should this pattern hold — something only time will tell — then the point where the curves cross may move, but it seems likely to remain in the early 5.x era. If that is truly the case then, in recent times at least, the kernel community may well be fixing more bugs than it is introducing.
What might have caused the situation to change? Your editor does not know but can wave his hands as well as anybody else. One possibility is improved development tools and, especially, the increased use of fuzz testing to turn up old bugs and prevent new ones. The slow but steady growth in the kernel's (still inadequate) testing infrastructure will have helped. Increased insistence on patch review may have helped to keep the number of bugs introduced roughly constant even as the volume of code going into the kernel has increased. Or perhaps none of the above applies.
It is also almost certainly true that developers have become more disciplined about adding Fixes tags, causing more bug fixes to actually be counted as such while not actually reflecting a change in the rate at which fixes are happening. In general, Fixes tags may be the best proxy we have for actual bug counts, but they are still an inaccurate metric; it depends on developers to carefully add them and to correctly identify the commits that introduce bugs.
The buggiest commits
One thing those tags might do reliably, though, is to identify the buggiest commits in the kernel's history. Remember that some commits require more than one fix over time; some of them require quite a few more than one. Here is a table of the most-fixed commits during the Git era:
Commit Fixes Description 1da177e4c3f4 355 Linux-2.6.12-rc2 e126ba97dba9 70 mlx5: Add driver for Mellanox Connect-IB adapters 8700e3e7c485 65 Soft RoCE driver 46a3df9f9718 54 net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support 9d71dd0c7009 42 can: add support of SAE J1939 protocol 76ad4f0ee747 38 net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC 604326b41a6f 38 bpf, sockmap: convert to generic sk_msg interface 1738cd3ed342 38 net: ena: Add a driver for Amazon Elastic Network Adapters (ENA) e1eaea46bb40 35 tty: n_gsm line discipline e7096c131e51 34 net: WireGuard secure network tunnel 1c1008c793fa 33 net: bcmgenet: add main driver file d5c65159f289 29 ath11k: driver for Qualcomm IEEE 802.11ax devices c0c050c58d84 27 bnxt_en: New Broadcom ethernet driver. c09440f7dcb3 27 macsec: introduce IEEE 802.1AE driver 7724105686e7 26 IB/hfi1: add driver files d2ead1f360e8 25 net/mlx5e: Add kTLS TX HW offload support 7733f6c32e36 25 usb: cdns3: Add Cadence USB3 DRD Driver 726b85487067 24 qla2xxx: Add framework for async fabric discovery 1e51764a3c2a 24 UBIFS: add new flash file system a49d25364dfb 24 staging/atomisp: Add support for the Intel IPU v2 96c8395e2166 24 spi: Revert modalias changes 3c4d7559159b 23 tls: kernel TLS support d7157ff49a5b 23 mtd: rawnand: Use the ECC framework user input parsing bits 6a98d71daea1 22 RDMA/rtrs: client: main functionality 3f518509dedc 22 ethernet: Add new driver for Marvell Armada 375 network unit ca6fb0651883 21 tcp: attach SYNACK messages to request sockets instead of listener ad67b74d2469 21 printk: hash addresses printed with %p c29f74e0df7a 20 netfilter: nf_flow_table: hardware offload support d2ddc776a458 20 afs: Overhaul volume and server record caching and fileserver rotation 1a86b377aa21 20 vdpa/mlx5: Add VDPA driver for supported mlx5 devices
One might wonder about what went wrong with Linux-2.6.12-rc2, which has been fixed (at last count) 355 times. That is, of course, the initial commit that started the Git era, so fixes identifying that commit are for bugs that were introduced prior to April 2005. Even in 2022, bugs of that vintage are still being found and fixed.
After that, the conclusion to be drawn is not that surprising: the commits that need a lot of fixes tend to be the large ones that add a significant new subsystem. A lot of new code will inevitably bring a fair number of new bugs with it, and those bugs will need to be discovered and fixed over time. One interesting exception might be ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") which inserted 47 lines in 2015 and has been fixed 21 times since, most recently in February for 5.17. Also noteworthy is 96c8395e2166 ("spi: Revert modalias changes"), which deleted six lines of code and has required 24 fixes thereafter. Beyond those, though, the commits needing a large number of fixes have been large in their own right.
Perhaps more interesting is the fact that, of the 30 most-fixed commits shown above, 22 are related to networking (including InfiniBand). The networking subsystem is a large part of the kernel, but it is still a small piece of the whole and not the only subsystem that merges large patches. It's not clear why networking-related patches, in particular, would be more likely to need many fixes.
Bugs are a fact of life in software development, unfortunately, and we are unlikely to be free of them anytime soon. If an optimistic reading of the data above reflects reality, though, then it is possible that the kernel-development community may have reached a point where it is fixing more bugs than it introduces. LWN will surely revisit this topic in the future to see how the situation evolves.
Development statistics for the 6.1 kernel (and beyond)
The 6.1 kernel was released on December 11; by the time of this release, 13,942 non-merge changesets had been pulled into the mainline, growing the kernel by 412,000 lines of code. This is thus not the busiest development cycle ever, but neither is it the slowest, and those changesets contained a number of fundamental changes. This release will also be the long-term-support kernel for 2022. Read on for a look at where the work in 6.1 came from.The work in 6.1 was contributed by 2,043 developers, of whom 303 made their first contribution to the kernel in this release. The most active 6.1 developers were:
Most active 6.1 developers
By changesets Krzysztof Kozlowski 221 1.6% Yang Yingliang 169 1.2% Andy Shevchenko 145 1.0% Johan Hovold 145 1.0% Zhengchao Shao 142 1.0% Maxime Ripard 127 0.9% Hans de Goede 121 0.9% Vladimir Oltean 92 0.7% Jani Nikula 90 0.6% Wolfram Sang 89 0.6% Dmitry Baryshkov 87 0.6% Christoph Hellwig 85 0.6% Matthew Wilcox 84 0.6% Gaosheng Cui 81 0.6% Michael Straube 73 0.5% Mark Brown 71 0.5% Takashi Iwai 69 0.5% Josef Bacik 68 0.5% Pavel Begunkov 67 0.5% Johannes Berg 67 0.5%
By changed lines Liam Howlett 86771 11.3% Frank Min 56729 7.4% Ping-Ke Shih 21352 2.8% Ian Rogers 12588 1.6% Stephen Hemminger 12154 1.6% Miguel Ojeda 12019 1.6% Zhengchao Shao 10771 1.4% Zong-Zhe Yang 8101 1.1% Uwe Kleine-König 6776 0.9% Rodrigo Siqueira 6464 0.8% Dmitry Baryshkov 6064 0.8% Thomas Zimmermann 5831 0.8% Alex Elder 5767 0.8% Vladimir Oltean 5740 0.7% Moudy Ho 5462 0.7% Hui.Liu 5451 0.7% Sreekanth Reddy 5263 0.7% Konrad Dybcio 5038 0.7% Geetha sowjanya 4916 0.6% Ville Syrjälä 4899 0.6%
Krzysztof Kozlowski contributed more changesets to 6.1 than any other developer; this work consisted almost entirely of devicetree changes. Yang Yingliang contributed a lot of cleanup work in the device-driver subsystem. Andy Shevchenko made a wide-ranging set of improvements to various drivers, Johan Hovold contributed driver fixes and devicetree changes, and Zhengchao Shao was active in the networking subsystem.
The "changed lines" column looks rather different. Liam Howlett only
contributed 12 59 patches to 6.1, but some of them were
big: they added the maple tree data structure and the initial uses
of it to the memory-management subsystem. Frank Min's three patches added
the inevitable set of amdgpu register definitions. Ping-Ke Shih worked on
the Realtek rtw89 wireless network adapter, Ian Rogers added a set of Intel
event definitions to the perf subsystem, and Stephen Hemminger removed
support for the DECnet protocol.
The top testers and reviewers this time around were:
Test and review credits in 6.1
Tested-by Daniel Wheeler 200 13.0% Philipp Hortmann 134 8.7% Yu Zhao 70 4.6% Gurucharan G 31 2.0% Nathan Chancellor 29 1.9% Marek Szyprowski 27 1.8% Victor Nogueira 26 1.7% Linux Kernel Functional Testing 26 1.7% Naresh Kamboju 25 1.6% Peter Zijlstra 22 1.4% Kees Cook 21 1.4% Shaopeng Tan 21 1.4% Alexander Stein 21 1.4% Xin Hao 21 1.4% Cristian Marussi 21 1.4%
Reviewed-by Andy Shevchenko 229 3.0% Krzysztof Kozlowski 211 2.8% Hans de Goede 147 1.9% David Sterba 131 1.7% AngeloGioacchino Del Regno 131 1.7% Rob Herring 130 1.7% Dmitry Baryshkov 114 1.5% Kees Cook 91 1.2% Tariq Toukan 89 1.2% Hawking Zhang 83 1.1% Linus Walleij 67 0.9% Laurent Pinchart 67 0.9% Jan Kara 67 0.9% Guenter Roeck 66 0.9% Andrew Lunn 66 0.9%
Daniel Wheeler's presence at the top of the test-credit column is pretty much a given at this point; he continues to test patches from his colleagues at AMD at a rate of two or three patches every day of the development cycle. Philipp Hortmann, instead, tested patches to Realtek drivers in the staging tree. Yu Zhao's test credits appear in various memory-management changes, mostly related to the maple-tree work. On the review side, Andy Shevchenko reviewed patches for work all over the kernel tree, while Krzysztof Kozlowski focused mostly on device-tree changes and Hans de Goede reviewed a lot of platform-driver patches.
The most active employers this time around were:
Most active 6.1 employers
By changesets Huawei Technologies 1281 9.2% Intel 1254 9.0% (Unknown) 1097 7.9% 917 6.6% Linaro 837 6.0% AMD 750 5.4% Red Hat 672 4.8% (None) 564 4.0% Meta 414 3.0% NVIDIA 389 2.8% SUSE 333 2.4% Oracle 318 2.3% NXP Semiconductors 275 2.0% IBM 260 1.9% Renesas Electronics 224 1.6% (Consultant) 208 1.5% Microchip Technology Inc. 192 1.4% Arm 187 1.3% MediaTek 164 1.2% Collabora 144 1.0%
By lines changed Oracle 91852 12.0% AMD 89761 11.7% 56504 7.4% Intel 44062 5.8% (Unknown) 33765 4.4% Realtek 33277 4.3% Linaro 31234 4.1% Huawei Technologies 27856 3.6% NVIDIA 25441 3.3% Red Hat 24073 3.1% (None) 21498 2.8% Meta 18783 2.5% MediaTek 17599 2.3% NXP Semiconductors 14342 1.9% SUSE 13749 1.8% Brocade 12154 1.6% Microchip Technology Inc. 11651 1.5% Pengutronix 10200 1.3% Broadcom 8054 1.1% Marvell 8036 1.0%
Huawei clearly had a busy development cycle, with 117 developers contributing changes throughout the kernel. Otherwise, these results show yet another fairly typical development cycle.
Looking back
The kernel's development cycle runs for nine or ten weeks before producing the next major release. There is, however, another cycle built on top of that. The last release from each calendar year receives long-term support, for a period of up to six years. These kernels are the ones that end up in most products and distributions over time; at this point, one could maybe say that the LTS cycle is the real kernel release cycle, with the other releases just being intermediate stabilization points.
Unless something extremely surprising happens, 6.1 will be the final kernel release for 2022, and thus will become the next LTS kernel. Given that 6.1 is the endpoint for the year-long LTS development cycle, then a look at the full cycle is also potentially of interest. The previous LTS kernel, 5.15, was released on October 31 2021. Since then, the kernel community has merged 86,660 patches from 5,034 developers — 1,741 of whom were first-time contributors — with a net growth of over 3.7 million lines of code. The most active developers over this entire period were:
Most active developers, 5.16 through 6.1
By changesets Krzysztof Kozlowski 1134 1.3% Christoph Hellwig 918 1.1% Matthew Wilcox 716 0.8% Sean Christopherson 687 0.8% Andy Shevchenko 683 0.8% Ville Syrjälä 646 0.7% Michael Straube 631 0.7% Jakub Kicinski 583 0.7% Geert Uytterhoeven 560 0.6% Martin Kaiser 552 0.6% Hans de Goede 536 0.6% Dmitry Baryshkov 516 0.6% Jani Nikula 487 0.6% Mark Brown 471 0.5% Vladimir Oltean 466 0.5% Christophe JAILLET 454 0.5% Johannes Berg 453 0.5% Eric Dumazet 447 0.5% Pavel Begunkov 445 0.5% Mauro Carvalho Chehab 430 0.5%
By changed lines Aurabindo Pillai 341685 5.5% Leo Li 227954 3.7% Hawking Zhang 225057 3.7% Qingqing Zhuo 198735 3.2% Huang Rui 197305 3.2% Roman Li 155944 2.5% Zhengjun Xing 152525 2.5% Oded Gabbay 150670 2.4% Ping-Ke Shih 147214 2.4% Ian Rogers 145313 2.4% Dmitry Baryshkov 92702 1.5% Liam Howlett 86859 1.4% Jakub Kicinski 81325 1.3% Frank Min 56729 0.9% Christoph Hellwig 46400 0.8% Martin Habets 44438 0.7% Zhan Liu 34647 0.6% David Howells 31466 0.5% Krzysztof Kozlowski 30380 0.5% Nick Terrell 28611 0.5%
The most prolific developer during this time, once again Krzysztof Kozlowski, contributed over three patches for every single day of this extended development cycle — and still accounted for only 1.3% of the total. The top seven contributors in the "lines changed" column got there as the result of adding amdgpu register definitions; they account for well over 1 million lines of added code, which is a large portion of the total growth of the kernel.
That effect can be seen in the employer numbers as well:
Most active employers, 5.16 through 6.1
By changesets Intel 9295 10.7% (Unknown) 6134 7.1% 5597 6.5% Red Hat 4916 5.7% AMD 4474 5.2% Linaro 4373 5.0% (None) 4029 4.6% Huawei Technologies 3649 4.2% Meta 2904 3.4% NVIDIA 2563 3.0% SUSE 2269 2.6% (Consultant) 2018 2.3% Oracle 1944 2.2% IBM 1703 2.0% Renesas Electronics 1487 1.7% Arm 1391 1.6% NXP Semiconductors 1292 1.5% MediaTek 1212 1.4% Alibaba 944 1.1% Microchip Technology Inc. 943 1.1%
By lines changed AMD 1653237 26.8% Intel 744059 12.1% 345341 5.6% Linaro 302581 4.9% (Unknown) 212408 3.4% Meta 209040 3.4% Red Hat 202698 3.3% Realtek 185932 3.0% NVIDIA 179426 2.9% (None) 165999 2.7% Oracle 132210 2.1% MediaTek 112561 1.8% Huawei Technologies 101436 1.6% (Consultant) 86227 1.4% SUSE 81624 1.3% NXP Semiconductors 71782 1.2% Xilinx 66447 1.1% IBM 65059 1.1% Renesas Electronics 58418 0.9% Microchip Technology Inc. 50529 0.8%
The employer numbers do not change much from one cycle to the next, so it is unsurprising that a year's worth of numbers looks about the same as well. AMD's showing in the "lines changed" column demonstrates what the results of regularly dumping machine-generated register definitions into the kernel can be.
Finally, another metric of interest is non-author signoffs: the application of a Signed-off-by tag to a patch written by somebody else. That normally happens when a maintainer accepts a patch and adds it to their repository to eventually send upstream. These tags can thus reveal who is doing the maintainer work in the kernel community:
Non-author signoffs, 5.16 through 6.1
Maintainers Greg Kroah-Hartman 5242 6.3% David S. Miller 4605 5.6% Mark Brown 3708 4.5% Alex Deucher 3689 4.4% Jakub Kicinski 2968 3.6% Andrew Morton 2900 3.5% Jens Axboe 1925 2.3% Mauro Carvalho Chehab 1872 2.3% Bjorn Andersson 1729 2.1% Martin K. Petersen 1490 1.8% Paolo Bonzini 1456 1.8% Kalle Valo 1290 1.6% Arnaldo Carvalho de Melo 1221 1.5% Michael Ellerman 1109 1.3% Vinod Koul 1023 1.2% David Sterba 986 1.2% Alexei Starovoitov 957 1.2% Jonathan Cameron 948 1.1% Hans Verkuil 944 1.1% Shawn Guo 888 1.1%
Employers Red Hat 10945 13.2% Linaro 10488 12.6% Intel 8115 9.8% Meta 6946 8.4% 6649 8.0% Linux Foundation 5738 6.9% AMD 4058 4.9% SUSE 3526 4.3% NVIDIA 2315 2.8% Huawei Technologies 2070 2.5% IBM 1916 2.3% (Consultant) 1887 2.3% Oracle 1854 2.2% Qualcomm 1696 2.0% (None) 1533 1.8% Arm 1270 1.5% (Unknown) 1094 1.3% Cisco 950 1.1% Microsoft 580 0.7% Renesas Electronics 552 0.7%
The most active maintainers deal with dozens of patches every day and somehow manage to keep their sanity anyway. The list of companies employing maintainers has changed a bit over time; Linaro has been moving up for some time, for example. But it remains true that relatively few companies support the maintainer role; over half of the patches being merged into the mainline kernel pass through the hands of developers working for just five companies. One of the best ways for companies to improve their support for kernel development would be to give their developers the time and encouragement to become maintainers.
All told, though, the kernel's development process continues to move forward at a rapid pace, producing releases on a regular schedule and bringing several new developers into the community every day. The overall picture shows a community that is seemingly in good health and continuing to manage the challenges posed by its fast pace of development.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Briefs: X.Org security release; Linux 6.1; Bugzilla; Firefox 108; Git 2.39; OpenShot 3; PHP 8.2; ...
- Announcements: Newsletters, conferences, security updates, patches, and more.