MAINTAINERS truth and fiction
The MAINTAINERS file doesn't exist just to give credit to maintainers; developers make use of it to know where to send patches. The get_maintainer.pl script automates this process by looking at the files modified by a patch and generating a list of email addresses to send it to. Given that misinformation in this file can send patches astray, one would expect it to be kept up-to-date. Recently, your editor received a suggestion from Jakub Kicinski that there may be insights to be gleaned from comparing MAINTAINERS entries against activity in the real world. A bit of Python bashing later, a new analysis script was born.
Digging into MAINTAINERS
There are, it turns out, 2,280 "subsystems" listed in the MAINTAINERS file. Each of those subsystems includes a list of covered files and directories. One can look at the commits applied against those files to see who has been working in any given subsystem; writing patches obviously qualifies as this sort of work, but so do other activities like handling patches (as indicated by Signed-off-by tags) or reviewing them (Reviewed-by or Acked-by). By making use of a bit of CPU time diverted from cryptocurrency mining, it is possible to come up with an approximation of when a given subsystem's listed maintainers last actually did some work in that subsystem.
The full results of this analysis are available for those wanting to see the details.
There are, however, ways of narrowing down the data a bit to pick out some of the more interesting artifacts in this file. For example, there are 367 subsystems for which there is no maintainer or the maintainer has never been seen in the entire Git history (excluding "subsystems" with no files — see below). In many of these cases, the subsystem itself is well past the prime of its life; there simply isn't a lot of work for a 3c59x network-card maintainer to do these days. The networking developers are not buried in ATM patches, the Palm Treo hasn't seen much support work, Apple has released few M68k systems recently, there aren't many Arm floppy drives still in use, and S3 Savage video cards just aren't the must-have device they once were. Many of these entries are likely to point to code that could be removed altogether.
Similar lessons can be drawn from the list of subsystems with no listed maintainers at all. Of course, some of those are rather vague in other ways as well; one subsystem is simply called "ABI/API" and points to the linux-api mailing list. There is actually one file associated with this "subsystem"; it's kernel/sys_ni.c, which handles calls to non-implemented system calls. This entry is thus an attempt to get developers to copy the linux-api list when they add new system calls. A similar entry exists for "Arm subarchitectures".
Some maintainerless subsystems, such as the framebuffer layer, could probably benefit from somebody willing to take them over. The reiserfs filesystem lacks a maintainer but still seems to have some users. Others, like DECnet or the Matrox framebuffer, are probably best left alone (or removed) at this point.
Some "subsystems" listed in the MAINTAINERS file have no files to maintain; one interesting example is "embedded Linux", allegedly maintained by Paul Gortmaker, Matt Mackall, and David Woodhouse. Given the success of embedded Linux, one can only assume that they are doing an outstanding job. The "device number registry" claims to be maintained, but the entry contains only a pointer to a nonexistent web page. The URLs in the "disk geometry and partition handling" entry still work, but the pages do not appear to have been updated for well over a decade; not much is happening with Zip drive geometry these days, it would appear. The man pages, instead, are actively maintained, but they do not exist within the kernel tree.
Help needed
There are a couple of conclusions that can be drawn from the results so far. One is that many kernel subsystems are not really in need of maintenance at this point; some of them, instead, may be in need of removal. Another is that perhaps the MAINTAINERS file itself is in need of a bit of cleanup in spots. But it is also worth asking whether this data can be used to spot subsystems that could benefit from a new maintainer. To answer that question, some additional CPU time was expended to find all subsystems meeting these criteria:
- There is either no listed maintainer or the alleged maintainers have been inactive in that subsystem for at least six months.
- At least 50 commits have touched that subsystem since the release of the 5.5 kernel in January 2020.
The idea behind this search was to find subsystems that are still undergoing some sort of active development, but which do not have an active, listed maintainer. The results can be divided into a few different categories.
Some MAINTAINERS entries have broad lists of covered files that make the commit count seem larger than it really is. For example, the subsystem named "ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API" includes all of drivers/dma, which is also claimed by "DMA GENERIC OFFLOAD ENGINE SUBSYSTEM". That subsystem, in turn, is actively maintained by Vinod Koul. There are two subsystems that fall into this category; in the tables below "Activity" indicates the last observed activity by the listed maintainers (if any), while "Commits" shows the number of commits affecting the subsystem since 5.5:
Subsystem Activity Commits ASYNCHRONOUS TRANSFERS/TRANSFORMS (IOAT) API —— 536 HISILICON NETWORK SUBSYSTEM DRIVER 2019-11-16 258
These subsystems either do not exist as a separate entity, or they should have their lists of covered files reduced to match reality.
Then, there are the subsystems where the maintainers hide behind a corporate email alias. The listed maintainer for "DIALOG SEMICONDUCTOR DRIVERS" is support.opensource@diasemi.com, which is obviously not an address that will appear in any actual commits. A look within that subsystem shows active reviews from diasemi.com addresses, though, so the subsystem cannot really be said to be unmaintained. This category contains:
Subsystem Activity Commits DIALOG SEMICONDUCTOR DRIVERS —— 120 QUALCOMM ATHEROS ATH9K WIRELESS DRIVER —— 65 WOLFSON MICROELECTRONICS DRIVERS —— 146
Related to the above are subsystems where the maintainer entry is simply out of date; the listed maintainer is inactive, but somebody else, often from the same company, has picked up the slack and is acting as a de-facto maintainer. These include:
Subsystem Activity Commits HISILICON NETWORK SUBSYSTEM 3 DRIVER (HNS3) 2019-11-16 234 HISILICON SECURITY ENGINE V2 DRIVER (SEC2) 2020-06-18 55 LINUX FOR POWER MACINTOSH 2018-10-19 71 MELLANOX ETHERNET INNOVA DRIVERS —— 93 MELLANOX MLX4 IB driver —— 70 OMAP HWMOD DATA 2016-06-10 102 QCOM AUDIO (ASoC) DRIVERS 2018-05-21 125 TEGRA I2C DRIVER 2018-05-30 56
Finally, there are the subsystems that truly seem to lack a maintainer; they typically show patterns of commits either merged by a variety of subsystem maintainers, or passing through one of a few maintainers of last resort. They are:
Most of the above will be unsurprising to people who have been paying attention to the areas in question. The framebuffer subsystem is a known problem area; the "soft scrollback" capability was recently removed from the framebuffer driver due to a lack of maintainership. Quite a few people depend on this code still, but it is increasingly difficult to integrate with the kernel's graphics drivers and few people have any appetite to delve into it.
Subsystem Activity Commits ARM/UNIPHIER ARCHITECTURE —— 73 DRBD DRIVER 2018-12-20 51 FRAMEBUFFER LAYER —— 402 HMM - Heterogeneous Memory Management 2020-05-19 54 I2C SUBSYSTEM HOST DRIVERS —— 434 MARVELL MVNETA ETHERNET DRIVER 2018-11-23 65 MEDIA DRIVERS FOR RENESAS - VIN 2019-10-10 56 MUSB MULTIPOINT HIGH SPEED DUAL-ROLE CONTROLLER 2020-06-24 54 NFC SUBSYSTEM —— 72 PROC FILESYSTEM —— 171 PROC SYSCTL 2020-06-08 51 QLOGIC QLGE 10Gb ETHERNET DRIVER 2019-10-04 77 STAGING - REALTEK RTL8188EU DRIVERS 2020-07-15 121 STMMAC ETHERNET DRIVER 2020-05-01 174 UNIVERSAL FLASH STORAGE HOST CONTROLLER DRIVER —— 277 USB NETWORKING DRIVERS —— 119 X86 PLATFORM DRIVERS - ARCH —— 120
The I2C host drivers do, in fact, have a de-facto maintainer; it's Wolfram Sang, who also maintains the core I2C subsystem. He has long wished for help maintaining those drivers but none seems to be forthcoming, so he takes care of them in the time that is available. /proc is an interesting example; everybody depends on it, but nobody has taken responsibility for its maintenance. HMM, too, is interesting; its creator went to a lot of effort to get the code merged, but appears to have moved on to other pursuits now.
All of the above look like places where aspiring kernel developers could lend a welcome hand.
What about subsystems that have no entry in the MAINTAINERS file at all? If one were to bash out a quick script to find all files in the kernel tree that are not covered by at least one line in MAINTAINERS, one would end up with a list of just over 2,800 files. These include the MAINTAINERS file itself, naturally. Of the rest, the vast majority are header files under include/, most of which probably do have maintainers and should be added to the appropriate entries. Discouragingly, there are 72 files under kernel/ without a listed maintainer — a situation which certainly does not reflect reality. The SYSV IPC code is unmaintained, reflecting its generally unloved nature. Most of the rest of the unmaintained files are under tools/ or samples/.
A harder case to find is that of files that are covered by a MAINTAINERS entry, but which are not actually maintained by the named person; this will happen often with entries that cover entire directory trees. Your editor is listed as handling all of Documentation, but certainly cannot be said to be "maintaining" many of those files, for example; this is a situation that will arise in many places in the kernel tree.
If one were to try to draw some overall conclusions from this data, they might read something like the following. The MAINTAINERS file definitely has some dark corners that could, themselves, use some maintenance (some of which is already being done). There are some parts of the kernel lacking maintainers that could definitely use one, and other parts that have aged beyond the point of needing maintenance. For the most part, though, the subsystems in the kernel have designated maintainers, and most of them are at least trying to take care of the code they have responsibility for. The situation could be a lot worse.
[As usual, the script used to generate the above tables can be found in the
gitdm repository at git://git.lwn.net/gitdm.git.]
Index entries for this article | |
---|---|
Kernel | Development model/Maintainers |
Posted Jan 14, 2021 21:24 UTC (Thu)
by mchehab (subscriber, #41156)
[Link] (3 responses)
> MEDIA DRIVERS FOR RENESAS - VIN | 2019-10-10 | 56
Out of curiosity, as this one is under media sub-system, and I know this one has been actively maintained, I did a quick check on it:
The MAINTAINERS' entry for this one is:
> MEDIA DRIVERS FOR RENESAS - VIN
The last commit from its author (at linux-next) were in 2019-11-25:
> $ git log --author niklas drivers/media/platform/rcar-vin/
Ok, there's no .mailmap entry for Niklas "+" syntax.
Yet, this is a de-facto standard supported by almost all e-mail servers, and (somewhat) defined at RFC 5233.
At least on media, we've seen several people using either "name+sponsor" (we even have a major developer using "name-sponsor" because his e-mail server doesn't seem to support "+").
I'm wondering if the results would be too different if some rule were added to cover cases like that at the script ;-)
Posted Jan 14, 2021 21:29 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
Posted Jan 14, 2021 21:54 UTC (Thu)
by mchehab (subscriber, #41156)
[Link]
Makes sense for gitdm itself, as it would make sense to give different vendor credits for: name+vendor_a@bar and name+vendor_b@bar.
In the specific case of just checking if a MAINTAINERS file has updated e-mails, I guess the script could simply ignore "+vendor" part of the e-mails, if, by doing that, there's a match.
-
Btw, thanks for this article! It is very nice to see some analysis about the quality of the entries at MAINTAINERS!
Posted Jan 15, 2021 15:54 UTC (Fri)
by hmh (subscriber, #3838)
[Link]
I will look into either updating my entry in MAINTAINERS and/or adding an alias for the email. Sorry about that.
Posted Jan 14, 2021 21:38 UTC (Thu)
by mchehab (subscriber, #41156)
[Link] (1 responses)
Not necessarily. It could be split into a per subsystem files. As far as I can tell, get_maintainers.pl already supports it.
On such case, each subsystem-specific MAINTAINERS' file could have its own maintainer.
> In many of these cases, the subsystem itself is well past the prime of its life
I guess one of the problems with MAINTAINERS is that subsystems and drivers are equally listed there without any split. Looking at the big picture at the full analysis data (https://lwn.net/Articles/842419/), I noticed several drivers listed there whose last commit from the maintainer happened a long time ago - but just because they're working properly and there was no recent need to touch them, maybe except for some kAPI changes.
IMO, it would make more sense to have a main MAINTAINERS file with the subsystems that are merged directly upstream, plus a series of per-subsystem MAINTAINERS file, containing mostly driver maintainers.
I suspect that this could help to keep MAINTAINERS updated.
Posted Jan 14, 2021 21:41 UTC (Thu)
by corbet (editor, #1)
[Link]
Posted Jan 14, 2021 22:08 UTC (Thu)
by ukleinek (subscriber, #56625)
[Link]
I see that reviewers (i.e. people with an R: entry) are not considered even though there are rumors that the difference between M: and R: is only if you should add this person to To: or Cc: when creating a patch.
Also it would be interesting to see not only the latest contribution but the number of contributions for each person. (Maybe with some factors involved to make older contributions count less?!)
Best regards from Germany
Posted Jan 17, 2021 4:05 UTC (Sun)
by unixbhaskar (guest, #44758)
[Link]
Now, the problem is, I am so tempted to raise my hand to "help" but what stopping me , knowing fully what limitation I have and certainly don't become a burden to a overly worked maintainers.
......it certainly not one person's work([pretty evident considering the size kernel has) ...
Posted Feb 7, 2021 17:15 UTC (Sun)
by ksandstr (guest, #60862)
[Link] (1 responses)
The cost of removing support for old hardware is that anyone wishing to run that old hardware must use an old version of the kernel. The oldest longterm kernel receiving "official" support is 4.4.256; any hardware whose support was removed before then (or worse, a removal included in the 4.4 LTS cycle) will only work with a slew of bugs and security issues reintroduced.
The benefit of removing support is, usually, maintainer comfort; such as when aes-i586 was removed during 2020q2. (the maintainers did argue, without benchmarks, that the compiler would do a better job[0].) This is understandable when a maintainer would need to track down ancient PCI network hardware and re-cap an old 68k motherboard. But in practice maintainers aren't required to test all drivers before endorsing a common-layer change; the Linux development process assumes that when non-build breakage happens, affected parties raise a flag and a shout. It appears that maintainer comfort ultimately means someone going "f##k this old s##t" at something they've assumed responsibility for but isn't in their employer's micro-interest to spend two seconds of peripheral vision on; and certainly between removing and not that's the non-constructive option.
What this boils down to is that removing things usually makes Linux worse, and only very rarely better. This is a consequence of the development process laid down decades ago, which is still as good as it was then[1]. Therefore removals should be considered very, very carefully; much more so than today, where Linux appears to hover just barely above the line where J. Random Careerist can roll up, submit a feature-removal changeset, and have s/h/its name in the hallowed changelogs forever for a petty CV boost.
[0] it doesn't, and now all pre-AESNI x86 hardware spends more joules in disk encryption than before. As it turns out, programs hand-optimized for two-way superscalar are also hand-optimized for out-of-order x86. This used to be non-esoteric knowledge, if not entirely mundane.
Posted Feb 8, 2021 9:04 UTC (Mon)
by jem (subscriber, #24231)
[Link]
The process does not seem to work the way you describe. I follow kernel changes here on Lwn.net and elsewhere, and when something is removed the typical changelog comment is that the removed part has been broken for N years, and *nobody complained*.
E-mail subaddressing
> M: Niklas Söderlund <niklas.soderlund@ragnatech.se>
> Author: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
> Date: Wed Nov 25 17:44:49 2020 +0100
>
> media: rcar-vin: Rework CSI-2 firmware parsing
Many of the +addresses are handled explicitly in the alias list; I doubt this is the only one to have slipped through, though...
E-mail subaddressing
E-mail subaddressing
Contact email (MAINTAINERS) and contribution emails differing...
MAINTAINERS truth and fiction
Splitting the MAINTAINERS file was tried in 2017, but Linus didn't like it so it didn't happen.
Splitting MAINTAINERS
Reviewers not considered?
Uwe
MAINTAINERS truth and fiction
In support of removing this "let's remove things" thing
[1] if anything it's better now, with hardware interfaces having become much more standard than they were in the early 2000s; so the amount of "legacy bulk" increases slower than ever before.
In support of removing this "let's remove things" thing