Finding real-world kernel subsystems
This work was undertaken to develop a more formalized model of how kernel development works. With such an understanding, it is hoped, ways can be found to make the process work better and to provide new tools. The researchers have a particular interest in safety-critical deployments of Linux. Safety-critical environments are highly sensitive; working software can make a life-or-death difference there. So safety-critical developers have to ensure software quality by any means available.
One such means is to take a close look at the development process, on the
reasonable assumption that the process impacts the quality of the final
result. Assuming that the process itself makes sense, a project that
adheres more closely to its defined process should
produce higher-quality software. So if it can be proved that a project's
developers strictly comply with their development process, the level of
assurance is higher and certification — generally necessary for
safety-critical systems — is easier to achieve.
The Linux kernel presents some major challenges when it comes to certification due to its open development process. Nobody documents the process or the degree to which it is adhered to. But, she said, with a bit of data mining, much of that information can be recovered after the fact. Her focus is on patch integration in particular and whether patches are being merged by the appropriate subsystem maintainers. If patches are taking "strange paths", that is a sign that the process is not being followed.
Eichinger ran into a little problem on the path to that goal, though: where can one find the subsystem hierarchy that defines this process? Where are the documents describing these subsystems; more to the point, what is a subsystem, exactly? It may seem like a trivial question, she said; that is what the MAINTAINERS file is for. But it is not that easy; as was covered in this article (which she cited during the talk), the information in this file is neither complete nor 100% accurate.
First of all, many kernel subsystems do not appear in MAINTAINERS at all. But the picture is less than clear even for those that are present. Consider, for example, the "media subsystem"; there is no entry for it. There are, however, over 100 MAINTAINERS entries with "media" in the name somewhere. Which of those is the true media subsystem? The answer is not clear for somebody who is not closely familiar with the kernel community.
Eichinger and company needed a definition of a "subsystem", so they made their own. Entries in the MAINTAINERS file do not clearly describe subsystems, so they were deemed instead to be "sections" that describe some part of the kernel code base. Many of these sections share files with each other; those were designated as "thematically related". By finding and grouping clusters of related sections, the kernel's true subsystems could be found.
To do so, she processed all of the section entries and plotted them on an undirected graph, where the sections themselves were the vertices and shared lines of code make up the edges. The initial graph looked like this (from Eichinger's slides [PDF]):
That was, she allowed, a bit messy. To try to create something more useful, she cut the graph down to the largest 20% of the sections in the MAINTAINERS file. The result for the aforementioned media subsystem looked like this:
Therein one sees a number of sections for specific drivers, including a sizeable sub-cluster in the staging directory and a small blob in the Android drivers. The section that ties it all together is "media input infrastructure" — the actual media subsystem.
The picture for the direct rendering (DRM) subsystem looks a little different:
This subsystem appears as a large collection of related small clusters, with a lot of overlap between them. She described this organization as "non-conforming" with the hierarchical subsystem model; it seems likely that what is actually seen here is the distributed, group-maintainer model used by the DRM developers.
At this point, she has some sort of definition of subsystems, twelve of which were identified at the top level. Those twelve were the Arm architecture, drivers, crypto, USB, DRM, networking, media, documentation, sound, SCSI, more Arm stuff (OMAP architecture code, for example), and Infiniband. Along with that, she has a tool that can automate this sort of subsystem detection. It is, she said, "just scratching the surface" of the problem, but it is a start.
There are a number of ways this work could go in the future. One would be to examine historical kernel releases to build a history of how kernel subsystems have evolved over time. This model can also be used, of course, for the original purpose of determining how well the actual kernel patch flow conforms to the maintainer model. There may be scope for applying this technique to other projects as well.
For more information, readers can go to Eichinger's
bachelor thesis describing the entire project. The code for performing
this analysis (called "PaStA") can be found in this GitHub repository.
Index entries for this article | |
---|---|
Kernel | Development model/Maintainers |
Conference | linux.conf.au/2021 |
Posted Feb 1, 2021 22:40 UTC (Mon)
by blackwood (guest, #44174)
[Link] (1 responses)
Which I guess is just another way to state that the graph does indeed not capture the group maintainership nature of how most things are done in drm.
Posted Feb 2, 2021 21:47 UTC (Tue)
by jezuch (subscriber, #52988)
[Link]
Posted Feb 2, 2021 9:05 UTC (Tue)
by linusw (subscriber, #40300)
[Link]
Maybe that characteristic comes with a bit of compulsive interest in process and bureaucracy.
The other factor I percieve in this is the German car industry - particularly BMW - which is strongly pushing the agenda to use Linux in mission-critical realtime systems, such as self-driving vehicles. As the ISO certification for mission-critical systems require formal process this research becomes a means to an end.
These observations made it easier for me to understand this research in context. (I might be wrong.)
Posted Feb 2, 2021 19:55 UTC (Tue)
by nix (subscriber, #2304)
[Link] (2 responses)
Sorry, does this follow at all? It doesn't seem to do so to me. This is only the case if we know that the defined process in question is a local or global maximum in terms of the process producing high-quality software. Just looking at an existing process and formalizing it does not imply in any way that sticking to that formal model will generate higher-quality software, just that it's sticking closely to the model that the software's development process was already following: if that process was bad, following a formal model of it will produce bad software. This seems likely to me to (at best) be a wash in terms of quality, on the average.
Posted Feb 3, 2021 1:13 UTC (Wed)
by interalia (subscriber, #26615)
[Link] (1 responses)
Yes I found this a bit puzzling as well. I think the unstated assumption is that the development process is thought to be a good one, therefore following it properly is also a good thing. Given two projects A and B which both follow development process Z, then the quality of A will be "better" than B if A follows the process strictly whereas B only follows it haphazardly. In this situation "better" is not quite true, it's more like the quality of A would be steadier and therefore more reliable than B's quality, which I think intuitively makes sense.
I can see how this might make sense in a certification context where if the dev process is certified but the project doesn't really follow it, then approving the process is useless. It sounds a bit like an audit for ISO 9001 compliance.
Posted Feb 3, 2021 11:27 UTC (Wed)
by pizza (subscriber, #46)
[Link]
Posted Feb 4, 2021 11:27 UTC (Thu)
by sam.thursfield (subscriber, #94496)
[Link]
Posted Feb 8, 2021 6:47 UTC (Mon)
by emorrp1 (guest, #99512)
[Link]
Finding real-world kernel subsystems
Finding real-world kernel subsystems
Finding real-world kernel subsystems
Finding real-world kernel subsystems
Finding real-world kernel subsystems
Finding real-world kernel subsystems
Finding real-world kernel subsystems
Finding real-world kernel subsystems