Kernel development
Brief items
Kernel release status
The current development kernel is 4.4-rc8, released on January 3. "Normally, me doing an eighth release candidate means that there is some unresolved issue that still needs more time to get fixed. This time around, it just means that I want to make sure that everybody is back from the holidays and there isn't anything pending, and that people have time to get their merge window pull requests all lined up. No excuses about how you didn't have time to get things done by the time the merge window opened, now."
Previously, 4.4-rc7 came out on December 27.
Stable releases: none have been released since December 14.
Quotes of the week
Kernel development news
On moving on from being a maintainer
The maintainer of a Linux subsystem has a large, and largely thankless, job to do. While reviewing patches is clearly technical in nature, much of the rest of the work is almost clerical—and it takes enough time that there may be little or no time for programming or other actively technical tasks. Thus, it is not a surprise to see that maintainers burn out over time and start looking for other work (in the kernel or elsewhere) to do. In fact, it is surprising that it doesn't happen more often. However, there is no clear path for relinquishing the maintainer role—and generally no succession plan—which can make the transition kind of tricky.
That scenario is currently playing out for the md (software RAID)
subsystem, where
maintainer Neil Brown has announced that he
intends to step down on February 1. Brown got "sucked
in
" to being
the md maintainer in late 2001 because there was no one else doing it.
Since there is no "obvious
candidate for replacement maintainer - no one who has already been
doing significant parts of the maintainer role
", he intends to
create a maintainership vacuum in the hopes that one or more folks step up
to fill the role.
He laments that he has not been able to attract additional maintainers,
though he noted that there
are some folks in the md community who are certainly capable of doing the
job. The question,
in Brown's eyes, is whether or not they "care about the code and the
subsystem
", which is something that only individuals can determine
for themselves. That means he doesn't feel in a position to appoint anyone
to the role and would like to see folks volunteer. By stepping down, he
hopes that might create a little pressure to step up.
As he noted, Linus Torvalds has expressed a preference for small maintainer
teams, which might make sense for md. Another alternative might be for the
device mapper (dm) maintainer team to add md maintainer duties. Beyond
just md, though, he is also relinquishing the maintainer role for the mdadm administrative tool.
That could be handled by the new md maintainer or team, though he would
prefer to see different people maintain md and mdadm. According to Brown
(in response to an email query), there are two main reasons he favors that
separation: it worked well when he handed off nfsd to Bruce Fields and
nfs-utils to Steve Dickson, but also that "it encourages public
accountability - it is too easy for me to
make an API change to md, start using it in mdadm, and not have anyone
review it
".
Brown's announcement describes the responsibilities of a maintainer as he sees them:
- to gather and manage patches and outstanding issues,
- to review patches or get them reviewed
- to follow up bug reports and get them resolved
- to feed patches upstream, maybe directly to Linus, maybe through some other maintainer, depending on what relationships already exist or can be formed,
- to guide the longer term direction (saying "no" is important sometimes),
- to care,
As can be seen, there is a great deal to do. He also noted that another
job he had previously spent a lot of time on, following the linux-raid
mailing list to provide support on md-related issues, has fallen
by the wayside for him. But, in what might be a preview of what will
happen with the maintainer role, others in the md community have stepped
up. He is "absolutely
thrilled that the gap has been more than filled by other very competent
community members
".
Though he soon won't be doing the maintainer's job, Brown is not disappearing from the md world. He has committed to continue work on the raid5-journal and raid1-cluster projects. He would also be willing to mentor any volunteers and will still review some patches as well as comment on designs. He concluded with a call to action:
Certainly Brown is not the only maintainer to find that they have tired of doing that job. Back at the end of 2014, John Linville stepped down as the wireless network maintainer by "promoting" some of the subsystem maintainers and handing off the wireless driver patch handling to Kalle Valo. The mac80211, bluetooth, and nfc maintainers were asked to start pushing their patches directly to network maintainer David Miller, rather than going through Linville's tree. It seems that Linville had been more successful in finding maintainers along the way—or in them finding him—which simplified his decisions when he decided to work on other things. The wireless subsystem is rather larger than md, however, which tends to foster a bigger pool of potential maintainers.
As with other parts of the kernel development process, the maintainership role is a bit haphazard. Maintainers handle their duties as they see fit and focus their efforts in different ways. The main job is to get the right patches in a—hopefully—timely manner to Torvalds and into the mainline. Determining which patches are "right" is part of the job, too, of course, but some maintainers (including Torvalds) largely leave that job to their sub-maintainers, while others do not. Some of that can be seen in our article on how patches get to the mainline.
In most cases, the maintainer's style has likely come about organically over time—certain things seemed to work for them. But that style may impact how a transition out of the role will need to be handled. For md, there may be some folks interested in the maintainer job (or, more likely, team), who spoke up in the short thread. While it may seem a little crazy to those outside the kernel development community, creating a vacuum as an exit strategy may actually work better than other mechanisms—at least for some subsystems and maintainers.
Protecting private structure members
Most languages designed in the last few decades offer a way to restrict access to portions of a data structure, limiting their use to the code that directly implements the object that structure is meant to represent. The C language, initially designed in 1972, lacks any such feature. Most of the time, C (along with the projects using it) muddles along without this kind of protective feature. But that doesn't mean there would not be a use for it.If one browses through the kernel code, it's easy to find comments warning of dire results should outside code touch certain structure fields. The definition of struct irq_desc takes things a bit further, with a field defined as:
unsigned int core_internal_state__do_not_mess_with_it;
Techniques like this work most of the time, but it would still be nice if the computer could catch accesses to structure members by code that should have no business touching them.
Adding that ability is the goal of this patch set from Boqun Feng. It takes advantage of the fact that the venerable sparse utility allows variables to be marked as "not to be referenced." That feature is used primarily to detect kernel code that directly dereferences user-space pointers, but it can also be used to catch code that is messing around with structure members that it has not been invited to touch. Not all developers routinely run sparse, but enough do that new warnings would eventually be noticed.
The patch set adds a new __private marker that can be used to mark private structure members. So the above declaration could become:
unsigned int __private core_internal_state__do_not_mess_with_it;
As far as the normal C compiler is concerned, __private maps to the empty string and changes nothing. But when sparse is run on the code, it notes that the annotated member is not meant to be accessed and will warn when anybody tries.
Of course, some code must be able to access that field, or there is little point in having it there. Doing so without generating a sparse warning requires first removing the __private annotation; that is done by using the ACCESS_PRIVATE() macro. So code that now looks like:
foo = s->private_field;
would have to become:
foo = ACCESS_PRIVATE(s, private_field);
This aspect of the patch could prove to be the sticking point: some code may require a large number of ACCESS_PRIVATE() casts. Whether they are added to the code directly or hidden behind helper functions, they could lead to a fair amount of code churn if this feature is to be widely used. Given that the honor system works most of the time and that problems from inappropriate accesses to private data are relatively rare, the development community may decide that the current system works well enough.
The return of preadv2()/pwritev2()
Back in 2014, Milosz Tanski tried to add a flags argument to the preadv() and pwritev() system calls. At the time, your editor suggested that the patch set might be ready for the 3.19 development cycle. It might have been, but things stalled and the work was never actually merged. Now this idea is back, but with a different end goal in mind.The existing preadv() and pwritev() system calls (along with readv() and writev(), which are simpler versions of them) lack any means to pass operation-specific options into the kernel. So there is no way to change how a specific call works. Milosz's specific need was to be able to turn on non-blocking behavior for some, but not all, operations; this is a feature that appears to have a number of use cases. The idea was reasonably well received at the (March) 2015 Linux Filesystem, Storage and Memory Management Summit. Even so, work on these patches seemed to come to a halt, with the last posted version showing up in March.
Now, however, preadv2() and pwritev2() are back, though with a different use case in mind. This patch set, posted by Christoph Hellwig, still introduces the new system calls, which still look like:
int preadv2(unsigned long fd, struct iovec *vec, unsigned long vlen,
unsigned long pos_l, unsigned long pos_h, int flags);
int pwritev2(unsigned long fd, struct iovec *vec, unsigned long vlen,
unsigned long pos_l, unsigned long pos_h, int flags);
(Note that the system calls, as presented by the C library, would almost certainly be a little different, with pos_l and pos_h being combined into a single, 64-bit position value).
Unlike Milosz's patch set, though, Christoph's does not provide for non-blocking operations. Instead, it provides a different flag (RWF_HIPRI) allowing an application to indicate a high-priority operation. The block layer can then use that flag to decide whether it should use the new block-layer polling mechanism with that request or not. Polling can, for fast devices (non-volatile memory, for example), reduce latencies significantly. But polling has its costs as well; it probably is best used only when an application is concerned about cutting latency to the bare minimum. Without a flag like RWF_HIPRI, the kernel can't really know if the application cares about latency or not.
Christoph hasn't forgotten about the non-blocking read use case; he also mentions other possibilities like per-operation synchronous behavior or use of the DIF/DIX data integrity mechanism. But his first priority at this point is to get the new system calls in place, along with the polling feature. Once that has been done, there are plenty of other flags that can be added.
How 4.4's patches got to the mainline
The kernel development community is organized as a hierarchy, with developers submitting patches to maintainers who will, in turn, commit those patches to a repository and push them upstream to higher-level maintainers. This hierarchy logically looks a lot like the directory hierarchy of the kernel source itself; most maintainers look after one or more subtrees of the kernel source tree. But does that model really describe how patches make it into the mainline? The kernel's git repository, with the aid of some scripting, holds an answer to that question.With one exception, the process of pulling patches from one repository to another leaves a sign in the form of a merge commit. Those merge commits stay with their associated patches as they are pulled into subsequent repositories, eventually leaving clues to the pull history in the mainline repository. By working through the history and finding the merge that pulled in each patch, one can work out one plausible path by which each patch got to the mainline. The process takes a while to run and tends to make one's laptop warm up, but it produces interesting results in the end.
(A note for the curious: the one exception mentioned above is
"fast-forward" merges, where the destination repository has not changed
since the source repository diverged from it. Some projects fear merge
commits and insist that all merges be fast-forward merges, but that policy
causes the loss of some useful information. In any case, a no-merges
policy would be difficult to scale to a project the size of the kernel.
Fast-forward merges are rare in the kernel community, and almost
never happen for merges into the mainline.)
The result of running this analysis is the plot shown to the right; click on the image to see the plot in its full, 2.1MB glory.
An aphorism occasionally heard among kernel developers is "design in layers, implement flat." It reflects the learned wisdom that layering is a useful design and abstraction tool, but excessive layering in implementations tends to lead to overhead and poor performance. This plot suggests that the kernel development community itself grew as if it were designed with this same heuristic in mind. The kernel source tree is a multi-layer hierarchy, and the maintainers are theoretically organized in the same way, but, in the end, almost every maintainer pushes patches directly to Linus and, thus, directly into the mainline repository. Most of the time, there are no intermediaries between subsystem maintainers and Linus.
Why are things organized that way? One reason is clearly to minimize the latency built into the system; once changes are committed by a maintainer, they can get to the mainline quickly if need be. This organization breaks pull requests into (mostly) manageable pieces that Linus can look over directly, allowing him to maintain some idea of what is happening in all parts of the kernel. And, importantly, it reflects the fact that Linus feels he can trust a fairly large number of maintainers to not sneak questionable changes into a pull request. He relies heavily on subsystem maintainers to properly review changes from developers, but he does not need higher-level maintainers to review the work the subsystem maintainers are doing.
Clearly, such a system will only work if that trust is maintained. Equally important is the ability for Linus to be able to manage pull requests from that many maintainers. Those who have been watching the kernel community for a long time will remember the frightening process-scalability crises that occurred regularly prior to the introduction of BitKeeper (and the subsequent switch to Git). Over five years ago, when kernel development cycles still ran under 10,000 changes and involved a maximum of 1,200 developers, we asked whether Linus was reaching a scalability limit. At the beginning of 2015, cycles run more quickly, bring in 13,000 changes, and will soon involve 1,600 developers, and there are no real signs of strain.
It is good to know, though, that the process would easily accommodate spreading out the top-level responsibility if need be — should Linus get overwhelmed or simply step aside in favor of somebody else. He has advocated in favor of maintainer groups for subsystems; at some point, perhaps we'll have a maintainer group for the top-level repository as well.
The two trees that feed the most patches to the mainline are interesting in
that they show two different maintainer styles. The most active tree in
4.4 was, as it often is, the staging tree, run by Greg Kroah-Hartman.
2,454 changes went through the staging tree in this cycle, but not
a single
one only 122 of them were merged from another repository;
Greg applied each of the other 2,332 patches
himself. That's 35 patches applied each day over the course of the
entire 70-day development cycle. Like many subsystem maintainers, Greg
would rather see
patches posted to (and applied from) public mailing lists
rather than coming directly from other repositories.
The other tree at the top consists of David Miller's networking trees ("net" and "net-next") which, together, sent 2,276 patches into the mainline. The networking developers use the deepest hierarchy of any kernel subsystem, with a large percentage of the patches moving into David's tree from some other subsystem tree. The style of this group is also to use separate repositories for development ("net-next," for example) and for fixes ("net"), while other subsystems tend to put more things into the same repository, using branches to organize them. Thus, for example, the "tip" repository (with x86 and core-kernel work) and the arm-soc repository (numerous ARM-architecture topics) each generated numerous large pull requests during this development cycle, but each shows as a single tree in this plot. One could separate these streams by looking at the name of the branches pulled from, at the risk of adding a fair amount of noise to the plot.
Attentive readers may have wondered at the use of the term "one plausible
path" in the description of the algorithm at the top of this article.
Consider the small piece of the plot shown to the right; it shows
a single commit flowing from Mark
Brown's "regmap" repository toward net-next. That flow represents this
merge commit, wherein David pulled a
single change from the regmap repository. When Linus pulled net-next, he
will have gotten that change with the rest. But that same commit was also a
part of this
merge by Linus which brought the rest of the regmap work directly into the
mainline repository. At this point, the repository history shows that fix
as having come via the latter merge, but the former merge remains in the
history as well. More complicated patterns can be found, especially when
developers perform "back merges" of a higher-level tree into their own
repositories. Such merges are discouraged unless there is a good reason,
partly because they tend to obscure the commit history.
Doubtless there are other interesting things to be learned by watching how changes make their way through the kernel development community and its repositories. For those who are interested in looking further, the tools used to create this plot can be found in the gitdm repository: git://git.lwn.net/gitdm.git.
[Note that the plots have been updated to fix a mysterious but egregious
error; see the comments for details.]
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Device driver infrastructure
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>
