LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 3.11-rc2, released on July 21. Linus says: "the O_TMPFILE flag that is new to 3.11 has been going through a few ABI/API cleanups (and a few fixes to the implementation too), but I think we're done now. So if you're interested in the concept of unnamed temporary files, go ahead and test it out. The lack of name not only gets rid of races/complications with filename generation, it can make the whole thing more efficient since you don't have the directory operations that can cause serializing IO etc."

Stable updates: 3.10.2, 3.9.11, 3.4.54, and 3.0.87 were released on July 21; 3.9.11 is the last of the 3.9 series.

As of this writing, 3.10.3 and 3.2.49 are in the review process; they can be expected sometime on or after July 25.

Comments (none posted)

Quotes of the week

I'm not even going to speculate why people interested in InfiniBand switches end up buying paper towels.
Roland Dreier

I'm cantankerous, and hard to please. Send me too much and I yell, and send me too little and I yell. Because I'm the Goldilocks of kernel development, and I want my pull requests "just right".
Linus Torvalds

Though I must confess that I have shifted from being mostly worried about people yelling at me to being mostly worried about my own code yelling at me. Either way, I do find that being worried about some consequence or another does help me get a better result.
Paul McKenney

Comments (none posted)

Kernel development news

Device tree troubles

By Jonathan Corbet
July 24, 2013
Kernel developers working on the x86 architecture are spoiled; they develop for hardware that, for the most part, identifies itself when asked, with the result that it is usually easy to figure out how a specific machine is put together. Other architectures — most notably ARM — are rather messier in this regard, requiring the kernel to learn about the configuration of the hardware from somewhere other than the hardware itself. Once upon a time, hard-coded "board files" were used to build ARM-system-specific kernels; more recently, the device tree mechanism has emerged as the preferred way to describe a system to the kernel. A device tree file provides the enumeration information that the hardware itself does not, allowing the kernel to understand the configuration of the system it is running on. The device tree story is one of success, but, like many such stories, success is bringing on some growing pains.

A device tree "binding" is the specification of how a specific piece of hardware can be described in the device tree data structure. Most drivers meant to run on platforms where device trees are used include a documentation file describing that driver's bindings; see Documentation/devicetree/bindings/net/can/cc770.txt as a randomly chosen example. The kernel contains nearly 800 such files, plus a hundreds more ".dts" files describing complete system-on-chips and boards, and the number is growing rapidly.

Maintenance of those files is proving to be difficult for a number of reasons, but the core of the problem can be understood by realizing that a device tree binding is a sort of API that has been exposed by the kernel to the world. If a driver's bindings change in an incompatible way, newer kernels may fail to boot on systems with older device trees. Since the device tree is often buried in the system's firmware somewhere, this kind of problem can be hard to fix. But, even when the fix is easy, the kernel's normal API rules should apply; newer kernels should not break on systems where older kernels work.

The clear implication is that new device tree bindings need to be reviewed with care. Any new bindings should adhere to existing conventions, they should describe the hardware completely, and they should be supportable into the future. And this is where the difficulties show up, in a couple of different forms: (1) most subsystem maintainers are not device tree experts, and thus are not well equipped to review new bindings, and (2) the maintainers who are experts in this area are overworked and having a hard time keeping up.

The first problem was the subject of a request for a Kernel Summit discussion with the goal of educating subsystem maintainers on the best practices for device tree bindings. One might think that a well-written document would suffice for this purpose, but, unfortunately, these best practices still seem to be in the "I know it when I see it" phase of codification; as Mark Brown put it:

At the minute it's about at the level of saying that if you're not sure or don't know you should get the devicetree-discuss mailing list to review it. Ideally someone would write that document, though I wouldn't hold my breath and there is a bunch of convention involved.

Said mailing list tends to be overflowing with driver postings, though, making it less useful than one might like. Meanwhile, the best guidance, perhaps, came from David Woodhouse:

The biggest thing is that it should describe the *hardware*, in a fashion which is completely OS-agnostic. The same device-tree binding should work for Solaris, *BSD, Windows, eCos, and everything else.

That is, evidently, not always the case, currently; some device tree bindings can be strongly tied to specific kernel versions. Such bindings will be a maintenance problem in the long term.

Keeping poorly-designed bindings out of the mainline is the responsibility of the device tree maintainers, but, as Grant Likely (formerly one of those maintainers) put it, this maintainership "simply isn't working right now." Grant, along with Rob Herring, is unable to keep up with the stream of new bindings (over 100 of which appeared in 3.11), so a lot of substandard bindings are finding their way in. To address this problem, Grant has announced a "refactoring" of how device tree maintainership works.

The first part of that refactoring is Grant's own resignation, with lack of time given as the reason. In his place, four new maintainers (Pawel Moll, Mark Rutland, Stephen Warren and Ian Campbell) have been named as being willing to join Rob and take responsibility for device tree bindings; others with an interest in this area are encouraged to join this group.

The next step will be for this group to figure out how device tree maintenance will actually work; as Grant noted, "There is not yet any process for binding maintainership." For example, should there be a separate repository for device tree bindings (which would make review easier), or should they continue to be merged through the relevant subsystem trees (keeping the code and the bindings together)? It will take some time, and possibly a Kernel Summit discussion, to figure out a proper mechanism for the sustainable maintenance of device tree bindings.

Some other changes are in the works. The kernel currently contains hundreds of .dts files providing complete device trees for specific systems; there are also many .dtsi files describing subsystems that can be included into a complete device tree. In the short term, there are plans to design a schema that can be used to formally describe device tree bindings; the device tree compiler utility (dtc) will then be able to verify that a given device tree file adheres to the schema. In the longer term, those device tree files are likely to move out of the kernel entirely (though the binding documentation for specific devices will almost certainly remain).

All told, the difficulties with device trees do not appear to be anything other than normal growing pains. A facility that was once only used for a handful of PowerPC machines (in the Linux context, anyway) is rapidly expanding to cover a sprawling architecture that is in wide use. Some challenges are to be expected in a situation like that. With luck and a fair amount of work, a better set of processes and guidelines for device tree bindings will result from the discussion — eventually.

Comments (21 posted)

The exfiltrated exFAT driver

By Jonathan Corbet
July 24, 2013
The exFAT filesystem is a Microsoft product, designed for flash media. It lacks support in the Linux kernel; as a proprietary, heavily patented filesystem, it is not the sort of thing one would expect to see free support for. Still, when the exfat-nofuse repository showed up on GitHub, some dared to hope that Linux would gain exFAT support after all. Instead, what we appear to have gained is an ugly licensing mess and code that is best avoided.

From what can be determined by looking at the repository, the code appears to work. It was originally written by Samsung, it seems, and was shipped with one or more Android devices. The problem is that, as far as anybody can tell, Samsung never intended to distribute this code under the GPL. Instead, a GitHub user who goes by "rxrz" somehow came by a copy of the code, removed the original proprietary licensing headers, and inserted a GPL license declaration into the code. The code claimed to have a GPL license, but the copyright owner never released the code under that license.

On July 9, another GitHub user filed a bug noting that the license declaration was incorrect and suggesting a removal of the repository. The entity known as rxrz was not impressed, though, saying:

It's a leaked code of a proprietary exfat driver, written by Samsung, Inc. It works, you can use it. What else do you want, a signed paper from your parents on whether you can or can not use it? I'm a programmer, not a lawyer. You got the code, now decide what to do with it, it's up to you.

The code has since been edited to remove the GPL declaration and restore the proprietary license, but it remains available on GitHub and rxrz evidently feels that nothing wrong was done by posting it there. It also appears that GitHub has no interest in pulling down the repository in the absence of an explicit takedown notice from Samsung, so this "leaked" driver may remain available for some time.

This whole episode seems like a fairly straightforward case of somebody trying to liberate proprietary code by any means available. There are some interesting questions raised by all of this, though. The first of those is: what if somebody had tried to merge this code into the mainline kernel? The immediate answer is that they would have been chased off the list once developers actually had a look at the code, which, to put it gently, does not much resemble Linux kernel code. In the absence of this obvious barrier, one can hope that our normal review mechanisms would have kept this code from being merged until the developer was able to provide a satisfactory explanation of where it came from.

But it is not clear that all of our code is reviewed to that level, so it is hard to be sure. An exFAT implementation is likely to attract enough attention to ensure that the right questions are asked. Had the code in question been a driver for a relatively obscure piece of hardware, instead, it might not have been looked at very closely.

Then, one might ask: why is Samsung shipping this as a proprietary module in the first place? After all, Samsung appears to have figured out how Linux kernel development works and has made a solid place for itself as one of the largest contributors to the kernel. One can only guess at the answer, but it likely has to do with claims that Microsoft makes over the exFAT format. Microsoft has shown itself to be willing to assert patents on filesystem formats, so taking some care with an implementation of a new Microsoft filesystem format would seem like an exercise in basic prudence. Whether this exercise led to ignoring the GPL in an imprudent manner is the subject of another debate entirely.

Similarly, some prudence would be advisable for anybody thinking to use the code as a reverse-engineering tool for a new exFAT implementation. It is hard to reverse-engineer one's way around patent problems. exFAT may well be a format that is best left alone.

Finally, for those who have been in this community for a long time, the attitude revealed by a number of participants in the GitHub issue thread may be surprising. Licensing, GPL or otherwise, appears not to matter to many of these people. All that matters is that the code can be downloaded and that it works. This attitude can be found elsewhere on GitHub; indeed, many have complained that GitHub itself seems to be indifferent at best to the licensing of the code it distributes.

Perhaps we are heading into some sort of post-copyright era where licensing truly no longer matters. But it would not be surprising if those who are interested in copyright resist that future for a while yet. We are not just talking about the entertainment industry here; the simple fact of the matter is that anybody who values the provisions of the GPL is indeed interested in copyright. It is hard to demand respect for the GPL while refusing to respect the terms of other licenses.

Among other things, that means that the kernel community must continue to be careful not to incorporate code that has not been contributed under a suitable license. So code that shows up on the net must be looked at carefully, no matter how useful it appears to be. In this case, there was no danger that the exFAT code would ever be merged; nobody even suggested that it should be. But there will be other modules of dubious provenance in the future, some of which may seem more legitimate at first glance. Even then, though, our processes should be good enough to find the problems and avoid a merger that we will later regret. Hopefully.

(Thanks to Armijn Hemel for the heads-up).

Comments (53 posted)

What's missing from our changelogs

By Jonathan Corbet
July 24, 2013
Tens of thousands of changes make their way into the mainline kernel every year. For most of those changes, the original motivation for the work is quickly forgotten; all that remains is the code itself and the changelog that goes with it. For this reason, kernel maintainers tend to insist on high-quality changelogs; as Linus recently put it, "We have a policy of good commit messages in the kernel." Andrew Morton also famously pushes developers to document the reasons explaining why a patch was written, including the user-visible effects of any bugs fixed. Kernel developers do not like having to reverse engineer the intent of a patch years after the fact.

With that context in mind, and having just worked through another merge window's worth of patches, your editor started wondering if our changelogs were always as good as they should be. A bit of scripting later, a picture of sorts has emerged; as one might expect, the results were not always entirely encouraging.

Changelogs

A patch's changelog is divided into three parts: a one-line summary, a detailed change explanation, and a tags section. For the most trivial patches, the one-line summary might suffice; there is not much to add to "add missing include of foo.h", for example. For anything else, one would expect a bit more text describing what is going on. So patches with empty explanation sections should be relatively rare.

As of this writing, just under 70,000 non-merge changesets have been pulled into the mainline repository since the release of the 3.5 kernel on July 21, 2012. Of those, 6,306 had empty explanations — 9% of the total. Many of them were as trivial as one might expect, but others were rather less so.

Some developers are rather more laconic than others. In the period since 3.5, the developers most inclined to omit explanations were:

DeveloperCount
Al Viro570
Ben Skeggs224
Mark Brown213
Hans Verkuil204
Andreas Gruenbacher143
Axel Lin130
Philipp Reisner126
Antti Palosaari118
James Smart107
Alex Deucher85
Laurent Pinchart84
Kuninori Morimoto75
Eric W. Biederman75
Pavel Shilovsky72
Rafał Miłecki72
David S. Miller65
David Howells61
Peter Meerwald61
Maxime Ripard55
YOSHIFUJI Hideaki51

For the curious, a page listing the no-explanation patches merged by the above developers is available. A quick look shows that a lot of patches with empty explanations find their way into the core virtual filesystem layer; many of the rest affect graphics drivers, audio drivers, Video4Linux drivers, and the DRBD subsystem. But they can be found anywhere; of the 1,065 changes that touched the mm/ subdirectory, 46 lacked an explanation, for example.

If one believes that there should be fewer patches with empty explanations going into the kernel, one might be inclined to push subsystem maintainers to be a bit more demanding in this regard. But, interestingly, it has become much harder to determine which maintainers have had a hand in directing patches into the kernel.

Signoffs

The Signed-off-by line in the tags section is meant to document the provenance of patches headed into the mainline. When a developer submits a patch, the changelog should contain a signoff certifying that the patch can properly be contributed to the kernel under a GPL-compatible license. Additionally, maintainers who accept patches add their own signoffs documenting that they handled the patch and that they believe it is appropriate for submission to the mainline. In theory, by following the sequence of Signed-off-by lines, it is possible to determine the path that any change followed to get to Linus's tree.

The truth is a little bit more complicated than that. To begin with, of the changes merged since 3.5, 79 had no signoffs at all. Roughly half of those were commits by Linus changing the version number; he does not apply a signoff to such changes, even for those that contain added data beyond the version number update. The rest are all almost certainly mistakes; a handful are the result of obvious formatting errors. See the full list for details. The mistakes are innocent, but they do show a failure of a process which is supposed to disallow patches that have not been signed off by their authors.

Arguably, there is another class of patches that is more interesting: those that contain a single Signed-off-by line. Such patches have, in theory, been managed by a single developer who wrote the patch and got it into the mainline unassisted. One might think that only Linus is in a position to do any such thing; how could anybody else get a change into the mainline on their own?

In fact, of the 70,000 patches pulled into the mainline during the period under discussion, 16,651 had a single signoff line. Of those, 11,527 (16% of the total) had no other tags, like Acked-by, Reviewed-by, or Tested-by, that would indicate attention from at least one other developer. For the purposes of this discussion, only the smaller set of patches has been considered. The most frequent committers of single-signoff patches are:

DeveloperCount
Al Viro891
Takashi Iwai525
Mark Brown492
Johannes Berg414
Alex Deucher391
Mauro Carvalho Chehab389
Ben Skeggs362
Greg Kroah-Hartman292
Trond Myklebust279
David S. Miller264
Felipe Balbi259
Tomi Valkeinen258
Arnaldo Carvalho de Melo172
Eric W. Biederman147
Josef Bacik145
Shawn Guo142
J. Bruce Fields141
Ralf Baechle132
Arnd Bergmann131
Samuel Ortiz129

(See this page for a list of the single-signoff patches merged by the above developers). These results are, of course, a result of the use of git combined with the no-rebasing rule. Once a patch has been committed to a public repository, it becomes immutable and can never again acquire tags like Signed-off-by. To pick one example from the list above, wireless developer Johannes Berg maintains his own tree for mac80211 changes; when he commits a patch, it will carry his signoff. Changes flow from that tree to John Linville's wireless tree, then to Dave Miller's networking tree, and finally to the mainline repository. Since each of those moves is done with a Git "pull" operation, no additional signoffs will be attached to any of those patches; they will arrive in the mainline with a single signoff.

One might contend that patches become less subject to review once they enter the Git stream; they can be pulled from one repository to the next sight-unseen. Indeed, early in the BitKeeper era, developers worried that pull requests would be used to slip unreviewed patches into the mainline kernel. Single-signoff patches might be an indication that this is happening. And, indeed, important patches like the addition of the O_TMPFILE option went to the mainline with, as far as your editor can tell, no public posting or review (and no explanation in the changelog, for that matter). It also seems plausible that single-signoff patches merged into the sound subsystem or the Radeon driver (to name a couple of examples) have not been reviewed in detail by anybody other than the author; there just aren't that many people with the interest and skills to review that code.

Without a chain of signoff lines, we lose more than a picture of which maintainers might have reviewed the patches; we also lose track of the path by which a patch finds its way into the mainline. A given changeset may pass through a number of repositories, but those passages leave no mark on the changeset itself. Sometimes that path can be worked out from the mainline repository history, but doing so can be harder than one might imagine, even in the absence of "fast-forward merges" and other actions that obscure that history. Given that the Signed-off-by line was introduced to document how patches get into the kernel, the loss of this information may be a reason for concern.

The kernel community prides itself on its solid foundation of good procedures, including complete changelogs and a reliable signoff chain. Most of the time, that pride is entirely justified. But, perhaps, there might be room for improvement here and there — that is unsurprising when one considers that no project that merges 70,000 changes in a year can be expected to do a perfect job with every one of them. Where there is imperfection, there is room for improvement — though improving the signoff chain will be difficult as long as the tools do not allow it. But even a bit more verbiage in commit messages would be appreciated by those of us who read the patch stream.

Comments (47 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds