User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 development kernel is 2.6.28-rc4, released on November 9. "Nothing hugely exciting here. Various small fixes all over. There's a delayed FAT update which includes some movement of files around, and there's two fixes for some really long-standing problems (not really regressions, but nasty bugs) in Unix domain file descriptor passing." This release also contains a new Fujitsu MB862xx framebuffer driver and the introduction of a new internal API for dealing with CPU masks. See the long-format changelog for all the details.

As of this writing, just over 200 fixes have been merged into the mainline git repository since the 2.6.28-rc4 release.

The current stable 2.6 kernel is, released on November 7. It contains a long list of fixes accompanied by a stronger-than-usual encouragement to upgrade. The update is in the review process as of this writing; it will likely be released on November 14.

The and stable kernel updates came out on November 10. They both contain a long list of fixes, and both are intended to be the last in the series. Users who are dependent on these updates will want to consider moving to 2.6.27 in the near future.

Comments (none posted)

Kernel development news

Quotes of the week

Google was going to be an interesting case of a large company hiring people both from the embedded world and also the existing Linux development community and then producing an embedded device that was intended to compete with the very best existing platforms. I had high hopes that this combination of factors would result in the Linux community as a whole having a better idea what the constraints and requirements for high-quality power management in the embedded world were, rather than us ending up with another pile of vendor code sitting on an FTP site somewhere in Taiwan that implements its power management by passing tokenised dead mice through a wormhole.

To a certain extent, my hopes were fulfilled. We got a git server in California.

-- Matthew Garrett

We should stop using CPP, which is the outdated tech of the sixties. We should go with the new wave of the seventies and use this shiny new "C" language that's all the rage with features like type checking and stuff.
-- Ingo Molnar

If four heads have exploded (thus far) over one piece of code, perhaps the blame doesn't lie with those heads.
-- Andrew Morton

Comments (none posted)

Tracking of testers and bug reporters - a status report

By Jonathan Corbet
November 11, 2008
A recurring topic at kernel summits is proper recognition for users who report bugs and test fixes. These people help the development process considerably, but they are far less visible than the developers who are creating those bugs in the first place. Since we would like to have more testers and reporters, it makes sense to reward them in whatever way we can. One of the strongest currencies we hold is credit for work done. So it stands to reason that crediting those who help the development process is in the interest of everybody involved.

One mechanism developed for this purpose is a set of tags applied to patches before they are merged into the mainline. When a patch fixes a bug, the user(s) who reported that bug should be credited through the addition of a Reported-by: tag. Similarly, testers are credited with the Tested-by: tag. As it happens, some developers have adopted the habit of using Reported-and-tested-by: as a way of saving valuable newlines in the common case where a user fills both roles.

There is a certain warm feeling that comes with having one's name stored in a changelog entry in the kernel source repository. But the amount of visibility which comes from this event is relatively small. So your editor decided to hack up his git data mining utility to track these tags. Without further ado, here are the top problem reporters and patch testers for the 2.6.27 development cycle:

Most credited 2.6.27 testers
Reported-by credits
Adrian Bunk4321.0%
Robert P. J. Day125.9%
Eric Sesterhenn52.4%
Andrew Morton42.0%
Alexey Dobriyan42.0%
Denys Fedoryshchenko42.0%
Yinghai Lu31.5%
David S. Miller31.5%
Vegard Nossum31.5%
Stephen Rothwell31.5%
Juha Leppanen31.5%
Russell King21.0%
Andi Kleen21.0%
Ingo Molnar21.0%
Benjamin Herrenschmidt21.0%
Daniel J Blueman21.0%
Daniel Exner21.0%
Manuel Lauss21.0%
Atsushi Nemoto21.0%
Mikael Pettersson21.0%
Tested-by: credits
Ingo Molnar74.6%
Andrew Savchenko63.9%
Rene Herman42.6%
Mariusz Kozlowski32.0%
Alexey Dobriyan32.0%
Tino Keitel32.0%
Robert Jarzmik32.0%
KOSAKI Motohiro21.3%
Benjamin Herrenschmidt21.3%
Larry Finger21.3%
Kenji Kaneshige21.3%
Jack Howarth21.3%
Gerald Schaefer21.3%
Dennis Jansen21.3%
Daniel J Blueman21.3%
Daniel Exner21.3%
Steven Noonan21.3%
Lawrence Greenfield21.3%
Mark Langsdorf21.3%

All told, there were a total of 205 Reported-by: and 153 Tested-by: credits entered during the 2.6.27 kernel cycle. This is arguably a reasonable start for a new tag, but it seems clear that a lot of problem reporters are not, yet, being credited in this manner. Your editor became curious to see just who is taking the time to credit these people; they, too, deserve some credit. A bit more script hacking yielded these tables:

Developers giving credits in 2.6.27
Reported-by credits
Adrian Bunk4421.5%
Linus Torvalds125.9%
Ingo Molnar83.9%
Andrew Morton73.4%
Peter Zijlstra73.4%
Bartlomiej Zolnierkiewicz62.9%
Yinghai Lu52.4%
Jarek Poplawski52.4%
Jiri Kosina52.4%
Hugh Dickins42.0%
FUJITA Tomonori42.0%
Paul Mundt42.0%
Vegard Nossum31.5%
Russell King31.5%
Jeremy Fitzhardinge31.5%
Roland McGrath31.5%
Haavard Skinnemoen31.5%
Dmitry Torokhov31.5%
David Woodhouse31.5%
Oleg Nesterov31.5%
Tested-by: credits
Pekka Enberg74.6%
Linus Torvalds74.6%
Takashi Iwai53.3%
Bartlomiej Zolnierkiewicz53.3%
Peter Zijlstra42.6%
Rafael J. Wysocki42.6%
Yinghai Lu42.6%
Hugh Dickins42.6%
Alan Stern42.6%
Eric Miao42.6%
Thomas Gleixner32.0%
Lennert Buytenhek32.0%
Alex Chiang32.0%
Krzysztof Helt32.0%
Stefan Richter32.0%
Andy Whitcroft32.0%
KOSAKI Motohiro21.3%
Dennis Jansen21.3%
Andrew Morton21.3%
David S. Miller21.3%

The end result: Adrian Bunk gave over 20% of the total bug reporting credits - to himself. Beyond that, a number of the core developers are taking at least some time to credit those who report bugs and test patches. But, in the end, the 10,628 changesets merged for 2.6.27 probably contained quite a few more patches which could have carried such tags. If the reporting and testing tags are to become truly useful and significant, they will have to be more universally used.

While your editor was at it, he also collected statistics for Reviewed-by: tags. These tags differ in that they are offered by the reviewer, who thereby states that a reasonably thorough review has been done and the code has not been found seriously wanting. Code review is perennially in short supply in just about any free software project, so, again, proper credit for reviewers seems like more than just a good idea. Here's the top 2.6.27 credited reviewers:

Developers with the most reviews (total 123)
Ingo Molnar2318.7%
Paul Jackson129.8%
Peter Zijlstra118.9%
Christoph Lameter108.1%
Aneesh Kumar K.V75.7%
KOSAKI Motohiro64.9%
Paul E. McKenney64.9%
Jeff Moyer54.1%
Robert P. J. Day43.3%
Nadia Derbey32.4%
Paul E. McKenney32.4%
Mingming Cao21.6%
Michael Buesch21.6%
Li Zefan21.6%
Matthew Wilcox21.6%
Ingo Oeser21.6%
Badari Pulavarty21.6%

If these numbers are to be believed, only 123 reviews were performed over the 2.6.27 development cycle. Even the most cynical observer is likely to agree that a bit more reviewing than that is going on. Most reviewers do not offer the associated tag, so their contribution goes unrecorded. In particular, Andrew Morton, who seems to review almost every patch which appears, should be at the top of the above list.

Clearly, the task of ensuring proper credit for testers, bug reporters, and reviewers is still in its initial stages. But one has to start somewhere; this is more information than we had before. Hopefully, over time, the habit of crediting those who help with the development process will become more widespread. And that, with luck, will encourage more testing and bug reporting and, as a result, a better kernel.

Comments (7 posted)

/dev/ksm: dynamic memory sharing

By Jonathan Corbet
November 12, 2008
The kernel generally goes out of its way to share identical memory pages between processes. Program text is always shared, for example. But writable pages will also be shared between processes when the kernel knows that the contents of the memory are the same for all processes involved. When a process calls fork(), all writable pages are turned into copy-on-write (COW) pages and shared between the parent and child. As long as neither process modified the contents of any given page, that sharing can continue, with a corresponding reduction in memory use.

Copy-on-write with fork() works because the kernel knows that each process expects to find the same contents in those pages. When the kernel lacks that knowledge, though, it will generally be unable to arrange sharing of identical pages. One might not think that this would ordinarily be a problem, but the KVM developers have come up with a couple of situations where this kind of sharing opportunity might come about. Your editor cannot resist this case proposed by Avi Kivity:

Consider the typical multiuser gnome minicomputer with all 150 users reading at the same time instead of working. You could share the firefox rendered page cache, reducing memory utilization drastically.

Beyond such typical systems, though, consider the case of a host running a number of virtualized guests. Those guests will not share a process-tree relationship which makes the sharing of pages between them easy, but they may well be using a substantial portion of their memory to hold identical contents. If that host could find a way to force the sharing of pages with identical contents, it should be able to make much better use of its memory and, as a result, run more guests. This is the kind of thing which gets the attention of virtualization developers. So the hackers at Qumranet Red Hat (Izik Eidus, Andrea Arcanageli, and Chris Wright in particular) have put together a mechanism to make that kind of sharing happen. The resulting code, called KSM, was recently posted for wider review.

KSM takes the form of a device driver for a single, virtual device: /dev/ksm. A process which wants to take part in the page sharing regime can open that device and register (with an ioctl() call) a portion of its address space with the KSM driver. Once the page sharing mechanism is turned on (via another ioctl()), the kernel will start looking for pages to share.

The algorithm is relatively simple. The KSM driver, inside a kernel thread, picks one of the memory regions registered with it and start scanning over it. For each page which is resident in memory, KSM will generate an SHA1 hash of the page's contents. That hash will then be used to look up other pages with the same hash value. If a subsequent memcmp() call shows that the contents of the pages are truly identical, all processes with a reference to the scanned page will be pointed (in COW mode) to the other one, and the redundant page will be returned to the system. As long as nobody modifies the page, the sharing can continue; once a write operation happens, the page will be copied and the sharing will end.

The kernel thread will scan up to a maximum number of pages before going to sleep for a while. Both the number of pages to scan and the sleep period are passed in as parameters to the ioctl() call which starts scanning. A user-space control process can also pause scanning via another ioctl() call.

The initial response to the patch from Andrew Morton was not entirely enthusiastic:

The whole approach seems wrong to me. The kernel lost track of these pages and then we run around post-facto trying to fix that up again. Please explain (for the changelog) why the kernel cannot get this right via the usual sharing, refcounting and COWing approaches.

The answer from Avi Kivity was reasonably clear:

For kvm, the kernel never knew those pages were shared. They are loaded from independent (possibly compressed and encrypted) disk images. These images are different; but some pages happen to be the same because they came from the same installation media.

Izik Eidus adds that, with this patch, a host running a bunch of Windows guests is able to overcommit its memory 300% without terribly ill effects. This technique, it seems, is especially effective with Windows guests: Windows apparently zeroes all freed memory, so each guest's list of free pages can be coalesced down to a single, shared page full of zeroes.

What has not been done (or, at least, not posted) is any sort of benchmarking of the impact KSM has on a running system. The scanning, hashing, and comparing of pages will require some CPU time, and it is likely to have noticeable cache effects as well. If you are trying to run dozens of Windows guests, cache effects may well be relatively low on your list of problems. But that cost may be sufficient to prevent the more general use of KSM, even though systems which are not using virtualization at all may still have a lot of pages with identical contents.

Comments (25 posted)

The sad story of the em28xx driver

By Jonathan Corbet
November 11, 2008
Over the last year or two, the kernel development process has been changed in a deliberate attempt to make the addition of new drivers easier. It has become clear that out-of-tree drivers often do not get any better until they are merged; meanwhile, users want those drivers and distributors are shipping them. So it would seem that everybody's interests are served by getting those drivers into the mainline tree. Experience with drivers merged under this policy has generally been positive; once those drivers head for the mainline, they get more attention and tend to improve quickly.

Given that, one might well wonder why Markus Rechberger's recently submitted "empia" driver series is encountering so much resistance. This driver works with a number of video acquisition devices based on Empia chips; many of those are not supported by the kernel now. As an Empia Technology employee, Markus has access to the relevant data sheets and is, thus, well placed to write a fully-functional driver. There are users who will attest that the drivers work, and that Markus provides good support for them. But, as things stand now, it would appear that this driver is not headed for the mainline.

What we have here is a classic story of an impedance mismatch between a developer and the development community. In the process, this long story has helped to give the Video4Linux development community a bit of a reputation as a dysfunctional family - a perception which those developers are only now beginning to overcome. The sad truth would seem to be that, while working with the community is something that a couple thousand developers do with little trouble every year, there will always be a few who have difficulties.

A quick review of some of the history is in order here. Markus was one of the authors of the original em28xx driver, first merged for the 2.6.15 kernel. His efforts to enhance that driver quickly ran into trouble, though, when he tried to make substantial changes to the low-level tuner interface - changes which affected a number of other drivers. These changes were not popular in the Video4Linux community, and there were fears that they could break unrelated drivers. So this code was not merged.

In response to this rejection, Markus claimed ownership of the em28xx driver and asked that it be removed from the mainline kernel. He then continued development of the code, hosting it on his own server. There was even a period where the code was relicensed to the MPL, apparently as part of an attempt to prevent it from being taken into the mainline. Eventually, Markus came back with a new approach which moved much of the tuner code into user space. That solution, too, failed to pass review; nobody else could really see much advantage in moving that much driver code out of the kernel. The fact that Markus clearly intended to have some of that code appear in the form of binary-only blobs did not help his case. So the user-space approach, like its predecessor, was not merged.

While Markus was working on his own version of the code, others were putting patches into the mainline em28xx driver. At times, Markus tried to block those changes. The tone of the discussion is, perhaps, best seen from this note sent to Video4Linux maintainer Mauro Carvalho Chehab:

Best would be to replace you as a maintainer since you don't have any respect of others work either. Companies should be aware that if they try to submit any code to you they will loose the authority over _their_ work.

Of course, losing "authority" over code is inherent in releasing that code under a license like the GPL. This attempt to exercise control over freely-licensed code was slapped down by Andrew Morton and others, but it left unpleasant memories behind.

Now Markus is back with a driver that, to all appearances, duplicates the functionality of a driver which is already in the mainline kernel. It is not hard to see this submission as an attempt to retake control of that driver and, perhaps, restart the discussions from past years. So it is not entirely surprising that this driver has not been received with a great deal of enthusiasm. In short, Markus has been told to go away until he is prepared to submit his work in the form of a series of small patches to the in-tree em28xx driver.

The advantages of improving the current driver, rather than duplicating some of its functionality in a new code base, are clear. It would avoid the confusion which can come from having two drivers for the same hardware in the tree, and it would minimize the risk of losing important fixes which have been applied to the in-tree code. This is, also, the way that kernel developers are normally expected to do their work. On the other hand, video developer Hans Verkuil reviewed the new driver and concluded:

In my opinion it's pretty much hopeless trying to convert the current em28xx driver into what you have. It's a huge amount of work that no one wants to do and (in this case) with very little benefit.

This review notwithstanding, Mauro has indicated that he is not interested in accepting this patch. But rejecting Markus's new driver out of hand might just be a mistake. There seems to be little doubt that it has developed well beyond the in-tree driver; it supports a wider range of devices. Failure to merge it risks losing the work that has been done, and, perhaps, losing the future work of a developer who, for all his faults, is clearly trying to provide a better experience for Video4Linux users.

Having multiple drivers for the same hardware in the kernel is not an ideal situation, but it is also not without precedent. The IDE and parallel ATA subsystems provide redundant support for a wide range of hardware. The e1000 and e1000e drivers had overlapping coverage for some time. In such cases, the long-term goal is usually to work toward the removal of one of the drivers.

So one could make the case for merging the new driver and, eventually, removing the older one. In the process, the new driver could receive some much-needed attention from other developers. It has coding style and copyright attribution problems; a quick review has also left your editor wondering about locking issues. But such problems are common to drivers which have spent a lot of time out of tree; they are simply something to fix. Meanwhile, this driver contains the result of years of work and access to the relevant data sheets; freezing it out may not be in the best interests of kernel developers or users.

Comments (22 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers


Filesystems and block I/O

Memory management



Virtualization and containers

Benchmarks and bugs


Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds