Brief items
The current 2.6 development kernel is 2.6.28-rc4,
released on November 9.
"
Nothing hugely exciting here. Various small fixes all over. There's
a delayed FAT update which includes some movement of files around, and
there's two fixes for some really long-standing problems (not really
regressions, but nasty bugs) in Unix domain file descriptor
passing." This release also contains a new Fujitsu MB862xx
framebuffer driver and the introduction of a new internal API for dealing
with CPU masks. See
the
long-format changelog for all the details.
As of this writing, just over 200 fixes have been merged into the mainline
git repository since the 2.6.28-rc4 release.
The current stable 2.6 kernel is 2.6.27.5, released on November 7. It
contains a long list of fixes accompanied by a stronger-than-usual
encouragement to upgrade. The 2.6.27.6 update is in the review
process as of this writing; it will likely be released on November 14.
The 2.6.25.20 and 2.6.26.8 stable kernel updates came out on
November 10. They both contain a long list of fixes, and both are
intended to be the last in the series. Users who are dependent on these
updates will want to consider moving to 2.6.27 in the near future.
Comments (none posted)
Kernel development news
Google was going to be an interesting case of a large company
hiring people both from the embedded world and also the existing
Linux development community and then producing an embedded device
that was intended to compete with the very best existing
platforms. I had high hopes that this combination of factors would
result in the Linux community as a whole having a better idea what
the constraints and requirements for high-quality power management
in the embedded world were, rather than us ending up with another
pile of vendor code sitting on an FTP site somewhere in Taiwan that
implements its power management by passing tokenised dead mice
through a wormhole.
To a certain extent, my hopes were fulfilled. We got a git server in California.
--
Matthew Garrett
We should stop using CPP, which is the outdated tech of the
sixties. We should go with the new wave of the seventies and use
this shiny new "C" language that's all the rage with features like
type checking and stuff.
--
Ingo Molnar
If four heads have exploded (thus far) over one piece of code,
perhaps the blame doesn't lie with those heads.
--
Andrew Morton
Comments (none posted)
By Jonathan Corbet
November 11, 2008
A recurring topic at kernel summits is proper recognition for users who
report bugs and test fixes. These people help the development process
considerably, but they are far less visible than the developers who are
creating those bugs in the first place. Since we would like to have more
testers and reporters, it makes sense to reward them in whatever way we
can. One of the strongest currencies we hold is credit for work done. So
it stands to reason that crediting those who help the development process
is in the interest of everybody involved.
One mechanism developed for this purpose is a set of tags applied to
patches before they are merged into the mainline. When a patch fixes a
bug, the user(s) who reported that bug should be credited through the
addition of a Reported-by: tag. Similarly, testers are credited
with the Tested-by: tag. As it happens, some developers have
adopted the habit of using Reported-and-tested-by: as a way of
saving valuable newlines in the common case where a user fills both roles.
There is a certain warm feeling that comes with having one's name stored in
a changelog entry in the kernel source repository. But the amount of
visibility which comes from this event is relatively small. So your editor
decided to hack up his git data mining utility to track these tags.
Without further ado, here are the top problem reporters and patch testers
for the 2.6.27 development cycle:
| Most credited 2.6.27 testers |
| Reported-by credits |
| Adrian Bunk | 43 | 21.0% |
| Robert P. J. Day | 12 | 5.9% |
| Eric Sesterhenn | 5 | 2.4% |
| Andrew Morton | 4 | 2.0% |
| Alexey Dobriyan | 4 | 2.0% |
| Denys Fedoryshchenko | 4 | 2.0% |
| Yinghai Lu | 3 | 1.5% |
| David S. Miller | 3 | 1.5% |
| Vegard Nossum | 3 | 1.5% |
| Stephen Rothwell | 3 | 1.5% |
| Juha Leppanen | 3 | 1.5% |
| Russell King | 2 | 1.0% |
| Andi Kleen | 2 | 1.0% |
| Ingo Molnar | 2 | 1.0% |
| Benjamin Herrenschmidt | 2 | 1.0% |
| Daniel J Blueman | 2 | 1.0% |
| Daniel Exner | 2 | 1.0% |
| Manuel Lauss | 2 | 1.0% |
| Atsushi Nemoto | 2 | 1.0% |
| Mikael Pettersson | 2 | 1.0% |
|
| Tested-by: credits |
| Ingo Molnar | 7 | 4.6% |
| Andrew Savchenko | 6 | 3.9% |
| Rene Herman | 4 | 2.6% |
| Mariusz Kozlowski | 3 | 2.0% |
| Alexey Dobriyan | 3 | 2.0% |
| Tino Keitel | 3 | 2.0% |
| Robert Jarzmik | 3 | 2.0% |
| KOSAKI Motohiro | 2 | 1.3% |
| Benjamin Herrenschmidt | 2 | 1.3% |
| Larry Finger | 2 | 1.3% |
| Kenji Kaneshige | 2 | 1.3% |
| Jack Howarth | 2 | 1.3% |
| Gerald Schaefer | 2 | 1.3% |
| Dennis Jansen | 2 | 1.3% |
| Daniel J Blueman | 2 | 1.3% |
| Daniel Exner | 2 | 1.3% |
| Steven Noonan | 2 | 1.3% |
| Rus | 2 | 1.3% |
| Lawrence Greenfield | 2 | 1.3% |
| Mark Langsdorf | 2 | 1.3% |
|
All told, there were a total of 205 Reported-by: and 153
Tested-by: credits entered during the 2.6.27 kernel cycle. This
is arguably a reasonable start for a new tag, but it seems clear that a lot
of problem reporters are not, yet, being credited in this manner. Your
editor became curious to see just who is taking the time to credit these
people; they, too, deserve some credit. A bit more script hacking yielded
these tables:
| Developers giving credits in 2.6.27 |
| Reported-by credits |
| Adrian Bunk | 44 | 21.5% |
| Linus Torvalds | 12 | 5.9% |
| Ingo Molnar | 8 | 3.9% |
| Andrew Morton | 7 | 3.4% |
| Peter Zijlstra | 7 | 3.4% |
| Bartlomiej Zolnierkiewicz | 6 | 2.9% |
| Yinghai Lu | 5 | 2.4% |
| Jarek Poplawski | 5 | 2.4% |
| Jiri Kosina | 5 | 2.4% |
| Hugh Dickins | 4 | 2.0% |
| FUJITA Tomonori | 4 | 2.0% |
| Paul Mundt | 4 | 2.0% |
| Vegard Nossum | 3 | 1.5% |
| Russell King | 3 | 1.5% |
| Jeremy Fitzhardinge | 3 | 1.5% |
| Roland McGrath | 3 | 1.5% |
| Haavard Skinnemoen | 3 | 1.5% |
| Dmitry Torokhov | 3 | 1.5% |
| David Woodhouse | 3 | 1.5% |
| Oleg Nesterov | 3 | 1.5% |
|
| Tested-by: credits |
| Pekka Enberg | 7 | 4.6% |
| Linus Torvalds | 7 | 4.6% |
| Takashi Iwai | 5 | 3.3% |
| Bartlomiej Zolnierkiewicz | 5 | 3.3% |
| Peter Zijlstra | 4 | 2.6% |
| Rafael J. Wysocki | 4 | 2.6% |
| Yinghai Lu | 4 | 2.6% |
| Hugh Dickins | 4 | 2.6% |
| Alan Stern | 4 | 2.6% |
| Eric Miao | 4 | 2.6% |
| Thomas Gleixner | 3 | 2.0% |
| Lennert Buytenhek | 3 | 2.0% |
| Alex Chiang | 3 | 2.0% |
| Krzysztof Helt | 3 | 2.0% |
| Stefan Richter | 3 | 2.0% |
| Andy Whitcroft | 3 | 2.0% |
| KOSAKI Motohiro | 2 | 1.3% |
| Dennis Jansen | 2 | 1.3% |
| Andrew Morton | 2 | 1.3% |
| David S. Miller | 2 | 1.3% |
|
The end result: Adrian Bunk gave over 20% of the total bug reporting
credits - to himself. Beyond that, a number of the core developers are
taking at least some time to credit those who report bugs and test
patches. But, in the end, the 10,628 changesets merged for 2.6.27 probably
contained quite a few more patches which could have carried such tags. If
the reporting and testing tags are to become truly useful and significant,
they will have to be more universally used.
While your editor was at it, he also collected statistics for
Reviewed-by: tags. These tags differ in that they are offered by
the reviewer, who thereby states that a reasonably thorough review has been
done and the code has not been found seriously wanting. Code review is
perennially in short supply in just about any free software project, so,
again, proper credit for reviewers seems like more than just a good idea.
Here's the top 2.6.27 credited reviewers:
| Developers with the most reviews (total 123) |
| Ingo Molnar | 23 | 18.7% |
| Paul Jackson | 12 | 9.8% |
| Peter Zijlstra | 11 | 8.9% |
| Christoph Lameter | 10 | 8.1% |
| Aneesh Kumar K.V | 7 | 5.7% |
| KOSAKI Motohiro | 6 | 4.9% |
| Paul E. McKenney | 6 | 4.9% |
| Jeff Moyer | 5 | 4.1% |
| Robert P. J. Day | 4 | 3.3% |
| Nadia Derbey | 3 | 2.4% |
| Paul E. McKenney | 3 | 2.4% |
| Mingming Cao | 2 | 1.6% |
| Michael Buesch | 2 | 1.6% |
| Li Zefan | 2 | 1.6% |
| Matthew Wilcox | 2 | 1.6% |
| Ingo Oeser | 2 | 1.6% |
| Badari Pulavarty | 2 | 1.6% |
If these numbers are to be believed, only 123 reviews were performed over
the 2.6.27 development cycle. Even the most cynical observer is likely to
agree that a bit more reviewing than that is going on. Most reviewers do
not offer the associated tag, so their contribution goes unrecorded. In
particular, Andrew Morton, who seems to review almost every patch which
appears, should be at the top of the above list.
Clearly, the task of ensuring proper credit for testers, bug reporters, and
reviewers is still in its initial stages. But one has to start somewhere;
this is more information than we had before. Hopefully, over time, the
habit of crediting those who help with the development process will become
more widespread. And that, with luck, will encourage more testing and bug
reporting and, as a result, a better kernel.
Comments (7 posted)
By Jonathan Corbet
November 12, 2008
The kernel generally goes out of its way to share identical memory pages between
processes. Program text is always shared, for example. But writable pages
will also be shared between processes when the kernel knows that the
contents of the memory are the same for all processes involved. When a
process calls
fork(), all writable pages are turned into
copy-on-write (COW) pages and shared between the parent and child. As long
as neither process modified the contents of any given page, that sharing
can continue, with a corresponding reduction in memory use.
Copy-on-write with fork() works because the kernel knows that each
process expects to find the same contents in those pages. When the kernel
lacks that knowledge, though, it will generally be unable to arrange
sharing of identical pages. One might not think that this would ordinarily
be a problem, but the KVM developers have come up with a couple of
situations where this kind of sharing opportunity might come about. Your
editor cannot resist this case proposed by
Avi Kivity:
Consider the typical multiuser gnome minicomputer with all 150
users reading lwn.net at the same time instead of working. You
could share the firefox rendered page cache, reducing memory
utilization drastically.
Beyond such typical systems, though, consider the case of a host running a
number of virtualized guests. Those guests will not share a process-tree
relationship which makes the sharing of pages between them easy, but they
may well be using a substantial portion of their memory to hold identical
contents. If that host could find a way to force the sharing of pages with
identical contents, it should be able to make much better use of its memory
and, as a result, run more guests.
This is the kind of thing which gets the attention of virtualization
developers. So the hackers at Qumranet Red Hat (Izik
Eidus, Andrea Arcanageli, and Chris Wright in particular) have put
together a mechanism to make that kind of sharing happen. The resulting
code, called KSM, was recently posted for wider review.
KSM takes the form of a device driver for a single, virtual device:
/dev/ksm. A process which wants to take part in the page sharing
regime can open that device and register (with an ioctl() call) a
portion of its address space with the KSM driver. Once the page sharing
mechanism is turned on (via another ioctl()), the kernel will
start looking for pages to share.
The algorithm is relatively simple. The KSM driver, inside a kernel
thread, picks one of the memory regions registered with it and start
scanning over it. For each page which is resident in memory, KSM will
generate an SHA1 hash of the page's contents. That hash will then be used
to look up other pages with the same hash value. If a subsequent
memcmp() call shows that the contents of the pages are truly
identical, all processes with a reference to the scanned page will be
pointed (in COW mode) to the other one, and the redundant page will be
returned to the system. As long as nobody modifies the page, the sharing
can continue; once a write operation happens, the page will be copied and
the sharing will end.
The kernel thread will scan up to a maximum number of pages before going to
sleep for a while. Both the number of pages to scan and the sleep period
are passed in as parameters to the ioctl() call which starts
scanning. A user-space control process can also pause scanning via another
ioctl() call.
The initial response to the patch from
Andrew Morton was not entirely enthusiastic:
The whole approach seems wrong to me. The kernel lost track of
these pages and then we run around post-facto trying to fix that up
again. Please explain (for the changelog) why the kernel cannot
get this right via the usual sharing, refcounting and COWing
approaches.
The answer from Avi Kivity was reasonably
clear:
For kvm, the kernel never knew those pages were shared. They are
loaded from independent (possibly compressed and encrypted) disk
images. These images are different; but some pages happen to be
the same because they came from the same installation media.
Izik Eidus adds that, with this patch, a
host running a bunch of Windows guests is able to overcommit its memory
300% without terribly ill effects. This technique, it seems, is especially
effective with Windows guests: Windows apparently zeroes all freed memory,
so each guest's list of free pages can be coalesced down to a single,
shared page full of zeroes.
What has not been done (or, at least, not posted) is any sort of
benchmarking of the impact KSM has on a running system. The scanning,
hashing, and comparing of pages will require some CPU time, and it is
likely to have noticeable cache effects as well. If you are trying to run
dozens of Windows guests, cache effects may well be relatively low on your
list of problems. But that cost may be sufficient to prevent the more
general use of KSM, even though systems which are not using virtualization
at all may still have a lot of pages with identical contents.
Comments (25 posted)
By Jonathan Corbet
November 11, 2008
Over the last year or two, the kernel development process has been changed
in a deliberate attempt to make the addition of new drivers easier. It has
become clear that out-of-tree drivers often do not get any better until
they are merged; meanwhile, users want those drivers and distributors are
shipping them. So it would seem that everybody's interests are served by
getting those drivers into the mainline tree. Experience with drivers
merged under this policy has generally been positive; once those drivers
head for the mainline, they get more attention and tend to improve
quickly.
Given that, one might well wonder why Markus Rechberger's recently
submitted "empia" driver series is encountering so much resistance. This
driver works with a number of video acquisition devices based on Empia
chips; many of those are not supported by the kernel now. As an Empia
Technology employee, Markus has access to the relevant data sheets and is,
thus, well placed to write a fully-functional driver. There are users who
will attest that the drivers work, and that Markus provides good support
for them. But, as things stand now, it would appear that this driver is
not headed for the mainline.
What we have here is a classic story of an impedance mismatch between a
developer and the development community. In the process, this long story
has helped to give the Video4Linux development community a bit of a
reputation as a dysfunctional family - a perception which
those developers are only now beginning to overcome. The sad truth would seem
to be that, while working with the community is something that a couple
thousand developers do with little trouble every year, there will always be
a few who have difficulties.
A quick review of some of the history is in order here.
Markus was one of the authors of the original em28xx driver, first merged
for the 2.6.15 kernel. His efforts to enhance that driver quickly ran into
trouble, though, when he tried to make substantial changes to the low-level
tuner interface - changes which affected a number of other drivers. These
changes were not popular in the Video4Linux community, and there were fears
that they could break unrelated drivers. So this code was not merged.
In response to this rejection, Markus claimed
ownership of the em28xx driver and asked that it be removed from the
mainline kernel. He then continued development of the code, hosting it on
his own server.
There was even a period where the code was relicensed to the MPL, apparently as
part of an attempt to prevent it from being
taken into the mainline.
Eventually, Markus came back with a new approach which moved much
of the tuner code into user space. That solution, too, failed to pass
review; nobody else could really see much advantage in moving that much
driver code out of the kernel. The fact that Markus clearly intended to
have some of that code appear in the form of binary-only blobs did not help
his case. So the user-space approach, like its predecessor, was not
merged.
While Markus was working on his own version of the code, others were
putting patches into the mainline em28xx driver. At times, Markus tried to
block those changes. The tone of the discussion is, perhaps, best seen
from this note sent to Video4Linux
maintainer Mauro Carvalho Chehab:
Best would be to replace you as a maintainer since you don't have
any respect of others work either. Companies should be aware that
if they try to submit any code to you they will loose the authority
over _their_ work.
Of course, losing "authority" over code is inherent in releasing that code
under a license like the GPL. This attempt to exercise control over
freely-licensed code was slapped down by
Andrew Morton and others, but it left unpleasant memories behind.
Now Markus is back with a driver that, to all appearances, duplicates the
functionality of a driver which is already in the mainline kernel. It is
not hard to see this submission as an attempt to retake control of that
driver and, perhaps, restart the discussions from past years. So it is not
entirely surprising that this driver has not been received with a great
deal of enthusiasm. In short, Markus has been told to go away until he is
prepared to submit his work in the form of a series of small patches to the
in-tree em28xx driver.
The advantages of improving the current driver, rather than duplicating
some of its functionality
in a new code base, are clear. It would avoid the confusion which can
come from having two drivers for the same hardware in the tree, and it
would minimize the risk of losing important fixes which have been applied
to the in-tree code. This is, also, the way that kernel developers are
normally expected to do their work.
On the other hand, video developer Hans Verkuil reviewed the new driver and concluded:
In my opinion it's pretty much hopeless trying to convert the
current em28xx driver into what you have. It's a huge amount of
work that no one wants to do and (in this case) with very little
benefit.
This review notwithstanding, Mauro has indicated that he is not interested in
accepting this patch.
But rejecting Markus's new driver out of hand might just be a mistake. There
seems to be little doubt that it has developed well beyond the in-tree
driver; it supports a wider range of devices. Failure to merge it risks
losing the work that has been done, and, perhaps, losing the future work of
a developer who, for all his faults, is clearly trying to provide a better
experience for Video4Linux users.
Having multiple drivers for the same hardware in the kernel is not an ideal
situation, but it is also not without precedent.
The IDE and parallel ATA subsystems provide
redundant support for a wide range of hardware. The e1000 and e1000e
drivers had overlapping coverage for some time. In such cases, the
long-term goal is usually to work toward the removal of one of the
drivers.
So one could make the case for merging the new driver and, eventually,
removing the older one. In the process, the new driver could receive some
much-needed attention from other developers. It has coding style and
copyright attribution problems; a quick review has also left your editor
wondering about locking issues. But such problems are common to drivers
which have spent a lot of time out of tree; they are simply something to
fix. Meanwhile, this driver contains the result of years of work and
access to the relevant data sheets; freezing it out may not be in the best
interests of kernel developers or users.
Comments (22 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Security-related
Virtualization and containers
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>