Brief items
The current development kernel is 3.2-rc6,
released on December 16. Linus was a bit
grumpy about late merge requests, but sees the series calming down soon.
"
We're at -rc6 now, and while I can see myself doing an -rc7, I
probably won't do an -rc8 unless something bad pops up. There doesn't seem
to be any real reason to drag out this release any more, and we'll probably
have the real 3.2 around new years."
Stable updates: the 2.6.32.51, 3.0.14, and 3.1.6 stable kernels were released on
December 21. Each contains
another long list of important fixes; upgrading is recommended.
Comments (none posted)
Hmm. This patch looks obviously correct. But it looks *so*
obviously correct that it just makes me suspicious.
--
Linus Torvalds
Nevertheless, being too afraid to stray from the beaten path
implies being too afraid to work on RCU. But there are times when
the RCU implementation needs a more sane approach. During those
times, I must find some other outlet for my insanity: To do
otherwise is to break RCU. Fortunately, this time around, an
appropriate outlet was readily available in the guise of Ubuntu's
new Unity window manager.
--
Paul McKenney
Comments (none posted)
A new library
libkmod and set of tools (
kmod-*) for handling kernel modules has been
announced. The idea is to give early boot tools, installers, udev, and others an easy way to query and control kernel modules via a library, rather than using
modprobe. "
In a recent Linux Desktop (and also several embedded systems) when computer is booting up, udev is responsible for checking available hardware, creating device nodes under /dev (or at least configuring their permissions) and loading kernel modules for the available hardware. In a kernel from a distribution it's pretty common to put most of the things as modules. Udev reads the /sys filesystem to check the available hardware and tries to load the necessary modules. This translates in hundreds of calls to the modprobe binary, and in several of them just to know the module is already loaded, or it's in-kernel. With libkmod it's possible for udev with a few lines of code to do all the job, benefiting from the configurations and indexes already opened and parsed." The project also provides work-alike programs for
insmod,
lsmod,
rmmod, and an incomplete version of
modprobe that use
libkmod, with plans to complete the set. (Thanks to Luis Felipe Strano Moraes.)
Comments (37 posted)
By Jonathan Corbet
December 21, 2011
One of the ongoing echoes from the compromise of kernel.org is an increased
interest in verifying the integrity of pull requests sent to Linus. One
way of doing that is for the developer to add a cryptographic signature to
the email containing the pull request. If the top commit ID is included in
the message, the pull request (and the code it covers) can be
authenticated, but the digital signature itself is not stored in the
mainline repository, making it hard to re-verify requests at some future
time.
An alternative is to use git to create a signed tag, which stores the
signature in the repository itself. In the future, that may become the
accepted way to get code into the mainline. Linus has described some pending changes to git that
make the capture and storage of that information simple. So simple, in
fact, that there is no longer any need to worry about branches or unique
tag names:
Everybody: you can now create a signed tag, and just point me at
it. You don't even have to have a separate branch for me to pull
any more, just the signed tag is fine.
So it would actually be nicer if you used temporary tag names the
way you use temporary branch names when you ask me to pull. The tag
*content* will be saved from now on (unless I screw up while
traveling or something and pull with a machine that has an older
git version), so there's very little advantage in then saving the
tags separately by having ugly tag-names with long lifetimes.
All of this evidently works now, with existing stable git releases; only
the process of merging such a tag requires the newer code. So, soon,
signed tags may be the standard way to identify changes to be pulled.
Comments (4 posted)
The 2012 Linux Storage, Filesystem, and Memory Management Summit will be
held on April 1 and 2 in San Francisco, California. The call for
proposals for discussions has gone out, with a deadline of
February 5.
Full Story (comments: none)
Kernel development news
By Jonathan Corbet
December 21, 2011
The 3.2 kernel development cycle always had the potential to be a little
different. The prolonged kernel.org outage had left a number of subsystem
trees scrambling for new homes; that led to a delayed opening of the merge
window. The actual merging of changes happened mostly during the Kernel
Summit in Prague. And, even before the normal process got disrupted, this
looked like a more than usually active cycle. Despite these challenges,
the 3.2 kernel process seems to have worked pretty much as it usually does
once it got started.
As of this writing (just after the release of 3.2-rc6), some 11,655
non-merge changesets have been pulled into the mainline kernel; these
changesets were contributed by 1,289 developers. At that count, 3.2 is the
fourth largest development cycle ever. Chances are good that it will
surpass 2.6.29 (11,678 changes) to move up to the number-three position;
getting past 2.6.30 (11,989) seems harder - if not impossible - at this
point, while passing 2.6.25 (12,243) to become the busiest cycle ever seems
quite unlikely. If we want to set a new record for changes merged, we're
going to have to try harder.
A lot of code was removed in this cycle, so the total growth of the kernel
was 176,000 lines - a relatively modest number.
The most active developers this time around were:
| Most active 3.2 developers |
| By changesets |
| Larry Finger | 302 | 2.6% |
| Paul Gortmaker | 234 | 2.0% |
| Mark Brown | 226 | 1.9% |
| Axel Lin | 220 | 1.9% |
| K. Y. Srinivasan | 165 | 1.4% |
| Jonathan Cameron | 159 | 1.4% |
| Roland Vossen | 157 | 1.3% |
| Ben Skeggs | 121 | 1.0% |
| Dmitry Eremin-Solenikov | 117 | 1.0% |
| Christoph Hellwig | 113 | 1.0% |
| Nicolas Pitre | 109 | 0.9% |
| Al Viro | 104 | 0.9% |
| Dan Carpenter | 101 | 0.9% |
| Arend van Spriel | 100 | 0.9% |
| Mark Einon | 99 | 0.8% |
| Guennadi Liakhovetski | 98 | 0.8% |
| Laurent Pinchart | 95 | 0.8% |
| Takashi Iwai | 92 | 0.8% |
| Johannes Berg | 91 | 0.8% |
| J. Bruce Fields | 88 | 0.8% |
|
| By changed lines |
| Arend van Spriel | 105436 | 9.2% |
| Kalle Valo | 100542 | 8.8% |
| Larry Finger | 84036 | 7.3% |
| Roland Vossen | 34944 | 3.1% |
| Edwin Rong | 21876 | 1.9% |
| Mark Brown | 13771 | 1.2% |
| Mark Einon | 13597 | 1.2% |
| Richard Kuo | 12223 | 1.1% |
| Rasesh Mody | 11792 | 1.0% |
| Joe Thornber | 10000 | 0.9% |
| Jonathan Cameron | 9776 | 0.9% |
| Kukjin Kim | 8920 | 0.8% |
| Franky (Zhenhui) Lin | 8383 | 0.7% |
| Linus Walleij | 7317 | 0.6% |
| Emmanuel Grumbach | 6838 | 0.6% |
| Felipe Balbi | 6783 | 0.6% |
| David Kilroy | 6356 | 0.6% |
| Takashi Iwai | 6188 | 0.5% |
| Shawn Guo | 6021 | 0.5% |
| Jeff Kirsher | 6015 | 0.5% |
|
Larry Finger put a vast amount of work into cleaning up the rtl8192e driver
in the staging tree, making it quite a bit smaller in the process. Paul
Gortmaker split the EXPORT_SYMBOL* macros into
<linux/export.h>; after that, many files no longer needed to
include <linux/module.h>. The real advantage of that kind
of work, beyond minimizing the interactions between various parts of the
kernel, is that it makes the kernel compilation process faster. Mark
Brown, as usual, wrote or improved vast numbers of audio drivers. Axel Lin
did a lot of cleanup work, mostly in the audio driver subsystem, while
K. Y. Srinivasan continued the seemingly unending task of getting
Microsoft's "hv" drivers ready to move into the mainline.
Arend van Spriel topped the list of "lines changed" by moving the brcm80211
driver from staging into the mainline tree. One could argue that this
change should be accounted as a rename (which doesn't change any lines),
but it does not show up that way in
the source history: one patch added the drivers to mainline, while a
separate patch removed them from staging. Kalle Valo removed the ath6kl driver from
staging, since support for this hardware had been added to the mainline
"ath" driver; as a result, he topped the list of developers who removed the
most code from the kernel. Larry Finger's work has already been
mentioned. Roland Vossen worked hard on the brcm80211 cleanup, and Edwin
Rong added a driver for the Realtek RTS5139 cardreader to the staging
tree.
The top five entries in the "lines changed" column are all thus related to
the staging tree. Some have argued in the past that staging should be
excluded from these statistics. There is a valid point behind those
arguments, but it should also be noted that much of the activity this time
was around movement of code from staging into the mainline. That suggests
that staging is working the way it was intended to, and that work done
there benefits the mainline in the end.
191 employers were identified as having supported work on the 3.2 kernel.
Among those, the most active were:
| Most active 3.2 employers |
| By changesets |
| (None) | 1722 | 14.8% |
| Red Hat | 988 | 8.5% |
| (Unknown) | 863 | 7.4% |
| Intel | 844 | 7.2% |
| Broadcom | 493 | 4.2% |
| Texas Instruments | 482 | 4.1% |
| IBM | 412 | 3.5% |
| Novell | 347 | 3.0% |
| Wind River | 281 | 2.4% |
| Qualcomm | 251 | 2.2% |
| Wolfson Micro | 248 | 2.1% |
| Samsung | 232 | 2.0% |
| MiTAC | 220 | 1.9% |
| (Consultant) | 208 | 1.8% |
| Nokia | 202 | 1.7% |
| Linaro | 202 | 1.7% |
| Oracle | 189 | 1.6% |
| Freescale | 182 | 1.6% |
| Google | 182 | 1.6% |
| Microsoft | 177 | 1.5% |
|
| By lines changed |
| Broadcom | 256549 | 22.4% |
| (None) | 202387 | 17.7% |
| Qualcomm | 133277 | 11.6% |
| Red Hat | 48673 | 4.2% |
| (Unknown) | 43254 | 3.8% |
| Intel | 43094 | 3.8% |
| Texas Instruments | 31529 | 2.8% |
| Samsung | 30233 | 2.6% |
| IBM | 22279 | 1.9% |
| Realsil Micro | 22065 | 1.9% |
| Brocade | 21734 | 1.9% |
| Freescale | 16657 | 1.5% |
| Wolfson Micro | 16217 | 1.4% |
| ST Ericsson | 14334 | 1.3% |
| Novell | 14161 | 1.2% |
| Code Aurora Forum | 13706 | 1.2% |
| Univ. of Cambridge | 12350 | 1.1% |
| Linaro | 10708 | 0.9% |
| (Consultant) | 9263 | 0.8% |
| Marvell | 8640 | 0.8% |
|
Red Hat remains the top corporate submitter of patches to the kernel, but
its lead looks less commanding than it once was. Meanwhile, companies like
Texas Instruments and Samsung continue to increase their contributions to
the kernel - embedded systems vendors are now a huge part of the
development community. There also seems to be an increase in the amount of
code coming from industry consortia like Linaro - again, mostly focused in
the embedded area. But, with over 190 companies participating, we clearly
still have interest from beyond just the embedded realm.
As of this writing, the 3.2 kernel looks likely to be released right around
the end of the year, after one more -rc release. If that schedule holds,
this cycle will have required less than 70 days, significantly shorter than
the average (which is about 80 days) despite the large volume of changes.
The process, in other words, appears to be working fairly well despite the
kernel.org difficulties and the delayed start. Sooner or later, we are
bound to run into a problem that will throw a significant wrench into the
works - life is just like that - but that certainly hasn't happened this
time around.
Comments (2 posted)
By Jake Edge
December 21, 2011
One of the big problem areas that has been identified in the ARM kernel
trees is the diversity of implementations for various things that could be
shared—either within the ARM tree or more widely with the rest of the
kernel. That problem has led to a large amount of duplicated code in the
ARM tree, both via cut-and-paste and code that is conceptually similar but
uses different data structures and APIs. The latter makes the creation of
a single kernel image that can boot on multiple ARM platforms impossible, so
there are efforts to consolidate these implementations. The common clock
framework is one such effort.
In a typical ARM system-on-chip (SoC), there can be dozens of different
clocks for use by various I/O and other devices in the SoC. Typically
those clocks are hooked together into elaborate tree-like structures. In
those trees,
child clocks can sometimes only change their frequency if the parent
(and any other children) are correspondingly changed; disabling certain
clocks will affect other clocks in the system and so on. Each ARM
platform/SoC has its own way of encapsulating that information and
presenting it to other parts of the system (like power and thermal
management controllers), which makes it difficult to create
platform-independent solutions.
The first problem that a common clock framework faces is the sheer number
of different struct clk definitions scattered throughout the ARM
tree. There are more than two dozen definitions in arch/arm
currently, but the proposal for a common
framework not surprisingly reduces that number to one. Implementations can
wrap the struct clk in another structure that holds
hardware-specific data, but the common structure looks like:
struct clk {
const char *name;
const struct clk_hw_ops *ops;
struct clk *parent;
unsigned long rate;
unsigned long flags;
unsigned int enable_count;
unsigned int prepare_count;
struct hlist_head children;
struct hlist_node child_node;
};
The parent and children/child_node fields allow
the clocks to be arranged
into trees, while the rate field tracks the
current clock frequency (in Hz). The
flags field is used to describe the clock
type (e.g. whether a rate change needs to be done on the parent clock, or
that the clock must be disabled before changing the rate). The two *_count fields are for tracking calls to the enable
and prepare operations, while the bulk of the "work" is done within the
struct
clk_hw_ops field (ops).
Each of the entries in the clk_hw_ops structure correspond to a
function in the driver-facing API for the clock framework. That API does
some sanity checking before calling the corresponding operation from
clk_hw_ops:
struct clk_hw_ops {
int (*prepare)(struct clk *clk);
void (*unprepare)(struct clk *clk);
int (*enable)(struct clk *clk);
void (*disable)(struct clk *clk);
unsigned long (*recalc_rate)(struct clk *clk);
long (*round_rate)(struct clk *clk, unsigned long,
unsigned long *);
int (*set_parent)(struct clk *clk, struct clk *);
struct clk * (*get_parent)(struct clk *clk);
int (*set_rate)(struct clk *clk, unsigned long);
};
clk_prepare() is used to initialize
the clock to a state where it could be enabled, and that call must be made
before
clk_enable(), which actually starts the clock running.
clk_disable() and
clk_unprepare() do the reverse and
should be called in that order. The difference is that
clk_prepare() can sleep, while
clk_enable() must not, so
having two separate calls allows the clock initialization to be split into
atomic and non-atomic pieces.
clk_get_parent() and clk_set_parent() do what the names
imply, simply returning or changing the parent
field, though setting the parent only succeeds if the clock is not already
in use (otherwise -EBUSY is returned). clk_recalc_rate() queries
the hardware, rather than the
cached rate field, for the current frequency of the
clock. clk_round_rate() rounds a frequency in Hz to a rate that
the clock can actually use, and can also be used to determine the correct
frequency for the parent clock when changing rates. All of those are more or less helper functions
for clk_set_rate().
clk_set_rate() changes the frequency of a clock, but it must take
into account some other factors. If the CLK_PARENT_SET_RATE flag
value is set for the clock, clk_set_rate() needs to propagate the
change to the parent clock (which may also have that flag set,
necessitating a recursive traversal of the tree, attempting to set the rate
at each level).
Drivers can also register their interest in being notified of rate changes
with the clk_notifier_register() function. Three different types
of notification can be requested: before the clock's rate
changes, after it has been changed, or if the change gets aborted after the
pre-change notifications have been called (i.e. PRE_RATE_CHANGE,
POST_RATE_CHANGE, and ABORT_RATE_CHANGE). In each case,
both the old and new values for the rate get passed as part of the
notification callback. The patch to add notifications
creates another operation in clk_hw_ops called
speculate_rate(),
which notes potential rate changes and sends any needed pre-change
notifications as it walks the sub-tree.
The patch set also exports the clock hierarchy into debugfs. Each
top-level clock gets a directory in ../debug/clk that contains
read-only files to report the clock's rate, flags, prepare and enable
counts, and the number of notifiers registered. Subdirectories are created
for each child clock containing the same information.
The common clock framework has been around for some time in various forms.
The current incarnation is being shepherded by Mike Turquette, but he notes
that it is based on work originally done by Jeremy Kerr and Ben
Herrenschmidt. Beyond that: "Many others contributed to those
patches and promptly had their work stolen by me".
Turquette has also posted a patch set with
an example
that replaces the OMAP4 clocks using the framework.
The comments on
the most recent iteration have been fairly light, but still substantive, so
we are clearly a ways off from seeing a version in the mainline. It's
clearly on the radar of ARM developers, and would clean up a fair amount of
code duplication within that tree, so we should see something in the
mainline soon—hopefully in one of the next few kernel releases.
Comments (13 posted)
By Jonathan Corbet
December 20, 2011
The agenda for the
2011 Kernel
Summit did not include Android as a topic, but Android came up anyway.
In a conclusion that surprised many, the group agreed that the bulk of the
Android kernel code should probably be merged into the mainline. The past couple of
years have made it clear that Android will not be going away; it has, in
particular, done a good job of outlasting the resistance to merging its
code. After the Summit things got quiet again on the Android front, but
that does not mean that nothing has been happening.
Tim Bird recently announced the existence
of the Android mainlining project, an effort intended to help coordinate
the various groups that have been working in this area. The project has
the obligatory wiki and mailing
list. The list is new and has not seen a whole lot of traffic - a
situation which may well change in the near future.
Toward the end of November, the core Android code was returned to the staging
tree, from which it had been removed at the end of 2009. Since the code's
return to staging,
changes have been going in and the code has caught up to its state in the
Android tree. The code has now reached a point where, as summarized
by Greg Kroah-Hartman on December 16:
[T]he next linux-next Linux kernel
release should almost boot an Android userspace, we are only
missing one piece, ashmem, and that should hopefully land in my
staging-next tree next week. The patches are still being tested and
cleaned up by others.
Between the wiki and a look at drivers/staging/android
in linux-next, one can get a fair idea of the state of the various
patches. One notable patch that is not there is wakelocks (or
"suspend blockers"), a feature which has been at the core of the controversy around
the Android code. The wakelock concept will almost certainly return at
some point, but much of the focus seems to be on the easier components at
the moment. As Greg noted, wakelocks are not actually needed to boot an
Android system - they're just necessary to keep that system from draining
the battery too quickly.
The pieces that exist in the linux-next staging directory now are:
- Binder, the
interprocess communication mechanism used within Android. Binder
could conceivably be replaced with a standard IPC mechanism or,
perhaps, with D-bus, but it has a number of unique features (zero-copy
message transmission, thread management, credential passing) that are
hard to replace in a straightforward manner. (See this article for a detailed look at
various Linux IPC mechanisms, binder included).
- Logger is the kernel piece of the Android logging
system. It implements a completely separate path for
Android-specific log
messages, which do not mix with normal kernel messages in any way.
Other than adding a "facility" concept to kernel logging, it's not
clear what this component offers, but it is also relatively
self-contained and should not be too controversial.
- The "low memory killer" implements Android's interesting approach to
application management. In the Android world, applications never
choose to exit. They hang around until memory gets tight, at which
point kernel starts to kill them off. It's a
small piece of code that works using the "shrinker"
mechanism, a standard way to register functions to be called when the
kernel would like to free up some memory. So, even though it is
memory-management code, it is
relatively unintrusive and will not affect systems where it is not
used.
- "Pmem" is Android's answer to the age-old problem of allocating large,
physically-contiguous buffers after the system has been running for a
while. It works in the usual way: a range of memory is set aside at
boot time. One difference with pmem is that it exports a device to
user space, allowing buffers to be allocated directly by applications
and passed to drivers. That, in turn, leads to things like camera
drivers being written with the assumption that user space can give
them physically-contiguous buffers for video frames, something that
would not be possible in a mainline kernel.
Approaches like CMA seem like a better
solution to this particular problem - if and when CMA is merged into
the mainline. Meanwhile, however, applications
have been written using pmem, so that interface is unlikely to go away
in the near future.
- The "RAM console" saves log data to a special region of memory where
it can be found and recovered after a reboot. It is a debugging tool.
- "Timed GPIO" is a simple mechanism whereby the kernel can schedule a
specific setting for a GPIO line at some point in the future. An
example use would be to ensure that the vibrator gets turned off
regardless of what happens to the application that turned it on.
The "ashmem" component was not in linux-next as of this writing, but,
as Greg noted, its arrival there is
expected in the near future. Ashmem is a shared memory mechanism that is
able to discard some or all of its contents when memory pressure gets
high. It could conceivably be replaced by the proposed POSIX_FADV_VOLATILE operation, but the latter
does not, yet, seem to be a complete solution for Android's requirements.
There are a number of Android-specific changes that do not appear on that
list, and, thus, are not likely to be merged into the mainline in the near
future. Some of them are so Android-specific that they may never get in;
the "network security" tweaks fall into that category. Others, such as the
alarm timer code, may be superseded by enhancements in the mainline.
Then, of course,
there is a long list of drivers for hardware found on Android devices.
Quite a few of those drivers have found their way into the mainline
already, and others are on their way.
In summary: if all goes well, the 3.3 kernel should see the delta between
Android kernels and the mainline go down considerably. That should make
life easier for developers and for vendors wanting to provide
Android-compatible hardware. Of course, it would be unsurprising if
Android were to grow new subsystems of its own in the future; the Android
developers have made it clear that they are unable and unwilling to wait
for the mainlining process to run its course when they have products to
ship. But, with any luck at all, the worst days of a significant fork that
has caused a fair amount of ill will and difficult discussion should soon
be behind us.
Comments (56 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Virtualization and containers
Benchmarks and bugs
Miscellaneous
- Lucas De Marchi: kmod 2 .
(December 21, 2011)
Page editor: Jonathan Corbet
Next page: Distributions>>