LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 3.2-rc6, released on December 16. Linus was a bit grumpy about late merge requests, but sees the series calming down soon. "We're at -rc6 now, and while I can see myself doing an -rc7, I probably won't do an -rc8 unless something bad pops up. There doesn't seem to be any real reason to drag out this release any more, and we'll probably have the real 3.2 around new years."

Stable updates: the 2.6.32.51, 3.0.14, and 3.1.6 stable kernels were released on December 21. Each contains another long list of important fixes; upgrading is recommended.

Comments (none posted)

Quotes of the week

Hmm. This patch looks obviously correct. But it looks *so* obviously correct that it just makes me suspicious.
-- Linus Torvalds

Nevertheless, being too afraid to stray from the beaten path implies being too afraid to work on RCU. But there are times when the RCU implementation needs a more sane approach. During those times, I must find some other outlet for my insanity: To do otherwise is to break RCU. Fortunately, this time around, an appropriate outlet was readily available in the guise of Ubuntu's new Unity window manager.
-- Paul McKenney

Comments (none posted)

First version of kmod released

A new library libkmod and set of tools (kmod-*) for handling kernel modules has been announced. The idea is to give early boot tools, installers, udev, and others an easy way to query and control kernel modules via a library, rather than using modprobe. "In a recent Linux Desktop (and also several embedded systems) when computer is booting up, udev is responsible for checking available hardware, creating device nodes under /dev (or at least configuring their permissions) and loading kernel modules for the available hardware. In a kernel from a distribution it's pretty common to put most of the things as modules. Udev reads the /sys filesystem to check the available hardware and tries to load the necessary modules. This translates in hundreds of calls to the modprobe binary, and in several of them just to know the module is already loaded, or it's in-kernel. With libkmod it's possible for udev with a few lines of code to do all the job, benefiting from the configurations and indexes already opened and parsed." The project also provides work-alike programs for insmod, lsmod, rmmod, and an incomplete version of modprobe that use libkmod, with plans to complete the set. (Thanks to Luis Felipe Strano Moraes.)

Comments (37 posted)

Pull requests with signed tags

By Jonathan Corbet
December 21, 2011
One of the ongoing echoes from the compromise of kernel.org is an increased interest in verifying the integrity of pull requests sent to Linus. One way of doing that is for the developer to add a cryptographic signature to the email containing the pull request. If the top commit ID is included in the message, the pull request (and the code it covers) can be authenticated, but the digital signature itself is not stored in the mainline repository, making it hard to re-verify requests at some future time.

An alternative is to use git to create a signed tag, which stores the signature in the repository itself. In the future, that may become the accepted way to get code into the mainline. Linus has described some pending changes to git that make the capture and storage of that information simple. So simple, in fact, that there is no longer any need to worry about branches or unique tag names:

Everybody: you can now create a signed tag, and just point me at it. You don't even have to have a separate branch for me to pull any more, just the signed tag is fine.

So it would actually be nicer if you used temporary tag names the way you use temporary branch names when you ask me to pull. The tag *content* will be saved from now on (unless I screw up while traveling or something and pull with a machine that has an older git version), so there's very little advantage in then saving the tags separately by having ugly tag-names with long lifetimes.

All of this evidently works now, with existing stable git releases; only the process of merging such a tag requires the newer code. So, soon, signed tags may be the standard way to identify changes to be pulled.

Comments (4 posted)

[CFP] Linux Storage, Filesystem & Memory Management Summit 2012

The 2012 Linux Storage, Filesystem, and Memory Management Summit will be held on April 1 and 2 in San Francisco, California. The call for proposals for discussions has gone out, with a deadline of February 5.

Full Story (comments: none)

Kernel development news

Some numbers from the 3.2 development cycle

By Jonathan Corbet
December 21, 2011
The 3.2 kernel development cycle always had the potential to be a little different. The prolonged kernel.org outage had left a number of subsystem trees scrambling for new homes; that led to a delayed opening of the merge window. The actual merging of changes happened mostly during the Kernel Summit in Prague. And, even before the normal process got disrupted, this looked like a more than usually active cycle. Despite these challenges, the 3.2 kernel process seems to have worked pretty much as it usually does once it got started.

As of this writing (just after the release of 3.2-rc6), some 11,655 non-merge changesets have been pulled into the mainline kernel; these changesets were contributed by 1,289 developers. At that count, 3.2 is the fourth largest development cycle ever. Chances are good that it will surpass 2.6.29 (11,678 changes) to move up to the number-three position; getting past 2.6.30 (11,989) seems harder - if not impossible - at this point, while passing 2.6.25 (12,243) to become the busiest cycle ever seems quite unlikely. If we want to set a new record for changes merged, we're going to have to try harder.

A lot of code was removed in this cycle, so the total growth of the kernel was 176,000 lines - a relatively modest number.

The most active developers this time around were:

Most active 3.2 developers
By changesets
Larry Finger3022.6%
Paul Gortmaker2342.0%
Mark Brown2261.9%
Axel Lin2201.9%
K. Y. Srinivasan1651.4%
Jonathan Cameron1591.4%
Roland Vossen1571.3%
Ben Skeggs1211.0%
Dmitry Eremin-Solenikov1171.0%
Christoph Hellwig1131.0%
Nicolas Pitre1090.9%
Al Viro1040.9%
Dan Carpenter1010.9%
Arend van Spriel1000.9%
Mark Einon990.8%
Guennadi Liakhovetski980.8%
Laurent Pinchart950.8%
Takashi Iwai920.8%
Johannes Berg910.8%
J. Bruce Fields880.8%
By changed lines
Arend van Spriel1054369.2%
Kalle Valo1005428.8%
Larry Finger840367.3%
Roland Vossen349443.1%
Edwin Rong218761.9%
Mark Brown137711.2%
Mark Einon135971.2%
Richard Kuo122231.1%
Rasesh Mody117921.0%
Joe Thornber100000.9%
Jonathan Cameron97760.9%
Kukjin Kim89200.8%
Franky (Zhenhui) Lin83830.7%
Linus Walleij73170.6%
Emmanuel Grumbach68380.6%
Felipe Balbi67830.6%
David Kilroy63560.6%
Takashi Iwai61880.5%
Shawn Guo60210.5%
Jeff Kirsher60150.5%

Larry Finger put a vast amount of work into cleaning up the rtl8192e driver in the staging tree, making it quite a bit smaller in the process. Paul Gortmaker split the EXPORT_SYMBOL* macros into <linux/export.h>; after that, many files no longer needed to include <linux/module.h>. The real advantage of that kind of work, beyond minimizing the interactions between various parts of the kernel, is that it makes the kernel compilation process faster. Mark Brown, as usual, wrote or improved vast numbers of audio drivers. Axel Lin did a lot of cleanup work, mostly in the audio driver subsystem, while K. Y. Srinivasan continued the seemingly unending task of getting Microsoft's "hv" drivers ready to move into the mainline.

Arend van Spriel topped the list of "lines changed" by moving the brcm80211 driver from staging into the mainline tree. One could argue that this change should be accounted as a rename (which doesn't change any lines), but it does not show up that way in the source history: one patch added the drivers to mainline, while a separate patch removed them from staging. Kalle Valo removed the ath6kl driver from staging, since support for this hardware had been added to the mainline "ath" driver; as a result, he topped the list of developers who removed the most code from the kernel. Larry Finger's work has already been mentioned. Roland Vossen worked hard on the brcm80211 cleanup, and Edwin Rong added a driver for the Realtek RTS5139 cardreader to the staging tree.

The top five entries in the "lines changed" column are all thus related to the staging tree. Some have argued in the past that staging should be excluded from these statistics. There is a valid point behind those arguments, but it should also be noted that much of the activity this time was around movement of code from staging into the mainline. That suggests that staging is working the way it was intended to, and that work done there benefits the mainline in the end.

191 employers were identified as having supported work on the 3.2 kernel. Among those, the most active were:

Most active 3.2 employers
By changesets
(None)172214.8%
Red Hat9888.5%
(Unknown)8637.4%
Intel8447.2%
Broadcom4934.2%
Texas Instruments4824.1%
IBM4123.5%
Novell3473.0%
Wind River2812.4%
Qualcomm2512.2%
Wolfson Micro2482.1%
Samsung2322.0%
MiTAC2201.9%
(Consultant)2081.8%
Nokia2021.7%
Linaro2021.7%
Oracle1891.6%
Freescale1821.6%
Google1821.6%
Microsoft1771.5%
By lines changed
Broadcom25654922.4%
(None)20238717.7%
Qualcomm13327711.6%
Red Hat486734.2%
(Unknown)432543.8%
Intel430943.8%
Texas Instruments315292.8%
Samsung302332.6%
IBM222791.9%
Realsil Micro220651.9%
Brocade217341.9%
Freescale166571.5%
Wolfson Micro162171.4%
ST Ericsson143341.3%
Novell141611.2%
Code Aurora Forum137061.2%
Univ. of Cambridge123501.1%
Linaro107080.9%
(Consultant)92630.8%
Marvell86400.8%

Red Hat remains the top corporate submitter of patches to the kernel, but its lead looks less commanding than it once was. Meanwhile, companies like Texas Instruments and Samsung continue to increase their contributions to the kernel - embedded systems vendors are now a huge part of the development community. There also seems to be an increase in the amount of code coming from industry consortia like Linaro - again, mostly focused in the embedded area. But, with over 190 companies participating, we clearly still have interest from beyond just the embedded realm.

As of this writing, the 3.2 kernel looks likely to be released right around the end of the year, after one more -rc release. If that schedule holds, this cycle will have required less than 70 days, significantly shorter than the average (which is about 80 days) despite the large volume of changes. The process, in other words, appears to be working fairly well despite the kernel.org difficulties and the delayed start. Sooner or later, we are bound to run into a problem that will throw a significant wrench into the works - life is just like that - but that certainly hasn't happened this time around.

Comments (2 posted)

A common clock framework

By Jake Edge
December 21, 2011

One of the big problem areas that has been identified in the ARM kernel trees is the diversity of implementations for various things that could be shared—either within the ARM tree or more widely with the rest of the kernel. That problem has led to a large amount of duplicated code in the ARM tree, both via cut-and-paste and code that is conceptually similar but uses different data structures and APIs. The latter makes the creation of a single kernel image that can boot on multiple ARM platforms impossible, so there are efforts to consolidate these implementations. The common clock framework is one such effort.

In a typical ARM system-on-chip (SoC), there can be dozens of different clocks for use by various I/O and other devices in the SoC. Typically those clocks are hooked together into elaborate tree-like structures. In those trees, child clocks can sometimes only change their frequency if the parent (and any other children) are correspondingly changed; disabling certain clocks will affect other clocks in the system and so on. Each ARM platform/SoC has its own way of encapsulating that information and presenting it to other parts of the system (like power and thermal management controllers), which makes it difficult to create platform-independent solutions.

The first problem that a common clock framework faces is the sheer number of different struct clk definitions scattered throughout the ARM tree. There are more than two dozen definitions in arch/arm currently, but the proposal for a common framework not surprisingly reduces that number to one. Implementations can wrap the struct clk in another structure that holds hardware-specific data, but the common structure looks like:

    struct clk {
	const char                  *name;
	const struct clk_hw_ops     *ops;
	struct clk                  *parent;
	unsigned long               rate;
	unsigned long               flags;
	unsigned int                enable_count;
	unsigned int                prepare_count;
	struct hlist_head           children;
	struct hlist_node           child_node;
    };

The parent and children/child_node fields allow the clocks to be arranged into trees, while the rate field tracks the current clock frequency (in Hz). The flags field is used to describe the clock type (e.g. whether a rate change needs to be done on the parent clock, or that the clock must be disabled before changing the rate). The two *_count fields are for tracking calls to the enable and prepare operations, while the bulk of the "work" is done within the struct clk_hw_ops field (ops).

Each of the entries in the clk_hw_ops structure correspond to a function in the driver-facing API for the clock framework. That API does some sanity checking before calling the corresponding operation from clk_hw_ops:

    struct clk_hw_ops {
	int             (*prepare)(struct clk *clk);
	void            (*unprepare)(struct clk *clk);
	int             (*enable)(struct clk *clk);
	void            (*disable)(struct clk *clk);
	unsigned long   (*recalc_rate)(struct clk *clk);
	long            (*round_rate)(struct clk *clk, unsigned long,
				      unsigned long *);
	int             (*set_parent)(struct clk *clk, struct clk *);
	struct clk *    (*get_parent)(struct clk *clk);
	int             (*set_rate)(struct clk *clk, unsigned long);
    };
clk_prepare() is used to initialize the clock to a state where it could be enabled, and that call must be made before clk_enable(), which actually starts the clock running. clk_disable() and clk_unprepare() do the reverse and should be called in that order. The difference is that clk_prepare() can sleep, while clk_enable() must not, so having two separate calls allows the clock initialization to be split into atomic and non-atomic pieces.

clk_get_parent() and clk_set_parent() do what the names imply, simply returning or changing the parent field, though setting the parent only succeeds if the clock is not already in use (otherwise -EBUSY is returned). clk_recalc_rate() queries the hardware, rather than the cached rate field, for the current frequency of the clock. clk_round_rate() rounds a frequency in Hz to a rate that the clock can actually use, and can also be used to determine the correct frequency for the parent clock when changing rates. All of those are more or less helper functions for clk_set_rate().

clk_set_rate() changes the frequency of a clock, but it must take into account some other factors. If the CLK_PARENT_SET_RATE flag value is set for the clock, clk_set_rate() needs to propagate the change to the parent clock (which may also have that flag set, necessitating a recursive traversal of the tree, attempting to set the rate at each level).

Drivers can also register their interest in being notified of rate changes with the clk_notifier_register() function. Three different types of notification can be requested: before the clock's rate changes, after it has been changed, or if the change gets aborted after the pre-change notifications have been called (i.e. PRE_RATE_CHANGE, POST_RATE_CHANGE, and ABORT_RATE_CHANGE). In each case, both the old and new values for the rate get passed as part of the notification callback. The patch to add notifications creates another operation in clk_hw_ops called speculate_rate(), which notes potential rate changes and sends any needed pre-change notifications as it walks the sub-tree.

The patch set also exports the clock hierarchy into debugfs. Each top-level clock gets a directory in ../debug/clk that contains read-only files to report the clock's rate, flags, prepare and enable counts, and the number of notifiers registered. Subdirectories are created for each child clock containing the same information.

The common clock framework has been around for some time in various forms. The current incarnation is being shepherded by Mike Turquette, but he notes that it is based on work originally done by Jeremy Kerr and Ben Herrenschmidt. Beyond that: "Many others contributed to those patches and promptly had their work stolen by me". Turquette has also posted a patch set with an example that replaces the OMAP4 clocks using the framework.

The comments on the most recent iteration have been fairly light, but still substantive, so we are clearly a ways off from seeing a version in the mainline. It's clearly on the radar of ARM developers, and would clean up a fair amount of code duplication within that tree, so we should see something in the mainline soon—hopefully in one of the next few kernel releases.

Comments (13 posted)

Bringing Android closer to the mainline

By Jonathan Corbet
December 20, 2011
The agenda for the 2011 Kernel Summit did not include Android as a topic, but Android came up anyway. In a conclusion that surprised many, the group agreed that the bulk of the Android kernel code should probably be merged into the mainline. The past couple of years have made it clear that Android will not be going away; it has, in particular, done a good job of outlasting the resistance to merging its code. After the Summit things got quiet again on the Android front, but that does not mean that nothing has been happening.

Tim Bird recently announced the existence of the Android mainlining project, an effort intended to help coordinate the various groups that have been working in this area. The project has the obligatory wiki and mailing list. The list is new and has not seen a whole lot of traffic - a situation which may well change in the near future.

Toward the end of November, the core Android code was returned to the staging tree, from which it had been removed at the end of 2009. Since the code's return to staging, changes have been going in and the code has caught up to its state in the Android tree. The code has now reached a point where, as summarized by Greg Kroah-Hartman on December 16:

[T]he next linux-next Linux kernel release should almost boot an Android userspace, we are only missing one piece, ashmem, and that should hopefully land in my staging-next tree next week. The patches are still being tested and cleaned up by others.

Between the wiki and a look at drivers/staging/android in linux-next, one can get a fair idea of the state of the various patches. One notable patch that is not there is wakelocks (or "suspend blockers"), a feature which has been at the core of the controversy around the Android code. The wakelock concept will almost certainly return at some point, but much of the focus seems to be on the easier components at the moment. As Greg noted, wakelocks are not actually needed to boot an Android system - they're just necessary to keep that system from draining the battery too quickly.

The pieces that exist in the linux-next staging directory now are:

  • Binder, the interprocess communication mechanism used within Android. Binder could conceivably be replaced with a standard IPC mechanism or, perhaps, with D-bus, but it has a number of unique features (zero-copy message transmission, thread management, credential passing) that are hard to replace in a straightforward manner. (See this article for a detailed look at various Linux IPC mechanisms, binder included).

  • Logger is the kernel piece of the Android logging system. It implements a completely separate path for Android-specific log messages, which do not mix with normal kernel messages in any way. Other than adding a "facility" concept to kernel logging, it's not clear what this component offers, but it is also relatively self-contained and should not be too controversial.

  • The "low memory killer" implements Android's interesting approach to application management. In the Android world, applications never choose to exit. They hang around until memory gets tight, at which point kernel starts to kill them off. It's a small piece of code that works using the "shrinker" mechanism, a standard way to register functions to be called when the kernel would like to free up some memory. So, even though it is memory-management code, it is relatively unintrusive and will not affect systems where it is not used.

  • "Pmem" is Android's answer to the age-old problem of allocating large, physically-contiguous buffers after the system has been running for a while. It works in the usual way: a range of memory is set aside at boot time. One difference with pmem is that it exports a device to user space, allowing buffers to be allocated directly by applications and passed to drivers. That, in turn, leads to things like camera drivers being written with the assumption that user space can give them physically-contiguous buffers for video frames, something that would not be possible in a mainline kernel.

    Approaches like CMA seem like a better solution to this particular problem - if and when CMA is merged into the mainline. Meanwhile, however, applications have been written using pmem, so that interface is unlikely to go away in the near future.

  • The "RAM console" saves log data to a special region of memory where it can be found and recovered after a reboot. It is a debugging tool.

  • "Timed GPIO" is a simple mechanism whereby the kernel can schedule a specific setting for a GPIO line at some point in the future. An example use would be to ensure that the vibrator gets turned off regardless of what happens to the application that turned it on.

The "ashmem" component was not in linux-next as of this writing, but, as Greg noted, its arrival there is expected in the near future. Ashmem is a shared memory mechanism that is able to discard some or all of its contents when memory pressure gets high. It could conceivably be replaced by the proposed POSIX_FADV_VOLATILE operation, but the latter does not, yet, seem to be a complete solution for Android's requirements.

There are a number of Android-specific changes that do not appear on that list, and, thus, are not likely to be merged into the mainline in the near future. Some of them are so Android-specific that they may never get in; the "network security" tweaks fall into that category. Others, such as the alarm timer code, may be superseded by enhancements in the mainline. Then, of course, there is a long list of drivers for hardware found on Android devices. Quite a few of those drivers have found their way into the mainline already, and others are on their way.

In summary: if all goes well, the 3.3 kernel should see the delta between Android kernels and the mainline go down considerably. That should make life easier for developers and for vendors wanting to provide Android-compatible hardware. Of course, it would be unsurprising if Android were to grow new subsystems of its own in the future; the Android developers have made it clear that they are unable and unwilling to wait for the mainlining process to run its course when they have products to ship. But, with any luck at all, the worst days of a significant fork that has caused a fair amount of ill will and difficult discussion should soon be behind us.

Comments (56 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Virtualization and containers

Benchmarks and bugs

Miscellaneous

  • Lucas De Marchi: kmod 2 . (December 21, 2011)

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds