LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel remains 3.1-rc6; there have been no releases - development or stable - in the last week.

Comments (2 posted)

Quotes of the week

Coherent vision isn't something that the kernel community really values.
-- Neil Brown

On both 1.5 and 1.75, OLPC obtained assurances from the companies that the data sheets for the processor/companion chips/SoC would be publicly available by the time the laptop reached production.

In both cases, the companies lied to get the designs started and have no intention of ever releasing critical documentation outside of an NDA.

-- John Watlington

People who remove debugability blindly have earned an one way ticket to the Oort cloud. There is utter chaos already so they wont be noticed at all.
-- Thomas Gleixner

Comments (1 posted)

linux-next on github

The linux-next tree has been unavailable since kernel.org went down; maintainer Stephen Rothwell had said previously that he was unable to post it at an alternative location. That problem has evidently been overcome; those wanting the current linux-next tree can find it on github.

Full Story (comments: 28)

Where's that tree?

By Jonathan Corbet
September 21, 2011
As of this writing, the kernel.org outage continues with no word as to when the site will be back up. Kernel development never stops, though, and one of the advantages of the git model is that copies of repositories exist all over the place. A number of developers have announced new locations for their trees; some of the relocated repositories are:

ACPIhttps://github.com/lenb/linux.git
ALSA SOCgit://opensource.wolfsonmicro.com/linux-2.6-asoc.git
amd64 EDACgit://amd64.org/linux/bp.git
APMgit://twin.jikos.cz/jikos/apm
arm-socgit://git.linaro.org/people/arnd/arm-soc.git
HIDgit://twin.jikos.cz/jikos/hid
infiniband https://github.com/rolandd/infiniband
inputhttps://github.com/dtor/input
kbuildhttp://repo.or.cz/w/linux-kbuild.git
libatagit://github.com/jgarzik/libata-dev.git
mmcgit://dev.laptop.org/users/cjb/mmc mmc-next
pmgit://github.com/rjwysocki/linux-pm.git
regmapgit://opensource.wolfsonmicro.com/regmap.git
SCSIgit://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6.git
git://bedivere.hansenpartnership.com/git/scsi-misc-2.6.git
slabgit://github.com/penberg/linux.git
tipgit://tesla.tglx.de/git/linux-2.6-tip
trivialgit://twin.jikos.cz/jikos/trivial
wirelessgit://git.infradead.org/users/linville/wireless.git
git://git.infradead.org/users/linville/wireless-next.git
git://git.infradead.org/users/linville/wireless-testing.git
xengit://oss.oracle.com/git/kwilk/xen.git

Most of these relocations have been advertised as being temporary. It will be interesting to see how many of them move back to kernel.org on its return, especially if that site comes back with stricter rules about access.

Comments (12 posted)

Kernel development news

LPC: Control groups

By Jonathan Corbet
September 20, 2011
Control groups remain a controversial topic in kernel circles; some developers like them, others hate them. The latter group would like to see the feature removed altogether, but that seems unlikely to happen; there are too many users for control groups already, with more to come. The 2011 Linux Plumbers Conference featured a discussion among those users that gave some insights into why control groups are useful and what could be done to make them more so.

The session started with a brief talk by Kir Kolyshkin of Parallels; for him, control groups are all about implementing containers. Containers can be seen as a sort of poor user's virtualization; it enables the running of multiple, isolated user-space systems all on the same kernel. Containers tend to be more efficient than pure virtualization; they are also, he said, the only form of virtualization available for the ARM architecture at the moment. Control groups help in the implementation of containers by isolating groups of processes from each other and by allowing the imposition of resource limits on each group.

The bulk of the session, though, centered around a presentation by Tim Hockin on Google's isolation and resource limitation needs. Google's cluster runs all kinds of jobs which, internally, are divided into "tier 1" and "tier 2" tasks. The general problem Google has is that tasks normally do not use 100% of the resources they request; that [Tim
Hockin] means that systems in the cluster tend to be underutilized. Google would like to be able to pack more jobs onto each box, but they have to be very careful about overcommitting resources. If that is not done carefully, resource-intensive jobs can get in the way of urgent tasks like responding to search queries.

Google uses its own form of containers to be able to overcommit systems safely. Containers let Google place limits on the CPU usage, memory usage, I/O bandwidth consumption, etc. of each group of processes on the system. The goal, when all goes well, is to isolate each group from the others, provide predictable resources to each, and to lose very little time on the container implementation itself. Control groups are used when they are available and suitable to the task; in other places, a lot of user-space control code is used instead. The user-space code is complex and racy, Tim said; they would like to be rid of it.

There is a special daemon running on each system that wakes up about every 100ms to have a look at what is going on. Should it detect a load spike originating from the system's tier-1 work, it will stop or kill any tier-2 tasks needed to make room. This all works, but it could work better; more support from the kernel would be helpful.

For example, memory use needs to be tightly controlled on these systems. At the moment, Google is using the "fake NUMA" feature to partition system memory and parcel it out as needed (see this article for a bit more information on how that works). Fake NUMA is a hack, though, with resource costs of its own. They are moving to the kernel's memory controller, but it is not yet suitable for their needs because it cannot work with nested control groups. They had similar problems with the disk bandwidth controller, but that problem has been resolved recently. In general, Tim said, anybody who is designing a controller for Linux should think about how it will nest from the beginning.

One other problem with the memory controller is its handling of shared memory. Currently shared pages are billed to the control group that touches it first. That makes deterministic resource control hard, especially in situations where the limits are set tightly. Tim didn't like the idea of proportional billing (dividing the charge for each page across each group that has it mapped) any better. That, he said, takes memory billing out of the control of each group; if one control group exits, the others will suddenly find themselves over their limits as their portion of the shared pages grows. What he would like would be the ability to manually arrange for pages backed by certain files to be billed to specific groups. Then he could set up a system group to be billed for, say, the C library.

There are some other problems as well. The memory overhead of the memory controller is painfully high, for example. Google would really like a way to query the size of the working set for each control group, but that capability is not currently there. They also really want per-control-group reclaim to focus the memory management code on the control groups that are currently exceeding their limits. And, if a container goes so far over its limits that the out-of-memory killer gets involved, it would be really nice to have a way to kill a whole control group at once instead of having to do it one process at a time. (It's worth noting that patches for many of these features exist; many of them come from Google).

Beyond that, there is a lot of interest in the I/O bandwidth controller. A lot of Google jobs, he said, are "seek locked"; controlling how much I/O bandwidth they use is important. Controllers for other types of resources (number of threads, number of open file descriptors, network ports, etc.) would be useful. And so on.

The session spent some time on other topics - primarily user-space checkpoint/restart. It was agreed that everybody in the room was interested in better isolation, and that the memory controller was the area in need of the most work at the moment. The session was dominated by users of control groups, though; there were not a lot of implementers present. Even more notable in their absence were those developers who are opposed to control groups in their current form; it would have been interesting to hear their ideas about how the needs expressed there should really be met.

Comments (1 posted)

Toward a unified display driver framework

By Jonathan Corbet
September 20, 2011
In a separate article, LWN looked at the discussion around how display drivers should be managed in the X server; one of the things that was noted there was that the movement of much of the driver logic into the kernel has reduced the rate of change on the user-space side. Seemingly simultaneously, the kernel community got into an extended discussion of how display drivers should be managed within the kernel. Here, the complexity of contemporary hardware is likely to drive both a consolidation of and some extensions to the kernel's interfaces.

It all started rather innocently with Tomi Valkeinen's description of the challenges posed by the display system found on OMAP processors. System-on-chip architectures like OMAP tend not to bother with the nice separation between devices found on desktop- and server-oriented architectures. So, instead of having a "video card," the OMAP has, on one side, an acceleration engine that can render pixels into main memory and, on the other, a "display subsystem" connecting that memory to the video display. That subsystem consists of a series of overlay processors, each of which can render a window from memory; the output of all the overlay processors is composited by the hardware and actually put on the display. Or, more specifically, the output from these processors is handed to the panel controller, which may be a complex bit of hardware in its own right.

So OMAP graphics depends on a set of interconnected components. Filling video memory can be done via the framebuffer interface, via the direct rendering (DRM) interface, or, for video captured from a camera, via the Video4Linux2 overlay interface. Video memory must be managed for those interfaces, then handed to the display processors which, in turn, must communicate with the panel controller. All of this works, but, as Tomi noted, there seems to be a lot of duplication of code between these various interfaces and no generic way for the kernel to manage things at any of these levels. Wouldn't it be nicer, he asked, to create a low-level display framework to handle these tasks?

He is not the first to ask such a question; the graphics developers have been working on this problem for some years, and much of the solution seems clear. The DRM code is where the bulk of the work has been done in the last few years; it is the only display subsystem that comes close to being able to describe and drive contemporary hardware. As the memory management issues associated with graphics become more complex, it becomes increasingly necessary to use a management framework like GEM, and that means using DRM. It also, as a result of its X-server heritage, contains a couple decades' worth of experience on dealing with the quirks of real-world video hardware. So most developers seem to believe that, over time, DRM should become the interface for mode setting and memory management, while the older framebuffer interface should become a compatibility layer over DRM until it fades away entirely.

That said, Florian Tobias Schandinat, who recently took over the maintainership of the framebuffer code, has a different opinion. To Florian, the framebuffer layer is alive and well, it has more users than DRM does, and it will not be going away anytime soon. His biggest complaint with DRM appears to be that (1) it is significantly more complex, making the drivers more complex, and (2) exposing the acceleration capabilities of the graphics processor makes it easy for applications to crash the system. The fact that the framebuffer API does not provide any mechanism for acceleration is, in his view, an advantage.

Florian would appear to be in the minority here, though; most developers seem to feel that it will be increasingly hard to manage contemporary hardware without the capabilities that the DRM layer provides. The presence of bugs in DRM drivers that can crash the system - especially when acceleration is used - is not really denied by anybody, but it was pointed out that use of DRM does not require the use of acceleration. The hardware is also apparently getting better in that it makes it easier for the operating system to regain control of the GPU when necessary. In any case, crashes and bugs are seen as something to fix and not as a reason to avoid DRM outright.

That leaves the question of how to handle the Video4Linux2 overlay feature. Overlay has been somewhat deprecated and unloved for some years, though it remains an official part of the interface; it was designed for an earlier, simpler era. When CPUs reached a point where they could easily manage a video stream from a camera device, the motivation for overlay faded - for a while. More recently, the resolution of video streams has increased notably and power consumption has become a much more important consideration. Even if the CPU can process a video stream in real time on a mobile device, the battery will last longer if the CPU sleeps and the stream goes straight to video memory. That means that the ability to overlay video streams onto the display in a zero-copy manner has become interesting again.

Given that the old overlay interface is seen as inadequate, there is a clear need for a new one. Jesse Barnes floated a proposal for a new overlay API back in April; the DMA buffer sharing proposal posted more recently is also aimed at this requirement. The word is that this topic was discussed at the X Developers Conference and that a new proposal is forthcoming soon.

As an indication of where things could be heading in the longer term, it is worth looking at this message from Laurent Pinchart, the author of the V4L2 media controller subsystem. The complexity of video acquisition devices has reached a point where treating them as a single device no longer works well; thus the media controller, which allows user space to query and change the connections between a pipeline of devices. The display problem, he said, is essentially the same; perhaps, he suggested, the media controller could be useful for controlling display pipelines as well. The idea did not immediately take the world by storm, but it may give an indication of where things will eventually need to go in the future.

The last few years have seen the consolidation of a lot of display-oriented code into the kernel; that code is increasingly using common infrastructure like the GEM memory manager. It is not hard to imagine that this consolidation will continue to the point where the DRM subsystem becomes the supported way for controlling displays, with the other interfaces implemented as some sort of compatibility layer. The complexity of the DRM code is, in the end, driven by the complexity of the hardware it must drive, and that hardware does not look like it will be getting any simpler anytime soon.

Comments (5 posted)

dm-verity

By Jonathan Corbet
September 19, 2011
It is not often that Netflix employees show up on linux-kernel to advocate for the merging of specific patches. But that is exactly what has happened after a posting of a new device mapper module called dm-verity. As one might expect, dm-verity has little to do with, say, efficient sorting of DVD mailings. It is, instead, a classic piece of security technology with the potential to work in the user's interests - or against those interests.

The purpose of dm-verity is to implement a device mapper target capable of validating the data blocks contained in a filesystem against a list of cryptographic hash values. If the hash for a specific block does not come out as expected, the module assumes that the device has been tampered with and causes the access attempt to fail. It has been put forward by Mandeep Singh Baines of Google's Chromium OS team, but there appears to be interest in this capability beyond that small group.

At the core of this new facility is a module called dm-bht, which works with a list of block numbers and their associated hash values. This list is organized into a simple tree for quick access to the hashes for arbitrary blocks. In essence, the leaves of the tree are pages containing hash values; each higher level in the tree contains hashes of the blocks below it. Verifying a block requires not only checking the hash value for that specific block; it is also necessary to verify hashes up to the root of the tree. If the hash for the tree root (which is assumed to be trusted) checks out, all is well. The dm-bht code can use any hash algorithm supported by the kernel's crypto API; SHA1 is given as an example, but others can be used as well.

dm-verity implements a read-only target; it is assumed that there is no need to change the data protected by this scheme (being, most likely, the binaries run by the system itself) during operation. The tree of block hashes is stored with (or near) the data itself, but the root hash must be passed in externally. If that root hash comes from a trusted source, it should be possible to detect any modification of the disk, in either the data itself or in the stored hash tree. So, if all goes well, a system running with dm-verity can be assured that the underlying software has not been changed. It's worth noting that integrity checking for any specific block does not happen until the kernel tries to read that block into the page cache. There is, thus, no need for a lengthy verification process at boot or mount time.

All of this depends on getting the right hash value into the system at startup time. If some sort of hardware-verified trusted bootloader is in use, that can probably be done in some sort of secure manner. Device mapper setup is a complex task requiring some sort of running user space. This means that a system using dm-verity will need some other mechanism to load a trusted kernel and initramfs or the whole chain will break. A trusted bootloader can achieve that kind of setup; another example given by the developers is a system booting from a "known good" source like a USB stick that is never left unattended.

One might wonder how dm-verity differs from existing features like the extended verification module. EVM requires and uses a trusted platform module (TPM) on the system to be verified; as long as the initial boot step can be secured, dm-verity is able to work without a TPM. It also seems likely that dm-verity will be faster since it does on-demand verification of blocks; there is no need to verify entire files before the first block can be accessed.

Wesley Miaw of Netflix made it clear that this patch is seen with favor there:

Netflix would like dm-verity to be included in the Linux kernel. Over the past year, we have been working with Google and porting dm-verity onto a number of consumer electronics devices running embedded Linux. Demand for this feature has been high and we see a lot of benefit associated with making dm-verity part of the official kernel.

The reasons for this interest should be fairly clear: dm-verity will make it easier to create locked-down Linux-based systems that will enforce whatever DRM requirements the movie studios may see fit to impose. Thanks to dm-verity, there will no longer be pirated films circulating on the Internet; or, perhaps, that's the sort of outcome that only happens in the movies. Whether or not the effort is futile, it shows that tools like dm-verity can be used to harden Linux-based systems in ways that are hostile to their users.

To an extent, Google's interests may align with those of Netflix: Chromebooks that can stream content from Netflix will be more attractive than those that cannot. But dm-verity also fits the ChromeOS concepts of minimal, trustworthy devices with data stored on Google's servers. For users who like this mode of operation, this kind of built-in integrity protection is a positive feature. Google can, one hopes, be trusted to hold the user's data; a suitably verified device can be trusted not to leak that data or the user's credentials to an attacker. Even if the running system is compromised through some sort of malware attack, a simple reboot should either put things right or make it clear that the machine can no longer be trusted.

As long as this functionality is under the user's control, it can be made to serve the user's interests. The "developer mode switch" designed into Chromebooks seems like a good compromise in this area. Some vendors will, beyond doubt, choose to incorporate tools like dm-verity without giving owners the ability to turn it off. That is not a good thing, but neither is it anything new.

Comments (12 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds