The current development kernel remains 3.1-rc6
; there have been no
releases - development or stable - in the last week.
Comments (2 posted)
Coherent vision isn't something that the kernel community really
-- Neil Brown
On both 1.5 and 1.75, OLPC obtained assurances from the companies
that the data sheets for the processor/companion chips/SoC would be
publicly available by the time the laptop reached production.
In both cases, the companies lied to get the designs started and
have no intention of ever releasing critical documentation outside
of an NDA.
-- John Watlington
People who remove debugability blindly have earned an one way
ticket to the Oort cloud. There is utter chaos already so they wont
be noticed at all.
-- Thomas Gleixner
Comments (1 posted)
The linux-next tree has been unavailable since kernel.org went down;
maintainer Stephen Rothwell had said previously that he was unable to post
it at an alternative location. That problem has evidently been overcome;
those wanting the current linux-next tree can find it on github
Full Story (comments: 28)
As of this writing, the kernel.org outage continues with no word as to when
the site will be back up. Kernel development never stops, though, and one
of the advantages of the git model is that copies of repositories exist all
over the place. A number of developers have announced new locations for
their trees; some of the relocated repositories are:
Most of these relocations have been advertised as being temporary. It will
be interesting to see how many of them move back to kernel.org on its
return, especially if that site comes back with stricter rules about access.
Comments (12 posted)
Kernel development news
Control groups remain a controversial topic in kernel circles; some
developers like them, others hate them. The latter group would like to see
the feature removed altogether, but that seems unlikely to happen; there
are too many users for control groups already, with more to come. The 2011
Linux Plumbers Conference featured a discussion among those users that gave
some insights into why control groups are useful and what could be done to
make them more so.
The session started with a brief talk by Kir Kolyshkin of Parallels; for
him, control groups are all about implementing containers. Containers can
be seen as a sort of poor user's virtualization; it enables the running of
multiple, isolated user-space systems all on the same kernel. Containers
tend to be more efficient than pure virtualization; they are also, he said,
the only form of virtualization available for the ARM architecture at the
moment. Control groups help in the implementation of containers by
isolating groups of processes from each other and by allowing the
imposition of resource limits on each group.
The bulk of the session, though, centered around a presentation by
Tim Hockin on Google's isolation and resource limitation needs. Google's
cluster runs all kinds of jobs which, internally, are divided into
"tier 1" and "tier 2" tasks. The general problem Google has is
that tasks normally do not use 100% of the resources they request; that
means that systems in the cluster tend to be underutilized. Google would
like to be able to pack more jobs onto each box, but they have to be very
careful about overcommitting resources. If that is not done carefully,
resource-intensive jobs can get in the way of urgent tasks like responding
to search queries.
Google uses its own form of containers to be able to overcommit systems
safely. Containers let Google place limits on the CPU usage, memory usage,
I/O bandwidth consumption, etc. of each group of processes on the system.
The goal, when all goes well, is to isolate each group from the others,
provide predictable resources to each, and to lose very little time on the
container implementation itself. Control groups are used when they are
available and suitable to the task; in other places, a lot of user-space
control code is used instead. The user-space code is complex and racy, Tim
said; they would like to be rid of it.
There is a special daemon running on each system that wakes up about every
100ms to have a look at what is going on. Should it detect a load spike
originating from the system's tier-1 work, it will stop or kill any tier-2
tasks needed to make room. This all works, but it could work better; more
support from the kernel would be helpful.
For example, memory use needs to be tightly controlled on these systems.
At the moment, Google is using the "fake NUMA" feature to partition system
memory and parcel it out as needed (see this
article for a bit more information on how that works). Fake NUMA is a
hack, though, with resource costs of its own. They are moving to the kernel's
memory controller, but it is not yet suitable for their needs because it
cannot work with nested control groups. They had similar problems with the
disk bandwidth controller, but that problem has
been resolved recently. In general, Tim said, anybody who is designing
a controller for Linux should think about how it will nest from the
One other problem with the memory controller is its handling of shared
memory. Currently shared pages are billed to the control group that
touches it first. That makes deterministic resource control hard,
especially in situations where the limits are set tightly. Tim didn't like
the idea of proportional billing (dividing the charge for each page across
each group that has it mapped) any better. That, he said, takes memory
billing out of the control of each group; if one control group exits, the
others will suddenly find themselves over their limits as their portion of
the shared pages grows. What he would like would be the ability to manually
arrange for pages backed by certain files to be billed to specific groups.
Then he could set up a system group to be billed for, say, the C library.
There are some other problems as well. The memory overhead of the memory
controller is painfully high, for example. Google would really like a way
to query the size of the working set for each control group, but that
capability is not currently there. They also really want per-control-group
reclaim to focus the memory management code on the control groups that are
currently exceeding their limits. And, if a container goes so far over its
limits that the out-of-memory killer gets involved, it would be really nice
to have a way to kill a whole control group at once instead of having to do
it one process at a time. (It's worth noting that patches for many
of these features exist; many of them come from Google).
Beyond that, there is a lot of interest in the I/O bandwidth controller. A
lot of Google jobs, he said, are "seek locked"; controlling how much I/O
bandwidth they use is important. Controllers for other types of resources
(number of threads, number of open file descriptors, network ports, etc.)
would be useful. And so on.
The session spent some time on other topics - primarily user-space checkpoint/restart. It was agreed
that everybody in the room was interested in better isolation, and that the
memory controller was the area in need of the most work at the moment.
The session was dominated by users of control groups, though; there were
not a lot of implementers present. Even more notable in their absence were
those developers who are opposed to control groups in their current form;
it would have been interesting to hear their ideas about how the needs
expressed there should really be met.
Comments (1 posted)
In a separate article
, LWN looked at the
discussion around how display drivers should be managed in the X server;
one of the things that was noted there was that the movement of much of the
driver logic into the kernel has reduced the rate of change on the
user-space side. Seemingly simultaneously, the kernel community got into
an extended discussion of how display drivers should be managed within the
kernel. Here, the complexity of contemporary hardware is likely to drive
both a consolidation of and some extensions to the kernel's interfaces.
It all started rather innocently with Tomi Valkeinen's description of the
challenges posed by the display system found on OMAP processors.
System-on-chip architectures like OMAP tend not to bother with the nice
separation between devices found on desktop- and server-oriented
architectures. So, instead of having a "video card," the OMAP has, on one
side, an acceleration engine that can render pixels into main memory and,
on the other, a "display subsystem" connecting that memory to the video
display. That subsystem consists of a series of overlay processors, each
of which can render a window from memory; the output of all the overlay
processors is composited by the hardware and actually put on the display.
Or, more specifically, the output from these processors is handed to the
panel controller, which may be a complex bit of hardware in its own right.
So OMAP graphics depends on a set of interconnected components. Filling
video memory can be done via the framebuffer interface, via the direct
rendering (DRM) interface, or, for video captured from a camera, via the
Video4Linux2 overlay interface. Video memory must be managed for those
interfaces, then handed to the display processors which, in turn, must
communicate with the panel controller. All of this works, but, as Tomi
noted, there seems to be a lot of duplication of code between these various
interfaces and no generic way for the kernel to manage things at any of
these levels. Wouldn't it be nicer, he asked, to create a low-level
display framework to handle these tasks?
He is not the first to ask such a question; the graphics developers have
been working on this problem for some years, and much of the solution seems
clear. The DRM code is where the bulk of the work has been done in the
last few years; it is the only display subsystem that comes close to
being able to describe and drive contemporary hardware. As the memory
management issues associated with graphics become more complex, it becomes
increasingly necessary to use a management framework like GEM, and that
means using DRM. It also, as a result of its X-server heritage, contains
a couple decades' worth of experience on dealing with the quirks of
real-world video hardware. So most developers seem to believe that, over
time, DRM should become the interface for mode setting and memory
management, while the older framebuffer interface should become a
compatibility layer over DRM until it fades away entirely.
That said, Florian Tobias Schandinat, who recently took over the
maintainership of the framebuffer code, has a
different opinion. To Florian, the framebuffer layer is alive and
well, it has more users than DRM does, and it will not be going away
anytime soon. His biggest complaint with
DRM appears to be that (1) it is significantly more complex, making
the drivers more complex, and (2) exposing the acceleration
capabilities of the graphics processor makes it easy for applications to
crash the system. The fact that the framebuffer API does not provide any
mechanism for acceleration is, in his view, an advantage.
Florian would appear to be in the minority here, though; most developers seem to
feel that it will be increasingly hard to manage contemporary hardware
without the capabilities that the DRM layer provides. The presence of bugs
in DRM drivers
that can crash the system - especially when acceleration is used - is not
really denied by anybody, but it was
pointed out that use of DRM does not require the use of acceleration. The
hardware is also apparently getting better in that it makes it easier for the
operating system to regain control of the GPU when necessary. In any case,
crashes and bugs are seen as something to fix and not as a reason to avoid
That leaves the question of how to handle the Video4Linux2 overlay
feature. Overlay has been somewhat deprecated and unloved for some years,
though it remains an official part of the interface; it was designed for an
earlier, simpler era. When CPUs reached a point where they could easily
manage a video stream from a camera device, the motivation for overlay
faded - for a while. More recently, the resolution of video streams has
increased notably and power consumption has become a much more important
consideration. Even if the CPU can process a video stream in
real time on a mobile device, the battery will last longer if the CPU
sleeps and the stream goes straight to video memory. That means that the
ability to overlay video streams onto the display in a zero-copy manner has
become interesting again.
Given that the old overlay interface is seen as inadequate,
there is a clear need for a new one. Jesse Barnes floated a proposal for a new overlay API back in
April; the DMA buffer sharing proposal
posted more recently is also aimed at this requirement. The word is that this topic was discussed at the X
Developers Conference and that a new proposal is forthcoming soon.
As an indication of where things could be heading in the longer term, it is
worth looking at
this message from Laurent Pinchart, the
author of the V4L2 media controller
subsystem. The complexity of video acquisition devices has reached a
point where treating them as a single device no longer works well; thus the
media controller, which allows user space to query and change the
connections between a pipeline of devices. The display problem, he said,
is essentially the same; perhaps, he suggested, the media controller could
be useful for controlling display pipelines as well. The idea did not
immediately take the world by storm, but it may give an indication of where
things will eventually need to go in the future.
The last few years have seen the consolidation of a lot of display-oriented
code into the kernel; that code is increasingly using common
infrastructure like the GEM memory manager. It is not hard to imagine that
this consolidation will continue to the point where the DRM subsystem
becomes the supported way for controlling displays, with the other
interfaces implemented as some sort of compatibility layer. The complexity
of the DRM code is, in the end, driven by the complexity of the hardware it
must drive, and that hardware does not look like it will be getting any
simpler anytime soon.
Comments (5 posted)
It is not often that Netflix employees show up on linux-kernel to advocate
for the merging of specific patches. But that is exactly what has happened
after a posting of a new device mapper module called dm-verity. As one
might expect, dm-verity has little to do with, say, efficient sorting of
DVD mailings. It is, instead, a classic piece of security technology with
the potential to work in the user's interests - or against those interests.
The purpose of dm-verity is to implement a device mapper target capable of
validating the data blocks contained in a filesystem against a list of
cryptographic hash values. If the hash for a specific block does not come
out as expected, the module assumes that the device has been tampered with
and causes the access attempt to fail. It has been put forward by Mandeep
Singh Baines of Google's Chromium OS team, but there appears to be interest
in this capability beyond that small group.
At the core of this new facility is a module called dm-bht, which works
with a list of block numbers and their associated hash values. This list
is organized into a simple tree for quick access to the hashes for
arbitrary blocks. In essence, the leaves of the tree are pages containing
hash values; each higher level in the tree contains hashes of the blocks
below it. Verifying a block requires not only checking the hash value for
that specific block; it is also necessary to verify hashes up to the root
of the tree. If the hash for the tree root (which is assumed to be
trusted) checks out, all is well.
The dm-bht code can use any hash algorithm supported by the kernel's crypto
API; SHA1 is given as an example, but others can be used as well.
dm-verity implements a read-only target; it is assumed that there is no
need to change the data protected by this scheme (being, most likely, the
binaries run by the system itself) during operation. The tree of block
hashes is stored with (or near) the data itself, but the root hash must be
passed in externally. If that root hash comes from a trusted source, it
should be possible to detect any modification of the disk, in either the
data itself or in the stored hash tree. So, if all goes well, a system
running with dm-verity can be assured that the underlying software has not
been changed. It's worth noting that integrity checking for any specific
block does not happen until the kernel tries to read that block into the
page cache. There is, thus, no need for a lengthy verification process at
boot or mount time.
All of this depends on getting the right hash value into the system at
startup time. If some sort of hardware-verified trusted bootloader is in
use, that can probably be done in some sort of secure manner. Device
mapper setup is a complex task requiring some sort of running user space.
This means that a system using dm-verity will need some other mechanism to
load a trusted kernel and initramfs or the whole chain will break. A
trusted bootloader can achieve that kind of setup; another example given by
the developers is a system booting from a "known good" source like a USB
stick that is never left unattended.
One might wonder how dm-verity differs from existing features like the extended verification module. EVM requires
and uses a trusted platform module (TPM) on the system to be verified; as
long as the initial boot step can be secured, dm-verity is able to work
without a TPM. It also seems likely that dm-verity will be faster since it
does on-demand verification of blocks; there is no need to verify entire
files before the first block can be accessed.
Wesley Miaw of Netflix made it clear that
this patch is seen with favor there:
Netflix would like dm-verity to be included in the Linux
kernel. Over the past year, we have been working with Google and
porting dm-verity onto a number of consumer electronics devices
running embedded Linux. Demand for this feature has been high and
we see a lot of benefit associated with making dm-verity part of
the official kernel.
The reasons for this interest should be fairly clear: dm-verity will make
it easier to create locked-down Linux-based systems that will enforce
whatever DRM requirements the movie studios may see fit to impose. Thanks
to dm-verity, there will no longer be pirated films circulating on the
Internet; or, perhaps, that's the sort of outcome that only happens in the
movies. Whether or not the effort is futile, it shows that tools like
dm-verity can be used to harden Linux-based systems in ways that are
hostile to their users.
To an extent, Google's interests may align with those of Netflix:
Chromebooks that can stream content from Netflix will be more attractive
than those that cannot. But dm-verity also fits the ChromeOS concepts of
minimal, trustworthy devices with data stored on Google's servers. For
users who like this mode of operation, this kind of built-in integrity
protection is a positive feature. Google can, one hopes, be trusted to
hold the user's data; a suitably verified device can be trusted not to
leak that data or the user's credentials to an attacker. Even if the running system is
compromised through some sort of malware attack, a simple reboot should
either put things right or make it clear that the machine can no longer be
As long as this functionality is under the user's control, it can be made
to serve the user's interests. The "developer mode switch" designed into
Chromebooks seems like a good compromise in this area. Some vendors will,
beyond doubt, choose to incorporate tools like dm-verity without giving
owners the ability to turn it off. That is not a good thing, but neither
is it anything new.
Comments (12 posted)
Patches and updates
- Thomas Gleixner: 3.0.4-rt14 .
(September 15, 2011)
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>