Brief items
The current development kernel is 3.9-rc2,
released on March 10. "
Hey, things
have been reasonable calm. Sure, Dave Jones has been messing with trinity
and we've had some excitement from that, but Al is back, and is hopefully
now busy virtually riding to the rescue on a white horse. But otherwise
it's been good for this phase in the rc window."
Stable updates: no stable updates have been released in the last
week. As of this writing, the 3.8.3,
3.4.36, and
3.0.69 updates are in the review process;
they can be expected on or after March 14.
Comments (none posted)
More importantly, does a vintage kernel sound better than a more
recent one? I've been doing some testing and the results are
pretty clear, not that they should surprise anyone who knows
anything about recording:
1) Older kernels sound much warmer than newer ones.
2) Kernels compiled by hand on the machine they run on sound less
sterile than upstream distro provided ones which also tend to
have flabby low end response and bad stereo imaging.
3) As if it needed saying, gcc4 is a disaster for sound quality. I
mean, seriously if you want decent audio and you use gcc4 you may
as well be recording with a tin can microphone.
—
Ben Bell (Thanks to Johan Herland)
But this is definitely another of those "This is our most desperate
hour. Help me, Al-biwan Ke-Viro, you're my only hope" issues.
Al? Please don't make me wear that golden bikini.
—
Linus Torvalds
Every Linux kernel maintainer with meaningful contributions to the
security of the Linux kernel will be fully sponsored by the Pax
Team. The LKSC organization team has hired strategically placed
bouncers with bats to improve Linux kernel security and future LKML
discussions.
—
Pax Team
Comments (2 posted)
By Jonathan Corbet
March 13, 2013
The "overlayfs" filesystem is one implementation of the
union filesystem concept, whereby two or more
filesystems can be combined into a single, virtual tree. LWN first
reported on overlayfs in 2010; since then it
has seen continued development and has been shipped by a number of
distributors. It has not, however, managed to find its way into the
mainline kernel.
In a recent posting of the overlayfs patch
set, developer Miklos Szeredi asked if it could be considered for
inclusion in the 3.10 development cycle. He has made such requests before,
but, this time, Linus answered:
Yes, I think we should just do it. It's in use, it's pretty small,
and the other alternatives are worse. Let's just plan on getting
this thing done with.
At Linus's request, Al Viro has agreed to
review the patches again, though he noted that he has not been entirely
happy with them in the past. Unless something serious and unfixable
emerges from that
review, it looks like overlayfs is finally on track for merging
into the mainline kernel.
Comments (3 posted)
Kernel development news
By Michael Kerrisk
March 13, 2013
One of the features merged in the 3.9 development cycle was TCP and UDP
support for the SO_REUSEPORT socket option; that support was
implemented in a series of patches by Tom Herbert. The new socket option
allows multiple sockets on the same host to
bind to the same port, and is intended to improve the performance of
multithreaded network server applications running on top of multicore systems.
The basic concept of SO_REUSEPORT is simple enough. Multiple
servers (processes or threads) can bind to the same port if they each set
the option as follows:
int sfd = socket(domain, socktype, 0);
int optval = 1;
setsockopt(sfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof(optval));
bind(sfd, (struct sockaddr *) &addr, addrlen);
So long as the first server sets this option before binding its
socket, then any number of other servers can also bind to the same port if
they also set the option beforehand. The requirement that the first server
must specify this option prevents port hijacking—the possibility that
a rogue application binds to a port already used by an existing server in
order to capture (some of) its incoming connections or datagrams. To
prevent unwanted processes from hijacking a port that has already been
bound by a server using SO_REUSEPORT, all of the servers that
later bind to that port must have an effective user ID that matches the
effective user ID used to perform the first bind on the socket.
SO_REUSEPORT can be used with both TCP and UDP sockets. With
TCP sockets, it allows multiple listening sockets—normally each in a
different thread—to be bound to the same port. Each thread can then
accept incoming connections on the port by calling accept(). This
presents an alternative to the traditional approaches used by multithreaded
servers that accept incoming connections on a single socket.
The first of the
traditional approaches is to have a single listener thread that accepts all
incoming connections and then passes these off to other threads for
processing. The problem with this approach is that the listening thread can
become a bottleneck in extreme cases. In early
discussions on SO_REUSEPORT, Tom noted that he was dealing
with applications that accepted 40,000 connections per second. Given that
sort of number, it's unsurprising to learn that Tom works at Google.
The second of the traditional approaches used by multithreaded servers
operating on a single port is to have all of the threads (or processes)
perform an accept() call on a single listening socket in a simple
event loop of the form:
while (1) {
new_fd = accept(...);
process_connection(new_fd);
}
The problem with this technique, as Tom pointed out, is that when multiple threads are
waiting in the accept() call, wake-ups are not fair, so that,
under high load, incoming connections may be distributed across threads in a
very unbalanced fashion. At Google, they have seen a factor-of-three
difference between the thread accepting the most connections and the thread
accepting the fewest connections; that sort of imbalance can lead to
underutilization of CPU cores. By contrast, the SO_REUSEPORT
implementation distributes connections evenly across all of the threads (or
processes) that are blocked in accept() on the same port.
As with TCP, SO_REUSEPORT allows multiple UDP sockets to be
bound to the same port. This facility could, for example, be useful in a
DNS server operating over UDP. With SO_REUSEPORT, each thread
could use recv() on its own socket to accept datagrams arriving on
the port. The traditional approach is that all threads would compete to
perform recv() calls on a single shared socket. As with the second
of the traditional TCP scenarios described above, this can lead to
unbalanced loads across the threads. By contrast, SO_REUSEPORT
distributes datagrams evenly across all of the receiving threads.
Tom noted that the traditional
SO_REUSEADDR socket option already allows multiple UDP sockets to
be bound to, and accept datagrams on, the same UDP port. However, by
contrast with SO_REUSEPORT, SO_REUSEADDR does not prevent
port hijacking and does not distribute datagrams evenly across the
receiving threads.
There are two other noteworthy points about Tom's patches. The first
of these is a useful aspect of the implementation. Incoming connections and
datagrams are distributed to the server sockets using a hash based on the
4-tuple of the connection—that is, the peer IP address and port plus
the local IP address and port. This means, for example, that if a client
uses the same socket to send a series of datagrams to the server port, then
those datagrams will all be directed to the same receiving server (as long
as it continues to exist). This eases the task of conducting stateful
conversations between the client and server.
The other noteworthy point is that there is a defect in the current implementation of TCP
SO_REUSEPORT. If the number of listening sockets bound to a port
changes because new servers are started or existing servers terminate, it
is possible that incoming connections can be dropped during the three-way
handshake. The problem is that connection requests are tied to a specific
listening socket when the initial SYN packet is received during the
handshake. If the number of servers bound to the port changes, then the
SO_REUSEPORT logic might not route the final ACK of
the handshake to the correct listening socket. In this case, the client
connection will be reset, and the server is left with an orphaned
request structure. A solution to the problem is still being worked on, and
may consist of implementing a connection request table that can be shared
among multiple listening sockets.
The SO_REUSEPORT option is non-standard, but available in a
similar form on a number of other UNIX systems (notably, the BSDs, where the
idea originated). It seems to offer a useful alternative for squeezing the
maximum performance out of network applications running on multicore
systems, and thus is likely to be a welcome addition for some application
developers.
Full Story (comments: 9)
By Michael Kerrisk
March 13, 2013
A February linux-kernel mailing list discussion of a patch that extends
the use of the CAP_COMPROMISE_KERNEL capability soon evolved into
a discussion of the specific uses (or abuses) of the CAP_SYS_RAWIO
capability within the kernel. However, in reality, the discussion once
again exposes some general difficulties in the Linux capabilities
implementation—difficulties that seem to have no easy solution.
The discussion began when Kees Cook submitted a patch to guard writes to model-specific
registers (MSRs) with a check to see if the caller has the
CAP_COMPROMISE_KERNEL capability. MSRs are x86-specific control
registers that are used for tasks such as debugging, tracing, and
performance monitoring; those registers are accessible via the /dev/cpu/CPUNUM/msr
interface. CAP_COMPROMISE_KERNEL
(formerly known as CAP_SECURE_FIRMWARE)
is a new capability designed for use in conjunction with UEFI secure boot,
which is a mechanism to ensure that the kernel is booted from an on-disk
representation that has not been modified.
If a process has the CAP_COMPROMISE_KERNEL capability, it can
perform operations that are not allowed in a secure-boot environment;
without that capability, such operations are denied. The idea is that if
the kernel detects that it has been booted via the UEFI secure-boot
mechanism, then this capability is disabled for all processes. In turn,
the lack of that capability is intended to
prevent operations that can modify the running kernel.
CAP_COMPROMISE_KERNEL is not yet part of the mainline kernel, but
already exists as a
patch in the Fedora distribution and Matthew Garrett is working towards
its inclusion in the mainline kernel.
H. Peter Anvin wondered whether
CAP_SYS_RAWIO did not already suffice for Kees's purpose. In
response, Kees argued that
CAP_SYS_RAWIO is for governing reads: "writing needs a much
stronger check". Kees went on to
elaborate:
there's a reasonable distinction between systems that expect to
strictly enforce user-space/kernel-space separation
(CAP_COMPROMISE_KERNEL) and things that are fiddling with
hardware (CAP_SYS_RAWIO).
This in turn led to a short discussion about whether a capability was
the right way to achieve the goal of restricting certain operations in a
secure-boot environment. Kees was inclined
to think it probably was the right approach, but deferred to Matthew
Garrett, implementer of much of the secure-boot work on Fedora. Matthew
thought that a capability approach seemed the best fit, but noted:
I'm not wed to [a capability approach] in the slightest, and in
fact it causes problems for some userspace (anything that drops all
capabilities suddenly finds itself unable to do something that it
expects to be able to do), so if anyone has any suggestions for a
better approach…
In the current mainline kernel, the CAP_SYS_RAWIO capability
is checked in the msr_open() function: if the caller has that
capability, then it can open the MSR device and perform reads and writes on
it. The purpose of Kees's patch is to add a CAP_COMPROMISE_KERNEL
check on each write to the device, so that in a secure-boot environment the
MSR devices are readable, but not writeable. The problem that Matthew
alludes to is that this approach has the potential to break user space
because, formerly, there was no capability check on MSR writes. An
application that worked prior to the introduction of
CAP_COMPROMISE_KERNEL can now fail in the following
scenario:
-
The application has a full set of privileges.
-
The application opens an MSR device (requires CAP_SYS_RAWIO).
-
The application drops all privileges, including CAP_SYS_RAWIO and
CAP_COMPROMISE_KERNEL.
-
The application performs a write on the previously opened MSR device
(requires CAP_COMPROMISE_KERNEL).
The last of the above steps would formerly have succeeded, but, with
the addition of the CAP_COMPROMISE_KERNEL check, it now fails. In a
subsequent reply, Matthew noted that QEMU was
one program that was
broken by a scenario similar to the above. Josh Boyer noted that Fedora has had a few reports of
applications breaking on non-secure-boot systems because of scenarios like
this. He highlighted why such breakages are so surprising to users and why
the problem is seemingly unavoidable:
… the general problem is people think dropping all caps blindly
is making their apps safer. Then they find they can't do things they
could do before the new cap was added…
Really though, the main issue is that you cannot introduce new
caps to enforce finer grained access without breaking something.
Shortly afterward, Peter stepped back to
ask a question about the bigger picture: why should
CAP_SYS_RAWIO be allowed on a secure-boot system? In other words,
rather than adding a new CAP_COMPROMISE_KERNEL capability that is
disabled in secure-boot environments, why not just disable
CAP_SYS_RAWIO in such environments, since it is the possession of
that capability that permits compromising a booted kernel?
That led Matthew to point out a major
problem with CAP_SYS_RAWIO:
CAP_SYS_RAWIO seems to have ended up being a catchall of "Maybe
someone who isn't entirely root should be able to do this", and not
everything it covers is equivalent to being able to compromise the
running kernel. I wouldn't argue with the idea that maybe we should
just reappraise most of the current uses of CAP_SYS_RAWIO, but
removing capability checks from places that currently have them seems
like an invitation for userspace breakage.
To see what Matthew is talking about, we need to look at a little
history. Back in January 1999, when capabilities first appeared with the
release of Linux 2.2, CAP_SYS_RAWIO was a single-purpose
capability. It was used in just a single C file in the kernel source, where
it governed access to two system calls: iopl() and
ioperm(). Those system calls permit access to I/O ports, allowing
uncontrolled access to devices (and providing various ways to modify the
state of the running kernel); hence the requirement for a capability in
order to employ the calls.
The problem was that CAP_SYS_RAWIO rapidly grew to cover a
range of other uses. By the time of Linux 2.4.0, there were 37 uses across
24 of the kernel's C source files, and looking
at the 3.9-rc2 kernel, there are 69 uses in 43 source files. By either
measure, CAP_SYS_RAWIO is now the third most commonly used
capability inside the kernel source (after CAP_SYS_ADMIN and
CAP_NET_ADMIN).
CAP_SYS_RAWIO seems to have encountered a fate similar to CAP_SYS_ADMIN,
albeit on a smaller scale. It has expanded well beyond its original narrow
use. In particular, Matthew noted:
Not having CAP_SYS_RAWIO blocks various SCSI commands, for
instance. These might result in the ability to write individual blocks
or destroy the device firmware, but do any of them permit modifying
the running kernel?
Peter had some choice words to describe
the abuse of CAP_SYS_RAWIO to protect operations on SCSI
devices. The problem, of course, is that in order to perform relatively
harmless SCSI operations, an application requires the same capability that
can trivially be used to damage the integrity of a secure-boot system. And
that, as Matthew went on to point out, is
the point of CAP_COMPROMISE_KERNEL: to disable the truly dangerous
operations (such as MSR writes) that CAP_SYS_RAWIO permits, while
still allowing the less dangerous operations (such as the
SCSI device operations).
All of this leads to a conundrum that was nicely summarized by Matthew. On the one
hand, CAP_COMPROMISE_KERNEL is needed to address the problem that
CAP_SYS_RAWIO has become too diffuse in its meaning. On the other
hand, the addition of CAP_COMPROMISE_KERNEL checks in places where
there were previously no capability checks in the kernel means that
applications that drop all capabilities will break. There is no easy way
out of this difficulty. As Peter noted:
"We thus have a bunch of unpalatable choices, **all of which are
wrong**".
Some possible resolutions of the conundrum were mentioned by Josh Boyer earlier in the thread:
CAP_COMPROMISE_KERNEL could be treated as a "hidden" capability
whose state could be modified only internally by the kernel. Alternatively,
CAP_COMPROMISE_KERNEL might be specially treated, so that it can
be dropped only by a capset() call that operates on that
capability alone; in other words, if a capset() call specified
dropping multiple capabilities, including CAP_COMPROMISE_KERNEL,
the state of the other capabilities would be changed, but not the state of
CAP_COMPROMISE_KERNEL. The problem with these approaches is that
they special-case the treatment of CAP_COMPROMISE_KERNEL in a
surprising way (and surprises in security-related APIs have a way of coming
back to bite in the future). Furthermore, it may well be the case that analogous
problems are encountered in the future with other capabilities; handling
each of these as a special case would further add to the complexity of the
capabilities API.
The discussion in the thread touched on a number of other difficulties
with capabilities. Part of the solution to the problem of the overly broad
effect of CAP_SYS_RAWIO (and CAP_SYS_ADMIN) might be to
split the capability into smaller pieces—replace one capability with
several new capabilities that each govern a subset of the operations
governed by the old capability. Each privileged operation in the kernel
would then check to see whether the caller had either the old or the new
privilege. This would allow old binaries to continue to work while allowing
new binaries to employ the new, tighter capability. The risk with this
approach is, as Casey Schaufler noted, the
possibility of an explosion in the number of capabilities, which would
further complicate administering capabilities for
applications. Furthermore, splitting capabilities in this manner doesn't
solve the particular problem that the CAP_COMPROMISE_KERNEL
patches attempt to solve for CAP_SYS_RAWIO.
Another general problem touched on by
Casey is that capabilities still have not seen wide adoption as a
replacement for set-user-ID and set-group-ID programs. But, as Peter noted, that may well be
in large part because a bunch of the capabilities are so
close to equivalent to "superuser" that the distinction is
meaningless... so why go through the hassle?
With 502 uses in the 3.9-rc2 kernel, CAP_SYS_ADMIN is the most
egregious example of this problem. That problem itself would appear to
spring from the Linux kernel development model: the decisions about which
capabilities should govern new kernel features typically are made by individual
developer in a largely decentralized and uncoordinated
manner. Without having a coordinated big picture, many developers have
adopted the seemingly safe choice, CAP_SYS_ADMIN. A related
problem is that
it turns
out that a number of capabilities allow escalation to full root
privileges in certain circumstances. To some degree, this is probably
unavoidable, and it doesn't diminish the fact that a well-designed
capabilities scheme can be used to reduce the attack surface of applications.
One approach that might help solve the problem of overly broad capabilities
is hierarchical capabilities. The idea, mentioned by Peter, is to
split some capabilities in a fashion similar to the way that the root
privilege was split into capabilities. Thus, for instance,
CAP_SYS_RAWIO could become a hierarchical capability with
sub-capabilities called (say) CAP_DANGEROUS and
CAP_MOSTLY_HARMLESS. A process that gained or lost
CAP_SYS_RAWIO would implicitly gain or lose both
CAP_DANGEROUS and CAP_MOSTLY_HARMLESS, in the
same way that transitions to and from an effective user ID of 0
grant and drop all capabilities. In addition, sub-capabilities could be
raised and dropped independently of their "siblings" at the same
hierarchical level. However, sub-capabilities are not a concept that
currently exists in the kernel, and it's not clear whether the existing
capabilities API could be tweaked in such a way that they could be
implemented sanely. Digging deeper into that topic remains an open
challenge.
The CAP_SYS_RAWIO discussion touched on a long list of
difficulties in the current Linux capabilities implementation: capabilities
whose range is too broad, the difficulties of splitting capabilities while
maintaining binary compatibility (and, conversely, the administrative
difficulties associated with defining too large a set of capabilities), the
as-yet poor adoption of binaries with file capabilities vis-a-vis
traditional set-user-ID binaries, and the (possible) need for an API for
hierarchical capabilities. It would seem that capabilities still have a way
to go before they can deliver on the promise of providing a manageable
mechanism for providing discrete, non-elevatable privileges to
applications.
Comments (38 posted)
By Jonathan Corbet
March 12, 2013
Many people have talked about the Android kernel code and its relation
to the mainline. One of the people who has done the most to help bring
Android and the mainline closer together is John Stultz; at the 2013 Linaro
Connect Asia event, he
talked about the status of the Android code. The picture that emerged
shows that a lot of progress has been made, but there is still a lot of
work yet to be done.
What's out there
John started by reviewing the existing Android kernel patches by
category, starting with the core code: the binder interprocess
communication mechanism, the ashmem shared
memory mechanism, the Android
logger, and monotonic event
timestamps. The timestamp patch is needed to
get timestamps from the monotonic clock for input events; otherwise it is
hard to be sure of the timing between events, which makes gesture
recognition hard. The problem is that these events cannot be added without
breaking the kernel's ABI,
so they cannot be just merged without further consideration.
There is a set of changes that John categorized as
performance and power-consumption improvements. At the top of the list is
the infamous "wakelock" mechanism, used by Android to know when the system
as a whole can be suspended to save power. There is a special alarm
device that can generate alarms that will wake the system from a suspended
state. The Android low-memory killer gets rid of tasks when memory gets
tight; it is designed to activate more quickly than the kernel's
out-of-memory killer, which will not act until a memory shortage is
seriously affecting system performance. Also in this category is the
interactive CPU frequency governor, which immediately ramps the CPU up to
its maximum speed in response to touch events; its purpose is to help the
system provide the fastest response possible to user actions.
The "debugging features" category includes a USB gadget driver that
supports communication with the adb debugging tools; it is also
used to support file transfer using the media transfer
protocol (MTP). The FIQ debugger is a low-level kernel debugger
with some unique features — communication through the device's headphone
jack being one of them. The RAM console will save kernel messages for
later recovery in case
of a crash. There is the "key-reset" driver, a kind of
"control-alt-delete for phones." The patches to the ARM architecture's
"embedded trace macrocell" and
"embedded trace buffer" drivers offer improved logging of
messages from peripheral processors. Then there is the "goldfish"
emulator, derived
from QEMU, which allows Android to be run in an emulated mode on a desktop
system.
The list of networking features starts with the "paranoid networking
framework," the mechanism that controls which applications have access to
the network;
it restricts that access to members of a specific group. There is a set of
netfilter changes mostly aimed at providing better accounting for which
applications are using data. There are some Bluetooth improvements and the
Broadcom "bcmhd" WiFi driver.
In the graphics category is the ION memory
allocator, which handles DMA buffer management. The
"sync" driver provides a sort of mutex allowing applications to wait for
a vertical refresh cycle. There is also a miscellaneous category that
includes the battery meta-driver, which provides wakelock support and thermal
management. That category contains various touch screen drivers, the
"switch" class for
dealing with physical switches, and the timed GPIO facility as well.
Finally, the list of deprecated features includes the PMEM memory
allocator, the early suspend mechanism, the "apanic" driver, and the
yaffs2 filesystem, which has been replaced by ext4.
Upstreaming status
Having passed over the long list of Android patches, John moved on to
discuss where each stands with regard to upstreaming. The good news is
that some of these features are already upstream. Wakelocks are, arguably,
the most important of those; Rafael Wysocki's opportunistic
suspend work, combined with a user-space emulation library, has made it
possible for Android to move over to a mainline-based solution. John's
monotonic event timestamp patches are also in the mainline, controlled by a
special ioctl() command to avoid breaking the ABI; Android is
using this mechanism as of the 4.2 ("Jelly Bean") release. The RAM console
functionality
is available via the pstore mechanism. The
switch class is now supported via the kernel's "extconn" driver, but
Android is not yet using this functionality.
A number of the Android patches are currently in the staging tree.
These include the binder, ashmem, the logger, the low-memory killer, the
alarm device, the gadget device, and the timed GPIO feature. The sync
driver was also just pulled into the staging tree for merging in the 3.10
development cycle. With all of
the staging code, John said, Android "just works" on a mainline kernel.
That does not mean that the job is done, though; quite a few Android
patches are still in need of more work to get upstream. One such patch is
the FIQ debugger; work is being done to integrate it with the kdb
debugger, but, among other problems, the developers are having a hard time
getting review attention for their patches. The key-reset driver was
partially merged for the 3.9 kernel, but there are a number of details to
be dealt with still. The plan for the low-memory killer is to integrate it
with the mempressure control group patch and
use the low-memory notification interface that is part of that mechanism; the
developers hope to merge that code sometime soon. Ashmem is to be
reimplemented via one of the volatile ranges patch sets, but
there is still no agreement on the right direction for this feature.
Much of the goldfish code has been merged for the 3.9 release.
The ION memory allocator has not yet been submitted for consideration
at all. Much of this code duplicates what has been done with the CMA allocator and the DMA buffer sharing mechanism; integrating
everything could be a challenge. There should be pieces that can be carved
out and submitted, John said, even if the whole thing requires more work.
The interactive CPU frequency driver has been rejected by the scheduler
developers in its current form. Supporting this feature properly could
require some significant reworking of the scheduler code.
The netfilter changes have been submitted for inclusion, but there is some
cleanup required before they can be merged. The paranoid networking
code, instead, is not appropriate for upstream and will not be submitted.
The right solution here would appear to be for Android to use the network
namespaces feature, but that would require some big changes on the Android
side, so it is not clear when it might happen.
The alarm device code needs to be integrated with the kernel's timerfd subsystem. Much of that integration
has been
done, but it requires an Android interface change, which is slowing things
down. The embedded trace driver changes have been submitted, but the
developer who did that work has moved on, so the code is now unmaintained.
It is also undocumented and nobody else fully understands it at this
point. There is a desire to replace the Android gadget driver with the CCG
("configurable composite gadget") code that is currently in the staging
tree, but CCG does not yet do everything that Android needs, and it appears
to be unmaintained as well. There was talk in the session of Linaro
possibly taking over the development of that driver in the future.
Finally, it would be good to get the binder and logger patches out of the
staging tree. That, however, is "complicated stuff" and may take a while.
There is hope that the upcoming patches to support D-Bus-like communication
mechanisms in the kernel will be useful to provide binder-like
functionality as well.
There are a few issues needing longer-term thought. The integration of the
sync driver and the DMA buffer sharing mechanism is being thought through
now; there are a lot of details to be worked out. The upstreaming
of ION could bring its own challenges. Much of that code has superficial
similarities to the GEM and TTM memory managers that already exist in the
kernel. Figuring out how to merge the interactive CPU frequency driver is
going to be hard, even before one gets into details like how it plays with
the ongoing big.LITTLE initiative. Some fundamental
scheduler changes will be needed, but it's not clear who is going to do
this work. The
fact that Google continues to evolve its CPU frequency driver is not
helping in this regard. There will, in other words, be plenty to keep
developers busy for some time.
Concluding remarks
In total, John said, there are 361 Android patches for the kernel, with the
gadget driver being the largest single chunk. Some of these patches are quite
old; one of the patches actually predates Android itself. Google is not
standing still; there is new code joining that which has been around for a
while. Current areas of intensive development include ION, the sync
driver, the CPU frequency driver, the battery driver, and the netfilter
code. While some of the code is going into the mainline, the new
code adds to the pile of out-of-tree patches shipped by the Android project.
Why should we worry about this, John asked, when it really is just another
one of many forks of the kernel? Forking is how development gets done;
see, for example, the development of the realtime patches or how many
filesystems are written. But, he said, forks of entire communities, where
code does not get merged back, are more problematic. In this case, we
are seeing a lot of ARM systems-on-chip being developed with Android in
mind from the beginning, leading to an increase in the use of out-of-tree
drivers and kernels. Getting the Android base into the mainline makes it
easier for developers to work with, and makes it easier to integrate
Android-related code developed by others. John would like Android
developers to see the mainline kernel, rather than the Android world, as
their community.
Things are getting better; Zach Pfeffer pointed out that the work being
done to bring Android functionality into the mainline kernel is, indeed,
being used by the Android team. The relationship
between that team and the kernel development community is getting better in
general. It is a good time for people
who are interested to join the effort and help get things done.
[Your editor would like to thank Linaro for travel assistance to attend
this event.]
Comments (17 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>