Brief items
The current development kernel is 3.4-rc6,
released on May 6. "
Another week,
another -rc - and I think we're getting close to final 3.4. So please do
test."
Stable updates: the 3.0.31 and 3.3.5 updates were released on May 7 with
the usual pile of important fixes.
The 3.2.17 update, with 167 fixes, is in
the review process as of this writing; it can be expected on or after
May 11.
Comments (2 posted)
So [KERN_CONT] is like a defibrillator: it is good to *have* one,
but it's really bad to have to *use* one.
—
Linus Torvalds
I really love fairy tales, just not in the context of kernel code.
—
Thomas Gleixner
Quick! Everyone say something extreme for this week's LWN Quote of
the Week!
—
Jon
Masters (warning: disappointing results)
Comments (none posted)
Kathleen Nichols and Van Jacobson have published
a paper describing a
new network queue management algorithm that, it is hoped, will play a
significant role in the solution to the bufferbloat problem.
"
CoDel (Controlled Delay Management) has three major innovations that
distinguish it from prior AQMs. First, CoDel’s algorithm is not based on
queue size, queue-size averages, queue-size thresholds, rate measurements,
link utilization, drop rate or queue occupancy time. Starting from Van
Jacobson’s 2006 insight, we used the local minimum queue as a more
accurate and robust measure of standing queue. Then we observed that it is
sufficient to keep a single-state variable of how long the minimum has been
above or below the target value for standing queue delay rather than
keeping a window of values to compute the minimum. Finally, rather than
measuring queue size in bytes or packets, we used the packet-sojourn time
through the queue. Use of the actual delay experienced by each packet is
independent of link rate, gives superior performance to use of buffer size,
and is directly related to the user-visible performance."
For more information, see this
blog post from Jim Gettys. "A preliminary Linux implementation
of CoDel written by Eric Dumazet and Dave Täht is now being tested on
Ethernet over a wide range of speeds up to 10gigE, and is showing very
promising results similar to the simulation results in Kathie and Van’s
article. CoDel has been run on a CeroWrt home router as well, showing its
performance."
Comments (13 posted)
Kernel development news
By Jonathan Corbet
May 9, 2012
"Bufferbloat" can be thought of as the buffering of too many packets in
flight between two network end points, resulting in excessive delays and
confusion of TCP's flow control algorithms. It may seem like a simple
problem, but the simple solution—make buffers smaller—turns out not to
work. A true solution to bufferbloat requires a deeper understanding of
what is going on, combined with improved software across the net.
A new paper from
Kathleen Nichols and Van Jacobson provides some of that understanding and
an algorithm for making things better—an algorithm that has been
implemented first in Linux.
Your editor had a classic bufferbloat experience at a conference hotel last
year. An attempt to copy a photograph to the LWN server (using
scp) would consistently fail with a "response timeout" error.
There was so much buffering in the path that scp was able to
"send" the entire image before any of it had been received at the other
end. The scp utility would then wait for a response from the
remote end; that response would never come in time because most of the
image had not, contrary to what scp thought, actually been
transmitted. The solution was to use the -l option to slow down
transmission to a rate closer to what the link could actually manage. With
scp transmitting slower, it was able to come up with a more
reasonable idea for when the data should be received by the remote end.
And that, of course, is the key to avoiding bufferbloat issues in general.
A system transmitting packets onto the net should not be sending them more
quickly than the slowest link on the path to the destination can handle
them. TCP implementations are actually designed to figure out what the
transmission rate should be and stick to it, but massive buffering defeats
the algorithms used to determine that rate. One way around this problem is
to force users to come up with a suitable rate manually, but that is not
the sort of network experience most users want to have. It would be far
better to find a solution that Just Works.
Part of that solution, according to Nichols and Jacobson, is a new
algorithm called
CoDel (for "controlled delay"). Before describing that algorithm, though,
they make it clear that just making buffers smaller is not a real solution
to the problem. Network buffers serve an important function: they absorb
traffic spikes and equalize packet rates into and out of a system. A long
packet queue is not necessarily a problem, especially during the startup
phase of a network connection, but long queues as a steady state just add
delays without improving throughput at all. The point of CoDel is to allow
queues to grow when needed, but to try to keep the steady state at a
reasonable level.
Various automated queue management algorithms have been tried over the years; they
have tended to suffer from complexity and a need for manual configuration.
Having to tweak parameters by hand was never a great solution even in ideal
situations, but it
fails completely in situations where the network load or link delay time
can vary widely over time. Such situations are the norm on the
contemporary Internet; as a result, there has been little use of automated
queue management even in the face of obvious problems.
One of the key insights in the design of CoDel is that there is only one
parameter that really matters: how long it takes a packet to make its way
through the queue and be sent on toward its destination. And, in
particular, CoDel is interested in the minimum delay time over a
time interval of interest. If that minimum is too high, it indicates a
standing backlog of packets in the queue that is never being cleared, and
that, in turn, indicates that too much buffering is going on.
So CoDel works by adding a timestamp to each packet as it is received and
queued. When the packet reaches the head of the queue, the time spent in
the queue is calculated; it is a simple calculation of a single value, with
no locking required, so it will be fast.
Less time spent in queues is always better, but that time cannot always be
zero. Built into CoDel is a maximum acceptable queue time, called
target; if a packet's time in the queue exceeds this value, then the
queue is deemed to be too long. But an overly-long queue is not, in
itself, a problem, as long as the queue empties out again. CoDel defines a
period (called interval) during which the time spent by packets in
the queue should fall below target at least once; if that does not
happen, CoDel
will start dropping packets. Dropped packets are, of course, a signal to
the sender that it needs to slow down, so, by dropping them, CoDel should
cause a reduction in the rate of incoming packets, allowing the queue to
drain. If the queue time remains above target, CoDel will drop
progressively more packets. And that should be all it takes to keep queue
lengths at reasonable values on a CoDel-managed node.
The target and interval parameters may seem out of place in
an algorithm that is advertised as having no knobs in need of tweaking.
What the authors have found, though, is that a target of 5ms and an
interval of 100ms work well in just about any setting. The use of
time values (rather than packet or byte counts) makes the algorithm
function independently of the speed of the links it is managing, so there
is no real need to adjust them. Of course, as they note, these are early
results based mostly on simulations; what is needed now is experience
using a functioning implementation on the real Internet.
That experience may not be long in coming, at least for some kinds of
links; there is now a CoDel patch for Linux
available thanks to Dave Täht and Eric Dumazet. This code is likely to
find its way into the mainline fairly quickly; it will also be available in
the CeroWrt
router distribution. As the early CoDel implementation starts to see some
real use, some shortcomings will doubtless be encountered and it may well
lose some of its current simplicity. But it has every appearance of being
an important component in the solution to the bufferbloat problem.
Of course, it's not the only component; the problem is more complex than
that. There is still a need to look at buffer sizes throughout the stack;
in many places, there is simply too much buffering in places where it can
do no good. Wireless networking adds some interesting challenges of its
own, with its quickly varying link speeds and complexities added by packet
aggregation. There is also the little problem of getting updated software
distributed across the net. So a full solution is still somewhat distant,
but the understanding of the problem is clearly growing and some
interesting approaches are beginning to appear.
Comments (43 posted)
By Jonathan Corbet
May 8, 2012
With the release of the
3.4-rc6 prepatch,
Linus let it be known
that he thought the final 3.4 release was probably not too far away. That
can only mean one thing: it's time to look at the statistics for this
development cycle. 3.4 was an active cycle, with an interesting surprise
or two.
As of this writing, Linus has merged just over 10,700 changes for 3.4; those changes
were contributed from 1,259 developers. The total growth of the kernel
source this time around is 215,000 lines. The developers most active in
this cycle were:
| Most active 3.4 developers |
| By changesets |
| Mark Brown | 284 | 2.7% |
| Russell King | 211 | 2.0% |
| Johannes Berg | 147 | 1.4% |
| Al Viro | 136 | 1.3% |
| Axel Lin | 133 | 1.2% |
| Johan Hedberg | 122 | 1.1% |
| Guenter Roeck | 121 | 1.1% |
| Masanari Iida | 109 | 1.0% |
| Stanislav Kinsbursky | 97 | 0.9% |
| Trond Myklebust | 85 | 0.8% |
| Jiri Slaby | 82 | 0.8% |
| Ben Hutchings | 82 | 0.8% |
| Greg Kroah-Hartman | 78 | 0.7% |
| Takashi Iwai | 78 | 0.7% |
| Dan Carpenter | 78 | 0.7% |
| Stephen Warren | 76 | 0.7% |
| Stanislaw Gruszka | 76 | 0.7% |
| Alex Deucher | 73 | 0.7% |
|
| By changed lines |
| Joe Perches | 56571 | 8.1% |
| Dan Magenheimer | 24077 | 3.4% |
| Stephen Rothwell | 17354 | 2.5% |
| Greg Kroah-Hartman | 15015 | 2.1% |
| Mark Brown | 12266 | 1.8% |
| Jiri Olsa | 11842 | 1.7% |
| Mark A. Allyn | 10976 | 1.6% |
| Stephen Warren | 10386 | 1.5% |
| Arun Murthy | 9347 | 1.3% |
| Ingo Molnar | 8779 | 1.3% |
| Alex Deucher | 8770 | 1.3% |
| David Howells | 8034 | 1.2% |
| Guenter Roeck | 7634 | 1.1% |
| Chris Kelly | 7023 | 1.0% |
| Johannes Berg | 6657 | 1.0% |
| Ben Hutchings | 6650 | 1.0% |
| Al Viro | 6628 | 0.9% |
| Russell King | 6610 | 0.9% |
|
Mark Brown finds himself at the top of the list of changeset contributors
for the second cycle in a row; as usual, he has done a great deal of work
with sound drivers and related subsystems. Russell King is the chief ARM
maintainer; he has also taken an active role in the refactoring and cleanup
of the ARM architecture code. Johannes Berg continues to do a lot of work
with the mac80211 layer and the iwlwifi driver, Al Viro has been improving
the VFS API and fixing issues throughout the kernel, and Axel Lin has done
a lot of cleanup work in the ALSA and regulator subsystems and beyond.
Joe Perches leads the "lines changed" column with coding-style fixes, pr_*() conversions, and related work.
Dan Magenheimer added the "ramster" memory sharing mechanism to the staging
tree.
Linux-next maintainer Stephen Rothwell made it into the "lines
changed" column with the removal of a lot of old PowerPC code. Greg
Kroah-Hartman works all over the tree, but the bulk of his changed lines
were to be found in the staging tree.
Some 195 companies contributed changes during the 3.4 development cycle.
The top contributors this time around were:
| Most active 3.4 employers |
| By changesets |
| (None) | 1156 | 10.8% |
| Intel | 1138 | 10.6% |
| Red Hat | 960 | 9.0% |
| (Unknown) | 688 | 6.4% |
| Texas Instruments | 428 | 4.0% |
| IBM | 381 | 3.6% |
| Novell | 372 | 3.5% |
| (Consultant) | 298 | 2.8% |
| Wolfson Microelectronics | 286 | 2.7% |
| Samsung | 234 | 2.2% |
| Google | 222 | 2.1% |
| Oracle | 188 | 1.8% |
| Freescale | 175 | 1.6% |
| Qualcomm | 161 | 1.5% |
| Linaro | 143 | 1.3% |
| Broadcom | 140 | 1.3% |
| NetApp | 133 | 1.2% |
| MiTAC | 133 | 1.2% |
| AMD | 132 | 1.2% |
|
| By lines changed |
| (None) | 108509 | 15.5% |
| Intel | 67464 | 9.7% |
| Red Hat | 65966 | 9.4% |
| (Unknown) | 50900 | 7.3% |
| IBM | 36800 | 5.3% |
| Oracle | 26617 | 3.8% |
| Texas Instruments | 25687 | 3.7% |
| Samsung | 24966 | 3.6% |
| NVidia | 20604 | 2.9% |
| Linux Foundation | 16917 | 2.4% |
| ST Ericsson | 15792 | 2.3% |
| Novell | 15185 | 2.2% |
| Wolfson Microelectronics | 14039 | 2.0% |
| (Consultant) | 13495 | 1.9% |
| AMD | 10151 | 1.5% |
| Freescale | 10102 | 1.4% |
| Linaro | 9360 | 1.3% |
| Google | 9070 | 1.3% |
| Qualcomm | 8972 | 1.3% |
|
A longstanding invariant in the above table has been Red Hat as the top
corporate contributor; in 3.4, however, Red Hat has been pushed down one
position by Intel. Red Hat's contributions are down somewhat; 960
changesets in 3.4 compared to 1,290 in 3.3. But the more significant
change is the burst of activity from Intel. This work is mostly
centered around support for Intel's own hardware, as one would expect, but
also extends to things like support for the x32 ABI.
Meanwhile, Texas Instruments continues the growth in participation seen
over the last few years, as do a number of other mobile and embedded
companies. Once upon a time, it was said that Linux development was
dominated by "big iron" enterprise-oriented companies; those companies have
not gone away, but they are clearly not the only driving force behind Linux
kernel development at this point.
On the other hand, the participation by volunteers is at the
lowest level seen in many cycles, continuing a longstanding trend.
A brief focus on ARM
Recent development cycles have seen a lot of work in the ARM subtree, and
3.4 is no exception; 1,100 changesets touched code in arch/arm
this time around. Those changes were contributed by 178 developers
representing 51 companies. Among those companies, the most active were:
| Most active 3.4 employers (ARM subtree) |
| By changesets |
| (Consultant) | 149 | 13.5% |
| Texas Instruments | 121 | 11.0% |
| (None) | 103 | 9.4% |
| Samsung | 91 | 8.3% |
| Linaro | 80 | 7.3% |
| NVidia | 54 | 4.9% |
| ARM | 52 | 4.7% |
| (Unknown) | 48 | 4.4% |
| Calxeda | 46 | 4.2% |
| Freescale | 40 | 3.6% |
| Atmel | 37 | 3.4% |
| Atomide | 30 | 2.7% |
| OpenSource AB | 24 | 2.2% |
| Google | 23 | 2.1% |
| ST Ericsson | 23 | 2.1% |
|
| By lines changed |
| Samsung | 8162 | 16.8% |
| (None) | 5967 | 12.3% |
| NVidia | 4929 | 10.2% |
| (Consultant) | 4755 | 9.8% |
| Linaro | 3550 | 7.3% |
| Texas Instruments | 3118 | 6.4% |
| ARM | 2659 | 5.5% |
| Calxeda | 2408 | 5.0% |
| Atmel | 2080 | 4.3% |
| (Unknown) | 1862 | 3.8% |
| Vista-Silicon S.L. | 1121 | 2.3% |
| Freescale | 1117 | 2.3% |
| Atomide | 1005 | 2.1% |
| Google | 737 | 1.5% |
| PHILOSYS Software | 659 | 1.4% |
|
ARM is clearly an active area for consultants, who contributed over 13% of
the changes this time around. Otherwise, there are few surprises to be
seen in this area; the companies working in the mobile area are the biggest
contributors to the ARM tree, while those focused on other types of systems
have little presence here.
There is one other way to look at ARM development. Much of the work on ARM
is done through the Linaro consortium. Many developers contributing code
from a linaro.com address are "on loan" from other companies; the above
table, to the extent possible, credits those changes to the "real" employer
that paid for the work. If, instead, all changes from a Linaro address are
credited to Linaro, the results change: Linaro, with 11.9% of all the
changes in arch/arm, becomes the top employer, though it still
accounts for fewer changes than independent consultants do. Linaro clearly
has become an important part of the ARM development community.
In summary, it has been another busy and productive development cycle in
the kernel community. Despite the usual hiccups, things are stabilizing
and chances are good that 3.4-rc7 will be the last prepatch, meaning that
this cycle will be a relatively short one. There is little rest for kernel
developers, though; the 3.5 cycle with its frantic merge window will start
shortly thereafter. Stay tuned to LWN, as always, for ongoing coverage of
development in this large and energetic community.
Comments (1 posted)
By Jonathan Corbet
May 9, 2012
The diversity of the ARM architecture is one of its great strengths:
manufacturers have been able to create a wide range of interesting
system-on-chip devices around the common ARM processor core. But this
diversity, combined with a general lack of hardware discoverability, makes
ARM systems hard to support in the kernel. As things stand now, a special
kernel must be built for any specific ARM system. With most other
architectures, it is possible to support most or all systems with a single
binary kernel (or maybe two for 32-bit and 64-bit configurations). In the
ARM realm, there is no single binary kernel that can run everywhere. Work
is being done to improve that situation, but some interesting decisions
will have to be made on the way.
On an x86 system, the kernel is, for the most part, able to boot and ask
the hardware to describe itself; kernels can thus configure themselves for
the specific system on which they are run. In the ARM world, the hardware
usually has no such capability, so the kernel must be told which devices
are present and where they can be found. Traditionally, this configuration
has been done in "board files," which have a number of tasks:
- Define any system-specific functions and setup code.
- Create a description of the available peripherals, usually through
the definition of a number of platform
devices.
- Create a special machine description structure that includes a magic
number defined for that particular system. That number must be passed
to the kernel by the bootloader; the kernel uses it to find the
machine description for the specific system being booted.
There are currently hundreds of board files in the ARM architecture
subtree, and some unknown number of them shipped in devices but never
contributed upstream. Within a given platform type (a specific
system-on-chip line from a vendor), it is often possible to build
multiple board files into a single kernel, with the actual machine type
being specified at boot time. But combining board files across platform
types is not generally possible.
One of the main goals of the current flurry of work in the ARM subtree is
to make multi-platform kernels possible. An important step in that
direction is the
elimination of board files as much as possible; they are being replaced
with device trees. In the end, a board
file is largely a static data structure describing the topology of the
system; that data structure can just as easily be put into a text file
passed into the kernel by the boot loader. By moving the hardware
configuration information out of the kernel itself, the ARM developers make
the kernel more easily applicable to a wider variety of hardware. There
are a lot of other things to be done before we have true multi-platform
support—work toward properly abstracting interrupts and clocks continues,
for example—but device tree support is an important piece of the puzzle.
Arnd Bergmann recently asked a question to
the kernel development community: does it make sense to support legacy board
files in multi-platform kernels? Or would it be better to limit support to
systems that use device trees for hardware enumeration? Arnd was pretty
clear on what his own position was:
My feeling is that we should just mandate DT booting for
multiplatform kernels, because it significantly reduces the
combinatorial space at compile time, avoids a lot of legacy board
files that we cannot test anyway, reduces the total kernel size and
gives an incentive for people to move forward to DT with their
existing boards.
There was a surprising amount of opposition to this idea. Some developers
seemed to interpret Arnd's message as a call to drop support for systems
that lack device tree support, but that is not the point at all. Current
single-platform builds will continue to work as they always have; nobody is
trying to take that away. The point, instead, is to make life easier for
developers trying to make multi-platform builds work; multi-platform ARM
kernels have never worked in the past, so excluding some systems will not
deprive their users of anything they already had.
Some others saw it as an arbitrary restriction without any real technical
basis. There is nothing standing in the way of including non-device-tree
systems in a multi-platform kernel except the extra complexity and bloat
that they bring. But complexity and bloat are technical problems,
especially when the problem being solved is difficult enough as it is.
It was also pointed out that there are some older
platforms that have not seen any real maintenance in recent times, but
which are still useful for users.
In the end, it will come down to what the users of multi-platform ARM
kernels want. It was not immediately clear to everybody that there are users
for such kernels: ARM kernels are usually targeted to specific devices, so
adding support for other systems gives no benefit at all. Thus, embedded
systems manufacturers are likely to be uninterested in multi-platform
support. Distributors are another story, though; they would like to
support a wide range of systems without having to build large numbers of
kernels. As Debian developer Wookey put
it:
We are keen on multiplatform kernels because building a great pile
of different ones is a massive pain (and not just for arm because
it holds up security updates), and if we could still cover all that
lot with one kernel, or indeed any number less than 7 that would be
great.
In response, Arnd amended his proposal to
allow board files for subarchitectures that don't look likely to support
device trees anytime soon. At that point, the discussion wound down
without any sort of formal conclusion. The topic will likely be discussed
at the upcoming Linaro Connect event and, probably, afterward as well.
There are a number of other issues to be dealt with before multi-platform
ARM kernels are a reality; that gives some time for this particular
decision to be considered with all the relevant needs in mind.
Comments (6 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Networking
Architecture-specific
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>