On the first day of this year's Linux Foundation Collaboration Summit,
several kernel developers sat down with moderator Greg Kroah-Hartman for
another edition of the kernel panel. The developers covered a wide range
of kernel subsystems, from graphics and memory management, to storage and
networking. As is usual, a lively discussion ensued, covering a number of
topical and longtime kernel concerns.
Audience questions for the panel are eagerly sought, Kroah-Hartman said,
noting that a similar panel at LinuxCon Japan had turned into an
between the kernel hackers onstage and those in the audience. He then had
the panel introduce themselves.
Mel Gorman said that he works for SUSE
Labs on memory management along with fixing bugs in various SLES and openSUSE
kernels. John Linville of Red Hat is the maintainer for the wireless LAN
subsystem, which is, he said, not about writing "cool code" unfortunately,
but is more of an administrative role shepherding others' patches and
features. James Bottomley is the CTO for server virtualization at
Parallels as well as maintaining the
SCSI subsystem for the kernel. In addition, he "mucks about at the
edges" trying to make kernel development better, which is often as much a
social problem as anything else, he said. Keith Packard works for the
Intel Open Source Technology Center on graphics and window systems, as well
as doing kernel DRM (direct rendering
Kroah-Hartman then queried Bottomley and Gorman about what went on at the
recently completed Linux Storage, Filesystem, and Memory Management Summit
(LWN coverage: day one and day two). Bottomley rattled off a few
different topics that came up in the storage and filesystems tracks including
new "weird and wild" SCSI commands
that are coming down the pipe. He joked that it was necessary to keep
Christoph Hellwig gagged while that talk was going on so that everyone
could actually hear about the commands. The summit is becoming one of the more
important kernel development meetings, he said, and it is one that, unlike
some kernel summits, actually has arguments; it's "definitely
Gorman also mentioned a number of different topics that were discussed in
the memory management track including the two NUMA migration schemes
that are currently floating around (Peter Zijlstra's sched/numa and Andrea Arcangeli's AutoNUMA), as well as containers and control
groups. He said that kernel hackers are now
concerned about how quickly the containers and control groups code runs,
rather than whether it will run, which was the concern in the past. The
discussions were "quite civil" at the summit, which contrasts with how they
sometimes go on the mailing list. The meetings were definitely a success,
he said, as even if a decision went against a developer's idea or plan,
they got a
good idea of why the others objected to it.
Wireless and graphics
Things are clearly getting much better in the wireless area, Kroah-Hartman
noted; it "used to be a nightmare", but, he asked Linville, is it a solved
problem now? Wireless in Linux has matured quite a bit over the last few
years, Linville said, and there are a number of companies that are now
participants in developing free Linux drivers, including Broadcom and
Qualcomm/Atheros. It really helps to have people available "who know how
the hardware works", he said.
But, wireless technology continues to evolve with things like 802.11ac
coming along (Linville called it "[802.]11n on steroids") that require
support and drivers for Linux. Bottomley asked about 802.11n compliance,
which Linville said is going well, though there are still "things to be
ironed out". The code is in place and drivers are using it, but there is
still some development to be done. All of that is helped by better support
from the bigger players, but some of the second-tier wireless hardware
providers are working on free kernel drivers as well.
Moving on to Packard and graphics, Kroah-Hartman asked about X and mobile
phones. In the past, phones shipped using X, but that really is no longer
the case, he said, and wondered why. Packard said that the last six years
had been spent "radically restructuring" graphics on Linux. The idea was
to have kernel drivers that could support more window systems, beyond just
X, because X is "not what people want anymore", he said. Today, most are
interested in compositing-based models.
Compositing is a totally different windowing system model, he said, which
is simpler. After all the work that was done, the existing kernel DRM layer
is capable of supporting all of the different options, including Wayland,
Android, and X. Android usually uses different drivers, but the idea is
similar. We are, Packard said, moving away from X as the fundamental
graphics layer for Linux, instead the Linux kernel now serves that purpose.
From the audience, Bdale Garbee asked Linville about the state of Broadcom
drivers, noting that performance and stability of those drivers on
laptops recently was "not so great". Linville said that he assumes it will
get better, that the Broadcom drivers have not been in the kernel that
long, and that those drivers getting exposure in distributions should
help. That will lead to more bug reports, which will be beneficial. The
developers have been working well with Linville and are being diligent
about looking at bug reports and fixing the problems reported there.
There is always going to be a certain amount of lag, Linville said, because
some distributions are faster or slower about updating to the latest
kernel. But, it is the "same old story", he said, if you find bugs, report
them, and respond to the questions that are asked.
That led to a discussion of the stable kernel tree, with Bottomley noting
that some distributions are more attuned to stable than others.
Kroah-Hartman said that he tries to do a stable release each week, but that
it is rare for Broadcom bug fixes to be sent to the stable tree. Linville
said that he should remind developers to CC stable on the bug fix patches,
but that there is a somewhat tricky balance there, which requires judgment
calls on which fixes are appropriate.
Gorman said that it is common in memory management development to get
"slapped" if thing aren't marked for stable that should be. But, he said,
each subsystem deals with things differently. Bottomley said that nobody
likes getting the email reminding them to send fixes to the stable
maintainers but that it is important to get those patches into the stable trees.
Next up was a question for all about their "pet peeves" in Linux kernel
development. We often see the same problems over and over, Kroah-Hartman
said, which ones are particularly irksome? Packard said that his biggest
pet peeve was outside the graphics area. He uses Bluetooth a lot and is
annoyed that every time an -rc1 kernel is released, Bluetooth breaks. It
is good, in some ways, he guesses, because now he can debug and bisect
Bluetooth problems in the kernel. But the basic problem seems to be that
it is common for Bluetooth kernel development to break all of user space.
Every time he has ever suggested that for graphics work, he got "flamed to
a crisp". If your patch breaks the user-space interface, he said, please
don't bother submitting the patch.
Bottomley is unhappy with changelogs that don't say why the change
is being made. Changelogs often say what is being done, but they don't say
what the user-visible effects are. Well-written changelogs should
not describe the change itself, he said, because that's what the C
code is for. "Almost all kernel developers can read C", he said, to
a hearty laugh from the audience. When Linville suggested that he didn't
really have a pet peeve, Bottomley immediately asked if he would be willing
to swap subsystems with him.
On further reflection, Linville echoed some of Bottomley's complaints,
noting that it is sometimes difficult to determine where a particular patch
should be sent. Because the changelogs don't clearly indicate whether the
patch is a fix or a new feature, he is left guessing whether it belongs in
the next tree, or needs to applied more urgently. It is particularly
problematic during a merge window, he said, so the changelogs need to say
where the patch is bound.
Since he doesn't "have to maintain anything", Gorman is in a different
position. He joked that he gets to "rag on maintainers and make their life
miserable". More seriously, he pointed to mistakes that are made again and
again as a pet peeve of his. He mentioned writeback causing long delays
when writing to USB sticks as an example. That has been fixed "at least
eight times", he said, only to be broken again in the next release. We
need to do a better job checking to see that those kinds of bugs stay fixed,
More audience participation
A member of the audience asked about the future of proprietary loadable
kernel modules, and Kroah-Hartman immediately said that he really didn't
see a future for them. The kernel developers have provided ways to operate
hardware from user space that can be used for proprietary drivers. As an
pointed to "laser welding robots" that are driven from user space with a 3D
application that uses lots of floating point math.
If companies look at the business case, Bottomley said, it is rare that
closed drivers make sense. If a company produces standard hardware that
lots of people will require a driver for, there is no real business value
in a closed driver. For more specialized devices, user-space drivers may
make sense, he said.
Structured logging was the topic of another audience question. The idea
has been around since at least 2004, the audience member said, and some
solutions have started to appear. The problem is that users are now
supporting larger numbers of systems and "cannot manage a datacenter by
hand". Where is structured logging in the kernel headed?
Kroah-Hartman mentioned that a patch had just been merged that builds atop
dev_printk() and brings some structure to logging. There have
been recent proposals, including one at last year's kernel summit that got
derailed by a "spat over UUIDs", Bottomley said. Packard said that there
is a fear of top-down proposals for structured logging, especially if
driver writers have to specify their messages ahead of time. Some
remind him of the VMS error message documentation that came in a large
binder. What's needed is a way for driver writers and others to get the
benefits of structured logging without all the problems, he said.
SCSI and bufferbloat
The new SCSI commands that Bottomley mentioned at the start formed the
basis of the next question. Kroah-Hartman noted that there are more
high-speed storage devices these days that are avoiding SCSI because they
can't get the I/O operations/second (IOPS) rates that they need, so, he
asked, is SCSI dead for high speed? Bottomley said that it is really two
different questions. There is a need for storage that acts like memory,
and some of these efforts are intermediate steps that are being taken when
"what we really need is more memory".
As far as whether SCSI will survive, Bottomley was willing to bet that it
would be around for quite some time. There is a need for standards-based
storage devices so that users can purchase storage at Fry's or other stores
and know that they will work with their systems. Whether it will be SATA,
SAS, SCSI, or something else is not clear, but he believes that SCSI will
be in the mix.
Throughput used to be an important measure for storage, but that is moving
to IOPS. SCSI used to
sacrifice latency for throughput, but the reverse is happening because of
the focus on IOPS. Linville spoke up at that point to note the parallels
with bufferbloat in the networking world. Latency was de-emphasized in
networking devices for throughput and then people started wondering why
their interactivity had gone down.
That led to a question to Linville about bufferbloat and the patches that had
recently gone into the kernel to try to address some of those problems. It
is a "complicated topic", Linville said, and wireless is part of the
problem, though it is seen on both wired and wireless networks. Some of
the problems are inherent in wireless technologies because the available
bandwidth changes over short periods of time, which can lead to high
latency. There are also problems with wireless that aren't
bufferbloat-related but sometimes look like bufferbloat.
Unfortunately, no "magic solution" has presented itself to fix the general
bufferbloat problem. There needs to be an adaptive queue management
algorithm, but none is known that solves the problem. Something that works
well for wired networks is "random early discard" (RED), but that requires
lots of tuning. A recent change to measure queues in bytes, rather than
packets, helps, because queue length limits are set by the aggregate size
of the packets being sent. But there are still questions of what the
length of the byte queues should be, whether they should change, and, if so,
how often. The problem is not specific to Linux, and there are some
political issues surrounding it because not everyone believes it is a
problem—or that it is their problem.
A grab bag of questions and answers
As the panel time slot wound down, there were a number of other audience
questions and kernel hacker discussions. An audience question about
participation by Linux developers in standards committees noted that it
takes money and some amount of insanity to participate in those.
Kroah-Hartman pointed out that the Linux Foundation has worked on standards
participation in the past, while Bottomley asked why there was a perception
that Linux developers and companies are not involved. He noted that in the
storage area there is a "tireless Dane" (Martin Petersen) who works on
standards. It isn't the money or doing what's needed to get on the
committees that is the problem, Bottomley said, but instead it is finding
the right people to do so. The T10, T13, and UEFI committees all have
Linux representatives, he said. If there are standards committees where
Linux is not represented, we want to know about that, Kroah-Hartman said.
Grant Likely asked about the progress of the Android patches into the
mainline; when is that job "done"? Once Android is using mainline kernels
was the answer Bottomley gave. Kroah-Hartman noted that the real problem
is on the user-space side. Kernel hackers can't do anything about changing
the Android user space, but companies like Linaro and Samsung are making
some progress in doing so. The 3.3 kernel can boot an Android user space,
but it will "eat your battery alive", he said. We are making progress, but
it will require teamwork to get there.
What are kernel hackers doing to measure power consumption was the next
audience question. Packard said that there is a lot of focus on that in
the graphics arena. They are using wall power meters to measure the power
consumption of various devices over time and trying to correlate those
measurements with active functional units in graphics devices and
system-on-chips (SoCs). They measure things like joules-per-movie, which
is a critical measure for users. There is an effort to balance the "race
to idle" with voltage and frequency scaling, he said, especially for
latency-sensitive applications like displaying movies. In addition to just
the graphics hardware, they are trying to measure memory power utilization
and bus power utilization, he said.
The problem is bigger than graphics, Bottomley said. In the storage
world, the speed of the buses has been "jacked up", which increases the
power usage. On a netbook SATA link, a half-watt of power can be used just
to power the bus. Power saving for SATA buses is coming, he said.
The last topic covered was that of "hardware bypass", which are devices
that take on some of the tasks that are normally handled by the kernel in
the interests of performance. Gorman pointed out that bypass is often done
to drive some "artificial metric" and that kernel developers need to know
what the metric is in order to make proper decisions. The question for
those proposing bypass should be "Why are you trying to do that?", he
said. The problem is not just for SCSI (or storage), but for all of the
various bypass (or offload) proposals.
The CPU isolation feature, which allows an administrator to remove a CPU
from those managed by the kernel to run a particular workload unperturbed
by the rest of the system, is one that Gorman mentioned. One of the
reasons that people give for wanting the feature is to avoid the
inter-processor interrupt (IPI) "storms" that can occur. But a better way
to approach that problem is to figure out why those storms are happening
and to address that instead, he said. That's "ultimately the right thing
to do" whenever bypass is suggested.
Linville noted that TCP offload engines provided a boost for some users,
but that CPU improvements have "largely erased the gains" that were made.
Bottomley said that the question should not be how to avoid the kernel
code, but instead should be how to take advantage of the work that the
kernel developers do. Essentially, the consensus was that bypass or offload
technologies are not only bypassing the kernel, but are also ignoring the
collective knowledge and abilities of the Linux kernel community.
Once again, the kernel panel gave a nice glimpse inside the heads of kernel
developers. It provided some insight into how they approach problems, and
where they think solutions generally lie. It was nice to have a mix of
blood" as well as "old hands" on the panel, which definitely led to an
to post comments)