Brief items
The current development kernel is 3.5-rc7;
released on July 14. "
Hey guys,
remember how things have been stabilizing and slowing down, and all the
kernel developers were off on summer vacation? Yeah, we need to talk about
that." He is still hoping this is the last -rc before the final 3.5
release.
Stable updates: the 3.0.37 and 3.4.5 updates were released on July 16;
3.2.23 was released on July 13. The
3.0.38 and 3.4.6 updates are in the review process as of
this writing; they can be expected on or after July 19.
Comments (none posted)
If I have to explain this I'm going to be very sad.
[...]
-#define HV_LINUX_GUEST_ID_HI 0xB16B00B5
+#define HV_LINUX_GUEST_ID_HI 0x0DEFACED
— Signed-off-by:
Matthew Garrett
And to make it worse, "x" itself was the result of doing
"*&y". Which was probably written by the insane monkey's older
brother, Max, who has been chewing Quaaludes for a few years, and
as a result _his_ brain really isn't doing too well either. Even
for a monkey. And now you're letting *him* at your keyboard too?
—
Linus Torvalds
Comments (11 posted)
Linux.com
interviews
Dave Jones, Fedora's kernel maintainer, as part of its Kernel
Developers series. "
I needed to build my own kernel, because none of the distros shipped one that supported something I needed. And the feature I needed was only available in the development tree at the time (which was 2.1.x at the time). I don't recall what it was, but I think it may have been something silly like VFAT. Things weren't always stable, so I got into a habit of updating regularly (by carrying the latest tarball on a zip disk from the university to home). I started sending patches for things wherever I saw something I thought I could improve. I'm struggling to remember my first real accomplishment. It may have been fixing AFFS during the 2.1.x series. There were a whole bunch of really minor things before then."
Comments (2 posted)
Kernel development news
By Jonathan Corbet
July 17, 2012
Secure communications are dependent on good cryptography, and cryptography,
in turn, is dependent on good random numbers. When cryptographic keys are
generated from insufficiently-random values, they may turn out to be easily
guessable by an attacker, leaving their user open to eavesdropping and
man-in-the-middle attacks. For this reason, quite a bit of attention has
been put into random number generation, but that does not mean that
problems do not still exist. A set of patches intended for merging into
the 3.6 kernel highlight some of the current concerns about random number
generation in Linux.
Computing systems traditionally do not have sources of true randomness
built into them. So they have operated by attempting to extract randomness
from the environment in which they operate. The fine differences in timing
between a user's keystrokes are one source of randomness, for example.
The kernel can also use factors like the current time, interrupt timing,
and more. For a typical desktop system, such sources usually provide
enough randomness for the system's needs. Randomness gets harder to come
by on server systems without a user at the keyboard. But the hardest
environment of all may be embedded systems and network routers; these
systems may perform
important security-related tasks (such as the generation of host keys)
before any appreciable randomness has been received from the environment.
As Zakir Durumeric, Nadia Heninger, J. Alex Halderman, and
Eric Wustrow have documented, many of
the latter class of systems are at risk, mostly as a result of keys
generated with insufficient randomness and predictable initial conditions.
They write: "We found that 5.57% of TLS hosts and 9.60% of SSH hosts
share public keys in an apparently vulnerable manner, due to either
insufficient randomness during key generation or device default
keys." They were also able to calculate the actual keys used for a
rather smaller (but still significant) percentage of hosts. Their site
includes a key checker; concerned administrators may point it at their
hosts to learn if their keys are vulnerable.
Fixes for this problem almost certainly need to be applied at multiple
levels, but kernel-level fixes seem particularly important since the kernel
is the source for most random numbers used in cryptography. To that end,
Ted Ts'o has put together a set of patches
designed to improve the amount of randomness available in the system from
when it first boots. Getting there involves making a number of changes.
One of those is to fix the internal add_interrupt_randomness()
function, which is used to derive randomness from interrupt timing. Use of
this function has been declining in recent years, as a result of both its
cost and concerns about the actual randomness of many interrupt sources.
Ted's patch set tries to address the cost by batching interrupt-derived
randomness on a per-CPU basis and only occasionally mixing it into the
system-wide entropy pool. That mixing is also done with a new, lockless
algorithm; this algorithm contains some small race conditions, but those
could be seen to make the result even more random. An attempt is
made to increase the amount of randomness obtained from interrupts by
mixing in additional data, including the value of the instruction pointer
at the time of the interrupt. After this change, adding randomness from
interrupts should be fast and effective, so it is done by default for
all interrupts; the IRQF_SAMPLE_RANDOM interrupt flag no
longer has any effect.
Next, the patch set adds a new function:
void add_device_randomness(const void *buf, unsigned int size);
The purpose is to allow drivers to mix in device-specific data that, while
not necessarily random, is system-specific and unpredictable. Examples
include serial, product, and manufacturer information from attached USB
devices, the "write counter" from some realtime clock devices, and the MAC
address from network devices. Most of this data should be random from the
point of view of an attacker; it should help to prevent the situation where
multiple, newly-booted devices generate the same keys.
Finally, Ted's patch set also changes the use of the hardware random number
generator built into a number of CPUs. Rather than return random numbers
directly from the hardware, the code now mixes hardware random data into
the kernel's entropy pool and generates random numbers from there. His
reasoning is that using hardware random numbers directly requires placing a
lot of trust in the manufacturer:
It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove
otherwise --- especially since Bull Mountain is documented to use
AES as a whitener. Hence, the output of an evil, trojan-horse
version of RDRAND is statistically indistinguishable from an RDRAND
implemented to the specifications claimed by Intel. Short of using
a tunneling electronic microscope to reverse engineer an Ivy
Bridge chip and disassembling and analyzing the CPU microcode,
there's no way for us to tell for sure.
Mixing hardware random data into the entropy pool helps to mitigate that
threat. The first time this patch came around, Linus rejected it, saying "It would be a total
PR disaster for Intel, so they have huge incentives to be
trustworthy." That opinion was not
universally shared, though, and the patch remains in the current set.
Chances are it will be merged in its current form.
An important part of the plan, though, is to get these patches into the
stable updates despite their size. Then, with luck, device manufacturers
will pick them up relatively quickly and stop shipping systems with a known
weakness. Even better would be, as Ted suggested, to make changes at the user-space
levels as well. For example, delaying key generation long enough to let
some randomness accumulate should improve the situation even more. But
making things better at the kernel level is an important start.
Comments (24 posted)
By Jonathan Corbet
July 17, 2012
The "bufferbloat" problem is the result of excessive buffering in the
network stack; it leads to long latencies and poor reliability in the
network as a whole. Fixing it is a matter of buffering less data in each
system between any two endpoints—a task that sounds simple, but proves to
be more challenging than one might expect. It turns out that buffering can
show up in many surprising places in the networking stack; tracking all of
these places down and fixing them is not always easy.
A number of bloat-fighting changes have gone into the kernel over the last
year. The CoDel queue management algorithm
works to prevent packets from building up in router queues over time. At a
much lower level, byte queue limits put a
cap on the amount of data that can be waiting to go out a specific network
interface. Byte queue limits work only at the device queue level, though,
while the networking stack has other places—such as the queueing discipline
level—where buffering can happen. So there would be value in an
implementation that could limit buffering at levels above the device queue.
Eric Dumazet's TCP small queues patch looks
like it should be able to fill at least part of that gap. It limits the
amount of data that can be queued for transmission by any given socket
regardless of where the data is queued, so it shouldn't be fooled by
buffers lurking in the queueing, traffic control, or netfilter code. That
limit is set by a new sysctl knob found at:
/proc/sys/net/ipv4/tcp_limit_output_bytes
The default value of this limit is 128KB; it could be set lower on systems
where latency is the primary concern.
The networking stack already tracks the amount of data waiting to be
transmitted through any given socket; that value lives in the
sk_wmem_alloc field of struct sock. So applying a limit
is relatively easy; tcp_write_xmit() need only look to see if
sk_wmem_alloc is above the limit. If that is the case, the socket
is marked as being throttled and no more packets are queued.
The harder part is figuring out when some space opens up and it is possible
to add more packets to the queue. The time when queue space becomes free
is when a queued packet is freed. So Eric's patch overrides the normal
struct sk_buff destructor when an output limit is in effect; the
new destructor can check to see whether it is time to queue more data for
the relevant socket. The only problem is that this destructor can be
called from deep within the network stack with important locks already
held, so it cannot queue new data directly. So Eric had to add a new
tasklet to do the actual job of queuing new packets.
It seems that the patch is having the intended result:
Results on my dev machine (tg3 nic) are really impressive, using
standard pfifo_fast, and with or without TSO/GSO. Without reduction of
nominal bandwidth.
I no longer have 3MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
He also ran some tests over a 10Gb link and
was able to get full wire speed, even with a relatively small output
limit.
There are some outstanding questions, still. For example, Tom Herbert asked about how this mechanism interacts with
more complex queuing disciplines; that question will take more time and
experimentation to answer. Tom also suggested that the limit could be made
dynamic and tied to the lower-level byte queue limits. Still, the patch
seems like an obvious win, so it has already been pulled into the net-next
tree for the 3.6 kernel. The details can be worked out
later, and the feature can always be turned off by default if problems
emerge during the 3.6 development cycle.
Comments (2 posted)
By Jake Edge
July 18, 2012
Configuring a kernel was once a fairly straightforward process, only
requiring knowledge of what hardware needs to be supported. Over
time, things have gotten more complex in general, but distributions have
added their own sets of dependencies on specific kernel
features—dependencies that can be difficult for regular users to
figure out. That led Linus Torvalds to put out an RFC proposal to add distribution-specific
kernel configuration options.
The problem stems from distributions' user space needing certain
configuration options enabled in order to function correctly. Things like
tmpfs and devtmpfs support, control groups, security options (e.g. SELinux,
AppArmor), and even raw netfilter table support were listed by Torvalds as
"support infrastructure" options that are required by various
distributions. But, in addition to being hard to figure out, those options
tend to change over time, so a configuration that worked for Fedora N may
not work for Fedora N+1. The resulting problems can be hard to find as
Torvalds pointed out: "There's been
several times when I started with my old minimal config, and the
resulting kernel would boot, but something wouldn't quite work right,
and it can be very subtle indeed."
So, he suggested adding distribution-specific Kconfig files:
The point I'm slowly getting to is that I would actually love to have
*distro* Kconfig-files, where the distribution would be able to say
"These are the minimums I *require* to work". So we'd have a "Distro"
submenu, where you could pick the distro(s) you use, and then pick
which release [...] it would make it much easier for a normal
user (and quite frankly, I want to put myself in that group too) to
make a kernel config that "just works".
There are others ways to get there, of course, but they leave something to
be desired, Torvalds said. Copying the distribution config file would
work, but would bring along a bunch of extra options that aren't really
necessary for the proper operation of the distribution. Using
make localmodconfig (which selects all of options from the
running kernel) suffers from much the same problem, he said. The ultimate
goal is to have more people able to build kernels:
I really think that
"How do I generate a kernel config file" is one of those things that
keeps normal people from compiling their own kernel. And we *want*
people to compile their own kernel so that they can help with things
like bisecting etc. The more, the merrier.
In general, the idea was met with approval on linux-kernel. There were
concerns about how the distribution-specific files would be maintained, and
that sometimes they might get out of sync with the distribution's
requirements. Dave Jones noted that he
sometimes gets blindsided by Fedora kernel requirements (and he is the
Fedora kernel maintainer).
Torvalds is pretty explicitly not looking for a perfect solution, however,
just one that is better: "even a 'educated guess' config file is
better than what we have now". In that message, he outlines two requirements that he
sees for the feature. The first is that each configuration option that is
selected for a particular distribution version come with a comment
explaining why it is needed. The second is that the configuration options
be the minimum required to make the system function properly—not that
it "grow to contain all the options just
because somebody decided to just add random things until things
worked".
Commenting the options may be difficult even for those who work directly on
distribution kernels though. Ben Hutchings (who maintains the Debian
kernel) pointed out that he sometimes does
not know the reason that a particular option is needed, particularly at
some later point: "just because an option
was requested and enabled to support some bit of userland, doesn't mean
I know what's using or depending on it now".
Other kinds of configuration options are possible, of course. In his
original message, Torvalds mentioned configurations for "common platforms",
such as a "modern PC laptop" that would choose options typically required
for those (USB storage, FAT/VFAT, power management, etc.). He specifically
said that platform configuration should be considered an entirely separate
feature from the distribution idea.
KVM (and other virtualization) users were also interested in creating an
option that
would select all of the drivers and other options needed for those kernels.
Currently "you need to hunt through 30+ different menus in order to find
what you need to run in a basic KVM virtual machine", as Trond
Myklebust put it. There was a lot of
discussion (and much agreement) on the need for better configuration
options for virtualization, but some of that got rather far afield from
Torvalds's original proposal.
Unsurprisingly, kernel developers started thinking about how
they could use the feature. There was concern that choosing a
particular distribution and its dependencies would make it harder for
kernel developers to further customize the configuration. David Lang had
some specific complaints about the approach
suggested in the RFC, noting that it would be hard to choose a Fedora kernel
without getting SELinux for example. He also was concerned about the
amount of churn these defconfig-like files might cause (referencing the
movement to reduce the number of defconfigs in the ARM tree). But Torvalds
makes it clear that Lang and other kernel
hackers are not
the target of the feature:
The thing I'm asking for is for normal people. Make it easy for people
who DO NOT CARE about the config file to just build a kernel for their
machine.
Don't complicate the issue by bringing up some totally unrelated
question. Don't derail a useful feature for the 99% because you're not
in it.
There may be ways to satisfy both camps—Lang seemed to think so anyway—but until someone actually posts some code, it's hard to say. While there
was general agreement that the feature would be useful, so far no one has
stepped up to do the work. Whether Torvalds plans to do that or was just
floating
a trial balloon and hoping someone else would run with it is unclear, but
it does seem like a feature worth having.
Comments (22 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Virtualization and containers
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>