LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 3.5-rc7; released on July 14. "Hey guys, remember how things have been stabilizing and slowing down, and all the kernel developers were off on summer vacation? Yeah, we need to talk about that." He is still hoping this is the last -rc before the final 3.5 release.

Stable updates: the 3.0.37 and 3.4.5 updates were released on July 16; 3.2.23 was released on July 13. The 3.0.38 and 3.4.6 updates are in the review process as of this writing; they can be expected on or after July 19.

Comments (none posted)

Quotes of the week

If I have to explain this I'm going to be very sad. [...]
-#define HV_LINUX_GUEST_ID_HI		0xB16B00B5
+#define HV_LINUX_GUEST_ID_HI		0x0DEFACED
— Signed-off-by: Matthew Garrett

And to make it worse, "x" itself was the result of doing "*&y". Which was probably written by the insane monkey's older brother, Max, who has been chewing Quaaludes for a few years, and as a result _his_ brain really isn't doing too well either. Even for a monkey. And now you're letting *him* at your keyboard too?
Linus Torvalds

Comments (11 posted)

30 Linux Kernel Developers in 30 Weeks: Dave Jones (Linux.com)

Linux.com interviews Dave Jones, Fedora's kernel maintainer, as part of its Kernel Developers series. "I needed to build my own kernel, because none of the distros shipped one that supported something I needed. And the feature I needed was only available in the development tree at the time (which was 2.1.x at the time). I don't recall what it was, but I think it may have been something silly like VFAT. Things weren't always stable, so I got into a habit of updating regularly (by carrying the latest tarball on a zip disk from the university to home). I started sending patches for things wherever I saw something I thought I could improve. I'm struggling to remember my first real accomplishment. It may have been fixing AFFS during the 2.1.x series. There were a whole bunch of really minor things before then."

Comments (2 posted)

Kernel development news

Random numbers for embedded devices

By Jonathan Corbet
July 17, 2012
Secure communications are dependent on good cryptography, and cryptography, in turn, is dependent on good random numbers. When cryptographic keys are generated from insufficiently-random values, they may turn out to be easily guessable by an attacker, leaving their user open to eavesdropping and man-in-the-middle attacks. For this reason, quite a bit of attention has been put into random number generation, but that does not mean that problems do not still exist. A set of patches intended for merging into the 3.6 kernel highlight some of the current concerns about random number generation in Linux.

Computing systems traditionally do not have sources of true randomness built into them. So they have operated by attempting to extract randomness from the environment in which they operate. The fine differences in timing between a user's keystrokes are one source of randomness, for example. The kernel can also use factors like the current time, interrupt timing, and more. For a typical desktop system, such sources usually provide enough randomness for the system's needs. Randomness gets harder to come by on server systems without a user at the keyboard. But the hardest environment of all may be embedded systems and network routers; these systems may perform important security-related tasks (such as the generation of host keys) before any appreciable randomness has been received from the environment.

As Zakir Durumeric, Nadia Heninger, J. Alex Halderman, and Eric Wustrow have documented, many of the latter class of systems are at risk, mostly as a result of keys generated with insufficient randomness and predictable initial conditions. They write: "We found that 5.57% of TLS hosts and 9.60% of SSH hosts share public keys in an apparently vulnerable manner, due to either insufficient randomness during key generation or device default keys." They were also able to calculate the actual keys used for a rather smaller (but still significant) percentage of hosts. Their site includes a key checker; concerned administrators may point it at their hosts to learn if their keys are vulnerable.

Fixes for this problem almost certainly need to be applied at multiple levels, but kernel-level fixes seem particularly important since the kernel is the source for most random numbers used in cryptography. To that end, Ted Ts'o has put together a set of patches designed to improve the amount of randomness available in the system from when it first boots. Getting there involves making a number of changes.

One of those is to fix the internal add_interrupt_randomness() function, which is used to derive randomness from interrupt timing. Use of this function has been declining in recent years, as a result of both its cost and concerns about the actual randomness of many interrupt sources. Ted's patch set tries to address the cost by batching interrupt-derived randomness on a per-CPU basis and only occasionally mixing it into the system-wide entropy pool. That mixing is also done with a new, lockless algorithm; this algorithm contains some small race conditions, but those could be seen to make the result even more random. An attempt is made to increase the amount of randomness obtained from interrupts by mixing in additional data, including the value of the instruction pointer at the time of the interrupt. After this change, adding randomness from interrupts should be fast and effective, so it is done by default for all interrupts; the IRQF_SAMPLE_RANDOM interrupt flag no longer has any effect.

Next, the patch set adds a new function:

    void add_device_randomness(const void *buf, unsigned int size);

The purpose is to allow drivers to mix in device-specific data that, while not necessarily random, is system-specific and unpredictable. Examples include serial, product, and manufacturer information from attached USB devices, the "write counter" from some realtime clock devices, and the MAC address from network devices. Most of this data should be random from the point of view of an attacker; it should help to prevent the situation where multiple, newly-booted devices generate the same keys.

Finally, Ted's patch set also changes the use of the hardware random number generator built into a number of CPUs. Rather than return random numbers directly from the hardware, the code now mixes hardware random data into the kernel's entropy pool and generates random numbers from there. His reasoning is that using hardware random numbers directly requires placing a lot of trust in the manufacturer:

It's unlikely that Intel (for example) was paid off by the US Government to do this, but it's impossible for them to prove otherwise --- especially since Bull Mountain is documented to use AES as a whitener. Hence, the output of an evil, trojan-horse version of RDRAND is statistically indistinguishable from an RDRAND implemented to the specifications claimed by Intel. Short of using a tunneling electronic microscope to reverse engineer an Ivy Bridge chip and disassembling and analyzing the CPU microcode, there's no way for us to tell for sure.

Mixing hardware random data into the entropy pool helps to mitigate that threat. The first time this patch came around, Linus rejected it, saying "It would be a total PR disaster for Intel, so they have huge incentives to be trustworthy." That opinion was not universally shared, though, and the patch remains in the current set. Chances are it will be merged in its current form.

An important part of the plan, though, is to get these patches into the stable updates despite their size. Then, with luck, device manufacturers will pick them up relatively quickly and stop shipping systems with a known weakness. Even better would be, as Ted suggested, to make changes at the user-space levels as well. For example, delaying key generation long enough to let some randomness accumulate should improve the situation even more. But making things better at the kernel level is an important start.

Comments (24 posted)

TCP small queues

By Jonathan Corbet
July 17, 2012
The "bufferbloat" problem is the result of excessive buffering in the network stack; it leads to long latencies and poor reliability in the network as a whole. Fixing it is a matter of buffering less data in each system between any two endpoints—a task that sounds simple, but proves to be more challenging than one might expect. It turns out that buffering can show up in many surprising places in the networking stack; tracking all of these places down and fixing them is not always easy.

A number of bloat-fighting changes have gone into the kernel over the last year. The CoDel queue management algorithm works to prevent packets from building up in router queues over time. At a much lower level, byte queue limits put a cap on the amount of data that can be waiting to go out a specific network interface. Byte queue limits work only at the device queue level, though, while the networking stack has other places—such as the queueing discipline level—where buffering can happen. So there would be value in an implementation that could limit buffering at levels above the device queue.

Eric Dumazet's TCP small queues patch looks like it should be able to fill at least part of that gap. It limits the amount of data that can be queued for transmission by any given socket regardless of where the data is queued, so it shouldn't be fooled by buffers lurking in the queueing, traffic control, or netfilter code. That limit is set by a new sysctl knob found at:

    /proc/sys/net/ipv4/tcp_limit_output_bytes

The default value of this limit is 128KB; it could be set lower on systems where latency is the primary concern.

The networking stack already tracks the amount of data waiting to be transmitted through any given socket; that value lives in the sk_wmem_alloc field of struct sock. So applying a limit is relatively easy; tcp_write_xmit() need only look to see if sk_wmem_alloc is above the limit. If that is the case, the socket is marked as being throttled and no more packets are queued.

The harder part is figuring out when some space opens up and it is possible to add more packets to the queue. The time when queue space becomes free is when a queued packet is freed. So Eric's patch overrides the normal struct sk_buff destructor when an output limit is in effect; the new destructor can check to see whether it is time to queue more data for the relevant socket. The only problem is that this destructor can be called from deep within the network stack with important locks already held, so it cannot queue new data directly. So Eric had to add a new tasklet to do the actual job of queuing new packets.

It seems that the patch is having the intended result:

Results on my dev machine (tg3 nic) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth. I no longer have 3MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes.

He also ran some tests over a 10Gb link and was able to get full wire speed, even with a relatively small output limit.

There are some outstanding questions, still. For example, Tom Herbert asked about how this mechanism interacts with more complex queuing disciplines; that question will take more time and experimentation to answer. Tom also suggested that the limit could be made dynamic and tied to the lower-level byte queue limits. Still, the patch seems like an obvious win, so it has already been pulled into the net-next tree for the 3.6 kernel. The details can be worked out later, and the feature can always be turned off by default if problems emerge during the 3.6 development cycle.

Comments (2 posted)

Kernel configuration for distributions

By Jake Edge
July 18, 2012

Configuring a kernel was once a fairly straightforward process, only requiring knowledge of what hardware needs to be supported. Over time, things have gotten more complex in general, but distributions have added their own sets of dependencies on specific kernel features—dependencies that can be difficult for regular users to figure out. That led Linus Torvalds to put out an RFC proposal to add distribution-specific kernel configuration options.

The problem stems from distributions' user space needing certain configuration options enabled in order to function correctly. Things like tmpfs and devtmpfs support, control groups, security options (e.g. SELinux, AppArmor), and even raw netfilter table support were listed by Torvalds as "support infrastructure" options that are required by various distributions. But, in addition to being hard to figure out, those options tend to change over time, so a configuration that worked for Fedora N may not work for Fedora N+1. The resulting problems can be hard to find as Torvalds pointed out: "There's been several times when I started with my old minimal config, and the resulting kernel would boot, but something wouldn't quite work right, and it can be very subtle indeed."

So, he suggested adding distribution-specific Kconfig files:

The point I'm slowly getting to is that I would actually love to have *distro* Kconfig-files, where the distribution would be able to say "These are the minimums I *require* to work". So we'd have a "Distro" submenu, where you could pick the distro(s) you use, and then pick which release [...] it would make it much easier for a normal user (and quite frankly, I want to put myself in that group too) to make a kernel config that "just works".

There are others ways to get there, of course, but they leave something to be desired, Torvalds said. Copying the distribution config file would work, but would bring along a bunch of extra options that aren't really necessary for the proper operation of the distribution. Using make localmodconfig (which selects all of options from the running kernel) suffers from much the same problem, he said. The ultimate goal is to have more people able to build kernels:

I really think that "How do I generate a kernel config file" is one of those things that keeps normal people from compiling their own kernel. And we *want* people to compile their own kernel so that they can help with things like bisecting etc. The more, the merrier.

In general, the idea was met with approval on linux-kernel. There were concerns about how the distribution-specific files would be maintained, and that sometimes they might get out of sync with the distribution's requirements. Dave Jones noted that he sometimes gets blindsided by Fedora kernel requirements (and he is the Fedora kernel maintainer).

Torvalds is pretty explicitly not looking for a perfect solution, however, just one that is better: "even a 'educated guess' config file is better than what we have now". In that message, he outlines two requirements that he sees for the feature. The first is that each configuration option that is selected for a particular distribution version come with a comment explaining why it is needed. The second is that the configuration options be the minimum required to make the system function properly—not that it "grow to contain all the options just because somebody decided to just add random things until things worked".

Commenting the options may be difficult even for those who work directly on distribution kernels though. Ben Hutchings (who maintains the Debian kernel) pointed out that he sometimes does not know the reason that a particular option is needed, particularly at some later point: "just because an option was requested and enabled to support some bit of userland, doesn't mean I know what's using or depending on it now".

Other kinds of configuration options are possible, of course. In his original message, Torvalds mentioned configurations for "common platforms", such as a "modern PC laptop" that would choose options typically required for those (USB storage, FAT/VFAT, power management, etc.). He specifically said that platform configuration should be considered an entirely separate feature from the distribution idea.

KVM (and other virtualization) users were also interested in creating an option that would select all of the drivers and other options needed for those kernels. Currently "you need to hunt through 30+ different menus in order to find what you need to run in a basic KVM virtual machine", as Trond Myklebust put it. There was a lot of discussion (and much agreement) on the need for better configuration options for virtualization, but some of that got rather far afield from Torvalds's original proposal.

Unsurprisingly, kernel developers started thinking about how they could use the feature. There was concern that choosing a particular distribution and its dependencies would make it harder for kernel developers to further customize the configuration. David Lang had some specific complaints about the approach suggested in the RFC, noting that it would be hard to choose a Fedora kernel without getting SELinux for example. He also was concerned about the amount of churn these defconfig-like files might cause (referencing the movement to reduce the number of defconfigs in the ARM tree). But Torvalds makes it clear that Lang and other kernel hackers are not the target of the feature:

The thing I'm asking for is for normal people. Make it easy for people who DO NOT CARE about the config file to just build a kernel for their machine.

Don't complicate the issue by bringing up some totally unrelated question. Don't derail a useful feature for the 99% because you're not in it.

There may be ways to satisfy both camps—Lang seemed to think so anyway—but until someone actually posts some code, it's hard to say. While there was general agreement that the feature would be useful, so far no one has stepped up to do the work. Whether Torvalds plans to do that or was just floating a trial balloon and hoping someone else would run with it is unclear, but it does seem like a feature worth having.

Comments (22 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds