User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel is still 2.6.7; it has been almost a month since that release (which happened on June 15) and no 2.6.8 prepatches have yet come out.

Linus's BitKeeper tree continues to grow, though at a slower rate. Recent additions include a new, faster scrolling mode for framebuffer consoles, a serial ATA update, various architecture updates, many fixes for a new series of locking bugs reported by the Stanford checker, a fix for a /proc permissions bug (see below), and lots of fixes.

The current tree from Andrew Morton is 2.6.7-mm6. Recent additions to -mm include packet writing support for DVD-RW and CD-RW drives, a new set of scheduler tweaks, an IDE update and various fixes.

The current 2.4 prepatch is 2.4.27-rc3, which was released by Marcelo on July 3. Very few patches were added this time around; things would appear to be stabilizing toward the 2.4.27-final release.

Comments (none posted)

Kernel development news

Quote of the week

The stuff that's gone around looks minor. It's not like they're teaching sched.c to play cpu tetris for gang scheduling or Kalman filtering profiling feedback to stripe tasks using different cpu resources across SMT siblings or playing graph games to meet RT deadlines, so it doesn't look like very much at all is going on to me.

It's pretty obvious why everyone and their brother is grinding out purported scheduler rewrites: the code is self-contained, however, nothing interesting is coming of all this. Never been for have so many patches been written against the same file, accomplishing so little.

-- William Lee Irwin would like to see more ambitious scheduler patches.

Comments (1 posted)

TCP window scaling and broken routers

Every TCP packet includes, in the header, a "window" field which specifies how much data the system which sent the packet is willing and able to receive from the other end. The window is the flow control mechanism used by TCP; it controls the maximum amount of data which can be "in flight" between two communicating systems and keeps one side from overwhelming the other with data.

In the early days of TCP, windows tended to be relatively small. The computers of that age did not have huge amounts of memory to dedicate toward buffering network data, and the available networking technology was not fast enough to make use of a larger window in any case. Modern network interfaces can handle larger packets and keep more of them in flight at any given time; they will perform better with a larger window. Some kinds of high-speed long-haul links can have very high bandwidth, but also high latency. Keeping that sort of pipe filled can require a very large window; if a sending system cannot have a large number of packets in transit at any given time, it will not be able to make use of the bandwidth available. For these reasons, good performance can often require very large windows.

The TCP window field, however, is only 16 bits wide, allowing for a maximum window size of 64KB. The TCP designers must have thought that nobody would ever need a larger window than that. But 64KB is not even close to what is needed in many situations today. The solution to this problem is called "window scaling." It is not new; window scaling was codified in RFC 1323 back in 1992. It is also not complicated: a system wanting to use window scaling sets a TCP option containing an eight-bit scale factor. All window values used by that system thereafter should be left-shifted by that scale factor; a window scale of zero, thus, implies no scaling at all, while a scale factor of five implies that window sizes should be shifted five bits, or multiplied by 32. With this scheme, a 128KB window could be expressed by setting the scale factor to five and putting 4096 in the window field.

To keep from breaking TCP on systems which do not understand window scaling, the TCP option can only be provided in the initial SYN packet which initiates the connection, and scaling can only be used if the SYN+ACK packet sent in response also contains that option. The scale factor is thus set as part of the setup handshake, and cannot be changed thereafter.

The details are still being figured out, but it would appear that some routers on the net are rewriting the window scale TCP option on SYN packets as they pass through. In particular, they seem to be setting the scale factor to zero, but leaving the option in place. The receiving side sees the option, and responds with a window scale factor of its own. At this point, the initiating system believes that its scale factor has been accepted, and scales its windows accordingly. The other end, however, believes that the scale factor is zero. The result is a misunderstanding over the real size of the receive window, with the system behind the firewall believing it to be much smaller than it really is. If the expected scale factor (and thus the discrepancy) is large, the result is, at best, very slow communication. In many cases, the small window can cause no packets to be transmitted at all, breaking TCP between the two affected systems entirely.

In the 2.6.7 kernel, the default scale factor is zero; in Linus's BitKeeper tree and the 2.6.7-mm kernels, instead, it has been increased to seven. This change has brought the broken router behavior to light; suddenly people running current kernels are finding that they cannot talk to a number of systems out there. One of the higher-profile affected sites is Gentoo users are, unsurprisingly, not pleased.

As a way of making things work, Stephen Hemminger has proposed a patch which adds a calculation to select the smallest scale factor which covers the largest possible window size. The result on most systems is that the scale factor gets set to two. This factor will still be corrupted by broken routers, but the resulting window size (¼ of what it should be) is still large enough to allow communication to happen.

The patch makes networking with systems behind broken routers work again, but it has been rejected anyway. The networking maintainers (and David Miller in particular) believe that the patch simply papers over a problem, and that adding hacks to the Linux network stack to accommodate broken routers is a mistake. If, instead, the situation is left as it is, pressure on the router manufacturers should get the problem fixed relatively quickly. It has been a few years, now, that Linux has a strong enough presence in the networking world that it can get away with taking this sort of position.

In the mean time, anybody running a current kernel who is having trouble connecting to a needed site can work around the problem with a command like:

    echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale 

or by adding a line like:

    net.ipv4.tcp_default_win_scale = 0

to /etc/sysctl.conf.

Comments (21 posted)

Cryptographic signatures on kernel modules

The latest Fedora Rawhide kernels come with an interesting feature: the ability to enforce cryptographic signatures on loadable modules. This capability has a few uses:

  • Preventing the kernel from loading modules which have somehow been corrupted.

  • Making it harder for an attacker to install a rootkit on a compromised system.

  • Enabling vendors of enterprise Linux distributions to block the loading of unapproved modules into stock kernels. (It should be noted that, at this point, no vendor has indicated any plans to restrict module loading in this way.)

The code which handles signed modules was originally written by Greg Kroah-Hartman; it has subsequently been fixed up in various ways by David Howells. Greg wrote a Linux Journal article about his work back in January.

The signature code works by looking at the most interesting ELF sections within a module file: the .text (program code) and .data (initialized data) areas. When the module is built, a script uses the objdump utility to extract those sections; the result can be fed to gpg to generate a signature. That signature is then patched into the module as yet another section, called module_sig. Overall, adding signatures is a relatively small change to the module build process.

The signatures are not much use, however, if nobody checks them; implementing that check within the kernel is a somewhat larger business. The 2.6 kernel includes a whole cryptographic subsystem, but that code is oriented toward the needs of networking and encrypted filesystems. Verifying module signatures using public keys was not one of the objectives when the crypto API was added. To support this task, several thousand lines of code must be added to the kernel; they perform arbitrary-precision integer arithmetic (this code came directly from GnuPG), DSA signature verification (also from GnuPG), simple in-kernel key management, and the code to actually verify module data against signatures.

As things stand in the patch currently, any public keys used to verify modules are built directly into the kernel itself. Being able to add a site-specific key at run time would be a convenient feature, but it would also defeat the purpose of this whole exercise. Any attacker who is in a position to load malevolent modules could just load a new key first, thus circumventing the signature verification. Even as things stand, a kernel using signature verification should be set up to not allow overwriting of in-kernel key data by way of /dev/kmem and such.

With all that infrastructure in place, a relatively small set of patches makes the module loader actually verify signatures. Once again, the interesting sections are stripped out, and a checksum is generated with the SHA1 algorithm. If the signature in the module (1) can be decrypted with a public key contained within the kernel, and (2) contains the same checksum, the module checks out and can be loaded.

In the code, one can see the traces of a kernel developer encountering an interesting problem. In many systems, the SHA1 transform code is kept in a loadable module. The module loader, when it attempts to verify the signature of a different module, could well force the kernel to try loading the SHA1 module. The module code, however, takes the module_mutex semaphore very early in the process; the recursive attempt will thus simply deadlock the whole thing. To avoid this problem, the crypto API was enhanced with a crypto_alloc_tfm2() function which can be instructed to not load any modules while setting itself up. The SHA1 code will have to be linked directly into the kernel if it is used for module verification.

Rawhide kernels come configured to verify any signatures found in modules, but they will also happily load modules with no signature at all. There is a configuration option which tightens things up, however, so that only signed modules will be accepted. One wonders how much a proprietary module vendor might pay to have their public key included in a distributor's stock kernels once that option is turned on.

Comments (6 posted)

Fun with /proc permissions

Herbert Poetzl discovered some interesting behavior in the 2.6 kernel: it seems that any user can set arbitrary permissions on most files in /proc. A patch had been merged back in the 2.5 days which enabled changing of permissions, but an important check got left out.

For the most part, the security implications of this bug are small, but real. Local users can make files in /proc inaccessible, which can break commands (like ps) which rely on them. Making /proc/sysrq-trigger writable allows some obnoxious mayhem to be created. On the other hand, changing permissions in /proc/sys has no useful effect: the sysctl code performs its own permissions checking on top of what the filesystem does. The actual process entries under /proc do their own checking as well, and do not allow the permissions to be changed.

The fix is simple, and has been merged for 2.6.8. But some developers wondered why anybody would want to mess with permissions in /proc in the first place. It turns out that there is some information there which, in some cases, people would like to hide from other users on the system. Command lines for specific processes and TCP connection tracking information were mentioned as specific examples. So permissions tweaking in /proc will remain - but not just anybody will be able to do it.

Comments (none posted)

Patches and updates

Kernel trees


Core kernel code

Development tools

Device drivers


Filesystems and block I/O

Memory management




Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds