Release status
Kernel release status
The current 2.6 kernel is still 2.6.7; it has been almost a month
since that release (which happened on June 15) and no 2.6.8 prepatches have
yet come out.
Linus's BitKeeper tree continues to grow, though at a slower rate. Recent
additions include a new, faster scrolling mode for framebuffer consoles, a
serial ATA update, various architecture updates, many fixes for a new
series of locking bugs reported by the
Stanford checker, a fix for a /proc permissions bug (see below),
and lots of fixes.
The current tree from Andrew Morton is 2.6.7-mm6. Recent additions to -mm include
packet writing support for DVD-RW and CD-RW drives, a new set of scheduler
tweaks, an IDE update and various fixes.
The current 2.4 prepatch is 2.4.27-rc3, which was released by Marcelo on July 3. Very few
patches were added this time around; things would appear to be stabilizing
toward the 2.4.27-final release.
Comments (none posted)
Kernel development news
Quote of the week
The stuff that's gone around looks minor. It's not like they're teaching
sched.c to play cpu tetris for gang scheduling or Kalman filtering
profiling feedback to stripe tasks using different cpu resources across
SMT siblings or playing graph games to meet RT deadlines, so it doesn't
look like very much at all is going on to me.
It's pretty obvious why everyone and their brother is grinding out
purported scheduler rewrites: the code is self-contained, however,
nothing interesting is coming of all this. Never been for have so many
patches been written against the same file, accomplishing so little.
-- William Lee Irwin would like to see more
ambitious scheduler patches.
Comments (1 posted)
TCP window scaling and broken routers
Every TCP packet includes, in the header, a "window" field which specifies
how much data the system which sent the packet is willing and able to
receive from the other end. The window is the flow control mechanism used
by TCP; it controls the maximum amount of data which can be "in flight"
between two communicating systems and keeps one side from overwhelming the
other with data.
In the early days of TCP, windows tended to be relatively small. The
computers of that age did not have huge amounts of memory to dedicate
toward buffering network data, and the available networking technology was
not fast enough to make use of a larger window in any case. Modern network
interfaces can handle larger packets and keep more of them in flight at any
given time; they will perform better with a larger window. Some kinds of
high-speed long-haul links can have very high
bandwidth, but also high latency. Keeping that sort of pipe filled can
require a very large window; if a sending system cannot have a large number
of packets in transit at any given time, it will not be able to make use of
the bandwidth available. For these reasons, good performance can often
require very large windows.
The TCP window field, however, is only 16 bits wide, allowing for a maximum
window size of 64KB. The TCP designers must have thought that nobody would
ever need a larger window than that. But 64KB is not even close to what is
needed in many situations today.
The solution to this problem is called "window scaling." It is not new;
window scaling was codified in RFC 1323 back in
1992. It is also not complicated: a system wanting to use window scaling
sets a TCP option containing an eight-bit scale factor. All window values
used by that system thereafter should be left-shifted by that scale factor;
a window scale of zero, thus, implies no scaling at all, while a scale
factor of five implies that window sizes should be shifted five bits, or
multiplied by 32. With this scheme, a 128KB window could be expressed by
setting the scale factor to five and putting 4096 in the window field.
To keep from breaking TCP on systems which do not understand window
scaling, the TCP option can only be provided in the initial SYN packet
which initiates the connection, and scaling can only be used if the SYN+ACK
packet sent in response also contains that option. The scale factor is
thus set as part of the setup handshake, and cannot be changed thereafter.
The details are still being figured out, but it would appear that some
routers on the net are rewriting the window scale TCP option on SYN packets as
they pass through. In particular, they seem to be setting the scale
factor to zero, but leaving the option in place. The receiving side sees
the option, and responds with a window scale factor of its own. At this
point, the initiating system believes that its scale factor has been
accepted, and scales its windows accordingly. The other end, however,
believes that the scale factor is zero. The result is a misunderstanding
over the real size of the receive window, with the system behind the
firewall believing it to be much smaller than it really is. If the
expected scale factor (and thus the discrepancy) is large, the result is,
at best, very slow communication. In many cases, the small window can
cause no packets to be transmitted at all, breaking TCP between the two
affected systems entirely.
In the 2.6.7 kernel, the default scale factor is zero; in Linus's BitKeeper
tree and the 2.6.7-mm kernels, instead, it has been increased to seven.
This change has brought the broken router behavior to light; suddenly
people running current kernels are finding that they cannot talk to a
number of systems out there. One of the higher-profile affected sites is
packages.gentoo.org. Gentoo
users are, unsurprisingly, not pleased.
As a way of making things work, Stephen Hemminger has proposed a patch which adds a calculation to select the
smallest scale factor which covers the largest possible window size. The
result on most systems is that the scale factor gets set to two. This
factor will still be corrupted by broken routers, but the resulting window
size (¼ of what it should be) is still large enough to allow
communication to happen.
The patch makes networking with systems behind broken routers work again,
but it has been rejected anyway. The
networking maintainers (and David Miller in particular) believe that the
patch simply papers over a problem, and that adding hacks to the Linux
network stack to accommodate broken routers is a mistake. If, instead, the
situation is left as it is, pressure on the router manufacturers should get
the problem fixed relatively quickly. It has been a few years, now, that
Linux has a strong enough presence in the networking world that it can get
away with taking this sort of position.
In the mean time, anybody running a current kernel who is having trouble
connecting to a needed site can work around the problem with a command
like:
echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale
or by adding a line like:
net.ipv4.tcp_default_win_scale = 0
to /etc/sysctl.conf.
Comments (18 posted)
Cryptographic signatures on kernel modules
The latest Fedora Rawhide kernels come with an interesting feature: the
ability to enforce cryptographic signatures on loadable modules. This
capability has a few uses:
- Preventing the kernel from loading modules which have somehow been
corrupted.
- Making it harder for an attacker to install a rootkit on
a compromised system.
- Enabling vendors of enterprise Linux distributions to block the
loading of unapproved modules into stock kernels.
(It should be noted that, at this point, no vendor has indicated any plans
to restrict module loading in this way.)
The code which handles signed modules was originally written by Greg
Kroah-Hartman; it has subsequently been fixed up in various ways by David
Howells. Greg wrote a Linux Journal
article about his work back in January.
The signature code works by looking at the most interesting ELF sections
within a module file: the .text (program code) and .data
(initialized data) areas. When the module is built, a script uses the
objdump utility to extract those sections; the result can be fed
to gpg to generate a signature. That signature is then patched
into the module as yet another section, called module_sig.
Overall, adding signatures is a relatively small change to the module build
process.
The signatures are not much use, however, if nobody checks them;
implementing that check within the kernel is a somewhat larger business.
The 2.6 kernel includes a whole cryptographic subsystem, but that code is
oriented toward the needs of networking and encrypted filesystems.
Verifying module signatures using public keys was not one of the objectives
when the crypto API was added. To support this task, several thousand
lines of code must be added to the kernel; they perform arbitrary-precision
integer arithmetic (this code came directly from GnuPG), DSA signature
verification (also from GnuPG), simple in-kernel key management, and the
code to actually verify module data against signatures.
As things stand in the patch currently, any public keys used to verify
modules are built directly into the kernel itself. Being able to add a
site-specific key at run time would be a convenient feature, but it would
also defeat the purpose of this whole exercise. Any attacker who is in a
position to load malevolent modules could just load a new key first, thus
circumventing the signature verification. Even as things stand, a kernel
using signature verification should be set up to not allow overwriting of
in-kernel key data by way of /dev/kmem and such.
With all that infrastructure in place, a relatively small set of patches
makes the module loader actually verify signatures. Once again, the
interesting sections are stripped out, and a checksum is generated with the
SHA1 algorithm. If the signature in the module (1) can be decrypted
with a public key contained within the kernel, and (2) contains the
same checksum, the module checks out and can be loaded.
In the code, one can see the traces of a kernel developer encountering an
interesting problem. In many systems, the SHA1 transform code is kept in a
loadable module. The module loader, when it attempts to verify the
signature of a different module, could well force the kernel to try loading
the SHA1 module. The module code, however, takes the module_mutex
semaphore very early in the process; the recursive attempt will thus simply
deadlock the whole thing. To avoid this problem, the crypto API was
enhanced with a crypto_alloc_tfm2() function which can be
instructed to not load any modules while setting itself up. The SHA1
code will have to be linked directly into the kernel if it is used for
module verification.
Rawhide kernels come configured to verify any signatures found in modules,
but they will also happily load modules with no signature at all. There is
a configuration option which tightens things up, however, so that only
signed modules will be accepted. One wonders how much a proprietary module
vendor might pay to have their public key included in a distributor's stock
kernels once that option is turned on.
Comments (6 posted)
Fun with /proc permissions
Herbert Poetzl
discovered some interesting
behavior in the 2.6 kernel: it seems that any user can set arbitrary
permissions on most files in
/proc. A patch had been merged back in
the 2.5 days which enabled changing of permissions, but an important check
got left out.
For the most part, the security implications of this bug are small, but
real. Local users can make files in /proc inaccessible, which can
break commands (like ps) which rely on them. Making
/proc/sysrq-trigger writable allows some obnoxious mayhem to be
created. On the other hand, changing permissions in /proc/sys has
no useful effect: the sysctl code performs its own permissions checking on
top of what the filesystem does. The actual process entries under
/proc do their own checking as well, and do not allow the
permissions to be changed.
The fix is simple, and has been merged for 2.6.8. But some developers
wondered why anybody would want to mess with permissions in /proc
in the first place. It turns out that there is some information there
which, in some cases, people would like to hide from other users on the
system. Command lines for specific processes and TCP connection tracking
information were mentioned as specific examples. So permissions tweaking
in /proc will remain - but not just anybody will be able to do
it.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>