LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.11-rc3, released by Linus on February 2. This prepatch adds an XFS update, a set of out-of-memory killer fixes, a generic transport class mechanism (which replaces the SCSI transport code), some architecture updates, the removal of bcopy(), a fix for writable module parameters in sysfs (it never actually worked before), and various fixes. See the long-format changelog for the details.

Linus's BitKeeper repository contains a small number of patches, including some IDE updates, some additional checking in read() and write() (see below), a DMA blacklist for problematic serial ATA drives, and a handful of fixes.

The current -mm tree is 2.6.11-rc3-mm1. Recent changes to -mm include a firewire update, the address space randomization patches (covered on last week's security page), the "BIO pool" mechanism, the removal of the realtime rlimit patch (see below), and more fixes.

There still have been no 2.4.30 prepatches.

Comments (none posted)

Kernel development news

Quotes of the week

A large part of kernel history is currently practically locked into bk. bk isn't doing what I need, so naturally I'm looking for alternatives, but I don't have the freedom to take my data and try it with some other tool. Was that really part of the deal when bk was introduced that I'm denied of this freedom?
-- Roman Zippel

It's exactly the same as a file system. If you put some files into a file system does the file system creator owe you the knowledge of how those files are maintained in the file system? Since when is that part of the deal?
-- Larry McVoy

If you must follow this conversation, the thread can be found over here.

Comments (11 posted)

Audio latency goes full circle

Two weeks ago, it appeared that a solution to the problem of low-latency scheduling for audio applications had been found. Ingo Molnar's approach, which allowed unprivileged processes to use the realtime scheduling modes as long as they did not use more than an administrator-specified portion of the available CPU time, seemed like a reasonably straightforward way to go. Ingo's patch had gone into the -mm tree for further testing.

The rlimit approach keeps a rogue process from taking over the system entirely. It does not, however, prevent abuse by poorly-behaved software. If even limited access to realtime scheduling became widely available on Linux systems, it would only be a matter of time until developers figured out that they could make their programs seem faster by using the realtime mode. Proprietary applications could be particularly problematic in this regard; distributors would likely rip out unwarranted realtime scheduling calls in free software that they ship, but that cannot be done with proprietary code.

Other concerns with the rlimit approach include the need for some audio applications to get fast access to the CPU even if they require 100% of the available time, and general unease with tweaking the scheduler for this use. The end result is that the rlimit patch has come back out of -mm, and Ingo has said:

i'm not opposed to the LSM solution per se, especially given that none of the other solutions in existence are fully satisfactory (and thus acceptable for the scheduler currently). The LSM patch is clearly the least intrusive solution.

Those who have been following the discussion will remember that the whole long thing began because certain kernel developers did not feel that the realtime security module (which gives members of an administrator-specified group access to realtime scheduling) was acceptable for inclusion. So the discussion has come back to where it started, and it appears that the realtime security module will be merged (though that had not happened as of this writing). Ingo apologized for the whole thing, explaining it this way:

it is just an unfortunate situation that the issue here is _not_ clear-cut at all. It is a longstanding habit on lkml to try to solve things as cleanly and generally as possible, but there are occasional cases where this is just not possible.

One remaining problem with the realtime security module is that it gives audio users the right to monopolize the processor with any program they run, not just audio utilities. Making the audio programs run in a setgid mode might seem like a way around that issue, except for the fact that the GTK+ toolkit actively prevents things from working that way. The unfortunate result is that users must be given more privilege than they actually need. Most of the time, that should be acceptable; multi-user audio workstations are likely to be relatively rare.

Comments (12 posted)

read() and write() access checking

Long ago, when the 2.0 kernel was the state of the art, the implementation of the read() and write() system calls (and readv() and writev() too) behaved a little differently than now. Then, as now, the main purpose of the core implementation of those system calls was to pass the call on to the appropriate function in the filesystem code or device driver handling the file of interest after dealing with any relevant file locking details. In many ways, sys_read() and friends in 2.6 look very much like they did in 2.0.

The 2.0 implementation differed, however, in that it checked whether the calling process had the ability to read or write the buffer it passed into the kernel. The semantics of a read() call, say, should be the same regardless of where the data is being read from. So it made sense to check, before invoking the VFS or device driver, that the buffer passed to read() was writable by the calling process. In 2.2, that check went away, possibly as part of the big changes made to how user-space access checks were implemented. Performing those checks became entirely the responsibility of the lower-level code.

Linus recently merged a patch which restores the upper-level checks for 2.6.11. The reason given with the patch is that checks performed in lower-level code only verify the range of memory which will actually be read from or written to. If that range is smaller than the application requested (because the file is not that long, say), part of the range requested by the application will not be checked. The operation of the system is entirely correct in this case, but an opportunity to flag a bug in the calling program will have been missed.

It also doesn't hurt that placing the check at the entry point to the kernel ensures that it will be done in all situations. One less opportunity for security problems resulting from forgotten checks in lower-level code can only be a good thing. It seems almost certain that at least one such vulnerability must exist somewhere in the 2.6 kernel.

One might conclude that low-level code, such as device drivers, need no longer perform the access_ok() check, since it is now being handled at a higher level. A prudent developer, however, would probably leave that check in place. It is quite cheap on most architectures (it generally just ensures that the given buffer is not located in kernel space), and the higher-level checks went away once before. Safe is better than sorry, especially when being safe is so easy.

(For completeness, it's worth noting that Linus merged another patch which ensures that a read or write operation does not overflow the file offset).

Comments (none posted)

More hooks for kernel events

The kernel has, for a while now, been accumulating hooks for informing user space when things happen. Some of the current mechanisms include:

  • The hotplug mechanism, which invokes a user-space program (/sbin/hotplug by default) when kobjects are registered or unregistered (generally in response to the addition or removal of hardware on the system).

  • The Linux security module (LSM) hooks, which enable a loadable module to respond to (and possibly veto) dozens of actions by user-space processes. The LSM mechanism is used by, among other things, SELinux and the realtime LSM module.

  • The lightweight audit framework uses a netlink socket to pass information on kernel events to user space, with the idea that these events will be logged somewhere.

  • The kernel events mechanism, which also uses netlink, is a simple scheme for notifying user space of events which might be of interest to the user(s).

One might think that, at this point, the kernel is sufficiently well instrumented that more hooks would be unnecessary. But more are on the way.

One of those is the relay fork module, proposed by Guillaume Thouvenin. Its sole purpose is to inform interested user-space processes when a process forks; the intended user is the enhanced Linux system accounting project. Rather than use one of the existing mechanisms for conveying information to user space, the relay fork patch works by sending a signal to the interested process(es) whenever a fork occurs.

The patch works by adding a new sysfs directory (/sys/relayfork) with a couple of control attributes. The attribute signal controls which signal is sent; by default, signal 33 (which is in the realtime signal range on most architectures) is used. The other attribute (processes) contains a list of the processes receiving these signals. Registering a process for receipt of "relay fork" signals is simply a matter of writing its process ID to the processes attribute.

This patch may eventually go in, but probably not with the signal mechanism. Guillaume was encouraged to use the kernel events mechanism instead, and he has agreed that it is a workable solution.

Meanwhile, the vSecurity project is working to put together a number of hardening technologies in a form suitable for merging into the mainline. To that end, a couple of new LSM hooks have been proposed. This one adds a hook for invocations of the chroot() call, which, interestingly, has no such hook now. The purpose is not so much to control the use of chroot() as to note that it has happened and take steps, in other security hooks, to ensure that the process does not break out of its restricted subtree.

The other patch adds a hook to chmod(). This one is unlikely to be merged, since a separate hook, which is called for inode attribute changes, already exists. The vSecurity hacker (Lorenzo Hernández García-Hierro) has indicated that he has other hooks he wishes to place, but those have not yet been posted for review.

Comments (none posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds