LWN.net Logo

Kernel development

Brief items

Kernel release status

The current 2.6 prepatch is 2.6.22-rc4, released on June 4. It adds a few hundred fixes aimed at further stabilizing 2.6.22. See the long-format changelog for the details.

As of this writing, no patches have found their way into the mainline since -rc4.

The current -mm tree is 2.6.22-rc4-mm1. Recent changes to -mm include an operation to disable all I/O space access (for virtualized guests), a lengthy patch set aimed at fixing a page fault deadlock, some suspend/hibernate work, the O_CLOEXEC patch (see below), support code for Xen on the x86-64 architecture, ext4 support for the upcoming fallocate() system call, and the containers patch set.

For older kernels: 2.6.16.52 was released on May 31 with a handful of fixes.

Comments (none posted)

Kernel development news

Quotes of the week

I'm convinced there's some contest to see who can make the worst graphical mail client for Linux. I'm not sure what the prize is, or who's winning, but the entries so far are horrific.
-- Dave Jones

Lotus Notes has no serious competition.

Andy's patch-checking script will (should) detect wordwrapping, tab-expansion and hopefully space-stuffing. When we get that sorted out, people who submit broken patches to one of the lists should get a robot reply within minutes telling them what they did wrong, so things will become largely self-correcting.

I am sooooo looking forward to that thing. <Sends note to Nobel prize committee>

-- Andrew Morton

Comments (none posted)

Fun with file descriptors

Last week's article on syslets briefly mentioned a problem with using file descriptors for low-level communications with the kernel. There is a single namespace for file descriptors, combined with a strict rule for how those descriptors are allocated. As long as the application is fully in charge of that space all works well, and the "lowest available descriptor" rule can be relied upon. As soon as hidden levels (the C library in particular) start using file descriptors for their own purposes, though, the potential for conflicts and confusion at the application level arises. An application which makes a mistaken assumption about where a file descriptor will be allocated, or which indiscriminately "cleans up" open descriptors belonging to the libraries will break. This problem is evidently real, to the point that the glibc goes out of its way to avoid using internal file descriptors for anything.

This issue is a problem for kernel developers. They would rather not create new, file-descriptor-based services (completion events for syslet-based asynchronous I/O, for example) if glibc will not use those services. So there has been a search for alternatives, most of which involve creating a separate space for "system" file descriptors. Linus suggested one way of doing this:

Which *could* be something as simple as saying "bit 30 in the file descriptor specifies a separate fd space" along with some flags to make open and friends return those separate fd's. That makes them useless for "select()" (which assumes a flat address space, of course), but would be useful for just about anything else.

Davide Libenzi took this idea forward with a patch to create a non-sequential file descriptor area. The current kernel tracks file descriptors in a linear array - a technique which works well as long as the "lowest available descriptor" rule holds. As soon as one starts setting high-order bits in file descriptor numbers, however, the linear array becomes rather less practical. So Davide's patch creates a separate, linked-list data structure used for the non-sequential file descriptor range. The second part of the patch set then fixes up the dup2() system call to use the new file descriptor range. The normal behavior of dup2() has not changed, but if the destination file descriptor is passed as FD_UNSEQ_ALLOC, a random file descriptor will be allocated from the non-sequential area. A specific file descriptor in that area can be requested by passing a number higher than FD_UNSEQ_BASE.

This approach has the advantage of not requiring any new system calls or changing the default user-space binary interface at all. But according to Ulrich Drepper, that attribute is not an advantage at all. Since using this capability requires application changes in any case, Ulrich would rather just see a new system call created; he proposes:

    int nonseqfd(int fd, int flags);

This system call would duplicate the open file descriptor fd into the non-sequential space, optionally closing fd in the process. The flags parameter would allow other attributes of the new file descriptor to be controlled. Of particular interest is whether that descriptor shows up in the /proc/pid/fd directory. The optimal way of closing all open file descriptors, apparently, is to read that directory to see which descriptors are currently open. Keeping special descriptors out of that directory (perhaps shifting them to a parallel private-fd directory) will prevent well-meaning applications from closing the library's file descriptors.

It has been suggested that the open() system call should get a flag which would cause it to select a non-sequential file descriptor from the outset, eliminating the need for a separate call to nonseqfd(). There are, however, a number of system calls which create file descriptors but which have no flags parameter and which, thus, will never be able to return non-sequential file descriptors; socket() is a classic example. So there will still be a need for a system call which can duplicate a file descriptor into the new space.

Ulrich has requested that all file descriptors in the non-sequential space be allocated randomly. He would rather not ever see a situation where application developers think they can rely on any specific allocation behavior when using that space. There have also been suggestions that the non-sequential space could be useful for for high-performance applications which hold large numbers of file descriptors open - web servers, for example. Such applications usually have no use for the "lowest available descriptor" guarantee and would happily do without the overhead of implementing that guarantee. Davide's current implementation does not appear to have been written with thousands of non-sequential file descriptors in mind, though.

On another front, Ulrich has been working on a race condition which comes up with certain types of applications. It is possible to request that a file descriptor be automatically closed if the process performs an exec(); the fcntl() system call is used for this purpose. The problem is that there is some time between when the file descriptor is created (with an open() call, perhaps) and the subsequent fcntl() call. If another thread forks and runs a new program between those two calls, its copy of the new file descriptor will not have the close-on-exec flag set and will thus remain open.

Solving that problem generally will take some work, but fixing the open case is relatively easy. Ulrich is proposing a new O_CLOEXEC flag for this purpose. There does not appear to be much opposition to this idea, so the new flag might well make an appearance in 2.6.23.

Comments (18 posted)

The thorny case of kmalloc(0)

People running 2.6.22-rc kernels have likely noticed the occasional warning and traceback associated with zero-length allocations. It turns out that there is code in the kernel which asks kmalloc() to allocate a zero-sized object. Nobody really knew how often this happens until the warning went in as part of the SLUB allocator patch set; now that these cases are turning up, it seems that deciding what to do about them is harder than one might expect.

One possibility is to return NULL. On the face of it, this option would appear to make sense; the caller has requested that no memory be allocated, and kmalloc() has complied. The problem here is that a NULL pointer is already loaded with meaning. It says that the allocation has failed (which it didn't - there is always enough memory left to allocate another zero bytes) and is often used as an indicator that a particular structure or subsystem has not been initialized. More to the point, it seems that there is an occasional situation where a zero-length allocation is not entirely incorrect; consider the allocation of a structure which, as a result of the kernel's configuration options, has been optimized down to zero members. Coding around such cases is possible, but it is not clear that adding more twists and turns is worth the trouble when zero-length allocations can just be handled in kmalloc().

Another possibility is to return the smallest object that kmalloc() can manage - currently eight bytes. That is what kmalloc() has silently done for years. This solution appears to work, but it has the disadvantage of returning memory which can be written to. A zero-length allocation can arguably be correct, but it's hard to find anybody who would agree that storing data into a zero-length chunk of memory makes sense. Even highly compressed data cannot be expected to fit into that space in all situations. People who worry about finding bugs would much prefer that any attempt to actually write to memory allocated with kmalloc(0) caused the kernel to protest in a very noisy way.

That brings us to the third possibility: this patch from Christoph Lameter which causes kmalloc(0) to return a special ZERO_SIZE_PTR value. It is a non-NULL value which looks like a legitimate pointer, but which causes a fault on any attempt at dereferencing it. Any attempt to call kfree() with this special value will do the right thing, of course.

The final option seems like it should be the right course, allowing zero-length allocations without masking any subsequent incorrect behavior. Surprisingly, though, there is an objection here too: now every call to kmalloc(0) returns the same value. One might not think this would be a problem; subsequent zero-length allocations will all be zero bytes apart, just like the C standard says they should be. But some developers are worried that this behavior might confuse code which compares pointers to see if two objects are the same. There is also, apparently, an established coding pattern (in user space) which uses zero-length allocations as a way of generating a unique cookie value. If all zero-length allocations return the same pointer, these cookies lose their uniqueness.

That worry appears unlikely to carry the day, though; Linus says:

If people can't be bothered to create a "random ID generator" themselves, they had damn well better use "kmalloc(1)" rather than "kmalloc(0)" to get a unique cookie. Asking the allocator to do something idiotic because some idiot thinks a memory allocator is a cookie allocator is just crazy.

I can understand that things like user-level libraries have to take crazy people into account, but the kernel internal libraries definitely do not.

Add to this argument the fact that nobody seems to have discovered such a use of kmalloc() in the kernel yet, and the "unique cookie" argument runs out of steam. So some form of the ZERO_SIZE_PTR patch, with the warning removed, will probably find its way into the mainline - but probably not before 2.6.23.

Comments (13 posted)

Wireless regulatory compliance

Wireless networking vendors have, over time, developed a large and imaginative set of reasons for their refusal to make free drivers and hardware programming information for their products available. One of those reasons is regulatory compliance; if untrusted parties can modify a wireless device driver, they may (accidentally or not) program the device to operate outside of the rules governing frequency use and power levels in their specific area. Some vendors apparently believe that they could be held responsible for what others do with their hardware, especially in parts of the world with relatively aggressive enforcement of regulations on spectrum use. While the United States is often mentioned in such discussions, people who have studied the issue tend to worry more about Japan. That said, there are regulations worldwide - differing regulations - and a Linux system with radio transmitters in it will be expected to comply with those regulations.

To that end, Larry Finger has recently returned with a new version of his proposal for a mechanism which would enable Linux to operate wireless adapters in a legally-sanctioned way. The scheme involves the creation of a database describing the regulatory regime in various parts of the world. At system startup, a user-space daemon would determine (somehow) where the system was located, obtain the relevant parameters from the database, and feed them into the mac80211 subsystem, which would then instruct drivers on how to program their devices. In the absence of instructions from user space, the kernel would fall back to a minimal configuration known to be legal worldwide - if such a configuration can be found.

There was some interesting feedback, starting with the assertion that the mac80211 layer is the wrong place for a regulatory module. There are wireless adapters which have full MAC capability built into them, and which will not use mac80211, but these devices have the same regulatory issues. Beyond that, Linux systems can contain other sorts of transmitters, starting with BlueTooth adapters and going on from there. If this sort of regulatory compliance is to be added to the kernel (and cleaned out of various drivers where it already exists), it would be best to add it once and have it work in all situations. It turns out that some thought has gone into a kernel "frequency broker" module which would handle this task, but development has not yet gone very far.

Overly zealous regulatory enforcement is a concern for some users. There are people running Linux who have licenses allowing spectrum use which is denied to most of us. They would, understandably, like to be able to use their hardware (when it is capable of such use) in ways which take advantage of their wider permissions. If the kernel eventually adopts a regulatory mechanism which cannot be overruled, it will prevent some users from doing things which they are legally entitled to do. Until they go into the code and disable the regulatory code, at least.

Of course, if legal users can override the regulatory mechanism, others can as well. That leads to the question of whether a regulatory regime implemented in free software can ever be good enough to satisfy the authorities. Luis Rodriguez pointed out an April, 2007 ruling [PDF] from the U.S. Federal Communications Commission which suggests that there could be trouble there:

The Commission did not address the possibility of manufacturers using open source software to implement security measures. However, we recognize that hardware and software security measures that interact with the open source software need not be subject to an open source agreement. We are hereby stating that it is our policy, consistent with the intent of Cognitive Radio Report and Order and Cisco's request, that manufacturers should not intentionally make the distinctive elements that implement that manufacturer's particular security measures in a software defined radio public, if doing so would increase the risk that these security measures could be defeated or otherwise circumvented to allow operation of the radio in a manner that violates the Commission's rules. A system that is wholly dependent on open source elements will have a high burden to demonstrate that it is sufficiently secure to warrant authorization as a software defined radio.

(Emphasis added).

If free regulatory code will never be good enough for regulatory agencies, one might well ask whether it is worth the trouble for Linux developers to implement such a module in the first place. One could answer that operating transmitters in a way consistent with their licensing is the correct thing to do, regardless of whether governments see it as being sufficiently robust. But, if the main concern is keeping governments happy, the only real solution may be to do as Intel has done and move regulatory compliance back into the device's firmware and away from the host operating system altogether. This approach brings an additional benefit in the form of eliminating one excuse for not releasing free drivers.

Comments (19 posted)

Patches and updates

Kernel trees

Build system

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Memory management

Networking

Security-related

Virtualization and containers

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds