Kernel development
Brief items
Kernel release status
The current development kernel remains 2.5.69; there have been no development kernel releases since May 4.Patches continue to accumulate in Linus's BitKeeper repository, however; it now contains some NFS fixes, sysfs support for network devices, an XFS update, some scheduler fixes, a change to the request_module() prototype, some framebuffer fixes, more annotations of user-space pointers and makefile support for Linus's (still unreleased) kernel source analyzer, 48-bit IDE addressing support, a (hopefully) working IDE tagged command queueing implementation, the BIO "walking" API, more devfs cleanups (devfs_register() is gone), the USB "gadget" subsystem, a wireless networking update (and quite a bit of networking work in general), dynamic block I/O request allocation, a fair amount of SCSI cleanup work, a generic x86 subarchitecture, a number of TTY layer cleanups, a USB update, an IA-64 update, and a vast number of other fixes -- some 700 changesets in all.
The current stable kernel is 2.4.20; no 2.4.21 prepatches have been released since 2.4.21-rc2 on May 8.
Kernel development news
The second "must fix" IRC session
The second IRC discussion on the 2.6 "must fix" list was held on May 21. The full transcript is available for those who are interested. Below is a quick summary of some of the high points.- Power management. Patrick Mochel is in a debugging stage;
in any case, power management changes could go in after 2.6.0.
- Frame buffer restore after suspending, lots of pending issues, especially
on 3d systems. "It's gonna be hell and will take time." Not
necessarily a show stopper for 2.6.0.
- IDE suspend/resume: patches exist which put suspend and
resume operations on request queues so they are properly serialized
with other activity.
- I/O scheduler selection; some way of choosing between I/O
schedulers is needed before the new schedulers can be merged. The
anticipatory scheduler still has enough problems on some loads that it
cannot go in otherwise.
- qlogic drivers: several exist, none really work. Consensus
seems to be that the "feral" driver is the one to go forward with.
- Crypto loopback driver, would be nice for 2.6, but nobody
seems to be working on it.
- ext3 big kernel lock removal: Patches exist, but some "deep
surgery" is required to make it all work. There are concerns that
none of the Linux journaling filesystems perform all that well on SMP
systems.
- ext2 and ext3 block allocations: the filesystems can allocate
blocks poorly. Not necessarily a 2.6.0 issue.
- IRQ balancing, mostly a question of whether the user space
tools should be bundled with the kernel. What's really needed,
perhaps, is a better distribution mechanism for user-space kernel
tools.
- klibc: was awaiting users before it could be merged into 2.5,
but those users have not yet materialized. Alexander Viro has things
that would use it, so this work may move forward before 2.6.
- kexec (booting one kernel directly from another): is working,
but "seems intrusive and late." It's very useful for some users,
though.
- Object-based reverse mapping VM: it still has issues with
highly-shared pages and nonlinear mappings. The latter problem has
been solved. Some think that, if objrmap is merged at all, it should
be marked experimental.
- Networking: Andrew says "net/ is boring, it just works all the
time."
- Early console/printk and a general API for reporting errors to
user space. This stuff looks too late and slow to get in this time
around.
- Kbuild: a better way of building external modules, and allowing
separate source and object directories. "Both sound important."
Conclusion was that it will happen, but it could be after 2.6.0.
- Firmware loading: Greg KH pointed out the driver model firmware
interface currently in patch form (see this
LWN article). Should be merged soon.
- ACPI: still has problems, but work is proceeding.
- Asynchronous I/O: I/O to files still is not truly asynchronous. Patches exist, but are "late, a bit intrusive, a bit messy." People think they are important, however; work will be done to clean them up.
No further discussions have been scheduled at this time.
Kernel policy issues: compatibility and configuration
When the kernel is deep into a feature freeze and there are not a whole lot of new developments to worry about, it must be time for some policy debates. A couple of issues that have come up over the last week or so - both involving the FUTEX subsystem - cast an interesting light on how policy issues are made, and how the kernel project interacts with its user community.A "FUTEX" is, of course, a fast user-space mutual exclusion primitive. FUTEXes are similar to SYSV semaphores in terms of the functionality they provide, though no attempt has been made to be compatible with the SYSV semaphore interface. A FUTEX is also fast: if there is no contention for a particular lock (which should be the case most of the time) there is no need to go into the kernel at all. An actual system call is only made when a process must wait. FUTEXes are used by the blindingly fast 2.5 threading implementation; other applications will certainly be found for them as they become more widely available.
Ingo Molnar recently sent out a series of patches to the FUTEX subsystem; one of them adds a new "requeueing" feature. This feature addresses a performance problem in glibc resulting from a double-lock implementation there; with requeueing, a process which waits on a condition variable can be automatically requeued on a different lock when the condition becomes true. Requeueing avoids the "thundering herd" problem (when many processes are awakened only to contend with each other and go back to sleep) which otherwise results in this situation.
The patch drew complaints about how the new feature is implemented. The FUTEX subsystem provides a single system call (futex()) with a command argument. All FUTEX operations are multiplexed through this single call. This style of system call has been deprecated within the kernel for a while now; it is difficult to get a handle on what multiplexor calls are really doing. So it was suggested that, rather than adding yet another command to futex(), Ingo should really tear out the old system call and create a set of new, single-function calls.
Ingo did, in fact, send out a patch implementing the futex_wait(), futex_wake(), and futex_requeue() system calls. But he left the old futex() call in as well. And that is the core of the real disagreement: certain developers feel that, since no stable kernel was ever released with the old system call, it should be simply removed before 2.6.0.
The problem, of course, is that stable kernels have been released with that system call. In particular, Red Hat Linux 9 contains a version of the 2.4.20 kernel with Native PThread Library and FUTEX support patched in. Removing the futex() system call would break glibc on those systems. So the question becomes: should a feature which has, officially, only been present in development kernels be removed, thus breaking a widely-deployed distribution? Or does a certain amount of compatibility cruft have to remain in the 2.6.0 kernel in order to avoid that breakage?
In this case, the issue has been resolved by a decree from Linus: compatibility will be preserved.
In a separate posting, Linus states:
"...the goodness of an operating system is not in how pretty it is,
but in how well it supports the user.
" And that attitude, of
course, has a lot to do with why Linux is as successful as it is.
The other FUTEX-related issue has to do with configuration options. Christopher Hoover recently submitted this patch which makes the FUTEX subsystem optional; those who don't want FUTEXes would be able to configure them out of the kernel entirely. Linus, however, doesn't like the idea:
Similar issues have come up, for example, with regard to making the epoll() system call or parts of sysfs optional. Increasingly, there is an interest in defining a minimal functionality that all Linux kernels will have. Without that, it can be hard to get developers to use some of the advanced features offered by the kernel.
On the other hand, developers creating kernels for embedded systems often want to jettison everything that is not absolutely needed. These people, of course, argue for the ability to configure every feature in the kernel. And, as Alan Cox pointed out, making features configurable forces developers to make the implementation of those features properly modular.
The likely resolution is that configuration options will be provided for "core" features, but they will be hard to find. Such options may be buried under a menu titled "remove core functions for embedded systems," or hidden from the higher-level configuration interfaces altogether (requiring the use of a text editor on the .config file to change them). Different users have very different needs, and the Linux kernel tries to address as many of those needs as it can.
A general method for firmware loading
While most computer peripherals work right "out of the box," some will not function properly until the host system has downloaded a blob of binary firmware. Often as not, this firmware is proprietary software. In the past, a number of drivers have gone into the kernel with proprietary firmware bundled in. In the eyes of many, all devices have proprietary firmware in them; there is little reason to be upset if, in some cases, that firmware arrives via the kernel. But others (notably, the Debian project) object to linking any sort of non-free software into their kernel.The end result is that the recommended way of dealing with devices needing firmware downloads is to have a user-space process handle it. That way, no non-free software need be linked into the kernel; as a side benefit, it also gets easier to upgrade that firmware. The downloads have typically been handled by way of a device-specific ioctl() call; each driver includes its own, slightly different implementation.
In 2.5, the device model provides a framework which can be used to clean up the handling of firmware downloads. All that was missing was an actual implementation. Manuel Estrada Sainz has filled that gap, however, with a patch adding an interface for firmware loads.
In the new scheme, a device driver needing firmware for a particular device makes a call to:
int request_firmware(struct firmware **fw, const char *name,
struct device *device);
Here, name is the name of the relevant device, and device is its device model entry. This call will create a directory with the given name under /sys/class/firmware and populate it with two files called loading and data. A hotplug event is then generated which, presumably, will inspire user space to find some firmware to feed the device.
The resulting user-space process starts by setting the loading sysfs attribute to a value of one. The actual firmware can then be written to the data file; when the process is complete, the loading file should be set back to zero. At that point, request_firmware() will return to the driver with fw pointing to the actual firmware data. The user-space process can chose to abort the firmware load by writing -1 to the loading attribute.
When the driver has loaded the firmware into its device, it should free up the associated memory with:
void release_firmware(struct firmware *fw);
There has been talk of maintaining firmware within the kernel so that subsequent requests can be satisfied without going back to user space. No such mechanism has been implemented at this point, however. For situations where it is not possible to wait for user space to react, there is a request_firmware_nowait() function which will call back into the driver when the firmware is available.
As of this writing, the new firmware code has not yet been merged into the mainline kernel. Changes to the interface would not be surprising, but it seems likely that 2.6 will have a generic firmware support interface that is not vastly different from what is described here.
Driver porting
Driver porting series changes
As was noted last week, the driver porting series is approach completion and new articles will be relatively rare from now on. The series is being maintained, however. Some changes this week include:
- The miscellaneous changes article has
been updated to cover the new request_module() prototype.
- The BIO structure now reflects the
addition of bvec_kmap_irq().
- Request queues I has a brief description of the new "BIO walking" functions.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Device drivers
Documentation
Filesystems and block I/O
Networking
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>
