Brief items
The current development kernel remains 2.5.69; there have been no
development kernel releases since May 4.
Patches continue to accumulate in Linus's BitKeeper repository, however; it
now contains some NFS fixes, sysfs support for network devices, an XFS
update, some scheduler fixes, a change to the request_module()
prototype, some framebuffer fixes, more annotations of
user-space pointers and makefile support for Linus's (still unreleased)
kernel source analyzer, 48-bit IDE addressing support, a (hopefully)
working IDE tagged command queueing implementation, the BIO "walking"
API, more devfs cleanups (devfs_register() is gone), the USB
"gadget" subsystem, a wireless networking update (and quite a bit of
networking work in general), dynamic block I/O request allocation, a fair
amount of SCSI cleanup work, a generic x86 subarchitecture, a number of TTY
layer cleanups, a USB update, an IA-64 update, and a vast number of other
fixes -- some 700 changesets in all.
The current stable kernel is 2.4.20; no 2.4.21 prepatches have been
released since 2.4.21-rc2 on May 8.
Comments (none posted)
Kernel development news
The second IRC discussion on the 2.6 "must fix" list was held on
May 21. The
full transcript is
available for those who are interested. Below is a quick summary of some
of the high points.
- Power management. Patrick Mochel is in a debugging stage;
in any case, power management changes could go in after 2.6.0.
- Frame buffer restore after suspending, lots of pending issues, especially
on 3d systems. "It's gonna be hell and will take time." Not
necessarily a show stopper for 2.6.0.
- IDE suspend/resume: patches exist which put suspend and
resume operations on request queues so they are properly serialized
with other activity.
- I/O scheduler selection; some way of choosing between I/O
schedulers is needed before the new schedulers can be merged. The
anticipatory scheduler still has enough problems on some loads that it
cannot go in otherwise.
- qlogic drivers: several exist, none really work. Consensus
seems to be that the "feral" driver is the one to go forward with.
- Crypto loopback driver, would be nice for 2.6, but nobody
seems to be working on it.
- ext3 big kernel lock removal: Patches exist, but some "deep
surgery" is required to make it all work. There are concerns that
none of the Linux journaling filesystems perform all that well on SMP
systems.
- ext2 and ext3 block allocations: the filesystems can allocate
blocks poorly. Not necessarily a 2.6.0 issue.
- IRQ balancing, mostly a question of whether the user space
tools should be bundled with the kernel. What's really needed,
perhaps, is a better distribution mechanism for user-space kernel
tools.
- klibc: was awaiting users before it could be merged into 2.5,
but those users have not yet materialized. Alexander Viro has things
that would use it, so this work may move forward before 2.6.
- kexec (booting one kernel directly from another): is working,
but "seems intrusive and late." It's very useful for some users,
though.
- Object-based reverse mapping VM: it still has issues with
highly-shared pages and nonlinear mappings. The latter problem has
been solved. Some think that, if objrmap is merged at all, it should
be marked experimental.
- Networking: Andrew says "net/ is boring, it just works all the
time."
- Early console/printk and a general API for reporting errors to
user space. This stuff looks too late and slow to get in this time
around.
- Kbuild: a better way of building external modules, and allowing
separate source and object directories. "Both sound important."
Conclusion was that it will happen, but it could be after 2.6.0.
- Firmware loading: Greg KH pointed out the driver model firmware
interface currently in patch form (see this
LWN article). Should be merged soon.
- ACPI: still has problems, but work is proceeding.
- Asynchronous I/O: I/O to files still is not truly
asynchronous. Patches exist, but are "late, a bit intrusive, a bit
messy." People think they are important, however; work will be done
to clean them up.
No further discussions have been scheduled at this time.
Comments (none posted)
When the kernel is deep into a feature freeze and there are not a whole lot
of new developments to worry about, it must be time for some policy
debates. A couple of issues that have come up over the last week or so -
both involving the FUTEX subsystem - cast an interesting light on how
policy issues are made, and how the kernel project interacts with its user
community.
A "FUTEX" is, of course, a fast user-space mutual exclusion primitive.
FUTEXes are similar to SYSV semaphores in terms of the functionality they
provide, though no attempt has been made to be compatible with the SYSV
semaphore interface. A FUTEX is also fast: if there is no contention for a
particular lock (which should be the case most of the time) there is no
need to go into the kernel at all. An actual system call is only made when
a process must wait. FUTEXes are used by the blindingly fast 2.5
threading implementation; other applications will certainly be found for
them as they become more widely available.
Ingo Molnar recently sent out a series of patches to the FUTEX subsystem;
one of them adds a new "requeueing"
feature. This feature addresses a performance problem in glibc resulting
from a double-lock implementation there; with requeueing, a process which
waits on a condition variable can be automatically requeued on a different
lock when the condition becomes true. Requeueing avoids the "thundering
herd" problem (when many processes are awakened only to contend with each
other and go back to sleep) which otherwise results in this situation.
The patch drew complaints about how the new feature is implemented. The
FUTEX subsystem provides a single system call (futex()) with a
command argument. All FUTEX operations are multiplexed through this single
call. This style of system call has been deprecated within the kernel for
a while now; it is difficult to get a handle on what multiplexor calls are
really doing. So it was suggested that, rather than adding yet another
command to futex(), Ingo should really tear out the old system
call and create a set of new, single-function calls.
Ingo did, in fact, send out a patch
implementing the futex_wait(), futex_wake(), and
futex_requeue() system calls. But he left the old
futex() call in as well. And that is the core of the real
disagreement: certain developers feel that,
since no stable kernel was ever released with the old system call, it
should be simply removed before 2.6.0.
The problem, of course, is that stable kernels have been released
with that system call. In particular, Red Hat Linux 9 contains a
version of the 2.4.20 kernel with Native PThread Library and FUTEX support
patched in. Removing the futex() system call would break glibc on
those systems. So the question becomes: should a feature which has,
officially, only been present in development kernels be removed, thus
breaking a widely-deployed distribution? Or does a certain amount of
compatibility cruft have to remain in the 2.6.0 kernel in order to avoid
that breakage?
In this case, the issue has been resolved by a
decree from Linus: compatibility will be preserved.
Something like "it's only been in the development kernels" is
simply not an issue. The only thing that matters is whether it is
used by various binaries or not.
In a separate posting, Linus states:
"...the goodness of an operating system is not in how pretty it is,
but in how well it supports the user." And that attitude, of
course, has a lot to do with why Linux is as successful as it is.
The other FUTEX-related issue has to do with configuration options.
Christopher Hoover recently submitted this
patch which makes the FUTEX subsystem optional; those who don't want
FUTEXes would be able to configure them out of the kernel entirely. Linus,
however, doesn't like the idea:
I will strongly argue against making futexes conditional, simply
because I _want_ people to be able to depend on them in modern
kernels. I do not want developers to fall back on SysV semaphores
just because it's too painful for them to use the faster
alternatives.
Similar issues have come up, for example, with regard to making the
epoll() system call or parts of sysfs optional. Increasingly,
there is an interest in defining a minimal functionality that all Linux
kernels will have. Without that, it can be hard to get developers to use
some of the advanced features offered by the kernel.
On the other hand, developers creating kernels for embedded systems often
want to jettison everything that is not absolutely needed. These people,
of course, argue for the ability to configure every feature in the kernel.
And, as Alan Cox pointed out, making
features configurable forces developers to make the implementation of those
features properly modular.
The likely resolution is that configuration options will be provided
for "core" features, but they will be hard to find. Such options may be
buried under a menu titled "remove core functions for embedded systems," or
hidden from the higher-level configuration interfaces altogether (requiring
the use of a text editor on the .config file to change them).
Different users have very different needs, and the Linux kernel tries to
address as many of those needs as it can.
Comments (1 posted)
While most computer peripherals work right "out of the box," some will not
function properly until the host system has downloaded a blob of binary
firmware. Often as not, this firmware is proprietary software. In the
past, a number of drivers have gone into the kernel with proprietary
firmware bundled in. In the eyes of many, all devices have proprietary
firmware in them; there is little reason to be upset if, in some cases,
that firmware arrives via the kernel. But others (notably, the Debian
project) object to linking any sort of non-free software into their
kernel.
The end result is that the recommended way of dealing with devices needing
firmware downloads is to have a user-space process handle it. That way, no
non-free software need be linked into the kernel; as a side benefit, it
also gets easier to upgrade that firmware. The downloads have typically
been handled by way of a device-specific ioctl() call; each driver
includes its own, slightly different implementation.
In 2.5, the device model provides a framework which can be used to
clean up the handling of firmware downloads. All that was missing was an
actual implementation. Manuel Estrada Sainz has filled that gap, however,
with a patch adding an interface for
firmware loads.
In the new scheme, a device driver needing firmware for a particular device
makes a call to:
int request_firmware(struct firmware **fw, const char *name,
struct device *device);
Here, name is the name of the relevant device, and device
is its device model entry. This call will create a directory with the
given name under /sys/class/firmware and populate it with
two files called loading and data. A hotplug event is
then generated which, presumably, will inspire user space to find some
firmware to feed the device.
The resulting user-space process starts by setting the loading
sysfs attribute to a value of one. The actual firmware can then be written
to the data file; when the process is complete, the
loading file should be set back to zero. At that point,
request_firmware() will return to the driver with fw
pointing to the actual firmware data. The user-space process can chose to
abort the firmware load by writing -1 to the loading
attribute.
When the driver has loaded the firmware into its device, it should free up
the associated memory with:
void release_firmware(struct firmware *fw);
There has been talk of maintaining firmware within the kernel so that
subsequent requests can be satisfied without going back to user space. No
such mechanism has been implemented at this point, however. For situations
where it is not possible to wait for user space to react, there is a
request_firmware_nowait() function which will call back into the
driver when the firmware is available.
As of this writing, the new firmware code has not yet been merged into the
mainline kernel. Changes to the interface would not be surprising, but it
seems likely that 2.6 will have a generic firmware support interface that
is not vastly different from what is described here.
Comments (2 posted)
Driver porting
As was noted last week, the driver porting series is approach completion
and new articles will be relatively rare from now on. The series is being
maintained, however. Some changes this week include:
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Device drivers
Documentation
Filesystems and block I/O
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>