Kernel development
Brief items
Kernel release status
The current stable 2.6 kernel is 2.6.14.6, released on January 7. It contains a small number of fixes, a couple of which address potential security issues. Chances are this will be the last update for the 2.6.14 kernel.There is no 2.6.16 prepatch yet. Well over 2000 patches have been merged into the mainline git repository, however. See the separate article (below) for a list of the most significant changes.
The current -mm tree is 2.6.15-mm3. Recent changes to -mm include a big x86-64 update, sysfs support in the parallel port driver, John Stultz's core time subsystem patches, the removal of several old USB audio drivers, the openat() system call and friends, a new direct migration patch set, and multi-block allocation for the ext3 filesystem. Despite all that new stuff, -mm has thinned considerably over the last week as patches have moved into the mainline.
Kernel development news
Quotes of the week
Looking forward to 2.6.16
As of this writing, well over 2000 patches have been merged for the upcoming 2.6.16 kernel. The following list covers some of the more important or user-visible patches; it is not exhaustive by any means. Links to LWN articles describing the patches have been provided where available.The 2.6.16 merge window will remain open for some time yet, so expect some more big changes before it is done.
User-visible changes
- OCFS2, Oracle's clustered
filesystem.
- Networking changes include per-packet access control tied into the
IPSec subsystem, an implementation of the "CUBIC" congestion control
algorithm for TCP, an initial implementation of the DCCP protocol over IPv6,
and a sysfs interface to the network bonding module, allowing runtime
reconfiguration without the need to reload the module. There is also
an obscure "intermediate functional block" network device option which can
be used for configuration flexibility and resource sharing.
- Module versioning (storing version information to help binary modules
work with more than one kernel release) is no longer considered
experimental.
- The hotplug helper /sbin/hotplug is now officially
deprecated. The control file /proc/sys/kernel/hotplug has
moved to /sys/kernel/uevent_helper, but it is expected to be
disabled on most systems in favor of udev and the netlink interface.
- Copy-on-write support and NUMA awareness for "hugetlb" pages.
- The software suspend code has seen some work. The encryption option
has been removed; it was little used and offered little protection in
the first place. A few steps have been taken toward moving parts of the
suspend process to user space.
- The swap migration code,
allowing a process's pages to follow it from
one processor to another. As of this writing, the direct migration patches
have not been merged.
- The "SLOB allocator" has been added; it is a replacement for the Linux
slab code which is suited for very small-memory systems.
- The oldest supported version
of gcc for kernel building is now 3.2.
- The ext3 filesystem has a new mount option allowing the location of
the journal device to be specified.
- The module loader now explicitly checks for the ndiswrapper and
driverloader modules, and will mark the kernel tainted if they are
found.
- V9fs (the Plan9
filesystem) is now capable of performing zero-copy
operations. Various other v9fs improvements have been added as well.
- Support for the Cell architecture has been significantly filled out.
- New drivers for ADI Eagle-based USB ADSL modems, ATI and Phillips USB remote control units, the Marvel Yukon2 Ethernet chipset, the network interface in the Intel ixp2000 (ARM) CPU, the CS5535 audio device, Digigram PCXHR boards, and the SyncLink GT and AC serial adaptor families.
Internal API changes
- Ingo Molnar's mutex code
has been added. A few patches converting subsystems over to mutexes
have gone in, but most of that work remains to be done.
- The usb_driver structure has a new field
(no_dynamic_id) which lets a driver disable the addition of
dynamic device IDs. The owner field has also been removed
from this structure.
- Some significant changes to the SCSI subsystem aimed at eliminating
the use of the old scsi_request structure. The SCSI software
IRQ is no longer used; postprocessing happens via the generic block
software IRQ instead.
- Vast numbers of typedefs have been removed from the ALSA code,
bringing that subsystem more in line with kernel coding standards.
Power management support has also been added to a number of ALSA
drivers.
- A new workqueue function schedule_on_each_cpu() will cause a
function to be called on every running processor on the system.
- Much of the core device model code has been reeducated to use the term
"uevent" instead of "hotplug." Some changes which are visible outside
of the core code include:
- kobject_hotplug() becomes kobject_uevent()
- struct kset_hotplug_ops becomes struct kset_uevent_ops, and its hotplug() member is now uevent()
- add_hotplug_env_var() becomes add_uevent_var()
- A 64-bit atomic type, atomic_long_t, has been added.
Supported functions are:
- long atomic_long_read(atomic_long_t *l);
- void atomic_long_set(atomic_long_t *l, long i);
- void atomic_long_inc(atomic_long_t *l);
- void atomic_long_dec(atomic_long_t *l);
- void atomic_long_add(long i, atomic_long_t *l);
- void atomic_long_sub(long i, atomic_long_t *l);
- The block I/O barrier code has been rewritten. This
patch changes the barrier API and also adds a new parameter to
end_that_request_last().
- The block_device_operations structure has a new method
getgeo(); its job is to fill in an hd_geometry
structure with information about the drive. With this operation in
place, many block drivers will not need an ioctl() function
at all.
- The dentry structure has been changed: the d_child
and d_rcu fields are now overlaid in a union. This change
shrinks this heavily-used structure and improves its cache behavior.
- struct page has also been changed; it is now smaller on large
SMP systems.
- Linas Vepstas's PCI error
recovery patch has been merged.
- A new list function, list_for_each_entry_safe_reverse(), does
just what one would expect.
- The high-resolution kernel timer code has been merged. Much of the
core works as described in this LWN article, but there
have also been changes and most of the names are different. The new
high-resolution timer interface will be discussed in the
January 19 Kernel Page.
- Buffering for the TTY layer has been completely redone.
As noted above, more changes are likely; stay tuned. Remember that API changes will eventually find their way onto the LWN 2.6 API Changes Page.
The mutex API
The mutex code may well have set a record for the shortest time spent in -mm for such a fundamental patch. It would not have been surprising for mutexes to sit in -mm through at least one kernel cycle, which would have had them being merged in or after 2.6.17. But the mutex code appeared in exactly one -mm release (2.6.15-mm2, released on January 7) before being merged into the mainline on January 9.The actual mutex type (minus debugging fields) is quite simple:
struct mutex { atomic_t count; spinlock_t wait_lock; struct list_head wait_list; };
Unlike semaphores, mutexes have one definition which is used on all architectures. Some of the actual locking and unlocking code can be overridden if it can be made to perform better on a specific architecture, but the core data structure remains the same. The count field contains the state of the mutex. A value of one indicates that it is available, zero means locked, and a negative value means that it is locked and processes might be waiting. Separating the two "locked" cases is worthwhile: in the (usual) case where nobody is waiting for the mutex, there is no need to go through the process of seeing if anybody needs to be waked up. wait_lock controls access to wait_list, which is a simple list of processes waiting on the mutex.
The mutex API (obtained through <linux/mutex.h>) is simple. Every mutex must first be initialized either at declaration time with:
DEFINE_MUTEX(name);
Or at run time with:
mutex_init(struct mutex *lock);
Once a mutex has been initialized, it can be locked with any of:
void mutex_lock(struct mutex *lock); int mutex_lock_interruptible(struct mutex *lock); int mutex_trylock(struct mutex *lock);
A call to mutex_lock() will lock the mutex, putting the calling process into an uninterruptible wait if need be. mutex_lock_interruptible() uses an interruptible sleep; if the lock is obtained, it will return zero. A return value of -EINTR means that the locking attempt was interrupted by a signal and the caller should act accordingly. Finally, mutex_trylock() will attempt to obtain the lock, but will not sleep; unlike mutex_lock_interruptible(), it returns zero on failure (the lock was unavailable) and one if the lock is acquired.
In all cases, the mutex must eventually be freed (by the same process which acquired it) through a call to:
void mutex_unlock(struct mutex *lock);
Note that mutex_unlock() cannot be called from interrupt context. This restriction appears to have more to do with keeping mutexes from ever being used as completions than a fundamental restriction caused by the mutex design itself. Note also that a mutex can only be locked once - locking calls do not nest.
Finally, there is a function for querying the state of a mutex:
int mutex_is_locked(struct mutex *lock);
This function will return a boolean value indicating whether the mutex is locked or not, but will not change the state of the lock.
Now that this code has been merged, the semaphore type can officially be considered to be on its way out. New code should not use semaphores, and old code which uses semaphores as mutexes should be converted over when an opportunity presents itself. The reader/writer semaphore type (rwsem) is a different beast, and is not affected by this patch. There is a debugging option which can be configured into development kernels which may help with the transition; with this option enabled, quite a few types of errors will be detected.
At this point, code which uses the counting feature of semaphores lacks a migration path. There is evidently a plan to introduce a new, architecture-independent type for these users, but that code has not yet put in an appearance. Once that step has been taken, the path will be clear for the eventual removal of semaphores from the kernel entirely.
Linux and wireless networking
Jeff Garzik's recent State of the Union: Wireless posting came right to the point:
Jeff went on to discuss a few of the challenges facing the Linux wireless implementation. This is, indeed, one area where some real progress is needed. Proprietary chipsets are just the beginning of the issues which must be dealt with - free software developers are actually beginning to catch up in that area. But before all the resulting drivers can be merged into a coherent whole, a few other things will have to be worked out.
One of those has to do with the 802.11 stack used by the kernel. As was discussed here last December, there is a fair amount of unhappiness with the in-kernel stack, which, among other things, has no "softmac" support, needed for adapters which do not perform MAC functions in hardware. A number of out-of-tree wireless stacks do provide that support, and there have been a lot of suggestions that one of those (usually the DeviceScape stack) be merged.
Those suggestions have been strongly resisted by the networking maintainers. They would rather see work go into fixing up the stack which is in the kernel now than replace it wholesale or - even worse - having two independent 802.11 stacks to maintain. Replacing the current stack would involve significant disruption in the networking subsystem, and would be hard to do without breaking the drivers which use the old stack. The two-stack solution, instead, would bloat the kernel and increase the amount of work required to maintain the networking subsystem into the future. So it is not surprising that there is a strong interest in evolving the current stack toward the desired functionality rather than bringing in a whole new implementation.
Still, the pressure to switch over to the DeviceScape stack appears to be growing. Jeff's posting seems to recognize this fact, and asks that, in the end, the developers at least pick a single stack which they can live with. And, says Jeff, regardless of which stack is chosen in the end:
Another issue has to do with the management interface for wireless adapters. Wired network adapters are relatively simple; set a few options on media access, give them an address, and they are ready to go. The wireless world is rather more complicated. To deal with the extra configuration required by wireless adapters, the "wireless extensions" interface - essentially a big set of ioctl() commands for querying and setting adapter parameters - was developed.
There seems to be a consensus that the wireless extensions have reached their expiration date, and need to be replaced with something else. Most developers would appear to favor a new (not yet specified) interface built on the netlink mechanism. User-space management code could then be rewritten to speak the new management protocol over netlink sockets.
This approach may seem strange, given the emphasis which has been placed on sysfs and the creation of scriptable, plain-text interfaces. Sysfs does seem like a poor match for wireless configuration, however. Wireless adapters have a large number of parameters, and it is often necessary to change several of them simultaneously. Sysfs, with its one-value-per-file rules, provides no means for this sort of atomic, multi-parameter update; a netlink interface could, instead, be designed with these needs in mind from the beginning.
Of the other issues mentioned, perhaps this one is the most significant: there is no wireless maintainer. The lack of a developer who is specifically interested in this area of networking and who will work to push it forward has clearly hurt. Fortunately, it appears that this era may be at an end: John Linville has stepped forward to take on this responsibility.
John has a fair amount of work ahead of him; quite a few developers have to be brought together and made to agree on the way forward. To that end, a wireless networking summit has been scheduled for early April in Portland. If the attendees at that meeting (which looks to include both kernel and user space developers) can produce a viable plan, Linux may just lose its "superiority in the area of crappy wireless support" before too long.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>