Kernel development
Brief items
Kernel release status
The current development kernel is 2.5.69; there has not been a development kernel release since May 4.Linus's BitKeeper tree contains some framebuffer fixes, more annotations of user-space pointers and makefile support for Linus's (still unreleased) kernel source analyzer, 48-bit IDE addressing support, a (hopefully) working IDE tagged command queueing implementation, the BIO "walking" API, more devfs cleanups (devfs_register() is gone), the USB "gadget" subsystem, a wireless networking update (and quite a bit of networking work in general), dynamic block I/O request allocation, a fair amount of SCSI cleanup work, a generic x86 subarchitecture, a number of TTY layer cleanups, a USB update, an IA-64 update, and a vast number of other fixes.
The current stable kernel is 2.4.20; Marcelo released the second 2.4.21 release candidate on May 8. This patch fixes the aic7xxx problems (though not entirely to the satisfaction of the aic7xxx maintainer) and adds a fair number of other small fixes.
Kernel development news
The 2.6 "must fix" list
Version 3 of the 2.6 "must-fix" list has been posted. The list has seen additions and removals, but is not getting a whole lot shorter.On May 14, a number of developers met via IRC to discuss this list; the IRC log is available for those who would like to see how the discussion went. A detailed writeup will be made available; briefly, the main points discussed were:
- The TTY drivers need a lot of work; there are lots of locking and
other problems. Some of the problems are denial-of-service holes, so
fixes will have to be backported to 2.4 as well. It's on Alexander
Viro's list.
- BIO splitting (with the ability to split on non-page boundaries) is
still needed, to fix the RAID problems if nothing else.
- The input layer also still has problems, including locking and
difficult configuration options.
- Merging the ARM code, including a bunch of drivers that could,
perhaps, be useful beyond the ARM architecture. The real question
there is where they should go in the tree...hardly a 2.6 show
stopper.
- CardBus problems; this is a locking issue again.
- Lots of framebuffer work remains; it has been proceeding slowly.
- SCSI: the discussion was mostly about which drivers should be merged
and/or need fixing.
- Races involving direct I/O and the truncate() system call
which can destroy filesystems. This one looks hard to fix, but
something needs to be done. In the worst case, direct I/O could be
disabled for regular files, but nobody likes that option.
- Some scheduling problems remain; Ingo Molnar has patches, but nobody
is sure how many of the problems those patches fix.
- Networking: the big problem is one where TCP sessions occasionally
hang. More traces of hung connections will be needed to track this
one down.
- Process accounting is broken for 32-bit user IDs. This one looks like it can
be fixed using some padding in the accounting record structure. Alan Cox
(conveniently absent) was nominated to do the fix.
- The 1000HZ clock on the i386 architecture is creating some timekeeping
problems that need to be fixed. In the worst case, the clock
frequency would have to go back to 100, but there should be a better
way.
- 64-bit dev_t: Al Viro wants to do quite a bit of work, still, with device number allocation (especially for char devices) and Andries Brouwer is still looking for problems in ioctl() calls. It was asked whether this work could be decoupled from the size change; as was pointed out, going ahead and changing the size of dev_t would make many of the problems more apparent. The /proc/devices file poses some interesting compatibility problems in the new device number scheme.
The discussion did not get through the entire list before time ran out (the Europeans were getting seriously tired, since it was after midnight there, and even kernel hackers begin to slow down about then). Another discussion next week is likely.
The Carrier-Grade Linux shopping list
The OSDL Carrier-Grade Linux project is slowly working toward making Linux suitable for high-stakes telecommunications deployments. To that end, the group has been working on a set of requirements that Linux must meet before it is suitable for such use. The version 2 specification is, with this week's release of the CGL clustering requirements, now complete. The full documents are available on the project's web site. For the busier people among us, here is a quick summary of some of the kernel-oriented requirements.
- Persistent device naming; a device should have the same name regardless of how many times it has been connected and removed.
- Live software upgrades including kernel upgrades; it should be possible to put in a new kernel with less than a minute of downtime. The kexec patch should be helpful in this regard.
- Multi-node volume management that works across a cluster.
- Enhanced panics; it should be possible to configure what happens when the system panics, chosing between halting, rebooting, power-cycling components, etc.
- Fault injection, allowing the simulation of hardware and software failures.
- Page flushing, similar to that provided by the fsync() system call, but under the control of the system administrator rather than the specific application.
- POSIX timers, presumably like those currently found in 2.5.
- User-space semaphores and spinlocks; the 2.5 FUTEX capability should take care of this one.
- Low-level asynchronous events in a scalable manner - like the 2.5 epoll() capability.
- SVR4 streams, required by some applications. "
Keeping it separate from the base kernel ... also would be the prudent thing to do, as providing streams in the kernel got an unfavorable reception in the past in the LKML.
" - Linux security module support as found in 2.5.
- IPSec for IPv4, also as found in 2.5.
- DRM stuff, such as checking binaries for a signature before executing them.
- Atomic checkpoint support which, among other things, allow services to be quickly moved across a cluster if a node fails.
- Failing node isolation so that a confused cluster node cannot corrupt resources.
- Cluster messaging which offers "better quality of service than TCP/IP." Latency is of particular concern.
- Storage replication over the network. Multipath storage access is also required.
All together, it is a lengthy list which will not be fully supported by Linux for quite some time yet. Knowing where you want to go is always an important first step, however.
Security modules begin to appear
One of the (many) complaints leveled against the Linux Security Module (LSM) architecture is that it adds a whole new API, that has no users, to the kernel. That situation is changing, now; a couple of new security modules have been posted over the last week or so.The larger and less surprising of the two is the SELinux module. SELinux is the hardened version of the kernel implemented by the U.S. National Security Agency; it features a number of mandatory access control features designed to contain the damage that occurs if and when an application is compromised. SELinux has, in the past, been subjected to some patent claims, but the patent owners have been silent for some time and, one hopes, that issue has quietly gone away. Though a look at Secure Computing Corporation's last communication on the subject before using SELinux might still be prudent.
SELinux is not yet proposed for inclusion within the mainline kernel; it is still being reviewed, and it depends on a series of other patches which have not yet been merged. Patent issues aside, the inclusion of modules like this should not be controversial, even at this stage of kernel development; they sit off to the side and do not have any effect on anybody who does not actually use them.
More recently, Niki Rahimi (of IBM) posted a Trusted Path Execution module. This module divides all users into those who are "trusted" (root and anybody root has added to the list) and everybody else. Programs, too, are either trusted or not; trusted programs are those living in a directory which owned by root and not writeable by anybody else. Trusted users can run any executable in the system (subject to the usual access checks, of course), and anybody can run trusted programs. But untrusted users are not allowed to run untrusted programs. This module, thus, provides a simple mechanism for controlling which programs may be run on a system.
The promise of the LSM scheme is that it will make it easy for developers and users to experiment with different security schemes. If all goes according to plan, LSM should enable the creation of a large library of security modules to the needs of many different sites.
Driver porting
News from the driver porting series
This week's driver porting article (below) discusses the class mechanism, which is part of the device model. At this point, this series is nearing completion. There will be an occasional new article, and the existing base of articles (30 of them, now) will be updated as the kernel hackers do their best to make them obsolete. But these articles will no longer appear every week. Creating this series has been a lot of work, but also a lot of fun; many thanks to all of you for your support and helpful comments.Driver porting: Device classes
| This article is part of the LWN Porting Drivers to 2.6 series. |
To help with this sort of resource discovery issue, the driver model exports a "class" interface. Devices, once registered, can be associated with one or more classes which describe the function(s) performed by the device. Class memberships show up under the /sys/class sysfs directory, and, of course, can be decorated with all kinds of attributes. There are also mechanisms which provide notification - both within and outside of the kernel - when a device joins or leaves a class. The class interface can also be the easiest way for a driver to make arbitrary attributes available via sysfs.
For many (if not most) drivers, class membership will be handled automatically in the higher layers. Block devices, for example, are associated with the "block" class when their associated gendisk structures are registered. (This class currently appears in /sys/block, incidentally; it will likely move to /sys/class/block at some point). Occasionally, however, it can be necessary to explicitly associate a device with a specific class. This article describes how to do that, and - though remaining superficial - it provides more information than is really needed in order to, with luck, provide an understanding of how the class system works.
For those wishing for a hands-on example, the full source for a version of the "simple block driver" module that understands classes is available.
Creating a class
It is a rare device which exists in a unique class of its own; as a result, drivers will almost never create their own classes. Should the need arise, however, the process is simple. The first step is the creation of a struct class (defined in <linux/device.h>). There are two necessary fields, being the name and a pointer to a "release" function; the SBD driver sets up its class as:
static struct class sbd_class = {
.name = "sbd",
.release = sbd_class_release
};
The name is, of course, how this class will show up under /sys/class. We will get to the release function shortly, after we have looked at class devices.
Beyond that, there is only one other thing that a class definition can provide: a "hotplug" function:
int (*hotplug)(struct class_device *dev, char **envp,
int num_envp, char *buffer, int buffer_size);
The addition of a device to a class creates a hotplug event. Before /sbin/hotplug is called to respond to that event, the class's hotplug() method (if any) will be called. That method can add variables to the environment that is passed to /sbin/hotplug; they should be put into buffer (respecting the given buffer_size) with pointers set into envp (but no more than num_envp of them, and with a NULL pointer to terminate the list). The return value should be zero, or the usual negative error code.
Classes need to be registered, of course:
int class_register(struct class *cls);
The return value will be zero of all goes well. The void function class_unregister() will do exactly what one would expect.
Class devices
If your device type lacks a specific registration function of its own (such as add_disk() or register_netdev()), or if you have created your own custom class, you may find yourself adding your device(s) to a class explicitly. Membership in a class is represented by an instance of struct class_device. There are three fields that should normally be filled in:
struct class *class; struct device *dev; char class_id[BUS_ID_SIZE];
The class pointer, of course, should be aimed at the proper class structure. The dev pointer is optional; it is used to create the device and driver symbolic links in the device's class entry in sysfs. Since user-space processes looking to discover devices of a particular class probably want to have that pointer, you should make it easy for them. The class_id is a string which is unique within the class - it becomes, of course, the name of the device's sysfs entry.
Once the class_device structure has been set up, it can be added to the class with:
int class_device_register(struct class_device *class_dev);
class_device_unregister() can be used at module unload time.
Once you register a class device, it becomes available to the world as a whole. If your class device is allocated dynamically, you must be very careful about when you free it. Remember that user-space processes can retain references to your device via your sysfs attributes; you must not free the class device until all of those references are gone.
That, of course, is the purpose of the release function stored in struct class. This function has a simple prototype:
void release_fn(struct class_device *cd);
This function is called when the last reference to the given device goes away; it should respond by freeing the device. That call will typically happen when you call class_device_unregister() on the device, but it could happen later if other references persist.
Please note that, if your class device structure is dynamically allocated, or it embedded within another, dynamic structure, you must use a release function to free that structure or your code is buggy.
Class device attributes
Attributes are easily added to a class device entry. If the attribute is to be readable, it will need a "show" function to respond to reads; the function used to export the driver version in SBD looks like:
static ssize_t show_version(struct class_device *cd, char *buf)
{
sprintf(buf, "%s\n", Version);
return strlen(buf) + 1;
}
If the attribute is to be writable, you will need a store function too:
ssize_t (*store)(struct class_device *, const char *buf, size_t count);
These functions are then bundled into an attribute structure with:
CLASS_DEVICE_ATTR(name, mode, show, store);
The name should not be a quoted string; it is joined in the macro to create a structure called class_device_attr_name.
The final step is to create the actual device attribute, using:
int class_device_create_file(struct class_device *,
struct class_device_attribute *);
You can call class_device_remove_file() to get rid of an attribute, but that is also done automatically for you when a device is removed from a class.
Interfaces
The term "interface," as used within the device model, is a bit confusing. A better way to think of interfaces is as a sort of constructor and destructor mechanism for class device entries. An interface provides add() and remove() methods which are called as devices are added to (and removed from) a class; their usual purpose is to add class-specific attributes to the class device entry. They can, however, perform any other kernel function that might be useful in response to class device events.Briefly, the creation of an interface requires the creation of a class_interface structure, which needs to have the following fields filled in:
struct class *class;
int (*add) (struct class_device *);
void (*remove) (struct class_device *);
Once the interface is set up with:
int class_interface_register(struct class_interface *);
The add() and remove() functions will be called when devices are added to (or removed from) the given class. A call to class_interface_unregister() undoes the registration.
Patches and updates
Kernel trees
Architecture-specific
Build system
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>
