Brief items
The current development kernel is 2.5.69; there has not been a
development kernel release since May 4.
Linus's BitKeeper tree contains some framebuffer fixes, more annotations of
user-space pointers and makefile support for Linus's (still unreleased)
kernel source analyzer, 48-bit IDE addressing support, a (hopefully)
working IDE tagged command queueing implementation, the BIO "walking"
API, more devfs cleanups (devfs_register() is gone), the USB
"gadget" subsystem, a wireless networking update (and quite a bit of
networking work in general), dynamic block I/O request allocation, a fair
amount of SCSI cleanup work, a generic x86 subarchitecture, a number of TTY
layer cleanups, a USB update, an IA-64 update, and a vast number of other
fixes.
The current stable kernel is 2.4.20; Marcelo released the second 2.4.21 release candidate on
May 8. This patch fixes the aic7xxx problems (though not entirely to
the satisfaction of the aic7xxx maintainer) and adds a fair number of other
small fixes.
Comments (none posted)
Kernel development news
Version 3 of the
2.6 "must-fix" list has
been posted. The list has seen additions and removals, but is not getting
a whole lot shorter.
On May 14, a number of developers met via IRC to discuss this list; the IRC log is available for those who would like
to see how the discussion went. A detailed writeup will be made available;
briefly, the main points discussed were:
- The TTY drivers need a lot of work; there are lots of locking and
other problems. Some of the problems are denial-of-service holes, so
fixes will have to be backported to 2.4 as well. It's on Alexander
Viro's list.
- BIO splitting (with the ability to split on non-page boundaries) is
still needed, to fix the RAID problems if nothing else.
- The input layer also still has problems, including locking and
difficult configuration options.
- Merging the ARM code, including a bunch of drivers that could,
perhaps, be useful beyond the ARM architecture. The real question
there is where they should go in the tree...hardly a 2.6 show
stopper.
- CardBus problems; this is a locking issue again.
- Lots of framebuffer work remains; it has been proceeding slowly.
- SCSI: the discussion was mostly about which drivers should be merged
and/or need fixing.
- Races involving direct I/O and the truncate() system call
which can destroy filesystems. This one looks hard to fix, but
something needs to be done. In the worst case, direct I/O could be
disabled for regular files, but nobody likes that option.
- Some scheduling problems remain; Ingo Molnar has patches, but nobody
is sure how many of the problems those patches fix.
- Networking: the big problem is one where TCP sessions occasionally
hang. More traces of hung connections will be needed to track this
one down.
- Process accounting is broken for 32-bit user IDs. This one looks like it can
be fixed using some padding in the accounting record structure. Alan Cox
(conveniently absent) was nominated to do the fix.
- The 1000HZ clock on the i386 architecture is creating some timekeeping
problems that need to be fixed. In the worst case, the clock
frequency would have to go back to 100, but there should be a better
way.
- 64-bit dev_t: Al Viro wants to do quite a bit of work, still,
with device number allocation (especially for char devices) and
Andries Brouwer is still looking for problems in ioctl()
calls. It was asked whether this work could be decoupled from the
size change; as was pointed out, going ahead and changing the size of
dev_t would make many of the problems more apparent. The
/proc/devices file poses some interesting compatibility
problems in the new device number scheme.
The discussion did not get through the entire list before time ran out (the
Europeans were getting seriously tired, since it was after midnight there,
and even kernel hackers begin to slow down about then). Another discussion
next week is likely.
Comments (2 posted)
The OSDL
Carrier-Grade
Linux project is slowly working toward making Linux suitable for
high-stakes telecommunications deployments. To that end, the group has
been working on a set of requirements that Linux must meet before it is
suitable for such use. The version 2 specification is, with this
week's
release of the CGL
clustering requirements, now complete. The full documents are available on
the project's web site. For the busier people among us, here is a quick
summary of some of the kernel-oriented requirements.
- Persistent device naming; a device should have the same name
regardless of how many times it has been connected and removed.
- Live software upgrades including kernel upgrades; it should
be possible to put in a new kernel with less than a minute of downtime.
The kexec patch should be helpful in this regard.
- Multi-node volume management that works across a cluster.
- Enhanced panics; it should be possible to configure what
happens when the system panics, chosing between halting, rebooting,
power-cycling components, etc.
- Fault injection, allowing the simulation of hardware and
software failures.
- Page flushing, similar to that provided by the fsync()
system call, but under the control of the system administrator rather
than the specific application.
- POSIX timers, presumably like those currently found in 2.5.
- User-space semaphores and spinlocks; the 2.5 FUTEX capability
should take care of this one.
- Low-level asynchronous events in a scalable manner - like the
2.5 epoll() capability.
- SVR4 streams, required by some applications. "Keeping it
separate from the base kernel ... also would be the prudent thing to
do, as providing streams in the kernel got an unfavorable reception in
the past in the LKML."
- Linux security module support as found in 2.5.
- IPSec for IPv4, also as found in 2.5.
- DRM stuff, such as checking binaries for a signature before
executing them.
- Atomic checkpoint support which, among other things, allow
services to be quickly moved across a cluster if a node fails.
- Failing node isolation so that a confused cluster node cannot
corrupt resources.
- Cluster messaging which offers "better quality of service than
TCP/IP." Latency is of particular concern.
- Storage replication over the network. Multipath storage access
is also required.
All together, it is a lengthy list which will not be fully supported by
Linux for quite some time yet. Knowing where you want to go is always an
important first step, however.
Comments (none posted)
One of the (many) complaints leveled against the Linux Security Module
(LSM) architecture is that it adds a whole new API, that has no users, to the
kernel. That situation is changing, now; a couple of new security modules
have been posted over the last week or so.
The larger and less surprising of the two is the SELinux module. SELinux is the hardened version of
the kernel implemented by the U.S. National Security Agency; it features a
number of mandatory access control features designed to contain the damage
that occurs if and when an application is compromised. SELinux has, in the
past, been subjected to some patent claims, but the patent owners have been
silent for some time and, one hopes, that issue has quietly gone away.
Though a look at Secure Computing Corporation's
last communication on the subject before using SELinux might still be
prudent.
SELinux is not yet proposed for inclusion within the mainline kernel; it is
still being reviewed, and it depends on a series of other patches which
have not yet been merged. Patent issues aside, the inclusion of modules
like this should not be controversial, even at this stage of kernel
development; they sit off to the side and do not have any effect on anybody
who does not actually use them.
More recently, Niki Rahimi (of IBM) posted a Trusted Path Execution module. This module
divides all users into those who are "trusted" (root and anybody root has
added to the list) and everybody else. Programs, too, are either trusted
or not; trusted programs are those living in a directory which owned by
root and not writeable by anybody else. Trusted users can run any
executable in the system (subject to the usual access checks, of course),
and anybody can run trusted programs. But untrusted users are not allowed
to run untrusted programs. This module, thus, provides a simple mechanism
for controlling which programs may be run on a system.
The promise of the LSM scheme is that it will make it easy for developers
and users to experiment with different security schemes. If all goes
according to plan, LSM should enable the creation of a large library of
security modules to the needs of many different sites.
Comments (2 posted)
Driver porting
This week's driver porting article (below) discusses the class mechanism,
which is part of the device model. At this point, this series is nearing
completion. There will be an occasional new article, and the existing base
of articles (30 of them, now) will be updated as the kernel hackers do
their best to make them obsolete. But these articles will no longer appear
every week. Creating this series has been a lot of work, but also a lot of
fun; many thanks to all of you for your support and helpful comments.
Comments (1 posted)
Previous articles in this series have shown how the device model maintains a data
structure representing the physical structure of the host system. There is
more to know about a system than how it is plugged together, however;
indeed, most of the time, user space really does not care about physical
connections. Users (and the applications they run) are much more
interested in questions like "what disks
does this system have" or "where is the mouse?"
To help with this sort of resource discovery issue, the driver model
exports a "class" interface. Devices, once registered, can be associated
with one or more classes which describe the function(s) performed by the
device. Class memberships show up under the /sys/class sysfs
directory, and, of course, can be decorated with all kinds of attributes.
There are also mechanisms which provide notification - both within and
outside of the kernel - when a device joins or leaves a class. The class
interface can also be the easiest way for a driver to make arbitrary
attributes available via sysfs.
For many (if not most) drivers, class membership will be handled
automatically in the higher layers. Block devices, for example, are
associated with the "block" class when their associated gendisk
structures are registered. (This class currently appears in
/sys/block, incidentally; it will likely move to
/sys/class/block at some point). Occasionally, however, it can be
necessary to explicitly associate a device with a specific class. This
article describes how to do that, and - though remaining superficial - it
provides more information than is really needed in order to, with luck,
provide an understanding of how the class system works.
For those wishing for a hands-on example, the full source for a version of the "simple block
driver" module that understands classes is available.
Creating a class
It is a rare device which exists in a unique class of its own; as a result,
drivers will almost never create their own classes. Should the need arise,
however, the process is simple. The first step is the creation of a
struct class (defined in
<linux/device.h>).
There are two necessary fields, being the name and a pointer to a "release"
function; the SBD driver sets up its class as:
static struct class sbd_class = {
.name = "sbd",
.release = sbd_class_release
};
The name is, of course, how this class will show up under
/sys/class. We will get to the release function shortly, after we
have looked at class devices.
Beyond that, there is only one other thing that a class definition can
provide: a "hotplug" function:
int (*hotplug)(struct class_device *dev, char **envp,
int num_envp, char *buffer, int buffer_size);
The addition of a device to a class creates a hotplug event. Before
/sbin/hotplug is called to respond to that event, the class's
hotplug() method (if any) will be called. That method can add
variables to the environment that is passed to /sbin/hotplug; they
should be put into buffer (respecting the given
buffer_size) with pointers set into envp (but no more
than num_envp of them, and with a NULL pointer to
terminate the list). The return value should be zero, or the
usual negative error code.
Classes need to be registered, of course:
int class_register(struct class *cls);
The return value will be zero of all goes well. The void function
class_unregister() will do exactly what one would expect.
Class devices
If your device type lacks a specific registration function of its own (such
as
add_disk() or
register_netdev()), or if you have
created your own custom class, you may find
yourself adding your device(s) to a class explicitly. Membership in a
class is represented by an instance of
struct class_device. There
are three fields that should normally be filled in:
struct class *class;
struct device *dev;
char class_id[BUS_ID_SIZE];
The class pointer, of course, should be aimed at the proper class
structure. The dev pointer is optional; it is used to create the
device and driver symbolic links in the device's class
entry in sysfs. Since
user-space processes looking to discover devices of a particular class
probably want to have that pointer, you should make it easy for them. The
class_id is a string which is unique within the class - it
becomes, of course, the name of the device's sysfs entry.
Once the class_device structure has been set up, it can be added
to the class with:
int class_device_register(struct class_device *class_dev);
class_device_unregister() can be used at module unload time.
Once you register a class device, it becomes available to the world as a
whole. If your class device is allocated dynamically, you must be very
careful about when you free it. Remember that user-space processes can
retain references to your device via your sysfs attributes; you must not
free the class device until all of those references are gone.
That, of course, is the purpose of the release function stored in
struct class. This function has a simple prototype:
void release_fn(struct class_device *cd);
This function is called when the last reference to the given device goes
away; it should respond by freeing the device. That call will typically
happen when you call class_device_unregister() on the device, but
it could happen later if other references persist.
Please note that, if your class device structure is dynamically allocated,
or it embedded within another, dynamic structure, you must use a
release function to free that structure or your code is buggy.
Class device attributes
Attributes are easily added to a class device entry. If the attribute is
to be readable, it will need a "show" function to respond to reads; the
function used to export the driver version in SBD looks like:
static ssize_t show_version(struct class_device *cd, char *buf)
{
sprintf(buf, "%s\n", Version);
return strlen(buf) + 1;
}
If the attribute is to be writable, you will need a store function too:
ssize_t (*store)(struct class_device *, const char *buf, size_t count);
These functions are then bundled into an attribute structure with:
CLASS_DEVICE_ATTR(name, mode, show, store);
The name should not be a quoted string; it is joined in the macro
to create a structure called class_device_attr_name.
The final step is to create the actual device attribute, using:
int class_device_create_file(struct class_device *,
struct class_device_attribute *);
You can call class_device_remove_file() to get rid of an
attribute, but that is also done automatically for you when a device is
removed from a class.
Interfaces
The term "interface," as used within the device model, is a bit confusing.
A better way to think of interfaces is as a sort of constructor and
destructor mechanism for class device entries. An interface provides
add() and
remove() methods which are called as devices
are added to (and removed from) a class; their usual purpose is to add
class-specific attributes to the class device entry. They can, however,
perform any other kernel function that might be useful in response to class
device events.
Briefly, the creation of an interface requires the creation of a
class_interface structure, which needs to have the following
fields filled in:
struct class *class;
int (*add) (struct class_device *);
void (*remove) (struct class_device *);
Once the interface is set up with:
int class_interface_register(struct class_interface *);
The add() and remove() functions will be called when
devices are added to (or removed from) the given class. A call to
class_interface_unregister() undoes the registration.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
- Douglas Gilbert: sgbind.
(May 14, 2003)
Documentation
Filesystems and block I/O
Janitorial
Kernel building
Memory management
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>