Brief items
The current development kernel is 2.6.0-test7, which was
released by Linus on October 8. Changes
this time include a bunch of janitorial work, some IDE driver updates, a
new filesystem mount option parsing scheme, a change to how module array
parameters are declared, some video for Linux updates, an ACPI update, an
XFS update, a reserved system call for the
vserver
project, and lots of fixes. See
the
long-format changelog for the details.
As part of the announcement, Linus has stated that he is tightening up the
criteria for accepting patches.
The more interesting thing is that I and Andrew are trying to calm
down development, and I do _not_ want to see patches that don't fix
a real and clear bug. In other words, the "cleanup and janitorial"
stuff is on hold, and -test8 and then -test9 should be for
_stability_ fixes only.
The current stable kernel is 2.4.22; the last 2.4.23 prepatch was 2.4.23-pre6 on October 1.
Comments (1 posted)
Kernel development news
Much attention is paid to official kernel releases from Linus and Marcelo.
There are, however, a number of other kernel trees out there, many of which
offer different views of how the kernel should be or where its development
should go. It has been a while since we've looked at the alternative
kernel trees which are currently being maintained, so it is a good time for
an update. We'll start with the 2.6-based trees.
Andrew Morton's -mm tree (currently at 2.6.0-test6-mm4) remains the largest staging
area for code headed toward the mainline. The -mm kernels give big patches
a place where they can be examined and tested without breaking the mainline
kernel. This tree currently contains, beyond lots of fixes, a full set of
kgdb patches, the current versions of the must-fix
and should-fix
lists, a bunch of VFS work from Al Viro (aimed at making hot removal of
disks work properly), the CFQ disk I/O scheduler, Intel MSI and EFI
support, the 4G/4G large memory patch, a lot of direct and asynchronous I/O
work, and a patch called "support-zillions-of-scsi-disks."
Stephen Hemminger recently released 2.6.0-test6-osdl1. This relatively small patch
has been reborn; it now concerns itself with features that will be merged
after the 2.6.0 release, if ever. Thus, it includes a patch adding file
extents to ext3, Ingo Molnar's ExecShield, the Linux kernel crash dump
facility, the kexec system call, and a few others.
Martin Bligh continues to release occasional -mjb kernels; the latest is 2.6.0-test6-mjb1. These kernels have
"mainly scalability and NUMA stuff, and anything else that stops
things from irritating me." The patch currently includes a
configuration option for the internal clock speed, a number of tunable
parameters for the scheduler, the lockmeter patch, the object-based reverse
mapping patch, and a number of NUMA-related patches.
Randy Dunlap posts an occasional -kj tree; 2.6.0-test6-kj1 was released on
September 29. This is not the tree for people seeking exciting new
features; its purpose is to serve as a collection area for janitorial
patches that might otherwise fall through the cracks.
Alan Cox's departure from the kernel scene has left a large hole where his
2.4-base -ac tree used to be. Many distributors based their stock kernels
on something close to Alan's tree. The -ac tree has technically been taken
over by Bernhard Rosenkraenzer, but his last release was 2.4.23-pre4-pac2 on September 17.
Andrea Arcangeli has, of late, started announcing more of his -aa trees to
the world. His latest (2.4.23pre6aa2)
includes a new "desktop" boot parameter which sets several options for
optimal desktop performance, run-time configurable internal clock speed,
some virtual memory work, the TUX HTTP server, kgdb, the 2.6 "futex"
feature, XFS version 13, Jens Axboe's "laptop mode" patch, and many
others. Andrea plans to include Jeff Garzik's "libata" disk drivers soon.
James Bourne maintains a "-uv" patch series which is limited to compilation
and security fixes for the current stable kernel. The latest is 2.4.22-uv2.
Comments (1 posted)
Tom Zanussi has posted a new version of the "relayfs" filesystem code; the
full set of patches can be found in the "patches" section, below. Relayfs
is an attempt to provide a common framework for kernel code which must
exchange large amounts of data with user space. The initial application
would appear to be for kernel event tracing and profiling, but one can
certainly imagine other ways to use such a system as well.
Relayfs is, of course, yet another virtual filesystem implemented by the
kernel; it must be explicitly mounted by user space to be available.
Kernel code can then create a relay with relay_open(); it will
show up as a file under relayfs. User space can then open the relay and
employ all of the usual file operations - including mmap() and
poll() - to exchange data with the kernel. To an application, a
relayfs file descriptor looks much like a Unix-domain socket, except that
the other end is a piece of kernel code rather than another process.
The interface on the kernel side is a bit more complex. The expected
relay_read() and relay_write() functions exist and can be
used to move data to and from user space. But relayfs also exposes much of
the internal structure to kernel code that needs to know about it. So
special-purpose code can obtain a pointer into the relayfs buffer and copy
data there directly, for example. There is also a set of callbacks for
kernel code that wants to know about relayfs events, and a set of utilities
for manipulating the buffer size, optimizing the locking used, etc.
Relayfs is a non-intrusive patch - it does not affect parts of the kernel
that are not explicitly changed to make use of it. So it is conceivable
that this patch could yet make it into a 2.6 release. The reimplementation
of printk() which uses relayfs might have to wait a little longer,
however.
Comments (none posted)
Driver porting
Last week, in the
article about kobjects, it was mentioned that a kset has a set of
hotplug operations. This week we will introduce the hotplug operations,
and detail how they work.
Remember that a kset is a group of kobjects which are all embedded in
the same type of structure. In the definition of a kset, a pointer to a
struct kset_hotplug_ops is specified. If this pointer is
set, whenever a kobject that is a member of that kset is created or
destroyed by the kernel, the userspace program /sbin/hotplug
will be called. If a kobject does not have a kset associated with it,
the kernel will traverse up the kobject hierarchy (using the
parent pointer) to
try to find a kset to use for this test.
struct kset_hotplug_ops is a structure containing three
function pointers and is defined as:
struct kset_hotplug_ops {
int (*filter)(struct kset *kset, struct kobject *kobj);
char *(*name)(struct kset *kset, struct kobject *kobj);
int (*hotplug)(struct kset *kset, struct kobject *kobj,
char **envp, int num_envp,
char *buffer, int buffer_size);
};
Hotplug filters
The filter function will be called by the kernel before a
hotplug operation happens. The kobject and the kset which are being used
for the hotplug event are passed as parameters to the function. If this
function returns 1 then the hotplug event will be generated;
otherwise (if the function returns 0), the hotplug event will not be
generated. This function is used by the driver core and the block
subsystem to filter out hotplug events for kobjects that are owned by
these systems but which should not have hotplug events generated for them.
As an example, the driver core's hotplug filter is contained in the file
drivers/base/core.c and looks like:
static int dev_hotplug_filter(struct kset *kset, struct kobject *kobj)
{
struct kobj_type *ktype = get_ktype(kobj);
if (ktype == &ktype_device) {
struct device *dev = to_dev(kobj);
if (dev->bus)
return 1;
}
return 0;
}
In this function, the first thing that happens is the type of the
kobject is checked. If this really is a device type of kobject,
then we know it is safe to cast this kobject to a struct device,
which is done in the line:
struct class_device *class_dev = to_class_dev(kobj);
If this class device has a class assigned to it (dev->bus), the
filter function tells the kobject core that it is acceptable to generate
a hotplug event for this object. If any of these tests fail, the
function returns 0 stating that no hotplug event should be
generated.
The filter function allows objects in the device tree to own
kobjects themselves (to create subdirectories, and for other uses) and
prevent hotplug events from being created for these child kobjects.
Hotplug event names
When /sbin/hotplug is called by the kernel, it only has one
argument passed to it, the name of the subsystem creating the event.
All other information about the hotplug event is passed in environment
variables. For detailed examples of some of the hotplug events and
environment variables, see the
Linux Hotplug project website.
For the kobject core to know what kind of name to provide to this
hotplug event, the name function callback is provided. If the
kset associated with this kobject wants to override the name of the kset
for the hotplug event, then this function needs to return a pointer to a
string that is more suitable. If this function is not provided, or it
returns NULL, then the kset's name will be used.
For example, all struct device objects in the kernel belong to
the same device kset (the device, driver, and class model sits on top of
kobjects and ksets, making it simpler for driver authors to use). This
kset is called "devices". It would not make much sense for
every USB or IEEE1394 device that was plugged into, or removed from the
system to generate a hotplug event with the name "devices".
Because of this, the device subsystem has a name function for its
hotplug operations:
static char *dev_hotplug_name(struct kset *kset, struct kobject *kobj)
{
struct device *dev = to_dev(kobj);
return dev->bus->name;
}
In this function, the kobject is converted to a struct device,
and then the name of the bus associated with this device is returned.
This allows USB devices to create hotplug events with the name
"usb" and IEEE1394 devices to create hotplug events with the
name "ieee1394".
One note about this function: the only way that we know it is safe to
directly cast this kobject into a struct device is that it has
passed the filter function first. In that function, the type
of the kobject and the fact that the device had a pointer to a bus was
verified. Without that filter function, that information would have to
be checked before blindly casting and following two levels of pointer
indirection.
Hotplug environment variables
All calls to /sbin/hotplug provide the majority of information
within environment variables. The three variables that are always set
for every hotplug call are the following:
| Variable |
Value |
Description |
| ACTION |
add or remove |
Describes if the kobject is being
added or removed from the system.
|
| SEQNUM |
numeric |
Provides the sequence number of the
hotplug event. It is used for userspace to
determine if it has received the hotplug event
out of order or not. The value starts out a
0 when the kernel boots, and increments
with every /sbin/hotplug call. It is a
64-bit number, so it will not roll over for a
very long time.
|
| DEVPATH |
string |
The path to the kobject that
the hotplug event is happening on, within the
sysfs file system. To get the true
filesystem location for this kobject, add the
mount point for sysfs (usually
/sys) to the beginning
of this string.
|
These variables are usually enough for userspace to determine what is
happening with this hotplug event, but a lot of subsystems want to
provide more information. This is especially true when a
kobject is removed from the system, as the sysfs entry for the device
will also be removed, preventing userspace from being able to look up
any attributes about the device that was just removed. Because of this,
the hotplug callback is provided for the kset to provide any
additional environment variables that it wants to.
The hotplug function callback is allowed to add any additional
environment variables that the kset might want added for this call to
/sbin/hotplug. To review the prototype for this function:
int (*hotplug)(struct kset *kset, struct kobject *kobj,
char **envp, int num_envp,
char *buffer, int buffer_size);
Here, kset and kobj are the objects for which the event
is happening, envp is a pointer to an array of environment
variables (in the usual "NAME=value" format), num_envp is the
length of envp,
buffer is a buffer where additional variables can be put, and
buffer_size is the size of buffer.
The hotplug function should create any additional environment variables
that are called for, store pointers to them in envp, and terminate
envp with a NULL.
If the hotplug callback returns a non-zero value, the hotplug
event is aborted, and /sbin/hotplug will not be called.
The driver and class subsystems pass hotplug calls
down to the bus and class owners of the kobject that is being
created or removed, allowing these individual subsystems to add
their own environment variables. For example, for all devices located
on the USB bus, the function usb_hotplug() in the
drivers/usb/core/usb.c file will be called. This function is
defined as (with much of the boring code removed):
static int usb_hotplug(struct device *dev, char **envp, int num_envp,
char *buffer, int buffer_size)
{
struct usb_interface *intf;
struct usb_device *usb_dev;
char *scratch;
int i = 0;
int length = 0;
/* ... */
intf = to_usb_interface(dev);
usb_dev = interface_to_usbdev(intf);
/* ... */
scratch = buffer;
envp[i++] = scratch;
length += snprintf(scratch, buffer_size - length, "PRODUCT=%x/%x/%x",
usb_dev->descriptor.idVendor,
usb_dev->descriptor.idProduct,
usb_dev->descriptor.bcdDevice);
if ((buffer_size - length <= 0) || (i >= num_envp))
return -ENOMEM;
++length;
scratch += length;
/* ... */
envp[i++] = NULL;
return 0;
}
The lines:
scratch = buffer;
envp[i++] = scratch;
set up the environment pointer to point to the next location in the
buffer passed to us. Then the big call to
snprintf creates a
variable called
PRODUCT which is assigned the value of the USB
device's vendor, product and device ids separated by a
'/'
character. If
snprintf succeeded in not overrunning the
buffer provided to us, and we still have enough room for one more
environment variable, then the function continues on. The last environment
variable pointer is set to NULL before returning.
All that work for a simple result
With the combined effort of the kset hotplug function callbacks every
kset can customize the call to /sbin/hotplug in whatever way it
likes while still providing userspace a consistent interface from the
kernel. Every kobject that is registered with
sysfs can generate this call easily, so all parts of the
kernel that use kobjects and ksets automatically get the
/sbin/hotplug interface for free. This allows userspace
projects such as the
module loading scripts,
devlabel,
udev,
and
D-BUS
valuable information as to what the kernel is doing whenever a change in
the kobject tree occurs.
Comments (1 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Networking
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>