Kernel development [LWN.net]

Kernel release status

The current development kernel is 2.6.0-test7, which was released by Linus on October 8. Changes this time include a bunch of janitorial work, some IDE driver updates, a new filesystem mount option parsing scheme, a change to how module array parameters are declared, some video for Linux updates, an ACPI update, an XFS update, a reserved system call for the vserver project, and lots of fixes. See the long-format changelog for the details.

As part of the announcement, Linus has stated that he is tightening up the criteria for accepting patches.

The more interesting thing is that I and Andrew are trying to calm down development, and I do _not_ want to see patches that don't fix a real and clear bug. In other words, the "cleanup and janitorial" stuff is on hold, and -test8 and then -test9 should be for _stability_ fixes only.

The current stable kernel is 2.4.22; the last 2.4.23 prepatch was 2.4.23-pre6 on October 1.

Comments (1 posted)

Alternative kernel trees

Much attention is paid to official kernel releases from Linus and Marcelo. There are, however, a number of other kernel trees out there, many of which offer different views of how the kernel should be or where its development should go. It has been a while since we've looked at the alternative kernel trees which are currently being maintained, so it is a good time for an update. We'll start with the 2.6-based trees.

Andrew Morton's -mm tree (currently at 2.6.0-test6-mm4) remains the largest staging area for code headed toward the mainline. The -mm kernels give big patches a place where they can be examined and tested without breaking the mainline kernel. This tree currently contains, beyond lots of fixes, a full set of kgdb patches, the current versions of the must-fix and should-fix lists, a bunch of VFS work from Al Viro (aimed at making hot removal of disks work properly), the CFQ disk I/O scheduler, Intel MSI and EFI support, the 4G/4G large memory patch, a lot of direct and asynchronous I/O work, and a patch called "support-zillions-of-scsi-disks."

Stephen Hemminger recently released 2.6.0-test6-osdl1. This relatively small patch has been reborn; it now concerns itself with features that will be merged after the 2.6.0 release, if ever. Thus, it includes a patch adding file extents to ext3, Ingo Molnar's ExecShield, the Linux kernel crash dump facility, the kexec system call, and a few others.

Martin Bligh continues to release occasional -mjb kernels; the latest is 2.6.0-test6-mjb1. These kernels have "mainly scalability and NUMA stuff, and anything else that stops things from irritating me." The patch currently includes a configuration option for the internal clock speed, a number of tunable parameters for the scheduler, the lockmeter patch, the object-based reverse mapping patch, and a number of NUMA-related patches.

Randy Dunlap posts an occasional -kj tree; 2.6.0-test6-kj1 was released on September 29. This is not the tree for people seeking exciting new features; its purpose is to serve as a collection area for janitorial patches that might otherwise fall through the cracks.

Alan Cox's departure from the kernel scene has left a large hole where his 2.4-base -ac tree used to be. Many distributors based their stock kernels on something close to Alan's tree. The -ac tree has technically been taken over by Bernhard Rosenkraenzer, but his last release was 2.4.23-pre4-pac2 on September 17.

Andrea Arcangeli has, of late, started announcing more of his -aa trees to the world. His latest (2.4.23pre6aa2) includes a new "desktop" boot parameter which sets several options for optimal desktop performance, run-time configurable internal clock speed, some virtual memory work, the TUX HTTP server, kgdb, the 2.6 "futex" feature, XFS version 13, Jens Axboe's "laptop mode" patch, and many others. Andrea plans to include Jeff Garzik's "libata" disk drivers soon.

James Bourne maintains a "-uv" patch series which is limited to compilation and security fixes for the current stable kernel. The latest is 2.4.22-uv2.

Comments (1 posted)

Relayfs

Tom Zanussi has posted a new version of the "relayfs" filesystem code; the full set of patches can be found in the "patches" section, below. Relayfs is an attempt to provide a common framework for kernel code which must exchange large amounts of data with user space. The initial application would appear to be for kernel event tracing and profiling, but one can certainly imagine other ways to use such a system as well.

Relayfs is, of course, yet another virtual filesystem implemented by the kernel; it must be explicitly mounted by user space to be available. Kernel code can then create a relay with relay_open(); it will show up as a file under relayfs. User space can then open the relay and employ all of the usual file operations - including mmap() and poll() - to exchange data with the kernel. To an application, a relayfs file descriptor looks much like a Unix-domain socket, except that the other end is a piece of kernel code rather than another process.

The interface on the kernel side is a bit more complex. The expected relay_read() and relay_write() functions exist and can be used to move data to and from user space. But relayfs also exposes much of the internal structure to kernel code that needs to know about it. So special-purpose code can obtain a pointer into the relayfs buffer and copy data there directly, for example. There is also a set of callbacks for kernel code that wants to know about relayfs events, and a set of utilities for manipulating the buffer size, optimizing the locking used, etc.

Relayfs is a non-intrusive patch - it does not affect parts of the kernel that are not explicitly changed to make use of it. So it is conceivable that this patch could yet make it into a 2.6 release. The reimplementation of printk() which uses relayfs might have to wait a little longer, however.

Comments (none posted)

kobjects and hotplug events

October 7, 2003

This article was contributed by Greg Kroah-Hartman.

Last week, in the article about kobjects, it was mentioned that a kset has a set of hotplug operations. This week we will introduce the hotplug operations, and detail how they work.

Remember that a kset is a group of kobjects which are all embedded in the same type of structure. In the definition of a kset, a pointer to a struct kset_hotplug_ops is specified. If this pointer is set, whenever a kobject that is a member of that kset is created or destroyed by the kernel, the userspace program /sbin/hotplug will be called. If a kobject does not have a kset associated with it, the kernel will traverse up the kobject hierarchy (using the parent pointer) to try to find a kset to use for this test.

struct kset_hotplug_ops is a structure containing three function pointers and is defined as:

    struct kset_hotplug_ops {
	int (*filter)(struct kset *kset, struct kobject *kobj);
	char *(*name)(struct kset *kset, struct kobject *kobj);
	int (*hotplug)(struct kset *kset, struct kobject *kobj, 
                       char **envp, int num_envp, 
		       char *buffer, int buffer_size);
    };

Hotplug filters

The filter function will be called by the kernel before a hotplug operation happens. The kobject and the kset which are being used for the hotplug event are passed as parameters to the function. If this function returns 1 then the hotplug event will be generated; otherwise (if the function returns 0), the hotplug event will not be generated. This function is used by the driver core and the block subsystem to filter out hotplug events for kobjects that are owned by these systems but which should not have hotplug events generated for them.

As an example, the driver core's hotplug filter is contained in the file drivers/base/core.c and looks like:

static int dev_hotplug_filter(struct kset *kset, struct kobject *kobj)
{
	struct kobj_type *ktype = get_ktype(kobj);

	if (ktype == &ktype_device) {
		struct device *dev = to_dev(kobj); 
		if (dev->bus)
			return 1;
	}
	return 0;
}

In this function, the first thing that happens is the type of the kobject is checked. If this really is a device type of kobject, then we know it is safe to cast this kobject to a struct device, which is done in the line:

    struct class_device *class_dev = to_class_dev(kobj);

If this class device has a class assigned to it (dev->bus), the filter function tells the kobject core that it is acceptable to generate a hotplug event for this object. If any of these tests fail, the function returns 0 stating that no hotplug event should be generated.

The filter function allows objects in the device tree to own kobjects themselves (to create subdirectories, and for other uses) and prevent hotplug events from being created for these child kobjects.

Hotplug event names

When /sbin/hotplug is called by the kernel, it only has one argument passed to it, the name of the subsystem creating the event. All other information about the hotplug event is passed in environment variables. For detailed examples of some of the hotplug events and environment variables, see the Linux Hotplug project website.

For the kobject core to know what kind of name to provide to this hotplug event, the name function callback is provided. If the kset associated with this kobject wants to override the name of the kset for the hotplug event, then this function needs to return a pointer to a string that is more suitable. If this function is not provided, or it returns NULL, then the kset's name will be used.

For example, all struct device objects in the kernel belong to the same device kset (the device, driver, and class model sits on top of kobjects and ksets, making it simpler for driver authors to use). This kset is called "devices". It would not make much sense for every USB or IEEE1394 device that was plugged into, or removed from the system to generate a hotplug event with the name "devices". Because of this, the device subsystem has a name function for its hotplug operations:

static char *dev_hotplug_name(struct kset *kset, struct kobject *kobj)
{       
	struct device *dev = to_dev(kobj);

	return dev->bus->name; 
}

In this function, the kobject is converted to a struct device, and then the name of the bus associated with this device is returned. This allows USB devices to create hotplug events with the name "usb" and IEEE1394 devices to create hotplug events with the name "ieee1394".

One note about this function: the only way that we know it is safe to directly cast this kobject into a struct device is that it has passed the filter function first. In that function, the type of the kobject and the fact that the device had a pointer to a bus was verified. Without that filter function, that information would have to be checked before blindly casting and following two levels of pointer indirection.

Hotplug environment variables

All calls to /sbin/hotplug provide the majority of information within environment variables. The three variables that are always set for every hotplug call are the following:

Variable	Value	Description
`ACTION`	`add` or `remove`	Describes if the kobject is being added or removed from the system.
`SEQNUM`	numeric	Provides the sequence number of the hotplug event. It is used for userspace to determine if it has received the hotplug event out of order or not. The value starts out a `0` when the kernel boots, and increments with every `/sbin/hotplug` call. It is a 64-bit number, so it will not roll over for a very long time.
`DEVPATH`	string	The path to the kobject that the hotplug event is happening on, within the `sysfs` file system. To get the true filesystem location for this kobject, add the mount point for `sysfs` (usually `/sys`) to the beginning of this string.

These variables are usually enough for userspace to determine what is happening with this hotplug event, but a lot of subsystems want to provide more information. This is especially true when a kobject is removed from the system, as the sysfs entry for the device will also be removed, preventing userspace from being able to look up any attributes about the device that was just removed. Because of this, the hotplug callback is provided for the kset to provide any additional environment variables that it wants to.

The hotplug function callback is allowed to add any additional environment variables that the kset might want added for this call to /sbin/hotplug. To review the prototype for this function:

    int (*hotplug)(struct kset *kset, struct kobject *kobj, 
                   char **envp, int num_envp, 
		   char *buffer, int buffer_size);

Here, kset and kobj are the objects for which the event is happening, envp is a pointer to an array of environment variables (in the usual "NAME=value" format), num_envp is the length of envp, buffer is a buffer where additional variables can be put, and buffer_size is the size of buffer. The hotplug function should create any additional environment variables that are called for, store pointers to them in envp, and terminate envp with a NULL. If the hotplug callback returns a non-zero value, the hotplug event is aborted, and /sbin/hotplug will not be called.

The driver and class subsystems pass hotplug calls down to the bus and class owners of the kobject that is being created or removed, allowing these individual subsystems to add their own environment variables. For example, for all devices located on the USB bus, the function usb_hotplug() in the drivers/usb/core/usb.c file will be called. This function is defined as (with much of the boring code removed):

static int usb_hotplug(struct device *dev, char **envp, int num_envp,
		       char *buffer, int buffer_size)
{
	struct usb_interface *intf;
	struct usb_device *usb_dev;
	char *scratch;
	int i = 0;
	int length = 0;

	/* ... */
	intf = to_usb_interface(dev);
	usb_dev = interface_to_usbdev(intf);

	/* ... */
	scratch = buffer;
	envp[i++] = scratch;
	length += snprintf(scratch, buffer_size - length, "PRODUCT=%x/%x/%x",
			   usb_dev->descriptor.idVendor,
			   usb_dev->descriptor.idProduct,
			   usb_dev->descriptor.bcdDevice);
	if ((buffer_size - length <= 0) || (i >= num_envp))
		return -ENOMEM;
	++length;
	scratch += length;

	/* ... */
	envp[i++] = NULL;
	return 0;
}

The lines:

	scratch = buffer;
	envp[i++] = scratch;

set up the environment pointer to point to the next location in the buffer passed to us. Then the big call to snprintf creates a variable called PRODUCT which is assigned the value of the USB device's vendor, product and device ids separated by a '/' character. If snprintf succeeded in not overrunning the buffer provided to us, and we still have enough room for one more environment variable, then the function continues on. The last environment variable pointer is set to NULL before returning.

All that work for a simple result

With the combined effort of the kset hotplug function callbacks every kset can customize the call to /sbin/hotplug in whatever way it likes while still providing userspace a consistent interface from the kernel. Every kobject that is registered with sysfs can generate this call easily, so all parts of the kernel that use kobjects and ksets automatically get the /sbin/hotplug interface for free. This allows userspace projects such as the module loading scripts, devlabel, udev, and D-BUS valuable information as to what the kernel is doing whenever a change in the kobject tree occurs.

Comments (1 posted)

Linus Torvalds Linux 2.6.0-test7 - stability freeze ?

Andrew Morton 2.6.0-test6-mm3 ?

Andrew Morton 2.6.0-test6-mm4 ?

Martin J. Bligh 2.6.0-test6-mjb1 ?

Stephen Hemminger 2.6.0-test6-osdl1 ?

Andrea Arcangeli 2.4.23pre6aa1 ?

Andrea Arcangeli 2.4.23pre6aa2 ?

James Bourne 2.4.22-uv2 patch set released ?

Peter Aechtler [1/2] posix message queues ?

Peter Aechtler [2/2] posix message queues ?

Robert Williamson Linux Test Project October Release Announcement ?

Long Nguyen Updated MSI Patches ?

Greg KH USB fixes for 2.6.0-test6 ?

Jeff Garzik libata update posted ?

Gerd Knorr v4l: videobuf update ?

Gerd Knorr v4l: saa7146 driver update ?

Gerd Knorr v4l: bttv driver update ?

Gerd Knorr saa7134 driver update ?

David Brownell [patch 2.6.0-test6] usbcore and driver model (0/3) ?

David Brownell [patch 2.6.0-test6] rm interface.driver (1/3) ?

David Brownell [patch 2.6.0-test6] usbfs updates (2/3) ?

David Brownell [patch 2.6.0-test6] usbnet (3/3) ?

Greg KH gadget serial driver -- patch for 2.6 ?

H. Peter Anvin Horribly overdue update to unicode.txt ?

Roman Zippel new HFS(+) driver ?

Tom Zanussi relayfs (1/4) (Documentation) ?

Tom Zanussi relayfs (2/4) (include files) ?

Tom Zanussi relayfs (3/4) (VFS and scheme-specific code) ?

Tom Zanussi relayfs (4/4) (public API and common code) ?

Jim Keniston Net device error logging ?

Netfilter Core Team Release of iptables-1.2.9rc1 ?

Christian Kujau crypto benchmark results with 2.6.0-test6 ?

Mike Benoit File System shootout... ?

Kernel development

Brief items

Kernel release status

Kernel development news

Alternative kernel trees

Relayfs

Driver porting

kobjects and hotplug events

Hotplug filters

Hotplug event names

Hotplug environment variables

All that work for a simple result

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Networking

Benchmarks and bugs