|
|
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current development kernel is 2.6.0-test5; Linus has released no kernels since September 8. It has been a relatively slow period for kernel development in general.

Patches in Linus's BitKeeper repository include a Coda filesystem update, some initramfs tweaks, improvements in random driver locking, the removal of some ext3 debugging hooks, direct I/O support for reiserfs, some CPU frequency work, an Intel SpeedStep-SMI driver, a substantial amount of janitorial work, and various fixes.

The current stable kernel is 2.4.22. Marcelo released 2.4.23-pre4 on September 12; it includes some VM improvements (including the removal of the much-maligned out-of-memory killer), an ia-64 update, some NFS work, a wireless update, and various other fixes.

Comments (1 posted)

Kernel development news

Solving out-of-memory situations the Linux way

The out-of-memory (OOM) killer is a longstanding source of controversy in Linux development circles. The killer comes into play if the kernel encounters a memory shortage so severe that the ongoing functioning of the system is endangered. Rather than panic or lock up, the kernel brings in the OOM killer, which goes looking for processes to kill. The killer has a complicated set of heuristics built into it in an attempt to have it target the processes that are least likely to be missed. Anybody who has seen the OOM killer in action, however, knows that it can still make unfortunate choices. Choosing the process which (1) is among the least valuable on the system, and (2) is a significant part of the memory problem is a difficult task.

As a result of discomfort with this grim reaper lurking within the kernel, and of recently merged VM improvements, the OOM killer has been removed from the 2.4.23 prepatch series.

For 2.6, Rusty Lynch has just posted a different answer that should, perhaps, have been obvious from the beginning. Rather than trying to come up with a set of OOM killer heuristics that works for everybody, Rusty's patch sets up a notifier-based mechanism that allows for pluggable OOM killer modules. With this patch, anybody who wants to set up a different response to memory shortages need only write a module implementing that technique.

The patch includes the standard OOM killer, along with an example alternative which simply panics the system. But there is already talk of creating OOM killer modules implementing different policies. One, which has been posted already, targets processes if they are seen to be forking children which fall victim to the OOM killer; it works on the assumption that the parent is the real source of the problem. A "blame Mozilla" module has been suggested. And Alan Cox has suggested involving the security module code so that a site's security policies can be part of the OOM reaction process.

It's unclear how far this process will go. But pluggable OOM killers is a clear way of ending the long discussion over what the right policy should be. Linux is, after all, about choice, even when the choices are unpleasant.

Comments (8 posted)

OpenBIOS releases a Forth kernel

The OpenBIOS project has announced the release of a Forth kernel, known as "BeginAgain." Most users, who are strangely uninterested in typing Forth code at something close to bare hardware, will probably not rush out to install this release. But it is a step forward for the OpenBIOS project and for everybody wanting to run their systems with free software all the way down to the bare metal. The BeginAgain platform is mostly useful for testing at this point, but when a few more pieces are added (a device interface and the client layer which will allow the system to boot operating systems) OpenBIOS should start to get interesting for a wider group of users.

Comments (none posted)

ACPI gets a new maintainer

Andrew Grover has announced that he is no longer the ACPI maintainer; his duties have been passed on to Len Brown. ACPI is still not popular among all developers and users, but the simple fact is that good ACPI support is now required to get many systems to function properly. Andrew and his team have put massive amounts of work into the Linux ACPI implementation over the last few years, with the result that Linux does, indeed, have good ACPI support. Thanks, Andrew; we're looking forward to your next project, whatever it may be.

Comments (none posted)

Driver porting

Driver porting: Char devices and large dev_t

This article is part of the LWN Porting Drivers to 2.6 series.
Much 2.5 kernel development work went toward increasing the size of the dev_t device number type. That work has necessarily forced some changes in how device drivers work with the rest of the kernel. This article describes the changes as seen from the point of view of char drivers. It is current as of the 2.6.0-test9 kernel. Note that the interfaces describe here are still volatile and could change significantly before 2.6.0-final is released.

Major and minor numbers

With the expanded dev_t, it is no longer be possible to assume that major and minor numbers fit within eight bits. To the greatest extent possible, the relevant interfaces have been changed in ways that will not break existing drivers. In particular, a driver which uses the longstanding register_chrdev() function to register a char device will never see minor device numbers greater then 255. Attempts to open a device node with a larger minor number will simply fail with a "no such device" error.

One change that is visible to all drivers, however, is the elimination of the kdev_t type. Device numbers are now a simple dev_t throughout the kernel. The place where this change is most apparent for most will be the change in the type of the inode i_rdev field. Drivers which need to get major or minor numbers from inodes should use the two new helper functions:

    unsigned iminor(struct inode *inode);
    unsigned imajor(struct inode *inode);

Use of these functions will help keep a driver working in the future, even if the representation within inodes changes again.

The new way

register_chrdev() continues to work as it always did, and drivers which use that function need not be changed. Unchanged drivers, however, will not be able to use the expanded device number range, or take advantage of the other features provided by the new code. Sooner or later, it is worthwhile to get to know the new interface.

The new way to register a char device range is with:

    int register_chrdev_region(dev_t from, unsigned count, char *name);

Here, from is the device number of the first device in the range, count is the number of device numbers to register, and name is the base name of the device (it appears in /proc/devices). The return value is zero if all goes well, and a negative error number otherwise.

Note that from is a device number, not a major number. This interface allows the registration of an arbitrary range of device numbers, starting from anywhere. So the from argument specifies both the beginning major and minor number. If the count argument exceeds the number of minor numbers available, the allocation will continue on into the next major number; this is a design feature.

register_chrdev_region() works if you know which major device number you wish to use. If, instead, your driver expects to work with dynamic major number allocation, it should use:

    int alloc_chrdev_region(dev_t *dev, unsigned baseminor, 
                            unsigned count, char *name);

In this case, dev is an output-only parameter which will be set to the first device number of the allocated range. The input parameters are baseminor, the first minor number to use (usually zero); count, the number of device numbers to allocate; and name, the base name of the device. Once again, the return value is zero or a negative error code.

Connecting up devices

Some readers may have noticed that the above functions, unlike register_chrdev(), do not have a file_operations argument. Registering a device number range sets those numbers aside for your use, but it does not actually make any device operations available to user space. There is now a separate object (struct cdev) which represents char devices, and which must be set up by your driver to actually make a device available.

To work with struct cdev, you code should include <linux/cdev.h>. Then, the usual way of getting one of these structures is with:

    struct cdev *cdev_alloc(void);

If all goes well, the return value will be a pointer to a newly allocated, initialized cdev structure. Check that value, though; there is a memory allocation involved, and things can always fail.

It is also possible to declare a static cdev structure, or to embed one within another structure. In this case, you should pass it to:

    void cdev_init(struct cdev *cdev, struct file_operations *fops);

before doing anything else with it.

Your driver will need to set a couple of fields in the cdev structure before adding it to the system. The owner field should be set to the owning module, usually THIS_MODULE. The device's file_operations structure should be pointed to by the ops field. And, to get a directory in sysfs, you should also set the name field in the embedded kobject, with something like:

    struct cdev *my_cdev = cdev_alloc();
    kobject_set_name(&cdev->kobj, "my_cdev%d", devnum);

Note that kobject_set_name() takes a printf()-like format string and associated arguments.

Once you have the structure set up, it's time to add it to the system:

    int cdev_add(struct cdev *cdev, dev_t dev, unsigned count);

cdev is, of course, a pointer to the cdev structure; dev is the first device number handled by this structure, and count is the number of devices it implements. This, one cdev structure can stand in for several physical devices, though you will usually not want to do things that way.

There are two important things to bear in mind when calling cdev_add(). The first is that this call can fail. If the return value is nonzero, the device has not been added and is not visible to user space. If, instead, the call succeeds, the device becomes immediately live. You should not call cdev_add() until your driver is completely ready to handle calls to the device's methods.

Adding a device also creates a directory entry under /sys/cdev, using the name stored in the kobj.name field. As of this writing, that directory is empty, but one assumes that all sorts of good things (the associated device numbers, if nothing else) will eventually show up there.

Deleting devices

If you need to get rid of a cdev structure, the usual way of doing things is to call:

    void cdev_del(struct cdev *cdev);

This function should only be called, however, on a cdev structure which has been successfully added to the system with cdev_add(). If you need to destroy a structure which has not been added in this way (perhaps cdev_add() failed), you must, instead, manually decrement the reference count in the structure's kobject with a call like:

    kobject_put(&cdev->kobj);

Calling cdev_del() on a device which is still active (if, say, a user-space process still has an open file reference to it) will cause the device to become inaccessible, but it will not actually delete the structure at that time. The reference count in the structure will keep it around until all the references have gone away. That means that your driver's methods could be called after you have deleted your cdev object - a possibility you should be aware of.

The reference count of a cdev structure can be manipulated with:

    struct kobject *cdev_get(struct cdev *cdev);
    void cdev_put(struct cdev *cdev);

Note that these functions change two reference counts: that of the cdev structure, and that of the module which owns it. It will be rare for drivers to call these functions, however.

Finding your device in file operations

Most of the methods provided by the driver in the file_operations structure take a struct inode (or a struct file which can be used to find the associated inode) as an argument. Traditionally, Linux drivers have looked at the device number stored in the inode's i_rdev field to determine which device is being operated upon. That technique still works, but, in many cases, there is a better way. In 2.6, struct inode contains a field called i_cdev, which contains a pointer to the associated cdev structure. If you have embedded one of those structures within your own, device-specific structure, you can use the container_of() macro (described in the kobject article) to obtain a pointer to that structure.

Why things were done this way

The new interface may seem rather more complex to many. Before, a single call to register_chrdev() was all that was necessary; now a driver has to deal with the additional hassle of managing cdev structures. This approach provides a great deal of flexibility, however, in how the device number space can be managed. Each device gets exactly the number range it needs, and its operations will never be invoked for device numbers outside that range. In the past, it has been noted that many drivers had incorrect range checks on minor numbers; with the new scheme, all those range checks can go away altogether.

The new method also makes it easy for each device to have its own file_operations structure without the need for big switch statements in the open() method. Separate cdev structures can also have separate entries in /sys/cdev. In general, char devices have become proper objects within the kernel, with all the advantages that come with that status. A little bit of extra object management is a small price to pay.

Comments (7 posted)

Patches and updates

Kernel trees

Andrew Morton 2.6.0-test5-mm2 ?
Randy.Dunlap 2.6.0-test5-kj1 patchset ?
Randy.Dunlap 2.6.0-test5-kj2 patchset ?
Marcelo Tosatti Linux 2.4.23-pre4 ?
Bernhard Rosenkraenzer 2.4.23-pre4-pac1 ?
Bernhard Rosenkraenzer 2.4.23-pre4-pac2 ?
Bernhard Rosenkraenzer 2.4.23-pre3-pac2 ?
Alan Cox Linux 2.4.22-ac3 ?
Con Kolivas 2.4.22-ck2 ?

Core kernel code

Con Kolivas O20.2int ?
Con Kolivas O20.3int ?

Development tools

Device drivers

Documentation

Suparna Bhattacharya Kernel AIO Web Page ?

Filesystems and block I/O

Memory management

Networking

Security-related

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds