LWN.net Logo

The 2.5 device model

A constant feature of development kernel summaries is "device model work." Perhaps it's time to take a look at what the device model actually is, and where it's going.

The device model effort has its roots in the 2001 Kernel Summit. It had become clear, at that point, that support of advanced power management would require a more structured approach to the management of devices in the Linux kernel. There has traditionally been no centralized registry of devices in the kernel - no way to just ask the system what devices were connected to it. Power management needs not only the answer to that question, but also some idea of how all the devices are plugged together. It doesn't do to shut down a SCSI controller before stopping all of the peripherals connected to that controller, for example.

So the device model work, done mainly by Patrick Mochel, started by adapting the existing PCI device scheme to represent a full system. At the center of the scheme is struct device, which, of course, represents a single device in the system. This structure contains quite a few fields, including no less than six different list heads; some of these fields will be examined shortly.

One type of device, of course, is a bus. There is a device structure for each bus, along with a bus_type structure for each type of bus. Almost every device on a system is reached via (at least) one bus, and the device model topology reflects that. Each bus device maintains, via the children list in its device structure, a list of all devices plugged into that bus. By looking at the bus_list field of any device in the system, the kernel can find all other devices attached to the same bus.

Each device structure also maintains a parent pointer (to another struct device, of course), and an entry into another list (called simply node) of all its siblings under the same parent. This hierarchy may look a lot like the bus lists already mentioned, but that is not the case. A device may be on a USB bus, but its parent may be the USB hub to which it is connected. Similarly, a SCSI tape drive may be reached through a PCI bus, but its parent is the SCSI host adaptor.

Thus, it is the parent and node lists that model the true hierarchy of the devices in the system. One could suspend a computer by starting at the top-level devices and doing a depth-first traversal of the device hierarchy via each device's children list. In fact, the device model makes this sort of traversal easy by maintaining a separate "global device list" which contains every device on the system, in the depth-first order.

As an example, your editor's system is represented in the driver model with a hierarchy like the following:

root
  pci0
    PCI host bridge
    ISA bridge
    IDE interface
    USB controller
      USB bus
        Lexar SmartMedia reader
    ACPI bridge
    SCSI adaptor
      SCSI bus 0
        Target 0 (disk drive)
	  Partition 1
	  Partition 2
	Target 1 (DAT tape)
	  st0
	  nst0
	  ...
	Target 4 (CDRW)
    Audio controller
    MIDI port
    Ethernet controller
    Graphics card
  sys
    Interrupt controller
    8253 Interval timer
    floppy controller

Each entry in the hierarchy above is one device structure in the model; each device's children list holds each indented entry below that device. The global device list, instead, contains the full hierarchy shown above, in order from top to bottom. ("sys" is a virtual bus for devices not otherwise connected to a system bus).

The model, as described so far, shows the hierarchy of the system, but does not allow the kernel to actually do much with those devices. The next step involves a new generic structure: struct device_driver, which is registered for each driver in the system. This structure tells the system what type of bus the driver expects to work with, and provides a set of useful functions. One of those functions is probe; when a new device is discovered on the system the base code calls the probe function of every likely-looking driver for the relevant bus until a driver agrees to manage the device. The system then sets the driver pointer in the device structure, and knows how to find the right driver for the device from then on.

This driver pointer is not used for normal, user-space accesses to the device - that is still handled through the device arrays (indexed by the device's major number). What that pointer can be used for, however, is power management and hotplug events. If the kernel has been told to suspend the system, for example, it now need only pass through the global device list, calling the suspend function found in the device driver structure for each device. Similarly, if the user unplugs a device, the kernel can call that device's remove function to let the driver know.

The above is sufficient to handle the basic functions needed by power management and to support hotpluggable devices. It also unifies much of the device probing and accounting logic in the kernel, allowing the removal of a great deal of duplicated code. The device model work has not stopped there, however. One recent (2.5.32) addition is the notion of device classes and interfaces. The "class" of a device is the basic function that it performs - it could be an "input" or "storage" device, for example. Not much is done with the class information currently, but the structure is there for class-level drivers to affect how the device is managed.

"Interfaces" are paths to the device from user space - normally entries in /dev. Devices which implement a given interface can be expected to respond in certain, well-defined ways. As with classes, about all that is done with interfaces, for now, is to remember them. But that could change.

This discussion, so far, has left out an important subsystem which, while technically not part of the device model, is intimately tied in with it. "driverfs" is a virtual filesystem which provides a userspace representation of the driver model data structure. This filesystem, normally mounted at /devices, contains (currently) three top-level directories:

  • root contains the entire device tree in the usual hierarchical form. By digging around in /devices/root, users (or code) can get a handle on how the system is put together. Driverfs also makes it easy for devices to export tunable parameters (much like those found in /proc/sys) which can be found - and tweaked - in the device tree.

  • class contains an entry for each device class registered in the system. Further down, an entry for every device which implements that class can be found (it's a symbolic link to the entry in the /devices/root tree). There are also entries for each interface registered with a class, and, again, a symbolic link for every device implementing the interface.

  • bus lists each bus type (not each physical bus) on the system and the devices managed by each.
(See this example /devices listing, which corresponds to the system hierarchy shown above, to see how it all goes together).

Some readers may be noting a certain similarity between driverfs and devfs. They do resemble each other in that they are both kernel-generated virtual filesystems which contain entries for the devices in the system. They differ, however, in that driverfs is intended to be a physical representation of the system, while devfs is intended to provide user-space access to the devices themselves. A devfs user can mount /dev/discs/disc0; somebody perusing driverfs can, with sufficient typing pain, find the directory /devices/root/pci0/00:0e.0/scsi0/0:0:0:0/0:0:0:0:p1, but there's nothing there to mount. Instead, a bunch of information - including the device's major and minor numbers - is available.

So devfs and driverfs serve different purposes, but driverfs (with /sbin/hotplug) could conceivably supplant devfs in future kernels. While driverfs is not intended to be the way users access devices, all the information needed to create /dev nodes is (or can be) there. In the future, the /sbin/hotplug script may be used to configure all devices as they are discovered in the system; there is no reason why that script can not use the driverfs information (including class and interface information) to create /dev nodes implementing whatever policy the system administrator likes. The result would be a flexible device naming and administration scheme which removes policy from the kernel code.

That all remains in the future, however; the device model and driverfs are still works in progress. Most driver code does not yet interface with the device model; thus far, there has been little need to change the drivers themselves, since the PCI code has done the necessary device registration. Full implementation of classes and interfaces, however, is likely to require digging into the driver code, and that could take a little while. It could yet happen for 2.6, however.


(Log in to post comments)

Grammar

Posted Aug 29, 2002 8:33 UTC (Thu) by pointwood (guest, #2814) [Link]

Quote from the first part: "It doesn't to to shut down a SCSI controller before stopping all of the peripherals connected to that controller, for example."

Grammar

Posted Aug 29, 2002 16:40 UTC (Thu) by jwharmanny (guest, #971) [Link]

Please stop complaining about this kind of small mistakes; it's annoying. Rather say "thank you" for yet another great article!

Grammar

Posted Aug 29, 2002 22:03 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

I didn't see any complaint in the comment.

Grammar

Posted Aug 30, 2002 7:44 UTC (Fri) by pointwood (guest, #2814) [Link]

Complaint? Please!

It was never intented to be a complaint, just a note to the editor.

And yes, LWN rocks!

initramfs status?

Posted Aug 29, 2002 16:44 UTC (Thu) by cpeterso (subscriber, #305) [Link]

I think this is cool how the kernel is taking a more structured approach to devices (and their drivers). But does anyone know the status of the initramfs work? In the Linux 2.5 status updates, Alan Cox's work to replace ALL statically-linked device drivers with modules stored in the kernel file's initramfs is always still in the planning stage. This sounds like another way to unify a lot of hairy, duplicate code in the kernel, but it doesn't seem like it will see the light of day in time for the Linux 2.6 feature freeze on October 31.. :-(

initramfs status?

Posted Aug 29, 2002 17:48 UTC (Thu) by iabervon (subscriber, #722) [Link]

I believe that nobody's managed to get all of the efficiencies of having things statically linked into the kernel with modules. Additionally, people keep reducing the duplication; there's no inherent reason a device driver can't act like a module while being linked into the kernel image (getting the code into memory and linked is the same for all modules; once it's linked, it can run the same whether it's always been there or was loaded dynamically). So it may turn out that the main useful idea of Alan's work (code initializes the same way regardless of whether it's in the kernel file or not) will be implemented without some of the other properties of the idea which people don't like.

initramfs status?

Posted Aug 29, 2002 22:21 UTC (Thu) by giraffedata (subscriber, #1954) [Link]

>I believe that nobody's managed to get all of the efficiencies of having
>things statically linked into the kernel with modules

I believe there are no such efficiencies. I maintain the loadable kernel module ("module") HOWTO and have been collecting information on that question, and all I've come up with is one possible efficiency that statically linked modules have over dynamically loaded ones in that the base kernel lives in a block of virtual=real memory, whereas a dynamically loaded one may get loaded into a region where the virtual address is not the same as the real address. And that _may_ have some measurable effect on speed in _some_ architectures. But probably nothing significant.

And even that is pretty easy to fix in the initramfs concept, since the loading of the base kernel and of all the other modules is happening at about the same time.

initramfs status?

Posted Sep 1, 2002 7:42 UTC (Sun) by qubes (subscriber, #2562) [Link]

A staticly linked kernel lives in a memory space mapped with 4M tlb entries. Loaded modules don't get loaded into the same memory space, and the jump to "far" addresses take longer then jumps within the same tlb. Given that, the difference is usually in the noise of any kernel profile.

...

initramfs really needs a developer to fall in love with the userland side of the problem...and Alan currently hacking on the IDE bits that will hopefully get stable in 2.5.

Reduced Development Cycle

Posted Aug 30, 2002 1:54 UTC (Fri) by mmarq (guest, #2332) [Link]

Maybe I'm wrong, but I belive the main reason, beside of cleaning a lot of hairy and or duplicated code, is really to get all code related to device_drivers out of Kernel "core". In the end I belive we would't have a HAL(hardware abstraction layer), but something more flexible and sligthly resembling the I2O model. Of course this would't be 2.6, but there's an obvious advantage of having a completely separeted kernel development and device_driver development, that could translate in a global development cycle of about 1 year, half of what is today.

It would even more wonderful, if it somehow could reach compatibility at the device_driver level wiht the I2O model.Because it will mean a lot faster asyncronous I/O (and syncronous), solve for good the proprietary device_driver problem, and set the standard for a Unix like device driver model that definitely breaks the MS grasp on the x86 platform (same hardware_driver could run everywhere: Linux, Solaris, MacOSX...X86, spark, ppc...it only neds a recompile). Maybe this is not so stupid and in the clouds, as it sounds; a lot in Patrick Mochel device model could be tweaked that way, missing a little more will and imagination!!

Reduced Development Cycle

Posted Aug 30, 2002 8:50 UTC (Fri) by mmarq (guest, #2332) [Link]

A crazy idea!!!

Following the description of the driver model in this article, why not instead of a "struct device_driver, which is registered for each driver in the system" we could have a " pool of stackable struct device_drivers-one for each class of drivers in the system, which are (pool) registered for each driver in the system. This poll, individualy and or as a whole, could provide not only the "probe" function and others of this driver model, but most of all, every other functions allowing it to function as the middle communication layer of the split driver model of I2O(http://www.intelligent-io.com/specs_resources/V2.0_spec.pdf.

It dosent have to follow the intire V2.0_spec, and could even be more flexible and simple in lots of aspects. The main point here is to provide a way, par exemple, for a ethernet_board_driver talk directly to a sound_board_driver or a graphics_board_driver, and provide the interfaces for those hardware_drivers to be completely independent of the rest of the OS, in such a way that the same driver in a x86 platform(exemple) could be used wihtout alterations by all the OSes that implement the driver model. If we go to "spark" or "ppc", the same thing.

I belive, IMO, that this could be the deal that the hardware industry could not refuse, better than one gun pointed to the head in one hand, and a pen to sign the contrat in the other!!!...it'll definitely mean nÂș1 hardware support for Linux.

Reduced Development Cycle

Posted Sep 4, 2002 1:59 UTC (Wed) by mmarq (guest, #2332) [Link]

More on crazy idea!!!

For those that thoughed that I must had been on drugs, here is another dose!!!

Since the proposed transferes between a Ethernet board and a graphics or sound board( in the exemple), could be essencialy by nature, a streaming data transfer mode, I not departure even more from I2O V2.0 spec, and use the "facilities" of the STCP(stream protocol)[an alternative could be ST-RTP, another open protocol] now introduced in the 2.5 kernel. Since the real interesting thing about a middle communication layer- apart from isolate device_drivers from the rest of the kernel, is the direct communication between device_drivers wihtout or with minimal interruping of the central CPU, then streaming all possible transfers of data could be a very efective way of achieving this. The use of STCP or alternatives could be the key to use I2O defined "hardware_drivers", whihtout special Mobos or expansion boards with IOP(interrup processores), and with very little overhead - I2O "device-drivers" on plain commodity hardware!!!

And since there was the famous kernel in a IBM wristwatch, and since the STCP can propagate even on the INTERNET(some developers think it could replace TCP), this could propel the use of a Linux box not only to use and control it's own devices and peripherals, but also remote devices and peripherals, connected or not to other computers, and including devices like HDTV or HI-FI set!!!...remote now has no limit...Imagine watching and program your home cable HDTV set recording facilities, from a laptop in a hotel, in another continent, connected trough the internet???!!!!

The 2.5 device model

Posted Aug 30, 2002 3:20 UTC (Fri) by otterley (guest, #3500) [Link]

If driverfs is actually implemented, devfs should simply map /dev shortcuts to drivefs leaf nodes, which in turn should be the actual device files.

Whatever driverfs does, the tree model should be bound as tightly as possible to the devices' physical layout. One example I'd like us to avoid repeating is having Fibre Channel devices named like SCSI devices, because Fibre Channel devices have unique World Wide Names (WWNs) for addressing purposes. I want to be capable of assuring myself that the disk I'm trying to access is the correct disk; I'm tired of having my disks haphazardly assigned to device nodes and ending up accessing the wrong disk because the kernel decided that disk 0x210100e08b245b51 that was exported as /dev/sda before I rebooted is now at /dev/sdb.

devfs can still serve a purpose here, to help map user-friendly names (/dev/sdX) to physical pathnames -- similar to Solaris' path_to_inst system. But it should be entirely optional to use.

Solaris has had this right since the early '90s; it's about time Linux got with the program.

The 2.5 device model

Posted Aug 30, 2002 8:08 UTC (Fri) by garloff (subscriber, #319) [Link]

Of course you know scsidev
http://www.garloff.de/kurt/linux/scsidev/
which allows you to create device nodes based on
the WWN (and other criteria at your choice).

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds