Toward a safer sysfs
Internally, the core building block for the device model is the kobject. Objects represented in sysfs - devices, for example - each contain a kobject which, among other things, is the focal point for sysfs access. The kobject also contains a reference count for the containing object which is used to manage its lifecycle. A given kobject and its containing data structure can be deleted when the reference count goes to zero - and not before. Reference counting works, but it can lead to surprises.
As an example, consider a USB device - a mouse, say. When this device is plugged into the system, a suitable device structure (containing a kobject) is created and registered with the kernel. When the mouse is unplugged, that structure is released. But imagine what happens if a user-space process opens a sysfs file associated with the mouse device while it is present, and keeps that file open long after the physical device goes away. The kernel must be able to handle operations on that open sysfs file, even though the driver thinks that the device it represents is long gone. The reference counting in the kobject makes this work - most of the time. The potential for confusion is high, though, especially with drivers which have not been written with this sort of lifecycle management in mind.
Back at the end of March, Tejun Heo posted a discussion of device model lifecycle issues which points out this problem and a few others. His argument is that the need to manage objects with different lifecycles makes programming with the device model hard - something developers have known for some time. Even the core device model maintainers will admit that it's easy to get things wrong.
More recently, Tejun has followed up with a patch set which attempts to simplify the situation. There is a great deal of cleanup work in these patches, and one small API change, but the core change is this: it enables a clean separation of the lifecycles of sysfs objects and the underlying data structures they represent. As a result, it is no longer necessary for code outside of sysfs to be concerned about the fact its data structures may have a shorter life than the sysfs objects representing those structures.
A sysfs directory (which represents a kobject) is represented within the kernel by struct sysfs_dirent. In current kernels, if the sysfs_dirent structure exists, its underlying kobject is expected to exist as well. It is not possible for the kobject to go away as long as the sysfs_dirent structure exists; that means that the structure containing the kobject must continue to exist as long as any references to the sysfs files exist. Tejun's patch works by eliminating that requirement.
In the modified sysfs, each sysfs_dirent contains a new counter called s_active. This counter tracks the number of active references to the object; these references are the ones which involve the associated kobject at the current moment. A user-space process which is holding a sysfs file open will not increase the s_active count until it performs an actual operation on that file, and the reference remains only for as long as it takes to complete the operation. Since most sysfs operations are quite fast, active references will not normally be held for long.
The active count, as it happens, is maintained with an rwsem - a reader/writer semaphore. Active references are tracked as readers, so there can be any number of them outstanding at a given time. The code to obtain an active reference works with a call to down_read_trylock(), meaning that it will take a "lock" (a reference) if one is available, but it will not block if the operation fails. All of the relevant sysfs operations have been changed to obtain active references before referencing the kobject - and they make sure that the reference was granted. If an attempt to obtain an active reference fails, sysfs fails the higher-level operation with -ENODEV.
The only way down_read_trylock() will fail is if another thread holds a writer lock on the semaphore - or is in the process of waiting for the readers to get out of the way so it can get that lock. Should something happen which causes the underlying kobject to go away, the cleanup code will call down_write() on the s_active rwsem in the sysfs_dirent entry, thus taking a writer lock. This call will cause any future attempts to obtain an active reference to fail; it will also block until all currently-existing active references are released.
The end result of all this is that, once the final kobject_put() call has completed for a given kobject, there will be no further attempts to access that kobject from sysfs. The kobject (and its containing data structure) can be safely deleted, and the driver need worry no more about it.
As an added bonus, there is no longer any need to increase module reference counts when sysfs attributes are being accessed. A driver which is being unloaded will release all of its devices, meaning that sysfs will no longer make any calls into the driver module anyway; the module reference count becomes superfluous. So Tejun's patch removes the owner field from attribute structures - a change which ripples through a significant amount of driver code.
There have been some comments on how the patches are implemented, but no
disagreement with the ultimate goal; these changes could go in as soon as
2.6.22. Tejun would appear to have more improvements in mind, but, even
with no further changes, the current patches go a long way toward making
sysfs safer and easier to work with.
Index entries for this article | |
---|---|
Kernel | Device model |
Kernel | Sysfs |
Posted Apr 12, 2007 2:58 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link] (2 responses)
I mean, obviously this new code will prevent the system from causing weasels to erupt from your
Posted Apr 12, 2007 3:09 UTC (Thu)
by corbet (editor, #1)
[Link]
Posted Apr 19, 2007 18:55 UTC (Thu)
by jd (guest, #26381)
[Link]
Okay, so what DOES happen when you try to read from, or write to, a sysfs file for which the Toward a safer sysfs
device no longer exists?
nostrils or whatever other catastrophic event might occur, but surely there should be some
defined behavior here? Something along the lines of "device no longer exists, go away"?
Exactly that - it returns ENODEV.
Toward a safer sysfs
Does that mean a device driver to make weasels erupt from a user's Toward a safer sysfs
nostrils will be broken? Can we add a special API for such drivers, say
next April 1st?