The story of sysfs (and the device model in general) is a long and
complicated one. The creation of a single data structure to represent the
system's hardware and software configuration was long overdue; many tasks
(power management, for
example) cannot be done properly without it. Sysfs adds value to that
structure by representing it to user space. This structure is useful in
many ways, but it has also brought its share of hassles. Exposing kernel
data structures to user space makes it hard to change those structures
without breaking the user-space API; it also exposes every one of them to
user-space initiated lifecycle problems.
Internally, the core building block for the device model is the kobject.
Objects represented in sysfs - devices, for example - each contain a
kobject which, among other things, is the focal point for sysfs access.
The kobject also contains a reference count for the containing object which
is used to manage its lifecycle. A given kobject and its containing data
structure can be deleted when the reference count goes to zero - and not
before. Reference counting works, but it can lead to surprises.
As an example, consider a USB device - a mouse, say. When this device is
plugged into the system, a suitable device structure (containing a kobject)
is created and registered with the kernel. When the mouse is unplugged,
that structure is released. But imagine what happens if a user-space
process opens a sysfs file associated with the mouse device while it is
present, and keeps that file open long after the physical device goes
away. The kernel must be able to handle operations on that open sysfs
file, even though the driver thinks that the device it represents is long
gone. The reference counting in the kobject makes this work - most of the
time. The potential for confusion is high, though, especially with drivers
which have not been written with this sort of lifecycle management in
Back at the end of March, Tejun Heo posted a
discussion of device model lifecycle issues which points out this
problem and a few others. His argument is that the need to manage objects
with different lifecycles makes programming with the device model hard -
something developers have known for some time. Even the core device model
maintainers will admit that it's easy to get things wrong.
More recently, Tejun has followed up with a patch set which attempts to
simplify the situation. There is a great deal of cleanup work in these
patches, and one small API change, but the core change is this: it enables
a clean separation of the lifecycles of sysfs objects and the underlying
data structures they represent. As a result, it is no longer necessary for
code outside of sysfs to be concerned about the fact its data structures
may have a shorter life than the sysfs objects representing those
A sysfs directory (which represents a kobject) is represented within the
kernel by struct
sysfs_dirent. In current kernels, if the sysfs_dirent
structure exists, its underlying kobject is expected to exist as well. It
is not possible for the kobject to go away as long as the
sysfs_dirent structure exists; that means that the structure
containing the kobject must continue to exist as long as any references to
the sysfs files exist. Tejun's patch works by eliminating that requirement.
In the modified sysfs, each sysfs_dirent contains a new counter
called s_active. This counter tracks the number of active
references to the object; these references are the ones which involve the
associated kobject at the current moment. A user-space process which is
holding a sysfs file open will not increase the s_active count
until it performs an actual operation on that file, and the reference
remains only for as long as it takes to complete the operation. Since most
sysfs operations are quite fast, active references will not normally be
held for long.
The active count, as it happens, is maintained with an rwsem - a reader/writer
semaphore. Active references are tracked as readers, so there can be any
number of them outstanding at a given time. The code to obtain an active
reference works with a call to down_read_trylock(), meaning that
it will take a "lock" (a reference) if one is available, but it will not
block if the operation fails. All of the relevant
sysfs operations have been changed to obtain active references before
referencing the kobject - and they make sure that the reference was
granted. If an attempt to obtain an active reference fails, sysfs fails
the higher-level operation with -ENODEV.
The only way
down_read_trylock() will fail is if another thread holds a writer
lock on the semaphore - or is in the process of waiting for the readers to
get out of the way so it can get that lock.
Should something happen which causes the underlying kobject to go away, the
cleanup code will call down_write() on the s_active rwsem
in the sysfs_dirent entry, thus taking a writer lock. This call
will cause any future
attempts to obtain an active reference to fail; it will also block until
all currently-existing active references are released.
The end result of all this is that, once the final kobject_put()
call has completed for a given kobject, there will be no further attempts
to access that kobject from sysfs. The kobject (and its containing data
structure) can be safely deleted, and the driver need worry no more about
As an added bonus, there is no longer any need to increase module reference
counts when sysfs attributes are being accessed. A driver which is being
unloaded will release all of its devices, meaning that sysfs will no longer
make any calls into the driver module anyway; the module reference count
becomes superfluous. So Tejun's patch removes the owner field
from attribute structures - a change which ripples through a significant
amount of driver code.
There have been some comments on how the patches are implemented, but no
disagreement with the ultimate goal; these changes could go in as soon as
2.6.22. Tejun would appear to have more improvements in mind, but, even
with no further changes, the current patches go a long way toward making
sysfs safer and easier to work with.
to post comments)