LWN.net Logo

Sysfs and a stable kernel ABI

Some things are fairly predictable. There is a long list of regressions in the 2.6.16 kernel, and some of those do not appear to be getting a whole lot of developer attention. But when one of those bugs causes a developer's iPod to stop working with Linux, it will get fixed in a timely manner. This time around, it also set off a discussion on what it really means to have a stable application interface to the kernel.

Back in the dim and distant past (last year), the "user events" mechanism was added to the kernel. One of the first events to be implemented was block device mount and unmount operations. Over time, however, it was concluded that user events were not the right way to communicate this information. So a new interface - allowing interested user-space processes to call poll() on /proc/mounts - was added to the kernel. Then, a patch was merged for 2.6.16 which removes the mount and unmount events.

When Pekka Enberg (the iPod user) fingered this patch as the cause of the problem, the author of that patch (Key Sievers) responded: "Upgrade HAL, it's too old for that kernel." This response didn't sit well with Andrew Morton:

You took a kernel interface which was present in 2.6.10, 2.6.11, 2.6.12, 2.6.13, 2.6.14 and 2.6.15 and changed it in a non-compatible way, without telling us that it was non-compatible and without even notifying people that we'd gone and broken existing userspace.

We. Don't. Do. That.

Linus, too, was unimpressed:

Guys: you now have two choices: fix it by sending me a patch and an explanation of what went wrong, or see the patch that broke things be reverted.... I'm fed up with hearing how "breaking user space is ok because it's HAL or hotplug". IT IS NOT OK. Get your damn act together, and stop blaming other people.

For now, the issue has been resolved by reverting the patch in question. The feature removal schedule has been updated to note that the mount and unmount events will disappear in February of 2007. iPod owners can rest easy for now.

But this episode drives home a point which is worth noting. Longstanding kernel policy has been that, while kernel internals can change at any time, the user-space interface must remain absolutely stable. Even when an interface turns out to have been badly designed, it must continue to work. Interfaces can be augmented or superseded, but they cannot be broken.

Not that long ago, the kernel ABI consisted entirely of the system call interface and a few files in /proc. While regressions were not unknown, the fact is that keeping a couple hundred system calls in a stable state is a relatively straightforward task. People notice when a system call interface is changed. In more recent times, the interface to the kernel has gotten much wider; it includes several netlink-based protocols and a number of kernel-based virtual filesystems like configfs and sysfs. It can be easy for kernel developers to lose track of the fact that, when they work on one of those interfaces, they risk breaking the user-space ABI. And it can be easy for changes which change the user-space interface to slip past the review process.

This risk is especially acute with sysfs. The directory tree exported via sysfs matches, in a very close way, the data structures maintained within the kernel. Every sysfs directory corresponds to a kobject embedded within some kernel structure, and every sysfs attribute is tied, somehow, to an attribute of the associated structure within the kernel. There are some advantages to this arrangement; sysfs has become a clear window into the organization of the system as seen by the kernel. And, because sysfs is so closely tied to the kernel's data structures, most developers need not even think about it. When a new type of device, for example, is added to the kernel, the associated sysfs entries will generally just happen by themselves.

But every entry in sysfs - 3400 attributes in 1175 directories on your editor's relatively simple system - is part of the kernel ABI. That's 3400 attributes tied to 1175 kernel internal data structures which cannot be changed without the risk of breaking user-space code. Sysfs has evolved into a highly complex - and, to a great extent, undocumented - binary interface to the kernel. In the short term, that makes sysfs susceptible to inadvertent regressions as developers make changes without thinking about the possible user-space effects.

In the longer term, a different problem might arise. The kernel developers have always been willing to make incompatible changes to the internal API if the end result is a better, more capable, or safer interface. This freedom to change things is widely exploited; see the LWN 2.6 API changes page to see just how widely. As kernel data structures get tied into sysfs, however, they become part of an ABI which cannot be broken. In a few years, the kernel hackers may find themselves in the position of wanting to make significant internal structural changes, only to be thwarted by the inability to change the associated sysfs structure. At that point, the choice be to either (1) not make the changes, or (2) interpose some sort of compatibility translation layer between sysfs and the kernel structures it represents. Neither looks like a whole lot of fun.


(Log in to post comments)

Sysfs and a stable kernel ABI

Posted Feb 23, 2006 3:33 UTC (Thu) by etrusco (guest, #4227) [Link]

Or bump major version and break the dang interface ;-)

Sysfs and a stable kernel ABI

Posted Feb 24, 2006 9:41 UTC (Fri) by kleptog (subscriber, #1183) [Link]

Indeed, maybe this should be the new model.

2.6.x - Lets keep adding stuff and hacking away while it works
2.7.x - Two months to delete the mountains of cruft we've accumulated
2.8.x - Here we go again

If we designate the odd numbers for series where features nobody wants anymore get removed and nothing added, maybe it wouldn't be such a long development cycle?

What's the 'B' for?

Posted Feb 23, 2006 5:41 UTC (Thu) by xoddam (subscriber, #2322) [Link]

While deprecating and removing interfaces is not (normally) done lightly,
it has been done. Scheduled removal of deprecated user-space interfaces
has taken place *within* the 2.6 series, not merely when the minor
version number has been 'bumped'. Setting a time-frame for the removal
of the mount and unmount events continues this practice -- no-one said,
"we'll remove them in 2.7".

As for sysfs -- wasn't the whole point that it accurately reflects kernel
data structures? If that is its defining motivation, the guarantee not
to break the ABI arguably never extended to sysfs. Translation into
legacy data structures sounds very burdensome.

What does 'B' mean in ABI anyway? Binary? That implies it's explicitly
about the system call interface. Obviously the guarantee extends beyond
that, but should it really cover every userspace interface? Keeping some
things in sync with the kernel just makes sense. To me.

What's the 'B' for?

Posted Feb 23, 2006 17:00 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

Binary means old Binaries still work.

There is a tradition of programs that are tightly coupled with the
kernel breaking. Look at Documentation/Changes.

However it should not be something that is done lightly or casually.

Removing mount and unmount events

Posted Feb 23, 2006 18:23 UTC (Thu) by BradReed (subscriber, #5917) [Link]

What does removing mount and unmount events mean to a user? I run Slackware, which doesn't include HAL, and hotplug/udev detects my ipod when I plug it in, and creates /dev/ipod for me to mount. Is this the functionality that is being removed?

What is HAL, and why would kernel developers not want users to know if something is mounted or unmounted?

Removing mount and unmount events

Posted Feb 23, 2006 21:36 UTC (Thu) by iabervon (subscriber, #722) [Link]

HAL is a library for having desktop-level programs able to respond to system events. E.g., having it so that, when you plug in your iPod, the GNOME desktop manager knows to display an icon for you to interact with it.

They aren't removing the ability for users to know when something is being mounted or unmounted. The issue was that the old mechanism was unclear as to what had actually happened, and had race conditions (if you mounted a device and then quickly unmounted it, the thing getting the events would get these two events, but be unable to interpret them usefully once the mount was gone). There's already a (virtual) file, /proc/mounts, which lists the mounts, and this is what programs would have to read to interpret a mount event. So they added support for just watching that file, so HAL can tell when to reread it (and change the set of icons it causes to have displayed). The contraversy is over removing the confusing and now unnecessary interface that some versions of HAL that people still use depend on.

Removing mount and unmount events

Posted Feb 24, 2006 13:29 UTC (Fri) by BradReed (subscriber, #5917) [Link]

Thanks for the explanation. Doesn't look like something I need worry about then. Slackware no longer ships Gnome, and I use Enlightenment WM which doesn't have icons.

Removing mount and unmount events

Posted Feb 24, 2006 13:58 UTC (Fri) by hazelsct (guest, #3659) [Link]

HAL is not a GNOME issue alone. It interacts with KDE, it's part of the Network Manager, it's in more and more user-space software. Think of it as a user-space mechanism for handling system events. Sure, you can pretend it doesn't exist, but then you will miss out on a lot of nice new functionality, not just having icons pop up when new devices appear.

Removing mount and unmount events

Posted Mar 2, 2006 13:30 UTC (Thu) by nix (subscriber, #2304) [Link]

HAL is specifically for hardware-type events.

The system DBUS queue (which is used by HAL) is intended as the `all system events' thing.

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds