Michael Kerrisk, the Linux man page maintainer since 2004, gave a talk on
the value of documentation during the first day of LinuxConf Europe 2007.
While documents are useful for end users trying to get their job done, this
use was not Michael's focus; instead, he talked about how documentation can
help in the creation of a better kernel in the first place. The writing of
documents, he says, reveals bugs and bad interface designs before they
become part of a released kernel. And that can help to prevent a great
deal of pain for both kernel and user-space developers.
Michael presented three examples to show how the process of writing
documentation can turn up bugs:
- The inotify interface
was added to the 2.6.13 kernel as an
improved way for an application to request notifications when changes
are made to directories and files. Around 2.6.16, Michael got around
to writing a manual page for this call, only to find that one option
(IN_ONESHOT) had never worked. Once the problem was found it
was quickly fixed, but that did not happen until an effort was made to
document the interface.
- splice() was added in 2.6.17. Michael found that it was easy
to write programs which would go into an unkillable hang; clogging the
system with hung processes was also easy. Again, once the problem was
found, it was fixed quickly.
- The timerfd() interface, as merged in 2.6.22, did not work
properly. It also has some design issues, as were covered in
The existence of buggy interfaces in stable kernel releases is, says
Michael, a result of insufficient testing of -rc kernels during the
development process. Better documentation can help with this problem.
Better documentation can also help with the API design process in the first
place. Designing good APIs is hard, and is made harder by the fact that,
for the kernel, API design mistakes must be maintained forever. So anything
which can help in the creation of a good API can only be a good thing.
The characteristics of a good API include simplicity, ease of use,
generality, consistency with other interfaces, and integration with other
interfaces. Bad designs, instead, lack those characteristics. As an
example, Michael discussed the dnotify interface - the previous attempt to
provide a file-change notification service. Dnotify suffered as a result
of its use of signals, which never leads to an easy-to-use interface. It
was only able to monitor directories, not individual files. It required
keeping an open file descriptor, thus preventing the unmounting of any
filesystem where dnotify was in use. And the amount of information
provided to applications was limited.
Another example was made of the mlock() and
remap_file_pages() system calls. Both have start and
length arguments to specify the range of memory to be affected.
The mlock() interface rounds the length argument up to
the next page, while remap_file_pages() rounds it down. The two
system calls also differ in when they apply the length argument. As a
result, a call like:
mlock (4000, 6000);
will affect bytes 0..12287, while
remap_file_pages (4000, 6000, ...);
affects bytes 0..4095. This sort
of inconsistency makes these system calls harder for developers to use.
Many bits can be expended on how bad these interfaces are. But, asks
Michael, was it all really the developer's fault? Or did the lack of a
review process contribute to these problems?
Many of these difficulties result from the fact that the designers of
system call interfaces (kernel hackers) are not generally the users of
those interfaces. To make things better, Michael put forward a proposal to
formalize the system call interface development process. He acknowledges
that this sort of formalization is a hard sell, but the need to create
excellent interfaces from the first release makes it necessary. So he
would like to see a formal signoff requirement for APIs - though who would
be signing off on them was not specified. There would need to be a design
review, full documentation of the interface, and a test suite before this
signoff could happen. The test suite would need to be at least partially
written by people other than the developer, who will never be able to imagine
all of the crazy things users might try to do with a new interface.
The documentation requirement is an important part of the process. Writing
documentation for an interface will often reveal bugs or bad design
decisions. Beyond that, good documentation makes the nature of the
interface easier for others to understand, resulting in more review and
more testing of a proposed interface. Without testing from application
developers, problems in new APIs will often not be found until after they
have been made part of a stable kernel release, and that is too late.
In the question period, it was asserted that getting application developers
to try out system calls in -rc kernels is always going to be hard.
An alternative idea, which has been heard before, would be to mark new
system calls as "experimental" for a small number of kernel release cycles
after they are first added. Then it would be possible to try out new
system calls without having to run development kernels and still have a
chance to influence the final form of the new API. It might be easier to
get the kernel developers to agree to this kind of policy than to get them
to agree to an elaborate formal review process, but it still represents a
policy change which would have to be discussed. That discussion could
happen soon; how it goes will depend on just how many developers really
feel that there is a problem with how user-space APIs are designed and
The next day, Arnd Bergmann gave a talk on how not to design kernel
interfaces. Good interfaces, he says, are designed with "taste," but
deciding what has taste is not always easy. Taste is subjective and
changes over time. But some characteristics of a tasteful interface are
clear: simplicity, consistency, and using the right tool for the job.
These are, of course, very similar to the themes raised by Michael the day
As is often the case, discussion of interface design is often most easily
done by pointing out the things one should not do. Arnd started in
with system calls, which are the primary interface to the kernel. Adding
new system calls is a hard thing to do; there is a lot of review which must
be gotten through first (though, as discussed above, perhaps it's still not
hard enough). But often the alternative to adding system calls can be
worse; he raised the hypothetical idea of a /dev/exit device; a
process which has completed its work could quit by opening and writing to
that device. Such a scheme would allow the elimination of the
exit() system call, but it would not be a more tasteful interface
by any means.
The ioctl() system call has long been the target of criticism; it
is not type safe, hard to script, and is an easy way to sneak in ABI
changes without anybody noticing. On the other hand, it is well
established, easy to extend, it works in modules, and it can be a good way
to prototype system calls. Again, trying to avoid ioctl() can
lead to worse things; Arnd presented an example from the InfiniBand code
which interprets data written to a special file descriptor to execute
commands. The result is essentially ioctl(), but even less clear.
Sockets are a well-established interface which, Arnd says, would never be
accepted into the kernel now. They are totally inconsistent with
everything else, operate on devices which are not part of the device tree,
have read and write calls which are not read() and
write(), and so on. Netlink, by adding complexity to the socket
interface, did not really help the user-space interface situation in
general; its use is, he says, best avoided. But, importantly, it is better
to use netlink than to reinvent it. The wireless extensions API was
brought up as another example of how not to do things; putting wireless
extensions over netlink turned out to be a way of combining the worst
features of sockets and ioctl() into a single interface.
The "fashionable" way to design new interfaces now is with virtual
filesystems. But troubles can be found there as well. /proc
became a sort of dumping ground for new interfaces until the developers
began to frown on additions there. Sysfs was meant to solve many of
the problems with /proc, but it clearly has not solved the API
stability problem. Virtual filesystems may well be the best way to create
new interfaces, but there are many traps there.
Finally, there was some talk of designing interfaces to make ABI emulation
easy. Arnd suggests that data structures should be the same in both kernel
and user space. Avoid long variables, and, whenever possible,
avoid pointers as well. Structure padding - either explicit or caused by
badly aligned fields - can lead to trouble. And so on.
All told, it was a lively session with a great deal of audience
participation. There are many user-space interface design mistakes which
are part of Linux and must be supported forever. There is also a great
deal of interest in avoiding making more of those mistakes in the future.
The problem remains a hard one, though, even with the benefit of a great
deal of experience.
to post comments)