BKL-free in 2.6.37 (maybe)
Arnd currently has a vast array of patches in the linux-next tree. Many of them are the result of the tedious (but tricky) work of looking at specific subsystems, determining what kind of locking they really need to have, then substituting lock_kernel() calls with something more local. In many cases, the BKL locking can simply be removed, as the code turns out not to need it. A big focus for 2.6.37 has been the removal of the BKL from a number of filesystems - a task which has required digging into some fairly old code. The Amiga FFS, for example, cannot have received much maintenance in recent times, and seems unlikely to have a lot of users.
The most wide-ranging patch for 2.6.37 has to do with the llseek() function, found in struct file_operations. This function allows a filesystem or driver to implement the lseek() system call, changing a file descriptor's position within the file. Unlike most file_operations functions, there is a default implementation for llseek() which simply changes the kernel's idea of the descriptor's position without notifying the underlying code at all. That change, naturally, was done with the BKL held. This implicit default llseek() implementation will have made life easier for a handful of developers, but it makes BKL removal hard: an implementation change could affect any code with a file_operations structure, not just modules which actually implement the llseek() operation.
To make things harder, a great many of these implicit llseek() implementations are not really needed or useful - most device drivers do not implement any concept of a "file position" and pay no attention to whatever the kernel thinks the position might be. In such situations, it is tempting to change the code to an explicit "no seeking allowed" implementation which reflects what is really going on. The problem here is that some user-space application somewhere might be calling lseek() on the device, and they might get upset if those calls started failing with ESPIPE errors. In other words, a successful-but-ignored lseek() call might just be part of the user-space ABI for a specific device. So something more careful has to be done.
The first step was to go through the kernel and add an explicit llseek() operation to every file_operations structure which did not already have one - a patch affecting 343 files. This work was done primarily with a frightening Coccinelle semantic patch (it was included in the patch changelog) which attempts to determine whether the code in question actually uses the file position or not. If the file position is used, default_llseek(), which implements the old default behavior, becomes the explicit default; otherwise noop_llseek(), which succeeds but does nothing, is used. After that work was done, Arnd was able to verify that none of the users of default_llseek() (there are 191 of them) needs the BKL. So the removal of the BKL from llseek() can be made complete.
The patch also changes how llseek() is handled in the core kernel. Starting with 2.6.37, assuming this work is merged (a good bet), any code which fails to provide an llseek() operation will default to no_llseek(), which returns ESPIPE. Any out-of-tree code which depends on the old default will thus not work properly with 2.6.37 until it is updated.
Even after all of this work, there are still a lot of lock_kernel() calls in the mainline. Almost all of them, though, are in old, obscure code which is not relevant to a lot of users. In some cases, the remaining BKL-using code might be shifted over to the staging tree and eventually removed entirely if it is not fixed up. In other cases, an effort will be made to eradicate the BKL; it can still be found in occasionally-useful code like the Appletalk and ncpfs implementations. There are also a lot of Video4Linux2 drivers which still use the BKL; how those drivers will be fixed is the subject of an ongoing discussion in the V4L2 community.
The biggest impediment to a BKL-free 2.6.37, though, may well be the POSIX locking code. File locks are represented internally with a file_lock structure; those structures are passed around to a few places and, of course, protected with the BKL. Patches exist to protect those structures with a spinlock within the core kernel. The main sticking point appears to be the NFS lockd daemon, which uses file_lock structures and which, thus, requires the BKL; somebody is said to be working on fixing this code, but no patches have been posted yet. Until lockd has been converted, file locking as a whole requires the BKL. And, since it's a rare kernel that does not have file locking enabled, that will drag the BKL into almost all real-world kernel builds.
Even after that fix is in place, distributor kernels are likely to need the
BKL for a bit longer. As long as there is even one module they ship which
requires the BKL, the support for it needs to be there, even if most users
will not have that module loaded. People who build their own kernels,
though, should often be able to put together a configuration which does not
need the BKL. If all goes well, 2.6.37 will have a configuration option
which makes BKL-free builds possible. That's a huge step forward, even if
the BKL still exists in most stock kernels.
| Index entries for this article | |
|---|---|
| Kernel | Big kernel lock |
| Kernel | lock_kernel() |
