LWN.net Logo

BKL-free in 2.6.37 (maybe)

By Jonathan Corbet
September 20, 2010
The removal of the big kernel lock has been an ongoing, multi-year effort which has been reported on here a few times. The BKL has some strange and unique properties which make its removal from various kernel subsystems trickier than one might think it should be. But, thanks to a great deal of work by Arnd Bergmann, we might just be approaching a point where the 2.6.37 kernel can be built BKL-free for many or most users. There is, however, one significant obstacle which still must be overcome.

Arnd currently has a vast array of patches in the linux-next tree. Many of them are the result of the tedious (but tricky) work of looking at specific subsystems, determining what kind of locking they really need to have, then substituting lock_kernel() calls with something more local. In many cases, the BKL locking can simply be removed, as the code turns out not to need it. A big focus for 2.6.37 has been the removal of the BKL from a number of filesystems - a task which has required digging into some fairly old code. The Amiga FFS, for example, cannot have received much maintenance in recent times, and seems unlikely to have a lot of users.

The most wide-ranging patch for 2.6.37 has to do with the llseek() function, found in struct file_operations. This function allows a filesystem or driver to implement the lseek() system call, changing a file descriptor's position within the file. Unlike most file_operations functions, there is a default implementation for llseek() which simply changes the kernel's idea of the descriptor's position without notifying the underlying code at all. That change, naturally, was done with the BKL held. This implicit default llseek() implementation will have made life easier for a handful of developers, but it makes BKL removal hard: an implementation change could affect any code with a file_operations structure, not just modules which actually implement the llseek() operation.

To make things harder, a great many of these implicit llseek() implementations are not really needed or useful - most device drivers do not implement any concept of a "file position" and pay no attention to whatever the kernel thinks the position might be. In such situations, it is tempting to change the code to an explicit "no seeking allowed" implementation which reflects what is really going on. The problem here is that some user-space application somewhere might be calling lseek() on the device, and they might get upset if those calls started failing with ESPIPE errors. In other words, a successful-but-ignored lseek() call might just be part of the user-space ABI for a specific device. So something more careful has to be done.

The first step was to go through the kernel and add an explicit llseek() operation to every file_operations structure which did not already have one - a patch affecting 343 files. This work was done primarily with a frightening Coccinelle semantic patch (it was included in the patch changelog) which attempts to determine whether the code in question actually uses the file position or not. If the file position is used, default_llseek(), which implements the old default behavior, becomes the explicit default; otherwise noop_llseek(), which succeeds but does nothing, is used. After that work was done, Arnd was able to verify that none of the users of default_llseek() (there are 191 of them) needs the BKL. So the removal of the BKL from llseek() can be made complete.

The patch also changes how llseek() is handled in the core kernel. Starting with 2.6.37, assuming this work is merged (a good bet), any code which fails to provide an llseek() operation will default to no_llseek(), which returns ESPIPE. Any out-of-tree code which depends on the old default will thus not work properly with 2.6.37 until it is updated.

Even after all of this work, there are still a lot of lock_kernel() calls in the mainline. Almost all of them, though, are in old, obscure code which is not relevant to a lot of users. In some cases, the remaining BKL-using code might be shifted over to the staging tree and eventually removed entirely if it is not fixed up. In other cases, an effort will be made to eradicate the BKL; it can still be found in occasionally-useful code like the Appletalk and ncpfs implementations. There are also a lot of Video4Linux2 drivers which still use the BKL; how those drivers will be fixed is the subject of an ongoing discussion in the V4L2 community.

The biggest impediment to a BKL-free 2.6.37, though, may well be the POSIX locking code. File locks are represented internally with a file_lock structure; those structures are passed around to a few places and, of course, protected with the BKL. Patches exist to protect those structures with a spinlock within the core kernel. The main sticking point appears to be the NFS lockd daemon, which uses file_lock structures and which, thus, requires the BKL; somebody is said to be working on fixing this code, but no patches have been posted yet. Until lockd has been converted, file locking as a whole requires the BKL. And, since it's a rare kernel that does not have file locking enabled, that will drag the BKL into almost all real-world kernel builds.

Even after that fix is in place, distributor kernels are likely to need the BKL for a bit longer. As long as there is even one module they ship which requires the BKL, the support for it needs to be there, even if most users will not have that module loaded. People who build their own kernels, though, should often be able to put together a configuration which does not need the BKL. If all goes well, 2.6.37 will have a configuration option which makes BKL-free builds possible. That's a huge step forward, even if the BKL still exists in most stock kernels.


(Log in to post comments)

BKL-free in 2.6.37 (maybe)

Posted Sep 23, 2010 9:16 UTC (Thu) by marcH (subscriber, #57642) [Link]

> If all goes well, 2.6.37 will have a configuration option which makes BKL-free builds possible. That's a huge step forward, even if the BKL still exists in most stock kernels.

Suppose the BKL has just become optional. As a *user*, suppose I do not use *any* module or code using the BKL. Would it make any difference to me to run a BKL-free kernel compared to a stock kernel compiled with most features, including an unused BKL?

> The Amiga FFS, for example, cannot have received much maintenance in recent times, and seems unlikely to have a lot of users.
> [...] [...] [...]
> In some cases, the remaining BKL-using code might be shifted over to the staging tree and eventually removed entirely if it is not fixed up.

In such cases isn't it better to just let this suboptimal but tested and working code depend forever on an optional BKL? The few remaining users of such old code might still enjoy it without having the resources to free it from the BKL.

BKL-free in 2.6.37 (maybe)

Posted Sep 23, 2010 12:14 UTC (Thu) by NAR (subscriber, #1313) [Link]

In such cases isn't it better to just let this suboptimal but tested and working code depend forever on an optional BKL?

Do you mean tested 10 years ago and was possibly working 10 years ago? I remember an article here about a kernel module that wasn't even compiling for some 4 years and nobody noticed.

Kernel developers sometimes boast about having features with a user community numbering in single digits. Do you think these extremely rarely used features are actually working with every kernel release?

BKL-free in 2.6.37 (maybe)

Posted Sep 23, 2010 14:20 UTC (Thu) by marcH (subscriber, #57642) [Link]

> Do you mean tested 10 years ago and was possibly working 10 years ago?

No: I meant in small but non-zero use still today. Replace "Amiga FFS" by a more appropriate example if required.

> I remember an article here about a kernel module that wasn't even compiling for some 4 years and nobody noticed.

I wasn't considering such extreme cases. This seems rather off-topic since nobody cares whether such broken modules use the BKL or not.

BKL-free in 2.6.37 (maybe)

Posted Sep 23, 2010 14:48 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> In such cases isn't it better to just let this suboptimal but tested and working code depend forever on an optional BKL?

If every use of the BKL is removed, the code which magically releases and reacquires it on a context switch can also be removed. This cannot be done if there is even one single user.

Not to mention that it is also a good canary for bit-rotted code. It is most probably not "tested and working code"; said code usually depended on some property of the environment which has now changed (this is how bit rot happens).

BKL-free in 2.6.37 (maybe)

Posted Sep 23, 2010 19:00 UTC (Thu) by arnd (subscriber, #8866) [Link]

> Suppose the BKL has just become optional. As a *user*, suppose I do not use *any* module or code using the BKL.
> Would it make any difference to me to run a BKL-free kernel compared to a stock kernel compiled with most features,
> including an unused BKL?

Not much. The task_struct can shrink by four bytes and the schedule() function loses a few bytes for calling release_kernel_lock()/reaquire_kernel_lock().

The -rt kernel tree wins a bit more because it does not have to work around the BKL being weird any more.

If we get distros to disable the BKL while it's still there, that will help annoy certain companies providing binary-only kernel modules, but if you're building your own kernels and don't use those modules, it won't make a difference.

I originally thought we'd have a lot more legacy modules that are not worth fixing and need to be disabled, but now the config option is not so important any more.

> In such cases isn't it better to just let this suboptimal but tested and working code depend forever on an optional BKL?
> The few remaining users of such old code might still enjoy it without having the resources to free it from the BKL.

The only remaining modules that nobody has volunteered to fix are now only i810/i830 (drm), and a few file systems (adfs, coda, freevxfs, hpfs, smbfs and ufs). I guess we can either declare them BROKEN_ON_SMP or remove them if nobody steps up to fix them by the end of the following (2.6.38) merge window.

BKL-free in 2.6.37 (maybe)

Posted Sep 23, 2010 19:04 UTC (Thu) by talisein (subscriber, #31829) [Link]

Have there been any metrics to view the progress of the BKL-freedom movement? E.g. time spent in the BKL, release to release?

How it works?

Posted Sep 30, 2010 14:34 UTC (Thu) by chojrak11 (guest, #52056) [Link]

The relatively new Android drivers, or Microsoft drivers got removed from the kernel because of no maintenance, but Amiga FFS is still there? I don't gather it. Either Amiga FFS is actively developed, bugfixed or otherwise maintained (which I think is unlikely), or all of this doesn't make sense.

How it works?

Posted Sep 30, 2010 15:03 UTC (Thu) by cladisch (✭ supporter ✭, #50193) [Link]

All code in the kernel is supposed to abide by a minimal quality standard. The Android and Microsoft drivers were part of the staging tree, i.e., they were supposed to be improved until they could be moved into the kernel proper. This did not happen.

The AmigaFFS driver is maintained. There isn't much maintenance because it hasn't changed much; it needs to change only to follow kernel API changes.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds