LWN.net Logo

Kill BKL Vol. 2

By Jonathan Corbet
May 21, 2008
Last week's big kernel lock article discussed a BKL-related performance regression and concluded that we would likely see a new interest in its elimination. In the intervening week, that interest has indeed come to the fore. There are now a couple of different efforts afoot to get rid of this long-lasting lock.

One might well wonder why the BKL is so persistent. Over the last (approximately) fifteen years, thousands of locks have been added to the kernel, pushing the BKL into increasingly obscure corners. But there are a lot of those corners, including a great many explicit lock_kernel() calls, the open() method for every char device, most ioctl() implementations, all fasync() implementations, and more. The BKL can be found throughout the kernel, and doesn't appear ready to go without a fight.

Part of the problem is simply that locking is hard. So going in and changing the locking of some crufty, old driver is not at the top of the list for a lot of developers, who would generally rather be creating crufty new drivers. Beyond that, though, the BKL is special. It was originally created to be more than just a locking primitive; its purpose is to allow BKL-covered code to pretend that it is still running on an old, uniprocessor system. So its semantics are very different from any other lock in the Linux kernel.

For example, the BKL nests, so programmers can add lock_kernel() calls anywhere without worrying about whether the BKL might already have been acquired elsewhere. As with a mutex, code holding the BKL can sleep; however, the scheduler will magically release the BKL until the holding thread wakes up again. So there can be various threads in kernel space, all of which think they hold the BKL, but only one of them will actually be running at any given time. The end result is that it is hard to get a handle on what is happening with the BKL at any given time; code can depend on it without ever really being aware of its existence.

As Ingo Molnar put it in his kill the BKL tree announcement:

Furthermore, the BKL is not covered by lockdep, so its dependencies are largely unknown and invisible, and it is all lost in the haze of the past ~15 years of code changes. All this has built up to a kind of Fear, Uncertainty and Doubt about the BKL: nobody really knows it, nobody really dares to touch it and code can break silently and subtly if BKL locking is wrong.

That doesn't mean that people aren't willing to try; Ingo's tree - to which we will return shortly - is a major effort in that direction. But first, consider another initiative which, somewhat accidentally, turned up an example of just how subtle BKL-related issues can be. As was mentioned above, the kernel grabs the BKL whenever a process opens a char device; the BKL is held while the associated driver's open() function runs. To eliminate BKL, one must remove this particular use of it; one cannot just take it out, however, without breaking every driver which does not have proper locking internally. So, in fact, this lock_kernel() call cannot be removed until every driver's open() function has been audited and, if necessary, fixed. That's a big flag day.

An alternative, which your editor rashly jumped into doing, is to push the acquisition of the BKL down one level. Every open() function is forced to be correct through the addition of explicit lock_kernel() and unlock_kernel() calls; once all of the in-tree drivers have been fixed in this way, the higher-level call in chrdev_open() can be removed. This work may seem like a step backward, in that it replaces a single lock_kernel() call with approximately 100 others. But it's actually a big step forward, in that each driver can now be audited and fixed independently. This work has now been done, the resulting tree is in linux-next, and, if all goes well, it should be ready for 2.6.27.

While doing this work, though, your editor noticed quite a few drivers with open functions that were either completely empty (all they do is "return 0") or they do something relatively trivial. These functions, one would think, do not need to acquire the BKL; they touch no global resources and cannot possibly race with any other part of the kernel. In fact, as was suggested by others, the empty open() functions could just be removed altogether.

It was Alan Cox who pointed out that life is not quite so simple. Under the current regime, an open function which looks like this:

    static int empty_open(struct inode *inode, struct file *filp)
    {
    	return 0;
    }

is really better modeled as this:

    static int empty_open(struct inode *inode, struct file *filp)
    {
        lock_kernel();
	unlock_kernel();
    	return 0;
    }

These two may seem the same, but there is a crucial difference: in the second form, empty_open() will not return until it can acquire the BKL. In other words, after empty_open() runs, one knows that the BKL became available at least once. And this matters: a classic device driver error is to (1) register a device with the kernel, then (2) initialize all of the internal data structures needed to manage that device. Should some other process attempt to open and use the device between those two steps, unpleasant things can happen. The lock_kernel() call in the open() function, despite protecting no critical section directly, serializes the opening of the device with the driver's initialization, and thus prevents mayhem. So, says Alan,

I think it would be best to make them lock/unlock kernel in the first pass and then work through them. The BKL can be subtle and evil, but as I brought it into the world I guess I must banish it ;)

Alan will not be alone in that effort, though, and Ingo Molnar's "kill the BKL" tree is likely to help this work considerably. Ingo's approach is to get rid of most of the features which make the BKL special. So, with his patches, the BKL becomes just another mutex which, crucially, can be tracked with the lock validator. It is no longer released when a thread calls schedule(), a change which forced the addition of a few explicit "release, schedule, and reacquire" changes in code which would otherwise deadlock. There's a number of warnings added to point out calls made holding the BKL which should not be. And so on.

This patch set, in essence, removes the BKL entirely, replacing it with just another big lock which happens to do nesting. And the nesting might go too at some point. So the BKL becomes more visible and easier to understand. And, presumably, easier to eliminate.

Linus likes this approach, though he would like to see it reworked to the point that it can be merged into the mainline relatively soon. Doing that would require putting most of the changes behind a configuration option decorated with a sufficient number of scary warnings; then people who wanted to test this code could turn it on and see what explodes. The number of explosions would probably be relatively small - but probably not zero.

This set of changes, along with the other work being done, suggests that significant progress toward the elimination of the BKL can be expected over the next few kernel development cycles. Once it's gone, we'll have a kernel without legacy locking issues, and without the unpleasant performance issues that the BKL can bring. That will still take a while, though; there is simply no substitute for actually looking at all the BKL-covered code and ensuring that it will run safely in the absence of that protection. It's a painstaking job requiring moderate skills which can only be rushed so much.


(Log in to post comments)

Kill BKL Vol. 2

Posted May 22, 2008 19:17 UTC (Thu) by joey (subscriber, #328) [Link]

</title>
<groan> :-)

Kill BKL Vol. 2

Posted May 22, 2008 23:31 UTC (Thu) by pr1268 (subscriber, #24648) [Link]

Best. LWN article title. Ever.

:-)

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds