LWN.net Logo

The Big Kernel Lock lives on

It was recently noted that ioctl() system calls are still executed with the Big Kernel Lock (BKL) held. A suggestion was made that drivers which can implement ioctl() without the BKL held should be specially flagged as a way of increasing parallelism. That suggestion looks like it will not get very far. But it did pique your editor's interest in current use of the BKL. Besides, there hasn't been a whole lot else going on this week.

The BKL is an artifact from when the Linux kernel first supported multiprocessor systems. Making the kernel safe for concurrent access from multiple CPUs has been a multi-year task; it is not a job that could have been done all at once at the beginning. So Linux 2.0 supported SMP systems by way of the BKL, which only allowed one processor to be running kernel code at any given time. The BKL is essentially a spinlock, but with a couple of interesting properties:

  • The BKL can be taken recursively; the kernel remembers how many times a given thread has called lock_kernel() and does the right thing. Normal spinlocks are rather less forgiving.

  • Code holding the BKL can sleep. The lock is released while the given thread sleeps, and reacquired upon awakening.

The BKL made SMP Linux possible, but it didn't scale very well. Its overhead could be felt even with two processors, and it made running on anything larger problematic. So the kernel developers have been breaking the BKL into finer-grained locks ever since. Thus, for example, the block I/O subsystem went from the BKL to its own lock (io_request_lock) in 2.2, and from that to individual queue locks in 2.6. The kernel now has thousands of locks, and some people had assumed that the BKL would be gone by 2.6.

As it turns out, there are still over 500 lock_kernel() calls in the 2.6.6 kernel. For the curious, here are some of the places which still rely on this old, system-wide lock:

  • The core kernel retains a few calls. The implementation of the reboot() system call is one of them; this is, of course, not one of the more performance-sensitive parts of the kernel. The boot-time early initialization process is also run with the BKL held. The sysctl() system call is run under the BKL; interestingly, while much of /proc is also implemented under the BKL, it appears that reads and writes to /proc/sys do not run with the BKL held.

  • Many older filesystems (UFS, coda, HPFS, FAT, NCP, SMB, Minix, etc.) make heavy use of the BKL for serialization. The UnixWare "Boot File System" implementation has several calls; somehow, they seem unlikely to be fixed anytime soon. There are also lock_kernel() calls in NFS, UDF, isofs, the reiserfs journaling code, autofs, and some others. The ext2 filesystem uses the BKL to protect modifications to the superblock; ext3, instead, had all of its lock_kernel() calls purged during the 2.5 development process.

  • The rpciod kernel thread spends its entire life with the BKL held.

  • Core dumps are created with the BKL held.

  • Block and character devices have their open() methods called under the BKL. Block release() methods are also called this way, but that is not true for char drivers. The default llseek() method runs under the BKL, but, if a driver or filesystem provides its own llseek() method, that method will not be called with the BKL held. The fasync() method is always called under the BKL. As noted at the beginning, ioctl() methods are called with the lock held; additionally, the ugly code which does 32-bit emulation on 64-bit systems needs the BKL.

  • The file locking code still requires the BKL.

  • Almost 10% of the lock_kernel() calls can be found in the (old, deprecated) OSS sound code. The ALSA code has no BKL calls, with one exception: the implementation of its /proc files.

  • Most of the architectures retain some calls in the arch-specific code. The ptrace() system call is one common place for these calls. i386 also uses the BKL to protect llseek() calls on the CPUID and MSR pseudo-devices. uClinux performs execve() calls under the BKL.

  • Almost all of the remaining BKL calls are to be found in device drivers. The TTY subsystem still has quite a few of them, as does USB. Many of these calls are protecting llseek() implementations. Quite a few of the rest are for the creation of special-purpose kernel threads: the daemonize() function needs to be called with the BKL held. Those calls can, presumably, go away as the driver code is (slowly) migrated over to the new kthread calls.

Given how poorly the BKL is viewed, it may be surprising that so many places in the kernel still use it. The simple fact is that, with regard to the BKL, all of the low-hanging fruit has long since been taken. For most of the remaining calls, removing the BKL is not worth the trouble and code churn. So, while removal of the remaining calls over the 2.7 development series looks entirely possible, it would not be surprising if that does not happen.


(Log in to post comments)

The Big Kernel Lock lives on

Posted May 27, 2004 4:21 UTC (Thu) by ncm (subscriber, #165) [Link]

Would somebody please explain, briefly, how the BKL and its users interact with the (approximately) myriad other locks in the kernel? I.e. does the BKL only guard what is not guarded by any other lock? Might a driver need to take the BKL and another, finer-grained lock, before proceeding? Is there a natural order in which locks are taken?

The Big Kernel Lock lives on

Posted May 27, 2004 11:24 UTC (Thu) by kunitz (subscriber, #3965) [Link]

Alan Cox, I believe, emphasized: Locks protect data; not threads. As long as two threads don't access the same data, they are not required to share the same lock. Today most of the kernel data is protected by granular locks; however there is still data protected by the big kernel lock. So finding all the users of the big kernel lock is the easy part, you must find out which data is actually protected and you must introduce granular locks to protect that data.

Even in the pre-SMP times you had to lock data against interrupt handlers. Linus simply disabled and enabled interrupts in the critical sections using the infamous cli()/sti() pairs. I believe, the simplicity of that solution inspired the big kernel lock.

The Big Kernel Lock lives on

Posted May 27, 2004 11:54 UTC (Thu) by corbet (editor, #1) [Link]

The BKL is a special lock; its purpose still, essentially, is to protect resources not covered by some other lock. Modern code running under the BKL may well take other locks, but it will be unaware of it - the locks will be taken further down the call chain. Once the code itself becomes lock-aware, the need for the BKL should go away.

And yes, it is actually quite important to define the order in which locks are taken. If the same two locks can be taken in either order, the system will eventually deadlock. Lock ordering rules (and, in general, figuring out which locks you need) get to be a real problem as the number of locks grows; people like Larry McVoy have been warning for years that overly fine-grained locking leads to an unmaintainable kernel.

The Big Kernel Lock lives on

Posted May 27, 2004 12:33 UTC (Thu) by brugolsky (subscriber, #28) [Link]

I'm sure that you meant this, but just to clarify: fine-grained locks, in and of themselves, are not the problem. One can lock a list, or lock the individual elements; the choice generally impacts performance. Excessive lock depth (i.e., level of nesting) results in an unmaintainable code. It seems to be generally agreed that the cliff lies not far beyond four locks.

The Big Kernel Lock lives on

Posted May 27, 2004 23:43 UTC (Thu) by nix (subscriber, #2304) [Link]

`Seven, plus or minus two'... and since we don't want to restrict kernel maintainership to those who are lucky enough to have big short-term memories, less than five seems a good point to stop.

The Big Kernel Lock lives on

Posted Jun 2, 2004 22:30 UTC (Wed) by shane (subscriber, #3335) [Link]

Not to speak for Mr. Corbet, but I'm pretty sure he actually was
referring to having too many locks. The problem is deadlock: one thread
holding lock A, waiting for lock B; the other holding lock B, waiting for
lock A. This is the simplest example (well, holding A and waiting for A
is simpler, but you get the idea). Any circular chain of references is
possible, and causes the same problem.

This problem is easier to hit when you use many different locks. A
programmer's natural inclination is to lock each resource as you need it.
However, in order to prevent deadlock you should always lock in the same
order. Which means that if any thread ever needs lock A and lock B, it
always locks A and then lock B. This is not always optimal, as lock A
may be held for a period of time when it is not needed.

The Big Kernel Lock lives on

Posted May 27, 2004 17:54 UTC (Thu) by stuart2048 (subscriber, #6241) [Link]

OK, so the BKL is a big ugly spin lock (or small and simple, depending on your perspective ;-). What about the thousands of smaller grained locks in the kernel (thousands -- really???). I'm curious how they are implemented.

The Big Kernel Lock lives on

Posted May 28, 2004 0:13 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

They're the same kind of spin lock. But because they're small, and consequently not ugly, they are preferable. Small just means only a few things use each one.

There's no reason that one CPU shouldn't access a proc file while another CPU accesses a sound card. But today that can't happen because they both use the BKL. The proc file access uses it to serialize proc file accesses and the sound card uses it to serialize sound card accesses, and as a byproduct they also mutually exclude each other.

The only reason they both use the BKL is programmer laziness. If we find the energy, we can make one lock for proc files and another for sound cards and remove the ugliness. (Actually, I'm sure we would go much finer grained than that).

I guess I should admit that the BKL isn't really the same as the fine-grained locks because of the BKL's unique property that it gets automatically released across sleeps. It would be even uglier if it didn't do that.

The Big Kernel Lock lives on

Posted May 27, 2004 17:57 UTC (Thu) by iabervon (subscriber, #722) [Link]

I'd guess that most of the uses of the BKL will go away for the purpose of removing the BKL code, or for the purpose of determining and documenting what data each area touches. What exactly does rpciod use that's protected by the BKL? Is that data still protected by the BKL? Will somebody know when redoing the locking wherever it is that rpciod has to be changed accordingly?

The Big Kernel Lock lives on

Posted Jun 15, 2006 12:14 UTC (Thu) by shamalwinchurkar (guest, #38390) [Link]

Would somebody please explain, briefly, write() and read() system calls of device drives also called by kernel after holding BLK lock?

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds