How scalable is too much?
[Posted July 10, 2002 by corbet]
In the beginning Alan Cox created the big
kernel lock (BKL), and Linux became SMP-capable. The BKL ensured that only
one processor could be running kernel code at any given time, thus keeping
the processors from stepping on each other. It was an effective way of
bringing SMP support to a kernel which had not been designed for multiple
processors.
The problem with the BKL, of course, is that multiple processors often want
to run concurrently in kernel code. Most of the time, those processors are
working on entirely different tasks and would not interfere with each
other. The more processors you have, the worse the problem gets; the Linux
kernel with just one big lock (i.e. 2.0) really did not function all that
well with more than two processors. Any additional CPUs would just spend
their time waiting to be able to get into the kernel code.
Scalability to larger systems, thus, requires finer locking. The BKL can
be split into a memory management lock, a networking lock, a filesystem
lock, etc. In the 2.1 development series, for example, the block I/O
subsystem adopted its own lock (io_request_lock) to keep the block
code and drivers from getting into trouble. Scalability was improved,
since the block code no longer needed the BKL, and could execute
concurrently with other kernel code.
But the io_request_lock serialized all block request handling. A
process submitting requests for one drive could not run concurrently with a
different process working with a different device. Floppy operations
contended for the same lock as performance-critical disk requests. The I/O
request lock improved scalability, but, once you get enough processors and
drives, it was still a bottleneck. So, one of the first steps in the 2.5
block subsystem work was to replace io_request_lock with a
per-queue lock, one for each device. The result will be better performance
on large, disk-intensive systems.
Most other kernel subsystems have been going through a similar development
process: global locks are replaced by multiple locks which protect smaller
data structures. This increasingly fine-grained locking makes the kernel
scalable to more and more processors, but it also brings some real costs.
For example, most of us do not run Linux on huge systems, and probably
never will. Embedded SMP systems are also rare.
All that locking will have a cost, even though the compiler
optimizes it out on uniprocessor systems.
The real cost, however, is in the complexity of the kernel code. As the
kernel becomes populated with thousands of little locks, it becomes
increasingly difficult to write correct kernel code. Which lock(s) must
you have to access a given data structure, or to call a given function? In
which order should locks be taken? Consider two code paths, both of which
need locks L1 and L2. The first thread takes L1, the second takes L2; each
then tries to take the other lock. The result is a deadlocked system.
Avoiding this problem requires specifying ordering relationships for every lock
in the system - and the number of those relationships grows exponentially
with the number of locks.
One can try to document the locking requirements of each data structure and
function in the kernel, and every lock ordering constraint.
But, even if one honestly believed that such a
document would be created (and, importantly, maintained), it would be a
very thick, complicated manual. A kernel with many locks will be a kernel
that is difficult to program.
Some people (i.e. Larry McVoy) have been arguing for years that Linux
should not chase the "scalability" goal too far. Down that road lies a
kernel that is twisted beyond maintainability, and, once you realize that
this has happened, it is too late to go back. For the most part,
scalability work has continued in the face of those warnings, but there are
signs that things are beginning to change. For example, a recent patch which removed the BKL from the driverfs
code was shouted down in a fairly strong way. Alexander Viro stated, in characteristic fashion:
"Zillion little spinlocks" means that kernel is scaled into
oblivion. Literally. If you want to play with resulting body -
feel free, but I like it less kinky.
So, while there has been no definitive statement of policy, it looks like
at least some kernel developers are thinking that locking in the kernel is
complex enough. There may be no 64-processor Linux in our future...
...at least, not in the classic SMP form. Larry McVoy has been pushing "cache-coherent clusters" as an alternative
approach for some time. A CC/cluster takes a large machine and divides it
into small group of (four, say) processors; each group runs an independent
Linux kernel. The kernels have minimal interactions with each other, so
locking issues fade to the background. Nobody has, yet, implemented such a
cluster, though a lot of the pieces are there. If somebody runs with this
idea, Linux could yet be the most scalable system of them all.
(
Log in to post comments)