Documenting and enforcing locking requirements
[Posted July 17, 2002 by corbet]
As was discussed
last week, one problem with
an increasingly fine-grained kernel is that it becomes difficult to know
which locks, out of thousands, must be held at any given point. Some
functions include documentation on their locking requirements (and
sometimes it's even current), but many others don't. And there is no way
for the code to actually enforce those requirements.
That may be about to change, however. Jesse Barnes, in discussion with
Daniel Phillips and others, has posted a
patch which addresses both problems. A function which expects to be
called with a given lock held simply includes a line like:
MUST_HOLD_SPIN(&some_lock);
In kernels compiled for production use, this macro expands to nothing and
serves as documentation only - anybody looking at the code sees immediately
that some_lock must be held before calling the function. The
CONFIG_DEBUG_SPINLOCK compilation option gives the macro some
teeth, however: if the given lock is not actually held at that point the
kernel panics immediately. The end result is that erroneous calls are
likely to get fixed in a hurry.
Dave Jones jumped in with a suggestion for
tracking down a related (and common) problem: code which sleeps while
holding a spinlock. Sleeping while holding a lock is against the rules,
since it can cause other processors to spin for a very long time. But it
is easy, while programming the kernel, to call a function which, three
functions later, goes to sleep. Once again, one could try to document the
"can sleep" status of every function and expect programmers to follow that
documentation. But, says Dave, why not just add a line like:
FUNCTION_SLEEPS();
to any function that can sleep? If the macro is called while a lock is
held, a bug exists. A quick kernel panic will allow the kernel hackers to
track down the offending call in a hurry.
Neither of these changes has found its way into a mainline kernel yet. If
they do, though, they could well help in the early detection of a number of
programming errors.
(
Log in to post comments)