LWN.net Logo

Documenting and enforcing locking requirements

As was discussed last week, one problem with an increasingly fine-grained kernel is that it becomes difficult to know which locks, out of thousands, must be held at any given point. Some functions include documentation on their locking requirements (and sometimes it's even current), but many others don't. And there is no way for the code to actually enforce those requirements.

That may be about to change, however. Jesse Barnes, in discussion with Daniel Phillips and others, has posted a patch which addresses both problems. A function which expects to be called with a given lock held simply includes a line like:

    MUST_HOLD_SPIN(&some_lock);

In kernels compiled for production use, this macro expands to nothing and serves as documentation only - anybody looking at the code sees immediately that some_lock must be held before calling the function. The CONFIG_DEBUG_SPINLOCK compilation option gives the macro some teeth, however: if the given lock is not actually held at that point the kernel panics immediately. The end result is that erroneous calls are likely to get fixed in a hurry.

Dave Jones jumped in with a suggestion for tracking down a related (and common) problem: code which sleeps while holding a spinlock. Sleeping while holding a lock is against the rules, since it can cause other processors to spin for a very long time. But it is easy, while programming the kernel, to call a function which, three functions later, goes to sleep. Once again, one could try to document the "can sleep" status of every function and expect programmers to follow that documentation. But, says Dave, why not just add a line like:

    FUNCTION_SLEEPS();

to any function that can sleep? If the macro is called while a lock is held, a bug exists. A quick kernel panic will allow the kernel hackers to track down the offending call in a hurry.

Neither of these changes has found its way into a mainline kernel yet. If they do, though, they could well help in the early detection of a number of programming errors.


(Log in to post comments)

Debug assertions are great

Posted Jul 18, 2002 17:23 UTC (Thu) by cpeterso (subscriber, #305) [Link]

I think this is a great idea! Many projects (including FreeBSD) use debug assertion statements to help find bugs and document/enforce assumptions in code. I've read that Linus doesn't like assertions, thinking that lazy developers just "code around" them instead of understanding the assumptions the assertions are documenting. Now that the Linux kernel is getting so complicated (with many locks and a preemtible kernel) developers need all the help they can get in testing their code.

Microsoft has a tool called Driver Verifier that helps developers stress-test their device drivers. It forces the NT kernel to do extra (and time-consuming) error checking, such as checking for appropriate IRQL interuppt levels in driver functions and simulating memory allocation failures. Since Linux's device driver APIs are constantly shifting (compared to NT's documented APIs), more assertions like MUST_HOLD_SPIN() are important to become "active" documentation.

Documenting and enforcing locking requirements

Posted Jul 19, 2002 18:58 UTC (Fri) by Lorenzo (guest, #260) [Link]

In the topic: Documenting and enforcing locking requirements ... the paragraph:

As was discussed last week, one problem with an increasingly fine-grained kernel is that it becomes difficult to know which locks, out of thousands,must be held at any given point. Some functions include documentation on their locking requirements (and sometimes it's even current), but many others don't. And there is no way for the code to actually enforce those requirements.

... just sends shivers down my spine at the complete misunderstanding by the author of fundamental principles of locking. To wit: "Some functions include documentation on their locking requirements ...".

The whole concept of "locking code" is broken as designed. One should lock data, not code. It is the data that is vulnerable to concurrent access and update, not the code. Once an engineer/programmer grasps that fundamental concept, the problem becomes more manageable.

Thanks. I feel much better now.

Documenting and enforcing locking requirements

Posted Jul 20, 2002 14:53 UTC (Sat) by corbet (editor, #1) [Link]

I hate to say it, but your author is not entirely clueless.

In this case, there are many kernel functions that expect to be called with specific locks already held. Yes, the lock is protecting data (or hardware resources, etc.), but the function needs to have access to said data. If a function tries to take out a lock that the thread already holds, it will spin for, shall we say, a very long time. There could also be a problem with lock ordering requirements.

Thus, "locks held" is part of the interface to a Linux kernel function whether you like it or not.

Documenting and enforcing locking requirements

Posted Jul 21, 2002 23:40 UTC (Sun) by Lorenzo (guest, #260) [Link]

Good programming practice also suggests that you should never leave a function while holding a lock. ... but, you do what you have to do.

Using assert-checks for double locks is also useful

Posted Jul 29, 2002 23:48 UTC (Mon) by cdurst (guest, #2953) [Link]

Having just been debugging some (application) thread locking code, I wanted to mention that I found it extremely useful to also also add asserts to make sure that I wasn't setting the same lock twice, or releasing it more than once.

That change immediately pointed out a few places where a function that required a prior lock was inadvertently calling another function that did not assume that it was locked and tried to lock it again.

Some programs may consider that OK and would just nest the locks, but in my particular case this was VERY BAD and lead to the locks being released too early.

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds