LWN.net Logo

[BUGS] [CHECKER] 99 synchronization bugs and a lock summary database

From:  Yichen Xie <yxie-AT-cs.stanford.edu>
To:  linux-kernel-AT-vger.kernel.org
Subject:  [BUGS] [CHECKER] 99 synchronization bugs and a lock summary database
Date:  Thu, 1 Jul 2004 18:01:00 -0700 (PDT)

Hi all, 

We are a group of researchers at Stanford working on program analysis
algorithms.  We have been building a precision enhanced program analysis
engine at Stanford, and our first application was to derive mutex/lock
behavior in the linux kernel. In the process, we found 99 likely
synchronization errors in linux kernel version 2.6.5:

    http://glide.stanford.edu/linux-lock/err1.html (69 errors)
    http://glide.stanford.edu/linux-lock/err2.html (30 errors)

err1.html consists of potential double locks/unlocks. Acquiring a lock
twice in a row may result in a system hang, and releasing a lock more than
once with certain mutex functions (e.g. up) may cause critical section
violations.

err2.html consists of reports on inconsistent output lock states on
function exit. These errors usually correspond to missed lock operations
on error paths. (filenames in this report correspond to where a function
is declared, so CTAGS may come in handy to find the actual implementation
of the function).

In the error reports, functions are hyperlinked to their derived
summaries, and those of their callees (since the analysis spans function
calls, the error condition of a particular function usually depend on the
locking behavior of its callees).

For example, in function "radeon_pm_program_v2clk" (first error report in
err1.html), the tool flagged an error at line 323 of
"drivers/video/aty/radeon_pm.c". Line 323 invokes two macros, OUTPLL, and
INPLL. OUTPLL acquires "rinfo->reg_lock", and then evaluates "addr", which
is calculated, in this case, by calling _INPLL. By clicking on the link
"drivers/video/aty/radeon_pm.c:radeon_pm_program_v2clk", we can see that
_INPLL requires "rinfo->reg_lock" be unheld on entry (confirmed by looking
at its definition), which is not satisfied in this example. So this is a
double lock error and could potentially lead to a deadlock on MP systems.

We also have a separate web interface to the summary database at:

	http://glide.stanford.edu/linux-lock/

For example, typing "fh_put" in the input box gives

=========
 SUMMARY 
=========
FUNCTION SUMMARY: 'include/linux/nfsd/nfsfh.h:fh_put'
{
  dcache_lock(global): [unlocked -> unlocked]
  fhp(param#0)->fh_dentry->d_lock: [unlocked -> unlocked]
  fhp(param#0)->fh_dentry->d_inode->i_sem: [locked -> unlocked]
}

Each line in the function summary correspond to the requirements and
effects on one particular lock. For example, fh_put requires that the
global lock variable dcache_lock be unheld on entry, and it'll remain
unheld on exit. It also requires fhp->fh_dentry->d_inode->i_sem be held on
entry and it'll release it on exit. (note: please ignore summaries for
lock premitives like spin_lock or down_interruptible; models for these
functions are built into the checker and the derived summaries are not
used).

We have found that some modules in the kernel has functions with
complicated synchronization behavior (esp. in filesystems), and we hope
summaries generated by this tool could be useful not only for bug finding,
but also for documentation purposes as well.

The analysis is intraprocedurally "path sensitive", so it won't be fooled
by cases like

     if (flag & BLOCKING) spin_lock(&l);
     ...
     if (flag & BLOCKING) spin_unlock(&l);

or

     if (!spin_trylock(&l))
	return -EAGAIN;
     ...
     spin_unlock(&l);

The analysis algorithm models values (down to individual bits) and
pointers in the program with a boolean satisfiability solver with high
precision, and we're actively looking for other properties involving
(heavy) data dependencies where naive analysis would fail. Suggestions and
insights from the linux kernel community will be more than welcome!

As always, feedbacks and confirmations will be greatly appreciated!

Best regards,
Yichen Xie
<yxie@cs.stanford.edu>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


(Log in to post comments)

Link to thread?

Posted Jul 8, 2004 11:10 UTC (Thu) by knan (subscriber, #3940) [Link]

Note to LWN: when reporting linux-kernel discussions, a link to the thread in question would be handy for reference.

Link to thread?

Posted Jul 14, 2004 3:06 UTC (Wed) by roelofs (subscriber, #2599) [Link]

Note to LWN: when reporting linux-kernel discussions, a link to the thread in question would be handy for reference.

Seconded. I realize there are a lot of different LKML web interfaces out there, but just about any of them would suffice.

Greg

Link to thread?

Posted Jul 14, 2004 15:02 UTC (Wed) by corbet (editor, #1) [Link]

People request this occasionally, and I'm sympathetic, honest. It is not an easy thing to do, though. Writing the kernel page requires keeping up with over two dozen mailing lists, some of which are very high volume. It can't be done by reading through a web archive. So finding an archive pointer for every posting quoted would be an additional step. The only way that could realistically be done would be to write one less kernel page article each week so I could dig through somebody's archives for the pointers. So far, I have not seen that as a good tradeoff.

Solving this problem is on the list, but the list is long. I do wish I had something more encouraging to say...

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds