Finding locking bugs with Smatch

By Daroc Alden
June 11, 2025

Smatch is a GPL-licensed static-analysis tool for C that has a lot of specialized checks for the kernel. Smatch has been used in the kernel for more than 20 years; Dan Carpenter, its primary author, decided last year that some details of its plugin system were due for a rewrite. He spoke at Linaro Connect 2025 about his work on Smatch, the changes to its implementation, and how those changes enabled him to easily add additional checks for locking bugs in the kernel.

Video of the talk is available, and Carpenter's slides can be found on Linaro's website. Carpenter began by apologizing for the relative complexity of this talk, compared to some of his presentations about Smatch in prior years. "We're running out of easy checks to write," he explained. Smatch is designed to permit writing project-specific checks; over the years, a large number of kernel-specific checks have been added to the code, so the latest work has moved on to more complicated topics, such as locking.

One of the things that sets Smatch apart from other static-analysis tools, Carpenter said, is its support for control-flow analysis and cross-function analysis. He frequently uses both of these features to understand new subsystems; Smatch can "tell you where a variable is set, where callers of a function are, and what a function can return," among other things. For example, Smatch might show that a particular function has three callers, all of which hold a particular lock when they call it. From that, the programmer can infer the implicit locking requirements of the function.

That kind of report requires cross-function analysis, to trace nested calls with a lock held, but it also requires some starting knowledge of which functions acquire or release a lock. In the kernel, Smatch obtains that information from a hand-written table of every lock-related function, and then propagates that information through the different call trees. Rebuilding the complete database of calls takes multiple passes and five to six hours, which he said he does every night.

Once the database is built, however, it allows files to be easily checked for common locking problems. The most common mistake is to fail to unlock a lock in a function's error path, he said. Smatch finds places where this has occurred by looking for functions that acquire a lock, and then have some possible control-flow path along which it is not released. That is slightly more complicated than it sounds.

"It's harder than you might think to know what the start state is," he explained. If a function makes a call to spin_lock(), one might reasonably assume that the lock was not held before that point. But some functions behave differently depending on whether a lock is already held, so that is control-flow-sensitive as well. Also, sometimes a lock is referred to by multiple names, being locked by one name and unlocked via another. This complexity had resulted in Smatch's lock-tracking code slowly becoming an unreadable mess. "And I'm the one who wrote it."

So, in the past year, Carpenter has rewritten everything. The reimplementation of the locking checks provides a blueprint for how to write modular Smatch checks, he said. Checks can now call add_lock_hook() and add_unlock_hook() to be informed when Smatch finds that a function call acquires or releases a lock somewhere in its call tree. Locks are also now tracked by type, instead of by name, in order to reduce problems with one lock being referred to by more than one name.

There's a slight wrinkle with code that uses C's cleanup attribute to automatically unlock locks. On the one hand, it mostly eliminates bugs related to forgotten unlocks; on the other hand, it's a "headache for static analysis" since it makes the lock object harder to access and track. Ultimately, since they avoid many locking bugs, Smatch can "mostly just ignore them".

Carpenter has used the new structure to write checks for double lock and unlock bugs as well. Unlike other static-analysis projects, Smatch focuses less on "universal static properties" and more on "the actual bugs people are writing". Smatch will not catch every possible double lock, double unlock, or forgotten unlock. That increases the number of false negatives from the tool, but it results in a much bigger reduction in false positives, he said.

I asked how he found the classes of bugs that people actually write, in order to target them with Smatch checks. He explained that he reviews patches sent to the linux-stable mailing list in order to find bugs that could have been found earlier with static analysis. He encouraged other people to try the same thing, as he has found it educational.

In the future, Carpenter wants to extend Smatch's double-lock checks to operate across function boundaries, to take advantage of Clang's upcoming support for tracking the relationship between locks and their data, and to handle lock-ordering bugs.

As time wound down, one member of the audience wanted to know how Smatch compared to Cppcheck, Coccinelle, and other static-analysis tools. Other open-source tools do not have good control-flow analysis, and practically no cross-function analysis, Carpenter said. Smatch does those things better, but it has its own weaknesses. The main problem is that it hasn't really been tested outside the kernel, he explained, so it's not clear how well Smatch will handle other styles of code. Smatch is also relatively slow.

Coccinelle is fast, and can generate fixes for many problems. Sparse is good at finding endianness bugs and user-space pointers dereferenced in kernel space. "But in terms of flow analysis, Smatch is the only tool that we have in open source."

After Carpenter's talk, I ran the tool against my own copy of the kernel; in that process, I learned that Smatch is best run from source. The last release predates Carpenter's rewrite, and there are a number of useful scripts included in the source distribution that are not present in the distribution packages. Smatch's terse documentation covers how to build its analysis database and run existing checks, but not how to query the database in a more free-form manner. The smatch_data/db/smdb.py script included in the source distribution can be used for that purpose.

[Thanks to Linaro for sponsoring my travel to Linaro Connect.]

Index entries for this article
Kernel	Development tools/Static analysis
Conference	Linaro Connect/2025

RCU and function pointers

Posted Jun 11, 2025 22:30 UTC (Wed) by neilbrown (subscriber, #359) [Link] (5 responses)

A question for Dan, who I'm sure is following...
Does your current code track RCU read locking as well as the more normal locks? And does it follow flow across function pointers in operations structs?
So could you, for example, query the database to see what inode_operations are called without rcu_read_lock held, and then which implementations of those functions use rcu_dereference as though the lock were held?

RCU and function pointers

Posted Jun 12, 2025 7:09 UTC (Thu) by error27 (subscriber, #8346) [Link] (4 responses)

Hi Neil,

> Does your current code track RCU read locking as well as the more normal locks?

Yes.

> And does it follow flow across function pointers in operations structs?

Yep. When you call a function pointer, then it's recorded as "(struct inode_operations)->atomic_open" in the caller_info table. There is a function_ptr table which lists the functions implement that. The command `smdb.py functions inode_operations atomic_open` gives the list. So I could do:

$ echo "select distinct(ptr) from function_ptr;" | sqlite3 smatch_db.sqlite | grep inode_operations | tee ptrs
$ IFS="
" ; for i in $(cat ptrs) ; do smdb.py $i ; done | grep LOCK

I ran this and it says nothing is holding the rcu_read lock... Huh. When I run "echo "select * from caller_info where type = 9030;" | sqlite3 smatch_db.sqlite | grep rcu_read_lock" it shows that it's being recorded in other places. One thing is that I'm not currently using the caller information to print warnings. It's stored in the database, but I haven't actually used that information yet so I haven't debugged it.

> So could you, for example, query the database to see what inode_operations are called without rcu_read_lock held, and then which implementations of those functions use rcu_dereference as though the lock were held?

Yeah. That's *supposed* to be easy but since I haven't debugged that part I'm probably going to run into surprises. What I do is I start with the simplest, stupidest test and then re-write it from there. So my first draft will be to complain if we call rcu_read_lock_held() and we're not holding the RCU read lock. Then from I'll go through and silence the false positives. There might be places where we check if the read lock is held and don't immediately print an error message, right? It's a stupid check. But the first draft is never going to work anyway. Maybe I'll print a different warning if we call rcu_read_lock_held() and Smatch thinks it can't possibly be held and I wouldn't publish that, but it might be useful for debugging.

Probably the actual check is to only complain if we call rcu_read_lock_held() from a rcu_dereference*() macro.

RCU and function pointers

Posted Jun 20, 2025 16:12 UTC (Fri) by error27 (subscriber, #8346) [Link] (3 responses)

I did try this and I ran into a bunch of issues...

It turns out that rcu_read_locks are nestable and the locking code doesn't handle that correctly. I might be able to copy and paste the Smatch preempt code to handle this. It's slightly different because the preempt code warns if we ever have preempt disabled and in this case it's the opposite where we want to want if we're ever missing the lock. You want to lean towards which ever way gives fewer false positives. There are places where recursion could confuse it for example. Writing special modules to track rcu_read_lock() instead or re-using the smatch_locking.c code is a lot of typing but it's doable.

I'm a bit confused by rcu_dereference_rtnl(). If we're holding the rtnl_lock(), does that mean the rcu_read_lock() is held?

RCU and function pointers

Posted Jun 20, 2025 20:37 UTC (Fri) by johill (subscriber, #25196) [Link]

as for rcu_dereference_rtnl():

 * rcu_dereference_rtnl - rcu_dereference with debug checking
 * @p: The pointer to read, prior to dereferencing
 *
 * Do an rcu_dereference(p), but check caller either holds rcu_read_lock()
 * or RTNL. Note : Please prefer rtnl_dereference() or rcu_dereference()

You need either, not both. Both would be a bit silly. There are a a lot of calls to rcu_dereference_check() elsewhere which are basically "either RCU or the lock protecting the thing". The reason (IMHO) it says rtnl_dereference() or rcu_dereference() is preferable is that then you have selected either one of the contexts (holding the lock - RTNL in this case - or under RCU protection), but there's always going to be some code that doesn't know the context in whatever call stack it has.

RCU and function pointers

Posted Jun 20, 2025 22:34 UTC (Fri) by neilbrown (subscriber, #359) [Link] (1 responses)

> Writing special modules to track rcu_read_lock() instead or re-using the smatch_locking.c code is a lot of typing but it's doable.

Thanks for digging in to this! Don't try too hard on my account. It was little more than an idle thought because I had seen a d_compare function which uses rcu_dereference but can sometimes be called without rcu_read_locking() - but in practice almost never is (proc_sys_compare) and I wondered how hard it would be for smatch to notice. It seems "not impossible but not trivial" which is a good answer. In this case the anomaly can actually be found simply using sparse as the pointer in question is not marked __rcu.

BTW I tend to think of rcu_read_lock() like a refcount rather than like a read-lock. It is a refcount on "everything". So maybe copying smatch_locking.c wouldn't be the best approach. Does smatch track how many references the code owns to objects? Or does that sound too much like borrow-checking? Could it track references to the RCUniverse? Would that be any easier?

Thanks!

RCU and function pointers

Posted Jun 23, 2025 10:39 UTC (Mon) by error27 (subscriber, #8346) [Link]

> BTW I tend to think of rcu_read_lock() like a refcount rather than like a read-lock.

This is like preempt. Every spinlock bumps the refcount and people often hold multiple spinlocks at the same time.

With preempt the return states are tracked the same way described here. The one subtlety is that if you take multiple locks it only counts as one refcount bump. It doesn't matter the exact number of refcounts we're holding, only that it's greater than one and that the inc/dec functions pair correctly. So it's simpler to just say +1.

For the caller side, the preempt count is tracked as a number. But again it's not an exact number. If any of the callers are holding one or more spinlocks then that's +1. If we take another spinlock then we pump the count. Say we were holding two locks so we started as +1 and then we drop both locks then we'd go to negative and that's fine too. I guess we might miss a bug but that's not as bad as a false positive.

The difference between preempt count and rcu_read_lock() is that for preempt we only need to find one caller which has preempt disabled, but for rcu_read_lock() we need to find all the callers, otherwise it triggers a false positive. It's a much higher bar.

The way that recursion works in Smatch means we're always going to miss some callers. I guess one idea is that I could periodically delete all the simple recursive calls from the database where a function calls itself? It doesn't solve the halting problem but it probably silences quite a few false positives. But the main thing is that I'd have to create a list of functions which need to be manually silenced. There are about 400 warnings so it's probably a day's work to silence all the warnings.