Capability analysis for the kernel
The Clang feature is based on the concept of "capabilities" that a program can be determined — at compile time — to hold (or not) at any given point. Capabilities are typically the address of a data structure; for example, the address of a specific spinlock can be designated as a capability that a program can acquire with a lock operation. Functions can be annotated to indicate that they acquire or release a capability; developers can also indicate that callers of a function must hold (or not hold) a specific capability.
Adding analysis to the kernel
Bart Van Assche posted a patch series on February 6 showing how Clang's thread-safety feature could be used with the kernel's mutex type. The core of this work can be found in this patch, which annotates the various mutex-related functions; for example, the prototype for mutex_lock() and mutex_unlock() are modified to be:
void mutex_lock(struct mutex *lock) ACQUIRE(*lock); void mutex_unlock(struct mutex *lock) RELEASE(*lock);
The first line says that a call to mutex_lock() will gain a capability in the form of the pointed-to mutex, while calling mutex_unlock() will give up that capability. The ACQUIRE() and RELEASE() macros are built on top of Clang's lower-level macros; there are quite a few other macros in the set. With that infrastructure in place, any function that requires a specific mutex to be held can be annotated accordingly; for example:
static struct devfreq_governor *try_then_request_governor(const char *name) REQUIRES(devfreq_list_lock);
The compiler will then warn on any call to that function if the possession of the indicated lock (devfreq_list_lock) cannot be determined. There is also a series of macros with names like GUARDED_BY() to document that access to specific data (a structure member, for example) requires that a certain mutex be held. Those macros are not actually used in the posted series, though.
Van Assche's patch set is focused on the mutex type, and attempts to annotate usage throughout the entire kernel (though many of the annotations are NO_THREAD_SAFETY_ANALYSIS, which disables the analysis because the locking is too complicated for Clang to figure out — a situation that arises frequently). This work culminates in a massive patch touching over 800 files, which is a significant amount of code churn. The work has already found a number of locking bugs, the fixes for which are included in the series.
An alternative approach
On the same day, Marco Elver posted a patch set of his own with a slightly different approach to using the same feature; that series has since been updated, adopting the term "capability analysis" in place of "thread-safety analysis". While Van Assche used a breadth-first approach with mutexes, Elver has gone depth-first with a series that adds annotations for several locking primitives, but which is only active in subsystems that explicitly opt into it. In that way, warnings can be turned on for code where the maintainers and developers are interested in them (and will act on them), while being left off for the rest of the kernel.
The syntax of the annotations is a little different from Van Assche's approach, but the intent is clearly the same:
void mutex_lock(struct mutex *lock) __acquires(lock); void mutex_unlock(struct mutex *lock) __releases(lock);
Elver's series, though, goes beyond mutexes to add annotations for spinlocks, reader-writer locks, seqlocks, single-bit spinlocks, read-copy-update, local locks, and wait/wound mutexes. In many cases, the annotations that already exist for the kernel's locking correctness validator (lockdep) have been reworked to add the needed capability declarations. There is a __guarded_by() annotation to document that a lock that must be held to access a specific structure member; its use can be seen in this patch instrumenting the kfence subsystem. The capability_unsafe() marker disables checking for a block of code. Most of the new annotations, along with documentation, can be found in this patch.
The other difference found in Elver's approach is the explicit opt-in design, which allows each subsystem to enable or disable the feature. By default, any given subsystem will not be covered by this analysis; that can be changed by adding one or more lines to the subsystem's makefile:
CAPABILITY_ANALYSIS := y CAPABILITY_ANALYSIS_foo := y
The first line will enable analysis for all code compiled by way of that makefile, while the second will enable it only for the compilation creating foo.o. The patch set enables the feature for a number of kernel subsystems, including debugfs, kernel fences, rhashtables, tty drivers, the TOMOYO security module, the crypto subsystem, and more.
What next?
It would appear that the community has a wealth of riches here: two competing patch sets that aim to use the same compiler feature to improve correctness checking within the kernel. Either series can increase confidence that locking is being handled correctly, and both work entirely at compile time, with no run-time overhead. The reception for this work has been quite positive, with the only open question seemingly being which series would be accepted, or whether the kernel might adopt a combination of the two.
There are no definitive answers to that question, but a clue can be found
in the fact that Van Assche has been posting comments (and Reviewed-by
tags) for Elver's patch set. Peter Zijlstra has also tried
his hand with Elver's work in the scheduler subsystem, saying that
"this is all awesome
". That attempt pointed to some needed changes;
it seems Zijlstra also managed to crash the Clang compiler. He later pointed
out that the capability analysis works in simple cases, but for
"anything remotely interesting, where you actually want the help, it
falls over
".
Real use in the kernel (and beyond) may well help to drive development work
in the Clang community to improve this analysis feature to the point that
it can be routinely used to verify locking patterns. Some of that work may
need to happen before support for this kind of capability analysis can be
added to the kernel. But, since it is an opt-in, compile-time feature,
there may well be value to adding it relatively soon, even if it needs
further work. The kernel community seems to be hungry for this kind of
support.
Index entries for this article | |
---|---|
Kernel | Development tools/Static analysis |
Posted Mar 10, 2025 14:55 UTC (Mon)
by kleptog (subscriber, #1183)
[Link]
In my experience it's often the simple cases that need the extra checking because they're so "obvious" that people don't pay enough attention, especially during refactors.
The complex cases tend to get more thoroughly examined.
Posted Mar 10, 2025 14:59 UTC (Mon)
by andresfreund (subscriber, #69562)
[Link] (21 responses)
I wish the relevant analysis infrastructure in llvm/clang would improve, unfortunately I haven't seen much advancement in the last few years...
Posted Mar 10, 2025 18:22 UTC (Mon)
by atnot (subscriber, #124910)
[Link] (19 responses)
Posted Mar 10, 2025 18:37 UTC (Mon)
by andresfreund (subscriber, #69562)
[Link] (1 responses)
Posted Mar 11, 2025 10:46 UTC (Tue)
by khim (subscriber, #9252)
[Link]
Indeed, that's the code that I would ask to refactor: put the inner part into a function (or a closure), call it between Can be trivially handled in any language with generics or templates. Don't make reader to play puzzle games in head while reading that code! These are complicated enough that most developers (although probably not most kernel developers) would think about ways to abstract all that machinery and put in into simple, easy to digest, code.
Posted Mar 10, 2025 18:42 UTC (Mon)
by pm215 (subscriber, #98099)
[Link] (15 responses)
Posted Mar 10, 2025 21:58 UTC (Mon)
by proski (subscriber, #104)
[Link] (12 responses)
And then let's see why the lock is needed. In Rust, objects are not locked just in case, they are locked to be accessed for reading or writing. If there is no apparent access to the object being locked, perhaps that access is implicit and should be made explicit.
Posted Mar 11, 2025 1:57 UTC (Tue)
by kschendel (subscriber, #20465)
[Link] (11 responses)
Posted Mar 11, 2025 11:05 UTC (Tue)
by khim (subscriber, #9252)
[Link] (10 responses)
That's what “new generation” writes and while question of efficiency exists it's Ok for 90% of the code. These debates happened decades ago and the result is: 90% of time complicated control flow is… just not worth it. And in 90% of remaining 10% compiler can handle it just fine. The remaining 1% of code can be reviewed to the death with every line discussed (and maybe even commented!) separately. You can do conditional locking in Rust, too. What you don't do is conditional unlocking. Something like this:
Note that there are no Although the whole thing looks extremely suspicious to me: if we are not planning to touch
Posted Mar 12, 2025 22:35 UTC (Wed)
by neggles (subscriber, #153254)
[Link] (1 responses)
We *are* planning to touch parent - we're about to remove one of its children, so we lock it to prevent another task from changing it while we do that.
A number of device classes may or may not have a parent device, depending on how they're attached to the system.
Posted Mar 12, 2025 22:57 UTC (Wed)
by khim (subscriber, #9252)
[Link]
No. We are not doing that. Instead we hope that someone else would reach out to touch our That's very much a premature optimization: we have two entirely different operations conflated and introduce hidden dependency channel for no good reason. Sure, but why do they even care. And why do they care enough to play tricks with pointers where the exact same pointer may be both locked and unlocked with no way to distinguish these two states, except by tracing the call graph (inherently fragile operation both for compiler and human)? I agree that sometimes that's the only sensible choice… but far too often the whole story is done in this inherently convoluted and fragile fashion not because it may affect the performance or memory or anything like that but simply “because we can”. And that's how we end up with these endless debates about who have to take the lock and when… the only way to solve it is to stop creating such convoluted schemes except when one knows they are justified by measurable improvements somewhere.
Posted Mar 12, 2025 23:59 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (7 responses)
The implication (at least as far as I can understand this code) is that we're doing something that makes parent semantically invalid (or at least thread-unsafe), even though we are not writing to any of its bytes.
In Rust, you would most likely solve this problem with a typestate, which probably looks vaguely like the following:
You can then use &mut ParentHandle as an argument to anything which must exclusively lock the parent but operates on the device. If you already locked the parent elsewhere, you can also construct it without (further) locking, and that's just as safe (and avoids the need for reentrant locking).
Posted Mar 13, 2025 10:14 UTC (Thu)
by farnz (subscriber, #17727)
[Link] (6 responses)
As a result, if you have a &mut Thing, you know by the semantics of Rust that you have exclusive access to the Thing. This could be because the compiler can see at compile time that you've got exclusive access, or because unsafe Rust in a locking primitive guarantees this property at run time.
Posted Mar 13, 2025 18:03 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
* You could just use &Foo and &mut Foo (probably with lifetimes, at least if they're going into a struct). The main advantage is that these are the "lowest common denominator" - nearly everything else can be converted into one of these, except that pinning pointers cannot be converted into &mut. But none of the other generic options are compatible with Pin either (in this use case), so if you need to mutate through a Pin, you pretty much have to write it out separately (and hope the type supplies appropriate Pin<&mut Self> methods for actually doing the mutation). Unfortunately, the use of &Foo or &mut Foo means users will sometimes have to write &*foo or the like to convert some other type into a reference, which is not terrible but also not ideal.
IMHO the least problematic option (at least in a case like this) is to take Deref<T> or DerefMut<T>, because:
* Conversion into &T or &mut T is rarely all that difficult (except for the Pin case noted above) and those always implement Deref/DerefMut.
I'm actually not too upset about AsRef lacking that reflexive blanket implementation - if it had one, then it would be much easier to write generic code that does implicit conversions, and once that starts to become the standard way of writing generic code, everything starts feeling a little too much like C++ (before they invented the explicit keyword). In fact, part of me is tempted to say that they should not fix this, even if trait coherence gets more permissive in the future, because I'm not sure the resulting generality would be worth it. But OTOH that doesn't seem to have happened with From and Into, so maybe it's fine.
Posted Mar 13, 2025 19:12 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
Thus, if your typestate had a way to get an Option<&mut P> from a ParentHandle, I'd be reduced to confirming that the only way for that option to be None is if the weak reference to parent was already dangling, and thus there is no parent to lock.
Posted Mar 14, 2025 12:36 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Hmm. That's odd. I see `AsRef<Path>` generic parameters fairly often. I feel like `AsRef<OsStr>` is also around, but I've seen that less often. Is &Path and &OsStr better here perhaps (though I feel like String and str usages might need caller-side help then)?
Posted Mar 14, 2025 13:51 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (2 responses)
If that crate fails to do so, then the orphan rules mean that you can't add it yourself. This means that, unless the type has "cheated" with a trivial manual implementation, you can't use an &T where a function asks for an AsRef<T>. Fortunately, there is impl<T> AsRef<[T]> for [T], so this restriction only applies to converting &T to AsRef<T>, not &[T] to AsRef<[T]>.
Posted Mar 14, 2025 14:09 UTC (Fri)
by intelfx (subscriber, #130118)
[Link] (1 responses)
Posted Mar 14, 2025 14:30 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
If the issues around specialization can be resolved, then that would provide a path to permitting such a blanket imp for AsRef. But that's a big if, since there's a requirement to avoid permitting specialization based on lifetime annotations (mostly because lifetime annotations are supposed to be erased after type checking is complete, but also because there's no way in Rust to represent the statement "lifetimes 'a and 'b are exactly the same"; the closest you can get is "'a is at least as long as 'b and 'b is at least as long as 'a", which is not quite the same in terms of the logic used for type and borrow checking).
Posted Mar 11, 2025 22:10 UTC (Tue)
by atnot (subscriber, #124910)
[Link] (1 responses)
- As a program grows, it will gain more and more concurrent data structures (ad-hoc or designed), each of which will gain their own bespoke ways of determining when what needs to be locked under which conditions.
In the end it always ends up difficult to manage this M x N complexity of bespoke locking rules for each component long term. And the truth is, most code simply doesn't need it. A simple universal locking rule of "if you want to touch this data you need to hold it's associated lock" works for the vast majority of cases. "If you want to touch this you need to call this function" works for even more. You can really design a surprising amount of complex code sensibly around these simple type of rules which are trivial for both humans and computers to verify and are already within what clang can so.
I think there's a lot of value in having tools that can understand the complexity of these locking rules that have already been created for sure. But I think it's a mistake to trick ourselves into believing that the habitual over-engineering of locking rules we tend to get ourselves into is a necessity when the tools are already perfectly good for most cases and designing around what they can do already delivers more understandable code today. Allowing cleverer locking rules really wouldn't fix that problem.
Posted Mar 12, 2025 22:38 UTC (Wed)
by neggles (subscriber, #153254)
[Link]
case in point: `struct page`
Posted Mar 20, 2025 8:45 UTC (Thu)
by daenzer (subscriber, #7050)
[Link]
Posted Apr 3, 2025 14:55 UTC (Thu)
by melver (subscriber, #134990)
[Link]
We are hopeful this can happen some time this year. After which, we will revisit the feature for the kernel. If there are willing Clang/LLVM contributors to help us accelerate landing this feature, please get in touch.
Posted Mar 10, 2025 19:03 UTC (Mon)
by abatters (✭ supporter ✭, #6932)
[Link]
Posted Mar 14, 2025 1:17 UTC (Fri)
by irogers (subscriber, #121692)
[Link]
Simple cases need the checking more
analysis too limited
analysis too limited
> There's a correlary to this: If your locking scheme is regularly too complex for a computer to understand, humans probably won't understand it either.
It doesn't have to be very complicated for clang to not understand it. IIRC it's enough to have:
analysis too limited
if (need_lock)
acquire_lock();
simple_code_like_an_assignment = 1;
if (need_lock)
release_lock();
IIRC it also doesn't deal with conditional lock acquisition failing.
These really aren't complicated hard to understand locking schemes...
> IIRC it's enough to have
analysis too limited
acquire_lock
and release_lock
unconditionally.
Mmm, I agree on the general principle, but is this (the example from the thread that the analyser can't handle) really an example of "so complex you can't understand it" ? All it's doing is "if we have a parent then hold its lock across this call"... It feels to me like maybe the analyser could do with being a little smarter too. (Unless of course I've misunderstood what the code is doing :-) )
analysis too limited
struct device *parent = dev->parent;
if (parent)
device_lock(parent);
device_release_driver(dev);
if (parent)
device_unlock(parent);
analysis too limited
analysis too limited
> IMO this is too close to making everything a nail, since one has a hammer ...
analysis too limited
// We probably have Weak reference to parent so need to upgrade.
let maybe_parent = dev.parent.upgrade();
let guard;
// If we have parent then we need to lock it.
if let(parent) = maybe_parent {
guard = parent.lock();
}
device_release_driver(dev);
unlock
and there are no need for unlock
, there are literally nothing to analyze.parent
then why do we lock it? If we do plan to touch it then why we are not passing it into that function? And why do we have a strange function that may or may not access parent in the first place ? There would be no need for lock if it doesn't access parent
and it may obviously not access it if it's not there so this function be definition is already very strange.analysis too limited
> We *are* planning to touch parent
analysis too limited
parent
to detach us from it. While expecting that we hold the lock.analysis too limited
// callsite
let handle = lock_make_parent_handle(dev);
device_release_driver(handle);
// lock_make_parent_handle() definition
pub fn lock_make_parent_handle<'a>(dev: &'a mut Device) -> ParentHandle<'a, MutexGuard<'a, Device>>{
ParentHandle{
dev: dev,
parent: dev.parent.upgrade().map(|parent| parent.lock().unwrap()), // PoisonError handling elided for brevity
}
}
pub fn make_parent_handle_unlocked<'a, P>(dev: &'a mut Device, maybe_parent: Option<P>) -> Result<ParentHandle<'a, P>, WrongParentError>
where P: 'a + DerefMut<Device>
{
let maybe_dev_parent = dev.parent.upgrade();
if maybe_parent.is_some() != maybe_dev_parent.is_some() {
return Err(WrongParentError{});
}
if let (Some(parent), Some(dev_parent)) = maybe_parent, maybe_dev_parent {
/* We cannot directly compare parent and dev_parent with PartialEq or the like, because by assumption,
* dev_parent is already locked (or at least, it should be if the caller is using the function correctly).
* Test whether parent points somewhere inside of dev_parent by doing pointer comparisons:
*/
if (&*dev_parent as *const u8) > (&*parent as *const u8){
return Err(WrongParentError{});
}
if (std::ptr::from_ref(&*dev_parent).wrapping_add(1) as *const u8) <= (&*parent as *const u8){
// Can't wrap, Rust's memory model guarantees that base + size does not overflow usize.
return Err(WrongParentError{});
}
}
Ok(ParentHandle{
dev: dev,
parent: maybe_parent,
})
}
// ParentHandle definition
struct ParentHandle<'a, P>
where P: 'a + DerefMut<Device>{
dev: &'a mut Device,
parent: Option<P>,
}
// This struct and the above two constructors are in a separate crate or module so that
// you can't manufacture your own instances from scratch.
// device_release_driver() definition
fn device_release_driver<'a, P>(handle: &mut ParentHandle<'a, P>)
where P: 'a + DerefMut<Device>
{
// use handle.dev here... probably needs get() and get_mut() methods, and also for parent, etc.
}
It's worth noting when you're analyzing this sort of code in Rust that there's a parallel between Rust's references, and reader-writer locks (rwlocks) on the owned object. Semantically, a shared reference (&'lifetime Thing) acts as a compile-time read lock on the thing it refers to (modulo interior mutability - just as it's permissible for a structure protected by an rwlock to contain something with its own locking that allows mutation, even when you only have a read lock on the outer structure), and an exclusive reference (&'lifetime mut Thing) acts as a compile-time write lock on the structure it refers to.
Note similarity between Rust references and reader-writer locks
Note similarity between Rust references and reader-writer locks
* You could instead take Deref<Foo> and DerefMut<Foo>. This covers most reasonable smart pointers, in addition to &Foo and &mut Foo, which means the user can give us ownership (or shared ownership, a lock guard, etc.) if that is more convenient for them. Technically, there is nothing prohibiting a Deref implementation from doing something expensive like taking a lock, but the semantic meaning of Deref and DerefMut is "this conversion is so cheap and so obvious that it should happen implicitly," so taking a lock in Deref is evil (on top of that, the signature of Deref::deref() is not compatible with taking a lock unless you hold it for the entire lifespan of Self, so it makes more sense to implement Deref on the already-taken lock guard instead). For example, a Box<Foo> has all the methods of a &mut Foo, because even though it's technically not the same thing, we prefer to think of it as "a Foo that happens to live on the heap" rather than "a Box containing a Foo." Similarly, Vec<T> and [T; N] both deref coerce into &[T], because they're both conceptually "a contiguous sequence of T objects somewhere in memory," and so it would make no sense to require an explicit conversion to use slice methods on them. However, this can also be a problem, if you don't actually care about that property and just want "a thing that can cheaply and infallibly give me a &Foo, regardless of what it is conceptually supposed to be."
* Then there are AsRef<Foo> and AsMut<Foo>. These are the "cheap reference-to-reference conversion" traits. Just like Deref, they should not take locks or do other expensive operations (and just like Deref::deref(), their method signatures are not really compatible with locking anyway). Unlike Deref, they don't get invoked implicitly, which means that unlike Deref, you can implement them more than once on the same type, as well as on types that are conceptually distinct from one another. For example, the various string-like types can all be converted into &[u8] via AsRef, but they do not Deref as &[u8], because it would be very confusing if we mixed bytewise methods into the same namespace as Unicode-oriented methods. Unfortunately, &T does not blanket implement AsRef<T> due to a trait coherence constraint, which means these are not a good candidate for accepting arguments generically.
* There are also Borrow<Foo> and BorrowMut<Foo>. These are extremely similar to AsRef<Foo> and AsRefMut<Foo>, with two differences: Firstly, they semantically require that Self implements Eq, Ord, and Hash identically to the underlying (owned) type (i.e. if lhs == rhs, then lhs.borrow() == rhs.borrow()). This makes them suitable for use in hash tables, trees, heaps, etc. Secondly, there is a blanket implementation for &T, so they are more suitable for generic use than AsRef<T>. But you don't always need the Eq/Ord/Hash property, so this may be too restrictive in some use cases.
* In principle, there's also Into<&T>, but this doesn't exist in practice because the signature doesn't make sense (it consumes self and returns &T, but what is the lifetime of the &T it returns, given that self has been consumed and no longer exists?).
* Most smart pointers are Deref<T>, so we still support the "pass an owning smart pointer or lock guard instead of a reference" use case. That's one less lifetime for the user to think about, but they don't have to do this if they don't want to (they can pass &T instead).
* If the end user wants to do some conversion that isn't "obvious" in the way that deref coercion is meant to be obvious, then probably they should do that explicitly at the callsite.
To be clear, this was more meant in terms of reading the code, not writing it - if you see an &mut, you know that you already have exclusive access to the thing you're looking at, and thus that you don't need to look for locking.
Note similarity between Rust references and reader-writer locks
Note similarity between Rust references and reader-writer locks
Path, OsStr and str "cheat" by manually implementing AsRef for the reflexive case. The problem is that trait coherence prevents you having impl<T: ?Sized> AsRef<T> for T, but instead requires the crate that defines T to add the trivial implementation manually.
Note similarity between Rust references and reader-writer locks
Note similarity between Rust references and reader-writer locks
That's a question better asked on internals.rust-lang.org. My understanding, as an outsider, is that permitting this blanket implementation in parallel to impl<T: AsRef<U> + ?Sized, U: ?Size> AsRef<U> for &T runs into the same set of soundness problems as other forms of specialization.
Note similarity between Rust references and reader-writer locks
analysis too limited
- Similarly, as a data structure itself grows in complexity and functionality, the rules for an individual data structure will grow more and more complex with additional considerations, optimisations, special exceptions, build options, etc. Nobody creates inscrutable locking rules on purpose, but they do have a habit of appearing as a codebase gets older. And while ideally these rules would be rigorously specified, in reality you're inevitably going to be playing a game of telephone where each developer gives the next their interpretation of what the locking rules are until nobody can agree on the specifics.
analysis too limited
analysis too limited
analysis too limited
coccinelle
Problems with thread safety and RAII patterns in C++
https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#no-...
This can be pretty limiting as you may put logic in constructors/destructors that uses global locks, particularly common for RAII patterns. C cleanups for RAII are also considered in the thread.