|
|
Subscribe / Log in / New account

The dependency tracker for complex deadlock detection

By Jonathan Corbet
September 4, 2025

OSS EU
Deadlocks are a constant threat in concurrent settings with shared data; it is thus not surprising that the kernel project has long since developed tools to detect potential deadlocks so they can be fixed before they affect production users. Byungchul Park thinks that he has developed a better tool that can detect more deadlock-prone situations. At the 2025 Open Source Summit Europe, he presented an introduction to his dependency tracker (or "DEPT") tool and the kinds of problems it can detect.

[Byungchul Park] Park began by presenting a simple ABBA deadlock scenario. Imagine two threads running, each of which makes use of two locks, called A and B. The first thread acquires A, then B, ending up holding both locks; meanwhile, the second thread acquires the same two locks but in the opposite order, taking B first. That can work, until the bad day when each thread succeeds in taking the first of its two locks. Then one holds A, the other holds B, and each is waiting for the other to release the second lock it needs. They will wait for a long time.

Deadlocks like this can be hard to reproduce, and thus hard to track down and fix; it is far better to detect the possibility of such a deadlock ahead of time. The kernel has a tool called "lockdep", originally written by Ingo Molnar, that can perform this detection. Whenever a thread acquires a lock, lockdep remembers that acquisition, along with the context — specifically, any other locks that are currently held. It will thus notice that, in the first case, A is acquired ahead of B. When lockdep observes the second thread acquiring the locks in the opposite order, it will raise the alarm. The problem can then be fixed, hopefully before this particular deadlock ever occurs on a production system.

Park continued with a more complex example, presented in this form:

Thread XThread YThread Z
mutex_lock(A)mutex_lock(C)
mutex_lock(A)mutex_lock(C)
wait_for_event(B)
mutex_unlock(A)mutex_unlock(C)
mutex_unlock(A)mutex_unlock(C)
event B

Thread Y, holding lock C, will be blocked waiting for A to become available. Thread Z, meanwhile, is blocked waiting for C. Thread X is waiting for some sort of event to happen; it is up to Z to produce that event, but that will never happen because Z is waiting on a chain of locks leading back to A, held by X. Park's point here is that deadlocks are not driven only by locks, other types of events can be part of deadlock scenarios as well. Lockdep, being focused on locks, is unable to detect these cases.

Park's DEPT tool, currently in its 16th revision, is meant to address this type of problem. The core idea behind DEPT is a new definition of a dependency; while lockdep sees dependencies as lock acquisitions, DEPT looks at events more generally. If one event occurs prior to another, DEPT will notice the dependency between them. In the example above, the release of A depends on the occurrence of B, which in turn depends on the release of C, which itself depends on the release of A. That is a dependency loop that can cause a deadlock; DEPT will duly report it.

Back in 2023, kernel developers were beating their heads against a complex deadlock problem, and Linus Torvalds asked whether DEPT might be able to track the problem down. Using DEPT, Park was able to pinpoint the problem, which involved a wait on a bit lock; events involving bit locks are invisible to lockdep. Earlier this year, Andrew Morton was struggling with a similar problem:

Nostalgia. A decade or three ago many of us spent much of our lives staring at ABBA deadlocks. Then came lockdep and after a few more years, it all stopped. I've long hoped that lockdep would gain a solution to custom locks such as folio_wait_bit_common(), but not yet.

He, too, asked if DEPT could find this problem; once again, DEPT located the deadlock so that it could be fixed.

DEPT, Park said, has a number of benefits, starting with the fact that it uses a more generalized concept of dependencies rather than just looking at specific locks. It can handle situations involving folio locks, completions, or DMA fences that lockdep cannot. DEPT is also able to properly deal with reader-writer locks. It is, he said, less likely than lockdep to generate false-positive reports, and it is able to continue operating after the first report (unlike lockdep, which shuts down once a problem is found).

Lockdep, he allowed, is better in that it produces far fewer false-positive reports now; he noted that kernel developers have spent 20 years achieving that result. It will, though, miss some deadlock situations. In cases where the deadlock scenario is more complex than lockdep can handle, he concluded, DEPT should be used instead.

I could not resist asking why DEPT is not upstream after 16 revisions and more than three years (the first version was posted in February 2022). Park answered that the real source of resistance is the number of false positives. Lockdep is just a lot quieter.

Another attendee said that, since DEPT is a build-time option, it is hard to use with problems that affect production systems in the field; he suggested that perhaps an approach using BPF would be useful. Park agreed, but seemed to not have considered the BPF option until now; he said that he would look into it.

Finally, another audience member said that he did not think that the false-positive problem should block the merging of this code; he asked what the real problem is. Park did not answer that question, but declared that he would keep working on the code until he manages to get it upstream.

After the talk, with beer in hand, I asked a core-kernel developer about why DEPT has run into resistance. The answer was indeed false positives; the tool generates so many reports that it is difficult to get a real signal out of the noise. The root of the problem, they said, was that Park is attempting to replace lockdep all at once with a large patch set. Something like DEPT could be useful and welcome, but it would need to go upstream as a set of smaller changes that evolve lockdep into something with DEPT's view of dependencies, making the kernel better with every step. Until that approach is taken, attempts to merge DEPT into the mainline are unlikely to be successful.

The slides from Park's talk are available for interested readers.

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to this event.]

Index entries for this article
KernelDevelopment tools
ConferenceOpen Source Summit Europe/2025


to post comments

Large changes

Posted Sep 4, 2025 17:53 UTC (Thu) by quotemstr (subscriber, #45331) [Link] (15 responses)

> Something like DEPT could be useful and welcome, but it would need to go upstream as a set of smaller changes that evolve lockdep into something with DEPT's view of dependencies

It's shockingly common for open source projects to miss innovations that help both developers and users just because the people who do the innovative work are so energetic, excited, and industrious and tend to express their works all at once in the form of large, cohesive patches.

Some days, I think Linux and other mature projects would reject a patch that at once cured cancer and made fusion power practical all at once if it came in the form of a patch more than 50 lines long.

It's not a good use of that brilliant and energetic person's time to slice and dice an opus of a work into a thousand tiny steps, especially when everyone knows that the intermediate steps will never see any use and exist only to satisfy a Process.

Once a project would rather not innovate than innovate in large-ish chunks, it's dead. It's in maintenance mode. It's no longer about improving the world, but consolidating improvements already made.

Large changes

Posted Sep 4, 2025 19:35 UTC (Thu) by willy (subscriber, #9762) [Link] (7 responses)

You're missing that large changes have a large chance of problems. I dare say the folio project is one of the most ambitious changes made to the kernel in the last ten years. If I'd waited for it to be "done" before submitting it, it (a) would not be done yet and (b) would never be merged.

Instead, I have roughly 2000 patches over the last five years slowly converting us to folios. Almost every patch results in a mild improvement. Some of those patches contain bugs! Because each one is so small, it's been relatively easy to track those bugs down and fix them.

I haven't looked at DEPT in detail. Maybe it's not possible to transform lockdep into DEPT one small patch at a time. But it'd be awfully fun for somebody to give it a try.

Large changes

Posted Sep 4, 2025 20:21 UTC (Thu) by intelfx (subscriber, #130118) [Link] (5 responses)

> You're missing that large changes have a large chance of problems. I dare say the folio project is one of the most ambitious changes made to the kernel in the last ten years. If I'd waited for it to be "done" before submitting it, it (a) would not be done yet and (b) would never be merged.

Folios are not an innovation. It is literally *refactoring* — a purely internal, housekeeping work, cleaning up some APIs by introducing some new types. It may be "one of the most ambitious changes made to the kernel," but it's purely due to how invasive it is, and not because it's some kind of a notable invention.

Of course, it makes sense for a *refactoring* to be made slowly and in pieces. That's an entirely different weight category from the subject of the present discussion (and the article).

Large changes

Posted Sep 4, 2025 21:35 UTC (Thu) by acarno (subscriber, #123476) [Link] (2 responses)

I think dismissing folios as mere refactoring does a disservice to the amount of careful design work needed to achieve the benefits that folios bring. But that's a matter of opinion, so I'll leave it at that...

From the link in the article, the DEPT diff stat shows 6500 lines of changes to 69 files; only 13 of those are new. I'm not a kernel developer, but it seems entirely reasonable to ask that the other 56 files worth of changes be merged in a few commits at a time. I can't speak for anyone else, but I certainly wouldn't want to review 6500 lines in a single PR, even if carefully split across 42 commits; that's just unmanageable.

Large changes

Posted Sep 4, 2025 23:22 UTC (Thu) by intelfx (subscriber, #130118) [Link] (1 responses)

> I think dismissing folios as mere refactoring does a disservice to the amount of careful design work needed to achieve the benefits that folios bring. But that's a matter of opinion, so I'll leave it at that...

No disregard was intended to the vast amount and high quality of the design and engineering work that went into the folios conversion or the follow-up projects.

However, all of that is unrelated to the point I was making, which is that the folios were neither an innovation (the concept of a folio, with all the accompanying engineering effort, serves no purpose outside of specific particulars of Linux kernel design) nor a self-contained work, which is why comparing them to DEPT (or any other "innovations" the OP was talking about) is arguably incorrect.

Large changes

Posted Sep 5, 2025 4:11 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> However, all of that is unrelated to the point I was making, which is that the folios were neither an innovation (the concept of a folio, with all the accompanying engineering effort

Not much stuff is truly innovative, but after the folio conversion, Linux will be the first "large" OS to gain true flexible memory units as a kernel-level primitive.

Windows NT uses a page frame database with "prototype" entries for shared pages, it does support large pages, but not automatic splitting/merging in filesystems. Darwin also uses fixed pages.

Large changes

Posted Sep 5, 2025 23:59 UTC (Fri) by SLi (subscriber, #53131) [Link] (1 responses)

Do you think there's a guarantee that all clearly merge-worthy work _can_ be done in small incremental changes?

Now, I think one would do really well to at least demand exceptionally good reasoning for why something couldn't, and that is not just "it's too much work"—getting things upstreamed into Linux is hard work, often for a good reason, that you sign up for if you try.

I guess there are some mechanisms that try to alleviate the problem somewhat: The more isolated and optional a change is, the less scrutiny even a larger PR tends to get.

But, in general, I think "it must be done in small pieces" is often taken as a gospel without asking if that's in general possible. And maybe it is, maybe it isn't... (although opening this door would also have more developers arguing why their code is so special)

Large changes

Posted Sep 6, 2025 1:56 UTC (Sat) by iabervon (subscriber, #722) [Link]

I think there are some cases where something is tightly coupled with itself and loosely coupled with the existing kernel, and no part of the new code makes sense (or compiles) without much of the rest of the new code. For something like that, you'd want a big explanation, then a big chunk of code in its own new directory, then small incremental changes to the existing kernel code.

Then a reviewer can look at patch 3 to see (e.g.) adding tracking to the first type of kernel lock, then look at patch 1 to see why the added code makes sense, then look at patch 2 to see what the added code is actually doing, and that provides a starting point to understand the purpose of the new engine and consider whether its implementation makes sense. It also allows for reviews that assume the new subsystem works as described and review the small incremental changes to the existing kernel to use it.

Large changes

Posted Sep 15, 2025 12:39 UTC (Mon) by maxbyungchulpark (subscriber, #179345) [Link]

As you know, there was an attempt to incrementally improve the existing lockdep through the cross-release (see https://lwn.net/Articles/709849/ and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...). However, this effort was quickly reverted due to the generation of false positives. It's important to note that lockdep ceases reporting after the first instance, regardless of whether it is a false positive or a genuine issue.

From my experience, I can say it's nearly impossible to achieve meaningful progress based on lockdep, as expanding its coverage tends to result in an increase in false positives, thus diminishing its usefulness.

In summary, our available options are as follows:
1) Incrementally improve the existing lockdep, following the approach by cross-release,
2) Completely transform it into a new tool in one step, or
3) Enable the development of a separate tool that can be improved independently.

I explained why we should not go for 1). 2) is too risky - it's no way to replace the mature tool with a unverified tool for sure - thus we shouldn't take 2). The best option we must take is 3). I'm trying to achieve 3) with DEPT. Thoughts?

Large changes

Posted Sep 7, 2025 2:28 UTC (Sun) by ssmith32 (subscriber, #72404) [Link] (6 responses)

>It's not a good use of that brilliant and energetic person's time to slice and dice an opus of a work

The idea that iterative software development is to be preferred over the big bang approach is certainly not limited to open source. It's rather wide spread throughout every domain of software development, and has spread out to other areas (e.g. military, aerospace, etc).

If you thing DEPT is an opus of a brilliant mind, it pales in comparison to what's been developed iteratively in other areas.

Heck, _actual_ opuses are often developed iteratively...

https://en.m.wikipedia.org/wiki/Symphony_No._9_(Beethoven)

Large changes

Posted Sep 8, 2025 8:37 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

There is a big difference between developing something iteratively and finding an iterative path from some existing thing you didn't write to some other existing thing you did write. The latter seems like much more of a waste of time.

Large changes

Posted Sep 8, 2025 9:29 UTC (Mon) by neilbrown (subscriber, #359) [Link]

The latter would give people confidence that you actually understood the former and the needs that it was meeting, and hence confidence that your replacement at least met those same needs.

Large changes

Posted Sep 8, 2025 9:46 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

> The idea that iterative software development is to be preferred over the big bang approach is certainly not limited to open source. It's rather wide spread throughout every domain of software development, and has spread out to other areas (e.g. military, aerospace, etc).

My biggest criticism of iterative software (and it's a criticism of how it's usually done, not the iteration itself) is that all too often, it leads to buggy software with missing functionality - "here's a problem, let's fix it".

If you work together with a strict exhaustive spec [1] yes, it'll slow down the speed of iteration, but there'll be far fewer bugs for the next iteration.

[1] "ten minutes spent preparing saves an hour debugging". And by "in conjunction with the spec" I don't mean you need a spec for the entire program, just the bit you're working on. I've moaned about this many times, usually in conjunction with things like file save, where a 2x2 matrix presents the user with three options - "yes this time", "yes always", "no this time". Where's "no always"? This manifests itself very much with software nowadays where you are nagged with the options "now" or "later". Very conspicuous by its absence is "bugger off and don't come back !!!" (Okay, I understand that's because they think "a fool and his money are soon parted", but it's quite obvious they consider you are a fool for wanting to use their software ...)

Cheers,
Wol

Large changes

Posted Sep 8, 2025 13:25 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

> like file save, where a 2x2 matrix presents the user with three options - "yes this time", "yes always", "no this time". Where's "no always"?

My favorite has been on some Android permission dialogs, there was Allow/Disallow and a separate checkbox for "Don't ask again". Great! However, the wise guys decided that if the "Don't ask again" checkbox was checked…the Disallow button should be disabled. Thanks, but no thanks.

Large changes

Posted Sep 8, 2025 15:08 UTC (Mon) by Wol (subscriber, #4433) [Link]

My worst was - gentoo of all things - "automount /boot when building the kernel".

Obviously you can have or not have a /boot. By default it can be mounted or not mounted. When building a kernel I would have thought all three possible options (obviously automount a non-existent /boot doesn't make sense) were reasonable choices, but if you selected "don't automount the /boot partition" it just aborted the kernel build. ???? So the only options that worked were no separate boot, or automount. So what on earth was the point of asking ????

(I didn't want kernel build trampling over my multi-boot /boot! The consequences typically were the system crashing into the systemd recovery shell on reboot!)

I can't remember what the workarounds suggested were, but they basically boiled down to "don't update /boot in any way shape or form", which isn't a workaround, or made the build crash with "can't find path" (for an option which allegedly didn't affect the build itself in any way). So the whole thing was buggy as hell, but nobody could see the (non)sense of it all.

Cheers,
Wol

Large changes

Posted Sep 15, 2025 13:31 UTC (Mon) by maxbyungchulpark (subscriber, #179345) [Link]

DEPT should be enhanced through iterative software development. While the concept of DEPT is headed in the right direction, the implementation could be improved. I hope it's not misled.

Simple Matter of Coding??

Posted Sep 5, 2025 0:59 UTC (Fri) by neilbrown (subscriber, #359) [Link] (8 responses)

> I've long hoped that lockdep would gain a solution to custom locks such as folio_wait_bit_common(), but not yet.

Isn't this just an SMC??

spin_lock, mutex_lock, etc are also "custom locks" in that someone had to write the code. And they had to make the relevant lockdep calls so that lockdep could see the locks. What is stopping someone doing the same for folio_wait_bit_common()???

Each lock requires a 'struct lockdep_map' and adding those for bit-locks is slightly more clumsy than for locks that are represented by a struct. But it is far from impossible. rhashtable uses bitlocks for each buckets and has lockdep monitoring.

folio_wait_bit_common only ever waits for PG_locked, PG_private_2, or PG_writeback. So we just need 3 lockdep_maps.

And that is where we hit the problem. the lock bits are in 'struct folio' and making that bigger, even for debug-only kernels, isn't going to fly.

So the solution we need isn't for "custom locks" but for "space constrained" locks.

For rhashtables we use one lockdep_map for each "bucket_table" which covers multiple buckets and so multiple locks. On reflection we probably could simply use a single global lockdep_map. For most locks that don't nest I don't think the lockdep_map contains anything that isn't in the global lock_class. I wonder if the folio locks nest. I wonder how much it matters. I'm guess the nesting-related info in lockdep_map is more for performance than functionality (the word 'cache' gives it away). We might need to be able to disable LOCK_STAT for locks with a shared lockdep_map....

Of course this would be much easier to reason about if there were good documentation for lockdep (lockdep-design.rst contains lots of detail but doesn't mention lockdep_map explicitly) .... but we've got the code which should be enough.

Simple Matter of Coding??

Posted Sep 5, 2025 1:28 UTC (Fri) by willy (subscriber, #9762) [Link] (7 responses)

The reason this doesn't work for folio locks is the same reason there's no support for semaphores in lockdep: lockdep is built around the concept of ownership.

Mutexes have to be released by their owner. Semaphores and the folio lock can be (and are) unlocked by other tasks and even interrupt context.

Simple Matter of Coding??

Posted Sep 5, 2025 1:48 UTC (Fri) by neilbrown (subscriber, #359) [Link] (6 responses)

Interesting and useful - thanks.

Surely it would be possible to transfer ownership. At some point a process that locked a folio will "release" it (e.g. submit for writeback) and which point is can disavow the lock (i.e. tell lockdep it is no longer locked, without actually unlocking). At some other point some other process (or irq handler) will become responsible for the locked folio and so it can adopt the lock (i.e. tell lockdep that it just successfully locked the lock).

Maybe that wouldn't be quite such a "simple" matter of coding, but I imagine it would create valuable documentation of control flow.

Simple Matter of Coding??

Posted Sep 5, 2025 2:01 UTC (Fri) by willy (subscriber, #9762) [Link] (4 responses)

I've considered things like that before. It never seems to work out quite right. I wouldn't have a "release lockdep without releasing the lock" though. I'd want an API that says "context FOO is now responsible for unlocking". That would also support XFS using a work queue to unlock a lock. It'd also allow lockdep to track actual deadlocks like something that prevents FOO from running.

Simple Matter of Coding??

Posted Sep 6, 2025 4:30 UTC (Sat) by neilbrown (subscriber, #359) [Link] (3 responses)

> I wouldn't have a "release lockdep without releasing the lock" though. I'd want an API that says "context FOO is now responsible for unlocking".

While I can see the value of being completely explicit about a chain of ownership, I can also see that it add substantial complexity. The other contexts, which aren't processes, are not yet well defined. Some might be queues, but there are likely other possibilities. Then the mechanism which realises the handover from the queue to some other process might need to wait for a thread (as you suggested) or might need to wait for memory (for a new thread to be allocated) or maybe other things.
Getting that all well understood would be useful, but not trivial.

> It never seems to work out quite right.

could that be because you aimed too high as discussed above? As has been mentioned in other comments, an incremental approach might be more likely to succeed.

Yes, I know, I should propose a patch....

Simple Matter of Coding??

Posted Sep 6, 2025 10:48 UTC (Sat) by willy (subscriber, #9762) [Link] (2 responses)

If you remember, lockdep comes from the RT patchset. They really want locks to have owners so they can priority boost the owner when a higher priority task blocks on it.

Now, if the folio is locked because of I/I, there's obviously no owner. You just have to wait for the storage. So that's fine. But if a work queue owns the lock, that should be boostable.

Simple Matter of Coding??

Posted Sep 8, 2025 4:54 UTC (Mon) by neilbrown (subscriber, #359) [Link] (1 responses)

????
I don't think lockdep is at all related to boosting lock owners.
If CONFIG_PROVE_LOCKING isn't set, lockdep doesn't track lock owners (or do anything).
And if it is set, then lots of data structures are significantly bigger so it isn't something you'd do in production.
And lock owner boosting that doesn't work in production seems pointless.

But maybe I misunderstood....

Simple Matter of Coding??

Posted Sep 8, 2025 10:32 UTC (Mon) by farnz (subscriber, #17727) [Link]

The relationship between lockdep and boosting lock owners is indirect, not direct; they both need clarity around lock ownership to work well, but they're not tied together in code.

In effect, lockdep is "we will need lock ownership to be clear out for the RT patchset to do lock boosting - let's do something useful for mainline so that mainline cares about lock ownership, not just RT".

Simple Matter of Coding??

Posted Sep 16, 2025 7:50 UTC (Tue) by maxbyungchulpark (subscriber, #179345) [Link]

That's exactly what cross-release feature was working for, about 10 years ago. (see https://lwn.net/Articles/709849/ and https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...)


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds