ThreadSanitiser

Posted Feb 6, 2025 2:28 UTC (Thu) by roc (subscriber, #30627)
In reply to: ThreadSanitiser by milesrout
Parent article: Exposing concurrency bugs with a custom scheduler

It's very different.

This kind of approach randomizes scheduling in clever ways to try to trigger bugs. rr chaos mode is a similar kind of approach and there are many others.

TSAN and other race detectors try to find inadequate synchronization, e.g. two CPU cores racing to access the same memory location, where at least one is a write, with no synchronization preventing the trace. A TSAN report is therefore not necessarily an application bug, although it usually is. But the advantage of TSAN is that you can report issues without actually having to trigger the precise schedule that exposes the bug.

ThreadSanitiser

Posted Feb 6, 2025 15:04 UTC (Thu) by parttimenerd (guest, #175795) [Link] (1 responses)

(co-speaker of the talk here) You're correct. The goals of the two tools are also different. My scheduler runs programs with random/erratic scheduling orders, but without interfering with the program's execution itself. The scheduler can, therefore, find only larger races or broken inter-dependencies between two threads. A basic example of such interdependency is given in the Queue example (https://github.com/parttimenerd/concurrency-fuzz-schedule...) that I used for the talk.

The use case of this tool is essentially to run it during integration testing or long fuzzing sessions.

> rr chaos mode is a similar kind of approach and there are many others.

Yes, but rr chaos mode is far slower. One of the ideas is to combine the custom scheduler with the other techniques.

Another goal of the custom scheduler was to show what is possible with sched-ext, allowing us to implement a custom scheduling policy in a matter of hours.

ThreadSanitiser

Posted Feb 7, 2025 10:20 UTC (Fri) by roc (subscriber, #30627) [Link]

> Yes, but rr chaos mode is far slower. One of the ideas is to combine the custom scheduler with the other techniques.

Tradeoffs! rr chaos mode has the advantage that you get a fully reproducible recording of the execution which makes it really easy to debug. AFAICT from the description in this article, this sched_ext approach has no guarantee that you can reproduce a bug reliably after you've caught it once.