Re: detecting all possible deadlock conditions
Posted May 31, 2006 7:17 UTC (Wed) by mingo
In reply to: The kernel lock validator
Parent article: The kernel lock validator
Well, the validator does reduce the incredibly complex QA task of "trigger all possible locking races that can lead to deadlocks, assuming an arbitrary number of CPUs and interrupt sources" to "trigger all locking codepaths at least once".
In other words, we've reduced a highly probabilistic parallel QA (and manual code review) task to a comparatively _much_ simpler task of serial code coverage.
But yes, to increase the 'reach' of the formal proof of correctness we need to increase code coverage during testing. Projects like LTP do that systematically, they use kernel instrumentation and visualization tools to see which code wasnt executed yet and create testcases to trigger it. Plus chaos helps us too: if LTP doesnt trigger it, users will eventually - and will likely not trigger the deadlock itself, but only the codepath that makes it possible. (hence there is much better debug info and a still working system.)
Put differently: it's not bad idea to run through every possible branch of code at least once anyway ;-) If some code is never triggered in practice, why is the code there and how was it tested to begin with? There is a world of a difference between code that is never executed even once and narrow 1-instruction races that need many CPUs to even trigger.
So i'd say "total locking correctness" is not an impossible dream anymore, but is attached to another effort (code coverage) that we want to make as complete as possible anyway.
to post comments)