They're mutexes, Jim, but not as we know them

Posted May 9, 2013 21:48 UTC (Thu) by ksandstr (guest, #60862)
Parent article: Wait/wound mutexes

There are at least two concrete reasons for using wait/wound mutexes over the more common "define a lock ordering, and (very carefully) violate it with try-lock primitives" scheme of fine-grained multiple locking.

The first is that a failure to try-lock requires some sort of a fall-back procedure that either doesn't violate locking order, or does so with a different try-lock subject than in previous iterations. Coming up with that procedure is an enormous hassle, and cramps many a concurrent design. Wait/wound mutexes would seem to avoid this hazard by permitting the wounded thread to resume only after the contended mutex has been released at least once.

The second is that (depending on the interface) wait/wound mutexes could reduce the "slumbering herd" effect that occurs when a thread holds a number of mutexes but then has to wait on one more, repeating down the tree. (this effect also tends to magnify non-mutex waits through the mutex scheme, making it especially pernicious.) This reduction would happen by having the wounded thread wait for the contended mutex only after releasing its own, thereby allowing sufficiently unrelated threads to proceed unimpeded. The net gain is lower aggregate latency under contention.

They're mutexes, Jim, but not as we know them

Posted May 10, 2013 17:08 UTC (Fri) by heijo (guest, #88363) [Link] (6 responses)

The procedure can simply be to remember the lock you wanted to take in all previous attempts and start the next attempt by sorting them by lock order and taking them.

Eventually you'll succeed, although it may take time quadratic in the total number of locks that may be taken across attempts.

Wait/wound on the other hand guarantees that the first-coming thread will progress in linear time in the number of locks, but it seems the other threads may burn unlimited CPU time attempting to take locks and retrying in pathological cases.

This could be fixable by having the "later" thread wait directly for all earlier threads to be done, to save power at the expense of latency, although this is probably not an issue in practice.

They're mutexes, Jim, but not as we know them

Posted May 10, 2013 19:17 UTC (Fri) by dlang (guest, #313) [Link] (4 responses)

> but it seems the other threads may burn unlimited CPU time attempting to take locks and retrying in pathological cases.

undefined CPU time, but not unlimited.

each thread gets a sequence number when they start trying to get a lock, and when two threads conflict, the one with the lower sequence number wins.

As a result, every thread is guaranteed to be able to get the lock in preference to any threads that were first tried to get the lock after it did.

This puts a outer bound on the amount of CPU it will waste in the meantime (admittedly, not a bound that you can readily calculate since you don't know how long the locks will be held by processes that have priority over you)

They're mutexes, Jim, but not as we know them

Posted May 10, 2013 23:44 UTC (Fri) by brong (guest, #87268) [Link] (3 responses)

Hang on - are you saying that the sequence number you were allocated persists even after you back out and try again?

My understanding was that once you're wounded, you have to restart from scratch. If you're restarting with the same (low) sequence number rather than being allocated a brand new one, then I see your point. Otherwise, I see starvation possibilities, the horror.

They're mutexes, Jim, but not as we know them

Posted May 11, 2013 2:21 UTC (Sat) by dlang (guest, #313) [Link] (2 responses)

from the article

> But, since the sequence number increases monotonically, a once-wounded thread must eventually reach a point where it has the highest priority and will win out.

They don't say explicitly that the sequence number is maintained, but I don't see what this would mean otherwise.

They're mutexes, Jim, but not as we know them

Posted May 15, 2013 13:45 UTC (Wed) by mlankhorst (subscriber, #52260) [Link] (1 responses)

Correct. The current implementation preserves the sequence number. But it's an implementation detail, the user of the api will never have to work with the sequence numbers directly.

They're mutexes, Jim, but not as we know them

Posted May 15, 2013 20:49 UTC (Wed) by dlang (guest, #313) [Link]

It may be "only an implementation detail" in that the user of the API never deals with the sequence number.

But this detail is critical to avoiding the risk of permanent starvation of some thread.

They're mutexes, Jim, but not as we know them

Posted May 13, 2013 17:40 UTC (Mon) by ksandstr (guest, #60862) [Link]

The distinct ordering of locks is required to prevent the client from turning a set of N locks, out of which M are required, into a very expensive spinlock; the low-level idea being that the lock that violates ordering (and generates the EDEADLK status) is a different one each time, and that its being locked may end up not being due to the current client's other locks. WW mutexes prevent this by sleeping the backing-off client until the conflicting mutex has been released at least once, which is strictly required for its progress.

But as you say, for algorithms where the needed set of locks cannot change between backout and retry, your solution is adequate. I've found that those algorithms generally aren't candidates for fine-grained locking, though that's neither here or there.

Personally, if wait/wound mutexes remove these halfway esoteric concerns from mainstream kernel code (along with the entire idea of lock ordering), I'll be happy as a clam.