A surprise with mutexes and reference counts
A surprise with mutexes and reference counts
Posted Nov 27, 2014 10:42 UTC (Thu) by tvld (guest, #59052)In reply to: A surprise with mutexes and reference counts by dpotapov
Parent article: A surprise with mutexes and reference counts
C++11 makes a similar requirement. IIRC, C11 isn't specific, but they usually default to semantic equality with C++11 and/or POSIX in the synchronization constructs.
The underlying issue is that there is both a fast path and a slow path for wake-up, and other threads can wake up in both ways because mutex release first wakes up through the fast-path and then throw the slow path; then, it can happen that the slow path wake-up runs concurrently with another thread having the lock acquired.
This affects glibc's mutex implementation too. There's no segfault or such, but there can be pending FUTEX_WAKE calls to memory locations that might have been unmapped or reused for a different futex. That can introduce spurious wake-ups for FUTEX_WAIT calls on other futexes, which violates the promise FUTEX_WAIT makes for a return value of 0.
I see two meaningful solutions for this:
(1) Clarify that spurious wake-ups (ie, wake-ups that happen but are not due to FUTEX_WAKE calls by the code that created this futex) are allowed when FUTEX_WAIT returns 0. This changes the futex contract, but it's incompletely documented anyway. This doesn't break any code in glibc or
other futex-using code that I'm aware of. Generally, unless you have a one-shot synchronization mechanism, typical futex uses will have to tolerate past slow-path wake-ups anyway, so need to be robust to spurious wake-ups. Also, all programs using glibc will have been and are affected by such spurious wake-ups anyway.
(2) Introduce a new futex type or variant of FUTEX_WAIT whose contract explicitly allows spurious wake-ups; combine with a FUTEX_WAKE that wakes only these calls or futexes. Has the benefit that nothing in contract of the original futex changes, but requires a kernel implementation change and many futex users will need to change (e.g., glibc).
Solving this in userspace entirely is possible, but would degrade performance. If you just allow slow-path wake-up, lock release latency increases and thus scalability decreases. Trying to implement some of the deferred memory reclamation schemes that Paul mentioned is hard for process-shared mutexes.
For more background, see https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html