The original idea is geared towards threadpools. If a user enqueues work which blocks or might block (e.g. lock a mutex, send data on a socket, ...)
then the threadpool's efficiency is temporarily decreased.
One way to solve would be to overallocate threads, but that only brings unnecessary context switching. Another would be to require the user to use async I/O but this tends to be rather complex and impacts the threadpool user's code instead of the library itself.
The prototype implementation, as I saw later on, actually resembles what Tejun Heo is doing with his workqueue implementation. When one thread of the workqueue blocks, another one is woken to resume work. The big difference, of course, being that workqueues are kernel-only, while userspace might definitely benefit from a similar approach.
At this moment I'm geared towards implementing Ingo's solution as that would not only benefit my use-case, but could also be used as a true perf-counter to measure how well a given workload is using the available CPU-power. It's also quite flexible.
Linus' solution could still be added later on as both solutions have no impact on each other as far as I can tell.