|
|
Log in / Subscribe / Register

Wait until all futexes are available?

Wait until all futexes are available?

Posted Jun 25, 2020 11:22 UTC (Thu) by itsmycpu (guest, #139639)
In reply to: Wait until all futexes are available? by glenn
Parent article: Rethinking the futex API

> An alternative approach is to have the consumer wait on a single futex. Now each producer is responsible for evaluating whether all of the inputs of the consumer have been satisfied. The last producer to satisfy these inputs wakes the consumer. This leads to a more efficient solution, but it is not without its trade-offs. A producer needs insight into the availability of data on the other edges of its consumer.

Your situation may have specifics that I am not aware of, so I can only hope that this is going to make sense:

Each producer keeps track of if a consumer has already been informed about the availability of data, in that consumer's list entry. That list entry of course also has a reference to the consumer. The consumer itself has an atomic counter for how many inputs it is still missing. When a producer is able to provide a new input, it reduces the consumer's (atomic) counter by 1. In case it reaches zero at that point, it wakes the consumer.

Instead of a counter, for example an atomic bitset can be used. Then the consumer's list entry contains the specific bit that this producer should set. When a producer sees that it has set the last necessary bit, it wakes the consumer (or just sets that flag if the consumer isn't waiting). This is what I have used in some cases. In my case, the consumer is often doing other work while the flag is set, so that it often does not have to wait, and the whole process takes place in user space.

> This implies that the producers must run within the same process or share some state in shared memory. This makes for a cross-cutting software architecture where a composable one would feel much more natural. The lifetime of shared memory is not fun to manage. There are also security implications.

As far as I know, using futexes between processes always requires shared memory, one way or the other. That probably makes them fastest solution when blocking threads is required on the slow path.

(Of course futexes should only be used on the slow path.)


to post comments

Wait until all futexes are available?

Posted Jun 30, 2020 15:42 UTC (Tue) by ncm (guest, #165) [Link]

I endorse the above. For nodes with a large number of inputs, each bit can represent another bitmap, recursively, until there are enough bits for all the inputs. When any bitmap is filled, the producer sets the bit representing it at the next level up. When the topmost bitmap is filled, the producer that filled it wakes the thread.

Bits are set by compare-and-swap. atomically, with no system call needed.

To do it with a counter, the producer atomically decrements a counter, and wakes the thread when it sees the transition to zero.

Waking the thread might mean writing a byte to a pipe it is sleeping on.

Or, the producer's thread might just add the consumer to a scoreboard of runnable tasks, and some already-running thread will get to it. That way, the kernel is never involved at all. You keep exactly as many threads as you have cores on the job, and they spin when there is no work. Isolate those cores (isolcpus, nohz_full, rcu_nocbs) so the kernel does not steal them.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds