|
|
Log in / Subscribe / Register

Rethinking the futex API

Rethinking the futex API

Posted Jun 19, 2020 6:42 UTC (Fri) by ras (subscriber, #33059)
In reply to: Rethinking the futex API by krisman
Parent article: Rethinking the futex API

> How do you wake up multiple threads from your producer (1) without issuing multiple syscalls to wake each futex,

If they are waiting on the same futex, one call will suffice. If they are waiting on multiple calls, multiple wakes will be needed of course. But remember, futex's optimised for the fast path. The fast path is when the lock is uncontested. In that case there no one is waiting, and no one has to be woken up, and so no syscalls at all. If you do have to wake another thread up, you've already incurred the overhead on one syscall: the FUTEX_WAIT that put it to sleep, so it's not an untenable additional penalty.

> (2) avoiding spurious wakeups of unrelated threads,

Why are there wakeups on unrelated threads? This wonderful utex library has recorded what utex_t's each thread is willing to be woken up in its shared memory, and wakes up as many as needed. It's not like the kernel can do it differently: it also will have to look up what user space threads are waiting on the futex, and wake some up. The basic algorithm and code doesn't change when it moves from kernel space to user space.

> (4) not relying in your waiters spinning (which throws cpu utilization through the roof)

Why is anybody spinning - kernel or user space? Well OK - the kernel might spin because your critical section isn't long and so it's faster than mutex_lock(). But that's just a speed optimisation - you could always just use a mutex_lock(). In user space spinning always requires a syscall - perhaps many if you end up spinning for a while, so you are always better off with the user space mutex_lock() equivalent, which is futex(), and hope it is uncontested. If it is uncontested there are no syscalls, and remember, if the kernel is spinning, then someone must has done a syscall to the kernel already, which is another way of saying futex_waitv() takes the syscall hit every time.

> Can you better describe what your model would look like in memory?

Ahh, that is a long answer. And you're stretching my memory because I was 30 years ago I did it for an OS I developed and sold. It's too long to go into it in detail, but briefly it was based on two primitives: one that looked like utex_waitv(). The second primitive was based on a new trick: every utex allowed you to attach an arbitrary number of child utex's on them. Waking up such a child transparently woke up the parent instead of the child, so a utex_waitv() on a child would never succeed. utex_waitv() returned the utex that woke up, which might be a child utex of one of the utex’s in the top level utex_waitv() vector.

utex_wake() becomes more complex of course: it has to check if the child had a parent, and indeed if the parent had a parent in order to wake up the right utex. I put the woken up utex’s in a queue list attached to the parent because I wanted to preserve the order of wakeups, but simply setting a flag in the child utex then having utex_wake() go looking for it would also work.

I actually had a 3rd feature which would probably be frowned upon today: you could attach a function to a utex and instead of waking it up, utex_wake() would call that function, effectively turning the utex into an interrupt like thingy. That was how the parent / child feature was actually implemented - the child had a function attached that woke up the parent. This feature has the interesting property of letting a utex transparently defer to some other mutex / condition mechanism. For example, instead of waking up a parent, you could write a byte to a fifo, or signal a semop(). The person calling the utex_wake() is none the wiser.

In the general case you have to lock a lot of things when you are altering the resulting tree of mutex's. In my case I can't remember that happening. Instead one thread, the one that the parent mutex added and deleted children and since the children can only be attached to one parent there is no contention.

BTW, I've never found a use for waking up more than one thread at a time. It obviously has it's uses if you really want to release a thundering herd, but a mutex usually guards a shared volatile resource that only one thread can change at a time, so there is no point waking more than one. If it discovers it doesn't want it, it can always wake someone else up.

This all requires a bit of work. You aren't dolling up futex's with a thin wrapper, you are implementing a new thing that uses futex's where the kernel would use spinlocks, IRQ masking or mutex_lock(). It is duplicating the existing code and data structures in the kernel. And there are more complexities I haven't mentioned, like when a wake calls a function that does a wait. However, because of that remarkable property of mutex that most code isn't executed unless there is contention, it doesn't matter speed wise because none of it is touched in the fast path.


to post comments


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds