Posted May 15, 2010 0:54 UTC (Sat) by vomlehn (subscriber, #45588)
Parent article: Adaptive spinning futexes
I actually implemented conditional user-space spinning in 2.4 some years ago and it had really nice performance. It relied on having a "callboard", i.e. a piece of memory that indicated, for each thread in which you were interested, whether it was running or not. The memory is registered with the kernel, which is responsible for updating it when the process state changes.
So, the idea is that you store the thread ID (or an index for the thread) in the spinlock. When you want the lock, you do a conditional load and store with you tid/thread index. If you get back zero, nobody has the lock and you're done. Otherwise, you use the tid/thread index to check the callboard to see whether the thread holding the lock is running. If so, you loop again. If thread holding the lock isn't running, you make a system call that sleeps until the spinlock value is zero (you pass the address to check in the system call).
The performance of this was good, but the nicest part was that there wasn't a significant performance drop-off as the number of contenders for the lock goes up. I no longer recall whether I was using a 4- or 8 processor machine, but I *think* it was 8. From the caching standpoint, the callboard is read often/write rarely, always a good thing. If your conditional load and store only causes a cache conversion to modified if the write actually happens, you also have good cache behavior there. (When I was working on this, the processor I was using would actually record a cache modification even if the condition wasn't met. Ick.)
The work never got pushed back and nothing ever came of it that I know of, which was kinda sad. Oracle had requested that we do this.
Posted May 17, 2010 17:10 UTC (Mon) by dvhart (guest, #19636)
[Link]
In your 2.4 implementation, you allocated the memory in userspace and then told the kernel where it was? Along the same lines as SET_TID_ADDRESS(2)? Did you also pin this memory?
Adaptive spinning futex implementation
Posted May 18, 2010 1:45 UTC (Tue) by vomlehn (subscriber, #45588)
[Link]
Yes, you allocated the memory anyway you wanted, but using shared memory reduced the amount of work the kernel had to do because it could update the process' states in only one place. Plus you needed shared memory for the spinlock part anyway. You're right about pinning the memory, too. We updated the state in the scheduler, so you couldn't go to sleep while the memory was paged in.