Posted Feb 9, 2008 7:40 UTC (Sat) by Nick (guest, #15060)
In reply to: Beautiful by elanthis
Parent article: Ticket spinlocks
Hi, this is Nick,
I must say first that Jon's writeup is very good, much better than my own attempts to explain
the problem/solution! :)
Secondly, I cannot take credit for the basic algorithm. I didn't actually look at any ticket
lock implementations before implementing this one, however the basic idea of implementing a
fifo using a continually incrementing head and tail index is not new at all.
If anything, I might take credit for distilling it down to somewhere close to the optimal x86
implementation. This is based just on the fact that I had a famous CPU designer ask if they
could use it as their company's reference ticket lock assembly sequence! (but I don't consider
myself an x86 guru, so I wouldn't be surprised if someone else coded this up well before me!)
The main trick I use, as Jon's article points to, is to increment the "next" part of the lock
and load the "owner" part in a single instruction. I do this by using 16 bit wide memory
operations on the 8+8 bit wide ticket lock.
The basic implementation of the algorithm would first increment the next part, and then load
the owner part. Doing it like this is probably fine, but it will increase the load on the L1
cache or on some CPUs it might have to invoke the store forwarding logic or even have to stall
until the store exists the store queue.
To answer cpeterso's question, yes in most situations, if you have enough contention on your
lock that starvation is a problem then sure you have a larger problem with your algorithm.
However, what can happen is that you have a common-case-uncontended lock but 0.001% of your
users will actually generate a lot of contention on that lock. Even if the critical section
itself is very tiny, the cost of transferring the lock cacheline will end up being a limiting
factor as the frequency of acquisition and the core count increases. It is, IMO, better to
survive the corner cases gracefully than to squeeze another couple of cycles out of the
algorithm in the case that you expect. (Also don't forget that often sleeping locks are
implemented using a spinlock)
I could easily be mistaken, but I believe it will result in a better kernel. We'll see.