LWN.net Logo

Beautiful

Beautiful

Posted Feb 7, 2008 4:01 UTC (Thu) by elanthis (subscriber, #6227)
Parent article: Ticket spinlocks

Seeing algorithms like these come into being is nothing short of beautiful.

Is this an old trick just now brought to the Linux kernel, or something new Nick came up with?


(Log in to post comments)

Beautiful

Posted Feb 7, 2008 5:06 UTC (Thu) by walken (subscriber, #7089) [Link]

I think the "lamport's bakery" algorithm is similar to this. There may be subtle differences
though, I'm not an expert in locking algorithms...

20 years too late?

Posted Feb 7, 2008 18:33 UTC (Thu) by cpeterso (guest, #305) [Link]

I believe DEC implemented a similar lock queue algorithm in the Topaz operating system for
their Firefly multiprocessor system in the 1980s. I think their system avoided polling the
spinlock, thus eliminating memory bus contention. They had some sort of cascading queue of
locks.

If your spinlocks have this much contention, maybe they should sleep (in a wait queue) instead
of spinning?

Beautiful

Posted Feb 9, 2008 7:40 UTC (Sat) by Nick (subscriber, #15060) [Link]

Hi, this is Nick,

I must say first that Jon's writeup is very good, much better than my own attempts to explain
the problem/solution! :)

Secondly, I cannot take credit for the basic algorithm. I didn't actually look at any ticket
lock implementations before implementing this one, however the basic idea of implementing a
fifo using a continually incrementing head and tail index is not new at all.

If anything, I might take credit for distilling it down to somewhere close to the optimal x86
implementation. This is based just on the fact that I had a famous CPU designer ask if they
could use it as their company's reference ticket lock assembly sequence! (but I don't consider
myself an x86 guru, so I wouldn't be surprised if someone else coded this up well before me!)

The main trick I use, as Jon's article points to, is to increment the "next" part of the lock
and load the "owner" part in a single instruction. I do this by using 16 bit wide memory
operations on the 8+8 bit wide ticket lock.

The basic implementation of the algorithm would first increment the next part, and then load
the owner part. Doing it like this is probably fine, but it will increase the load on the L1
cache or on some CPUs it might have to invoke the store forwarding logic or even have to stall
until the store exists the store queue.

To answer cpeterso's question, yes in most situations, if you have enough contention on your
lock that starvation is a problem then sure you have a larger problem with your algorithm.
However, what can happen is that you have a common-case-uncontended lock but 0.001% of your
users will actually generate a lot of contention on that lock. Even if the critical section
itself is very tiny, the cost of transferring the lock cacheline will end up being a limiting
factor as the frequency of acquisition and the core count increases. It is, IMO, better to
survive the corner cases gracefully than to squeeze another couple of cycles out of the
algorithm in the case that you expect. (Also don't forget that often sleeping locks are
implemented using a spinlock)

I could easily be mistaken, but I believe it will result in a better kernel. We'll see.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds