|
BeautifulBeautifulPosted Feb 9, 2008 7:40 UTC (Sat) by Nick (subscriber, #15060)In reply to: Beautiful by elanthis Parent article: Ticket spinlocks
Hi, this is Nick, I must say first that Jon's writeup is very good, much better than my own attempts to explain the problem/solution! :) Secondly, I cannot take credit for the basic algorithm. I didn't actually look at any ticket lock implementations before implementing this one, however the basic idea of implementing a fifo using a continually incrementing head and tail index is not new at all. If anything, I might take credit for distilling it down to somewhere close to the optimal x86 implementation. This is based just on the fact that I had a famous CPU designer ask if they could use it as their company's reference ticket lock assembly sequence! (but I don't consider myself an x86 guru, so I wouldn't be surprised if someone else coded this up well before me!) The main trick I use, as Jon's article points to, is to increment the "next" part of the lock and load the "owner" part in a single instruction. I do this by using 16 bit wide memory operations on the 8+8 bit wide ticket lock. The basic implementation of the algorithm would first increment the next part, and then load the owner part. Doing it like this is probably fine, but it will increase the load on the L1 cache or on some CPUs it might have to invoke the store forwarding logic or even have to stall until the store exists the store queue. To answer cpeterso's question, yes in most situations, if you have enough contention on your lock that starvation is a problem then sure you have a larger problem with your algorithm. However, what can happen is that you have a common-case-uncontended lock but 0.001% of your users will actually generate a lot of contention on that lock. Even if the critical section itself is very tiny, the cost of transferring the lock cacheline will end up being a limiting factor as the frequency of acquisition and the core count increases. It is, IMO, better to survive the corner cases gracefully than to squeeze another couple of cycles out of the algorithm in the case that you expect. (Also don't forget that often sleeping locks are implemented using a spinlock) I could easily be mistaken, but I believe it will result in a better kernel. We'll see.
(Log in to post comments)
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.