Not logged in
Log in now
Create an account
Subscribe to LWN
LWN.net Weekly Edition for May 16, 2013
A look at the PyPy 2.0 release
PostgreSQL 9.3 beta: Federated databases and more
LWN.net Weekly Edition for May 9, 2013
(Nearly) full tickless operation in 3.10
Either the lock would be handled separately or the data would have a pointer to the "real" lock.
It strikes me as quite funny that adding a level of indirection could help performance in this case.
What about separating locks and data?
Posted Jan 4, 2013 17:04 UTC (Fri) by corbet (editor, #1)
If you move the lock out of the structure it is protecting, you have to allocate it in a separate step. The allocation could maybe be hidden in spin_lock_init(), but the freeing would require explicit action (in vast numbers of call sites) or some severe magic. Probably don't want to do that.
Regardless of whether the lock is relocated or not, the only way to avoid cache problems with surrounding data is to expand the lock to fill an entire cache line. That would bloat spinlocks considerably, and there are a lot of spinlocks in a running kernel. That overhead would hurt; among other things, the bloat, alone, would slow things by causing additional cache utilization.
Maybe I'm missing something (I often do), but, from what I can tell, trying to isolate the locks isn't really a viable solution.
Posted Jan 4, 2013 17:34 UTC (Fri) by renox (subscriber, #23785)
As for the memory bloat, I think that these "separated spinlock" would be useful for only the most contended spinlocks which would somewhat mitigate the issue.
Of course contention isn't the same on all the systems and developers working on embedded or supercomputers would probably disagree on which spinlocks has to be "separated"..
Posted Jan 5, 2013 22:53 UTC (Sat) by dlang (✭ supporter ✭, #313)
how do you know what the best cache size is to use on the processor that someone will buy next year and run the binary you compiled today on.
Posted Jan 5, 2013 23:50 UTC (Sat) by giraffedata (subscriber, #1954)
That would make this kind of memory layout feasible until they invent a more complex form of caching.
Posted Jan 5, 2013 18:07 UTC (Sat) by PaulMcKenney (subscriber, #9624)
Of course, and noted earlier in this thread, the opposite holds true at high levels of contention. Life is like that sometimes! ;-)
Posted Jan 4, 2013 18:58 UTC (Fri) by daney (subscriber, #24551)
1) Each load from the Now-Serving counter memory location creates traffic on the internal CPU buses. This traffic decreases the amount of useful work that can be done. Since the threads blocked waiting for their Now-Serving number to come up are not doing anything useful, decreasing the amount of bus traffic they are generating makes everything else go faster.
2) Cache line contention/bouncing caused by Ticket-Counter modifications and modifications to the data protected by the lock.
The first issue is (mostly) the one addressed by the patch in question. Increasing the size/alignment of arch_spinlock_t to occupy an entire cache line might be beneficial for some use cases, but it would increase the size of many structures, thus causing increased cache pressure.
Posted Jan 5, 2013 10:38 UTC (Sat) by renox (subscriber, #23785)
For the second issue, what you're describing (having the spinlock occupying an entire cache line) isn't always necessary: in some cases you could put 'cold' data in the same cache line as the lock to get the best performance without using too much memory.
Posted Jan 10, 2013 18:30 UTC (Thu) by ksandstr (guest, #60862)
That'd protect the significant cachelines from not only write-bouncing from ticket-acquisition, but from any spinlock-related "oops, had to flush this exclusive cache line to RAM in the meantime" cases due to read traffic also. I'm guessing that an operation to acquire locks on N objects without ordering foulups could fit on top of that as well.
Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds