|| ||Jeremy Fitzhardinge <email@example.com> |
|| ||Peter Zijlstra <firstname.lastname@example.org> |
|| ||[PATCH RFC 0/7] x86: convert ticketlocks to C and remove duplicate code |
|| ||Thu, 23 Jun 2011 18:19:12 -0700|
|| ||Linus Torvalds <email@example.com>,
"H. Peter Anvin" <firstname.lastname@example.org>, Ingo Molnar <email@example.com>,
the arch/x86 maintainers <firstname.lastname@example.org>,
Linux Kernel Mailing List <email@example.com>,
Nick Piggin <firstname.lastname@example.org>,
Jeremy Fitzhardinge <email@example.com>|
|| ||Article, Thread
From: Jeremy Fitzhardinge <firstname.lastname@example.org>
I'm proposing this series for 3[.0].1.
[ Change since last post: added xadd helpers. The inc helper didn't
seem very useful. ]
This is a repost of a series to clean up the x86 ticket lock
code by converting it to a mostly C implementation and removing
lots of duplicate code relating to the ticket size.
The last time I posted this series, the only significant comments
were from Nick Piggin, specifically relating to:
1. A wrongly placed barrier on unlock (which may have allowed the
compiler to move things out of the locked region. I went
belt-and-suspenders by having two barriers to prevent motion
into or out of the locked region.
2. With NR_CPUS < 256 the ticket size is 8 bits. The compiler doesn't
use the same trick as the hand-coded asm to directly compare the high
and low bytes in the word, but does a bit of extra shuffling around.
However, the Intel optimisation guide and several x86 experts have
opined that its best to avoid the high-byte operations anyway, since
they will cause a partial word stall, and the gcc-generated code should
Overall the compiler-generated code is very similar to the hand-coded
versions, with the partial byte operations being the only significant
difference. (Curiously, gcc does generate a high-byte compare for me
in trylock, so it can if it wants to.)
I've been running with this code in place for several months on 4 core
systems without any problems.
I couldn't measure a consistent performance difference between the two
implemenations; there seemed to be +/- ~1% +/-, which is the level of
variation I see from simply recompiling the kernel with slightly
different code alignment.
Overall, I think the large reduction in code size is a big win.
Jeremy Fitzhardinge (7):
x86/ticketlock: clean up types and accessors
x86/ticketlock: convert spin loop to C
x86/ticketlock: Use C for __ticket_spin_unlock
x86/ticketlock: make large and small ticket versions of spin_lock the
x86/ticketlock: make __ticket_spin_lock common
x86/ticketlock: make __ticket_spin_trylock common
x86/ticketlock: prevent memory accesses from reordered out of lock
arch/x86/include/asm/spinlock.h | 147 ++++++++++++---------------------
arch/x86/include/asm/spinlock_types.h | 22 +++++-
2 files changed, 74 insertions(+), 95 deletions(-)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/