| From: |
| Heiko Carstens <heiko.carstens@de.ibm.com> |
| To: |
| Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@elte.hu>,
Linus Torvalds <torvalds@linux-foundation.org>,
David Miller <davem@davemloft.net>,
Benjamin Herrenschmidt <be |
| Subject: |
| [patch 0/8] Allow inlined spinlocks again V5 |
| Date: |
| Sat, 29 Aug 2009 12:21:15 +0200 |
| Cc: |
| linux-arch@vger.kernel.org,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Arnd Bergmann <arnd@arndb.de>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Horst Hartmann <horsth@linux.vnet.ibm.com>,
Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>,
Heiko Carstens <heiko.carstens@de.ibm.com> |
This patch set allows to have inlined spinlocks again.
V2: rewritten from scratch - now also with readable code
V3: removed macro to generate out-of-line spinlock variants since that
would break ctags. As requested by Arnd Bergmann.
V4: allow architectures to specify for each lock/unlock variant if
it should be kept out-of-line or inlined.
V5: simplify ifdefs as pointed out by Linus. Fix architecture compile
breakages caused by this change.
Linus, Ingo, do you still have objections?
---
The rationale behind this is that function calls on at least s390 are
expensive.
If one considers that server kernels are usually compiled with
!CONFIG_PREEMPT a simple spin_lock is just a compare and swap loop.
The extra overhead for a function call is significant.
With inlined spinlocks overall cpu usage gets reduced by 1%-5% on s390.
These numbers were taken with some network benchmarks. However I expect
any workload that calls frequently into the kernel and which grabs a few
locks to perform better.
The implementation is straight forward: move the function bodies of the
locking functions to static inline functions and place them in a header
file.
By default all locking code remains out-of-line. An architecture can
specify
#define __spin_lock_is_small
in arch/<whatever>/include/asm/spinlock.h to force inlining of a locking
function.
defconfig cross compile tested for alpha, arm, x86, x86_64, ia64, m68k,
m68knommu, mips, powerpc, powerpc64, sparc64, s390, s390x.