LWN.net Logo

Out-of-lining spinlocks

Spinlocks, as the core kernel synchronization primitive, are highly performance critical. They are implemented differently on each architecture, by way of some carefully-crafted assembly code, so that not one extra cycle is spent there, especially when the lock is not contended. They are also implemented as inline assembly, so that no function calls get in the way of that fast path through.

Recently, however, Zwane Mwaikambo has pulled a patch out of the -tiny tree which moves spinlocks into normal, out-of-line functions - at least, on the x86 and x86-64 architectures. The reason for doing this is to shrink the kernel; there are a lot of spinlock calls in the kernel, and the inline code gets replicated for every one of them. Moving the spinlock code out of line gets rid of that duplication, and shrinks the kernel text size by 50KB or so.

Zwane posted some benchmarks showing that there are no performance regressions. In fact, on some hardware, the improved cache utilization brought about by pulling together the spinlock code can actually improve performance by a slight amount.

The patch comes with a configuration option allowing the spinlock code to be built in either mode. Given that moving the code out of line seems to be a win, some have wondered if things shouldn't always be done that way. Linus pointed out one advantage to the inline code: it makes the sources of lock contention very clear in kernel profiles. With out-of-line spinlocks, all a profile will show is that a lot of time was spent waiting for locks; with the code inline, the function which is actually waiting for the lock shows up instead. So out-of-line locks may be best for production kernels, but developers may want to keep them inline.


(Log in to post comments)

Out-of-lining spinlocks

Posted Aug 12, 2004 9:16 UTC (Thu) by rjw (guest, #10415) [Link]

Surely that is an issue better solved by the profiling tools ( ie allowing certain functions cost to be assigned to their callers) rather than making your code fit the tool?

I'd be really surprised if no profiling tools allowed that.

Out-of-lining spinlocks

Posted Aug 12, 2004 15:10 UTC (Thu) by jmshh (guest, #8257) [Link]

The issue is better solved by the profiling tools, but for most architectures they don't know how to identify the caller. The missing item is the return address.

There are two solutions:

  1. Compile the code with using a frame pointer (FP) for each routine. For most architectures (especially register starved ones like x86) the code will be significantly larger and slower.
  2. Analyze the code statically and for every possible PC calculate an offset from SP where the return address is found. This requires a dissassembler in the tool.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds