|| ||Zachary Amsden <zamsden-AT-redhat.com> |
|| ||"H. Peter Anvin" <hpa-AT-zytor.com> |
|| ||Re: [PATCH] x86 rwsem optimization extreme |
|| ||Wed, 17 Feb 2010 15:03:52 -1000|
|| ||Linus Torvalds <torvalds-AT-linux-foundation.org>,
linux-kernel-AT-vger.kernel.org, Thomas Gleixner <tglx-AT-linutronix.de>,
Ingo Molnar <mingo-AT-redhat.com>, x86-AT-kernel.org,
Avi Kivity <avi-AT-redhat.com>|
|| ||Article, Thread
> On 02/17/2010 02:10 PM, Linus Torvalds wrote:
>> The cost of 'adc' may happen to be identical in this case, but I suspect
>> you didn't test on UP, where the 'lock' prefix goes away. An unlocked
>> 'add' tends to be faster than an unlocked 'adc'.
>> (It's possible that some micro-architectures don't care, since it's a
>> memory op, and they can see that 'C' is set. But it's a fragile assumption
>> that it would always be ok).
> FWIW, I don't know of any microarchitecture where adc is slower than
> add, *as long as* the setup time for the CF flag is already used up.
> However, as I already commented, I don't think this is worth it. This
> inline appears to only be instantiated once, and as such, it takes a
> whopping six bytes across the entire kernel.
Without the locks,
stc; adc %rdx, (%rax)
add %rdx, (%rax)
Shows no statistical difference on Intel.
On AMD, the first form is about twice as expensive.
Course this is all completely useless, but it would be if the locks were
inline (which is actually an askable question now). There was just so much
awesomeness going on with the 64-bit rwsem constructs I felt I had to add
even more awesomeness to the plate. For some definition of awesomeness.
to post comments)