x86/percpu: Use C for arch_raw_cpu_ptr()
From: | Uros Bizjak <ubizjak-AT-gmail.com> | |
To: | x86-AT-kernel.org, linux-kernel-AT-vger.kernel.org | |
Subject: | [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr() | |
Date: | Tue, 10 Oct 2023 18:42:29 +0200 | |
Message-ID: | <20231010164234.140750-1-ubizjak@gmail.com> | |
Cc: | Uros Bizjak <ubizjak-AT-gmail.com>, Nadav Amit <namit-AT-vmware.com>, Andy Lutomirski <luto-AT-kernel.org>, Brian Gerst <brgerst-AT-gmail.com>, Denys Vlasenko <dvlasenk-AT-redhat.com>, "H . Peter Anvin" <hpa-AT-zytor.com>, Linus Torvalds <torvalds-AT-linux-foundation.org>, Peter Zijlstra <peterz-AT-infradead.org>, Thomas Gleixner <tglx-AT-linutronix.de>, Josh Poimboeuf <jpoimboe-AT-redhat.com> | |
Archive-link: | Article |
Implementing arch_raw_cpu_ptr() in C, allows the compiler to perform better optimizations, such as setting an appropriate base to compute the address instead of an add instruction. E.g.: address calcuation in amd_pmu_enable_virt() improves from: 48 c7 c0 00 00 00 00 mov $0x0,%rax 87b7: R_X86_64_32S cpu_hw_events 65 48 03 05 00 00 00 add %gs:0x0(%rip),%rax 00 87bf: R_X86_64_PC32 this_cpu_off-0x4 48 c7 80 28 13 00 00 movq $0x0,0x1328(%rax) 00 00 00 00 to: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax 00 8798: R_X86_64_PC32 this_cpu_off-0x4 48 c7 80 00 00 00 00 movq $0x0,0x0(%rax) 00 00 00 00 87a6: R_X86_64_32S cpu_hw_events+0x1328 Co-developed-by: Nadav Amit <namit@vmware.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> --- arch/x86/include/asm/percpu.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 60ea7755c0fe..cdc188279c5a 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -49,6 +49,19 @@ #define __force_percpu_prefix "%%"__stringify(__percpu_seg)":" #define __my_cpu_offset this_cpu_read(this_cpu_off) +#ifdef CONFIG_USE_X86_SEG_SUPPORT +/* + * Efficient implementation for cases in which the compiler supports + * named address spaces. Allows the compiler to perform additional + * optimizations that can save more instructions. + */ +#define arch_raw_cpu_ptr(ptr) \ +({ \ + unsigned long tcp_ptr__; \ + tcp_ptr__ = __raw_cpu_read(, this_cpu_off) + (unsigned long)(ptr); \ + (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \ +}) +#else /* CONFIG_USE_X86_SEG_SUPPORT */ /* * Compared to the generic __my_cpu_offset version, the following * saves one instruction and avoids clobbering a temp register. @@ -61,6 +74,8 @@ : "m" (__my_cpu_var(this_cpu_off)), "0" (ptr)); \ (typeof(*(ptr)) __kernel __force *)tcp_ptr__; \ }) +#endif /* CONFIG_USE_X86_SEG_SUPPORT */ + #else /* CONFIG_SMP */ #define __percpu_seg_override #define __percpu_prefix "" -- 2.41.0