Eliminate the no-SIMD en/decryption fallbacks on x86
From: | Eric Biggers <ebiggers-AT-kernel.org> | |
To: | x86-AT-kernel.org | |
Subject: | [RFC PATCH 0/2] Eliminate the no-SIMD en/decryption fallbacks on x86 | |
Date: | Wed, 19 Feb 2025 21:13:23 -0800 | |
Message-ID: | <20250220051325.340691-1-ebiggers@kernel.org> | |
Cc: | linux-crypto-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org, Ard Biesheuvel <ardb-AT-kernel.org>, Ben Greear <greearb-AT-candelatech.com>, Xiao Liang <shaw.leon-AT-gmail.com>, Thomas Gleixner <tglx-AT-linutronix.de>, Ingo Molnar <mingo-AT-redhat.com>, Borislav Petkov <bp-AT-alien8.de>, Dave Hansen <dave.hansen-AT-linux.intel.com>, Andy Lutomirski <luto-AT-kernel.org>, "Jason A . Donenfeld" <Jason-AT-zx2c4.com> | |
Archive-link: | Article |
The patchset can also be retrieved from: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/... x86-softirq-fpu-fix-v1 This patchset fixes a longstanding issue where kernel-mode FPU (i.e., SIMD) was not reliably usable in softirqs in x86, which was creating the need for a fallback. The fallback was really bad for performance, and it even hurt performance for users that never encountered the edge case where kernel-mode FPU was not usable. This patchset aligns x86 with other architectures such as arm, arm64, and riscv by making kernel-mode FPU work in softirqs reliably. There are a few possible ways to achieve that, and for now I just went with the simplest way; see patch 1 for details. Patch 2 eliminates all uses of the "crypto SIMD helper" from x86, as patch 1 makes it unnecessary. For the RFC it is just one big patch; I'll probably split patch 2 up if this progresses past RFC status. Performance results have been positive. All en/decryption is now slightly faster on x86, as it no longer take a detour through crypto/simd.c. I get a 7% or 23% improvement for AES-XTS, for example. I also benchmarked bidirectional IPsec, which has been claimed to often hit the edge case where kernel-mode FPU was previously not usable in softirq context. Ultimately, I was not actually able to reproduce that edge case being reached unless I reduced the number of CPUs to 1, in which case it then started being occasionally reached. Regardless, even without that case being reached, IPsec throughput still improved by 2%. In situations where that case was being reached, or where users required a synchronous algorithm, a much larger improvement should be seen. Eric Biggers (2): x86/fpu: make kernel-mode FPU reliably usable in softirqs crypto: x86 - stop using the SIMD helper arch/x86/crypto/Kconfig | 14 -- arch/x86/crypto/aegis128-aesni-glue.c | 13 +- arch/x86/crypto/aesni-intel_glue.c | 168 ++++++++------------- arch/x86/crypto/aria_aesni_avx2_glue.c | 22 +-- arch/x86/crypto/aria_aesni_avx_glue.c | 20 +-- arch/x86/crypto/aria_gfni_avx512_glue.c | 22 +-- arch/x86/crypto/camellia_aesni_avx2_glue.c | 21 +-- arch/x86/crypto/camellia_aesni_avx_glue.c | 21 +-- arch/x86/crypto/cast5_avx_glue.c | 21 +-- arch/x86/crypto/cast6_avx_glue.c | 20 +-- arch/x86/crypto/serpent_avx2_glue.c | 21 +-- arch/x86/crypto/serpent_avx_glue.c | 21 +-- arch/x86/crypto/serpent_sse2_glue.c | 21 +-- arch/x86/crypto/sm4_aesni_avx2_glue.c | 30 ++-- arch/x86/crypto/sm4_aesni_avx_glue.c | 30 ++-- arch/x86/crypto/twofish_avx_glue.c | 21 +-- arch/x86/include/asm/fpu/api.h | 17 +-- arch/x86/kernel/fpu/core.c | 37 ++--- 18 files changed, 180 insertions(+), 360 deletions(-) base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319 prerequisite-patch-id: ec1feea7e6f4d03e4e4c64c492197b89c957611a -- 2.48.1