KVM: SVM: Rework ASID management
From: | Yosry Ahmed <yosry.ahmed-AT-linux.dev> | |
To: | Sean Christopherson <seanjc-AT-google.com> | |
Subject: | [RFC PATCH 00/24] KVM: SVM: Rework ASID management | |
Date: | Wed, 26 Mar 2025 19:35:55 +0000 | |
Message-ID: | <20250326193619.3714986-1-yosry.ahmed@linux.dev> | |
Cc: | Paolo Bonzini <pbonzini-AT-redhat.com>, Jim Mattson <jmattson-AT-google.com>, Maxim Levitsky <mlevitsk-AT-redhat.com>, Vitaly Kuznetsov <vkuznets-AT-redhat.com>, Rik van Riel <riel-AT-surriel.com>, Tom Lendacky <thomas.lendacky-AT-amd.com>, x86-AT-kernel.org, kvm-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org, Yosry Ahmed <yosry.ahmed-AT-linux.dev> | |
Archive-link: | Article |
This series reworks how SVM manages ASIDs by: (a) Allocating a single static ASID for each L1 VM, instead of dynamically allocating ASIDs. This simplifies the logic and allow for more unifications between SVM and SEV, as the latter already uses per-VM ASIDs as required for other purposes. This is patches 1 to 10. (b) Using a separate ASID for L2 VMs. Instead of using the same ASID for L1 and L2 guests, and doing a TLB flush and MMU sync on every nested transition, a separate ASID is used and TLB flushes are done conditionally as needed. This is patches 11 till the end. The advantages of this are: - Simplifying the logic by dropping dynamic ASID allocations. - Unifying some logic between SVM and SEV, as the latter already uses per-VM ASIDs as required for other purposes. - Enabling INVLPGB virtualization [1]. - Improving the performance of nested guests by avoiding some TLB flushes. The series was tested by running a L2 and L3 Linux guests with some simple workloads in them (mmap()/munmap() stress, netperf, etc). I also ran the KVM selftests in both L0 and L1. I believe some of the patches are in mergeable state, but this series is still an RFC for a few reasons: - I haven't done as much testing as I initially planned. Mainly I wanted to test with a Windows guest running WSL to get Linux and Windows L2 VMs running side-by-side. I couldn't get it done due to some testing infrastructure hiccups. - The SEV changes are generally untested beyond build testing, and I would like to get more feedback on them before moving forward. Namely, I think there is room for further unification. SEV should probably use the new kvm_tlb_tags infrastructure to allocate its ASIDs as well. The way I think about it is by optionally having a bitmap of "pending" ASIDs in kvm_tlb_tags, and make unused SEV ASIDs "pending" until we run out of space and do the necessary flushes to make them free. - I want to get general feedback about the direction this is heading in, and things like generalizing the ASID tracking in SEV to work for SVM, thoughts on using an xarray for that, etc. - Some things can/should be cleaned up, although they can be followups too. For example, the current logic will allocate a "normal" ASID for an SEV VM upon creation, then allocate an SEV-friendly ASID to it when SEV is initialized. The "normal" ASID remains allocated though, and kvm_svm->asid and kvm_svm->sev_info.asid remain different. It seems like we should not allocate the "normal" ASID to begin with, or free it if the VM uses SEV. However, I am not sure what's the best way to do any of this because I am not clear on the life cycle of a SEV VM. This series started as two separate series, one to optimize nested TLB flushes by using a separate ASID for L2 VMs [2], and one to use a single ASID per-VM [3]. However, there is a lot of dependency and interaction among both series that I think it's useful to combine them, at least for now so that the big picture is clear. The series can be later split again into 2 or more series, or merged incrementally. I am sending this out now to get feedback, and also to "checkpoint" my work as I won't be picking this up again for a few months. I will remain able to respond to discussion and reviews, although at a lower capacity. If anyone wants to pick up this series in the meantime, partially or fully, please feel free to do so. Just let me know so that we can coordinate. Rik and Tom, I CC'd you due to the previous discussion you had with Sean about INVLPGB virtualization. I can drop you from following versions if you'd like to avoid the noise. Here is a brief walkthrough of the series: Part 1: Use a single ASID per-VM - Patch 1 generalizes the VPID allocation into a generic kvm_tlb_tags factory to be used by SVM. - Patches 2-3 are cleanups and/or refactoring. - Patches 4-5 get rid of the cases where we currently allocate a new ASID dynamically by just flushing the existing ASID or falling back to full flush if flushing an ASID is not supported. - Patches 6-9 generalize SEV's per-CPU ASID -> vCPU tracking to make it work for SVM. - Patch 10 finally drops the dynamic ASID allocation logic and uses a single per-VM ASID. Part 2: Optimize nSVM TLB flushes - Patch 11 starts by using a separate ASID for L2 guests, although it is initially the same as the L1 ASID. It's essentially just laying the groundwork. - Patches 12 - 16 are refactoring groundwork. - Patches 17 - 22 add the needed handling of the L2 ASID TLB flushing. - Patch 23 starts allocating a new ASID for L2 as using the same ASID is no longer needed. - Patch 24 drops the unconditional TLB flushes on nested transitions, which are no longer necessary after L2 is using a separate well-maintained ASID. Diff from the initial versions of series [2] and [3]: - Generalized the SEV tracking of ASID->vCPU to use it for SVM, to make sure the TLB is flushed when a new vCPU with the same ASID is run on the same physical CPU. - Made sure kvm_hv_vcpu_purge_flush_tlb() is handled correctly by passing in is_guest_mode to purge the correct queue when doing L1 vs L2 TLB flushes (Maxim). - Improved the commentary in nested_svm_entry_tlb_flush() (Maxim). - Handle INVLPGA from the guest even nested NPT is used (Maxim). - Improved some commit logs. [1]https://lore.kernel.org/all/Z8HdBg3wj8M7a4ts@google.com/ [2]https://lore.kernel.org/lkml/20250205182402.2147495-1-yos... [3]https://lore.kernel.org/lkml/20250313215540.4171762-1-yos... Yosry Ahmed (24): KVM: VMX: Generalize VPID allocation to be vendor-neutral KVM: SVM: Use cached local variable in init_vmcb() KVM: SVM: Add helpers to set/clear ASID flush in VMCB KVM: SVM: Flush everything if FLUSHBYASID is not available KVM: SVM: Flush the ASID when running on a new CPU KVM: SEV: Track ASID->vCPU instead of ASID->VMCB KVM: SEV: Track ASID->vCPU on vCPU load KVM: SEV: Drop pre_sev_run() KVM: SEV: Generalize tracking ASID->vCPU with xarrays KVM: SVM: Use a single ASID per VM KVM: nSVM: Use a separate ASID for nested guests KVM: x86: hyper-v: Pass is_guest_mode to kvm_hv_vcpu_purge_flush_tlb() KVM: nSVM: Parameterize svm_flush_tlb_asid() by is_guest_mode KVM: nSVM: Split nested_svm_transition_tlb_flush() into entry/exit fns KVM: x86/mmu: rename __kvm_mmu_invalidate_addr() KVM: x86/mmu: Allow skipping the gva flush in kvm_mmu_invalidate_addr() KVM: nSVM: Flush both L1 and L2 ASIDs on KVM_REQ_TLB_FLUSH KVM: nSVM: Handle nested TLB flush requests through TLB_CONTROL KVM: nSVM: Flush the TLB if L1 changes L2's ASID KVM: nSVM: Do not reset TLB_CONTROL in VMCB02 on nested entry KVM: nSVM: Service local TLB flushes before nested transitions KVM: nSVM: Handle INVLPGA interception correctly KVM: nSVM: Allocate a new ASID for nested guests KVM: nSVM: Stop bombing the TLB on nested transitions arch/x86/include/asm/kvm_host.h | 2 + arch/x86/include/asm/svm.h | 5 - arch/x86/kvm/hyperv.h | 8 +- arch/x86/kvm/mmu/mmu.c | 22 ++- arch/x86/kvm/svm/nested.c | 68 ++++++--- arch/x86/kvm/svm/sev.c | 60 +------- arch/x86/kvm/svm/svm.c | 257 +++++++++++++++++++++++--------- arch/x86/kvm/svm/svm.h | 43 ++++-- arch/x86/kvm/vmx/nested.c | 4 +- arch/x86/kvm/vmx/vmx.c | 38 +---- arch/x86/kvm/vmx/vmx.h | 4 +- arch/x86/kvm/x86.c | 60 +++++++- arch/x86/kvm/x86.h | 13 ++ 13 files changed, 378 insertions(+), 206 deletions(-) -- 2.49.0.395.g12beb8f557-goog