| From: |
| Nikita Kalyazin <kalyazin-AT-amazon.com> |
| To: |
| <akpm-AT-linux-foundation.org>, <pbonzini-AT-redhat.com>, <shuah-AT-kernel.org> |
| Subject: |
| [RFC PATCH 0/5] KVM: guest_memfd: support for uffd missing |
| Date: |
| Mon, 03 Mar 2025 13:30:06 +0000 |
| Message-ID: |
| <20250303133011.44095-1-kalyazin@amazon.com> |
| Cc: |
| <kvm-AT-vger.kernel.org>, <linux-kselftest-AT-vger.kernel.org>, <linux-kernel-AT-vger.kernel.org>, <linux-mm-AT-kvack.org>, <lorenzo.stoakes-AT-oracle.com>, <david-AT-redhat.com>, <ryan.roberts-AT-arm.com>, <quic_eberman-AT-quicinc.com>, <jthoughton-AT-google.com>, <peterx-AT-redhat.com>, <graf-AT-amazon.de>, <jgowans-AT-amazon.com>, <roypat-AT-amazon.co.uk>, <derekmn-AT-amazon.com>, <nsaenz-AT-amazon.es>, <xmarcalx-AT-amazon.com>, <kalyazin-AT-amazon.com> |
| Archive-link: |
| Article |
This series is built on top of the v3 write syscall support [1].
With James's KVM userfault [2], it is possible to handle stage-2 faults
in guest_memfd in userspace. However, KVM itself also triggers faults
in guest_memfd in some cases, for example: PV interfaces like kvmclock,
PV EOI and page table walking code when fetching the MMIO instruction on
x86. It was agreed in the guest_memfd upstream call on 23 Jan 2025 [3]
that KVM would be accessing those pages via userspace page tables. In
order for such faults to be handled in userspace, guest_memfd needs to
support userfaultfd.
This series proposes a limited support for userfaultfd in guest_memfd:
- userfaultfd support is conditional to `CONFIG_KVM_GMEM_SHARED_MEM`
(as is fault support in general)
- Only `page missing` event is currently supported
- Userspace is supposed to respond to the event with the `write`
syscall followed by `UFFDIO_CONTINUE` ioctl to unblock the faulting
process. Note that we can't use `UFFDIO_COPY` here because
userfaulfd code does not know how to prepare guest_memfd pages, eg
remove them from direct map [4].
Not included in this series:
- Proper interface for userfaultfd to recognise guest_memfd mappings
- Proper handling of truncation cases after locking the page
Request for comments:
- Is it a sensible workflow for guest_memfd to resolve a userfault
`page missing` event with `write` syscall + `UFFDIO_CONTINUE`? One
of the alternatives is teaching `UFFDIO_COPY` how to deal with
guest_memfd pages.
- What is a way forward to make userfaultfd code aware of guest_memfd?
I saw that Patrick hit a somewhat similar problem in [5] when trying
to use direct map manipulation functions in KVM and was pointed by
David at Elliot's guestmem library [6] that might include a shim for that.
Would the library be the right place to expose required interfaces like
`vma_is_gmem`?
Nikita
[1] https://lore.kernel.org/kvm/20250303130838.28812-1-kalyaz...
[2] https://lore.kernel.org/kvm/20250109204929.1106563-1-jtho...
[3] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqV...
[4] https://lore.kernel.org/kvm/20250221160728.1584559-1-royp...
[4] https://lore.kernel.org/kvm/20250221160728.1584559-1-royp...
[5] https://lore.kernel.org/kvm/20241122-guestmem-library-v5-...
Nikita Kalyazin (5):
KVM: guest_memfd: add kvm_gmem_vma_is_gmem
KVM: guest_memfd: add support for uffd missing
mm: userfaultfd: allow to register userfaultfd for guest_memfd
mm: userfaultfd: support continue for guest_memfd
KVM: selftests: add uffd missing test for guest_memfd
include/linux/userfaultfd_k.h | 9 ++
mm/userfaultfd.c | 23 ++++-
.../testing/selftests/kvm/guest_memfd_test.c | 88 +++++++++++++++++++
virt/kvm/guest_memfd.c | 17 +++-
virt/kvm/kvm_mm.h | 1 +
5 files changed, 136 insertions(+), 2 deletions(-)
base-commit: 592e7531753dc4b711f96cd1daf808fd493d3223
--
2.47.1