Error recovery for vfio-pci devices on s390x
From: | Farhan Ali <alifm-AT-linux.ibm.com> | |
To: | linux-s390-AT-vger.kernel.org, kvm-AT-vger.kernel.org, linux-kernel-AT-vger.kernel.org, linux-pci-AT-vger.kernel.org | |
Subject: | [PATCH v4 00/10] Error recovery for vfio-pci devices on s390x | |
Date: | Wed, 24 Sep 2025 10:16:18 -0700 | |
Message-ID: | <20250924171628.826-1-alifm@linux.ibm.com> | |
Cc: | alex.williamson-AT-redhat.com, helgaas-AT-kernel.org, clg-AT-redhat.com, alifm-AT-linux.ibm.com, schnelle-AT-linux.ibm.com, mjrosato-AT-linux.ibm.com | |
Archive-link: | Article |
Hi, This Linux kernel patch series introduces support for error recovery for passthrough PCI devices on System Z (s390x). Background ---------- For PCI devices on s390x an operating system receives platform specific error events from firmware rather than through AER.Today for passthrough/userspace devices, we don't attempt any error recovery and ignore any error events for the devices. The passthrough/userspace devices are managed by the vfio-pci driver. The driver does register error handling callbacks (error_detected), and on an error trigger an eventfd to userspace. But we need a mechanism to notify userspace (QEMU/guest/userspace drivers) about the error event. Proposal -------- We can expose this error information (currently only the PCI Error Code) via a device feature. Userspace can then obtain the error information via VFIO_DEVICE_FEATURE ioctl and take appropriate actions such as driving a device reset. I would appreciate some feedback on this series. Part of the series touches PCI common code, so would like to get some feedback on those patches. Thanks Farhan ChangeLog --------- v3 series https://lore.kernel.org/all/20250911183307.1910-1-alifm@l... v3 -> v4 - Remove warn messages for each PCI capability not restored (patch 1) - Check PCI_COMMAND and PCI_STATUS register for error value instead of device id (patch 1) - Fix kernel crash in patch 3 - Added reviewed by tags - Address comments from Niklas's (patches 4, 5, 7) - Fix compilation error non s390x system (patch 8) - Explicitly align struct vfio_device_feature_zpci_err (patch 8) v2 series https://lore.kernel.org/all/20250825171226.1602-1-alifm@l... v2 -> v3 - Patch 1 avoids saving any config space state if the device is in error (suggested by Alex) - Patch 2 adds additional check only for FLR reset to try other function reset method (suggested by Alex). - Patch 3 fixes a bug in s390 for resetting PCI devices with multiple functions. Creates a new flag pci_slot to allow per function slot. - Patch 4 fixes a bug in s390 for resource to bus address translation. - Rebase on 6.17-rc5 v1 series https://lore.kernel.org/all/20250813170821.1115-1-alifm@l... v1 - > v2 - Patches 1 and 2 adds some additional checks for FLR/PM reset to try other function reset method (suggested by Alex). - Patch 3 fixes a bug in s390 for resetting PCI devices with multiple functions. - Patch 7 adds a new device feature for zPCI devices for the VFIO_DEVICE_FEATURE ioctl. The ioctl is used by userspace to retriece any PCI error information for the device (suggested by Alex). - Patch 8 adds a reset_done() callback for the vfio-pci driver, to restore the state of the device after a reset. - Patch 9 removes the pcie check for triggering VFIO_PCI_ERR_IRQ_INDEX. Farhan Ali (10): PCI: Avoid saving error values for config space PCI: Add additional checks for flr reset PCI: Allow per function PCI slots s390/pci: Add architecture specific resource/bus address translation s390/pci: Restore IRQ unconditionally for the zPCI device s390/pci: Update the logic for detecting passthrough device s390/pci: Store PCI error information for passthrough devices vfio-pci/zdev: Add a device feature for error information vfio: Add a reset_done callback for vfio-pci driver vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX arch/s390/include/asm/pci.h | 30 +++++++- arch/s390/pci/pci.c | 75 ++++++++++++++++++++ arch/s390/pci/pci_event.c | 107 ++++++++++++++++------------- arch/s390/pci/pci_irq.c | 9 +-- drivers/pci/host-bridge.c | 4 +- drivers/pci/hotplug/s390_pci_hpc.c | 10 ++- drivers/pci/pci.c | 37 ++++++++-- drivers/pci/pcie/aer.c | 3 + drivers/pci/pcie/dpc.c | 3 + drivers/pci/pcie/ptm.c | 3 + drivers/pci/slot.c | 14 +++- drivers/pci/tph.c | 3 + drivers/pci/vc.c | 3 + drivers/vfio/pci/vfio_pci_core.c | 20 ++++-- drivers/vfio/pci/vfio_pci_intrs.c | 3 +- drivers/vfio/pci/vfio_pci_priv.h | 8 +++ drivers/vfio/pci/vfio_pci_zdev.c | 45 +++++++++++- include/linux/pci.h | 1 + include/uapi/linux/vfio.h | 15 ++++ 19 files changed, 318 insertions(+), 75 deletions(-) -- 2.43.0