The modernization of PCIe hotplug in Linux
The modernization of PCIe hotplug in Linux
Posted Oct 8, 2018 23:26 UTC (Mon) by davidstrauss (guest, #85867)Parent article: The modernization of PCIe hotplug in Linux
For PCIe hotplug (including removal) to "just work," kernel modules are going to need to get a lot more accustomed to an inversion of dependency removal. As suggested in "surprise removal," the traditional expectation has been an orderly retirement of consumers before their corresponding producers (to use the API terminology). We now have to accommodate the opposite: surprise electrical removal, followed by surprise PCIe device removal, followed by a kernel module surprising its consumers with an inability to service them, followed by graceful handling on downward. The ability to expose practically any PCIe device over Thunderbolt has changed the scope of hot removal support from a few types of server-style components (e.g. NVMe) to literally anything that may run over PCIe.
I would love to reach the point where I can safely and consistently unplug a Thunderbolt eGPU from my Linux machine -- without shutting it down first.
Posted Oct 9, 2018 14:56 UTC (Tue)
by MarcB (guest, #101804)
[Link] (6 responses)
I can easily see developers responding to that request with "just don't do that".
Posted Oct 9, 2018 17:38 UTC (Tue)
by jg (guest, #17537)
[Link] (3 responses)
Posted Oct 9, 2018 17:58 UTC (Tue)
by davidstrauss (guest, #85867)
[Link] (2 responses)
This is not a unique deficiency of Linux; my understanding is that macOS and Windows have similar effects for many devices. I'm just saying that support for hot removal of PCIe devices requires high-level support as much as low-level. The low-level work is critical, though.
Posted Oct 10, 2018 18:59 UTC (Wed)
by jg (guest, #17537)
[Link] (1 responses)
I don't remember if all the work has been done in the X server, but certainly the applications and window managers already "do the right thing" for the most part. It gives me great pleasure that I now usually have less trouble handling adding displays/projectors than many/most Windoze users do.
Please don't declare the problem as unsolvable in advance. If it isn't completely solved, it's mostly solved for displays. We started working on these issues with the xrandr extension almost 20 years ago (which keeps getting augmented with time: thanks keithp!). The fundamental shift architecturally happened there, with applications no longer able to presume their root window was immutable.
Posted Oct 10, 2018 19:03 UTC (Wed)
by davidstrauss (guest, #85867)
[Link]
I'm not sure where you got the impression I was making this claim. I started the thread to highlight that the value of PCIe hotplug isn't realized for certain device types without work higher in the stack.
Posted Oct 9, 2018 19:43 UTC (Tue)
by gioele (subscriber, #61675)
[Link] (1 responses)
Can userspace talk directly to PCIe memory?
And couldn't the userspace program just have a SIGBUS handler that notices the missing hardware, cleans up the state and goes back to normality?
Posted Oct 10, 2018 0:47 UTC (Wed)
by excors (subscriber, #95769)
[Link]
It has the concept of "lost" devices, which causes most API calls to return errors and the application is able to clean up and try again with a new graphics device. (Of course the application might choose to just crash instead). The spec says "The host address space corresponding to device memory mapped using vkMapMemory is still valid, and host memory accesses to these mapped regions are still valid, but the contents are undefined", which I guess means the kernel can't unmap the memory when it detects the GPU has gone away (because that would likely crash the application), but could map the whole range onto a single dummy page or something.
The modernization of PCIe hotplug in Linux
The modernization of PCIe hotplug in Linux
The modernization of PCIe hotplug in Linux
The modernization of PCIe hotplug in Linux
- Jim
The modernization of PCIe hotplug in Linux
The modernization of PCIe hotplug in Linux
The modernization of PCIe hotplug in Linux
