The modernization of PCIe hotplug in Linux

Posted Oct 8, 2018 23:26 UTC (Mon) by davidstrauss (guest, #85867)
Parent article: The modernization of PCIe hotplug in Linux

I've been struggling with this in my own adventures with using an eGPU over Thunderbolt. Unfortunately, I expect that PCIe-level hotplug improvements will only be half of the battle, as kernel modules for certain device types (certainly GPUs) respond violently to having their backing devices disappear, even if it happens cleanly at the PCIe level. Manually unloading the kernel modules before removing the device sometimes works, but it's often the case that they refuse to be unloaded because they're in-use by other kernel modules or userland. Perhaps there's a systematic way to hunt down and remove anything blocking the module unloading to prepare for disconnection, but I'm not convinced that's the right answer for user experience.

For PCIe hotplug (including removal) to "just work," kernel modules are going to need to get a lot more accustomed to an inversion of dependency removal. As suggested in "surprise removal," the traditional expectation has been an orderly retirement of consumers before their corresponding producers (to use the API terminology). We now have to accommodate the opposite: surprise electrical removal, followed by surprise PCIe device removal, followed by a kernel module surprising its consumers with an inability to service them, followed by graceful handling on downward. The ability to expose practically any PCIe device over Thunderbolt has changed the scope of hot removal support from a few types of server-style components (e.g. NVMe) to literally anything that may run over PCIe.

I would love to reach the point where I can safely and consistently unplug a Thunderbolt eGPU from my Linux machine -- without shutting it down first.

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 14:56 UTC (Tue) by MarcB (guest, #101804) [Link] (6 responses)

I think it is not even half the battle. Even if the kernel drivers are improved, userspace will still be a big issue. "Some of the memory you mapped just physically disappeared" seems not easy to handle (perhaps in some cases it is, but certainly not always). In fact, it could very well be impossible in many cases.

I can easily see developers responding to that request with "just don't do that".

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 17:38 UTC (Tue) by jg (guest, #17537) [Link] (3 responses)

In the case of screens, the hot plug case already has to be handed in user space, as you plug external monitors/projectors in routinely. The window system and UI already has support to detect and reconfigure your applications on such hotplug events. Works fine today on my laptop with USB3 dongles driving external monitors....

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 17:58 UTC (Tue) by davidstrauss (guest, #85867) [Link] (2 responses)

Code accessing USB devices already anticipates this sort of disconnection; code accessing PCIe devices often does not. I'm not removing a display; I'm removing a GPU. Suddenly unplugging my Thunderbolt eGPU -- even when I'm not actually running any apps or displays with it -- always freezes my system. I've isolated the cause down to the amdgpu (and related) modules not tolerating disappearance of a GPU, as unloading those modules (which rarely works) prevents the freeze on removal.

This is not a unique deficiency of Linux; my understanding is that macOS and Windows have similar effects for many devices. I'm just saying that support for hot removal of PCIe devices requires high-level support as much as low-level. The low-level work is critical, though.

The modernization of PCIe hotplug in Linux

Posted Oct 10, 2018 18:59 UTC (Wed) by jg (guest, #17537) [Link] (1 responses)

Again, this situation is already (at least partially, IIRC) dealt with in the code base. There are these weird laptops that have a GPU to augment the (much lower performance one) in the laptop; you want to be able to switch back and forth (the high power GPU gets powered down).

I don't remember if all the work has been done in the X server, but certainly the applications and window managers already "do the right thing" for the most part. It gives me great pleasure that I now usually have less trouble handling adding displays/projectors than many/most Windoze users do.

Please don't declare the problem as unsolvable in advance. If it isn't completely solved, it's mostly solved for displays. We started working on these issues with the xrandr extension almost 20 years ago (which keeps getting augmented with time: thanks keithp!). The fundamental shift architecturally happened there, with applications no longer able to presume their root window was immutable.
- Jim

The modernization of PCIe hotplug in Linux

Posted Oct 10, 2018 19:03 UTC (Wed) by davidstrauss (guest, #85867) [Link]

> Please don't declare the problem as unsolvable in advance.

I'm not sure where you got the impression I was making this claim. I started the thread to highlight that the value of PCIe hotplug isn't realized for certain device types without work higher in the stack.

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 19:43 UTC (Tue) by gioele (subscriber, #61675) [Link] (1 responses)

> I think it is not even half the battle. Even if the kernel drivers are improved, userspace will still be a big issue. "Some of the memory you mapped just physically disappeared" seems not easy to handle

Can userspace talk directly to PCIe memory?

And couldn't the userspace program just have a SIGBUS handler that notices the missing hardware, cleans up the state and goes back to normality?

The modernization of PCIe hotplug in Linux

Posted Oct 10, 2018 0:47 UTC (Wed) by excors (subscriber, #95769) [Link]

Vulkan lets you explicitly allocate device-local memory (i.e. on the GPU) then map it cache-coherently into userspace.

It has the concept of "lost" devices, which causes most API calls to return errors and the application is able to clean up and try again with a new graphics device. (Of course the application might choose to just crash instead). The spec says "The host address space corresponding to device memory mapped using vkMapMemory is still valid, and host memory accesses to these mapped regions are still valid, but the contents are undefined", which I guess means the kernel can't unmap the memory when it detects the GPU has gone away (because that would likely crash the application), but could map the whole range onto a single dummy page or something.