|
|
Subscribe / Log in / New account

The modernization of PCIe hotplug in Linux

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 14:56 UTC (Tue) by MarcB (guest, #101804)
In reply to: The modernization of PCIe hotplug in Linux by davidstrauss
Parent article: The modernization of PCIe hotplug in Linux

I think it is not even half the battle. Even if the kernel drivers are improved, userspace will still be a big issue. "Some of the memory you mapped just physically disappeared" seems not easy to handle (perhaps in some cases it is, but certainly not always). In fact, it could very well be impossible in many cases.

I can easily see developers responding to that request with "just don't do that".


to post comments

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 17:38 UTC (Tue) by jg (guest, #17537) [Link] (3 responses)

In the case of screens, the hot plug case already has to be handed in user space, as you plug external monitors/projectors in routinely. The window system and UI already has support to detect and reconfigure your applications on such hotplug events. Works fine today on my laptop with USB3 dongles driving external monitors....

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 17:58 UTC (Tue) by davidstrauss (guest, #85867) [Link] (2 responses)

Code accessing USB devices already anticipates this sort of disconnection; code accessing PCIe devices often does not. I'm not removing a display; I'm removing a GPU. Suddenly unplugging my Thunderbolt eGPU -- even when I'm not actually running any apps or displays with it -- always freezes my system. I've isolated the cause down to the amdgpu (and related) modules not tolerating disappearance of a GPU, as unloading those modules (which rarely works) prevents the freeze on removal.

This is not a unique deficiency of Linux; my understanding is that macOS and Windows have similar effects for many devices. I'm just saying that support for hot removal of PCIe devices requires high-level support as much as low-level. The low-level work is critical, though.

The modernization of PCIe hotplug in Linux

Posted Oct 10, 2018 18:59 UTC (Wed) by jg (guest, #17537) [Link] (1 responses)

Again, this situation is already (at least partially, IIRC) dealt with in the code base. There are these weird laptops that have a GPU to augment the (much lower performance one) in the laptop; you want to be able to switch back and forth (the high power GPU gets powered down).

I don't remember if all the work has been done in the X server, but certainly the applications and window managers already "do the right thing" for the most part. It gives me great pleasure that I now usually have less trouble handling adding displays/projectors than many/most Windoze users do.

Please don't declare the problem as unsolvable in advance. If it isn't completely solved, it's mostly solved for displays. We started working on these issues with the xrandr extension almost 20 years ago (which keeps getting augmented with time: thanks keithp!). The fundamental shift architecturally happened there, with applications no longer able to presume their root window was immutable.
- Jim

The modernization of PCIe hotplug in Linux

Posted Oct 10, 2018 19:03 UTC (Wed) by davidstrauss (guest, #85867) [Link]

> Please don't declare the problem as unsolvable in advance.

I'm not sure where you got the impression I was making this claim. I started the thread to highlight that the value of PCIe hotplug isn't realized for certain device types without work higher in the stack.

The modernization of PCIe hotplug in Linux

Posted Oct 9, 2018 19:43 UTC (Tue) by gioele (subscriber, #61675) [Link] (1 responses)

> I think it is not even half the battle. Even if the kernel drivers are improved, userspace will still be a big issue. "Some of the memory you mapped just physically disappeared" seems not easy to handle

Can userspace talk directly to PCIe memory?

And couldn't the userspace program just have a SIGBUS handler that notices the missing hardware, cleans up the state and goes back to normality?

The modernization of PCIe hotplug in Linux

Posted Oct 10, 2018 0:47 UTC (Wed) by excors (subscriber, #95769) [Link]

Vulkan lets you explicitly allocate device-local memory (i.e. on the GPU) then map it cache-coherently into userspace.

It has the concept of "lost" devices, which causes most API calls to return errors and the application is able to clean up and try again with a new graphics device. (Of course the application might choose to just crash instead). The spec says "The host address space corresponding to device memory mapped using vkMapMemory is still valid, and host memory accesses to these mapped regions are still valid, but the contents are undefined", which I guess means the kernel can't unmap the memory when it detects the GPU has gone away (because that would likely crash the application), but could map the whole range onto a single dummy page or something.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds