Error handling for I/O memory management units

Posted Aug 25, 2014 9:22 UTC (Mon) by Cyberax (✭ supporter ✭, #52523)
In reply to: Error handling for I/O memory management units by cladisch
Parent article: Error handling for I/O memory management units

We now have GPUs that can access the system RAM, and it's certainly possible for them to get IOMMU errors while running user space-supplied code.

Error handling for I/O memory management units

Posted Aug 25, 2014 10:19 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link] (4 responses)

Allowing userspace to use the GPU to read/write any memory would be a security hole.
The GPU driver checks the command stream for correctness.

Error handling for I/O memory management units

Posted Aug 25, 2014 10:22 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

OpenCL supports arbitrary pointer arithmetic. It's impossible to statically check the command stream for correctness.

Error handling for I/O memory management units

Posted Aug 25, 2014 11:49 UTC (Mon) by intgr (subscriber, #39733) [Link] (2 responses)

Does this mean that, if 2 users are both running code on the GPU, they can access and corrupt each other's data?

And without an IOMMU they can access all physical memory?

And Linux 3.15 merged patches to allow GPGPU (OpenCL) access to any unprivileged user by default (via DRM render nodes)?

And no checking could possibly be done of the code being executed?

PLEASE tell me I am misunderstanding something.

Error handling for I/O memory management units

Posted Aug 25, 2014 14:52 UTC (Mon) by jzbiciak (guest, #5246) [Link]

My understanding of the support at least some GPUs provide (whether or not Linux natively leverages it) is that you can provide an MMU context with a particular command stream. There isn't a global mapping table so that the GPU can see the union of mappings across all requestors. Rather, command streams coming from X get checked against an MMU context associated with X, and command streams coming from Y get checked against an MMU context associated with Y.

And within that framework, my understanding is that GPUs can trigger page faults, and that that is not an error. At least, that's what AMD's Kaveri was promising some time ago, and what I've seen in some other vendors' GPU+MMU pitches.

So I repeat my question: Does 'error' in the article refer to page faults in general, or an actual application error?

Error handling for I/O memory management units

Posted Aug 25, 2014 19:21 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

As I understand, currently GPUs have access only to some RAM regions, not the whole RAM. Though it's changing with the new modern heterogeneous architectures.

Command buffers are also scheduled to be run exclusively, so that gives _some_ protection. Lots of downsides (you can't run for too long, else you can starve other users) but it's also changing.