Error handling for I/O memory management units

By Jonathan Corbet
August 20, 2014

Kernel Summit 2014

Traditionally the Kernel Summit focuses more on process-oriented issues than technical problems; the latter are usually deemed to be better addressed by smaller groups on the mailing lists. The 2014 Summit, however, started off with a purely technical topic: how to handle errors signaled by I/O memory management units (IOMMUs)?

An IOMMU performs translations between memory addresses as seen by devices and those seen by the CPU. It can be used to present a simplified address space to peripherals, to make a physically scattered buffer look contiguous, or to restrict device access to a limited range of memory. Not all systems have IOMMUs in them, but there is a slow trend toward including them in a wider range of systems.

David Woodhouse started off by saying that, in the IOMMU context, there is no generalized way of reporting errors. So drivers cannot easily be notified if something goes wrong with an IOMMU. What does exist is a certain amount of subsystem-specific infrastructure. The PowerPC architecture has its "extended error handling" (EEH) structure that "only Ben Herrenschmidt understands," and the PCI subsystem has an error-handling mechanism as well. But the kernel needs a coherent way to report errors from an IOMMU to drivers regardless of how they are connected to the system. There also needs to be a generalized mechanism by which an offending device can be shut down to avoid crippling the system with interrupt storms.

David presented a possible way forward, which was based on extending and generalizing the PCI error-handling infrastructure. It would need to be moved beyond PCI and given extra capabilities, such as the ability to provide information on exactly what went wrong and what the offending address was. He asked: does anybody have any strong opinions on the subject? This is the Kernel Summit, so, of course, opinions were on offer.

Ben started by saying that it is not always easy to get specific information about a fault to report. The response to errors — often wired into the hardware — is to isolate the entire group of devices behind a faulting IOMMU; at that point, there is no access to any information to pass on. Drivers do have the ability to ask for fault notification, and they can attempt to straighten out a confused device. In the absence of driver support, he said, the default response is to simulate unplug and replug events on the device.

David pointed out that with some devices, graphics adapters in particular, users do not want the device to stop even in the presence of an error. One command stream may fault and be stopped, but others running in parallel should be able to continue. So a more subtle response is often necessary.

Josh Triplett asked about what the response should be, in general, when something goes wrong. Should recovery paths do anything other than give up and reset the device fully? For most devices, a full reset seems like an entirely adequate response, though, as mentioned, graphics adapters are different. Networking devices, too, might need a gentler hand if error recovery is to be minimally disruptive. But David insisted that, almost all the time, full isolation and resetting of the device is good enough. The device, he said, "has misbehaved and is sitting on the naughty step."

Andi Kleen asked how this kind of error-handling code could be tested. In the absence of comprehensive testing, the code will almost certainly be broken. David responded that it's usually easy to make a device attempt DMA to a bad address, and faults can be injected as well. But Ben added that, even with these tools, EEH error handling breaks frequently anyway.

David asked: what about ARM? Will Deacon responded that there are no real standards outside of PCI; he has not seen anything in the ARM space that responds sanely to errors. He also pointed out that hypervisors can complicate the situation. An IOMMU can be used to provide limited DMA access to a guest, but that means exposing the guest to potential IOMMU errors. A guest may end up isolating an offending device, confusing the host which may not be expecting that.

Arnd Bergmann asked that any error-handling solution not be specific to PCI devices since, in the ARM world, there often is not a PCI bus involved. David suggested that the existing PCI error-handling infrastructure could be a good place to start if it could be made more generic. There are some PCI-specific concepts there that would have to be preserved (for PCI devices) somehow, but much of it could be moved up into struct device and generalized. After hearing no objections to that approach, David said he would go off and implement it. He will not, he noted dryly, be interested in hearing objections later on.

Next: Stable tree maintenance.

Index entries for this article
Kernel	IOMMU
Conference	Kernel Summit/2014

Error handling for I/O memory management units

Posted Aug 20, 2014 23:03 UTC (Wed) by neilbrown (subscriber, #359) [Link] (12 responses)

Pardon my ignorance, but what sort of errors are we talking about here? What can "go wrong with an IOMMU"?

The article mentions "bad addresses" but presumably you can program DMA to bad addresses even without an IOMMU ... and presumably we try not to even when we have one.

Thanks.

Error handling for I/O memory management units

Posted Aug 21, 2014 5:43 UTC (Thu) by mcpherrinm (guest, #92295) [Link] (3 responses)

An IOMMU gives devices virtual memory, basically. This is similar to what we do with process address spaces. We're talking about a device doing something equivalent to a process segfaulting.

For example, a GPU may load a GL shader program with a bug that causes out of bounds reads to happen on the GPU. If executed by a malicious user, that could leak information the system doesn't intend them to have access to.

Error handling for I/O memory management units

Posted Aug 23, 2014 13:25 UTC (Sat) by corsac (subscriber, #49696) [Link] (2 responses)

Or someone could remotely take control of your network card and tries to DMA write from there, compromising the host.

Error handling for I/O memory management units

Posted Aug 23, 2014 22:34 UTC (Sat) by neilbrown (subscriber, #359) [Link] (1 responses)

These are perfectly good answers for why an IOMMU is a valuable thing to have, but don't seem to answer the question: what sort of error can you get from an IOMMU.

If you have a system without an IOMMU, then it is quite possible to program a DMA engine in some device to access an illegal address - maybe some address where there isn't any memory. Presumably an error gets reported .. or maybe it doesn't. Maybe it just silently fails.

If you add an IOMMU, then that greatly increases the range of addresses that are illegal for any given device, but surely the device will just fail in exactly the same way that it did before. I don't see any new sorts of errors. I must be missing something.

So I'm still hoping someone can explain to me what sort of errors one can get from an IOMMU.

Error handling for I/O memory management units

Posted Aug 24, 2014 3:51 UTC (Sun) by dlang (guest, #313) [Link]

I could be wrong, but I think the answer is that without an IOMMU the DMA will either succeed or fail, the only think that knows this is the thing trying to do the DMA

however with an IOMMU, the IOMMU can now report that the device attempted to access memory it's not allowed to.

The question is what should be done when a device misbehaves, and how should it be reported?

Error handling for I/O memory management units

Posted Aug 25, 2014 5:47 UTC (Mon) by jzbiciak (guest, #5246) [Link] (7 responses)

I have a related question. What is meant by "error"?

If a device (CPU or anything else) asks an MMU for a virtual to physical address translation for a given read or write, the MMU can report back a fault. However, the request itself may not have been an actual error. Rather, the fault could have arisen for many other reasons, including:

The page may have been unmapped by the OS but still resident. The fault informs the OS that it's still active. (A so-called minor fault.)
The page may have been marked read-only (because it's shared or clean) but the device requested to write, so the OS needs to mark it dirty and possibly perform a COW. (Another sort of minor fault, I believe.)
The page may have been paged out. (A major fault, but still not an error.)

Are these situations considered errors or not? This comment in particular made me wonder:

David pointed out that with some devices, graphics adapters in particular, users do not want the device to stop even in the presence of an error. One command stream may fault and be stopped, but others running in parallel should be able to continue. So a more subtle response is often necessary.

Is that so that the kernel can service a page fault for a given task, or is it the more general situation that you don't want the entire display system brought down by an errant task? (It seems like the latter would be pretty key baseline functionality. The former, though, is something I believe AMDs Kaveri was promising.)

So does "error" in this article mean "faults that would result in SIGSEGV or similar if a userland CPU task did it", or "faults that corresponds to minor/major page faults in an otherwise behaving userland CPU task, in addition to those which would cause SIGSEGVs?"

Error handling for I/O memory management units

Posted Aug 25, 2014 8:43 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link] (6 responses)

DMA is always done from/to buffers that have been allocated or locked by the device driver.

With DMA, there is no such thing as a minor fault; any IOMMU fault is the result of a bug in the OS/driver or in the hardware.

Error handling for I/O memory management units

Posted Aug 25, 2014 9:22 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

We now have GPUs that can access the system RAM, and it's certainly possible for them to get IOMMU errors while running user space-supplied code.

Error handling for I/O memory management units

Posted Aug 25, 2014 10:19 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link] (4 responses)

Allowing userspace to use the GPU to read/write any memory would be a security hole.
The GPU driver checks the command stream for correctness.

Error handling for I/O memory management units

Posted Aug 25, 2014 10:22 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

OpenCL supports arbitrary pointer arithmetic. It's impossible to statically check the command stream for correctness.

Error handling for I/O memory management units

Posted Aug 25, 2014 11:49 UTC (Mon) by intgr (subscriber, #39733) [Link] (2 responses)

Does this mean that, if 2 users are both running code on the GPU, they can access and corrupt each other's data?

And without an IOMMU they can access all physical memory?

And Linux 3.15 merged patches to allow GPGPU (OpenCL) access to any unprivileged user by default (via DRM render nodes)?

And no checking could possibly be done of the code being executed?

PLEASE tell me I am misunderstanding something.

Error handling for I/O memory management units

Posted Aug 25, 2014 14:52 UTC (Mon) by jzbiciak (guest, #5246) [Link]

My understanding of the support at least some GPUs provide (whether or not Linux natively leverages it) is that you can provide an MMU context with a particular command stream. There isn't a global mapping table so that the GPU can see the union of mappings across all requestors. Rather, command streams coming from X get checked against an MMU context associated with X, and command streams coming from Y get checked against an MMU context associated with Y.

And within that framework, my understanding is that GPUs can trigger page faults, and that that is not an error. At least, that's what AMD's Kaveri was promising some time ago, and what I've seen in some other vendors' GPU+MMU pitches.

So I repeat my question: Does 'error' in the article refer to page faults in general, or an actual application error?

Error handling for I/O memory management units

Posted Aug 25, 2014 19:21 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

As I understand, currently GPUs have access only to some RAM regions, not the whole RAM. Though it's changing with the new modern heterogeneous architectures.

Command buffers are also scheduled to be run exclusively, so that gives _some_ protection. Lots of downsides (you can't run for too long, else you can starve other users) but it's also changing.