LWN.net Logo

ARM, DMA, and memory management

By Jonathan Corbet
April 27, 2011
As the effort to bring proper abstractions to the ARM architecture and remove duplicated code continues, one clear problem area that has arisen is in the area of DMA memory management. The ARM architecture brings some unique challenges to this area, but the problems are not all ARM-specific. We are also seeing an interesting view into a future where more complex hardware requires new mechanisms within the kernel to operate properly.

One development in the ARM sphere is the somewhat belated addition of I/O memory management units (IOMMUs) to the architecture. An IOMMU sits between a device and main memory, translating addresses between the two. One obvious application of an IOMMU is to make physically scattered memory look contiguous to the device, simplifying large DMA transfers. An IOMMU can also restrict DMA access to a specific range of memory, adding a layer of protection to the system. Even in the absence of security worries, a device which can scribble on random memory can cause no end of hard-to-debug problems.

As this feature has come to ARM systems, developers have, in the classic ARM fashion, created special interfaces for the management of IOMMUs. The only problem is that the kernel already has an interface for the management of IOMMUs - it's the DMA API. Drivers which use this API should work on just about any architecture; all of the related problems, including cache coherency, IOMMU programming, and bounce buffering, are nicely hidden. So it seems clear that the DMA API is the mechanism by which ARM-based drivers, too, should work with IOMMUs; ARM maintainer Russell King recently made this point in no uncertain terms.

That said, there are some interesting difficulties which arise when using the DMA API on the ARM architecture. Most of these problems have their roots in the architecture's inability to deal with multiple mappings to a page if those mappings do not all share the same attributes. This is a problem which has come up before; see this article for more information. In the DMA context, it is quite easy to create mappings with conflicting attributes, and performance concerns are likely to make such conflicts more common.

Long-lasting DMA buffers are typically allocated with dma_alloc_coherent(); as might be expected from the name, these are cache-coherent mappings. One longstanding problem (not just on ARM) is that some drivers need large, physically-contiguous DMA areas which can be hard to come by after the system has been running for a while. A number of solutions to this problem have been tried; most of them, like the CMA allocator, involve setting aside memory at boot time. Using such memory on ARM can be tricky, as it may end up being mapped as if it were device memory, and may run afoul of the conflicting attributes rules.

More recently, a different problem has come up: in some cases, developers want to establish these DMA areas as uncached memory. Since main memory is already mapped into the kernel's address space as cached, there is no way to map it as uncached in another context without breaking the rules. Given this conflict, one might well wonder (as some developers did) why uncached DMA mappings are wanted. The reason, as explained by Rebecca Schultz Zavin, has to do with graphics. It's common for applications to fill memory with images and textures, then hand them over to the GPU without touching them further. In this situation, there's no advantage to having the memory represented in the CPU's cache; indeed, using cache lines for that memory can hurt performance. Going uncached (but with write combining) turns out to give a significant performance improvement.

But nobody will appreciate the higher speed if the CPU behaves strangely in response to multiple mappings with different attributes. Rebecca listed a few possible solutions to that problem that she had thought of; some have been tried before, and none are seen as ideal. One is to set aside memory at boot time - as is sometimes done to provide large buffers - and never map that memory into the kernel's address space. Another approach is to use high memory for these buffers; high memory is normally not mapped into the kernel's address space. ARM-based systems have typically not needed high memory, but as the number of systems with 1GB (or more) memory are shipped, we'll see more use of high memory. The final alternative would be to tweak the attributes in the kernel's mapping of the affected memory. That would be somewhat tricky; that memory is mapped with huge pages which would have to be split apart.

These issues - and others - have been summarized in a "to do" list by Arnd Bergmann. There's clearly a lot of work to be done to straighten out this interface, even given the current set of problems. But there is another cloud on the horizon in the form of the increasing need to share these buffers between devices. One example can be found in this patch, which is an attempt to establish graphical overlays as proper objects in the kernel mode setting graphics environment. Overlays are a way of displaying (usually) high-rate graphics on top of what the window system is doing; they are traditionally used for tasks like video playback. Often, what is wanted is to take frames directly from a camera and show them on the screen, preferably without copying the data or involving user space. These new overlays, if properly tied into the Video4Linux layer's concept of overlays, should allow that to happen.

Hardware is getting more sophisticated over time, and, as a result, device drivers are becoming more complicated. A peripheral device is now often a reasonably capable computer in its own right; it can be programmed and left to work on its own for extended periods of time. It is only natural to want these peripherals to be able to deal directly with each other. Memory is the means by which these devices will communicate, so we need an allocation and management mechanism that can work in that environment. There have been suggestions that the GEM memory manager - currently used with GPUs - could be generalized to work in this mode.

So far, nobody has really described how all this could work, much less posted patches. Working all of these issues out is clearly going to take some time. It looks like a fun challenge for those who would like to help set the direction for our kernels in the future.


(Log in to post comments)

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds