By Jonathan Corbet
April 27, 2011
As the effort to bring proper abstractions to the ARM architecture and
remove duplicated code continues, one clear problem area that has arisen is
in the area of DMA memory management. The ARM architecture brings some
unique challenges to this area, but the problems are not all ARM-specific.
We are also seeing an interesting view into a future where more complex
hardware requires new mechanisms within the kernel to operate properly.
One development in the ARM sphere is the somewhat belated addition of I/O
memory management units (IOMMUs) to the architecture. An IOMMU sits
between a device and main memory, translating addresses between the two.
One obvious application of an IOMMU is to make physically scattered memory
look contiguous to the device, simplifying large DMA transfers. An IOMMU
can also restrict DMA access to a specific range of memory, adding a layer
of protection to the system. Even in the absence of security worries,
a device which can scribble on random memory can cause no end of
hard-to-debug problems.
As this feature has come to ARM systems, developers have, in the classic
ARM fashion, created special interfaces for
the management of IOMMUs. The
only problem is that the kernel already has an interface for the management
of IOMMUs - it's the DMA API. Drivers which use this API should work on
just about any architecture; all of the related problems, including cache
coherency, IOMMU programming, and bounce buffering, are nicely hidden. So
it seems clear that the DMA API is the mechanism by which ARM-based
drivers, too, should work with IOMMUs; ARM maintainer Russell King recently
made this point in no
uncertain terms.
That said, there are some interesting difficulties which arise when using
the DMA API on the ARM architecture. Most of these problems have their
roots in the architecture's inability to deal with multiple mappings to a
page if those mappings do not all share the same attributes. This is a
problem which has come up before; see this
article for more information. In the DMA context, it is quite easy to
create mappings with conflicting attributes, and performance concerns are
likely to make such conflicts more common.
Long-lasting DMA buffers are typically allocated with
dma_alloc_coherent(); as might be expected from the name, these
are cache-coherent mappings. One longstanding problem (not just on ARM) is
that some drivers need large, physically-contiguous DMA areas which can be
hard to come by after the system has been running for a while. A number of
solutions to this problem have been tried; most of them, like the CMA allocator, involve setting aside memory at
boot time. Using such memory on ARM can be tricky, as it may end up being
mapped as if it were device memory, and may run afoul of the conflicting
attributes rules.
More recently, a different problem has come up: in some cases, developers
want to establish these DMA areas as uncached memory. Since main memory is
already mapped into the kernel's address space as cached, there is no way
to map it as uncached in another context without breaking the rules. Given
this conflict, one might well wonder (as some developers did) why uncached
DMA mappings are wanted. The reason, as explained
by Rebecca Schultz Zavin, has to do with graphics. It's common for
applications to fill memory with images and textures, then hand them over
to the GPU without touching them further. In this situation, there's no
advantage to having the memory represented in the CPU's cache; indeed,
using cache lines for that memory can hurt performance. Going uncached
(but with write combining) turns out to give a significant performance
improvement.
But nobody will appreciate the higher speed if the CPU behaves strangely in
response to multiple mappings with different attributes. Rebecca listed
a few possible solutions to that problem that she had thought of; some
have been tried before, and none are seen as ideal. One is to set aside
memory at boot time - as is sometimes done to provide large buffers - and
never map that memory into the kernel's address space. Another approach is
to use high memory for these buffers; high memory is normally not mapped
into the kernel's address space. ARM-based systems have typically not
needed high memory, but as the number of systems with 1GB (or more) memory
are shipped, we'll see more use of high memory. The final alternative would
be to tweak the attributes in the kernel's mapping of the affected memory.
That would be somewhat tricky; that memory is mapped with huge pages which
would have to be split apart.
These issues - and others - have been summarized in a "to do" list by Arnd
Bergmann. There's clearly a lot of work to be done to straighten out this
interface, even given the current set of problems. But there is another
cloud on the horizon in the form of the increasing need to share these
buffers between devices. One example can be found in this patch, which is an attempt to establish
graphical overlays as proper objects in the kernel mode setting graphics
environment. Overlays are a way of displaying (usually) high-rate graphics
on top of what the window system is doing; they are traditionally used for
tasks like video playback. Often, what is wanted is to take frames
directly from a camera and show them on the screen, preferably without
copying the data or involving user space. These new overlays, if properly
tied into the Video4Linux layer's concept of overlays, should allow that to
happen.
Hardware is getting more sophisticated over time, and, as a result, device
drivers are becoming more complicated. A peripheral device is now often a
reasonably capable computer in its own right; it can be programmed and left
to work on its own for extended periods of time. It is only natural to
want these peripherals to be able to deal directly with each other. Memory
is the means by which these devices will communicate, so we need an
allocation and management mechanism that can work in that environment.
There have been suggestions
that the GEM memory manager - currently used with GPUs - could be
generalized to work in this mode.
So far, nobody has really described how all this could work, much less
posted patches. Working all of these issues out is clearly going to take
some time. It looks like a fun challenge for those who would like to help
set the direction for our kernels in the future.
(
Log in to post comments)