KS2012: ARM: DMA mapping
In the last discussion on day one of the 2012 ARM minisummit, Marek Szyprowski gave a status update on changes in the ARM DMA subsystem over the last year. There has been a lot of work in that time, with most of it having been merged in 3.5. The most important change is the conversion to dma_map_ops, which provides a common DMA framework that can be implemented as needed for each architecture. It allows for both coherent and non-coherent devices, supports bounce buffers, and IOMMUs.
The second most important change was the addition of the Contiguous Memory Allocator (CMA). It is in 3.5, but is still marked as experimental. It has been tested on some systems, and Szyprowski hopes that it will be stabilizing over the next kernel cycle or so.
Lastly, a bunch of new attributes for DMA operations have been added. These are mostly for improving performance and to "avoid some hacks", Szyprowski said. For upcoming releases, he would like to work on better support for declaring coherent areas.
For 3.5, there was work to remove some of the limits on DMA, in particular, the 2MB limit on mappings. The fixed-sized coherent area has been replaced with memory from vmalloc(). That can't be done in atomic context, however, so there is a small pre-allocation for use in that context. For some devices that buffer was too small, so the size has been made platform dependent. The IOMMU implementation had no support for an atomic buffer at all, but patches have been posted recently, which he hopes to get into 3.6.
The IOMMU code is not particularly ARM-specific, Szyprowski said; it could be used for other architectures. There is a bit more work to isolate the common code and make it generic, but he would need to coordinate that work with the other architectures. Arnd Bergmann suggested just moving the code to a generic place, but leaving it turned off for other architectures. That would allow others interested to turn it on and try it out.
Bergmann noted that when CMA was proposed a year and a half ago, it was envisioned that it would be unconditionally built for all v6 and v7 platforms. But that would make all recent ARM architectures depend on an experimental feature, so he suggested that it might be time to turn off the experimental designation.
There are still some issues that need to be resolved before that can happen, Szyprowski said. There are cases where the allocation can fail because of different accounting between movable and non-movable regions. But Mel Gorman strongly recommended building CMA by default since the problems just result in an allocation failure, and did not cause a full system failure. He suggested making CMA the default with a fall-back to the old code if it fails. That way people will start using the feature, potentially see fall-back warnings, and help fix the problems. If it stays as an experimental feature, he fears that no one will actually use and test CMA.
Bergmann thought that any platform using a boot time reservation of memory (i.e. a "carve out") should be forced into using CMA. One of the problems with that idea is that some of the carve-outs are not upstream because they are for out-of-tree graphics hardware. In addition, the vendors are moving on and are no longer interested in adding features or updating their drivers to use a new feature like CMA.
Noting that there are multiple ways to do carve-outs, Gorman also suggested creating a core carve-out API for code consolidation. It could provide memory that is isolated or DMA-able, for example, so that all of the carve-outs in the kernel could use it. CMA could underlie that API, and it could implement the fall-back until CMA shakes out.
Fragmentation within CMA regions was mentioned as a concern. While Gorman didn't think it all that likely to happen in practice, some noted that there were already problems when using memory regions for OpenGL. User space actions can cause significant fragmentation in that case. Szyprowski suggested using separate CMA regions as a way to reduce the problem.
CMA still needs work to support highmem; there is no reason that it needs to be restricted to lowmem. Szyprowski hopes to get some time to work on that in the future. Wiring up CMA to x86 DMA is another thing that he plans to work on.
