|
|
Log in / Subscribe / Register

KS2012: ARM: DMA mapping

By Jake Edge
August 29, 2012

2012 Kernel Summit

In the last discussion on day one of the 2012 ARM minisummit, Marek Szyprowski gave a status update on changes in the ARM DMA subsystem over the last year. There has been a lot of work in that time, with most of it having been merged in 3.5. The most important change is the conversion to dma_map_ops, which provides a common DMA framework that can be implemented as needed for each architecture. It allows for both coherent and non-coherent devices, supports bounce buffers, and IOMMUs.

The second most important change was the addition of the Contiguous Memory Allocator (CMA). It is in 3.5, but is still marked as experimental. It has been tested on some systems, and Szyprowski hopes that it will be stabilizing over the next kernel cycle or so.

Lastly, a bunch of new attributes for DMA operations have been added. These are mostly for improving performance and to "avoid some hacks", Szyprowski said. For upcoming releases, he would like to work on better support for declaring coherent areas.

For 3.5, there was work to remove some of the limits on DMA, in particular, the 2MB limit on mappings. The fixed-sized coherent area has been replaced with memory from vmalloc(). That can't be done in atomic context, however, so there is a small pre-allocation for use in that context. For some devices that buffer was too small, so the size has been made platform dependent. The IOMMU implementation had no support for an atomic buffer at all, but patches have been posted recently, which he hopes to get into 3.6.

The IOMMU code is not particularly ARM-specific, Szyprowski said; it could be used for other architectures. There is a bit more work to isolate the common code and make it generic, but he would need to coordinate that work with the other architectures. Arnd Bergmann suggested just moving the code to a generic place, but leaving it turned off for other architectures. That would allow others interested to turn it on and try it out.

Bergmann noted that when CMA was proposed a year and a half ago, it was envisioned that it would be unconditionally built for all v6 and v7 platforms. But that would make all recent ARM architectures depend on an experimental feature, so he suggested that it might be time to turn off the experimental designation.

There are still some issues that need to be resolved before that can happen, Szyprowski said. There are cases where the allocation can fail because of different accounting between movable and non-movable regions. But Mel Gorman strongly recommended building CMA by default since the problems just result in an allocation failure, and did not cause a full system failure. He suggested making CMA the default with a fall-back to the old code if it fails. That way people will start using the feature, potentially see fall-back warnings, and help fix the problems. If it stays as an experimental feature, he fears that no one will actually use and test CMA.

Bergmann thought that any platform using a boot time reservation of memory (i.e. a "carve out") should be forced into using CMA. One of the problems with that idea is that some of the carve-outs are not upstream because they are for out-of-tree graphics hardware. In addition, the vendors are moving on and are no longer interested in adding features or updating their drivers to use a new feature like CMA.

Noting that there are multiple ways to do carve-outs, Gorman also suggested creating a core carve-out API for code consolidation. It could provide memory that is isolated or DMA-able, for example, so that all of the carve-outs in the kernel could use it. CMA could underlie that API, and it could implement the fall-back until CMA shakes out.

Fragmentation within CMA regions was mentioned as a concern. While Gorman didn't think it all that likely to happen in practice, some noted that there were already problems when using memory regions for OpenGL. User space actions can cause significant fragmentation in that case. Szyprowski suggested using separate CMA regions as a way to reduce the problem.

CMA still needs work to support highmem; there is no reason that it needs to be restricted to lowmem. Szyprowski hopes to get some time to work on that in the future. Wiring up CMA to x86 DMA is another thing that he plans to work on.



to post comments

KS2012: ARM: DMA mapping

Posted Sep 6, 2012 18:18 UTC (Thu) by grundler (guest, #23450) [Link]

"The IOMMU code is not particularly ARM-specific, Szyprowski said; it could be used for other architectures."

My $0.02: Hhaving worked on 4 different IOMMUs, I believe *in general* the IOMMU code is mostly chip specific, as well as possibly arch specific. I have not looked at the IOMMU code for ARM platforms (yet) and those might easily share most of the code within the ARM platforms.

The problem is for good performance, each IOMMU implements an IO TLB (Translation Look-aside buffer). Behaviors for prefetching or replacing entries in the TLB will depending on the specific chip. This is turn will dictate the "optimal" algorithm for allocating DMA mappings to specific devices. For examples, see ccio-dma.c and sba_iommu.c. James Bottomley refactored so they share some code, but not the allocation policy.

Maintaining coherency between CPU (updates to the IO Page Directory) and the IOMMU is "arch dependent" depending on the CPU cache coherency protocol (e.g. VIVT vs PIPT). IOMMU is normally using a "physical addreess" to reference the IO Pdir and the CPU is using a virtual address. Optimal solutions to making this work will depend on the CPU architecture.

cheers,
grant


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds