LWN.net Logo

DMA issues, part 2

Last week's Kernel Page looked at various DMA-related issues. One of those was the ability to make use of memory located on I/O controllers for DMA operations. That work has taken a step forward with this proposal from James Bottomley, which adds a new function to the DMA API:

    int dma_declare_coherent_memory(struct device *dev, 
                                    dma_addr_t bus_addr,
                                    dma_addr_t device_addr, 
                                    size_t size, int flags);

This function tells the DMA code about a chunk of memory available on the device represented by dev. The memory is size bytes long; it is located at bus_addr from the bus's point of view, and device_addr from the device's perspective. The flags argument describes how the memory is to be used: whether it should be mapped into the kernel's address space, whether children of the device can use it, and whether it should be the only memory used by the device(s) for DMA.

The actual patch implementing this API is still in the works. As of this writing, there have been no real comments on it.

Meanwhile, a different DMA issue has been raised by the folks at nVidia, who are trying to make their hardware work better on Intel's em64t (AMD64 clone) architecture. It is, it turns out, difficult to reliably use DMA on devices which cannot handle 64-bit addresses.

Memory on (non-NUMA) Linux systems has traditionally been divided into three zones. ZONE_DMA is the bottom 16MB; it is the only memory which is accessible to ancient ISA peripherals and, perhaps, a few old PCI cards which are simply a repackaging of ISA chipsets. ZONE_NORMAL is all of the memory, outside of ZONE_DMA, which is directly accessible to the kernel. On a typical 32-bit Linux system, ZONE_NORMAL extends up to just under the first 1GB of physical memory. Finally, ZONE_HIGHMEM is the "high memory" zone - the area which is not directly accessible to the kernel.

This layout works reasonably well for DMA allocations on 32-bit systems. Truly limited peripherals use memory taken from ZONE_DMA; most of the rest work with ZONE_NORMAL memory. In the 64-bit world, however, things are a little different. There is no need for high memory on such systems, so ZONE_HIGHMEM simply does not exist, and ZONE_NORMAL contains everything above ZONE_DMA. Having (almost) all of main memory contained within ZONE_NORMAL simplifies a lot of things.

Kernel memory allocations specify (implicitly or explicitly) the zone from which the memory is to be obtained. On 32-bit systems, the DMA code can simply specify a zone which matches the capabilities of the device and get the memory it needs. On 64-bit systems, however, the memory zones no longer align with the limitations of particular devices. So there is no way for the DMA layer to request memory fitting its needs. The only exception is ZONE_DMA, which is far more restrictive than necessary.

On some architectures - notably AMD's x86_64 - an I/O memory management unit (IOMMU) is provided. This unit remaps addresses between the peripheral bus and main memory; it can make any region of physical memory appear to exist in an area accessible by the device. Systems equipped with an IOMMU thus have no problems allocating DMA memory - any memory will do. Unfortunately, when Intel created its variant of the x86_64 architecture, it decided to leave the IOMMU out. So devices running on "Intel inside" systems work directly with physical memory addresses, and, as a result, the more limited devices out there cannot access all of physical memory. And, as we have seen, the kernel has trouble allocating memory which meets their special needs.

One solution to this problem could be the creation of a new zone, ZONE_BIGDMA, say, which would represent memory reachable with 32-bit addresses. Nobody much likes this approach, however; it involves making core memory management changes to deal with the shortcomings of a few processors. Balancing memory use between zones is a perennial memory management headache, and adding more zones can only make things worse. There is one other problem as well: some devices have strange DMA limitations (a maximum of 29 bits, for example); creating a zone which would work for all of them would not be easy.

The Itanium architecture took a different approach, known as the "software I/O translation buffer" or "swiotlb." The swiotlb code simply allocates a large chunk of low memory early in the bootstrap process; this memory is then handed out in response to DMA allocation requests. In many cases, use of swiotlb memory involves the creation of "bounce buffers," where data is copied between the driver's buffer and the device-accessible swiotlb space. Memory used for the swiotlb is removed from the normal Linux memory management mechanism and is, thus, inaccessible for any use other than DMA buffers. For these reasons, the swiotlb is seen as, at best, inelegant.

It is also, however, a solution which happens to work. The swiotlb can also accommodate devices with strange DMA masks by searching until it finds memory which fits. So the solution to the problem experienced by nVidia (and others) is likely to be a simple expansion of the swiotlb space. Carving a 128MB array out of main memory for full-time use as DMA buffers may seem like a shocking waste, but, if you have enough memory that you're having trouble with addresses requiring more than 32 bits, the cost of a larger swiotlb will be hard to notice.


(Log in to post comments)

DMA issues, part 2

Posted Jul 1, 2004 1:49 UTC (Thu) by simonl (subscriber, #13603) [Link]

Thanks for yet another interesting article.

I do not understand why opportunistic allocation would not work: If a device needs a DMA buffer within 2^32 (or 2^29 for that matter), just try to allocate memory, see if the it falls within that range, if not free the memory again and return error. Or maybe try to move some lowmem pages to highmem first.

I see the point with ZONE_DMA, the first 16 MB could be allocated for other purposes very quickly, perhaps even locked pages. But is that really a problem with the entire 32 bit space?

DMA issues, part 2

Posted Jul 1, 2004 19:20 UTC (Thu) by bronson (subscriber, #4806) [Link]

Wow, has Intel totally lost it?

I knew that that they silently copied AMD's 64-bit architecture when their own development efforts failed. But I didn't know until now that they did such a poor job of it!

Omitting the IOMMU just leads to the sort of addressing headaches that should have been left back in the early 90s. Did they really think it would increase their die size that much??

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds