By Jonathan Corbet
July 5, 2011
LWN recently looked (again) at the
contiguous
memory allocator (CMA) patch set; CMA is intended to provide large,
contiguous DMA buffers to drivers without requiring that memory be set
aside for that exclusive purpose. CMA was recently
reposted with the idea that it is nearly ready
for merging. There is a clear desire to see this code get at least into
the -mm tree, even if it is not yet quite ready for the mainline. Most
reviewers are pleased with CMA; it would seem that there are very few
roadblocks remaining. Except that, as it turns out, one big obstacle
remains.
Over the years, LWN has also looked at ARM's
special memory management
challenges. Recent ARM CPUs are, like those implementing other
architectures, becoming more complex in order to improve performance. So
ARM processors can now do speculative prefetching of memory contents in
surprising ways. This prefetching works well on cached memory, but should not
be used on memory that has been marked as uncached. An additional
complication comes from the fact that virtual memory systems
can have more than one mapping for a given range of memory, and caching is
a feature of the mapping, not the memory itself. So one might well wonder
what happens if different mappings have different caching attributes. On
recent ARM processor designs, what happens is officially undefined; in
practice, it can mean problems like corrupted memory, machine checks, or
simple hangs. As it happens, kernel developers normally go out of their
way to avoid that kind of behavior.
The current CMA mechanism is used as an allocator behind
dma_alloc_coherent(), which creates a cache-coherent DMA buffer.
In the absence of bus-snooping hardware that is able to notice when a DMA
transfer changes memory, "cache-coherent" is likely to mean simply
"uncached." So CMA must, on such systems, create an uncached range of
memory to hand back to the requesting driver. That is easily done, and all
should be well...at least, unless there happens to be another mapping to
the same memory with different caching attributes.
Unfortunately, conflicting mappings can come about easily on a Linux
system. One of the first things the kernel does as it boots is to create a
"linear mapping" which provides kernel-space virtual addresses for most or
all of the memory present in the system. The kernel cannot manipulate
memory directly without such a mapping; putting as much of memory as
possible into a persistent mapping thus makes sense. On a 32-bit system,
just under 1GB of memory can be mapped this way (64-bit systems can always
map all of memory and will be able to do so for quite some time yet). This
kernel-mapped memory is called "low memory"; almost all allocations of
memory for the kernel's use come from the low memory area. Naturally, low
memory is mapped with caching enabled; to do otherwise would destroy the performance of
the system. If a region of low memory is turned into a DMA buffer with an
uncached mapping, the system will have two conflicting mappings for the
same memory and will have moved into "undefined behavior" territory.
These conflicting mappings are the reason behind ARM maintainer Russell
King's strong opposition to the merging of
CMA in its current form. He believes that the code is unsafe on ARM
systems; it should not, he says, be merged until the mapping problem has
been solved.
The interesting thing is that the existing DMA API has the same problem on
ARM; dma_alloc_coherent() uses vanilla alloc_pages() to
obtain a buffer, then changes the caching attributes before giving the
buffer back to the caller. The addition of CMA does not make ARM's DMA API
any more or less safe than it was before; it just perpetuates an existing
problem.
Russell has a patch pending for 3.1 which addresses this problem
by setting aside a chunk of memory which is never mapped into the kernel's
address space. With this memory pool available, coherent DMA mappings can
be set up without endangering the operation of the system.
The whole reason CMA exists, though, is to provide large, contiguous
buffers without the need to set aside memory; Russell's approach thus
defeats the entire purpose. The pressures which have led to the creation
of CMA will not go away anytime soon, so it seems that another solution is
needed. Arnd Bergmann has outlined two
possibilities, neither of which is entirely pleasant:
- CMA could be changed to only allocate from the high memory zone. High
memory is (by definition) not in the kernel's linear mapping, so no
other mappings should exist. The problem with this approach is that
it forces the use of high memory on all systems; ARM-based systems are
reaching the point where some of them need high memory anyway, but
that need is not, yet, universal. Getting enough memory into the high
memory zone to be useful could require moving the boundary and
shrinking low memory; that is not desirable because low memory is
often a limiting resource already. Even if that obstacle can be
overcome, the ARM architecture poses
unique challenges which would make a high memory implementation
hard.
- Memory that has been turned into a coherent DMA buffer could simply be
removed from the kernel's linear mapping until the buffer is no longer
needed. This approach seems simple until one remembers that the
kernel uses huge pages for the linear mapping. Splitting those huge
pages into smaller pages would increase translation lookaside buffer
(TLB) contention, reducing the performance of the system as a whole.
Compared to these alternatives, simply setting aside a chunk of memory at
boot time might not look like such a bad idea after all. CMA developer
Marek Szyprowski's plan appears to be to go
with the second of those two alternatives; he thinks that it can be done
without significantly hurting performance.
In truth, the best tradeoff will almost certainly differ from one platform
to the next. In some situations, memory will be tight enough that a
significant runtime penalty to avoid making static DMA buffers seems
worthwhile; on others, setting aside a bit of memory may not be a real
problem. So what may come of all this is a set of choices to be made
when configuring a kernel. There does not appear to be a single solution
which just works for everybody on the horizon at this time.
(
Log in to post comments)