By Jonathan Corbet
May 28, 2008
Getting high-performance, three-dimensional graphics working under Linux is
quite a challenge even when the fundamental hardware programming
information is available. One component of this problem is memory
management: a graphics processor (GPU) is, essentially, a computer of its
own with a distinct view of memory. Managing the GPU's memory - and its
view of system RAM - must be done carefully if the resulting system is
intended to work at all, much less with acceptable performance.
Not that long ago, it appeared that this problem had been solved with the
translation table maps (TTM)
subsystem. TTM remains outside of the mainline kernel, though, as do
all drivers which use it. A recent query
about what would be required to get TTM merged led to an interesting
discussion where it turned out that, in fact, TTM may not be the future of
graphics memory management after all.
A number of complaints about TTM have been raised. Its API is far larger
than is needed for any free Linux driver; it has, in other words, a certain
amount of code dedicated to the needs of binary-only drivers. The fencing
mechanism (which manages concurrency between the host CPUs and the GPU) is
seen as being complex, difficult to work with, and not always yielding the
best performance. Heavy use of memory-mapped buffers can create performance
problems of its own. The TTM API is an exercise in trying to provide for
everything in all situations; as a result it is, according to some
driver developers, hard to match to
any specific hardware, hard to get started with, and still insufficiently
flexible. And, importantly, there is a
distinct shortage of working free drivers which use TTM. So Dave Airlie worries:
I was hoping that by now, one of the radeon or nouveau drivers
would have adopted TTM, or at least demoed something working using
it, this hasn't happened which worries me... The real question is
whether TTM suits the driver writers for use in Linux desktop and
embedded environments, and I think so far I'm not seeing enough
positive feedback from the desktop side
All of these worries would seem to be moot, since TTM is available and
there is nothing else out there. Except, as it turns out, there is
something out there: it's called the Graphics Execution Manager, or GEM.
The Intel-sponsored GEM project is all of one month old, as of this writing.
The GEM developers had not really intended to announce
their work quite yet, but the TTM discussion brought the issue to the fore.
Keith Packard's introduction to GEM includes a
document describing the API as it exists so far. There are a number of
significant differences in how GEM does things. To begin with, GEM
allocates graphical buffer objects using normal, anonymous, user-space
memory. That means that these buffers can be forced out to swap when
memory gets tight. There are clear advantages to this approach, and not
just in memory flexibility: it also makes the implementation of suspend and
resume easier by automatically providing backing store for all buffer
objects.
The GEM API tries to do away with the mapping of buffers into user space.
That mapping is expensive to do and brings all sorts of interesting issues
with cache coherency between the CPU and GPU. So, instead, buffer objects
are accessed with simple read() and write() calls. Or,
at least, that's the way it would be if the GEM developers could attach a
file descriptor to each buffer object. The kernel, however, does not make
the management of that many file descriptors easy (yet), so the real API
uses separate handles for buffer objects and a series of ioctl()
calls.
That said, it is possible to map a buffer object into user space. But then
the user-space driver must take explicit responsibility for the management
of cache coherency. To that end there is a set of ioctl() calls
for managing the "domain" of a buffer; the domain, essentially, describes
which component of the system owns the buffer and is entitled to operate on
it. Changing the domains (there are two, one for read access and one for
writes) of a buffer will perform the necessary cache flushes. In a sense,
this mechanism resembles the streaming DMA API, where the ownership of DMA
buffers can be switched between the CPU and the peripheral controller.
That is not entirely surprising, as a very similar problem is being solved.
This API also does away with the need for explicit fence operations.
Instead, a CPU operation which requires access to a buffer will simply
wait, if necessary, for the GPU to finish any outstanding operations
involving that buffer.
Finally, the GEM API does not try to solve the entire problem; a number of
important operations (such as the execution of a set of GPU commands) are
left for the hardware-specific driver to implement. GEM is, thus, quite
specific to the needs of Intel's driver at this time; it does not try for
the same sort of generality that was a goal of TTM. As described by Eric Anholt:
The problem with TTM is that it's designed to expose one general
API for all hardware, when that's not what our drivers want...
We're trying to come at it from the other direction: Implement one
driver well. When someone else implements another driver and finds
that there's code that should be common, make it into a support
library and share it.
The advantage to this approach is that it makes it relatively easy to
create something which works well with Intel drivers. And that may well be
a good start; one working set of drivers is better than none. On the other
hand, that means that a significant amount of work may be required to get
GEM to the point where it can support drivers for other hardware. There
seem to be two points of view on how that might be done: (1) add
capabilities to GEM when needed by other drivers, or (2) have each
driver use its own memory manager.
The first approach is, in many ways, more pleasing. But it implies that
the GEM API could change significantly over time. And that, in turn, could
delay the merging of the whole thing; the GEM API is exported to user
space, and, as a result, must remain compatible as things change. So there
may be resistance to a quick merge of an API which looks like it may yet
have to evolve for some time.
The second approach, instead, is best described by Dave Airlie:
Well the thing is I can't believe we don't know enough to do this
in some way generically, but maybe the TTM vs GEM thing proves its
not possible. So we can then punt to having one memory manager per
driver, but I suspect this will be a maintenance nightmare, so if
people decide this is the way forward, I'm happy to see it
happen. However the person submitting the memory manager n+1 must
damn well be willing to stand behind the interface until time ends,
and explain why they couldn't re-use 1..n memory managers.
One other remaining issue is performance. Keith Whitwell posted some benchmark results showing that the
i915 driver performs significantly worse with either TTM or GEM than
without. Keith Packard gets different
results, though; his tests show that the GEM-based driver is significantly
faster. Clearly there is a need for a set of consistent benchmarks;
performance of graphics drivers is important, but performance cannot be
optimized if it cannot be reliably measured.
The use of anonymous memory also raises some performance concerns: a
first-person shooter game will not provide the same experience if its
blood-and-gore textures must be continually paged in. Anonymous memory can
also be high memory, and, thus, not necessarily accessible via a 32-bit
pointer. Some GPU hardware cannot address high memory; that will likely
force the use of bounce buffers within the kernel. In the end, GEM will
have to prove that it can deliver good performance; GEM's developers are
highly motivated to make their hardware look good, so there is a reasonable
chance that things will work out on this front.
The conclusion to draw from all of this is that the GPU memory management
problem cannot yet be considered solved. GEM might eventually become that
solution, but it is a very new API which still needs a fair amount of
work. There is likely to be a lot of work yet to be done in this area.
(Thanks to Timo Jyrinki for suggesting this topic.)
(
Log in to post comments)