By Jonathan Corbet
January 23, 2009
For years, linux.conf.au has been one of the best places to go to catch up
with the state of the X Window System; the 2009 event was no exception.
There was a big difference this time around, though. X talks have
typically been all about the great changes which are coming in the near
future. This time, the X developers had a different story: most of
those great changes are done and will soon be heading toward a distribution near
you.
Keith Packard's talk started with that theme. When he spoke at
LCA2008, there were a few missing features in X.org. Small things like
composited three-dimensional graphics, monitor hotplugging, shared
graphical objects, kernel-based mode setting, and kernel-based
two-dimensional drawing. One of the main things holding all of that work
back was the lack of a memory manager which could work with the graphics
processor (GPU). It was, Keith said, much like programming everything in
early Fortran; doing things with memory was painful.
That problem is history;
X now has a kernel-based memory management system. It can be used
to allocate persistent objects which are shared between the CPU and the
GPU. Since graphical objects are persistent, applications no longer need to make backup
copies of everything; these objects will not disappear. Objects have
globally-visible names, which, among other things, allows them to be shared
between applications. They can even be shared between different APIs, with
objects being transformed between various types (image, texture, etc.) as
needed. It looks, in fact, an awful lot like a filesystem; there may
eventually be a virtual filesystem interface to these objects.
This memory manager is, of course, the graphics execution manager,
or GEM. It is new code; the developers first started talking about the
need to start over with a new memory manager in March, 2008. The first
implementation was posted in April, and the code was merged for the 2.6.28
kernel, released in December. In the process, the GEM developers dropped a
lot of generality; they essentially abandoned the task of supporting BSD
systems, for example ("sorry about that," says Keith). They also limit
support to some Intel hardware at this point. After seeing attempts at large,
general solutions fail, the GEM developers decided to focus on getting one
thing working, and to generalize thereafter. There is work in progress to
get GEM working with ATI chipsets, but that project will not be done for a
little while yet.
[PULL QUOTE:
Moving data between caches is very
expensive, so caching must be managed with great care.
This is a task they
had assumed would be hard. "Unfortunately," says Keith, "we were right."
END QUOTE]
GEM is built around the shmfs filesystem code; much of the fundamental
object allocation is done there. That part is easy; the biggest hassle
turns out to be in the area of cache management. Even on Intel hardware,
which is alleged to be fully cache-coherent, there are caching issues which
arise when dealing with the GPU. Moving data between caches is very
expensive, so caching must be managed with great care. This is a task they
had assumed would be hard. "Unfortunately," says Keith, "we were right."
One fundamental design feature of GEM is the use of global names for
graphical objects. Unlike previous APIs, GEM does not deal with physical
addresses of objects in its API. That allows the kernel to move things
around as needed; as a result, every application can work with the
assumption that it has access to the full GPU memory aperture. Graphical
objects, in turn, are referenced by "batch buffers," which contain
sequences of operations for the GPU. The batch buffer is the fundamental
scheduling unit used by GEM; by allowing multiple applications to schedule
batch buffers for execution, the GEM developers hope to be able to take
advantage of the parallelism of the GPU.
GEM replaces the "balkanized" memory management found in earlier APIs.
Persistent objects eliminate a number of annoyances, such as the dumping of
textures at every task switch. What is also gone is the allocation of the
entire memory aperture at startup time; memory is now allocated as needed.
And lots of data copying has been taken out. All told, it is a much
cleaner and better-performing solution than its predecessors.
Getting this code into the kernel was a classic example of working well
with the community. The developers took pains to post their code early,
then they listened to the comments which came back. In the process of
responding to reviews, they were able to make some internal kernel API
changes which made life easier. In general, they found, when you actively
engage the kernel community, making changes is easy.
The next step was the new DRI2 X extension, intended to replace the (now
legacy) DRI extension. It only has three requests, enabling connection to
the hardware and buffer allocation. The DRI shared memory area (and its
associated lock) have been removed, eliminating a whole class of problems.
Buffer management is all done in the X server; that makes life a lot easier.
Then, there is the kernel mode-setting (KMS) API - the other big missing
piece. KMS gets user-space applications out of the business of programming
the adapter directly, putting the kernel in control. The KMS code (merged
for 2.6.29) also implements the fbdev interface, meaning that graphics and
the console now share the same driver. Among other things, that will let
the kernel present a traceback when the system panics, even if X is
running. Fast user switching is another nice feature which falls out of
the KMS merge.
KMS also eliminates the need for the X server to run with root privileges,
which should help security-conscious Linux users sleep better at night.
The X server is a huge body of code which, as a rule, has never been
through a serious security audit. It's a lot better if that code can be
run in an unprivileged mode.
Finally, KMS holds out the promise of someday supporting non-graphical uses
of the GPU. See the GPGPU site for
information on the kinds of things people try to do once they see the GPU
as a more general-purpose coprocessor.
All is not yet perfect, naturally. Beyond its limited hardware support,
the new code also does not yet solve the longstanding "tearing" problem.
Tearing happens when an update is not coordinated with the monitor's
vertical refresh, causing half-updated screens. It is hard to solve
without stalling the GPU to wait for vertical refresh, an operation which
kills performance. So the X developers are looking at ways to
context-switch the GPU. Then buffer copies can be queued in the kernel and
caused to happen after the vertical refresh interrupt. It's a somewhat
hard problem, but, says Keith, it will be fixed soon.
There is reason to believe this promise. The X developers have managed to
create and merge a great deal of code over the course of the last year.
Keith's talk was a sort of a celebration; the multi-year process of
bringing X out of years of stagnation and into the 21st century is coming
to a close. That is certainly an achievement worth celebrating.
Postscript: Keith's talk concerned the video output aspect of the X
Window System, but an output-only system is not particularly interesting. The other
side of the equation - input - was addressed by Peter Hutterer in a
separate session. Much of the talk was dedicated to describing the current
state of affairs on the input side of X. Suffice to say that it is a
complex collection of software modules which have been bolted on over the
years; see the diagram in the background of the picture to the
right.
What is more interesting is where things are going from here. A lot of
work is being done in this area, though, according to Peter, only a couple
of developers are doing it. Much of the classic
configuration-file magic has been superseded by HAL-based autoconfiguration
code. The complex sequence of events which follows the attachment of a
keyboard is being simplified. Various limits - on the number of buttons on
a device, for example - are being lifted. And, of course, the
multi-pointer X work (discussed
at LCA2008) is finding its way into the mainline X server and into
distributions.
The problems in the input side of X have received less attention, but it is
still an area which has been crying out for work for some time. Now that
work, too, is heading toward completion. For users of X (and that is
almost all of us), life is indeed getting better.
(
Log in to post comments)