February 15, 2008
This article was contributed by B. Rathmann (KoalaBR)
[
Editor's note: the following is the first in a two-part article on the
status of the Nouveau project. This installment is an introductory piece
describing the problem; the second part (to appear in one week) looks at
how Nouveau development is being done and its current status.]
Nouveau is an effort to
create a complete open source driver for NVidia
graphics cards for X.org. It aims to support 2D and 3D acceleration from
the early NV04 cards up to the latest G80 Cards and work across all
supported architectures like x86-64, PPC and x86.
The project originated when Stéphane Marchesin set out to de-obfuscate parts
of the NVidia-maintained nv driver. However, NVidia had corporate policies
in place about the nv driver, and had no plans to change them at the
time. So they refused Stéphane's patches.
This left Stéphane with the greatest open source choice:
"fork it"! At FOSDEM in February 2006, Stéphane unveiled his plans for an
open source driver for NVidia hardware called Nouveau. The name was
suggested by his IRC client's French autoreplace feature which suggested
the word "nouveau" when he typed "nv". People liked it, so the name
stuck. The FOSDEM presentation got the project enough publicity to engage
the curiosity of other developers.
Ben Skeggs was one of the first developers to sign up. He had worked on reverse
engineering the R300 (one of ATI graphics chips) shader components and
writing parts of the R300 driver; as a result, he had great experience with graphics
drivers. He initially showed interest in the NV40 shaders only, but he got
caught in the event horizon and has worked on every aspect of the driver
for NV40 and later cards.
The project engaged other developers with short and long term interest. It
also generated a large amount of interest due to a pledge drive that an
independent user started.
However, the project was mainly developed on IRC and it was quite difficult
for newcomers to get any insight into previous development; reading
IRC logs is unpractical at best. With this in mind, KoalaBR decided to
start summarizing development in a series of articles known as the TiNDC
(The irregular Nouveau Development Companion). This series of articles
proved very useful for attracting developers and testers to the
project. TiNDC issues are published every two to four weeks; as of this
writing, the current issue is TiNDC
#34.
Linux.conf.au 2007 saw the first live demo of Nouveau. Dave Airlie had signed up to
give a talk on the subject; he managed to persuade Ben Skeggs that showing a
working glxgears demo would be a great finish to the talk. Ben toiled furiously
with the other developers to get the init code into shape for his laptop
card and the presentation was a great success.
After missing a Google Summer of Code place, X.org granted Nouveau a
Vacation of Code alternative. This saw Arthur Huillet join the team to
complete proper Xv support on Nouveau. Arthur saw the light and continued
with the project once the VoC ended.
In autumn 2007 Stuart Bennett and Maarten Maathuis vowed to get Nouveau's
RandR1.2 into a better shape. Since then a steady stream of patches has
advanced the code greatly.
The project now has 8 regular contributors (Stéphane Marchesin, Ben Skeggs,
Patrice Mandin, Arthur Huillet, Pekka Paalanen, Maarten Maathuis, Peter
Winters, Jeremy Kolb, Stuart Bennett) with many more part time
contributors, testers, writers and translators.
NVidia card families
This article will use the NVidia GPU technical names as opposed to marketing names.
| GPU name | Product name(s) |
| NV04/05 | Riva TNT, TNT2 |
| NV1x | GeForce 256, GeForce 2, GeForce 4 MX |
| NV2x | GeForce 3, GeForce 4 Ti |
| NV3x | GeForce 5 |
| NV4x(G7x) | GeForce 6, GeForce 7 |
| NV5x(G8x) | GeForce 8 |
Where there are "N" and "G" naming the "N" variant (NV4x, NV5x) will be used.
Further information can be found on the Nouveau site.
Graphic Stack Overview
Before jumping into the Nouveau driver, this section provides a short
background on the mess that is the Linux graphics stack.
This stack has a long history dating back to Unix X
servers and the XFree86 project. This history has lead to a situation quite unlike
the driver situation for any other device on a Linux system. The graphics
drivers existed mainly in user space, provided by the XFree86 project, and
little or no kernel interaction was required. The user-space component known
as the DDX (Device-Dependant X) was responsible for initializing the card,
setting modes and providing acceleration for 2D operations.
The kernel also provided framebuffer drivers on certain systems to allow a
usable console before X started. The interaction between these drivers
and the X.org drivers was very complex and often caused many problems
regarding which driver "owned" the hardware.
The DRI project was started to add support for direct rendering of 3D
applications on Linux. This meant that an application could talk to the 3D
hardware directly, bypassing the X server. OpenGL was the standard 3D API, but
it is a complex interface which is definitely too large to
implement in-kernel. GPUs also provided completely different low-level
interfaces. So, due to the complexity of the higher level interface and
nonstandard nature of the hardware APIs, a kernel component (DRM) and a
userspace driver (DRI) were required to securely expose the hardware interfaces
and provide the OpenGL API.
Shortcomings of the current architecture have been noted over the past few
years; the current belief is that GPU initialization, memory management,
and mode setting need to migrate to the kernel in order to provide better
support for features such as suspend/resume, proper cohabitation of X and
framebuffer driver, kernel error reporting, and future graphics card
technologies.
The GPU memory manager implemented by Tungsten Graphics is known as TTM. It was originally designed as a
general VM memory manager but initially targeted at Intel hardware.
On top of this memory manager, a new modesetting architecture for the
kernel is being implemented. This is based on the RandR 1.2 work found in
the X.org server.
GPU architecture
Graphics cards are programmed in numerous ways, but most initialization and
mode setting is done via memory-mapped IO. This is just a set of registers
accessible to the CPU via its standard memory address space. The registers
in this address space are split up into ranges dealing with various
features of the graphics card such as mode setup, output control, or clock
configuration.
A longer explanation can be found on Wikipedia.
Most recent GPUs also provide some sort of command processing ability where
tasks can be offloaded from the CPU to be executed on the GPU, reducing the
amount of CPU time required to execute graphical operations. This
interface is commonly a FIFO implemented as a circular ring buffer into which
commands are pushed by the CPU for processing by the GPU. It is
located somewhere in a shared memory area (AGP memory, PCIGART, or video
RAM). The GPU will also have a set of state information that is used to
process these commands, usually known as a context.
Most modern GPUs only contain a single command processing state
machine. However NVidia hardware has always contained multiple independent
"channels" which consist of a private FIFO (push buffer), a graphics
context and a number of context objects. The push buffer contains the
commands to be processed by the card. The graphics context stores
application specific data such as matrices, texture unit configuration,
blending setup, shader information etc. Each channel has 8 subchannels to
which graphics objects are bound in order to be addressed by FIFO
commands.
Each NVidia card provides between 16 and 128 channels, depending on model;
these are assigned to different rendering-related tasks. Each 3D client has
an associated channel, while some are reserved for use in the kernel and
the X
server. Channels are context-switched by software via an interrupt (on older
cards) or automatically by the hardware on cards after the NV30.
Now what to store within the FIFO? Each NVidia card offers a set of
objects, each of which provide a set of methods related to a given task,
e.g. DMA memory transfers or rendering. Those methods are the ones used by
the driver (or on a higher level, the rendering application). Whenever a
client connects, it uses an ioctl() to create the channel. After that the
client creates the objects it needs via an additional ioctl().
Currently we do have two types of possible clients: X (via the DDX driver)
and OpenGL via DRI/MESA. An accelerated framebuffer using the new
mode setting architecture (nouveaufb) will also be a future client to avoid
conflicts with nvidiafb.
Let's have a look at a small number of objects:
| object name | Description | Available on |
| NV_IMAGE_BLIT | 2D engine, blit image from
one image into another one |
NV03 NV04 NV10 NV20 |
| NV12_IMAGE_BLIT | An enhanced version of the
above | NV11 NV20 NV20
NV30 NV40 |
| NV_MEMORY_TO_MEMORY_FORMAT | DMA memory transfer
| NV04 NV10 NV20 NV30 NV40 NV50 |
From this list, you can see that there are object types which are
available on all cards (NV_MEMORY_TO_MEMORY_FORMAT) while others are only
available on certain cards. For example, each class of card has its own
3D-engine object, such as NV10TCL on NV1x and NV20TCL on NV2x. An object
is identified by a unique number: its "class". This id is 0x5f for
NV_IMAGE_BLIT, 0x9f for NV12_IMAGE_BLIT and 0x39 for
NV_MEMORY_TO_MEMORY_FORMAT. If you want to use functionality provided by a
given object, you must first bind this object to a subchannel. The card
provides a certain number of subchannels which correspond to a certain
number of "active" (or "bound") objects.
A command in the FIFO is made of a command header, followed by one or more
parameters. The command header usually contains the subchannel number, the
method offset to be called, and the number of parameters (a command header
can also define a jump in the FIFO but this is outside the scope of this
document). Each method the object provides has an offset which has to be set in the
command.
In order to limit the number of command headers to be written, thereby
improving performance, NVidia cards will call several subsequent methods in
a row if you provide several parameters.
How do we refer to an object? The data written to the FIFO doesn't hold any
info about that... Binding an object to a subchannel is done by writing
the object ID as an argument to method number 0. For example: 00044000
5c00000c binds object id 5c00000c to subchannel 2. This object ID is used
as a key in a hash table kept in the card's memory which is filled up when
creating objects.
The creation of an object relies on special memory areas.
RAMIN is "instance memory", an area of memory through which the graphics
engines of the card are configured. A RAMIN area is present on all NVIDIA
chipsets in some form, but it has evolved quite a bit as newer chipsets have
been released. Basically, RAMIN is what contains the objects. An object is
usually not big (128 bytes in general, up to a few kilobytes in case of DMA
transfer objects).
| Card-specific RAMIN areas |
| Pre-NV40
|
Area of dedicated internal memory accessible through the card's MMIO
registers.
|
| NV4x
| A 16MiB PCI resource is used to access PRAMIN. This
resource maps over the last
16MiB of VRAM. The first 1MiB of PRAMIN is also accessible through the (now "legacy")
MMIO PRAMIN aperture.
|
|
NV5x
|
A 32MiB PCI resource, which is unusable in the default power-on state of the card. It
can be configured in a variety of different ways through the NV5x virtual memory.
The legacy MMIO aperture can be re-mapped over any 1MiB of VRAM that's desired.
|
There are also a few specific areas in RAMIN that are worth mentioning:
- RAMFC, the FIFO Context Table. It is a global table that stores the
configuration/state of the FIFO engine for each channel. It doesn't exist
in the same way on NV5x, where the FIFO has registers that contain pointers to each
channel's PFIFO state, rather than a single global table.
- RAMHT, the FIFO hash table. A global table, used by PFIFO to locate context
objects, except on NV5x, where each channel has its own hash table.
Additional information can be found on the Nv object
types and Honza Havlicek
pages on the Nouveau site.
(
Log in to post comments)