LWN: Comments on "GEM v. TTM"

GEM v. TTM

quotemstr — Fri, 26 Dec 2008 01:57:37 +0000

Or, you know, fix whatever file descriptor infrastructure makes the management problematic.

GEM v. TTM

dibbles_you — Fri, 26 Dec 2008 01:13:14 +0000

"The GEM API tries to do away with the mapping of buffers into user space. That mapping is expensive to do and brings all sorts of interesting issues with cache coherency between the CPU and GPU. So, instead, buffer objects are accessed with simple read() and write() calls. Or, at least, that's the way it would be if the GEM developers could attach a file descriptor to each buffer object. The kernel, however, does not make the management of that many file descriptors easy (yet), so the real API uses separate handles for buffer objects and a series of ioctl() calls."

Ok so open/read/write/seek/mmap would be great (if the kernel could efficiently handle that many objects), ok fine, use ioctls to emulate this behavior, but shouldn't we be adding in macros so they look like open, read, write, gopen, gread and gwrite? So when the kernel is ready, it's a simple change of a #define gwrite write ?

Clarification on benchmark results

drag — Fri, 30 May 2008 20:40:10 +0000

This may be one of those situations were you just are not going to know the right way to do
it. 

Like the Linux developers stumbling over themselves to deal with wireless drivers.. First they
treated them as generic ethernet devices, which ment that each driver had to do way to much
work on it's own. 

Then Intel introduced their open source 802.11 stack, unfortunately it was not generic enough
to work with all sorts of different drivers.

Now they finally got the devicescape stuff fairly down so that it makes writing Linux wireless
drivers a sane thing to do.

Who knows? It may just be that a Intel video card vs Nvidia/ATI card are so different
architecturally that they simply can't be managed with the same API and that maybe Nvidia and
ATI cards can be managed together or something like that.

How can you tell? The only two ways I can figure it out is wait years and years and end up
with half-made drivers supporting obsolete hardware, or just to go for it and know it's going
to be a learning experience and get something done quick enough that it can actually benefit
end users.

Not a userland API?

drag — Fri, 30 May 2008 20:34:44 +0000

Open Source 3D drivers in Linux do their acceleration using userspace drivers. 

The *_dri.so drivers are loaded by your Xserver-side of things and then the in-kernel DRM
stuff is what opens up a hole for those userspace drivers to interact with the kernel. The
Linux kernel is now slowly taking on additional duties to manage display modesetting and
memory management, which should lead to help moving X out of the root account and better
display performance.

Even with very expensive cards there still isn't going to be enough memory on board to manage
a very large display with many applications open on a 3D desktop. So your going to have to
have some way to deal with intelligent way to deal with moving memory in and out of a video
card.

Not a userland API?

jhohm — Fri, 30 May 2008 18:17:34 +0000

I think TTM and GEM are not userland APIs, but in-kernel driver APIs; Linus's demand for
compatibility might not apply.

GEM v. TTM

jzbiciak — Fri, 30 May 2008 05:42:12 +0000

The use of anonymous memory also raises some performance concerns: a first-person shooter game will not provide the same experience if its blood-and-gore textures must be continually paged in.

Ah, you brought back some ~~nightmares~~memories...

//
// Z_Malloc
// You can pass a NULL user if the tag is < PU_PURGELEVEL.
//
#define MINFRAGMENT             64


void*
Z_Malloc
( int           size,
  int           tag,
  void*         user )
{
    int         extra;
    memblock_t* start;
    memblock_t* rover;
    memblock_t* newblock;
    memblock_t* base;

    size = (size + 3) & ~3;

    // scan through the block list,
    // looking for the first free block
    // of sufficient size,
    // throwing out any purgable blocks along the way.

    // account for size of block header
    size += sizeof(memblock_t);

    // if there is a free block behind the rover,
    //  back up over them
    base = mainzone->rover;
....

DOOM had a zone allocator setup where you could allocate purgable blocks. If you ran out of space, it's start purging space until there was room for the new allocation. Objects would register callbacks to handle being purged. :-)

The reason I remember it is that I had to hack around it when I made an embedded version of DOOM that directly memory mapped the WAD file rather than Z_Malloc'ing it. Finding all the places where WAD elements were being explicitly managed was no walk in the park. :-)

Re: Just merge something!

anholt — Thu, 29 May 2008 21:49:39 +0000

We have been told by Linus that we're not allowed to break userland API once the code gets
merged to the linux kernel.  We've got mistakes made 8 years ago, and fixed in better API 5
years ago, that we still have to implement because we're not allowed to break API.

It means that if you're unsure of maintaining an API today, you're really scared of merging it
and having to maintain it 5 years down the line when you've added better APIs and nobody in
their right mind is using the old software stack.

Just merge something!

nim-nim — Thu, 29 May 2008 08:56:58 +0000

I still remember seeing the first Utah GLX demos and thinking 3D was on its way to be solved.
What a fool I was.

After all this years I feel the GFX Linux developers suffer from a perpetual alpha mindset.
Stuff is started, advances enough to be used on some dev systems and be demoed at a few
conferences, then is declared "not good enough" and killed before it reaches most user systems
(because if actual users were exposed to it, they may file issues and demand that the result
is minimally maintained, and it's much more comfortable to just work on new prototypes after
new prototypes).

Other systems (wireless) have gone through several API rewrites in-tree while graphic
developpers where still debating is something should be merged at all. While the wireless
rewrites have been painful they've been a lot less painful than having them happen
out-of-tree.

So please just merge something. If it needs to be rewritten it will be rewritten, and the
rewrite will be painful, but at least users will have something to use in the meanwhile, and
they won't have to fish for new alpha code all over the internet.

GEM v. TTM

dberkholz — Thu, 29 May 2008 04:51:59 +0000

It's more that the needs of embedded hardware only supported by binary-only drivers are
different.

GEM v. TTM

anholt — Wed, 28 May 2008 22:01:58 +0000

The early benchmarking is kind of unfortunate -- we just started writing this code, and have
needed to spend more time on correctness than performance so far.  I've still got issues on
the 965 to resolve.  But keithp put in changes last week that got another 16% performance
improvement on my 945 system with GEM, I think we've got room for improvement on 915-class
still, and I know there's serious low-hanging fruit in 965 with GEM.

Right now, though, I care most about getting a solid user API that we can feel comfortable
putting into the kernel and maintaining for the forseeable future.  The only issue I have with
GEM API at the moment is the cache domain setting being general as opposed to driver-specific
API.  So far when we try to make a general API describing some bit of hardware state with an
N-bit field, it seems some other driver developer says he needs about 4N bits.

GEM v. TTM

MisterIO — Wed, 28 May 2008 19:41:31 +0000

"The first approach is, in many ways, more pleasing. But it implies that the GEM API could
change significantly over time. And that, in turn, could delay the merging of the whole thing;
the GEM API is exported to user space, and, as a result, must remain compatible as things
change. So there may be resistance to a quick merge of an API which looks like it may yet have
to evolve for some time. "

Why? If it's not standardized anywhere, you coul just label it experimental and actually try
it, before starting to say that it will remain compatible as things change.

GEM v. TTM

sylware — Wed, 28 May 2008 18:27:17 +0000

There is also opengl 3 in the pipeline. But opengl 3 is supposed to be quite high level and I
have pain to imagine a modern and fast 3D engine without the ability to have a fine grained
control on the video ram.
Carmak said that Id next engine (nb 5) will stream giant textures in order to render outside
landscape. Of course you can do it with opengl interfaces, but common sens pushes for low
level video ram management interface in order to make such engine fast and performant: will we
see low level memory management appear as an opengl extension?
To make things harder, all GPU manufacturers have announced, soon to arrive, hardware
accelerated raytracing and started to provide APIs for GPU "general programming". The GPU
market is in high entropy and tension is rising. And that's not helping the design of the new
Linux graphic stack. Intel wants to become serious with GPUs... of course those who "saw" the
larrabe performing where "stunned". Better wait to see it in a real life context. And NVIDIA
suggested that in its next GPUs much of what was done on the CPU will be offloaded on the
GPU... and that's not pleasing Intel...

Clarification on benchmark results

keithw — Wed, 28 May 2008 17:55:15 +0000

Note that the benchmark results I posted don't exactly show what is claimed in the article.

In particular, the version of the driver labeled "i915tex" is the original TTM version of the
i915 driver and has good performance, while "master/ttm" is a newer one which seems to have
suffered some degree of performance regression relative to both i915tex and the original
non-ttm version...  at least in the couple of machines I've looked at...

To make things even more confusing, it seems that Keith Packard's testing may have revealed
yet another regression in the non-ttm versions of the driver, which I haven't had a chance to
dig into at this point.  

All this testing is pretty preliminary & hampered by lack of time & travel schedules, etc.
So, nobody really has all the answers.

Anyway, the biggest win at this point would be getting some sort of a memory manager interface
that everyone agrees on & can move forward with, *providing* that it doesn't encode design
decisions which preclude a properly performant implementation -- and I'm hopeful that's the
case.

Keith

No Bounce buffers

arjan — Wed, 28 May 2008 15:43:45 +0000

Since with shmfs you can set a DMA mask (effectively) on the inode, there's no need to use
bounce buffers... you just allocate the memory in the right place from the start.

GEM v. TTM

zooko — Wed, 28 May 2008 15:08:17 +0000

"A number of complaints about TTM have been raised. Its API is far larger than is needed for
any free Linux driver; it has, in other words, a certain amount of code dedicated to the needs
of binary-only drivers."

How are the needs of binary-only drivers different than the needs of open source drivers?  Is
TTM offering API pieces that are particularly useful for DRM or something like that?