LWN.net Logo

Memory part 2: CPU caches -- factual error

Memory part 2: CPU caches -- factual error

Posted Oct 2, 2007 4:00 UTC (Tue) by tshow (subscriber, #6411)
Parent article: Memory part 2: CPU caches

I know the article is mainly addressing desktop-style computers, but it is not correct to say that OS-controlled fast RAM is not viable. It may not make sense in your PC, but nearly every game console I've worked on in the past decade has had dedicated fast RAM, usually closely tied to the CPU and sometimes the graphics system, that was mapped directly into the address space of the machine. In most cases it was on-die with the processor.

On the PlayStation, PS2 and PSP, they call it the "scratchpad". Nintendo DS has similar hardware, as did the Dreamcast as I recall. The GameCube system RAM was divided into 8M of 1T-SRAM and 24M DRAM, with the assumption that slow-access DMA data (like PCM sound files) would reside in slow memory, while executable pages and data that required frequent access would be stored in fast memory.

One common technique is to compile your game to use the scratchpad for the stack. The performance improvements can be dramatic with some loads. Cycling data that will need to be randomly accessed through the scratchpad for processing (DMA it in, work on it, DMA it out) is another common use, especially in games where physics, collision, AI or rendering are processor intensive.

You can argue that it only makes sense for certain workloads, but there are literally hundreds of millions of consoles out there built to this design; it is obviously viable.

One direction this design leads is the Cell processor design, wherein you have a series of co-processors with tightly-bound fast memory and slow/difficult access to system memory. Another is the AGP GART and its successors, or NUMA, for that matter. There are many variations of tightly-coupled vs. loosely-coupled memory out there in the wild, and many of them are address-space mapped rather than caches.

I suppose my major issue here is that the articles are turning out to be disappointingly PC-specific and CPU-centric. While CPU/memory interaction on PCs is certainly important, at least in game development it is only a piece of the puzzle. I'm hoping as we get past the introductory chapters the articles will begin to consider the lands beyond the north bridge and alternate system designs.


(Log in to post comments)

Memory part 2: CPU caches -- factual error

Posted Oct 2, 2007 20:58 UTC (Tue) by nix (subscriber, #2304) [Link]

Ulrich has been quite... emphatic in the past about not caring about them.
They're too rare, you see. (Too rare in the sense of, um, hugely
numerically dominating. I don't get it either.)

Memory part 2: CPU caches -- factual error

Posted Oct 11, 2007 7:47 UTC (Thu) by arcticwolf (guest, #8341) [Link]

Don't try to make sense of Drepper - doing so will only get you verbally abused. (Hmm, looking at the front page for this - well, last, for you - week, I wonder how many women there are in glibc development. Having lurked on the libc-alpha mailing list for a while, Drepper seems, to me, to be exhibiting a perfect example of the kind of attitude that we-the-community should try to get rid of as much as possible.)

Embedded is a special case

Posted Oct 4, 2007 3:07 UTC (Thu) by filker0 (guest, #31278) [Link]

Embedded systems are a special case and all of the rules change. Set top boxes, such as game
machines, are special purpose platforms. General coding techniques used in your typical
application tend to be architecture and platform agnostic, a game is written knowing exactly what
kind of hardware environment it's going to get. Embedded apps often manage their own cache,
too. I know the ones I'm working on right now do.

Embedded is a special case

Posted Oct 4, 2007 6:00 UTC (Thu) by tshow (subscriber, #6411) [Link]

> Embedded systems are a special case and all of the rules change.

That's fair enough, but there are an awful lot of game machines and embedded systems out there; more than there are PCs, if you count game systems, cellphones, PDAs, set-top boxes, the control systems in cars...

Our game engine deals with tightly-coupled address-mapped memory on all the platforms it supports; on platforms that don't actually have such memory (PCs, mostly), we fake it with a block of normal memory. We've built our engine as an OS (and support libraries) for games; the idea being that a game will compile on any platform that the engine supports with minimal resorting to #ifdef. You *can* write fast platform-agnostic game code that crosses (very different) platforms.

A whole lot of the techniques that I'm sure this series of articles is going to delve into (walking memory in address-order whenever possible, aligning data structures to (ideally) machine word size, (hopefully) cache line size or (at worst) hardware page size, keeping transitions across page boundaries to a minimum, unrolling loops is no longer a good idea, strategies for preventing icache misses...) are just as applicable to embedded systems as they are to PCs. Arguably moreso; caches on embedded systems and game systems tend to be significantly smaller than on PCs, so the cost of cache misses is that much higher.

With relatively little effort and a little discussion of the wider realms beyond the beige (or black, or possibly silvery; your mileage may vary) desktop space heater, this could be a significantly more useful treatise.

No disagreement here

Posted Oct 5, 2007 1:10 UTC (Fri) by filker0 (guest, #31278) [Link]

Embedded systems may outnumber general purpose PCs, but I doubt that any single platform
outnumbers them on its own. Also, far fewer programmers ever have a chance to program one.
Whether all programmers have to know how to deal with systems with 4 different types of RAM,
or demand paged high speed static RAM that is paged from a larger SDRAM, that in turn is paged
from NOR or NAND Flash by a separate microprocessor that implements a predictive pre-fetch.
Each platform is a special case.

A game engine such as the one you describe provides a virtual machine, and makes a heck of a
lot of sense. All you have to port, as you said, is the VM. (Not all VMs use byte-codes, afterall).

My current project (I'm the low-level platform guy) involves a lot of cache performance
optimization in the application level code -- aligning data on cache line boundaries, use of burst
DMA to do memory-to-memory transfers in parallel with continued code execution, and explicit
cache loading and flushes. But in our system, everything is deterministic (it has to be by the
rules of our industry). Determinism is extremely hard on a pipelined RISC architecture, and when
you add cache to the picture, it becomes almost impossible. In our case, though we need to
squeeze every drop of performance that we can, that comes second to it always taking the same
amount of time to do a specific operation.

Most programmers don't have to know the kind of cache details that game console and some
other embedded programmers (avionics, in my case) do. Still, I think it's good that more
programmers understand the concepts and techniques for improving cache performance in a
general multi-programming environment.

Memory part 2: CPU caches -- factual error

Posted Oct 5, 2007 21:16 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

What the article is biased toward isn't desktop computers or PCs (and the latter is ambiguous; sometimes it means personal computer; other times it means architectures descendant of the IBM PC). The bias is toward general purpose computers. Everything seems to be fully applicable to a typical web server, for example.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds