LWN: Comments on "What every programmer should know about memory, Part 1"

What every programmer should know about memory, Part 1

farnz — Sun, 29 Apr 2018 12:59:35 +0000

Nobody has updated this article because, bar a few details, not a lot has changed. FSB is diagrams 2.1 and 2.2, while QPI/UPI is diagram 2.3. All that's changed is which systems fall into which diagram.

Similar applies to the discussion of DRAM access details - while the numbers have changed, the differences are minor; DDR4 is a change from DDR3 in the same way that DDR3 is a change from DDR4, and FB-DRAM is now nearly gone from the market.

However, beyond these details, the underlying technology remains the same as it was back in 2007. Similar applies to later parts (caches etc) - the numbers are changed, but the technology and its behaviour are not significantly different.

What every programmer should know about memory, Part 1

quocbao — Sat, 28 Apr 2018 04:05:04 +0000

I think it has a reason because nobody did that for over ten years since this article was published. By the way, Wikipedia already had a good article about QPI.

What every programmer should know about memory, Part 1

neilbrown — Thu, 26 Apr 2018 21:37:47 +0000

Maybe you could be that someone?

What every programmer should know about memory, Part 1

quocbao — Thu, 26 Apr 2018 20:11:35 +0000

I hope someone will update this informative article beacause nowaday, many things have been changed for example FSB bus is replaced by QPI/UPI links.

What every programmer should know about memory, Part 1

cncwebworld — Sat, 18 Nov 2017 11:23:57 +0000

Very good compiled article

Still very useful and informative

zenk — Fri, 03 Apr 2015 07:17:59 +0000

I am surprised that almost 8 years passed, this article is still very useful and informative.
Nowadays performance is almost always related to memory performance, the information and rationale is more useful.
Thank you Ulrich and LWN!

What every programmer should know about memory, Part 1

RohitS5 — Mon, 06 Jan 2014 10:26:43 +0000

This are the kind of details which makes difference between a good programmer and an average programmer. I am surprised to see how much more to learn in this space with time, memory, threading, processing etc. Thank you

100MHz × 64bit × 2 = 1,600MB/s ?

sgifford — Tue, 04 Dec 2007 04:52:47 +0000

I was confused by the same thing; throughout section 2.2.4 it wasn't clear to me whether MB/s
meant megabytes/second or megabits/second.  Usually this abbreviation means megabytes, but the
text implied that it was megabits.  Clearing this up would make that section much... err...
clearer.

Even with that, a great article!  Thanks!

The tools used

Ford_Prefect — Mon, 19 Nov 2007 07:46:07 +0000

Is the script publishable? Might save a lot of people a lot of trouble.

Hooray!

zooko — Sun, 11 Nov 2007 06:28:37 +0000

Argh -- I shouldn't post to LWN while sleepy.

While lecturing people about the value of using precise terminology, I accidentally wrote
"gigs" when I meant "teras".

If it had been gigs, the people in the example would have been only 7.5% off.

Sorry about that.

Hooray!

zooko — Sun, 11 Nov 2007 06:24:02 +0000

When we dealt with numbers in the thousands (10^3), approximating a kilo as 2^10 was only 2.5%
off.  Now that we routinely deal with numbers in the billions (10^9), approximating a giga as
2^30 is 7.5% off.  Some of us already deal with numbers in the trillions (10^12), and
approximating a tera as 2^40 is a full 10% off!

Now if you do binary arithmetic in your head, so that when you see 14,463,188,475,466, you
instantly know that it is 13.2 * 2^40, then this comment doesn't apply to you.  But you don't.
When you see "14,463,188,475,466" you approximate it in your head as "14.5 gigs".  If you tell
someone else that you are looking at 14.5 gigs, and they think that you mean 14.5 2^40's, then
they are overestimating the number you are looking at by more than 10%!

See also:

http://en.wikipedia.org/wiki/SI_prefix

A "kilo" has meant 10^3 to the scientific world since 1795.  A "tera" has meant 10^12 since
1960.  Programmers use of units are eventually going to have to become compatible with the
larger scientific world, not least because the numbers we deal with are getting bigger.

This is great!

nix — Thu, 11 Oct 2007 21:55:08 +0000

Warning: Ulrich has no patience at all with people who don't do their
homework (by, say, typing in `ulrich drepper home page' in Google).

It's <http://people.redhat.com/drepper/>.

This is great!

vaib — Thu, 11 Oct 2007 04:04:11 +0000

Can you please tell the homepage of the author.

What every programmer should know about memory, Part 1

wookey — Wed, 10 Oct 2007 22:22:00 +0000

You have missed the bit that the '1.066Ghz' bus is 'quad-pumped' so isn't really 1GHz at all: it is a quarter of that. Hence about 11:1 rather than about 3:1 ration between clock speeds. I just learned this from the above article (I had been taken in by marketers before and assumed that FSB speeds were real :-)

What every programmer should know about memory, Part 1

DonDiego — Wed, 10 Oct 2007 15:34:29 +0000

An Intel Core 2 processor running at 2.933GHz and a 1.066GHz FSB have a clock ratio of 11:1 (note: the 1.066GHz bus is quad-pumped). Each stall of one cycle on the memory bus means a stall of 11 cycles for the processor.

11:1? I thought it was ~3:1, what have I missed there? It does not look like a typo...

What every programmer should know about memory, Part 1

njs — Fri, 05 Oct 2007 19:43:45 +0000

For the diagrams, see:
http://udrepper.livejournal.com/12663.html
http://udrepper.livejournal.com/12840.html

The tools used

corbet — Fri, 05 Oct 2007 15:51:25 +0000

All done in LaTeX and metapost. Conversion to HTML was done by a script I wrote after I gave up on all the more general LaTeX->HTML tools out there.

What every programmer should know about memory, Part 1

edmcman — Fri, 05 Oct 2007 15:33:40 +0000

A wonderful and professional article!

As a side note, does anyone know what this was written in, and perhaps what the diagrams were created in?

What every programmer should know about memory, Part 1

jschrod — Fri, 05 Oct 2007 11:18:16 +0000

Well, it depends where you study. At the TU Darmstadt, Germany, this stuff was part of our undergrad Computer Science courses, back in 1981ff. (I don't know the current curricula, though.)

FB-DIMM pins

anton — Thu, 04 Oct 2007 20:16:27 +0000

Actually only the interface at the memory controller is 69 pins (allowing more channels from one memory controller chip). The FB-DIMM needs these 69 pins, plus 69 pins to talk to the next FB-DIMM, plus additional pins for power and ground; that's why they have the familiar 240-pin form factor.

Hyperthreading performance

anton — Thu, 04 Oct 2007 20:10:22 +0000

But I think the usual reason for hyperthreading slowdown is just the overhead of switching threads.

In SMT (and that includes hyperthreading), there is no thread switching overhead. The execution core just executes instructions from different contexts at the same time (but in different resources).

I don't know why the Pentium 4 variant of SMT performs as badly as it does; cache thrashing may contribute, but I don't think that this is the main reason. The main reasons are probably some obscure microarchitectural details, maybe the replay system, maybe something else.

What every programmer should know about memory, Part 1

tjrtech — Thu, 04 Oct 2007 16:44:19 +0000

Great article and good review. I learned all this as a formally educated computer engineer. This shows why computer engineers write faster code than comp sci or informally trained coders.

100MHz × 64bit × 2 = 1,600MB/s ?

pdfan — Thu, 04 Oct 2007 06:15:02 +0000

100MHz × 64bit × 2 = 1,600MB/s

bash-3.2# bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
100 * 64 * 2
12800

100 * 64 * 2 / 8
1600
quit
bash-3.2#

What every programmer should know about memory, Part 1

wyrdwright — Wed, 03 Oct 2007 14:19:44 +0000

Excellent article; clear and concise without sacrificing too much detail. Looking forward to the rest.

Reader Comments

roelofs — Mon, 01 Oct 2007 22:22:32 +0000

Two more clarification-comments:

Recent RAM types require two separate buses (or channels as they are called for DDR2, see Figure 2.8) which doubles the available bandwidth.

Unless I'm missing something fundamental, Figure 2.8 has nothing to do with DDR2 channels. Indeed, I don't believe the comment even refers to Figures 2.12 or 2.13; I see nothing relevant. Perhaps the figure in question was dropped at some point?

In this example the SDRAM spits out one word per cycle.

Here and in several other places, the text is ambiguous. "Cycle" in this context apparently means clock cycle, but there's an implicit (larger) cycle measured from RAS to RAS (for example) that defines the overall throughput. Figure 2.8 actually shows four words going out in that larger cycle.

Greg

Grammar correction

valankar — Mon, 01 Oct 2007 03:47:14 +0000

"Implementing this is trivial: one only has the use the same column address for two DRAM cells and access them in parallel."

should be:

"Implementing this is trivial: one only has to use the same column address for two DRAM cells and access them in parallel."

What every programmer should know about memory, Part 1

k8to — Sun, 30 Sep 2007 20:27:02 +0000

Agreed. But knowing how a dram cell is implemented is more than one level of abstraction below low-level programming, and that is more than one level of abstraction below what "every" programmer will ever deal with.

on-board video cards

foom — Sat, 29 Sep 2007 23:06:05 +0000

But I guess DVI designers decided the computer wants to update the picture by a full raster scan 60 times a second anyway, so there's no need for internal refresh. Doing a little reading just now, it looks like the DVI data stream is a simple raster scan. It even apparently has "blanking intervals," though they couldn't possibly be for same purpose as on a CRT.

DVI's timing and blanking intervals are the same as VGA's. I believe it was designed this way to make the modification to the video cards easier, and to facilitate dual-output DVI / VGA video cards. (so the VGA port is basically just the DVI port with an extra D2A converter in the path.)

on-board video cards

giraffedata — Sat, 29 Sep 2007 21:49:38 +0000

It's not the physics, but the modernness that I think makes the refresh not necessary for LCDs at the level it is for CRTs: If I were designing a monitor in the 1990s out of parts that need to be refreshed (even a CRT), I would put required refresh function inside,rather than pass the responsibility off to the computer. In SVGA days, though, it probably made sense to keep the monitor dumb.

But I guess DVI designers decided the computer wants to update the picture by a full raster scan 60 times a second anyway, so there's no need for internal refresh. Doing a little reading just now, it looks like the DVI data stream is a simple raster scan. It even apparently has "blanking intervals," though they couldn't possibly be for same purpose as on a CRT.

Good article (so far)

filker0 — Fri, 28 Sep 2007 19:51:51 +0000

The article here is primarily about x86 type systems, assumes (I think) 64 bit multi-core CPUs, and also assumes a general computing environment. Extending this to other architectures might make it more useful (not that it's not a good overview so far; I learned a few things, and it's only 1/7th of the way through) to the folks this will matter the most to -- the embedded Linux programmer. Embedded programmers have more control over their environment than typical user-space programmers, and often need to tweak things to get rid of every wasted cycle possible.

Better knowing how the memory works, how it's connected to the rest of the system, and how software can be written to take this into account can lead to better performance. If this is applied at the kernel level when organizing kernel data structures and code, as well as in the design of service code (DMA, paging, interrupt handlers, data streams/pipes, IPC, etc.) could lead to better system performance.

Thanks for the article. I look forward to the rest of the parts.

What every programmer should know about memory, Part 1

pm101 — Fri, 28 Sep 2007 15:56:56 +0000

Personally, I think it is useful to have a reasonable knowledge one level of abstraction down, and one level of abstraction up. There are a number of reasons for this:

You can often find optimizations that cross abstraction barriers. It is difficult to predict what you'll need to know to do this, so you really need to have a deep understanding of both layers. In some cases, you can also influence the hardware design.
You can predict how the technology may evolve.
You gain intellectual depth.

This is especially important for systems programmers -- the target of this article. If I'm designing a kernel, or a virtual machine (as in JVM, or .net runtime), or a high-performance systems library, I want to design it in such a way that it can take advantage of possible future underlying technologies.

Indeed, in many cases, I may even be able to influence underlying technologies. If I am aware of the circuit requirements of memory refresh, I can design code that explicitly leaves time for the refresh, while giving good bandwidth and latency when the memory is actually accessed. If something is a good idea, and a major OS or runtime can take advantage of it, you can bet that hardware designers somewhere will add support for it.

The major reason most CPUs only have 2-4 cores today, and didn't have multiple cores a while ago, was that software could not take advantage of them. Right now, optimum performance comes from about 64 cores at 700MHz each (the Tilera processor), but it can only be used in esoteric applications because software designers a decade ago were not aware of where the hardware is headed, and did not design applications, languages, or run-times in a parallelism-friendly way (programmer-friendly parallelism is only starting to happen today with languages like Fortress).

on-board video cards

pm101 — Fri, 28 Sep 2007 14:41:51 +0000

Are you sure?

My impression was that active matrix LCDs worked a lot like DRAM. They had a transistor and a capacitor for each pixel (the transistor was added in the move from passive to active matrix), but the voltage on that capacitor decayed and needed to be periodically refreshed. I was unaware of LCD displays having any on-board memory from which to do the refresh, but that could have been added while I wasn't following the market, although I'd be surprised, since it seems like it'd be an unnecessary cost item.

Another grammar fix

rmunn — Thu, 27 Sep 2007 21:41:06 +0000

I spotted another grammar oops. In section 2 ("Commodity Hardware Today"), seventh paragraph (the one immediately after the first bullet-point list), the last sentence reads: "This problem, therefore, must to be taken into account." That should be either "needs to be taken into account" or "must be taken into account."

This is, of course, an artifact of editing the paper, where "needs to be" was changed into "must be" at some point but the leftover "to" was missed.

What every programmer should know about memory, Part 1

Unleashed — Thu, 27 Sep 2007 20:06:41 +0000

Hey, this article and the likes you can find here just made me finally decide to subscribe! Bravo Ulrich!

What every programmer should know about memory, Part 1

jvestby — Thu, 27 Sep 2007 19:36:32 +0000

Excellent stuff.

I believe I have found a small typo in the formula just above figure 2.12.
Something like 133MHz is needed to get 1600.

Hooray!

dwheeler — Thu, 27 Sep 2007 14:42:56 +0000

I'm delighted to see this series, thanks for running it. It's always frustrated me that people who develop software often have no clue what's going on underneath, and as a result write hideous code. E.G., yes, it does matter what order you access matrices in. I presume this series will eventually get there.

Also: let me say that I like the SI binary prefixes (GiByte, etc.); when computer memories were 48K, the difference between the binary and decimal prefixes didn't matter much, but as everything is getting bigger/faster, the differences have getting bigger too. When you're being imprecise, it doesn't matter, but when you want to be precise (e.g., when describing product specs or presenting a diagnostic report), I find them REALLY helpful. In some circumstances it's also the law: claiming your product does something, but not actually meeting your claims (because you used the wrong prefix) can actually get you hauled into court. There's a much bigger world beyond computing, and they already know what "Giga" means; it's 10^9.

What every programmer should know about memory, Part 1

tyhik — Thu, 27 Sep 2007 08:09:54 +0000

"... but the none of them need be aware of the circuit design. None."

There are programmers out there, who write memory controller configuration code for boot loaders. I have done it and knowing the electrical design of memory cells really helped to answer simple questions like why the hell does DRAM need a configurable controller while the onchip SRAM is nicely ready for use right after powerup.

Reader Comments

drepper — Wed, 26 Sep 2007 23:25:01 +0000

I appreciate (most of) the comments and actually made already a few changes based on them to clarify a few things (and correct typos etc).

But I'm not going to reply to anything specific here and now. This is just section 2 (with 1 only being an introduction). Some of what has been discussed in comments goes far beyond what is in these sections. Once you've read section 6 you probably have a better understanding about what is covered and what isn't (and to some extend: why certain things are covered in the first place).

So, don't regard my silence as a sign of disinterest, it just means that many questions will automatically be answered later.

What every programmer should know about memory, Part 1

k8to — Wed, 26 Sep 2007 21:22:50 +0000

I cannot help but be a sourpuss and say: almost no programmer neeeds to know this about memory.

Some architects on software that needs good optimization should probably be acquiainted with the performance characteristics discussed, but the none of them need be aware of the circuit design. None.

What every programmer should know about memory, Part 1

dankamongmen — Wed, 26 Sep 2007 20:18:45 +0000

Ulrich's amazing, and the main source of my understandings of modern glibc / linux-userspace API. Thanks again for such execellent code and attending documentation!