User: Password:
Subscribe / Log in / New account

Speeding up the page allocator

Speeding up the page allocator

Posted Feb 26, 2009 21:24 UTC (Thu) by nix (subscriber, #2304)
In reply to: Speeding up the page allocator by bluefoxicy
Parent article: Speeding up the page allocator

The one that doesn't consume twice the memory bandwidth, i.e., not the

Uncached memory is *far* slower than CPUs, and cache is precious and

(Log in to post comments)

Speeding up the page allocator

Posted Feb 26, 2009 21:46 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

I came here to say pretty much the same thing. Instructions in general are waaaaay faster than memory, so caring about branch predictor performance on an "easy" case (in this case, a long-running memory fill loop) is just silly. Modern CPUs issue multiples of instructions per cycle and still measure run time in cycles per instruction, because memory is slooooooow.

I believe AMD recommended "rep stosd" for filling memory at one time. If you want to go faster still, I imagine there are SSE equivalents that store 128 or 256 bits at a go. (I haven't kept up with the latest SSE2 and SSE3. I focus on C6000-family TI DSPs.)

If you throw in "prefetch for write" instructions, you optimize the cache transfers too. I believe on AMD devices at least, it moves the line into the "O"wner state in its MOESI protocol directly, rather than waiting for the "S"hared -> "O"wner transition on the first write. (In a traditional MESI, it seems like it'd pull the line to the "E"xclusive state.)

Speeding up the page allocator

Posted Feb 27, 2009 1:08 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

Ah, it appears AMD K7 and beyond go one better and have a "streaming store" that doesn't even do a cache allocate. Nice.

Here's the MMX and AMD optimized copies and fills the kernel currently uses. I can't imagine they'd settle for a crappy loop here, and it looks like some thought was put into these.

On regular x86, they do indeed use "rep stosl". (I guess the AT&T syntax spells it "stosl" instead of "stosd"?) See around like 92.

Rampant speculation is fun and all, but I suspect Arjan actually measured these. :-) (Or, at least the ones in the MMX file.)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds