User: Password:
Subscribe / Log in / New account

Speeding up the page allocator

Speeding up the page allocator

Posted Feb 28, 2009 17:53 UTC (Sat) by anton (subscriber, #25547)
In reply to: Speeding up the page allocator by bluefoxicy
Parent article: Speeding up the page allocator

You'd copy because the whole "copy 4096 bytes" instruction is ONE instruction, "rep movsd"
And filling is also just one instruction: "rep stosd".

Concerning speed, this stuff is probably bandwidth-limited in the usual case (when the page has cooled down for a while), so the time for the in-core execution probably does not really matter. The branch in the looping version should be very well predictable. Hmm, I think it's more likely that "rep stosd" avoids the write-allocation cache-line reads than the looping version, and that would have an effect with the page being cold. If you want to know for certain, just measure it.

About using the DMA engine, I remember (but could not find last I looked) a posting (by IIRC Linus Torvalds) many years ago that compared the Linux approach of clearing on-demand with some other OS (BSD?) that cleared pages in the idle process or something (where it costs nothing in theory). In the bottom line (i.e., when measuring application performance) the Linux approach was faster, because the page was warm in the cache afterwards, and accesses to the page did not incur cache misses. This should still hold, even with clearing by a DMA engine.

(Log in to post comments)

Speeding up the page allocator

Posted Mar 5, 2009 8:18 UTC (Thu) by efexis (guest, #26355) [Link]

Plus you usually don't want empty pages sitting around doing nothing; it's a waste of memory. All "free" pages can instead be used as swap/filesystem cache. If a memory request comes in, then the page is yanked from the page cache, wiped, and given straight to whoever needs it. If the original contents (before it was yanked) are needed by the pagecache, it's merely faulted back in, pulling the data from its backing store, into a new location (and doesn't need pre-wiping for this purpose). Yes there is a case for fresh memory upon system boot, but as that only occures once per page, it's really not worth bogging the kernel down with code for.

Speeding up the page allocator

Posted Mar 5, 2009 8:37 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

Don't get hung up on the fact that "rep stosd" is one instruction. The number of instructions isn't what matters, it's the number of memory operations. The CPU expands "rep stosd" and "rep movsd" into lots of "mov" instructions under the hood. Talking about the number of instructions here is just talking about code size, not how fast it actually fills memory.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds