Speeding up the page allocator
Posted Feb 28, 2009 17:53 UTC (Sat) by anton
In reply to: Speeding up the page allocator
Parent article: Speeding up the page allocator
You'd copy because the whole "copy 4096 bytes" instruction is ONE instruction, "rep movsd"
And filling is also just one instruction: "rep stosd".
Concerning speed, this stuff is probably bandwidth-limited in the
usual case (when the page has cooled down for a while), so the time
for the in-core execution probably does not really matter. The branch
in the looping version should be very well predictable. Hmm, I think
it's more likely that "rep stosd" avoids the write-allocation
cache-line reads than the looping version, and that would have an
effect with the page being cold. If you want to know for certain, just
About using the DMA engine, I remember (but could not find last I
looked) a posting (by IIRC Linus Torvalds) many years ago that
compared the Linux approach of clearing on-demand with some other OS
(BSD?) that cleared pages in the idle process or something (where it
costs nothing in theory). In the bottom line (i.e., when measuring
application performance) the Linux approach was faster, because the
page was warm in the cache afterwards, and accesses to the page did
not incur cache misses. This should still hold, even with clearing by
a DMA engine.
to post comments)