User: Password:
Subscribe / Log in / New account

Speeding up the page allocator

Speeding up the page allocator

Posted Feb 26, 2009 14:35 UTC (Thu) by (guest, #38022)
In reply to: Speeding up the page allocator by nix
Parent article: Speeding up the page allocator

The right time to zero a page is maybe when they are free'd, and then a DMA operation would tell the cache subsystem to use his local memory for something more interresting than blank pages (hoping that a dirty non-accessed cache line is preferably evicted to be used for another address needing cache).

(Log in to post comments)

Speeding up the page allocator

Posted Feb 26, 2009 20:17 UTC (Thu) by bluefoxicy (guest, #25366) [Link]

Zeroing a page on free is incorrect. What if you free, allocate, don't use, free, allocate, don't use, free... kalloc() for example allocates units of 4k, 8k, 16k, 32k, 64k, 128k. A 65k alloc will bring in 15 untouched pages, that then get freed. The proper time to zero a page is just-in-time on read/write (which, coincidentally, is also the time to commit an allocation to a real physical memory page)

Speeding up the page allocator

Posted Feb 27, 2009 1:21 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

While it's true that zeroing all freed pages is a bad idea, keeping a pool of freed pages that's refilled during idle periods isn't so crazy. I believe the Windows NT kernel does something along those lines. You do end up putting more code in the fast-path to detect whether the "prezeroed pool" is non-empty, and it only applies to GFP_ZERO pages anyway, so I suspect it ends up not being a win under Linux.

Mel's patches bring a noticeable speedup at the benchmark level, and suggest to me that GFP_ZERO pages are not the most numerous allocations in the system. This makes intuitive sense--most allocations back other higher-level allocators in the kernel and/or provide buffer space that's about to be filled. There's no reason to zero it. Complicating those allocations for a minor speedup in GFP_ZERO allocations seems misplaced.

Speeding up the page allocator

Posted Feb 27, 2009 10:26 UTC (Fri) by (guest, #38022) [Link]

Please note that I was more talking about DMA zeroing, which is nearly "free" in CPU time (on some tests I did on PPC, it is more than 10 times faster than the CPU zeroing - excluding dcbz which cannot be used on un-cached memory to be precise).
The big advantage is that it should also remove those cache-lines from the memory cache (layer 1, 2 and 3 if present) at time of free(), so it should still be better if you "free, allocate, don't use, free, allocate, don't use" because the allocated and unused memory isn't even fetched into the memory cache, and isn't made dirty for the other processors cache.
But it is probably more complex (multiprocessor DMA semaphore), and for these kind of things only testing can tell the truth, and that truth is only valid for the tested environment.

Speeding up the page allocator

Posted Feb 27, 2009 23:14 UTC (Fri) by nix (subscriber, #2304) [Link]

It's nearly free, but is it worth the complexity? How many pages are
zeroed, and then not used soon enough that it's still in cache?

IIRC the zero page was removed from the kernel because zeroing pages was
faster than doing pagetable tricks to share a single zero page. Pagetable
manipulation is particularly expensive, but even so...

we have /dev/zero, why not use the hardware implementation?

Posted Mar 4, 2009 8:03 UTC (Wed) by xoddam (subscriber, #2322) [Link]

Yep. Zeroing memory before handing it to userspace is a *very* common operation (an absolute requirement for security reasons) and every memory controller worth its salt should supply an efficient way of doing so without trashing the CPU cache. If the Linux kernel doesn't already make use of it wherever it is available, that's madness.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds