User: Password:
|
|
Subscribe / Log in / New account

Speeding up the page allocator

Speeding up the page allocator

Posted Feb 26, 2009 20:17 UTC (Thu) by bluefoxicy (guest, #25366)
In reply to: Speeding up the page allocator by etienne_lorrain@yahoo.fr
Parent article: Speeding up the page allocator

Zeroing a page on free is incorrect. What if you free, allocate, don't use, free, allocate, don't use, free... kalloc() for example allocates units of 4k, 8k, 16k, 32k, 64k, 128k. A 65k alloc will bring in 15 untouched pages, that then get freed. The proper time to zero a page is just-in-time on read/write (which, coincidentally, is also the time to commit an allocation to a real physical memory page)


(Log in to post comments)

Speeding up the page allocator

Posted Feb 27, 2009 1:21 UTC (Fri) by jzbiciak (subscriber, #5246) [Link]

While it's true that zeroing all freed pages is a bad idea, keeping a pool of freed pages that's refilled during idle periods isn't so crazy. I believe the Windows NT kernel does something along those lines. You do end up putting more code in the fast-path to detect whether the "prezeroed pool" is non-empty, and it only applies to GFP_ZERO pages anyway, so I suspect it ends up not being a win under Linux.

Mel's patches bring a noticeable speedup at the benchmark level, and suggest to me that GFP_ZERO pages are not the most numerous allocations in the system. This makes intuitive sense--most allocations back other higher-level allocators in the kernel and/or provide buffer space that's about to be filled. There's no reason to zero it. Complicating those allocations for a minor speedup in GFP_ZERO allocations seems misplaced.

Speeding up the page allocator

Posted Feb 27, 2009 10:26 UTC (Fri) by etienne_lorrain@yahoo.fr (guest, #38022) [Link]

Please note that I was more talking about DMA zeroing, which is nearly "free" in CPU time (on some tests I did on PPC, it is more than 10 times faster than the CPU zeroing - excluding dcbz which cannot be used on un-cached memory to be precise).
The big advantage is that it should also remove those cache-lines from the memory cache (layer 1, 2 and 3 if present) at time of free(), so it should still be better if you "free, allocate, don't use, free, allocate, don't use" because the allocated and unused memory isn't even fetched into the memory cache, and isn't made dirty for the other processors cache.
But it is probably more complex (multiprocessor DMA semaphore), and for these kind of things only testing can tell the truth, and that truth is only valid for the tested environment.

Speeding up the page allocator

Posted Feb 27, 2009 23:14 UTC (Fri) by nix (subscriber, #2304) [Link]

It's nearly free, but is it worth the complexity? How many pages are
zeroed, and then not used soon enough that it's still in cache?

IIRC the zero page was removed from the kernel because zeroing pages was
faster than doing pagetable tricks to share a single zero page. Pagetable
manipulation is particularly expensive, but even so...


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds