|From:||Nick Piggin <firstname.lastname@example.org>|
|To:||Andrew Morton <email@example.com>, Linus Torvalds <firstname.lastname@example.org>|
|Subject:||[RFC][PATCH 0/3] beat kswapd with the proverbial clue-bat|
|Date:||Sun, 05 Sep 2004 15:44:18 +1000|
|Cc:||Linux Memory Management <email@example.com>, linux-kernel <firstname.lastname@example.org>|
Kswapd is dumb as bricks when it comes to higher order allocations. Actually that's not quite fair: it is bad at lots of things... but higher order allocations are one of its more spectacular failures. The major problem that I can see is with !wait allocations, where you aren't allowed to free anything yourself - you're relying on kswapd (aside from that, it's always nice to avoid synchronous reclaim). Apparently these (higher-order && !wait) come up mainly in networking which is the thing I had in mind. *However* as I only have half of a gigabit network (ie. 1 card), I haven't done any testing where it really counts. I'm also seeing surprisingly few reports on lkml, so perhaps it is me that needs the beating? Anyway, the big failure case is when memory is fragmented to the point that pages_free > pages_low, but you still have no higher order pages left. In that case, your !wait allocations can keep calling wakeup_kswapd but he'll just keep sleeping. min_free_kbytes is not really a solution because it just raises pages_low. In a nutshell, that whole area doesn't really have any idea about higher order allocations. So my solution? Just teach kswapd and the watermark code about higher order allocations in a fairly simple way. If pages_low is (say), 1024KB, we now also require 512KB of order-1 and above pages, 256K of order-2 and up, 128K of order 3, etc. (perhaps we should stop at about order-3?) *Also*, if we have requested an order 5 allocation, but one isn't available, we'll get kswapd to try to free at least 1, even if its order-5 "free-until" watermark is 0KB. The main cost is keeping track of the number of free pages of each order. There is also a penalty in the allocator for order > 0 allocations, but I have tried to do it so lower order allocations need to do less work. Flames? Comments? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to email@example.com More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds