Weekly Edition Return to the Kernel page |
[patch] scheduler fix for 1cpu/node case
Hi, after talking to several people at OLS about the current NUMA scheduler the conclusion was: (1) it sucks (for particular workloads), (2) on x86_64 (embarassingly simple NUMA) it's useless, goto (1). Fact is that the current separation of local and global balancing, where global balancing is done only in the timer interrupt at a fixed rate is way too unflexible. A CPU going idle inside a well balanced node will stay idle for a while even if there's a lot of work to do. Especially in the corner case of one CPU per node this is condemning that CPU to idleness for at least 5 ms. So x86_64 platforms (but not only those!) suffer and whish to switch off the NUMA scheduler while keeping NUMA memory management on. The attached patch is a simple solution which - solves the 1 CPU / node problem, - lets other systems behave (almost) as before, - opens the way to other optimisations like multi-level node hierarchies (by tuning the retry rate) - simpifies the NUMA scheduler and deletes more lines of code than it adds. The timer interrupt based global rebalancing might appear to be a simple and good idea but it takes the scheduler a lot of flexibility. In the patch the global rebalancing is done after a certain number of failed attempts to locally balance. The number of attempts is proportional to the number of CPUs in the current node. For only 1 CPU in the current node the scheduler doesn't even try to balance locally, it wouldn't make sense anyway. Of course one could instead set IDLE_NODE_REBALANCE_TICK = IDLE_REBALANCE_TICK, but this is more ugly (IMHO) and only helps when all nodes have 1 CPU / node. Please consider this for inclusion. Thanks, Erich [2. text/x-diff; 1cpufix-lb-2.6.0t1.patch]... |
Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.