| From: |
| Erich Focht <efocht@hpce.nec.com> |
| To: |
| "linux-kernel" <linux-kernel@vger.kernel.org>,
LSE <lse-tech@lists.sourceforge.net> |
| Subject: |
| [Lse-tech] [patch] scheduler fix for 1cpu/node case |
| Date: |
| Mon, 28 Jul 2003 21:16:46 +0200 |
| Cc: |
| "Martin J. Bligh" <Martin.Bligh@us.ibm.com>, Andi Kleen <ak@muc.de>,
torvalds@osdl.org |
Hi,
after talking to several people at OLS about the current NUMA
scheduler the conclusion was:
(1) it sucks (for particular workloads),
(2) on x86_64 (embarassingly simple NUMA) it's useless, goto (1).
Fact is that the current separation of local and global balancing,
where global balancing is done only in the timer interrupt at a fixed
rate is way too unflexible. A CPU going idle inside a well balanced
node will stay idle for a while even if there's a lot of work to
do. Especially in the corner case of one CPU per node this is
condemning that CPU to idleness for at least 5 ms. So x86_64 platforms
(but not only those!) suffer and whish to switch off the NUMA
scheduler while keeping NUMA memory management on.
The attached patch is a simple solution which
- solves the 1 CPU / node problem,
- lets other systems behave (almost) as before,
- opens the way to other optimisations like multi-level node
hierarchies (by tuning the retry rate)
- simpifies the NUMA scheduler and deletes more lines of code than it
adds.
The timer interrupt based global rebalancing might appear to be a
simple and good idea but it takes the scheduler a lot of
flexibility. In the patch the global rebalancing is done after a
certain number of failed attempts to locally balance. The number of
attempts is proportional to the number of CPUs in the current
node. For only 1 CPU in the current node the scheduler doesn't even
try to balance locally, it wouldn't make sense anyway. Of course one
could instead set IDLE_NODE_REBALANCE_TICK = IDLE_REBALANCE_TICK, but
this is more ugly (IMHO) and only helps when all nodes have 1 CPU /
node.
Please consider this for inclusion.
Thanks,
Erich
[2. text/x-diff; 1cpufix-lb-2.6.0t1.patch]...