Weekly Edition Return to the Kernel page |
[patch 2.6.0-test1] node affine NUMA scheduler extension
No real change compared to the previous version, patch was only adapted to fit into 2.6.0-test1. I append the description from my previous posting. The patch shows 5-8% gain in the numa_test benchmark on a TX7 Itanium2 machine with 8 CPUs/4 nodes. The interesting numbers are ElapsedTime and TotalUserTime. In numa_test I changed the PROBLEMSIZE from 1000000 to 2000000 in order to get longer execution/test times. The results are avergaes over 10 measurements, the standard deviation is in brackets. 2.6.0-test1 kernel: original NUMA scheduler Tasks AverageUserTime ElapsedTime TotalUserTime TotalSysTime 4 52.67(3.51) 61.30(8.04) 210.70(14.05) 0.16(0.02) 8 50.29(1.85) 55.19(6.36) 402.38(14.78) 0.34(0.02) 16 53.27(2.30) 115.30(5.40) 852.40(36.75) 0.62(0.02) 32 51.92(1.13) 215.98(5.95) 1661.66(36.08) 1.21(0.04) 2.6.0-test1 kernel: node affine NUMA scheduler Tasks AverageUserTime ElapsedTime TotalUserTime TotalSysTime 4 50.13(2.09) 56.72(8.46) 200.55(8.34) 0.15(0.01) 8 49.78(1.29) 54.43(4.90) 398.26(10.31) 0.34(0.02) 16 50.37(0.96) 110.79(8.46) 806.01(15.33) 0.63(0.03) 32 51.10(0.51) 210.18(3.27) 1635.40(16.16) 1.23(0.04) In order to see the UserTime / CPU one needs an additional patch which gets back the per cpu times in /proc/pid/cpu. The patch comes in a separate post. > This patch is an adaptation of the earlier work on the node affine > NUMA scheduler to the NUMA features meanwhile integrated into > 2.5. Compared to the patch posted for 2.5.39 this one is much simpler > and easier to understand. > > The main idea is (still) that tasks are assigned a homenode to which > they are preferentially scheduled. They are not only sticking as much > as possible to a node (as in the current 2.5 NUMA scheduler) but will > also be attracted back to their homenode if they had to be scheduled > away. Therefore the tasks can be called "affine" to the homenode. > > The implementation is straight forward: > - Tasks have an additional element in their task structure (node). > - The scheduler keeps track of the homenodes of the tasks running in > each node and on each runqueue. > - At cross-node load balance time nodes/runqueues which run tasks > originating from the stealer node are preferred. They get a weight > bonus for each task with the homenode of the stealer. > - When stealing from a remote node one tries to get the own tasks (if > any) or tasks from other nodes (if any). This way tasks are kept on > their homenode as long as possible. > > The selection of the homenode is currently done at initial load > balancing, i.e. at exec(). A smarter selection method might be needed > for improving the situation for multithreaded processes. An option is > the dynamic_homenode patch I posted for 2.5.39 or some other scheme > based on an RSS/node measure. But that's another story... Regards, Erich [2. text/x-diff; node_affine_sched-2.6.0t1-23.diff]... |
Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.