|From:||Ingo Molnar <mingo-AT-elte.hu>|
|To:||Frans Pop <elendil-AT-planet.nl>|
|Subject:||[quad core results] BFS vs. mainline scheduler benchmarks and measurements|
|Date:||Mon, 7 Sep 2009 14:16:13 +0200|
|Cc:||kernel-AT-kolivas.org, linux-kernel-AT-vger.kernel.org, a.p.zijlstra-AT-chello.nl, efault-AT-gmx.de|
* Frans Pop <email@example.com> wrote: > Ingo Molnar wrote: > > So the testbox i picked fits into the upper portion of what i > > consider a sane range of systems to tune for - and should still fit > > into BFS's design bracket as well according to your description: > > it's a dual quad core system with hyperthreading. > > Ingo, > > Nice that you've looked into this. > > Would it be possible for you to run the same tests on e.g. a dual > core and/or a UP system (or maybe just offline some CPUs?)? It > would be very interesting to see whether BFS does better in the > lower portion of the range, or if the differences you show between > the two schedulers are consistent across the range. Sure! Note that usually we can extrapolate ballpark-figure quad and dual socket results from 8 core results. Trends as drastic as the ones i reported do not get reversed as one shrinks the number of cores. [ This technique is not universal - for example borderline graphs on cannot be extrapolated down reliably - but the graphs i posted were far from borderline. ] Con posted single-socket quad comparisons/graphs so to make it 100% apples to apples i re-tested with a single-socket (non-NUMA) quad as well, and have uploaded the new graphs/results to: kernel build performance on quad: http://redhat.com/~mingo/misc/bfs-vs-tip-kbuild-quad.jpg pipe performance on quad: http://redhat.com/~mingo/misc/bfs-vs-tip-pipe-quad.jpg messaging performance (hackbench) on quad: http://redhat.com/~mingo/misc/bfs-vs-tip-messaging-quad.jpg OLTP performance (postgresql + sysbench) on quad: http://redhat.com/~mingo/misc/bfs-vs-tip-oltp-quad.jpg It shows similar curves and behavior to the 8-core results i posted - BFS is slower than mainline in virtually every measurement. The ratios are different for different parts of the graphs - but the trend is similar. I also re-ran a few standalone kernel latency tests with a single quad: lat_tcp: BFS: TCP latency using localhost: 16.9926 microseconds sched-devel: TCP latency using localhost: 12.4141 microseconds [36.8% faster] as a comparison, the 8 core lat_tcp result was: BFS: TCP latency using localhost: 16.5608 microseconds sched-devel: TCP latency using localhost: 13.5528 microseconds [22.1% faster] lat_pipe quad result: BFS: Pipe latency: 4.6978 microseconds sched-devel: Pipe latency: 2.6860 microseconds [74.8% faster] as a comparison, the 8 core lat_pipe result was: BFS: Pipe latency: 4.9703 microseconds sched-devel: Pipe latency: 2.6137 microseconds [90.1% faster] On the desktop interactivity front, i also still saw that bad starvation artifact with BFS with multiple copies of CPU-bound pipe-test-1m.c running in parallel: http://redhat.com/~mingo/cfs-scheduler/tools/pipe-test-1m.c Start up a few copies of them like this: for ((i=0;i<32;i++)); do ./pipe-test-1m & done and the quad eventually came to a halt here - until the tasks finished running. I also tested a few key data points on dual core and it shows similar trends as well (as expected from the 8 and 4 core results). But ... i'd really encourage everyone to test these things yourself as well and not take anyone's word on this as granted. The more people provide numbers, the better. The latest BFS patch can be found at: http://ck.kolivas.org/patches/bfs/ The mainline sched-devel tree can be found at: http://people.redhat.com/mingo/tip.git/README Thanks, Ingo
Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds