User: Password:
Subscribe / Log in / New account

[quad core results] BFS vs. mainline scheduler benchmarks and measurements

From:  Ingo Molnar <>
To:  Frans Pop <>
Subject:  [quad core results] BFS vs. mainline scheduler benchmarks and measurements
Date:  Mon, 7 Sep 2009 14:16:13 +0200
Archive-link:  Article, Thread

* Frans Pop <> wrote:

> Ingo Molnar wrote:
> > So the testbox i picked fits into the upper portion of what i
> > consider a sane range of systems to tune for - and should still fit
> > into BFS's design bracket as well according to your description:
> > it's a dual quad core system with hyperthreading.
> Ingo,
> Nice that you've looked into this.
> Would it be possible for you to run the same tests on e.g. a dual 
> core and/or a UP system (or maybe just offline some CPUs?)? It 
> would be very interesting to see whether BFS does better in the 
> lower portion of the range, or if the differences you show between 
> the two schedulers are consistent across the range.


Note that usually we can extrapolate ballpark-figure quad and dual 
socket results from 8 core results. Trends as drastic as the ones 
i reported do not get reversed as one shrinks the number of cores. 

[ This technique is not universal - for example borderline graphs
  on cannot be extrapolated down reliably - but the graphs i 
  posted were far from borderline. ]

Con posted single-socket quad comparisons/graphs so to make it 100% 
apples to apples i re-tested with a single-socket (non-NUMA) quad as 
well, and have uploaded the new graphs/results to:

  kernel build performance on quad:

  pipe performance on quad:

  messaging performance (hackbench) on quad:

  OLTP performance (postgresql + sysbench) on quad:

It shows similar curves and behavior to the 8-core results i posted 
- BFS is slower than mainline in virtually every measurement. The 
ratios are different for different parts of the graphs - but the 
trend is similar.

I also re-ran a few standalone kernel latency tests with a single 


  BFS:          TCP latency using localhost: 16.9926 microseconds
  sched-devel:  TCP latency using localhost: 12.4141 microseconds [36.8% faster]

  as a comparison, the 8 core lat_tcp result was:

  BFS:          TCP latency using localhost: 16.5608 microseconds
  sched-devel:  TCP latency using localhost: 13.5528 microseconds [22.1% faster]

lat_pipe quad result:

  BFS:          Pipe latency: 4.6978 microseconds
  sched-devel:  Pipe latency: 2.6860 microseconds [74.8% faster]

  as a comparison, the 8 core lat_pipe result was:

  BFS:          Pipe latency: 4.9703 microseconds
  sched-devel:  Pipe latency: 2.6137 microseconds [90.1% faster]

On the desktop interactivity front, i also still saw that bad 
starvation artifact with BFS with multiple copies of CPU-bound 
pipe-test-1m.c running in parallel:

Start up a few copies of them like this:

  for ((i=0;i<32;i++)); do ./pipe-test-1m & done

and the quad eventually came to a halt here - until the tasks 
finished running.

I also tested a few key data points on dual core and it shows 
similar trends as well (as expected from the 8 and 4 core results).

But ... i'd really encourage everyone to test these things yourself 
as well and not take anyone's word on this as granted. The more 
people provide numbers, the better. The latest BFS patch can be 
found at:

The mainline sched-devel tree can be found at:



(Log in to post comments)

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds