BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 18:35 UTC (Mon) by hppnq (guest, #14462)
In reply to: BFS vs. mainline scheduler benchmarks and measurements by lacostej
Parent article: BFS vs. mainline scheduler benchmarks and measurements

I mean if I was Con, that's the first thing I would do: create a measurable suite of tests.

Actually, he did that: you may find interbench interesting. It was used to produce Con's performance statistics. Also, see this 2002 interview with Con, discussing his earlier effort ConTest and scheduler benchmarking in general.

The challenge, it seems, is to get scheduler developers to agree on what constitutes a normal workload on normal systems tuned in normal ways.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:45 UTC (Mon) by mingo (guest, #31122) [Link] (3 responses)

The challenge, it seems, is to get scheduler developers to agree on what constitutes a normal workload on normal systems tuned in normal ways.

There's not much disagreement really. Everyone agrees that interactivity problems need to be investigated and fixed - it's as simple as that. We have a lot of tools to do just that, and things that get reported to us we try to get fixed.

In practice, interactivity fixes rarely get in the way of server tunings - and if they do, the upstream kernel perspective was always for desktop/latency tunings to have precedence over server/thoughput tunings.

I'm aware that the opposite is being claimed, but that does not make it a fact.

Try a simple experiment: post a patch to lkml with Linus Cc:-ed that blatantly changes some tunable to be more server friendly (double the default latency target or increase some IO batching default) at the expense of desktop latencies. My guess is that you'll see a very quick NAK.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:01 UTC (Tue) by hppnq (guest, #14462) [Link] (2 responses)

We have a lot of tools to do just that, and things that get reported to us we try to get fixed.

Ah, my point is that you claim to compare apples to apples while you use different tools than Con to compare the performance of the BFS and CFS schedulers. It is entirely possible that I missed the comparison of benchmarking tools, of course, and I'm not saying that you or Con should choose any particular tool: I am simply observing there is a difference.

But, looking at the interbench results, I cannot help but think that it would have been better if Con had used some other benchmarks as well: one could drive a truck through those standard deviations.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:48 UTC (Tue) by mingo (guest, #31122) [Link] (1 responses)

Well, the reason i spent 8+ hours for each round of testing is because i threw a lot of reliable and relevant benchmarks/workloads at the schedulers. Most of those were used by Con too in the past for scheduler work he did so it's not like he never runs them or disagrees with them on some fundamental basis - he just chose not to test them on BFS this time around. Sysbench comes from FreeBSD for example, hackbench was written many years ago to test chat server latencies/throughput, kbuild, lat_tcp and lat_pipe is well-known as well, etc.

Basically i applied a wide spectrum of tests that _I_ find useful to build a picture about how good a scheduler is, and posted the results. (I wanted to find the strong spot of BFS - which by in turn would be a weak spot of the mainline scheduler.)

So i tested what i was curious about (basic latency in four tests, throughput and scalability in two other tests) - others can test what they are curious about - testing these schedulers is not that hard, it's not like i have a monopoly on posting scheduler comparisons ;-)

But, looking at the interbench results, I cannot help but think that it would have been better if Con had used some other benchmarks as well: one could drive a truck through those standard deviations.

The inherent noise in the interbench numbers does not look particularly good - and i found that too in the past. But it's still a useful test, so i'm not dissing it - it's just very noisy in general. I prefer low noise tests as i want to be able to stand behind them later on. When i post benchmarks they get a lot of scrutiny, for natural reasons, so i want sound results. You wont find many (any?) measurements from me in the lkml archives that were discredited later.

Also, on the theoretical angle, i dont think there's much to be won on the interactivity front either: the mainline scheduler has a fixed deadline (/proc/sys/kernel/sched_latency_ns) which you can tune down if you wish to and it works hard to meet that latency goal for every task. If it doesn't then that's a bug we want to fix, not some fundamental design weakness.

But ... theory is one thing and practice is another, so it always makes sense to walk the walk and keep an open mind about all this.

So what we need now are bugreports and testers willing to help us. These kinds of heated discussions about the scheduler are always useful as the attention on the scheduler increases and we are able to fix bugs that don't get reported otherwise - so i'm not complaining ;-)

For latency characterisation and debugging we use the latency tests i did post (pipe, messaging, etc.), plus to measure a live desktop we use latencytop, latency tracer, the 'perf' tool, etc.

So there's plenty of good tools, plenty of well-known benchmarks, plenty of good and reliable data, and a decade old kernel policy that desktop latencies have a precedence over server throughput - and the scheduler developers are eager to fix all bugs that get reported.

Let me note here that based on these 100+ comment discussions here on LWN and on Slashdot as well, we only got a single specific latency bugreport against the upstream scheduler in the past 24 hours. So there's a lot of smoke, a lot of wild claims and complaints - but little actionable feedback from real Linux users right now.

So please, if you see some weirdness that is suspected to be caused by the scheduler then please post it to lkml. (Please Cc: Peter Zijstra and me as well to any email.) I'm sure the scheduler is not bug-free and i'm sure there's interactivity bugs to fix as well, so dont hesitate to help out.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 11:45 UTC (Tue) by hppnq (guest, #14462) [Link]

Thanks for clarifying! Not only do I appreciate all those hours of developing and testing wonderful software, I also like it a lot that you take the time to comment about it here at LWN. :-)