|
|
Log in / Subscribe / Register

BFS vs. mainline scheduler benchmarks and measurements

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:22 UTC (Mon) by ikm (subscriber, #493)
In reply to: BFS vs. mainline scheduler benchmarks and measurements by bvdm
Parent article: BFS vs. mainline scheduler benchmarks and measurements

> i don't think you should expect to convince the lwn.net audience with arguments suggesting Ingo Molnar's technical incompetence. Really.

I expect everyone can draw the conclusions of their own. I've made mine. Ingo's a nice guy, but I don't think he's measuring the right things here. But how are you going to measure things like:

  • mplayer using OpenGL renderer doesn't drop frames anymore when dragging and dropping the video window around in an OpenGL composited desktop
  • Composite desktop effects like zoom and fade out don't stall for sub-second periods of time while there's CPU load in the background
  • LMMS (a tool utilizing real-time sound synthesis) does not produce "pops", "crackles" and drops in the sound during real-time playback due to buffer under-runs
  • Games like Doom 3 and such don't "freeze" periodically for small amounts of time (again for sub-second amounts) when something in the background grabs CPU time
Those are things a person has reported as a followup on the thread in question. Do you think his was lying?


to post comments

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:41 UTC (Mon) by bvdm (guest, #42755) [Link] (1 responses)

Do you have a point other than that the current scheduler is not perfect? We all knew that. And Ingo invited Con to help improve it. So you don't really have a point at all, do you?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 12:59 UTC (Mon) by ikm (subscriber, #493) [Link]

Go troll elsewhere. Thank you.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 13:11 UTC (Mon) by mingo (subscriber, #31122) [Link] (2 responses)

But how are you going to measure things like:

* mplayer using OpenGL renderer doesn't drop frames anymore when dragging and dropping the video window around in an OpenGL composited desktop

* Composite desktop effects like zoom and fade out don't stall for sub-second periods of time while there's CPU load in the background

* LMMS (a tool utilizing real-time sound synthesis) does not produce "pops", "crackles" and drops in the sound during real-time playback due to buffer under-runs

* Games like Doom 3 and such don't "freeze" periodically for small amounts of time (again for sub-second amounts) when something in the background grabs CPU time

This is a list of routine interactivity problems that we track down and address. In the past few years we've got extensive infrastructure built up in the mainline kernel that allows their measurement and allows us to eliminate them.

A good place to start would be to try the latency tracing suggestions from Frederic Weisbecker on lkml:

Such properties of the desktop are measured routinely (sometimes easily - sometimes it needs quite a bit of work) - so please report them and help out tracking them down.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 14:02 UTC (Mon) by ikm (subscriber, #493) [Link] (1 responses)

Yay, that's a start. I hope this can go somewhere eventually. Clearly it's the interactivity issues Con has always been after, not the bulk workloads. With a way to measure and quantify those issues and scenarios, something might get going somewhere.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:53 UTC (Mon) by mingo (subscriber, #31122) [Link]

You might want to try latencytop. We added the instrumentation for that after the CFS merge - to make it easier to prove/report scheduler (and other) latencies.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 16:21 UTC (Mon) by lacostej (guest, #2760) [Link] (5 responses)

> But how are you going to measure things like:

Can't these tool detect when they hang/stall ?

Can't we pipe modify them to report the issues in a known format (or to a third party daemon) and use those tools as tests ?

I mean if I was Con, that's the first thing I would do: create a measurable suite of tests.

Instead of talking of feelings, we would talk about measurable things. It's not like we're talking about usability. Even usability can be tested up to some degree.

So, can't we elevate the debate ?

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 18:35 UTC (Mon) by hppnq (guest, #14462) [Link] (4 responses)

I mean if I was Con, that's the first thing I would do: create a measurable suite of tests.

Actually, he did that: you may find interbench interesting. It was used to produce Con's performance statistics. Also, see this 2002 interview with Con, discussing his earlier effort ConTest and scheduler benchmarking in general.

The challenge, it seems, is to get scheduler developers to agree on what constitutes a normal workload on normal systems tuned in normal ways.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 7, 2009 21:45 UTC (Mon) by mingo (subscriber, #31122) [Link] (3 responses)

The challenge, it seems, is to get scheduler developers to agree on what constitutes a normal workload on normal systems tuned in normal ways.

There's not much disagreement really. Everyone agrees that interactivity problems need to be investigated and fixed - it's as simple as that. We have a lot of tools to do just that, and things that get reported to us we try to get fixed.

In practice, interactivity fixes rarely get in the way of server tunings - and if they do, the upstream kernel perspective was always for desktop/latency tunings to have precedence over server/thoughput tunings.

I'm aware that the opposite is being claimed, but that does not make it a fact.

Try a simple experiment: post a patch to lkml with Linus Cc:-ed that blatantly changes some tunable to be more server friendly (double the default latency target or increase some IO batching default) at the expense of desktop latencies. My guess is that you'll see a very quick NAK.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:01 UTC (Tue) by hppnq (guest, #14462) [Link] (2 responses)

We have a lot of tools to do just that, and things that get reported to us we try to get fixed.

Ah, my point is that you claim to compare apples to apples while you use different tools than Con to compare the performance of the BFS and CFS schedulers. It is entirely possible that I missed the comparison of benchmarking tools, of course, and I'm not saying that you or Con should choose any particular tool: I am simply observing there is a difference.

But, looking at the interbench results, I cannot help but think that it would have been better if Con had used some other benchmarks as well: one could drive a truck through those standard deviations.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 8:48 UTC (Tue) by mingo (subscriber, #31122) [Link] (1 responses)

Ah, my point is that you claim to compare apples to apples while you use different tools than Con to compare the performance of the BFS and CFS schedulers. It is entirely possible that I missed the comparison of benchmarking tools, of course, and I'm not saying that you or Con should choose any particular tool: I am simply observing there is a difference.

Well, the reason i spent 8+ hours for each round of testing is because i threw a lot of reliable and relevant benchmarks/workloads at the schedulers. Most of those were used by Con too in the past for scheduler work he did so it's not like he never runs them or disagrees with them on some fundamental basis - he just chose not to test them on BFS this time around. Sysbench comes from FreeBSD for example, hackbench was written many years ago to test chat server latencies/throughput, kbuild, lat_tcp and lat_pipe is well-known as well, etc.

Basically i applied a wide spectrum of tests that _I_ find useful to build a picture about how good a scheduler is, and posted the results. (I wanted to find the strong spot of BFS - which by in turn would be a weak spot of the mainline scheduler.)

So i tested what i was curious about (basic latency in four tests, throughput and scalability in two other tests) - others can test what they are curious about - testing these schedulers is not that hard, it's not like i have a monopoly on posting scheduler comparisons ;-)

But, looking at the interbench results, I cannot help but think that it would have been better if Con had used some other benchmarks as well: one could drive a truck through those standard deviations.

The inherent noise in the interbench numbers does not look particularly good - and i found that too in the past. But it's still a useful test, so i'm not dissing it - it's just very noisy in general. I prefer low noise tests as i want to be able to stand behind them later on. When i post benchmarks they get a lot of scrutiny, for natural reasons, so i want sound results. You wont find many (any?) measurements from me in the lkml archives that were discredited later.

Also, on the theoretical angle, i dont think there's much to be won on the interactivity front either: the mainline scheduler has a fixed deadline (/proc/sys/kernel/sched_latency_ns) which you can tune down if you wish to and it works hard to meet that latency goal for every task. If it doesn't then that's a bug we want to fix, not some fundamental design weakness.

But ... theory is one thing and practice is another, so it always makes sense to walk the walk and keep an open mind about all this.

So what we need now are bugreports and testers willing to help us. These kinds of heated discussions about the scheduler are always useful as the attention on the scheduler increases and we are able to fix bugs that don't get reported otherwise - so i'm not complaining ;-)

For latency characterisation and debugging we use the latency tests i did post (pipe, messaging, etc.), plus to measure a live desktop we use latencytop, latency tracer, the 'perf' tool, etc.

So there's plenty of good tools, plenty of well-known benchmarks, plenty of good and reliable data, and a decade old kernel policy that desktop latencies have a precedence over server throughput - and the scheduler developers are eager to fix all bugs that get reported.

Let me note here that based on these 100+ comment discussions here on LWN and on Slashdot as well, we only got a single specific latency bugreport against the upstream scheduler in the past 24 hours. So there's a lot of smoke, a lot of wild claims and complaints - but little actionable feedback from real Linux users right now.

So please, if you see some weirdness that is suspected to be caused by the scheduler then please post it to lkml. (Please Cc: Peter Zijstra and me as well to any email.) I'm sure the scheduler is not bug-free and i'm sure there's interactivity bugs to fix as well, so dont hesitate to help out.

BFS vs. mainline scheduler benchmarks and measurements

Posted Sep 8, 2009 11:45 UTC (Tue) by hppnq (guest, #14462) [Link]

Thanks for clarifying! Not only do I appreciate all those hours of developing and testing wonderful software, I also like it a lot that you take the time to comment about it here at LWN. :-)


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds