Some notes from the BFS discussion
Since then, CFS creator Ingo Molnar has responded with a series of
benchmark results comparing the two schedulers. Tests included kernel
build times, pipe performance, messaging performance, and an online
transaction processing test; graphs were posted showing how each scheduler
performed on each test. Ingo's conclusion: "Alas, as it can be seen
in the graphs, i can not see any BFS performance improvements, on this
box.
" In fact, the opposite was true: BFS generally performed
worse than the mainline scheduler.
Con's answer was best described as "dismissive":
[snip lots of bullshit meaningless benchmarks showing how great cfs is and/or how bad bfs is, along with telling people they should use these artificial benchmarks to determine how good it is, demonstrating yet again why benchmarks fail the desktop]
As far as your editor can tell, Con's objections to the results mirror those heard elsewhere: Ingo chose an atypical machine for his tests, and those tests, in any case, do not really measure the performance of a scheduler in a desktop situation. The more cynical observers seem to believe that Ingo is more interested in defending the current scheduler than improving the desktop experience for "normal" users.
The machine chosen was certainly at the high end of the "desktop" scale:
A number of people thought that this box is not a typical desktop Linux system. That may indeed be true - today. But, as Ingo (among others) has pointed out, it's important to be a little ahead of the curve when designing kernel subsystems:
Btw., that's why the Linux scheduler performs so well on quad core systems today - the groundwork for that was laid two years ago when scheduler developers were testing on a quads. If we discovered fundamental problems on quads _today_ it would be way too late to help Linux users.
Partly in response to the criticisms, though, Ingo reran his tests on a single quad-core system, the same type of system as Con's box. The end results were just about the same.
The hardware used is irrelevant, though, if the benchmarks are not testing performance characteristics that desktop users care about. The concern here is latency: how long it takes before a runnable process can get its work done. If latencies are too high, audio or video streams will skip, the pointer will lag the mouse, scrolling will be jerky, and Maelstrom players will lose their ships. A number of Ingo's original tests were latency-related, and he added a couple more in the second round. So it looks like the benchmarks at least tried to measure the relevant quantity.
Benchmark results are not the same as a better desktop experience, though, and a number of users are reporting a "smoother" desktop when running with BFS. On the other hand, making significant scheduler changes in response to reports of subjective "feel" is a sure recipe for trouble: if one cannot measure improvement, one not only risks failing to fix any problems, one is also at significant risk of introducing performance regressions for other users. There has to be some sort of relatively objective way to judge scheduler improvements.
The way preferred by the current scheduler maintainers is to identify causes of latencies and fix them. The kernel's infrastructure for the identification of latency problems has improved considerably over the last year or two. One useful tool is latencytop, which collects data on what is delaying applications and presents the results to the user. The ftrace tracing framework is also able to create data on the delay between when a process is awakened and when it actually gets into the CPU; see this post from Frederic Weisbecker for an overview of how these measurements can be taken.
If there are real latency problems remaining in the Linux scheduler - and there are enough "BFS is better" reports to suggest that there are - then using the available tools to describe them seems like the right direction to take. Once the problem is better understood, it will be possible to consider possible remedies. It may well be that the mainline scheduler can be adjusted to make those problems go away. Or, possibly, a more radical sort of approach is necessary. But, without some understanding of the problem - and associated ability to measure it - attempted fixes seem a bit like a risky shot in the dark.
Ingo welcomed Con back to the development community and invited him to help improve the Linux scheduler. This seems unlikely to happen, though. Con's way of working has never meshed well with the kernel development community, and he is showing little sign of wanting to change that situation. That is unfortunate; he is a talented developer who could do a lot to improve Linux for an important user community. The adoption of the current CFS scheduler is a direct result of his earlier work, even if he did not write the code which was actually merged. In general, though, improving Linux requires working with the Linux development community; in the absence of a desire to do that effectively, there will be severe limits on what a developer will be able to accomplish.
(See also: Frans Pop's benchmark tests,
which show decidedly mixed results.)
| Index entries for this article | |
|---|---|
| Kernel | Latency |
| Kernel | Scheduler |
Posted Sep 10, 2009 2:34 UTC (Thu)
by ncm (guest, #165)
[Link] (1 responses)
Posted Sep 10, 2009 16:10 UTC (Thu)
by mingo (guest, #31122)
[Link]
It sounds as if Con could have much greater effect by posting benchmarks that mimic what he and those who agree with him consider typical use cases, and that do poorly under the current scheduler. The kernel people seem to be pretty good at hitting numeric targets they can reproduce.
Note that Con did write a tool that measures various latency aspects of the kernel scheduler: InterBench.
Interestingly, the BFS vs. mainline numbers Con posted are showing a mainline desktop latency advantage (also here).
(Caveat emptor: i have not done those measurements so i dont know how reliable they are - the standard deviation seems very high.)
Note that you dont 'have to' come up with a numeric result - a deterministic result that is described well and can be reproduced by a kernel developer is useful too.
Obviously numeric results have the not to be under-estimated advantage of removing subjective bias from tests. It turns a subjective impression into a hard number that cannot be ignored by either side of an argument. On the flip side, it's harder to obtain. latencytop should help out there for example.
Posted Sep 10, 2009 6:12 UTC (Thu)
by fredrik (subscriber, #232)
[Link] (8 responses)
http://thread.gmane.org/gmane.linux.kernel/886319/focus=8...
And if I interpret the thread correctly it seems like Ingo Molnar and Jens Axboe actually managed to pinpoint and fix a latency related issue in the CFS. An issue that maybe would have gone undetected if it hadn't been for the BFS.
Yay for "trolls" that spur kernel improvement. ;)
Posted Sep 10, 2009 6:26 UTC (Thu)
by drag (guest, #31333)
[Link] (5 responses)
Posted Sep 10, 2009 11:21 UTC (Thu)
by Tracey (guest, #30515)
[Link] (4 responses)
I wasn't sure when I'd have the time to try them, but later into the night I was tuning up a fedora 11 system for audio work. After I had set it up and was testing audio latency via the jack-audio system I decided to start tuning(err, poking things into) some of the scheduler stuff in /proc/sys/kernel.
This was on a older dual core with 4 gig ram running fedora 11. I tried the scheduler tweaks on the kernels 2.6.30.5-43.fc11.x86_64(stock fedora) and kernel-rt-2.6.29.6-1.rt23.4.fc11.ccrma.x86_64(Fernando at CCRMA's real time patched kernel).
What I was looking for was how low I could get the audio latency down to without getting xruns in the audio system. I noticed that when tweaking sched_latency_ns, sched_wakeup_granularity_ns, and sched_min_granularity_ns that I could get better latency on both the fedora and ccrma kernels.
The testing mostly consisted of starting jack from qjackctl, starting the hydrogen drum machine and sometimes another soft-synth; the starting glxgears and dragging it or something else quickly around the screen. I also opened firefox and other things, just to try to harass the audio session.
I could get the fedora kernel down to about 5msec latency and the ccrma-rt just above 1msec latency while using the scheduler tweaks. That was an improvement of 30-50% from using the kernel defaults. So, I did prove to myself at least, that the cfs scheduler can be tweaked. Of course, the system load took a hit somewhat(just as was told it would).
Anyway, here's the real funny part: After I would set the scheduler parameters lower I "noticed" that the screen was smoother and more responsive. Totally subjective on my part. Of course, it was very late and I needed sleep.
This whole BFS versus CFS things seems to be a black hole that likes to tear the folks up who get to close to it.
Posted Sep 10, 2009 15:54 UTC (Thu)
by mingo (guest, #31122)
[Link] (3 responses)
Anyway, here's the real funny part: After I would set the scheduler parameters lower I "noticed" that the screen was smoother and more responsive. Totally subjective on my part. Of course, it was very late and I needed sleep.
That's very much possible. The upstream scheduler is a deadline scheduler in essence, and /proc/sys/kernel/sched_latency_ns sets the latency target. The scheduler tries to schedules tasks so that no task ever gets a longer delay than this latency target. (i.e. no task misses its deadline)
The defaults on 2.6.31 are 20 msecs for 1-CPU systems, 40 msecs for 2-CPU systems and 60 msecs for 4-CPU systems (etc. - growing logarithmically by CPU count).
Smaller value there means more scheduling - but also faster reaction and 'smoother' mixing of workloads. So if you lower your 40 msecs down to 20 msecs, you could get a "two times smoother" visual experience for certain GUI workloads.
You can think of it as if your 50 Hz flickering screen went to 100 Hz by halving its latency target. Such changes can affect the subjective end result rather spectacularly.
It would be nice if you documented your latency parameter changes so that we could consider them for the mainline scheduler. Those parameters were always meant to be (and were regularly) tweaked and its effects were re-measured.
The latest scheduler tree (the 2.6.32 scheduler bits) also has them lowered - you can test it by booting the -tip kernel.
Does the -tip tree feel more interactive to you, or do you still need to lower the latency targets there too?
(Feel free to report it in email or here on LWN.net.)
Posted Sep 17, 2009 6:34 UTC (Thu)
by eduperez (guest, #11232)
[Link] (2 responses)
The defaults on 2.6.31 are 20 msecs for 1-CPU systems, 40 msecs for 2-CPU systems and 60 msecs for 4-CPU systems (etc. - growing logarithmically by CPU count). From my complete ignorance of how it works, may I ask why? This seems counter-intuitive to me: as the number of CPU's increase, users expect to feel a lower latency; and having more CPU's means the scheduler has it easier to find and empty CPU where the delayed task can execute. Thanks.
Posted Sep 17, 2009 16:44 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
Posted Sep 21, 2009 12:56 UTC (Mon)
by eduperez (guest, #11232)
[Link]
Posted Sep 10, 2009 10:45 UTC (Thu)
by ctg (guest, #3459)
[Link] (1 responses)
Posted Sep 10, 2009 22:21 UTC (Thu)
by Velmont (guest, #46433)
[Link]
Yes, that email should really be read. It gives me a warm fuzzy feeling all over. Just a quote from the mail Con Kolivas sent [snip]
Posted Sep 10, 2009 11:58 UTC (Thu)
by liw (subscriber, #6379)
[Link]
Posted Sep 10, 2009 13:08 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (7 responses)
(That is, the blips should be irrelevant for throughput once you average things out over a period of time, and as far as the responsiveness is concerned, they have already happened and you can no longer take them back.)
Posted Sep 10, 2009 15:22 UTC (Thu)
by anton (subscriber, #25547)
[Link] (6 responses)
Posted Sep 10, 2009 16:18 UTC (Thu)
by mingo (guest, #31122)
[Link] (5 responses)
Beyond IO bound tasks, there's also a general quality argument behind rewarding sleepers:
Lighter, leaner tasks get an advantage. They run less and subsequently sleep more.
Tasks that do intelligent multi-threading with a nice, parallel set of tasks get an advantage too.
CPU hogs that slow down the desktop and eat battery like the end of the world is nigh should take a back seat compared to ligher, friendlier, 'more interactive' tasks.
So the Linux scheduler always tried to reward tasks that are more judicious with CPU resources. An app can get 10% snappier by using 5% less CPU time.
Posted Sep 10, 2009 19:08 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (4 responses)
And regarding the rewarding of processes, it sounds a bit like the scheduler wanting to know better than the user what the user wants. It would be much less heavy handed to just let the user know that a thread was not behaving nicely, and to let the user deal with it. They might have a good reason for running it after all.
Just my thoughts, not to be given more weight than they deserve.
Posted Sep 10, 2009 19:20 UTC (Thu)
by mingo (guest, #31122)
[Link]
I agree with your observations - these are the basic tradeoffs to consider.
Note that the reward for tasks is limited. (unlimited would open up a starvation hole)
But you are right to suggest that the scheduler should not be guessing about the purpose of tasks.
So this capability was always kept optional, and was turned on/off during the fair scheduler's evolution, mainly driven by user feedback and by benchmarks. We might turn it off again - there are indications that it's causing problems.
Posted Sep 10, 2009 21:11 UTC (Thu)
by anton (subscriber, #25547)
[Link] (2 responses)
Posted Sep 11, 2009 5:18 UTC (Fri)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Sep 11, 2009 5:20 UTC (Fri)
by mjthayer (guest, #39183)
[Link]
Posted Sep 10, 2009 13:10 UTC (Thu)
by busterb (subscriber, #560)
[Link] (1 responses)
http://www.cyanogenmod.com/home/4-1-6-is-here-with-100-mo...
Posted Sep 10, 2009 14:58 UTC (Thu)
by kirkengaard (guest, #15022)
[Link]
Posted Sep 11, 2009 18:03 UTC (Fri)
by iabervon (subscriber, #722)
[Link]
BFS may give better interactivity by not giving X clients as low latency in their attempts to generate work for the server. Also, disabling the "new fair sleepers" feature helps some people, which also suggests that this is actually effectively a priority inversion problem between the task that seems to be slow and tasks that are doing work on behalf of that task and are actually slow.
Posted Sep 12, 2009 18:04 UTC (Sat)
by Thalience (subscriber, #4217)
[Link] (1 responses)
I think the best response we can give to this assertion is the same one used in the audiophile community: Blind A/B tests. Have someone else switch between the two systems while doing the subjective evaluation. Don't tell the user which one is which. If they consistently prefer one over the other, perhaps there is a real effect. Otherwise....
Posted Sep 17, 2009 16:55 UTC (Thu)
by realnc (guest, #60393)
[Link]
As much as we find it frustrating that there will always be some people who insist that they perceive a subjective improvement that is not measured by any benchmark you care to name, it is human nature. I think the best response we can give to this assertion is the same one used in the audiophile community: Blind A/B tests. I don't need an ABX test to tell that sound stops with CFS if I start alt+tabbing while the sound continues playing when doing the same with BFS. Unless you think that some psychosomatic effect exists within BFS that makes me hear stuff that isn't there :)
benchmarks
benchmarks
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
I did not notice that those latencies where _per-cpu_, and (wrongly) assumed they where _global_...; it makes a lot more sense, now; thanks.
Some notes from the BFS discussion - and Con Kolivas responded...
Some notes from the BFS discussion - and Con Kolivas responded...
What does please me now, though, is that this message thread is finally
concentrating on what BFS was all about. The fact that it doesn't scale is no
mystery whatsoever. The fact that that throughput and lack of scaling was
what was given attention was missing the point entirely. To point that out I
used the bluntest response possible, because I know that works on lkml (does
it not?). Unfortunately I was so blunt that I ended up writing it in another
language; Troll. So for that, I apologise.
It pleases me immensely to see that it has alreadyIt pleases me immensely to see that it has already spurred on a flood of
changes to the interactivity side of mainline development in its few days of
existence, including some ideas that BFS uses itself. That in itself, to me,
means it has already started to accomplish its goal, spurred on a flood of
changes to the interactivity side of mainline development in its few days of
existence, including some ideas that BFS uses itself. That in itself, to me,
means it has already started to accomplish its goal,
Benchmarking the scheduler on desktop machine
Some notes from the BFS discussion
Some notes from the BFS discussion
what benefit is there to trying adjusting a processes
scheduling based on how much time it got in the past?
If a process is I/O bound (e.g., waiting for the user most of the time
rather than computing), it can be useful to prefer it, because it will
give a faster response to the user (or, for disk-bound processes, it
will come up faster with the next request for the disk, increasing disk utilization and hopefully total run-time), whereas a
CPU-bound process usually does not benefit from getting its timeslice
now rather than later.
Some notes from the BFS discussion
Some notes from the BFS discussion
Some notes from the BFS discussion
Some notes from the BFS discussion
Does a process rendering animation, or mixing music which
was played back as it is rendered/mixed fare well enough here?
Such processes normally won't use all of the CPU (unless the CPU is
too slow for them), because they are limited by the speed in which
they want to play back the content, so a scheduler prefering sleepers
over CPU hogs will prefer them over, say, oggenc. Of course, a
browser might get even more preferred treatment, which you may not
want; and clock scaling will tend to make stuff that consumes a
significant mostly-constant amount of CPU look almost CPU-bound if
they are alone on the CPU (but then it does not really matter).
It would be much less heavy handed to just let the user
know that a thread was not behaving nicely, and to let the user deal
with it.
Traditionally Unix had nice for that. I'm not sure that
this still works properly with current Linux schedulers. The
last time I tried it did not work well.
Some notes from the BFS discussion
Some notes from the BFS discussion
Some notes from the BFS discussion
reactions appear to be a noticable increase in responsiveness and decrease in stability.
Some notes from the BFS discussion
Some notes from the BFS discussion
Some notes from the BFS discussion
Some notes from the BFS discussion
