LWN: Comments on "Some notes from the BFS discussion"

Some notes from the BFS discussion - and Con Kolivas responded...

eduperez — Mon, 21 Sep 2009 12:56:42 +0000

per-cpu!!!
I did not notice that those latencies where _per-cpu_, and (wrongly) assumed they where _global_...; it makes a lot more sense, now; thanks.

Some notes from the BFS discussion

realnc — Thu, 17 Sep 2009 16:55:01 +0000

As much as we find it frustrating that there will always be some people who insist that they perceive a subjective improvement that is not measured by any benchmark you care to name, it is human nature.

I think the best response we can give to this assertion is the same one used in the audiophile community: Blind A/B tests.

I don't need an ABX test to tell that sound stops with CFS if I start alt+tabbing while the sound continues playing when doing the same with BFS. Unless you think that some psychosomatic effect exists within BFS that makes me hear stuff that isn't there :)

Some notes from the BFS discussion - and Con Kolivas responded...

dlang — Thu, 17 Sep 2009 16:44:08 +0000

shorter time slices are inefficient (remember cache is many times faster than ram) so with more CPUs you can let the per-cpu latency creep higher and get equivalent or better overall responsiveness due to the additional CPUs being available to do the work.

Some notes from the BFS discussion - and Con Kolivas responded...

eduperez — Thu, 17 Sep 2009 06:34:53 +0000

The defaults on 2.6.31 are 20 msecs for 1-CPU systems, 40 msecs for 2-CPU systems and 60 msecs for 4-CPU systems (etc. - growing logarithmically by CPU count).

From my complete ignorance of how it works, may I ask why? This seems counter-intuitive to me: as the number of CPU's increase, users expect to feel a lower latency; and having more CPU's means the scheduler has it easier to find and empty CPU where the delayed task can execute. Thanks.

Some notes from the BFS discussion

Thalience — Sat, 12 Sep 2009 18:04:47 +0000

I think the best response we can give to this assertion is the same one used in the audiophile community: Blind A/B tests. Have someone else switch between the two systems while doing the subjective evaluation. Don't tell the user which one is which. If they consistently prefer one over the other, perhaps there is a real effect. Otherwise....

Some notes from the BFS discussion

iabervon — Fri, 11 Sep 2009 18:03:58 +0000

I think one issue is that a lot of interactivity issues probably come down to the X server not using its share of the CPU time to process the thing that the user is actually watching. I have the sneaking suspicion that Maelstrom is getting the CPU time to generate plenty of frames each second, and sending them off to X, which is not quite keeping up. Once the X server is sufficiently behind, there's back pressure on Maelstrom making requests, at which point it seems to be I/O-bound, and therefore gets all the CPU it can use to generate more frames, ensuring plenty of lag between the time that Maelstrom generates a frame and the time that the user sees it. And, of course, the benchmarks all look really good, because the game never has to wait for processor before generating a frame and the X server is drawing lots of frames. And, of course, the game is effectively trying to benchmark the system, in order to determine how closely-spaced frames should be, and our efficient system has hidden the work that it's trying to measure.

BFS may give better interactivity by not giving X clients as low latency in their attempts to generate work for the server. Also, disabling the "new fair sleepers" feature helps some people, which also suggests that this is actually effectively a priority inversion problem between the task that seems to be slow and tasks that are doing work on behalf of that task and are actually slow.

Some notes from the BFS discussion

mjthayer — Fri, 11 Sep 2009 05:20:43 +0000

s/algorithm/heuristic/. And of course, since CFS considers the behaviour of the process over a long period of time, this effect should be somewhat limited.

Some notes from the BFS discussion

mjthayer — Fri, 11 Sep 2009 05:18:16 +0000

A process like gcc, which alternates between I/O and CPU bound, may also get more than its share under this algorithm - perhaps that is why people always give build processes as examples of what negatively affects their interactivity?

Some notes from the BFS discussion - and Con Kolivas responded...

Velmont — Thu, 10 Sep 2009 22:21:25 +0000

Yes, that email should really be read. It gives me a warm fuzzy feeling all over. Just a quote from the mail Con Kolivas sent

What does please me now, though, is that this message thread is finally concentrating on what BFS was all about. The fact that it doesn't scale is no mystery whatsoever. The fact that that throughput and lack of scaling was what was given attention was missing the point entirely. To point that out I used the bluntest response possible, because I know that works on lkml (does it not?). Unfortunately I was so blunt that I ended up writing it in another language; Troll. So for that, I apologise.

[snip]

It pleases me immensely to see that it has alreadyIt pleases me immensely to see that it has already spurred on a flood of changes to the interactivity side of mainline development in its few days of existence, including some ideas that BFS uses itself. That in itself, to me, means it has already started to accomplish its goal, spurred on a flood of changes to the interactivity side of mainline development in its few days of existence, including some ideas that BFS uses itself. That in itself, to me, means it has already started to accomplish its goal,

Some notes from the BFS discussion

anton — Thu, 10 Sep 2009 21:11:50 +0000

Does a process rendering animation, or mixing music which was played back as it is rendered/mixed fare well enough here?

Such processes normally won't use all of the CPU (unless the CPU is too slow for them), because they are limited by the speed in which they want to play back the content, so a scheduler prefering sleepers over CPU hogs will prefer them over, say, oggenc. Of course, a browser might get even more preferred treatment, which you may not want; and clock scaling will tend to make stuff that consumes a significant mostly-constant amount of CPU look almost CPU-bound if they are alone on the CPU (but then it does not really matter).

It would be much less heavy handed to just let the user know that a thread was not behaving nicely, and to let the user deal with it.

Traditionally Unix had nice for that. I'm not sure that this still works properly with current Linux schedulers. The last time I tried it did not work well.

Some notes from the BFS discussion

mingo — Thu, 10 Sep 2009 19:20:16 +0000

I agree with your observations - these are the basic tradeoffs to consider.

Note that the reward for tasks is limited. (unlimited would open up a starvation hole)

But you are right to suggest that the scheduler should not be guessing about the purpose of tasks.

So this capability was always kept optional, and was turned on/off during the fair scheduler's evolution, mainly driven by user feedback and by benchmarks. We might turn it off again - there are indications that it's causing problems.

Some notes from the BFS discussion

mjthayer — Thu, 10 Sep 2009 19:08:17 +0000

Hm, I'm not sure if either of those arguments quite convince me :) As far as the "CPU bound processes don't need good latencies" is concerned, well, it is a heuristic, and heuristics are only as good as the set of usage cases that the author thought of. Does a process rendering animation, or mixing music which was played back as it is rendered/mixed fare well enough here? You are of course much better qualified than me to think of the possible edge cases of the heuristic...

And regarding the rewarding of processes, it sounds a bit like the scheduler wanting to know better than the user what the user wants. It would be much less heavy handed to just let the user know that a thread was not behaving nicely, and to let the user deal with it. They might have a good reason for running it after all.

Just my thoughts, not to be given more weight than they deserve.

Some notes from the BFS discussion

mingo — Thu, 10 Sep 2009 16:18:31 +0000

Beyond IO bound tasks, there's also a general quality argument behind rewarding sleepers:

Lighter, leaner tasks get an advantage. They run less and subsequently sleep more.

Tasks that do intelligent multi-threading with a nice, parallel set of tasks get an advantage too.

CPU hogs that slow down the desktop and eat battery like the end of the world is nigh should take a back seat compared to ligher, friendlier, 'more interactive' tasks.

So the Linux scheduler always tried to reward tasks that are more judicious with CPU resources. An app can get 10% snappier by using 5% less CPU time.

benchmarks

mingo — Thu, 10 Sep 2009 16:10:26 +0000

It sounds as if Con could have much greater effect by posting benchmarks that mimic what he and those who agree with him consider typical use cases, and that do poorly under the current scheduler. The kernel people seem to be pretty good at hitting numeric targets they can reproduce.

Note that Con did write a tool that measures various latency aspects of the kernel scheduler: InterBench.

Interestingly, the BFS vs. mainline numbers Con posted are showing a mainline desktop latency advantage (also here).

(Caveat emptor: i have not done those measurements so i dont know how reliable they are - the standard deviation seems very high.)

Note that you dont 'have to' come up with a numeric result - a deterministic result that is described well and can be reproduced by a kernel developer is useful too.

Obviously numeric results have the not to be under-estimated advantage of removing subjective bias from tests. It turns a subjective impression into a hard number that cannot be ignored by either side of an argument. On the flip side, it's harder to obtain. latencytop should help out there for example.

Some notes from the BFS discussion - and Con Kolivas responded...

mingo — Thu, 10 Sep 2009 15:54:30 +0000

Anyway, here's the real funny part: After I would set the scheduler parameters lower I "noticed" that the screen was smoother and more responsive. Totally subjective on my part. Of course, it was very late and I needed sleep.

That's very much possible. The upstream scheduler is a deadline scheduler in essence, and /proc/sys/kernel/sched_latency_ns sets the latency target. The scheduler tries to schedules tasks so that no task ever gets a longer delay than this latency target. (i.e. no task misses its deadline)

The defaults on 2.6.31 are 20 msecs for 1-CPU systems, 40 msecs for 2-CPU systems and 60 msecs for 4-CPU systems (etc. - growing logarithmically by CPU count).

Smaller value there means more scheduling - but also faster reaction and 'smoother' mixing of workloads. So if you lower your 40 msecs down to 20 msecs, you could get a "two times smoother" visual experience for certain GUI workloads.

You can think of it as if your 50 Hz flickering screen went to 100 Hz by halving its latency target. Such changes can affect the subjective end result rather spectacularly.

It would be nice if you documented your latency parameter changes so that we could consider them for the mainline scheduler. Those parameters were always meant to be (and were regularly) tweaked and its effects were re-measured.

The latest scheduler tree (the 2.6.32 scheduler bits) also has them lowered - you can test it by booting the -tip kernel.

Does the -tip tree feel more interactive to you, or do you still need to lower the latency targets there too?

(Feel free to report it in email or here on LWN.net.)

Some notes from the BFS discussion

anton — Thu, 10 Sep 2009 15:22:40 +0000

what benefit is there to trying adjusting a processes scheduling based on how much time it got in the past?

If a process is I/O bound (e.g., waiting for the user most of the time rather than computing), it can be useful to prefer it, because it will give a faster response to the user (or, for disk-bound processes, it will come up faster with the next request for the disk, increasing disk utilization and hopefully total run-time), whereas a CPU-bound process usually does not benefit from getting its timeslice now rather than later.

Some notes from the BFS discussion

kirkengaard — Thu, 10 Sep 2009 14:58:41 +0000

Cum grano salis -- while this has multiple anecdotes behind it, I'm not sure you can safely chalk instability in a development Android firmware image up to a BFS problem without isolating that change and doing real testing.

Some notes from the BFS discussion

busterb — Thu, 10 Sep 2009 13:10:16 +0000

One of the hacked Android firmwares has switched to BFS in its experimental branch. Early
reactions appear to be a noticable increase in responsiveness and decrease in stability.

http://www.cyanogenmod.com/home/4-1-6-is-here-with-100-mo...

Some notes from the BFS discussion

mjthayer — Thu, 10 Sep 2009 13:08:18 +0000

This has actually made me wonder - what benefit is there to trying adjusting a processes scheduling based on how much time it got in the past? It seems to me that unless you are giving very hard QoS guarantees, or something is wrong with your scheduler algorithm, any deviations from what the time the process should have had will be blips which can be ignored in the longer run, and trying to compensate for them is likely to introduce unnecessary complexity.

(That is, the blips should be irrelevant for throughput once you average things out over a period of time, and as far as the responsiveness is concerned, they have already happened and you can no longer take them back.)

Benchmarking the scheduler on desktop machine

liw — Thu, 10 Sep 2009 11:58:20 +0000

Until someone comes up with good, generally accepted objective benchmarks for schdulers with good coverage, perhaps it would make sense to benchmark things using double-blind tests and real people.

Some notes from the BFS discussion - and Con Kolivas responded...

Tracey — Thu, 10 Sep 2009 11:21:11 +0000

After reading(or trying to keep up with all of the messages on LWN) I went through it and noted a few of Ingo's scheduler tuning parameters.

I wasn't sure when I'd have the time to try them, but later into the night I was tuning up a fedora 11 system for audio work. After I had set it up and was testing audio latency via the jack-audio system I decided to start tuning(err, poking things into) some of the scheduler stuff in /proc/sys/kernel.

This was on a older dual core with 4 gig ram running fedora 11. I tried the scheduler tweaks on the kernels 2.6.30.5-43.fc11.x86_64(stock fedora) and kernel-rt-2.6.29.6-1.rt23.4.fc11.ccrma.x86_64(Fernando at CCRMA's real time patched kernel).

What I was looking for was how low I could get the audio latency down to without getting xruns in the audio system. I noticed that when tweaking sched_latency_ns, sched_wakeup_granularity_ns, and sched_min_granularity_ns that I could get better latency on both the fedora and ccrma kernels.

The testing mostly consisted of starting jack from qjackctl, starting the hydrogen drum machine and sometimes another soft-synth; the starting glxgears and dragging it or something else quickly around the screen. I also opened firefox and other things, just to try to harass the audio session.

I could get the fedora kernel down to about 5msec latency and the ccrma-rt just above 1msec latency while using the scheduler tweaks. That was an improvement of 30-50% from using the kernel defaults. So, I did prove to myself at least, that the cfs scheduler can be tweaked. Of course, the system load took a hit somewhat(just as was told it would).

This whole BFS versus CFS things seems to be a black hole that likes to tear the folks up who get to close to it.

Some notes from the BFS discussion - and Con Kolivas responded...

ctg — Thu, 10 Sep 2009 10:45:43 +0000

Thanks for pointing that out. It is a critical piece of the story - enough to warrant our editor updating the article, IMHO!

Some notes from the BFS discussion - and Con Kolivas responded...

drag — Thu, 10 Sep 2009 06:26:44 +0000

It was critical that Con actually has code to show for his ideas and eats his own dogfood.

Some notes from the BFS discussion - and Con Kolivas responded...

fredrik — Thu, 10 Sep 2009 06:12:57 +0000

To Con Kolivas' defense he actually came back with what I think is a very reasonable response later in the thread.

http://thread.gmane.org/gmane.linux.kernel/886319/focus=8...

And if I interpret the thread correctly it seems like Ingo Molnar and Jens Axboe actually managed to pinpoint and fix a latency related issue in the CFS. An issue that maybe would have gone undetected if it hadn't been for the BFS.

Yay for "trolls" that spur kernel improvement. ;)

benchmarks

ncm — Thu, 10 Sep 2009 02:34:50 +0000