Some notes from the BFS discussion - and Con Kolivas responded...

Posted Sep 10, 2009 6:26 UTC (Thu) by drag (guest, #31333)
In reply to: Some notes from the BFS discussion - and Con Kolivas responded... by fredrik
Parent article: Some notes from the BFS discussion

It was critical that Con actually has code to show for his ideas and eats his own dogfood.

Some notes from the BFS discussion - and Con Kolivas responded...

Posted Sep 10, 2009 11:21 UTC (Thu) by Tracey (guest, #30515) [Link] (4 responses)

After reading(or trying to keep up with all of the messages on LWN) I went through it and noted a few of Ingo's scheduler tuning parameters.

I wasn't sure when I'd have the time to try them, but later into the night I was tuning up a fedora 11 system for audio work. After I had set it up and was testing audio latency via the jack-audio system I decided to start tuning(err, poking things into) some of the scheduler stuff in /proc/sys/kernel.

This was on a older dual core with 4 gig ram running fedora 11. I tried the scheduler tweaks on the kernels 2.6.30.5-43.fc11.x86_64(stock fedora) and kernel-rt-2.6.29.6-1.rt23.4.fc11.ccrma.x86_64(Fernando at CCRMA's real time patched kernel).

What I was looking for was how low I could get the audio latency down to without getting xruns in the audio system. I noticed that when tweaking sched_latency_ns, sched_wakeup_granularity_ns, and sched_min_granularity_ns that I could get better latency on both the fedora and ccrma kernels.

The testing mostly consisted of starting jack from qjackctl, starting the hydrogen drum machine and sometimes another soft-synth; the starting glxgears and dragging it or something else quickly around the screen. I also opened firefox and other things, just to try to harass the audio session.

I could get the fedora kernel down to about 5msec latency and the ccrma-rt just above 1msec latency while using the scheduler tweaks. That was an improvement of 30-50% from using the kernel defaults. So, I did prove to myself at least, that the cfs scheduler can be tweaked. Of course, the system load took a hit somewhat(just as was told it would).

Anyway, here's the real funny part: After I would set the scheduler parameters lower I "noticed" that the screen was smoother and more responsive. Totally subjective on my part. Of course, it was very late and I needed sleep.

This whole BFS versus CFS things seems to be a black hole that likes to tear the folks up who get to close to it.

Some notes from the BFS discussion - and Con Kolivas responded...

Posted Sep 10, 2009 15:54 UTC (Thu) by mingo (guest, #31122) [Link] (3 responses)

That's very much possible. The upstream scheduler is a deadline scheduler in essence, and /proc/sys/kernel/sched_latency_ns sets the latency target. The scheduler tries to schedules tasks so that no task ever gets a longer delay than this latency target. (i.e. no task misses its deadline)

The defaults on 2.6.31 are 20 msecs for 1-CPU systems, 40 msecs for 2-CPU systems and 60 msecs for 4-CPU systems (etc. - growing logarithmically by CPU count).

Smaller value there means more scheduling - but also faster reaction and 'smoother' mixing of workloads. So if you lower your 40 msecs down to 20 msecs, you could get a "two times smoother" visual experience for certain GUI workloads.

You can think of it as if your 50 Hz flickering screen went to 100 Hz by halving its latency target. Such changes can affect the subjective end result rather spectacularly.

It would be nice if you documented your latency parameter changes so that we could consider them for the mainline scheduler. Those parameters were always meant to be (and were regularly) tweaked and its effects were re-measured.

The latest scheduler tree (the 2.6.32 scheduler bits) also has them lowered - you can test it by booting the -tip kernel.

Does the -tip tree feel more interactive to you, or do you still need to lower the latency targets there too?

(Feel free to report it in email or here on LWN.net.)

Some notes from the BFS discussion - and Con Kolivas responded...

Posted Sep 17, 2009 6:34 UTC (Thu) by eduperez (guest, #11232) [Link] (2 responses)

The defaults on 2.6.31 are 20 msecs for 1-CPU systems, 40 msecs for 2-CPU systems and 60 msecs for 4-CPU systems (etc. - growing logarithmically by CPU count).

From my complete ignorance of how it works, may I ask why? This seems counter-intuitive to me: as the number of CPU's increase, users expect to feel a lower latency; and having more CPU's means the scheduler has it easier to find and empty CPU where the delayed task can execute. Thanks.

Some notes from the BFS discussion - and Con Kolivas responded...

Posted Sep 17, 2009 16:44 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

shorter time slices are inefficient (remember cache is many times faster than ram) so with more CPUs you can let the per-cpu latency creep higher and get equivalent or better overall responsiveness due to the additional CPUs being available to do the work.

Some notes from the BFS discussion - and Con Kolivas responded...

Posted Sep 21, 2009 12:56 UTC (Mon) by eduperez (guest, #11232) [Link]

per-cpu!!!
I did not notice that those latencies where _per-cpu_, and (wrongly) assumed they where _global_...; it makes a lot more sense, now; thanks.