LWN: Comments on "Improving scheduler latency"

Improving scheduler latency

Corkscrew — Wed, 22 Sep 2010 20:50:15 +0000

Regards subjective measures like "nicer interactive feel": this should be easy to test. At worst it would take about an hour plus a willing volunteer.

1) Set up both kernel versions so that they can be swapped with minimal effort (e.g. via bootloader).

2) Give the volunteer a few minutes on the old version, as a baseline.

3) Send the volunteer out of the room. Toss a coin to decide which kernel version to run. Secretly record which version you booted.

4) Leave the room and send the volunteer in. They should play around with the computer and record whether it feels more or less responsive than the baseline.

(Ideally there should be no contact between the developer and the volunteer after the choice in step #2 is made.)

5) Repeat steps #3-#4 (or #2-#4 if you're not worried about the volunteer getting bored). Each second repetition, use the kernel version you didn't use the first time round (this doesn't compromise randomisation and makes the statistics easier).

6) Perform a basic statistical analysis on the results. [Turns out this isn't as basic as I thought, and my battery's about to give out; will look it up tomorrow.]

This is a bit primitive - there are still a few opportunities for bias. A better tool would be a randomised bootloader of some kind that would log the choices of kernel and only reveal them after the tester had stated their preferences. This would have the additional advantage that the developer could do the testing without compromising blinding.

Anyone fancy developing one of those?

Improving scheduler latency

oak — Sat, 18 Sep 2010 19:03:49 +0000

I remember that on x86 the scheduler tick has been increased to 1 KHz. I wonder what happens on (embedded & slower) architectures where the scheduler tick is still e.g. 128Hz...?

Improving scheduler latency

Julie — Fri, 17 Sep 2010 22:06:00 +0000

Shouldn't the timeslice minimum be based on the speed of the CPU?

I wondered this before too, I assumed the answer was so obvious to everyone else that I would just be exposing my naivety in bringing it up. (So I'm glad you did because that makes me feel better :-))
After all, CFS scraps the 'arbitrary' un-CPU-centric HZ.

So, um, why _is_ it that we don't scale process scheduling resolution according to the capability of the CPU? Assuming this isn't a silly question.

Improving scheduler latency

clugstj — Fri, 17 Sep 2010 14:52:48 +0000

750us is a HUGE amount of time compared to cache misses.

Improving scheduler latency

martinfick — Thu, 16 Sep 2010 21:35:05 +0000

This assumes that interactivity is either better or not. It ignores the possibility that with some use cases interactivity might be better with one kernel, and that with other use cases it might be better with the another kernel. So, while your suggestion might work with an exact use case, which use cases are important is itself likely very subjective.

Improving scheduler latency

bfields — Thu, 16 Sep 2010 20:05:39 +0000

Isn't the real expense of a context switch in the various kinds of cache misses that it causes? In any case it's more complicated than "speed of the CPU". Maybe there's some way it could be measured dynamically.

Improving scheduler latency

marcH — Thu, 16 Sep 2010 17:17:06 +0000

> The result is better latency measurements and, it seems, a nicer interactive feel.

There is a really simple and cheap way to make subjective feelings objective: blind trials. Randomly switch between the two kernels to compare and write down your impressions. After a few times, look in the logs which kernels were tested when.

Improving scheduler latency

clugstj — Thu, 16 Sep 2010 14:17:28 +0000

Whether it's 2ms or 750us, it seems arbitrary. Shouldn't the timeslice minimum be based on the speed of the CPU? That is to say, faster processors can tolerate more context switching before it becomes significant.

"minimum" time slice

rvfh — Thu, 16 Sep 2010 11:39:01 +0000

Indeed. Sorry for missing this obvious point :-(

"minimum" time slice

knewt — Thu, 16 Sep 2010 10:20:45 +0000

Note the "minimum". Assuming I've read the article correctly, the slice size is 6ms/process_count, until this reaches 750µs, at which point it doesn't reduce any further (all with the default 6ms maximum latency setting). So with only two processes you would have a 3ms slice, as shown in the first example.

Improving scheduler latency

rvfh — Thu, 16 Sep 2010 08:22:15 +0000

> A patch making just the minimum time slice change was fast-tracked into the mainline and will be present in 2.6.36-rc5.

I am not an expert, but should not this be a bit more dynamic? I mean, it could impact CPU-constrained systems which usually run few tasks. Maybe the slice could start at 2 ms and reduce down to 750 us as more processes turn up?