LWN.net Logo

RDSL and ignoring feedback

RDSL and ignoring feedback

Posted Jul 26, 2007 13:35 UTC (Thu) by corbet (editor, #1)
In reply to: RDSL and ignoring feedback by jospoortvliet
Parent article: Still waiting for swap prefetch

Indeed. Imho Corbet should've mentioned this properly - Con did get negative comments, but those where entirely silly. Complaining about the fairness of a fair scheduler???

That's just the sort of approach which created trouble for SD/RSDL. If people see regressions with their workloads, stamping a "100% certified fair!" label on it will not make them feel better about it. You have to address these problems; if you are unwilling to do so, your code will not make it into the kernel.

CFS is also a "fair" scheduler, but it has not drawn the same sort of complaints - though it will be interesting to see what happens as the testing community gets larger. As I understand it, the CFS brand of "fairness" takes a longer-term view, allowing tasks to get their "fair" share even if they sleep from time to time. That helps to prevent the sort of regressions seen with SD.

The real key, though, is what happens when things go wrong. There will certainly be people reporting scheduler issues over the 2.6.23 cycle. Ingo and the other CFS hackers could certainly dismiss them as "entirely silly," seeing as the scheduler is "completely fair," after all. But they won't do that. Instead, they will do their best to understand and solve the problems. That is why CFS is in the kernel, and SD is not.


(Log in to post comments)

RDSL and ignoring feedback

Posted Jul 26, 2007 14:40 UTC (Thu) by nevyn (subscriber, #33129) [Link]

That is a very bad analogy, for a "fair" scheduler change requests along the lines of "app. X does Y, and gets an unfair amount of CPU" and should be dealt with promptly. Change requests like "app. X does Y, and I want it to get an unfair amount of CPU" should be solved another way. We didn't drop SELinux because random uid=0 processes could no longer re-write my /etc/shadow file, because that was a desired result.

Looking at it another way if you create holes in the algo. so that Xorg can walk through them, then so can any other piece of code. The correct fix was pointed out, if Xorg needs more CPU time than "normal" then just tell the kernel that using the interfaces specially for that purpose (Ie. nice).

IMNSHO CFS got into the kernel because Ingo wrote it, and basically no other reason. That's not entirely a bad thing, but it's not entirely a good thing either and doing "unbiased" reporting pretending otherwise isn't helping anyone.

RDSL and ignoring feedback

Posted Jul 26, 2007 17:38 UTC (Thu) by erwbgy (subscriber, #4104) [Link]

"unbiased" reporting pretending otherwise isn't helping anyone.

Of course it is not unbiased reporting - we pay good money for Jon's opinion. That's what an editorial is. Jon explained quite clearly, and even-handedly in my opinion, why he came to the conclusion he did. Agree with him or not (and I certainly don't always), but criticising his reporting because you don't agree with his conclusions is not helping anyone.

RDSL and ignoring feedback

Posted Jul 26, 2007 18:26 UTC (Thu) by nevyn (subscriber, #33129) [Link]

Maybe that sounded worse that I wanted it to. I didn't mean that Jon was biased against my view point, quite the opposite, it seemed more like he'd tried to "present both sides of an argument" instead of just saying what he thought was right.

But maybe he really did/does believe that the "reported problem" was a regression and needed fixing, at which point I guess I'll just have to disagree.

RDSL and ignoring feedback

Posted Jul 27, 2007 21:41 UTC (Fri) by jospoortvliet (subscriber, #33164) [Link]

Indeed. Top be honest, I don't get it why he re-stated that. I mean, it's
lovely to try and fix things if they come up, but this was and is
impossible to fix. And I'm sure Ingo won't try to fix it either. If
someone complains X gets not enough CPU (because it gets 1/10th if 10
heavy processes are running), he will tell the person to renice X. Just
like Con did. After all, it's what a fair scheduler does.

Someone complaining about it simply doesn't understand it - a fair
scheduler WILL lead to regressions. No way around it, period. It's unfair
to attack Con on this one, imho, and again - I really don't see how
Corbet can say these things.

BTW not to say Corbet is stupid or anything negatively, I just think he's
wrong here. Or there is something I utterly do NOT understand (and if
that's the case, I hope he can explain).

RDSL and ignoring feedback

Posted Jul 27, 2007 21:51 UTC (Fri) by corbet (editor, #1) [Link]

If somebody's workload works on 2.6.N, but fails on 2.6.N+1, it's a regression. It doesn't matter if life is better for a lot of other folks, or whether you call it "fair," it's still a regression. Regressions are bad news. And yes, CFS does do a better job with that sort of workload.

RDSL and ignoring feedback

Posted Jul 27, 2007 22:21 UTC (Fri) by zlynx (subscriber, #2285) [Link]

It may technically be a regression for that user, but if a change improves things for more other users than it hurts, I call that progress.

It may be a case of two steps forward, one step back, but still progress and a good thing, not bad news.

Regressions and progress

Posted Jul 27, 2007 22:41 UTC (Fri) by corbet (editor, #1) [Link]

Here's a message from Linus from a couple of weeks ago; I had considered it for the quote of the week:
So we don't fix bugs by introducing new problems. That way lies madness, and nobody ever knows if you actually make any real progress at all. Is it two steps forwards, one step back, or one step forward and two steps back? Different people will give different answers.

That's why regressions are _so_ much more important than new bugfixes. Because it's much more important to make slow but _steady_ progress, and have people know things improve (or at least not "deprove"). We don't want any kind of "brownian motion development".

Regressions and progress

Posted Jul 27, 2007 23:37 UTC (Fri) by zlynx (subscriber, #2285) [Link]

Linus' statement says regressions are much more important. But that doesn't specify how much more important. And when the benefits of the change build up, they override how important a regression is.

Ingo's scheduler does not work as well as Con's on many 3D applications, like games. That's a regression from Con's work. Ingo's doesn't always schedule games as well as mainline either (Transgaming went to some work to make Cedega share work between processes and sleep enough to fake out the 2.6 scheduler).

If I decide to complain about the regression (which I won't) should Linus hold up merging CFS until Ingo can meet my demand that the new thing work just like the old thing? (Renicing Cedega is just *too* hard! Poor me!)

Another example: I've complained that 4K stacks aren't ready to be the default (stacking enough device mappers crash), but I strongly suspect that the kernel is going to change the default to 4K no matter what I think.

Regressions and progress

Posted Oct 9, 2007 21:57 UTC (Tue) by Blaisorblade (guest, #25465) [Link]

This Linus quote is also from a couple of years ago, about drivers (when somebody fixed ACPI and broke suspend for most stuff). And he's a maintainer, and he must have this opinion (or the community should replace him).

What matters is "how deep do you need to stack DM" (is it a real problem) and "4k is a default (other choices are kept, so *you* will make the 8k choice, which is mostly worse)".

That is a known problem since years, and Device Mapper can be changed to be non-recursive; see the LWN articles about changes in link-resolution from recursive to iterative to understand what I mean - that's the same stuff. Technically, this is a tail-call optimization to reduce stack depth.

After reading the discussion over Con's community management, and thinking to Reiser4, I think that Linux is not about politics, but about communities, or rather Social Networks of developers and their influence on community filtering they do (that's a lot of academic buzzwords).

That said, it's known that many problem (including VM and I think scheduling) are computationally hard, so whichever solution you choose it has weak points (it's a theorem in some cases - you can prove that for each compressor there is a file the compressor expands). The point is how hard is the regression.

Distributions had to be fixed not to renice X to -10 (as usual for 2.4) when 2.6 came out. A stable kernel cannot require such a big change. I can fix my X startup script (well, working my way through X startup is not fun, even for an experienced Linux developer like me), but the Ubuntu average user cannot (I know tens of such users, switching from Windows because Vista sucks and Linux had Beryl - they are good Physics students).

RDSL and ignoring feedback

Posted Jul 28, 2007 12:08 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

I find that hard to believe if we're talking about the same person
complaining. His problem can never be fixed by CFS, unless CFS
automatically would renice his X, or would introduce unfairness some
other way.

CFS will cause regressions, because it doesn't do unfair scheduling -
which is what users have come to expect. There is no way around it.

Besides, CFS does worse on 3D gaming compared to SD and mainline, and ppl
will complain about that as well.

Note that I'm happy CFS got in mainline, as far as I can tell, it has a
superior design. It's just that the mentioned reasoning for the choice
doesn't work for me...

Maybe this is worth reading, if you didn't already.
http://osnews.com/story.php/18350/Linus-On-CFS-vs.-SD
(don't forget the OTHER SIDE of the story ;-) )

CFS and SD internals, design

Posted Jul 26, 2007 15:08 UTC (Thu) by mingo (subscriber, #31122) [Link]

As I understand it, the CFS brand of "fairness" takes a longer-term view, allowing tasks to get their "fair" share even if they sleep from time to time.

Correct, and i call this concept "sleeper fairness".

The simplest way to describe it is via an specific example: on my box if i run glxgears, it uses exactly 50% of CPU time. If i boot into the SD scheduler, and start a CPU hog in parallel to the glxgears task, the two tasks share the CPU: the CPU hog will get ~60% of CPU time, glxgears will get ~40% of CPU time. If i boot CFS, both tasks will get exactly 50% of CPU time.

I've described this mechanism and other internal details in another thread already, but i think it makes sense to paste that reply here too:

wait_runtime is a scheduler-internal metric that shows how much out-of-balance this task's execution history is compared to what execution time it could get on a "perfect, ideal multi-tasking CPU". So if wait_runtime gets negative that means it has spent more time on the CPU than it should have. If wait_runtime gets positive that means it has spent less time than it "should have". CFS sorts tasks in an rbtree with this value as a key and uses this value to choose the next task to run. (with lots of additional details - but this is the raw scheme.) It will pick the task with the largest wait_runtime value. (i.e. the task that is most in need of CPU time.)

This mechanism and implementation is basically not comparable to SD in any way, the two schedulers are so different. Basically the only common thing between them is that both aim to schedule tasks "fairly" - but even the definition of "fairness" is different: SD strictly considers time spent on the CPU and on the runqueue, CFS takes time spent sleeping into account as well. (and hence the approach of "sleep average" and the act of "rewarding" sleepy tasks, which was the main interactivity mechanism of the old scheduler, survives in CFS. Con was fundamentally against sleep-average methods. CFS tried to be a no-tradeoffs replacement for the existing scheduler and the sleeper-fairness method was key to that.)

This (and other) design differences and approaches - not surprisingly - produced two completely different scheduler implementations. Anyone who has tried both schedulers will attest to the fact that they "feel" differently and behave differently as well.

Due to these fundamental design differences the data structures and algorithms are necessarily very different, so there was basically no opportunity to share code (besides the scheduler glue code that was already in sched.c), and there's only 1 line of code in common between CFS and SD (out of thousands of lines of code):

  * This idea comes from the SD scheduler of Con Kolivas:
  */
 static inline void sched_init_granularity(void)
 {
         unsigned int factor = 1 + ilog2(num_online_cpus());

This boot-time "ilog2()" tuning based on the number of CPUs available is a tuning approach i saw in SD and i asked Con whether i could use it in CFS. (to which Con kindly agreed.)

CFS and SD internals, design

Posted Jul 28, 2007 12:30 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

Thanx, Ingo, an excellent explanation. I wonder if you could elaborate on
the following you wrote:

This mechanism and implementation is basically not comparable to SD in
any way, the two schedulers are so different. Basically the only common
thing between them is that both aim to schedule tasks "fairly" - but even
the definition of "fairness" is different: SD strictly considers time
spent on the CPU and on the runqueue, CFS takes time spent sleeping into
account as well. (and hence the approach of "sleep average" and the act
of "rewarding" sleepy tasks, which was the main interactivity mechanism
of the old scheduler, survives in CFS. Con was fundamentally against
sleep-average methods. CFS tried to be a no-tradeoffs replacement for the
existing scheduler and the sleeper-fairness method was key to that.)

How does this work, and effect fairness? I mean, can you tell a bit more
on the difference between SD and CFS in this area? (I'm pretty
interested, that's all)

CFS and SD internals, design

Posted Sep 18, 2007 9:25 UTC (Tue) by mingo (subscriber, #31122) [Link]

How does this work, and effect fairness? I mean, can you tell a bit more on the difference between SD and CFS in this area? (I'm pretty interested, that's all)

The practical difference is noticeable for something like the X server - Xorg is often a "sleepy" process but it's important that when it runs it gets its own maximum share of CPU time. With a "runners only" fairness model it will receive less CPU time than with a "sleepers considered too" (CFS) fairness model.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds