LWN.net Logo

RDSL and ignoring feedback

RDSL and ignoring feedback

Posted Jul 26, 2007 2:38 UTC (Thu) by zlynx (subscriber, #2285)
Parent article: Still waiting for swap prefetch

But some users were reporting real regressions with RSDL and were being told that those regressions were to be expected and would not be fixed. This behavior soured Linus on RSDL and set the stage for Ingo Molnar's CFS scheduler. Some (not all) people are convinced that Con's scheduler was the better design, but refusal to engage with negative feedback doomed the whole exercise.

I was following most of that on LKML as it happened, and the way that I saw it was that a guy testing RSDL was reporting the fact that his X server now got 25% CPU instead of 75% as a regression.

Con did respond. He said the scheduler was fair, that was the design, and that he (the tester) could renice X to -10 or -15 if he wished.

I don't see how else Con could respond to that. RSDL was supposed to be fair. Giving X 75% isn't fair. There's just no way to resolve those two things.


(Log in to post comments)

RDSL and ignoring feedback

Posted Jul 26, 2007 8:29 UTC (Thu) by jospoortvliet (subscriber, #33164) [Link]

Indeed. Imho Corbet should've mentioned this properly - Con did get
negative comments, but those where entirely silly. Complaining about the
fairness of a fair scheduler???

He might also have mentioned that by design, it is extremely unlikely to
find negative consequences of using swap-prefetch (which explains why
there are no real bugreports), and how the whole updatedb issue might be
solved by other means, but that doesn't go for all things swap-prefetch
helps.

For example, start OO.o on a low-mem machine, work in it, close it. Now
you've got 60 mb free ram. Swap prefetch will start filling that with
your swapped-out pages pretty quick, and if those pages are firefox or
some other app (which is likely, as swap-prefetch starts with the most
important data), you're pretty happy. There is NO other way of doing this
than swap-prefetch.

I think the main reason SwPr didn't make it in is that those who have to
decide over it have very very bulky hardware, and their employers are
very afraid it MIGHT in some weird way influence the 1024-cpu linux
deployments, and they don't care about the desktop at all.

RDSL and ignoring feedback

Posted Jul 26, 2007 13:35 UTC (Thu) by corbet (editor, #1) [Link]

Indeed. Imho Corbet should've mentioned this properly - Con did get negative comments, but those where entirely silly. Complaining about the fairness of a fair scheduler???

That's just the sort of approach which created trouble for SD/RSDL. If people see regressions with their workloads, stamping a "100% certified fair!" label on it will not make them feel better about it. You have to address these problems; if you are unwilling to do so, your code will not make it into the kernel.

CFS is also a "fair" scheduler, but it has not drawn the same sort of complaints - though it will be interesting to see what happens as the testing community gets larger. As I understand it, the CFS brand of "fairness" takes a longer-term view, allowing tasks to get their "fair" share even if they sleep from time to time. That helps to prevent the sort of regressions seen with SD.

The real key, though, is what happens when things go wrong. There will certainly be people reporting scheduler issues over the 2.6.23 cycle. Ingo and the other CFS hackers could certainly dismiss them as "entirely silly," seeing as the scheduler is "completely fair," after all. But they won't do that. Instead, they will do their best to understand and solve the problems. That is why CFS is in the kernel, and SD is not.

RDSL and ignoring feedback

Posted Jul 26, 2007 14:40 UTC (Thu) by nevyn (subscriber, #33129) [Link]

That is a very bad analogy, for a "fair" scheduler change requests along the lines of "app. X does Y, and gets an unfair amount of CPU" and should be dealt with promptly. Change requests like "app. X does Y, and I want it to get an unfair amount of CPU" should be solved another way. We didn't drop SELinux because random uid=0 processes could no longer re-write my /etc/shadow file, because that was a desired result.

Looking at it another way if you create holes in the algo. so that Xorg can walk through them, then so can any other piece of code. The correct fix was pointed out, if Xorg needs more CPU time than "normal" then just tell the kernel that using the interfaces specially for that purpose (Ie. nice).

IMNSHO CFS got into the kernel because Ingo wrote it, and basically no other reason. That's not entirely a bad thing, but it's not entirely a good thing either and doing "unbiased" reporting pretending otherwise isn't helping anyone.

RDSL and ignoring feedback

Posted Jul 26, 2007 17:38 UTC (Thu) by erwbgy (subscriber, #4104) [Link]

"unbiased" reporting pretending otherwise isn't helping anyone.

Of course it is not unbiased reporting - we pay good money for Jon's opinion. That's what an editorial is. Jon explained quite clearly, and even-handedly in my opinion, why he came to the conclusion he did. Agree with him or not (and I certainly don't always), but criticising his reporting because you don't agree with his conclusions is not helping anyone.

RDSL and ignoring feedback

Posted Jul 26, 2007 18:26 UTC (Thu) by nevyn (subscriber, #33129) [Link]

Maybe that sounded worse that I wanted it to. I didn't mean that Jon was biased against my view point, quite the opposite, it seemed more like he'd tried to "present both sides of an argument" instead of just saying what he thought was right.

But maybe he really did/does believe that the "reported problem" was a regression and needed fixing, at which point I guess I'll just have to disagree.

RDSL and ignoring feedback

Posted Jul 27, 2007 21:41 UTC (Fri) by jospoortvliet (subscriber, #33164) [Link]

Indeed. Top be honest, I don't get it why he re-stated that. I mean, it's
lovely to try and fix things if they come up, but this was and is
impossible to fix. And I'm sure Ingo won't try to fix it either. If
someone complains X gets not enough CPU (because it gets 1/10th if 10
heavy processes are running), he will tell the person to renice X. Just
like Con did. After all, it's what a fair scheduler does.

Someone complaining about it simply doesn't understand it - a fair
scheduler WILL lead to regressions. No way around it, period. It's unfair
to attack Con on this one, imho, and again - I really don't see how
Corbet can say these things.

BTW not to say Corbet is stupid or anything negatively, I just think he's
wrong here. Or there is something I utterly do NOT understand (and if
that's the case, I hope he can explain).

RDSL and ignoring feedback

Posted Jul 27, 2007 21:51 UTC (Fri) by corbet (editor, #1) [Link]

If somebody's workload works on 2.6.N, but fails on 2.6.N+1, it's a regression. It doesn't matter if life is better for a lot of other folks, or whether you call it "fair," it's still a regression. Regressions are bad news. And yes, CFS does do a better job with that sort of workload.

RDSL and ignoring feedback

Posted Jul 27, 2007 22:21 UTC (Fri) by zlynx (subscriber, #2285) [Link]

It may technically be a regression for that user, but if a change improves things for more other users than it hurts, I call that progress.

It may be a case of two steps forward, one step back, but still progress and a good thing, not bad news.

Regressions and progress

Posted Jul 27, 2007 22:41 UTC (Fri) by corbet (editor, #1) [Link]

Here's a message from Linus from a couple of weeks ago; I had considered it for the quote of the week:
So we don't fix bugs by introducing new problems. That way lies madness, and nobody ever knows if you actually make any real progress at all. Is it two steps forwards, one step back, or one step forward and two steps back? Different people will give different answers.

That's why regressions are _so_ much more important than new bugfixes. Because it's much more important to make slow but _steady_ progress, and have people know things improve (or at least not "deprove"). We don't want any kind of "brownian motion development".

Regressions and progress

Posted Jul 27, 2007 23:37 UTC (Fri) by zlynx (subscriber, #2285) [Link]

Linus' statement says regressions are much more important. But that doesn't specify how much more important. And when the benefits of the change build up, they override how important a regression is.

Ingo's scheduler does not work as well as Con's on many 3D applications, like games. That's a regression from Con's work. Ingo's doesn't always schedule games as well as mainline either (Transgaming went to some work to make Cedega share work between processes and sleep enough to fake out the 2.6 scheduler).

If I decide to complain about the regression (which I won't) should Linus hold up merging CFS until Ingo can meet my demand that the new thing work just like the old thing? (Renicing Cedega is just *too* hard! Poor me!)

Another example: I've complained that 4K stacks aren't ready to be the default (stacking enough device mappers crash), but I strongly suspect that the kernel is going to change the default to 4K no matter what I think.

Regressions and progress

Posted Oct 9, 2007 21:57 UTC (Tue) by Blaisorblade (guest, #25465) [Link]

This Linus quote is also from a couple of years ago, about drivers (when somebody fixed ACPI and broke suspend for most stuff). And he's a maintainer, and he must have this opinion (or the community should replace him).

What matters is "how deep do you need to stack DM" (is it a real problem) and "4k is a default (other choices are kept, so *you* will make the 8k choice, which is mostly worse)".

That is a known problem since years, and Device Mapper can be changed to be non-recursive; see the LWN articles about changes in link-resolution from recursive to iterative to understand what I mean - that's the same stuff. Technically, this is a tail-call optimization to reduce stack depth.

After reading the discussion over Con's community management, and thinking to Reiser4, I think that Linux is not about politics, but about communities, or rather Social Networks of developers and their influence on community filtering they do (that's a lot of academic buzzwords).

That said, it's known that many problem (including VM and I think scheduling) are computationally hard, so whichever solution you choose it has weak points (it's a theorem in some cases - you can prove that for each compressor there is a file the compressor expands). The point is how hard is the regression.

Distributions had to be fixed not to renice X to -10 (as usual for 2.4) when 2.6 came out. A stable kernel cannot require such a big change. I can fix my X startup script (well, working my way through X startup is not fun, even for an experienced Linux developer like me), but the Ubuntu average user cannot (I know tens of such users, switching from Windows because Vista sucks and Linux had Beryl - they are good Physics students).

RDSL and ignoring feedback

Posted Jul 28, 2007 12:08 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

I find that hard to believe if we're talking about the same person
complaining. His problem can never be fixed by CFS, unless CFS
automatically would renice his X, or would introduce unfairness some
other way.

CFS will cause regressions, because it doesn't do unfair scheduling -
which is what users have come to expect. There is no way around it.

Besides, CFS does worse on 3D gaming compared to SD and mainline, and ppl
will complain about that as well.

Note that I'm happy CFS got in mainline, as far as I can tell, it has a
superior design. It's just that the mentioned reasoning for the choice
doesn't work for me...

Maybe this is worth reading, if you didn't already.
http://osnews.com/story.php/18350/Linus-On-CFS-vs.-SD
(don't forget the OTHER SIDE of the story ;-) )

CFS and SD internals, design

Posted Jul 26, 2007 15:08 UTC (Thu) by mingo (subscriber, #31122) [Link]

As I understand it, the CFS brand of "fairness" takes a longer-term view, allowing tasks to get their "fair" share even if they sleep from time to time.

Correct, and i call this concept "sleeper fairness".

The simplest way to describe it is via an specific example: on my box if i run glxgears, it uses exactly 50% of CPU time. If i boot into the SD scheduler, and start a CPU hog in parallel to the glxgears task, the two tasks share the CPU: the CPU hog will get ~60% of CPU time, glxgears will get ~40% of CPU time. If i boot CFS, both tasks will get exactly 50% of CPU time.

I've described this mechanism and other internal details in another thread already, but i think it makes sense to paste that reply here too:

wait_runtime is a scheduler-internal metric that shows how much out-of-balance this task's execution history is compared to what execution time it could get on a "perfect, ideal multi-tasking CPU". So if wait_runtime gets negative that means it has spent more time on the CPU than it should have. If wait_runtime gets positive that means it has spent less time than it "should have". CFS sorts tasks in an rbtree with this value as a key and uses this value to choose the next task to run. (with lots of additional details - but this is the raw scheme.) It will pick the task with the largest wait_runtime value. (i.e. the task that is most in need of CPU time.)

This mechanism and implementation is basically not comparable to SD in any way, the two schedulers are so different. Basically the only common thing between them is that both aim to schedule tasks "fairly" - but even the definition of "fairness" is different: SD strictly considers time spent on the CPU and on the runqueue, CFS takes time spent sleeping into account as well. (and hence the approach of "sleep average" and the act of "rewarding" sleepy tasks, which was the main interactivity mechanism of the old scheduler, survives in CFS. Con was fundamentally against sleep-average methods. CFS tried to be a no-tradeoffs replacement for the existing scheduler and the sleeper-fairness method was key to that.)

This (and other) design differences and approaches - not surprisingly - produced two completely different scheduler implementations. Anyone who has tried both schedulers will attest to the fact that they "feel" differently and behave differently as well.

Due to these fundamental design differences the data structures and algorithms are necessarily very different, so there was basically no opportunity to share code (besides the scheduler glue code that was already in sched.c), and there's only 1 line of code in common between CFS and SD (out of thousands of lines of code):

  * This idea comes from the SD scheduler of Con Kolivas:
  */
 static inline void sched_init_granularity(void)
 {
         unsigned int factor = 1 + ilog2(num_online_cpus());

This boot-time "ilog2()" tuning based on the number of CPUs available is a tuning approach i saw in SD and i asked Con whether i could use it in CFS. (to which Con kindly agreed.)

CFS and SD internals, design

Posted Jul 28, 2007 12:30 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

Thanx, Ingo, an excellent explanation. I wonder if you could elaborate on
the following you wrote:

This mechanism and implementation is basically not comparable to SD in
any way, the two schedulers are so different. Basically the only common
thing between them is that both aim to schedule tasks "fairly" - but even
the definition of "fairness" is different: SD strictly considers time
spent on the CPU and on the runqueue, CFS takes time spent sleeping into
account as well. (and hence the approach of "sleep average" and the act
of "rewarding" sleepy tasks, which was the main interactivity mechanism
of the old scheduler, survives in CFS. Con was fundamentally against
sleep-average methods. CFS tried to be a no-tradeoffs replacement for the
existing scheduler and the sleeper-fairness method was key to that.)

How does this work, and effect fairness? I mean, can you tell a bit more
on the difference between SD and CFS in this area? (I'm pretty
interested, that's all)

CFS and SD internals, design

Posted Sep 18, 2007 9:25 UTC (Tue) by mingo (subscriber, #31122) [Link]

How does this work, and effect fairness? I mean, can you tell a bit more on the difference between SD and CFS in this area? (I'm pretty interested, that's all)

The practical difference is noticeable for something like the X server - Xorg is often a "sleepy" process but it's important that when it runs it gets its own maximum share of CPU time. With a "runners only" fairness model it will receive less CPU time than with a "sleepers considered too" (CFS) fairness model.

RDSL and ignoring feedback

Posted Jul 26, 2007 22:44 UTC (Thu) by bronson (subscriber, #4806) [Link]

SwPr didn't get in because some very rich employers don't want to destabilize their NUMA monsters? Er, doesn't this sound really unlikely?

The article mentions that Linus and some others feel that SwPr is just papering over a more fundamental problem. So, why not spend time trying to fix the fundamental problem before hacking around it?

That's a rhetorical question... there could be a number of reasons: the root cause is too complex to be understood, or the proper fix is worse than SwPr, etc. I just think that Linus & crew would like to see someone attempt to fix the real problem before resorting to a SwPr hack. If a proper fix is attempted and proves unweildy, then I bet SwPr will jump a lot higher on a number of kernel devs' priority queues.

> ...entirely silly. Complaining about the fairness of a fair scheduler?

They weren't complaining about the fairness, they were complaining about the quality. Is a 100% fair scheduler actually the best scheduler? Probably not.

RDSL and ignoring feedback

Posted Jul 28, 2007 12:35 UTC (Sat) by jospoortvliet (subscriber, #33164) [Link]

No, but a 100% fair scheduler is the only way to ensure you won't have
stalls and other problems brought by unfairness. You can't have your cake
and eat it too.

RDSL and ignoring feedback

Posted Jul 28, 2007 20:41 UTC (Sat) by bronson (subscriber, #4806) [Link]

Sorry, I don't quite understand... are you saying that a 100% fair scheduler actually *is* the best scheduler? If so, then would you have any evidence/research to back this up? I'm genuinely interested.

My uneducated view: in schedulers, as with government and parenting, 100% fairness is unattainable and probably paradoxical. The best policy may or may not be the most fair policy. They're simply disconnected.

RDSL and ignoring feedback

Posted Aug 2, 2007 8:00 UTC (Thu) by renox (subscriber, #23785) [Link]

The thing that this complaint show is that a 'fair scheduler' in itself is not good enough for enduser desktop..
If you have an application APP that is important to you, you renice it so it has lots of CPU, fine, but then say that this application sends a lot of work to do for the X server (could be any server really), then there is a kind of 'priority inversion' which happens where APP is slowed down because the X server doesn't have a big enough CPU share..

It's quite difficult to solve.. The only way would be to have some way to transfer the 'CPU token' that APP have to the server it's asking to do some work on its behalf, if it is a multi-threaded server which use a different thread for each client, then maybe the kernel could understand what's happening and boost the corresponding server's thread priority accordingly, but in a non-threaded server, I don't see how it could be solved even if the APP says gives my 'CPU token' to server X, how could the server X would be able to report/understand that currently he is supposed to be working for client APP and not for another client?

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds