Posted Sep 3, 2009 12:54 UTC (Thu) by xoddam (subscriber, #2322)
Parent article: The offline scheduler
> Had Ben-Yehuda been working in the open, and looking for comments
> from the kernel community, he might have realized that his
> approach would not be acceptable at least for the mainline
> much sooner.
He's been posting on this subject on LKML since October of last year
He got very little in the way of comments (kudos to the few who engaged) but ploughed on with the technical work regardless. Only now has the discussion reached the point where 'prominent' scheduler hackers are offering much more comment than "why would you want to do that?" and realising that there is a genuine need which this hack is an attempt to address.
Ben-Yehuda is like a CPU-bound Con Kolivas with an extra language barrier.
Posted Sep 3, 2009 13:29 UTC (Thu) by mingo (subscriber, #31122)
[Link]
Only now has the discussion reached the point where 'prominent' scheduler hackers are offering much more comment than "why would you want to do that?" and realising that there is a genuine need which this hack is an attempt to address.
As the article mentioned, the crux of the issue is a dynamic (not HZ driven) scheduler tick.
If you followed scheduler development you might have noticed that this (dynamic scheduler tick) was implemented 1.5 years ago by Peter Zijstra (who is the other scheduler maintainer in addition to myself).
For details, see this upstream commit:
commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
Author: Peter Zijlstra < a.p.zijlstra@chello.nl >
Date: Fri Jan 25 21:08:29 2008 +0100
sched: high-res preemption tick
It was released in the v2.6.26 kernel iirc.
Nobody was really interested in it though and it had stability problems so it's disabled currently. It's a nice feature and completing that would speed up _all_ applications which are currently interrupted HZ times a second.
So not only have the scheduler maintainers realized this problem years ago, they have also implemented a rough prototype solution as well and tried to productize it. Given enough interest in this topic, it could be finished - most of the code is still there.
So i'm with Thomas on this one: the 'offline scheduler' is on the wrong track in its current form and we can do better than that. The scheduler maintainers (have to) insist on things to be implemented correctly and cleanly so that as many Linux applications can benefit from the end result as possible - not just the proprietary code Ben-Yehuda claimed the 'offline scheduler' was designed for.
Dynamic scheduler tick
Posted Sep 4, 2009 14:33 UTC (Fri) by razb (guest, #43424)
[Link]
Hello Mingo
1. the offline scheduler is about treating a processor as a device. this is why I am offloading it. i have compared in my essay several partition- system, CPU sets, INtime and IBM partitions. I did not comare it to dynticks because dynticks is simply a different matter.
2. the offline schdeuler has other features that monitor (RTOP) and protect the kernel ( offline firewall ) when it is not possible.
Posted Sep 4, 2009 15:30 UTC (Fri) by mingo (subscriber, #31122)
[Link]
Hello Mingo
1. the offline scheduler is about treating a processor as a device. this is why I am offloading it. i have compared in my essay several partition- system, CPU sets, INtime and IBM partitions. I did not comare it to dynticks because dynticks is simply a different matter.
The "offline scheduler" is, as you say, a CPU partitioning scheme.
Our (oft repeated) point is that Linux already has a CPU partitioning scheme: cpusets. It can be configured dynamically and will isolate one (or more CPUs) just fine.
This cpusets scheduler feature has been added to the Linux kernel 4.5 years ago in 2005, and has been released as part of the v2.6.12 Linux kernel. It has been part of Linux ever since then - continuously fixed/updated/enhanced.
If cpusets as implemented today does not fit your needs then the (upstream acceptable) solution is not to add a completely different facility with its extra layering, but to fix the currently existing one.
That will benefit all current cpusets users as well beyond enabling the usecases you are interested in.
A new facility is only added if the old one is unfixable. That has not been outlined here - it has not even been argued to be unfixable. [If that is proven then the new facility will simply replace the old (broken) one.]
This is really how the Linux kernel is developed - and always was. We try to avoid reinventing the wheel and we try to avoid duplicate functionality in the core kernel as much as possible. This is what is happening here too.
It sure does mean extra work and requires willingness to work with existing upstream facilities.
Duplicate/overlapping functionality quickly becomes a mess to users and is unmaintainable as well in the long run due to the increased complexity. We try to avoid such overlap and duplication as much as possible.
The lkml discussions with you stalled because you basically only repeated your arguments why you'd want to have the offline scheduler (which in itself is fine) - without showing much interest in improving existing kernel facilities or showing that they are unfixable (which is not fine if you want to enhance the upstream kernel).
Anyway, there's lots of possibilities how to continue this on the technical level. Everyone agrees that undisturbed CPU cores are desirable, so if you (or someone else) implements it correctly it will be accepted upstream - and gladly so. The job of a maintainer (like me) is to say 'no' to patches that are (not yet) good enough technically.
Thanks,
Ingo
Dynamic scheduler tick
Posted Sep 4, 2009 20:46 UTC (Fri) by razb (guest, #43424)
[Link]
Hello again Ingo
Well, I understand your arguments and agree with the "upstream" consideration. the offline scheduler approach is agressive . when i offlined napi, i had to do some re-writing in dev.c .
>The lkml discussions with you stalled because you basically only >repeated your arguments why you'd want to have the offline scheduler >(which in itself is fine) - without showing much interest in improving >existing kernel facilities or showing that they are unfixable (which is >not fine if you want to enhance the upstream kernel
In the case of cpu sets, i argue that cpu sets do not provide complete partitioning. Meaning , i cannot ask a packet from 10gbps interface to be moved to processor X and another packet from the same 10gbps interface to be moved to processor Y. why should a flash video packet be moved to processor 7 if processor 7 is heavily busy with incoming ftp traffic ?
For the best of my knowledge; a napi context is triggered by the first packet which can be any processor "in the affinity".
But this is possible by offlin'ing napi. just simply route packets by their service type; not by irq masking; And who care for cache misses if i have an entire processor to do that work;
But you are correct that i haven't replied with technical details. i just posted the link to the essay.
what is correct way to isolate a processor, What are the restrictions ? what are the requirements ?
Raz
Dynamic scheduler tick
Posted Sep 4, 2009 21:07 UTC (Fri) by mingo (subscriber, #31122)
[Link]
[...] In the case of cpu sets, i argue that cpu sets do not provide complete partitioning. [...]
Obviously they do not, as otherwise you would not have implemented your patch.
My point, which i outlined in more detail in my reply above, is that there are two approaches possible that are acceptable for upstreaming:
- either extend and fix cpusets with the features you desire
- or prove/show that that's impossible or undesirable. (in which case your solution will have to replace cpusets, cover all its usecases, migrate all its APIs and users smoothly, etc., etc.)
You took a third approach: "I added it as a new, separate, special-purpose feature, not integrated with existing cpusets facilities because it was the easiest for me that way".
That is the ... short-term easy but long-term expensive answer which people on lkml objected to for good reasons. We've been there, we've done that, we are still suffering the consequences ;-)
Linux is a 18+ years old kernel, there's not that many easy projects left in it anymore :-/ Core kernel features that look basic and which are not in Linux yet often turn out to be not that simple.
I hope this explains our point of view. We can continue this discussion on lkml - i'm very interested in extensions to cpusets and Peter Zijstra outlined models for integrating IRQ space partitioning into the cpusets model. (he called them system-sets) He sent a few prototype patches to lkml as well - early 2008 IIRC. Those could be picked up and finished, if you are interested.
Thanks,
Ingo
Dynamic scheduler tick
Posted Sep 15, 2009 8:21 UTC (Tue) by linuxrocks123 (guest, #34648)
[Link]
I've been using dynamic tick for over a year. I just checked and I have the kernel option for it enabled in my 2.6.29.6 kernel. When was it disabled? Will you bring it back?
The offline scheduler
Posted Sep 3, 2009 22:06 UTC (Thu) by razb (guest, #43424)
[Link]
:) I simply decided to stay low. I did not know it would irritate so many people.
raz
The offline scheduler
Posted Sep 4, 2009 15:36 UTC (Fri) by mingo (subscriber, #31122)
[Link]
As far as i'm concerned the patches do not irritate me - why should they? (I dont find them upstream acceptable in their current form but hey, most of my own feature patches are not acceptable in their initial form either ;-)
"Staying low" is the worst possibly strategy if you want to improve the upstream kernel. Engaging in the process and listening to upstream feedback and acting on suggestions is important.