LWN: Comments on "Schedulers: the plot thickens"

Ouch.

slamb — Thu, 26 Apr 2007 21:58:57 +0000

But, on the other hand, runqueue handling is an issue that, so far, every scheduler has shown problem with, one time or another. The radical idea of doing away with them altogether certainly deserves a closer look.

I'm surprised by the combination of no one doing this before and someone doing it now. Before reading the recent article about the scheduler, I'd only seen priority queues implemented as an array of plain queues in cases where there were only a handful of priorities. When there's one per nice level (140?) or many more (priority is a function of nice level and timeslice left), it seems like trees or heaps would be an obvious choice. Having a sorted structure seems much simpler than doing these minor and major rotations to the array, with this secondary "expired" array.

So given that they originally did this a different way, the logical question is why. Was it so the time complexity could be O(p) [*] rather than O(log n)? Well, now Ingo's apparently thinking that's not important. How many processes was the O(1) scheduler designed to support? How long does Ingo's scheduler take to run in that situation?

If O(log n) does turn out to be a problem, I wonder if a mostly-sorted soft heap would be better at amortized O(1). Faster as a whole, and "corrupting" a fixed percentage of priorities might not be a bad way to avoid total starvation of low-priority processes, but amortized time might mean it'd be too jerky to be used.

[*] - p = number of priorities...from skimming the description, it doesn't look like it was ever O(1) like the cool kids said. They just considered p to be a constant 140.

Schedulers: the plot thickens

jospoortvliet — Tue, 24 Apr 2007 12:25:38 +0000

there isn't really a standard testsuite, but many of the problem cases (and the little code snippets to show 'em) are sticking around, and are used for testing new scheduler improvements/changes. Quite a few float around on the LKML and also on Dr Con's mailinglist.

Scheduler architecture and modularity

brugolsky — Thu, 19 Apr 2007 13:12:39 +0000

The "modularity" in Ingo's queue interface would seem to lend itself toward implementing something similar to the traffic control packet scheduler framework. IIRC, OpenVZ uses a TBF-based hierarchical fair scheduler; it would be interesting to see it ported to CFS.

Schedulers: the plot thickens

i3839 — Thu, 19 Apr 2007 12:34:13 +0000

Another important property of Ingo's scheduler is that the time is measured in nanoseconds. According to a later email, Ingo said that the earlier version without the high precision and using queues didn't work well at all. This are the two things that seem to have been limiting RSDL and other schedulers, as strange artefacts cropped up because of the queue and low granularity design.

Peter William (of plugsched) also wrote a scheduler, and experience with trying out different things.

William Lee Irwin III (now I understand why people call him wli ;-) is hammering on the importance of a standard test suite for schedulers, so if there are people with free time who want to help him with setting one up...

rbtree is per-CPU

smurf — Thu, 19 Apr 2007 06:47:28 +0000

Gah. Forgive my sloppy use of language. By "single RB tree" I meant "a single tree to replace the former run queues structure", not "a single tree on the whole system". Process migration between CPUs is, after all, not going to go away.

On another front, despite Con being rather miffed by Ingo's original patch, the subsequent dialog between them is a model of mutual respect that lots of people can learn from. Myself included.

rbtree is per-CPU

kwchen — Thu, 19 Apr 2007 06:28:47 +0000

Has anyone experimented with one scheduling entity per numa-node, or one per physical CPU package etc, instead of current one per CPU?

rbtree is per-CPU

axboe — Thu, 19 Apr 2007 06:18:40 +0000

Hi,

The rbtree task timeline is per CPU, it's not a global entity. The latter would naturally never fly.

Ouch.

smurf — Thu, 19 Apr 2007 06:09:36 +0000

I can certainly understand Con.

But, on the other hand, runqueue handling is an issue that, so far, every scheduler has shown problem with, one time or another. The radical idea of doing away with them altogether certainly deserves a closer look.

I'm interested how the single-RB-tree idea will handle on a machine with a lot of CPUs. Off to check gmane.lists.linux.kernel ...

Schedulers: the plot thickens

dlang — Thu, 19 Apr 2007 02:33:53 +0000

if Ingo or Linus get fired up on a subject the process for testers is

1. find the latest version of the patch
2. download it
3. if on a slow link, go back to #1
4. compile it
5. if on a slow cpu, go back to #1
6. start testing
7. find bug
8. go back to #1
9. report bug.

with normal developers you can count on the code being stable for a day or two after release, you don't have to keep checking for new releases

yes, this is slightly exaggerating things, but not my much (sometimes it seems like the time it takes for you to read the e-mail announceing a release is enough time for an update)

LWN: Comments on "Schedulers: the plot thickens"

*Ouch*.

Schedulers: the plot thickens

Scheduler architecture and modularity

Schedulers: the plot thickens

rbtree is per-CPU

rbtree is per-CPU

rbtree is per-CPU

*Ouch*.

Schedulers: the plot thickens

Ouch.

Ouch.