LWN.net Logo

KS2011: Scheduler testing

By Jonathan Corbet
October 24, 2011
2011 Kernel Summit coverage
Google's Paul Turner started his session by saying that scheduler testing is a hard problem. Results are hard to reproduce, especially when users cannot share their workloads, and remote debugging is impossible. There are a lot of possible machine topologies, expanding the problem space. The lack of testability makes developers reluctant to make significant changes to the scheduler; it is too easy to break somebody else's workload and not find out until it is too late. But there are a lot of changes that need to be made, especially (for Paul, at least) in the areas of load balancing and power management. What is to be done?

Load balancing is a concern at Google; having idle CPUs when there are processes waiting for a chance to run on other CPUs is not an ideal situation. It is not too hard to come up with some metrics to describe the "goodness" of the scheduler's load balancing decisions; it is just a matter at looking at the state of the system and determining if a job on one CPU should have been placed on a different one. The problem is that this computation is sufficiently hard that it can't be done in real time; otherwise there would be no real load balancing problem.

Enter LinSched, a scheduler simulator for Linux. This tool was released in 2010, but, unfortunately, "it was awful." The idea was good, but the implementation was not. So the folks at Google have reworked the whole thing, ending up with something that looks a lot like user-mode Linux, but which is aimed at scheduler testing. There are no hooks placed in the scheduler itself, so it is easy to change the scheduler or apply patches before testing. It is fast, and has support for most scheduler features. All of the instrumentation and support has been pushed off into a new "linsched" virtual architecture.

The next step is to come up with useful workloads. A number of Google workloads were observed to the point where they could come up with patterns for 500 different situations. With the new linsched, they can quickly test all of these workloads with a variety of system topologies; the whole thing can be run under a debugger, so it is easy to stop and examine decisions that go wrong. Finding the "wrong" decisions is a matter of calculating the load-balancing score for a reference scheduler, then looking for places where a test scheduler generates worse scores. This tool has already been used to find (and fix) workload-specific regressions resulting from some mainline scheduler patches.

They would like to release this code and push it into the kernel's tools/ directory. That could have even happened for the 3.2 cycle, but the kernel.org outage slowed things down somewhat. Interest in the tool among developers has been high, and in academic communities as well.

It is safe to say that this news was warmly received in the room at the kernel summit. Having a test tool like this gives developers a much higher degree of confidence when they are making changes to the scheduler. So Ingo Molnar welcomed the tool, asking only that the perf infrastructure be used as much as possible. He would like to see the workload descriptions merged as well so that everything needed to run scheduler tests will be present in the mainline kernel. Scheduler problems, he said, traditionally take a long time to find; having a tool like this would allow the scheduler to be improved much more aggressively. He could see a day when no scheduler patches would be accepted before they had passed this set of tests.

A number of other things could be added to the tool in the future - simulating preemption latency, for example. It didn't take developers long to say that they would like to have a similar tool for other parts of the kernel - the memory management subsystem came to mind. That is a rather harder problem - that subsystem is rather more complex than the scheduler. But even being able to exhaustively test just the scheduler is a big step in the right direction.

Next: Patch review


(Log in to post comments)

KS2011: Scheduler testing

Posted Oct 26, 2011 13:12 UTC (Wed) by mingo (subscriber, #31122) [Link]

So Ingo Molnar welcomed the tool, asking only that the perf infrastructure be used as much as possible.

Just a quick clarification: Paul's scheduler analysis and simulation tool already uses 'perf sched record' to collect data, and the analysis and simulation capabilities overlaps with (and eclipse) what 'perf sched' is about so what I asked for was for Paul's tool to become 'perf sched' in essence.

What we don't want is for the old 'perf sched' to stay around dangling.

BFS

Posted Oct 26, 2011 16:36 UTC (Wed) by abacus (guest, #49001) [Link]

Not a single word about the BFS ? Does that mean that the upstream scheduler is now superior in all regards compared to the BFS ?

BFS

Posted Oct 26, 2011 19:41 UTC (Wed) by corbet (editor, #1) [Link]

The BFS author is not interested in creating a general-purpose scheduler, and he is not interested in working with the development community. It is not surprising that his work is not on the agenda at a meeting like this.

BFS

Posted Oct 27, 2011 10:47 UTC (Thu) by intgr (subscriber, #39733) [Link]

BFS was announced in 2009, that's ages ago at kernel development time scales. Mainline scheduler developers already analyzed and discussed it back in those days, made some improvements to CFS and moved on. It's simply not relevant anymore.

BFS wasn't "superior in all regards", don't expect that from the mainline scheduler either. Every heuristic algorithm will be better in some cases and worse in others.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds