TurboSched: the return of small-task packing
Those of us who have been in this field for far too long will, when seeing "turbo mode", think back to the "turbo button" that appeared on personal computers in the 1980s. Pushing it would clock the processor beyond its original breathtaking 4.77MHz rate to something even faster — a rate that certain applications were unprepared for, which is why the "go slower" mode was provided at all. Modern turbo mode is a different thing, though, and it's not just a matter of a missing front-panel button. In short, it allows a processor to be overclocked above its rated maximum frequency for a period of time when the load on the rest of system overall allows it.
Turbo mode can thus increase the CPU cycles available to a given process, but there is a reason why the CPU's rated maximum frequency is lower than what turbo mode provides. The high-speed mode can only be sustained as long as the CPU temperature does not get too high and, crucially (for the scheduler), the overall power load on the system must not be too high. That, in turn, implies that some CPUs must be powered down; if all CPUs are running, there will not be enough power available for any of those CPUs to go into the turbo mode. This mode, thus, is only usable for certain types of workloads and will not be usable (or beneficial) for many others.
A workload that would work well in turbo mode is one where the system as a whole is not fully utilized (so that some CPUs can be shut down), and where a relatively small number of processes can benefit from the higher CPU speeds. But that benefit will only be realized if the turbo mode can actually be used. The CPU scheduler in current kernels balances a great many requirements, but "make sure that some CPUs can go into turbo mode" has not been expressed as a need to be balanced in the past. It's thus unsurprising that the scheduler's operation is not optimal for systems with turbo mode and workloads that want to take advantage of that mode.
One problem in particular is that the scheduler is designed to keep the system as responsive as possible and to make the fullest use of the available CPUs. That goal is reflected in how processes are placed on CPUs throughout the system. If a sleeping process wakes up and needs to execute, the scheduler will try to place that process on an idle CPU, thus allowing it to execute immediately rather than waiting in a run queue. That is the right thing to do much of the time, but it is not ideal if your objective is to keep some CPUs powered down so that the others can run in turbo mode. In such cases, it might be better to put a newly awakened process onto a CPU that is already busy and let sleeping CPUs lie.
Getting the scheduler to pack more processes into running CPUs is the objective of the TurboSched patch set. But such packing needs to be done carefully; otherwise, scheduling latency could increase significantly and system utilization could be reduced. To avoid such problem, TurboSched limits this packing behavior to "jitter" processes — those that run sporadically for limited periods of time and which do not have significant response-time requirements. These processes are often doing some sort of housekeeping work and do not suffer from having to share a CPU with other work.
A question that immediately comes to mind is: how does the scheduler decide which processes fit into this "jitter" category? The answer is that it doesn't; such processes need to be specifically marked by user space. Specifically, TurboSched is built on top of the (still unmerged) scheduler utilization clamping patch set, which allows an administrator to impose limits on how much load any given process appears to put on the system. By putting an upper limit on the apparent load, the administrator can keep a given process from forcing a CPU's frequency to increase, even if that process will happily run 100% of the time. Processes marked this way already have a reduced claim to system CPU resources; TurboSched extends this interpretation and concludes that a sleeping CPU should not be powered up for processes whose maximum utilization is clamped.
The logic as implemented in the patch set actually goes a little beyond that, in that jitter processes will not be placed onto a CPU that is running at less that 12.5% of its capacity. The reasoning is that an underutilized CPU might well go idle soon; putting a new process there could prevent that from happening, which would be an undesirable result. Of course, it would also not be good to overload the running CPUs with jitter tasks, so there is an upper limit to how much jitter load can be placed on any given CPU.
This approach may seem familiar; it is reminiscent of the small-task packing algorithms that have been discussed since (at least) 2012. Small-task packing has never made its way into the mainline, so one might wonder why this variation would be different. The biggest difference this time is in the explicit marking of jitter tasks, which will effectively make TurboSched be a no-op on the bulk of the systems out there. In the absence of clamped tasks, the scheduler will run as it does now, so there should be no performance regressions for any existing workloads.
Meanwhile, the benefit for some workloads can be up to a 13% performance
increase, according to some impressive ASCII graphics in the patch
posting. This increase won't happen with all workloads, but on dedicated
systems with well-understood and tuned workloads with the right mix of
processes, TurboSched should make things run better. That, along with the
relatively noninvasive nature of the patch set, suggests that it might just
clear the high bar for scheduler changes.
Index entries for this article | |
---|---|
Kernel | Scheduler |
Posted Jul 2, 2019 10:32 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Every desktop and server machine I've owned for the last decade has had turbo mode, so it's lovely to see a patchset that makes the scheduler properly aware of it --- but if this patchset requires explicit marking of processes, it seems unlikely that it'll be widely used :(
Speculating, perhaps the explicit marking could be automated away if there was a PAM module so you could group entire uids as interactive or not, and/or just mark everything that's a direct child of the X session as interactive -- or have the Wayland and X libraries mark clients of X or Wayland as interactive, and maybe arrange that things that talk to a TTY or PTY directly get an interactive marking... and then treat everything else with a low enough recent utilization as amenable to packing. (I'd imagine that new processes inherit the initial state of this mark from their parents, and probably you'd need a bit of code change to have make(1) and ninja etc mark themselves, since they are seriously noninteractive but only operate briefly).
Posted Jul 2, 2019 11:57 UTC (Tue)
by james (subscriber, #1325)
[Link]
Android has a very good idea which tasks are interactive at any one point. Systemd doesn't quite have that overview of the system, but it shouldn't be difficult for a distro or admin to identify services that "run sporadically for limited periods of time and ... do not have significant response-time requirements" and mark them as such in the unit file.
I note that the most recent version "removed cgroup based task
classification": I hope the systemd maintainers are consulted on the API.
Posted Jul 2, 2019 15:16 UTC (Tue)
by NightMonkey (guest, #23051)
[Link] (1 responses)
"1963 Timesharing: A Solution to Computer Bottlenecks": https://www.youtube.com/watch?v=Q07PhW5sCEk
Cheers!
Posted Jul 2, 2019 15:35 UTC (Tue)
by fotoba (subscriber, #61150)
[Link]
3.9 Merge window part 1
As of Inian run
Posted Jul 2, 2019 19:04 UTC (Tue)
by jhoblitt (subscriber, #77733)
[Link] (2 responses)
Posted Jul 3, 2019 10:56 UTC (Wed)
by nilsmeyer (guest, #122604)
[Link] (1 responses)
Posted Jul 3, 2019 13:50 UTC (Wed)
by Clozure (guest, #125255)
[Link]
Posted Jul 6, 2019 19:40 UTC (Sat)
by robert_s (subscriber, #42402)
[Link]
Wouldn't it be possible for this feature to piggyback on the PR_SET_TIMERSLACK setting in prctl? Surely a task which is marked as "ok being a little bit late" is exactly what this feature is looking for...
TurboSched: the return of small-task packing
TurboSched: the return of small-task packing
The biggest difference this time is in the explicit marking of jitter tasks, which will effectively make TurboSched be a no-op on the bulk of the systems out there.
Once upon a time this would have been true, but I'd be surprised if systemd didn't pick this up, and Android too (to the extent it isn't doing its own thing).
TurboSched: the return of small-task packing
TurboSched: the return of small-task packing
By Jonathan Corbet
February 20, 2013
A relatively simple scheduler patch fixes the "bouncing cow problem," wherein, on a system with more processors than running processes, those processes can wander across the processors, yielding poor cache behavior. For a "worst-case" tbench benchmark run, the result is a 15x improvement in performance.
https://lwn.net/Articles/538101/
https://www.youtube.com/watch?v=oFS8dxZ2SdU
TurboSched: the return of small-task packing
TurboSched: the return of small-task packing
TurboSched: the return of small-task packing
TurboSched: the return of small-task packing