User: Password:
Subscribe / Log in / New account

Rethinking power-aware scheduling

Rethinking power-aware scheduling

Posted Jan 13, 2012 17:52 UTC (Fri) by daglwn (guest, #65432)
In reply to: Rethinking power-aware scheduling by jhhaller
Parent article: Rethinking power-aware scheduling

That's exactly right. Even in the HPC world where I work, power is already at the top of the list of concerns. Performance always matters but we don't have unlimited resources as we could imagine just a few years ago.

One thing that troubles me about the conversation is the idea that one can determine power needs based on whether the machine is running on battery or not. I know that it can be customized. It's the thought process and assumptions made that concern me. We're at the point where EVERYONE needs low power, just various degrees of it.

(Log in to post comments)

Rethinking power-aware scheduling

Posted Jan 13, 2012 22:20 UTC (Fri) by dlang (subscriber, #313) [Link]

everyone needs low power, but not everyone is willing to sacrifice performance to get low power.

that's the key issue here.

the optimal performance thing is to distribute the work as widely as possible to reduce the performance impact of shared resource contention (even if that shared resource is just the cache attached to a particular core)

But that leads to many cores running at a small fraction of their capacity.

the optimal power saving mode is to get as many cores as possible to be completely idle so that they can be powered down, even if this reduces performance.

which one is the right choice depends on what you are trying to do, but if I purchase a machine with 8 cores, I don't want the system slowing my response time by 10% because it thinks that approximately the same performance can be achieved by only using 4 cores. If I was willing to accept that, I would have saved money (and even more power) by only buying 4 cores in the first place.

Rethinking power-aware scheduling

Posted Jan 13, 2012 22:32 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

You gain performance by reducing cache contention, but you potentially gain performance by having multiple threads on the same core and running directly out of cache. Bursty workloads may also benefit from being concentrated on one package in order to reduce the likelihood of that package entering deep package C states, while still giving an overall power win because the other packages can do.

It's not an either/or scenario. If you care deeply about performance then you need to tune your scheduler for your specific workload, just like you end up having to tune the VM or io scheduler.

Rethinking power-aware scheduling

Posted Jan 13, 2012 23:08 UTC (Fri) by dlang (subscriber, #313) [Link]

if you have processes/threads sharing something, then you should take that into account when scheduling them to reduce the cost of the sharing.

but if you are talking about processes (which is the more common case), then you don't gain anything by having them share a cache, and in fact you are less likely to allow them to run out of cache if they share it because they will be contending for the space.

if you have a problem of entering C states reducing your performance, then the answer should be to change the controller that is causing you to enter the C states to hurt you less.

this doesn't require you to tune the scheduler for every workload, it simply takes accepting the fact that what is best for power is not going to be best for performance, and therefor not insisting that 'everyone cares about power' which implies that the power saving mode is the only one that should matter.

going back up the thread a few posts, the heuristic that if you are on battery power you are probably willing to sacrifice a bit of performance for significant reductions in power use, but if you are on line power you are probably not does represent the real world. It's not perfect, which is why it is a default, not a hard-coded mode, but it's a pretty accurate heuristic.

Rethinking power-aware scheduling

Posted Jan 13, 2012 23:15 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

It's not the heuristic our customers ask for, so I'd be interested to know how you're defining it as accurate.

Rethinking power-aware scheduling

Posted Jan 13, 2012 23:23 UTC (Fri) by dlang (subscriber, #313) [Link]

It's exactly what people are used to.

on their laptops, when they unplug the screen dims slightly and the systems switch to a more aggressive power saving mode.

your customers are not asking for it explicitly because they are used to getting it by default.

most of them won't realise what the problem is if they don't get it, they will just consider their device sluggish (or at least not as fast as the competition) if they don't get peak performance when plugged in, and they will consider the device/OS to be a power hog if it doesn't last as long when on battery power.

Rethinking power-aware scheduling

Posted Jan 13, 2012 23:28 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

Since we mostly sell into the enterprise server market, I'm pretty sure that that's not what they're talking about.

Rethinking power-aware scheduling

Posted Jan 13, 2012 23:39 UTC (Fri) by dlang (subscriber, #313) [Link]

Ok, I don't know who you are or what you market, but I also don't know very many enterprise servers that have battery powered modes.

Rethinking power-aware scheduling

Posted Jan 13, 2012 23:44 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

None. That's the point. They want aggressive power management despite these devices always being plugged in. The assumption that just because you're not running off battery you're not interested in power management is one that's untrue for a huge proportion of Linux users. It's in no way an accurate heuristic.

Rethinking power-aware scheduling

Posted Jan 14, 2012 0:02 UTC (Sat) by dlang (subscriber, #313) [Link]

so if you want aggressive power management, you set it. nobody is preventing it.

But setting aggressive power management for all cases as the default for everyone is wrong.

Rethinking power-aware scheduling

Posted Jan 14, 2012 22:43 UTC (Sat) by raven667 (subscriber, #5198) [Link]

But powering more hardware than required to run the workload is wasteful of power for no benefit. Isn't it a good idea to work on the scheduler so that it can run the computer just as hard with just as much power use as necessary and no more? You may want a tunable to make the power saving so aggressive that it affects performance, although the wisdom IIRC is that making workloads run slow makes them use more power by running longer. Making the default no power saving at all is probably not reasonable.

Rethinking power-aware scheduling

Posted Jan 14, 2012 22:48 UTC (Sat) by dlang (subscriber, #313) [Link]

there is no way to have power savings with no performance penalty under any conditions.

it takes time to bring CPUs out of sleep states, and during that time the work that is waiting for them may not be able to get done.

it is not always less power to run at full speed and then sleep, that is frequently the case, but it depends on the ability to move in and out of sleep, along with the amount of power saved.

In this case, we are talking about the options when you have multiple cores, some sharing components, and have less work than it takes to max out all the cores.

putting all the work on one core and powering off the other cores may save power, but it could make the work take longer (but not enough longer to use more power than the other cores would consume if they were not powered down). for some people having the work take slightly longer won't matter, for others it will.

Rethinking power-aware scheduling

Posted Jan 15, 2012 3:53 UTC (Sun) by raven667 (subscriber, #5198) [Link]

There is no reason to think that a power aware scheduler can't be good enough to be the default, is there? Even for latency sensitive operations the scheduler could keep some amount of idle capacity available for bursts of work without running the whole machine at full bore. It seems to me that power saving should be the default even for machines on mains power

Rethinking power-aware scheduling

Posted Jan 15, 2012 13:48 UTC (Sun) by mjg59 (subscriber, #23239) [Link]

Nonsense. Turbo mode is an example of aggressive power management resulting in significantly enhanced performance under certain workloads.

Rethinking power-aware scheduling

Posted Jan 15, 2012 11:09 UTC (Sun) by liljencrantz (guest, #28458) [Link]

What makes you so sure that your intuition on what is an accurate heuristic for power management is so much better than Matthew Garret's? As a kernel developer that seems to work almost full time on power issues for Red Hat, one would hope that he has a more than passing familiarity with the needs of the enterprise market. If your answer is along the lines of intuition/personal experience, then perhaps you should consider the possibility that your needs are the atypical ones? If your answer is something entirely different, then please elucidate us, because right now it might seem like you're stating opinions as facts.

My somewhat limited personal experience is that most data centers I've worked with have a total power limit per rack that is painfully low, and that reducing power usage by a few watts per system would allow us to stuff in one more server per rack, leading to a significant amount of savings. This resonates well with what Garret is saying.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds