LWN.net Logo

Advertisement

E-Commerce & credit card processing - the Open Source way!

Advertise here

Matthew Garrett on the race to idle

Matthew Garrett talks about power saving strategy in his unique manner. "Some people write software that lets you choose different power profiles depending on whether you're on AC or battery. Typically, one of the choices lets you reduce the speed of your processor when you're on battery. This is bad. It is wrong. The people who implement these programs are dangerous. Do not listen to them. Do not endorse their product and/or newsletter. Do not allow your eldest child to engage in conjugal acts with them. Doing this will reduce your battery life. It will heat up your home. It will kill baby seals. The sea will rise and your car will float away. If you are already running it, make sure that it always sets your cpufreq governor to ondemand and does not limit the frequencies in use. Failure to do so will result in me setting you on fire."
(Log in to post comments)

Matthew Garrett on the race to idle

Posted May 9, 2008 12:13 UTC (Fri) by JoeBuck (subscriber, #2330) [Link]

I think Matthew might be oversimplifying. The real issue is whether, on a given processor, halving the clock frequency saves more than half the power, or less than half the power.

The dynamic power consumed by a CMOS circuit is proportional to the square of the voltage, while the clock speed is proportional to the voltage itself. Leakage power is constant. So, at least for embedded processors, like ARMs, where leakage power is low, voltage scaling can be a win over sleeping: you halve the speed and power drops by nearly a factor of four.

For 65 nm or 45 nm silicon, leakage power is considerable, and the only way to get rid of it is to sleep. Also, noise margins might not allow all voltages to be scaled very much. So it might well be that, for a given processor, voltage scaling always loses as compared to sleeping. But it certainly isn't true in most cases for embedded systems under a predictable load (doing DSP applications like video encoding/decoding or CDMA/GSM modem operation).

Matthew Garrett on the race to idle

Posted May 9, 2008 12:17 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

Yeah, I should probably have made it clear that I'm primarily talking about x86 here. Things
are likely to be different in the embedded world, especially given the different constraints
that system on chip designs tend to impose on you.

Matthew Garrett on the race to idle

Posted May 9, 2008 23:56 UTC (Fri) by felixfix (subscriber, #242) [Link]

You still have to allow for the fact that the rest of the system (disk drive, display) is
still drawing power.  It's like speeding up a small section of code by a factor of ten -- if
that code only runs 10% of the time, you have only speeded up the entire program by 9% (isn't
this Amdahl's law?).  Factoring in just the CPU power doesn't answer the question properly.
You may end up using more power and drawing down your battery even further.

In fact, TFA said this.

Matthew Garrett on the race to idle

Posted May 11, 2008 4:25 UTC (Sun) by IkeTo (subscriber, #2122) [Link]

> Leakage power is constant.

Should it be proportional to voltage?

Matthew Garrett on the race to idle

Posted May 11, 2008 18:10 UTC (Sun) by bronson (subscriber, #4806) [Link]

It's been many years since I've done VLSI but, yes, we normally modeled leakage as a constant.
It's mostly affected by temperature and process characteristics, not voltage.

If applied voltage is near the thermal voltage (25.9mV) then I suppose you might have to
include voltage in the equation.  But we're nowhere near that today and I'm not sure anyone
would ever want to run that close to the noise floor (I know, I know, 640K should be enough
for anybody...  we'll see!)

Matthew Garrett on the race to idle

Posted May 12, 2008 2:31 UTC (Mon) by IkeTo (subscriber, #2122) [Link]

> we normally modeled leakage as a constant.

My understanding is that leakage *current* is nearly constant (unless the voltage difference
is very small), so leakage power is proportional to voltage, and leakage energy is
proportional to voltage multiplied by time, i.e., voltage divides by frequency.  Since voltage
divides by frequency is increases not very much when you decrease frequency, this is nearly
constant energy.  If leakage *power* is constant instead, the leakage energy will be
proportional to the inverse of frequency.  Then voltage scaling would be doing something very
bad to energy consumption!

Matthew Garrett on the race to idle

Posted May 13, 2008 2:20 UTC (Tue) by forthy (guest, #1525) [Link]

> Should it be proportional to voltage?

It actually goes exponentially with voltage and temperature. This means scaling up the voltage for higher frequency and in turn heating up the CPU increases leakage by factors, which makes the original power saving equation somehow dubious.

Matthew Garrett on the race to idle

Posted May 13, 2008 3:08 UTC (Tue) by IkeTo (subscriber, #2122) [Link]

Would you mind clarifying what do you mean by "original power saving equation" and "dubious"?

Matthew Garrett on the race to idle

Posted May 11, 2008 8:39 UTC (Sun) by daniels (subscriber, #16193) [Link]

Um, except that chips like ARMs are hyper-optimised for sleep, and most (I'm excluding
insanity like StrongARM here) consume an absolute crapteenth of what they do running, while
asleep? It's not even really so much a race to idle, as a race to sleep, or even retention.

Matthew Garrett on the race to idle

Posted May 9, 2008 13:16 UTC (Fri) by mbizon (subscriber, #37138) [Link]


What about applications that eat all CPU cycles no matter what ?

Think of a game with no FPS limit, sure the gaming experience won't be as good at lowest
frequency, but at least you get to play the game longer.

Matthew Garrett on the race to idle

Posted May 9, 2008 13:44 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

But running at the same FPS without limiting the frequency would let you play for even longer.

Matthew Garrett on the race to idle

Posted May 9, 2008 14:41 UTC (Fri) by mbizon (subscriber, #37138) [Link]


That's not always practically possible, some games don't make this a tunable, and of course if
they are not open sourced, good luck on having this changed.

And how about users ? Do they have to remember they're running on battery and fix settings
accordingly ? Or do we have to make all applications of this kind 'PM aware' ?


And by the way, fixing the FPS has drawbacks.

Let's imagine you have a variable frame rate of 20 to 50 at lowest frequency depending on your
position/viewpoint in the game. Capping the frame rate at 50 and running at higher speed does
not guarantee that you'll play longer. Capping at 20 does not give you the same visual result
at all.

Matthew Garrett on the race to idle

Posted May 9, 2008 18:09 UTC (Fri) by dlang (subscriber, #313) [Link]

racing to idle works if you can predict the future workload well enough, and if the wakup time
to ramp back up to full speed is fast enough to not annoy the user.

if either of these are incorrect it may very well be that you are better off running at a
lower clock rate  even if you are slightly less efficiant in cyles/watt.

like all benchmarks, it depends on your actual load.

if the sleep/wake time can get to be fast enough (I think <100ms is frequently fast enough)
the user won't notice the difference, and then you can start to shift to the 'race to idle'
mode, but if it takes longer then that to respond to a keystroke the user will start noticing.

Matthew Garrett on the race to idle

Posted May 9, 2008 18:12 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

Shifting from C4 to C0 takes around 17 microseconds on my hardware, which is pretty typical. I
think that's somewhat less than 100 milliseconds :)

Matthew Garrett on the race to idle

Posted May 9, 2008 18:17 UTC (Fri) by dlang (subscriber, #313) [Link]

try it again with a USB device plugged in and the numbers can change drasticly.

some hardware combinations work well at shifting from one mode to another, others don't work
nearly as well.

David Lang

Matthew Garrett on the race to idle

Posted May 9, 2008 18:52 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

USB's tendancy to trigger DMA means that you're going to spend more time in C2 than would be
ideal, but on anything made in the past 5 years that's still going to result in you saving
more power than staying in C0 at a low voltage. Recent hardware will even automatically
promote itself from C3 to C2 without OS intervention.

Matthew Garrett on the race to idle

Posted May 9, 2008 23:06 UTC (Fri) by dlang (subscriber, #313) [Link]

in theory you are right, in practice it doesn't work as well.

take the OLPC laptop, designed for very good power management at the hardware level (with
serious talk of going to sleep between keystrokes). due in large part to the need to have long
enough delays to properly talk to things like external USB and SD devices this machine takes
>200ms to wake up.

yes there is defiantly room for improvement in the software, but the reliable interfacing to
external (but effectivly permanently attached) devices is bad enough that doing race-to-idle
ends up being a horrible thing in practice (they tried doing it in a few builds, it was bad
enough that people started disabling sleep entirely)

Matthew Garrett on the race to idle

Posted May 10, 2008 4:25 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

That's not race to idle, it's race to suspend. Linux is currently deficient in its requirement
that the entire device tree be resumed before userspace can restart, which results in suckage
like you describe. But that's an orthogonal issue - if you're talking about the processor
rather than the platform, then the power saving states add only small levels of latency.

Matthew Garrett on the race to idle

Posted May 10, 2008 4:37 UTC (Sat) by dlang (subscriber, #313) [Link]

if you aren't switching to a suspend mode when you hit the idle state, what is the benifit of
becoming idle?

you can blame it on whatever layers you want, but for the user the result is the same, in
practice the race-to-idle approach does not currently give a reasonable user experiance (in
many cases), and as a result, switching to a lower clock speed instead of race-to-idle is
actually better.

Matthew Garrett on the race to idle

Posted May 10, 2008 4:53 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

The benefit is that your CPU power draw drops to approximately nothing. On modern CPUs,
halving the speed of your processor doesn't halve its power draw. Letting it idle takes it
down to 0-1 watts.

Matthew Garrett on the race to idle

Posted May 11, 2008 5:23 UTC (Sun) by IkeTo (subscriber, #2122) [Link]

> On modern CPUs, halving the speed of your processor doesn't halve its power draw.

Hm... I read otherwise somewhere else, if you count only the power of the CPU.  (Actually it
should save more than half of its power, otherwise why slowing down?)  Would you mind sharing
with us where you get this idea?

Matthew Garrett on the race to idle

Posted May 10, 2008 12:31 UTC (Sat) by dilinger (subscriber, #2867) [Link]

Please do not use the OLPC laptop as an example.  The power management software is not nearly
close to being finished, and OLPC has suffered greatly from lack of manpower (I'll spare you
the details of _why_ we haven't had enough people).  In reality, there should be no reason why
we can't do <200ms  resume; however, no one within the organization has even _started_
optimizing away the extra 800ms that we deal with.

The automatic suspend stuff that we had been toying with was merely a hack.

Matthew Garrett on the race to idle

Posted May 11, 2008 8:44 UTC (Sun) by daniels (subscriber, #16193) [Link]

I wouldn't call Geode/OLPC 'designed for good power management': it's lower power than most
x86, sure, but isn't even in the same league as consumer device hardware like ARM and MIPS.
If you can actually measure the wake-from-sleep time at all, then you've lost.

Consumer hardware like the Nokia 770/N800/N810, OpenMoko, and similar, all races to sleep, and
sleeps/resumes much, much more often than you think (probably by a factor of thousands).  It
really can be about as transparent as the x86 execution/idle switch if your hardware is
decent, and you do it right.  Current Linux on ARM definitely does it right.

USB does make this more difficult by its very nature, but this isn't any more a problem with
the concept of race-to-sleep than FireWire's remote DMA security fiasco is a problem with the
concept of externally pluggable mass storage.

Matthew Garrett on the race to idle

Posted May 10, 2008 2:58 UTC (Sat) by farnz (subscriber, #17727) [Link]

Unfortunately, this conflicts hard with real-time constraints; I work with a soft real-time system based on Linux, and I've had to disable dropping into deeper C-states. Our system includes a smooth text scroller, which works by updating the screen every frame (16 milliseconds); thanks to the high performance of X11 on Intel Q35, the update takes less than a millisecond, and then that thread goes to sleep until the next frame starts.

With the latest Intel processors, we found that the screen jerked, as there was nothing but this update on one core, and it was going into a low C-state as soon as the update completed, then not coming out of idle until too late.

I note that there's an in-kernel mechanism to let drivers tell the scheduler about their latency needs (allowing them to always be scheduled on the core that's not being put into a low C-state); it's a shame that there's no way for a process to do the same, yet.

Matthew Garrett on the race to idle

Posted May 10, 2008 4:27 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

I agree with that. Applications need to be able to indicate that they're unable to tolerate
latency, and the scheduler and CPU governor need to cope with the extra restriction. I've been
talking to some of the embedded people about this, in terms of what sort of userspace-visible
knobs we need for effective power management without screwing up userspace.

Matthew Garrett on the race to idle

Posted May 12, 2008 11:03 UTC (Mon) by mgross (subscriber, #38112) [Link]

kernel/pm_qos_params.c provides an interface for communicating acceptable latencies.  (new in
2.6.25)

--mgross

Matthew Garrett on the race to idle

Posted May 12, 2008 18:49 UTC (Mon) by nix (subscriber, #2304) [Link]

... and it's built in unconditionally, even if you have no power 
management configured, which seems rather strange.

Matthew Garrett on the race to idle

Posted May 10, 2008 12:39 UTC (Sat) by renox (guest, #23785) [Link]

Weird, if your application needs to update the screen every 16ms, this means that you're using
some kind of timers to wake up every 16ms, so the kernel ought to be aware of this deadline
and change the C-state of the CPU accordingly..

Either I'm misunderstanding something or there is a bug somewhere, have you discussed this on
the LKML?

Matthew Garrett on the race to idle

Posted May 10, 2008 13:23 UTC (Sat) by farnz (subscriber, #17727) [Link]

Not yet discussed this on the LKML - disabling C states is a good enough workaround for now, and I'm currently knee-deep in VIA hardware issues to debug.

We're not using timers at all - we use the DRM to wait for VBlank. The kernel can't easily know (without kernel modesetting, which is a whole different can of worms to fix) that the frame rate of our screens is 60Hz. Once kernel modesetting lands, I'll certainly be looking into ensuring that the kernel is aware of the latency limits that wait for VBlank implies.

Matthew Garrett on the race to idle

Posted May 13, 2008 7:41 UTC (Tue) by daenzer (subscriber, #7050) [Link]

> We're not using timers at all - we use the DRM to wait for VBlank.

But then use X11 for rendering? If so, part of the reason for the latency being too high could
be the several context switches between the interrupt and the rendering operation taking
place. By using OpenGL with current Mesa and a current drm Git snapshot, it would be possible
to have the DRM emit the buffer swap operation from a tasklet triggered by the interrupt, and
the application could generate frames ahead of time and transparently sleep when it's too far
ahead.

Matthew Garrett on the race to idle

Posted May 13, 2008 8:46 UTC (Tue) by farnz (subscriber, #17727) [Link]

We do use X11 for rendering, but it's definitely the move from a low C-state that kills us. We can easily measure the latency of the rendering operation; in the worst case we've seen thus far (not on an Intel platform), we see around 0.03 milliseconds between the wait for VBlank syscall returning, and us being about to enter the wait again. Note that we sync the X stream at this point, so we don't re-enter the wait until the server has finished drawing.

Note that we don't draw by blitting a new frame from scratch; we delta the existing on-screen frame instead, which minimises the work involved.

Matthew Garrett on the race to idle

Posted May 11, 2008 1:19 UTC (Sun) by jd (guest, #26381) [Link]

It also depends on the nature of the hardware. Anything that can be offloaded from the CPU
onto a lower-power device would allow you to do work without CPU intervention and therefore
not require the CPU to be active. We might see more offloading and more intelligent
peripherals as PCIe 2.x gets marketshare, but certainly more if power savings start proving
substantial.

What if I want to save power on a desktop

Posted May 11, 2008 2:56 UTC (Sun) by rganesan (subscriber, #1182) [Link]

Do desktop CPUs support C states? powertop says that "Detailed C-state information is only
available on Mobile CPUs (laptops)". My desktop is pretty much on most of the time and I run
cpufreq with ondemand governor. With multicore being the way forward, does turning off/on
cores on demand make sense on the desktop? 

Remember Cyrix and HLT?

Posted May 11, 2008 4:56 UTC (Sun) by kleptog (subscriber, #1183) [Link]

Way back when I had a Cyrix chip and one nice thing about it was that when the OS did a HLT in
the idle loop (which linux did at the time) the power usage dropped significantly. Which was
good since these chips had a nasty habit of overheating when you tried to do anything complex.

Is this something they did and no-one ever followed up on? Because it seemed to me like a
rather obvious idea that everyone would do, but it seems I'm wrong... Seems rather a lot
easier than switching sleep states several times per second.

Remember Cyrix and HLT?

Posted May 11, 2008 8:36 UTC (Sun) by mjg59 (subscriber, #23239) [Link]

C states are effectively a more advanced implementation of the hlt instruction. C1 is
equivalent to hlt, with deeper states disabling more of the CPU and introducing more latency.

Overheating

Posted May 13, 2008 1:24 UTC (Tue) by ajtucker (subscriber, #11974) [Link]

After reading the article and re-enabling cpufreq-ondemand, I now remember why I use the
userspace cpuspeed daemon instead.  After 10 minutes or so my laptop switches itself off when
the CPU has reached something over 90 degrees C.

So while the benefit of switching states quickly is nice (it certainly feels more responsive),
the kernel CPU governers definitely need to be able to keep an eye on, and be constrained by,
the CPU temperature, as Matthew mentioned briefly in the article.

Overheating

Posted May 13, 2008 10:17 UTC (Tue) by zlynx (subscriber, #2285) [Link]

It sounds to me that your laptop needs to be taken apart and have the dust blown out.  You may
also have one of those problems where the heatsink uses a fluid coolant that has leaked out.

Overheating

Posted May 15, 2008 9:11 UTC (Thu) by ajtucker (subscriber, #11974) [Link]

Thanks, it'd been a while since I'd given it a good blast, and it certainly helped -- I've
reverted back to using the kernel on-demand governor and the temperature is more stable.

I still stand by the comment that it should be possible to tell the governor to take into
account the CPU temperature and trip points and throttle back if necessary.

Overheating

Posted May 22, 2008 10:24 UTC (Thu) by anton (guest, #25547) [Link]

I still stand by the comment that it should be possible to tell the governor to take into account the CPU temperature and trip points and throttle back if necessary.
Since you have been happy with a userspace solution for that, you can also use a userspace solution in combination with ondemand:

Just lower the cpufreq/scaling_max_freq value when the CPU gets too hot. This will make ondemand lower the maximum frequency it chooses.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds