Notes from the Montreal Linux Power Management Mini-Summit
From: | Len Brown <lenb-AT-kernel.org> | |
To: | Linux Power Management List <linux-pm-AT-lists.osdl.org>, linux-acpi-AT-vger.kernel.org | |
Subject: | Montreal Linux Power Management Mini-Summit, July 13, 2009 - Meeting Notes | |
Date: | Thu, 30 Jul 2009 18:04:40 -0400 (EDT) | |
Cc: | Linux Kernel Mailing List <linux-kernel-AT-vger.kernel.org> |
A Linux Power Management "mini-summit" was held on July 13th, 2009 - on the first day of the Montreal Linux Symposium. The Linux Symposium generously provided the facilities. We repeated the process used in 2008: http://lwn.net/Articles/292447/ This year the meeting room was more accessible to the general attendees of the Linux Symposium, so we had a fair number of "drop-ins". 25 signed in (listed below) plus a few more that came and went. While this exceeded our cap of 20, the extra people did not hinder our goal of focusing on a single discussion. Attendees --------- Len Brown - Intel - ACPI, SFI, Suspend co-Maintainer Howard Alyne - Wind River Pierre Phaneuf Rafael J. Wysocki - SUSE Labs/Novell, U. Warsaw; Hibernate and Suspend Maintainer Per-Inge Tallberg - Ericsson Rickard Andersson - Ericsson Paul Mundt - Renesas - SH Maintainer Magnus Damm - Renesas Richard Wooodruff - Texas Instruments, OMAP Stephen Hui - Zarlink John Linville - Red Hat - Wireless LAN maintainer Mark Brown - Marvell Samuel Thibault - labri.fr Lucas Nussbaun - inria.fr Srinivas Sripathi - Motorola Jason Baron - Red Hat Aristu Rozanaski - Red Hat - RHEL6 kernel maintainer Christopher Curtis - RipTide Software Klaus Pedersen - Nokia H. Peter Anvin - Intel - x86 maintainer Ernest Szedeman - Nortel Rick Leir - Leirtech David Ahern - Cisco Wending Wen - Rheinmetall Jason Chagas - Marvell Some of the attendees are in photos here: http://picasaweb.google.com/lenb417/2009LinuxSymposium# Agenda ------ 1. Review changes over the last year 2. Survey tools, techniques, workloads 3. Discuss upcoming work Summary of Power Management kernel changes since last year ---------------------------------------------------------- ACPI Platform BIOS compatibility fixes ACPI ACPI_SCI_EN work-around resume memory corruption workarounds hibernation: NVS memory handling handle overlapping memory zones suspend/resume framework re-work (Rafael Wysocki) shipped suspend/resume RTC test feature ordering update/workaround simplified driver interface now available r8169 etc. drivers now using it PCI PM framework re-worked to simplify drivers graphics drivers better support suspend/resume i915 video restore, though has bugs ATI making progress, especially older cards NVIDIA - continues to trail no open source support for devices after 7200 power aware scheduling sched_mc_power_savings per-CPU timers fixed clock_events_broadcast() bugs fixed (no longer needed on Westmere, which has always running LAPIC timer) range timers shipped upstream eg. range timers used android to group around wireless Intel shipped Nehalem (Core i7), which has always-running-TSC Run Time power management is receiving some attention now. OMAP (Richard Woodruff) 2008 had TI releasing aggressive full-off reference code on public portals Customers snapshotted this code at different points Heavy support burden ramping variants into production Linux-OMAP community have been creating a cleaner version of aggressive PM code suitable for mainline kernel in Linux-OMAP PM branch. Hope of reduced burden for future kernels with mainlined code ACPI sub-system (Len Brown) quality has been the focus for the last year. We continue to process about 300 bugs/year with 50-60 unresolved at any given time. Wireless: (John Linville) mac-80211 is now suspend/resume aware IEEE-80211 has run-time power saving features eg. negotiate w/ access point starting to deploy in drivers beacon filtering (reduces CPU wake-ups) TX power upcoming in cfg-80211 API Nokia tablets pushing power savings SH: (Paul Mundt) cpuidle integration using clocksources & clockevents from upstream can switch between timers depending on sleep states Hibernate & STR enabled, can test w/ RTC & kexec-jump-and-return s390: added suspend/resume support 5-second boot on Atom netbook for Moblin async API is upstream Fedora Core-11 boots in 20 seconds on a notebook Down from 60 seconds in Fedora Core-10 PM-QOS shipped Documentation/power/pm_qos_interface.txt Survey of Tools, Techniques, workloads for optimizing power management ---------------------------------------------------------------------- powertop bootchart bootgraph CONFIG_POWER_TRACER=y LTT-lite performance counters for energy coming OMAP uses on-board instrumentation suspend/resume debug I/F Power meters: O(100) Watts Up Pro; O(600) Extech; O(1000) Yokogawa O(600) HP/Agilent 34401A OMAP: measure per-power-plane w/ lab instruments 500mA vs uA range difficult to measure w/ precision multi-channel DAC - each channel calibrated to range Workloads for measuring power: handheld: no standard workloads however device vendors have internal benchmarks #1 idle #2 specific workloads #3 combination use-case SpecPower benchmark for servers (only) Energy Star for client computers idle only requires STR to be enabled by default Energy Star Server spec coming Future Energy Star wants to use energy benchmark BAPCO MobileMark 2007 for Windows Apple joined, so expect something new to work also on Apple No Linux Distro representation EEMBC released something or other... BLTK (Battery Life Toolkit) for Linux http://www.lesswatts.org/projects/bltk/ could use refresh could use handheld new workloads Future plans for the PM development, kernel side ------------------------------------------------- cpuidle C-states generalized to be platform idle states... platform driver can hide platform hooks into CPU power states Runtime PM for Platform Devices. 2.6.32 framework plan simmering SH running on top of prototype now context save/restore for power off power domain platform devices SH specific - Magnus IO devices eg PCI, USB - Alan Stern clock framework (started in ARM, now common on embedded) includes ref-counts/clock architecture specific implementation x86/ACPI system doesn't expose clock dependencies so unclear benefit to that arch Run-time PM of I/O devices, from the PCI POV mostly ability to put device into D1/D2 (~200us) /D3 (10ms) wakeup: PCIe #PME plug-event via root port (PCI #PME is less well specified) ACPI 4.0 adds D3hot Q: has an effect on _SD3? Hibernate/suspend: Axiom: we need more people fixing suspend/resume bugs Suspend2 aka "Tux on Ice" Spring 2009 patch set to replace hibernate w/ TOI was deemed impractical by upstream community, which prefers an incremental approach. Since, Nigel has sent specific patches to Rafael along the lines of gradual cherry-picking that upstream needs. First example is patch to compress hibernation image which Rafael thinks can be integrated. TOI is able to save larger hibernate images due to how it manages memory. This is a nice benefit and we'd like to see if we can do it upstream. patch review bandwidth limited 1. image compression 2. image saving performance currently very slow 3. ability to use multiple devices to save images including multiple swaps, and regular files 4. break the half-of-memory image limitation 5. Image encryption (solution for keys is an issue) It would be great to have Nigel supporting upstream hibernate. TOI supports snapshot boot via "kiosk mode" Hibernate & kexec kexec-jump is upstream (i386, SH, no x86_64) simplifies memory management of the "jumped to" code unclear if any other advantages. kexec-crash-dump is useful can make an oops "look less scary" and be automatic STR performance eliminate console switch async device resume android submitted "auto-suspend" patches compromise between low-level and high-level suspend invocation policy. cpuidle vs auto-suspend suspend is more "draconian", it stops timers etc for you. platform drivers in cpuidle can get to same place. Android OHA -Open Handset Alliance controls android license(s) Android = access to app-store Moblin shall support Android applications OMAP & SH specifics UIO - user space codec etc. have no concept of PM could use clock framework extension (clock framework is accessible via debugfs if necessary) interrupt coalescing deferred I/O to LCD delay until regular (infrequent) update interval use x-damage API to track change to visible screen SH running cpufreq on top of clock framework cpufreq has notifiers, clock framework does not lightweight CPU hotplug IBM proposed "idle throttling" approach using scheduler Intel is proposing simple "forced idle" RT thread PeterZ likes neither implementation, but favors the IBM approach in the long term. SH SMP wants to run Itron on some cores... low latency transition is important Memory Power Management Nokia project w/ U. in Brazil more pain than gain in memory offline prototype "partial RAM self refresh" page tables for kernel memory would allow moving kernel physical memory memory off-line incompatible with high-performance interleaving using NUMA node to segment memory allows tracking unused memory anti-fragmentation went upstream last year consensus: online/offline node granularity only ACPI 4.0 was published Error Reporting extensions processor aggregator device (forced idle to save power) D3hot generalized fan support thermal extensions IPMI op-region Len will do a Linux ACPI 4.0 presentation this Fall virtualization power management PM is still an after-though in the VMM space they have bigger problems KVM gets everything in Linux for free but could benefit from more info from the guests Xen gets to re-invent/port/re-implement everything in Linux VMMS have an easier time moving physical pages and thus doing memory power management -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Posted Aug 3, 2009 16:06 UTC (Mon)
by ssam (guest, #46587)
[Link] (6 responses)
the idea was that you start with a timeout of say 30 sec. if the user wiggles the mouse within a few seconds of the screen blanking then you increase the timeout. otherwise reduce the timeout.
that way on days when i am staring a big chunks of code, and not scrolling, if the screen goes blank i just wiggle the mouse, and automatically get a longer timeout.
on days when i am reading a magazine, but occasionally wake the screen to check something, it can blank faster.
Posted Aug 4, 2009 3:54 UTC (Tue)
by jabby (guest, #2648)
[Link]
Maybe it will be my first kernel hacking project if no one else jumps on it... Please, someone jump on it! You don't want my first foray into kernel code to be controlling your screen... ;o)
Posted Aug 4, 2009 6:06 UTC (Tue)
by dkite (guest, #4577)
[Link] (1 responses)
Derek
Posted Aug 4, 2009 8:40 UTC (Tue)
by muwlgr (guest, #35359)
[Link]
Posted Aug 4, 2009 14:07 UTC (Tue)
by zdzichu (subscriber, #17118)
[Link]
Posted Aug 4, 2009 23:34 UTC (Tue)
by pimlottc (guest, #44833)
[Link] (1 responses)
Perhaps there are more use cases for people who need to see what's on the screen without typing/mousing for long periods, but I would expect that to be niche.
Posted Aug 5, 2009 0:15 UTC (Wed)
by foom (subscriber, #14868)
[Link]
That said, I prefer the behavior Gnome has these days: when you leave it alone, it starts slowing
fading to black over 10 seconds or so. That seems much better than having a drawn-out dimmed
screen period.
Posted Aug 3, 2009 21:09 UTC (Mon)
by shapr (subscriber, #9077)
[Link] (9 responses)
Wait, what? They make s390 notebooks?
Posted Aug 3, 2009 21:32 UTC (Mon)
by elanthis (guest, #6227)
[Link] (8 responses)
My desktop does suspend/hibernate, for example, so I can leave my machine powered off over night while still having my full session restored when I start it up in the morning.
For big power-hungry workstations, power saving is even more critical. The power consumed by workstations and servers is one of the biggest (if not the biggest) expensive a large ISP/IT Department has to deal with.
Posted Aug 3, 2009 22:13 UTC (Mon)
by drag (guest, #31333)
[Link] (2 responses)
Remember that typically with mainframe applications they charge money based on MIPS cycles. So that the more processor you have the more everything costs. So in a efficiently running mainframe environment with proper setup and accounting you should be running at about 100% cpu 24/7 in order to get the best value.
They are not like PCs were you have the user or I/O as a bottleneck and the CPU spends most of it's time idle... Mainframes tend to have massive amount of I/O and relatively little CPU.
I would still like to have suspend-to-disk capabilities in a mainframe environment however. For various hardware issues and whatnot you do need to plan for downtime occasionally. By being able to suspend the Linux systems to disk then that reduces the downtime. Instead of needlessly wasting CPU time booting up and initializing the system you just load up the memory snapshot, which should be almost always much faster in a system like that.
Posted Aug 3, 2009 22:43 UTC (Mon)
by ewan (guest, #5533)
[Link] (1 responses)
Posted Aug 4, 2009 16:09 UTC (Tue)
by ewan (guest, #5533)
[Link]
Posted Aug 4, 2009 14:21 UTC (Tue)
by man_ls (guest, #15091)
[Link] (4 responses)
Posted Aug 4, 2009 20:41 UTC (Tue)
by dlang (guest, #313)
[Link] (3 responses)
people
even allowing for 2x power consumption (to cover cooling, etc) servers on a 3 or so year replacement cycle would still cost more than the power they consume over that time (assuming max power draw the entire time)
power is a significant cost, and since it shows up as a single line item it jumps out at people, but it's still not as bad as people are making it out to be.
Posted Aug 4, 2009 22:09 UTC (Tue)
by man_ls (guest, #15091)
[Link] (2 responses)
Similarly, if each server costs 3k$, the breaking point is with a lifecycle of just ~3 years. I would say that either machines cost more or use less juice, so servers should be above power too.
Posted Aug 4, 2009 22:30 UTC (Tue)
by dlang (guest, #313)
[Link] (1 responses)
if you have any serious uses you have at least two people (probably 3) so that you have someone available all the time (with vacations, sick time, etc). a _lot_ of places which meet this criteria have fewer than the 220-330 servers that would be needed to maintain that ratio.
this ratio is also very dependent on how many different variations of server configurations that you have. google gets such phenomenal numbers of servers per admin by the fact that they have _lots_ of any one configuration. if they only had a couple thousand servers per configuration they would need far more admins than they do ;-) they also don't have their admins deal with failures, they just shut down the failed systems.
In many ways I would rather have another 50 servers to manage that fit in one of my existing baselines than to add 1 special exception box that is completely different.
Posted Aug 13, 2009 1:41 UTC (Thu)
by deleteme (guest, #49633)
[Link]
One baseline is good but not acheivable.
Notes from the Montreal Linux Power Management Mini-Summit
"adaptive backlight dimming" ... brilliant!
Notes from the Montreal Linux Power Management Mini-Summit
backlight after a certain time of inactivity.
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
The only reason I like backlight dimming is that it is less abrupt and disruptive than simply turning
off the screen. So if I am actually reading the screen but not interacting, the dim is a more subtle
signal to twiddle the mouse than the screen suddenly going blank.
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
Notes from the Montreal Linux Power Management Mini-Summit
What? I would have sworn that making the payroll was a bigger expense than power costs. Especially in organizations with lots of development going around. Thing is, I never paid attention to energy costs so it may well be. Just curious: is it an impression of yours, or do you have numbers?
OT: Biggest expense
OT: Biggest expense
servers
power
It makes a lot of sense. Maybe at Google things are different, due to a couple of factors:
OT: Biggest expense
For the rest of us things are different. At 0.10$/kWh, one server using 1kW (a high powered beast) at all times costs ~900$/yr. For a 100k$/yr admin (fully loaded) the breaking point is at ~110 high-powered servers per admin -- kind of the industry average according to the first link. You have to manage more per admin to spend more on power than on people, so in an IT department with any development payroll should win hands down.
OT: Biggest expense
OT: Biggest expense