From wakelocks to a real solution

By Jonathan Corbet
February 18, 2009

Last week's article on wakelocks described a suspend-inhibiting interface which derives from the Android project and the hostile reaction that interface received. Since then, the discussion has continued in two separate threads. Kernel developers, like engineers everywhere, are problem solvers, so the discussion has shifted away from criticism of wakelocks and toward the search for an acceptable solution. As of this writing, that solution does not exist, but we have learned some interesting things about the problem space.

Getting Linux power management to work well has been a long, drawn-out process, much of which involves fixing device drivers and applications, one at a time. There is also a lot of work which has gone into ensuring that the CPU remains in an idle state as much as possible. One of the reasons that some developers found the wakelock interface jarring was that the Android developers chose a different approach to power management. Rather than minimize power consumption at any given time, the Android code simply tries to suspend the entire device whenever possible. There are a couple of reasons for this approach, one of which we will get to below.

But we'll start with a very simple reason why Android goes for the "suspend the entire world" solution: because they can. The hardware that Android runs on, like many embedded systems (but unlike most x86-based systems), has been designed to suspend and resume quickly. So the Android developers see no reason to do things any other way. But that leads to comments like this one from Matthew Garrett:

Part of the reason you're getting pushback is that your solution to the problem of shutting down unused hardware is tied to embedded-style systems with very low resume latencies. You can afford to handle the problem by entering an explicit suspend state. In the x86 mobile world, we don't have that option. It's simply too slow and disruptive to the user experience. As a consequence we're far more interested in hardware power management that doesn't require an explicit system-wide suspend.

A solution that's focused on powering down as much unused hardware as possible regardless of the system state benefits the x86 world as well as the embedded world, so I think there's a fairly strong argument that it's a better solution than one requiring an explicit system state change.

Matthew also notes that it's possible to solve the power management problem without fully suspending the system; he gives the Nokia tablets as an example of a successful implementation which uses finer-grained power management.

That said, it seems clear that the full-suspend approach to power management is not going to go away. Some hardware is designed to work best that way, so Linux needs to support that mode of operation. So there has been some talk about how to design wakelocks in a way which fits better into the kernel as a whole. On the kernel side, there is some dispute as to whether the wakelock mechanism is needed at all; drivers can already inhibit an attempt by the kernel to suspend the system. But there is some justice to the claim that it's better if the kernel knows it can't suspend the system without having to poll every driver.

One simple solution, proposed by Matthew, would be a simple pair of functions: inhibit_suspend() and uninhibit_suspend(). On production systems, they would manipulate an atomic counter; when the counter is zero, the system can be suspended. These functions could take a device structure as an argument; debugging versions could then track which devices are blocking a suspend at any given time. The user-space equivalent could be a file like /dev/inhibit_suspend; as long as at least one process holds that file open, the system will continue to run. All told, it looks like a simple API without many of the problems seen in the wakelock code.

There were a few complaints from the Android side, but the biggest sticking point appears to be over timeouts. The wakelock API implements an automatic timeout which causes the "lock" to go away after a given time. There appear to be a few reasons for the existence of the timeouts:

Since not all drivers use the wakelock API, timeouts are required to prevent suspending the system while those drivers are running. The proposed solution to this one is to instrument all of the drivers which need to keep the system running. Once an acceptable API is merged into the kernel, drivers can be modified as needed.
If a process holding a wakelock dies unexpectedly, the timeout will keep the system running while the watchdog code restarts the faulting process. The problem here is that timeouts encode a recovery policy in the kernel and do little to ensure that operation is actually correct. What has been proposed instead is that the user-space "inhibit suspend" policy be encapsulated into a separate daemon which would make the decisions on when to keep the system awake.
User-space applications may simply screw up and forget to allow the system to suspend.

The final case above is also used as an argument for the full-suspend approach to power management. Even if an ill-behaved application goes into a loop and refuses to quit, the system will eventually suspend and save its battery anyway. This is an argument which does not fly particularly well with a lot of kernel developers, who respond that, rather than coding the kernel to protect against poor applications, one should simply fix those applications. Arjan van de Ven points out that, since the advent of PowerTop, the bulk of the problems with open-source applications have been fixed.

In this space, though, it is harder to get a handle on all of these problems. Brian Swetland describes the situation this way:

carrier deploys a device
carrier agrees to allow installation of arbitrary third party apps without some horrible certification program requiring app authors to jump through hoops, wait ages for approval, etc
users rejoice and install all kinds of apps
some apps are poorly written and impact battery life
users complain to carrier about battery life

Matthew also acknowledges the problem:

Remember that Android has an open marketplace designed to appeal to Java programmers - users are going to end up downloading code from there and then blaming the platform if their battery life heads towards zero. I think "We can't trust our userland not to be dumb" is a valid concern.

It is a real problem, but it still is not at all clear that attempts to fix such problems in the kernel are advisable - or that they will be successful in the end. Ben Herrenschmidt offers a different solution: a daemon which monitors application behavior and warns the user when a given application is seen to be behaving badly. That would at least let users know where the real problem is. But it is, of course, no substitute for the real solution: run open-source applications on the phone so that poor behavior can be fixed by users if need be.

The Android platform is explicitly designed to enable proprietary applications, though. It may prove to be able to attract those applications in a way which standard desktop Linux has never quite managed to do. So some sort of solution to the problem of power management in the face of badly-written applications will need to be found. The Android developers like wakelocks as that solution for now, but they also appear to be interested in working with the community to find a more globally-acceptable solution. What that solution will look like, though, is unlikely to become clear without a lot more discussion.

Index entries for this article
Kernel	Android
Kernel	Power management/Opportunistic suspend

From wakelocks to a real solution

Posted Feb 19, 2009 7:36 UTC (Thu) by aleXXX (subscriber, #2742) [Link] (10 responses)

> This is an argument which does not fly particularly well with a lot of
> kernel developers, who respond that, rather than coding the kernel to
> protect against poor applications, one should simply fix those
> applications

Hmm, I don't agree with that. Isn't it after all similar to memory
protection ? If we would trust all userspace applications to be bugfree
and not access memory which is not theirs, there would be no need for
protected memory.
In the same way this protects the system against programs behaving badly
memory-wise, some protection against applications behaving badly
power-consumtion-wise seems like a good thing to me.

Alex

From wakelocks to a real solution

Posted Feb 19, 2009 9:23 UTC (Thu) by dgm (subscriber, #49227) [Link]

Exactly. If you think that the applications are responsible of beeing well-behaver, you'd better use or MS-DOS.

From wakelocks to a real solution

Posted Feb 19, 2009 9:32 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (6 responses)

Well, partly because it's impossible to absolutely protect against those applications - they'll still increase your power consumption when you're using your machine.

I don't buy this argument either

Posted Feb 19, 2009 10:40 UTC (Thu) by khim (subscriber, #9252) [Link] (5 responses)

If you run some wild application you can make your system slow down so much that sshing to it and killing offending process is impossible. Somehow the answer "fix your userspace" was never considered "good enough" and years were spent developing many systems (quotas, cotainers, VMs) to make it safe to run any application and still have control over system.

Sure: any application will consume resources. But with phone you need guarantee that consumed resources (all resources including power) are limited by some arbitrary value. If it's enough for program - it'll work great, if not - I can decide if fancy screen-saver worth giving it half of battery resources.

The same story as with preemptive vs cooperative multitasking: cooperative multitasking works great if you have control over all programs (see Novel Netware 3.x), but if not - it's disaster (see Windows 3.x and/or MacOS before MacOS X).

I don't buy this argument either

Posted Feb 19, 2009 10:50 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (4 responses)

Stopping every single userspace process from running is an awfully blunt tool to prevent poorly written apps from spending battery power, especially when there are more flexible approaches that allow userspace defined policy.

Why?

Posted Feb 19, 2009 12:10 UTC (Thu) by khim (subscriber, #9252) [Link] (3 responses)

Stopping every single userspace process from running is an awfully blunt tool to prevent poorly written apps from spending battery power

Somehow I doubt you can save as much power by using any other approach. XO tried to do this, G1 is doing this - I'm pretty sure it'll be standard approach in some niches for years to come. And why should a single poorly-written application be able to suck your battery dry if system is designed to mostly live in suspended mode?

Kernel already is doing things like that. Only there kernel guarantees small amount of time for "normal" process here it gurantees only small amount of work time for any process. Different systems, different requirements...

Why?

Posted Feb 19, 2009 12:20 UTC (Thu) by mjg59 (subscriber, #23239) [Link] (2 responses)

XO is a different case - the runtime idle states on x86 are signifiantly higher power draw than on modern SoCs. OMAP and the MSM chips used in the G1 are effectively equivalent in runtime idle and suspended states. The concept of a "suspended mode" is dying out in many markets, so optimising for it is foolish. Nokia have succeeded in demonstrating that it's unnecessary when you have the appropriate hardware support.

Why?

Posted Feb 24, 2009 18:30 UTC (Tue) by tbird20d (subscriber, #1901) [Link] (1 responses)

Nokia (and TI really) have demonstrated that with near-infinite hardware knobs and Herculean software effort, you can pull this off. But the methods are not generalizable to other platforms, scalable, or IMHO sustainable in the long-term.

Why?

Posted Feb 24, 2009 18:49 UTC (Tue) by mjg59 (subscriber, #23239) [Link]

For functional deep runtime power management, you need three things:

1) The hardware to support it. That's increasingly the case - multiple vendors provide this kind of functionality.
2) The OS to support high quality driver power management. That requires paying attention to application requirements and aggressively reducing the power consumption of hardware when those requirements are relaxed.
3) The userspace applications not to use resources unnecessarily, or some way to actively prevent them from being given them.

(1) is entirely out of our control. For hardware that supports low-latency full-system suspend/resume and doesn't support ultra low-power runtime idle modes, we don't have any option - the only solution is some sort of automatic system suspend.

However, I'm going to argue that that's not especially interesting. Hardware that falls into this category is a decreasing proportion of the market. ARM is mostly heading towards supporting sufficiently deep runtime idle. x86 doesn't have sufficiently low-latency suspend/resume for automatic suspend to be practical. Optimising for this scenario is optimising for a dying market segment.

(2) and (3) are interesting because they benefit the entire Linux market, not merely a segment of the embedded market. Enhancing our driver framework allows us to save power in everything from the phone to the server. Ensuring that our software stack doesn't engage in pathological behaviour provides the same benefits.

Concentrating on wakelocks simply ignores the reality that they provide no benefit in most usecases. In the Android case, they're a bandaid to hide inadequacies in other software layers.

From wakelocks to a real solution

Posted Feb 27, 2009 7:51 UTC (Fri) by efexis (guest, #26355) [Link]

There's actually lot to be said for that. Protected memory mechanisms back on the early 286 CPU's were documented as debugging tools, as they would trap illegal memory accesses, point to where they're occuring, so the software could then be fixed. Assuming all-correct, trusted, and playing-nicely-together software, being able to remove protected and virtual memory mechanisms could actually make a lot of things run a lot lot faster, although of course with downsides too, such as losing automatic copy-on-write memory pages that makes other things run much quicker (like fork()ing). I seem to recall that much stuff that's been launched into space will often do away with memory protection mechanisms as it makes the silicon much simpler.

If we had the man hours to put into the software that would be great, but, it's cheaper to protect against human error (and malace) instead by using software and circuitry. This is often resisted as many feel the desire to Do It Right, but then you get things like probes on Mars deadlocking, and kernel guys going "let's just implement priority inheritance to get it working". I seem to recall Linus being resisant to priority inheritance in the Linux kernel, but eventually an implementation did get in (http://lwn.net/Articles/178253/). Whilst this may not be Doing It Right, it is definitely Doing the Right Thing.

Alex

From wakelocks to a real solution

Posted Apr 29, 2010 3:52 UTC (Thu) by cventers (guest, #31465) [Link]

Memory protection is just as necessary for security on a multi-user operating system as it is for crash protection. Without it, any application that crashed or decided to be malicious could corrupt just about anything on the system.

Memory protection is also largely implemented in hardware, and is a fundamental component of how multiple processes can coexist on one computer and still appear to run simultaneously.

That's wayyyyyy different from adding hacks to the kernel to fix broken applications. That reduces kernel quality and encourages app developers to be lazy. It's something Microsoft would do -- add kernel hacks to make Office or Borland work right.

From wakelocks to a real solution

Posted Feb 19, 2009 8:01 UTC (Thu) by i3839 (guest, #31386) [Link] (1 responses)

> What has been proposed instead is that the user-space "inhibit suspend"
> policy be encapsulated into a separate daemon which would make the
> decisions on when to keep the system awake.

A daemon like that could also implement the timeout behaviour, solving the problem described here:

> User-space applications may simply screw up and forget to allow the
> system to suspend.

Simple kernel interface + user space daemon seems like a working solution.

From wakelocks to a real solution

Posted Feb 19, 2009 8:24 UTC (Thu) by alonz (subscriber, #815) [Link]

Such a daemon could also allow some applications to hold the phone active for a longer time—for example, media players or games.

It could even use the Android permissions model to decide which applications get this preferential treatment.

From wakelocks to a real solution

Posted Feb 19, 2009 21:28 UTC (Thu) by xyzzy (guest, #1984) [Link] (1 responses)

Android apps all run under an "I can't believe it's not Java(tm)" VM, right? So surely apps-being-dumb issues could be solved in userspace at the VM level.

From wakelocks to a real solution

Posted Feb 21, 2009 18:14 UTC (Sat) by man_ls (guest, #15091) [Link]

They could, but I would guess the VM is not very smart either -- it does whatever userspace tells it to do. Solving the problem in the not-quite-Java VM would add complexity in a layer which (at least theoretically) is not involved in power management, just in secure execution.

Besides, there will be other native applications running on the G1, which (even if they under direct control of Goog... ahem, Android) must be written with power management in mind. It seems easier to just push power management to the kernel and let it do its thing.

From wakelocks to a real solution

Posted Mar 4, 2009 10:09 UTC (Wed) by amit (subscriber, #1274) [Link]

Tight SElinux policies for prohibiting just about any app to open the file suggested can work.

Of course, in an open mobile world where any app can be installed should be restricted and selinux can help really well here.