What comes after suspend blockers

By Jonathan Corbet
June 1, 2010

It looked like it was almost a done deal: after more than a year of discussions, it seemed that most of the objections to the Android "suspend blockers" concept had been addressed. The code had gone into a tree which feeds into linux-next, and a pull request was sent to Linus. All that remained was to see whether Linus actually pulled it. That did not happen; by the end of the merge window, the newly reinvigorated discussion had made that outcome unsurprising. But the discussion which destroyed any chance of getting that code in has, in the end, yielded the beginnings of an approach which may be acceptable to all participants. This article will take a technical look at the latest round of objections and the potential solution.

As a reminder, suspend blockers (formerly "wakelocks") came about as part of the power management system used on Android phones. Whenever possible, the Android developers want to put the phone into a fully suspended state, where power consumption is minimized. The Android model calls for automatic ("opportunistic") suspend to happen even if there are processes which are running. In this way, badly-written programs are prevented from draining the battery too quickly.

But a phone which is suspended all the time, while it does indeed run a long time on a single charge, is also painful to use. So there are times when the phone must be kept running; these times include anytime that the display is on. It's also important to not suspend the phone when interesting things are happening; that's where suspend blockers come in. The arrival of a key event, for example, will cause a suspend blocker to be obtained within the kernel; that blocker will be released after the event has been read by user space. The user-space application, meanwhile, takes a suspend blocker of its own before reading events; that will keep the system running after the kernel releases the first blocker. The user-space blocker is only released once the event has been fully processed; at that point, the phone can suspend.

The latest round of objections included some themes which had been heard before: in particular, the suspend blocker ABI, once added to the kernel, must be maintained for a very long time. Since there was a lot of unhappiness with that ABI, it's not surprising that many kernel developers did not want to be burdened with it indefinitely. There are also familiar concerns about the in-kernel suspend blocker calls spreading to "infect" increasing numbers of drivers. And the idea that the kernel should act to protect the system against badly-written applications remains controversial; some simply see that approach as making a more robust system, while others see it as a recipe for the proliferation of bad code.

Quality of service

The other criticisms, though, came from a different direction: suspend blockers were seen as a brute-force solution to a resource management problem which can (and should) be solved in a way which is more flexible, meets the needs of a wider range of users, and which is not tied to current Android hardware. In this view, "suspend" is not a special and unique state of the system; it is, instead, just an especially deep idle state which can be managed with the usual cpuidle logic. The kernel currently uses quality-of-service (QOS) information provided through the pm_qos API to choose between idle states; with an expanded view of QOS, that same logic could incorporate full suspend as well.

In other words, using cpuidle, current kernels already implement the "opportunistic suspend" idea - for the set of sleep states known to the cpuidle code now. On x86 hardware, a true "suspend" is a different hardware state than the sleep states used by cpuidle, but (1) the kernel could hide those differences, and (2) architectures which are more oriented toward embedded applications tend to treat suspend as just another idle state already. There are signs that x86 is moving in the same direction, where there will be nothing all that special about the suspended state.

That said, there are some differences at the software level. Current idle states are only entered when the system is truly idle, while opportunistic suspend can happen while processes are running. Idle states do not stop timers within the kernel, while suspend does. Suspend, in other words, is a convenient way to bring everything to a stop - whether or not it would stop of its own accord - until some sort of sufficiently interesting event arrives. The differences appear to be enough - for now - to make a "pure" QOS-based implementation impossible; things can head in that direction, though, so it's worth looking at that vision.

To repeat: current CPU idle states are chosen based on the QOS requirements indicated by the kernel. If some kernel subsystem claims that it needs to run with latencies measured in microseconds, the kernel knows that it cannot use a deep sleep state. Bringing suspend into this model will probably involve the creation of a new QOS level, often called "QOS_NONE", which specifies that any amount of latency is acceptable. If nothing in the system is asking for a QOS greater than QOS_NONE, the kernel knows that it can choose "suspend" as an idle state if that seems to make sense. Of course, the kernel would also have to know that any scheduled timers can be delayed indefinitely; the timer slack mechanism already exists to make that information available, but this mechanism is new and almost unused.

In a system like this, untrusted applications could be run in some sort of jail (a control group, say) where they can be restricted to QOS_NONE. In some versions, the QOS level of that cgroup is changed dynamically between "normal" and QOS_NONE depending on whether the system as a whole thinks it would like to suspend. Once untrusted applications are marked in this way, they can no longer prevent the system from suspending - almost.

One minor difficulty that comes in is that, if suspend is an idle state, the system must go idle before suspending becomes an option. If the application just sits in the CPU, it can still keep the system as a whole from suspending. Android's opportunistic suspend is designed to deal with this problem; it will suspend the system regardless of what those applications are trying to do. In the absence of this kind of forced suspend, there must be some way to keep those applications from keeping the system awake.

One intriguing idea was to state that QOS_NONE means that a process might be forced to wait indefinitely for the CPU, even if it is in a runnable state; the scheduler could then decree the system to be idle if only QOS_NONE processes are runnable. Peter Zijlstra worries that not running runnable tasks will inevitably lead to all kinds of priority and lock inversion problems; he does not want to go there. So this approach did not get very far.

An alternative is to defer any I/O operations requested by QOS_NONE processes when the system is trying to suspend. A process which is waiting for I/O goes idle naturally; if one assumes that even the most CPU-hungry application will do I/O eventually, it should be possible to block all processes this way. Another is to have a user-space daemon which informs processes that it's time to stop what they are doing and go idle. Any process which fails to comply can be reminded with a series of increasingly urgent signals, culminating in SIGKILL if need be.

Meanwhile, in the real world

Approaches like this can be implemented, and they may well be the long-term solution. But it's not an immediate solution. Among other things, a purely QOS-based solution will require that drivers change the system's overall QOS level in response to events. When something interesting happens, the system should not be allowed to suspend until user space has had a chance to respond. So important drivers will need to be augmented with internal QOS calls - kernel-space suspend blockers in all but name, essentially. Timers will need to be changed so that those which can be delayed indefinitely do not prevent the system from suspending. It might also be necessary to temporarily pass a higher level of QOS to applications when waking them up to deal with events. All of this can probably be done in a way that can be merged, but it won't solve Android's problem now.

So what we may see in the relatively near future is a solution based on an approach described by Alan Stern. Alan's idea retains the use of forced suspend, though not quite in the opportunistic mode. Instead, there would be a "QOS suspend" mode attainable by explicitly writing "qos" to /sys/power/state. If there are no QOS constraints active when "QOS suspend" is requested, the system will suspend immediately; otherwise, the process writing to /sys/power/state will block until those constraints are released. Additionally, there would be a new QOS constraint called QOS_EVENTUALLY which is compatible with any idle state except full suspend. These constraints - held only within the kernel - would block suspend when things are happening.

In other words, Android's kernel-space suspend blockers turn into QOS_EVENTUALLY constraints. The difference is that QOS terms are being used, and the kernel can make its best choice on how those constraints will be met.

There are no user-space suspend blockers in Alan's approach; instead, there is a daemon process which tries to put the system into the "QOS suspend" state whenever it thinks that nothing interesting is happening. Applications could communicate with that daemon to request that the system not be suspended; the daemon could then honor those requests (or not) depending on whatever policy it implements. Thus, the system suspends when both the kernel and user space agree that it's the right thing to do, and it doesn't require that all processes go idle first. This mechanism also makes it easy to track which processes are blocking suspend - an important requirement for the Android folks.

In summary, as Alan put it:

The advantages of this scheme are that this does everything the Android people need, and it does it in a way that's entirely compatible with pure QoS/cpuidle-based power management. It even starts along the path of making suspend-to-RAM just another kind of dynamic power state.

Android developer Brian Swetland agreed, saying "...from what I can see it certainly seems like this model provides us with the functionality we're looking for." So we might just have the form of a real solution.

There are a number of loose ends to tie down, of course. Additionally, various alternatives are still being discussed; one approach would replace user-space wakelocks with a special device which can be used to express QOS constraints, for example. There is also the little nagging issue that nobody has actually posted any code. That problem notwithstanding, it seems like there could be a way forward which would finally break the roadblock that has kept so much Android code out of the kernel for so long.

Index entries for this article
Kernel	Android
Kernel	Power management
Kernel	Power management/Opportunistic suspend

What comes after suspend blockers

Posted Jun 3, 2010 10:24 UTC (Thu) by nix (subscriber, #2304) [Link] (3 responses)

Another is to have a user-space daemon which informs processes that it's time to stop what they are doing and go idle. Any process which fails to comply can be reminded with a series of increasingly urgent signals, culminating in SIGKILL if need be.

SIGSTOP, surely? Just as uncatchable, and we're trying to stop them chewing CPU, not kill them stone dead.

What comes after suspend blockers

Posted Jun 3, 2010 18:54 UTC (Thu) by dmk (guest, #50141) [Link] (2 responses)

>SIGSTOP, surely? Just as uncatchable, and we're trying to stop them chewing CPU, not kill them stone dead.

Well no. Peter actually meant to kill them.

What comes after suspend blockers

Posted Jun 14, 2010 13:28 UTC (Mon) by oak (guest, #2786) [Link] (1 responses)

Before KILL one could try SIGXCPU...

What comes after suspend blockers

Posted Jun 14, 2010 20:24 UTC (Mon) by aigarius (subscriber, #7329) [Link]

Yep, as one of "... series of increasingly urgent signals ..." before SIGKILL, just as mentioned in that text.