By Jonathan Corbet
June 1, 2010
It looked like it was almost a done deal: after more than a year of
discussions, it seemed that most of the objections to the Android "suspend
blockers" concept had been addressed. The code had gone into a tree which
feeds into linux-next, and
a pull
request was sent to Linus. All that remained was to see whether Linus
actually pulled it. That did not happen; by the end of the merge window,
the newly reinvigorated discussion had made that outcome unsurprising. But
the discussion which destroyed any chance of getting that code in has, in
the end, yielded the beginnings of an approach which may be acceptable to
all participants. This article will take a technical look at the latest
round of objections and the potential
solution.
As a reminder, suspend blockers (formerly "wakelocks") came about as part
of the power management system used on Android phones. Whenever possible,
the Android developers want to put the phone into a fully suspended state,
where power consumption is minimized. The Android model calls for
automatic ("opportunistic") suspend to happen even if there are processes
which are running. In this way, badly-written programs are prevented from
draining the battery too quickly.
But a phone which is suspended all the time, while it does indeed run a
long time on a single charge, is also painful to use. So there are times
when the phone must be kept running; these times include anytime that the
display is on. It's also important to not suspend the phone when
interesting things are happening; that's where suspend blockers come in.
The arrival of a key event, for example, will cause a suspend blocker to be
obtained within the kernel; that blocker will be released after the event
has been read by user space. The user-space application, meanwhile, takes
a suspend blocker of its own before reading events; that will keep the
system running after the kernel releases the first blocker. The user-space
blocker is only released once the event has been fully processed; at that
point, the phone can suspend.
The latest round of objections included some themes which had been heard
before: in particular, the suspend blocker ABI, once added to the kernel,
must be maintained for a very long time. Since there was a lot of
unhappiness with that ABI, it's not surprising that many kernel developers
did not want to be burdened with it indefinitely. There are also familiar
concerns about the in-kernel suspend blocker calls spreading to "infect"
increasing numbers of drivers. And the idea that the kernel should act to
protect the system against badly-written applications remains
controversial; some simply see that approach as making a more robust
system, while others see it as a recipe for the proliferation of bad code.
Quality of service
The other criticisms, though, came from a different direction: suspend
blockers were seen as a brute-force solution to a resource management
problem which can (and should) be solved in a way which is more flexible,
meets the needs of a wider range of users, and which is not tied to current
Android hardware. In this view, "suspend" is not a special and
unique state of the system; it is, instead, just an especially deep idle
state which can be managed with the usual
cpuidle logic. The kernel
currently uses quality-of-service (QOS) information provided through the
pm_qos API to choose between
idle states; with an expanded view of QOS, that same logic could
incorporate full suspend as well.
In other words, using cpuidle, current kernels already implement the
"opportunistic suspend" idea - for the set of sleep states known to the
cpuidle code now. On x86 hardware, a true "suspend" is a different hardware state
than the sleep states used by cpuidle, but (1) the kernel could hide
those differences, and (2) architectures which are more oriented
toward embedded applications tend to treat suspend as just another idle
state already. There are signs that x86 is moving in the same direction,
where there will be nothing all that special about the suspended state.
That said, there are some differences at the software level. Current idle
states are only entered when the system is truly idle, while opportunistic
suspend can happen while processes are running. Idle states do not stop
timers within the kernel, while suspend does. Suspend, in other words, is
a convenient way to bring everything to a stop - whether or not it would
stop of its own accord - until some sort of
sufficiently interesting event arrives. The differences appear to be
enough - for now - to make a "pure" QOS-based implementation impossible;
things can head in that direction, though, so it's worth looking at that
vision.
To repeat: current CPU idle states are chosen based on the QOS
requirements indicated by the kernel. If some kernel subsystem claims that
it needs to run with latencies measured in microseconds, the kernel knows
that it cannot use a deep sleep state. Bringing suspend into this model
will probably involve the creation of a new QOS level, often called "QOS_NONE",
which specifies that any amount of latency is acceptable. If
nothing in the system is asking for a QOS greater than QOS_NONE, the kernel
knows that it can choose "suspend" as an idle state if that seems to make
sense. Of course, the kernel would also have to know that any scheduled
timers can be delayed indefinitely; the timer slack mechanism already
exists to make that information available, but this mechanism is new and
almost unused.
In a system like this, untrusted applications could be run in some sort of
jail (a control group, say) where they can be restricted to QOS_NONE. In
some versions, the QOS level of that cgroup is changed dynamically between
"normal" and QOS_NONE depending on whether the system as a whole thinks it
would like to suspend. Once untrusted applications are marked in this way,
they can no longer prevent the system from suspending - almost.
One minor difficulty that comes in is that, if suspend is an idle state,
the system must go idle before suspending becomes an option. If the
application just sits in the CPU, it can still keep the system as a whole
from suspending. Android's opportunistic suspend is designed to deal with
this problem; it will suspend the system regardless of what those
applications are trying to do. In the absence of this kind of forced
suspend, there must be some way to keep those applications from keeping the
system awake.
One intriguing idea was to state that QOS_NONE means that a process might
be forced to wait indefinitely for the CPU, even if it is in a runnable
state; the scheduler could then decree the system to be idle if only
QOS_NONE processes are runnable. Peter Zijlstra worries that not running runnable tasks will
inevitably lead to all kinds of priority and lock inversion problems; he
does not want to go there. So this approach did not get very far.
An alternative is to defer any I/O operations requested by QOS_NONE
processes when the system is trying to suspend. A process which is waiting
for I/O goes idle naturally; if one assumes that even the most CPU-hungry
application will do I/O eventually, it should be possible to block all
processes this way. Another is to have a user-space daemon which informs
processes that it's time to stop what they are doing and go idle. Any
process which fails to comply can be reminded with a series of increasingly
urgent signals, culminating in SIGKILL if need be.
Meanwhile, in the real world
Approaches like this can be implemented, and they may well be the long-term
solution. But it's not an immediate solution. Among other things, a
purely QOS-based solution will require that drivers change the system's
overall QOS level in response to events. When something interesting
happens, the system should not be allowed to suspend until user space has
had a chance to respond. So important drivers will need to be augmented
with internal QOS calls - kernel-space suspend blockers in all but name,
essentially. Timers will need to be changed so that those which can be
delayed indefinitely do not prevent the system from suspending.
It might also be necessary to temporarily pass a higher level
of QOS to applications when waking them up to deal with events. All of
this can probably be done in a way that can be merged, but it won't solve
Android's problem now.
So what we may see in the relatively near future is a solution based on an approach described by Alan Stern. Alan's
idea retains the use of forced suspend, though not quite in the
opportunistic mode. Instead, there would be a "QOS suspend" mode
attainable by explicitly writing "qos" to /sys/power/state. If
there are no QOS constraints active when "QOS suspend" is requested, the
system will suspend immediately; otherwise,
the process writing to /sys/power/state will block until those
constraints are released. Additionally, there would be a new QOS
constraint called QOS_EVENTUALLY which is compatible with any idle state
except full suspend. These constraints - held only within the
kernel - would block suspend when things are happening.
In other words, Android's kernel-space suspend blockers turn into
QOS_EVENTUALLY constraints. The difference is that QOS terms are being
used, and the kernel can make its best choice on how those constraints will
be met.
There are no user-space suspend blockers in Alan's approach; instead, there
is a daemon process which tries to put the system into the "QOS suspend"
state whenever it thinks that nothing interesting is happening.
Applications could communicate with that daemon to request that the system
not be suspended; the daemon could then honor those requests (or not)
depending on whatever policy it implements. Thus, the system suspends when
both the kernel and user space agree that it's the right thing to do, and
it doesn't require that all processes go idle first. This mechanism also
makes it easy to track which processes are blocking suspend - an important
requirement for the Android folks.
In summary, as Alan put it:
The advantages of this scheme are that this does everything the
Android people need, and it does it in a way that's entirely
compatible with pure QoS/cpuidle-based power management. It even
starts along the path of making suspend-to-RAM just another kind of
dynamic power state.
Android developer Brian Swetland agreed,
saying "...from what I can see it certainly seems like this model
provides us with the functionality we're looking for." So we might
just have the form of a real solution.
There are a number of loose ends to tie down, of course. Additionally, various
alternatives are still being discussed; one
approach would replace user-space wakelocks with a special device which
can be used to express QOS constraints, for example. There is also the
little nagging issue that nobody has actually posted any code. That
problem notwithstanding, it seems like there could be a way forward which
would finally break the roadblock that has kept so much Android code out of
the kernel for so long.
(
Log in to post comments)