Wakelocks and the embedded problem

By Jonathan Corbet
February 10, 2009

The relationship between embedded system developers and the kernel community is known for being rough, at best. Kernel developers complain about low-quality work and a lack of contributions from the embedded side; the embedded developers, when they say anything at all, express frustrations that the kernel development process does not really keep their needs in mind. A current discussion involving developers from the Android project gives some insight into where this disconnect comes from.

Android, of course, is Google's platform for mobile telephones. The initial Android stack was developed behind closed doors; the code only made it out into the world when the first deployments were already in the works. The Android developers have done a lot of kernel work, but very little code has made made the journey into the mainline. The code which has been merged all went into the staging tree without a whole lot of initiative from the Android side. Now, though, Android developer Arve Hjønnevåg is making an effort to merge a piece of that project's infrastructure through the normal process. It is not proving to be an easy ride.

The most controversial bit of code is a feature known as "wakelocks." In Android-speak, a "wakelock" is a mechanism which can prevent the system from going into a low-power state. In brief, kernel code can set up a wakelock with something like this:

    #include <linux/wakelock.h>

    wake_lock_init(struct wakelock *lock, int type, const char *name);

The type value describes what kind of wakelock this is; name gives it a name which can be seen in /proc/wakelocks. There are two possibilities for the type: WAKE_LOCK_SUSPEND prevents the system from suspending, while WAKE_LOCK_IDLE prevents going into a low-power idle state which may increase response times. The API for acquiring and releasing these locks is:

    void wake_lock(struct wake_lock *lock);
    void wake_lock_timeout(struct wake_lock *lock, long timeout);
    void wake_unlock(struct wake_lock *lock);

There is also a user-space interface. Writing a name to /sys/power/wake_lock establishes a lock with that name, which can then be written to /sys/power/wake_unlock to release the lock. The current patch set only allows suspend locks to be taken from user space.

This submission has not been received particularly well. It has, instead, drawn comments like this from Ben Herrenschmidt:

looks to me like some people hacked up some ad-hoc trick for their own local need without instead trying to figure out how to fit things with the existing infrastructure (or possibly propose changes to the existing infrastructure to fit their needs).

or this one from Pavel Machek:

Ok, I think that this wakelock stuff is in "can't be used properly" area on Rusty's scale of nasty interfaces.

There's no end of reasons to dislike this interface. Much of it duplicates the existing pm_qos (quality of service) API; it seems that pm_qos does not meet Android's needs, but it also seems that no effort was made to fix the problems. The scheme seems over-engineered when all that is really needed is a "do not suspend" flag - or, at most, a counter. The patches disable the existing /sys/power/state interface, which does not play well with wakelocks. There is no way to recover if a user-space process exits while holding a wakelock. The default behavior for the system is to suspend, even if a process is running; keeping a system awake may involve a chain of wakelocks obtained by various software components. And so on.

The end result is that this code will not make it into the mainline kernel. But it has been shipped on large numbers of G1 phones, with many more yet to go. So users of all those phones will be using out-of-tree code which will not be merged, at least not in anything like its current form. Any applications which depend on the wakelock sysfs interface will break if that interface is brought up to proper standards. It's a bit of a mess, but it is a very typical mess for the embedded systems community. Embedded developers operate under a set of constraints which makes proper kernel development hard. For example:

One of the core rules of kernel development is "post early and often." Code which is developed behind closed doors gets no feedback from the development community, so it can easily follow a blind path for a long time. But embedded system vendors rarely want to let the world know about what they are doing before the product is ready to ship; they hope, instead, to keep their competitors in the dark for as long as possible. So posting early is rarely seen as an option.
Another fundamental rule is "upstream first": code goes into the mainline before being shipped to customers. Once again, even if an embedded vendor wants to send code into the mainline, they rarely want to begin that process before the product ships. So embedded kernels are shipped containing out-of-tree code which almost certainly has a number of problems, unsupportable APIs, and more.
Kernel developers are expected to work with the goal of improving the kernel for everybody. Embedded developers, instead, are generally solving a highly-specific problem under tight time constraints. So they do not think about, for example, extending the existing quality-of-service API to meet their needs; instead, they bash out something which is quick, dirty, and not subject to upstream review.

One could argue that Google has the time, resources, and in-house kernel development knowledge to avoid all of these problems and do things right. Instead, we have been treated to a fairly classic example of how things can go wrong.

The good news is that Google developers are now engaging with the community and trying to get their code into the mainline. This process could well be long, and require a fair amount of adjustment on the Android side. Even if the idea of wakelocks as a way to prevent the system from suspending is accepted - which is far from certain - the interface will require significant changes. The associated "early suspend" API - essentially a notification mechanism for system state changes - will need to be generalized beyond the specific needs of the G1 phone. It could well be a lot of work.

But if that work gets done, the kernel will be much better placed to handle the power-management needs of handheld devices. That, in turn, can only benefit anybody else working on embedded Linux deployments. And, crucially, it will help the Android developers as they port their code to other devices with differing needs. As the number of Android-based phones grows, the cost of carrying out-of-tree code to support each of them will also grow. It would be far better to generalize that support and get it into the mainline, where it can be maintained and improved by the community.

Most embedded systems vendors, it seems, would be unwilling to do that work; they are too busy trying to put together their next product. So this sort of code tends to languish out of the mainline, and the quality of embedded Linux suffers accordingly. Perhaps this case will be different, though; maybe Google will put the resources into getting its specialized code into shape and merged into the mainline. That effort could help to establish Android as a solid, well-supported platform for mobile use, and that should be good for business. Your editor, ever the optimist, hopes that things will work out this way; it would be a good demonstration of how embedded community can work better with the kernel community, getting a better kernel in return.

Index entries for this article
Kernel	Android
Kernel	Embedded systems

This does not count for all embedded developers

Posted Feb 11, 2009 10:40 UTC (Wed) by wsa (guest, #52415) [Link] (1 responses)

I would have liked if you wrote "a lot of embedded developers" instead of "embedded developers" at times. We at Pengutronix, for example, are working hard on getting our patches upstream and advertising this to our customers, and I know a few others who do, too. "Embedded" is not just the usual suspects and their mobile phones, there are numerous devices solving industrial tasks which want to be supported. Quality is definately needed here.

In my book, the time constraint problem is the biggest one. Customers do want results whilst the mainline review process needs time, so you often end up working with a customer-version and a mainline-version, porting fixes back and forth. Also, one can see that the hardware developers face the same time constraints (be it processor manufacturers or board designers), which makes producing kernel quality code an even bigger challenge because of sloppy hardware.

I am just now working on a generic SPI-driver for the i.MX-platforms for mainline. Everyone who wants to get an idea what difficulties an embedded kernel developer may face is invited to join me. This one is a prime example.

This does not count for all embedded developers

Posted Feb 11, 2009 13:59 UTC (Wed) by corbet (editor, #1) [Link]

You are right, I should not have used quite such a broad brush. There are quite a few embedded developers who make a point of working with the upstream kernel, and the number seems to be growing. My apologies.

Wakelocks and the embedded problem

Posted Feb 11, 2009 11:47 UTC (Wed) by russell (guest, #10458) [Link]

A little off topic. But instead of a wakelock. I'd like to see a poweroff timer that powered down regardless of what user space is doing. After doing who knows what damage cooking my laptop of several occasions. I no longer trust user space to get it right and power off.

Wakelocks and the embedded problem

Posted Feb 11, 2009 13:01 UTC (Wed) by Kluge (subscriber, #2881) [Link] (2 responses)

'Another fundamental rule is "upstream first": code goes into the mainline before being shipped to
customers.'

I thought that the kernel hackers disliked adding new features unless they were already in use (and
hence proven useful) by someone.

Wakelocks and the embedded problem

Posted Feb 11, 2009 15:53 UTC (Wed) by knan (subscriber, #3940) [Link] (1 responses)

"In use by some other piece of code" is the usual criteria. I.e. a driver using your added shared infrastructure, a userspace program talking to the interface added, etc.

The actual hardware being more than dreams in a simulator also helps, of course.

Wakelocks and the embedded problem

Posted Feb 14, 2009 0:34 UTC (Sat) by giraffedata (guest, #1954) [Link]

I can remember various LWN articles about some proposed feature where kernel developers argued that it needed to be used out of tree and shipped with distributions for a while to prove its worthiness before joining the kernel.org major league. I believe these were major functions, though.

Wakelocks and the embedded problem

Posted Feb 11, 2009 16:20 UTC (Wed) by mjthayer (guest, #39183) [Link] (10 responses)

Perhaps unsurprisingly, LWN articles on this subject tend to come to the point of view that embedded developers should adjust to fit the kernel developers' model. Perhaps the kernel developers would be able to move towards the embedded developers to some extent without compromising their own positions though?

Just going from the example above (and I realise that this may already be happening without my knowing), the main problem seems to be the interfaces, not the code. So if the embedded developers were able to discuss the interfaces with the relevant kernel developers on private mailing lists, to get an idea of what was likely to wash and what not then everyone would be much further on, even without the embedded people releasing their code.

Of course, once they did get to the stage of releasing code, there would still be the long integration process, but a lot of the heat would be taken off by the fact that the interfaces were likely to get through without too much discussion. The embedded people would be able to ship without the integration being complete, but safe in the knowledge that at some point their stuff would run on a generic kernel, with all the resulting benefits, as long as they showed a reasonable amount of good will.

Wakelocks and the embedded problem

Posted Feb 11, 2009 16:38 UTC (Wed) by droundy (subscriber, #4559) [Link] (1 responses)

I imagine the problem with this idea is that usually interfaces are trickier than implementations, and it's very hard to know if an interface is "right" without also having a decent implementation. e.g. presumably the problem with pm_qos that made it inadequate for android's needs probably wasn't obvious when that code was reviewed (and is still not clear to me).

Wakelocks and the embedded problem

Posted Feb 11, 2009 18:19 UTC (Wed) by mjthayer (guest, #39183) [Link]

They could still explain though, why the existing interfaces did not suit them and what they proposed to/were in the process of creating instead. That would at least give some valuable feedback as to how likely the changes are to get in. The embedded people do create implementations. Even that limited feedback as they went along might make everyone's life easier.

Wakelocks and the embedded problem

Posted Feb 11, 2009 21:13 UTC (Wed) by gouyou (guest, #30290) [Link] (2 responses)

> if the embedded developers were able to discuss the interfaces with the
> relevant kernel developers on private mailing lists

Yeah, like most of them would be interested to have discussion like that under NDA, helping for-profit companies produce better products ...

Wakelocks and the embedded problem

Posted Feb 11, 2009 21:45 UTC (Wed) by mjthayer (guest, #39183) [Link] (1 responses)

I am supposing of course that they think the embedded people will contribute interesting code in the long run. If they don't think that then this is moot anyway :)

Wakelocks and the embedded problem

Posted Feb 11, 2009 22:00 UTC (Wed) by gouyou (guest, #30290) [Link]

But even if they contribute interesting code, most kernel developer do not work on Linux only for glory, they get paid to do it. I'm not sure company like RedHat, Novell, IBM or Oracle would be terribly happy to have their people spend time reviewing embedded API.

(For the top empoyer you can take a look here: http://lwn.net/Articles/312074/)

Wakelocks and the embedded problem

Posted Feb 12, 2009 0:10 UTC (Thu) by dlang (guest, #313) [Link]

most kernel developers will respond to private e-mails about new developments.

there isn't a list for this, in part because there are so many kernel developers that such a list would hardly be limited.

the kernel folks have included drivers for hardware that's not shipping yet.

so, the kernel developers have shown that they are willing to work with embedded developers, but they can't be proactive about it because they don't have any way of knowing that they need to contact someone. the embedded developers know they are working on something, and can easily find out who to contact for advice. for the most part they don't choose to do so.

Wakelocks and the embedded problem

Posted Feb 12, 2009 3:03 UTC (Thu) by jamesh (guest, #1159) [Link] (2 responses)

The private mailing list thing seems like it would be problematic. Are you thinking of a single private mailing list, or one for each embeded developer?

If it is just a single mailing list, then the developer's competitors will likely also be on the list, which they might consider just as bad as a public list.

If it is separate lists, that is a lot of effort for the kernel developers. Also, what should they do if two embedded developers propose interfaces that achieve similar or identical aims? Do they break confidentiality and try to get the two to cooperate, or do they have to pretend that they don't know about the other use case?

Wakelocks and the embedded problem

Posted Feb 12, 2009 9:04 UTC (Thu) by mjthayer (guest, #39183) [Link] (1 responses)

Actually I was thinking that the embedded developers would not be on the list at all, but CCed when appropriate. And if handled delicately, they might even welcome a limited co-ordination with competitors on kernel interfaces - those are likely not to be the most valuable "IP" which they wish to keep to their breast for all times. If the kernel developers thought that the resulting contributions were likely to be of sufficient value (to themselves or their employers :) ) they could even play intermediaries without actually dropping names. This "if" is of course the hinging point for everything I have posted up until now.

Wakelocks and the embedded problem

Posted Feb 12, 2009 9:53 UTC (Thu) by johill (subscriber, #25196) [Link]

You're also assuming that no kernel developer (for lack of more specification) are competition, something which cannot possibly be true. If you think this through, the list might as well be public, and then you might as well use linux-kernel or a more appropriate subsystem list.

Wakelocks and the embedded problem

Posted Feb 12, 2009 12:09 UTC (Thu) by mjg59 (subscriber, #23239) [Link]

Google had all of this code in a public git repository long before they shipped anything running it, so absence of discussion before now isn't down to wanting to keep it secret.

Wakelocks and the embedded PM

Posted Feb 12, 2009 18:00 UTC (Thu) by mgross (guest, #38112) [Link] (1 responses)

As I look more and more closely at the wakelock structure I'm struck by how similar it is to some ideas we tossed around on the CELF PM working group a few years back. Ideas that fizzled a little at that time.

The high level notion of having a "fall-line" to low power states subject to constraints keeping components from "falling' to a lower power state is still quite interesting. FWIW at the time we worked on this concept in CELF things got complex around the dependency and notification networks that needed to be managed to make things work.

Wakelock implements a type of constraint method. I think the API has problems but the general idea of of constraint based steepest descent PM still has appeal. To me anyway.

Wakelocks and the embedded PM

Posted Feb 13, 2009 0:53 UTC (Fri) by mjg59 (subscriber, #23239) [Link]

I think the real question is over how constraints should be exposed. I'm very much on the side of inferring constraints from the behaviour of userland - if they have a device open then we should assume that they want to use it, so should avoid shutting it down. We're nowhere near providing that level of functionality in the kernel yet, but doing so helps the embedded, desktop and server worlds.

I'm not sold on the idea of providing explicit constraints in most cases. If you're going to provide that constraint explicitly, why not allow the kernel to infer it? The code to say "Nothing needs access to input devices now" is not significantly differently complicated to the code that closes the input device when it doesn't need it. But that's the kind of case that the Android code deals with now.

Stuff like the pm_qos framework deals with a different case, where you're supplying additional functional constraints to the kernel above and beyond those that can be inferred. I think we should be focusing on what those constraints might be rather than thinking about the wakelock and early suspend code from Android.

Wakelocks and the embedded problem

Posted Feb 13, 2009 0:44 UTC (Fri) by jd (guest, #26381) [Link] (3 responses)

The level of interaction by embedded developers can be roughly modeled by Brownian motion. Sometimes it is there, sometimes it isn't. For example, when working on the FOLK kernel patch set of obscure drivers, I encountered drivers for embedded hardware that would be there one week and vanish the next.

(I had a devil of a time trying to find VME or Fieldbus drivers that would sit still. The drivers would appear without warning - the companies rarely advertised them - and then vanished without warning.)

Sometimes, I would get all kinds of odd reactions to questions. The COMEDI developers were dead set against merging their code with the baseline, although I could never get them to give me a reason that made sense. I could never get much of a coherent answer from RTAI, either. I'm sure both groups had excellent reasons, and mean no offense to either, but I would have preferred to know what that reason was.

The Transputer drivers never made it into the mainstream, either, and I only discovered them on a series of barely-recognized FTP sites that didn't appear on most search engines. True, not many people used Transputers by the time the patch came out, but then not many people used the CBM64 when drivers for Commodore peripherals started circulating. There was zero documentation for the Transputer drivers, including any indication of who wrote them, and they'd clearly been abandoned a long time by the time I found them.

One of the reasons I developed FOLK was to stop this kind of nonsense from happening - people would have a better idea of what was out there, whether the developers liked it or not. (I got into a few tangles with GRSecurity over that. I can understand their reasoning of wanting to make sure security code hasn't been tampered with, but they have no control over what someone installing it does and they're now near-death from lack of exposure. They wouldn't depend as much on a single revenue stream if their work was better-known, better-circulated and better-understood. I can understand their position, but I can still resent the fact that Linux will be a poorer place when GRSecurity goes the way of the Dodo.)

I found many, many other embedded projects out there, and expect to find many many more such projects should I ever go looking again. These projects don't suffer from a lack of releases, a lack of open-sourceness or a lack of highly imaginative solutions. What they lack is an existence within the visible spectrum. What you don't see, you can't use. Sure, there are some "secret" projects out there, but if the published projects were getting some eyeballs, there'd be less need for "secret" APIs (as the problems with the mainstream APIs would be fixed or replacements would already be incorporated).

Sure, if more of these projects got discussed and more got included into the mainstream, it wouldn't fix all the problems in the world, or even in the embedded world. What it would do is reduce the number of opportunities for problems and misunderstandings to develop. Isn't that in the interests of both embedded and non-embedded developers?

I can even understand embedded (and non-embedded) developers being wary of the extra overheads involved in collecting together the various bits of work, documenting meaningfully the APIs and other such non-fundamental work that needs to get done. I'd even be willing to do some of this, if there was some chance it would make a difference.

Wakelocks and the embedded problem

Posted Dec 31, 2011 11:55 UTC (Sat) by ErikEngerd (guest, #82043) [Link] (2 responses)

It is now almost 2012 and I still regularly find myself tracing wake lock issues on a recent Android phone. I think that whatever the solution to this problem is, the kernel should be in control over wake locks. In particular, when a process exits, all its wake locks should be released.

In addition, I am wondering why wake locks are even needed. Shouldn't the cpu go to a low power state whenever possible automatically? What exactly is the problem that a wake lock is intended to solve? Is it just scheduling of background tasks?

Perhaps it would also be good to have something like policies in linux defining how often/when certain processes may run (perhaps an extension to selinux?).

Anyway, I think the current implementation with putting the responsibility for wake locks in user space is really fragile. Many users are experiencing poor battery life because of this and are not technical enough to find the cause. Having a solution where the kernel is in control would be a big step in the right direction, I think.

Wakelocks and the embedded problem

Posted Dec 31, 2011 19:30 UTC (Sat) by dlang (guest, #313) [Link] (1 responses)

this is the heart of the disagreement over wakelocks being _the_ solution to the power problem.

there are problems with switching the system into low power mode

1. some things don't work in some low power modes

2. it takes time to switch out of low power modes

linux systems have been switching to low power modes automatically for quite a few years, but they only switch to modes that are going to be transparent to the user (unless they are watching for it).

In addition to these power saving modes, there are the 'suspend' and 'hibernate' modes where they system stops all processing. Traditional systems try to determine that the system is idle for a long enough time period before going into suspend.

the idea behind the userspace wakelocks can be paraphrased into having an extremely short (approaching zero) timer for going into suspend, but only if nothing is holding a wakelock to keep the system awake.

In my opinion, this idea is mostly defeated by the fact that they don't trust regular programs to take the wakelock, and instead have a central power management daemon that does things like hold the wakelock the entire time the screen is lit.

now, something similar to the wakelock was needed in the kernel to keep the system from going to sleep at the same time that a new event was happening that would cause the system to wake up (to prevent a race condition), and a mechanism to do this was added to the kernel a year or so ago (but is not yet used by Android)

Wakelocks and the embedded problem

Posted Mar 26, 2012 17:07 UTC (Mon) by bgat (guest, #20709) [Link]

I think that runtime-pm is very, very close to being a complete replacement for wakelocks. At least for platforms with drivers that fully support it.

The nice thing about runtime-pm is that it is fully aware of the kernel's Device Model, and can therefore make better decisions about system state than wakelocks can. The downside is that it looks nothing like existing wakelocks, so it requires movement from Android.