Wakelocks and the embedded problem
Android, of course, is Google's platform for mobile telephones. The initial Android stack was developed behind closed doors; the code only made it out into the world when the first deployments were already in the works. The Android developers have done a lot of kernel work, but very little code has made made the journey into the mainline. The code which has been merged all went into the staging tree without a whole lot of initiative from the Android side. Now, though, Android developer Arve Hjønnevåg is making an effort to merge a piece of that project's infrastructure through the normal process. It is not proving to be an easy ride.
The most controversial bit of code is a feature known as "wakelocks." In Android-speak, a "wakelock" is a mechanism which can prevent the system from going into a low-power state. In brief, kernel code can set up a wakelock with something like this:
#include <linux/wakelock.h> wake_lock_init(struct wakelock *lock, int type, const char *name);
The type value describes what kind of wakelock this is; name gives it a name which can be seen in /proc/wakelocks. There are two possibilities for the type: WAKE_LOCK_SUSPEND prevents the system from suspending, while WAKE_LOCK_IDLE prevents going into a low-power idle state which may increase response times. The API for acquiring and releasing these locks is:
void wake_lock(struct wake_lock *lock); void wake_lock_timeout(struct wake_lock *lock, long timeout); void wake_unlock(struct wake_lock *lock);
There is also a user-space interface. Writing a name to /sys/power/wake_lock establishes a lock with that name, which can then be written to /sys/power/wake_unlock to release the lock. The current patch set only allows suspend locks to be taken from user space.
This submission has not been received particularly well. It has, instead, drawn comments like this from Ben Herrenschmidt:
or this one from Pavel Machek:
There's no end of reasons to dislike this interface. Much of it duplicates the existing pm_qos (quality of service) API; it seems that pm_qos does not meet Android's needs, but it also seems that no effort was made to fix the problems. The scheme seems over-engineered when all that is really needed is a "do not suspend" flag - or, at most, a counter. The patches disable the existing /sys/power/state interface, which does not play well with wakelocks. There is no way to recover if a user-space process exits while holding a wakelock. The default behavior for the system is to suspend, even if a process is running; keeping a system awake may involve a chain of wakelocks obtained by various software components. And so on.
The end result is that this code will not make it into the mainline kernel. But it has been shipped on large numbers of G1 phones, with many more yet to go. So users of all those phones will be using out-of-tree code which will not be merged, at least not in anything like its current form. Any applications which depend on the wakelock sysfs interface will break if that interface is brought up to proper standards. It's a bit of a mess, but it is a very typical mess for the embedded systems community. Embedded developers operate under a set of constraints which makes proper kernel development hard. For example:
- One of the core rules of kernel development is "post early and often."
Code which is developed behind closed doors gets no feedback from the
development community, so it can easily follow a blind path for a long
time. But embedded system vendors rarely want to let the world know
about what they are doing before the product is ready to ship; they
hope, instead, to keep their competitors in the dark for as long as
possible. So posting early is rarely seen as an option.
- Another fundamental rule is "upstream first": code goes into the
mainline before being shipped to customers. Once again, even if an
embedded vendor wants to send code into the mainline, they rarely want
to begin that process before the product ships. So embedded kernels
are shipped containing out-of-tree code which almost certainly has a number of
problems, unsupportable APIs, and more.
- Kernel developers are expected to work with the goal of improving the kernel for everybody. Embedded developers, instead, are generally solving a highly-specific problem under tight time constraints. So they do not think about, for example, extending the existing quality-of-service API to meet their needs; instead, they bash out something which is quick, dirty, and not subject to upstream review.
One could argue that Google has the time, resources, and in-house kernel development knowledge to avoid all of these problems and do things right. Instead, we have been treated to a fairly classic example of how things can go wrong.
The good news is that Google developers are now engaging with the community and trying to get their code into the mainline. This process could well be long, and require a fair amount of adjustment on the Android side. Even if the idea of wakelocks as a way to prevent the system from suspending is accepted - which is far from certain - the interface will require significant changes. The associated "early suspend" API - essentially a notification mechanism for system state changes - will need to be generalized beyond the specific needs of the G1 phone. It could well be a lot of work.
But if that work gets done, the kernel will be much better placed to handle the power-management needs of handheld devices. That, in turn, can only benefit anybody else working on embedded Linux deployments. And, crucially, it will help the Android developers as they port their code to other devices with differing needs. As the number of Android-based phones grows, the cost of carrying out-of-tree code to support each of them will also grow. It would be far better to generalize that support and get it into the mainline, where it can be maintained and improved by the community.
Most embedded systems vendors, it seems, would be unwilling to do that
work; they are too busy trying to put together their next product. So this
sort of code tends to languish out of the mainline, and the quality of
embedded Linux suffers accordingly. Perhaps this case will be different,
though; maybe Google will put the resources into getting its specialized
code into shape and merged into the mainline. That effort could help to
establish Android as a solid, well-supported platform for mobile use, and
that should be good for business. Your editor, ever the optimist, hopes
that things will work out this way; it would be a good demonstration of how
embedded community can work better with the kernel community, getting a
better kernel in return.
Index entries for this article | |
---|---|
Kernel | Android |
Kernel | Embedded systems |
Posted Feb 11, 2009 10:40 UTC (Wed)
by wsa (guest, #52415)
[Link] (1 responses)
I would have liked if you wrote "a lot of embedded developers" instead of "embedded developers" at times. We at Pengutronix, for example, are working hard on getting our patches upstream and advertising this to our customers, and I know a few others who do, too. "Embedded" is not just the usual suspects and their mobile phones, there are numerous devices solving industrial tasks which want to be supported. Quality is definately needed here.
In my book, the time constraint problem is the biggest one. Customers do want results whilst the mainline review process needs time, so you often end up working with a customer-version and a mainline-version, porting fixes back and forth. Also, one can see that the hardware developers face the same time constraints (be it processor manufacturers or board designers), which makes producing kernel quality code an even bigger challenge because of sloppy hardware.
I am just now working on a generic SPI-driver for the i.MX-platforms for mainline. Everyone who wants to get an idea what difficulties an embedded kernel developer may face is invited to join me. This one is a prime example.
Posted Feb 11, 2009 13:59 UTC (Wed)
by corbet (editor, #1)
[Link]
Posted Feb 11, 2009 11:47 UTC (Wed)
by russell (guest, #10458)
[Link]
Posted Feb 11, 2009 13:01 UTC (Wed)
by Kluge (subscriber, #2881)
[Link] (2 responses)
I thought that the kernel hackers disliked adding new features unless they were already in use (and
Posted Feb 11, 2009 15:53 UTC (Wed)
by knan (subscriber, #3940)
[Link] (1 responses)
The actual hardware being more than dreams in a simulator also helps, of course.
Posted Feb 14, 2009 0:34 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
Posted Feb 11, 2009 16:20 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (10 responses)
Just going from the example above (and I realise that this may already be happening without my knowing), the main problem seems to be the interfaces, not the code. So if the embedded developers were able to discuss the interfaces with the relevant kernel developers on private mailing lists, to get an idea of what was likely to wash and what not then everyone would be much further on, even without the embedded people releasing their code.
Of course, once they did get to the stage of releasing code, there would still be the long integration process, but a lot of the heat would be taken off by the fact that the interfaces were likely to get through without too much discussion. The embedded people would be able to ship without the integration being complete, but safe in the knowledge that at some point their stuff would run on a generic kernel, with all the resulting benefits, as long as they showed a reasonable amount of good will.
Posted Feb 11, 2009 16:38 UTC (Wed)
by droundy (subscriber, #4559)
[Link] (1 responses)
Posted Feb 11, 2009 18:19 UTC (Wed)
by mjthayer (guest, #39183)
[Link]
Posted Feb 11, 2009 21:13 UTC (Wed)
by gouyou (guest, #30290)
[Link] (2 responses)
Yeah, like most of them would be interested to have discussion like that under NDA, helping for-profit companies produce better products ...
Posted Feb 11, 2009 21:45 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Feb 11, 2009 22:00 UTC (Wed)
by gouyou (guest, #30290)
[Link]
(For the top empoyer you can take a look here: http://lwn.net/Articles/312074/)
Posted Feb 12, 2009 0:10 UTC (Thu)
by dlang (guest, #313)
[Link]
there isn't a list for this, in part because there are so many kernel developers that such a list would hardly be limited.
the kernel folks have included drivers for hardware that's not shipping yet.
so, the kernel developers have shown that they are willing to work with embedded developers, but they can't be proactive about it because they don't have any way of knowing that they need to contact someone. the embedded developers know they are working on something, and can easily find out who to contact for advice. for the most part they don't choose to do so.
Posted Feb 12, 2009 3:03 UTC (Thu)
by jamesh (guest, #1159)
[Link] (2 responses)
If it is just a single mailing list, then the developer's competitors will likely also be on the list, which they might consider just as bad as a public list.
If it is separate lists, that is a lot of effort for the kernel developers. Also, what should they do if two embedded developers propose interfaces that achieve similar or identical aims? Do they break confidentiality and try to get the two to cooperate, or do they have to pretend that they don't know about the other use case?
Posted Feb 12, 2009 9:04 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Feb 12, 2009 9:53 UTC (Thu)
by johill (subscriber, #25196)
[Link]
Posted Feb 12, 2009 12:09 UTC (Thu)
by mjg59 (subscriber, #23239)
[Link]
Posted Feb 12, 2009 18:00 UTC (Thu)
by mgross (guest, #38112)
[Link] (1 responses)
The high level notion of having a "fall-line" to low power states subject to constraints keeping components from "falling' to a lower power state is still quite interesting. FWIW at the time we worked on this concept in CELF things got complex around the dependency and notification networks that needed to be managed to make things work.
Wakelock implements a type of constraint method. I think the API has problems but the general idea of of constraint based steepest descent PM still has appeal. To me anyway.
Posted Feb 13, 2009 0:53 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link]
I'm not sold on the idea of providing explicit constraints in most cases. If you're going to provide that constraint explicitly, why not allow the kernel to infer it? The code to say "Nothing needs access to input devices now" is not significantly differently complicated to the code that closes the input device when it doesn't need it. But that's the kind of case that the Android code deals with now.
Stuff like the pm_qos framework deals with a different case, where you're supplying additional functional constraints to the kernel above and beyond those that can be inferred. I think we should be focusing on what those constraints might be rather than thinking about the wakelock and early suspend code from Android.
Posted Feb 13, 2009 0:44 UTC (Fri)
by jd (guest, #26381)
[Link] (3 responses)
(I had a devil of a time trying to find VME or Fieldbus drivers that would sit still. The drivers would appear without warning - the companies rarely advertised them - and then vanished without warning.)
Sometimes, I would get all kinds of odd reactions to questions. The COMEDI developers were dead set against merging their code with the baseline, although I could never get them to give me a reason that made sense. I could never get much of a coherent answer from RTAI, either. I'm sure both groups had excellent reasons, and mean no offense to either, but I would have preferred to know what that reason was.
The Transputer drivers never made it into the mainstream, either, and I only discovered them on a series of barely-recognized FTP sites that didn't appear on most search engines. True, not many people used Transputers by the time the patch came out, but then not many people used the CBM64 when drivers for Commodore peripherals started circulating. There was zero documentation for the Transputer drivers, including any indication of who wrote them, and they'd clearly been abandoned a long time by the time I found them.
One of the reasons I developed FOLK was to stop this kind of nonsense from happening - people would have a better idea of what was out there, whether the developers liked it or not. (I got into a few tangles with GRSecurity over that. I can understand their reasoning of wanting to make sure security code hasn't been tampered with, but they have no control over what someone installing it does and they're now near-death from lack of exposure. They wouldn't depend as much on a single revenue stream if their work was better-known, better-circulated and better-understood. I can understand their position, but I can still resent the fact that Linux will be a poorer place when GRSecurity goes the way of the Dodo.)
I found many, many other embedded projects out there, and expect to find many many more such projects should I ever go looking again. These projects don't suffer from a lack of releases, a lack of open-sourceness or a lack of highly imaginative solutions. What they lack is an existence within the visible spectrum. What you don't see, you can't use. Sure, there are some "secret" projects out there, but if the published projects were getting some eyeballs, there'd be less need for "secret" APIs (as the problems with the mainstream APIs would be fixed or replacements would already be incorporated).
Sure, if more of these projects got discussed and more got included into the mainstream, it wouldn't fix all the problems in the world, or even in the embedded world. What it would do is reduce the number of opportunities for problems and misunderstandings to develop. Isn't that in the interests of both embedded and non-embedded developers?
I can even understand embedded (and non-embedded) developers being wary of the extra overheads involved in collecting together the various bits of work, documenting meaningfully the APIs and other such non-fundamental work that needs to get done. I'd even be willing to do some of this, if there was some chance it would make a difference.
Posted Dec 31, 2011 11:55 UTC (Sat)
by ErikEngerd (guest, #82043)
[Link] (2 responses)
In addition, I am wondering why wake locks are even needed. Shouldn't the cpu go to a low power state whenever possible automatically? What exactly is the problem that a wake lock is intended to solve? Is it just scheduling of background tasks?
Perhaps it would also be good to have something like policies in linux defining how often/when certain processes may run (perhaps an extension to selinux?).
Anyway, I think the current implementation with putting the responsibility for wake locks in user space is really fragile. Many users are experiencing poor battery life because of this and are not technical enough to find the cause. Having a solution where the kernel is in control would be a big step in the right direction, I think.
Posted Dec 31, 2011 19:30 UTC (Sat)
by dlang (guest, #313)
[Link] (1 responses)
there are problems with switching the system into low power mode
1. some things don't work in some low power modes
2. it takes time to switch out of low power modes
linux systems have been switching to low power modes automatically for quite a few years, but they only switch to modes that are going to be transparent to the user (unless they are watching for it).
In addition to these power saving modes, there are the 'suspend' and 'hibernate' modes where they system stops all processing. Traditional systems try to determine that the system is idle for a long enough time period before going into suspend.
the idea behind the userspace wakelocks can be paraphrased into having an extremely short (approaching zero) timer for going into suspend, but only if nothing is holding a wakelock to keep the system awake.
In my opinion, this idea is mostly defeated by the fact that they don't trust regular programs to take the wakelock, and instead have a central power management daemon that does things like hold the wakelock the entire time the screen is lit.
now, something similar to the wakelock was needed in the kernel to keep the system from going to sleep at the same time that a new event was happening that would cause the system to wake up (to prevent a race condition), and a mechanism to do this was added to the kernel a year or so ago (but is not yet used by Android)
Posted Mar 26, 2012 17:07 UTC (Mon)
by bgat (guest, #20709)
[Link]
The nice thing about runtime-pm is that it is fully aware of the kernel's Device Model, and can therefore make better decisions about system state than wakelocks can. The downside is that it looks nothing like existing wakelocks, so it requires movement from Android.
This does not count for all embedded developers
You are right, I should not have used quite such a broad brush. There are quite a few embedded developers who make a point of working with the upstream kernel, and the number seems to be growing. My apologies.
This does not count for all embedded developers
Wakelocks and the embedded problem
Wakelocks and the embedded problem
customers.'
hence proven useful) by someone.
Wakelocks and the embedded problem
I can remember various LWN articles about some proposed feature where kernel developers argued that it needed to be used out of tree and shipped with distributions for a while to prove its worthiness before joining the kernel.org major league. I believe these were major functions, though.
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
> relevant kernel developers on private mailing lists
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded PM
Wakelocks and the embedded PM
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem
Wakelocks and the embedded problem