User: Password:
Subscribe / Log in / New account

A new approach to opportunistic suspend

A new approach to opportunistic suspend

Posted Sep 29, 2011 2:49 UTC (Thu) by neilbrown (subscriber, #359)
In reply to: A new approach to opportunistic suspend by jstultz
Parent article: A new approach to opportunistic suspend

I think the wakeup_count model would require the pm-daemon, between reading wakeup_count and writing back to it, to say to the alarm timer "you aren't doing anything right now, are you" and for the alarm timer to check and say "no, I'm right - nothing happening here". Only then does pm-daemon go ahead and suspend. In your case the alarm timer would say "Oh, I just got a wake-up, better not suspend just yet".

The only place I can see room for a race is inside the kernel as the event is handed from one subsystem to another. If e.g. an interrupt handler registered a wakeup event, then queued it for a kernel thread to handle and make available to user-space. I cannot immediately see how we are guaranteed not to sleep before the event becomes visible to user-space. However I suspect a very short usleep in pm-daemon would cover the race, and there may be some rules for writing wakeup-aware drivers that I don't know about.

(Log in to post comments)

A new approach to opportunistic suspend

Posted Sep 29, 2011 7:01 UTC (Thu) by jstultz (subscriber, #212) [Link]

I'm not sure I follow the pm-daemon talking with the alarmtimer. But I'd be interested in hearing more.

The trouble is that you have to keep suspend from happening from the wakeup irq to the point that userland runs and consumes the event.

You're right about protecting the path from the wakeup irq to the task wakeup, and that it might go through numerous kernel threads (threaded irqs, softirqs, work queues, or even timers). However, that can be protected using pm_stay_awake/pm_relax chaining, as I did for the RTC/alarmtimer paths.

But we also have to wait for the woken task to run and do whatever communication with the pm-daemon to stop it from suspending. I don't really see how that can be solved without some kernel assistance.

With wakelocks, there is the select/userland-wakelock/read method for overlapping the critical sections. With my proposal, the kernel deboosts before calling schedule() and reboosts the task on wakeup.

Trying to get userland to do something like registering interest with the pm-daemon, and then having to ack the pm-daemon the event was consumed might be plausable, but with only a global wakeup_count I'm not sure how we'd associate which wakeup events we're waiting for an ack on and which ones to ignore (much less the general ugliness of having to map userland's sense of wakeup sources to the actual kernel devices causing them).

I'm very interested in proposals on how to properly solve this on the userland side. However, so far folks seem to wave at it, and no one has shown a working proof of concept.

A new approach to opportunistic suspend

Posted Sep 29, 2011 7:58 UTC (Thu) by neilbrown (subscriber, #359) [Link]

I have to confess that I have only half-implemented this. The kernel running on my phone predates wakeup_count (2.6.29!) but when I get a GTA04....

Anyway this is how it would work - with lots of implementation details that are to some extent irrelevant, but make it more concrete.

There is a file /var/lock/suspend which is empty. Any process which cares about suspend takes a shared lock on this file and arranges to be notified when it changes (dnotify or inotify - choose your poison).
There is another file: /var/run/suspend_disabled. Any process that doesn't want a suspend to happen takes a shared lock on this file.

Some process that is responsible for initiating suspend either periodically tries to take a non-blocking exclusive lock on /var/run/suspend_disabled, or just sits waiting on a blocking exclusive lock.

When it gets a lock, it then drops it and starts the suspend cycle handshake. This involves:
1 - read /sys/power/wakeup_count
2 - create a new file /var/lock/suspend-next
3 - write anything to /var/lock/suspend
4 - get a blocking exclusive lock on /var/lock/suspend

Any process that is interested in suspend will now get notified that a suspend in imminent ([di]notify tells it, and it can check the size of the suspend file to confirm). It does any preparation that might be needed for suspend (tell the gsm chip to stop reporting cell changes, record the time of the next cron job in the RTC alarm), takes a shared lock on /var/lock/suspend-next, asks for change notification on that file, and closes /var/lock/suspend (thus dropping the lock).
If the preparation determined that something actually needed to be done *now* so a suspend should not happen it should take a shared lock on
/var/run/suspend_disabled before it closes /var/lock/suspend.

When all interested parties have actioned their part of the handshake the suspending thread will get the exclusive lock it wanted and:

5 - get a non-blocking exclusive lock on /var/run/suspend_disabled.
6 - if that succeeded, write the wakeup_count back to /sys/power/wakeup_count
7 - if that succeeded, write sleep to /sys/power/state
8 - unlock /var/run/suspend_disabled if it was locked
9 - rename /var/lock/suspend-next to /var/lock/suspend

This completes the cycle. Any client which asked for notification on /var/lock/suspend_disabled will find out that it has been renamed (so it now has a shared lock on /var/lock/suspend) and can do any wake-from-suspend processing such a telling the GSM chip it is allowed to be noisy again.

This way any process that cares can get race-free notification of suspend and resume so it can do any necessary processing, and it has the opportunity to abort the suspend before it commits. If any wakeup event happens after a thread does a check, that event will prevent the system from actually suspending, and the relevant thread will get another notification and another chance to consume the event before there is any chance of another suspend.

Your alarmtimer would need to hook into this somehow. Presumably there would be a cron-like task which gets notification of suspends, arranges an RTC wakeup before allowing suspend to complete, and holds a suspend_disabled lock from a couple of seconds before something needs to run (depending on how long a typical suspend/resume cycle is) until the task has completed.

So the "wait for woken task to run and ... stop it from suspending" is achieved by holding a lock which the pm-daemon must wait for before it is allowed to complete a suspend.

Does that make it clear and concrete enough? Is it convincing?

Current best guess at Android suspend-blocker requirements

Posted Oct 1, 2011 17:41 UTC (Sat) by PaulMcKenney (subscriber, #9624) [Link]

It is not clear to me how your approach handles wakeup events (for example, keyboard/button input). Your thought is to have the kernel drivers acquire locks on the files, or did you have something else in mind?

Also, are you assuming FIFO processing of file lock requests? In the absence of FIFO processing, I don't see what causes all outstanding requests to be processed between your step 4 and 5. (I don't know of any formal standard requiring that file locking process requests in FIFO order, but I might well be missing some fine print somewhere.)

Current best guess at Android suspend-blocker requirements

Posted Oct 1, 2011 21:46 UTC (Sat) by neilbrown (subscriber, #359) [Link]

Hi Paul,

I trust the kernel to handle wakeup events properly and to get them to a point where user-space can see them before allowing the wakeup event to be considered complete. Up thread John says "that can be protected using pm_stay_awake/pm_relax chaining, as I did for the RTC/alarmtimer paths" and I believe that he is talking about exactly that issue and believes it can be addressed - and I trust him in this.... and I just had a look at his patches and the relevant code and find my trust in John was not misplaced (though my trust in the kernel might be ... the kernel currently doesn't do the right thing with RTC wakeups ... but that is a bug that John is fixing, it is not a problem with the design).

The locking protocol relies on the difference between shared and exclusive locks. The clients (who want suspend notification) take shared locks. The service (which will initiate suspend) takes an exclusive lock which will wait for all shared locks to release. This provides the required ordering.

New client locks are directed to a different file: suspend-next (if 'suspend' is not empty, clients must not try to lock it, but should lock suspend-next instead).

I believe this foundation can be used to address all the concerns in your fine document. I'm not completely certain on "reasonable statistics in order to enable debugging and to enable reporting to endusers" as I don't know what the "reasonable statistics are". Presumably that is just statistics about who locks "suspend_disabled" and for how long. That isn't available directly in my scheme, but the locking of suspend_disabled is a trivial part of the API - quite separate from the race-free-suspend-notification - and could easily be replaced by something that carries more detailed information.

(Of course the whole API could be replaced with something using "binder", or sockets, or SUN-RPC or maybe even smoke signals. I just like filesystem APIs so that is what I tend to use. It is the sequencing and interactions that are important).

The interactions are:
- service multicasts "I'm going to suspend now - prepare yourselves"
- clients perform any preparation, which may include saying "please don't suspend now" on a separate channel.
- clients confirm they are prepared
- services waits for all confirmations. Then checks again if anyone is blocking suspend (via wakeup_count or checking user-space requests).
- If no-one is blocking suspend - enter suspend, and resume
- service tells all client "Ok, we are awake again - go about your business"
- clients act accordingly

Current best guess at Android suspend-blocker requirements

Posted Oct 2, 2011 8:09 UTC (Sun) by neilbrown (subscriber, #359) [Link]

It occurred to me that I didn't really answer your first question properly.

The auto-suspend daemon performs 2 checks to ensure that it is OK to suspend.
1/ The file locking handshake that I described in detail - this makes sure that user-space processes are happy with the suspend.
2/ the wakeup_count handshake. This ensures that the kernel is happy with the suspend.

The kernel has "wakeup_sources" that are similar to suspend blockers. When a wakeup_source is active, it means that suspend should not succeed.

When a wakeup_source is activated (normally in an interrupt handler), two global counters are incremented, and when it is deactivated (which should happen once the event is visible to user-space) one of them is decremented.

When the auto-suspend daemon reads from /sys/power/wakeup_count, the read blocks until the decrementing count is zero - so there are no active wakeup_sources. The other count is returned.
Subsequently - after checking with all of userspace - the daemon writes the number back to wakeup_count. If the corresponding counter has changed, the write fails. This means there has been another event and the daemon needs to check with user-space again.
If it succeeds then the kernel stores this number and the next attempt to suspend (echo mem > /sys/power/state) will only complete if there are no more wakeup events. i.e. if, right up to the moment of suspend, the event counter has the same value as the stored number that was written to /sys/power/state.

So the file-locking is only for user-space, and the wakeup event counting completely handles the kernel-event side.

Current best guess at Android suspend-blocker requirements

Posted Oct 2, 2011 22:24 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

Ah, understood. For whatever reason, I thought you were trying to re-implement John's patches as opposed to building a user-space API for them. Agreed, it would be quite hard to properly handle wakeup events in user space, given that the kernel needs to be involved in the wakeup/resume process!

Current best guess at Android suspend-blocker requirements

Posted Oct 2, 2011 23:43 UTC (Sun) by neilbrown (subscriber, #359) [Link]

Yes and no.

Some of John's patches were just bug-fixes. The wakeup event from the CMOS timer wasn't being counted as a wakeup event properly. Those patches I wasn't trying to re-implement - they are good.

The other patches from John did things to the scheduler so that distinguished processes could prevent suspend just by being busy. I was trying to show that these patches were not necessary because a completely race-free protocol is available in userspace.

(and finally there was a patch which tried to make /sys/power/state resistant to processes that didn't follow the wakeup_count protocol. That issue is best addressed by fixing the processes, or bind-mounting a file over /sys/power/state).

Current best guess at Android suspend-blocker requirements

Posted Oct 3, 2011 16:18 UTC (Mon) by PaulMcKenney (subscriber, #9624) [Link]

Cool! But I must defer to John and the Android guys at this point, as they would have a better idea of what works and does not in the Android environment.

My guess is that if they needed something lighter weight, this same general approach could be implemented within a device driver, though an implementation that just used existing user-space interfaces would be preferable, all else being equal. (Which it never is, but that is life!)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds