|
|
Subscribe / Log in / New account

Timer slack for slacker developers

By Jonathan Corbet
October 17, 2011
The "timer slack controller" is a proposed mechanism that would allow a session management program to adjust the timer tolerances of a group of processes with a single knob. It seems like a relatively obscure and harmless feature, but it has been the focus of an intense debate on the kernel mailing lists. The core question has been seen before: what measures should the kernel take, if any, to keep poorly-written applications from hurting performance?

Timers allow a process to request a wakeup at some future time; timer slack gives the kernel some leeway in its implementation of those timers. If the kernel can delay specific timers by a bounded amount, it can often expire multiple timers at once, minimizing the number of wakeups and, thus, reducing the system's power consumption. Some processes need more precise timing than others; for this reason, the kernel allows a process to specify its maximum timer slack with the prctl() system call. There is, currently, no mechanism to allow one process to adjust another process's timer slack value; it is generally assumed that any given process knows best when it comes to its own timing requirements.

The timer slack controller allows a suitably privileged process to set the timer slack value for every process contained within a control group. The patch has been circulating for some time without generating a great deal of interest; it recently resurfaced in response to the "plumber's wish list for Linux" which requested such a feature. The reasoning behind the request was explained by Lennart Poettering:

Consider you have one or more desktop user sessions logged in, each one in a timer slack cgroup. Now, userspace already tracks when sessions become idle (i.e. currently desktop userspace then starts a screensaver, or turns off the screen, or similar), and we'd like to increase the timer slack for the session cgroups individually as the individual session becomes idle, and decrease it again if the session stops being idle.

It is, in other words, a power-saving mechanism. When the session manager determines that nothing special is going on, it can massively increase the slack on any timers operated by desktop applications, effectively decreasing the number of wakeups. Applications need not be aware of whether the user is currently at the keyboard or not; they will simply slow down during the boring times.

There is some stiff opposition to merging this controller. Naturally, the fact that the timer slack controller uses control groups is part of the problem; some kernel developers have still not made their peace with control groups. Until that situation resolves itself - if it ever does - features based on control groups are going to have a bumpy ride on their way into the mainline.

Beyond the general control group issue, though, two complaints have been heard about this approach to power management. One is that applications running on the desktop may have timing requirements that are not dependent on whether the user is actually there or not. One could imagine a data acquisition application that does not have stringent response requirements, but which will still lose data if its timers suddenly gain multiple seconds of slack. Lennart's response is that such applications should be using the realtime scheduler classes, but that answer is unlikely to please anybody. There is likely to be no shortage of applications that have never needed to bother with realtime scheduling but which still will not work well with arbitrary delays. Imposing such delays could lead to any number of strange bugs.

The big complaint, though, as expressed by Peter Zijlstra and others, is that this feature makes it easier for developers to get away with writing low-quality applications. If the pressure to remove badly-written code is removed, it is said, that code will never get fixed. Peter suggests that, rather than papering over poor behavior in the kernel, it would be better to simply kill applications that waste power. He was especially strident about applications that continue to draw when their windows are not visible; such problems should be fixed, he said, before adding workarounds to the kernel.

The massive improvements in power behavior that resulted from the release and use of PowerTop is often pointed to as an example of how things should be done. This situation is a little different, though. The wakeup reductions inspired by PowerTop were low-hanging fruit - processes waking up multiple times per second for no useful purpose. The timer slack controller is aimed at a different problem: wakeups which are useful when somebody is paying attention, but which are not useful otherwise. That is a trickier problem.

Determining when the user is paying attention is not always straightforward, though there some obvious signs. If the screen has been turned off because the input devices are idle, the user probably does not care. Other cases - non-visible tabs in web browsers, for example - have been cited as well, but the situation is not so obvious there. As Matthew Garrett put it: buried tabs still need timer events "because people expect gmail to provide them with status updates even if it's not the foreground tab." Fixing the problem in applications would require figuring out when nothing is going on, finding a way to communicate it to applications, then fixing large numbers of them (some of which are proprietary) to respond to those events.

It is not surprising that developers facing that kind of challenge might choose to improve the situation with a simple kernel patch instead. It is, certainly, a relatively easy path toward better battery life. But the patch does raise a fundamental policy question that has never been answered in any definitive way. Does mitigating the effects of (what is seen as) application developer sloppiness encourage the distribution of low-quality code and worsen the system in the long run? Or, instead, does the "tough love" approach deter developers and impoverish our application environment without actually fixing the underlying problems?

An answer to that question is unlikely to come in the near future. What that probably means is that the current fuss will be enough to keep the timer slack controller from getting in through the 3.2 merge window. It also seems unlikely to go away, though; we are likely to see this topic return to the mailing lists in the future.

Index entries for this article
KernelControl groups
KernelTimers


to post comments

Timer slack for slacker developers

Posted Oct 17, 2011 22:20 UTC (Mon) by josh (subscriber, #17465) [Link] (3 responses)

I agree entirely with the point of view that this supports applications which abuse timers to poll for events rather than blocking until those events occur. We should fix those applications.

Why should the gmail client-side code require a timer event? It should simply receive notifications from the gmail server when mail arrives, via one of the many mechanisms browsers now offer for server push.

Timer slack for slacker developers

Posted Oct 17, 2011 22:38 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (2 responses)

gmail doesn't - what I was replying to there was a query about why javascript is run at all in background tabs.

Timer slack for slacker developers

Posted Oct 17, 2011 22:54 UTC (Mon) by brother_rat (subscriber, #1895) [Link] (1 responses)

The N900 browser suspends Javascript if you switch away from the browser view (there's an option for how quickly to do it I think). It does seem to break a not insignificant number of websites, such as 'live updating' news/sport stories on the BBC.

Timer slack for slacker developers

Posted Oct 18, 2011 9:38 UTC (Tue) by ab (subscriber, #788) [Link]

N900 (and N9, N950 as well) do have a user-space framework to coordinate wakeups from applications themselves. This is used by almost all networking applications to coordinate their wakeups. https://meego.gitorious.org/meego-middleware/libiphb/blob... is client side API, while https://meego.gitorious.org/meego-middleware/dsme/trees/m... is server-side part of overall device state management.

Of course, it requires changes from the application side and also a common coordination mechanism. Adding such changes to applications isn't that hard, the API is rather simple. What is more important is that apps will anyway need changes to determine their own periods of inactivity.

Perhaps it will be helpful to have a kernel-side mechanism to get overall configuration easier but the situation as described in the article wakes memories of Android kernel vs upstream kernel debates for the similar use on the kernel side.

Timer slack for slacker developers

Posted Oct 17, 2011 23:14 UTC (Mon) by idupree (guest, #71169) [Link] (5 responses)

"There is likely to be no shortage of applications that have never needed to bother with realtime scheduling but which still will not work well with arbitrary delays. Imposing such delays could lead to any number of strange bugs."

I can attest to something similar. When my system pauses multiple seconds due to heavy swapping pressure, now and then something weird happens (like Amarok forgetting my metadata or crashing or something). Though recently, my system's been pretty stable in that regard.

Is there a testing tool, akin to Valgrind, that can insert random multi-second delays in applications in order to help test/debug them under such conditions?

Timer slack for slacker developers

Posted Oct 17, 2011 23:29 UTC (Mon) by yoe (guest, #25743) [Link] (2 responses)

trace it in a debugger, and stop/restart it at random?

note that gdb is scriptable, if I'm not mistaken...

Timer slack for slacker developers

Posted Oct 18, 2011 0:21 UTC (Tue) by idupree (guest, #71169) [Link]

(Not that I would have much idea how to debug timing issues, possibly race conditions, in a program I've haven't already worked with the code of! But I really should look into this for my own programs sometime... provided I can concoct some useful tests within an hour or two, which may or may not work out.)

Timer slack for slacker developers

Posted Oct 18, 2011 15:58 UTC (Tue) by nix (subscriber, #2304) [Link]

You don't need a debugger for that: just hit it with SIGSTOP and SIGCONT at random.

(You'd probably need to hit whole process groups at once -- using negative PIDs -- to attempt to simulate the way the kernel fires slacked timers in groups.)

Timer slack for slacker developers

Posted Nov 20, 2011 19:56 UTC (Sun) by oak (guest, #2786) [Link]

Valgrind can at least slow the apps a lot, so it can be used for flushing out & detecting some timing related race conditions. Additionally you could add some dummy higher priority program to delay tested program's wakeups.

Most of the time it's some races...

Posted Nov 20, 2011 20:07 UTC (Sun) by khim (subscriber, #9252) [Link]

In most cases it's better to run program in question under TSAN and fix data races found.

Of course if people are suing sleep(3) as sychronization primitive it'll not help, but then little can be done with such programs anyway.

Timer slack for slacker developers

Posted Oct 17, 2011 23:16 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

I think that this would be a very useful thing to have available for mobile devices.

there are a lot of things that you really do want to have the system do when it's otherwise asleep

check for alarms

check for new mail

check for ...

currently all of the apps that do these things schedule their own wakeup, and so if you have several apps that do these sorts of things, the system is waking up quite a few times.

If it was possible to set the timer slack for a group of apps to 60 seconds, then all the apps that would wake up in that minute could be processed in one wakeup call

even the things that want to happen far more frequently could probably be forced to live with a 1 second slack (one wakeup per second)

yes, this can be misused, but most features can, that shouldn't eliminate the cases where it can be used properly.

the counter position of "force every programmer to implement linux-specific calls to properly take advantage of possible power savings on every different type of device out there, for every user's priorities" is just not realistic.

first off, different devices are going to have different sensitivies to wakeups, so what is a significant power saving on one device will have little effect other than making things less responsive on another device.

Secondly, the amount of slack that is appropriate for the app will vary on the running conditions. Is the device plugged into mains power and connected to a external screen and keyboard (aka desktop replacement), or is it on battery power, and is that battery power running low so the user is willing to accept being less up-to-date in exchange for the battery lasting longer? It would be horribly inefficient for each application to be trying to run these calculations (not to mention the horrible problem of trying to define the policy for every application)

this is completely ignoring the cynical statement that many of these programmers are not that good and are not going to 'fix' their applications.

Timer slack for slacker developers

Posted Oct 18, 2011 14:48 UTC (Tue) by mjthayer (guest, #39183) [Link]

> the counter position of "force every programmer to implement linux-specific calls to properly take advantage of possible power savings on every different type of device out there, for every user's priorities" is just not realistic.

As Lennart suggested, one could go the other way and expect applications which need realtime performance to use the (standard as far as I know) realtime APIs. This would be the finger-pointing approach to getting people to fix their applications, but directed at those that can't take the slack rather than those that can but don't.

Educating users about which applications do things right and which do not always did wonders for Apple and so far the experience has been that it can work for the Linux/FLOSS world as well.

Timer slack for slacker developers

Posted Oct 17, 2011 23:40 UTC (Mon) by yoe (guest, #25743) [Link] (17 responses)

I think the proper way to implement this is not to force a particular timing upon applications, but to have applications register to use a "slacky" timing when available.

For instance, in response to issues that were found with powertop, an API call was added to glib that would align timeouts on one-second intervals, thereby grouping them together so the processor would only wake up once per second, in the worst case.

Analoguously, glib (or similar libraries for other environments) could provide a way for an application to say "I need to check a condition once in a while. It's probably a good idea to do this fairly often if there's an active user using the system, but it's perfectly fine to slow down a bit if not". The library could then, say, vary the time to a point between two extremes given by the app's programmer, based on the amount of time it's been since the user last did "something". Perhaps the shortest time could even be chosen only if the application in question is actually active.

This way, mail apps can provide mail "instantly" if the user is actively reading mail, but slow down a bit if not. Or so.

Oh hell, just thinking out loud here.

Timer slack for slacker developers

Posted Oct 17, 2011 23:44 UTC (Mon) by dlang (guest, #313) [Link] (16 responses)

the question is how much of this policy belongs in each application?

it would be good to let the application do things to indicate what it needs (the once a second wakeup is a perfect example), but trying to encode all the policy into each application seems like a configuration disaster waiting to happen.

we don't try to have each application configure screen dimming, sleep, etc. Instead we have one power management application on the system that looks at what the system is doing, including what the applications request and hint that they want, but that then makes the decisions on what to do.

This sort of thing seems like it fits in very nicely with the other things the power management app is doing.

Timer slack for slacker developers

Posted Oct 18, 2011 7:00 UTC (Tue) by cmccabe (guest, #60281) [Link] (15 responses)

I'm afraid I'm going to have to disagree. This kind of policy belongs in the applications-- they all have different requirements.

For example, a movie player should keep the system awake for as long as it is playing the movie. Anything less will just result in a lot of frustrated users. This problem is more than theoretical-- I have had this problem under Linux before where the screen has been blanked after a few "idle" minutes playing a movie. Thankfully my current build of mplayer now seems to be able to tell Xorg not to blank the screen, but "clever" solutions like this could reintroduce the same kinds of problems.

Android got this right. An application that needs to keep the phone from going to sleep can take a wakelock. There is zero chance that the phone will, for example, go to sleep in the middle of a phone call because a "clever" power manager thought that not many buttons had been pushed recently.

There are some people who argue that applications should not have the ability to influence power management because application developers cannot be trusted. But this is the worst kind of tribal mentality ("kernel devs good, user space devs bad"). The best thing that we could do as kernel and systems software developers is not to hide functionality from the upper layers, but to make it clear what exactly is going on. We need more tools like powertop that can point a finger at bad code and get it fixed.

Timer slack for slacker developers

Posted Oct 18, 2011 7:05 UTC (Tue) by dlang (guest, #313) [Link] (13 responses)

note that we are talking about a user space power manager here, so this isn't a kernel vs user-space debate

it's perfectly fine for a movie player to tell the power management daemon that it doesn't want the system to go to sleep, but the power management daemon may still decide to save power by shutting down the other 7 cores of an 8-core machine (or slow the clock down, but not so much that the app maxes out the remaining processing time)

Android 'gets it right' only for the simple case of a single core system. for multiple cores the information of an app claiming "I don't want to sleep now" isn't enough

Timer slack for slacker developers

Posted Oct 18, 2011 7:57 UTC (Tue) by cmccabe (guest, #60281) [Link] (12 responses)

> it's perfectly fine for a movie player to tell the power management
> daemon that it doesn't want the system to go to sleep, but the
> power management daemon may still decide to save power by shutting
> down the other 7 cores of an 8-core machine (or slow the clock down,
> but not so much that the app maxes out the remaining processing time)

Well, if you're using an Intel system, the unused cores will be clocked slower by the firmware running on them. cpufreq also has a role, and a bigger role on ARM. I don't think userspace daemons come into play at all here.

Your talk about "slowing the clock down" definitely seems like something that would screw up my movie playback. Even ignoring that issue, what if your externally imposed policy slows down other daemons that the movie player needs to function efficiently? A lot of applications use D-BUS these days. It's pretty clear that messing with timings on the pulseaudio daemon will probably cause audio glitches and dropouts in my movie playing, even though you oh so kindly allowed the movie player to continue running. Or If the movie player is embedded in Firefox, and you decide to mess with that process, there could also be problems.

If application developers want to opt in to a lazy clock policy, that is fine. We should set up a system call that allows them to do that, and start making use of that. But it shouldn't be forced on developers who don't want it.

Incidentally, kernel developers get really mad when other folks pull the same kind of "for your own good" BS on them-- for example, when hard drives "optimize" by not actually flushing the writes out to disk when they're told to do so.

At the end of the day, if you take away the ability of stupid people to do stupid things, you also take away the ability of smart people to do smart things. We have enough application developers out there that we can afford to install the programs written by the smart ones and ignore or fix the rest.

> Android 'gets it right' only for the simple case of a single core
> system. for multiple cores the information of an app claiming "I
> don't want to sleep now" isn't enough

Wakelocks and cpufreq are two distinct systems with different roles.

In general, Android's approach is to allow the application to use as much CPU as it wants, but to have good monitoring tools that allow users to spot and de-install CPU hogs. However, you need certain capabilities to do things like take a wakelock.

Timer slack for slacker developers

Posted Oct 18, 2011 8:18 UTC (Tue) by dlang (guest, #313) [Link] (11 responses)

and as a result of the possible misuse of this tool you (and others) appear utterly opposed to giving me, the administrator of a machine, any ability to override what the application programmers choose (or don't choose) to do.

for every scenario that someone paints where this could be useful, there is going to be another scenario where it could be misused.

but the same thing can apply to 'renice' or 'ionice' as well, why aren't you also opposed to the ability for the evil (or clueless) system administrator to mess with the application by changing it's priority.

or for that matter ulimit can cause programs to fail, it should be ripped out of the system, instead every applications should be audited to make sure it only allocates the resources that it really needs, and the applications should then be changed to cooperatively give up resources if something else needs it.

for that matter, what about preemptive time slicing, the system is more efficient if the applications cooperate instead, so we should just change every application to do cooperative time slicing, including setting the priority between applications (after all, doesn't the application writer know what's best for the applications)

why are all of these ways for an administrator to control what's happening with their machine acceptable, but the idea that the administrator could override the application's decision (or lack thereof) on timer slack be so horrible?

Timer slack for slacker developers

Posted Oct 18, 2011 9:28 UTC (Tue) by dgm (subscriber, #49227) [Link] (10 responses)

You're right. The fact that you _can_ do something doesn't mean you should, or will. But you may. And that means application writers and users _will_ pay a price for your flexibility: users complaining about broken applications because someone thought all applications should or should not do something.

Thus, I would say: if this gets finally in, advise to distributions and admins to NOT use it, unless you understand in gory detail ALL the implications. With great power... yada, yada, yada.

Frankly, I would add that using cgroups with this is calling for abuse. Make it a per process option and it will become much less "dangerous".

Timer slack for slacker developers

Posted Oct 18, 2011 14:18 UTC (Tue) by bronson (subscriber, #4806) [Link] (2 responses)

Why is it much less dangerous? What's the difference between the master app changing a cgroup and the master app looping through a list of children and changing them one by one?

Timer slack for slacker developers

Posted Oct 18, 2011 16:35 UTC (Tue) by dgm (subscriber, #49227) [Link]

The difference is that group membership is transitive. The children processes of your children (or of any other process added to the group) do belong to it by default. Knowing that all those processes will behave correctly is substantially more difficult than doing direct manipulation of a list of known processes.

Of course, if the group hierarchy is standardized, processes known to behave badly can be moved out of the group, but it would mean modification of the process, thus (at least partially) negating the benefits of being able to do it without changing code.

Timer slack for slacker developers

Posted Oct 18, 2011 16:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Race conditions. You need to be very sure that you haven't missed a child that has been created just after you've started looping.

cgroups allow to do it atomically.

Timer slack for slacker developers

Posted Oct 18, 2011 17:49 UTC (Tue) by dlang (guest, #313) [Link] (6 responses)

why is this option for cgroups any more dangerous than cpu, memory, or disk I/O throttling?

in all cases you can cause an application being limited to behave in ways different from what the application programmer expected.

Timer slack for slacker developers

Posted Oct 18, 2011 21:40 UTC (Tue) by cmccabe (guest, #60281) [Link] (5 responses)

Operating systems certainly need to have configuration knobs. nice, ionice, ulimit, and friends definitely fall into this category. But the more configuration knobs you have, the more complex the system gets to administer, to understand, and to program for. Complexity tends to breed bugs, frustrated users, and miscommunication between different subsystems. That is why adding a new knob just to do something that you could have done with the old knobs is something that we should resist.

nice and ulimit are also things specified by POSIX and implemented by many other operating systems. Like Linus said, Linux-specific interfaces tend not to get very much use, even when they're much better than the standard interfaces they're replacing. In this case, you're talking about adding a platform specific interface that is not better than what it's replacing, just different. It's also an interface that application developers have no easy way to opt-out of. IMHO, not a win at all.

Timer slack for slacker developers

Posted Oct 18, 2011 22:23 UTC (Tue) by dgm (subscriber, #49227) [Link] (2 responses)

Not onyl that. This particular know is for making timers misbehave in order to save power. An application can be expected to work with less CPU or IO, but faulting timers?

Timer slack for slacker developers

Posted Oct 18, 2011 22:37 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

remember that timers are not precise. they will delay you until _at_least_ the timer wakeup, they may delay you further.

normally this further amount is relatively short, but if the system is under high load it could be a significant amount of time (if the system is swapping badly, it could be seconds after the timer is scheduled to fire before the application executes)

this is just changing things to that some other suitably privileged task in the system can increase the maximum lag that the application sees.

it's not something that can't happen today.

Timer slack for slacker developers

Posted Oct 19, 2011 13:39 UTC (Wed) by dgm (subscriber, #49227) [Link]

You're completely right, of course. I'm sold.

Timer slack for slacker developers

Posted Oct 18, 2011 22:33 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

in this case, short of modifying the source code (not something for an administrator to do), what current knob do you have to be able to change the timer slack for another process (or group of processes)?

as I see it there is no knob that you (as an administrator) can twist for this today.

there's also no knob that you as a application programmer can twist that will change the slack for you and your running children, instead you would have to have each child invoke the change independently.

Timer slack for slacker developers

Posted Oct 19, 2011 2:08 UTC (Wed) by cmccabe (guest, #60281) [Link]

Application programmers can do things like increase timeouts for background threads or (often) avoid polling entirely. Lennart email talks about "stuff like closed source crap, and all kinds of other things you cannot fix." One very common novice programmer mistake is using polling where you don't need to.

The one use case that is intriguing is synchronizing application timers so that a lot of them fire at once, in order to save on wakeups. I honestly can't think of any good way to do this with the existing timeout APIs-- maybe someone else can.

All these comments about "slack" are making me think of this guy:
http://en.wikipedia.org/wiki/File:Bobdobbs.png

Timer slack for slacker developers

Posted Oct 19, 2011 7:16 UTC (Wed) by Rudd-O (guest, #61155) [Link]

It is not the movie player that keeps the system awake, but actually the power management application that does the job on behalf of the movie player. Otherwise you end up with horrible hacks such as Xine (xine-lib) "pressing" the Shift key every 30 seconds.

This is why the policy belongs in the power management application, not in the user applications (which should, of course, have a mechanism available to commandeer the power management application or otherwise override policy decisions).

Timer slack for slacker developers

Posted Oct 18, 2011 0:02 UTC (Tue) by Simetrical (guest, #53439) [Link] (1 responses)

The use-case of background tabs in browsers is an excellent one -- we do not want browsers firing events too frequently for tabs the user isn't looking at. However, no special kernel support is required, because browsers already handle it. In both of the major open-source browsers (Firefox >= 5/Chrome >= 11), timers set by background tabs will fire at most once per second:

https://bugzilla.mozilla.org/show_bug.cgi?id=633421
http://code.google.com/p/chromium/issues/detail?id=66078

This tends to support the idea that well-written applications don't need to have the kernel fix their mistakes for them. (But most applications are not nearly as well-written as major browsers.)

Timer slack for slacker developers

Posted Oct 18, 2011 8:04 UTC (Tue) by misiu_mp (guest, #41936) [Link]

To solve it in userspace you would have to modify applications to use one common timer synchronization entity. Otherwise even perfectly good applications will cause frequent wakeups. If you have 10 processes running and each is neatly set up not to wake up more often than once per second, you will wake up 10 times per second.

Timer slack for slacker developers

Posted Oct 18, 2011 3:12 UTC (Tue) by jcm (subscriber, #18262) [Link] (16 responses)

By the way, I've been reading the X11 protocol specification recently (not so found of the newer desktops and want to learn enough to feed myself). In X11, applications receive Expose events every time a window configuration event happens or a window is partially or completely covered or uncovered. Most decent toolkits should react appropriately by not redrawing non-visible parts of the window, so I'm curious to know what Peter is referring to here.

Timer slack for slacker developers

Posted Oct 18, 2011 3:52 UTC (Tue) by neilbrown (subscriber, #359) [Link] (3 responses)

I'm not Peter so I can only guess.

But applications tend to update a widget, and then let the tool kit render the widget to the screen.

A clock, for example, might have a 'label' widget and wake up every second to 'strftime' the time to a buffer and then label.set_text(buffer). That will update some internal state and tell the server to "invalidate" the relevant region of the window. If the window is visible, or subsequently when it becomes visible, the X server sends an 'expose' event and then the tool kit renders the widget to the window.

The application is oblivious of the fact that the label never actually makes it to the display. It just keeps on wasting power generating labels that are never seen.

It could find out though. It could connect to the "expose" event and only continue updating while the expose events are coming in.

Getting this smooth and jitter free might be tricky. For a simple widget it should be sufficient to wait for an expose, then calculate and set the image, then let the tool-kit draw it. Then when the next event (e.g. timer tick) says that the image should change, you just invalidate the window and stop the timer. Don't bother drawing or listening for further notifications until the expose happens.

However for rendering that is more complex doing it that way might be too slow. You might have to put up a "waiting" image when you get an expose after a long idle period, spend 500ms generating your image, then render it and have the next one ready before invalidating the old one.

e.g. a GPS tracker might not even bother down-loading maps when the window is not exposed. But when it gets an expose event in a new city, it cannot create a correct render straight away, and should not present and out-of-date render. So you have to wait for a while. This presents the classic trade-off. We save power by not updating regularly, but the cost is latency when we want to re-activate.

So it is possible, but it could be a bit messy. Extra tool-kit support would probably make it smoother. And there is probably some tool kit that already makes this really easy and I'm looking forward to someone telling me which one.

Timer slack for slacker developers

Posted Oct 18, 2011 14:40 UTC (Tue) by mjthayer (guest, #39183) [Link] (2 responses)

> The application is oblivious of the fact that the label never actually makes it to the display. It just keeps on wasting power generating labels that are never seen.
>
> It could find out though. It could connect to the "expose" event and only continue updating while the expose events are coming in.
>
> Getting this smooth and jitter free might be tricky.

Actually this isn't as bad as it sounds. The application can query the visibility state of the widget's window (and get notified when it changes), so it just has to keep a binary "on/off" state for updates. (I can't remember without checking whether there is a simple way to get information about which parts of a partially visible window are shown, but that would be much less important.)

Timer slack for slacker developers

Posted Oct 18, 2011 21:25 UTC (Tue) by neilbrown (subscriber, #359) [Link] (1 responses)

It probably isn't as bad as I made it sound.

I don't think the 'visibility state' (notified by "mapped" and "unmapped" events) is really enough because it doesn't tell you when a window is fully obscured by another window.

However it wouldn't be too hard to synthesis a simple binary state for the whole window by watching expose events.

And you don't really need to put up a "waiting" image like I suggested - just do what firefox does and leave the contents of the screen like they were before - firefox window decorations wrapped around 2 xterms and an xclock which isn't ticking any more :-)

To play with mapped events, try:

#!/usr/bin/env python

import gtk

w = gtk.Window()
w.show()

def mapped(w,ev):
    print "mapped"

def unmapped(w, ev):
    print "unmapped"

w.connect('map-event',mapped)
w.connect('unmap-event',unmapped)
w.connect('destroy', lambda x:gtk.mainquit())

gtk.mainloop()

Timer slack for slacker developers

Posted Oct 19, 2011 8:03 UTC (Wed) by mjthayer (guest, #39183) [Link]

> I don't think the 'visibility state' (notified by "mapped" and "unmapped" events) is really enough because it doesn't tell you when a window is fully obscured by another window.

Actually I thought it did:

"When the window changes state [...] to viewable and fully obscured, the X server generates [a VisibilityNotify event] with the state member of the XVisibilityEvent structure set to VisibilityFullyObscured." [1]

(Hope I'm not getting too boring here!)

[1] http://tronche.com/gui/x/xlib/events/window-state-change/...

Timer slack for slacker developers

Posted Oct 18, 2011 5:09 UTC (Tue) by geuder (guest, #62854) [Link] (11 responses)

> In X11, applications receive Expose events every time a window configuration event happens or a window is partially or completely covered or uncovered.

Traditionally yes. However, I believe to remember that at least one existing compositing window manager does not offer this nice functionality. So the application will update its window even when not visible.

Disclaimer: I don't remember the details. And if I did, I would probably be hindered by an NDA to get more detailed.

Timer slack for slacker developers

Posted Oct 18, 2011 5:20 UTC (Tue) by jcm (subscriber, #18262) [Link] (8 responses)

Fortunately I explicitly don't care about compositing window managers :)

Timer slack for slacker developers

Posted Oct 18, 2011 9:31 UTC (Tue) by dgm (subscriber, #49227) [Link] (7 responses)

Unfortunately, you live in a world where most people do.

Timer slack for slacker developers

Posted Oct 18, 2011 15:01 UTC (Tue) by jcm (subscriber, #18262) [Link] (6 responses)

http://www.amazon.com/gp/product/B002TYEPWO/ref=dm_dp_trk...

I've decided there are plenty of things I enjoy about living in the present, but X11 is an area where I'm going to force a return to the 80s. I want a network transparent non-3D desktop that just works. I used to care about pretty window managers and all that nonsense, now I just want to get stuff done with my computer and have it otherwise left alone.

Timer slack for slacker developers

Posted Oct 18, 2011 16:08 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

Quite so. The pretty stuff is very pretty but every new bit of it seems to break something old that Just Worked until now, because nobody uses network transparency / nobody cares about power consumption / nobody cares about video playback / nobody cares about focus-follows-mouse / nobody cares about any older X extensions / ...

(OK, that *was* excessively cynical.)

Timer slack for slacker developers

Posted Oct 18, 2011 17:08 UTC (Tue) by jcm (subscriber, #18262) [Link] (2 responses)

Can be summarized as "I want a UNIX workstation and not a pretty desktop". I don't mean to sound offensive but I actively do not want any of that other stuff. I just want a 1980s style X11 workstation that works exactly as it always did, with X forwarding just working, and none of this 3D stuff. Sure, having extensions to play games is nice and all, but I don't need to rotate my desktop on a cube or have wobbly windows. I used to think I do, but that's before I realized I'd rather have a super boring UI that is set in stone. I want to do something and it always works just like it always did.

Timer slack for slacker developers

Posted Oct 18, 2011 23:16 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

I have similar goals, but I'd like Wayland so that I could (do what is essentially) moving *windows* between X servers instead of having windows rooted in the $DISPLAY they started with. The xpra tool can sort of do it, but it's still rooted to some other display and AFAICT doesn't really give the full power of, say, XMonad (happy to be wrong though) over the xpra windows. Combine this with per-application freeze and thaw that came up a few weeks ago that works between different machines and I can migrate a running system to another machine :) .

Timer slack for slacker developers

Posted Oct 20, 2011 22:30 UTC (Thu) by nix (subscriber, #2304) [Link]

I'd describe that as "I want a Unix workstation and not *just* a pretty desktop". The pretty is all right, but only if it doesn't break other stuff on its way. "Pretty" is less important than "works".

(But I know we are weird this way. Emacs versus vi is one thing, but most people would look at both of those and run screaming back to their pretty Eclipse. Actually, no, most people would run screaming back to Word, which is neither pretty nor functional...)

80's X - was: Timer slack for slacker developers

Posted Oct 18, 2011 21:08 UTC (Tue) by neilbrown (subscriber, #359) [Link] (1 responses)

I think the modern spelling of "network transparent" is "HTML5". However I find that xterm<->tmux provides a very usable 80's style windowing environment over a simple ssh connection.
Oh wait ... I don't think we had ssh in the 80's :-)

80's X - was: Timer slack for slacker developers

Posted Oct 18, 2011 22:12 UTC (Tue) by jcm (subscriber, #18262) [Link]

Yea but...HTML5 is the example folks give when told that Linux as a consumer OS doesn't have a stable platform for third party applications. They say "oh, but in the future...HTML5...hand wavy!" and all that. Conveniently forgetting that this was the future for iPhone right before Apple had to about-face and offer real apps. HTML5 and friends might be the future, but that future is further away than most think it is.

It's true we didn't have ssh in the 80s. Nor did we have Xinerama and lots of other things I do like, all of which are iterative improvements on what went before :)

Timer slack for slacker developers

Posted Oct 18, 2011 9:32 UTC (Tue) by smurf (subscriber, #17840) [Link] (1 responses)

Well, it's not that easy. Your window may be covered, except that the user presses Alt-Tab and expects to see current content (clocks, download managers, …). Or it may be covered by something that's almost, but not quite, opaque.

That's not the problem, though. The problem is that you can't depend on every program to get this right, but, on the other hand, you do need some way for a program to tell the kernel that when it says 0.2sec it MEANS 0.2sec and not something random that may be roughly of the same order of magnitude. Or not.

How exactly to do that is the kernel people's problem. New scheduler class, new kind of timer, whatever.

Timer slack for slacker developers

Posted Oct 29, 2011 18:50 UTC (Sat) by JanC_ (guest, #34940) [Link]

If the Window-switcher knows how to tell the application that it's "exposed" again, Alt-Tab shouldn't be a problem, I suppose?

Timer slack for slacker developers

Posted Oct 18, 2011 4:05 UTC (Tue) by neilbrown (subscriber, #359) [Link] (3 responses)

> Does mitigating the effects of (what is seen as) application developer sloppiness encourage the distribution of low-quality code and worsen the system in the long run?

<sarcasm>
You bet-cha it does. But we have seen this all before.

Backups just encourage people to be careless with files.

RAID just encourages people to buy cheap hardware.

The OOM killer just encourages people to use Java^W^W^W to get careless about freeing memory.

And the result?
- you cannot actually buy good reliable hard drives any more
- The '-i' flag is still not the default for "rm"
Fortunately memory keeps getting cheaper so bloat isn't too much of a problem.
</sarcasm>

The only problem I see with the proposal is that it could cause correctly working programs to fail, and regressions are generally frowned upon.
But if it really brings value - which someone can quantify and measure for us - then I suspect that it a hurdle that can be overcome.

Timer slack for slacker developers

Posted Oct 19, 2011 18:50 UTC (Wed) by jzbiciak (guest, #5246) [Link] (2 responses)

If I give an otherwise correct program too little RAM, too little disk, way to slow a CPU or too little of some other resource it needs to function properly, it'll fail for any of those reasons. I've got a ton of knobs I can turn with ulimit and other mechanisms to put the thumbscrews on applications.

How is that different from giving it too little timing precision in its wakeups?

Timer slack for slacker developers

Posted Oct 20, 2011 1:04 UTC (Thu) by neilbrown (subscriber, #359) [Link] (1 responses)

I think the ulimit argument and the "unpredictable load" arguments are actually very good. Not 100% sure they are bullet proof though.
I would want it to be possible for users to disable a feature like this for their whole session, and for individual programs (preferably without recompiling - maybe an LD_PRELOAD or something). But it certainly seems worth trying.

Distros should do it and see what the response is and if it works well, send it upstream.... Only distros have an upstream-first policy these days (Which I support) so it has to be the other way around - but then if it turns out t o be a bad idea we are stuck with it forever.... I'm so confused!

Timer slack for slacker developers

Posted Oct 20, 2011 1:33 UTC (Thu) by dlang (guest, #313) [Link]

remember that what is bing proposed is not any particular use of the timer slack feature, but mearly adding the ability to twist this knob on another program rather than the program having to know how to twist the knob on itself

depending on how you (or your distro) define and use cgroups, you can then do this for an individual program, for a user session, or any other grouping that you can think of.

there are many, many ways to use this, some will be good, some will be horrible. It's in these ways to use this feature that the distros shoudl be experimenting.

and part of the experimentation will be that they find that particular programs work especially well with particular settings, and they can then push patches to have the apps do the changes themselves upstream.

this is not the first time that the kernel has supported things that change userspace performance, it was just a few releases ago that the kernel changed how it allocates CPU between different cgroups so that you could do things like put a compile in it's own group and have that group compete with all the other apps on the system as if it was one process, not hundreds of processes as afar as it's CPU share goes.

this change is less intrusive than that one was as this doesn't change the default behavior at all, it just makes it possible to do things.

Timer slack for slacker developers

Posted Oct 18, 2011 10:39 UTC (Tue) by lmb (subscriber, #39048) [Link]

Trying to reject the kernel feature of fiddling with another process's or cgroup's slack value for fear of the sloppy user-code it supposedly encourages just doesn't hold. We have setpriority() too, after all.

The ability to implement a common user-space controller for this makes sense; it also makes sense that applications somehow register with it to inform them of their special requirements (or that the controller maintains a list for applications that are important enough but haven't yet been enhanced accordingly). Merging and deciding between possibly conflicting policies is not the kernel's task.

Timer slack for slacker developers

Posted Oct 18, 2011 16:19 UTC (Tue) by k8to (guest, #15413) [Link] (1 responses)

It's hard to get this stuff right.

For example, if my laptop is plugged into the wall, and I start 4 hour network transfer, and come back an hour later and find it has put itself to sleep -- I'm not amused.

Timer slack for slacker developers

Posted Oct 18, 2011 19:05 UTC (Tue) by yokem_55 (subscriber, #10498) [Link]

I run into this a lot. The power savings I see when I put my desktop system to sleep regularly is really quite substantial ($10-$15/mo off my electric bill), but anytime I have a long running background job that doesn't use the f.d.o power management sleep inhibit api, I see precisely this misbehavior. I've come across a simple python script (see here) that places an inhibit on suspend that I've used in wrapper scripts for several command line programs that if they are running, should keep the system awake. It would be much nicer though if a lower level, more system wide setup could be contrived.

Timer slack for slacker developers

Posted Oct 20, 2011 19:28 UTC (Thu) by iq-0 (subscriber, #36655) [Link] (1 responses)

A lot of the problems around power-management and timer-slack have to do with programs being unaware of some part of the state of the system. And part of it has to do with the state being distrubuted and nobody having a good idea where to look.

So effectively you want some global state register where programs can hook themselves to state change events (this could be kernel provided or fully userspace (e.g. dbus)).

Now we can have power management related states like "active","passive","might_suspend","light_wake_cycle" (just thinking aloud, not sure which would be sensible). And another state register could be for memory states: "no_pressure","slight_pressure","high_pressure","swapping_frenzy". Or a state register for the monitor state "highlight","lowlight","screensaver","suspended","off","headless"

Now programs are able to act on certain transitions (database servers can ramp up their caching when under "no_pressure" and flush caches when under "high_pressure". And webbrowsers can start slacking their timers more when the system is "passive", while download processes can start notifying the system not to suspend when "might_suspend" is entered (otherwise that would just be unnecessary overhead).

The biggest problem at the moment is that nobody has any idea how to gather any of the above information in a maintainable way. So that alone would be a good argument for a kernel supported alternative.

Timer slack for slacker developers

Posted Nov 20, 2011 19:54 UTC (Sun) by oak (guest, #2786) [Link]

If you have a lot of programs running that subscribe to these events, then you have the problem that half the world wakes up to handle them and your performance goes south. So there needs to be also some mechanism that delivers these events only to apps where it's still relevant. When such state events are relevant to an app, varies between states.

As an example from phone UIs: non-visible applications shouldn't be interested about screen blanking event. Apps themselves could unsubscribe from events when they become non-visible, and subscribe back when they again become visible, but with composited windows there's AFAIK no standard message/property to tell about that (N900 Hildon widgets & N9 MeegoTouch have their own window properties for this).


Copyright © 2011, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds