Timer slack for slacker developers

Posted Oct 17, 2011 23:40 UTC (Mon) by yoe (guest, #25743)
Parent article: Timer slack for slacker developers

I think the proper way to implement this is not to force a particular timing upon applications, but to have applications register to use a "slacky" timing when available.

For instance, in response to issues that were found with powertop, an API call was added to glib that would align timeouts on one-second intervals, thereby grouping them together so the processor would only wake up once per second, in the worst case.

Analoguously, glib (or similar libraries for other environments) could provide a way for an application to say "I need to check a condition once in a while. It's probably a good idea to do this fairly often if there's an active user using the system, but it's perfectly fine to slow down a bit if not". The library could then, say, vary the time to a point between two extremes given by the app's programmer, based on the amount of time it's been since the user last did "something". Perhaps the shortest time could even be chosen only if the application in question is actually active.

This way, mail apps can provide mail "instantly" if the user is actively reading mail, but slow down a bit if not. Or so.

Oh hell, just thinking out loud here.

Timer slack for slacker developers

Posted Oct 17, 2011 23:44 UTC (Mon) by dlang (guest, #313) [Link] (16 responses)

the question is how much of this policy belongs in each application?

it would be good to let the application do things to indicate what it needs (the once a second wakeup is a perfect example), but trying to encode all the policy into each application seems like a configuration disaster waiting to happen.

we don't try to have each application configure screen dimming, sleep, etc. Instead we have one power management application on the system that looks at what the system is doing, including what the applications request and hint that they want, but that then makes the decisions on what to do.

This sort of thing seems like it fits in very nicely with the other things the power management app is doing.

Timer slack for slacker developers

Posted Oct 18, 2011 7:00 UTC (Tue) by cmccabe (guest, #60281) [Link] (15 responses)

I'm afraid I'm going to have to disagree. This kind of policy belongs in the applications-- they all have different requirements.

For example, a movie player should keep the system awake for as long as it is playing the movie. Anything less will just result in a lot of frustrated users. This problem is more than theoretical-- I have had this problem under Linux before where the screen has been blanked after a few "idle" minutes playing a movie. Thankfully my current build of mplayer now seems to be able to tell Xorg not to blank the screen, but "clever" solutions like this could reintroduce the same kinds of problems.

Android got this right. An application that needs to keep the phone from going to sleep can take a wakelock. There is zero chance that the phone will, for example, go to sleep in the middle of a phone call because a "clever" power manager thought that not many buttons had been pushed recently.

There are some people who argue that applications should not have the ability to influence power management because application developers cannot be trusted. But this is the worst kind of tribal mentality ("kernel devs good, user space devs bad"). The best thing that we could do as kernel and systems software developers is not to hide functionality from the upper layers, but to make it clear what exactly is going on. We need more tools like powertop that can point a finger at bad code and get it fixed.

Timer slack for slacker developers

Posted Oct 18, 2011 7:05 UTC (Tue) by dlang (guest, #313) [Link] (13 responses)

note that we are talking about a user space power manager here, so this isn't a kernel vs user-space debate

it's perfectly fine for a movie player to tell the power management daemon that it doesn't want the system to go to sleep, but the power management daemon may still decide to save power by shutting down the other 7 cores of an 8-core machine (or slow the clock down, but not so much that the app maxes out the remaining processing time)

Android 'gets it right' only for the simple case of a single core system. for multiple cores the information of an app claiming "I don't want to sleep now" isn't enough

Timer slack for slacker developers

Posted Oct 18, 2011 7:57 UTC (Tue) by cmccabe (guest, #60281) [Link] (12 responses)

> it's perfectly fine for a movie player to tell the power management
> daemon that it doesn't want the system to go to sleep, but the
> power management daemon may still decide to save power by shutting
> down the other 7 cores of an 8-core machine (or slow the clock down,
> but not so much that the app maxes out the remaining processing time)

Well, if you're using an Intel system, the unused cores will be clocked slower by the firmware running on them. cpufreq also has a role, and a bigger role on ARM. I don't think userspace daemons come into play at all here.

Your talk about "slowing the clock down" definitely seems like something that would screw up my movie playback. Even ignoring that issue, what if your externally imposed policy slows down other daemons that the movie player needs to function efficiently? A lot of applications use D-BUS these days. It's pretty clear that messing with timings on the pulseaudio daemon will probably cause audio glitches and dropouts in my movie playing, even though you oh so kindly allowed the movie player to continue running. Or If the movie player is embedded in Firefox, and you decide to mess with that process, there could also be problems.

If application developers want to opt in to a lazy clock policy, that is fine. We should set up a system call that allows them to do that, and start making use of that. But it shouldn't be forced on developers who don't want it.

Incidentally, kernel developers get really mad when other folks pull the same kind of "for your own good" BS on them-- for example, when hard drives "optimize" by not actually flushing the writes out to disk when they're told to do so.

At the end of the day, if you take away the ability of stupid people to do stupid things, you also take away the ability of smart people to do smart things. We have enough application developers out there that we can afford to install the programs written by the smart ones and ignore or fix the rest.

> Android 'gets it right' only for the simple case of a single core
> system. for multiple cores the information of an app claiming "I
> don't want to sleep now" isn't enough

Wakelocks and cpufreq are two distinct systems with different roles.

In general, Android's approach is to allow the application to use as much CPU as it wants, but to have good monitoring tools that allow users to spot and de-install CPU hogs. However, you need certain capabilities to do things like take a wakelock.

Timer slack for slacker developers

Posted Oct 18, 2011 8:18 UTC (Tue) by dlang (guest, #313) [Link] (11 responses)

and as a result of the possible misuse of this tool you (and others) appear utterly opposed to giving me, the administrator of a machine, any ability to override what the application programmers choose (or don't choose) to do.

for every scenario that someone paints where this could be useful, there is going to be another scenario where it could be misused.

but the same thing can apply to 'renice' or 'ionice' as well, why aren't you also opposed to the ability for the evil (or clueless) system administrator to mess with the application by changing it's priority.

or for that matter ulimit can cause programs to fail, it should be ripped out of the system, instead every applications should be audited to make sure it only allocates the resources that it really needs, and the applications should then be changed to cooperatively give up resources if something else needs it.

for that matter, what about preemptive time slicing, the system is more efficient if the applications cooperate instead, so we should just change every application to do cooperative time slicing, including setting the priority between applications (after all, doesn't the application writer know what's best for the applications)

why are all of these ways for an administrator to control what's happening with their machine acceptable, but the idea that the administrator could override the application's decision (or lack thereof) on timer slack be so horrible?

Timer slack for slacker developers

Posted Oct 18, 2011 9:28 UTC (Tue) by dgm (subscriber, #49227) [Link] (10 responses)

You're right. The fact that you _can_ do something doesn't mean you should, or will. But you may. And that means application writers and users _will_ pay a price for your flexibility: users complaining about broken applications because someone thought all applications should or should not do something.

Thus, I would say: if this gets finally in, advise to distributions and admins to NOT use it, unless you understand in gory detail ALL the implications. With great power... yada, yada, yada.

Frankly, I would add that using cgroups with this is calling for abuse. Make it a per process option and it will become much less "dangerous".

Timer slack for slacker developers

Posted Oct 18, 2011 14:18 UTC (Tue) by bronson (subscriber, #4806) [Link] (2 responses)

Why is it much less dangerous? What's the difference between the master app changing a cgroup and the master app looping through a list of children and changing them one by one?

Timer slack for slacker developers

Posted Oct 18, 2011 16:35 UTC (Tue) by dgm (subscriber, #49227) [Link]

The difference is that group membership is transitive. The children processes of your children (or of any other process added to the group) do belong to it by default. Knowing that all those processes will behave correctly is substantially more difficult than doing direct manipulation of a list of known processes.

Of course, if the group hierarchy is standardized, processes known to behave badly can be moved out of the group, but it would mean modification of the process, thus (at least partially) negating the benefits of being able to do it without changing code.

Timer slack for slacker developers

Posted Oct 18, 2011 16:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Race conditions. You need to be very sure that you haven't missed a child that has been created just after you've started looping.

cgroups allow to do it atomically.

Timer slack for slacker developers

Posted Oct 18, 2011 17:49 UTC (Tue) by dlang (guest, #313) [Link] (6 responses)

why is this option for cgroups any more dangerous than cpu, memory, or disk I/O throttling?

in all cases you can cause an application being limited to behave in ways different from what the application programmer expected.

Timer slack for slacker developers

Posted Oct 18, 2011 21:40 UTC (Tue) by cmccabe (guest, #60281) [Link] (5 responses)

Operating systems certainly need to have configuration knobs. nice, ionice, ulimit, and friends definitely fall into this category. But the more configuration knobs you have, the more complex the system gets to administer, to understand, and to program for. Complexity tends to breed bugs, frustrated users, and miscommunication between different subsystems. That is why adding a new knob just to do something that you could have done with the old knobs is something that we should resist.

nice and ulimit are also things specified by POSIX and implemented by many other operating systems. Like Linus said, Linux-specific interfaces tend not to get very much use, even when they're much better than the standard interfaces they're replacing. In this case, you're talking about adding a platform specific interface that is not better than what it's replacing, just different. It's also an interface that application developers have no easy way to opt-out of. IMHO, not a win at all.

Timer slack for slacker developers

Posted Oct 18, 2011 22:23 UTC (Tue) by dgm (subscriber, #49227) [Link] (2 responses)

Not onyl that. This particular know is for making timers misbehave in order to save power. An application can be expected to work with less CPU or IO, but faulting timers?

Timer slack for slacker developers

Posted Oct 18, 2011 22:37 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

remember that timers are not precise. they will delay you until _at_least_ the timer wakeup, they may delay you further.

normally this further amount is relatively short, but if the system is under high load it could be a significant amount of time (if the system is swapping badly, it could be seconds after the timer is scheduled to fire before the application executes)

this is just changing things to that some other suitably privileged task in the system can increase the maximum lag that the application sees.

it's not something that can't happen today.

Timer slack for slacker developers

Posted Oct 19, 2011 13:39 UTC (Wed) by dgm (subscriber, #49227) [Link]

You're completely right, of course. I'm sold.

Timer slack for slacker developers

Posted Oct 18, 2011 22:33 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

in this case, short of modifying the source code (not something for an administrator to do), what current knob do you have to be able to change the timer slack for another process (or group of processes)?

as I see it there is no knob that you (as an administrator) can twist for this today.

there's also no knob that you as a application programmer can twist that will change the slack for you and your running children, instead you would have to have each child invoke the change independently.

Timer slack for slacker developers

Posted Oct 19, 2011 2:08 UTC (Wed) by cmccabe (guest, #60281) [Link]

Application programmers can do things like increase timeouts for background threads or (often) avoid polling entirely. Lennart email talks about "stuff like closed source crap, and all kinds of other things you cannot fix." One very common novice programmer mistake is using polling where you don't need to.

The one use case that is intriguing is synchronizing application timers so that a lot of them fire at once, in order to save on wakeups. I honestly can't think of any good way to do this with the existing timeout APIs-- maybe someone else can.

All these comments about "slack" are making me think of this guy:
http://en.wikipedia.org/wiki/File:Bobdobbs.png

Timer slack for slacker developers

Posted Oct 19, 2011 7:16 UTC (Wed) by Rudd-O (guest, #61155) [Link]

It is not the movie player that keeps the system awake, but actually the power management application that does the job on behalf of the movie player. Otherwise you end up with horrible hacks such as Xine (xine-lib) "pressing" the Shift key every 30 seconds.

This is why the policy belongs in the power management application, not in the user applications (which should, of course, have a mechanism available to commandeer the power management application or otherwise override policy decisions).