Short sleeps suffering from slack
The C-library sleep() function is defined to put the calling process to sleep for at least the number of seconds specified. One might think that calling sleep() with an argument of zero seconds would make relatively little sense; why put a process to sleep for no time? It turns out, though, that some developers put such calls in as a way to relinquish the CPU for a short period of time. The idea is to be nice and allow other processes to run briefly before continuing execution. Applications that perform polling or are otherwise prone to consuming too much CPU are often "fixed" with a zero-second sleep.
Once upon a time in Linux, sleep(0) would always put the calling process to sleep for at least one clock tick. When high-resolution timers were added to the kernel, the behavior changed: if a process asked to sleep on an already-expired timer (which is the case for a zero-second sleep), the call simply returned directly back to the calling process. Then came the addition of timer slack, which can extend sleep periods to force multiple processes to wake at the same time. This behavior will cause timers to run a little longer than requested, but the result is fewer processor wakeups and, thus, a savings of power. In the case of a zero-second sleep, the addition of timer slack turns an expired timer into one that is not expired, so the calling process will, once again, be put to sleep.
The default timer slack, at 50µs, is unlikely to cause visible changes to the behavior of most applications. But it seems that, on some systems, the timer slack value is set quite high - on the order of seconds - to get the best power behavior possible. That can extend the length of a zero-second sleep accordingly, leading to misbehaving applications.
Matthew Garrett, working under the notion that breaking applications is bad, submitted a patch making a special-case for zero-second sleeps. The idea is simple: if the requested sleep time is zero, timer slack will not be added and the process will not be delayed indefinitely. The problem with this approach is that the process will still not get the desired result: rather than yielding the processor, it will have simply performed a useless system call and gone right back to whatever it was doing before. Without timer slack, a request to sleep on an expired timer will return directly to user space without going through the scheduler at all.
An alternative would be to transform sleep(0) into a call to sched_yield(). But that idea is not hugely popular with the scheduler developers, who think that calls to sched_yield() are almost always a bad idea. It is better, they say, to fix the applications to stop polling or doing whatever else it is that they do that causes developers to think that explicitly yielding the CPU is the right thing to do.
According to Matthew, the number of affected applications is not tiny:
Normal practice in kernel development would be to try to avoid breaking those applications if possible. Even in cases where applications are relying on undefined and undocumented behavior - certainly the case here - it is better if a kernel upgrade doesn't turn working code into broken code. Some participants have suggested that the same approach should be taken in this case.
The situation with sleep(0) is a little different from others, though. Application developers cannot claim a long history of working behavior in this case, since the kernel's response to a zero-second sleep has already changed a few times over the course of the last decade. And, according to Thomas Gleixner, it is hard to know when the special case applies or what should be done:
Thomas worries that there may be calls for special cases for similar calls
- single-nanosecond calls to nanosleep(), for example - and that
the result will be an accumulation of cruft in the core timer code. So,
rather than try to define these cases and maintain the result indefinitely,
he thinks it is better just to let the affected code break in cases where
the timer slack has been set to a large value. And that is where the
discussion faded away, suggesting that nothing will be done in the kernel
to reduce the effect of timer slack on zero-second sleeps.
| Index entries for this article | |
|---|---|
| Kernel | Development model/User-space ABI |
| Kernel | hrtimer |
| Kernel | Timers |
Posted Feb 23, 2012 5:24 UTC (Thu)
by xorbe (guest, #3165)
[Link] (1 responses)
Posted Feb 23, 2012 13:43 UTC (Thu)
by vonbrand (subscriber, #4458)
[Link]
Or better use something like having a slack of 10% incomming value + 1ns
Posted Feb 23, 2012 13:55 UTC (Thu)
by Ben_P (guest, #74247)
[Link] (1 responses)
I've seen so many poorly written programs "fix" concurrency problems with yields that I'm quite cynically whenever I see them in code. Unless it's in some very primitive concurrency or IO; yields only seem to delay incorrect code from breaking.
Posted Feb 24, 2012 23:02 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
It seems like you've answered your own question.
I have no trouble believing that these programs you've seen worked better after sleep(0) was added than before. Maybe it's just within a narrow field of application, but that may be the only field that matters. You might say these programs don't deserve to keep working, even in that narrow application, but you can't deny that making sleep() a no-op would do damage there.
Posted Feb 24, 2012 0:29 UTC (Fri)
by cmccabe (guest, #60281)
[Link] (4 responses)
Posted Feb 25, 2012 21:08 UTC (Sat)
by nevets (subscriber, #11875)
[Link] (2 responses)
If you have a set of threads all at the same priority, running FIFO and pinned to the same CPU, you can use sched_yield() to put yourself behind the other threads with the same priority and let them work. I've been on one project that did this.
The kernel stop_machine mechanism use to do this. It used yield() to let its other threads get the scheduler (all running highest FIFO priority). It did this method up till v2.6.26, after that, the algorithm was changed.
Posted Feb 26, 2012 22:48 UTC (Sun)
by cmccabe (guest, #60281)
[Link] (1 responses)
Incidentally, I was around for the cooperative multitasking days on Mac OS 6. It was not good. I'm sure there's some rationale for simulating that kind of thing in userspace, but a lot of times it smells like doing something in userspace that you ought to be doing in the kernel.
Posted Feb 27, 2012 12:21 UTC (Mon)
by jengelh (guest, #33263)
[Link]
Posted Mar 1, 2012 13:12 UTC (Thu)
by farnz (subscriber, #17727)
[Link]
There are two problem cases where that's a bad implementation:
Note that the second case is specific to SCHED_FIFO - other scheduling algorithms will preempt a CPU-bound task if something else of same priority needs the CPU. SCHED_FIFO specifically does not allow that to happen, so you need some sensible mechanism for a task to say "I'm still CPU-bound, but this is an appropriate point to preempt me if another task needs to run".
Posted Feb 24, 2012 8:16 UTC (Fri)
by rvfh (guest, #31018)
[Link] (15 responses)
So the problem is not just sleep(0) then! sleep(1) might sleep several seconds too... Isn't this the first issue to fix? Who decided my sleep(1) could wait several seconds and not just the 1 I coded?
To me the problem is when the sleep requested period is less than the slack value, and that's what I would fix.
Posted Feb 24, 2012 8:45 UTC (Fri)
by dlang (guest, #313)
[Link] (2 responses)
Posted Feb 24, 2012 9:29 UTC (Fri)
by rvfh (guest, #31018)
[Link] (1 responses)
What do we do? Either
I think this calls for a new user-space API, such as:
But sleep's behaviour should not be changed.
Posted Feb 24, 2012 9:55 UTC (Fri)
by tglx (subscriber, #31301)
[Link]
The kernel does not change sleep() behaviour. It's the sysadmins choice to set slack to something large. The kernel provides the mechanism, but not the policy.
Posted Feb 24, 2012 10:20 UTC (Fri)
by anselm (subscriber, #2796)
[Link] (9 responses)
The person who wrote the spec for sleep(), which says, among other things:
So if you believe that »sleep(1)« will sleep for exactly one second, you are mistaken about how sleep() works.
Posted Feb 24, 2012 22:57 UTC (Fri)
by giraffedata (guest, #1954)
[Link]
Posted Feb 26, 2012 10:44 UTC (Sun)
by IkeTo (subscriber, #2122)
[Link] (7 responses)
Nobody has any doubt about "sleep(1)" sleeping 1.01 second, or sleeping 2 whole days if the user suspended the computer. But that's a different proposition than expecting that "sleep(1)" would regularly sleep 10 seconds in a reasonably loaded system. As a developer, if I know that if instead it sleeps 10 seconds, my program will not behave as it should, what other options do I have?
Posted Feb 27, 2012 10:50 UTC (Mon)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
Posted Feb 27, 2012 15:51 UTC (Mon)
by fuhchee (guest, #40059)
[Link] (1 responses)
So a single systemwide knob has to be fixed by the user's sysadmin? That doesn't seem appropriate, just to retain previous capability.
Posted Feb 27, 2012 19:05 UTC (Mon)
by dlang (guest, #313)
[Link]
Posted Mar 1, 2012 5:24 UTC (Thu)
by kevinm (guest, #69913)
[Link] (2 responses)
A program that calls sleep(n) must already expect to sleep for at least n seconds. The timer-slack is just making these bugs more visible.
Posted Mar 3, 2012 2:05 UTC (Sat)
by IkeTo (subscriber, #2122)
[Link] (1 responses)
With the timer slack, all at a sudden users will see the timer being updated once fifteen seconds, and the final alert also late similarly. No user will miss such a "bug".
Now what option do I have?
1. I can ask the user to setuid root the program so that the program can use real-time scheduling, hoping that they have root privileges, and making every security sensitive user to raise their eyebrow.
2. I can ask the user to change the cgroup wide timer slack value, hoping that they have root privileges, and making the whole system wasting energy for all the time before the user/admin remember to reset the timer slack value, because they are now sleeping more than they do optimally.
3. I can stop sleeping at all, and instead use a busy loop with a very high nice level. Seems very drastic, waste a processor, waste power, make system load 1, but in a sense it is the best solution because it only affect the system for as long as the stop watch runs, and do not need root privileges.
How's that sound?
Posted Mar 7, 2012 17:22 UTC (Wed)
by mpr22 (subscriber, #60784)
[Link]
4. Write your program with a client/daemon architecture. The daemon can be activated as root by the system's daemon-managing services, then drop its privileges once it has given itself a real-time scheduling class. The client connects to the daemon via a socket, then sits in a blocking read() waiting for the once-a-second heartbeat packets from the daemon. If the daemon doesn't currently have any clients, it can just sit in a blocking accept() call until one shows up. Admittedly this stops people on machines they don't administer from installing and using your application. However, if the user isn't trusted to have administrative access to the system, they probably shouldn't be self-installing applications that require policy violations to work as expected anyway.
Posted Mar 9, 2012 8:41 UTC (Fri)
by Thomas (subscriber, #39963)
[Link]
Posted Feb 24, 2012 10:26 UTC (Fri)
by mpr22 (subscriber, #60784)
[Link]
Linus, by virtue of deciding in 1991 that his new kernel would be an ordinary preemptively multitasking kernel, rather than something more exotic. sleep() has always had the property on Unix-like OSes that your process might sleep longer than you expect.
Posted Mar 1, 2012 13:29 UTC (Thu)
by slashdot (guest, #22014)
[Link]
Short sleeps suffering from slack
Short sleeps suffering from slack
Short sleeps suffering from slack
Short sleeps suffering from slack
What types of applications break when sleep(0) just returns? ...
I've seen so many poorly written programs "fix" concurrency problems with yields ...
Ideal glibc implementation of sched_yield / sleep(0):Short sleeps suffering from slack
void sched_yield(void) {
fprintf(stderr, "You are a bad developer. Go away.\n");
}
Short sleeps suffering from slack
Short sleeps suffering from slack
Short sleeps suffering from slack
Short sleeps suffering from slack
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
* app dev says it should sleep 1 second
* sys owner says if you sleep, then you may sleep for 5 seconds
* sleep for 1 second, as requested, or
* sleep for up to 5 seconds and break the application
unsigned int sleep_slack(unsigned int seconds, unsigned int slack);
Are we really chasing the right issue?
Are we really chasing the right issue?
Who decided my sleep(1) could wait several seconds and not just the 1 I coded?
The suspension time may be longer than requested due to the scheduling of other activity by the system.
I think it's more basic than the documented function of sleep(). In a non-realtime timeshared OS, the OS can take several seconds from you any time it wants, whether you did a sleep() or not. If you get to run at all, you should be grateful.
oversleeping
Are we really chasing the right issue?
That would depend on whether your program breaking when the delay is 10 seconds instead of 1 second is justifiable. If it is, you'll just have to document that the user needs to turn down the timer slack setting on their system. If it isn't, fix your buggy program.
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
Are we really chasing the right issue?
Who decided my sleep(1) could wait several seconds and not just the 1 I coded?
Are we really chasing the right issue?
