The right way to yield

[Posted May 7, 2003 by corbet]

The kernel list still sees occasional complaints about the interactive response of recent development kernels. Many of these complaints, it turns out, relate to OpenOffice. The specific problem in this case has been found: a combination of a change in sched_yield() semantics and, one might say, suboptimal programming in OpenOffice.

The purpose of sched_yield() is to temporarily give up the processor to other processes. The process calling sched_yield() remains in the runnable state, and can normally expect to run again in short order. The 2.5 development series has made a subtle change in the way sched_yield() works, however. This call used to simply move the process to the end of the run queue; now it moves the process to the "expired" queue, effectively cancelling the rest of the process's time slice. So a process calling sched_yield() now must wait until all other runnable processes in the system have used up their time slices before it will get the processor again.

The new semantics arguably make more sense; a process calling sched_yield() should truly give up the processor. Some threaded applications, however, implement busy-wait loops with sched_yield(). OpenOffice is one such application; LinuxThreads also, apparently, uses this technique. This kind of application performs poorly with the new yield semantics; being moved to the "expired" queue makes the loop far less responsive.

There has been talk of ways of changing sched_yield() so that OpenOffice and other applications are not so badly penalized. One approach, for example, preserves the application's time slice, but drops its priority slightly. The consensus, however, seems to be that applications that loop on sched_yield() are simply broken and should be fixed. In the case of OpenOffice, this fix has already apparently been made.

The right way to yield

Posted May 8, 2003 4:08 UTC (Thu) by ncm (guest, #165) [Link] (3 responses)

Would somebody please post what the proper fix to code that uses sched_yield looks like? Can we LD_PRELOAD a library that maps sched_yield() calls to that fix, so that existing binaries aren't broken?

While every description I've read of this change describes it as "subtle", it strikes me not as subtle, but heavy-handed. Calling sched_yield() has always been the polite thing to do; now I don't know when I would use it at all. Is there now some other polite way to express what sched_yield() used to mean? When would I use sched_yield?

The right way to yield

Posted May 8, 2003 6:58 UTC (Thu) by Xman (guest, #10620) [Link] (1 responses)

As far as I see, sched_yield() still means the same thing it always did: give my time to someone else. The problem comes if you do a busy wait using sched_yeild(). I would think the right way to do this would be not to do a busy wait, but rather use a conditional variable. I'd love to see what the specific fix/suggestion for OpenOffice was though.

The right way to yield

Posted May 8, 2003 18:32 UTC (Thu) by spitzak (guest, #4593) [Link]

I don't think it is just doing busy/wait, but actually doing background
calculations. I would also like somebody to post what the "replacement
code" is. I did not know about sched_yield, but if I did I certainly
would have used it the same as Open Office did, and had trouble with this
change. Currently I use select() with a zero timeout, or do nothing at
all, when I have a big calculation and want to indicate that it is not
really important that it be done as fast as possible.

The right way to yield

Posted May 16, 2003 0:43 UTC (Fri) by r6144 (guest, #3443) [Link]

I really don't think there is a lot of places where sched_yield() is truly useful. If you want to wait for something, first try the correct primitive, and yielding only when there's no such primitive (what do you do when fork() fails and you want to wait?). As for Openoffice, I LD_PRELOAD'd a library that reduces sched_yield() (and nanosleep()) to NOPs, and all problems disappeared, and there doesn't seem to be excessive CPU usage or problems like that.

The right way to yield

Posted May 8, 2003 19:17 UTC (Thu) by dsime (guest, #5764) [Link]

I never thought of yeild as giving up time. rather it was giving up CPU. Sort of a polite way to not hold someone else up while waiting for a short delay.
An implementation of LWT if you will, the current implementation seems little different than a timeslice expiration. If so it would be better, for the application, to just "busy wait" then at least you may still have time when the event you are waiting for happens.

2024, Nginx, still calling sched_yield in a loop

Posted Sep 20, 2024 22:48 UTC (Fri) by gabriel_clima (subscriber, #166516) [Link]

Nginx still uses this, calls sched_yield in an infinite for loop. It does try to avoid it, makes a couple thousand attempts to atomic_cmp_set, but ultimately gives in and calls sched_yield.
That's their implementation of `ngx_rwlock_wlock(ngx_atomic_t *lock)`
Woe, woe upon the hapless diletante programmer electing to use this write lock function