LWN.net Logo

More changes for 2.6.16

More changes for 2.6.16

Posted Jan 21, 2006 9:45 UTC (Sat) by mingo (subscriber, #31122)
In reply to: More changes for 2.6.16 by nix
Parent article: More changes for 2.6.16

The "infinitely lower priority" scheduling policy would be SCHED_IDLE. The problem with SCHED_IDLE is that i've yet to see a correct implementation.

There are implementations that seem to work for users, but they dont handle one crutial thing: they dont prevent a SCHED_IDLE task from DoS-ing the kernel by holding some critical resource. [*]

E.g. you start a SCHED_IDLE task, it does some processing, calls the kernel and acquires /tmp's inode->i_mutex and gets scheduled away while holding that mutex. If at that point some other, non-SCHED_IDLE CPU-intensive task is started, it will delay the SCHED_IDLE task indefinitely - which task is holding a crutial mutex! No other task will be able to access /tmp, the system will be essentially DoS-ed.

"Infinitely lower priority" scheduling inevitably leads to "infinitely long scheduling latencies", which inevitably leads to totally new scheduling phenomenons that you wont see with the stock scheduler. That's why i implemented the simpler but obviously correct SCHED_BATCH variant. If you want some really non-intrusive processing, set the task to SCHED_BATCH and set its nice level to +19.

[*] years ago i implemented a SCHED_IDLE variant that correctly handles the DoS issue, but it was too intrusive: it needed to hook into the syscall entry and exit path, to set/restore the priority of the SCHED_IDLE task to/from a non-starving priority while it's in kernel mode, to guarantee that kernel mode does not starve.


(Log in to post comments)

More changes for 2.6.16

Posted Jan 22, 2006 20:01 UTC (Sun) by nix (subscriber, #2304) [Link]

Oh, blast, I thought I saw a patch (from you!) fly past many moons ago back around 2.6.0 that fixed the priority inversion problem, but if you say that it never existed, then my memory's obviously acting up :) it makes sense that it'd need to hook the syscall entry/exit path, at least to add one conditional, to implement the simple fix that _IDLE tasks are only _IDLE when not in the kernel...

If SCHED_BATCH solves the `my most backgrounded tasks eat ~5% of CPU time no matter what' problem, then good: that's what's really wanted by people running background compute-hogging jobs, after all.

More changes for 2.6.16

Posted Jan 31, 2006 21:46 UTC (Tue) by efexis (guest, #26355) [Link]

Isn't the usual thing to give the waiting processes cpu cycles to the idle task that's holding the lock? I thought this happened at a level within the kernel anyway (from what I've read only, I've been nowhere near the code so apologise if i'm talkin out my arse).

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds