A safe SCHED_IDLE implementation
In principle, SCHED_IDLE is not that hard to implement. The problem, of course, is the classic priority inversion trap. If a SCHED_IDLE process acquires an important shared resource, such as an internal filesystem semaphore, there is no way to know how long the process may have to wait before it can run long enough to release that resource. A SCHED_IDLE process can be preempted at any time by a higher-priority process; it could then keep needed resources unavailable indefinitely. Priority inversion problems can come up by themselves; this situation could also be brought about intentionally as a denial of service attack.
So far, no solution to this problem has been implemented, so no SCHED_IDLE patch has ever been merged into the kernel. It is easier to simply ensure that every process makes a little progress occasionally so that priority inversion problems resolve themselves.
Now Ingo Molnar has posted a patch which, he claims, implements SCHED_IDLE (which he calls SCHED_BATCH) in a safe way. Those who are curious are encouraged to read his posting, which describes the work in far more detail than you will find here.
The fundamental observation behind Ingo's approach is that processes only hold important kernel resources, such as semaphores, when they are running in kernel mode. If a SCHED_BATCH process is preempted when running in user mode, it is safe to set that process aside indefinitely. If, instead, it is running in kernel mode, it must be allowed to finish it work within a reasonable period of time.
So Ingo's patch splits the schedule() call into two variants. schedule_userspace() is called when the preempted process is running in user mode; it implements the full SCHED_BATCH semantics. schedule(), instead, is invoked when the process is in kernel mode; it will handle a SCHED_BATCH process like any other, normal process. Thus SCHED_BATCH processes essentially have their priorities raised while running in kernel mode.
Raising the priority of processes that hold critical resources is a classic
response to priority inversion problems. Ingo's patch takes a slightly
simpler approach by treating the entire kernel as such a resource. This
patch will raise the priority of SCHED_BATCH processes a bit more
than is strictly necessary; the approach should be robust, however, and the
difference in scheduling behavior would be difficult to measure.
