|
|
Log in / Subscribe / Register

A safe SCHED_IDLE implementation

A longstanding kernel feature request is a SCHED_IDLE scheduler class. Tasks running as SCHED_IDLE would only run when the processor would otherwise be idle. The "niceness" scheme in the current scheduler does not provide this behavior: even the lowest-priority processes will run sometimes. Users who want to search out encryption keys, model proteins, or search for extraterrestrial life on their systems generally want that work to not take any time from other tasks running on the system. Thus the request for SCHED_IDLE.

In principle, SCHED_IDLE is not that hard to implement. The problem, of course, is the classic priority inversion trap. If a SCHED_IDLE process acquires an important shared resource, such as an internal filesystem semaphore, there is no way to know how long the process may have to wait before it can run long enough to release that resource. A SCHED_IDLE process can be preempted at any time by a higher-priority process; it could then keep needed resources unavailable indefinitely. Priority inversion problems can come up by themselves; this situation could also be brought about intentionally as a denial of service attack.

So far, no solution to this problem has been implemented, so no SCHED_IDLE patch has ever been merged into the kernel. It is easier to simply ensure that every process makes a little progress occasionally so that priority inversion problems resolve themselves.

Now Ingo Molnar has posted a patch which, he claims, implements SCHED_IDLE (which he calls SCHED_BATCH) in a safe way. Those who are curious are encouraged to read his posting, which describes the work in far more detail than you will find here.

The fundamental observation behind Ingo's approach is that processes only hold important kernel resources, such as semaphores, when they are running in kernel mode. If a SCHED_BATCH process is preempted when running in user mode, it is safe to set that process aside indefinitely. If, instead, it is running in kernel mode, it must be allowed to finish it work within a reasonable period of time.

So Ingo's patch splits the schedule() call into two variants. schedule_userspace() is called when the preempted process is running in user mode; it implements the full SCHED_BATCH semantics. schedule(), instead, is invoked when the process is in kernel mode; it will handle a SCHED_BATCH process like any other, normal process. Thus SCHED_BATCH processes essentially have their priorities raised while running in kernel mode.

Raising the priority of processes that hold critical resources is a classic response to priority inversion problems. Ingo's patch takes a slightly simpler approach by treating the entire kernel as such a resource. This patch will raise the priority of SCHED_BATCH processes a bit more than is strictly necessary; the approach should be robust, however, and the difference in scheduling behavior would be difficult to measure.


to post comments

A safe SCHED_IDLE implementation

Posted Jul 5, 2002 19:04 UTC (Fri) by xoddam (subscriber, #2322) [Link] (1 responses)

(You seem to have ommitted <pre> on the post,
which is only really readable if you 'view source').

As discussed, surely it is still possible for a priority
inversion to occur if a SCHED_BATCH process holds a
shared resource in userspace? Processes holding kernel
resources may not be suspended indefinitely, but
deadlocks on locks which can be held while running in
userspace (such as pthreads mutexes or lock files)
will still be possible. I suppose the idea is that
that is a problem to be solved in userspace :-)

A safe SCHED_IDLE implementation

Posted Jul 7, 2002 21:38 UTC (Sun) by Peter (guest, #1127) [Link]

As discussed, surely it is still possible for a priority inversion to occur if a SCHED_BATCH process holds a shared resource in userspace?

Priority inversion can be simulated by merely taking a resource and never releasing it. So if you can't trust your users not to DOS you with a priority inversion, you shouldn't give them permission to lock those shared resources in the first place.

As for unintentional priority inversion - yes, it's a userspace problem. (:

A safe SCHED_IDLE implementation

Posted Jul 12, 2002 22:12 UTC (Fri) by DeletedUser2561 ((unknown), #2561) [Link]

Digital Equipment Corporations (DEC) VAXELN Real Time OS scheduling incorporated JOB and PROCESS Execution thread creation. A master PROCESS could create a single JOB (created by CREATE_JOB) and multiple PROCESS (with CREATE_PROCESS). The JOB is the background thread and runs only when there were no ready/running PROCESS threads. The earlier RSX-IAS OS from the late '70s implmented a similar scheduling algorithm and was "Batch" based.


Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds