A new way to sleep?
[Posted September 17, 2002 by corbet]
A quick look through the kernel source will turn up no end of examples of
code like:
while (some_condition)
interruptible_sleep_on(some_queue);
The idea, of course, is to put the process asleep until something of
interest has happened. The problem with this kind of code is that if the
condition changes (and the wakeup happens) between the two lines of
code above, the process will miss the wakeup and could sleep for far longer
than intended. Because of this inherent race condition, the elimination of
sleep_on() and its variants has been on the kernel hackers' todo
list for some time.
There is a macro (wait_event) which can be used to sleep safely,
but most code which includes race-free sleeps does so manually with the
following approximate steps:
- Create a wait queue entry (usually with DECLARE_WAITQUEUE).
- Change the process to a state (usually TASK_INTERRUPTIBLE)
which indicates that it is asleep - even though the process is still
running in kernel code.
- Add the current process to a wait queue which will be awakened when
the condition is met.
- Test the condition of interest; if no sleep is necessary, reset the
process state to TASK_RUNNING, remove the wait queue entry,
and get on with the job at hand.
- Otherwise call the scheduler to let some other process run until
somebody wakes the current process up.
- On wakeup, go back to the top and do it all again.
This sequence works because a wakeup will reset the task state to
TASK_RUNNING; this "shorts out" the sleep should the process test
its condition at the wrong time and call the scheduler after the wakeup has
happened.
In many places, the above steps are complicated by the need to release
locks or other resources before invoking the scheduler. The result is a
lot of duplicated (and error-prone) code throughout the kernel - and this
is the "safe" way of doing things.
As part of his 2.5.35-mm1 patch, Andrew
Morton has included a new interface designed to simplify the coding of safe
sleeps. Code using the new API looks like:
DECLARE_WAIT(queueentry);
prepare_to_wait(&wait_queue, &queue_entry, TASK_INTERRUPTIBLE);
if (condition_not_met)
schedule ()
finish_wait(&wait_queue, &queue_entry);
The actual series of events that occur has not really changed; things have
just been packaged inside the new prepare_to_wait() and
finish_wait functions. The result, though, is code which is
cleaner and more likely to be correct. Now it's just a matter of those
hundreds of sleep_on calls still in the 2.5 kernel source...
(
Log in to post comments)