LWN.net Logo

A new way to sleep?

A quick look through the kernel source will turn up no end of examples of code like:

    while (some_condition)
        interruptible_sleep_on(some_queue);

The idea, of course, is to put the process asleep until something of interest has happened. The problem with this kind of code is that if the condition changes (and the wakeup happens) between the two lines of code above, the process will miss the wakeup and could sleep for far longer than intended. Because of this inherent race condition, the elimination of sleep_on() and its variants has been on the kernel hackers' todo list for some time.

There is a macro (wait_event) which can be used to sleep safely, but most code which includes race-free sleeps does so manually with the following approximate steps:

  • Create a wait queue entry (usually with DECLARE_WAITQUEUE).

  • Change the process to a state (usually TASK_INTERRUPTIBLE) which indicates that it is asleep - even though the process is still running in kernel code.

  • Add the current process to a wait queue which will be awakened when the condition is met.

  • Test the condition of interest; if no sleep is necessary, reset the process state to TASK_RUNNING, remove the wait queue entry, and get on with the job at hand.

  • Otherwise call the scheduler to let some other process run until somebody wakes the current process up.

  • On wakeup, go back to the top and do it all again.

This sequence works because a wakeup will reset the task state to TASK_RUNNING; this "shorts out" the sleep should the process test its condition at the wrong time and call the scheduler after the wakeup has happened. In many places, the above steps are complicated by the need to release locks or other resources before invoking the scheduler. The result is a lot of duplicated (and error-prone) code throughout the kernel - and this is the "safe" way of doing things.

As part of his 2.5.35-mm1 patch, Andrew Morton has included a new interface designed to simplify the coding of safe sleeps. Code using the new API looks like:

    DECLARE_WAIT(queueentry);
    prepare_to_wait(&wait_queue, &queue_entry, TASK_INTERRUPTIBLE);
    if (condition_not_met)
        schedule ()
    finish_wait(&wait_queue, &queue_entry);

The actual series of events that occur has not really changed; things have just been packaged inside the new prepare_to_wait() and finish_wait functions. The result, though, is code which is cleaner and more likely to be correct. Now it's just a matter of those hundreds of sleep_on calls still in the 2.5 kernel source...


(Log in to post comments)

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds