Kernel code often needs to set aside a task to be performed "a little
later." The classic example is that of an interrupt handler, which must
perform its task quickly, without blocking. Typically interrupt handlers
simply acknowledge the interrupt, then arrange for the real work to be done
outside of interrupt context. That work, which can include starting new
I/O operations, delivering data to user space, or cleanup actions, gets
done when the kernel gets around to it - and, usually, when it's safe to
In the good old days, the "bottom half" mechanism was used to set aside
tasks in this manner. Linux bottom halves were quite inflexible, being
identified by globally-unique, compile-time numbers. There could be no
more than 32 of them - the number that could be tracked in a single-word
bitmask. And bottom halves were not safe places for extended processing or
tasks that needed to sleep.
More recent kernels moved much of the bottom half work to "task queues." A
task queue is a simple linked list of functions to call (and data to pass
to them). Certain predefined task queues were run at well-defined times;
one was executed whenever the scheduler was called, and another was run out
of the timer interrupt handler. Task queues cleaned things up
significantly, but they were not particularly transparent and,
fundamentally, they were still bottom halves. Their removal has been on
numerous peoples' "todo" lists for some time.
One replacement for task queues is the "tasklet" interface, which was
introduced in the 2.3 development series. Tasklets provide a
high-performance interface for quick tasks that do not sleep; they are thus
suitable for certain sorts of operations, but they do not replace task
queues in all situations.
More recently, an attempt was made to address other deferred processing
needs by wrapping a new interface (schedule_task()) around (what
was) the scheduler task queue, and creating a special kernel thread
(keventd) to run that queue. keventd provided a
well-defined process context for tasks that need it (in particular, those
which can sleep). But keventd still suffered the limitations of
task queues, plus one other: all tasks were executed by a single thread.
One very slow task could thus hold up everything else in the queue,
creating unpredictable latencies.
A couple of patches recently posted by Ingo Molnar address these problems
and clean up deferred processing substantially. The first patch removes the task
queue interface and converts its remaining users over to
schedule_task(); this patch was included in 2.5.40. The more
interesting work is contained in the workqueue
patch (since updated),
which has not yet (as of this writing) been merged by Linus.
This patch replaces the task queue mechanism (and schedule_task()
entirely with a mechanism which is simpler to use and which yields
With the workqueue patch, task queues are replaced with the new "workqueue"
concept. The basic idea is the same: a workqueue is a linked list of
structures containing functions to call and data to pass to them. But the
internals of workqueues are better hidden so that users need not worry
about what is really going on. Workqueues are executed in process context,
so tasks executed from those queues may sleep. Each workqueue, however,
has its own worker threads (one per CPU), so one subsystem's workqueue
not block others from running. There is a default workqueue (analogous to
the old schedule_task() functionality) for relatively simple tasks
that do not justify their own queue.
For those who are interested, we have written up a separate article with reasonably complete
documentation of the workqueue interface.
There has been a bit of discussion over the details of this
interface. It has been through one set of modifications already, and will
likely evolve more in the near future.
The basic idea, however, appears to have been well
received; some version of this patch will probably go in before too long.
to post comments)