LWN.net Logo

Concurrency-managed workqueues and thread priorities

By Jonathan Corbet
June 22, 2010
The original workqueue code found its way into the mainline without a great deal of discussion or debate; it was a clear improvement over what came before. Tejun Heo's concurrency-managed workqueues (CMWQ) rework has the potential to be a significant improvement as well, but its path toward merging has not been so smooth. The fifth iteration of the patch set is currently under discussion. While a number of concerns have been addressed, others have come out of the woodwork to replace them.

The CMWQ work is intended to address a number of problems with current kernel workqueues. At the top of the list is the proliferation of kernel threads; current workqueues can, on a large system, run the kernel out of process IDs before user space ever gets a chance to run. Despite all these threads, current workqueues are not particularly good at keeping the system busy; workqueues may contain a backlog of work while the CPU sits idle. Workqueues can also be subject to deadlocks if locking is not handled very carefully. As a result, the kernel has grown a number of workarounds and some competing deferred-work mechanisms.

To resolve these problems, the CMWQ code maintains a set of worker threads on each processor; these threads are shared between workqueues, so the system is not overrun with workqueue-specific threads. The special scheduler class once used by CMWQ is long gone, but the code still has hooks into the scheduler which it can use to track which worker threads are actually executing at any given time. If all workqueue threads on a CPU have blocked waiting on some resource, and if there is queued work to do, the CMWQ code will kick off a new thread to work on it. The CMWQ code can run multiple jobs from the same CPU concurrently - something the current workqueue code will not do. In this way, the CPU is always kept busy as long as there is work to be done.

The first complaint that came back this time was that many developers had long since forgotten what CMWQ was all about, and Tejun had not put that information into the patch series introduction. He made up for that with an overview document explaining the basics of the code. That led quickly to a new complaint: the lack of dedicated worker threads means that it is no longer possible to change the scheduling behavior of specific workqueues.

There were two variants of this complaint. Daniel Walker lamented the loss of the ability to change the priority of workqueue threads from user space. Tejun has firmly denied that this is a useful thing to be able to do, and Daniel has not, yet, shown an example of where it would be desirable. Andrew Morton, instead, worries about being able to change scheduling behavior from within the kernel; that is something that at least one driver does now. He might be willing to let this capability go, but he's not happy about it:

Oh well. Kernel threads should not be running with RT policy anyway. RT is a userspace feature, and whenever a kernel thread uses RT it degrades userspace RT qos. But I expect that using RT in kernel threads is sometimes the best tradeoff, so let's not pretend that we're getting something for nothing here!

Tejun's reply to this concern takes a couple of forms. One is that workqueues are intended for general-purpose asynchronous work, and that is how almost all callers use it. It would be better, he says, to make special mechanisms for situations where they are really needed. To that end, he has posted a simple kthread_worker API which can be used for the creation of special-purpose worker threads. Essentially, one starts by setting up a kthread_worker structure:

    DEFINE_KTHREAD_WORKER(worker);
    /* ... or ... */
    struct kthread_worker worker;
    init_kthread_worker(&worker);

Then, a kernel thread should be set up using the (existing) kthread_create() or kthread_run() utilities, but passing a pointer to kthread_worker_fn() as the actual function to run:

    struct task_struct thread;

    thread = kthread_run(kthread_worker_fn, &worker, "name" ...);

Thereafter, it's just a matter of filling in kthread_work structures with actual work to be done and queueing them with:

    bool queue_kthread_work(struct kthread_worker *worker,
                            struct kthread_work *work);

So far, there has been no real commentary on this patch.

The other thing which could be done is to associate attributes like priority and CPU affinity with the work to be done instead of with the thread doing the work. That would require expanding the workqueue API to allow this information to be specified; the CMWQ code would then tweak worker threads accordingly when passing jobs to them. At this point, though, it's not clear that there is enough need for this feature to justify the added complexity that it would require.

The CMWQ code certainly adds a bit of complexity already, though it makes up for some of that by replacing the slow work and asynchronous function call mechanisms. Tejun is hoping to drop it into linux-next shortly, and, presumably, to get it merged for 2.6.36. Whether that will happen remains to be seen; core kernel changes can be hard, and this one may not, yet, have cleared its last hurdle.


(Log in to post comments)

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds