Deadlocking the system with asynchronous functions
In early January, Alex Riesen reported some difficulties with USB devices on recent kernels; among other things, it was easy to simply lock up the system altogether. A fair amount of discussion followed before Ming Lei identified the problem. It comes down to the block layer's use of the asynchronous function call infrastructure used to increase parallelism in the kernel.
The asynchronous code is relatively simple in concept: a function that is to be run asynchronously can be called via async_schedule(); it will then run in its own thread at some future time. There are various ways of waiting until asynchronously called functions have completed; the most thorough is async_synchronize_full(), which waits until all outstanding asynchronous function calls anywhere in the kernel have completed. There are ways of waiting for specific functions to complete, but, if the caller does not know how many asynchronous function calls may be outstanding, async_synchronize_full() is the only way to be sure that they are all done.
The block layer in the kernel makes use of I/O schedulers to organize and optimize I/O operations. There are several I/O schedulers available; they can be switched at run time and can be loaded as modules. When the block layer finds that it needs an I/O scheduler that is not currently present in the system, it will call request_module() to ask user space to load it. The module loader, in turn, will call async_synchronize_full() at the end of the loading process; it needs to ensure that any asynchronous functions called by the newly loaded module have completed so that the module will be fully ready by the time control returns to user space.
So far so good, but there is a catch. When a new block device is discovered, the block layer will do its initial work (partition probing and such) in an asynchronous function of its own. That work requires performing I/O to the device; that in turn, requires an I/O scheduler. So the block layer may well call request_module() from code that is already running as an asynchronous function. And that is where things turn bad.
The problem is that the (asynchronous) block code must wait for request_module() to complete before it can continue with its work. As described above, the module loading process involves a call to async_synchronize_full(). That call will wait for all asynchronous functions, including the one that called request_module() in the first place, and which is still waiting for request_module() to complete. Expressed more concisely, the sequence looks like this:
- sd_probe() calls async_schedule() to scan a device
asynchronously.
- The scanning process tries to read data from the device.
- The block layer realizes it needs an I/O scheduler, so, in
elevator_get(), it calls request_module() to load
the relevant kernel module.
- The module is loaded and initializes itself.
- do_module_init() calls async_synchronize_full() to
wait for any asynchronous functions called by the just-loaded module.
- async_synchronize_full() waits for all asynchronous functions, including the one called back in step 1, which is waiting for the async_synchronize_full() call to complete.
That, of course, is a classic deadlock.
Fixing that deadlock turns out not to be as easy as one would like. Ming suggested that the call to async_synchronize_full() in the module loader should just be removed, and that user space should be taught that devices might not be ready immediately when the modprobe binary completes. Linus was not impressed with this approach, however, and it was quickly discarded.
The optimal solution would be for the module loader to wait only for asynchronous functions that were called by the loaded module itself. But the kernel does not currently have the infrastructure to allow that to happen; adding it as an urgent bug fix is not really an option. So something else needed to be worked out. To that end, Tejun Heo was brought into the discussion and asked to help come up with a solution. Tejun originally thought that the problem could be solved by detecting deadlock situations and proceeding without waiting in that case, but the problem of figuring out when it would be safe to proceed turned out not to be tractable.
The solution that emerged instead is regarded as a bit of a hack by just about everybody involved. Tejun added a new process flag (PF_USED_ASYNC) to mark when a process has called asynchronous functions. The module loader then tests this flag; if no asynchronous functions are called as the module is loaded, the call to async_synchronize_full() is skipped. Since the I/O scheduler modules make no such calls, that check avoids the deadlock in this particular case. Obviously, the problem remains in any case where an asynchronously-loaded module calls asynchronous functions of its own, but no other such cases have come to light at the moment. So it seems like a workable solution.
Even so, Tejun remarked "It makes me feel dirty but makes the problem
go away and I can't think of anything better
". The patch has found
its way into the mainline and will be present in the 3.8 final
release. By then, though, it would not be entirely surprising if somebody
else were to take up the task of finding a more elegant solution for a
future development cycle.
| Index entries for this article | |
|---|---|
| Kernel | Asynchronous function calls |
