|
|
Subscribe / Log in / New account

Slow probing + udev + SIGKILL = trouble

By Jonathan Corbet
September 9, 2014
A few years back, kernel developers briefly tried to make all device probing into an asynchronous activity. The resulting complications proved to be hard enough to resolve that the effort was deemed to not be worthwhile and the change was backed out. Now asynchronous probing is back on the table. The idea is not being received entirely warmly, but the problem it is trying to solve is real — and slightly strange.

Things started in November 2013, when Tetsuo Handa wrote a patch to make kthread_create(), a kernel function which creates and runs a kernel thread, killable (meaning that it will exit on a SIGKILL signal). Prior to this change, any process that was running in kthread_create() would be temporarily immune to SIGKILL; in particular, it would not respond if the out-of-memory (OOM) killer decided that it was in need of termination. While there is room for disagreement with the heuristics by which the OOM killer chooses victims, few developers believe that those victims, once chosen, should remain in the system for an arbitrary amount of time. With Tetsuo's change, a process that had caused the creation of a kernel thread would no longer be immune to the OOM killer's wrath.

Interestingly, this patch caused some systems to fail to boot. A number of storage-system drivers require kernel threads to complete the process of probing for storage arrays. This probing process can involve a fair amount of work, to the point that it can take a minute or so to run. But it seems that systemd-udev does not have unlimited patience; it starts a 30-second timer (reduced from three minutes last year) when loading a device module, and kills the loading process (with SIGKILL) should that timer expire. So the process trying to probe the storage array is killed, the array assembly fails, and the system does not boot. Prior to Tetsuo's change, the signal would have been ignored during the probing process; afterward, it became fatal. Other types of drivers, such as those that must go through a lengthy firmware-downloading exercise, can also be affected by this problem.

The resulting discussions were spread out across multiple lists and bug trackers and thus were somewhat hard to follow. Kernel developers seemed to be generally of the opinion that a hard-coded 30-second timeout in systemd-udev is not a good idea, and that the problem should be fixed there. The systemd developers believe that any module taking more than 30 seconds to load is simply buggy and should be fixed. Tetsuo suggested that kthread_create() could delay its exit for ten seconds on SIGKILL if that signal originates anywhere other than the OOM killer. None of these ideas have found a consensus or led to a solution to the problem.

Of course, there is the option of simply increasing the timeout in the configuration files, something that was done by the device mapper developers in response to a similar problem. But that approach strikes nobody as being particularly elegant.

There is one other way around this difficulty: device drivers could, at module load time, simply register themselves and do only the work that can be completed quickly. Any time-consuming work would be pushed off into a separate thread to run asynchronously, after module loading is done and systemd-udev has left the vicinity. This type of asynchronous initialization might also have the effect of improving boot times by allowing other work to happen while the slow probing work is being done.

To this end, Luis Rodriguez has posted a patch set adding asynchronous probing to the driver core. The patches add a new field (async_probe) to struct device_driver; if that field has a true value, probing for devices will happen asynchronously by way of a workqueue. Three drivers (pata_marvell, cxgb4, and mptsas) were modified to request the new asynchronous behavior. There is also a variant of Tetsuo's 10-second-delay patch that is primarily intended to put a warning into the system log should a non-OOM-killer SIGKILL show up in kthread_create(); it is there to help identify other drivers that need to be converted to the asynchronous mode.

Tejun Heo, who, among other things, maintains the serial ATA layer, was not fond of this approach. His opposition, in the end, comes down to two issues:

  • Any driver can potentially exhibit this problem. Taking a whack-a-mole approach by tweaking drivers with reported issues is thus the wrong way to solve the problem — there will always be more drivers that still need fixing.

  • Linux systems have always managed to have locally attached storage devices available when module loading completes. Moving to an asynchronous mode is thus a user-visible behavior change; it could well break administrative scripts that expect storage arrays to be ready for mounting immediately after the driver has been loaded.

The second issue is the more controversial of the two. Modern distributions tend not to make assumptions about when devices will show up in the system, so some developers argue that there should no longer be any problems in this area. But old administrative scripts can hang around for a long time. So the risk of breaking real-world systems with this kind of change is real, even if it is not clear how many systems might be affected.

Of course, other real-world systems are broken now, so something needs to be done. The most likely outcome would appear to be some sort of asynchronous probing that is done under user-space control; unless user space has explicitly requested it, the behavior would not change. The probable implementation of this approach is a global flag that turns on asynchronous device probing, with one exception. There is a good chance that any kernel code that calls request_module() expects that the requested module's devices will be available when the call returns. So modules loaded via this path will, for now at least, have to be loaded synchronously.

On distributions where the management of storage arrays is done with distribution-supplied scripts, the "use asynchronous probing" switch could be turned on by default. Others would require some sort of manual intervention. It is not the best resolution that one might imagine, but it might be the best that is on offer in the near future.

Index entries for this article
KernelDevice drivers/Asynchronous probing
Kerneludev


to post comments

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 8:09 UTC (Thu) by tomegun (guest, #56697) [Link]

The reason for reducing the udev timeout from 180 to 30 seconds no longer applies (it had to do with firmware loading, which is now done entirely by the kernel), so as of yesterday it is back to 180 seconds in systemd git.

If three minutes is still not enough time, we could probably increase it even more. However, there needs to be some sort of timeout in place to avoid (udev) workers hanging around forever.

In the specific case of module loading, having the kernel block for several minutes on some module sounds like a problem worth fixing regardless of whether/when udev gives up waiting. If the only problem with asynchronous probing is that some userspace can not deal with it (it is still not entirely clear to me if there are other issues), making this behaviour opt-in at configure time seems like a reasonable solution to me.

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 8:45 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

I confess that I wrote several scripts in the past that depended on modprobe being synchronous (to load a module for telephony cards). They are probably still in use.

So is it possible to set the 'use async probing' flag on a per-process basis? I'd prefer to leave the global setting as it is and let systemd load its modules asynchronously.

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 9:27 UTC (Thu) by tomegun (guest, #56697) [Link] (2 responses)

I suppose we could introduce a flag to finit_module() which would be off by default and when on it would indicate that we do not wish to wait for probing.

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 9:30 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

That's an even better idea, since there'll be no additional global state.

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 16:46 UTC (Thu) by josh (subscriber, #17465) [Link]

This seems like a great idea, especially since we could very easily make the kernel's automatic module loading (in response to hardware discovery) asynchronous, while preserving the synchronous behavior of manual module loading for scripting purposes.

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 17:57 UTC (Thu) by sorokin (guest, #88478) [Link]

Speaking from userspace point of view, both synchronous (for scripting) and asynchronous (for daemons) module loading are useful. In the latter case the notification over epoll about module loading completion is very helpful.

I don't understand why old administrative scripts should be broken. If current interface is synchronous (I assume it is insmod, am I right?), well, leave it as is and create new async_insmod. Surely inside kernel insmod could be implemented using async_insmod.

Slow probing + udev + SIGKILL = trouble

Posted Sep 11, 2014 19:22 UTC (Thu) by Alan.Stern (subscriber, #12437) [Link]

Tejun Heo said: "Linux systems have always managed to have locally attached storage devices available when module loading completes."

This isn't quite true. The sd_mod module uses async probing for the time-consuming parts of registering a disk drive, such as reading the drive capacity and the partition table. There is no way to disable this behavior.

Of course, most systems have the sd driver built into the kernel, not built as a loadable module. But for those that do, probing of attached drives is not complete when the module finishes loading.

Slow probing + udev + SIGKILL = trouble

Posted Sep 15, 2014 18:55 UTC (Mon) by Baylink (guest, #755) [Link] (1 responses)

"systemd-udev is not a good idea"

There. FTFY.

Slow probing + udev + SIGKILL = trouble

Posted Sep 16, 2014 12:29 UTC (Tue) by cortana (subscriber, #24596) [Link]

Please take your pointless trolling elsewhere.

Driver pleads for longer init time

Posted Sep 18, 2014 21:03 UTC (Thu) by dfsmith (guest, #20302) [Link]

If a driver knows it's going to take a long time to init, it should be able to plead the case with whichever process is loading it. Since systemd (apparently) runs a logging system, if the driver logs "modulename_init will take at least X more seconds" then systemd should honor it. Just need to find a standard log format ("modulename_init will complete before abstime=X", etc.).


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds