|
|
Subscribe / Log in / New account

Firmware loading and suspend/resume

By Jonathan Corbet
August 15, 2012
Many devices are unable to function until the host system has loaded them with their operating firmware. Runtime-loadable firmware has some real advantages: the hardware can be a little cheaper to make, and the firmware is easily upgraded after the hardware has been sold. But it also poses some problems, especially when combined with other features. Properly handling firmware loading over suspend/resume cycles has been a challenge for the kernel for some time, but a new set of patches may be poised to make things work better with little or no need for changes to drivers.

The obvious issue with suspend/resume is that any given device may lose its firmware while the system is suspended. The whole point of suspending the system is to reduce its power consumption to a minimum, so that operation may well power down peripheral devices entirely. Loss of firmware during suspend doesn't seem like it should be a big problem; the driver can just load the firmware again at resume time. But firmware tends to live on disk, and the actual firmware loading operation involves the running of a helper process in user space. Neither the disk nor user space are guaranteed to be available at the point in the resume process when a given device wants its firmware back; drivers that attempt to obtain firmware at such times may fail badly. The result is resume failures; they may be of the intermittent, developer-never-sees-it variety that can be so frustrating to track down. So the search has been on for a more robust solution for some time.

In July, Ming Lei tried to address this problem with a patch integrating firmware loading with the deferred driver probing mechanism. In short, if a firmware load fails, the whole driver initialization process would be put on the deferred queue to be retried later on. So, a driver that is unable to load its firmware at resume time will be put on hold and retried at a later point when, hopefully, the resources required to complete the firmware load will be available. That, Ming hoped, would resolve a lot of resume-time failures without requiring changes to lots of drivers.

Linus, however, disagreed:

Sure, for a lot of devices it's fine to load the firmware later. But some devices may be part of the resume sequence in very critical ways, and deferring the firmware loading will just mean that the resume will fail.

Deferring firmware loading in this manner, he thought, would just serve to hide problems from developers but leave them to burn users later on. It is much better, he thought, to force driver writers to deal with the problem explicitly.

The classic way for a driver writer to handle this problem is to just keep the firmware around after it is loaded at system boot time. Permanently cached firmware will always be available when it is needed, so firmware loading at resume time should be robust. The problem with that approach is that the firmware blobs loaded into some devices can be quite large; keeping them around forever can waste a fair amount of kernel-space memory. To make things worse, these blobs are loaded into vmalloc() memory (so that they appear to be contiguous in memory); that memory can be in short supply on 32-bit systems. Permanently caching the firmware is, thus, not an ideal solution, but that is what a number of drivers do now.

After the discussion with Linus, Ming thought for a while and came back with a new proposal: cache firmware blobs, but only during the actual suspend/resume cycle. Drivers can, of course, do that now; they can request a copy of the firmware while suspending their devices, and release that copy once it's no longer needed at resume time. But that is a chunk of boilerplate code that would need to be added to each driver. Ming's patch, instead, makes this process automatic and transparent.

In particular, request_firmware() is changed to make a note of the name of every firmware blob it is asked to load. This information is reference-counted and tied to the devices that needed the firmware; it can thus be discarded if all such devices disappear. The result is a simple data structure tracking all of the firmware blobs that may be needed by the hardware currently present in the system.

At system suspend time, the code simply goes and loads every piece of firmware that it thinks may be needed. That data then sits in memory while the system is suspended. At resume time, those cached blobs are available to any driver, with no need for filesystem access or user-space involvement, via the usual request_firmware() interface. Once the resume process is complete, the firmware loader will, after a small delay, release all of those cached firmware images, freeing the associated memory and address space for other uses.

The patch seems close to an ideal solution. Firmware loading at resume time becomes more robust, there is no need for drivers to be concerned with how it works, and wasted memory is minimized. Even Linus said "Nothing in this patchset made me go 'Eww'", which, from him, can be seen as reasonably high praise. It doesn't solve every problem; there are, for example, some strange devices that retain firmware over a reboot but not over suspend, so the system may not know that a specific firmware image is needed until resume time, when it's too late. But such hardware is probably best handled as a special case. For the rest, we may be close to a solution that simply works—and that brings an end to the recurring "firmware at resume time" discussions on the mailing lists.

Index entries for this article
KernelDevice drivers/Firmware loading
KernelFirmware


to post comments

Firmware loading and suspend/resume

Posted Aug 16, 2012 5:18 UTC (Thu) by kugel (subscriber, #70540) [Link] (1 responses)

I'm surprised to hear that firmware loading depends on user space, on a helper process in particular. Does that refer to FUSE or something else?

Firmware loading and suspend/resume

Posted Aug 16, 2012 6:28 UTC (Thu) by JohnLenz (guest, #42089) [Link]

Firmware loading happens through udev. The kernel raises a hotplug event which udev sees. See this README. On Ubuntu, /lib/udev/rules.d/50-firmware contains the udev rule, which runs the /lib/udev/firmware binary whenever the kernel requests a firmware file.

The reason is because the kernel can run in strange environments: read only root file system, running with containers with locked down root filesystems, etc. With filesystem namespaces, there is no single filesystem so the kernel has no idea where to look for files. Even calling into custom filesystem code from somewhere else in the kernel is somewhat icky. Instead the kernel just requests userspace take care of it.

Firmware loading and suspend/resume

Posted Aug 17, 2012 4:19 UTC (Fri) by felixfix (subscriber, #242) [Link] (1 responses)

I'm going to clutter this up with an off-topic comment. It tickles me pink to see more and more non-US/Euro names show up as contributors. I realize Ming Lei may well be a US or EU citizen or resident, but the more "non-traditional" names show up, the more likely they come from all over the world, and that just makes me smile.

I also realize this comment could easily come off sounding rather strange and perhaps be taken wrongly, so maybe I can forestall some of that by saying again that I like this perceived change, and if I am slow to perceive it, if it has been obvious to everyone else for years and years, well, pardon me, but it still strikes me as a positive sign in so many ways that I can't help but smile and think the world is becoming a better place in many ways: Linux, free source software, expanding horizons, less poverty from the spread of technology, world peace, and more varieties of beer spreading around the world as tokens of code appreciation.

Firmware loading and suspend/resume

Posted Aug 17, 2012 12:49 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

more varieties of beer

Well, more brands of pale lager, anyway :)

Firmware loading and suspend/resume

Posted Sep 20, 2012 23:19 UTC (Thu) by iive (guest, #59638) [Link] (2 responses)

There might be a(nother) case where this new mechanism won't work.

If you attach USB device while the system is suspended, on resume the usb-core would find it and probe a driver for it. That driver would try to load a firmware, but because it have never been loaded before it won't be found in the "cache".

Honestly, why is userland even involved in firmware loading? The whole userland shenanigan should be scraped and reverted to the old system where the kernel loads the firmware directly from the filesystem. If the kernel modules are accessible, then the firmware would be accessible too. (Use tmpfs or initrd as workarounds for the other cases.)

KiSS.

Firmware loading and suspend/resume

Posted Sep 21, 2012 22:49 UTC (Fri) by nix (subscriber, #2304) [Link] (1 responses)

I pretty much concur. The userspace loader causes problems even when it isn't being used at all. e.g. I just got a new machine with a Barts GPU in it (ATI Radeon 6870), and since I have a largely non-modular kernel with KMS built in I put all the firmware I thought it would need into CONFIG_EXTRA_FIRMWARE. I forgot one file... so the system hung for something like sixty seconds(!) trying to call out to userspace to load the firmware -- even though PID 1 had not yet been forked off! Of course because this was brand-new hardware and it was right at modesetting time I thought I'd messed something up in the .config or there was something major missing in the kernel's KMS support or something like that.

The whole thing is a complete trainwreck, from the need to dig through source code to find the right names to jam in CONFIG_EXTRA_FIRMWARE through the userspace loading that you can use except in unusual situations such as if you have even a single firmware-using module built into the kernel or if you need even a single firmware-using module to resume from hibernation. (Oh, and how much assistance does the kernel give you in detecting that you have either of those situations? None, that's how much. It just dies without a message at the appropriate time. IIRC there's been talk about fixing the hibernation side of this, but I don't think anything ever came of it.)

This whole thing was designed entirely to let distro vendors produce something without violating the GPL, and it does that -- but unfortunately it makes it bloody hard for the rest of us to produce working systems without digging through the source code if we have anything needing firmware at all, even if we're not using modules for anything.

(Sorry, Matthew, I really don't like to criticise your work -- but this banjaxed-up firmware-loading mess just wasted several hours of my time hunting for a 'lockup' that wasn't, due to a hugely overlong timeout that even the stupidest kernel should not have incurred, on a kernel with no loadable firmware of any sort, before userspace was even running. This is not something that has been tested in the non-modular case with an eye to not being intolerably awful.)

Firmware loading and suspend/resume

Posted Sep 22, 2012 13:33 UTC (Sat) by jackb (guest, #41909) [Link]

Another cool feature is that if you are trying to build a non-modular driver that requires firmware that's included in the kernel sources, and have KBUILD_OUTPUT set compilation will fail due to a "not found" error until you manually copy the firmware directory to $KBUILD_OUTPUT


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds