Firmware loading and suspend/resume
The obvious issue with suspend/resume is that any given device may lose its firmware while the system is suspended. The whole point of suspending the system is to reduce its power consumption to a minimum, so that operation may well power down peripheral devices entirely. Loss of firmware during suspend doesn't seem like it should be a big problem; the driver can just load the firmware again at resume time. But firmware tends to live on disk, and the actual firmware loading operation involves the running of a helper process in user space. Neither the disk nor user space are guaranteed to be available at the point in the resume process when a given device wants its firmware back; drivers that attempt to obtain firmware at such times may fail badly. The result is resume failures; they may be of the intermittent, developer-never-sees-it variety that can be so frustrating to track down. So the search has been on for a more robust solution for some time.
In July, Ming Lei tried to address this problem with a patch integrating firmware loading with the deferred driver probing mechanism. In short, if a firmware load fails, the whole driver initialization process would be put on the deferred queue to be retried later on. So, a driver that is unable to load its firmware at resume time will be put on hold and retried at a later point when, hopefully, the resources required to complete the firmware load will be available. That, Ming hoped, would resolve a lot of resume-time failures without requiring changes to lots of drivers.
Linus, however, disagreed:
Deferring firmware loading in this manner, he thought, would just serve to hide problems from developers but leave them to burn users later on. It is much better, he thought, to force driver writers to deal with the problem explicitly.
The classic way for a driver writer to handle this problem is to just keep the firmware around after it is loaded at system boot time. Permanently cached firmware will always be available when it is needed, so firmware loading at resume time should be robust. The problem with that approach is that the firmware blobs loaded into some devices can be quite large; keeping them around forever can waste a fair amount of kernel-space memory. To make things worse, these blobs are loaded into vmalloc() memory (so that they appear to be contiguous in memory); that memory can be in short supply on 32-bit systems. Permanently caching the firmware is, thus, not an ideal solution, but that is what a number of drivers do now.
After the discussion with Linus, Ming thought for a while and came back with a new proposal: cache firmware blobs, but only during the actual suspend/resume cycle. Drivers can, of course, do that now; they can request a copy of the firmware while suspending their devices, and release that copy once it's no longer needed at resume time. But that is a chunk of boilerplate code that would need to be added to each driver. Ming's patch, instead, makes this process automatic and transparent.
In particular, request_firmware() is changed to make a note of the name of every firmware blob it is asked to load. This information is reference-counted and tied to the devices that needed the firmware; it can thus be discarded if all such devices disappear. The result is a simple data structure tracking all of the firmware blobs that may be needed by the hardware currently present in the system.
At system suspend time, the code simply goes and loads every piece of firmware that it thinks may be needed. That data then sits in memory while the system is suspended. At resume time, those cached blobs are available to any driver, with no need for filesystem access or user-space involvement, via the usual request_firmware() interface. Once the resume process is complete, the firmware loader will, after a small delay, release all of those cached firmware images, freeing the associated memory and address space for other uses.
The patch seems close to an ideal solution. Firmware loading at resume
time becomes more robust, there is no need for drivers to be concerned with
how it works, and wasted memory is minimized. Even Linus said
"Nothing in this patchset made me go 'Eww'
", which, from him,
can be seen
as reasonably high praise. It doesn't solve every problem; there are, for
example, some
strange devices that retain firmware over a reboot but not over
suspend, so the system may not know that a specific firmware image is
needed until resume time, when it's too late. But such hardware is
probably best handled as a special case. For the rest, we may be close to
a solution that simply works—and that brings an end to the recurring
"firmware at resume time" discussions on the mailing lists.
Index entries for this article Kernel Device drivers/Firmware loading Kernel Firmware
Posted Aug 16, 2012 5:18 UTC (Thu)
by kugel (subscriber, #70540)
[Link] (1 responses)
Posted Aug 16, 2012 6:28 UTC (Thu)
by JohnLenz (guest, #42089)
[Link]
Firmware loading happens through udev. The kernel raises a hotplug event which udev sees. See this README. On Ubuntu, /lib/udev/rules.d/50-firmware contains the udev rule, which runs the /lib/udev/firmware binary whenever the kernel requests a firmware file. The reason is because the kernel can run in strange environments: read only root file system, running with containers with locked down root filesystems, etc. With filesystem namespaces, there is no single filesystem so the kernel has no idea where to look for files. Even calling into custom filesystem code from somewhere else in the kernel is somewhat icky. Instead the kernel just requests userspace take care of it.
Posted Aug 17, 2012 4:19 UTC (Fri)
by felixfix (subscriber, #242)
[Link] (1 responses)
I also realize this comment could easily come off sounding rather strange and perhaps be taken wrongly, so maybe I can forestall some of that by saying again that I like this perceived change, and if I am slow to perceive it, if it has been obvious to everyone else for years and years, well, pardon me, but it still strikes me as a positive sign in so many ways that I can't help but smile and think the world is becoming a better place in many ways: Linux, free source software, expanding horizons, less poverty from the spread of technology, world peace, and more varieties of beer spreading around the world as tokens of code appreciation.
Posted Aug 17, 2012 12:49 UTC (Fri)
by mpr22 (subscriber, #60784)
[Link]
Well, more brands of pale lager, anyway :)
Posted Sep 20, 2012 23:19 UTC (Thu)
by iive (guest, #59638)
[Link] (2 responses)
If you attach USB device while the system is suspended, on resume the usb-core would find it and probe a driver for it. That driver would try to load a firmware, but because it have never been loaded before it won't be found in the "cache".
Honestly, why is userland even involved in firmware loading? The whole userland shenanigan should be scraped and reverted to the old system where the kernel loads the firmware directly from the filesystem. If the kernel modules are accessible, then the firmware would be accessible too. (Use tmpfs or initrd as workarounds for the other cases.)
KiSS.
Posted Sep 21, 2012 22:49 UTC (Fri)
by nix (subscriber, #2304)
[Link] (1 responses)
The whole thing is a complete trainwreck, from the need to dig through source code to find the right names to jam in CONFIG_EXTRA_FIRMWARE through the userspace loading that you can use except in unusual situations such as if you have even a single firmware-using module built into the kernel or if you need even a single firmware-using module to resume from hibernation. (Oh, and how much assistance does the kernel give you in detecting that you have either of those situations? None, that's how much. It just dies without a message at the appropriate time. IIRC there's been talk about fixing the hibernation side of this, but I don't think anything ever came of it.)
This whole thing was designed entirely to let distro vendors produce something without violating the GPL, and it does that -- but unfortunately it makes it bloody hard for the rest of us to produce working systems without digging through the source code if we have anything needing firmware at all, even if we're not using modules for anything.
(Sorry, Matthew, I really don't like to criticise your work -- but this banjaxed-up firmware-loading mess just wasted several hours of my time hunting for a 'lockup' that wasn't, due to a hugely overlong timeout that even the stupidest kernel should not have incurred, on a kernel with no loadable firmware of any sort, before userspace was even running. This is not something that has been tested in the non-modular case with an eye to not being intolerably awful.)
Posted Sep 22, 2012 13:33 UTC (Sat)
by jackb (guest, #41909)
[Link]
Firmware loading and suspend/resume
Firmware loading and suspend/resume
Firmware loading and suspend/resume
Firmware loading and suspend/resume
more varieties of beer
Firmware loading and suspend/resume
Firmware loading and suspend/resume
Firmware loading and suspend/resume
