LWN.net Logo

Module unloading in a reference counted world

Increasingly, the kernel uses reference counts to know when data structures are no longer needed and can be reclaimed. This reference counting tends to be managed by the kobject type, though other mechanisms are used as well. When properly used, this mechanism works well. Interesting issues can come up, however, when reference-counted objects are maintained by code in loadable modules. In many situations, the module cannot be unloaded until all objects it has created have seen their reference counts go to zero and have been returned to the system. Otherwise, the system can be left with objects containing invalid references to module code which no longer exists. Bad things usually result from that situation.

Alan Stern recently ran into this sort of situation; his module registers various structures with the device model, and must be sure not to allow itself to be unloaded until those structures have been released. To that end, he wrote a patch adding two functions (class_device_unregister_wait() and platform_device_unregister_wait()) which unregister those structures and explicitly wait until they have been released. This patch did not get very far, however; it was quickly pointed out that, with this code, it is relatively easy to deadlock the kernel. If the process trying to remove the module also has an open file descriptor to one of that module's sysfs entries, everything comes to a halt. The suggested solution, instead, is to simply not allow the module to be unloaded if it still has unreclaimed objects outstanding.

That approach is taken in some other contexts. The cdev structure used to represent char devices uses a kobject for its reference count. The cdev_get() function does more than just increment the count in the kobject, however; it also increments the reference count for the module which drives that device. If any cdev structure owned by a module has references, the module, too, will have a non-zero reference count and will not be unloadable.

Another approach has been taken in the network subsystem. The net_device structure represents a network device; its rules say that it must be allocated dynamically, with alloc_netdev(). When the network driver is done with the structure, it calls free_netdev() to get rid of it. The net_device structure has its own reference count, but it is not tied to the underlying module's reference count. Instead, the networking system guarantees that, once free_netdev() has been called, it will not call into the module again for that device. The release function for the net_device structure, which returns its memory to the system, lives in the networking code, rather than in any loadable module. As a result, the module can be removed even while some of its net_device structures continue to exist, and all will be well. Those structures have been detached from the module which created them, and will be freed by core kernel code.

The real lesson from all this, perhaps, is that the kernel developers are still figuring out the implications of the lifetime rules of the objects they create. The addition of sysfs in 2.5 has tended to force this issue; sysfs exposes a great many internal kernel objects to user space, which can keep references to those objects for an indeterminate period of time. Making everything work safely in this environment has proved to be a challenge at times.

And module unloading, of course, will always be a challenge. There will likely always be issues involved with removing code from a live kernel. As Linus put it:

The proper thing to do (and what we _have_ done) is to say "unloading of modules is not supported". It's a debugging feature, and you literally shouldn't do it unless you are actively developing that module.

Experience shows that many users are not happy with a kernel which cannot unload modules, however. So the kernel developers are likely to be wrestling with these issues for some time yet.


(Log in to post comments)

Module unloading in a reference counted world

Posted Jan 29, 2004 2:23 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

Refcounting? Come on. Why not a kernel-level garbage collector? Besides being better from a performance standpoint (refcounts are expensive to update all the time), this would also help resolve a lot of these circularity problems. And it'd make the Lispers happy, too. :-)

Module unloading in a reference counted world

Posted Jan 29, 2004 7:40 UTC (Thu) by nix (subscriber, #2304) [Link]

Of course, there *is* a kernel-level garbage collector, but it's for AF_UNIX sockets. :)

Garbage collector

Posted Jan 29, 2004 15:50 UTC (Thu) by rwmj (subscriber, #5474) [Link]

Yes, just what I was going to post :-)

All of these problems that the kernel developers keep reporting were solved back in the 60s and 70s by garbage collectors which are *more efficient* than hand allocation.

Rich.

Module unloading in a reference counted world

Posted Jan 29, 2004 21:52 UTC (Thu) by chad.netzer (✭ supporter ✭, #4257) [Link]

Any garbage collecting scheme has to be implementable on all architectures (like ref counting is), not too punishing on the cache (like ref counting is), and fairly small and simple (like ref counting).

Despite its disadvantages, ref counting still has (and will perhaps always have) its place. In particular, the number of counts for modules is probably quite low, and fairly static, so that inc/decref performance issues aren't much concern (but cache issues are). Circular dependencies may be the bigger problem, but even then, in the limited domain of kernel modules, one that might be addressed in a straightforward way.

However, I welcome comments about what specific advantages other garbage collecting schemes might offer.

Module unloading in a reference counted world

Posted Jan 29, 2004 11:08 UTC (Thu) by xanni (subscriber, #361) [Link]

As I have already told Linus in person on at least two occasions, there are unfortunately several other important reasons to unload modules, for example:

  • To reinitialise hardware without rebooting (rescanning SCSI busses!)
  • Hot-pluggable hardware (including PCMCIA and USB), especially on laptops that are suspended rather than being rebooted and that sometimes use a vast number of different hot-pluggable devices between reboots
  • Rarely used filesystems on removable media (e.g. accessing an HFS CD-ROM)
Cheers,
       *** Xanni ***

Why o why?

Posted Jan 29, 2004 12:55 UTC (Thu) by hummassa (subscriber, #307) [Link]

I did not understand your arguments. Modules can be shutdown/reinitialized without unloading. So, let's see:

  • To reinitialise hardware without rebooting... shutdown module, reinitialize module, no need to unload.
  • Hot-pluggable hardware (including PCMCIA and USB), especially on laptops that are suspended rather than being rebooted and that sometimes use a vast number of different hot-pluggable devices between reboots... vast number? let's see... I have two different 802.11b usb adapters + mp3 player + camera + webcam + mouse + keyboard + palmtop + flash disk + smart media reader. All of them take, like, 100KiB of the kernel memory? the hotplug routine is (at boot): verify all hotplugable stuff if they are still there, if not send hotplug-unplug event to the driver, it sits there until you want to plug the thing on again.
  • Rarely used filesystems on removable media (e.g. accessing an HFS CD-ROM)... why not just leave it there?

Ok, before you start hating me, I will give the only real good argumento pro-module-unloading: so you can upgrade a buggy or insecure module without (possibly expensive in terms of time) rebooting the machine.

Why o why?

Posted Jan 29, 2004 19:04 UTC (Thu) by brouhaha (subscriber, #1698) [Link]

The same issues with module unloading also occur with simply shutting down a module, unless you don't mind letting the module leak memory.

Why o why?

Posted Jan 31, 2004 0:23 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

I guess it depends upon what you mean by shutdown.

However, the examples given are just cases where traditional modules do something only at load time, so the only way to do it over is to unload and load again. But there's no reason those modules couldn't do the same thing (tearing down, rebuilding) while loaded and running.

disallow unloading? Puh-lease!

Posted Jan 31, 2004 0:29 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

People are never going to accept a world where module loading is one way, and it has nothing to do with practical arguments. It's simply unclean.

Linus is usually pretty sensitive to the difference between clean designs and unclean ones; I'm surprised he favors one-way module loading.

disallow unloading? Puh-lease!

Posted Jan 31, 2004 0:56 UTC (Sat) by wolfrider (guest, #3105) [Link]

--I'll give you an AMEN, and a real-world example: Knoppix live-cd's.

--Live-cds can load a bunch of things by default (like NTFS module) that I simply **do not have** on my system. The module is sitting there taking up memory, and with 2.4 and ALL PREVIOUS kernels -- all the way back to 2.0 -- I can unload the module if it's not needed.

--To do away with that functionality leads to madness and despair, and (l)user revolt. ;-)

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds