LWN.net Logo

A different approach to module races

The topic of module unload races - where the kernel can end up calling into a module which has been removed - comes back occasionally. Much work has been done in 2.5 to reduce and eliminate these races. Part of that effort was moving module reference counting outside of the modules themselves. The result was a safer scheme, but one which imposes new requirements on kernel code which calls into modules. In some kernel subsystems (networking), the maintainers have decided that there is no need to worry about reference counting for modules; they simply ignore it.

Enter Rusty Russell. Since the reference counts are seen to be a pain, and some code isn't using them at all, why not simply get rid of them? He has submitted a patch which does exactly that.

Of course, the issue of how to safely remove modules remains. Without reference counts, how does the kernel know when it can actually get rid of a particular module? With Rusty's patch, a different approach is taken: modules are never actually removed. If an administrator invokes rmmod, the module's cleanup function will be called and all kernel knowledge of the module will go away - but the module code itself will remain in the kernel. The patch thus sacrifices some system memory on every unload as a way of avoiding unload races.

Some developers liked this patch, others didn't. For a kernel hacker who is debugging a module, a little lost memory for each load/unload cycle is probably not a big problem; the system will likely be rebooted soon anyway. The patch does present a bigger problem for Linux installers, however; many of these do hardware detection by loading almost every module available and seeing which ones actually find something. On a "small" system (that is, say, 64MB), it is possible that some distribution installers would simply run out of memory and die.

Rusty proposed adding a special rmmod option which would clean up memory left behind by deleted modules (while also marking the kernel tainted). For now, however, all of this has been made irrelevant by Linus, who decreed: "First off - we're not changing fundamental module stuff any more." This statement drew an amused response from Rusty ("OK. Who are you and what have you done with the real Linus?"), but the general sigh of relief from most kernel hackers could be heard worldwide. It seems that Linus is truly holding the line and keeping out potentially disruptive changes this time around.


(Log in to post comments)

A different approach to module races

Posted Jul 31, 2003 17:19 UTC (Thu) by cpeterso (guest, #305) [Link]

I don't understand why module ref-counting is so difficult. Since device driver modules are written by so many people (outside the Linux core team), maybe modules are more likely to have ref-count bugs. Instead of relying on every module developer to get it right, maybe the problem can be abstracted and solved correctly once in the core kernel code. If lazy module developers cannot be trusted to write their ref-counting code correctly, maybe the kernel needs to add module garbage collection! :-)

I'm only half-joking! Module GC would probably be much easier than heap GC. The kernel probably keeps a list of loaded modules and references to the modules from other drivers or buffers. This list could be traversed and leaked modules can then safely be unloaded.

A different approach to module races

Posted Jul 31, 2003 21:30 UTC (Thu) by rankincj (subscriber, #4865) [Link]

There has been a lot of core-kernel help for module references. Most of the important kernel structures have an "owner" field that is (or should be) initialised to THIS_MODULE. In theory, the kernel can then increment and decrement the modules' usage counters when it knows that it is safe to do so.

Part of the problem has been that a lot of structures that needed owner fields didn't have them. The USB subsystem only gained an owner field in 2.4.20, and this created a maintenance problem for out-of-kernel drivers supporting earlier versions. The 2.4 serial layer doesn't have one at all, although I think the 2.6 layer might. And the locking requirements for stacks of related modules aren't always obvious, either. I suppose the simplest way of looking at the problem is that only the core-kernel or a locked module should modify another module's usage counter, although I'm sure that people can think of various exceptions to this, or cases when it cannot be achieved for some reason...

As for module garbage collection, I have that already in the form of a cron job that runs "rmmod -a" once an hour. However, the problem is to know which modules can be unloaded in the first place.

Why it's harder than it looks

Posted Aug 1, 2003 1:24 UTC (Fri) by Peter (guest, #1127) [Link]

I don't understand why module ref-counting is so difficult.

Try it some time. Write your own patch for module ref-counting. But, so as not to repeat the mistakes of the past, keep in mind a few points:

  • A request for module removal can come at any time. Not just at "convenient" times. The module may be running an IRQ handler on a different CPU, for example, and in such a case the module reference count had better not be zero. You are not allowed to decree that "module removal will only be attempted when the user knows the system isn't using that device". Likewise, you have to assume an SMP system; if you ignore SMP concurrency, you aren't solving the problem.
  • A module cannot, in general, manipulate its own reference count. Because if a module decrements its own count and it goes to zero, the module could disappear - *poof* - right in the middle of the module's execution thread. Likewise it rarely makes sense for a module to increment its own reference count - because, what if the count was zero before the increment operation? Then the module could disappear - *poof* - before the count is incremented. Thus, incrementing one's own ref count is in general no protection at all.
  • Any time you put yourself on a wait queue or similar, you need a reference. But keep in mind that, for reasons discussed above, it doesn't make much sense to take the reference yourself. Someone outside the module generally must do it for you.
  • Any time a module's data structures are registered in an external list (like a list of filesystems, etc), it needs a reference. That reference should be deleted when the structure is unregistered - keeping in mind, again, that the module itself must not decrement its own count from 1 to 0.
  • Try to minimize your overhead. Taking and releasing a module reference on every interrupt is a Bad Idea.
  • Whatever solution you come up with, make sure it is easy to update and verify all device drivers for correct operation. That's actually the hardest part. There are hundreds of drivers in the kernel tree, and a lot of them are not module-removal-safe. Your scheme must not be overly complex, such that it's impossible to fix all these drivers. Ideally, you change things in the kernel core such that drivers become safe automatically.

If you think you can solve this in a race-free manner, without imposing any interesting burdens on 99% of the modules out there (both in kernel and out of it), please post your patch to linux-kernel. I'm sure Rusty and Dave Miller will be happy to look at it.

(Note that Rusty, Dave and others have already solved all of the above problems, but they are left with a compliance issue in that most modules do not follow the rules they have set and thus are not "safe".)

Why it's harder than it looks

Posted Aug 1, 2003 16:48 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

Here's one more, that rarely gets attention:

- Any time you create a new thread executing the module code, you need a reference. The reference must exist from before the thread gets created to after the thread is dead.

By the way, there are two kinds of references involved. One type prevents the module unloader from attempting an unload. The other tells the module itself whether it's safe to be unloaded. The difference is that the module is capable of undoing some references or waiting for them to go away, whereas the module unloader isn't. E.g. while the module has a major number registered, it has a reference and cannot unload. But you wouldn't want rmmod to fail just because that reference exists. The module cleanup function can undo that reference by unregistering the major number.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.