Why it's harder than it looks
Posted Aug 1, 2003 1:24 UTC (Fri) by
Peter (guest, #1127)
In reply to:
A different approach to module races by cpeterso
Parent article:
A different approach to module races
I don't understand why module ref-counting is so difficult.
Try it some time. Write your own patch for module ref-counting. But, so as not to repeat the mistakes of the past, keep in mind a few points:
- A request for module removal can come at any time. Not just at "convenient" times. The module may be running an IRQ handler on a different CPU, for example, and in such a case the module reference count had better not be zero. You are not allowed to decree that "module removal will only be attempted when the user knows the system isn't using that device". Likewise, you have to assume an SMP system; if you ignore SMP concurrency, you aren't solving the problem.
- A module cannot, in general, manipulate its own reference count. Because if a module decrements its own count and it goes to zero, the module could disappear - *poof* - right in the middle of the module's execution thread. Likewise it rarely makes sense for a module to increment its own reference count - because, what if the count was zero before the increment operation? Then the module could disappear - *poof* - before the count is incremented. Thus, incrementing one's own ref count is in general no protection at all.
- Any time you put yourself on a wait queue or similar, you need a reference. But keep in mind that, for reasons discussed above, it doesn't make much sense to take the reference yourself. Someone outside the module generally must do it for you.
- Any time a module's data structures are registered in an external list (like a list of filesystems, etc), it needs a reference. That reference should be deleted when the structure is unregistered - keeping in mind, again, that the module itself must not decrement its own count from 1 to 0.
- Try to minimize your overhead. Taking and releasing a module reference on every interrupt is a Bad Idea.
- Whatever solution you come up with, make sure it is easy to update and verify all device drivers for correct operation. That's actually the hardest part. There are hundreds of drivers in the kernel tree, and a lot of them are not module-removal-safe. Your scheme must not be overly complex, such that it's impossible to fix all these drivers. Ideally, you change things in the kernel core such that drivers become safe automatically.
If you think you can solve this in a race-free manner, without imposing any interesting burdens on 99% of the modules out there (both in kernel and out of it), please post your patch to linux-kernel. I'm sure Rusty and Dave Miller will be happy to look at it.
(Note that Rusty, Dave and others have already solved all of the above problems, but they are left with a compliance issue in that most modules do not follow the rules they have set and thus are not "safe".)
(
Log in to post comments)