Weekly Edition Return to the Kernel pageSponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
Generic semaphores
Most kernel patches delete some code, replacing it with newer and
(presumably) better code. Much of the time, it seems, the new code is more
voluminous than what came before.
Occasionally, though, a patch comes along which
deletes over 7600 lines of code - replacing it with a mere 314 lines -
while claiming to maintain the same functionality. Matthew Wilcox's generic semaphore patch is one
of those changes.
In essence, a semaphore is a counter with a wait queue attached to it. When kernel code wants to access the resource protected by the semaphore, it makes a call to:
void down(struct semaphore *sem);
This call will check the counter associated with sem; if it is greater than zero, the counter will be decremented and control returns to the caller. Otherwise the caller will be put to sleep until sometime in the future when the counter has been increased again. Increasing the counter - when the the protected resource is no longer needed - is done with a call to up(). Semaphores can be used in any situation where there is a need to put an upper limit on the number of processes which can be within a given critical section at any time. In practice, that upper limit is almost always set to one, resulting in semaphores which are used as a straightforward mutual exclusion primitive. In current kernels, semaphores are implemented with highly-optimized, architecture-specific code. There are, in fact, more than twenty independent semaphore implementations in the kernel code base. Matthew's patch rips all of that out and replaces it with a single, generic implementation which works on all architectures. After the patch is applied, a semaphore looks like this:
struct semaphore {
spinlock_t lock;
int count;
struct list_head wait_list;
};
The implementation follows from this definition in a straightforward way: the spinlock is used to protect manipulations of count, while wait_list is used to put processes to sleep when they must wait for count to increase. The actual code, of course, is somewhat complicated by performance and interrupt-safety considerations, but it remains relatively short and simple. One might ask: why weren't semaphores done this way in the first place? The answer is that, once upon a time (prior to 2.6.16), semaphores were one of the primary mutual exclusion mechanisms in the kernel. The 2.6.16 cycle brought in mutexes from the realtime tree, and most semaphore users were converted over. So semaphores, which were once a performance-critical primitive, are now much less so. As a result, any need there may have been for carefully hand-tuned, architecture-specific code is gone. So the code might as well go too. The other question which comes up is: why are semaphores still being used at all? The number of semaphore users has dropped considerably since 2.6.16, but there are still a number of them in the kernel. Some of those could certainly be converted to mutexes, but doing so requires a careful audit of the code to be sure that the semaphore's counting feature is not being used. Once that work is done, it may turn out that, in some places, a semaphore is truly the right data structure. So semaphores are likely to remain - but they'll require rather less code than before. (Log in to post comments)
Generic semaphores Posted Mar 20, 2008 4:27 UTC (Thu) by daniel (subscriber, #3181) [Link] Hi Jon, "The other question which comes up is: why are semaphores still being used at all?" Stay away from my counting semaphores, please :-) Thankyou. Daniel
Mutex is not a semaphore Posted Mar 20, 2008 5:22 UTC (Thu) by ikm (subscriber, #493) [Link] Somehow I feel that the "mutex vs semaphore" distinction would have been nice to have from the start — this way the question "why are semaphores still being used at all?" would never arise at all. I'd like to notice that one use of a semaphore as of a "fifo for token-like objects", that is, the one where the initial value is usually 0 and where one side initiating pushes causes another side to initiate the same amount of pops is not mentioned in this article at all, while e.g. for me it was always the primary use for semaphores (when I had dedicated mutex primitives for handling mutual exclusions, that is).
That is, semaphore is not a mutex Posted Mar 20, 2008 5:49 UTC (Thu) by ikm (subscriber, #493) [Link] Subject should've been the other way around, sorry :) Point is, stop using them interchangeably!
Mutex is not a semaphore Posted Mar 22, 2008 3:22 UTC (Sat) by willy (subscriber, #9762) [Link] > I'd like to notice that one use of a semaphore is of a "fifo for > token-like objects", that is, the one where the initial value is > usually 0 The semaphore implementations (both current and new) do support this, but we have few if any users of it currently in the kernel. We have the 'completion' API which sort-of does what you want (it's specialised for 'I am exiting now', but doesn't have to be used that way). My motivation was really not to correct mutex vs semaphore usage but to add new features to semaphores -- down_killable() and down_timeout() are the two I've added so far, each taking very few additional lines of code. I now suspect, having read over the completion code, that completions could be rewritten in terms of my new semaphore implementation. I'll look into it later. Matthew Wilcox
Generic semaphores Posted Mar 21, 2008 20:02 UTC (Fri) by Alan.Stern (subscriber, #12437) [Link] One reason for keeping semaphores around is that they aren't subject to lockdep checking, whereas mutexes are. While this checking is a very good thing and I'm all in favor of it, the fact remains that there are some usage patterns lockdep cannot handle. A typical example is a tree of data structures (like the device tree), where locks must be acquired from the top down, or from the bottom up.
Generic semaphores Posted Mar 22, 2008 3:31 UTC (Sat) by willy (subscriber, #9762) [Link] > A typical example is a tree of data structures (like the device tree), > where locks must be acquired from the top down, or from the bottom up. lockdep would be right to moan about that -- if you have two tasks trying to do that, they can deadlock against each other. The real problem with lockdep is that mutexes have an owner and semaphores simply don't. If they're counting (>1) then they can have multiple simultaneous owners. And, in some circumstances, they can be acquired from interrupt context. Horrible, but there you go.
Generic semaphores Posted Mar 22, 2008 13:47 UTC (Sat) by Alan.Stern (subscriber, #12437) [Link] > lockdep would be right to moan about that -- if you have two tasks > trying to do that, they can deadlock against each other. There you go -- you are suffering the same weakness as lockdep. It's not possible for two tasks to deadlock when acquiring locks in a tree structure provided that the locks are always acquired going down a branch. (Or if the locks are always acquired going up a branch.) This sort of constraint could potentially be represented within lockdep, but right now there's no way to do it. > The real problem with lockdep is that mutexes have an owner and > semaphores simply don't. That isn't a problem with lockdep; it's a problem with trying to use lockdep to track general semaphore usage. Lack of explicit parents for semaphores hasn't prevented many semaphores from being converted to mutexes, complete with lockdep monitoring. My point was that even though a semaphore may be used for simple mutual exclusion in a way that should be compatible with replacement by a mutex, the replacement can't be carried out if the semaphore is being used to lock members of a tree.
Generic semaphores Posted Mar 23, 2008 13:37 UTC (Sun) by willy (subscriber, #9762) [Link] > It's not possible for two tasks to deadlock when acquiring locks in a > tree structure provided that the locks are always acquired going down > a branch. (Or if the locks are always acquired going up a branch.) That's not what you said before! It sounded like you were describing a situation where task A acquires locks walking down the tree and simultaneously task B acquires locks walking up the tree. That can't work. lockdep does manage to handle situations like the VFS where we have i_mutex protecting each inode and some rather byzantine rules regarding which have to be locked first (it's particularly hairy when doing cross-directory renames -- you have to lock the renamed object, its parent and the destination directory. While not deadlocking against any other operation happening at the same time). So your assertion is demonstratedly untrue, but you still can't give semaphores an owner.
Generic semaphores Posted Mar 23, 2008 14:15 UTC (Sun) by Alan.Stern (subscriber, #12437) [Link] > That's not what you said before! It sounded like you were describing a > situation where task A acquires locks walking down the tree and > simultaneously task B acquires locks walking up the > tree. That can't work. Okay, what I wrote before wasn't sufficiently unambiguous and you misinterpreted it. > So your assertion is demonstratedly untrue, but you still can't give > semaphores an owner. Lockdep works in the special-case example of VFS. It still can't be made to handle general tree-usage patterns. It's true that many semaphores can't be given an owner. However there are some which can, but which nevertheless can't be converted to a mutex.
Generic semaphores Posted Mar 23, 2008 16:45 UTC (Sun) by willy (subscriber, #9762) [Link] > Lockdep works in the special-case example of VFS. It still can't be > made to handle general tree-usage patterns. I don't think that's true. Take a look at Documentation/lockdep-design.txt. The mutex_lock_nested() interface needs to be specialised for each tree, but that doesn't look hard -- the one in linux/fs.h (inode_i_mutex_lock_class) is more complex than most. It looks like a parent/child locking rule would be sufficient for most trees.
Generic semaphores Posted Mar 23, 2008 23:33 UTC (Sun) by Alan.Stern (subscriber, #12437) [Link]
> It looks like a parent/child locking rule would be sufficient for most
> trees.
Only if you never need to hold more than two locks at a time. In general that isn't good
enough. (It certainly isn't good enough for the device tree!) Furthermore lockdep has a
limit of 8 subclasses; thus you won't be able to acquire the locks for all the nodes along a
branch if the branch is too long.
In addition, trees can have other, more complicated, access patterns which lockdep can't even
come close to handling. For instance, the rule about always acquiring locks going down a
branch can be generalized as follows:
Whenever you hold a lock on a node A, you must not acquire
a lock on node B unless you already hold the lock for the
closest common ancestor of A and B.
This rule allows you to acquire locks down a branch, but it allows other patterns as well. If
all threads follow the rule then deadlock can never occur (exercize!). Clearly this is far
beyond lockdep's ability to express.
|
Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.