LWN.net Logo

Reworking the semaphore interface

The Linux kernel contains a full counting semaphore implementation. Given a semaphore, a call to down() will sleep until the semaphore contains a positive value, decrement that value, and return. Calling up() increments the semaphore's value and wakes up a process waiting for the semaphore, if one exists. If the initial value of the semaphore is ten, then ten different threads can call down() without blocking.

Most users of semaphores do not use the counting feature, however. Instead, they initialize the semaphore to a value of one, allowing a single thread to hold the semaphore at any given time. This mode of use turns a semaphore into a "mutex," a mutual exclusion primitive which can be used to implement critical sections. Using a semaphore in this way is entirely valid.

There is one little issue, however: a simple binary mutex can often be implemented more cheaply than a full counting semaphore. If a semaphore is used in the mutex mode, the extra cost of the counting capability is simply wasted. Linux semaphores also suffer from highly architecture-dependent implementations, to the point that any changes to the semaphore API are very difficult to make. So cleaning up semaphores has been one of those items on the "do to" list for some time.

David Howells went ahead and did it. His patch adds a new, binary mutex type to the kernel. Since almost all of the semaphores currently in use are, in reality, mutexes, David changed the prototypes of most of the semaphore functions (down() and variants, up(), init_MUTEX(), DECLARE_MUTEX()) to take a mutex rather than a semaphore. To make things work again, most semaphore declarations have been changed to struct mutex, but, beyond the declaration change, code using mutexes need not be modified.

For code which truly needs a semaphore, a new set of functions has been provided:

    void down_sem(struct semaphore *sem);
    void up_sem(struct semaphore *sem);
    int down_sem_trylock(struct semaphore *sem);
    ...

Kernel code which was actually using the counting capability of semaphores has been changed to use the new functions.

This patch makes fundamental changes to the kernel's mutual exclusion mechanisms, creates a flag day which breaks all out-of-tree code, and is generally quite large. But there is surprisingly little resistance to the patch in general. Some developers are concerned that some counting semaphores may have been converted to mutexes erroneously - it is hard to audit that much code and be absolutely sure of how every semaphore is used. It has also been noted that the posted mutex implementation may actually be slower than the semaphores it replaces, but that is something which, it is assumed, can be fixed. In general, however, almost nobody objects to making this sort of change.

There are some disagreements over just how the change should be done, however. Some developers do not want to see the old down() and up() functions switched to a different type which has no counter to bump "down" or "up." The alternative would be to create a completely new API for mutexes; Alan Cox has suggested names like sleep_lock() and sleep_unlock(). A completely new API would make it clear what is really going on; it would also make it possible to change over users gradually as they are audited.

Some developers would rather see a big flag day than a year-long series of patches slowly converting semaphore users over to mutexes. For them, the mutex changeover is a chance to get the API right, and they would rather see everything changed over at once. Gradual changeovers, it is argued, never seem to come to a conclusion; examples include the continued existence of the big kernel lock and the long-deprecated sleep_on() functions. Rather than live with a deprecated API for years, it may be better to just take the pain all at once and be done with it.

It has also been pointed out that there is another mutex patch in circulation: the real-time preemption tree has had mutexes for the last year. So far, there has been no real debate on whether the -rt implementation is better; Ingo Molnar does not seem to be pushing it, even though this might be a good opportunity to merge a significant chunk of the -rt tree into the mainline.

In the end, it looks like some sort of mutex patch is likely to be merged into a future mainline kernel - though it almost certainly will not be ready when the 2.6.16 window opens. The form of that patch could change significantly, however; stay tuned.


(Log in to post comments)

Reworking the semaphore interface

Posted Dec 15, 2005 12:02 UTC (Thu) by nurhussein (guest, #16226) [Link]

We could use the original dutch P and V for semaphore naming :)

Reworking the semaphore interface

Posted Dec 18, 2005 23:42 UTC (Sun) by farnz (guest, #17727) [Link]

For those of us who haven't researched semaphores thoroughly, and who are unfortunately monolingual, would you mind explaining the original Dutch P and V?

I'm assuming that they're both abbreviations for Dutch words, but as I speak English adequately, and can just about speak enough French to be understood (usually discovering in the process that the French speak better English than I do), I'd appreciate an explanation.

Dutch origins

Posted Dec 19, 2005 4:29 UTC (Mon) by xoddam (subscriber, #2322) [Link]

V is for 'verhoog', or 'increment', while P is for the neologism 'Prolag' formed from 'Probeer te verlagen', 'try to decrement'.

See http://www.cs.utexas.edu/users/EWD/transcriptions/EWD00xx/EWD51.html (Link found on Wikipedia)

Reworking the semaphore interface

Posted Dec 15, 2005 15:08 UTC (Thu) by jreiser (subscriber, #11027) [Link]

The improved clarity of identifying mutual exclusion [only] is welcome. However, a flag day (when the old usage no longer works at all) is painful. Could there be a compile-time option CONFIG_KMUTEX_USES_KSEMAPHORE which preserves source compatibility, perhaps even binary compatibility?

Reworking the semaphore interface

Posted Dec 15, 2005 22:02 UTC (Thu) by khim (subscriber, #9252) [Link]

Could there be a compile-time option CONFIG_KMUTEX_USES_KSEMAPHORE which preserves source compatibility, perhaps even binary compatibility?

Binary compatibility is not even considered (as usual), source compatibility is discussed.

Reworking the semaphore interface

Posted Dec 18, 2005 9:14 UTC (Sun) by hamjudo (subscriber, #363) [Link]

As a very occasional kernel hacker who mostly codes by looking at other device drivers. If I model my code after obsolete code or documentation, I don't want source code compatibility. I want a compile time error message that is easy to google.

If there is a flag day, could someone do people like me a favor? Attempt to compile a function written in the old style, with headers for the new style, then paste the error message into a comment attached to this article. Or if the details change, the article describing the updated change.

If I forget about this change when it matters and I make the mistake, google will lead me to an article that tells me the right way to do it.

Why not both?

Posted Dec 15, 2005 18:23 UTC (Thu) by Ross (subscriber, #4065) [Link]

People can have their flag day and also have clearer function names.

Specifically, get rid of up() and down() (those names are too short anyway).

Force all users to either move to sem_up() and sem_down() or sleep_lock() and sleep_unlock(). The advantage (in addition to clarity) is that you will be able to tell old uses of up() and down() from new ones. As code is converted it will be clear which set of functions is intended to be used and code which has not been converted will also be easy to identify.

Why not both?

Posted Dec 20, 2005 11:36 UTC (Tue) by arafel (subscriber, #18557) [Link]

Too short? Short can be good. :-)

Must admit I prefer something like mutex_take and mutex_release rather than the sleep_lock names, but it's only a name...

Debugging misconverted semaphores

Posted Dec 15, 2005 18:45 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246) [Link]

The debug side of this seems simple enough, at least in theory. To catch semaphores that were incorrectly converted to mutexes, have a compile option that compiles mutexes as semaphores and adds a BUG_ON() call to fault whenever the semaphore count is something other than 0 or 1.

Make the debug mode the default in the first major kernel release, sit back, and watch the popcorn pop. :-)

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds