|
|
Subscribe / Log in / New account

The trouble with volatile

Your editor's copy of The C Programming Language, Second Edition (copyright 1988, still known as "the new C book") has the following to say about the volatile keyword:

The purpose of volatile is to force an implementation to suppress optimization that could otherwise occur. For example, for a machine with memory-mapped input/output, a pointer to a device register might be declared as a pointer to volatile, in order to prevent the compiler from removing apparently redundant references through the pointer.

C programmers have often taken volatile to mean that the variable could be changed outside of the current thread of execution; as a result, they are sometimes tempted to use it in kernel code when shared data structures are being used. Andrew Morton recently called out use of volatile in a submitted patch, saying:

The volatiles are a worry - volatile is said to be basically-always-wrong in-kernel, although we've never managed to document why, and i386 cheerfully uses it in readb() and friends.

In response, Randy Dunlap pulled together some email from Linus on the topic and suggested to your editor that he could maybe help "document why." Here is the result.

The point that Linus often makes with regard to volatile is that its purpose is to suppress optimization, which is almost never what one really wants to do. In the kernel, one must protect accesses to data against race conditions, which is very much a different task.

Like volatile, the kernel primitives which make concurrent access to data safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent unwanted optimization. If they are being used properly, there will be no need to use volatile as well. If volatile is still necessary, there is almost certainly a bug in the code somewhere. In properly-written kernel code, volatile can only serve to slow things down.

Consider a typical block of kernel code:

    spin_lock(&the_lock);
    do_something_on(&shared_data);
    do_something_else_with(&shared_data);
    spin_unlock(&the_lock);

If all the code follows the locking rules, the value of shared_data cannot change unexpectedly while the_lock is held. Any other code which might want to play with that data will be waiting on the lock. The spinlock primitives act as memory barriers - they are explicitly written to do so - meaning that data accesses will not be optimized across them. So the compiler might think it knows what will be in shared_data, but the spin_lock() call will force it to forget anything it knows. There will be no optimization problems with accesses to that data.

If shared_data were declared volatile, the locking would still be necessary. But the compiler would also be prevented from optimizing access to shared within the critical section, when we know that nobody else can be working with it. While the lock is held, shared_data is not volatile. This is why Linus says:

Also, more importantly, "volatile" is on the wrong _part_ of the whole system. In C, it's "data" that is volatile, but that is insane. Data isn't volatile - _accesses_ are volatile. So it may make sense to say "make this particular _access_ be careful", but not "make all accesses to this data use some random strategy".

When dealing with shared data, proper locking makes volatile unnecessary - and potentially harmful.

The volatile storage class was originally meant for memory-mapped I/O registers. Within the kernel, register accesses, too, should be protected by locks, but one also does not want the compiler "optimizing" register accesses within a critical section. But, within the kernel, I/O memory accesses are always done through accessor functions; accessing I/O memory directly through pointers is frowned upon and does not work on all architectures. Those accessors are written to prevent unwanted optimization, so, once again, volatile is unnecessary.

Another situation where one might be tempted to use volatile is when the processor is busy-waiting on the value of a variable. The right way to perform a busy wait is:

    while (my_variable != what_i_want)
        cpu_relax();

The cpu_relax() call can lower CPU power consumption or yield to a hyperthreaded twin processor; it also happens to serve as a memory barrier, so, once again, volatile is unnecessary. Of course, busy-waiting is generally an anti-social act to begin with.

There are still a few rare situations where volatile makes sense in the kernel:

  • The above-mentioned accessor functions might use volatile on architectures where direct I/O memory access does work. Essentially, each accessor call becomes a little critical section on its own and ensures that the access happens as expected by the programmer.

  • Inline assembly code which changes memory, but which has no other visible side effects, risks being deleted by GCC. Adding the volatile keyword to asm statements will prevent this removal.

  • The jiffies variable is special in that it can have a different value every time it is referenced, but it can be read without any special locking. So jiffies can be volatile, but the addition of other variables of this type is frowned upon. Jiffies is considered to be a "stupid legacy" issue in this regard.

For most code, none of the above justifications for volatile apply. As a result, the use of volatile is likely to be seen as a bug and will bring additional scrutiny to the code. Developers who are tempted to use volatile should take a step back and think about what they are truly trying to accomplish.

(Thanks to Randy Dunlap for getting things started and researching the issue, and to Satyam Sharma, and Johannes Stezenbach for comments on the first draft of this article).

Index entries for this article
KernelCoding style
Kernelvolatile


to post comments

The trouble with volatile

Posted May 10, 2007 17:38 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link]

One other use for volatile is when communicating with interrupt/NMI handlers running on the same CPU. In this case, all the accesses are on the same CPU, so CPU reordering is irrelevant -- CPUs see their own accesses in order. The only requirement is that the compiler avoid optimizations that re-order the code.

Note that this applies only to hardware interrupts or NMIs -- this technique does not necessarily work for "interrupt handlers" that are passed off to threads.

So this is a specialized technique (especially in -rt), but an entirely valid one.

The trouble with volatile

Posted May 10, 2007 22:04 UTC (Thu) by ncm (guest, #165) [Link] (4 responses)

People who think they want something volatile should know that assigning to a volatile variable, or through a pointer to volatile, generates a store instruction followed by a load instruction. The load is normally optimized away, but the language definition says it's there (because an assignment has a value), and you said you didn't want optimization. Sometimes the extra load just slows things down, but if the address is really a register, it may be actively wrong.

The trouble with volatile

Posted May 11, 2007 5:54 UTC (Fri) by jzbiciak (guest, #5246) [Link] (3 responses)

Huh? That makes no sense. The *assignment statement* has a value, sure, and that's the value referenced in any outer context. For example:

volatile int   a;
volatile short b;
volatile int   c;

a = b = c;

This code will read 'c' cast it to the type of 'b', and then write it to 'b'. That whole expression, 'b = c', takes on the same value as was written to 'b'. That value gets written to 'a'. You aren't reading 'b' and assigning it to 'a', you're assigning the value of the expression 'b = c', plain and simple. That makes sense. Let's suppose 'b' gets modified by an interrupt handler, and the interrupt happens right after the write to 'b'. The code will always write the same value to 'a' regardless of whether 'b' even takes on the value written to it.

Now, on a compiler that doesn't do register allocation, or for which that's disabled, the value of the expression 'b = c' might get written to a compiler temporary in memory, and the compiler issues writes and reads for that location. I assert it's actually incorrect to read 'b' and write it to 'a' when writing 'a = b = c'.

If all assignments had a hidden read behind them, then Duff's Device probably would never have worked. The motivation for Duff's Device was fast writes through a volatile pointer to a hardware FIFO.

Now, C++ on the other hand... they allow assignments to be lvalues, which causes all sorts of wackiness. And so C++ apparently does behave somewhat as you describe, at least in the 'a = b = c' case. (An assignment in isolation still doesn't have a phantom read.) It bolsters my argument above that they consider this a deviation from how C behaves. Take a peek here.

At any rate, since we're talking about the Linux kernel, we're talking C, not C++.

The trouble with volatile

Posted May 11, 2007 6:45 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Hmmm...

I did some playing around, and it appears different compilers treat the a = b = c case rather differently than I expected. Some *do* in fact read 'b' after writing it (which just seems crazy to me).

I've asked our compiler optimizer lead developer at work his take on this topic.

In the meantime, allow me to throw a mea culpa out there and then I'll shut up. :-)

The trouble with volatile

Posted May 15, 2007 9:15 UTC (Tue) by IkeTo (subscriber, #2122) [Link] (1 responses)

I think the original response says the following simpler code will cause a store plus load:
volatile int a;
void f() {
        a = 2;
}
The argument is that "a = 2" has a value which requires a load to find, and the compiler is forbidden from optimizing out that load. I don't think the code generated by my 4.1.1 gcc reflects this, however:
        movl    $2, a
        ret
So perhaps the OP is speaking from a rather old experience or a different compiler.

The trouble with volatile

Posted May 15, 2007 12:20 UTC (Tue) by jzbiciak (guest, #5246) [Link]

I spoke to the compiler optimizer lead at work. He assures me that assignment ONLY implies a write, even if the value of the assignment expression is used in a subsequent expression.

At least as far as he's concerned, "a = 2" should only compile to a write, and "a = b = 2" should compile to two writes. "a = b = c" should compile to one read (reading 'c'), and two writes (to 'a' and 'b').

That said, our compiler actually compiled "a = b = c" more like "b = c; a = c" (reading 'c' twice), which he indicated was a bug. GCC seems to compile "a = b = c" more like "b = c; a = b", which is actually pretty close to what the OP was suggesting would happen. (Read 'c', Write 'b'. Read 'b', write 'a'.) Fun stuff.

So... a secondary lesson is, "However YOU might have interpreted the C specification, chances are your compiler treats it differently for anything other than the simplest of expressions."

The trouble with volatile

Posted May 11, 2007 6:16 UTC (Fri) by jzbiciak (guest, #5246) [Link]

Hiding 'volatile' in the various accessor macros, etc. is entirely different than bare 'volatile' in the code. The statement that "volatile is said to be basically-always-wrong in-kernel" applies to the second case in my opinion.

The platform and compiler specific trickery is safely hidden away, and the rest of the kernel "does the right thing" no matter where it runs.

This document will keep you busy for awhile. Heavy stuff.

The trouble with volatile

Posted May 11, 2007 21:50 UTC (Fri) by giraffedata (guest, #1954) [Link] (4 responses)

Linus' conclusion may be right, but his reasoning is all wrong.

First, while the purpose (reason for existence) of "volatile" is to prevent optimization, that's not what it's definition is. The keyword "volatile" does not say to not optimize; it says the value may change all by itself. So the fact that disabling optimization is a poor goal is not an argument against using "volatile."

Then Linus misinterprets the English word "volatile" and claims that data isn't volatile; rather accesses are volatile. The opposite is true.

I think he should have instead pointed out that some data is known to be involatile some of the time, so it is better if you can declare specifically when the data is volatile rather than just say it is volatile in general.

But the ways provided are just as over-general in the other direction. They say in a certain interval every memory location is volatile. There may be variables that due to circumstances cannot change while cpu_relax() runs. The program unnecessarily rereads memory.

I don't doubt that analysis shows that this over-generalization is better than the other over-generalization, but at least we should understand what comparison we're making.

The trouble with volatile

Posted May 14, 2007 7:16 UTC (Mon) by xoddam (subscriber, #2322) [Link]

Thanks girrafedata -- you have summed up all my niggling doubts about Linus' argument. His sweeping generalisations apply as well to explicit barriers as they do to volatile; they are undoubtedly the best solution to the problems they are used in the Linux kernel to solve, but they are not so much less general that his argument sweeps all before it.

What you don't mention (and Linus does) is that volatile can actually produce incorrect (as well as inefficient) code, because the compiler is not necessarily able to produce the right kind of barrier operation. Compilers know about instruction architectures and, maybe, something about which optimisations work best for which processor families, but they aren't oracles on the most appropriate bus synchronisation operations for every system architecture.

The trouble with volatile

Posted May 15, 2007 11:12 UTC (Tue) by IkeTo (subscriber, #2122) [Link] (2 responses)

> it says the value may change all by itself.

I think the inaccuracy and uselessness of this statement is what Linus is really objecting to. With this being inaccurate and useless, the compiler are forced to do useless things as a result. But those are side effects, the real culprit is still the inaccuracy and uselessness.

I believe the inaccuracy of "volatile" is that it says the memory is always changing, which is not really the case. The memory is changed by some other threads, or by some reordering done by some smart CPUs. It is definitely not "changing all the time", with the exception of I/O mapped memory which might.

I think it is mostly useless, because in most cases a simple "volatile" will not do what is intended, and to do what is intended you need something that will make volatile redundant.

E.g., if you are afraid of some other threads changing your variables, you need a lock to prevent that, rather than just an advice to the compiler not to optimize things out; and once you get that, the CPU reordering is dealt with, and the variable is no longer changing at all within the critical region, so you don't want volatile.

Another reply supplies another example: if you do inter-processor communications, "volatile" is not sufficient, because the instruction reordering done by the CPU causes the result to be erratic. You need something like barrier(), rmb(), wmb(), etc. Once you have that, "volatile" is redundant. Yes, those operations are over-general, but at least they work.

What if you really want to do I/O memory mapped read/write? Again, you need memory barriers for the code to work correctly, and once you have them, you don't need volatile.

> Then Linus misinterprets the English word "volatile" and claims that data
> isn't volatile; rather accesses are volatile. The opposite is true.

I agree that it is difficult to characterize an "access" is "volatile" or not, and I admit that I don't quite understand the statement. On the other hand, while "data is volatile" might be true for some time during the execution of the kernel, the "volatile" in the C language does not provide you the abstraction that you need most of the time.

The trouble with volatile

Posted May 16, 2007 11:43 UTC (Wed) by massimiliano (subscriber, #3048) [Link] (1 responses)

I agree that it is difficult to characterize an "access" is "volatile" or not, and I admit that I don't quite understand the statement. On the other hand, while "data is volatile" might be true for some time during the execution of the kernel, the "volatile" in the C language does not provide you the abstraction that you need most of the time.

A funny thing, that made me ring a bell...

In the .NET bytecode specification (called "CIL", Common Intermediate Language), there is the possibility to define something as "volatile", and... guess what? it is accesses that can be marked so, not variables!

This reinforces my idea that the .NET standard is very well thought out: they really learned from the mistakes of the past, at least on technical issues.

The trouble with volatile

Posted May 16, 2007 16:07 UTC (Wed) by giraffedata (guest, #1954) [Link]

In the .NET bytecode specification (called "CIL", Common Intermediate Language), there is the possibility to define something as "volatile", and... guess what? it is accesses that can be marked so, not variables!

This reinforces my idea that the .NET standard is very well thought out: they really learned from the mistakes of the past, at least on technical issues.

You're implying that "volatile" in C was a mistake. It wasn't. While it isn't useful for coordinating threads on separate processors in a 2007 Linux kernel, it's just fine for what it was designed for: coordinating a CPU with a memory-mapped I/O device in the early '80s.

I don't know anything about .NET, but I think the most you can conclude from this volatile thing is that .NET is designed for more modern computers than ANSI C. And that the spec is written poorly -- "volatile" is the wrong word for this.

The trouble with volatile

Posted May 12, 2007 3:46 UTC (Sat) by mikov (guest, #33179) [Link] (4 responses)

There have been some very fun discussions about volatile on comp.std.c in the past. Basically my opinion, although I have never been able to convince anybody, is that volatile is _always_ wrong even in user mode. Period. It serves no purpose whatsoever that couldn't be achieved much better and more reliably in a different way. To put it another way, volatile is useless in a 100% standard-compliant C program, and if the program is not 100% compliant, then it is even more useless.

Also, not all compilers implement it. Notably Borland C in some of its versions didn't.

People might argue that it is useful with signals or longjmp(), but the need for volatile in such scenarios is always an indication of a serious problem in the code.

The way I find it easy to think about it or explain it is that volatile does nothing to prevent the CPU reordering memory accesses, not to mention the visibility of memory accesses across different CPUs. So, it just muddles the issues.

I want to open a big bracket here and mention that volatile in Java an entirely different beast and is much much more useful. According to the Java memory model, volatile actually has very useful guarantees for memory visibility. (Perhaps Linux should consider Java for the kernel :-)

Still, I agree with what already has already been said earlier: it is conceptually wrong to mark data as volatile. Accesses should be marked, bit the data itself. So, Java doesn't get it 100% right either. (But of course it wasn't possible to add lightweight volatile accesses to Java without modifying the language).

The trouble with volatile

Posted May 14, 2007 15:07 UTC (Mon) by BenHutchings (subscriber, #37955) [Link] (1 responses)

"People might argue that it is useful with signals or longjmp(), but the need for volatile in such scenarios is always an indication of a serious problem in the code."

These are the two cases defined by the standard where volatile *must* be used for certain variables. Memory-mapped I/O is the third case it was intended for, but as that's inherently unportable the standard doesn't explicitly mention it. (I think the rationale does.)

The trouble with volatile

Posted May 14, 2007 17:30 UTC (Mon) by mikov (guest, #33179) [Link]

You are right. IIRC, it is required for local variables with setjmp(). I forgot the other case. What was it ? (Hmm, I have my C99 standard somewhere here..)

In any case, it doesn't mean that it actually is a good idea. Requiring volatile for setjmp() is a terrible, fragile, ugly hack. It's been some time since I thought about this, but off the top of my head:

This requirement presumes that the compiler doesn't know anything about setjmp()/longjmp(). What if the function invoking setjmp() gets auto-inlined ? Obviously the programmer can't be required to use volatile for all variables in all functions up the call tree. What is to prevent the auto-inlining ?

I don't view the standard as a mantra. After all, it did standardize "atoi" ... There is no excuse for that. There are things in the C99 that strike me as absolutely horrible.

The trouble with volatile

Posted May 16, 2007 8:38 UTC (Wed) by roelofs (guest, #2599) [Link]

The way I find it easy to think about it or explain it is that volatile does nothing to prevent the CPU reordering memory accesses, not to mention the visibility of memory accesses across different CPUs.

Well, more generally, the C and C++ languages themselves do nothing to prevent such problems (unlike Java)--they are blissfully unaware of all concurrency-related issues, and as a consequence, there is nothing you can do about any of it using only language-provided constructs. Some form of assembly-level support is always required.

Note that, fundamentally, it's the same problem afflicting the Double-Checked Locking Pattern (C/C++ version). Scott Meyers' (co-authored) 2004 DDJ article describes the issues quite nicely.

Greg

The trouble with volatile

Posted May 16, 2007 11:48 UTC (Wed) by massimiliano (subscriber, #3048) [Link]

I want to open a big bracket here and mention that volatile in Java an entirely different beast and is much much more useful. According to the Java memory model, volatile actually has very useful guarantees for memory visibility. (Perhaps Linux should consider Java for the kernel :-)

Or C#... it seems that the .NET platform is even more specific than Java in this sense :-)

A fourth case

Posted Mar 29, 2011 21:43 UTC (Tue) by willy (subscriber, #9762) [Link]

I have a device which will DMA entries to a block of memory.
To determine if an entry is new or old, there is a phase bit which is flipped each time around the ring. So I rely on the compiler actually loading from memory, and not using a cached copy of that bit.

See uses of volatile in http://git.kernel.org/?p=linux/kernel/git/willy/nvme.git;...


Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds