|
|
Subscribe / Log in / New account

C11 atomic variables and the kernel

By Jonathan Corbet
February 18, 2014
The C11 standard added a number of new features for the C and C++ languages. One of those features — built-in atomic types — seems like it would naturally be of interest to the kernel development community; for the first time, the language standard tries to address concurrent access to data on contemporary hardware. But, as recent discussions show, it may be a while before C11 atomics are ready for use with the kernel — if they ever are — and the kernel community may not feel any great need to switch.

The kernel provides a small set of atomic types now, along with a set of operations to manipulate those types. Kernel atomics are a useful way of dealing with simple quantities in an atomic manner without the need for explicit locking in the code. C11 atomics should be useful for the implementation of the kernel's atomic types, but their scope goes beyond that application.

In particular, each access to a C11 atomic variable has an explicit "memory model" associated with it. Memory models describe how accesses to memory can be optimized by the processor or the compiler; the more relaxed models can allow operations to be reordered or combined for improved performance. The default model ("sequentially consistent") is the strictest; it does not allow any combining or reordering of operations in any way that would be visible anywhere else in the program. The problem with this model is that it is quite expensive, and, most of the time, that expense does not need to be incurred for correct operation. The more relaxed models exist to allow for optimizations to be performed in a controlled manner while ensuring correct ordering when needed.

Thus, C11 atomic variables include features that, in the kernel, are usually implemented with memory barriers. So, for example, in current kernel code, one could see something like:

    smp_store_release(&x, new_value);

The smp_store_release() barrier (described in more detail in this article) tells the processor to ensure that any reads or writes executed before this assignment are visible on all processors before the assignment to x becomes visible. Reordering of operations that all occur before this barrier is still possible, as is the reordering of operations that all occur afterward. In most code, quite a bit of reordering can take place without affecting the correctness of the result. The use of explicit barriers in places where ordering does matter enables most accesses to be performed without barriers, enabling optimization and improving performance significantly.

If, instead, x were a C11 atomic type, one might write:

    atomic_store(&x, new_value, memory_order_release);

Where memory_order_release specifies the same ordering requirements as smp_store_release(). (See this page for a description of the C11 memory models).

If the memory_order_relaxed model (which imposes no ordering requirements on the access) is used for surrounding accesses to other atomic variables where ordering is not important, the end result should be similar to that achieved with smp_store_release(). But the former version is implemented with tricky, architecture-specific code within the kernel; the latter version, instead, causes the desired code to be emitted directly by the compiler.

When the kernel first gained support for multiprocessor systems, the C language had no concept of atomic types or memory barriers, so the kernel developers naturally had to create their own. Now that the language standard has caught up, one might think that changing the kernel to make use of the standard atomic types would make sense. And, someday, it might, but that transition is likely to be slow and fitful at best.

Optimization worries

The problem is that compilers tend to be judged on the speed of the code they generate, so compiler developers have a strong incentive to optimize code to the greatest extent possible. Sometimes those optimizations can break code that is not written with an attentive eye toward the standard; the kernel developers' perspective is that compiler developers will often rely on a legalistic reading of standards to justify "optimizations" that (from the kernel developer's viewpoint) make no sense and break code needlessly. Highly concurrent code, as is found in the kernel, tends to be more susceptible to optimization-caused problems than just about anything else. So kernel developers have learned to be careful.

One of the scariest potential problems is "speculative stores," where an incorrect value becomes visible on a temporary basis. A classic example would be code like this:

    if (x)
	y = 1;
    else
	y = 2;

It would not be uncommon for a compiler to optimize this code by turning it into something like this:

    y = 2;
    if (x)
	y = 1;

For sequential code operating in its own address space, the end result is the same, and the latter version avoids one jump. But if y is visible elsewhere, the value stored speculatively before the test may be seen by code that will proceed to do the wrong thing, causing things to go off the rails. Clearly, optimizations that cause incorrect values to become visible to any running thread must be avoided if the system is to run correctly.

When David Howells recently suggested that C11 atomic variables could be used in the kernel, speculative stores were one of the first concerns to be raised. The behavior of atomic variables as described by the standard is complex, to put it lightly, and there were real worries that the standard could allow compilers to generate speculative writes. An extensive and sometimes colorful discussion put most of those concerns to rest, but Paul McKenney, who has been representing the kernel's interests within the standard committee, is still not completely sure:

From what I can see at the moment, the standard -generally- avoids speculative stores, but there are a few corner cases where it might allow them. I will be working with the committee to see exactly what the situation is.

Another area of concern is control dependencies: situations where atomic variables and control flow interact. Consider a simple bit of code:

    x = atomic_load(&a, memory_order_relaxed);
    if (x)
  	atomic_store(&y, 42, memory_order_relaxed);

The setting of y has a control dependency on the value of x. But the C11 standard does not currently address control dependencies at all, meaning that the compiler or processor could play with the order of the two atomic operations, or even try to optimize the branch out altogether; see this explanation from GCC developer Torvald Riegel for details. Again, the results of this kind of optimization in the kernel context could be disastrous.

For cases like this, Paul suggested that some additional source-code markup and a new memory_order_control memory model could be used in the kernel to make the control dependency explicit:

    x = atomic_load(&a, memory_order_control);
    if (control_dependency(x))
  	atomic_store(&b, 42, memory_order_relaxed);

But this approach is unlikely to be taken, given just how unhappy Linus was with the idea. From his point of view, the control dependency should be obvious — the code is testing the value of x, after all. Any compiler that would move the atomic_store() operation in an externally visible way, he said, is simply broken.

There has also been some concern about "value speculation," wherein the compiler guesses that a variable will have a specific value and inserts a branch to fix things up if the guess is wrong. The processor's branch prediction hardware will then, hopefully, speed things up in cases where the guess is correct. See this note from Paul for an example of how value speculation might work — and how it might get things wrong. The good news on this front is that it seems that this kind of speculation will not be allowed. But it is not 100% clear that the current standard forbids it in all cases.

Non-local optimizations considered harmful

Yet another concern is global optimization. Compiler developers are increasingly trying to optimize programs at the level of entire source files, or even larger groups of files. This kind of optimization can work well as long as the compiler truly understands how variables are used. But the compiler is not required to understand the real hardware that the program is running on; it is, instead, required to prove its decisions against a virtual machine defined by the standard. If the real computer behaves in ways that differ from the virtual machine, things can go wrong.

Consider this example raised by Linus: the compiler might look at how the kernel accesses page table entries and notice that no code ever sets the "page dirty" bit. It might then conclude that any tests against that bit could simply be optimized out. But that bit can change; it's just that the hardware makes the change, not the kernel code. So any optimizations made based on the notion that the compiler can "prove" that bit will never be set will lead to bad things. Linus concluded: "Any optimization that tries to prove anything from more than local state is by definition broken, because it assumes that everything is described by the program."

Paul sent out a list of other situations where the compiler's virtual machine model might not match what is really happening. His examples included assembly code, kernel modules (which can access exported symbols, but which might not even exist when the compiler is making its decisions), kernel-space memory mapped into user space, JIT-compiled BPF code, and "probably other stuff as well". In short, there is a lot going on inside a kernel that the compiler cannot be expected to know about.

One solution to many of these non-local problems is to use volatile with the affected variables. Simply identifying such variables would be an error-prone exercise, of course, but there is a worse problem: using volatile turns off all optimization for the affected variable, defeating the purpose of using atomic variables in the first place. If volatile must be used, the kernel is better off staying with its current memory barrier scheme, which is designed to allow as much compiler- and processor-level optimization as possible, but no more than that.

Will it come to that? Despite his worries, Linus has actually expressed some confidence that real-world compilers will not break things badly:

In *practice*, I seriously doubt any reasonable compiler can actually make a mess of it. The kinds of optimizations that would actually defeat the dependency chain are simply not realistic. And I suspect that will end up being what we rely on - there being no actual sane sequence that a compiler would ever do, even if we wouldn't have guarantees for some of it.

But he has also been clear that his trust of compiler developers only goes so far and that, if necessary, the kernel community is more than prepared to stick with its current approach, which, he said, is "generally *fine*".

Does the kernel need C11 atomics?

Linus went on to make it clear that he is serious about this; if atomic variables as found in the C11 standard and its implementations do not provide what the kernel wants, the kernel will simply not use that feature. The kernel project, he said, is in a fairly strong bargaining position when it comes to atomic variables:

And the thing is, I suspect that the Linux kernel is the most complete - and most serious - user of true atomics that the C11 people can sell their solution to.

If we don't buy it, they have no serious user. Sure, they'll have lots of random other one-off users for their atomics, where each user wants one particular thing, but I suspect that we'll have the only really unified portable code base that handles pretty much *all* the serious odd cases that the C11 atomics can actually talk about to each other.

On the other hand, he said, the solutions found in the kernel now work just fine; there is no real need to move away from them if the kernel community does not want to.

In truth, there may well be other serious users; the GNU C library is using C11 atomics now for a few architectures, for example. And, while Torvald agreed that the kernel could continue to use its own solution, he also pointed out that there would be some advantages to using the standard mechanism. The widespread testing that this mechanism will receive was at the top of his list. One could also note that the kernel's tricky, architecture-specific barrier code could conceivably go away, replaced by more widely used code maintained by the compiler developers. That code would also, hopefully, be less likely to break when new releases of the compiler come out.

Beyond that, Torvald pointed out, C11 atomics can benefit from a fair amount of academic work that has been done. Some researchers at the University of Cambridge have come up with a formal description [PDF] of how C11 concurrency should work. Associated with that description is an interactive memory model simulator that can test code snippets for race conditions. And, in the end, if a large number of programs make use of C11 atomics, that should result in the quality of compiler implementations improving quickly.

Finally, if C11 atomic variables can be made to work in real-world programs, they could go a long way toward the establishment of reliable patterns for how C (and C++) can be used in concurrent environments. At the moment, there is no way for developers to know what is safe to do — now, and in the future. As Peter Sewell (one of the above-mentioned Cambridge researchers) put it:

There are too many compiler optimisations for people to reason directly in terms of the set of all transformations that they do, so we need some more concise and comprehensible envelope identifying what is allowed, as an interface between compiler writers and users.

The C11 standard is meant to be that "envelope," though, as Peter admitted, it is "not yet fully up to that task". But if the remaining uncertainties and problems can be addressed, C11 atomics could become a common language with which developers can reason about concurrency and allowable optimizations. Developers might come to understand the issues better, and kernel code might become a bit more widely accessible to developers who understand the standard.

So it might well benefit the kernel to make use of this relatively new language feature. Nobody has closed the door on that possibility, but any transition in that direction will require a lot of time, testing, and confidence building. Bugs resulting from low-level concurrency management problems can be among the hardest to find, reproduce, or diagnose; nobody will be in a hurry to replace the kernel's atomics and memory barriers without a high level of assurance that the change will not result in the introduction of that kind of issue.

Index entries for this article
KernelC11 atomic operations


(Log in to post comments)

C11 atomic variables and the kernel

Posted Feb 18, 2014 23:25 UTC (Tue) by josh (subscriber, #17465) [Link]

The article quotes Linus saying:
> In *practice*, I seriously doubt any reasonable compiler can actually make a mess of it. The kinds of optimizations that would actually defeat the dependency chain are simply not realistic. And I suspect that will end up being what we rely on - there being no actual sane sequence that a compiler would ever do, even if we wouldn't have guarantees for some of it.

However, elsewhere in the thread (and in other threads), there have been discussions of real-world scenarios that compilers do *today*, most notably noticing common operations in both branches of a conditional and floating them before the actual conditional.

C11 atomic variables and the kernel

Posted Feb 19, 2014 0:05 UTC (Wed) by jwakely (subscriber, #60262) [Link]

> The C11 standard added a number of new features for the C and C++ languages.

Not really. The C++ core language is independent of C, and although the C++ standard library does include C's standard library by reference, it is still based on C99.

The C11 atomics were almost copy'n'pasted from C++11. All the work was done for C++, and C (sensibly) incorporated it wholesale.

C11 atomic variables and the kernel

Posted Feb 19, 2014 4:04 UTC (Wed) by jzbiciak (guest, #5246) [Link]

I remember hearing from somebody (either a Herb Sutter talk, or our OpenCL/OpenMP/etc. compiler guru at work) that C11 and C++11 differ in some subtle ways. For most things, users wouldn't be able to tell the difference, but there were some important corner conditions.

Unfortunately, my GoogleFu on the topic is weak tonight.

You wouldn't happen to know off-hand what differs between C11 and C++11, would you?

C11 atomic variables and the kernel

Posted Feb 19, 2014 5:57 UTC (Wed) by wahern (subscriber, #37304) [Link]

C++11 basically refused to adopt most of the changes in C99, including named initializers and compound literals. C11 additions like _Generic and (IIRC) anonymous struct members also didn't get added to C++11.

The relevant question isn't what C++11 didn't add, it's what it actually adopted from C99 and C11, which is very little. The languages have moved further apart with the latest standard.

The atomics work was one area where everybody intentionally worked together to make sure there'd be compatibility. C11 didn't simply pull it in from C++11; the cooperation was very intentional and explicit.

In any event, at this point I think it's safe to say that the C and C++ languages have pretty much parted ways. All the C++ committee cares about is that vendors can maintain ABI compatibility between C and C++ compiled units. C++ isn't interested in keeping up-to-date with C syntax. Some C syntax, like compound literals, I don't think even can be supported, given changes in C++11 regarding temporaries. (But I'm not a C++ programmer so I could be wrong on that last point.)

C11 atomic variables and the kernel

Posted Feb 19, 2014 6:11 UTC (Wed) by wahern (subscriber, #37304) [Link]

One adoption I forgot about is that C++11 added <stdint.h> and <cstdint>. Again, I think all C++ cares about is making ABI compatibility practical for vendors. So the memory-model (atomics) and primitive types (stdint.h) matter. Syntax changes, not so much.

I just had to tweak a bunch of C headers at work to parse as C++. Very annoying, even though g++ and clang++ try hard to support C99 and C11 extensions in C++ code. But there remain considerable differences. The problem I ran into is that C99 added static inline routines, which means it's more common to encounter actual code blocks in C headers where you quickly run into, e.g., type casting issues like casting from const void * to const char *, which requires static_cast<> or reinterpret_cast<> in C++.

C11 atomic variables and the kernel

Posted Feb 19, 2014 7:11 UTC (Wed) by epa (subscriber, #39769) [Link]

The compiler should have a single flag you can specify which gives an error if any code construct has different semantics between C and C++. (In other words so you can write code in the intersection of both languages.) Of if that's too ambitious, at least check the syntax is valid in both languages.

C11 atomic variables and the kernel

Posted Feb 19, 2014 8:12 UTC (Wed) by jzbiciak (guest, #5246) [Link]

It's a bit late for me to find chapter and verse, but I seem to recall there are subtly different aliasing rules between the two.

Other areas where I was surprised—or at least dismayed and/or annoyed—by differences (at least in implementation): C99 _Bool vs. C++ bool, complex numbers (beyond just syntax), enums...

C11 atomic variables and the kernel

Posted Feb 19, 2014 9:59 UTC (Wed) by epa (subscriber, #39769) [Link]

Yes, many of these differences are quite pointless. Stroustrup a few years ago wrote an article C and C++ - sibling rivalry about some of them.

While I wouldn't necessarily support merging C and C++ into a single language, or even requiring one to be a subset of the other, I do think the two standards committees should be merged into one group which looks after both languages. That would piss them off a great deal, but better to piss off the committee than the programmers who have to cope with the needless differences.

C11 atomic variables and the kernel

Posted Feb 19, 2014 14:21 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

I don't think they should be merged. You're basically firing any C standard member who doesn't know C++ already and with how complicated the language is, that'd be a lot of time to require. Personally, I'd prefer if they just say that C support in C++ was a dumb idea and make a formal split. Alas, backwards compatibility :/ .

C11 atomic variables and the kernel

Posted Feb 19, 2014 23:40 UTC (Wed) by oshepherd (guest, #90163) [Link]

Perhaps due to higher general interest, perhaps because there is more going on there - my impression and experience is that the C++ standards committee is a lot more active and quite a bit more rigorous.

While the C++ standard has more bugs than the C standard - thats' not surprising, its' significantly larger - it also has less bugs per page, and the bugs it does have are more subtle (and thus harder to find)

The C11 standard has some careless bugs which kind of give away the fact that the C committee is really quite small - perhaps too small.

C11 atomic variables and the kernel

Posted Feb 19, 2014 15:48 UTC (Wed) by jwakely (subscriber, #60262) [Link]

While I wouldn't necessarily support merging Linux and FreeBSD into a single kernel, or even requiring one to be a subset of the other, I do think the two development communities should be merged into one group which looks after both kernels. That would piss them off a great deal, but better to piss off the kernel developers than the programmers who have to cope with the needless differences.

(Because telling volunteers they should be donating their time to projects they don't care about is a great way to get good results, right?)

C11 atomic variables and the kernel

Posted Feb 19, 2014 16:22 UTC (Wed) by epa (subscriber, #39769) [Link]

I imagined that the standards committees were staffed with paid employees of compiler vendors, e.g. Herb Sutter at Microsoft, but with the dominance of gcc and LLVM this view of the world must be out of date.

C11 atomic variables and the kernel

Posted Feb 28, 2014 14:55 UTC (Fri) by ianmcc (subscriber, #88379) [Link]

Microsoft only started participating in the C++ standards effort relatively recently (ie, when Microsoft employed Herb to take over as head of the C++ compiler group). Prior to then, Microsoft actively worked against standardization efforts.

I think most of the C++ committee are allowed to work on standardization work as part of their employment, but most somewhat reluctantly, it isn't their core job.

C11 atomic variables and the kernel

Posted Feb 20, 2014 11:57 UTC (Thu) by tvld (guest, #59052) [Link]

If you want to influence standardization, there's a straight-forward way to do that: Get involved. http://isocpp.org has useful information about the whole process, mailing lists, etc.

C11 atomic variables and the kernel

Posted Feb 19, 2014 9:09 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

But the _effect_ of such a flag is that C++ programmers will insist that C programmers "ought" to use the flag so as to not interfere with their pretty notion that C is just a limited subset of C++ written by Neanderthals. Which is bogus. When something provides C, it means C, not "I'm sure you can just paste this into a C++ program and that'll be fine".

If you add a Java component to a C program nobody expects that they'll just be able to import some.c.code; into the Java and have that work. If you need to access Perl from Python, or Ruby from PHP again, nobody insists that there should be some "easy" way to just mix them together as if they were merely dialects of the same language. In each case you're responsible for handling the adaptor layer. But somehow C++ programmers seem to think they shouldn't need to do this with C, let's not make this myth any easier to believe.

C11 atomic variables and the kernel

Posted Feb 19, 2014 9:57 UTC (Wed) by epa (subscriber, #39769) [Link]

Yes, they are different languages and anyone who says 'C/C++' usually doesn't know what they are talking about. Nonetheless there are plenty of C programmers who choose to make their code also valid C++, not because they are bullied into it by heartless C++ programmers, but as a way of getting some additional compile-time checking and warnings for odd corners of the language.

An 'intersection' compiler mode might also be useful when writing C wrappers for libraries implemented in C++.

C11 atomic variables and the kernel

Posted Feb 19, 2014 14:18 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

As a C++ developer for the most part, it has been my belief that C++'s greatest mistake has been C source compatibility (to the extent that it is even true). I firmly believe that having an FFI to C would have been much cleaner in the long run. Unfortunately, that ship sailed long ago. This further divide makes me even more sure of it though.

> compound literals

Is this why I can't declare literals like 'char const* const* strarray[] = { { "a", "b", NULL }, NULL };' in C++?

C11 atomic variables and the kernel

Posted Feb 19, 2014 20:57 UTC (Wed) by wahern (subscriber, #37304) [Link]

Pretty much, I guess. Although the valid syntax would be

char const* const* strarray[] = { (char const* const[]){ "a", "b", NULL }, NULL };

which the compiler effectively translates to

char const* const tmp[] = { "a", "b", NULL };
char const* const* strarray[] = { tmp, NULL };

Correct me if I'm wrong, but I believe that the problem is that the construct (type){ initializer list } in C++ creates a temporary array which is destroyed when the expression goes out of scope (I dunno if this is C++11 syntax or a common extension to be adopted in the next version). Compound literals in C have block scope lifetime.

Compound literals are a nice addition to C. It's clearly possible to live without them, but it's often convenient to be able to create small objects or buffers inline. For example, you can wrap something like strerror_r or strerror_l to behave like strerror, although you have to be careful about scoping. Very crude example: #define xstrerror(error) strerror_r((error), (char[256]){0}, 256).

I presume C++ has more wholesome ways of doing stuff like this, which is why the committee wasn't interested in the extension.

C11 atomic variables and the kernel

Posted Feb 20, 2014 1:15 UTC (Thu) by jwakely (subscriber, #60262) [Link]

> I dunno if this is C++11 syntax or a common extension to be adopted in the next version

Neither. It's a common extension but very unlikely to be standardised.

C11 atomic variables and the kernel

Posted Feb 20, 2014 6:12 UTC (Thu) by wahern (subscriber, #37304) [Link]

Ah, the syntax was I was thinking of is called something like "list-initialized temporaries", which was added to C++11 to allow classes to accept initialization lists in the manner of plain aggregate types. The syntax is similar, just without the parentheses--type{ initialization-list}, or just { initialization-list } if the type can be inferred.

GCC decided to give compound literals in C++ the same treatment as it does initializer lists. The temporary objects in the initializer lists are scoped to the expression. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53220

C11 atomic variables and the kernel

Posted Feb 20, 2014 8:06 UTC (Thu) by mm7323 (subscriber, #87386) [Link]

Not sure if c++ has alloca(), but this looks neater to me:

#define xstrerror(error) strerror_r((error), alloca(256), 256)

C11 atomic variables and the kernel

Posted Feb 20, 2014 13:29 UTC (Thu) by jzbiciak (guest, #5246) [Link]

Well, technically, neither C nor C++ has alloca(), although it's a common extension. Or did C11 add this?

C11 atomic variables and the kernel

Posted Feb 27, 2014 8:02 UTC (Thu) by kevinm (guest, #69913) [Link]

Calling alloca() from a function argument is historically a very iffy thing to do - there were implementations where this crashed and burned very badly (alloca ended up fudging the stack pointer in the middle of the code pushing arguments onto the stack).

C11 atomic variables and the kernel

Posted Feb 27, 2014 8:52 UTC (Thu) by jzbiciak (guest, #5246) [Link]

Well, at least among the compiler folk I work with, they'd argue that alloca() is an ugly thing to do, full stop. ;-)

Given that our own compiler doesn't support alloca(), and tries to figure out the total stack frame for the entire function (so that no matter what happens, the SP moves once on entry, once on exit), I can't say I entirely blame them for that opinion. (Or, at least, that's what our compiler did the last time I dug into it at that level.)

Is it just me, or does alloca() mostly just feel like a hack to get around the lack of destructors or other unrolling mechanisms tied to leaving scopes?

C11 atomic variables and the kernel

Posted Feb 27, 2014 8:58 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

It also offers very fast allocation and deallocation of arbitrarily-sized arrays within a cache-local area. It's hard to beat that.

C11 atomic variables and the kernel

Posted Feb 27, 2014 17:15 UTC (Thu) by jzbiciak (guest, #5246) [Link]

True, but if you already have to limit yourself to a certain maximum size allocation to effectively use alloca(), then declaring a local array gets you that same locality benefit.

I also saw elsewhere a horror story where a function call w/alloca() got inlined and turned into a stack overflow, because apparently the compiler deferred the implicit freeing of the buffer to the end of the parent function. ie:

    void inlined_func(...)
    {
         ... 
         alloca( ... ); 
         ...
         /* implicit free() here */
    }

    void parent( ... )
    {
         for (i = 0; i < 10000000; i++)
             inlined_func();
    }

turned into:

    void parent( ... )
    {
         for (i = 0; i < 10000000; i++)
         {
             ...
             alloca( ... );
             ...
         }
         /* implicit free() here */
    }

Oops.

I've never had that happen with local arrays, though. When the compiler inlines a function with a local array, the result looks more like an array local to the parent that it got inlined within, so it statically becomes part of the stack frame while you're in the parent. That is, if you replace alloca() in the example above with char buf[MAXBUF];:

    void parent( ... )
    {
         for (i = 0; i < 10000000; i++)
         {
             char buf[MAXBUF]; /* becomes a static part of the stack frame */
             ...
             ...
         }
    }

Sure, statically sized buffers have their own issues—buffer overflow attacks—but alloca() is no panacea if it can be used to overflow the stack and crash the app. I guess what I'm saying is that the set of places where alloca() provides an advantage over a statically sized buffer are limited because the places where it's safe to use either have so much overlap, while the places where only one or the other is appropriate are fairly small.

C11 atomic variables and the kernel

Posted Feb 27, 2014 18:00 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Statically sized buffers are way too wasteful. For example, storing path would require MAXPATH of stackspace for all paths.

C11 atomic variables and the kernel

Posted Feb 28, 2014 5:58 UTC (Fri) by jzbiciak (guest, #5246) [Link]

And yet, if you do encounter a series of paths that are at or near MAXPATH bytes, you'd still end up wasting that space anyway with alloca(). If you're saying you can't handle so many MAXPATH pathnames, then you're broken if you use alloca(), since it doesn't offer a way to fail gracefully.

If you're manipulating many structures that have potential not to fit on the stack, another approach I've seen is to allocate a static buffer large enough to catch most common cases, and fall back to a malloc()'d buffer if you'd exceed that threshold. It does require a conditional call to free(), but it's more robust than alloca() and avoids the unconditional bloat of overlarge local arrays.

(And, with the explicit malloc() and free(), it won't run afoul of the inlining gotcha I highlighted above.)

C11 atomic variables and the kernel

Posted Feb 28, 2014 11:37 UTC (Fri) by khim (subscriber, #9252) [Link]

Yes, bugs in compilers exist and should be fixed. Just like any other types of bugs. We don't design our programs around old bugs in kernel or libc, why should we design them to support broken compilers?

The rest of discussion sounds so bizzare I can not even believe I hear it on LWN. ALL remaining arguments only make sense for address-space constrained system without overcommit.

On systems with overcommit enabled and with no shortage of the address space (and that's 99.99% of systems out there) alloca is as safe as malloc+free and much, much faster. End of discussion.

C11 atomic variables and the kernel

Posted Feb 28, 2014 15:56 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Don't you argue that mobile devices are eating other form factor's lunches? They're all 32-bit last I checked where address space is at a premium.

C11 atomic variables and the kernel

Posted Feb 28, 2014 17:01 UTC (Fri) by khim (subscriber, #9252) [Link]

Don't you argue that mobile devices are eating other form factor's lunches?

Well, sure.

They're all 32-bit last I checked where address space is at a premium.

It's good idea to check things more often than once per decade. Have you head about this phone? How about this CPU or that one?

And I don't see where you get the notion that 32-bit implies “address space is at a premium”: typical mobile OS does not use swap and keeps many applications in memory at the same time. Any given application can only use 128MB or so. You can easily give allocate 16MB or 32MB of address space for heavy worker threads - more then enough for alloca.

C11 atomic variables and the kernel

Posted Feb 28, 2014 17:34 UTC (Fri) by jzbiciak (guest, #5246) [Link]

On platforms that support it. Still, it's not portable, and doesn't seem to offer truly material advantages in my eyes.

One place I've run into alloca() was in some initialization code of g_doom, the "generic frame buffer" port of Doom. That's pretty much the last place to make a "hot cache efficiency" or any other time-based argument for alloca() over malloc(). It was there pretty much for the purpose of getting an automatic deallocation if it took an early exit from the function.

The system I was porting to, however, didn't (and still doesn't) support alloca(), so I had to convert it to malloc()/free(). (And when I asked our compiler guys about it, they said "Just use malloc()" with the sort of tone that implied they thought alloca() was an abomination.)

Yes, the code was marginally cleaner looking with alloca(), but my point was that that benefit arises from the fact that C doesn't offer any other easy way to say "clean all this up when we leave this scope" other than to put things on the stack and rely on the stack frame unwinding.

Aside from non-portability, alloca() also pretty much requires you to use a frame pointer, since your stack frame size is now variable. Again, not usually a big deal, although it can hurt on register-starved architectures like 32-bit x86. And then there's all these fun comments in alloca()'s own manual page:

BUGS
        The alloca() function is machine and compiler dependent.  On many sys-
        tems its implementation is buggy.  Its use is discouraged.

        On many systems alloca() cannot be used inside the list of arguments of
        a function call, because the stack space reserved by alloca() would
        appear on the stack in the middle of the space for the function argu-
        ments.

I get it, you like alloca(). I see enough things potentially wrong with it that its meager advantages don't seem worth it to me.

C11 atomic variables and the kernel

Posted Feb 28, 2014 18:06 UTC (Fri) by khim (subscriber, #9252) [Link]

I get it, you like alloca(). I see enough things potentially wrong with it that its meager advantages don't seem worth it to me.

I'm not a big alloca lover, but I just don't see what's the big hoopla is all about. I mean: alloca is just a minor syntaxic sugar on top of facilities you have anyway. Yes, it's not portable but it's an interface people are familiar with, so why not?

Before you'll raise the racket about frame pointers and stuff please recall that you compiler must compile, e.g. the following program (from 6.5.3.4 The sizeof oprator part of the C standard):

#include <stddef.h>

size_t fsize(int n)
{
  char b[n+3];
  return sizeof b;
}

It's a 100% standard program. It's included in standard. It's non-optional part of it. It must be supported.

Now on any system where it's supported alloca implementation is trivial, so why not just implement and use it where appropriate?

I'm yet to observe a system which supports the aforementioned program (from the official 15 years old standard - and included again in new C11 one, too!) which does not support working and usable alloca and if you insist on using broken tools you deserve to receive broken programs.

P.S. Note that function in standard is called fsize—this gives you pretty strong hint where and how such facilities are supposed to be used, right?

C11 atomic variables and the kernel

Posted Feb 19, 2014 1:42 UTC (Wed) by lutchann (subscriber, #8872) [Link]

In skimming the article, I thought for a moment our editor had an appalling lapse of accuracy in spelling Linus's last name...but it turns out he was referring to Torvald Riegel instead. Hopefully my pointing this out will avoid some mistaken "correction" emails!

C11 atomic variables and the kernel

Posted Feb 19, 2014 7:29 UTC (Wed) by billygout (guest, #70918) [Link]

Thanks, I experienced the same confusion.

C11 atomic variables and the kernel

Posted Feb 19, 2014 15:18 UTC (Wed) by jimparis (guest, #38647) [Link]

Some LWN authors refer to people by their first name, and some refer to people by their last name, which makes it even more confusing. I suppose switching to email addresses is out of the question? :)

Thanks

Posted Feb 19, 2014 4:04 UTC (Wed) by bjacob (subscriber, #58566) [Link]

Just when my subscription was expired and I was wondering whether to renew. My wallet doesn't thank you.

C11 atomic variables and the kernel

Posted Feb 19, 2014 8:18 UTC (Wed) by jezuch (subscriber, #52988) [Link]

> "Any optimization that tries to prove anything from more than local state is by definition broken, because it assumes that everything is described by the program."

I think Linus oversteps a bit here. For the kernel it might be true because the kernel has a very intimate relationship with the hardware. But for user-space code I think it's a different thing. The whole point of the kernelspace-userspace split is so that the latter can assume that nothing "magical" happens, and if it does, it's within the system calls. I may be wrong, of course, but I don't want my compiler to give up useful [global] optimizations like removing a variable that is never written to (and preferably warning me about it) just because it was told that some unknown entity from outer space may modify the memory behind my back ;)

C11 atomic variables and the kernel

Posted Feb 19, 2014 9:34 UTC (Wed) by alexl (subscriber, #19068) [Link]

There is a lot of "magic" that can happen even in userspace, via e.g. shared memory (writes from other processes), dlopen/JIT (linker/compiler doesn't have full knowledge of code at runtime), etc.

C11 atomic variables and the kernel

Posted Feb 19, 2014 10:35 UTC (Wed) by tvld (guest, #59052) [Link]

I don't see how memory shared with other processes is a problem. You would need to synchronize properly for those (e.g., maintain data-race freedom), but the compiler is aware that this is shared with other threads because you did mmap() or similar on it; the compiler doesn't know about mmap() semantics, so it has to assume that mmap() might make the data visible to other stuff. One thing to note is that your synchronization needs to be ready for cross-process; regarding atomics, the C++ standard guarantees that any lock-free operations will also be "address-free", meaning that it still works if the memory region is visible at another virtual address, like when mapped by another process.

For dlopen, the compiler is aware of data escaping to unknown code because there will be function calls for functions the compiler hasn't analyzed (and it will thus *not* be able to make whole-program analysis and optimizations relying on that). JIT is the same if JIT'ed functions are accessible through function interfaces, global variables accessible to other compilation units, etc.

Thus, I think there's less "magic" than people might assume. Of course, if you ask your JIT to modify code produced by the compiler, the JIT should better know what it's doing :)

C11 atomic variables and the kernel

Posted Feb 19, 2014 12:03 UTC (Wed) by khim (subscriber, #9252) [Link]

the compiler doesn't know about mmap() semantics, so it has to assume that mmap() might make the data visible to other stuff

Citation needed. From what I'm seeing in the standard mmap() can do whatever it wants while it's running but when it returns compiler may assume that it knows everything about the program till you call mmap() again.

You don't even need another program to confuse compiler. You can just mmap the same region twice in your program then write to some atomic variable using one address while read it using different address. Naive compiler may decide that it have "proof" that variable is never modified.

C11 atomic variables and the kernel

Posted Feb 19, 2014 13:29 UTC (Wed) by tvld (guest, #59052) [Link]

> Citation needed. From what I'm seeing in the standard mmap() can do whatever it wants while it's running but when it returns compiler may assume that it knows everything about the program till you call mmap() again.

Sorry, but *your* assertion needs a citation. C11 is multi-threaded. Where is mmap or any other function with unknown semantics forbidden to spawn a new thread and let the new thread access the data?

> You don't even need another program to confuse compiler. You can just mmap the same region twice in your program then write to some atomic variable using one address while read it using different address. Naive compiler may decide that it have "proof" that variable is never modified.

Likewise. You don't need to map it to other addresses. A correct compiler will handle this just right.

Note though that if the caller of mmap() uses the data as if it would not be shared with other threads, then the compiler can assume it isn't shared. This follows from the data-race-freedom requirement in the C11/C++11 memory model. For example, if you access the data using non-atomic stores (also note that the atomics are type, not plain data), then doing so while another thread may read the data is a data race, and is specified to result in undefined behavior. IOW, any compiler transformations you didn't expected where caused by your program having a data race and thus undefined behavior.

C11 atomic variables and the kernel

Posted Feb 19, 2014 9:47 UTC (Wed) by NAR (guest, #1313) [Link]

In multithreaded programs it is very much possible that an other entity (other thread) modifies the memory "behind your back".

C11 atomic variables and the kernel

Posted Feb 19, 2014 10:09 UTC (Wed) by jwakely (subscriber, #60262) [Link]

But another thread is still part of the same program. An optimizer that can prove a variable is never written to obviously has to see the code for the whole program, including code running in other threads.

C11 atomic variables and the kernel

Posted Feb 19, 2014 12:05 UTC (Wed) by khim (subscriber, #9252) [Link]

Compiler can not see the code for the whole program because any program which never call any syscalls is pretty much useless. And even if you'll include kernel sources to the mix there are some things which CPU can do behind your back (as Linus pointed out).

C11 atomic variables and the kernel

Posted Feb 19, 2014 13:49 UTC (Wed) by tvld (guest, #59052) [Link]

> Compiler can not see the code for the whole program

That's what jwakely is pointing out: If the compiler cannot see the whole program, it will be aware of that, will not to do a whole-program analysis, and thus will not attempt optimizations that need whole-program analysis.

Also note that if you have a valid C program, then it follows the C semantics including its object model (ie, lifetime and accessibility of the objects that form program state). If you access C objects with other means than defined by C, you better know what you are doing. There's "volatile" for demarcating things where you talk to the outside world, which have much stricter semantics than normal code.

C11 atomic variables and the kernel

Posted Feb 19, 2014 12:42 UTC (Wed) by alankila (guest, #47141) [Link]

Well, it might be a shared with a whole *different* program, of course.

No, the reality is that the mere use of synchronization primitives must tell the compiler to disable closed-world optimizations.

C11 atomic variables and the kernel

Posted Feb 19, 2014 13:41 UTC (Wed) by tvld (guest, #59052) [Link]

If you share it with a different program (IOW, the external world not covered by the standard), this is what's "volatile" exists for. This is orthogonal to synchronization.

If you share with other programs that use the C11 memory model and ABI, you can just use the atomics and it will work fine because the compiler will see when you share stuff with other programs -- simply because it sees that state will be accessible to other entities that it cannot analyze. (This does assume that you share via C facilities such as functions such as mmap(); it won't work with other things that magically hook into your C program, such as debuggers, of course.)

If you share with other processes that use the C11 memory model and ABI, use lock-free atomics for that because they are "address-free" (see C++ 29.4p3).

IOW, what you call closed-world optimizations are not a problem because the compiler will see for which variables/state the world is closed, or for which is isn't. As I stated above, the only corner cases where this isn't the case is when program state objects are accessible through other means than defined by the C standard, and they are not explicitly shared. For example, if you would allocate an object, and let another process mmap your heap (eg, after attaching) and walk your malloc data structures to find the object. But that's outside of C; normal C programs without those custom quirks will just work fine.

C11 atomic variables and the kernel

Posted Feb 19, 2014 21:04 UTC (Wed) by sionescu (subscriber, #59410) [Link]

An external process can modify your program's memory via ptrace(2)

C11 atomic variables and the kernel

Posted Feb 20, 2014 0:32 UTC (Thu) by PaulMcKenney (subscriber, #9624) [Link]

Excellent point! I had forgotten about ptrace, and, by extension, debuggers.

C11 atomic variables and the kernel

Posted Feb 22, 2014 20:16 UTC (Sat) by nix (subscriber, #2304) [Link]

I think it safe to say that no optimized program is compiled in the expectation of debuggers doing *anything* to it. Sure, debuggers can work as well as possible, but in general making that happen is up to the debugger and the DWARF generation code in the compiler, and the compiler can sensibly assume that anyone ptracing is looking at the DWARF: the compiler certainly isn't going to pessimize its code to make the job of people doing ptrace() without looking at the generated DWARF any easier!

(sez a man who's written code in the last year that digs around in the guts of running programs without looking at the DWARF in any way shape or form. But I'm not modifying anything, so that's all right. Right? ... :} )

C11 atomic variables and the kernel

Posted Feb 23, 2014 1:04 UTC (Sun) by PaulMcKenney (subscriber, #9624) [Link]

;-) ;-) ;-)

C11 atomic variables and the kernel

Posted Feb 19, 2014 10:27 UTC (Wed) by tvld (guest, #59052) [Link]

I don't think this is necessarily a userspace/kernel difference, although the corner cases are less likely for typical userspace programs. The language itself has several notions of visibility/accessibility of variables built into it. For example, a variable on the stack is just visible in it's owning function unless it's address is taken and the address value escapes the function. "volatile" can be used to designate state that is part of the visible output/input from the external world.

The corner cases that we discuss further in the thread are when visibility/accessibility of variables is established through mechanisms outside of the language (e.g., accesses through magically known fixed addresses, linker script magic, ...).

C11 atomic variables and the kernel

Posted Feb 19, 2014 15:18 UTC (Wed) by PaulMcKenney (subscriber, #9624) [Link]

One key point on the userspace/kernel boundary is that we cannot let the kernel crash just because the userspace code had a data race.

C11 atomic variables and the kernel

Posted Feb 22, 2014 19:58 UTC (Sat) by nix (subscriber, #2304) [Link]

btw, C compilers have been optimizing based on entire source files for many years (GCC started doing it in 3.something, I think, and the kernel adjusted: remember -funit-at-a-time?). Whole-program optimization, which is what was being discussed, is optimizing whole programs at once, so the compiler has a global view. This is, of course, often not possible, so there is also link-time optimization, which can optimize lots of translation units as a group without necessarily assuming that it knows about the whole program all at once.

C11 atomic variables and the kernel

Posted Sep 2, 2014 0:41 UTC (Tue) by dalias (guest, #95815) [Link]

The speculative store optimization cited in this article is at least blown out of proportion, in the sense that it's only visible in code with undefined behavior, if at all. Both the POSIX memory model (as lacking in detailed specification as it is, it's still a memory model) and the C11 memory model forbid any transformations that could cause an incorrect value to be ready by another thread which is performing a legal read. If you're reading a non-atomic object in a situation where another thread may be writing to it (this could apply to another thread reading y in the article's example, with the example code as the writer), then the behavior is undefined, and so reading an incorrect value (or crashing, or summoning nasal demons, or whatever) is acceptable behavior for the compiler to produce. But if there's proper synchronization (POSIX or C11) or memory ordering (C11) protecting the read, a compiler cannot make such a transformation.

C11 atomic variables and the kernel

Posted Sep 2, 2014 1:19 UTC (Tue) by dlang (guest, #313) [Link]

It's nice to be able to declare the real world "out of spec" and blame everything on using code with undefined behaviour

but it doesn't actually do anyone any good.

In the real world, the processor memory model don't match the C11 or POSIX memory models and such issues need to be addressed.

One way of addressing them to to use atomic operations and/or memory barriers around all object access, that will make the standard happy, but the performance will plummet.


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds