LWN: Comments on "GCC and pointer overflows" https://lwn.net/Articles/278137/ This is a special feed containing comments posted to the individual LWN article titled "GCC and pointer overflows". en-us Mon, 06 Oct 2025 17:26:33 +0000 Mon, 06 Oct 2025 17:26:33 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net GCC and pointer overflows https://lwn.net/Articles/279631/ https://lwn.net/Articles/279631/ nix <div class="FormattedComment"><pre> You can't (yet) attach attributes to arbitrary expressions, though. </pre></div> Thu, 24 Apr 2008 20:39:48 +0000 GCC and pointer overflows https://lwn.net/Articles/279492/ https://lwn.net/Articles/279492/ eliezert <div class="FormattedComment"><pre> Of course it's best if we all stick to the standard, but people do make mistakes. I think the simplest way to minimize this is to have an attribute marking the check as a safty/sanity check so the compiler knows that: 1. It should not optimize this check out. 2. If it knows that this check will allways fail it can emit a warning. 3. If this check is assuming something improper it's better to fail the compilation and let the coder look up the error somewhere, than to silently drop the test. </pre></div> Thu, 24 Apr 2008 07:15:34 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278886/ https://lwn.net/Articles/278886/ mikov <div class="FormattedComment"><pre> I was thinking mostly of integers, not pointers. However I am hard pressed to think of an example where a pointer overflow would not safely wrap around _in practice_. Even with x86 FAR pointers, the offset will safely wrap around. There will be no trap and the comparison will still work. Are there many architectures, in use today, where pointers are not integers and do not wrap around on overflow ? Secondly, do these architecture run Linux and are they supported by GCC ? Regardless of the answer to these questions, why would people writing code with zero chance for running on those architectures, be maniacs ? The C Standard is not a measurement for good programming. It simply has a set of restrictions to ensure portability between all kinds of weird architectures (I should say it fails in this miserably, but that is beside the point). However portability is not the only, and by far not the most important measurement of code. </pre></div> Mon, 21 Apr 2008 17:32:41 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278863/ https://lwn.net/Articles/278863/ mikov <blockquote>Instructions that trap are not going away, if only because they're useful in virtual machines--or the like--to which C can be targeted.</blockquote> <p>I disagree. A predictable conditional jump is better, simpler and more efficient than a completely unpredictable trap. Even if the trap doesn't have to go through the OS signal handler (which it probably does on most OS-es), it has to save context, etc. <p>One could argue for adding more convenient instructions for detecting an overflow and making a conditional jump on it. I strongly suspect that instructions that trap are currently a dead end. <p>Anyway, we know x86 doesn't have instructions that trap on overflow. (Well, except "into", but no C compiler will generate that). Do PPC and ARM have them and are they used often? <blockquote>That Java and C# stipulate a fixed size is useless in practice; it doesn't help in the slightest the task of constraining range, which is almost always defined by the data, and similar external context. Any code which silently relies on a Java primitive type wrapping is poor code.</blockquote> <p>That is not my experience at all. C99 defines "int_least32_t" and the like exactly to address that problem. Many C programmers like to believe that their code doesn't rely on the size of "int", but they are usually wrong. Most pre-C99 code would break horribly if compiled in a 16-bit environment, or where integer widths are not powers of 2, or are not two's complement. <p>Honestly, I find the notion that one can write code without knowing how wide the integer types are, in a language which doesn't implicitly handle the overflow (unlike Python, Lisp, etc), to be absurd. <p>I am 100% with you on the unsigned types, though. <p>I also agree that in practice Java is as susceptible to arithmetic bugs as C. However it is for a different reason than the one you are implying. It is simply because in practice Java and C have _exactly_ the same integer semantics. <p>Java simply specifies things which 90% of the C programmers mistakenly take for granted. Wrap-around on overflow, truncating integer division, etc. Mon, 21 Apr 2008 16:00:09 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278843/ https://lwn.net/Articles/278843/ wahern <div class="FormattedComment"><pre> Instructions that trap are not going away, if only because they're useful in virtual machines--or the like--to which C can be targeted. Relying too heavily on pointer arithmetic in algorithms is not the smartest thing to do. The largest integral type supported on the computer I'm typing on is 64-bit (GCC, long long), but pointers are only 32-bits. Parsing a 5GB ISO Base Media file (aka QuickTime Mov), I can keep track of various information using the unsigned 64-bit integral; if I had written all my algorithms to rely on pointer arithmetic to store or compute offsets, I'd be screwed. C precisely defines the behavior of overflow for unsigned types. Java's primitives suck because they're all signed; the fact that it wraps (because Java effectively stipulates a two's-complement implementation) is useless. In fact, I can't even remember the last time I used (or at least wanted) a signed type, in any language. Having to deal with that extra dimension is a gigantic headache, and it's worth noting that Java is just as susceptible to arithmetic bugs as C. I'd argue more so, because unwarranted reliance on such behavior invites error, and such reliance is easier to justify/excuse in Java because it so narrowly stipulates such things. C's integral types are in some ways superior to many other languages' specifically because they're so loosely defined by the spec. Short of transparently supporting big integers, it forces you to focus more on values than representations. That Java and C# stipulate a fixed size is useless in practice; it doesn't help in the slightest the task of constraining range, which is almost always defined by the data, and similar external context. Any code which silently relies on a Java primitive type wrapping is poor code. Using comments is always second to using masks, and other techniques, where the code speaks for itself more clearly than a comment ever could. A better system, of course, would utilize type ranges a la Ada. Anyhow, I know the parent's point had more to do with pointers, but this just all goes to show that good code doesn't rely on underlying representations, but only the explicit logic of the programmer, and the semantics of the language. </pre></div> Mon, 21 Apr 2008 11:09:53 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278834/ https://lwn.net/Articles/278834/ nix <div class="FormattedComment"><pre> Ah, right. Well, QoI is in the eye of the beholder: I happen to think that the semantics of pointer overflow are semantics that only a maniac would rely upon intentionally, so the *most* we need is an extension of the existing compiler warning that `comparison is always true' or something along those lines. </pre></div> Mon, 21 Apr 2008 06:50:27 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278822/ https://lwn.net/Articles/278822/ mikov <div class="FormattedComment"><pre> I understand that. (In fact I used to be intimately familiar with the C99 Standard several years back, before I got disillusioned with it - how can you have any respect for a standard containing "atoi()" ? :-) ). I am not discussing the "meaning" of the Standard. The meaning is more than clear, at least in this case, and if you look at my first post in this thread, you will see me explicitly noting that this is a completely valid transformation according to the Standard. However it is also completely valid *not* to to perform this transformation. So, this is clearly a QOI issue and I am discussing it as such. I am not convinced that optimizing away operations which are otherwise completely valid and useful, only because the standard says that it is allowed (but not required!), is a good idea. The Standard is just a document. Fortunately it is not an absolute truth and it does not prevent us from thinking on our own. Plus, it is not particularly inspiring or useful. As you probably know very well, it is impossible to write an useful C application using only the C standard. *Any* C application is non-portable and non-standard compliant. The C Standard is so weak, that it is practically useless, except as mental self pleasuring, much as we are doing now. (BTW, people often delude themselves that they are writing compliant C, when their applications happen to run on a couple of conventional 32-bit architectures, using the same compiler, in a POSIX environment. Ha! Then they get ported to Windows (or the other way around) and it is taken as an absolute proof of compliance. In fact the applications are a mess of special cases, ifdef-s and non-standard assumptions.) </pre></div> Mon, 21 Apr 2008 01:30:46 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278821/ https://lwn.net/Articles/278821/ nix <div class="FormattedComment"><pre> Pointer overflow is undefined because the Standard does not define it. That's what undefined *means*. The semantics of some construct are undefined both when the Standard says they are, and when it simply doesn't say anything one way or the other. The semantics of a construct are implementation-defined *only* when the Standard says they are. You can't say `oh, it doesn't say anything about $FOO, therefore it's implementation-defined'. </pre></div> Mon, 21 Apr 2008 00:32:52 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278819/ https://lwn.net/Articles/278819/ mikov <div class="FormattedComment"><pre> As you noted yourself, there is a subtle difference between undefined and implementation defined behavior. This particular case is clearly an example of *implementation defined*. It behaves in the same way on 99% of all CPUs today and there is nothing undefined about it. Not only that, but it is not an uncommon idiom and can be useful in a number of situations. How can you call it undefined ??? You speak about architectures with different pointer representations, etc. I have heard that argument before and I do not buy it. Such implementations mostly do not exist, or if they do, are not an important target for Standard C today, let alone GCC. (Emphasis on the word "standard"). Considering the direction where hardware is going, such architectures are even less likely to appear in the future. Instructions that trap for any reason are terribly inconvenient for superscalar or OoO execution. Anyway, I think I may not have been completely clear about my point. I am *not* advocating that the standard should require some particular behavior on integer and pointer overflows. I am however not convinced that it is a good idea to explicitly *prohibit* the behavior which is natural to most computers in existence. As it is now, Standard C is (ahem) somewhat less than useful for writing portable programs (to put it extremely mildly). It is somewhat surprising that the Standard and (apparently) most compiler implementors are also making less useful for *non-portable* applications :-) (I have to admit that years ago I was fascinated with the aesthetic "beauty" of portable C code, or may be I liked the challenge. Then I came to realize what a horrible waste of time it was. Take Java for example - for all its horrible suckiness, it does a few things right - it defines *precisely* the size of the integer types, the behavior on overflow and integer remainder of negative numbers. There is a lot of good in x86 being the universal CPU winner - we can finally forget the idiosyncrasies of the seventies ...) </pre></div> Sun, 20 Apr 2008 22:39:25 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278801/ https://lwn.net/Articles/278801/ viro <div class="FormattedComment"><pre> a) *unsigned* integers wrap around; it hadn't been promised for signed ones and as the matter of fact it hadn't been true for a bunch of implementations, all way back to late 70s. b) pointer + integer wraparounds are even worse - all sorts of fun things happened in a lot of implementations; e.g. there's nothing to stop the hardware from using upper bits to store extra information of any kind (e.g. ASN) and using an instruction for pointer + int that mangles them instead of a wraparound. With instruction for comparison that would trap on ASN mismatch. And that doesn't take anything too exotic - e.g. 56bit address space, 8bit ASN in upper bits, 32bit int. Get a carry into bit 56, have both &gt; and &lt; trigger a trap. The point is, you are not relying on low-level details - you are assuming that all world is a VAX (OK, x86 these days). And insisting that all implementations reproduce exact details of (undefined) behaviour, no matter how much PITA it would be. BTW, that addition itself can trap on wraparound - and does on real-world architectures. So you can do that validation in a sane way (compare untrusted index with maximal valid value) or, if you want to rely on details on pointer implementation *and* be perverse, play with casting to uintptr_t and using the fact that operations on _that_ are well-defined. Will still break on architectures that do not match your idea of how the pointers are represented, but at least you won't be feeding the data to be validated into operations that might do all sorts of interesting things and trying to guess if those things had happened by the outcome. There's implementation-defined and there's undefined. That kind of wraparounds is firmly in the latter class, regardless of any optimizations. C doesn't try to codify _how_ any particular construct with undefined behaviour will screw you - it doesn't even promise to try and have it act consistently on given target. </pre></div> Sun, 20 Apr 2008 19:44:55 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278748/ https://lwn.net/Articles/278748/ giraffedata <blockquote> I think you are missing the point that this code is often both valid and intentional. </blockquote> <p> No, I did get that point, and it's orthogonal to my point, which was that the compiler is not optimizing away my code, so need not warn me that it has optimized away my code. <p> The first point I made in the same post is that I prefer no warning unless there is a way for me to disable it, which is directly derived from your point that the code is often both valid and intentional. <p> <strong>However:</strong> I can see your real point is that the code is not only <em>intentional</em>, but is intended to mean something different from what the standard says it means. That, I suppose, means the warning says, "you wrote something that often is intended to say something other than what it really says." I agree that's a valid warning. It's like the warning for "<tt>if (a = b)</tt>". It's still not a case of the user's code being "optimized away." <blockquote> After all this is C, not Lisp, so a programmer is supposed to be aware that integer types wrap around, etc. </blockquote> <p> While quite low level, C is not so low level that integer types wrap around. You're just talking about particular implementations of C. The warning we're talking about warns the programmer that something <em>didn't</em> wrap around as expected. Sat, 19 Apr 2008 19:19:29 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278743/ https://lwn.net/Articles/278743/ mikov <blockquote>I really don't think of what the compiler is doing as "optimizing away" my code. It's not throwing away code I wrote, it's simply implementing it with zero instructions, because that's an efficient way to implement what I wrote. The warning isn't, "I didn't generate any instructions for this." It's, "you wrote something that a programmer normally wouldn't write intentionally, so maybe you meant to write something else."</blockquote> <p>I think your are missing the point that this code is often both valid and intentional. After all this is C, not Lisp, so a programmer is supposed to be aware that integer types wrap around, etc. <p>I think that it happens to be one of the most efficient ways to check for overflow in some cases. So, it is a trade off - the standard has chosen to declare <i>perfectly valid code</i> as invalid in order to make other optimizations easier. <p>For example, a similar trade off would be to declare that all aliasing is invalid. Imagine the wild optimizations that would be made possible by that ! Sat, 19 Apr 2008 18:16:03 +0000 GCC and pointer overflows https://lwn.net/Articles/278698/ https://lwn.net/Articles/278698/ wahern <div class="FormattedComment"><pre> The question was how to check if arithmetic overflowed/wrapped, not whether an index or length is valid. </pre></div> Sat, 19 Apr 2008 05:51:53 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278677/ https://lwn.net/Articles/278677/ nix <div class="FormattedComment"><pre> That's equally useful for pointer arithmetic. Think arrays, strings, pointer walks, even things like Floyd's algorithm (smart optimizers can even spot the relationship between the tortoise and the hare: allowing for the hare to wrap around would ruin that). </pre></div> Fri, 18 Apr 2008 22:29:47 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278674/ https://lwn.net/Articles/278674/ giraffedata <p> I would prefer a warning only if there's an easy way to disable the warning. And Gcc almost never provides a way to say, "don't warn me about this particular piece of code; I've already verified it's supposed to be that way." The things I have to do to stop warning messages often make the code harder to read or slower to execute. But because the warnings are so often valid, I won't accept any code that generates any warnings. <p> I've used other compilers that at least have the ability to turn off warnings for a certain stretch of source code. And others have fine-grained selection of types of warnings. <blockquote> I prefer the compiler to warn about it, not to silently optimize the code. </blockquote> <p> I really don't think of what the compiler is doing as "optimizing away" my code. It's not throwing away code I wrote, it's simply implementing it with zero instructions, because that's an efficient way to implement what I wrote. The warning isn't, "I didn't generate any instructions for this." It's, "you wrote something that a programmer normally wouldn't write intentionally, so maybe you meant to write something else." Fri, 18 Apr 2008 22:03:30 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278672/ https://lwn.net/Articles/278672/ mikov <div class="FormattedComment"><pre> Exactly my point. It is not easy to imagine a case where this is a _desirable_ optimization. It is either a bug, which should be warned, or it is deliberate. Either way the compiler should not remove the code! I imagine however that the general assumption "int + positive_value is always greater than int" may be useful for complex loop optimizations and the like. It probably allows to completely ignore integer overflow in cases like this "for (int i = 0; i &lt; x; ++i ) {...}". </pre></div> Fri, 18 Apr 2008 21:22:52 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278669/ https://lwn.net/Articles/278669/ nix <div class="FormattedComment"><pre> Well said. Also, while in some cases it is a QoI issue which high-quality implementations will in some cases prescribe useful semantics for, this isn't such a case. I can't think of any particularly useful semantics for pointer wraparound, especially given that distinct objects have no defined nor stable relationships with each other anyway. Operating under the rules of modular arithmetic might have been nice, and perhaps a more recent language would define that... </pre></div> Fri, 18 Apr 2008 21:00:23 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278667/ https://lwn.net/Articles/278667/ ibukanov <div class="FormattedComment"><pre> <font class="QuotedText">&gt; * Are these optimizations really beneficial ? Do situations like the above occur in _correct_ code ?</font> Stuff like x + CONSTANT &lt; x can trivially happen as a result of a complex macro expansion especially after changes done by different people. But when this happens, I prefer the compiler to warn about it, not to silently optimize the code. </pre></div> Fri, 18 Apr 2008 20:59:19 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278654/ https://lwn.net/Articles/278654/ brouhaha There's a big difference between the (a + foo < a) and (sizeof (a) < 1) cases, which is that the former is something that a programmer is likely to code specifically because he or she knows that the program might (unintentionally) be buggy, in an attempt to catch a bug, while the latter is unlikely to occur at all, and certainly not as something a programmer is likely to deliberately test for. <blockquote> If it decided that, oh, that could be true after all, it's choosing an interpretation which the Standard forbids. </blockquote> Yet which can actually quite easily happen in real programs. NOWHERE does the standard say that a compiler has to optimize away tests that might always have a fixed value for valid programs, but might easily have differing values for buggy programs. <blockquote> Perhaps we should spell 'while' differently when the locale is de_DE? </blockquote> You've lost me here. I don't see any way that a purely semantic error in a program could result in "while" being misspelled, even if the locale is de_DE. Fri, 18 Apr 2008 19:15:54 +0000 Is this an useful optimization ?? https://lwn.net/Articles/278626/ https://lwn.net/Articles/278626/ mikov <div class="FormattedComment"><pre> So, this is similar to this issue: int x = ...; if (x + CONSTANT &lt; x) overflow(); GCC will optimize out the check completely because the standard says so. There is no arguing that GCC is in its right to do the optimization. However there are plenty of circumstances where this code is in fact valid (when the values of x and CONSTANT are controlled and the code is running on two's complement machine, etc). So, I am concerned about two things: * Are these optimizations really beneficial ? Do situations like the above occur in _correct_ code ? * Don't they make C a less useful language ? C is often regarded as high level assembler. When I write something, it should not be up to the compiler to second guess me. </pre></div> Fri, 18 Apr 2008 16:42:10 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278625/ https://lwn.net/Articles/278625/ viro <div class="FormattedComment"><pre> Standard is a contract between authors of program and authors of C implementation; if program invokes undefined behaviour, all bets are off. It is allowed to compile the check in question into "check if addition had triggered an overflow; if it had, before bothering with any comparisons do unto luser what Simon would have done on a particulary bad day". It can also turn that into "if addition overflows, take the value of first argument". And optimize according to that. It's not a matter of optimizing your comparisons away; it's a matter of addition having no prescribed semantics in case of overflows, regardless of optimizations. </pre></div> Fri, 18 Apr 2008 16:19:16 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278597/ https://lwn.net/Articles/278597/ nix <div class="FormattedComment"><pre> The problem is that the boundary between 'a C program, but not necessarily a valid one' and 'not a C program' is questionable. if (a + foo &lt; a) is testing something which, in any C program conforming to the Standard without really weird extensions, must be false. This is every bit as true as if (sizeof (a) &lt; 1) would be. If it decided that, oh, that could be true after all, it's choosing an interpretation which the Standard forbids. ... and if the compiler starts accepting that, what other non-C programs should it start accepting? Perhaps we should spell 'while' differently when the locale is de_DE? Perhaps `mary had a little lamb' is now defined as a valid C program? (Sure, compilers can *do* all this, and GCC does have some extensions chosen explicitly because their syntax is invalid Standard C --- the statement-expression extension springs to mind --- but the barrier to new extensions is very high these days, and in any case that doesn't mean that *anything* people do wrong should be defined as a language extension, especially not when it's as weird and devoid of practical utility as this. Portability to other compilers is important.) </pre></div> Fri, 18 Apr 2008 11:18:13 +0000 GCC and pointer overflows https://lwn.net/Articles/278569/ https://lwn.net/Articles/278569/ jd <div class="FormattedComment"><pre> In many cases with GCC, optimizations that are implied by something like -O2 or -O3 can be specifically disabled, or additional optimizations can be specifically enabled. If either case is true of the pointer arithmetic optimization in the newer GCCs, then this would seem to be a better case to put. The "insecurity" is optional and only there because the user has stipulated (implicitly or explicitly) that it should be. Alternatvely, it might be argued that this shouldn't be in GCC proper (it's not a compiler function, it's a validation function) but in a distinct module which scanned for likely range violations and other cadidates for problems - something akin to splint or the stanford checker but restricted to cases that can be reliably detected. You could then run it independently or as part of the toolchain. A third option would be to modify a source-to-source compiler like OpenIMPACT to alter tests that would be optimized this way into a more correct form that doesn't need optimizing by GCC. A fourth option is to have bounds-checking at runtime be implicit and require that it be explicitly excluded at compile-time. Alternatively, since runtime bounds-checking is already available, state that its exclusion is a voluntary statement on the part of a developer that the validation of pointers is not wanted. In other words, this seems a non-argument on any meaningful timescale. There are way too many ways for the GCC developers, other members of the open source community or the developers of potentially suspect code to eliminate the problem one way or another. If there is a potential problem, it is a potential problem by mutual agreement. </pre></div> Fri, 18 Apr 2008 05:48:40 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278526/ https://lwn.net/Articles/278526/ brouhaha There are three possibilities: <ol> <li>GCC does not make assumptions about the validity of the input <li>GCC assumes that the input is a C program, but not necessarily a valid one (according to the C standard) <li>GCC assumes that the input is a valid C program </ol> The GCC maintainers claim that the optimization is legitimate because GCC assumes that the input is a valid C program (choice 3). If they are willing to make that assumption, then they shouldn't need any error checking whatsoever. <p> A compiler assuming that the input is a valid program is counter to my expectations as a user of the compiler. Unless I explicitly turn on unsafe optimizations, I don't expect it to optimize away any comparisons that I've written, unless it can prove that the condition will always have the same result, based on my actual program code, not on what the C standard says a valid program may or may not do. <p> I have no problem at all with having such optimizations enabled as part of the unsafe optimizations at higher -O option levels. Thu, 17 Apr 2008 22:58:47 +0000 GCC and pointer overflows https://lwn.net/Articles/278515/ https://lwn.net/Articles/278515/ jzbiciak <P> Of course, it fails for dynamically allocated and grown buffers since<TT> sizeof() </TT>can't tell you the length. </P> <P>Also, you failed to account for element size. The following should work, though, for arrays of static size:</P> <PRE> if (len > (sizeof(buf) / sizeof(buf[0])) die_in_a_fire(); </PRE> <P>I don't understand why you have the bitwise negation operator in there. Also,<TT> len </TT>is a length, not a pointer type, so pointer format doesn't matter.</P> Thu, 17 Apr 2008 22:03:22 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278513/ https://lwn.net/Articles/278513/ nix <div class="FormattedComment"><pre> Something which invokes undefined behaviour according to the standard is, in a sense, not written in C. Of course any useful program will do things *beyond* the standard (e.g. relying on POSIX); but relying on things the like pointer wraparound is a rather different matter. (Who the hell would ever intentionally rely on such things anyway? It pretty much *has* to be a bug to have such code.) </pre></div> Thu, 17 Apr 2008 21:45:12 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278501/ https://lwn.net/Articles/278501/ brouhaha My point exactly. If the assumption is only that the program is written in C, but NOT that it is a correct C program, then it isn't reasonable to assume that the program's use of pointers meets the constraint defined by the standard, in which a pointer will never be manipulated to point outside its object. <p> That would be a fine optimization to use when you choose some high level of optimization that includes unsafe optimizations, but it shouldn't happen at -O1. Thu, 17 Apr 2008 20:39:40 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278471/ https://lwn.net/Articles/278471/ nix <div class="FormattedComment"><pre> GCC assumes that a program you feed its C compiler is written in C. If what you feed it cannot be a valid program (e.g. it contains a syntax error), it errors. If it's likely that it's invalid, it often warns. But if what you feed it has one valid interpretation and one which must be invalid, of course it must pick the valid interpretation. Anything else would lead to its being unable to compile valid programs, which would be a standards violation (and bloody annoying). </pre></div> Thu, 17 Apr 2008 20:08:32 +0000 Since when does GCC *assume* the program to be correct? https://lwn.net/Articles/278464/ https://lwn.net/Articles/278464/ brouhaha <blockquote> This behavior is allowed by the C standard, which states that, in a correct program, pointer addition will not yield a pointer value outside of the same object. So the compiler can assume that the test for overflow is always false and may thus be eliminated from the expression. </blockquote> GCC can only assume that the test for overflow is always false if it first assumes that the program is correct. Since when does it assume that? If GCC assumes that the program is correct, why does it sometimes generate errors and/or warnings at compile time? Correct programs won't require any error or warning messages at compile time, so there's no point in GCC having any code to check for the conditions that can cause those errors and warnings. Clearly it would be more efficient for GCC to avoid those checks, so they should be eliminated. Thu, 17 Apr 2008 19:38:23 +0000 would using casts fix the problem? https://lwn.net/Articles/278457/ https://lwn.net/Articles/278457/ hummassa <div class="FormattedComment"><pre> yes, aend - a (without cast) _is_ 10 in any platform, but (int)aend - (int)a is _undefined_ behaviour, because the bit pattern when converting from pointer to int is implementation-defined. </pre></div> Thu, 17 Apr 2008 18:55:52 +0000 Solution https://lwn.net/Articles/278440/ https://lwn.net/Articles/278440/ clugstj <div class="FormattedComment"><pre> OK, how about this: if you don't trust the code, don't use the latest whiz bang optimizations! The documentation for GCC should specifically warn about optimizations that can do bad things with questionable code. As an example, here is a quote from the "Sun Workshop 6 update 2" man page: -xO3 In addition to optimizations performed at the -xO2 level, also optimizes references and definitions for external variables. This level does not trace the effects of pointer assignments. When compiling either device drivers that are not properly protected by volatile, or programs that modify external variables from within signal handlers, use -xO2 Let's not penalize GCC because of how some people will use/abuse it. </pre></div> Thu, 17 Apr 2008 17:24:37 +0000 GCC and pointer overflows https://lwn.net/Articles/278414/ https://lwn.net/Articles/278414/ jzbiciak <P>No, I meant 32 bit system. If you do the pointer arithmetic first on an array of structs whose size is 65536, then any length &gt; 32767 will give a byte offset that's &gt;= 2<SUP>31</SUP> away. A length of 65535 will give you a pointer that's one element before the base.</P> <P>I guess it's an oversimplification to say 32767 specifically. Clearly, any len &gt; 65535 will wrap back around and start to look "legal" again if the element size is 65536. Any len &lt;= 65535 but greater than some number determined by how far the base pointer is from the end of the 32-bit memory map will wrap to give you a pointer that is less than the base. So, the threshold isn't really 32767, but something larger than the length of the array.</P> <P>That was my main point: When you factor in the size of the indexed type, the threshold at which an index causes the pointer to wrap around the end of the address space and the potential number of wrapping scenarios gets you in trouble. Thus, the only safe way is to compare indices, not pointers.</P> <P> For example, suppose the compiler didn't do this particular optimization, and you had written the following code on a machine for which <TT>sizeof(int)==4</TT>: </P> <PRE> extern void foo(int*); void b0rken(unsigned int len) { int mybuf[BUFLEN]; int *mybuf_end = mybuf + BUFLEN; int i; if (mybuf + len < mybuf_end &amp;&amp; mybuf + len >= mybuf) { for (i = 0; i < len; i++) mybuf[i] = 42; } foo(mybuf); /* prevent dead code elimination */ } </PRE> <P>What happens when you pass in<TT> 0x40000001 </TT>in for<TT> len</TT>? The computed pointer for<TT> mybuf + len </TT> will be equal to<TT> &amp;mybuf[1]</TT>, because under the hood, the compiler multiplies<TT> len </TT>by<TT> sizeof(int)</TT>, giving<TT> 0x00000004 </TT>after truncating to 32 bits. How many times does the loop iterate, though?</P> <P>This happens <I>regardless</I> of whether the compiler optimizes away the second test.</P> <P>Now, there <I>is</I> a chance a <I>different</I> optimization saves your butt here. If the compiler does common subexpression elimination and some amount of forward substitution, it may rewrite that<TT> if </TT>statement's first test as follows. (Oversimplified, but hopefully it gives you an idea of what the compiler can do to/for you.)</P> <P>Step 0: Represent pointer arithmetic as bare arithmetic (internally)</P> <PRE> mybuf + 4*len >= mybuf_end </PRE> <P>Step 1: Forward substitution.</P> <PRE> mybuf + 4*len >= mybuf + 4*BUFLEN </PRE> <P>Step 2: Remove common terms from both sides. (Assumes no integer overflow with pointer arithmetic--what started this conversation to begin with.)</P> <PRE> 4*len >= 4*BUFLEN </PRE> <P>Step 3: Strength reduction. (Again, assumes no integer overflow with pointer arithmetic.)</P> <PRE> len >= BUFLEN </PRE> <P>That gets us back to what the test should have been to begin with. There's no guarantee that the compiler will do all of this though. For example, GCC 4.2.0 doesn't seem to optimize the test in this way. I just tried the Code Composer Studio compiler for C64x (v6.0.18), and it seems to do up through step 2 if the array is on the local stack, thereby changing, but not eliminating the bug.</P> <P>It turns out GCC does yet a <I>different</I> optimization that changes the way in which the bug might manifest. It eliminates the original loop counter entirely, and instead transforms the loop effectively into:</P> <PRE> int *ptr; for (ptr = mybuf; ptr != mybuf + len; ptr++) *ptr++ = 42; </PRE> <P>This actually has even <I>different</I> safety characteristics.</P> <P>Moral of the story? Compare indices, not computed pointers to avoid buffer overflows.</P> Thu, 17 Apr 2008 17:14:08 +0000 GCC and pointer overflows https://lwn.net/Articles/278402/ https://lwn.net/Articles/278402/ mb <div class="FormattedComment"><pre> <font class="QuotedText">&gt; Any length &gt; 32767 causes problems on a 32-bit system.</font> Care to explain why? Did you mean "16-bit system"? </pre></div> Thu, 17 Apr 2008 16:23:57 +0000 GCC and pointer overflows https://lwn.net/Articles/278386/ https://lwn.net/Articles/278386/ wahern <div class="FormattedComment"><pre> if (~sizeof buf &lt; len) { die(); } This only works with unsigned values, and there are probably some caveats with width and promotion rules (portable, nonetheless). Also, assuming your environment uses linear addressing, and there's no other funny stuff going on with pointer bits (like the effectively 16 free bits on AMD64--using 48-bit addressing). if (~(uintptr_t)buf &lt; len) { die(); } I believe this should work on Windows and all Unix systems (guaranteed by additional SUSv3 constraints), but I'm not positive. </pre></div> Thu, 17 Apr 2008 16:01:02 +0000 Solution https://lwn.net/Articles/278375/ https://lwn.net/Articles/278375/ proski Bugs exist not only in old crusty code. Just because the code was written recently, it doesn't make the code better or safer. Even if the code was developed with the new compiler but was not specifically tested for overflows, it still can have that problem. Thu, 17 Apr 2008 15:27:14 +0000 GCC and pointer overflows https://lwn.net/Articles/278354/ https://lwn.net/Articles/278354/ iabervon <div class="FormattedComment"><pre> How so? I was only suggesting that the expression "pointer1 + offset &gt;= pointer2" be treated differently (that is, the magic does away if you store the LHS in a variable and use the variable). (And, of course, the change would only apply to cases where the LHS overflows the address space, which is clearly undefined, so having odd things matter isn't a problem.) If you do something with "pointer + offset" other than comparing it with something else, I wouldn't have anything change (what would it do, anyway, without a test and branch in the code?); and in the case where you're doing the comparison, processors generally have a status bit that gets updated in the multiply/shift and add and is pretty much free to have an unlikely jump based on. Except on architectures where the instruction decode significantly limits the pipeline, it should be at most one cycle to test so long as you do each test right after the ALU operation. C, in general, doesn't make good use of the fact that processors make it trivial to test for overflow, but that doesn't mean that C compilers can't do extra-clever things with this ability in cases where there's undefined behavior associated with it. </pre></div> Thu, 17 Apr 2008 14:06:29 +0000 GCC and pointer overflows https://lwn.net/Articles/278322/ https://lwn.net/Articles/278322/ kleptog <div class="FormattedComment"><pre> There's a terminology problem here: the program is not incorrect, the programmer has made an incorrect assumption. The assumption is that (buf + len &lt; buf) will be true if len is very large. Besides the fact that the assumption is false if sizeof(*buf) != 1, the GCC team (and other compilers) point out that this assumption is not warrented by the C spec. Stronger still, the C spec allows you to *assume* the test is false, no matter the value of len (assuming len is unsigned btw). That said, I'd love a way to say: if( __wraps( buf + len ) ) die(); </pre></div> Thu, 17 Apr 2008 09:39:15 +0000 CERT responds on the GCC mailing list https://lwn.net/Articles/278278/ https://lwn.net/Articles/278278/ stuart_hc Here is one <a href ="http://gcc.gnu.org/ml/gcc/2008-04/msg00131.html">example</a> of CERT's rationale for singling out GCC, citing GCC's popularity.<br> In this <a href="http://gcc.gnu.org/ml/gcc/2008-04/msg00131.html">email</a> CERT concedes they should change the advisory to mention other compilers, but it doesn't appear that CERT have made that change yet. Thu, 17 Apr 2008 03:55:20 +0000 Solution https://lwn.net/Articles/278253/ https://lwn.net/Articles/278253/ clugstj <div class="FormattedComment"><pre> How about this obvious solution. When compiling old crusty code, don't use the latest-and-greatest compiler optimizations? (I'm assuming this one can be turned off.) The older, and less recently maintained, the code, the more conservative the optimization should be. </pre></div> Thu, 17 Apr 2008 00:44:37 +0000 GCC and pointer overflows https://lwn.net/Articles/278249/ https://lwn.net/Articles/278249/ jzbiciak <BLOCKQUOTE><I>pointer addition will not yield a pointer value outside of the same object</I></BLOCKQUOTE> <P> Just some pedantry, but I believe you're allowed to point to the first element after the end of an array, as long as you don't dereference that element. Pointer comparisons with such a pointer are supposed to work, too. </P><P> That said, the problem is very similar to time comparison, where you have a running time counter that could wrap around, but all measured intervals still fit within the total size of the time-counter type. There, the safe way to do things is to keep the time as an unsigned value, and subtract off the base. </P><P> You can't really do that here, because <TT>ptrdiff_t</TT> really ought to be a signed type. In this case, if you keep "len" in a signed variable, negative lengths are always an error and so you can check for them directly <I>before</I> adding to a pointer base. Positive lengths can always be compared against the buffer size. In either case, you're comparing indices against each other, not pointers.</P> <P>Comparing against a generated pointer is a premature optimization, and begs for the compiler to slap you. This is especially true when the pointer is to a type for which sizeof(type) &gt; 1. For a large structure, the address arithmetic could wrap around several times, even for relatively small indices. Imagine a structure for which sizeof(type) == 65536. Any length &gt; 32767 causes problems on a 32-bit system. Granted, arrays of such large structures tend to be very rare.</P> Thu, 17 Apr 2008 00:01:47 +0000