User: Password:
|
|
Subscribe / Log in / New account

Is this an useful optimization ??

Is this an useful optimization ??

Posted Apr 18, 2008 16:42 UTC (Fri) by mikov (subscriber, #33179)
Parent article: GCC and pointer overflows

So, this is similar to this issue:

  int x = ...;
  if (x + CONSTANT < x)
     overflow();

GCC will optimize out the check completely because the standard says so. There is no arguing
that GCC is in its right to do the optimization.

However there are plenty of circumstances where this code is in fact valid (when the values of
x and CONSTANT are controlled and the code is running on two's complement machine, etc).

So, I am concerned about two things:

* Are these optimizations really beneficial ? Do situations like the above occur in _correct_
code ?

* Don't they make C a less useful language ? C is often regarded as high level assembler. When
I write something, it should not be up to the compiler to second guess me.


(Log in to post comments)

Is this an useful optimization ??

Posted Apr 18, 2008 20:59 UTC (Fri) by ibukanov (subscriber, #3942) [Link]

> * Are these optimizations really beneficial ? Do situations like the above occur in
_correct_ code ?

Stuff like x + CONSTANT < x can trivially happen as a result of a complex macro expansion
especially after changes done by different people.

But when this happens, I prefer the compiler to warn about it, not to silently optimize the
code.

Is this an useful optimization ??

Posted Apr 18, 2008 21:22 UTC (Fri) by mikov (subscriber, #33179) [Link]

Exactly my point. It is not easy to imagine a case where this is a _desirable_ optimization.
It is either a bug, which should be warned, or it is deliberate. Either way the compiler
should not remove the code!

I imagine however that the general assumption "int + positive_value is always greater than
int" may be useful for complex loop optimizations and the like. It probably allows to
completely ignore integer overflow in cases like this "for (int i = 0; i < x; ++i ) {...}". 





Is this an useful optimization ??

Posted Apr 18, 2008 22:29 UTC (Fri) by nix (subscriber, #2304) [Link]

That's equally useful for pointer arithmetic. Think arrays, strings, 
pointer walks, even things like Floyd's algorithm (smart optimizers can 
even spot the relationship between the tortoise and the hare: allowing for 
the hare to wrap around would ruin that).

Is this an useful optimization ??

Posted Apr 18, 2008 22:03 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

I would prefer a warning only if there's an easy way to disable the warning. And Gcc almost never provides a way to say, "don't warn me about this particular piece of code; I've already verified it's supposed to be that way." The things I have to do to stop warning messages often make the code harder to read or slower to execute. But because the warnings are so often valid, I won't accept any code that generates any warnings.

I've used other compilers that at least have the ability to turn off warnings for a certain stretch of source code. And others have fine-grained selection of types of warnings.

I prefer the compiler to warn about it, not to silently optimize the code.

I really don't think of what the compiler is doing as "optimizing away" my code. It's not throwing away code I wrote, it's simply implementing it with zero instructions, because that's an efficient way to implement what I wrote. The warning isn't, "I didn't generate any instructions for this." It's, "you wrote something that a programmer normally wouldn't write intentionally, so maybe you meant to write something else."

Is this an useful optimization ??

Posted Apr 19, 2008 18:16 UTC (Sat) by mikov (subscriber, #33179) [Link]

I really don't think of what the compiler is doing as "optimizing away" my code. It's not throwing away code I wrote, it's simply implementing it with zero instructions, because that's an efficient way to implement what I wrote. The warning isn't, "I didn't generate any instructions for this." It's, "you wrote something that a programmer normally wouldn't write intentionally, so maybe you meant to write something else."

I think your are missing the point that this code is often both valid and intentional. After all this is C, not Lisp, so a programmer is supposed to be aware that integer types wrap around, etc.

I think that it happens to be one of the most efficient ways to check for overflow in some cases. So, it is a trade off - the standard has chosen to declare perfectly valid code as invalid in order to make other optimizations easier.

For example, a similar trade off would be to declare that all aliasing is invalid. Imagine the wild optimizations that would be made possible by that !

Is this an useful optimization ??

Posted Apr 19, 2008 19:19 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

I think you are missing the point that this code is often both valid and intentional.

No, I did get that point, and it's orthogonal to my point, which was that the compiler is not optimizing away my code, so need not warn me that it has optimized away my code.

The first point I made in the same post is that I prefer no warning unless there is a way for me to disable it, which is directly derived from your point that the code is often both valid and intentional.

However: I can see your real point is that the code is not only intentional, but is intended to mean something different from what the standard says it means. That, I suppose, means the warning says, "you wrote something that often is intended to say something other than what it really says." I agree that's a valid warning. It's like the warning for "if (a = b)". It's still not a case of the user's code being "optimized away."

After all this is C, not Lisp, so a programmer is supposed to be aware that integer types wrap around, etc.

While quite low level, C is not so low level that integer types wrap around. You're just talking about particular implementations of C. The warning we're talking about warns the programmer that something didn't wrap around as expected.

Is this an useful optimization ??

Posted Apr 20, 2008 19:44 UTC (Sun) by viro (subscriber, #7872) [Link]

a) *unsigned* integers wrap around; it hadn't been promised for signed
ones and as the matter of fact it hadn't been true for a bunch of
implementations, all way back to late 70s.

b) pointer + integer wraparounds are even worse - all sorts of fun
things happened in a lot of implementations; e.g. there's nothing to
stop the hardware from using upper bits to store extra information of
any kind (e.g. ASN) and using an instruction for pointer + int that
mangles them instead of a wraparound.  With instruction for comparison
that would trap on ASN mismatch.  And that doesn't take anything too
exotic - e.g. 56bit address space, 8bit ASN in upper bits, 32bit int.
Get a carry into bit 56, have both > and < trigger a trap.

The point is, you are not relying on low-level details - you are assuming
that all world is a VAX (OK, x86 these days).  And insisting that all
implementations reproduce exact details of (undefined) behaviour, no
matter how much PITA it would be.  BTW, that addition itself can trap
on wraparound - and does on real-world architectures.

So you can do that validation in a sane way (compare untrusted index
with maximal valid value) or, if you want to rely on details on pointer
implementation *and* be perverse, play with casting to uintptr_t and
using the fact that operations on _that_ are well-defined.  Will still
break on architectures that do not match your idea of how the pointers
are represented, but at least you won't be feeding the data to be
validated into operations that might do all sorts of interesting things
and trying to guess if those things had happened by the outcome.

There's implementation-defined and there's undefined.  That kind of
wraparounds is firmly in the latter class, regardless of any optimizations.
C doesn't try to codify _how_ any particular construct with undefined
behaviour will screw you - it doesn't even promise to try and have it
act consistently on given target.

Is this an useful optimization ??

Posted Apr 20, 2008 22:39 UTC (Sun) by mikov (subscriber, #33179) [Link]

As you noted yourself, there is a subtle difference between undefined and implementation
defined behavior. This particular case is clearly an example of *implementation defined*. It
behaves in the same way on 99% of all CPUs today and there is nothing undefined about it. Not
only that, but it is not an uncommon idiom and can be useful in a number of situations. How
can you call it undefined ???

You speak about architectures with different pointer representations, etc. I have heard that
argument before and I do not buy it. Such implementations 
mostly do not exist, or if they do, are not an important target for Standard C today, let
alone GCC. (Emphasis on the word "standard"). 

Considering the direction where hardware is going, such architectures are even less likely to
appear in the future. Instructions that trap for any reason are terribly inconvenient for
superscalar or OoO execution.

Anyway, I think I may not have been completely clear about my point. I am *not* advocating
that the standard should require some particular behavior on integer and pointer overflows. I
am however not convinced that it is a good idea to explicitly *prohibit* the behavior which is
natural to most computers in existence.

As it is now, Standard C is (ahem) somewhat less than useful for writing portable programs (to
put it extremely mildly). It is somewhat surprising that the Standard and (apparently) most
compiler implementors are also making less useful for *non-portable* applications :-)


(I have to admit that years ago I was fascinated with the aesthetic "beauty" of portable C
code, or may be I liked the challenge. Then I came to realize what a horrible waste of time it
was. Take Java for example - for all its horrible suckiness, it does a few things right - it
defines *precisely* the size of the integer types, the behavior on overflow and integer
remainder of negative numbers.

There is a lot of good in x86 being the universal CPU winner - we can finally forget the
idiosyncrasies of the seventies ...)

Is this an useful optimization ??

Posted Apr 21, 2008 0:32 UTC (Mon) by nix (subscriber, #2304) [Link]

Pointer overflow is undefined because the Standard does not define it. 
That's what undefined *means*. The semantics of some construct are 
undefined both when the Standard says they are, and when it simply doesn't 
say anything one way or the other. The semantics of a construct are 
implementation-defined *only* when the Standard says they are. You can't 
say `oh, it doesn't say anything about $FOO, therefore it's 
implementation-defined'.

Is this an useful optimization ??

Posted Apr 21, 2008 1:30 UTC (Mon) by mikov (subscriber, #33179) [Link]

I understand that. (In fact I used to be intimately familiar with the C99 Standard several
years back, before I got disillusioned with it - how can you have any respect for a standard
containing "atoi()" ? :-) ). 

I am not discussing the "meaning" of the Standard. The meaning is more than clear, at least in
this case, and if you look at my first post in this thread, you will see me explicitly noting
that this is a completely valid transformation according to the Standard. 

However it is also completely valid *not* to to perform this transformation. So, this is
clearly a QOI issue and I am discussing it as such. I am not convinced that optimizing away
operations which are otherwise completely valid and useful, only because the standard says
that it is allowed (but not required!), is a good idea.

The Standard is just a document. Fortunately it is not an absolute truth and it does not
prevent us from thinking on our own. Plus, it is not particularly inspiring or useful.

As you probably know very well, it is impossible to write an useful C application using only
the C standard. *Any* C application is non-portable and non-standard compliant.  The C
Standard is so weak, that it is practically useless, except as mental self pleasuring, much as
we are doing now.

(BTW, people often delude themselves that they are writing compliant C, when their
applications happen to run on a couple of conventional 32-bit architectures, using the same
compiler, in a POSIX environment. Ha! Then they get ported to Windows (or the other way
around) and it is taken as an absolute proof of compliance. In fact the applications are a
mess of special cases, ifdef-s and non-standard assumptions.)

Is this an useful optimization ??

Posted Apr 21, 2008 6:50 UTC (Mon) by nix (subscriber, #2304) [Link]

Ah, right. Well, QoI is in the eye of the beholder: I happen to think that 
the semantics of pointer overflow are semantics that only a maniac would 
rely upon intentionally, so the *most* we need is an extension of the 
existing compiler warning that `comparison is always true' or something 
along those lines.

Is this an useful optimization ??

Posted Apr 21, 2008 17:32 UTC (Mon) by mikov (subscriber, #33179) [Link]

I was thinking mostly of integers, not pointers. However I am hard pressed to think of an
example where a pointer overflow would not safely wrap around _in practice_. Even with x86 FAR
pointers, the offset will safely wrap around. There will be no trap and the comparison will
still work.

Are there many architectures, in use today, where pointers are not integers and do not wrap
around on overflow ? Secondly, do these architecture run Linux and are they supported by GCC ?

Regardless of the answer to these questions, why would people writing  code with zero chance
for running on those architectures, be maniacs ?

The C Standard is not a measurement for good programming. It simply has a set of restrictions
to ensure portability between all kinds of weird architectures (I should say it fails in this
miserably, but that is beside the point). However portability is not the only, and by far not
the most important measurement of code.

Is this an useful optimization ??

Posted Apr 21, 2008 11:09 UTC (Mon) by wahern (subscriber, #37304) [Link]

Instructions that trap are not going away, if only because they're useful in virtual
machines--or the like--to which C can be targeted.

Relying too heavily on pointer arithmetic in algorithms is not the smartest thing to do. The
largest integral type supported on the computer I'm typing on is 64-bit (GCC, long long), but
pointers are only 32-bits. Parsing a 5GB ISO Base Media file (aka QuickTime Mov), I can keep
track of various information using the unsigned 64-bit integral; if I had written all my
algorithms to rely on pointer arithmetic to store or compute offsets, I'd be screwed.

C precisely defines the behavior of overflow for unsigned types. Java's primitives suck
because they're all signed; the fact that it wraps (because Java effectively stipulates a
two's-complement implementation) is useless. In fact, I can't even remember the last time I
used (or at least wanted) a signed type, in any language. Having to deal with that extra
dimension is a gigantic headache, and it's worth noting that Java is just as susceptible to
arithmetic bugs as C. I'd argue more so, because unwarranted reliance on such behavior invites
error, and such reliance is easier to justify/excuse in Java because it so narrowly stipulates
such things.

C's integral types are in some ways superior to many other languages' specifically because
they're so loosely defined by the spec. Short of transparently supporting big integers, it
forces you to focus more on values than representations. That Java and C# stipulate a fixed
size is useless in practice; it doesn't help in the slightest the task of constraining range,
which is almost always defined by the data, and similar external context. Any code which
silently relies on a Java primitive type wrapping is poor code. Using comments is always
second to using masks, and other techniques, where the code speaks for itself more clearly
than a comment ever could.

A better system, of course, would utilize type ranges a la Ada.

Anyhow, I know the parent's point had more to do with pointers, but this just all goes to show
that good code doesn't rely on underlying representations, but only the explicit logic of the
programmer, and the semantics of the language.

Is this an useful optimization ??

Posted Apr 21, 2008 16:00 UTC (Mon) by mikov (subscriber, #33179) [Link]

Instructions that trap are not going away, if only because they're useful in virtual machines--or the like--to which C can be targeted.

I disagree. A predictable conditional jump is better, simpler and more efficient than a completely unpredictable trap. Even if the trap doesn't have to go through the OS signal handler (which it probably does on most OS-es), it has to save context, etc.

One could argue for adding more convenient instructions for detecting an overflow and making a conditional jump on it. I strongly suspect that instructions that trap are currently a dead end.

Anyway, we know x86 doesn't have instructions that trap on overflow. (Well, except "into", but no C compiler will generate that). Do PPC and ARM have them and are they used often?

That Java and C# stipulate a fixed size is useless in practice; it doesn't help in the slightest the task of constraining range, which is almost always defined by the data, and similar external context. Any code which silently relies on a Java primitive type wrapping is poor code.

That is not my experience at all. C99 defines "int_least32_t" and the like exactly to address that problem. Many C programmers like to believe that their code doesn't rely on the size of "int", but they are usually wrong. Most pre-C99 code would break horribly if compiled in a 16-bit environment, or where integer widths are not powers of 2, or are not two's complement.

Honestly, I find the notion that one can write code without knowing how wide the integer types are, in a language which doesn't implicitly handle the overflow (unlike Python, Lisp, etc), to be absurd.

I am 100% with you on the unsigned types, though.

I also agree that in practice Java is as susceptible to arithmetic bugs as C. However it is for a different reason than the one you are implying. It is simply because in practice Java and C have _exactly_ the same integer semantics.

Java simply specifies things which 90% of the C programmers mistakenly take for granted. Wrap-around on overflow, truncating integer division, etc.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds