User: Password:
|
|
Subscribe / Log in / New account

Is this an useful optimization ??

Is this an useful optimization ??

Posted Apr 20, 2008 22:39 UTC (Sun) by mikov (subscriber, #33179)
In reply to: Is this an useful optimization ?? by viro
Parent article: GCC and pointer overflows

As you noted yourself, there is a subtle difference between undefined and implementation
defined behavior. This particular case is clearly an example of *implementation defined*. It
behaves in the same way on 99% of all CPUs today and there is nothing undefined about it. Not
only that, but it is not an uncommon idiom and can be useful in a number of situations. How
can you call it undefined ???

You speak about architectures with different pointer representations, etc. I have heard that
argument before and I do not buy it. Such implementations 
mostly do not exist, or if they do, are not an important target for Standard C today, let
alone GCC. (Emphasis on the word "standard"). 

Considering the direction where hardware is going, such architectures are even less likely to
appear in the future. Instructions that trap for any reason are terribly inconvenient for
superscalar or OoO execution.

Anyway, I think I may not have been completely clear about my point. I am *not* advocating
that the standard should require some particular behavior on integer and pointer overflows. I
am however not convinced that it is a good idea to explicitly *prohibit* the behavior which is
natural to most computers in existence.

As it is now, Standard C is (ahem) somewhat less than useful for writing portable programs (to
put it extremely mildly). It is somewhat surprising that the Standard and (apparently) most
compiler implementors are also making less useful for *non-portable* applications :-)


(I have to admit that years ago I was fascinated with the aesthetic "beauty" of portable C
code, or may be I liked the challenge. Then I came to realize what a horrible waste of time it
was. Take Java for example - for all its horrible suckiness, it does a few things right - it
defines *precisely* the size of the integer types, the behavior on overflow and integer
remainder of negative numbers.

There is a lot of good in x86 being the universal CPU winner - we can finally forget the
idiosyncrasies of the seventies ...)


(Log in to post comments)

Is this an useful optimization ??

Posted Apr 21, 2008 0:32 UTC (Mon) by nix (subscriber, #2304) [Link]

Pointer overflow is undefined because the Standard does not define it. 
That's what undefined *means*. The semantics of some construct are 
undefined both when the Standard says they are, and when it simply doesn't 
say anything one way or the other. The semantics of a construct are 
implementation-defined *only* when the Standard says they are. You can't 
say `oh, it doesn't say anything about $FOO, therefore it's 
implementation-defined'.

Is this an useful optimization ??

Posted Apr 21, 2008 1:30 UTC (Mon) by mikov (subscriber, #33179) [Link]

I understand that. (In fact I used to be intimately familiar with the C99 Standard several
years back, before I got disillusioned with it - how can you have any respect for a standard
containing "atoi()" ? :-) ). 

I am not discussing the "meaning" of the Standard. The meaning is more than clear, at least in
this case, and if you look at my first post in this thread, you will see me explicitly noting
that this is a completely valid transformation according to the Standard. 

However it is also completely valid *not* to to perform this transformation. So, this is
clearly a QOI issue and I am discussing it as such. I am not convinced that optimizing away
operations which are otherwise completely valid and useful, only because the standard says
that it is allowed (but not required!), is a good idea.

The Standard is just a document. Fortunately it is not an absolute truth and it does not
prevent us from thinking on our own. Plus, it is not particularly inspiring or useful.

As you probably know very well, it is impossible to write an useful C application using only
the C standard. *Any* C application is non-portable and non-standard compliant.  The C
Standard is so weak, that it is practically useless, except as mental self pleasuring, much as
we are doing now.

(BTW, people often delude themselves that they are writing compliant C, when their
applications happen to run on a couple of conventional 32-bit architectures, using the same
compiler, in a POSIX environment. Ha! Then they get ported to Windows (or the other way
around) and it is taken as an absolute proof of compliance. In fact the applications are a
mess of special cases, ifdef-s and non-standard assumptions.)

Is this an useful optimization ??

Posted Apr 21, 2008 6:50 UTC (Mon) by nix (subscriber, #2304) [Link]

Ah, right. Well, QoI is in the eye of the beholder: I happen to think that 
the semantics of pointer overflow are semantics that only a maniac would 
rely upon intentionally, so the *most* we need is an extension of the 
existing compiler warning that `comparison is always true' or something 
along those lines.

Is this an useful optimization ??

Posted Apr 21, 2008 17:32 UTC (Mon) by mikov (subscriber, #33179) [Link]

I was thinking mostly of integers, not pointers. However I am hard pressed to think of an
example where a pointer overflow would not safely wrap around _in practice_. Even with x86 FAR
pointers, the offset will safely wrap around. There will be no trap and the comparison will
still work.

Are there many architectures, in use today, where pointers are not integers and do not wrap
around on overflow ? Secondly, do these architecture run Linux and are they supported by GCC ?

Regardless of the answer to these questions, why would people writing  code with zero chance
for running on those architectures, be maniacs ?

The C Standard is not a measurement for good programming. It simply has a set of restrictions
to ensure portability between all kinds of weird architectures (I should say it fails in this
miserably, but that is beside the point). However portability is not the only, and by far not
the most important measurement of code.

Is this an useful optimization ??

Posted Apr 21, 2008 11:09 UTC (Mon) by wahern (subscriber, #37304) [Link]

Instructions that trap are not going away, if only because they're useful in virtual
machines--or the like--to which C can be targeted.

Relying too heavily on pointer arithmetic in algorithms is not the smartest thing to do. The
largest integral type supported on the computer I'm typing on is 64-bit (GCC, long long), but
pointers are only 32-bits. Parsing a 5GB ISO Base Media file (aka QuickTime Mov), I can keep
track of various information using the unsigned 64-bit integral; if I had written all my
algorithms to rely on pointer arithmetic to store or compute offsets, I'd be screwed.

C precisely defines the behavior of overflow for unsigned types. Java's primitives suck
because they're all signed; the fact that it wraps (because Java effectively stipulates a
two's-complement implementation) is useless. In fact, I can't even remember the last time I
used (or at least wanted) a signed type, in any language. Having to deal with that extra
dimension is a gigantic headache, and it's worth noting that Java is just as susceptible to
arithmetic bugs as C. I'd argue more so, because unwarranted reliance on such behavior invites
error, and such reliance is easier to justify/excuse in Java because it so narrowly stipulates
such things.

C's integral types are in some ways superior to many other languages' specifically because
they're so loosely defined by the spec. Short of transparently supporting big integers, it
forces you to focus more on values than representations. That Java and C# stipulate a fixed
size is useless in practice; it doesn't help in the slightest the task of constraining range,
which is almost always defined by the data, and similar external context. Any code which
silently relies on a Java primitive type wrapping is poor code. Using comments is always
second to using masks, and other techniques, where the code speaks for itself more clearly
than a comment ever could.

A better system, of course, would utilize type ranges a la Ada.

Anyhow, I know the parent's point had more to do with pointers, but this just all goes to show
that good code doesn't rely on underlying representations, but only the explicit logic of the
programmer, and the semantics of the language.

Is this an useful optimization ??

Posted Apr 21, 2008 16:00 UTC (Mon) by mikov (subscriber, #33179) [Link]

Instructions that trap are not going away, if only because they're useful in virtual machines--or the like--to which C can be targeted.

I disagree. A predictable conditional jump is better, simpler and more efficient than a completely unpredictable trap. Even if the trap doesn't have to go through the OS signal handler (which it probably does on most OS-es), it has to save context, etc.

One could argue for adding more convenient instructions for detecting an overflow and making a conditional jump on it. I strongly suspect that instructions that trap are currently a dead end.

Anyway, we know x86 doesn't have instructions that trap on overflow. (Well, except "into", but no C compiler will generate that). Do PPC and ARM have them and are they used often?

That Java and C# stipulate a fixed size is useless in practice; it doesn't help in the slightest the task of constraining range, which is almost always defined by the data, and similar external context. Any code which silently relies on a Java primitive type wrapping is poor code.

That is not my experience at all. C99 defines "int_least32_t" and the like exactly to address that problem. Many C programmers like to believe that their code doesn't rely on the size of "int", but they are usually wrong. Most pre-C99 code would break horribly if compiled in a 16-bit environment, or where integer widths are not powers of 2, or are not two's complement.

Honestly, I find the notion that one can write code without knowing how wide the integer types are, in a language which doesn't implicitly handle the overflow (unlike Python, Lisp, etc), to be absurd.

I am 100% with you on the unsigned types, though.

I also agree that in practice Java is as susceptible to arithmetic bugs as C. However it is for a different reason than the one you are implying. It is simply because in practice Java and C have _exactly_ the same integer semantics.

Java simply specifies things which 90% of the C programmers mistakenly take for granted. Wrap-around on overflow, truncating integer division, etc.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds