LWN: Comments on "Signed overflow optimization hazards in the kernel"

About undefined behaviour...

pwood — Thu, 23 Aug 2012 16:03:43 +0000

John Regehr's blog posts on undefined behaviour are always worth reading (his research group developed the IOC patch to clang that a couple of people have linked to above.) In the context of integer behaviour in C he has a quiz here which is worth a look. I'm not sure that you ever want to rely on undefined behaviour as suggested in the article as it means that the execution of your program is undefined and can change at the compiler's will. That is very different from relying on implementation defined behaviour.

Signed overflow optimization hazards in the kernel

georgm — Thu, 23 Aug 2012 10:28:10 +0000

The question concerns signed values, where wraps are not common. Unsigned benefit is clear.

Signed overflow optimization hazards in the kernel

georgm — Thu, 23 Aug 2012 09:54:33 +0000

A question:

What is the reason to write "if((a-b) < 0) ..." instead of just "if(a<b)"?

Signed overflow optimization hazards in the kernel

reddit — Thu, 23 Aug 2012 09:47:27 +0000

The sensible way to do this is (int)(a - b) < 0 where a and b are unsigned.

Signed overflow optimization hazards in the kernel

etienne — Wed, 22 Aug 2012 13:21:45 +0000

Seems like -ftrapv can work, but it is far from just adding an "into" instruction after each signed "add" on ia32 - note that "into" has disappeared in amd64 instruction set.

$ cat test.c
int a, b, c;
void main (void) {
c = a + b;
}
$ gcc -m32 -O2 -ftrapv test.c
$ objdump -d a.out

a.out: file format elf32-i386
....
080483f0 <main>:
push %ebp
mov %esp,%ebp
and $0xfffffff0,%esp
sub $0x10,%esp
mov 0x804a01c,%eax
mov %eax,0x4(%esp)
mov 0x804a020,%eax
mov %eax,(%esp)
call 8048420 <__addvsi3>
mov %eax,0x804a024
leave
ret
....
08048420 <__addvsi3>:
push %ebp
mov %esp,%ebp
push %ebx
sub $0x4,%esp
mov 0xc(%ebp),%ecx
mov 0x8(%ebp),%edx
call 804845c <__i686.get_pc_thunk.bx>
add $0x1bc2,%ebx
test %ecx,%ecx
lea (%ecx,%edx,1),%eax
js 8048450 <__addvsi3+0x30>
cmp %edx,%eax
setl %dl
test %dl,%dl
jne 8048457 <__addvsi3+0x37>
add $0x4,%esp
pop %ebx
pop %ebp
ret
xchg %ax,%ax
cmp %edx,%eax
setg %dl
jmp 8048444 <__addvsi3+0x24>
call 80482fc <abort@plt>

Without ftrapv:
080483c0 <main>:
mov 0x804a01c,%eax
add 0x804a018,%eax
push %ebp
mov %esp,%ebp
mov %eax,0x804a020
pop %ebp
ret

Signed overflow optimization hazards in the kernel

jezuch — Wed, 22 Aug 2012 10:39:16 +0000

> A code using lots of non-restrict pointers will certainly be difficult to vectorize. Such code can sometimes be auto-parallelized but I don't believe gcc has that capability.

It sorta-kinda does (-ftree-parallelize-loops) but I haven't tested it much. But I expect it to be even weaker than vectorization.

> gcc's vectorization is also pretty weak, though it is getting better. That may be part of what you're seeing.

It may be that. Or it may be that most of the code in non-HPC world does not lend itself to vectorization easily. I actually don't know, I'm just an amateur and hobbyist :)

Signed overflow optimization hazards in the kernel

mpr22 — Wed, 22 Aug 2012 08:51:38 +0000

-ftrapv is an architecture-independent compiler option to GCC. I don't know whether it works.

Signed overflow optimization hazards in the kernel

daglwn — Tue, 21 Aug 2012 14:30:00 +0000

> but then, HPC code relies more on optimization by hand and leaves little
> to the compiler, as I understand it)

Not true. The Intel compiler, for example, will vectorize its brains out automatically.

A code using lots of non-restrict pointers will certainly be difficult to vectorize. Such code can sometimes be auto-parallelized but I don't believe gcc has that capability.

gcc's vectorization is also pretty weak, though it is getting better. That may be part of what you're seeing.

Signed overflow optimization hazards in the kernel

jezuch — Tue, 21 Aug 2012 07:58:41 +0000

As someone who insists on rebuilding some of Debian's packages for my own machine (for fun and not for profit), I can tell that vectorization very rarely has any significant impact. Yes, it can provide a great boost in some synthetic benchmarks or maybe in HPC code (but then, HPC code relies more on optimization by hand and leaves little to the compiler, as I understand it), but very few loops in real-world code are well-formed enough to be suitable for auto-vectorization. I won't make your browser visibly faster :)

Signed overflow optimization hazards in the kernel

daglwn — Mon, 20 Aug 2012 16:45:15 +0000

You don't believe vectorization can significantly speed up code?

Signed overflow optimization hazards in the kernel

etienne — Mon, 20 Aug 2012 08:59:56 +0000

> So it is not too early to start future-proofing the Linux kernel by removing its reliance on signed integer overflow!

Would be nice to have a GCC option for ia32 so that any "add" used for signed arithmetic is followed with "into" (exception if overflow).
It will not slow too much the execution ("into" will not be predicted as a taken jump), and could help locate potential problems...

Signed overflow optimization hazards in the kernel

jezuch — Mon, 20 Aug 2012 07:22:52 +0000

> The compiler wants to know it's safe to assume this loop goes around 16 times. That isn't true if "x + 16" could overflow.

If you want to make it explicit, there's a flag for this in GCC: -funsafe-loop-optimizations (along with -Wunsafe-loop-optimizations if you want to know when it happens; it's a warning, though, so beware of build environments which insist on -Werror). AFACT there are two cases handled by this flag: assuming that the loop counter does not overflow, and assuming that the loop is not infinite. Don't know the exact implementation details, though.

Signed overflow optimization hazards in the kernel

PaulMcKenney — Mon, 20 Aug 2012 00:44:56 +0000

No problem!

I must admit that it would be nice if "(signed typeof(a))" and (unsigned typeof(b))" flipped the signedness of "a" and "b", but my version of gcc really doesn't like either variant. ;-)

Signed overflow optimization hazards in the kernel

baldrick — Sun, 19 Aug 2012 17:00:06 +0000

By "signed" I meant "signed integer type of the same size" and by "unsigned" I meant "unsigned integer type of the same size". The "same size" means: the same number of bits as the original integer type, so in the case of example 2 this means "signed long long" and "unsigned long long". Sorry for not being clear.

Signed overflow optimization hazards in the kernel

giraffedata — Sat, 18 Aug 2012 23:58:03 +0000

The result of ... converting an integer to a signed integer type when the value cannot be represented in an object of that type ...
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type;

I must be reading that wrongly, because that's not at all what GCC does. With -m32, int is a signed integer type of width 32. UINT_MAX reduced modulo 2^32 is UINT_MAX, which is not within the range of int. So this does not describe what (int)UINT_MAX does.

Rather, what GCC does appears to be the opposite of what the standard requires for conversion from negative number to unsigned integer (add UINT_MAX+1 until it fits): it subtracts UINT_MAX+1 until the value is within the range of int (in this case -1).

Signed overflow optimization hazards in the kernel

jzbiciak — Sat, 18 Aug 2012 23:04:44 +0000

I see I missed ppisa's impressive macros upthread. Those certainly seem to take into account a range of integer types as well!

Signed overflow optimization hazards in the kernel

jzbiciak — Sat, 18 Aug 2012 22:56:51 +0000

Where I've heard it coming up is when you have code that effectively looks like this:

    for (i = x; i < x + 16; i++)
    {
        /* code that does not modify either x or i */
    }

The compiler wants to know it's safe to assume this loop goes around 16 times. That isn't true if "x + 16" could overflow.

Signed overflow optimization hazards in the kernel

jzbiciak — Sat, 18 Aug 2012 22:51:33 +0000

This seems like it argues for a macro to hide the ugliness. Name it something obvious such as "before()". It makes the intent obvious and ensures the ugly math is done correctly. Something like:

/* return true if timestamp 'a' is earlier than timestamp 'b' */
#define before(a,b) (ULONG_MAX/2 > ((unsigned long)(a) - (unsigned long)(b)))

Or is this just an invitation to more problems?

Signed overflow optimization hazards in the kernel

ppisa — Sat, 18 Aug 2012 22:34:04 +0000

Hmm, cast to unsigned used to work even for signed types and in practice works still. a+=20 is translated into single add instruction on all targets I know. The other reason for casting is, that sometimes you can strore in object only shorter part of time stamp or generation counter, if you know, that live period is small enough or if you only check for change. Casting both to smaller of the two makes subtraction possibly cheaper, the result has to be casted to smaller one anyway.

But main reason for casting to ensure thing works on existing code
with signed types.

Best wishes,

Pavel

Signed overflow optimization hazards in the kernel

PaulMcKenney — Sat, 18 Aug 2012 19:48:57 +0000

True enough!

However, the Linux kernel's code can be legitimately used in any GPLv2 project, including those that might run on systems with non-twos-complement signed integer arithmetic. This sharing of code among compatibly licensed projects is a very good thing, in my view.

Which means in this case, where there is a solution that meets the C standard, and which loses nothing by doing so (at least on x86 and Power), it only makes sense to use that C-standard solution.

Signed overflow optimization hazards in the kernel

PaulMcKenney — Sat, 18 Aug 2012 19:02:34 +0000

Cool, that was the sort of thing I was thinking of with my "sizeof()" earlier, though I still do feel more comfortable with using the constants than relying on casting.

But why the casts to unsigned integral types? I would instead have expected a requirement that the caller's cyclic arithmetic be carried out in unsigned integers, so that the casts were unnecessary.

Signed overflow optimization hazards in the kernel

PaulMcKenney — Sat, 18 Aug 2012 18:19:43 +0000

One objection was that there really are still non-twos-complement machines in common use. As was noted by the comment to this article discussing saturating adders, where 32767+1==32767. But this would be addressed by "implementation defined" rather than "undefined".

Another objection was that there are systems still in common use that trap on signed integer overflow. If the C standard required wrapping, compilers for such systems would require special edge-case checks on pretty much any signed integer operation.

And there was of course also the objection that signed integer overflow always has been undefined. ;-)

Signed overflow optimization hazards in the kernel

ppisa — Sat, 18 Aug 2012 18:07:14 +0000

I hope that behavior of unsigned to signed conversion stays defined (at least for GCC and GCC replacement compilers - LLVM, Intel atc.). GCC defines behavior in current manual version

http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation...

I use ((signed type)((unsigned type)a - (unsigned type)b) > 0) often in our embedded code and in the fact I am probably author/coauthor of that MX1 example - if that line was not not rewritten by somebody.

Paul's example behavior is correct according to my knowledge (unsigned) is equivalent to (unsigned int) ie on 32 bit target 32 bit subtraction is evaluated and then sign is extended to 64 bits when conversion to (signed it) and then 64 bit signed type is required.

What is common trap is that plain

(unsigned type) a - (unsigned type) b

is not considered (type) nor (signed type). It is signed but interpreted as so big signed to hold additional bit if it is compared with zero. So additional cast to signed type same or smaller than both inputs (a and b) has to be used.

I use next mechanism to allow cyclic comparison between different
hardware, position, time, state generation counters etc in our code.

http://ulan.git.sourceforge.net/git/gitweb.cgi?p=ulan/ulut;...

Library is licensed GPL, LGPL, MPL, but code fragment can be taken as public domain, if it helps somebody.

#ifndef ul_cyclic_gt
#define ul_cyclic_gt(x,y) \
((sizeof(x)>=sizeof(long long))&&(sizeof(y)>=sizeof(long long))? \
(long long)((unsigned long long)(x)-(unsigned long long)(y))>0: \
(sizeof(x)>=sizeof(long))&&(sizeof(y)>=sizeof(long))? \
(long)((unsigned long)(x)-(unsigned long)(y))>0: \
(sizeof(x)>=sizeof(int))&&(sizeof(y)>=sizeof(int))? \
(int)((unsigned int)(x)-(unsigned int)(y))>0: \
(sizeof(x)>=sizeof(short))&&(sizeof(y)>=sizeof(short))? \
(short)((unsigned short)(x)-(unsigned short)(y))>0: \
(signed char)((unsigned char)(x)-(unsigned char)(y))>0 \
)
#endif /*ul_cyclic_gt*/

#ifndef ul_cyclic_ge
#define ul_cyclic_ge(x,y) \
((sizeof(x)>=sizeof(long long))&&(sizeof(y)>=sizeof(long long))? \
(long long)((unsigned long long)(x)-(unsigned long long)(y))>=0: \
(sizeof(x)>=sizeof(long))&&(sizeof(y)>=sizeof(long))? \
(long)((unsigned long)(x)-(unsigned long)(y))>=0: \
(sizeof(x)>=sizeof(int))&&(sizeof(y)>=sizeof(int))? \
(int)((unsigned int)(x)-(unsigned int)(y))>=0: \
(sizeof(x)>=sizeof(short))&&(sizeof(y)>=sizeof(short))? \
(short)((unsigned short)(x)-(unsigned short)(y))>=0: \
(signed char)((unsigned char)(x)-(unsigned char)(y))>=0 \
)
#endif /*ul_cyclic_ge*/

Please, if you know about some target, compiler or intention to break assumption (at least hopefully current GCC version guarantee) that unsigned to signed conversion reinterprets MSB as a sign. As for correctness of the code, there could be problem if target specifies some basic arithmetic type with some bits unused. It short 16 bit but stored in 32 bit entity. But none of our targets has that problem.

Code is used in many targets, some of safety grade class applications so notice of possible (even future) breakage is critical for me and users.

Signed overflow optimization hazards in the kernel

ppisa — Sat, 18 Aug 2012 17:20:01 +0000

It would worth to compile and run kernel for MIPS with GCC option -ftrapv, if its actual GCC implementation is not broken in the current GCC version. MIPS has wrapping (addu, addiu) and signed overflow generation (add, addi) variants of the instructions. But wrapping variants are used even for signed types to keep compatibility with usual C code writeup manners.

Signed overflow optimization hazards in the kernel

pdewacht — Fri, 17 Aug 2012 21:57:31 +0000

But given that Linux is only intended to be compiled by gcc, we can rely on its implementation-defined behavior:

The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.

(and I don't see how any compiler for a two's complement computer could define different behavior.)

Signed overflow optimization hazards in the kernel

gmaxwell — Fri, 17 Aug 2012 19:16:59 +0000

JM, the clang integer overflow checker we've used on our projects isn't part of clang proper, it's a (very useful) patch: http://embed.cs.utah.edu/ioc/

Beyond the optimization possibilities, the existence of tools like this is also a reason for keeping the undefined behavior, e.g. continue using signed values for counters that don't need the extra unsigned range: Most of the time overflow that you didn't expect (and thus couldn't wrap in a casting macro) is a sign of a logic error. By keeping it invalid you gain the possibility of dynamic instrumentation to catch those errors.

(Though I don't know if anyone has managed to get tools like this working with the kernel yet!)

Signed overflow optimization hazards in the kernel

iabervon — Fri, 17 Aug 2012 19:09:44 +0000

int find(char *s, char c, int lim)
{
  for (int i = 0; i != lim && s[i]; i++)
    if (s[i] == c)
      return i;
  return -1;
}

int main()
{
  find("foo", 'o', 1);
  find("foo", 'o', -1);
}

If the compiler inlines the second call, it can drop the "i != lim" test by assuming that overflow is impossible.

Signed overflow optimization hazards in the kernel

josh — Fri, 17 Aug 2012 16:07:24 +0000

What did the arguments against it say, other than "that would remove compiler optimization possibilities"?

Signed overflow optimization hazards in the kernel

wahern — Fri, 17 Aug 2012 06:45:04 +0000

You have that in reverse. Conversion to unsigned is always well defined. Conversion to signed where the value cannot be represented is implemented-defined:

C99 6.3.1.3 Signed and unsigned integers

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49)

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Signed overflow optimization hazards in the kernel

jimparis — Fri, 17 Aug 2012 05:05:29 +0000

Hmmm... If I compile the following with gcc 4.6.1 with -O2:
unsigned long long signed_cast(unsigned long long a, unsigned long long b)
{
        return (signed)((unsigned)a - (unsigned)b);
}
...
This is a 32-bit subtraction on 64-bit quantities.

Well, yeah, that's because "(signed)" is casting 32-bit when compiled with -m32. I might be misunderstanding your issue here but it seems you wanted:

unsigned long long signed_cast(unsigned long long a, unsigned long long b)
{
        return (signed long long)((unsigned long long)a - (unsigned long long)b);
}

Signed overflow optimization hazards in the kernel

PaulMcKenney — Fri, 17 Aug 2012 02:20:46 +0000

Though I could make the macros type-generic by (ab)using sizeof(). Not sure it is worth it, though.

Signed overflow optimization hazards in the kernel

mmorrow — Thu, 16 Aug 2012 23:34:15 +0000

Here's my favorite undefined-signed-overflow example:

  int f(int x){while(x < x + 1) x += 1; return x;}

gives:

  f:
  .L2:
    jmp .L2
    .ident  "GCC: (GNU) 4.8.0 20120408 (experimental)"

Signed overflow optimization hazards in the kernel

klossner — Thu, 16 Aug 2012 23:19:28 +0000

» I suspect that the dominance of twos complement was not due to ease of use, but rather due to the fact that it allows a single hardware adder to perform both signed and unsigned computations. «

My recollection of those days is that the greater motivation was to get away from systems with two different representations of zero. There were ancient Fortran codes run on CDC mainframes that had to test results for both postive and negative zero.

Signed overflow optimization hazards in the kernel

klossner — Thu, 16 Aug 2012 23:10:23 +0000

Sure, a good programmer should have done that. But the C compiler is often presented with automatically-generated code that has never been seen by human eyes, for example after several levels of macro expansion have done their work. Folding out the resulting dead code is often a substantial win.

Signed overflow optimization hazards in the kernel

cmccabe — Thu, 16 Aug 2012 23:10:05 +0000

I still feel a little confused by this. If you have a loop like this:

> int i;
> for (i = 0; i < 5; i++) {
> [expression not involving i]
> }

I don't see how -fnowrapv will prevent you from unrolling the loop. You know that starting at 0 and adding 1 will get you to 5 before it will get you to overflow.

I guess you could come up with a scenario where you don't know the initial value of the counter, but you do have a constant positive increment and a set upper limit. So something like this:

> for (i = function_from_another_translation_unit(); i < 5; i++) {
> [expression not involving i]
> }

Even in this case, though, you still know that either you'll run the loop no times, or there will be no overflow. You have to get to something like this to start seeing a problem:

> for (i = function_from_another_translation_unit(); i < 0x7ffffffe; i += 2) {
> [expression not involving i]
> }

This is pretty exotic scenario, though. If i starts off as 0x7fffffff and -fwrapv is enabled, the loop will never terminate, whereas with -fnowrapv it's undefined. To me, this feels like buggy code in the first place, so I don't really care if it's not optimized.

Am I missing something here? Is there a good example of a real-world scenario where -fnowrapv helps well-written code?

About undefined behaviour...

PaulMcKenney — Thu, 16 Aug 2012 22:52:19 +0000

Thank you, highly recommended!

About undefined behaviour...

hummassa — Thu, 16 Aug 2012 22:50:11 +0000

Great links! Thank you very much!!!

Signed overflow optimization hazards in the kernel

PaulMcKenney — Thu, 16 Aug 2012 22:49:32 +0000

Whew! It goes back to being non-urgent. ;-)

Signed overflow optimization hazards in the kernel

PaulMcKenney — Thu, 16 Aug 2012 22:48:42 +0000

Understood. And when it was suggested within the C11 Standards committee that signed-integer overflow be given twos-complement semantics, the discussion was both emphatic and short. ;-)

Signed overflow optimization hazards in the kernel

cmccabe — Thu, 16 Aug 2012 22:47:53 +0000

I really doubt that using -fnowrapv or the equivalent provides much of a performance boost in practice.

Most of the examples I've seen have been of the form "aha! I can (incorrectly) assume that this if statement that the programmer put here is a no-op!" In all of these cases I've seen, the "optimization" is something that a good programmer could have and probably would have done manually anyway.

I'd really like to see some real-world performance numbers about the effects of this optimization. Based on everything I've seen so far, this is a case where compiler writers got so carried away considering what they _could_ do, that they didn't stop to think if they _should_.