Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 3:40 UTC (Thu) by gmaxwell (guest, #30048)
Parent article: Signed overflow optimization hazards in the kernel

It's worth pointing out that the optimizations resulting from "signed counters can't overflow" can be pretty useful, since it can allows the compiler to figure out loop sizes e.g. "This loop runs either 8 times exactly, or it overflows; so it runs 8 times— unrolling and vectorizing".

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 21:53 UTC (Thu) by daglwn (guest, #65432) [Link] (3 responses)

Bingo. This is also why it's particularly bad to use unsigned loop counters.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:19 UTC (Thu) by PaulMcKenney (✭ supporter ✭, #9624) [Link] (2 responses)

So the kernel's current use of -fno-strict-overflow is presumably disabling a number of optimizations. How much performance would you two guess that the kernel is giving up?

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:45 UTC (Thu) by gmaxwell (guest, #30048) [Link] (1 responses)

It's hard to say—

First, for code in the kernel, I assume that vectorizing is off the table because IIRC the kernel doesn't normally use the XMM registers to avoid saving them... so thats a chunk of the benefit that wouldn't exist in the kernel.

Branch prediction is often effective enough to make the looping cost small. But not always. I've seen easily measurable gains in numerical code performance just from reorganizing things so the compiler could find limits and unroll, but it doesn't always matter.

The most obvious thing to do would be to compile the kernel without the switches and hope that it runs long enough to run a few benchmarks. :)

Another issue is that overflow is _usually_ a bug where it exists (Yes, there are plenty of times where a programmer is using it intentionally but they are far less common than cases where an overflow is a bug). By preferring to use signed values and preventing overflow where you won't handle it, and using unsigned only where you must for range or where you intend overflow you make tools that catch bugs with dynamic instrumentation of overflow far more powerful. ( E.g. http://embed.cs.utah.edu/ioc/ ). Though since no one is applying these sorts of tools to the kernel I guess it doesn't matter currently. :)

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 17:20 UTC (Sat) by ppisa (subscriber, #67307) [Link]

It would worth to compile and run kernel for MIPS with GCC option -ftrapv, if its actual GCC implementation is not broken in the current GCC version. MIPS has wrapping (addu, addiu) and signed overflow generation (add, addi) variants of the instructions. But wrapping variants are used even for signed types to keep compatibility with usual C code writeup manners.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 22:47 UTC (Thu) by cmccabe (guest, #60281) [Link] (9 responses)

I really doubt that using -fnowrapv or the equivalent provides much of a performance boost in practice.

Most of the examples I've seen have been of the form "aha! I can (incorrectly) assume that this if statement that the programmer put here is a no-op!" In all of these cases I've seen, the "optimization" is something that a good programmer could have and probably would have done manually anyway.

I'd really like to see some real-world performance numbers about the effects of this optimization. Based on everything I've seen so far, this is a case where compiler writers got so carried away considering what they _could_ do, that they didn't stop to think if they _should_.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 23:10 UTC (Thu) by cmccabe (guest, #60281) [Link] (2 responses)

I still feel a little confused by this. If you have a loop like this:

> int i;
> for (i = 0; i < 5; i++) {
> [expression not involving i]
> }

I don't see how -fnowrapv will prevent you from unrolling the loop. You know that starting at 0 and adding 1 will get you to 5 before it will get you to overflow.

I guess you could come up with a scenario where you don't know the initial value of the counter, but you do have a constant positive increment and a set upper limit. So something like this:

> for (i = function_from_another_translation_unit(); i < 5; i++) {
> [expression not involving i]
> }

Even in this case, though, you still know that either you'll run the loop no times, or there will be no overflow. You have to get to something like this to start seeing a problem:

> for (i = function_from_another_translation_unit(); i < 0x7ffffffe; i += 2) {
> [expression not involving i]
> }

This is pretty exotic scenario, though. If i starts off as 0x7fffffff and -fwrapv is enabled, the loop will never terminate, whereas with -fnowrapv it's undefined. To me, this feels like buggy code in the first place, so I don't really care if it's not optimized.

Am I missing something here? Is there a good example of a real-world scenario where -fnowrapv helps well-written code?

Signed overflow optimization hazards in the kernel

Posted Aug 18, 2012 22:56 UTC (Sat) by jzbiciak (guest, #5246) [Link] (1 responses)

Where I've heard it coming up is when you have code that effectively looks like this:

    for (i = x; i < x + 16; i++)
    {
        /* code that does not modify either x or i */
    }

The compiler wants to know it's safe to assume this loop goes around 16 times. That isn't true if "x + 16" could overflow.

Signed overflow optimization hazards in the kernel

Posted Aug 20, 2012 7:22 UTC (Mon) by jezuch (subscriber, #52988) [Link]

> The compiler wants to know it's safe to assume this loop goes around 16 times. That isn't true if "x + 16" could overflow.

If you want to make it explicit, there's a flag for this in GCC: -funsafe-loop-optimizations (along with -Wunsafe-loop-optimizations if you want to know when it happens; it's a warning, though, so beware of build environments which insist on -Werror). AFACT there are two cases handled by this flag: assuming that the loop counter does not overflow, and assuming that the loop is not infinite. Don't know the exact implementation details, though.

Signed overflow optimization hazards in the kernel

Posted Aug 16, 2012 23:10 UTC (Thu) by klossner (subscriber, #30046) [Link]

Sure, a good programmer should have done that. But the C compiler is often presented with automatically-generated code that has never been seen by human eyes, for example after several levels of macro expansion have done their work. Folding out the resulting dead code is often a substantial win.

Signed overflow optimization hazards in the kernel

Posted Aug 17, 2012 19:09 UTC (Fri) by iabervon (subscriber, #722) [Link]

int find(char *s, char c, int lim)
{
  for (int i = 0; i != lim && s[i]; i++)
    if (s[i] == c)
      return i;
  return -1;
}

int main()
{
  find("foo", 'o', 1);
  find("foo", 'o', -1);
}

If the compiler inlines the second call, it can drop the "i != lim" test by assuming that overflow is impossible.

Signed overflow optimization hazards in the kernel

Posted Aug 20, 2012 16:45 UTC (Mon) by daglwn (guest, #65432) [Link] (3 responses)

You don't believe vectorization can significantly speed up code?

Signed overflow optimization hazards in the kernel

Posted Aug 21, 2012 7:58 UTC (Tue) by jezuch (subscriber, #52988) [Link] (2 responses)

As someone who insists on rebuilding some of Debian's packages for my own machine (for fun and not for profit), I can tell that vectorization very rarely has any significant impact. Yes, it can provide a great boost in some synthetic benchmarks or maybe in HPC code (but then, HPC code relies more on optimization by hand and leaves little to the compiler, as I understand it), but very few loops in real-world code are well-formed enough to be suitable for auto-vectorization. I won't make your browser visibly faster :)

Signed overflow optimization hazards in the kernel

Posted Aug 21, 2012 14:30 UTC (Tue) by daglwn (guest, #65432) [Link] (1 responses)

> but then, HPC code relies more on optimization by hand and leaves little
> to the compiler, as I understand it)

Not true. The Intel compiler, for example, will vectorize its brains out automatically.

A code using lots of non-restrict pointers will certainly be difficult to vectorize. Such code can sometimes be auto-parallelized but I don't believe gcc has that capability.

gcc's vectorization is also pretty weak, though it is getting better. That may be part of what you're seeing.

Signed overflow optimization hazards in the kernel

Posted Aug 22, 2012 10:39 UTC (Wed) by jezuch (subscriber, #52988) [Link]

> A code using lots of non-restrict pointers will certainly be difficult to vectorize. Such code can sometimes be auto-parallelized but I don't believe gcc has that capability.

It sorta-kinda does (-ftree-parallelize-loops) but I haven't tested it much. But I expect it to be even weaker than vectorization.

> gcc's vectorization is also pretty weak, though it is getting better. That may be part of what you're seeing.

It may be that. Or it may be that most of the code in non-HPC world does not lend itself to vectorization easily. I actually don't know, I'm just an amateur and hobbyist :)