|
|
Subscribe / Log in / New account

Infinite loop language lawyering

Infinite loop language lawyering

Posted Jul 5, 2024 14:27 UTC (Fri) by epa (subscriber, #39769)
Parent article: New features in C++26

Infinite loops had originally been made undefined behavior in order to allow optimizing compilers to assume that programs made forward progress;
But surely there are other ways for the standard to allow that optimization, rather than the very blunt hammer of “undefined behaviour”? Since as we are often reminded, undefined means any semantics is fair game. A weak interpretation would mean “the compiler can assume the loop always terminates”, which is fine. But a strong interpretation would be “if the compiler can prove the loop is infinite, it can give any semantics it likes”. Which sounds far-fetched, but we have seen many cases where programmers were surprised by a more legalistic interpretation than they had expected, where the code theoretically has undefined behaviour.


to post comments

Infinite loop language lawyering

Posted Jul 5, 2024 15:59 UTC (Fri) by flussence (guest, #85566) [Link] (73 responses)

The thing with undefined behaviour is that it has only ever meant the standard doesn't force a course of action, and the compiler is free to do in that situation what it thinks best.

A shockingly large number of people have this bizarre misinterpretation that UB means some words printed on toilet paper are the only thing preventing the computer from committing overt acts of sadism against the programmer and user, and it's just waiting for any and every opportunity to do so — which says more about them than it does about the computer.

And while I have to admit some software projects *do* behave exactly like that, a C/++ compiler isn't going to, especially in this day and age where attempted jackassery can be defused by setting $CC differently. It might optimize away part of a function body after an infinite loop, it might not, but in either case it'll probably emit a warning to let the programmer know they should state their intent more clearly. Those are the only reasonable expectations to have here.

Infinite loop language lawyering

Posted Jul 5, 2024 16:15 UTC (Fri) by intelfx (subscriber, #130118) [Link] (58 responses)

> A shockingly large number of people have this bizarre misinterpretation that UB means some words printed on toilet paper are the only thing preventing the computer from committing overt acts of sadism against the programmer and user, and it's just waiting for any and every opportunity to do so — which says more about them than it does about the computer.

This is only because it has been demonstrated, time and again, that committing certain sorts of UB while using a modern optimizing compiler can and will cause exactly this sort of spooky action at a distance, reinterpreting the program in ways that appear completely unreasonable without a painstaking session of language lawyering.

Infinite loop language lawyering

Posted Jul 5, 2024 16:35 UTC (Fri) by malmedal (subscriber, #56172) [Link] (11 responses)

> ... reinterpreting the program in ways that appear completely unreasonable ...

Yes, I keep wishing that in all cases where gcc was forced to add a flag to disable some optimisation, because otherwise some existing program would be mis-compiled, e.g. -fwrapv and friends, that should be taken as incontrovertible proof that the flag should be default enabled and the C-standard should be updated to match...

Infinite loop language lawyering

Posted Jul 5, 2024 18:33 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

It was attempted and it doesn't work

Such approach would have worked if there would have been a way to convince developers to treat UBs seriously.

Then contract can be established: compiler users promise not to abuse UBs, compiler developers and standard writers try to define sane set of UBs..

Unfortunately lots of C/C++ developers demand O_PONIES two-stage treatment:

  1. All good optimizations that I like should be performed even if they break someone's else programs.
  2. All bad optimizations that I dislike shouldn't ever be performed if they break my program.

And usually that is framed as the only reasonable expectations to have.

Because $CC can always be set differently, you know. Well, you may set $CC in any value you want but that doesn't change the fact that while 30-40 years ago there were dozens, of not hundred of C compilers today, if we are talking about C++23 there are about half-dozen of them and they all treat UB in a very similar (and unforgiving) fashion.

That's just unfortunate consequence of the basic approach used by all optimizing complers.

But as long as compiler users don't want to accept that and reserve the right to write programs with UB and claim that they are “correct” other sides have zero incentive to change anything. Why should they? What would it give them?

Infinite loop language lawyering

Posted Jul 5, 2024 22:28 UTC (Fri) by malmedal (subscriber, #56172) [Link]

> It was attempted and it doesn't work

I'm aware of that effort, it always devolves into endless bikeshedding. Which is why we need a unambiguous rule, the one which I only semi-seriously proposed.

Infinite loop language lawyering

Posted Jul 5, 2024 19:45 UTC (Fri) by dvdeug (guest, #10998) [Link] (8 responses)

What about -ftrapv? That exists and can't be turned on with -fwrapv. The world of CPU architectures has shrunk since the C standard was first written, but a lot of things don't necessarily work the same ways on all systems that run C; that's why they're undefined.

And what are people going to do when they find out the C compiler now produces programs that are 20% slower, that benchmarks are showing Ada or Rust to be regularly faster than C? They're going to switch on a bunch of switches, probably a bunch that some website recommended for making fast code with GCC that they have no idea what the switches do, and then complain when their code breaks.

It's a hard problem. If you want a language with things defined much more tightly, there's many of them out there. Java, for example, defines an int to be a signed 32-bit two's-complement value with warping arithmetic, whether you're running a 32-bit system, a 64-bit system, or a 24-bit system with saturating arithmetic. But C programmers want maximal speed and are theoretically fine if everything blows up due to their error. They're going to be unhappy if their code slows down due to optimizations being disabled.

Infinite loop language lawyering

Posted Jul 5, 2024 22:39 UTC (Fri) by malmedal (subscriber, #56172) [Link]

> What about -ftrapv? That exists and can't be turned on with -fwrapv.

Tie-breaking should be done by what gcc implemented first.

> They're going to switch on a bunch of switches,

A minority would, the noise-level would go way down, but not to zero.

Infinite loop language lawyering

Posted Jul 10, 2024 4:07 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (5 responses)

> And what are people going to do when they find out the C compiler now produces programs that are 20% slower, that benchmarks are showing Ada or Rust to be regularly faster than C?

That would be pretty surprising, considering that Rust has neither the signed overflow UB nor the no-forward-progress UB, and seems to optimize just fine.

Unless, of course, you're talking about an unfair comparison in which Rust is permitted to do full LTO, and C is forced to optimize one translation unit at a time. Then, yeah, Rust wins, because you decided to make C leave a bunch of performance on the table. Unfortunately, this unfair comparison is meaningful, because LTO for C is widely perceived as problematic or at least a hassle, and LTO for Rust is widely perceived as the default or standard way to compile.

Infinite loop language lawyering

Posted Jul 10, 2024 11:29 UTC (Wed) by Wol (subscriber, #4433) [Link]

> > And what are people going to do when they find out the C compiler now produces programs that are 20% slower, that benchmarks are showing Ada or Rust to be regularly faster than C?

> That would be pretty surprising, considering that Rust has neither the signed overflow UB nor the no-forward-progress UB, and seems to optimize just fine.

The problem with that, is that so many benchmarks measure the wrong thing. I'll take a Rust program that produces the right answer 20% slower, over a C program that is far faster at producing the wrong answer. At least bogomips is quite explicit in its name that the result is completely bogus.

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 10, 2024 12:33 UTC (Wed) by atnot (subscriber, #124910) [Link] (3 responses)

> That would be pretty surprising, considering that Rust has neither the signed overflow UB nor the no-forward-progress UB, and seems to optimize just fine

It may not be callled "forward progress rule UB", but Rust does absolutely have it, as I discovered a few times writing embedded code :). It does appear to have logic that preserves the very literal `loop { }` statements though, i.e. an unconditional loop with no break. I don't know if that's actually guaranteed. But as soon as you create e.g. an infinite for loop, even one that would be very obviously interminable to a compiler, it just vanishes into the æther:

https://godbolt.org/z/EoGE9f6G8

Infinite loop language lawyering

Posted Jul 10, 2024 13:20 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

Your example isn't an infinite for loop, since overflow in Rust is defined as wrapping around. So 2.. eventually wraps around to 1, and thus x == 1 becomes true.

If I change it to use an explicit u32 loop index, cast to u64, and compare to 233, it becomes an infinite loop as you'd expect.

Infinite loop language lawyering

Posted Jul 10, 2024 13:47 UTC (Wed) by excors (subscriber, #95769) [Link]

And opt-level > 0 disables overflow checks by default. If you add -Coverflow-checks=on to atnot's example, then it still optimises away the loop and immediately panics with "attempt to add with overflow".

Infinite loop language lawyering

Posted Jul 11, 2024 4:18 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Rule of thumb: If your code contains no use of the unsafe keyword, and you do not use any libraries (or you're reasonably confident that all the library code you use is correct), then you should not experience UB at all. Any UB you do encounter is either a bug in Rust or a bug in one of the libraries you're using.

Since your linked code has no unsafe blocks and no libraries, it is either well-defined (in a way that you did not anticipate), or it is a soundness bug in Rust. In this case, it appears to be the result of wrapping signed arithmetic in release mode. You can prevent this by calling methods such as i64::checked_add() instead of using the addition operator, or just build and test in debug mode (and it will panic instead of wrapping). See also the documentation in https://doc.rust-lang.org/std/ops/struct.RangeFrom.html

Infinite loop language lawyering

Posted Jul 19, 2024 10:11 UTC (Fri) by anton (subscriber, #25547) [Link]

Most pro-UB advocacy comes without numbers for the supposed performance benefits of assuming that UB does not happen. You give a 20% number, but of course without any support and in a completely unspecified hypothetical setting. And the rest of your performance claims is also purely hypothetical. Everybody should learn to laugh about such crude marketing tactics and remember the person who uses them as someone who is trying to fool us.

Infinite loop language lawyering

Posted Jul 5, 2024 18:15 UTC (Fri) by khim (subscriber, #9252) [Link] (45 responses)

> A shockingly large number of people have this bizarre misinterpretation that UB means some words printed on toilet paper are the only thing preventing the computer from committing overt acts of sadism against the programmer and user,

Indeed, shokingly large number of people understand how UB works. Sadly, that understanding is not universal.

> It might optimize away part of a function body after an infinite loop, it might not, but in either case it'll probably emit a warning to let the programmer know they should state their intent more clearly.

Have you actually tried to verify that? Take the well-known example of C compiler disproving Last Fermat's theorem. Latest version of ICC still does that. No warning, just “improved” program that does things that should be impossible.

And yes, it's specifically about infinite loop being undefined behaviour, not about integer overflow like usual.

> Those are the only reasonable expectations to have here.

And here starts another thread where someone explains how something that compilers were always doing and still do is “unreasonable” and how compilers should be rewritten by someone (who?) to confirm to the expectations of people who don't understand how compilers work and don't want to understand how compilers work.

This show was interesting the first time I saw it, decades ago, by now it have become pretty dull.

> The thing with undefined behaviour is that it has only ever meant the standard doesn't force a course of action, and the compiler is free to do in that situation what it thinks best.

But that's precisely how compilers behave! If some code can only be executed in a program that triggers UB then you can assume that code which triggers UB is never executed, thus can be eliminated as “dead code”.

That's pretty basic optimization which all compilers are doing today! And yet it leads to outcome that you call “unreasonable”.

How long do people plan to pretend that everyone around them is wrong and demand “reasonable” compilers if it's pretty clear, by now, that they are not getting them ever?

Infinite loop language lawyering

Posted Jul 5, 2024 19:17 UTC (Fri) by Wol (subscriber, #4433) [Link] (41 responses)

> But that's precisely how compilers behave! If some code can only be executed in a program that triggers UB then you can assume that code which triggers UB is never executed, thus can be eliminated as “dead code”.

But the problem is the compiler assumes that reality does not exist! Why else would the compiler optimise away all those programmer checks for things that - IN REALITY - screw up?

It's all very well the compiler living in its own alternate universe, but we live in what we like to think is the real one - where maths is supposed to describe what actually happens, not claim that reality is impossible and has no right to exist.

We're not demanding o_ponies - we just want our compilers to produce the same result as the real world does!

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 5, 2024 19:33 UTC (Fri) by mb (subscriber, #50428) [Link] (29 responses)

> But the problem is the compiler assumes that reality does not exist!
>Why else would the compiler optimise away all those programmer checks for things that - IN REALITY - screw up?

Just encode the reality into your source code instead of assuming certain behaviors for UB and everything will be fine.

The compiler has no sense of "reality". It has the standard and the source code.

Infinite loop language lawyering

Posted Jul 13, 2024 6:33 UTC (Sat) by cpitrat (subscriber, #116459) [Link] (28 responses)

> instead of assuming certain behaviors for UB and everything will be fine.

You forgot the "never inadvertently write some code that is UB "which is about ad helpful as "do not write bugs".

The big problem with UB is not people willingly using it but people inadvertently being bitten by it.

Infinite loop language lawyering

Posted Jul 13, 2024 9:03 UTC (Sat) by Wol (subscriber, #4433) [Link]

> The big problem with UB is not people willingly using it but people inadvertently being bitten by it.

Or worse, people who tried to get it right and failed!

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 13, 2024 17:28 UTC (Sat) by mgb (guest, #3226) [Link] (26 responses)

Serious question, because I've been writing C and later C++ for more than forty years and I still don't know the answer:

How bad would it be if C/C++ compilers issued warnings for all UB and all optimizations relying on "UB can't happen" except where for a performance critical piece of code the authors wrapped it in a "Do not flag this I know what I'm doing and I'll stake my career on this" macro or pragma or whatever?

Infinite loop language lawyering

Posted Jul 13, 2024 18:07 UTC (Sat) by mpr22 (subscriber, #60784) [Link] (19 responses)

By "all UB", do you mean "UB that a static verifier can prove will definitely be triggered by code within the translation unit" or "UB that a static verifier can prove could be triggered if code outside the translation unit passed (in)appropriate arguments to a function inside the translation unit, or returned (in)appropriate values when called from inside the translation unit"?

Infinite loop language lawyering

Posted Jul 13, 2024 18:54 UTC (Sat) by mgb (guest, #3226) [Link] (18 responses)

Good question. Let's for now consider just one easy case:

How bad it would be if every time a compiler actually optimized some C/C++ code based on detected UB it issued a warning, unless the warning for that code had been specifically suppressed?

Infinite loop language lawyering

Posted Jul 13, 2024 19:58 UTC (Sat) by atnot (subscriber, #124910) [Link]

As said, compilers don't detect UB. In fact they can't, or this wouldn't even be a discussion.

Compilers just apply rules that make assumptions and the assumptions they're allowed to make as per the standard are that signed integers won't overflow, dereferenced pointers are valid, empty loops don't count as a side effect, etc.

You *could* warn every time such an assumption is made. But this would mean a[i]++ emits at least five warnings (pointer validity (and it's many subtypes), alignment validity (and it's subtypes), index validity, overflow of each part in base+(index*stride), overflow of increment).

For a practical example, consider alignment. Many architectures require special or multiple, slower instructions to load unaligned values. By being able to assume all dereferenced pointers are sufficiently aligned, compilers can unconditionally emit the faster instructions instead. Since it is almost impossible to guarantee what the alignment will be (at minimum once you start calling out to foreign code like libc), this would mean warning on every pointer dereference in the entire program.

There is pretty much only one way to solve this, which is to make it impossible to get invalid pointers from valid ones and require some special operation or blessing to turn unknown pointers into valid ones. Which is why most languages seeking to avoid this situation have ended up with that.

Optimizers exploiting UB

Posted Jul 13, 2024 20:00 UTC (Sat) by farnz (subscriber, #17727) [Link] (15 responses)

The problem is that the compiler does not optimize based on detected UB; rather, you have transforms (like "a * 5" turning into "a + a << 2", only often more complicated) which are only legal in combination on the assumption that the codebase does not contain UB. Because the combination of transforms is a performance improvement, the optimizer makes them all, and the resulting code no longer does what you expect.

And that mental model of "the compiler optimizes based on detected UB" is part of the communication gap - it's not that the compiler optimizes based on detected UB, but that the compiler optimizes on the assumption that there is no UB. The only way for the compiler to avoid depending on there being no UB is for the compiler to not optimize at all.

Optimizers exploiting UB

Posted Jul 14, 2024 12:10 UTC (Sun) by kleptog (subscriber, #1183) [Link] (3 responses)

I suppose an analogue would be how physicists and statisticians switch the order of integrals in ways that makes pure mathematicians cringe. It works when all the functions are "smooth enough", but sometimes it doesn't work and you get weird results.

Within their field the switching of the integrals works and is a valid optimisation and saves a lot of time. But if you tried to apply the same optimisation in other places you would get Undefined Behaviour.

This is usually not a problem, but if you see the amount of work required for computer proofs to actually validate all the assumptions, you can understand why compilers don't go to that effort.

Optimizers exploiting UB

Posted Jul 14, 2024 14:15 UTC (Sun) by Wol (subscriber, #4433) [Link] (2 responses)

> I suppose an analogue would be how physicists and statisticians switch the order of integrals in ways that makes pure mathematicians cringe. It works when all the functions are "smooth enough", but sometimes it doesn't work and you get weird results.

AOL!

I was helping analyse truck arrival times recently, and my colleague was deferring to my ?superior? statistical knowledge. I waffled on about standard deviation and how it would give us an answer that was "good enough", but she picked up on a couple of anomalies and said "how can I present this to senior management?"

My answer, which went down okay as far as I could tell, was "we're using bell-curve analysis on a skewed distribution. It's the wrong maths, but it's good enough for what we're doing".

If you understand what you're doing is dodgy, and evaluate the risks, then it's okay. If you blindly apply formulae without understanding what those formulae do, you're going to screw up.

Cheers,
Wol

Optimizers exploiting UB

Posted Jul 15, 2024 1:20 UTC (Mon) by himi (subscriber, #340) [Link] (1 responses)

This is one of the common things I've seen about numerical modelling, particularly in the area of CFD - you can set up your models such that they run and give you coherent-seeming answers, but unless you understand both the domain being modelled *and* the numerical modelling methods being used (and their constraints, particularly with rounding and precision) you have no way of knowing if those coherent-seeming answers have any relation to reality. It's the classic garbage-in, garbage-out problem, turbocharged by the combination of tooling that makes things "easy", and extraordinary complexity underlying both the real world *and* the computational methods.

Compilers are at least operating in a relatively constrained domain, with well defined rules . . . except that strolling through the copious arguments that always crop up in any discussion that touches on language specification and compiler behaviour makes it pretty clear that "well defined rules" are something you need to be focusing on right from the start of your language design, rather than retrofitting to an already established language.

Optimizers exploiting UB

Posted Jul 15, 2024 8:29 UTC (Mon) by farnz (subscriber, #17727) [Link]

Compilers are at least operating in a relatively constrained domain, with well defined rules . . . except that strolling through the copious arguments that always crop up in any discussion that touches on language specification and compiler behaviour makes it pretty clear that "well defined rules" are something you need to be focusing on right from the start of your language design, rather than retrofitting to an already established language.

Even if you do focus on well-defined rules from day 1, the whole field of programming language semantics is relatively young, and it's not completely clear which formal semantic model gets you the best outcomes; we know that axiomatic semantics are hard for non-specialists to interpret correctly, but that's about it for things we can use to choose between formalisms. At the moment, it looks like operational semantics are the front-runner, since they lend themselves nicely to optimizations on the basis of the "as-if" rule, but it's not clear whether operational or denotational semantics are easier to transform to a form that's clear to non-specialists. And there's no point having "correct" formal semantics if they're only referenced by specialists in formal semantics (since that's a subset of compiler developers); we need the formal semantics to at least be transformable into something that's clear to a skilled programmer.

Optimizers exploiting UB

Posted Jul 15, 2024 14:20 UTC (Mon) by Wol (subscriber, #4433) [Link] (5 responses)

> The problem is that the compiler does not optimize based on detected UB; rather, you have transforms (like "a * 5" turning into "a + a << 2", only often more complicated) which are only legal in combination on the assumption that the codebase does not contain UB. Because the combination of transforms is a performance improvement, the optimizer makes them all, and the resulting code no longer does what you expect.

To my mind, this instantly splits optimisations into two sorts (and I'm sure you'll tell me things don't work that way :-) but there is (1) your a*5 example, where the calculation still takes place, just that the result is not what you expect.

Then there's (2), where constant-folding changes the program flow - for example the test for undefined behaviour that folds to false because the compiler assumes undefined behaviour can't happen.

How easy it would be to do, I don't know (as in, does the compiler even have the context to do so), but if we could have "Warning: the expression inside a conditional has been folded to true/false", that would pick up a lot of the problematic optimisations.

So long as we can avoid warning on stuff that's actually been written as "while(true)" or "if(false)", (and I've no doubt programmers write that stuff - I do) then it shouldn't be a problem, because if that's what programmers meant, they should have written it.

Cheers,
Wol

Optimizers exploiting UB

Posted Jul 15, 2024 14:48 UTC (Mon) by farnz (subscriber, #17727) [Link] (4 responses)

That runs into the problem of macro expansion, constexpr functions, inlining and more. while(1) is clearly intended to be an infinite loop, but from a C99 or C11 compiler's point of view, while(true) is the same as while(HOW_BIG_IS_THAT_FISH), since true is not a value, but a macro that expands to the value 1.

And the problem with your description is that the program doesn't know that the "test folded to false", it knows that there's a conditional that's always false - but it can't tell whether it's intentionally always false, or whether that's an accident. Compilers can't read developer minds, yet.

Remember that, for a lot of the key optimizations, it's not working at the level of the source language - it's working with something like LLVM IR, and what it's seeing is a conditional jump where the condition is a constant, but it has no clue that the condition was "meant" to be variable, since the optimizations aren't working at that level. And it could be hundreds of optimization passes in before the condition becomes constant, at which point it's really hard to work out whether it's constant because the compiler has assumed no UB, or whether it's constant because the fully-defined behaviour of the language makes it constant.

Optimizers exploiting UB

Posted Jul 15, 2024 15:55 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

> And the problem with your description is that the program doesn't know that the "test folded to false", it knows that there's a conditional that's always false - but it can't tell whether it's intentionally always false, or whether that's an accident.

And the compiler doesn't track whether a constant expression was constant when first met, or a previous optimisation pass has folded it. And as you say macros screw things up.

I wasn't asking it to read minds - it was as simple as "did the programmer write "while(true)", or "while(a==b)"", but I'm not surprised you're saying even something that simple is hard to track. Because if the latter evaluates to a constant there's a good chance the programmer didn't mean it (but if that code was in a macro, then it might not always evaluate to a constant for the programmer who wrote the macro, but does for the programmer using the macro :-(

Cheers,
Wol

Detecting UB involves reading minds :-(

Posted Jul 15, 2024 17:27 UTC (Mon) by farnz (subscriber, #17727) [Link]

Unfortunately, it becomes impossible for the compiler to even determine what the programmer wrote at the point where it's interesting to know if a condition is constant, especially since defensive programming means that a condition might be known to be constant, even though it's variable at source level. Consider the following C function:


static int fetch_update_package_epoch(epochs *epoch_area) {
    if (epoch_area == NULL) {
         return 0;
    }
    epoch_area->package += 1;
    return epoch_area->package;
}

At source level, this is clearly not attempting to invoke UB, and there's no constant comparisons involved. As a result, by the time the optimizer is relying on assuming no UB, what it will be looking at (after optimizing the function's IR over many passes) is something like:


define dso_local i32 @fetch_update_package_epoch(ptr noundef %epoch_area) local_unnamed_addr {
entry:
  %cmp = icmp eq ptr %epoch_area, null
  br i1 %cmp, label %return, label %if.end

if.end:
  %0 = load i32, ptr %epoch_area, align 4
  %add = add nsw i32 %0, 1
  store i32 %add, ptr %epoch_area, align 4
  br label %return

return:
  %retval.0 = phi i32 [ %add, %if.end ], [ 0, %entry ]
  ret i32 %retval.0
}

This is short enough to inline; when it's inlined, the optimizer will replace a %call = call noundef i32 @fetch_update_package_epoch(epochs*)(ptr noundef nonnull %epochs) with the function body, making the right substitution for epoch_area. Now, because the call tells us that epoch_area is nonnull, the comparison becomes a constant, the branch can be made unconditional, and then dead code elimination removes the NULL check.

But the problem is that what's triggered the comparison becoming a constant is the compiler identifying that epoch_area in the call at IR level is nonnull. The optimizer can't warn you that it's eliding the NULL check based on this, because it cannot distinguish "this is nonnull because otherwise there is UB in the program" from "this is nonnull because the source lines are epoch_area epochs = { package = 1 }; pkg = fetch_update_package_epoch(&epochs);. Both of those are the same IR at this point - a call with a non-NULL annotation on the pointer - how should the optimizer know that one of those annotations is fine to ignore, but the other matters? And note that you also don't want to warn just because a pointer is marked as non-NULL because it'd be UB to be NULL, since that would cause you to warn on:


static int fetch_update_package_epoch(epochs *epoch_area) {
    epoch_area->package += 1;
    return epoch_area->package;
}

But it might be perfectly clear from the rest of the source code context that, in this case, epoch_area will never be NULL, because no function calls this with a possibly-NULL pointer.

Optimizers exploiting UB

Posted Jul 15, 2024 19:35 UTC (Mon) by unilynx (guest, #114305) [Link] (1 responses)

> That runs into the problem of macro expansion, constexpr functions, inlining and more. while(1) is clearly intended to be an infinite loop, but from a C99 or C11 compiler's point of view, while(true) is the same as while(HOW_BIG_IS_THAT_FISH), since true is not a value, but a macro that expands to the value 1.

eslint has a nice fix for this exact situation in JavaScript - you can configure it to require for(;;) to build an endless loop. no constant folding happens so no confusion

https://eslint.org/docs/latest/rules/no-constant-condition

JavaScript not a good source of analogies or fixes, unfortunately.

Posted Jul 15, 2024 19:54 UTC (Mon) by farnz (subscriber, #17727) [Link]

That only works because the output language is still JavaScript, with JavaScript semantics; the issue in this case is that the compiler has read in C (or Rust, or whatever) source code, translated it to an IR like GIMPLE or LLVM IR (with different semantics to the source language) on the basis that the translation only has UB in the IR if the source code had UB, optimized the IR on the assumption of no UB in the IR output, and then ended up with IR that it translates to machine code that has different semantics to the one the programmer intended, because the programmer did not intend to write code with UB.

Note, for example, that C++'s "forward progress" assumption also allows for endless loops, as long as they're "trivial", change state visible to another thread, or they contain I/O. It's only pure computational loops that can be elided if they do not terminate, and only if they're non-trivial in the original source - so for(;;) {} is trivial, and therefore cannot be elided, while for(;;) do_the_thing() is non-trivial, and can be elided if do_the_thing() has no effects visible outside the current thread. This is useful for compilers, since once it's able to show that do_the_thing() is a no-op, it can elide the whole loop - even if do_the_thing is actually a macro containing a break; statement.

There is no real equivalent of this stack of languages in the JavaScript ecosystem, because the "machine code" you execute is also JavaScript.

Optimizers exploiting UB

Posted Jul 19, 2024 10:25 UTC (Fri) by anton (subscriber, #25547) [Link] (4 responses)

transforms (like "a * 5" turning into "a + a << 2", only often more complicated) which are only legal in combination on the assumption that the codebase does not contain UB.
This is certainly a valid optimization with -fwrapv semantics. With -ftrapv semantics it's also valid AFAICS. So it's certainly not a transformation that is "only legal in combination on the assumption that the codebase does not contain UB."

Pro-UB advocates like to give the impression that basically every optimization requires the assumption that the program does not exercise undefined behaviour (e.g., they love to recommend -O0 for those people who do not want optimizations that assume the absense of UB), but in reality most optimizations do not require that assumption. Here we have an example where a pro-UB advocate wanted to give an example of an optimization that requires this assumption but failed.

Optimizers exploiting UB

Posted Jul 19, 2024 11:16 UTC (Fri) by farnz (subscriber, #17727) [Link] (3 responses)

It was not intended to be a transformation that is "only legal on the assumption that the codebase does not contain UB"; it was an example of a transform that is hopefully uncontroversial, in order to indicate just how much you're leaving on the table if you say "the code must behave exactly as written, and no optimizations are permitted in case a combination of optimizations result in UB doing something unexpected". In other words, that particular transform is an example of something that's barred if you say "use -O0" and yet that everyone reasonable agrees is one that should be permitted.

You even quoted the parts where I said that this wasn't an optimization that, on its own, depends on UB existing - I said "like … only often more complicated" to indicate that this transform is not itself controversial. I also said "transforms which are only legal in combination", because the problem is rarely transformations that rely on UB in their own right, but rather combinations of transforms where each transform is a valid optimization on its own, but the combination of 2 or more transforms (bearing in mind that even bad compilers have hundreds of optimizations they know how to perform) that is only OK in the absence of UB.

And the problem is that the resulting combinatorial explosion of possible combinations of transforms, each valid in its own right, and where the combination is itself valid in the absence of UB is a big space to search through. By defining more behaviours, you limit the allowed transformations and the allowed combinations of transforms, so that these combinations become compiler bugs, instead of bugs in your code.

Finally, I consider your name-calling defamatory; please desist from making false statements about me with the intention of making other people think worse of me.

Optimizers exploiting UB

Posted Jul 19, 2024 14:34 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

I got the point.

It's a bit like a sequence of integer multiplies and divides. *I* may know that if I do them in the correct order, I will get the correct result without either over- or under-flow, but if some clever idiot loses my parentheses in the bowels of the compiler, then the computer will get it wrong. Equally, if I'm used to the compiler doing it left-to-right, and then a slight change makes the compiler do it right-to-left. It's the law of unintended consequences, writ large, and the continual flow of new optimisers writs it ever larger :-)

Hence the stream of people moaning about it as they get bitten out of the blue ...

Cheers,
Wol

Optimizers exploiting UB

Posted Jul 19, 2024 15:02 UTC (Fri) by pizza (subscriber, #46) [Link] (1 responses)

> It's a bit like a sequence of integer multiplies and divides. *I* may know that if I do them in the correct order

... but do you, really? Have you proven that your specified ordering does the right thing on all possible inputs?

(On a near-daily basis I have to deal with the consequences of arithmetic overflow and issues relating to integer precision)

> will get the correct result without either over- or under-flow, but if some clever idiot loses my parentheses in the bowels of the compiler, then the computer will get it wrong. Equally, if I'm used to the compiler doing it left-to-right, and then a slight change makes the compiler do it right-to-left. It's the law of unintended consequences,

This is a poor example; operator evaluation order (including explicit evaluation ordering signified by use of parenthesis) is part of the core language specification, and compilers can't (and don't) muck with that.

Optimizers exploiting UB

Posted Jul 19, 2024 15:25 UTC (Fri) by adobriyan (subscriber, #30858) [Link]

> (including explicit evaluation ordering signified by use of parenthesis)

What are you talking about? The evaluation order of "x = a + (b + c);" is "anything".

Infinite loop language lawyering

Posted Jun 20, 2025 23:27 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

If your compiler is able to detect a potential problem, and it is very unlikely to be a false positive (you don't want a warning each two lines), they certainly do. For instance, possibly using a variable before assigning it a value is pretty routinely warned about. It is just very hard to detect all UB reliably enough not to drown you in warnings.

Infinite loop language lawyering

Posted Jul 13, 2024 18:46 UTC (Sat) by mb (subscriber, #50428) [Link]

>How bad would it be if C/C++ compilers issued warnings for all UB

Compilers don't try to and fundamentally can't find most actual UB in programs.
They *assume* that there is no UB and then apply optimizations with that assumption as a base.

Compilers do *not* find UB and then exploit it to optimize the program.
They assume that there is no UB.

>Do not flag this I know what I'm doing and I'll stake my career on this"

Use volatile and/or -O0.

Infinite loop language lawyering

Posted Jul 19, 2024 9:52 UTC (Fri) by anton (subscriber, #25547) [Link] (1 responses)

You are getting all kinds of excuses about the supposed impossibility or impracticality of such warnings. But if the C++ compiler writers actually wanted to provide such warnings, there are ways to do so. E.g., for the classic example where the compiler assumes that overflow does not happen, generate code without that assumption and with that assumption; if the code is different, output a warning. With tracking of where in the source code the assumptions about, e.g., value ranges are coming from, the warnings can also pinpoint to that code. But of course the culprits among the compiler maintainers do not want to do that because it would detract them from producing better benchmark results.

One interesting case is the idiom "if (x+1 <= x)". Someone has built recognition of this idiom into gcc, because gcc warns about it (or at least something similar). However, gcc does not optimize it into the shortest code "inc x; jno ..." when -fwrapv is enabled, nor does gcc optimize (x==LONG_MAX) into this code (at least when I last looked). So these are idioms that some gcc maintainer working on user friendlyness recognizes and warns about, but the gcc maintainers working on benchmark results do not optimize them.

Infinite loop language lawyering

Posted Jul 19, 2024 15:06 UTC (Fri) by atnot (subscriber, #124910) [Link]

> But if the C++ compiler writers actually wanted to provide such warnings, there are ways to do so.

I mean, they are. Look at GCC -fanalyzer, or even much of the existing warning infrastructure, all of which uses much of the same machinery as the compiler and in the case of -fanalyzer also extra stuff like symbolic evaluation that the compiler normally won't do.

However as you'll also discover reading the fanalyzer blog posts, the hard part is not finding possible UB, that's easy, too easy. It's making a tool that has sufficiently low false positives that anyone is actually willing to use it. Something that can reliably tell the difference between code that humans would consider "obviously correct" and subtly wrong, across various contexts and in every stage of compilation from the source down to the final assembly. That's what people who ask for this actually want and that is the thing that is nigh impossible in a language where right and wrong code looks almost identical, but people demand must do everything exactly as they ask, make it fast, output exactly the assembly instructions they expect (see your comment), and never break or change anything.

Infinite loop language lawyering

Posted Jul 19, 2024 14:24 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

> How bad would it be if C/C++ compilers issued warnings for all UB and all optimizations relying on "UB can't happen"

VERY bad, I think. Look at my discussions with farnz and pizza.

The problem basically can be summed up in two words - "macros" and "templates". In other words, a massive chunk of the code that actually gets to the compiler, wasn't written by the programmer. And a lot of that code probably relies on "UB won't happen".

So, to put it simply, you'll drown in false positives, most of them down to code that's nothing whatsoever to do with what you actually wrote. I couldn't agree more with what you want/wrote, I just don't think with modern compilers it's remotely practical. Half the problem is they keep adding more and more "cleverer" optimisers that bite mere mortals in the butt. You can turn off all the stupid optimisers, but can you even find them all? Especially as your definition of "stupid" probably disagrees with mine.

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 19, 2024 14:53 UTC (Fri) by pizza (subscriber, #46) [Link]

> You can turn off all the stupid optimisers, but can you even find them all?

Uh, yeah.. "-O0" turns off _everything_. And everything rolled into -O[n>0] can be individually selected.

...O0 also disables a lot of warnings because those rely on the optmization passes to detect questionable stuff.

> Especially as your definition of "stupid" probably disagrees with mine.

The definition of "stupid" also varies depending on the hardware and libc/operating system.

Stupid optimizations that depend on UB can't happen

Posted Jul 19, 2024 17:54 UTC (Fri) by farnz (subscriber, #17727) [Link]

There's a second layer of trouble, too. A single optimization looks like "if these obligations are fulfilled by the current IR, then this IR transform is applicable and will improve the output"; the compiler will have some form of sequencing and limits on which individual optimizations it's willing to run, but it will otherwise repeatedly loop running the optimizations whose obligations are fulfilled until it gets to a point where no optimization is available or the compiler has hit its limits.

And the thing that hurts is not normally a single optimization; rather, it's that a sequence of optimizations have their obligations met, and the resulting output is not what was intended, because the compiler front-end gave you IR that fulfilled the obligations on the basis that the source code contained no UB, but you wanted a different behaviour.

It's worth noting, too, that a lot of the "exploiting UB" that compilers do comes from a position of wanting code to have "portable performance". For example, there's a lot of C code out there that uses int as the index into an array (instead of size_t, which is technically what you're "meant" to use); on systems where int and size_t are the same size, this is a complete non-issue, since if signed overflow is defined as twos' complement wraparound, you'd still generate the same code for arr[start + offset] as you would if it was undefined behaviour (assuming a non-adversarial compiler). On a twos' complement 32-bit machine, there's no non-adversarial code generation for arr[start + offset] that's better than the obvious "add start to offset, use indexed addressing with a shift to get to the right bit of memory".

But on a LP64 machine, it gets more interesting. If start and offset are both ints, then the addition is done in int precision, then it's supposed to be sign-extended to 64-bit (to match size_t, then the shift-and-add happens in 64-bit. If you define signed overflow as "what x86-64 instructions do", then this requires a sign extension; if you don't define it, then the compiler is allowed to elide the sign extension, and generate the equivalent instructions to the 32-bit version.

Now, it's very reasonable to argue about whether or not the code should just be fixed to use size_t instead, or to say that the performance hit of sign extension is a price worth paying to remove UB on signed overflow. But the current situation is that compilers have to depend on no signed overflow in order to elide the sign extension, and they want to do that because it saves a few %age points on a benchmark that compilers care about.

And once you've written an optimization that fixes this in LLVM IR terms, it starts to fire in places where, instead of being an unmitigated benefit (as in the indexing case), it's a surprise.

For the curious, the sort of optimization that I'd be expecting to see fire in this particular case would be looking for LLVM IR like:

  %add = add nsw i32 %7, %8
  %idxprom = sext i32 %add to i64
with the prerequisite for applicability being that the use of the output of the sext instruction is undefined if the output is negative. If that's the case, you can remove the sext instruction and replace %idxprom with %add in the rest of the scope. For indexing, that'll be fine, since you'll use getelementptr inbounds to do the array[offset] operation, and inbounds guarantees that array + offset is within the bounds of the allocation tied to array, so you know that if %idxprom is used as the offset in a getelementptr inbounds, it'll be UB if %idxprom < 0.

With that in place as an optimization, clang just needs to know that addition of signed integers should be add nsw, rather than plain add, since add is defined even in the case of overflow, and nsw modifies arithmetic instructions to say that signed overflow is undefined. But now you can get surprise cases that aren't the indexing case we were trying to optimize, where the compiler happens to know that the output use case is undefined (possibly because another optimization pass was able to directly use the output with a getelementptr inbounds, or similar) if the output is negative, and therefore this optimization fires, resulting in you being surprised.

And that's the sort of thing that leads to "UB is evil" situations; the optimizations individually all make sense, are well-motivated, and make the output better; but they combine in unexpected ways, and it's the combinations that hurt, and hurt badly.

Infinite loop language lawyering

Posted Jul 5, 2024 20:12 UTC (Fri) by dvdeug (guest, #10998) [Link]

Really? Because in the reality I live in, 2,147,483,647 + 1 never equals a negative number.

> we just want our compilers to produce the same result as the real world does!

Python will accept that x + 1 > x for all integer values of x. People arguing about C don't want unbounded integers, because they're slow, but that means you don't want the real world, you want a fast approximation of it. They want 1/10 to equal zero, instead of one-tenth (like in Scheme), even though that constantly confuses new programmers. At which point, what compromises are acceptable aren't something that can be sweepingly defined; you've got to carefully negotiate what costs too much and what makes programming too frustrating to be worth its cost.

(For a non-math example, functions that take a float pointer and an int pointer can assume they don't alias. That rule was added in C99 because Fortran compilers always followed it and could at times produce faster code than C compilers because of it. Is it worth it? I can't say, but someone specifically thought that rule was worth adding because of the optimization it permitted. If you didn't care at all about the optimization, why were you using C instead of any number of slower, more "real-world" languages?)

Infinite loop language lawyering

Posted Jul 5, 2024 23:09 UTC (Fri) by khim (subscriber, #9252) [Link] (9 responses)

> But the problem is the compiler assumes that reality does not exist!

Sure. But that's the only kind of compilers that we may currently produce. To assume that reality exists compiler would need conscience and understanding of said reality.

Whether that's good idea or not is completely irrelevant when we couldn't give it these things!

> Why else would the compiler optimise away all those programmer checks for things that - IN REALITY - screw up?

Because compiler “doesn't know any better”. It doesn't know anything at all, in reality.

It just blindly applies very simple rules many times in row and hopes to produce better results from them.

It can get some estimate about whether the result is better or worse, but it's intrinsically incapable of deciding whether it removes programmer checks or not!

> We're not demanding o_ponies - we just want our compilers to produce the same result as the real world does!

Nope. You are precisely demanding O_PONIES. You are asking compiler to produce something that you perceive as reality.

But people are different. And what they perceive as reality is different, too.

None of compilers ever acted like you ask them to behave.

They were just too primitive to produce surprising results.

Infinite loop language lawyering

Posted Jul 6, 2024 16:16 UTC (Sat) by Wol (subscriber, #4433) [Link] (8 responses)

> Nope. You are precisely demanding O_PONIES. You are asking compiler to produce something that you perceive as reality.

> But people are different. And what they perceive as reality is different, too.

> None of compilers ever acted like you ask them to behave.

When the world was fresh, and Unix was young, EVERY C compiler acted like I'm asking them to.

If I asked them for

"x = a + b;"

that's what they gave me. They didn't say "ooh - the real mathematically correct answer isn't the answer the CPU is going to give me so I won't bother answering the question - oh and based on the fact I can't be bothered to answer the question I won't bother doing a load of other work too".

All I want is if I ask the computer a question, it should please god deign to give me an answer, not just ignore it because it thinks it knows better.

And if the C specification says "Ascii silly question, get a silly ansi", then fine. That's my problem. But the modern C specifications (as opposed to the old C standards - that's lower case s - what compilers actually did back then) say "Ascii silly question, watch C do a Cheshire Cat smile".

(Reality is what the computer does. In my reality, I would prefer it if the computer actually tried to do what I told it to. If I tell it wrong, that's my problem!)

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 6, 2024 17:19 UTC (Sat) by khim (subscriber, #9252) [Link] (6 responses)

> If I asked them for

"x = a + b;"

that's what they gave me.

But if you asked for

x = 2 + 2;

you were getting code that would just store 4 into x. You couldn't just take address of function with such code and the monkey-patch 2 to 3. Because that 2 was never there.

I have tried to find some compiler which wouldn't do that and couldn't find any. Even old Turbo C 1.0 does that. Unconditionally.

> They didn't say "ooh - the real mathematically correct answer isn't the answer the CPU is going to give me so I won't bother answering the question - oh and based on the fact I can't be bothered to answer the question I won't bother doing a load of other work too".

Sure. But that wasn't because they could understand your intent, but because they were very primitive and couldn't reason globally. But the approach to the optimizations was always the same: function that tries to look into it's own code to find 2 to monkey-patch it to 3 has undefined behavior and is not supposed to be supported.

And you couldn't draw the line in the sand between “then” and “now”: compilers always relied on anbsence of UB, but since they were primitive and could only “reason” locally human could always predict if they would “take an advantage of UB” or not.

As number of optimizations grew that have become harder and harder but there no “line in the sand” that separates “good compilers” from “bad compilers”.

> All I want is if I ask the computer a question, it should please god deign to give me an answer, not just ignore it because it thinks it knows better.

Well… maybe compilers would get a conscience they would be able to do that, but none of the compilers that are in use can do that. Not even -O0 may save you. Just try to find the compiler which would retain 2 in the generated code for return 2 + 2; line.

> n my reality, I would prefer it if the computer actually tried to do what I told it to.

Well, none of the compilers do that these days (have they ever existed? I couldn't find any) and even if you would create a compiler that would actually retain that 2 in the machine code I doubt anyone would use it.

The problem with O_PONIES lovers is not that they don't like the compiler to do optimizations for them, rather they expect that compiler would magically know which optimizations they would like and which optimizations they don't like.

That is why it's O_PONIES: I could have understood people who wanted, and made for themselves, compiler which altered absolutely nothing in the code and would have even left that 2 + 2 as three separate instructions… that would have takes less time that complaining about evil of compiler writers on various forums… but no, that's not what is happening, they still expect that compiler would do some “sane” optimizations… they just could never agree on which optimizations they consider “sane”.

Infinite loop language lawyering

Posted Jul 6, 2024 19:02 UTC (Sat) by Wol (subscriber, #4433) [Link] (5 responses)

> And you couldn't draw the line in the sand between “then” and “now”: compilers always relied on anbsence of UB, but since they were primitive and could only “reason” locally human could always predict if they would “take an advantage of UB” or not.

Except UB is a *new* concept that postdates C by a *lot* of years ... so compilers couldn't take advantage of it if it didn't exist. THAT is the problem. The standards postdate the compilers, and today's compiler writers have taken advantage of that to change the behaviour of language.

It took the appearance of the official standards to create UB. Before that, compilers actually behaved reasonably sanely.

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 7, 2024 10:53 UTC (Sun) by khim (subscriber, #9252) [Link] (4 responses)

> THAT is the problem.

Nope, that's not a problem at all.

> Except UB is a *new* concept that postdates C by a *lot* of years ... so compilers couldn't take advantage of it if it didn't exist.

They had no need for that. Before standard existed compilers were already taking advantage of the idea that developer would be “good boy” (or girl) and wouldn't do “crazy things”.

They just had different lists of “crazy things” that developers shouldn't do.

Standard gave a name to that concept but it haven't invented it.

That's why it had to add literally hundreds of UBs: compiler developers fought tooth and nail to get the right to not support certain “crazy things”, while C users were blissfully ignorand.

If look on Rust you'll find out that it's list of UBs is much shorter, but it's because this time developers and users were talking and discussing things.

Because undefined behaviors like A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment don't just invent themselves: most likely there was a compiler vendor which wanted to certify it's compiler but said compiler read program line-by-line and ignored last line without a new-line character.

> The standards postdate the compilers, and today's compiler writers have taken advantage of that to change the behaviour of language.

Nope. All these UBs (at lest the majority of them) were already exploited by compilers when the standard was written (or, at least, compiler developers had plans to exploit them).

But there are dozens of compilers and they exploited different UBs and in a very “local fashion”. Today the list of UBs is almost the same (there were few added over the years, but mostly this was about making “UBs by omission” explicit) but compilers have enough memory and computing power to do much more global optimizations.

That's the difference. If what you have said (that's “new” concept, “old” compilers were benign and other such nonsense) was true then it would have been possible to separate compilers into two classes: “good” and “bad”. But that's very much impossible: Turbo C 1.0 was released in 1987, two years before release of standard and it was already replacing return 2 + 2; with return 4; unconditionally.

Infinite loop language lawyering

Posted Jul 7, 2024 12:24 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> in 1987, two years before release of standard and it was already replacing return 2 + 2; with return 4; unconditionally.

And if instead it was replacing "return 200 + 200" with "return 400", when the function type was signed byte?

If I've got my maths right, the wrong thing to do is return -109. The right thing to do is a compiler error. The WORST POSSIBLE thing to do is silently delete the function, without warning. If I'm lucky, it's in a .o which then results in the linker crashing. If I'm unlucky, it's in the same .c as main{}, and has awful knock-on effects there.

And this is why C is a rubbish language. It runs on real hardware, but refuses to accept reality, pretending to run on a perfect Turing Machine. And in the process it actively hinders the programmer's attempts to ensure that theory and reality coincide.

The first Pick machine I programmed had BCD microcode. I think it actually used that as default, with infinite precision being possible. Pick itself actively encourages fixed point maths, which makes the programmer handle the exponent themselves. Both these features are totally unnecessary in the normal course of events, and programmers mostly ignored them, but the fact they are front and centre in the language and implementation means that at every step the programmer is confronted with the question "do I need to worry?"

And this is the problem with C. It *hides* from the programmer that this is a serious issue. And when you're dealing with hostile adversaries and malicious input, the fact that you cannot trust the language to at least try and protect you, is very dangerous. Worse, the language design actively helps attackers.

Stuff - anything - should be designed to "make the simple things easy, and the hard things possible". If C makes even understanding the language hard, then *everything* will be difficult. SQL explicitly takes this to extremes! As I understand it, a design aim was to make all things equally easy. As soon as any bit of your problem is hard, that makes *everything* hard!

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 7, 2024 13:34 UTC (Sun) by khim (subscriber, #9252) [Link]

You are barking at the wrong tree, as usual.

> It runs on real hardware, but refuses to accept reality, pretending to run on a perfect Turing Machine.

That's the only way high level languages ever work. It's more-or-less the definition of high-level language. Complaining about that is like complaining that water is wet or fire is hot.

> And this is the problem with C. It *hides* from the programmer that this is a serious issue.

All languages do that, that's not an issue at all. Or, maybe better say that it's an issue, sure but that's inevitable issue thus, again, complaining about that is like complaining that water is wet or fire is hot.

> And when you're dealing with hostile adversaries and malicious input, the fact that you cannot trust the language to at least try and protect you, is very dangerous. Worse, the language design actively helps attackers.

Do you see me objecting? Of course C horrible language. But the actual fact is that any other language which fill the same niche would be horrible, by your definition, too!

That's something you completely ignore and just scream O_PONIES, O_PONIES, gimme O_PONIES.

> The WORST POSSIBLE thing to do is silently delete the function, without warning. If I'm lucky, it's in a .o which then results in the linker crashing. If I'm unlucky, it's in the same .c as main{}, and has awful knock-on effects there.

Yet that's necessary property of the system-level language that “permits everything”. Optimizations are transforming programs for Turing machine into different program for Turing machine (that's essentially what your computer is, whether you want to admit it or not). To perform them we need to answer the question: are these programs equivalent or not. And this question, quite literally, can not be answered. Not “it's hard to do”, not “we don't know how to do that yet”, but simply “that's mathematically impossible”.

Thus you only have two choices:

  1. Your language have a straightjacket of some sort (be it managed code or some kind of theorem prover or something), or
  2. Your language have certain constructs that compiler accepts and then miscompiles.

These are the only two choices, choice of only accepting “good programs” and rejecting, at compile time, “bad programs” is just not possible.

Most high-level languages pick the #1 choice, but that makes them unsuitable for low-level system programming work.

C makes #2 choice and any other low-level language that we have does the same. Even optimizing assemblers do that, that's why [it's impossible to compile old assembler sources with modern assemblers](https://www.os2museum.com/wp/gw-basic-source-notes/)!

Only very primitive assemblers are not affected because there are no non-trivial transformations of the code, input and output Turing machines are the same thus problem doesn't exist.

Even Rust, for all [pretty justified] hype around it does the same choice. It just reduces scope of that part where “bad programs” are accepted and then miscompiled. But it doesn't eliminate that property. Because it couldn't.

C was written in simpler times, where people believed that you may just say to developers what “bad programs” shouldn't do and that would be enough. Even if said list of “bad things not to do” is, literally, hundreds of entries long.

It doesn't work, sure, but that's property of the language, not property of compilers for that language. Compilers are just trying to do the best job they could with awful language they got.

It would have been possible to change the language and make it less awful (Ada did, that, after all), if not for the firm misguided belief of many C and some C++ developers that their adversary is not intrinsic complexity of the task they tackle, but developers of the compilers who just refuse to correctly compile perfectly correct code with bazillions of UBs.

Infinite loop language lawyering

Posted Jul 7, 2024 13:47 UTC (Sun) by pizza (subscriber, #46) [Link] (1 responses)

> And if instead it was replacing "return 200 + 200" with "return 400", when the function type was signed byte?

The result would be the same either way, because untagged numeric literals are treated as signed ints. If the compiler adds 200+200 and then assigns/returns the resulting 400, or performs constant compile time evaluation and replaces 200+200 with 400 in the binary, it's still an integer that's too large to fit in its destination.

What happens next depends on how the compiler+hardware defines "int" and "signed byte" but assuming int is 32-bit twos complement, and 'signed byte' is the same but only 8 bits, 400 (0x190) gets truncated to -112 (ie 0x90).

Infinite loop language lawyering

Posted Jul 8, 2024 12:54 UTC (Mon) by khim (subscriber, #9252) [Link]

Result would be the same either way simply because such code doesn't contain any undefined behaviors.

Addition happens as ints even if source is char, and 400 fits in int on all known compilers.

Conversion from int into 8-bit quantity overflows, sure, but that's not an issue, that particular overflow if very well defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ᴺ, where N is the width of the destination type

Since there are no undefined behaviors and results are well-defined… compilers don't have a choice: language says the result would be -112, there are no ambiguity.

Infinite loop language lawyering

Posted Jul 6, 2024 21:37 UTC (Sat) by mb (subscriber, #50428) [Link]

> If I asked them for
> "x = a + b;"
> that's what they gave me.

-O0

Problem solved.

Infinite loop language lawyering

Posted Jul 5, 2024 19:35 UTC (Fri) by atnot (subscriber, #124910) [Link] (2 responses)

> And here starts another thread where someone explains how something that compilers were always doing and still do is “unreasonable”

Yes, agreed, but can we we at least not have it between the same two people for a week again this time (looking at you khim and wol)

Infinite loop language lawyering

Posted Jul 5, 2024 20:50 UTC (Fri) by Wol (subscriber, #4433) [Link]

:-)

I sometimes think khim is arguing for the sake of it, and I know I get dragged in too easily ...

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 5, 2024 23:04 UTC (Fri) by willy (subscriber, #9762) [Link]

I prefer it when Wol gets involved as I was recently shown how to filter people out. It really improves the S/N in the comment section.

You're right, I should probably add khim too.

Infinite loop language lawyering

Posted Jul 6, 2024 5:25 UTC (Sat) by epa (subscriber, #39769) [Link] (13 responses)

But optimizing away part of a function body after an infinite loop does not need any assumptions about undefined behaviour. If the compiler can prove the loop does not terminate then clearly the later code is not reachable. That would be the case under any formalization of the language.

That’s why I questioned whether the standard really says “if a loop doesn’t terminate, this is just as undefined as dereferencing an invalid pointer”. Surely there are different ways the standard could permit optimizations.

Infinite loop language lawyering

Posted Jul 6, 2024 16:26 UTC (Sat) by khim (subscriber, #9252) [Link] (12 responses)

> But optimizing away part of a function body after an infinite loop does not need any assumptions about undefined behaviour.

Sure, but that's different optimization entirely.

> Surely there are different ways the standard could permit optimizations.

Maybe, but that would imply some kind of entirely different approach to defining high-level languages and optimizations, too.

Currently the framework is based on as if rule: if two code sequences produce the same output for all programs that don't trigger UB situation then they are equivalent and after series of such transformations the final, optimized, result is what's written on disk.

Consider that case where disproves Last Fermat's theorem. The sequence of events that happen:

  1. Compiler notices that loop only changes state of local variables and nothing else.
  2. Compiler notices that if that loop ever terminates then it happens via return 1 statement.
  3. Compiler replaces the whole loop with return 1 statement.
  4. Programmers start screamaing bloody murder.

Notice how nowhere at all compiler ever tries to decide whether loop is terminating or not! It just simply assumes that program is valid egro it doesn't exhibit UB ergo return 1 is executed and all code that is between start of the function and that return 1 statement is not really needed.

Infinite loop language lawyering

Posted Jul 7, 2024 8:27 UTC (Sun) by epa (subscriber, #39769) [Link] (11 responses)

Thanks for explaining. But what are the useful, sound optimizations enabled by this rule, which effectively says “the compiler may assume that all loops terminate”?

(I wonder whether rewriting the loop using goto, or tail recursion, would avoid this problem. I had thought a for-loop was syntactic sugar for if and goto, but that’s not the case if an infinite goto-loop is defined while infinite for-loop is not.)

Infinite loop language lawyering

Posted Jul 7, 2024 10:33 UTC (Sun) by khim (subscriber, #9252) [Link]

> I wonder whether rewriting the loop using goto, or tail recursion, would avoid this problem.

Of course not. Very much not.

> I had thought a for-loop was syntactic sugar for if and goto

And you were correct.

> but that’s not the case if an infinite goto-loop is defined while infinite for-loop is not.

Have you tried to click on links that article helpfully provides and look on the actual rule that infinite loops are violating? Then you would know why tail recursion doesn't work and why there are not difference between for and while.

Of course goto loop works or (doesn't work, depending on who is asking) in the exact same way, there are absolutely zero difference.

Here is the actual quote: the implementation may assume that any thread will eventually do one of the following: <list of operations that have “side effects”> (terminate, call std​::​this_thread​::​yield, etc… here's the full list).

> But what are the useful, sound optimizations enabled by this rule, which effectively says “the compiler may assume that all loops terminate”?

If you would stop arguing about the taste of oysters with those who ate them and would look on the quote above… the answer is obvious, isn't it? Standard stop just half-step away from description of the optimization desired in details! Dead code elimination: start with “side effects”, look what code produces things that lead to “side effects”, remove everything else.

That's how clang removes useless code in simple example from excors. What ICC does is a bit more advanced, but idea is the same: if code doesn't produce “side effects”, then we can eliminate it. And if said code would have halted the program in that point? Oh, well, that program “was never valid”, anyway.

And if you add “side effects” to the loop, even a simple asm volatile(""), which is literally zero bytes of generated code, just a demand not to remove these zero bytes, then ICC complies, of course.

Rust compiler even does something like that automatically when it detects loops which don't have “side effects” but may or may not terminate. Such loops are usually “halting loops” that developers add, thus most optimizations are preserved… but this requires extra work and nobody likes extra work, compiler developers including.

Original version of C89 described the same thing in a very roundabout way: it postulated that program should produce “side effects” in a certain way while omitting the answer to the question about what happens to the code without any. That was still undefined behavior, even back then, it was just not explicit. Current version is much more straightforward and understandable, don't you think?

As you can see there are no magic and there are no malicious intent, either, but to understand what is happening and why you have to think, but screaming bloody murder demanding O_PONIES is so much more satisfying, isn't it?

Infinite loop language lawyering

Posted Jul 8, 2024 10:39 UTC (Mon) by matthias (subscriber, #94967) [Link] (9 responses)

> But what are the useful, sound optimizations enabled by this rule, which effectively says “the compiler may assume that all loops terminate”?

As khim already explained, this is dead code elimination. If there is a loop without any side-effects, the loop can be eliminated. Without the assumption, the compiler would need to solve the halting problem in order to decide whether the loop can be removed.

The question that was not answered is why would you want to have loops without side-effects in the first place. And the answer is you normally would not want this. But the behavior of a loop can depend on many things like template parameters, #ifdefs, etc. Assume a loop that is producing debug outputs for every element in a linked list. If you disable debug outputs, the debug-function called inside the loop is basically a no-op and the loop has no side-effects. But the loop is still there. If the compiler would not remove it, it would still walk through the entire linked list, just doing nothing. Without the assumption, the compiler needs to prove that the loop terminates, so essentially it has to prove that it is impossible that the linked list is circular. An impossible task. With the assumption, the compiler can simply remove the loop without caring whether it terminates or not.

Infinite loop language lawyering

Posted Jul 8, 2024 12:08 UTC (Mon) by daroc (editor, #160859) [Link]

Also, optimizations such as loop-invariant code motion can often lift some part of a calculation out of a loop, so even loops that look as though they should always contain something can sometimes become empty during optimization.

Infinite loop language lawyering

Posted Jul 8, 2024 19:15 UTC (Mon) by epa (subscriber, #39769) [Link] (7 responses)

If there is a loop without any side-effects, the loop can be eliminated.
Or in other words if the only possible side effect is non-termination, that can be ignored, and the loop treated as having truly zero side effects—which means you may as well not run it.

And your example of a compile-time debug flag explains why there could be practical benefits from eliminating this dead code.

Infinite loop language lawyering

Posted Jul 8, 2024 19:53 UTC (Mon) by khim (subscriber, #9252) [Link] (6 responses)

Dead code elimination is pretty basic optimization, without which nothing much else can be done.

And yes, the “guaranteed forward progresss rule” makes it possible to do it more often then you may do without it.

But the big questions are: “Is the complexity that it adds to the language worth it? Are savings from it in cases where compiler couldn't prove that loop actually stops worth it? Are compications needed to detetct and allow removal of only loops that can be proven to stop are worth it?”.

But for these questions to be asked and answered there needs to be a dialogue. Which just never happens in C/C++ word. Even when O_PONIES lovers concede the fact that low-level system programming language needs some undefined behaviors to be treated like compilers treat all undefined behaviors then frame their proposals in a form of ultimatum: instead of saying that we should define and permit some of these behaviors (like was stipulated explicitly in the Rationale for International Standard: Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose; it also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior) all their proposals include demands to stop treating undefined behaviors like an undefined behaviors and [try to] support programs that are doing some absolutely crazy things (e.g. programs that have code that accesses neighbour parameters of a function using address arithmetic).

Of course none of them ever try to implement their ideas and they also never use compilers like CompCert C that do have many of undefined behaviors defined (but not all, of course, as even sane proponents of O_PONIES admit we can use out-of-bounds memory accesses to read the executable machine code, and any change to the executable code can, in theory, change the behaviour).

All these talks about how we need O_PONIES, that only O_PONIES can be a solution and nothing but O_PONIES may ever work are amusing to participate on forums, but they, of course, don't move the needle one jot: neither standards committee nor compiler writers feel inclined to discuss radical “solutions” which would require them to rewrite everything with zero guarantess of acceptance.

Infinite loop language lawyering

Posted Jul 10, 2024 3:20 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (5 responses)

What I find baffling is that Rust has a very similar (but narrower and not quite as formal) set of UB rules, and there is never any discussion of whether Rust might've gotten a little closer to developer expectations than C and C++ did.

Salient features of the Rust rules[1]:

* Signed overflow is defined as wrapping, but panics on debug builds (and implementations may panic on release builds, if they so choose). C++26 has already introduced the concept of "erroneous behavior" with similar semantics (i.e. it can crash if the implementation feels like it), but has not applied it to signed integer overflow.
* There is no forward progress rule. It is also considered safe to deadlock, leak memory, exit without running destructors, or pass an inconsistent Hash implementation to e.g. HashMap (the affected HashMap instance may misbehave in a variety of ways, but it may not corrupt the heap or otherwise cause broad whole-program UB).
* Several things are UB in Rust that are well defined in C++, mostly under the "producing an invalid value" rule (the equivalent C/C++ rule is called "trap representations," but it's far narrower). Broadly speaking, C wants to say that every byte sequence of the appropriate size is a valid instance of a given type, and C++ would like to say this but only about trivially copyable types - but some hardware doesn't actually work that way, so they reluctantly acknowledge that, for numeric types other than char and (un)signed char, there might be byte sequences that produce UB when you try to evaluate them as an instance of that type. So, for example, if you have an enum with 3 values in C++, numbered from 0 to 2, and then try to cast 37 into that enum, C++ considers this an entirely legitimate and valid thing to do, because 37 is a valid integer and the enum's backing type must be large enough to store it (it is positive and less than SCHAR_MAX). Rust does not allow the equivalent maneuver, because you have three options and 37 is not bitwise equivalent to any of them.
* Rust's aliasing rules are different to those of C and C++. Rust's shared references may alias each other, but the object may not be mutated while a shared reference exists. Rust's mutable references may not alias anything (they're the equivalent of restrict). Rust's raw pointers have no aliasing restrictions (but they must not violate the restrictions applied to references). Type-based alias restrictions do not exist, and type-punning may be used freely as long as the object representation is a valid instance of the type in question. However, because validity cannot be statically checked in the general case, mem::transmute() (equivalent to std::bit_cast()) is protected by the unsafe keyword.

[1]: https://doc.rust-lang.org/reference/behavior-considered-u...

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 10:34 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

Rust has one big advantage over C and C++ when it comes to UB rules: outside of unsafe scopes (which need the literal token "unsafe" to mark them out), you cannot execute UB; that has combined with a culture that says that users of unsafe ought to make use of the visibility and privacy rules to ensure that code that does not itself use unsafe cannot cause your code to execute UB.

As a result, Rust's UB rules are not relevant to a majority of Rust programmers; only programmers who write unsafe code need to care about them. And because a majority of Rust programmers avoid unsafe because it's tricky to get right, you don't have the same issues around "accidental UB".

After all, if you look at the cases where people actively complain about C++ or C UB, they're cases where the intent behind the construct is fairly obvious to a human reader, and where many compilers do interpret it the way the original author intended. The complaint tends to be not so much "UB rules are horrible", but "this is reasonable code and should not be UB", and in Rust, you'd not have written unsafe, and thus wouldn't have ever been able to compile that code.

And, as an aside, I'd like to be picky about signed overflow; C++ could get the same general effect with unspecified behaviour, where the only permissible behaviours are wraparound or an exception being thrown. It didn't need the new concept of "erroneous behaviour" to get to the same place as Rust, and it would be possible to define acceptable wraparound behaviours such that most machines would do wraparound correctly with unsigned arithmetic operations (there's excess/biased, two forms of sign-magnitude depending on where the sign bit lives, base -2, one's complement and two's complement to consider, for at most 6 different definitions).

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 11:26 UTC (Wed) by Wol (subscriber, #4433) [Link] (3 responses)

> And, as an aside, I'd like to be picky about signed overflow; C++ could get the same general effect with unspecified behaviour, where the only permissible behaviours are wraparound or an exception being thrown.

I think this is the point I'm getting at, which is why I get so frustrated at the O_PONIES argument. You are executing code which assumes infinite precision, on hardware which can only do fixed precision. And then you place the onus on the PROGRAMMER to ensure that there are no bugs!!!

Rust may make the same "you want infinite precision" assumptions, but it also says "if you get it wrong, either the program will fail to compile, or it will crash at runtime". C/C++ just says "all bets are off" including deleting loads of code that will only ever execute if the programmer screws up - and was probably put there for the explicit purpose of detecting said screw up!!!

At the end of the day, all I want (thinking as an engineer) is that when hardware reality and programming mathematical purity collide, I get alarms going off. And if the language purists say I'm demanding O_PONIES, then that is a massive alarm going off telling anyone writing real-world systems they need to get the hell out of there.

Which is why, imnsho, I think C/C++ is rapidly heading into the Pascal world - a great language for teaching, but totally useless in real life without loads of hacks that sully its mathematical purity. The tragedy is that it didn't start out that way - as originally designed it matched reality pretty exactly - "x = a + b" just fed a and b into the cpu ADD instruction and gave you the result. If it wasn't what you expected because it had overflowed, so be it!

Rust is a real world language. Safe code is basically defined as "it does what the man on the Clapham Omnibus would expect". Code that can misbehave MUST be tucked inside the "unsafe" keyword, and then the programmer is placed on notice that the compiler will be unable to help them if they screw up. And equally importantly, the programmer is placed on notice if the code outside of "unsafe" screws up, that is a serious language bug that will get fixed, even if it breaks other code.

Cheers,
Wol

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 13:13 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

The reason for "O_PONIES" is that what people ask for is not a well-defined semantic, but "do what the hardware does". When the language is meant to be portable to arbitrary hardware, "do what the hardware" does is meaningless, since it depends on what the hardware is.

And Rust is not defined as "it does what the man on the Clapham Omnibus would expect" - there are explicit semantics defined for all of the operations, and it's expected that if the Rust operation doesn't quite match the hardware operation, the compiler will fix up the difference.

C has never worked the way you describe it working - even early C compilers could do things like strength reduction, converting "x = a * 5" to "x = a << 2 + a" . The problem is that some of these changes are liked (such as reducing multiply to bitshift and add), and nobody complains that the compiler chose to output a bitshift and add instead of a multiply, and some are disliked - but nobody can get even POSIX to define the meaning of UBs that people hate, like signed overflow.

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 14:23 UTC (Wed) by khim (subscriber, #9252) [Link]

> When the language is meant to be portable to arbitrary hardware, "do what the hardware" does is meaningless, since it depends on what the hardware is.

It doesn't even work for a single architecture. Because to even say “what the hardware is doing” you need precise and unambigious mapping from the source code to the machine code.

This essentially turns your language into an assembler and, worse, into non-optimizing assembler for even minor changes in the code generated may blow your code to pieces. I have even wrote such code, myself, when worked with a small enough microcontroller that had 256byte pages and I tried to squeeze all my code into one such page.

> And Rust is not defined as "it does what the man on the Clapham Omnibus would expect" - there are explicit semantics defined for all of the operations, and it's expected that if the Rust operation doesn't quite match the hardware operation, the compiler will fix up the difference.

Precisely. Rust program behavior is still described in terms of abstract Rust machine and Rust is fully embracing the fact that answer to the question about whether two such programs are equivalent or not couldn't be given in 100% of cases.

Rust works by embracing the fact that undecidable problems exist, while “we code for the hardware” guys just ignore that fact entirely.

> but nobody can get even POSIX to define the meaning of UBs that people hate, like signed overflow

Why do you say that? Both GCC and clang supported -fwrapv option for years. Signed overflow can be defined if you want it to be defined — that's not what “we code for the hardware” guys complain about. They complain about the need to know about these things in addition to knowing about what the hardware is doing — and that part couldn't be solved by tweaks to the language definition: of course if you want to use the language you need to know how it works! What other alternatives are there?

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 14:10 UTC (Wed) by khim (subscriber, #9252) [Link]

> Rust may make the same "you want infinite precision" assumptions, but it also says "if you get it wrong, either the program will fail to compile, or it will crash at runtime". C/C++ just says "all bets are off" including deleting loads of code that will only ever execute if the programmer screws up - and was probably put there for the explicit purpose of detecting said screw up!!!

You are misinterpreting things. Again. If you want to get Rust to behave like C/C++… just write x.unchecked_add(y) and you are back in the world of undefined overflow.

And if you want you may write __builtin_add_overflow(x, y, &res); in C/C++ and voila: no UB when something overflows! It's doable, ma!

> At the end of the day, all I want (thinking as an engineer) is that when hardware reality and programming mathematical purity collide, I get alarms going off.

Yup. You are asking for the only thing that compilers never delivered and could, in fact, never deliver.

> And if the language purists say I'm demanding O_PONIES, then that is a massive alarm going off telling anyone writing real-world systems they need to get the hell out of there.

Why? You are, quite literally, asking for the impossible. At some point this becomes a source of amusement, but it's obvious who would leave, in the end: the guys who are using things without crying about something they could never have would learn how to deal with the fact that compiler is an always an adversary (but sometimes a friend, too) and dreams “benign” compilers would never materialize, while people who want to “program for the hardware” would just eventually die off.

When Rust developers realized that providing what you are asking for is fundamentally impossible they divided language in two: in normal, safe, realm all “bad” programs are detected and rejected and in the unsafe real programmer is left one-on-one with a formidable adversary that is modern optimizing compiler.

They haven't solved the impossible issue, but they sidestepped it. And they are doing the same things in other places, too. They accept that existing state of affairs is not ideal but they effectively compromise to be able to work real-world programs using real-world compilers anyway.

While in C/C++ realm enough of developers (both C/C++ developers who use the compilers and also C/C++ compiler developers who write them) don't even think about compromise.

In particular your rants never ever admit that you are asking for the impossible and don't ever stop to think about whether some other way forward except for O_PONIES may exist.

> The tragedy is that it didn't start out that way

Seriously? Because from my POV it's the exact opposite.

> "x = a + b" just fed a and b into the cpu ADD instruction and gave you the result.

Sure. And that's where the problem lies. If you define your language in terms of instructions executed then you have to ensure precise sequence of instructions generated! And that's only possible and feasible when your compiler is primitive enough to predict exactly what a given sequence of characters gives as input would produce on the output from the compiler.

Not even assembler always works that way, self-modifying code often needs a very specific version of assembler to be compiled correctly.

Thinking that you may create a high-level language on that basis is sheer lunacy.

> Rust is a real world language.

Sure. But it's important to understand how the decision that were put in Rust have come to be. Rust is very much a result of a compromise. If we couldn't solve some problem perfectly then let's solve it to the best of our abilities and, maybe, redo the decision later. There's an intersting list of things that Gaydon Hoare had to give up on to ensure that Rust would succeed.

That's more-or-less the polar opposite from what C/C++ world is usually doing: compromises there are very rarely happen, instead things where people couldn't agree are just thrown away (take the drama discussed there with a grain of salt because JeanHeyd is a bit of a drama queen, but facts are facts).

> Safe code is basically defined as "it does what the man on the Clapham Omnibus would expect". Code that can misbehave MUST be tucked inside the "unsafe" keyword, and then the programmer is placed on notice that the compiler will be unable to help them if they screw up. And equally importantly, the programmer is placed on notice if the code outside of "unsafe" screws up, that is a serious language bug that will get fixed, even if it breaks other code.

Sure. And it works pretty well. But this mere structure is an explicit admission that O_PONIES are impossible, that to separation of the wheat from the chaff would never happen and when hardware reality and programming mathematical purity collide, I get alarms going off demand could never be fulfilled.

Rust design is based on top of that acceptance. If we couldn't precisely separate “good” programs from “bad” programs then let's separate them imprecisely and let's give you two answers instead on one: with false positives in one part of your program and false negative in the other part of your program.

And if you are willing to compromise then the line where the compromise is reached can be debated, but if you are not willing to compromise and only just scream “gimme O_PONIES, only O_PONIES are acceptable”, then obviously compromise would never be reached for you are not seeking it.

Infinite loop language lawyering

Posted Jul 5, 2024 19:23 UTC (Fri) by Wol (subscriber, #4433) [Link] (6 responses)

> Infinite loops had originally been made undefined behavior in order to allow optimizing compilers to assume that programs made forward progress;

Doesn't that now mean you can't write a gui? How much code contains the syntax

Do
....
Loop

WITH NO EXIT CONDITION?

Read literally, that now seems undefined behaviour, which allows the compiler to delete your event handler, which basically means whole classes of programs are now impossible to write in C++ !!!

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 5, 2024 19:30 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (3 responses)

Any kind of I/O (including reading a `volatile` variable) counts as "something can happen" and is therefore not considered a UB-infinite-loop.

Infinite loop language lawyering

Posted Jul 5, 2024 21:14 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (2 responses)

The problem with using volatile for this purpose is that it accomplishes too much. The compiler is required to emit all memory accesses to volatile variables exactly as written. Even if you write something like this:

volatile int v;
while(true) {
v;
// more code here.
}

The compiler is required to emit an explicit read* for each iteration, and that read must not be optimized out. This despite the fact that, in any mildly reasonable hosted environment (i.e. not some weird bare metal environment like an Arduino), it is fantastically unlikely that v actually has any sort of I/O-like semantics where those reads would make a difference. Heck, even in a bare metal environment, it's still pretty unlikely that v would have I/O semantics, because you would normally access such memory through a pointer to a magic address and not through a stack variable.

But the standard (and thus the C compiler) doesn't care. All it sees is that you asked for explicit reads and no optimization, so that's what you get.

* The standard does not specify exactly what counts as a "read." That is up to the implementation. In principle, an implementation could probably put v in a register, and then copy it to a second register on each iteration, and call that a "read." But I don't think the average implementation will actually do that, because many optimizations are disabled for volatile variables, and moving a variable to a register is not a trivial optimization (it requires proving that the variable's address is never taken, as well as figuring out which registers are available and so on). Regardless, it's pretty clear that the implementation is required to emit *something* that it considers equivalent to a "read."

Infinite loop language lawyering

Posted Jul 6, 2024 3:59 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (1 responses)

Well, the abstract machine really only has this one tool for out-of-band communication. If someone wants to put in things like `_Open_read`, `_Cpu_magic`, or `_Mmio` modifiers to help the compiler do…whatever it would do differently for these cases, that's probably a (long and arduous) path available. For something like C that aims to run on things from 1¢ micro controllers to bog standard desktops to whackadoodle specialized offload cards in HPC, trying to figure out what the different modifiers mean in each case sounds a lot harder than `volatile`'s "something outside of C's model" valve.

And I think the standard seems to do well with that problem exactly by not defining what a "read" is here (though yes, using a `volatile` stack variable is…an interesting case). If you are using sdcc and it knows about pin layout and whatnot, maybe it can know that the pin only updates at 20 kHz and can "optimize" and only actually read it at twice that rate. Or your mpicc can know that there's an MPI pipe attached somewhere that twiddles the variable and make it more event-driven. Of course, at that point you're not writing portable C anymore either and can talk with your compiler vendor about what you're actually getting.

Infinite loop language lawyering

Posted Jul 8, 2024 13:12 UTC (Mon) by khim (subscriber, #9252) [Link]

> For something like C that aims to run on things from 1¢ micro controllers to bog standard desktops to whackadoodle specialized offload cards in HPC, trying to figure out what the different modifiers mean in each case sounds a lot harder than `volatile`'s "something outside of C's model" valve.

As usual the tool that is both convenient and guaranteed is to use a bit different kind of volatile, specifically asm volatile. Simple __asm__ __volatile__(""); statement works on all architectures (because all of them accept emply string as assembler input) and it's volatile thus it has “side effects” and thus couldn't be removed.

Given the fact that most developers who need that tool already have it… there are little desire for the standartization. Although adding something like std::this_thread::no_yield to already existing std::this_thread::yield would be logical.

Infinite loop language lawyering

Posted Jul 5, 2024 19:36 UTC (Fri) by mb (subscriber, #50428) [Link]

No. A GUI has externally visible side effects. It has to write to volatile memory at some point. Otherwise it would not be usable.
It makes forward progress.

If you programmed a "GUI" that would not write to volatile memory, the compiler could optimize it completely away without a change in behavior.

Infinite loop language lawyering

Posted Jul 6, 2024 11:40 UTC (Sat) by rschroev (subscriber, #4164) [Link]

Any well-written GUI program will have something like

    if (user_requested_quit())
        break;
in its event loop (or message loop or game loop or whatever), and probably a whole lot of other exit conditions. Otherwise you just created a program that can only be stopped by forcefully terminating it, and your users will not be happy.

Infinite loop language lawyering

Posted Jul 6, 2024 9:41 UTC (Sat) by excors (subscriber, #95769) [Link]

> a strong interpretation would be “if the compiler can prove the loop is infinite, it can give any semantics it likes”. Which sounds far-fetched, but we have seen many cases where programmers were surprised by a more legalistic interpretation than they had expected, where the code theoretically has undefined behaviour.

This is indeed one of those cases: see https://godbolt.org/z/xhnMnsEG6 (where the function gets compiled into effectively "return n;"). clang -O1 will assume that a branch containing just "while (1) { }" cannot be executed, so it doesn't bother even checking the condition and will execute the other branch instead. That's dangerous if you're using it as an assert, expecting it to prevent execution of subsequent code when a precondition is violated.

Limits of static analysis

Posted Aug 16, 2025 5:27 UTC (Sat) by DemiMarie (subscriber, #164188) [Link] (2 responses)

The fundamental problem is that C and C++ are Turing-complete, so Rice’s theorem states that any nontrivial property of them is undecidable. This includes whether or not undefined behavior is possible. This means that any static analysis tool will either have false positives, false negatives, or both.

There are static analysis tools that can prove that undefined behavior never occurs. Such a tool is called sound. Both Frama-C + EVA and Astrée are sound static analyzers, and at least the latter is used in avionics where a software crash could cause the airplane to crash. The reason such tools are not used more widely is that they will also reject an enormous amount of correct code. For instance, I believe both tools do not allow dynamic memory allocation. That’s not a problem for safety-critical avionics software, but it is a dealbreaker for pretty much anything outside of embedded systems.

There are ways to eliminate undefined behavior while imposing fewer restrictions, but my understanding is that they require significant manual work. That’s why Rust requires lifetime annotations, and my understanding is that it’s why all attempts to create a memory safe subset of C++ have failed. There’s no way to do it that is both flexible enough for real programs and compatible with existing APIs. Of course, one could add enough annotations that one is essentially writing Rust in C++ syntax, but by that point one has done the hardest part of actually porting the code to Rust, and in fact a transpiler might actually be able to generate safe Rust from such code.

Limits of static analysis

Posted Aug 20, 2025 2:34 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

Even if C++ gets lifetimes, there is actually still the `Send` and `Sync` traits that help with managing programs using threads preserve lifetime soundness. These only really make sense with auto traits and typed synchronization containers (e.g., `std::mutex` is useless as it doesn't indicate what data it guards and a `std::mutex_data<T>` will be necessary instead). I have hope for the former. The latter is a lot further away (but maybe I'm just being pessimistic).

Pervasive changes needed to make static analysis tractable for useful programs

Posted Aug 20, 2025 11:07 UTC (Wed) by farnz (subscriber, #17727) [Link]

Note, too, that lifetimes, marker traits and similar have a hidden problem: they have to be pervasive before you get the benefits of them. This makes retrofitting them to C++ problematic - either they're worthless for most users, and only the subset of C++ users who don't have legacy code or use legacy C++ libraries (at least, not without a carefully considered wrapper that adds a new API with sound lifetimes, markers, or whatever) benefit, or they require a wholesale rework of all used C++ code.

In turn, if you have to rework all used C++ code to have a sound API with lifetimes, marker traits or whatever you've taken from Rust, you've got to have a good answer for "why rewrite in C++, when I can do the same work in Rust with cxx.rs?".


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds