Infinite loop language lawyering

Posted Jul 7, 2024 8:27 UTC (Sun) by epa (subscriber, #39769)
In reply to: Infinite loop language lawyering by khim
Parent article: New features in C++26

Thanks for explaining. But what are the useful, sound optimizations enabled by this rule, which effectively says “the compiler may assume that all loops terminate”?

(I wonder whether rewriting the loop using goto, or tail recursion, would avoid this problem. I had thought a for-loop was syntactic sugar for if and goto, but that’s not the case if an infinite goto-loop is defined while infinite for-loop is not.)

Infinite loop language lawyering

Posted Jul 7, 2024 10:33 UTC (Sun) by khim (subscriber, #9252) [Link]

> I wonder whether rewriting the loop using goto, or tail recursion, would avoid this problem.

Of course not. Very much not.

> I had thought a for-loop was syntactic sugar for if and goto

And you were correct.

> but that’s not the case if an infinite goto-loop is defined while infinite for-loop is not.

Have you tried to click on links that article helpfully provides and look on the actual rule that infinite loops are violating? Then you would know why tail recursion doesn't work and why there are not difference between for and while.

Of course goto loop works or (doesn't work, depending on who is asking) in the exact same way, there are absolutely zero difference.

Here is the actual quote: the implementation may assume that any thread will eventually do one of the following: <list of operations that have “side effects”> (terminate, call std::this_thread::yield, etc… here's the full list).

> But what are the useful, sound optimizations enabled by this rule, which effectively says “the compiler may assume that all loops terminate”?

If you would stop arguing about the taste of oysters with those who ate them and would look on the quote above… the answer is obvious, isn't it? Standard stop just half-step away from description of the optimization desired in details! Dead code elimination: start with “side effects”, look what code produces things that lead to “side effects”, remove everything else.

That's how clang removes useless code in simple example from excors. What ICC does is a bit more advanced, but idea is the same: if code doesn't produce “side effects”, then we can eliminate it. And if said code would have halted the program in that point? Oh, well, that program “was never valid”, anyway.

And if you add “side effects” to the loop, even a simple asm volatile(""), which is literally zero bytes of generated code, just a demand not to remove these zero bytes, then ICC complies, of course.

Rust compiler even does something like that automatically when it detects loops which don't have “side effects” but may or may not terminate. Such loops are usually “halting loops” that developers add, thus most optimizations are preserved… but this requires extra work and nobody likes extra work, compiler developers including.

Original version of C89 described the same thing in a very roundabout way: it postulated that program should produce “side effects” in a certain way while omitting the answer to the question about what happens to the code without any. That was still undefined behavior, even back then, it was just not explicit. Current version is much more straightforward and understandable, don't you think?

As you can see there are no magic and there are no malicious intent, either, but to understand what is happening and why you have to think, but ~~screaming bloody murder~~ demanding O_PONIES is so much more satisfying, isn't it?

Infinite loop language lawyering

Posted Jul 8, 2024 10:39 UTC (Mon) by matthias (subscriber, #94967) [Link] (9 responses)

> But what are the useful, sound optimizations enabled by this rule, which effectively says “the compiler may assume that all loops terminate”?

As khim already explained, this is dead code elimination. If there is a loop without any side-effects, the loop can be eliminated. Without the assumption, the compiler would need to solve the halting problem in order to decide whether the loop can be removed.

The question that was not answered is why would you want to have loops without side-effects in the first place. And the answer is you normally would not want this. But the behavior of a loop can depend on many things like template parameters, #ifdefs, etc. Assume a loop that is producing debug outputs for every element in a linked list. If you disable debug outputs, the debug-function called inside the loop is basically a no-op and the loop has no side-effects. But the loop is still there. If the compiler would not remove it, it would still walk through the entire linked list, just doing nothing. Without the assumption, the compiler needs to prove that the loop terminates, so essentially it has to prove that it is impossible that the linked list is circular. An impossible task. With the assumption, the compiler can simply remove the loop without caring whether it terminates or not.

Infinite loop language lawyering

Posted Jul 8, 2024 12:08 UTC (Mon) by daroc (editor, #160859) [Link]

Also, optimizations such as loop-invariant code motion can often lift some part of a calculation out of a loop, so even loops that look as though they should always contain something can sometimes become empty during optimization.

Infinite loop language lawyering

Posted Jul 8, 2024 19:15 UTC (Mon) by epa (subscriber, #39769) [Link] (7 responses)

If there is a loop without any side-effects, the loop can be eliminated.

Or in other words if the only possible side effect is non-termination, that can be ignored, and the loop treated as having truly zero side effects—which means you may as well not run it.

And your example of a compile-time debug flag explains why there could be practical benefits from eliminating this dead code.

Infinite loop language lawyering

Posted Jul 8, 2024 19:53 UTC (Mon) by khim (subscriber, #9252) [Link] (6 responses)

Dead code elimination is pretty basic optimization, without which nothing much else can be done.

And yes, the “guaranteed forward progresss rule” makes it possible to do it more often then you may do without it.

But the big questions are: “Is the complexity that it adds to the language worth it? Are savings from it in cases where compiler couldn't prove that loop actually stops worth it? Are compications needed to detetct and allow removal of only loops that can be proven to stop are worth it?”.

But for these questions to be asked and answered there needs to be a dialogue. Which just never happens in C/C++ word. Even when O_PONIES lovers concede the fact that low-level system programming language needs some undefined behaviors to be treated like compilers treat all undefined behaviors then frame their proposals in a form of ultimatum: instead of saying that we should define and permit some of these behaviors (like was stipulated explicitly in the Rationale for International Standard: Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose; it also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior) all their proposals include demands to stop treating undefined behaviors like an undefined behaviors and [try to] support programs that are doing some absolutely crazy things (e.g. programs that have code that accesses neighbour parameters of a function using address arithmetic).

Of course none of them ever try to implement their ideas and they also never use compilers like CompCert C that do have many of undefined behaviors defined (but not all, of course, as even sane proponents of O_PONIES admit we can use out-of-bounds memory accesses to read the executable machine code, and any change to the executable code can, in theory, change the behaviour).

All these talks about how we need O_PONIES, that only O_PONIES can be a solution and nothing but O_PONIES may ever work are amusing to participate on forums, but they, of course, don't move the needle one jot: neither standards committee nor compiler writers feel inclined to discuss radical “solutions” which would require them to rewrite everything with zero guarantess of acceptance.

Infinite loop language lawyering

Posted Jul 10, 2024 3:20 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (5 responses)

What I find baffling is that Rust has a very similar (but narrower and not quite as formal) set of UB rules, and there is never any discussion of whether Rust might've gotten a little closer to developer expectations than C and C++ did.

Salient features of the Rust rules[1]:

* Signed overflow is defined as wrapping, but panics on debug builds (and implementations may panic on release builds, if they so choose). C++26 has already introduced the concept of "erroneous behavior" with similar semantics (i.e. it can crash if the implementation feels like it), but has not applied it to signed integer overflow.
* There is no forward progress rule. It is also considered safe to deadlock, leak memory, exit without running destructors, or pass an inconsistent Hash implementation to e.g. HashMap (the affected HashMap instance may misbehave in a variety of ways, but it may not corrupt the heap or otherwise cause broad whole-program UB).
* Several things are UB in Rust that are well defined in C++, mostly under the "producing an invalid value" rule (the equivalent C/C++ rule is called "trap representations," but it's far narrower). Broadly speaking, C wants to say that every byte sequence of the appropriate size is a valid instance of a given type, and C++ would like to say this but only about trivially copyable types - but some hardware doesn't actually work that way, so they reluctantly acknowledge that, for numeric types other than char and (un)signed char, there might be byte sequences that produce UB when you try to evaluate them as an instance of that type. So, for example, if you have an enum with 3 values in C++, numbered from 0 to 2, and then try to cast 37 into that enum, C++ considers this an entirely legitimate and valid thing to do, because 37 is a valid integer and the enum's backing type must be large enough to store it (it is positive and less than SCHAR_MAX). Rust does not allow the equivalent maneuver, because you have three options and 37 is not bitwise equivalent to any of them.
* Rust's aliasing rules are different to those of C and C++. Rust's shared references may alias each other, but the object may not be mutated while a shared reference exists. Rust's mutable references may not alias anything (they're the equivalent of restrict). Rust's raw pointers have no aliasing restrictions (but they must not violate the restrictions applied to references). Type-based alias restrictions do not exist, and type-punning may be used freely as long as the object representation is a valid instance of the type in question. However, because validity cannot be statically checked in the general case, mem::transmute() (equivalent to std::bit_cast()) is protected by the unsafe keyword.

[1]: https://doc.rust-lang.org/reference/behavior-considered-u...

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 10:34 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

Rust has one big advantage over C and C++ when it comes to UB rules: outside of unsafe scopes (which need the literal token "unsafe" to mark them out), you cannot execute UB; that has combined with a culture that says that users of unsafe ought to make use of the visibility and privacy rules to ensure that code that does not itself use unsafe cannot cause your code to execute UB.

As a result, Rust's UB rules are not relevant to a majority of Rust programmers; only programmers who write unsafe code need to care about them. And because a majority of Rust programmers avoid unsafe because it's tricky to get right, you don't have the same issues around "accidental UB".

After all, if you look at the cases where people actively complain about C++ or C UB, they're cases where the intent behind the construct is fairly obvious to a human reader, and where many compilers do interpret it the way the original author intended. The complaint tends to be not so much "UB rules are horrible", but "this is reasonable code and should not be UB", and in Rust, you'd not have written unsafe, and thus wouldn't have ever been able to compile that code.

And, as an aside, I'd like to be picky about signed overflow; C++ could get the same general effect with unspecified behaviour, where the only permissible behaviours are wraparound or an exception being thrown. It didn't need the new concept of "erroneous behaviour" to get to the same place as Rust, and it would be possible to define acceptable wraparound behaviours such that most machines would do wraparound correctly with unsigned arithmetic operations (there's excess/biased, two forms of sign-magnitude depending on where the sign bit lives, base -2, one's complement and two's complement to consider, for at most 6 different definitions).

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 11:26 UTC (Wed) by Wol (subscriber, #4433) [Link] (3 responses)

> And, as an aside, I'd like to be picky about signed overflow; C++ could get the same general effect with unspecified behaviour, where the only permissible behaviours are wraparound or an exception being thrown.

I think this is the point I'm getting at, which is why I get so frustrated at the O_PONIES argument. You are executing code which assumes infinite precision, on hardware which can only do fixed precision. And then you place the onus on the PROGRAMMER to ensure that there are no bugs!!!

Rust may make the same "you want infinite precision" assumptions, but it also says "if you get it wrong, either the program will fail to compile, or it will crash at runtime". C/C++ just says "all bets are off" including deleting loads of code that will only ever execute if the programmer screws up - and was probably put there for the explicit purpose of detecting said screw up!!!

At the end of the day, all I want (thinking as an engineer) is that when hardware reality and programming mathematical purity collide, I get alarms going off. And if the language purists say I'm demanding O_PONIES, then that is a massive alarm going off telling anyone writing real-world systems they need to get the hell out of there.

Which is why, imnsho, I think C/C++ is rapidly heading into the Pascal world - a great language for teaching, but totally useless in real life without loads of hacks that sully its mathematical purity. The tragedy is that it didn't start out that way - as originally designed it matched reality pretty exactly - "x = a + b" just fed a and b into the cpu ADD instruction and gave you the result. If it wasn't what you expected because it had overflowed, so be it!

Rust is a real world language. Safe code is basically defined as "it does what the man on the Clapham Omnibus would expect". Code that can misbehave MUST be tucked inside the "unsafe" keyword, and then the programmer is placed on notice that the compiler will be unable to help them if they screw up. And equally importantly, the programmer is placed on notice if the code outside of "unsafe" screws up, that is a serious language bug that will get fixed, even if it breaks other code.

Cheers,
Wol

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 13:13 UTC (Wed) by farnz (subscriber, #17727) [Link] (1 responses)

The reason for "O_PONIES" is that what people ask for is not a well-defined semantic, but "do what the hardware does". When the language is meant to be portable to arbitrary hardware, "do what the hardware" does is meaningless, since it depends on what the hardware is.

And Rust is not defined as "it does what the man on the Clapham Omnibus would expect" - there are explicit semantics defined for all of the operations, and it's expected that if the Rust operation doesn't quite match the hardware operation, the compiler will fix up the difference.

C has never worked the way you describe it working - even early C compilers could do things like strength reduction, converting "x = a * 5" to "x = a << 2 + a" . The problem is that some of these changes are liked (such as reducing multiply to bitshift and add), and nobody complains that the compiler chose to output a bitshift and add instead of a multiply, and some are disliked - but nobody can get even POSIX to define the meaning of UBs that people hate, like signed overflow.

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 14:23 UTC (Wed) by khim (subscriber, #9252) [Link]

> When the language is meant to be portable to arbitrary hardware, "do what the hardware" does is meaningless, since it depends on what the hardware is.

It doesn't even work for a single architecture. Because to even say “what the hardware is doing” you need precise and unambigious mapping from the source code to the machine code.

This essentially turns your language into an assembler and, worse, into non-optimizing assembler for even minor changes in the code generated may blow your code to pieces. I have even wrote such code, myself, when worked with a small enough microcontroller that had 256byte pages and I tried to squeeze all my code into one such page.

> And Rust is not defined as "it does what the man on the Clapham Omnibus would expect" - there are explicit semantics defined for all of the operations, and it's expected that if the Rust operation doesn't quite match the hardware operation, the compiler will fix up the difference.

Precisely. Rust program behavior is still described in terms of abstract Rust machine and Rust is fully embracing the fact that answer to the question about whether two such programs are equivalent or not couldn't be given in 100% of cases.

Rust works by embracing the fact that undecidable problems exist, while “we code for the hardware” guys just ignore that fact entirely.

> but nobody can get even POSIX to define the meaning of UBs that people hate, like signed overflow

Why do you say that? Both GCC and clang supported -fwrapv option for years. Signed overflow can be defined if you want it to be defined — that's not what “we code for the hardware” guys complain about. They complain about the need to know about these things in addition to knowing about what the hardware is doing — and that part couldn't be solved by tweaks to the language definition: of course if you want to use the language you need to know how it works! What other alternatives are there?

Rust UB versus C UB - and why there's less discussion of Rust UB

Posted Jul 10, 2024 14:10 UTC (Wed) by khim (subscriber, #9252) [Link]

> Rust may make the same "you want infinite precision" assumptions, but it also says "if you get it wrong, either the program will fail to compile, or it will crash at runtime". C/C++ just says "all bets are off" including deleting loads of code that will only ever execute if the programmer screws up - and was probably put there for the explicit purpose of detecting said screw up!!!

You are misinterpreting things. Again. If you want to get Rust to behave like C/C++… just write x.unchecked_add(y) and you are back in the world of undefined overflow.

And if you want you may write __builtin_add_overflow(x, y, &res); in C/C++ and voila: no UB when something overflows! It's doable, ma!

> At the end of the day, all I want (thinking as an engineer) is that when hardware reality and programming mathematical purity collide, I get alarms going off.

Yup. You are asking for the only thing that compilers never delivered and could, in fact, never deliver.

> And if the language purists say I'm demanding O_PONIES, then that is a massive alarm going off telling anyone writing real-world systems they need to get the hell out of there.

Why? You are, quite literally, asking for the impossible. At some point this becomes a source of amusement, but it's obvious who would leave, in the end: the guys who are using things without crying about something they could never have would learn how to deal with the fact that compiler is an always an adversary (but sometimes a friend, too) and dreams “benign” compilers would never materialize, while people who want to “program for the hardware” would just eventually die off.

When Rust developers realized that providing what you are asking for is fundamentally impossible they divided language in two: in normal, safe, realm all “bad” programs are detected and rejected and in the unsafe real programmer is left one-on-one with a formidable adversary that is modern optimizing compiler.

They haven't solved the impossible issue, but they sidestepped it. And they are doing the same things in other places, too. They accept that existing state of affairs is not ideal but they effectively compromise to be able to work real-world programs using real-world compilers anyway.

While in C/C++ realm enough of developers (both C/C++ developers who use the compilers and also C/C++ compiler developers who write them) don't even think about compromise.

In particular your rants never ever admit that you are asking for the impossible and don't ever stop to think about whether some other way forward except for O_PONIES may exist.

> The tragedy is that it didn't start out that way

Seriously? Because from my POV it's the exact opposite.

> "x = a + b" just fed a and b into the cpu ADD instruction and gave you the result.

Sure. And that's where the problem lies. If you define your language in terms of instructions executed then you have to ensure precise sequence of instructions generated! And that's only possible and feasible when your compiler is primitive enough to predict exactly what a given sequence of characters gives as input would produce on the output from the compiler.

Not even assembler always works that way, self-modifying code often needs a very specific version of assembler to be compiled correctly.

Thinking that you may create a high-level language on that basis is sheer lunacy.

> Rust is a real world language.

Sure. But it's important to understand how the decision that were put in Rust have come to be. Rust is very much a result of a compromise. If we couldn't solve some problem perfectly then let's solve it to the best of our abilities and, maybe, redo the decision later. There's an intersting list of things that Gaydon Hoare had to give up on to ensure that Rust would succeed.

That's more-or-less the polar opposite from what C/C++ world is usually doing: compromises there are very rarely happen, instead things where people couldn't agree are just thrown away (take the drama discussed there with a grain of salt because JeanHeyd is a bit of a drama queen, but facts are facts).

> Safe code is basically defined as "it does what the man on the Clapham Omnibus would expect". Code that can misbehave MUST be tucked inside the "unsafe" keyword, and then the programmer is placed on notice that the compiler will be unable to help them if they screw up. And equally importantly, the programmer is placed on notice if the code outside of "unsafe" screws up, that is a serious language bug that will get fixed, even if it breaks other code.

Sure. And it works pretty well. But this mere structure is an explicit admission that O_PONIES are impossible, that to separation of the wheat from the chaff would never happen and when hardware reality and programming mathematical purity collide, I get alarms going off demand could never be fulfilled.

Rust design is based on top of that acceptance. If we couldn't precisely separate “good” programs from “bad” programs then let's separate them imprecisely and let's give you two answers instead on one: with false positives in one part of your program and false negative in the other part of your program.

And if you are willing to compromise then the line where the compromise is reached can be debated, but if you are not willing to compromise and only just scream “gimme O_PONIES, only O_PONIES are acceptable”, then obviously compromise would never be reached for you are not seeking it.