Infinite loop language lawyering

Posted Jul 7, 2024 12:24 UTC (Sun) by Wol (subscriber, #4433)
In reply to: Infinite loop language lawyering by khim
Parent article: New features in C++26

> in 1987, two years before release of standard and it was already replacing return 2 + 2; with return 4; unconditionally.

And if instead it was replacing "return 200 + 200" with "return 400", when the function type was signed byte?

If I've got my maths right, the wrong thing to do is return -109. The right thing to do is a compiler error. The WORST POSSIBLE thing to do is silently delete the function, without warning. If I'm lucky, it's in a .o which then results in the linker crashing. If I'm unlucky, it's in the same .c as main{}, and has awful knock-on effects there.

And this is why C is a rubbish language. It runs on real hardware, but refuses to accept reality, pretending to run on a perfect Turing Machine. And in the process it actively hinders the programmer's attempts to ensure that theory and reality coincide.

The first Pick machine I programmed had BCD microcode. I think it actually used that as default, with infinite precision being possible. Pick itself actively encourages fixed point maths, which makes the programmer handle the exponent themselves. Both these features are totally unnecessary in the normal course of events, and programmers mostly ignored them, but the fact they are front and centre in the language and implementation means that at every step the programmer is confronted with the question "do I need to worry?"

And this is the problem with C. It *hides* from the programmer that this is a serious issue. And when you're dealing with hostile adversaries and malicious input, the fact that you cannot trust the language to at least try and protect you, is very dangerous. Worse, the language design actively helps attackers.

Stuff - anything - should be designed to "make the simple things easy, and the hard things possible". If C makes even understanding the language hard, then *everything* will be difficult. SQL explicitly takes this to extremes! As I understand it, a design aim was to make all things equally easy. As soon as any bit of your problem is hard, that makes *everything* hard!

Cheers,
Wol

Infinite loop language lawyering

Posted Jul 7, 2024 13:34 UTC (Sun) by khim (subscriber, #9252) [Link]

You are barking at the wrong tree, as usual.

> It runs on real hardware, but refuses to accept reality, pretending to run on a perfect Turing Machine.

That's the only way high level languages ever work. It's more-or-less the definition of high-level language. Complaining about that is like complaining that water is wet or fire is hot.

> And this is the problem with C. It *hides* from the programmer that this is a serious issue.

All languages do that, that's not an issue at all. Or, maybe better say that it's an issue, sure but that's inevitable issue thus, again, complaining about that is like complaining that water is wet or fire is hot.

> And when you're dealing with hostile adversaries and malicious input, the fact that you cannot trust the language to at least try and protect you, is very dangerous. Worse, the language design actively helps attackers.

Do you see me objecting? Of course C horrible language. But the actual fact is that any other language which fill the same niche would be horrible, by your definition, too!

That's something you completely ignore and just scream O_PONIES, O_PONIES, gimme O_PONIES.

> The WORST POSSIBLE thing to do is silently delete the function, without warning. If I'm lucky, it's in a .o which then results in the linker crashing. If I'm unlucky, it's in the same .c as main{}, and has awful knock-on effects there.

Yet that's necessary property of the system-level language that “permits everything”. Optimizations are transforming programs for Turing machine into different program for Turing machine (that's essentially what your computer is, whether you want to admit it or not). To perform them we need to answer the question: are these programs equivalent or not. And this question, quite literally, can not be answered. Not “it's hard to do”, not “we don't know how to do that yet”, but simply “that's mathematically impossible”.

Thus you only have two choices:

Your language have a straightjacket of some sort (be it managed code or some kind of theorem prover or something), or
Your language have certain constructs that compiler accepts and then miscompiles.

These are the only two choices, choice of only accepting “good programs” and rejecting, at compile time, “bad programs” is just not possible.

Most high-level languages pick the #1 choice, but that makes them unsuitable for low-level system programming work.

C makes #2 choice and any other low-level language that we have does the same. Even optimizing assemblers do that, that's why [it's impossible to compile old assembler sources with modern assemblers](https://www.os2museum.com/wp/gw-basic-source-notes/)!

Only very primitive assemblers are not affected because there are no non-trivial transformations of the code, input and output Turing machines are the same thus problem doesn't exist.

Even Rust, for all [pretty justified] hype around it does the same choice. It just reduces scope of that part where “bad programs” are accepted and then miscompiled. But it doesn't eliminate that property. Because it couldn't.

C was written in simpler times, where people believed that you may just say to developers what “bad programs” shouldn't do and that would be enough. Even if said list of “bad things not to do” is, literally, hundreds of entries long.

It doesn't work, sure, but that's property of the language, not property of compilers for that language. Compilers are just trying to do the best job they could with awful language they got.

It would have been possible to change the language and make it less awful (Ada did, that, after all), if not for the firm misguided belief of many C and some C++ developers that their adversary is not intrinsic complexity of the task they tackle, but developers of the compilers who just refuse to correctly compile perfectly correct code with bazillions of UBs.

Infinite loop language lawyering

Posted Jul 7, 2024 13:47 UTC (Sun) by pizza (subscriber, #46) [Link] (1 responses)

> And if instead it was replacing "return 200 + 200" with "return 400", when the function type was signed byte?

The result would be the same either way, because untagged numeric literals are treated as signed ints. If the compiler adds 200+200 and then assigns/returns the resulting 400, or performs constant compile time evaluation and replaces 200+200 with 400 in the binary, it's still an integer that's too large to fit in its destination.

What happens next depends on how the compiler+hardware defines "int" and "signed byte" but assuming int is 32-bit twos complement, and 'signed byte' is the same but only 8 bits, 400 (0x190) gets truncated to -112 (ie 0x90).

Infinite loop language lawyering

Posted Jul 8, 2024 12:54 UTC (Mon) by khim (subscriber, #9252) [Link]

Result would be the same either way simply because such code doesn't contain any undefined behaviors.

Addition happens as ints even if source is char, and 400 fits in int on all known compilers.

Conversion from int into 8-bit quantity overflows, sure, but that's not an issue, that particular overflow if very well defined: the result is the unique value of the destination type that is congruent to the source integer modulo 2ᴺ, where N is the width of the destination type

Since there are no undefined behaviors and results are well-defined… compilers don't have a choice: language says the result would be -112, there are no ambiguity.