Development quote of the week

Posted Dec 1, 2022 12:08 UTC (Thu) by Wol (subscriber, #4433)
In reply to: Development quote of the week by anton
Parent article: Development quote of the week

> They believe that their programs will not be hit because they avoid undefined behaviour, and they use sanitizers (tools that make you aware of a few kinds undefined behaviours if they occur in a particular run) to find cases where they failed to do so,

Which bites far too many sensible programmers, because they TEST for undefined behaviour, and THE COMPILER DELETES THE TESTS BECAUSE IT'S UNDEFINED.

That's the lunacy of this approach ...

Cheers,
Wol

Development quote of the week

Posted Dec 1, 2022 13:55 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (9 responses)

The tests for behavior seem to expect some specific behavior in order to detect the situation happening. Sure, your platform(s) might have machine-defined behaviors for overflowing integers, but you're coding for the C machine which doesn't have anything specific that can (or should) happen, so it is right to remove it.

If C were not "so portable" to allow it to work without forced pessimization on machines which don't matter to *you*, but mattered in the past, then maybe "well, it's two's complement, so there is an expected behavior that can be tested for" would allow that condition to stick around. FWIW, C23 does consider signed integer representations other than two's complement "obsolete" (for now at least; it is not yet finalized). So if you're using C23, then the condition can stick around (of course, you lose performant addition on one's complement machines, but by writing such a check you already didn't care, presumably).

So I see these three solutions for UB that continue to cause problems because developers think "on the hardware" instead of "on the C abstract machine":

- accept the loss of portability by moving behaviors from UB to implementation-defined
- add intrinsics to detect bad cases to be used in these checks
- better ways to say "I know I am using assumption X about the hardware here" and let the compiler say "that doesn't work here" when compiling for ObscureArch (instead of just miscompiling)

The former is probably fine with most things, but once you get into the differences between currently-popular architectures (e.g., supporting unaligned reads and such), expect to hear about how C is "leaving performance on the floor" for certain use cases.

Development quote of the week

Posted Dec 1, 2022 17:03 UTC (Thu) by stevie-oh (subscriber, #130795) [Link] (1 responses)

> add intrinsics to detect bad cases to be used in these checks

I've been following the whole Undefined Behavior compilers-vs-developers madness for a few years now, and as I understand it, a lot of these checks are superfluous and _can_ be elided -- because they were part of boilerplate code that was inlined, and _in the inlined context_, the compiler can prove that the check is unnecessary.

Sure, the example given in the linked article doesn't have any inlined code. But the code responsible for eliding the check has no way to know that, because it's not looking at your .c file; it's looking at an abstract model of the code that's already been run through a few optimization passes.

> better ways to say "I know I am using assumption X about the hardware here"

For the case of signed integer overflow -- and, really, the biggest source of astonishment for people seems to be signed integer overflow -- you can use *unsigned* integers. Overflow those all you want; those are specified to wrap exactly the way most people expect.

Though, personally, I'd just use

```
if (x > (INT_MAX / 0x1ff))
return 0;
```

as my overflow check.

https://godbolt.org/z/KTYq9TqoM

Development quote of the week

Posted Dec 2, 2022 14:15 UTC (Fri) by renox (guest, #23785) [Link]

> For the case of signed integer overflow -- and, really, the biggest source of astonishment for people seems to be signed integer overflow -- you can use *unsigned* integers. Overflow those all you want; those are specified to wrap exactly the way most people expect.

In C yes, in Zig unsigned are exactly like signed integers: overflows are detected in debug mode but are undefined behaviors in release mode.

Development quote of the week

Posted Dec 1, 2022 17:38 UTC (Thu) by anton (subscriber, #25547) [Link] (6 responses)

Reality check: Name an architecture from after 1970 that does not support two's-complement arithmetic. Even if there is one, then on that architecture the C unsigned types are at least inefficient: Many two's-complement operations are the same as unsigned operations, so if an architecture does not have the two's-complement operations, it does not have the unsigned operations, either. I have certainly never programmed on a binary machine that did not support two's-complement, and I have been programming since the early 1980s.

And GCC targets only two's-complement machines. While portability to dinosaurs probably was the original reason for undefining signed overflow, I have read from fans of undefined behaviour who deny this and claim that this undefined behaviour was introduced for performance.

supporting unaligned reads

Actually that has won (at least in general-purpose computers). If I want to test my programs on a machine that requires data alignment, I have to turn on our 2006-vintage Sparc T1000 or even older machines (most likely I would turn on a 1998-vintage Alpha). If I want to test on big-endian machines, I also have to turn on some old, normally turned-off machine; little-endian has won, too.

Development quote of the week

Posted Dec 1, 2022 18:57 UTC (Thu) by khim (subscriber, #9252) [Link] (4 responses)

> If I want to test my programs on a machine that requires data alignment, I have to turn on our 2006-vintage Sparc T1000 or even older machines (most likely I would turn on a 1998-vintage Alpha).

You have a 1998-vintage Alpha but don't have a modern x86 device? That's crazy hard to believe. AC flag was added to it more than 30 years ago yet it's still supported. And vmovapsstill fails even without it in a brand-spanking new AVX512 (which AMD have only just added to their CPUs this year).

> If I want to test on big-endian machines, I also have to turn on some old, normally turned-off machine; little-endian has won, too.

IBM still makes new mainframes, Linux still supports them.

Development quote of the week

Posted Dec 1, 2022 22:41 UTC (Thu) by anton (subscriber, #25547) [Link]

I have tried several times to use the AC flag for testing that everything is aligned, but failed: On IA-32 Intel's own ABI results in code that produces an exception if AC is set. On AMD64 the ABI is ok, but gcc produces code with unaligned accesses from code where all pointers are correctly aligned for their data types.

Concerning movaps and friends, yes, they are useful to let fans of undefined behaviour miscompile previously working programs, but they are not useful for testing whether all memory accesses in a program are aligned. That's because they don't work for general-purpose registers, they don't work for scalars, they check for the wrong alignment (not the type alignment, but the container alignment; total brain damage), and there is no guaranteed way to get gcc to use it.

Concerning byte order: Unfortunately IBM has failed to make a mainframe available to me, so I still have to turn on a vintage machine for big-endian testing (I typically use a 2004-vintage iBook G4 for that). In the long run, few people will test software on big-endian machines, bit rot will lead to many programs not working on them, and that will be a problem for IBM's mainframe business; or maybe not: IBM's salesforce will undoubtedly convince the customers that it's a sign of their elite status that not every old program works on their mainframe.

Meanwhile, the OpenPower part of IBM has seen the signs of the time and has switched the architecture to little-endian.

Development quote of the week

Posted Dec 1, 2022 23:11 UTC (Thu) by Wol (subscriber, #4433) [Link] (1 responses)

> You have a 1998-vintage Alpha but don't have a modern x86 device?

That's not what he said. Or are you saying that x86 REQUIRES data alignment? (I don't know x86 architecture - I really don't know.)

Cheers,
Wol

Development quote of the week

Posted Dec 2, 2022 0:09 UTC (Fri) by khim (subscriber, #9252) [Link]

> Or are you saying that x86 REQUIRES data alignment?

Yes. Starting from 80486 you have a bit in EFLAGS register which you can use to switch between two modes.

ARM got such switch a bit later (only it was getting to it from the opposite direction: old versions required data alignment while it's optional on latest CPUs).

Development quote of the week

Posted Dec 2, 2022 18:59 UTC (Fri) by ejr (subscriber, #51652) [Link]

But packages like Redis (in redis-index) explicitly do not support big-endian.

It's fun, really.

Development quote of the week

Posted Dec 1, 2022 19:36 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Name an architecture from after 1970 that does not support two's-complement arithmetic.

I can't. However, C still thought it important enough to not specify until the standard due next year.

> I have read from fans of undefined behaviour who deny this and claim that this undefined behaviour was introduced for performance.

I'm not a *fan* of UB. I dislike it as much as anyone here. But I'm also not saying that *my* thoughts on what any given UB should mean should be enshrined either. Note that some optimizations allowed by UB can improve performance. That doesn't justify UB's existence, but should serve as a reminder for some of the reasons UB continues to persist.

> If I want to test on big-endian machines, I also have to turn on some old, normally turned-off machine; little-endian has won, too.

We deploy to (new) machines that are big-endian, so that is certainly not universal.

Development quote of the week

Posted Dec 1, 2022 16:36 UTC (Thu) by Karellen (subscriber, #67644) [Link] (80 responses)

How is that lunacy?

Compiler: "Your program is not allowed to perform UB, e.g. by exhibiting overflow."

Developer: "Oh, OK then. I'll try an operation and check if it overflowed - but sneakily - to make sure I don't overflow."

Compiler: "Wait...what? What part of "not allowed to perform" are you not getting?"

You can write tests to see if an operation might perform UB - but you have to do that in a way that doesn't itself perform UB. For example, by checking (a > 0 && b > 0 && a < MAX_INT / b) instead of (a > 0 && b > 0 && a * b > 0).

Development quote of the week

Posted Dec 1, 2022 17:14 UTC (Thu) by Wol (subscriber, #4433) [Link] (58 responses)

But where does it say you are *not allowed* to perform undefined behaviour? It simply says that the result is undefined. And deleting checks that are specifically there to detect said undefined behaviour is just, well, insane ...

The problem is the compiler writers are mixing up pure mathematics and applied mathematics. And it screws over people who actually do try and check that they are getting a pure result from applied hardware.

If I multiply 100 by 100, that is pure maths, and does not overflow. If I'm using 8-bit hardware then the result is not going to be pure. The point of a computer is to compute, and ruthlessly applying applied maths constraints to a program trying to do a pure calculation (and corrupting the results in consequence) is simply a willy-waving exercise in producing meaningless AND DANGEROUS results.

As others have pointed out, this insanity will simply lead to C becoming - rather rapidly - obsolete. Rust, with its emphasis on "there is no such thing as undefined behaviour in safe code" and "unsafe blocks can carry out undefined behaviour" will supplant it in no time flat if people lose confidence in the ability of C compilers to produce code that can be trusted. It's all very well to expect old code to keep working, but if you can't recompile and trust the result, you will RUN away, not walk...

I doubt it's a co-incidence that Rust code has far more invariants than C, and therefore far fewer opportunities for mad computer scientists to produce unexpected results from apparently correct programs ...

Cheers,
Wol

Development quote of the week

Posted Dec 1, 2022 18:41 UTC (Thu) by khim (subscriber, #9252) [Link] (51 responses)

> But where does it say you are *not allowed* to perform undefined behaviour?

In the rationale for C99:

Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. I

Simply by definition undefined behavior is a program error (albeit the one which may not be diagnosed by a compiler).

> And deleting checks that are specifically there to detect said undefined behaviour is just, well, insane ...

Computers don't deal with “sane” or “insane”. They don't have organs capable of dealing with these notions.

The most you can hope for is logical (and then, only if there are no bugs in the compiler).

> I doubt it's a co-incidence that Rust code has far more invariants than C, and therefore far fewer opportunities for mad computer scientists to produce unexpected results from apparently correct programs ...

Of course no. Rust is solving things in an extremely classic way:

An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning

> As others have pointed out, this insanity will simply lead to C becoming - rather rapidly - obsolete. Rust, with its emphasis on "there is no such thing as undefined behaviour in safe code" and "unsafe blocks can carry out undefined behaviour" will supplant it in no time flat if people lose confidence in the ability of C compilers to produce code that can be trusted. It's all very well to expect old code to keep working, but if you can't recompile and trust the result, you will RUN away, not walk...

The biggest difference between C/C++ lang and Rust land are the attitude.

In Rust land, no matter how talented and capable you are… if you don't follow the rules you would be ostracized and expelled.

That's very harsh, but as complete meltdown of C and C++ communities have shown that's the only way you can cooperate. Think about it: why the heck all these discussions about UBs in C/C++ invariably, 100% of time, come to that stupid “integer overflow” case? Both gcc and clang have options which make compilers respect integer overflow rules, shouldn't that close that discussion for good?

But no, C and C++ developers, mostly old ones, who have grown in times where Undocumented DOS and Undocumented Windows were a goto books… they just couldn't accept that. For them it's not about concrete UB defined or not defined in the standard, for them it's symbolic: Am I a trembling creature or have I the right??

It's not about particular language rule, it's about that ability to ignore the rules (when justified, of course, yada, yada).

But how compilers are supposed to optimize anything if their users are expected to break the rules? What's the difference (from the compiler's POV) between this atrocity and your overflow checks?

int set(int x) {
    int a;
    a = x;
}

int add(int y) {
    int a;
    return a + y;
}

int main() {
    int sum;
    set(2);
    sum = add(3);
    printf("%d\n", sum);
}

They both rely on the knowledge of the world that compiler doesn't have, they both work on most major compilers if you turn off the optimizations… why one of them should be treated differently from the other?

Rust answer is, basically “rules are rules, if program breaks them then anything is possible… but if you broke then by mistake we would help you to follow them”. Look on the story with Ipv4Addr, Ipv6Addr, SocketAddrV4 and SocketAddrV6: yes, rustc developers spent significant time talking to different crate makers to ensure update wouldn't be too disruptive, but that was only possible because said developers are not starting with that Am I a trembling creature or have I the right? question. If their code is wrong then it should be fixed, thanks for the bug report.

Development quote of the week

Posted Dec 1, 2022 22:40 UTC (Thu) by Wol (subscriber, #4433) [Link] (16 responses)

> In the rationale for C99:
>
> Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. I
>
> Simply by definition undefined behavior is a program error (albeit the one which may not be diagnosed by a compiler).

Except that doesn't say what you say it says. It says it is not the responsibility of the compiler to catch undefined behaviour. That should NOT give the compiler licence to trash the programmer's attempt to catch UB.

The compiler should emit a warning, an error, or ignore it. The compiler shouldn't ASSUME UB (or the lack of it) and use it to alter the program logic.

Cheers,
Wol

Development quote of the week

Posted Dec 1, 2022 23:58 UTC (Thu) by khim (subscriber, #9252) [Link] (14 responses)

> It says it is not the responsibility of the compiler to catch undefined behaviour.

And that's precisely what happens. Compiler does hundreds of tiny steps and each one is supposed to be valid in the absence of UB.

There are no code which does this: oooh, I seee there's UB ⇒ let's punish developer in a certain elaborate way

Instead It has hundreds of small, simple steps where each and every one assumes there are no UB.

because there are no UB ⇒ we can do this

because there are no UB ⇒ we can do that

…

After dozen of hundred such transformations you get the result which you are observing.

> The compiler shouldn't ASSUME UB (or the lack of it) and use it to alter the program logic.

That's precisely the point of having an UB in the language definition. List of UBs are, in reality, list of things that normal program should never do. Trying to access memory which was freed. Dereferencing NULL. Attempting to use variable which belongs to the function which have done its work and was finished. And so on.

UBs tracking are responsibility of the developer, not the compiler. That's why they exist in the first place.

Consider classic Rust's UB: core::hint::unreachable_unchecked. It's gcc counterpart is called __builtin_unreachable.

The whole point of these functions is to provide hint to the compiler: this condition can never be true. It would be pretty silly to “catch and report” something each time when you use such function, isn't it?

Development quote of the week

Posted Dec 2, 2022 14:50 UTC (Fri) by farnz (subscriber, #17727) [Link] (13 responses)

And further, a lot of the consequences of UB are not "compiler detected UB and did something silly". It's instead "compiler assumed no UB, and did something that's sensible for all defined behaviour, but not in the face of UB".

For example, a compiler can transform:

int func(int input) {
    if (input < 100) {
        return -1;
    }

    input += 50;

    if (input > 500) {
         return 1;
    }
    return 0;
}

into an equivalent of the following:

int func(int input) {
    if (input < 100) {
        return -1;
    } else if (input > 550) {
         return 1;
    } else {
        return 0;
    }
}

This is useful - we've removed an operation from the code completely - but to do this optimization, we've had to assume that input += 50 does not overflow. The compiler can make this assumption in C because signed integer overflow is undefined; if we were working with unsigned integers, then this is not a valid assumption to make, and the optimized form would look like:

unsigned func(unsigned input) {
    if (input < 100) {
        return -1;
    } else if (input > (UINT_MAX - 500) {
        return 0;
    } else if (input > 550) {
         return 1;
    } else {
        return 0;
    }
}

This is a relatively obvious case - but once the compiler is doing range analysis or other complex optimizations, the chain of reasoning that the compiler's going through can become incredibly opaque to humans, because the compiler is "just" looking through a large number of rules that might apply, choosing the ones that make the code potentially better, and repeating until it reaches a point where it's "good enough".

Hence my comparison to "proofs" that 1 = 0; these proofs tend to have lots of unnecessary steps, all of which are reasonable, and all of which exist to hide the step that goes wrong. That particular set of proof steps is a lot of rubbish meant to hide that the core of the proof is the undefined operation "divide by zero", and that if I continue reasoning validly after doing a divide by zero, I get a nonsense output. But, if you look at that proof and ignore the step that includes "we divide both sides by (x-y)", every other step is a reasonable and sane operation to perform. It's just that the outcome is complete nonsense.

Development quote of the week

Posted Dec 2, 2022 15:35 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

Yes. And it's important thing to understand that compilers always acted in that way.

You couldn't have low-level language without UBs but as long as compilers were only capable of doing one or two or three simple passes (you can download early release of Microsoft Pascal and see that these passes were, literally, separate binaries because there was not enough memory to even keep all the program in memory at once) an illusion that you are writing code for the hardware, not for the abstract spec looked sensible: compliers just weren't too powerful to break that illusion for most programs and for most programmers (they were always able to break special, hand-crafted to be broken, programs, but since what they were able to do was so simple it was easy to reason about whether they would break certain invalid-by-spec-valid-on-the-hardware program).

As compilers become more-and-more powerful they became capable of inventing longer and longer proofs and chances of making developers who misunderstood what they are actually doing upset have grown higher and higher.

Tragedy happened around 15-20 years ago when computers have passed 1GHz barrier, have gotten gigabyte of memory (or more) and compilers have acquired the ability to do really long and complicated analysis.

At this point rules, collections of UB, made back in the end of XX century stopped being adequate and it became obvious that C/C++ model is describing that cloud-cuckoo land… but no one wanted to change anything: C users continued in their delusion and C compiler developers insisted that it doesn't matter that C standard describes crazy cloud-cuckoo land, it's a standard, you have to follow it!

This inevitably ended in today's meltdown because when people talk past each other with ultimatums and where both sides are just simple not capable of doing what the other side demands… the crazy fiasco what we have today was quite inevitable and unavoidable, I'm afraid.

Compliers can only ever work “to the spec”, the illusion that you may “program for the hardware” is just an illusion, it's not possible in any language except for assembler (and it's not even possible to do that in assembler, just look on how retro-computing enthusiasts are looking on finding a specific version of assembler which may “correctly” compile old versions of MS DOS or MS BASIC).

And C users could not follow the existing C language spec because it definitely describes cloud-cuckoo land, there are just too many UBs, some of them are really crazy, and because, in C, they can surface in almost every line of code… it's just beyond human capabilties, practically speaking.

Compromise was needed, but, unfortunately, it's obvious that it was also impossible to achieve that without change to the community… and that change could only ever happen with a switch to another language.

Sad story, but, ultimately, the one which was almost inevitable.

Development quote of the week

Posted Dec 2, 2022 16:08 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

While I mostly agree, I don't think it's the 1 GHz barrier that made the difference; rather, I think it was the invention of SSA form in the last 1980s.

Before SSA form was invented, the amount of information you had to track to do anything much more complex than peephole optimizations exploded very quickly as the program grew, making it space-prohibitive to do many of the complex optimizations modern compilers do. SSA form tamed that explosion of state - and thus enabled compilers to do much larger-scale reasoning about the program's meaning.

Development quote of the week

Posted Dec 2, 2022 16:37 UTC (Fri) by khim (subscriber, #9252) [Link]

Both were important. Without SSA you couldn't do that many optimizations, but without extremely fast CPUs and loads of memory you couldn't do them either!

Note how SSA was invented in the 1980th, yet most compliers haven't employed it till many years later. GCC got it in 2005.

GCC started to use the fact signed integers never overflow in GCC 2.95, years before it adopted SSA.

Development quote of the week

Posted Dec 3, 2022 0:32 UTC (Sat) by Wol (subscriber, #4433) [Link] (7 responses)

> And further, a lot of the consequences of UB are not "compiler detected UB and did something silly". It's instead "compiler assumed no UB, and did something that's sensible for all defined behaviour, but not in the face of UB".

But this is where it DOES get silly.

In Maths, "int * int = int". Valid input gives valid output.

In Computerese, "int32 * int32 = ...?". Maths tells us that given valid input, we can NOT guarantee valid output. And this is where Khim's statement below, that we cannot have a low-level compiler without UB, is wrong. invalid != UB.

If the maths tells us that valid input cannot guarantee valid output, then the spec should define the consequences, even if it dodges the issue by saying "whatever the hardware does". We don't need a spec to define what mathematics is. We need a spec to define the consequences of using an applied finite range rather than a pure infinite range.

That then means we get a *sensible* model that says "where pure maths works, use the mathematical model. Where finite-range applied maths works, use the finite model. And when all else fails, whatever the hardware does." Yes it's going to break a lot of those optimisations where the compiler assumes the input has been sanitised, but the compiler should not be assuming! If the code says "uint64 = uint32 * uint32", then the compiler can quite happily assume that, provided the operands are promoted BEFORE the multiplication, the result will be valid. That's where those optimisations would be valuable!

And that's where it looks like Rust scores. The compiler DOESN'T assume the input has been sanitised - it checks. Where it can't check, the programmer has to use "unsafe" to confirm he's checked. And while I haven't programmed any Rust, it doesn't look like it's that hard to bypass Rust's safety checks - but the language does force you to *explicitly* bypass them. So you can assume, by default, that all Rust code is safe.

The problem with all these fancy C optimisations is that they take advantage of code that the programmer didn't even realise was unsafe.

Cheers,
Wol

Development quote of the week

Posted Dec 3, 2022 16:00 UTC (Sat) by khim (subscriber, #9252) [Link] (6 responses)

> In Maths, "int * int = int". Valid input gives valid output.

Can we stop beating that dead horse? Because it become crazy silly at this point. Existing compilers give you a way to remove that UB from the specification (with -fno-strict-overflow), but that just changes the list of UBs in the abstract machine specification.

It doesn't change the fact that you are still programming for the abstract machine not “for the hardware”.

> And that's where it looks like Rust scores. The compiler DOESN'T assume the input has been sanitised - it checks.

Only if you wouldn't write the magical unsafe keyword. Then it assumes that you have sanitized your pointers (just like C does). It makes some things which C puts in the UB bucket well-defined (e.g. type punning is legal in Rust) but, in turn, it places additional restrictions, too.

> So you can assume, by default, that all Rust code is safe.

Yes. But some developers don't buy that argument and program in unsafe Rust like you claim it's Ok to program in C. Some of them are even quite capable and knowledgeable. That's how we end up in that situation: if nothing else works they are just expelled from the ecosystem.

C couldn't afford that solution.

> The problem with all these fancy C optimisations is that they take advantage of code that the programmer didn't even realise was unsafe.

Sure, but the only way to solve that problem is to change the abstract machine specification! And attempts to do that have failed spectacularly.

Maybe at one point is was a surprise to the developers that signed overflow is undefined in C and C++, but in last 10 years so many articles and lectures talked so much about these that I have no idea where you have to live not to know that. And if you know and still want to use that mode (like Linux kernel does) there are -fno-strict-overflow.

Yet C users assert that it's not enough and they have right for that magical O_PONIES switch and C compiler developers, when they see that C users are not really serious about working together, do nothing beyond -fno-strict-overflow because it suits them just fine: all sensible developers are either using checks which don't overflow (you don't need UB, more sensible option would be use of functions provided just for that purpose) or use the switch and the ones who are non-sensible… you can't reason with them, anyway.

Development quote of the week

Posted Dec 3, 2022 19:23 UTC (Sat) by Wol (subscriber, #4433) [Link] (5 responses)

> Maybe at one point is was a surprise to the developers that signed overflow is undefined in C and C++, but in last 10 years so many articles and lectures talked so much about these that I have no idea where you have to live not to know that. And if you know and still want to use that mode (like Linux kernel does) there are -fno-strict-overflow.

You are making an ABSOLUTELY CLASSIC blunder here. You are assuming everyone else is the same as you. You are making an ASS of U and ME.

Just because you may be a C guru doesn't mean I am. Just because I'm a damn good programmer, and I do sometimes program in C, doesn't mean I'm a particularly good C programmer.

Where would I have gone to to hear these lectures? How would I have found the time to go to them? And who would have pointed these articles out to me? Bear in mind I am probably older than the C language itself - I was probably in primary school when it was born. I probably did know that signed overflow was undefined in C, but I certainly didn't know that I knew. In a couple of months time I will have forgotten.

But the BIG problem is "the principle of least surprise". If I try to multiply two numbers such that the result is too big to fit, I expect to get the wrong answer. That's my fault, fair enough. What I do NOT expect (and what you are telling me is a perfectly legitimate thing to happen) is for the compiler to delete the calculation because it saw that it could go wrong!

(Or in this particular case, the test for it going wrong, because the compiler assumed (despite maths to the contrary) that the calculation would never go wrong.)

Imho, if turning UB into defined behaviour breaks certain optimisations, tough. If the C model contradicts standard pure maths, then the C model should define the consequences of doing so - valid INPUT should not result in totally unexpected output.

At the end of the day, I really don't want to have to prefix every int32 multiplication with "if ln2(int1) + ln2(int2) > 32 then don't do the calculation". Is that really an optimisation?

Cheers,
Wol

Development quote of the week

Posted Dec 3, 2022 20:54 UTC (Sat) by khim (subscriber, #9252) [Link] (4 responses)

> How would I have found the time to go to them?

Well, you found time to argue about that topic on LWN, somehow, why couldn't you find time to read or watch some news about language which you are using in your work?

> What I do NOT expect (and what you are telling me is a perfectly legitimate thing to happen) is for the compiler to delete the calculation because it saw that it could go wrong!

Sure. Was a surprise for me the first time I hit that issue, too. But then I have found explanations, options (which gave me choices) and other things… which made it clear for me that complaining about that would just be stupid… you, on the other hand, want to change the whole world to make sure it would follow your ideas. Why? Do you really hope it'll work?

> Bear in mind I am probably older than the C language itself - I was probably in primary school when it was born.

What does it change? In fact it makes the whole thing much more bizzare: usually it's young people who want to change the whole world when they discover that it behaves quite unlikely their ideals.

Why situation with C is different, I wonder? Newgrads quickly accept the fact that shouldn't rely on behavior in case of overflow while “old hats” continue with their rants which leads us literally nowhere.

> Imho, if turning UB into defined behaviour breaks certain optimisations, tough.

Where can I download your compiler which follows that ideology? Would be interesting to compare it to others.

And it's still not clear why would you turn UB into defined behavior (and how would you define it for cases like double-free or data races).

> At the end of the day, I really don't want to have to prefix every int32 multiplication with "if ln2(int1) + ln2(int2) > 32 then don't do the calculation". Is that really an optimisation?

No, but it's your choice to write such strange and unusual code. There are many other ways to do caculations which don't overflow (e.g. you can cast int to unsigned int, multiply two values then cast them back… or can just use functions given to you specifically to check for overflow).

But your argument can be easily turned around: if you don't even read about what happens in C compilers land, if you don't watch videos, if you don't ever talk to C compiler developers… then why do you expect that they would follow the rules you invented and not the rules that have been clearly documented?

You don't even try to discuss with anyone whether change to the rule, you just ASSUME that everyone would find your rules so obvious and correct that they would be implemented in place of what's actually documented and implemented!

Forgive me, but while I, too, don't like that rule of C, but assumption that someone would, you know, read the documentation is a bit more natural that someone would read the documentation and then ignore it!

> But the BIG problem is "the principle of least surprise".

This principle belongs to the “discussion about rules” stage. And I would say that C users are as much to blame as C compiler developers. Instead of discussing the rules they invariably start discussing this magical O_PONIES option which would give them carte blanche to break the rules!

Not gonna happen, sorry. Not even Rust changes that. It only splits code into two parts, one where you have to play by rules and one where compiler ensures that you play by rules, but if you violate the rules all bets are still off.

Development quote of the week

Posted Dec 4, 2022 0:59 UTC (Sun) by Wol (subscriber, #4433) [Link] (3 responses)

> > How would I have found the time to go to them?

> Well, you found time to argue about that topic on LWN, somehow, why couldn't you find time to read or watch some news about language which you are using in your work?

Huh? Unfortunately, the main language I use at work is Visual Basic. And I'm hoping that I will soon be able to use DataBASIC again. (I did say I don't use C ...)

I'm trying to get used to C again, for a personal (at the moment) project, but the less of that I do the better ... for exactly these reasons.

> And it's still not clear why would you turn UB into defined behavior (and how would you define it for cases like double-free or data races).

Well, if the compiler detects a double free, it should really halt with an error. A double free is wrong, period. And if it doesn't detect it, there's not a lot it can do about it. Likewise data races.

Thing is, those two problems are (in my mind at least) clearly different. A double free is a nonsensical operation. A data race will produce nonsense results. The basic operation is ALWAYS wrong.

But multiplication? You can't say "don't do signed multiplication" (or maybe you can), but to say "sometimes it will work, sometimes it won't" is asking for trouble ...

> > At the end of the day, I really don't want to have to prefix every int32 multiplication with "if ln2(int1) + ln2(int2) > 32 then don't do the calculation". Is that really an optimisation?

> No, but it's your choice to write such strange and unusual code. There are many other ways to do caculations which don't overflow (e.g. you can cast int to unsigned int, multiply two values then cast them back… or can just use functions given to you specifically to check for overflow).

Are you saying that "int x; x = (uint) y * (uint) z" will result in the same answer you would expect from "x = y * z"? I guess 2s complement says it does, but the resulting code is unreadable to a programmer not very familiar with C. (I do things like "6 append 0 = 60", or "a = a + (b==1)", but there's no way I'd expect someone unfamiliar with DataBASIC to have a clue what they do ...)

Or calling out to functions? Written, I guess, in assembly to avoid using UB themselves, but not very efficient as a result ...

I get where you're coming from, the C spec is incomplete and inconsistent, and this is the inevitable consequence, but I really don't see why C needs UB. "If you do something not covered by the C model, then you get the hardware model". If you multiply two int32s, OF COURSE the result will sometimes not fit back in an int32 - that's basic maths. The compiler shouldn't assume the laws of basic mathematics don't apply, and use that as an excuse to do something totally unexpected.

The trouble with all this is that the cost of getting round all the optimisation cock-ups is - for many people - a lot higher than the gains such optimisations produce. This is why I hate SQL - the cognitive load of coping with its failings is far higher than the benefits it gives ... it's just not worth it! (I did say I work with VBA - Excel may be a crap database, but it's easier to use than Oracle/SQL ... :-)

Cheers,
Wol

Development quote of the week

Posted Dec 4, 2022 4:36 UTC (Sun) by khim (subscriber, #9252) [Link] (2 responses)

> Well, if the compiler detects a double free, it should really halt with an error.

Ooh. More O_PONIES. How compiler is supposed to do that? Most memory management systems just assume that user would ensure that it never happens.

There are some tools which may help you detect such cases (like MSAN) but they make your code significantly slower.

Still detection is not guaranteed and nobody knows how to make them faster (many tried, there are no success). It's just hard problem.

> Thing is, those two problems are (in my mind at least) clearly different.

But in C specification they are described by by almost exactly same words and in exact same places. Literally, it's not a figure if speech.

The pointer argument to the free or realloc function does not match a pointer earlier returned by a memory management function, or the space has been deallocated by a call to free or realloc is here.

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined is here.

Note that there are no mention of unsigned numbers in that text, but that's because standard defines result of operations on unsigned numbers as if they were members of appropriate ring, ℤ₂⁸ for 8-bit integers or ℤ₂³² for 32-bit ones… overflow is impossible by definition.

> A double free is a nonsensical operation. A data race will produce nonsense results. The basic operation is ALWAYS wrong.

Well… if the result is not mathematically defined or not in the range of representable values for its type then such thing is classified exactly and precisely the same way.

Does it mean that C standard doesn't know the notion of “do what the machine is doing”? Of course not! Such things are also carefully collected and enumerated. That's just different list, list of “implementations specified behaviors”. You can find the follows item there:

The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type — here.

That means that you can perfectly safely convert from 4294967295 from uint32_t to int32_t. That result would be predictable, although, in theory, may be different on different architectures (but if you verify and find out that such conversion would be producing -1 once that means that it would be producing -1 forever).

> But multiplication? You can't say "don't do signed multiplication" (or maybe you can), but to say "sometimes it will work, sometimes it won't" is asking for trouble ...

Well… that's what the standard says and that's what compilers expect by default. Their writers know that some people would like to write code with overflow, though, and they even specifically include -fno-strict-overflow option for folks who don't like this particular aspect of a C standard. Linux kernel uses it. You, too, may use it. If you want.

What more would you expect and why?

> Are you saying that "int x; x = (uint) y * (uint) z" will result in the same answer you would expect from "x = y * z"?

Yes.

> I guess 2s complement says it does, but the resulting code is unreadable to a programmer not very familiar with C.

Maybe, but it's guaranteed to work. Modular arithmetic never overflows and conversion from unsigned to signed is defined, more-or-less as do what the machine is doing (more precisely: it's declared that it's implementation have to pick one way to do that and stick to it and on most modern architectures and compilers that just means pure reinterpretations of bits).

> Or calling out to functions? Written, I guess, in assembly to avoid using UB themselves, but not very efficient as a result ...

No. These functions are by definition the most efficient way to do what you want. Their names doesn't include word Built-in for nothing. Compiler provides them, nit even the standard library. And of course, it would be silly to provide them in compiler and then implement them inefficiently. Usually they are turned into one machine instruction which produces the result and also flag which is the tested to jump to error handler.

> I get where you're coming from, the C spec is incomplete and inconsistent

On the contrary: it's specification is very detailed and mostly consistent (there are some issues with pointer provenance, but these are irrelevant here). Standard clearly marks cases which have to “do what the hardware is doing” and cases “which should never happen in the well-written program”. It gives them different names and even, helpfully, includes long lists which include behaviors of class #1 and class #2.

The only tiny problems for people like you (who want to program without actually reading the documentation): signed overflow is grouped with “double free”, “dangling pointers” and other hard-to-detect-yet-need-to-avoid cases. And, importantly, not grouped with unsigned overflow (according to C standard unsigned overflow doesn't exist and couldn't exist).

> If you do something not covered by the C model, then you get the hardware model

Because “hardware model” is not enough to describe how program is supposed to behave when faced with hard-to-detect errors like race conditions or other such crazy things. Race conditions are really interesting because they include undefined behavior on hardware level! Here's an interesting article on subject (and you can find tons of articled on LWN which explains how Linux deal with these… it's not easy to say the least).

And when hardware itself behaves unpredictably (usually in sane way, but occasionally in way that layman wouldn't expect at all) the only reasonable approach would be what C, C++ and Rust are doing: describe rules which developer have to follow to create a working program. Anything else wouldn't, really, work.

Rust have an advantage of limiting these issues to small percent of program code, but while in unsafe block rules are the same: if your code triggers UB then all bets are off (the list of UBs for Rust is smaller but includes some items that C doesn't consider UB).

> The compiler shouldn't assume the laws of basic mathematics don't apply, and use that as an excuse to do something totally unexpected.

Compiler does what standard explains in such precision. It even provides you with a switch to do math as you want and special functions for efficient overflow detection. Again: what more do you expect from it and why?

Development quote of the week

Posted Dec 4, 2022 20:26 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

> Compiler does what standard explains in such precision. It even provides you with a switch to do math as you want and special functions for efficient overflow detection. Again: what more do you expect from it and why?

Not to treat two completely different categories of programming cockup as if they were identical?

> Well… if the result is not mathematically defined or not in the range of representable values for its type then such thing is classified exactly and precisely the same way.

If the result is not mathematically defined, then I'd quite happily put data races and double frees in that category. Don't do them!

"Not in the range of representable values for its type" is the direct consequence of something that *is* mathematically defined - eg multiplication. If the input is valid, the operation IS mathematically defined, and it's just the output doesn't fit, then how can that be the same as something that MUST have random and potentially disastrous consequences if you're unlucky/stupid enough to do it?

Cheers,
Wol

Development quote of the week

Posted Dec 4, 2022 21:06 UTC (Sun) by khim (subscriber, #9252) [Link]

> Not to treat two completely different categories of programming cockup as if they were identical?

They are treated differently. One has a command-line option to turn it into implementation-specific behavior, one doesn't. One can be caught with UBSAN easily, one requires MSAN and a tiny bit of luck.

But yes, by default they are treated identically because that's what standard says.

> If the input is valid, the operation IS mathematically defined, and it's just the output doesn't fit, then how can that be the same as something that MUST have random and potentially disastrous consequences if you're unlucky/stupid enough to do it?

Compiler would ensure there are random and potentially disastrous consequences, don't worry 🤣.

Development quote of the week

Posted Dec 5, 2022 10:48 UTC (Mon) by geert (subscriber, #98403) [Link] (1 responses)

> } else if (input > 550) {

450?

Development quote of the week

Posted Dec 5, 2022 10:53 UTC (Mon) by farnz (subscriber, #17727) [Link]

Possibly, yes.

Shows the value of automated optimization, in that my hand-optimization may well be buggy.

Development quote of the week

Posted Dec 2, 2022 0:13 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

I don't know how you expect to "catch UB" by invoking it. It's like trying to see if a firecracker works by lighting it and wondering why it blew up in your face when you were just testing to see if the thing worked or not.

Development quote of the week

Posted Dec 1, 2022 23:36 UTC (Thu) by anton (subscriber, #25547) [Link] (28 responses)

But how compilers are supposed to optimize anything if their users are expected to break the rules?

By performing proper optimizations: strength reduction, loop-invariant code motion, inlining, loop unrolling, etc. You don't need to rely on programmers following some rules for proper optimizations.

Concerning your code that relies on the value of uninitialized variables, this code is probably pretty brittle during maintenance, so programmers avoid this kind of usage. By contrast, gcc does the same thing without -fwrapv as with -fwrapv in nearly all cases, and only rarely deletes a condition or sign extension, so programmers learn that signed integers in GCC perform modulo arithmetic; and some make use of this knowledge. You may think that they are wrong, but that does not make them avoid undefined behaviour.

One other difference is that gcc-10 -Wall produces several warnings for your code, while it silently miscompiles the example by Clément Bœsch.

Development quote of the week

Posted Dec 2, 2022 0:27 UTC (Fri) by khim (subscriber, #9252) [Link] (24 responses)

> Concerning your code that relies on the value of uninitialized variables, this code is probably pretty brittle during maintenance, so programmers avoid this kind of usage.

Yes, but to say that some code is “too ugly to survive” you have to precisely define what kind of code you consider “too ugly”. And if you would do that you would get list of UBs for your language.

It's just not possible to create a low-level language without UB. One theoretical alternative is to use Coq to provide proofs of correctness, but I'm 100% sure that people who complain that C is now, suddenly, unusable, would be unable to use Coq.

> One other difference is that gcc-10 -Wall produces several warnings for your code, while it silently miscompiles the example by Clément Bœsch.

Do you really think it's problematic to fix? Here we go. If you don't introduce UB and don't make certain programs “too ugly to survive” then no changes in the compiler can be done. Ever.

Demo program is obvious: just turn pointer to function into code* and start peeking on the generated code.

Literally any change in the compiler would be forbidden if you would try to create “low-level language without UB”. It just doesn't work.

Development quote of the week

Posted Dec 2, 2022 18:45 UTC (Fri) by anton (subscriber, #25547) [Link] (23 responses)

Yes, but to say that some code is “too ugly to survive” you have to precisely define what kind of code you consider “too ugly”. And if you would do that you would get list of UBs for your language.

Where do you get "ugly" from? I wrote "brittle". Anyway, yes, a compiler maintainer for a language like C has to determine what changes to the compiler would break real, existing, tested programs, and then refrain from these changes. That's just the same as Linus Torvalds' "We don't break user space". And sure, there can be cases where a compiler or kernel maintainer misjudge an issue; then they will just have to revert the change. I have written about this at greater length.

It's just not possible to create a low-level language without UB.

It may not be possible to create a low-level language that is fully defined, but it is easy to create one without undefined behaviour: Just define a set of allowed behaviours for each of the operations in the programming language. For signed integer overflow (the issue at hand) it is trivial: Common practice suggests the -fwrapv behaviour (so that would be fully defined).

Development quote of the week

Posted Dec 2, 2022 19:32 UTC (Fri) by khim (subscriber, #9252) [Link] (22 responses)

> Anyway, yes, a compiler maintainer for a language like C has to determine what changes to the compiler would break real, existing, tested programs, and then refrain from these changes.

Yes, they are doing that, too. The selected list of programs they don't break includes glibc, all the programs included in SPEC CPU and some others.

Unfortunately doing what Linux is doing is not, really, feasible because there are just too many programs which are not even supposed to work with random C compiler (and their developers don't plan to fix them).

Rust was born in a different era thus it can use crater run. But even then they haven't adopted Linus rule, it's not really feasible. They contact developers and help them to fix bugs in their code, instead Here you can see compatibility notes for crates broken by Rust 1.64.0 release.

> I have written about this at greater length.

I saw that before. Like all O_PONIES proposals they end up in a trash can (like similar proposals for Linux which “doesn't break userspace”, lol) because they lack consensus.

It's completely not clear why your proposals should be accepted and not bazillion other proposals thus compiler developers do a sensible thing and just wait for the ISO C and ISO C++ committee to sort all that mess out.

The biggest issue, as I have already said, is C community: as long as C developers don't plan to follow the rules all attempts to change said rules wouldn't help anyone.

It's as with an ages old adage: If users are made to understand that the system administrator's job is to make computers run, and not to make them happy, they can, in fact, be made happy most of the time. If users are allowed to believe that the system administrator's job is to make them happy, they can, in fact, never be made happy.

Rust users understand that Rust compilers's job is to compile valid programs and that makes it possible to keep them happy (most of the time). Because they can agree on the notion of “valid program” (most of the time).

Some C users believe compiler's job is to compile “programs written to the hardware” which makes them perpetually unhappy because each and every developer have it's own idea of what that means!

Yes, C/C++ compiler developer have also acted irresponsibly when they started using lack of certain, quite controversial, UBs for optimizations (I, myself, have wrote about crazyness that happens to realloc), but, ultimately, when people don't even plan to read rules and follow them… nothing would have worked, anyway.

Development quote of the week

Posted Dec 2, 2022 23:12 UTC (Fri) by Wol (subscriber, #4433) [Link] (2 responses)

> Unfortunately doing what Linux is doing is not, really, feasible because there are just too many programs which are not even supposed to work with random C compiler (and their developers don't plan to fix them).

> Rust was born in a different era thus it can use crater run. But even then they haven't adopted Linus rule, it's not really feasible. They contact developers and help them to fix bugs in their code, instead Here you can see compatibility notes for crates broken by Rust 1.64.0 release.

And what do you do when USERS don't (or can't) upgrade to said fixed versions? That's why Linus' rule is so important. Although actually I'd prefer the mainframe mechanism - an IBM360 emulator running on an IBM370 emulator running on a ... and so on. So you can run programs from the early 90s in an early 90s userspace, etc etc.

> It's as with an ages old adage: If users are made to understand that the system administrator's job is to make computers run, and not to make them happy, they can, in fact, be made happy most of the time. If users are allowed to believe that the system administrator's job is to make them happy, they can, in fact, never be made happy.

Provided said sysadmin understand's that the *computer's* job is to run the users' programs. Not necessarily the sysadmin, but you sometimes feel the IT department has lost sight of the reasons for having said computers ... :-)

Cheers,
Wol

Development quote of the week

Posted Dec 3, 2022 15:06 UTC (Sat) by khim (subscriber, #9252) [Link]

> And what do you do when USERS don't (or can't) upgrade to said fixed versions?

Say that you are very sorry? They have decided to play alone, then can resolve that problem.

> That's why Linus' rule is so important.

It was important in the end of last century when computers weren't powerful enough to run few VMs.

It's still important on some platforms where you can't run VM.

But compilers never had such a problem: if users claim they can not upgrade the compiler 10 times out of 10 it's social problem, not technical.

Social problems need social solutions and Rust's solution is very simple: if you don't want to upgrade and something breaks… you get to keep both pieces.

> Not necessarily the sysadmin, but you sometimes feel the IT department has lost sight of the reasons for having said computers ... :-)

Why should they know or even care about these reasons? They are doing their job, if they are doing their job according to the contract, then it's pointless and useless to discuss these reasons.

When we had trouble with our air conditioning system in office we had to buy CO₂ measuring equipment, create a monitoring system, and only after we had the numbers which showed that air conditioning system was unable to provide law-compliant service they had to go back and rebuild this system.

> Provided said sysadmin understand's that the *computer's* job is to run the users' programs.

Oh, sure, absolutely. But if it couldn't do that while signed contract is fully fullfilled then it's not sysadmin's pain and not sysadmin's job to do anything.

I can assure you: these air conditioning guys understood why we bought their system perfectly, yet as long as there was no proof that they are violating the law (and can be sued, theoretically) they did nothing.

C compiler developers follow the same approach: we have a contract (called C standard in case of C, Rust reference in case of Rust), if we are not violating it then it's not our fault, if you feel that contract is, somehow, bad or wrong, it must be renegotiated.

Why this approach, which is used literally everywhere else, doesn't work with C users, and why they feel entitled to have programs which are violating the contract have to be compiled “correctly” (in quotes because of if program triggers UB then, by definition, it's not clear what the “correct” result should be) is beyond me.

Development quote of the week

Posted Dec 3, 2022 15:19 UTC (Sat) by khim (subscriber, #9252) [Link]

> And what do you do when USERS don't (or can't) upgrade to said fixed versions?

One illustration for my point: statistic for supported rustc version have sharp drop after version 1.55. Why? Was that version, somehow, special? No. It was just version released one year ago. Most “serious” Rust developers don't want to create to much pain for their downstream users and support ancient, year-old versions of Rust (although many “less serious” developers only support three-months old Rust, that's why there are secondary drop after version 1.63).

If you want to use Rust from stable version of Debian (1.48 today) or Rust from Debian sid (1.62 today)? Sorry, guys, these are too old, you had your time for upgrade, now it's your problem.

If you really need a certified Rust (which, by necessity, would be older) you may contact ferrocene guys and they may provide your with support for these older versions.

Because you have created that problem, thus you have to pay for it's resolution.

Development quote of the week

Posted Dec 3, 2022 17:38 UTC (Sat) by anton (subscriber, #25547) [Link] (18 responses)

Unfortunately doing what Linux is doing is not, really, feasible because there are just too many programs which are not even supposed to work with random C compiler (and their developers don't plan to fix them).

It is just as feasible as for Linux. Programs that are not supposed to work with a particular C compiler are as irrelevant for the question of whether a newer version of that compiler breaks a program as programs that are not supposed to work on Linux are irrelevant for the question of whether a newer version of Linux breaks a Linux user space program (and there are probably more programs that work in Linux user space than are compiled by, say, gcc).

On my paper about backwards compatibility for C compilers:

I saw that before. Like all O_PONIES proposals they end up in a trash can (like similar proposals for Linux which “doesn't break userspace”, lol) because they lack consensus.
It's completely not clear why your proposals should be accepted and not bazillion other proposals thus compiler developers do a sensible thing and just wait for the ISO C and ISO C++ committee to sort all that mess out.

You may have seen it, but you failed to understand it. My paper does not propose a tightening of the C standard. Instead, it tells C compiler maintainers how they can change their compilers without breaking existing, working, tested programs. Such programs may be compiler-specific and architecture-specific (so beyond anything that a standard tries to address), but that's no reason to break them on the next version of the same compiler on the same architecture.

My paper does not tell anyone how to make a compiler-specific program portable between compilers, nor how to make an architecture-specific program portable between architectures. The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.

Development quote of the week

Posted Dec 3, 2022 19:14 UTC (Sat) by khim (subscriber, #9252) [Link] (4 responses)

> My paper does not propose a tightening of the C standard. Instead, it tells C compiler maintainers how they can change their compilers without breaking existing, working, tested programs.

What's the difference? If your proposal doesn't make it possible to explain what the “well behaving” program is then it's useless for the compiler developers. If it does make it possible to establish that then it may as well be a proposal for a change in a C standard.

> Such programs may be compiler-specific and architecture-specific (so beyond anything that a standard tries to address), but that's no reason to break them on the next version of the same compiler on the same architecture.

But nobody tries to break any programs. They just follow the rules.

> The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.

You haven't proven that in your article. You haven't presented any “friendly C compiler”, you haven't proven it can do auto-vectorization, but, most importantly, you haven't proven that if you create such a compiler you would be able to make anyone happy.

I'm pretty sure that “writers to the hardware” would find a way to become angry even on your compiler, but it's hard to check because there are no compiler to look on.

At least Regehr tried to do something constructive. The only thing you do is tell “O_PONIES are possible, trust me” in a different words.

It's typical for proponents of O_PONIES: they never bother to explain how exactly their “friendly C compiler” would work, they never explain what they propose to put inside, they just repeatedly assert that creation of black box of some shape is possible.

That's not very constructive.

Development quote of the week

Posted Dec 4, 2022 18:31 UTC (Sun) by anton (subscriber, #25547) [Link] (3 responses)

If your proposal doesn't make it possible to explain what the “well behaving” program is then it's useless for the compiler developers.

My paper (not proposal) tells compilers not what a "well behaving" program is (that's not at all its point, and it's not relevant for a friendly C compiler), but how a friendly C compiler behaves. That may be useless for the developer of an adversarial C compiler, true.

[...] you haven't proven that if you create such a compiler you would be able to make anyone happy.

I leave it up to the reader to decide whether a backwards-compatible compiler would make them happier than an adversarial compiler. But the idea of a proof that a certain kind of compiler makes anybody happy is interesting. What methodology would you accept for the proof?

Development quote of the week

Posted Dec 4, 2022 18:57 UTC (Sun) by khim (subscriber, #9252) [Link] (2 responses)

> My paper (not proposal) tells compilers not what a "well behaving" program is (that's not at all its point, and it's not relevant for a friendly C compiler), but how a friendly C compiler behaves.

Yup. O_PONIES, O_PONIES, and more O_PONIES.

> That may be useless for the developer of an adversarial C compiler, true.

Inventing different derogatory names for the people when you are trying to convince them to do something for you is not very good strategy.

> I leave it up to the reader to decide whether a backwards-compatible compiler would make them happier than an adversarial compiler.

I'm not asking about where someone would think if such compiler would make them happy but whether it would actually make them happy. These are different things, you know.

CompCert made a decent shot at what you are demanding, apparently, why haven't you became happy with it and still try to convince makers of “adversarial” compilers to do something (and do that by calling them childish names and, in general, trying to make sure they wouldn't hear you)?

It's not as if it's just a problem of getting people aware, CompCert is not a new thing.

> What methodology would you accept for the proof?

Easy: people tend to use what they like and don't use what they dislike. In 10 years no significant users of “adversarial” compilers have made that switch. They prefer to complain about unfair treatment yet continue to use gcc and clang.

Even Linus, who, famously, refuses to entertain the notion of using clang (which is funny if you consider the fact that certain facilities in kernel are no compatible with GCC) haven't made the switch. Why do you think it happens?

Development quote of the week

Posted Dec 6, 2022 18:51 UTC (Tue) by anton (subscriber, #25547) [Link] (1 responses)

John Regehr wrote: "A sufficiently advanced compiler is indistinguishable from an adversary." I don't agree that this is an "advance", but if compiler maintainers take the attitude that you are advocating here, the compilers are certainly going to become more adversarial the more sophisticated they get.

I find it funny that someone who writes "O_PONIES" as frequently as you do is complaining about supposedly derogatory names.

Your CompCert link does not mention anything that sounds like what I describe. Instead, the headline feature is formal verification of the compiler. CompCert's description of the supported C dialect also makes no mention of any such ambitions.

As for why people have not made "the switch". The switch to what? Compcert, a research project that has few targets and does not fully support setjmp() and longjmp(), and does not even talk about anything related to the issue we have been discussing here, and has deviations from the standard ABI of the platforms it supports?

GCC and Clang are apparently not adversarial enough for that; the approach seems to be that they try to be backwards-compatible by testing with a lot of real-world code out there (which is good), and mainly unleash the adversarial attitude when reacting to bug reports (not good). Also, the C language flags (like -fwrapv) available cover the most common issues, the remaining cases have not been painful enough to make people switch to a different C compiler (which one?).

Switching to a language with a more friendly compiler maintainer attitude is a big job, and is not done easily. However, when starting a new project, that's a good time to switch programming languages; now we just need a way to count how many new projects use C as its primary language now, compared to, say, 10 years ago.

Development quote of the week

Posted Dec 6, 2022 19:26 UTC (Tue) by khim (subscriber, #9252) [Link]

> I find it funny that someone who writes "O_PONIES" as frequently as you do is complaining about supposedly derogatory names.

These are fine if you don't plan to ask someone to do something for you. And it wasn't invented by me. It was, basically, invented on the LKML precisely when people started discussing situation about applications expected specific semantic which was never guaranteed or promised and which new versions of Linux kernel stopped providing. So much for 100% backward compatibility being a panacea for everything.

As you can guess the end result was precisely and exactly like with C compilers: there was much anguish, lots of discussions but in the end it was declared that since these guarantees were never there and code just happened to work because of accident app developers would have to rewrite their code if they want these guarantees.

> Compcert, a research project that has few targets and does not fully support setjmp() and longjmp(), and does not even talk about anything related to the issue we have been discussing here, and has deviations from the standard ABI of the platforms it supports?

So now you want full compliance with everything, too? Even more O_PONIES.

> now we just need a way to count how many new projects use C as its primary language now, compared to, say, 10 years ago.

Obviously that number would go down. C, basically, refused to advance when other languages did. C18 is very similar to C90 and almost undistinguishable from C99. I don't think it would be interesting idea to look on that, C was slowly turning into COBOL without any tales of adversarial compilers.

More interesting would be fate of C++. Use of C++ was growing, not shrinking, recently. Would be interesting to see what will happen to it.

Development quote of the week

Posted Dec 4, 2022 0:30 UTC (Sun) by mathstuf (subscriber, #69389) [Link] (12 responses)

> My paper does not tell anyone how to make a compiler-specific program portable between compilers, nor how to make an architecture-specific program portable between architectures. The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.

This basically leads down to an "IBSan" tool that detects implementation-defined behavior and signals on used-to-be-UB-but-is-now-arch-dependent. Portability is a benefit of code and if I know that my x86-compiled code is UB-free, it'll have the same behavior (but certainly not performance profile) on ShinyNewArch that gets released a decade from now. I really don't want to have to go to every project and make sure that they CI test my pet arch to make sure I don't have live grenades being lobbed my way on every update. I expect that Debian and NetBSD porters to obscure architectures appreciate that breaking these rules is just as "bad" on the "native" development platform(s) as they are on their target(s).

Now, if there were an in-language (no, the preprocessor doesn't count) to say "this is targeting x86 because we're talking to an IME, give me native behavior", *then* I could see there being some new "undefined-if-portable behavior" bucket for these kinds of things to go into.

Development quote of the week

Posted Dec 4, 2022 17:46 UTC (Sun) by anton (subscriber, #25547) [Link] (11 responses)

Portability is valuable in many settings, and I think I am quite experienced in the area, with Gforth (which "breaks the rules" (in your terminology) a lot) usually working out of the box on new architectures and operating systems.

But my point is that a friendly compiler must also work for non-portable code: If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform.

Portability is an orthogonal requirement. Your hypothetical "IBSan" tool may be helpful, although I have my doubts, see below. In practice I test for portability by making test runs on as many different platforms as I can get my hands on. That's not 100% reliable, but it tends to work quite well.

I have my doubts about "IBSan" because it assumes one binary that should cover all portability variants. Real-world portable C programs often have lots of conditional compilation and stuff coming from configure to help with portability. If you write "the preprocessor doesn't count", it's obvious that you are not interested in C as it is used in the real world.

Development quote of the week

Posted Dec 4, 2022 18:27 UTC (Sun) by khim (subscriber, #9252) [Link] (3 responses)

> But my point is that a friendly compiler must also work for non-portable code: If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform.

Well, neither C, C++ or Rust are even trying to be “friendly” by that definition (here's recent example where Rust 1.65 doesn't accept source which Rust 1.64 accepted). That's fine with Rust users, yet, apparently not fine with small (but very vocal) group of C users.

That's basically why C and C++ are doomed: in their world compiler users and compiler developers each talk in ultimatums which the other side is not willing to accept, which means conflict could never be resolved.

I have seen so much talks about “friendly C” (O_PONIESs, really) from C users, but don't even know a single optimizing compiler developer who subscribes under that idea.

Development quote of the week

Posted Dec 4, 2022 20:45 UTC (Sun) by pizza (subscriber, #46) [Link] (2 responses)

> Well, neither C, C++ or Rust are even trying to be “friendly” by that definition (here's recent example where Rust 1.65 doesn't accept source which Rust 1.64 accepted). That's fine with Rust users, yet, apparently not fine with small (but very vocal) group of C users.

That should read -- "that's apparently fine with the current Rust users".

C and C++ have several orders of magnitude of users than Rust. And those users (and compiler writers, and language stewards) are all trying to pull in their own, often incompatible, directions, collectively with literally billions of lines of code/baggage.

Rust, by virtue of being rather youthful, doesn't yet have a significant mass of users or use cases. There is only one implementation, produced by the same folks who define the language, and most of the users are still of the True Believer sort. All of this will inevitably change, and when it does, the needs of these various sub-groups will inevitably begin to diverge, and then the current "our way or the highway" language+implementation stewardship model will start failing.

If Rust does eventually succeed (ie ends up as a "legacy" language with many hundreds of millions of lines of code in wide deployment across tens of thousands (if not more) of organizations with divergent needs spanning a couple decades or so) then continuing to evolve it will run into many of the same sorts of problems that C and C++ face today -- ie problems of politics and governance.

I don't have any skin in this particular game, but I've been around long enough to see certain patterns, including the "we're smarter than those other guys so we'll be immune to their problems" hubris that *always* comes back to bite.

Development quote of the week

Posted Dec 4, 2022 21:53 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

> C and C++ have several orders of magnitude of users than Rust.

I don't think so. There are certainly a lot more existing code in C and C++, because they had several decades of headstart. As for number of actual users it's hard to say for sure, but recent countings put at around half of Go or third of Kotlin (and about ten times less popular than JavaScript which, you must admit, it definitely more popular than C, C++ or Rust).

> Rust, by virtue of being rather youthful, doesn't yet have a significant mass of users or use cases.

No. The important thing is not fact that Rust is youthful, but the fact that Rust users are youthful. The fiasco that happened with C and C++ is mostly caused by old people who still remember times where it was possible to pretend that C is “portable assembler”, “program to the hardware” and expect that compiler wouldn't screw you.

I dealt with quite a few newgrads and they accept strange and bizzare rules of standard C/C++ without much complaints. For them it's just how this weird language works. Strange rules, but hey, rules are rules. And the same happens with Rust.

But in C, very often, they have to deal with these old “relax, I know what I'm doing, I'm older than C, I know how it works” guys. While in Rust these guys, as I have said, are expelled from the community, instead.

I don't think this would change. Even if number of Rust developers is not ⅓ of number of C/C++ developers but closer to ⅒ of number of C/C++ developers it's pretty obvious that C/C++ style disaster wouldn't happen to Rust.

Plank's principle in action: An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning

I have meet real old software-related guys when I was in college and what I observe today reminds me of their tales about how structural programming arrived. The exact same refusal to accept new idea, the insistence that “proper” design is with the use of flowcharts on A1 (or A0 for complex cases) papers and that all these newfanged things like stacks or loops are just making development difficult and so on so forth.

The only big question is whether this time Rust (and Rust-like) language would actually win or if history would repeat itself and after initial success of languages properly structural like Algol or Pascal some half-backed newcomer would come and take over (like C and C++ did).

Time will tell.

> I don't have any skin in this particular game, but I've been around long enough to see certain patterns, including the "we're smarter than those other guys so we'll be immune to their problems" hubris that *always* comes back to bite.

Nah. I don't think there's any chance of Rust making the same mistake as C and C++ did, but it certainly can do an entirely new ones.

E.g. its approach to async programming… I'm still not convinced it's the right one and wouldn't lead to dead end.

It's a tale as old as time.

Posted Dec 5, 2022 16:09 UTC (Mon) by smoogen (subscriber, #97) [Link]

Most of these threads seem to mirror conversations I remember in the late 1980's when obfuscated C programs were big and many of the people who are now old, argued the same things about how the compiler should OR should not have allowed it. It also mirrors arguments between K&R C 1978 and K&R C 1988 version. The fact that many compilers would allow some middle road between 78 and 88 until the early 00's just allowed for 'what does C mean?' arguments even longer.

Development quote of the week

Posted Dec 5, 2022 12:02 UTC (Mon) by farnz (subscriber, #17727) [Link] (1 responses)

Given "If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform", what stops a compiler team deciding that supporting the behaviours of the previous version is too hard, and thus they're going to release a new compiler with the same CLI, that's not a "new version of the existing compiler"?

This is not a pure hypothetical - GCC 3 is not merely a "new version of GCC", it's a new compiler (the egcs GCC fork) that was adopted by GCC as the clear better outcome. If you set a rule like your proposed rule, what stops GCC21 being a new compiler, not version 21 of GCC?

Development quote of the week

Posted Dec 6, 2022 22:46 UTC (Tue) by anton (subscriber, #25547) [Link]

what stops a compiler team deciding that supporting the behaviours of the previous version is too hard, and thus they're going to release a new compiler with the same CLI, that's not a "new version of the existing compiler"?

Self-respect.

But actually that's somewhat the situation we have with gcc (and probably clang) now, only the maintainers don't say explicitly that their compilers are not backwards-compatible (they certainly have declared bug reports as invalid that clearly state that the code has worked with earlier gcc versions), so some people think of switching to a newer version of gcc as being an upgrade. It's not.

Even when starting with the same code base a compiler can be backwards-incompatible (as demonstrated by some gcc versions newer than 3), and with a different code base it can be compatible (but that's hard). and actually ecgs was forked from the pre-gcc-2.8 code base.

Development quote of the week

Posted Dec 6, 2022 3:26 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (4 responses)

> If you write "the preprocessor doesn't count", it's obvious that you are not interested in C as it is used in the real world.

I'm interested in *improving* things so that the compiler can *see* "this code is x86-bound, feel free to optimize appropriately" with proper attributes rather than code-masking performed by the preprocessor. Flowing "this code was selected based on a check of `defined(__x86_64__)`" is unlikely to be tenable with how complicated some preprocessor checks are (and *their abstractions* used in various libraries).

Development quote of the week

Posted Dec 6, 2022 22:55 UTC (Tue) by anton (subscriber, #25547) [Link] (3 responses)

There are some programs that don't need configure or the like to be portable, but many use a lot of stuff coming out of configure, and I am very pessimistic that we can get rid of conditional compilation.

When you write "this code is x86-bound, feel free to optimize appropriately", what optimization do you have in mind?

Development quote of the week

Posted Dec 6, 2022 23:03 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

> When you write "this code is x86-bound, feel free to optimize appropriately", what optimization do you have in mind?

I'm thinking that the optimizers can assume specific behavior for things instead of considering it UB. For example, left shift by too much can keep the same value (IIRC, ARM makes it 0). The programmer *intent* that this is target-specific is what is important here. Bare C code doing such a shift is still in the "this doesn't mean what you think it means, so I will assume that such Bad Things™ don't happen".

> There are some programs that don't need configure or the like to be portable, but many use a lot of stuff coming out of configure, and I am very pessimistic that we can get rid of conditional compilation.

I also think that conditional compilation is here to stay. However, it being a code-blind copy/paste mechanism doesn't have to be true. With `constexpr` instead of preprocessor symbols, it is possible to have something like D's `static if` or Rusts `cfg!()` mechanisms to hide code during compilation. This allows it to still be syntax checked and formatted appropriately instead of being a wild west of sadness when some long-dormant branch with unbalanced curly braces finally gets activated.

Development quote of the week

Posted Dec 7, 2022 0:29 UTC (Wed) by khim (subscriber, #9252) [Link]

> For example, left shift by too much can keep the same value (IIRC, ARM makes it 0).

It's a bit worse than that. ARM uses low byte to do shift which means that shift by 128 is, indeed, zero, but shift by 256 doesn't change anything (and doesn't touch flags).

Development quote of the week

Posted Dec 7, 2022 0:54 UTC (Wed) by khim (subscriber, #9252) [Link]

> For example, left shift by too much can keep the same value (IIRC, ARM makes it 0).

Note BTW, that the very first CPU, 8086 (and 8088) performs like ARM, not like all subsequent CPUs.

Means Intel took advantage of this UB back when it was developing Intel 80186 forty years ago.

ARM also have similar case, e.g., it has push and pop instructions which may push or pop from 1 to 16 registers as result of one instructions. If you specify 0 registers then some manufacturers treat it as NOP, some treat as UD, but it's also permitted to load random set registers from stack including PC counter!

So much for predictable hardware, huh? In fact document called ARMv8 AArch32 UNPREDICTABLE behaviours lists more than 50 of these.

Development quote of the week

Posted Dec 2, 2022 1:36 UTC (Fri) by khim (subscriber, #9252) [Link] (2 responses)

You don't need to rely on programmers following some rules for proper optimizations.

Wrong. Let me state one thing which you need to understand before you can meaningfully continue discussion:

100% OF C AND C++ COMPILERS 100% OF TIME 100% DEPEND ON THE ABSENCE OF UB IN THE COMPILED PROGRAM.

NO exception. Not even a single one.

Constructive proof.

First you have to take the program. Any program.
Then you change it in the following way:
Add SHA512 sum to the main and calculate SHA512 from full body of the program (just take address of main and add some offsets).
Refuse to start if SHA512 sum is different.

That it. Now you have a program which doesn't even misbehave if compiler tries to apply any optimizations, it would misbehave even if you change one bit of output.

For the old compilers which don't produce reproducible builds you may calculate SHA512 sum of some parts of programs and still arrive at the same outcome.

Now. You may say that's stupid way to write such programs. And I will agree. You may say that's brittle. And I will agree with that, too. You may say that nobody writes such programs. And you would be correct, again.

Yet the fact remains: if you would demand that changes in the compiler shouldn't break valid programs simply because they are causing UB then no changes in the compilers would be possible. None at all

Because if you declare that such special-crafted programs which are handcrafted just for your compiler as “valid” then you couldn't change anything in any program. If you declared them “invalid” — then congrats, you have just added one (or more) items to the list of UBs that programmer is not allowed to use in valid programs.

You can create a language which makes such trick impossible with Coq but that wouldn't a language where you can “program for the hardware”. On the contrary, it would be a language so far removed from what C is that none of C developers who complain that C/C++ compilers break their code would be able to use.

It fundamental: low-level languages without UB can not exist. The only question is what exactly would we call UB and what exactly would we allow.

Development quote of the week

Posted Dec 3, 2022 0:04 UTC (Sat) by riking (subscriber, #95706) [Link] (1 responses)

> Add SHA512 sum to the main and calculate SHA512 from full body of the program (just take address of main and add some offsets).

found the UB

Development quote of the week

Posted Dec 3, 2022 15:31 UTC (Sat) by khim (subscriber, #9252) [Link]

Yes, but that's UB according to the specification of C abstract machine! There are no UB “in the hardware” in that place!

My point was that if you “program to the hardware” and do even well-defined (on the hardware level) things and then demand “since that program worked once it must work after compiler upgrade” this demand makes it impossible to optimize some programs! Or even change compiler output in any way!

And if you add rules which say “yes, on hardware level result of that operation is defined, but you are not allowed to do that anyway”… you are creating a specification of abstract machine!

Yes, it maybe quite similar to “real hardware” (essentially: what real hardware does except for this small list of exceptions) but it's not “real hardware” anymore!

Development quote of the week

Posted Dec 1, 2022 23:52 UTC (Thu) by isilmendil (subscriber, #80522) [Link] (4 responses)

> But how compilers are supposed to optimize anything if their users are expected to break the rules?

The thing is, the user did not break the rules. The optimizer did when it optimized from the inside out instead of the outside in.

> What's the difference (from the compiler's POV) between this atrocity and your overflow checks?

For starters, your example only gives a semblance of a correct result when the optimizer is turned on, rather than off. The optimizer correctly recognises that the set() function has no side effects and can be eliminated. It does not eliminate add() because it has side effects.

If the same reasoning as above were applied, the optimizer presumably should remove the main method because any use of an indeterminate value is undefined behaviour (§6.2.4) and since UB will certainly not be invoked main() is obviously never called ;-)

Development quote of the week

Posted Dec 2, 2022 1:11 UTC (Fri) by khim (subscriber, #9252) [Link] (3 responses)

> The thing is, the user did not break the rules.

User did. The rule is very obvious: for the program to behave correctly it shouldn't contain errors. Simple, no?

I think you can even explain that to guys in the kindergarten, but for some reason C developers just couldn't accept it.

Granted, version from the standard is pretty long-winded, but it's still the exact same rule:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

> For starters, your example only gives a semblance of a correct result when the optimizer is turned on, rather than off.

WTF? Do you want to imply that variables don't live on stack? Why does register specifier exist if that's not so? Or do you want to imply that they are, somehow, initialized when you enter the function? That wouldn't be optimal now, would it? C never initialized variables, why should it start doing that now?

If we are coding for the hardware then correct result for that program is well-defined and it's 5. It's the exact same logic you were using for justification of the use of signed overflow: standard doesn't explain how that program works, but if you know how real hardware works then it's not hard to predict the outcome. And, indeed, most compilers (with optimizations disabled) produce precisely that outcome.

> The optimizer correctly recognises that the set() function has no side effects and can be eliminated.

No. That not how hardware works. Sorry. Variable a lives on the stack in two functions. One function sets it while the other one reads it.

Yes, it's how programs were written 60 years ago in Fortrain IV, but so what? Hardware haven't changed enough since then. Even code compiled with most compilers and disabled optimizations work.

> If the same reasoning as above were applied, the optimizer presumably should remove the main method because any use of an indeterminate value is undefined behaviour (§6.2.4) and since UB will certainly not be invoked main() is obviously never called ;-)

Sure. That would be valid compilation of that program according to the spec. But notably not a valid result if we are programming for the hardware and not to the abstract machine.

If you are programming to the spec you may say that this function have no side effects (because spec doesn't even mention word “stack”), but on the hardware level there are stack, registers, calling convention and lots of other perfectly documented things.

Any other result but 5 would be a miscompilation if we are allowed to write code “for the hardware”. From hardware point of view any store is a “side effect”, you can not just go and remove them! Heck, early compiles even allowed you to mix assembler and C code and, of course, assembler was supposed to access variables on stack (not possible if you declare them as register, of course). How can you say all that have disappeared?

Hardware on our desks is almost 100% compatible with what we had 40 years ago, when IBM 5150 was presented, why programs should suddenly start behaving differently?

Development quote of the week

Posted Dec 2, 2022 23:10 UTC (Fri) by isilmendil (subscriber, #80522) [Link] (2 responses)

As you quoted, if the *execution* contains an undefined operation, the program is ill-formed.

I think we can agree that the following little for-loop is not ill-formed:

int main() {
int i;
int array[8];
for (i=7; i >0 ; i--) {
array[i] = 0;
}
return array[0];
}

Using the same reasoning as with the original example, you can elide the loop condition. Violating the array bounds would be undefined behaviour, which means that the check against i>0 is always true...

>> For starters, your example only gives a semblance of a correct result when the optimizer is turned on, rather than off.

>WTF? Do you want to imply that variables don't live on stack? Why does register specifier exist if that's not so? Or do you want to imply that they are, somehow, initialized when you enter the function? That wouldn't be optimal now, would it? C never initialized variables, why should it start doing that now?

Of course not. You have two function with two different local variables (both called "a"). The compiler using the same portion of the stack for both variables is sheer coincidence. C never initialized memory. You get initialized memory by sheer luck (i.e. because of the OS, not the hardware or C standard says so).

>If we are coding for the hardware then correct result for that program is well-defined and it's 5.

But we are not coding for the hardware, we are coding for an abstract machine model. That's the whole point of writing C, not assembler.

> It's the exact same logic you were using for justification of the use of signed overflow: standard doesn't explain how that program works, but if you know how real hardware works then it's not hard to predict the outcome. And, indeed, most compilers (with optimizations disabled) produce precisely that outcome.

As you say yourself, "it's not hard to predict the outcome". So you're not writing for the hardware, but for what you predict would be sensible choices for the compiler to make.

Development quote of the week

Posted Dec 3, 2022 13:32 UTC (Sat) by khim (subscriber, #9252) [Link]

I think you mixed up the sides at some point.

> But we are not coding for the hardware, we are coding for an abstract machine model.

Who are these “we”? Proponents of semi-portable C (like Wol, look on his complains about how compiler screws over people who actually do try and check that they are getting a pure result from applied hardware). Yes? They most definitely don't program for “an abstract machine model”.

> That's the whole point of writing C, not assembler.

Unfortunately way too many C programmers from “semi-portable” camp forcibly assert that C is just a way to write assembler easier. That's the main issue with C. The endless “verbal wars” of C users against C compiler developers is a consequence.

Just read what they wrote, please! It can not be stated more clearly than that dialogue, can it:

> If you want your program to be predictable in any way, you cannot allow any instances of UB to occur in it - including in any tests for UB.

In other words, C IS NO LONGER FIT FOR SYSTEMS PROGRAMMING.

I don't remember if it was you, but when someone said "you need to program to the C model, not the hardware model", that is the death knell of a systems programming language. If you're not allowed to program to the hardware, why the hell are you using it to control said hardware?

My example shows that it's not, really, possible to “program for the hardware” in C and thus use C as [somewhat] “portable assembler”. If you don't subscribe to the idea that we program for the hardware and that C is “portable assembler” then situation with that program is obvious.

There are no need to explain to me how compilers work, I know that, I even done some patches for GCC some years ago.

But I'm yet to see any explanation from guys in that “we program to the hardware” camp about my program. They either ignore me or say that I'm an idiot because I wrote that code (as if I propose to use it in production) and never explain what exactly is wrong with it. Because if you “program for the hardware” it's impossible to do!

> So you're not writing for the hardware, but for what you predict would be sensible choices for the compiler to make.

True. But that's exactly the step that “we program for the hardware” guys are explicitly refusing to do. Because if they accept it then, suddenly, all these problems about cases where compiler destroys their “perfectly sensible programs with UB” are no longer compiler fault, but their fault: they never had any promise that compiler would produce working code, but some compilers did so by accident (note: on Turbo C 1.0 this program work just fine with all optimizations enabled), if new compiler break that program then onus is on them to fix it!

They couldn't accept that and wouldn't accept them. Which makes dialogue impossible.

C compiler developers are not innocent either: when they find out that C users rely on certain things which C users are using in their programs 99% of time their answer is “standard doesn't allow that” instead of sensible dialogue.

Basically we have two camps where members of each camp talk not to each other but “past” each other with ultimatums. Agreement or compromise is impossible in such dialogue.

That is what I wanted to show. The fact that I'm yet to see any answer from “we program for the hardware” guys about what we can do with this example (the most I have seen are various discussions about me and about how I'm an idiot for writing such code) is telling enough.

Development quote of the week

Posted Dec 5, 2022 12:01 UTC (Mon) by farnz (subscriber, #17727) [Link]

You're using different reasoning to the original example: the original example says that if a program execution path includes UB, then the program's behaviour is undefined (the meaning of UB), and thus any outcome is permissible, since all possible outcomes are within the defined behaviour of the program.

Your reasoning is slightly different; you're saying that if you change the program to potentially include UB when it did not before, then the program's behaviour is undefined. Which is also true, but is significantly different - the compiler is not permitted to add outcomes to the set of all possible outcomes of compiling and running a piece of code, only to remove outcomes and create a binary that chooses between allowed outcomes at runtime.

This is why UB is tricksy. By writing UB, you've written code that says "anything is an allowed outcome of compiling and executing this code", and then the optimizer, working on the as-if rule, is able to choose anything as the outcome. The code you present doesn't have UB, thus the allowed executions are constrained (to returning 0 from main), and the compiler is only allowed to generate code which returns 0, after possibly looping over 8 ints on the stack and setting them all to 0 (although, as this is not an "observable side-effect", it can elide that).

Development quote of the week

Posted Dec 1, 2022 18:52 UTC (Thu) by Karellen (subscriber, #67644) [Link] (4 responses)

OK, it doesn't say exactly that. What the C standard, §3.4.3, does say is:

3.4.3

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

EXAMPLE An example of undefined behavior is the behavior on integer overflow

There are no requirements for what the compiled code can do in response to undefined behaviour. The compiler is allowed to output code that does absolutely anything if such a thing occurs, including (infamously) make demons fly out of your nose. It is also therefore allowed to assume that UB never happens, because if it did happen, then it could have acted as if it didn't - because there are no requirements.

If you want your program to be predictable in any way, you cannot allow any instances of UB to occur in it - including in any tests for UB.

Development quote of the week

Posted Dec 1, 2022 22:47 UTC (Thu) by Wol (subscriber, #4433) [Link] (3 responses)

> If you want your program to be predictable in any way, you cannot allow any instances of UB to occur in it - including in any tests for UB.

In other words, C IS NO LONGER FIT FOR SYSTEMS PROGRAMMING.

I don't remember if it was you, but when someone said "you need to program to the C model, not the hardware model", that is the death knell of a systems programming language. If you're not allowed to program to the hardware, why the hell are you using it to control said hardware?

> undefined behavior
> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

Note this explicitly includes "A NONPORTABLE CONSTRUCT", ie valid code that is not guaranteed to work across divergent systems. In practice, what the compiler writers are doing is saying "there is no such thing as nonportable constructs", despite the standard explicitly allowing for it. Unfortunately, a systems programming language absolutely requires non-portable constructs that can be trusted to work ...

Cheers,
Wol

Development quote of the week

Posted Dec 1, 2022 22:55 UTC (Thu) by pizza (subscriber, #46) [Link] (1 responses)

> I don't remember if it was you, but when someone said "you need to program to the C model, not the hardware model", that is the death knell of a systems programming language. If you're not allowed to program to the hardware, why the hell are you using it to control said hardware?

By that definition, the only acceptable programming language for systems programming is... bare assembly.

Development quote of the week

Posted Dec 2, 2022 0:41 UTC (Fri) by khim (subscriber, #9252) [Link]

> By that definition, the only acceptable programming language for systems programming is... bare assembly.

Indeed. The only theoretical way to create low-level language without UB would be to employ Coq to ask developers to provide proofs of correctness for their programs.

Then there wouldn't be UB because you would be, quite literally, forbidden from doing things which cause UB (in particular any program which produces integer overflow wouldn't be able to do that because it would be compile-time error to write such program).

Surprisingly enough this world as not as far from us as we may imagine, there are already attempts to employ such techniques to practical tasks.

But it's not even remotely close to “programming for the hardware” model.

Rust's solution to this dilemma is just to go and kick people who don't understand that out of the community.

But I don't think C and/or C++ can really do that.

Old hats feel they are entitled to be able “to program for the hardware” model and even people who understand why UBs are unavoidable are getting tired.

It's one thing to write small amount of UB-capable code in the Rust's unsafe blocks, it's entirely different things if you need to think about these hundreds of possible UBs all the time when you write each and every line of code.

Development quote of the week

Posted Dec 1, 2022 23:06 UTC (Thu) by Karellen (subscriber, #67644) [Link]

C IS NO LONGER FIT FOR SYSTEMS PROGRAMMING.

Yeah, if by "no longer" you mean "since it was first standardised in 1989". Because that's when UB was defined and its semantics decided upon.

In practice, what the compiler writers are doing is saying "there is no such thing as nonportable constructs", despite the standard explicitly allowing for it.

Not quite. In practice, (modern) compiler writers allow for non-portable constructs, but you have to explicitly opt into them. (If they make sense on the platform you are compiling for. Which they might not, because they're non-portable). Hence GCC's -fwrapv and -fno-delete-null-pointer-checks.

Development quote of the week

Posted Dec 1, 2022 19:18 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> "unsafe blocks can carry out undefined behaviour"

No, they cannot. There are precisely 5 things `unsafe` allows:

- dereference a raw pointer
- call `unsafe` functions
- implement an `unsafe` trait
- mutate a `static`
- access `union` fields

That's it. UB is *still not allowed*. You must still uphold *all* of the rules that Rust expects. The only difference is that the compiler doesn't check everything. `unsafe { let a = &mut t; let b = &mut t; }` is UB (including if you go through a raw pointer) and not a power that `unsafe` provides.

Development quote of the week

Posted Dec 1, 2022 17:47 UTC (Thu) by anton (subscriber, #25547) [Link] (20 responses)

Compiler: "Wait...what? What part of "not allowed to perform" are you not getting?"

No, the compiler does not say that. If compilers reported undefined behaviour at compile time, programmers would just work around it (and for new code, they would not even grumble). But compilers don't do that; they just miscompile, usually without warning.

Development quote of the week

Posted Dec 1, 2022 19:01 UTC (Thu) by khim (subscriber, #9252) [Link] (17 responses)

> But compilers don't do that; they just miscompile, usually without warning.

But that's the whole point of UBs: the implementor license not to catch certain program errors that are difficult to diagnose is pretty clean explanation of what UBs are and who is responsible for avoiding them.

Development quote of the week

Posted Dec 1, 2022 23:05 UTC (Thu) by Wol (subscriber, #4433) [Link] (16 responses)

But the compiler clearly HAS caught undefined behaviour, and chosen to SILENTLY ALTER the program behaviour as a consequence. THAT'S A PSYCHOPATH.

If you're going to do that, AT LEAST TELL THE PROGRAMMER!!!

This is the behaviour that will destroy the C ecosystem - it can just no longer be trusted ...

Error trapping is enough of a nightmare without having to wrap the functional code in loads of traps and measures to prevent failure instead of just saying "if this fails then". (I'm spoilt - I'm used to languages where I don't get forced into artificial and un-natural constructs like "int32".)

Cheers,
Wol

Development quote of the week

Posted Dec 1, 2022 23:43 UTC (Thu) by khim (subscriber, #9252) [Link] (15 responses)

> But the compiler clearly HAS caught undefined behaviour

Nope. It assumed that UB doesn't happen and optimized code to the best of it's abilities. It's compiler's work.

> THAT'S A PSYCHOPATH.

Nope. Just logic. Compiler doesn't have common sense, it's neither hates you nor loves you. It's not capable of being psychopathic. It's too simple for that.

> If you're going to do that, AT LEAST TELL THE PROGRAMMER!!!

Why? Remember the rationale? Certain program errors that are difficult to diagnose.

It's programmers job to keep an eye for UB, not compiler's job.

There are certain tools that may help (C/C++ have UBSAN, Rust have Miri), but they are not part of the compiler for obvious reasons.

> Error trapping is enough of a nightmare without having to wrap the functional code in loads of traps and measures to prevent failure instead of just saying "if this fails then"

Well… that's the whole point of C standard: to write portable code with non-portable language constructs developers have certain restrictions placed on them. Compiler relies on these restrictions to be followed.

Be it C compiler or Rust compiler.

> I'm spoilt - I'm used to languages where I don't get forced into artificial and un-natural constructs like "int32".

Well… Rust have sublanguage which acts like you want. But it's unsafe sublanguage works on the same principles as C.

List of undefined behaviors in Rust is shorter and simpler and, notably, integer overflow and type punning are not UBs, but then, both clang and gcc have switches that turn off these UBs thus difference is pretty superfluous.

Pointer provenenace is still very much a thing and there are extra UBs that C doesn't have, too.

Development quote of the week

Posted Dec 5, 2022 10:54 UTC (Mon) by hummassa (subscriber, #307) [Link] (14 responses)

> Why? Remember the rationale? Certain program errors that are difficult to diagnose.

THIS. This has been the point that's missing from this whole thread IMHO. If it *was* difficult to diagnose, the compiler could not see the UB and optimize it out. So, Wol is right:

Once UB was detected, things were optimized out because of it, or other inferences were done, it's warning time. At that point, the compiler has a detailed internal rationale for what it will do. Dump that. Preferentially, with color-syntax-highlighting and graphical lines overlayed on the code, DrRacket-style.

Again: this is not impossible. The compiler KNOWS it's going to optimize something out because of UB. MAYBE you had to enable some level of optimization for it to happen, but, ok, the warning/error message can be dependent of `x` on `-Ox`. No problem there.

But, again, if the compiler is doing/omitting something because of UB, it SHOULD TELL THE PROGRAMMER. In painstaikingly richness of details.

Development quote of the week

Posted Dec 5, 2022 12:05 UTC (Mon) by khim (subscriber, #9252) [Link] (12 responses)

THIS. This has been the point that's missing from this whole thread IMHO.

The whole discussion is centered around this whole point.

If it *was* difficult to diagnose, the compiler could not see the UB and optimize it out.

That's “common sense”. Computers don't do common sense. Not even so-called AI-technologies can do these (for now?). Compilers defintely can't. Actually machine-learning technologies are even worse then what current compilers are doing: at least with current technologies compiler couldn't understand what it does, but human can, with machine learning not even human can explain why that or this change was done.

Computers do logic. And compilers do logic. And logic says: A ⟹ B and B ⟹ A are very different things.

The fact that compiler assumed that program doesn't do certain “hard to diagnose” things doesn't mean that compiler understood what the program is doing and consciously thwarted the programmers intent. Compiler can not anything “consciously” because it doesn't have “conscience”.

Compilers couldn't understand anything. They are too simple for that. But they are now more then capable to do repeated applications of various simple tests and computers are fast enough that such applications happens quickly.

Once UB was detected, things were optimized out because of it, or other inferences were done, it's warning time.

Do you want to see 100 lines of warnings for every line of source code? Any source code, not just code with manifested UB? Because that's the only thing that compiler can easily offer. It can only list assumptions that very uses and most of them are trivial: it assumed that variables were initialized, that memory was allocated, that pointers are not dangning and that 200 other UBs haven't happened, too.

Preferentially, with color-syntax-highlighting and graphical lines overlayed on the code, DrRacket-style.

O_PONIES, O_PONIES, and more O_PONIES. Compiler couldn't help with what it doesn't have. These graphical lines which you observe in gcc output or rusrc is done by entirely different module specifically created to show them.

If you want to observe how compiler emploited lack of UB to do certain optimization - most compilers supports doing dumps of code after each pass.

The only trouble: O_PONIES lovers don't know how to read these dumps, they don't care about them and would never understand them (or, rather, if they would do some effort of trying to understand them they would stop being ignoramuses wishing for O_PONIES).

Again: this is not impossible.

Prove it. Patches are welcome. I'm sure if you would create something like that both clang and gcc maintainers would be grad to accept such patch.

The compiler KNOWS it's going to optimize something out because of UB.

Oh, sure. Compiler knows which UBs it assumed not to happen in each line of code. Between 10 and 50 possible imaginable UB per line of source. Do you want that dump? What do you plan to do with it?

But, again, if the compiler is doing/omitting something because of UB, it SHOULD TELL THE PROGRAMMER. In painstaikingly richness of details.

That is easily achievable. Just dump ASS or RTL and markups about which UBs were assumed not to be there would be included. Thousands of them.

But programmers don't want that. They want ponies. Instead of list of thousands of UBs which compiler assumed are which are not in the code they want one UB which was assumed not to be there but which was actually put there by mistake or (even worse) on purpose.

That information compiler doesn't have and thus can not dump.

Development quote of the week

Posted Dec 5, 2022 15:31 UTC (Mon) by Wol (subscriber, #4433) [Link] (11 responses)

> Do you want to see 100 lines of warnings for every line of source code? Any source code, not just code with manifested UB? Because that's the only thing that compiler can easily offer. It can only list assumptions that very uses and most of them are trivial: it assumed that variables were initialized, that memory was allocated, that pointers are not dangning and that 200 other UBs haven't happened, too.

THIS THIS THIS.

ALL the UBs you have listed here are BUGS. They are all "Don't Do It".

But things like signed integer arithmetic? That's a SchrodinUB - you don't know if it's UB or not until you do do it. And if you look at the laws of Pure Mathematics, it's not UB *at*all*.

There's a reason us grey beards get all pissed off at this. We understand logic. We can do Maths. You said your typical programmer of today loves rules. "Monkey See Monkey Do" rules. Like "Don't do signed integer arithmetic!". WHAT!?!? Why on earth does C even HAVE signed integer arithmetic if you're not supposed to use it!? Why doesn't the compiler just optimise it ALL away if it's guaranteed to screw up at the most inopportune moments?

When I design software, I do NOT set out to solve the specific problem in front of me. At work at the moment I am trying to unravel a horrendous mess where everybody has solved their own problem, and we have loads of programs all interacting in weird and unexpected ways. (Compounded by people not using common sense and blaming me for "the output looks wrong", so when I investigate my reaction is "what is this garbage you are putting in?")

When I program I always - even if not consciously - do a truth table. What are the possible inputs? What are the possible outputs? Even if my problem is a small subset of the output, I DON'T close off the input space, I just mark it "here be dragons" (or, usually, more work for later), and trap or block it so I can deal with it later.

The majority of your examples have a simple, single value truth table. It's called "Garbage In, Garbage Out". I have absolutely no problem with that being UB.

But signed integer arithmetic? It appears to collapse to a simple two-way table - "Valid In, Valid Out", or "Valid In, Garbage Out". To treat that the same as a simple "Garbage In, Garbage Out" is crazy. (I've ignored the other two states, because "Garbage In" is indistinguishable from "Valid In".) A scenario where all possible inputs are valid, is not the same at all as a scenario where all possible inputs are invalid.

Cheers,
Wol

Development quote of the week

Posted Dec 5, 2022 17:03 UTC (Mon) by khim (subscriber, #9252) [Link] (10 responses)

> And if you look at the laws of Pure Mathematics, it's not UB *at*all*.

But laws of Pure Mathematics don't apply to computers unless you make them. C language does pretty natural assumption is that you would do so and would avoid computations which may overflow.

> Like "Don't do signed integer arithmetic!"

No, that's not the rule. The rule is “do the bounds checking, damn it”. Yes, this includes signed integer arithmetic, too.

> Why doesn't the compiler just optimise it ALL away if it's guaranteed to screw up at the most inopportune moments?

Because that would be against the rules, obviously.

> Even if my problem is a small subset of the output, I DON'T close off the input space, I just mark it "here be dragons" (or, usually, more work for later), and trap or block it so I can deal with it later.

For that to work you need some kind of isolation between hardware and abstract machine that language uses.

When you try to do that layer thin enough (like C does) you invariably end up with “bad code which may work for some time and then blow up in your place” (like using already freed object which may work for some time till, one unlucky day, interrupt would come at different time and that object would be overwritten). And then you have to describe that subset and avoid it.

It seems that you understand that. But somehow you refuse to accept that bounds of that subset are, to some degree, arbitrary and there needs to be common description of where they are!

That common description is called C standard. If you think that some things don't belong to that subset — use the switch and make them defined to you, personally (compilers already support that, clang and gcc provide plenty of such switches… there are many such controversial UBs and thus many such switches) or change the standard (yes, it's harder than swearing on various forums but have at least theoretical chance to work).

> It appears to collapse to a simple two-way table - "Valid In, Valid Out", or "Valid In, Garbage Out".

C doesn't support such things. It's either “Valid In, Valid Out” (like with unsigned numbers) or “Garbage In, Garbage Out” (like with signed ones). “Valid In, Garbage Out” is not in the cards, sorry.

> A scenario where all possible inputs are valid, is not the same at all as a scenario where all possible inputs are invalid.

Yes. And it's one of the reason why C standard classifies all inputs either as “Valid In, Valid Out” (fully-specified things, unspecified things and implementation-specified things are all in that class) or “Garbage In, Garbage Out”. Case of “Valid In, Garbage Out” shouldn't happen in a valid C program.

Even with unsigned numbers division by zero is not allowed (it's UB) because that would be “Valid In, Garbage Out” case. You have to ensure that divisor is non-zero. Only people rarely object against that, for some reason. Maybe because it's defined as “Garbage In, Garbage Out” in math, not just in C standard.

The fact that you don't ever need to support that “Valid In, Garbage Out” is something they really wanted to have because it simplifies reasoning for the compilers and wasn't supposed to be a big deal for humans.

That was, probably, miscalculation ANSI committee did 30 years ago. But I don't think it's easy to change: neither flags for clang/gcc nor Rust do that. Instead of declaring integer overflow UB Rust makes it IB and carefully describes all possible results in it's reference and the same with clang/gcc switches.

The rule there are no such thing as “Valid In, Garbage Out”, there are only “Valid In, Valid Out” and “Garbage In, Garbage Out” stays unchallenged.

Development quote of the week

Posted Dec 6, 2022 10:10 UTC (Tue) by farnz (subscriber, #17727) [Link] (9 responses)

The laws of Pure Mathematics do apply to computers, very much so. C is, for example, doing its operations on a finite set with abelian groups for multiplication and addition, and not on a field (since some operations such as multiplication and addition are not defined for all pairs of inputs from the set).

The fun comes in when people assume (falsely) that + and * in C are the field operators for school arithmetic. They're not - they're still set operations, and they still form an abelian group, but they're not field operators, and they don't behave like field operators in all cases.

Development quote of the week

Posted Dec 6, 2022 14:03 UTC (Tue) by khim (subscriber, #9252) [Link] (8 responses)

Only unsigned types are defined to work like you describe. Signed types work differently.

Development quote of the week

Posted Dec 6, 2022 14:11 UTC (Tue) by farnz (subscriber, #17727) [Link] (7 responses)

I'm sorry, you have me completely confused. I described a system where addition and multiplication are only defined for a subset of pairs of inputs, which is pretty much signed types in C - if I have a 32 bit integer type, then 2**17 * 2 ** 17 is UB in C and in Pure Mathematics for a type that's not a field, but is an abelian group, whereas 4 * 3 is defined, and is the same as 3 * 4 (the property that makes this an abelian group).

Unsigned types are fields, I thought? The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.

Development quote of the week

Posted Dec 6, 2022 14:49 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> I described a system where addition and multiplication are only defined for a subset of pairs of inputs

No. I haven't read that in your description and, worse, group is defined like this:

A group is a set 𝐆 together with a binary operation on 𝐆, here denoted "·", that combines any two elements 𝓪 and 𝓫 to form an element of 𝐆, denoted 𝓪·𝓫, such that the following three requirements, known as group axioms, are satisfied: associativity, identity element, inverse element.

You have started talking about group axioms without checking whether result is defined for all elements of supposed group. Unsigned numbers are a group. Signed numbers are not group.

> Unsigned types are fields, I thought?

No, they don't form a field. uint32_t(65536) doesn't have an inverse element (that is: you can not find any 𝔁 to make uint32_t(𝔁)*uint32_t(65536)=uint32_t(1)), it's a ring. Also a well defined mathematical object, but not a field.

> The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.

Semigroup, group, abelian group, ring… they all call for operations to be defined for all elements.

But field definition includes UB: reciprocal not defined for zero (and only for zero, that's why unsigned numbers in C are not a field).

That, somehow, never changes the opitions of O_PONIES lovers: they, usually, continue to support the notion that it doesn't mean anything (or, worse, start claiming that the fact that it's UB to divide by zero even for unsigned numbers is also a problem… even if rarely bites anyone because most people test for zero, but not for overflows).

> whereas in what I described, some input values for addition and multiplication result in undefined behaviour

If something doesn't follow the definition then you can't say that it's “𝓧, but not 𝓧”. Well… technically you can say that, but it's a tiny bit pointless: math doesn't work like that, you couldn't use theorems proven for 𝓧 with “𝓧, but not 𝓧”.

You may say that while unsigned numbers properly model ℤ₂ᴺ ring, while signed numbers correctly models ℤ if (and only if) you ensure that operations on them don't overflow, but they don't form semigroup, group, abelian group, or ring.

I guess the idea was that people may need ℤ₂ᴺ, but since ℤ is not feasible to provide in low-level lamguage like C (even python have trouble with ℤ) thus asking developer to use them carefully and avoid them when result is not guaranteed wouldn't too problematic. After all they have to remember not to divide by zero in math, why couldn't be they taught not to overflow in C?

As you can see some people don't like that idea (to put it midly).

Development quote of the week

Posted Dec 6, 2022 14:56 UTC (Tue) by farnz (subscriber, #17727) [Link]

I made mistakes in the terminology (I'd forgotten the details of groups) - but I stand by my claim that it's mathematically reasonable to have a finite set, F, with binary operations + and * defined only for a subset of pairs of elements of F, and that maths does not always claim that because I have a finite set F, with a binary operation +, it must follow all the axioms of arithmetic.

Rather, C's signed integers are a finite set that behaves like the integers in some respects but not others, and results that hold for mathematical integers do not necessarily hold for C signed integers, but that this doesn't mean that C signed integers are not a mathematically acceptable set - so reaching for mathematics and saying "but maths says!" just highlights that you're not particularly good at mathematics, and expect C's signed integers to behave like a set you're familiar with from school, rather than like the sort of sets that you might start discussing in a bachelor's degree.

Development quote of the week

Posted Dec 6, 2022 15:54 UTC (Tue) by anselm (subscriber, #2796) [Link] (1 responses)

The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.

ISTR that even in a group, the result of the operation on two members of the group must always be a member of the group. “Undefined behaviour” where applying the operation on two members of the group doesn't yield a result within the group isn't allowed. Nor does it work to exclude, e.g., 2**17 from the set on the grounds that 2**17 * 2**17 isn't in the set; you need it in the set so that (non-overflowing) operations like 2 * 2 ** 16 can have a result. (Fields get that property from the fact that they're basically built on groups.)

Development quote of the week

Posted Dec 6, 2022 16:37 UTC (Tue) by farnz (subscriber, #17727) [Link]

Yes - that's my error. I stand by the claim that C signed integers are a perfectly reasonable mathematical object, and that their behaviour is happily described by pure mathematics.

But I have to retract my claim that they're a group - they're still a finite set, with several operations that give them superficial similarity to the set of integers, but they're not a group.

Development quote of the week

Posted Dec 6, 2022 21:09 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Unsigned types are fields, I thought?

They are rings, not fields (no multiplicative inverse, since we're not looking at floats).

Signed integers are not even rings, because at least one element doesn't have an inverse (e.g. there's -32768 but not 32768).

Development quote of the week

Posted Dec 6, 2022 22:33 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

That's not how rings behave. -32768 is inverse of itself. In fact it's exact same ring which you have with unsigned numbers (most CPUs these days don't even bother to give you separate operations for signed or unsigned numbers, they only distinguish signed and unsigned numbers when you compare them, but ring only deals with addition, multiplication, and, like any ring, equality… ordering is not part of it).

That maybe one reason for why C standard defines signed operations like it does: if you want ring — you already have one, unsigned numbers, why would you need another, identical one?

Development quote of the week

Posted Dec 7, 2022 17:31 UTC (Wed) by kleptog (subscriber, #1183) [Link]

Unsigned integers in C form a well defined ring: Z_2^16 or whatever your bit size is, the integers modulo a power of two. This gives you commutative, associative and distributive properties. The operations don't preserve ordering though.

Signed integers on computers don't form a ring, not in a useful sense anyway. For example 10*19=-66 on an 8-bit machine. You can argue that if you look at the bit patterns it's the same as unsigned integers but that's just relabelling. What I think the C designers wanted was to consider the signed integers a subset of the natural numbers. Because that gives you a meaningful way to deal with ordered comparisons.

For example, the implication a<b => 2a<2b is *not true* in the unsigned integers. This is however super annoying for a compiler optimiser. The standard compiler transformation in C where you change loop counters to count by the width of the object you're looping over is strictly not allowed unless you can prove the loop bound won't overflow. The compiler often can't prove it, so that would leave a simple optimisation on the floor.

Unless you assume overflow is UB.

"a+b > a if b is positive" is false in the unsigned integers. But a very useful thing to be able to assume when optimising code. The classic example of code that changes with -fwrapv:

int f(int i) {
return i+1 > i;
}

Is optimised to true without it.

Pointer arithmetic is another case, you don't want to treat this as a full on ring either, because pointer comparison is so useful (can we assume a[b] is always later in memory than a[c] if b>c?). So the compiler simply assumes overflow cannot happen.

I don't think the compiler writers are out to get you. But if you want signed integers to act 100% like CPU registers, you do need to set the compiler flag.

Development quote of the week

Posted Dec 5, 2022 12:17 UTC (Mon) by farnz (subscriber, #17727) [Link]

But the compiler does not "see the UB". It applies a set of rules that are valid assuming no UB, and comes to a result.

I bring you back round to the "proof" that 1 = 0. The outcome is clearly absurd - 1 = 0 is false - and yet I have a "proof" here that 1 = 0.

The reason it all goes to pieces is that I've applied a set of rules, all of which are valid. Every step in that "proof" is a valid application of the symbolic manipulation rules of algebra - none of the applied rules are wrong, nor was the manipulation incorrect. The reason I come up with an absurd result is that I didn't detect that, for the specific rule "divide both sides by (x-y)", it's invalid to apply that rule if x = y.

UB problems in compilation are similar. I've got a set of symbolic manipulations (often a very large set) that I can apply to your program. Some of those manipulations are only valid if the program does not contain UB (just as in my "proof", I used a manipulation rule that is only valid if x != y). At no point does the compiler "detect" UB - it's manipulating the program on the assumption that it doesn't 'contain UB, and the consequences of a chain of manipulations (that are all valid in the absence of UB) is a bad outcome if there is UB present.

Development quote of the week

Posted Dec 1, 2022 19:29 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

Note that some make sense in context (others less so, I agree). However, some come about because of an *intersection* of optimizations. For example, an `inline` function that checks for `NULL` before dereferencing can be removed if it is inlined in a place where the pointer is used before the inlined code begins. Same with macros (which are just bad inline functions with sadder "scoping").

Development quote of the week

Posted Dec 1, 2022 22:58 UTC (Thu) by anton (subscriber, #25547) [Link]

Actually, if you compile for an environment where dereferencing a NULL pointer traps, it's a proper optimization to compile the same code for

x=*p; if (p!=NULL) { X; }

and for

x=*p; X;

In that environment you don't need to assume that undefined behaviour does not happen; instead, you know that p!=NULL after the first statement.

However, the Linux kernel is not such an environment, so you cannot apply this optimization there (unless you want to make the assumption that undefined behaviour does not happen, which the Linux kernel does not assume).