Development quote of the week
Development quote of the week
Posted Dec 1, 2022 12:08 UTC (Thu) by Wol (subscriber, #4433)In reply to: Development quote of the week by anton
Parent article: Development quote of the week
Which bites far too many sensible programmers, because they TEST for undefined behaviour, and THE COMPILER DELETES THE TESTS BECAUSE IT'S UNDEFINED.
That's the lunacy of this approach ...
Cheers,
Wol
Posted Dec 1, 2022 13:55 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (9 responses)
If C were not "so portable" to allow it to work without forced pessimization on machines which don't matter to *you*, but mattered in the past, then maybe "well, it's two's complement, so there is an expected behavior that can be tested for" would allow that condition to stick around. FWIW, C23 does consider signed integer representations other than two's complement "obsolete" (for now at least; it is not yet finalized). So if you're using C23, then the condition can stick around (of course, you lose performant addition on one's complement machines, but by writing such a check you already didn't care, presumably).
So I see these three solutions for UB that continue to cause problems because developers think "on the hardware" instead of "on the C abstract machine":
- accept the loss of portability by moving behaviors from UB to implementation-defined
The former is probably fine with most things, but once you get into the differences between currently-popular architectures (e.g., supporting unaligned reads and such), expect to hear about how C is "leaving performance on the floor" for certain use cases.
Posted Dec 1, 2022 17:03 UTC (Thu)
by stevie-oh (subscriber, #130795)
[Link] (1 responses)
I've been following the whole Undefined Behavior compilers-vs-developers madness for a few years now, and as I understand it, a lot of these checks are superfluous and _can_ be elided -- because they were part of boilerplate code that was inlined, and _in the inlined context_, the compiler can prove that the check is unnecessary.
Sure, the example given in the linked article doesn't have any inlined code. But the code responsible for eliding the check has no way to know that, because it's not looking at your .c file; it's looking at an abstract model of the code that's already been run through a few optimization passes.
> better ways to say "I know I am using assumption X about the hardware here"
For the case of signed integer overflow -- and, really, the biggest source of astonishment for people seems to be signed integer overflow -- you can use *unsigned* integers. Overflow those all you want; those are specified to wrap exactly the way most people expect.
Though, personally, I'd just use
```
as my overflow check.
Posted Dec 2, 2022 14:15 UTC (Fri)
by renox (guest, #23785)
[Link]
In C yes, in Zig unsigned are exactly like signed integers: overflows are detected in debug mode but are undefined behaviors in release mode.
Posted Dec 1, 2022 17:38 UTC (Thu)
by anton (subscriber, #25547)
[Link] (6 responses)
And GCC targets only two's-complement machines. While portability to dinosaurs probably was the original reason for undefining signed overflow, I have read from fans of undefined behaviour who deny this and claim that this undefined behaviour was introduced for performance.
Posted Dec 1, 2022 18:57 UTC (Thu)
by khim (subscriber, #9252)
[Link] (4 responses)
You have a 1998-vintage Alpha but don't have a modern x86 device? That's crazy hard to believe. AC flag was added to it more than 30 years ago yet it's still supported. And IBM still makes new mainframes, Linux still supports them.
Posted Dec 1, 2022 22:41 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Concerning movaps and friends, yes, they are useful to let fans of undefined behaviour miscompile previously working programs, but they are not useful for testing whether all memory accesses in a program are aligned. That's because they don't work for general-purpose registers, they don't work for scalars, they check for the wrong alignment (not the type alignment, but the container alignment; total brain damage), and there is no guaranteed way to get gcc to use it.
Concerning byte order: Unfortunately IBM has failed to make a mainframe available to me, so I still have to turn on a vintage machine for big-endian testing (I typically use a 2004-vintage iBook G4 for that). In the long run, few people will test software on big-endian machines, bit rot will lead to many programs not working on them, and that will be a problem for IBM's mainframe business; or maybe not: IBM's salesforce will undoubtedly convince the customers that it's a sign of their elite status that not every old program works on their mainframe.
Meanwhile, the OpenPower part of IBM has seen the signs of the time and has switched the architecture to little-endian.
Posted Dec 1, 2022 23:11 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (1 responses)
That's not what he said. Or are you saying that x86 REQUIRES data alignment? (I don't know x86 architecture - I really don't know.)
Cheers,
Posted Dec 2, 2022 0:09 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Yes. Starting from 80486 you have a bit in EFLAGS register which you can use to switch between two modes. ARM got such switch a bit later (only it was getting to it from the opposite direction: old versions required data alignment while it's optional on latest CPUs).
Posted Dec 2, 2022 18:59 UTC (Fri)
by ejr (subscriber, #51652)
[Link]
It's fun, really.
Posted Dec 1, 2022 19:36 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
I can't. However, C still thought it important enough to not specify until the standard due next year.
> I have read from fans of undefined behaviour who deny this and claim that this undefined behaviour was introduced for performance.
I'm not a *fan* of UB. I dislike it as much as anyone here. But I'm also not saying that *my* thoughts on what any given UB should mean should be enshrined either. Note that some optimizations allowed by UB can improve performance. That doesn't justify UB's existence, but should serve as a reminder for some of the reasons UB continues to persist.
> If I want to test on big-endian machines, I also have to turn on some old, normally turned-off machine; little-endian has won, too.
We deploy to (new) machines that are big-endian, so that is certainly not universal.
Posted Dec 1, 2022 16:36 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (80 responses)
How is that lunacy? Compiler: "Your program is not allowed to perform UB, e.g. by exhibiting overflow." Developer: "Oh, OK then. I'll try an operation and check if it overflowed - but sneakily - to make sure I don't overflow." Compiler: "Wait...what? What part of "not allowed to perform" are you not getting?" You can write tests to see if an operation might perform UB - but you have to do that in a way that doesn't itself perform UB. For example, by checking (a > 0 && b > 0 && a < MAX_INT / b) instead of (a > 0 && b > 0 && a * b > 0).
Posted Dec 1, 2022 17:14 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (58 responses)
The problem is the compiler writers are mixing up pure mathematics and applied mathematics. And it screws over people who actually do try and check that they are getting a pure result from applied hardware.
If I multiply 100 by 100, that is pure maths, and does not overflow. If I'm using 8-bit hardware then the result is not going to be pure. The point of a computer is to compute, and ruthlessly applying applied maths constraints to a program trying to do a pure calculation (and corrupting the results in consequence) is simply a willy-waving exercise in producing meaningless AND DANGEROUS results.
As others have pointed out, this insanity will simply lead to C becoming - rather rapidly - obsolete. Rust, with its emphasis on "there is no such thing as undefined behaviour in safe code" and "unsafe blocks can carry out undefined behaviour" will supplant it in no time flat if people lose confidence in the ability of C compilers to produce code that can be trusted. It's all very well to expect old code to keep working, but if you can't recompile and trust the result, you will RUN away, not walk...
I doubt it's a co-incidence that Rust code has far more invariants than C, and therefore far fewer opportunities for mad computer scientists to produce unexpected results from apparently correct programs ...
Cheers,
Posted Dec 1, 2022 18:41 UTC (Thu)
by khim (subscriber, #9252)
[Link] (51 responses)
In the rationale for C99: Simply by definition undefined behavior is a program error (albeit the one which may not be diagnosed by a compiler). Computers don't deal with “sane” or “insane”. They don't have organs capable of dealing with these notions. The most you can hope for is logical (and then, only if there are no bugs in the compiler). Of course no. Rust is solving things in an extremely classic way:
The biggest difference between C/C++ lang and Rust land are the attitude. In Rust land, no matter how talented and capable you are… if you don't follow the rules you would be ostracized and expelled. That's very harsh, but as complete meltdown of C and C++ communities have shown that's the only way you can cooperate. Think about it: why the heck all these discussions about UBs in C/C++ invariably, 100% of time, come to that stupid “integer overflow” case? Both gcc and clang have options which make compilers respect integer overflow rules, shouldn't that close that discussion for good? But no, C and C++ developers, mostly old ones, who have grown in times where Undocumented DOS and Undocumented Windows were a goto books… they just couldn't accept that. For them it's not about concrete UB defined or not defined in the standard, for them it's symbolic: Am I a trembling creature or have I the right?? It's not about particular language rule, it's about that ability to ignore the rules (when justified, of course, yada, yada). But how compilers are supposed to optimize anything if their users are expected to break the rules? What's the difference (from the compiler's POV) between this atrocity and your overflow checks? They both rely on the knowledge of the world that compiler doesn't have, they both work on most major compilers if you turn off the optimizations… why one of them should be treated differently from the other? Rust answer is, basically “rules are rules, if program breaks them then anything is possible… but if you broke then by mistake we would help you to follow them”. Look on the story with Ipv4Addr, Ipv6Addr, SocketAddrV4 and SocketAddrV6: yes,
Posted Dec 1, 2022 22:40 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (16 responses)
Except that doesn't say what you say it says. It says it is not the responsibility of the compiler to catch undefined behaviour. That should NOT give the compiler licence to trash the programmer's attempt to catch UB.
The compiler should emit a warning, an error, or ignore it. The compiler shouldn't ASSUME UB (or the lack of it) and use it to alter the program logic.
Cheers,
Posted Dec 1, 2022 23:58 UTC (Thu)
by khim (subscriber, #9252)
[Link] (14 responses)
And that's precisely what happens. Compiler does hundreds of tiny steps and each one is supposed to be valid in the absence of UB. There are no code which does this: oooh, I seee there's UB ⇒ let's punish developer in a certain elaborate way Instead It has hundreds of small, simple steps where each and every one assumes there are no UB. because there are no UB ⇒ we can do this After dozen of hundred such transformations you get the result which you are observing. That's precisely the point of having an UB in the language definition. List of UBs are, in reality, list of things that normal program should never do. Trying to access memory which was freed. Dereferencing UBs tracking are responsibility of the developer, not the compiler. That's why they exist in the first place. Consider classic Rust's UB: core::hint::unreachable_unchecked. It's gcc counterpart is called __builtin_unreachable. The whole point of these functions is to provide hint to the compiler: this condition can never be true. It would be pretty silly to “catch and report” something each time when you use such function, isn't it?
Posted Dec 2, 2022 14:50 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (13 responses)
And further, a lot of the consequences of UB are not "compiler detected UB and did something silly". It's instead "compiler assumed no UB, and did something that's sensible for all defined behaviour, but not in the face of UB".
For example, a compiler can transform:
into an equivalent of the following:
This is useful - we've removed an operation from the code completely - but to do this optimization, we've had to assume that input += 50 does not overflow. The compiler can make this assumption in C because signed integer overflow is undefined; if we were working with unsigned integers, then this is not a valid assumption to make, and the optimized form would look like:
This is a relatively obvious case - but once the compiler is doing range analysis or other complex optimizations, the chain of reasoning that the compiler's going through can become incredibly opaque to humans, because the compiler is "just" looking through a large number of rules that might apply, choosing the ones that make the code potentially better, and repeating until it reaches a point where it's "good enough".
Hence my comparison to "proofs" that 1 = 0; these proofs tend to have lots of unnecessary steps, all of which are reasonable, and all of which exist to hide the step that goes wrong. That particular set of proof steps is a lot of rubbish meant to hide that the core of the proof is the undefined operation "divide by zero", and that if I continue reasoning validly after doing a divide by zero, I get a nonsense output. But, if you look at that proof and ignore the step that includes "we divide both sides by (x-y)", every other step is a reasonable and sane operation to perform. It's just that the outcome is complete nonsense.
Posted Dec 2, 2022 15:35 UTC (Fri)
by khim (subscriber, #9252)
[Link] (2 responses)
Yes. And it's important thing to understand that compilers always acted in that way. You couldn't have low-level language without UBs but as long as compilers were only capable of doing one or two or three simple passes (you can download early release of Microsoft Pascal and see that these passes were, literally, separate binaries because there was not enough memory to even keep all the program in memory at once) an illusion that you are writing code for the hardware, not for the abstract spec looked sensible: compliers just weren't too powerful to break that illusion for most programs and for most programmers (they were always able to break special, hand-crafted to be broken, programs, but since what they were able to do was so simple it was easy to reason about whether they would break certain invalid-by-spec-valid-on-the-hardware program). As compilers become more-and-more powerful they became capable of inventing longer and longer proofs and chances of making developers who misunderstood what they are actually doing upset have grown higher and higher. Tragedy happened around 15-20 years ago when computers have passed 1GHz barrier, have gotten gigabyte of memory (or more) and compilers have acquired the ability to do really long and complicated analysis. At this point rules, collections of UB, made back in the end of XX century stopped being adequate and it became obvious that C/C++ model is describing that cloud-cuckoo land… but no one wanted to change anything: C users continued in their delusion and C compiler developers insisted that it doesn't matter that C standard describes crazy cloud-cuckoo land, it's a standard, you have to follow it! This inevitably ended in today's meltdown because when people talk past each other with ultimatums and where both sides are just simple not capable of doing what the other side demands… the crazy fiasco what we have today was quite inevitable and unavoidable, I'm afraid. Compliers can only ever work “to the spec”, the illusion that you may “program for the hardware” is just an illusion, it's not possible in any language except for assembler (and it's not even possible to do that in assembler, just look on how retro-computing enthusiasts are looking on finding a specific version of assembler which may “correctly” compile old versions of MS DOS or MS BASIC). And C users could not follow the existing C language spec because it definitely describes cloud-cuckoo land, there are just too many UBs, some of them are really crazy, and because, in C, they can surface in almost every line of code… it's just beyond human capabilties, practically speaking. Compromise was needed, but, unfortunately, it's obvious that it was also impossible to achieve that without change to the community… and that change could only ever happen with a switch to another language. Sad story, but, ultimately, the one which was almost inevitable.
Posted Dec 2, 2022 16:08 UTC (Fri)
by farnz (subscriber, #17727)
[Link] (1 responses)
While I mostly agree, I don't think it's the 1 GHz barrier that made the difference; rather, I think it was the invention of SSA form in the last 1980s.
Before SSA form was invented, the amount of information you had to track to do anything much more complex than peephole optimizations exploded very quickly as the program grew, making it space-prohibitive to do many of the complex optimizations modern compilers do. SSA form tamed that explosion of state - and thus enabled compilers to do much larger-scale reasoning about the program's meaning.
Posted Dec 2, 2022 16:37 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Both were important. Without SSA you couldn't do that many optimizations, but without extremely fast CPUs and loads of memory you couldn't do them either! Note how SSA was invented in the 1980th, yet most compliers haven't employed it till many years later. GCC got it in 2005. GCC started to use the fact signed integers never overflow in GCC 2.95, years before it adopted SSA.
Posted Dec 3, 2022 0:32 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (7 responses)
But this is where it DOES get silly.
In Maths, "int * int = int". Valid input gives valid output.
In Computerese, "int32 * int32 = ...?". Maths tells us that given valid input, we can NOT guarantee valid output. And this is where Khim's statement below, that we cannot have a low-level compiler without UB, is wrong. invalid != UB.
If the maths tells us that valid input cannot guarantee valid output, then the spec should define the consequences, even if it dodges the issue by saying "whatever the hardware does". We don't need a spec to define what mathematics is. We need a spec to define the consequences of using an applied finite range rather than a pure infinite range.
That then means we get a *sensible* model that says "where pure maths works, use the mathematical model. Where finite-range applied maths works, use the finite model. And when all else fails, whatever the hardware does." Yes it's going to break a lot of those optimisations where the compiler assumes the input has been sanitised, but the compiler should not be assuming! If the code says "uint64 = uint32 * uint32", then the compiler can quite happily assume that, provided the operands are promoted BEFORE the multiplication, the result will be valid. That's where those optimisations would be valuable!
And that's where it looks like Rust scores. The compiler DOESN'T assume the input has been sanitised - it checks. Where it can't check, the programmer has to use "unsafe" to confirm he's checked. And while I haven't programmed any Rust, it doesn't look like it's that hard to bypass Rust's safety checks - but the language does force you to *explicitly* bypass them. So you can assume, by default, that all Rust code is safe.
The problem with all these fancy C optimisations is that they take advantage of code that the programmer didn't even realise was unsafe.
Cheers,
Posted Dec 3, 2022 16:00 UTC (Sat)
by khim (subscriber, #9252)
[Link] (6 responses)
Can we stop beating that dead horse? Because it become crazy silly at this point. Existing compilers give you a way to remove that UB from the specification (with -fno-strict-overflow), but that just changes the list of UBs in the abstract machine specification. It doesn't change the fact that you are still programming for the abstract machine not “for the hardware”. Only if you wouldn't write the magical Yes. But some developers don't buy that argument and program in unsafe Rust like you claim it's Ok to program in C. Some of them are even quite capable and knowledgeable. That's how we end up in that situation: if nothing else works they are just expelled from the ecosystem. C couldn't afford that solution. Sure, but the only way to solve that problem is to change the abstract machine specification! And attempts to do that have failed spectacularly. Maybe at one point is was a surprise to the developers that signed overflow is undefined in C and C++, but in last 10 years so many articles and lectures talked so much about these that I have no idea where you have to live not to know that. And if you know and still want to use that mode (like Linux kernel does) there are -fno-strict-overflow. Yet C users assert that it's not enough and they have right for that magical
Posted Dec 3, 2022 19:23 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (5 responses)
You are making an ABSOLUTELY CLASSIC blunder here. You are assuming everyone else is the same as you. You are making an ASS of U and ME.
Just because you may be a C guru doesn't mean I am. Just because I'm a damn good programmer, and I do sometimes program in C, doesn't mean I'm a particularly good C programmer.
Where would I have gone to to hear these lectures? How would I have found the time to go to them? And who would have pointed these articles out to me? Bear in mind I am probably older than the C language itself - I was probably in primary school when it was born. I probably did know that signed overflow was undefined in C, but I certainly didn't know that I knew. In a couple of months time I will have forgotten.
But the BIG problem is "the principle of least surprise". If I try to multiply two numbers such that the result is too big to fit, I expect to get the wrong answer. That's my fault, fair enough. What I do NOT expect (and what you are telling me is a perfectly legitimate thing to happen) is for the compiler to delete the calculation because it saw that it could go wrong!
(Or in this particular case, the test for it going wrong, because the compiler assumed (despite maths to the contrary) that the calculation would never go wrong.)
Imho, if turning UB into defined behaviour breaks certain optimisations, tough. If the C model contradicts standard pure maths, then the C model should define the consequences of doing so - valid INPUT should not result in totally unexpected output.
At the end of the day, I really don't want to have to prefix every int32 multiplication with "if ln2(int1) + ln2(int2) > 32 then don't do the calculation". Is that really an optimisation?
Cheers,
Posted Dec 3, 2022 20:54 UTC (Sat)
by khim (subscriber, #9252)
[Link] (4 responses)
Well, you found time to argue about that topic on LWN, somehow, why couldn't you find time to read or watch some news about language which you are using in your work? Sure. Was a surprise for me the first time I hit that issue, too. But then I have found explanations, options (which gave me choices) and other things… which made it clear for me that complaining about that would just be stupid… you, on the other hand, want to change the whole world to make sure it would follow your ideas. Why? Do you really hope it'll work? What does it change? In fact it makes the whole thing much more bizzare: usually it's young people who want to change the whole world when they discover that it behaves quite unlikely their ideals. Why situation with C is different, I wonder? Newgrads quickly accept the fact that shouldn't rely on behavior in case of overflow while “old hats” continue with their rants which leads us literally nowhere. Where can I download your compiler which follows that ideology? Would be interesting to compare it to others. And it's still not clear why would you turn UB into defined behavior (and how would you define it for cases like double-free or data races). No, but it's your choice to write such strange and unusual code. There are many other ways to do caculations which don't overflow (e.g. you can cast But your argument can be easily turned around: if you don't even read about what happens in C compilers land, if you don't watch videos, if you don't ever talk to C compiler developers… then why do you expect that they would follow the rules you invented and not the rules that have been clearly documented? You don't even try to discuss with anyone whether change to the rule, you just ASSUME that everyone would find your rules so obvious and correct that they would be implemented in place of what's actually documented and implemented! Forgive me, but while I, too, don't like that rule of C, but assumption that someone would, you know, read the documentation is a bit more natural that someone would read the documentation and then ignore it! This principle belongs to the “discussion about rules” stage. And I would say that C users are as much to blame as C compiler developers. Instead of discussing the rules they invariably start discussing this magical Not gonna happen, sorry. Not even Rust changes that. It only splits code into two parts, one where you have to play by rules and one where compiler ensures that you play by rules, but if you violate the rules all bets are still off.
Posted Dec 4, 2022 0:59 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (3 responses)
> Well, you found time to argue about that topic on LWN, somehow, why couldn't you find time to read or watch some news about language which you are using in your work?
Huh? Unfortunately, the main language I use at work is Visual Basic. And I'm hoping that I will soon be able to use DataBASIC again. (I did say I don't use C ...)
I'm trying to get used to C again, for a personal (at the moment) project, but the less of that I do the better ... for exactly these reasons.
> And it's still not clear why would you turn UB into defined behavior (and how would you define it for cases like double-free or data races).
Well, if the compiler detects a double free, it should really halt with an error. A double free is wrong, period. And if it doesn't detect it, there's not a lot it can do about it. Likewise data races.
Thing is, those two problems are (in my mind at least) clearly different. A double free is a nonsensical operation. A data race will produce nonsense results. The basic operation is ALWAYS wrong.
But multiplication? You can't say "don't do signed multiplication" (or maybe you can), but to say "sometimes it will work, sometimes it won't" is asking for trouble ...
> > At the end of the day, I really don't want to have to prefix every int32 multiplication with "if ln2(int1) + ln2(int2) > 32 then don't do the calculation". Is that really an optimisation?
> No, but it's your choice to write such strange and unusual code. There are many other ways to do caculations which don't overflow (e.g. you can cast int to unsigned int, multiply two values then cast them back… or can just use functions given to you specifically to check for overflow).
Are you saying that "int x; x = (uint) y * (uint) z" will result in the same answer you would expect from "x = y * z"? I guess 2s complement says it does, but the resulting code is unreadable to a programmer not very familiar with C. (I do things like "6 append 0 = 60", or "a = a + (b==1)", but there's no way I'd expect someone unfamiliar with DataBASIC to have a clue what they do ...)
Or calling out to functions? Written, I guess, in assembly to avoid using UB themselves, but not very efficient as a result ...
I get where you're coming from, the C spec is incomplete and inconsistent, and this is the inevitable consequence, but I really don't see why C needs UB. "If you do something not covered by the C model, then you get the hardware model". If you multiply two int32s, OF COURSE the result will sometimes not fit back in an int32 - that's basic maths. The compiler shouldn't assume the laws of basic mathematics don't apply, and use that as an excuse to do something totally unexpected.
The trouble with all this is that the cost of getting round all the optimisation cock-ups is - for many people - a lot higher than the gains such optimisations produce. This is why I hate SQL - the cognitive load of coping with its failings is far higher than the benefits it gives ... it's just not worth it! (I did say I work with VBA - Excel may be a crap database, but it's easier to use than Oracle/SQL ... :-)
Cheers,
Posted Dec 4, 2022 4:36 UTC (Sun)
by khim (subscriber, #9252)
[Link] (2 responses)
Ooh. More
There are some tools which may help you detect such cases (like MSAN) but they make your code significantly slower. Still detection is not guaranteed and nobody knows how to make them faster (many tried, there are no success). It's just hard problem. But in C specification they are described by by almost exactly same words and in exact same places. Literally, it's not a figure if speech. The pointer argument to the free or realloc function does not match a pointer earlier returned by a memory management function, or the space has been deallocated by a call to free or realloc is here. If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined is here. Note that there are no mention of unsigned numbers in that text, but that's because standard defines result of operations on Well… if the result is not mathematically defined or not in the range of representable values for its type then such thing is classified exactly and precisely the same way. Does it mean that C standard doesn't know the notion of “do what the machine is doing”? Of course not! Such things are also carefully collected and enumerated. That's just different list, list of “implementations specified behaviors”. You can find the follows item there: The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type — here. That means that you can perfectly safely convert from 4294967295 from Well… that's what the standard says and that's what compilers expect by default. Their writers know that some people would like to write code with overflow, though, and they even specifically include -fno-strict-overflow option for folks who don't like this particular aspect of a C standard. Linux kernel uses it. You, too, may use it. If you want. What more would you expect and why? Yes. Maybe, but it's guaranteed to work. Modular arithmetic never overflows and conversion from unsigned to signed is defined, more-or-less as do what the machine is doing (more precisely: it's declared that it's implementation have to pick one way to do that and stick to it and on most modern architectures and compilers that just means pure reinterpretations of bits). No. These functions are by definition the most efficient way to do what you want. Their names doesn't include word Built-in for nothing. Compiler provides them, nit even the standard library. And of course, it would be silly to provide them in compiler and then implement them inefficiently. Usually they are turned into one machine instruction which produces the result and also flag which is the tested to jump to error handler. On the contrary: it's specification is very detailed and mostly consistent (there are some issues with pointer provenance, but these are irrelevant here). Standard clearly marks cases which have to “do what the hardware is doing” and cases “which should never happen in the well-written program”. It gives them different names and even, helpfully, includes long lists which include behaviors of class #1 and class #2. The only tiny problems for people like you (who want to program without actually reading the documentation): signed overflow is grouped with “double free”, “dangling pointers” and other hard-to-detect-yet-need-to-avoid cases. And, importantly, not grouped with unsigned overflow (according to C standard unsigned overflow doesn't exist and couldn't exist). Because “hardware model” is not enough to describe how program is supposed to behave when faced with hard-to-detect errors like race conditions or other such crazy things. Race conditions are really interesting because they include undefined behavior on hardware level! Here's an interesting article on subject (and you can find tons of articled on LWN which explains how Linux deal with these… it's not easy to say the least). And when hardware itself behaves unpredictably (usually in sane way, but occasionally in way that layman wouldn't expect at all) the only reasonable approach would be what C, C++ and Rust are doing: describe rules which developer have to follow to create a working program. Anything else wouldn't, really, work. Rust have an advantage of limiting these issues to small percent of program code, but while in Compiler does what standard explains in such precision. It even provides you with a switch to do math as you want and special functions for efficient overflow detection. Again: what more do you expect from it and why?
Posted Dec 4, 2022 20:26 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
Not to treat two completely different categories of programming cockup as if they were identical?
> Well… if the result is not mathematically defined or not in the range of representable values for its type then such thing is classified exactly and precisely the same way.
If the result is not mathematically defined, then I'd quite happily put data races and double frees in that category. Don't do them!
"Not in the range of representable values for its type" is the direct consequence of something that *is* mathematically defined - eg multiplication. If the input is valid, the operation IS mathematically defined, and it's just the output doesn't fit, then how can that be the same as something that MUST have random and potentially disastrous consequences if you're unlucky/stupid enough to do it?
Cheers,
Posted Dec 4, 2022 21:06 UTC (Sun)
by khim (subscriber, #9252)
[Link]
They are treated differently. One has a command-line option to turn it into implementation-specific behavior, one doesn't. One can be caught with UBSAN easily, one requires MSAN and a tiny bit of luck. But yes, by default they are treated identically because that's what standard says. Compiler would ensure there are random and potentially disastrous consequences, don't worry 🤣.
Posted Dec 5, 2022 10:48 UTC (Mon)
by geert (subscriber, #98403)
[Link] (1 responses)
450?
Posted Dec 5, 2022 10:53 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Possibly, yes.
Shows the value of automated optimization, in that my hand-optimization may well be buggy.
Posted Dec 2, 2022 0:13 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link]
Posted Dec 1, 2022 23:36 UTC (Thu)
by anton (subscriber, #25547)
[Link] (28 responses)
Concerning your code that relies on the value of uninitialized variables, this code is probably pretty brittle during maintenance, so programmers avoid this kind of usage. By contrast, gcc does the same thing without -fwrapv as with -fwrapv in nearly all cases, and only rarely deletes a condition or sign extension, so programmers learn that signed integers in GCC perform modulo arithmetic; and some make use of this knowledge. You may think that they are wrong, but that does not make them avoid undefined behaviour.
One other difference is that gcc-10 -Wall produces several warnings for your code, while it silently miscompiles the example by Clément Bœsch.
Posted Dec 2, 2022 0:27 UTC (Fri)
by khim (subscriber, #9252)
[Link] (24 responses)
Yes, but to say that some code is “too ugly to survive” you have to precisely define what kind of code you consider “too ugly”. And if you would do that you would get list of UBs for your language. It's just not possible to create a low-level language without UB. One theoretical alternative is to use Coq to provide proofs of correctness, but I'm 100% sure that people who complain that C is now, suddenly, unusable, would be unable to use Coq. Do you really think it's problematic to fix? Here we go. If you don't introduce UB and don't make certain programs “too ugly to survive” then no changes in the compiler can be done. Ever. Demo program is obvious: just turn pointer to function into Literally any change in the compiler would be forbidden if you would try to create “low-level language without UB”. It just doesn't work.
Posted Dec 2, 2022 18:45 UTC (Fri)
by anton (subscriber, #25547)
[Link] (23 responses)
Posted Dec 2, 2022 19:32 UTC (Fri)
by khim (subscriber, #9252)
[Link] (22 responses)
Yes, they are doing that, too. The selected list of programs they don't break includes glibc, all the programs included in SPEC CPU and some others. Unfortunately doing what Linux is doing is not, really, feasible because there are just too many programs which are not even supposed to work with random C compiler (and their developers don't plan to fix them). Rust was born in a different era thus it can use crater run. But even then they haven't adopted Linus rule, it's not really feasible. They contact developers and help them to fix bugs in their code, instead Here you can see compatibility notes for crates broken by Rust 1.64.0 release. I saw that before. Like all O_PONIES proposals they end up in a trash can (like similar proposals for Linux which “doesn't break userspace”, lol) because they lack consensus. It's completely not clear why your proposals should be accepted and not bazillion other proposals thus compiler developers do a sensible thing and just wait for the ISO C and ISO C++ committee to sort all that mess out. The biggest issue, as I have already said, is C community: as long as C developers don't plan to follow the rules all attempts to change said rules wouldn't help anyone. It's as with an ages old adage: If users are made to understand that the system administrator's job is to make computers run, and not to make them happy, they can, in fact, be made happy most of the time. If users are allowed to believe that the system administrator's job is to make them happy, they can, in fact, never be made happy. Rust users understand that Rust compilers's job is to compile valid programs and that makes it possible to keep them happy (most of the time). Because they can agree on the notion of “valid program” (most of the time). Some C users believe compiler's job is to compile “programs written to the hardware” which makes them perpetually unhappy because each and every developer have it's own idea of what that means! Yes, C/C++ compiler developer have also acted irresponsibly when they started using lack of certain, quite controversial, UBs for optimizations (I, myself, have wrote about crazyness that happens to
Posted Dec 2, 2022 23:12 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (2 responses)
> Rust was born in a different era thus it can use crater run. But even then they haven't adopted Linus rule, it's not really feasible. They contact developers and help them to fix bugs in their code, instead Here you can see compatibility notes for crates broken by Rust 1.64.0 release.
And what do you do when USERS don't (or can't) upgrade to said fixed versions? That's why Linus' rule is so important. Although actually I'd prefer the mainframe mechanism - an IBM360 emulator running on an IBM370 emulator running on a ... and so on. So you can run programs from the early 90s in an early 90s userspace, etc etc.
> It's as with an ages old adage: If users are made to understand that the system administrator's job is to make computers run, and not to make them happy, they can, in fact, be made happy most of the time. If users are allowed to believe that the system administrator's job is to make them happy, they can, in fact, never be made happy.
Provided said sysadmin understand's that the *computer's* job is to run the users' programs. Not necessarily the sysadmin, but you sometimes feel the IT department has lost sight of the reasons for having said computers ... :-)
Cheers,
Posted Dec 3, 2022 15:06 UTC (Sat)
by khim (subscriber, #9252)
[Link]
Say that you are very sorry? They have decided to play alone, then can resolve that problem. It was important in the end of last century when computers weren't powerful enough to run few VMs. It's still important on some platforms where you can't run VM. But compilers never had such a problem: if users claim they can not upgrade the compiler 10 times out of 10 it's social problem, not technical. Social problems need social solutions and Rust's solution is very simple: if you don't want to upgrade and something breaks… you get to keep both pieces. Why should they know or even care about these reasons? They are doing their job, if they are doing their job according to the contract, then it's pointless and useless to discuss these reasons. When we had trouble with our air conditioning system in office we had to buy CO₂ measuring equipment, create a monitoring system, and only after we had the numbers which showed that air conditioning system was unable to provide law-compliant service they had to go back and rebuild this system. Oh, sure, absolutely. But if it couldn't do that while signed contract is fully fullfilled then it's not sysadmin's pain and not sysadmin's job to do anything. I can assure you: these air conditioning guys understood why we bought their system perfectly, yet as long as there was no proof that they are violating the law (and can be sued, theoretically) they did nothing. C compiler developers follow the same approach: we have a contract (called C standard in case of C, Rust reference in case of Rust), if we are not violating it then it's not our fault, if you feel that contract is, somehow, bad or wrong, it must be renegotiated. Why this approach, which is used literally everywhere else, doesn't work with C users, and why they feel entitled to have programs which are violating the contract have to be compiled “correctly” (in quotes because of if program triggers UB then, by definition, it's not clear what the “correct” result should be) is beyond me.
Posted Dec 3, 2022 15:19 UTC (Sat)
by khim (subscriber, #9252)
[Link]
One illustration for my point: statistic for supported If you want to use Rust from stable version of Debian (1.48 today) or Rust from Debian sid (1.62 today)? Sorry, guys, these are too old, you had your time for upgrade, now it's your problem. If you really need a certified Rust (which, by necessity, would be older) you may contact ferrocene guys and they may provide your with support for these older versions. Because you have created that problem, thus you have to pay for it's resolution.
Posted Dec 3, 2022 17:38 UTC (Sat)
by anton (subscriber, #25547)
[Link] (18 responses)
On my paper about backwards compatibility for C compilers:
It's completely not clear why your proposals should be accepted and not bazillion other proposals thus compiler developers do a sensible thing and just wait for the ISO C and ISO C++ committee to sort all that mess out.
My paper does not tell anyone how to make a compiler-specific program portable between compilers, nor how to make an architecture-specific program portable between architectures. The mistake of efforts like Regehr's Friendly C is that they try to solve these problems, but that is not necessary for a friendly C compiler.
Posted Dec 3, 2022 19:14 UTC (Sat)
by khim (subscriber, #9252)
[Link] (4 responses)
What's the difference? If your proposal doesn't make it possible to explain what the “well behaving” program is then it's useless for the compiler developers. If it does make it possible to establish that then it may as well be a proposal for a change in a C standard. But nobody tries to break any programs. They just follow the rules. You haven't proven that in your article. You haven't presented any “friendly C compiler”, you haven't proven it can do auto-vectorization, but, most importantly, you haven't proven that if you create such a compiler you would be able to make anyone happy. I'm pretty sure that “writers to the hardware” would find a way to become angry even on your compiler, but it's hard to check because there are no compiler to look on. At least Regehr tried to do something constructive. The only thing you do is tell “ It's typical for proponents of That's not very constructive.
Posted Dec 4, 2022 18:31 UTC (Sun)
by anton (subscriber, #25547)
[Link] (3 responses)
Posted Dec 4, 2022 18:57 UTC (Sun)
by khim (subscriber, #9252)
[Link] (2 responses)
Yup. Inventing different derogatory names for the people when you are trying to convince them to do something for you is not very good strategy. I'm not asking about where someone would think if such compiler would make them happy but whether it would actually make them happy. These are different things, you know. It's as with an ages old adage: If users are made to understand that the system administrator's job is to make computers run, and not to make them happy, they can, in fact, be made happy most of the time. If users are allowed to believe that the system administrator's job is to make them happy, they can, in fact, never be made happy. CompCert made a decent shot at what you are demanding, apparently, why haven't you became happy with it and still try to convince makers of “adversarial” compilers to do something (and do that by calling them childish names and, in general, trying to make sure they wouldn't hear you)? It's not as if it's just a problem of getting people aware, CompCert is not a new thing. Easy: people tend to use what they like and don't use what they dislike. In 10 years no significant users of “adversarial” compilers have made that switch. They prefer to complain about unfair treatment yet continue to use Even Linus, who, famously, refuses to entertain the notion of using
Posted Dec 6, 2022 18:51 UTC (Tue)
by anton (subscriber, #25547)
[Link] (1 responses)
I find it funny that someone who writes "O_PONIES" as frequently as you do is complaining about supposedly derogatory names.
Your CompCert link does not mention anything that sounds like what I describe. Instead, the headline feature is formal verification of the compiler. CompCert's description of the supported C dialect also makes no mention of any such ambitions.
As for why people have not made "the switch". The switch to what? Compcert, a research project that has few targets and does not fully support setjmp() and longjmp(), and does not even talk about anything related to the issue we have been discussing here, and has deviations from the standard ABI of the platforms it supports?
GCC and Clang are apparently not adversarial enough for that; the approach seems to be that they try to be backwards-compatible by testing with a lot of real-world code out there (which is good), and mainly unleash the adversarial attitude when reacting to bug reports (not good). Also, the C language flags (like -fwrapv) available cover the most common issues, the remaining cases have not been painful enough to make people switch to a different C compiler (which one?).
Switching to a language with a more friendly compiler maintainer attitude is a big job, and is not done easily. However, when starting a new project, that's a good time to switch programming languages; now we just need a way to count how many new projects use C as its primary language now, compared to, say, 10 years ago.
Posted Dec 6, 2022 19:26 UTC (Tue)
by khim (subscriber, #9252)
[Link]
These are fine if you don't plan to ask someone to do something for you. And it wasn't invented by me. It was, basically, invented on the LKML precisely when people started discussing situation about applications expected specific semantic which was never guaranteed or promised and which new versions of Linux kernel stopped providing. So much for 100% backward compatibility being a panacea for everything. As you can guess the end result was precisely and exactly like with C compilers: there was much anguish, lots of discussions but in the end it was declared that since these guarantees were never there and code just happened to work because of accident app developers would have to rewrite their code if they want these guarantees. So now you want full compliance with everything, too? Even more Obviously that number would go down. C, basically, refused to advance when other languages did. C18 is very similar to C90 and almost undistinguishable from C99. I don't think it would be interesting idea to look on that, C was slowly turning into COBOL without any tales of adversarial compilers. More interesting would be fate of C++. Use of C++ was growing, not shrinking, recently. Would be interesting to see what will happen to it.
Posted Dec 4, 2022 0:30 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link] (12 responses)
This basically leads down to an "IBSan" tool that detects implementation-defined behavior and signals on used-to-be-UB-but-is-now-arch-dependent. Portability is a benefit of code and if I know that my x86-compiled code is UB-free, it'll have the same behavior (but certainly not performance profile) on ShinyNewArch that gets released a decade from now. I really don't want to have to go to every project and make sure that they CI test my pet arch to make sure I don't have live grenades being lobbed my way on every update. I expect that Debian and NetBSD porters to obscure architectures appreciate that breaking these rules is just as "bad" on the "native" development platform(s) as they are on their target(s).
Now, if there were an in-language (no, the preprocessor doesn't count) to say "this is targeting x86 because we're talking to an IME, give me native behavior", *then* I could see there being some new "undefined-if-portable behavior" bucket for these kinds of things to go into.
Posted Dec 4, 2022 17:46 UTC (Sun)
by anton (subscriber, #25547)
[Link] (11 responses)
But my point is that a friendly compiler must also work for non-portable code: If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform.
Portability is an orthogonal requirement. Your hypothetical "IBSan" tool may be helpful, although I have my doubts, see below. In practice I test for portability by making test runs on as many different platforms as I can get my hands on. That's not 100% reliable, but it tends to work quite well.
I have my doubts about "IBSan" because it assumes one binary that should cover all portability variants. Real-world portable C programs often have lots of conditional compilation and stuff coming from configure to help with portability. If you write "the preprocessor doesn't count", it's obvious that you are not interested in C as it is used in the real world.
Posted Dec 4, 2022 18:27 UTC (Sun)
by khim (subscriber, #9252)
[Link] (3 responses)
Well, neither C, C++ or Rust are even trying to be “friendly” by that definition (here's recent example where Rust 1.65 doesn't accept source which Rust 1.64 accepted). That's fine with Rust users, yet, apparently not fine with small (but very vocal) group of C users. That's basically why C and C++ are doomed: in their world compiler users and compiler developers each talk in ultimatums which the other side is not willing to accept, which means conflict could never be resolved. I have seen so much talks about “friendly C” (
Posted Dec 4, 2022 20:45 UTC (Sun)
by pizza (subscriber, #46)
[Link] (2 responses)
That should read -- "that's apparently fine with the current Rust users".
C and C++ have several orders of magnitude of users than Rust. And those users (and compiler writers, and language stewards) are all trying to pull in their own, often incompatible, directions, collectively with literally billions of lines of code/baggage.
Rust, by virtue of being rather youthful, doesn't yet have a significant mass of users or use cases. There is only one implementation, produced by the same folks who define the language, and most of the users are still of the True Believer sort. All of this will inevitably change, and when it does, the needs of these various sub-groups will inevitably begin to diverge, and then the current "our way or the highway" language+implementation stewardship model will start failing.
If Rust does eventually succeed (ie ends up as a "legacy" language with many hundreds of millions of lines of code in wide deployment across tens of thousands (if not more) of organizations with divergent needs spanning a couple decades or so) then continuing to evolve it will run into many of the same sorts of problems that C and C++ face today -- ie problems of politics and governance.
I don't have any skin in this particular game, but I've been around long enough to see certain patterns, including the "we're smarter than those other guys so we'll be immune to their problems" hubris that *always* comes back to bite.
Posted Dec 4, 2022 21:53 UTC (Sun)
by khim (subscriber, #9252)
[Link] (1 responses)
I don't think so. There are certainly a lot more existing code in C and C++, because they had several decades of headstart. As for number of actual users it's hard to say for sure, but recent countings put at around half of Go or third of Kotlin (and about ten times less popular than JavaScript which, you must admit, it definitely more popular than C, C++ or Rust). No. The important thing is not fact that Rust is youthful, but the fact that Rust users are youthful. The fiasco that happened with C and C++ is mostly caused by old people who still remember times where it was possible to pretend that C is “portable assembler”, “program to the hardware” and expect that compiler wouldn't screw you. I dealt with quite a few newgrads and they accept strange and bizzare rules of standard C/C++ without much complaints. For them it's just how this weird language works. Strange rules, but hey, rules are rules. And the same happens with Rust. But in C, very often, they have to deal with these old “relax, I know what I'm doing, I'm older than C, I know how it works” guys. While in Rust these guys, as I have said, are expelled from the community, instead. I don't think this would change. Even if number of Rust developers is not ⅓ of number of C/C++ developers but closer to ⅒ of number of C/C++ developers it's pretty obvious that C/C++ style disaster wouldn't happen to Rust. Plank's principle in action: An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning I have meet real old software-related guys when I was in college and what I observe today reminds me of their tales about how structural programming arrived. The exact same refusal to accept new idea, the insistence that “proper” design is with the use of flowcharts on A1 (or A0 for complex cases) papers and that all these newfanged things like stacks or loops are just making development difficult and so on so forth. The only big question is whether this time Rust (and Rust-like) language would actually win or if history would repeat itself and after initial success of languages properly structural like Algol or Pascal some half-backed newcomer would come and take over (like C and C++ did). Time will tell. Nah. I don't think there's any chance of Rust making the same mistake as C and C++ did, but it certainly can do an entirely new ones. E.g. its approach to async programming… I'm still not convinced it's the right one and wouldn't lead to dead end.
Posted Dec 5, 2022 16:09 UTC (Mon)
by smoogen (subscriber, #97)
[Link]
Posted Dec 5, 2022 12:02 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (1 responses)
Given "If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform", what stops a compiler team deciding that supporting the behaviours of the previous version is too hard, and thus they're going to release a new compiler with the same CLI, that's not a "new version of the existing compiler"?
This is not a pure hypothetical - GCC 3 is not merely a "new version of GCC", it's a new compiler (the egcs GCC fork) that was adopted by GCC as the clear better outcome. If you set a rule like your proposed rule, what stops GCC21 being a new compiler, not version 21 of GCC?
Posted Dec 6, 2022 22:46 UTC (Tue)
by anton (subscriber, #25547)
[Link]
But actually that's somewhat the situation we have with gcc (and probably clang) now, only the maintainers don't say explicitly that their compilers are not backwards-compatible (they certainly have declared bug reports as invalid that clearly state that the code has worked with earlier gcc versions), so some people think of switching to a newer version of gcc as being an upgrade. It's not.
Even when starting with the same code base a compiler can be backwards-incompatible (as demonstrated by some gcc versions newer than 3), and with a different code base it can be compatible (but that's hard).
and actually ecgs was forked from the pre-gcc-2.8 code base.
Posted Dec 6, 2022 3:26 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (4 responses)
I'm interested in *improving* things so that the compiler can *see* "this code is x86-bound, feel free to optimize appropriately" with proper attributes rather than code-masking performed by the preprocessor. Flowing "this code was selected based on a check of `defined(__x86_64__)`" is unlikely to be tenable with how complicated some preprocessor checks are (and *their abstractions* used in various libraries).
Posted Dec 6, 2022 22:55 UTC (Tue)
by anton (subscriber, #25547)
[Link] (3 responses)
When you write "this code is x86-bound, feel free to optimize appropriately", what optimization do you have in mind?
Posted Dec 6, 2022 23:03 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
I'm thinking that the optimizers can assume specific behavior for things instead of considering it UB. For example, left shift by too much can keep the same value (IIRC, ARM makes it 0). The programmer *intent* that this is target-specific is what is important here. Bare C code doing such a shift is still in the "this doesn't mean what you think it means, so I will assume that such Bad Things™ don't happen".
> There are some programs that don't need configure or the like to be portable, but many use a lot of stuff coming out of configure, and I am very pessimistic that we can get rid of conditional compilation.
I also think that conditional compilation is here to stay. However, it being a code-blind copy/paste mechanism doesn't have to be true. With `constexpr` instead of preprocessor symbols, it is possible to have something like D's `static if` or Rusts `cfg!()` mechanisms to hide code during compilation. This allows it to still be syntax checked and formatted appropriately instead of being a wild west of sadness when some long-dormant branch with unbalanced curly braces finally gets activated.
Posted Dec 7, 2022 0:29 UTC (Wed)
by khim (subscriber, #9252)
[Link]
It's a bit worse than that. ARM uses low byte to do shift which means that shift by 128 is, indeed, zero, but shift by 256 doesn't change anything (and doesn't touch flags).
Posted Dec 7, 2022 0:54 UTC (Wed)
by khim (subscriber, #9252)
[Link]
Note BTW, that the very first CPU, 8086 (and 8088) performs like ARM, not like all subsequent CPUs. Means Intel took advantage of this UB back when it was developing Intel 80186 forty years ago. ARM also have similar case, e.g., it has So much for predictable hardware, huh? In fact document called ARMv8 AArch32 UNPREDICTABLE behaviours lists more than 50 of these.
Posted Dec 2, 2022 1:36 UTC (Fri)
by khim (subscriber, #9252)
[Link] (2 responses)
You don't need to rely on programmers following some rules for proper optimizations. Wrong. Let me state one thing which you need to understand before you can meaningfully continue discussion: 100% OF C AND C++ COMPILERS 100% OF TIME 100% DEPEND ON THE ABSENCE OF UB IN THE COMPILED PROGRAM. NO exception. Not even a single one. Constructive proof. That it. Now you have a program which doesn't even misbehave if compiler tries to apply any optimizations, it would misbehave even if you change one bit of output. For the old compilers which don't produce reproducible builds you may calculate SHA512 sum of some parts of programs and still arrive at the same outcome. Now. You may say that's stupid way to write such programs. And I will agree. You may say that's brittle. And I will agree with that, too. You may say that nobody writes such programs. And you would be correct, again. Yet the fact remains: if you would demand that changes in the compiler shouldn't break valid programs simply because they are causing UB then no changes in the compilers would be possible. None at all Because if you declare that such special-crafted programs which are handcrafted just for your compiler as “valid” then you couldn't change anything in any program. If you declared them “invalid” — then congrats, you have just added one (or more) items to the list of UBs that programmer is not allowed to use in valid programs. You can create a language which makes such trick impossible with Coq but that wouldn't a language where you can “program for the hardware”. On the contrary, it would be a language so far removed from what C is that none of C developers who complain that C/C++ compilers break their code would be able to use. It fundamental: low-level languages without UB can not exist. The only question is what exactly would we call UB and what exactly would we allow.
Posted Dec 3, 2022 0:04 UTC (Sat)
by riking (subscriber, #95706)
[Link] (1 responses)
found the UB
Posted Dec 3, 2022 15:31 UTC (Sat)
by khim (subscriber, #9252)
[Link]
Yes, but that's UB according to the specification of C abstract machine! There are no UB “in the hardware” in that place! My point was that if you “program to the hardware” and do even well-defined (on the hardware level) things and then demand “since that program worked once it must work after compiler upgrade” this demand makes it impossible to optimize some programs! Or even change compiler output in any way! And if you add rules which say “yes, on hardware level result of that operation is defined, but you are not allowed to do that anyway”… you are creating a specification of abstract machine! Yes, it maybe quite similar to “real hardware” (essentially: what real hardware does except for this small list of exceptions) but it's not “real hardware” anymore!
Posted Dec 1, 2022 23:52 UTC (Thu)
by isilmendil (subscriber, #80522)
[Link] (4 responses)
The thing is, the user did not break the rules. The optimizer did when it optimized from the inside out instead of the outside in.
> What's the difference (from the compiler's POV) between this atrocity and your overflow checks?
For starters, your example only gives a semblance of a correct result when the optimizer is turned on, rather than off. The optimizer correctly recognises that the set() function has no side effects and can be eliminated. It does not eliminate add() because it has side effects.
If the same reasoning as above were applied, the optimizer presumably should remove the main method because any use of an indeterminate value is undefined behaviour (§6.2.4) and since UB will certainly not be invoked main() is obviously never called ;-)
Posted Dec 2, 2022 1:11 UTC (Fri)
by khim (subscriber, #9252)
[Link] (3 responses)
User did. The rule is very obvious: for the program to behave correctly it shouldn't contain errors. Simple, no? I think you can even explain that to guys in the kindergarten, but for some reason C developers just couldn't accept it. Granted, version from the standard is pretty long-winded, but it's still the exact same rule: WTF? Do you want to imply that variables don't live on stack? Why does If we are coding for the hardware then correct result for that program is well-defined and it's No. That not how hardware works. Sorry. Variable Yes, it's how programs were written 60 years ago in Fortrain IV, but so what? Hardware haven't changed enough since then. Even code compiled with most compilers and disabled optimizations work. Sure. That would be valid compilation of that program according to the spec. But notably not a valid result if we are programming for the hardware and not to the abstract machine. If you are programming to the spec you may say that this function have no side effects (because spec doesn't even mention word “stack”), but on the hardware level there are stack, registers, calling convention and lots of other perfectly documented things. Any other result but 5 would be a miscompilation if we are allowed to write code “for the hardware”. From hardware point of view any store is a “side effect”, you can not just go and remove them! Heck, early compiles even allowed you to mix assembler and C code and, of course, assembler was supposed to access variables on stack (not possible if you declare them as Hardware on our desks is almost 100% compatible with what we had 40 years ago, when IBM 5150 was presented, why programs should suddenly start behaving differently?
Posted Dec 2, 2022 23:10 UTC (Fri)
by isilmendil (subscriber, #80522)
[Link] (2 responses)
I think we can agree that the following little for-loop is not ill-formed:
int main() {
Using the same reasoning as with the original example, you can elide the loop condition. Violating the array bounds would be undefined behaviour, which means that the check against i>0 is always true...
>> For starters, your example only gives a semblance of a correct result when the optimizer is turned on, rather than off.
>WTF? Do you want to imply that variables don't live on stack? Why does register specifier exist if that's not so? Or do you want to imply that they are, somehow, initialized when you enter the function? That wouldn't be optimal now, would it? C never initialized variables, why should it start doing that now?
Of course not. You have two function with two different local variables (both called "a"). The compiler using the same portion of the stack for both variables is sheer coincidence. C never initialized memory. You get initialized memory by sheer luck (i.e. because of the OS, not the hardware or C standard says so).
>If we are coding for the hardware then correct result for that program is well-defined and it's 5.
But we are not coding for the hardware, we are coding for an abstract machine model. That's the whole point of writing C, not assembler.
> It's the exact same logic you were using for justification of the use of signed overflow: standard doesn't explain how that program works, but if you know how real hardware works then it's not hard to predict the outcome. And, indeed, most compilers (with optimizations disabled) produce precisely that outcome.
As you say yourself, "it's not hard to predict the outcome". So you're not writing for the hardware, but for what you predict would be sensible choices for the compiler to make.
Posted Dec 3, 2022 13:32 UTC (Sat)
by khim (subscriber, #9252)
[Link]
I think you mixed up the sides at some point. Who are these “we”? Proponents of semi-portable C (like Wol, look on his complains about how compiler screws over people who actually do try and check that they are getting a pure result from applied hardware). Yes? They most definitely don't program for “an abstract machine model”. Unfortunately way too many C programmers from “semi-portable” camp forcibly assert that C is just a way to write assembler easier. That's the main issue with C. The endless “verbal wars” of C users against C compiler developers is a consequence. Just read what they wrote, please! It can not be stated more clearly than that dialogue, can it:
My example shows that it's not, really, possible to “program for the hardware” in C and thus use C as [somewhat] “portable assembler”. If you don't subscribe to the idea that we program for the hardware and that C is “portable assembler” then situation with that program is obvious. There are no need to explain to me how compilers work, I know that, I even done some patches for GCC some years ago. But I'm yet to see any explanation from guys in that “we program to the hardware” camp about my program. They either ignore me or say that I'm an idiot because I wrote that code (as if I propose to use it in production) and never explain what exactly is wrong with it. Because if you “program for the hardware” it's impossible to do! True. But that's exactly the step that “we program for the hardware” guys are explicitly refusing to do. Because if they accept it then, suddenly, all these problems about cases where compiler destroys their “perfectly sensible programs with UB” are no longer compiler fault, but their fault: they never had any promise that compiler would produce working code, but some compilers did so by accident (note: on Turbo C 1.0 this program work just fine with all optimizations enabled), if new compiler break that program then onus is on them to fix it! They couldn't accept that and wouldn't accept them. Which makes dialogue impossible. C compiler developers are not innocent either: when they find out that C users rely on certain things which C users are using in their programs 99% of time their answer is “standard doesn't allow that” instead of sensible dialogue. Basically we have two camps where members of each camp talk not to each other but “past” each other with ultimatums. Agreement or compromise is impossible in such dialogue. That is what I wanted to show. The fact that I'm yet to see any answer from “we program for the hardware” guys about what we can do with this example (the most I have seen are various discussions about me and about how I'm an idiot for writing such code) is telling enough.
Posted Dec 5, 2022 12:01 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
You're using different reasoning to the original example: the original example says that if a program execution path includes UB, then the program's behaviour is undefined (the meaning of UB), and thus any outcome is permissible, since all possible outcomes are within the defined behaviour of the program.
Your reasoning is slightly different; you're saying that if you change the program to potentially include UB when it did not before, then the program's behaviour is undefined. Which is also true, but is significantly different - the compiler is not permitted to add outcomes to the set of all possible outcomes of compiling and running a piece of code, only to remove outcomes and create a binary that chooses between allowed outcomes at runtime.
This is why UB is tricksy. By writing UB, you've written code that says "anything is an allowed outcome of compiling and executing this code", and then the optimizer, working on the as-if rule, is able to choose anything as the outcome. The code you present doesn't have UB, thus the allowed executions are constrained (to returning 0 from main), and the compiler is only allowed to generate code which returns 0, after possibly looping over 8 ints on the stack and setting them all to 0 (although, as this is not an "observable side-effect", it can elide that).
Posted Dec 1, 2022 18:52 UTC (Thu)
by Karellen (subscriber, #67644)
[Link] (4 responses)
OK, it doesn't say exactly that. What the C standard, §3.4.3, does say is: 3.4.3 There are no requirements for what the compiled code can do in response to undefined behaviour. The compiler is allowed to output code that does absolutely anything if such a thing occurs, including (infamously) make demons fly out of your nose. It is also therefore allowed to assume that UB never happens, because if it did happen, then it could have acted as if it didn't - because there are no requirements. If you want your program to be predictable in any way, you cannot allow any instances of UB to occur in it - including in any tests for UB.
Posted Dec 1, 2022 22:47 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (3 responses)
In other words, C IS NO LONGER FIT FOR SYSTEMS PROGRAMMING.
I don't remember if it was you, but when someone said "you need to program to the C model, not the hardware model", that is the death knell of a systems programming language. If you're not allowed to program to the hardware, why the hell are you using it to control said hardware?
> undefined behavior
Note this explicitly includes "A NONPORTABLE CONSTRUCT", ie valid code that is not guaranteed to work across divergent systems. In practice, what the compiler writers are doing is saying "there is no such thing as nonportable constructs", despite the standard explicitly allowing for it. Unfortunately, a systems programming language absolutely requires non-portable constructs that can be trusted to work ...
Cheers,
Posted Dec 1, 2022 22:55 UTC (Thu)
by pizza (subscriber, #46)
[Link] (1 responses)
By that definition, the only acceptable programming language for systems programming is... bare assembly.
Posted Dec 2, 2022 0:41 UTC (Fri)
by khim (subscriber, #9252)
[Link]
Indeed. The only theoretical way to create low-level language without UB would be to employ Coq to ask developers to provide proofs of correctness for their programs. Then there wouldn't be UB because you would be, quite literally, forbidden from doing things which cause UB (in particular any program which produces integer overflow wouldn't be able to do that because it would be compile-time error to write such program). Surprisingly enough this world as not as far from us as we may imagine, there are already attempts to employ such techniques to practical tasks. But it's not even remotely close to “programming for the hardware” model. Rust's solution to this dilemma is just to go and kick people who don't understand that out of the community. But I don't think C and/or C++ can really do that. Old hats feel they are entitled to be able “to program for the hardware” model and even people who understand why UBs are unavoidable are getting tired. It's one thing to write small amount of UB-capable code in the Rust's
Posted Dec 1, 2022 23:06 UTC (Thu)
by Karellen (subscriber, #67644)
[Link]
Yeah, if by "no longer" you mean "since it was first standardised in 1989". Because that's when UB was defined and its semantics decided upon. Not quite. In practice, (modern) compiler writers allow for non-portable constructs, but you have to explicitly opt into them. (If they make sense on the platform you are compiling for. Which they might not, because they're non-portable). Hence GCC's -fwrapv and -fno-delete-null-pointer-checks.
Posted Dec 1, 2022 19:18 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
No, they cannot. There are precisely 5 things `unsafe` allows:
- dereference a raw pointer
That's it. UB is *still not allowed*. You must still uphold *all* of the rules that Rust expects. The only difference is that the compiler doesn't check everything. `unsafe { let a = &mut t; let b = &mut t; }` is UB (including if you go through a raw pointer) and not a power that `unsafe` provides.
Posted Dec 1, 2022 17:47 UTC (Thu)
by anton (subscriber, #25547)
[Link] (20 responses)
Posted Dec 1, 2022 19:01 UTC (Thu)
by khim (subscriber, #9252)
[Link] (17 responses)
But that's the whole point of UBs: the implementor license not to catch certain program errors that are difficult to diagnose is pretty clean explanation of what UBs are and who is responsible for avoiding them.
Posted Dec 1, 2022 23:05 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (16 responses)
If you're going to do that, AT LEAST TELL THE PROGRAMMER!!!
This is the behaviour that will destroy the C ecosystem - it can just no longer be trusted ...
Error trapping is enough of a nightmare without having to wrap the functional code in loads of traps and measures to prevent failure instead of just saying "if this fails then". (I'm spoilt - I'm used to languages where I don't get forced into artificial and un-natural constructs like "int32".)
Cheers,
Posted Dec 1, 2022 23:43 UTC (Thu)
by khim (subscriber, #9252)
[Link] (15 responses)
Nope. It assumed that UB doesn't happen and optimized code to the best of it's abilities. It's compiler's work. Nope. Just logic. Compiler doesn't have common sense, it's neither hates you nor loves you. It's not capable of being psychopathic. It's too simple for that. Why? Remember the rationale? Certain program errors that are difficult to diagnose. It's programmers job to keep an eye for UB, not compiler's job. There are certain tools that may help (C/C++ have UBSAN, Rust have Miri), but they are not part of the compiler for obvious reasons. Well… that's the whole point of C standard: to write portable code with non-portable language constructs developers have certain restrictions placed on them. Compiler relies on these restrictions to be followed. Be it C compiler or Rust compiler. Well… Rust have sublanguage which acts like you want. But it's List of undefined behaviors in Rust is shorter and simpler and, notably, integer overflow and type punning are not UBs, but then, both Pointer provenenace is still very much a thing and there are extra UBs that C doesn't have, too.
Posted Dec 5, 2022 10:54 UTC (Mon)
by hummassa (subscriber, #307)
[Link] (14 responses)
THIS. This has been the point that's missing from this whole thread IMHO. If it *was* difficult to diagnose, the compiler could not see the UB and optimize it out. So, Wol is right:
Once UB was detected, things were optimized out because of it, or other inferences were done, it's warning time. At that point, the compiler has a detailed internal rationale for what it will do. Dump that. Preferentially, with color-syntax-highlighting and graphical lines overlayed on the code, DrRacket-style.
Again: this is not impossible. The compiler KNOWS it's going to optimize something out because of UB. MAYBE you had to enable some level of optimization for it to happen, but, ok, the warning/error message can be dependent of `x` on `-Ox`. No problem there.
But, again, if the compiler is doing/omitting something because of UB, it SHOULD TELL THE PROGRAMMER. In painstaikingly richness of details.
Posted Dec 5, 2022 12:05 UTC (Mon)
by khim (subscriber, #9252)
[Link] (12 responses)
The whole discussion is centered around this whole point. That's “common sense”. Computers don't do common sense. Not even so-called AI-technologies can do these (for now?). Compilers defintely can't. Actually machine-learning technologies are even worse then what current compilers are doing: at least with current technologies compiler couldn't understand what it does, but human can, with machine learning not even human can explain why that or this change was done. Computers do logic. And compilers do logic. And logic says: The fact that compiler assumed that program doesn't do certain “hard to diagnose” things doesn't mean that compiler understood what the program is doing and consciously thwarted the programmers intent. Compiler can not anything “consciously” because it doesn't have “conscience”. Compilers couldn't understand anything. They are too simple for that. But they are now more then capable to do repeated applications of various simple tests and computers are fast enough that such applications happens quickly. Do you want to see 100 lines of warnings for every line of source code? Any source code, not just code with manifested UB? Because that's the only thing that compiler can easily offer. It can only list assumptions that very uses and most of them are trivial: it assumed that variables were initialized, that memory was allocated, that pointers are not dangning and that 200 other UBs haven't happened, too. If you want to observe how compiler emploited lack of UB to do certain optimization - most compilers supports doing dumps of code after each pass. The only trouble: Prove it. Patches are welcome. I'm sure if you would create something like that both Oh, sure. Compiler knows which UBs it assumed not to happen in each line of code. Between 10 and 50 possible imaginable UB per line of source. Do you want that dump? What do you plan to do with it? That is easily achievable. Just dump ASS or RTL and markups about which UBs were assumed not to be there would be included. Thousands of them. But programmers don't want that. They want ponies. Instead of list of thousands of UBs which compiler assumed are which are not in the code they want one UB which was assumed not to be there but which was actually put there by mistake or (even worse) on purpose. That information compiler doesn't have and thus can not dump.
Posted Dec 5, 2022 15:31 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (11 responses)
THIS THIS THIS.
ALL the UBs you have listed here are BUGS. They are all "Don't Do It".
But things like signed integer arithmetic? That's a SchrodinUB - you don't know if it's UB or not until you do do it. And if you look at the laws of Pure Mathematics, it's not UB *at*all*.
There's a reason us grey beards get all pissed off at this. We understand logic. We can do Maths. You said your typical programmer of today loves rules. "Monkey See Monkey Do" rules. Like "Don't do signed integer arithmetic!". WHAT!?!? Why on earth does C even HAVE signed integer arithmetic if you're not supposed to use it!? Why doesn't the compiler just optimise it ALL away if it's guaranteed to screw up at the most inopportune moments?
When I design software, I do NOT set out to solve the specific problem in front of me. At work at the moment I am trying to unravel a horrendous mess where everybody has solved their own problem, and we have loads of programs all interacting in weird and unexpected ways. (Compounded by people not using common sense and blaming me for "the output looks wrong", so when I investigate my reaction is "what is this garbage you are putting in?")
When I program I always - even if not consciously - do a truth table. What are the possible inputs? What are the possible outputs? Even if my problem is a small subset of the output, I DON'T close off the input space, I just mark it "here be dragons" (or, usually, more work for later), and trap or block it so I can deal with it later.
The majority of your examples have a simple, single value truth table. It's called "Garbage In, Garbage Out". I have absolutely no problem with that being UB.
But signed integer arithmetic? It appears to collapse to a simple two-way table - "Valid In, Valid Out", or "Valid In, Garbage Out". To treat that the same as a simple "Garbage In, Garbage Out" is crazy. (I've ignored the other two states, because "Garbage In" is indistinguishable from "Valid In".) A scenario where all possible inputs are valid, is not the same at all as a scenario where all possible inputs are invalid.
Cheers,
Posted Dec 5, 2022 17:03 UTC (Mon)
by khim (subscriber, #9252)
[Link] (10 responses)
But laws of Pure Mathematics don't apply to computers unless you make them. C language does pretty natural assumption is that you would do so and would avoid computations which may overflow. No, that's not the rule. The rule is “do the bounds checking, damn it”. Yes, this includes signed integer arithmetic, too. Because that would be against the rules, obviously. For that to work you need some kind of isolation between hardware and abstract machine that language uses. When you try to do that layer thin enough (like C does) you invariably end up with “bad code which may work for some time and then blow up in your place” (like using already freed object which may work for some time till, one unlucky day, interrupt would come at different time and that object would be overwritten). And then you have to describe that subset and avoid it. It seems that you understand that. But somehow you refuse to accept that bounds of that subset are, to some degree, arbitrary and there needs to be common description of where they are! That common description is called C standard. If you think that some things don't belong to that subset — use the switch and make them defined to you, personally (compilers already support that, C doesn't support such things. It's either “Valid In, Valid Out” (like with unsigned numbers) or “Garbage In, Garbage Out” (like with signed ones). “Valid In, Garbage Out” is not in the cards, sorry. Yes. And it's one of the reason why C standard classifies all inputs either as “Valid In, Valid Out” (fully-specified things, unspecified things and implementation-specified things are all in that class) or “Garbage In, Garbage Out”. Case of “Valid In, Garbage Out” shouldn't happen in a valid C program. Even with unsigned numbers division by zero is not allowed (it's UB) because that would be “Valid In, Garbage Out” case. You have to ensure that divisor is non-zero. Only people rarely object against that, for some reason. Maybe because it's defined as “Garbage In, Garbage Out” in math, not just in C standard. The fact that you don't ever need to support that “Valid In, Garbage Out” is something they really wanted to have because it simplifies reasoning for the compilers and wasn't supposed to be a big deal for humans. That was, probably, miscalculation ANSI committee did 30 years ago. But I don't think it's easy to change: neither flags for The rule there are no such thing as “Valid In, Garbage Out”, there are only “Valid In, Valid Out” and “Garbage In, Garbage Out” stays unchallenged.
Posted Dec 6, 2022 10:10 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (9 responses)
The laws of Pure Mathematics do apply to computers, very much so. C is, for example, doing its operations on a finite set with abelian groups for multiplication and addition, and not on a field (since some operations such as multiplication and addition are not defined for all pairs of inputs from the set).
The fun comes in when people assume (falsely) that + and * in C are the field operators for school arithmetic. They're not - they're still set operations, and they still form an abelian group, but they're not field operators, and they don't behave like field operators in all cases.
Posted Dec 6, 2022 14:03 UTC (Tue)
by khim (subscriber, #9252)
[Link] (8 responses)
Only unsigned types are defined to work like you describe. Signed types work differently.
Posted Dec 6, 2022 14:11 UTC (Tue)
by farnz (subscriber, #17727)
[Link] (7 responses)
I'm sorry, you have me completely confused. I described a system where addition and multiplication are only defined for a subset of pairs of inputs, which is pretty much signed types in C - if I have a 32 bit integer type, then 2**17 * 2 ** 17 is UB in C and in Pure Mathematics for a type that's not a field, but is an abelian group, whereas 4 * 3 is defined, and is the same as 3 * 4 (the property that makes this an abelian group).
Unsigned types are fields, I thought? The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.
Posted Dec 6, 2022 14:49 UTC (Tue)
by khim (subscriber, #9252)
[Link] (1 responses)
No. I haven't read that in your description and, worse, group is defined like this: You have started talking about group axioms without checking whether result is defined for all elements of supposed group. Unsigned numbers are a group. Signed numbers are not group. No, they don't form a field. Semigroup, group, abelian group, ring… they all call for operations to be defined for all elements. But field definition includes UB: reciprocal not defined for zero (and only for zero, that's why unsigned numbers in C are not a field). That, somehow, never changes the opitions of If something doesn't follow the definition then you can't say that it's “𝓧, but not 𝓧”. Well… technically you can say that, but it's a tiny bit pointless: math doesn't work like that, you couldn't use theorems proven for 𝓧 with “𝓧, but not 𝓧”. You may say that while I guess the idea was that people may need ℤ₂ᴺ, but since ℤ is not feasible to provide in low-level lamguage like C (even python have trouble with ℤ) thus asking developer to use them carefully and avoid them when result is not guaranteed wouldn't too problematic. After all they have to remember not to divide by zero in math, why couldn't be they taught not to overflow in C? As you can see some people don't like that idea (to put it midly).
Posted Dec 6, 2022 14:56 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
I made mistakes in the terminology (I'd forgotten the details of groups) - but I stand by my claim that it's mathematically reasonable to have a finite set, F, with binary operations + and * defined only for a subset of pairs of elements of F, and that maths does not always claim that because I have a finite set F, with a binary operation +, it must follow all the axioms of arithmetic.
Rather, C's signed integers are a finite set that behaves like the integers in some respects but not others, and results that hold for mathematical integers do not necessarily hold for C signed integers, but that this doesn't mean that C signed integers are not a mathematically acceptable set - so reaching for mathematics and saying "but maths says!" just highlights that you're not particularly good at mathematics, and expect C's signed integers to behave like a set you're familiar with from school, rather than like the sort of sets that you might start discussing in a bachelor's degree.
Posted Dec 6, 2022 15:54 UTC (Tue)
by anselm (subscriber, #2796)
[Link] (1 responses)
ISTR that even in a group, the result of the operation on two members of the group must always be a member of the group. “Undefined behaviour” where applying the operation on two members of the group doesn't yield a result within the group isn't allowed. Nor does it work to exclude, e.g., 2**17 from the set on the grounds that 2**17 * 2**17 isn't in the set; you need it in the set so that (non-overflowing) operations like 2 * 2 ** 16 can have a result. (Fields get that property from the fact that they're basically built on groups.)
Posted Dec 6, 2022 16:37 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
Yes - that's my error. I stand by the claim that C signed integers are a perfectly reasonable mathematical object, and that their behaviour is happily described by pure mathematics.
But I have to retract my claim that they're a group - they're still a finite set, with several operations that give them superficial similarity to the set of integers, but they're not a group.
Posted Dec 6, 2022 21:09 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
They are rings, not fields (no multiplicative inverse, since we're not looking at floats).
Signed integers are not even rings, because at least one element doesn't have an inverse (e.g. there's -32768 but not 32768).
Posted Dec 6, 2022 22:33 UTC (Tue)
by khim (subscriber, #9252)
[Link] (1 responses)
That's not how rings behave. That maybe one reason for why C standard defines signed operations like it does: if you want ring — you already have one, unsigned numbers, why would you need another, identical one?
Posted Dec 7, 2022 17:31 UTC (Wed)
by kleptog (subscriber, #1183)
[Link]
Signed integers on computers don't form a ring, not in a useful sense anyway. For example 10*19=-66 on an 8-bit machine. You can argue that if you look at the bit patterns it's the same as unsigned integers but that's just relabelling. What I think the C designers wanted was to consider the signed integers a subset of the natural numbers. Because that gives you a meaningful way to deal with ordered comparisons.
For example, the implication a<b => 2a<2b is *not true* in the unsigned integers. This is however super annoying for a compiler optimiser. The standard compiler transformation in C where you change loop counters to count by the width of the object you're looping over is strictly not allowed unless you can prove the loop bound won't overflow. The compiler often can't prove it, so that would leave a simple optimisation on the floor.
Unless you assume overflow is UB.
"a+b > a if b is positive" is false in the unsigned integers. But a very useful thing to be able to assume when optimising code. The classic example of code that changes with -fwrapv:
int f(int i) {
Is optimised to true without it.
Pointer arithmetic is another case, you don't want to treat this as a full on ring either, because pointer comparison is so useful (can we assume a[b] is always later in memory than a[c] if b>c?). So the compiler simply assumes overflow cannot happen.
I don't think the compiler writers are out to get you. But if you want signed integers to act 100% like CPU registers, you do need to set the compiler flag.
Posted Dec 5, 2022 12:17 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
But the compiler does not "see the UB". It applies a set of rules that are valid assuming no UB, and comes to a result.
I bring you back round to the "proof" that 1 = 0. The outcome is clearly absurd - 1 = 0 is false - and yet I have a "proof" here that 1 = 0.
The reason it all goes to pieces is that I've applied a set of rules, all of which are valid. Every step in that "proof" is a valid application of the symbolic manipulation rules of algebra - none of the applied rules are wrong, nor was the manipulation incorrect. The reason I come up with an absurd result is that I didn't detect that, for the specific rule "divide both sides by (x-y)", it's invalid to apply that rule if x = y.
UB problems in compilation are similar. I've got a set of symbolic manipulations (often a very large set) that I can apply to your program. Some of those manipulations are only valid if the program does not contain UB (just as in my "proof", I used a manipulation rule that is only valid if x != y). At no point does the compiler "detect" UB - it's manipulating the program on the assumption that it doesn't 'contain UB, and the consequences of a chain of manipulations (that are all valid in the absence of UB) is a bad outcome if there is UB present.
Posted Dec 1, 2022 19:29 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Dec 1, 2022 22:58 UTC (Thu)
by anton (subscriber, #25547)
[Link]
However, the Linux kernel is not such an environment, so you cannot apply this optimization there (unless you want to make the assumption that undefined behaviour does not happen, which the Linux kernel does not assume).
Development quote of the week
- add intrinsics to detect bad cases to be used in these checks
- better ways to say "I know I am using assumption X about the hardware here" and let the compiler say "that doesn't work here" when compiling for ObscureArch (instead of just miscompiling)
Development quote of the week
if (x > (INT_MAX / 0x1ff))
return 0;
```
Development quote of the week
Reality check: Name an architecture from after 1970 that does not support two's-complement arithmetic. Even if there is one, then on that architecture the C unsigned types are at least inefficient: Many two's-complement operations are the same as unsigned operations, so if an architecture does not have the two's-complement operations, it does not have the unsigned operations, either. I have certainly never programmed on a binary machine that did not support two's-complement, and I have been programming since the early 1980s.
Development quote of the week
supporting unaligned reads
Actually that has won (at least in general-purpose computers). If I want to test my programs on a machine that requires data alignment, I have to turn on our 2006-vintage Sparc T1000 or even older machines (most likely I would turn on a 1998-vintage Alpha). If I want to test on big-endian machines, I also have to turn on some old, normally turned-off machine; little-endian has won, too.
> If I want to test my programs on a machine that requires data alignment, I have to turn on our 2006-vintage Sparc T1000 or even older machines (most likely I would turn on a 1998-vintage Alpha).
Development quote of the week
vmovaps
still fails even without it in a brand-spanking new AVX512 (which AMD have only just added to their CPUs this year).
I have tried several times to use the AC flag for testing that everything is aligned, but failed: On IA-32 Intel's own ABI results in code that produces an exception if AC is set. On AMD64 the ABI is ok, but gcc produces code with unaligned accesses from code where all pointers are correctly aligned for their data types.
Development quote of the week
Development quote of the week
Wol
> Or are you saying that x86 REQUIRES data alignment?
Development quote of the week
Development quote of the week
Development quote of the week
Development quote of the week
Development quote of the week
Wol
> But where does it say you are *not allowed* to perform undefined behaviour?
Development quote of the week
Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. I
An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: it rarely happens that Saul becomes Paul. What does happen is that its opponents gradually die out, and that the growing generation is familiarized with the ideas from the beginning
> As others have pointed out, this insanity will simply lead to C becoming - rather rapidly - obsolete. Rust, with its emphasis on "there is no such thing as undefined behaviour in safe code" and "unsafe blocks can carry out undefined behaviour" will supplant it in no time flat if people lose confidence in the ability of C compilers to produce code that can be trusted. It's all very well to expect old code to keep working, but if you can't recompile and trust the result, you will RUN away, not walk...
int set(int x) {
int a;
a = x;
}
int add(int y) {
int a;
return a + y;
}
int main() {
int sum;
set(2);
sum = add(3);
printf("%d\n", sum);
}
rustc
developers spent significant time talking to different crate makers to ensure update wouldn't be too disruptive, but that was only possible because said developers are not starting with that Am I a trembling creature or have I the right? question. If their code is wrong then it should be fixed, thanks for the bug report.
Development quote of the week
>
> Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. I
>
> Simply by definition undefined behavior is a program error (albeit the one which may not be diagnosed by a compiler).
Wol
> It says it is not the responsibility of the compiler to catch undefined behaviour.
Development quote of the week
because there are no UB ⇒ we can do that
…
NULL
. Attempting to use variable which belongs to the function which have done its work and was finished. And so on.Development quote of the week
int func(int input) {
if (input < 100) {
return -1;
}
input += 50;
if (input > 500) {
return 1;
}
return 0;
}
int func(int input) {
if (input < 100) {
return -1;
} else if (input > 550) {
return 1;
} else {
return 0;
}
}
unsigned func(unsigned input) {
if (input < 100) {
return -1;
} else if (input > (UINT_MAX - 500) {
return 0;
} else if (input > 550) {
return 1;
} else {
return 0;
}
}
Development quote of the week
Development quote of the week
Development quote of the week
Development quote of the week
Wol
> In Maths, "int * int = int". Valid input gives valid output.
Development quote of the week
unsafe
keyword. Then it assumes that you have sanitized your pointers (just like C does). It makes some things which C puts in the UB bucket well-defined (e.g. type punning is legal in Rust) but, in turn, it places additional restrictions, too.O_PONIES
switch and C compiler developers, when they see that C users are not really serious about working together, do nothing beyond -fno-strict-overflow because it suits them just fine: all sensible developers are either using checks which don't overflow (you don't need UB, more sensible option would be use of functions provided just for that purpose) or use the switch and the ones who are non-sensible… you can't reason with them, anyway.Development quote of the week
Wol
> How would I have found the time to go to them?
Development quote of the week
int
to unsigned int
, multiply two values then cast them back… or can just use functions given to you specifically to check for overflow).O_PONIES
option which would give them carte blanche to break the rules!Development quote of the week
Wol
> Well, if the compiler detects a double free, it should really halt with an error.
Development quote of the week
O_PONIES
. How compiler is supposed to do that? Most memory management systems just assume that user would ensure that it never happens.unsigned
numbers as if they were members of appropriate ring, ℤ₂⁸ for 8-bit integers or ℤ₂³² for 32-bit ones… overflow is impossible by definition.uint32_t
to int32_t
. That result would be predictable, although, in theory, may be different on different architectures (but if you verify and find out that such conversion would be producing -1
once that means that it would be producing -1
forever).unsafe
block rules are the same: if your code triggers UB then all bets are off (the list of UBs for Rust is smaller but includes some items that C doesn't consider UB).Development quote of the week
Wol
> Not to treat two completely different categories of programming cockup as if they were identical?
Development quote of the week
Development quote of the week
Development quote of the week
Development quote of the week
Development quote of the week
But how compilers are supposed to optimize anything if their users are expected to break the rules?
By performing proper optimizations: strength reduction, loop-invariant code motion, inlining, loop unrolling, etc. You don't need to rely on programmers following some rules for proper optimizations.
> Concerning your code that relies on the value of uninitialized variables, this code is probably pretty brittle during maintenance, so programmers avoid this kind of usage.
Development quote of the week
code*
and start peeking on the generated code.Development quote of the week
Yes, but to say that some code is “too ugly to survive” you have to precisely define what kind of code you consider “too ugly”. And if you would do that you would get list of UBs for your language.
Where do you get "ugly" from? I wrote "brittle". Anyway, yes, a compiler maintainer for a language like C has to determine what changes to the compiler would break real, existing, tested programs, and then refrain from these changes. That's just the same as Linus Torvalds' "We don't break user space". And sure, there can be cases where a compiler or kernel maintainer misjudge an issue; then they will just have to revert the change. I have written about this at greater length.
It's just not possible to create a low-level language without UB.
It may not be possible to create a low-level language that is fully defined, but it is easy to create one without undefined behaviour: Just define a set of allowed behaviours for
each of the operations in the programming language. For signed integer overflow (the issue at hand) it is trivial: Common practice suggests the -fwrapv behaviour (so that would be fully defined).
> Anyway, yes, a compiler maintainer for a language like C has to determine what changes to the compiler would break real, existing, tested programs, and then refrain from these changes.
Development quote of the week
realloc
), but, ultimately, when people don't even plan to read rules and follow them… nothing would have worked, anyway.Development quote of the week
Wol
> And what do you do when USERS don't (or can't) upgrade to said fixed versions?
Development quote of the week
> And what do you do when USERS don't (or can't) upgrade to said fixed versions?
Development quote of the week
rustc
version have sharp drop after version 1.55. Why? Was that version, somehow, special? No. It was just version released one year ago. Most “serious” Rust developers don't want to create to much pain for their downstream users and support ancient, year-old versions of Rust (although many “less serious” developers only support three-months old Rust, that's why there are secondary drop after version 1.63).Development quote of the week
Unfortunately doing what Linux is doing is not, really, feasible because there are just too many programs which are not even supposed to work with random C compiler (and their developers don't plan to fix them).
It is just as feasible as for Linux. Programs that are not supposed to work with a particular C compiler are as irrelevant for the question of whether a newer version of that compiler breaks a program as programs that are not supposed to work on Linux are irrelevant for the question of whether a newer version of Linux breaks a Linux user space program (and there are probably more programs that work in Linux user space than are compiled by, say, gcc).
I saw that before. Like all O_PONIES proposals they end up in a trash can (like similar proposals for Linux which “doesn't break userspace”, lol) because they lack consensus.
You may have seen it, but you failed to understand it. My paper does not propose a tightening of the C standard. Instead, it tells C compiler maintainers how they can change their compilers without breaking existing, working, tested programs. Such programs may be compiler-specific and architecture-specific (so beyond anything that a standard tries to address), but that's no reason to break them on the next version of the same compiler on the same architecture.
> My paper does not propose a tightening of the C standard. Instead, it tells C compiler maintainers how they can change their compilers without breaking existing, working, tested programs.
Development quote of the week
O_PONIES
are possible, trust me” in a different words.O_PONIES
: they never bother to explain how exactly their “friendly C compiler” would work, they never explain what they propose to put inside, they just repeatedly assert that creation of black box of some shape is possible.Development quote of the week
If your proposal doesn't make it possible to explain what the “well behaving” program is then it's useless for the compiler developers.
My paper (not proposal) tells compilers not what a "well behaving" program is (that's not at all its point, and it's not relevant for a friendly C compiler), but how a friendly C compiler behaves. That may be useless for the developer of an adversarial C compiler, true.
[...] you haven't proven that if you create such a compiler you would be able to make anyone happy.
I leave it up to the reader to decide whether a backwards-compatible compiler would make them happier than an adversarial compiler. But the idea of a proof that a certain kind of compiler makes anybody happy is interesting. What methodology would you accept for the proof?
> My paper (not proposal) tells compilers not what a "well behaving" program is (that's not at all its point, and it's not relevant for a friendly C compiler), but how a friendly C compiler behaves.
Development quote of the week
O_PONIES
, O_PONIES
, and more O_PONIES
.gcc
and clang
.clang
(which is funny if you consider the fact that certain facilities in kernel are no compatible with GCC) haven't made the switch. Why do you think it happens?
John Regehr wrote: "A sufficiently advanced compiler is indistinguishable from an adversary." I don't agree that this is an "advance", but if compiler maintainers take the attitude that you are advocating here, the compilers are certainly going to become more adversarial the more sophisticated they get.
Development quote of the week
> I find it funny that someone who writes "O_PONIES" as frequently as you do is complaining about supposedly derogatory names.
Development quote of the week
O_PONIES
.Development quote of the week
Portability is valuable in many settings, and I think I am quite experienced in the area, with Gforth (which "breaks the rules" (in your terminology) a lot) usually working out of the box on new architectures and operating systems.
Development quote of the week
> But my point is that a friendly compiler must also work for non-portable code: If it works with one version of the compiler on a particular platform, it must also work with a later version of that compiler on that platform.
Development quote of the week
O_PONIES
s, really) from C users, but don't even know a single optimizing compiler developer who subscribes under that idea.Development quote of the week
> C and C++ have several orders of magnitude of users than Rust.
Development quote of the week
It's a tale as old as time.
Development quote of the week
Development quote of the week
what stops a compiler team deciding that supporting the behaviours of the previous version is too hard, and thus they're going to release a new compiler with the same CLI, that's not a "new version of the existing compiler"?
Self-respect.
Development quote of the week
There are some programs that don't need configure or the like to be portable, but many use a lot of stuff coming out of configure, and I am very pessimistic that we can get rid of conditional compilation.
Development quote of the week
Development quote of the week
> For example, left shift by too much can keep the same value (IIRC, ARM makes it 0).
Development quote of the week
> For example, left shift by too much can keep the same value (IIRC, ARM makes it 0).
Development quote of the week
push
and pop
instructions which may push or pop from 1 to 16 registers as result of one instructions. If you specify 0 registers then some manufacturers treat it as NOP
, some treat as UD
, but it's also permitted to load random set registers from stack including PC counter!Development quote of the week
Development quote of the week
Development quote of the week
Development quote of the week
> The thing is, the user did not break the rules.
Development quote of the week
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this document places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
> For starters, your example only gives a semblance of a correct result when the optimizer is turned on, rather than off.
register
specifier exist if that's not so? Or do you want to imply that they are, somehow, initialized when you enter the function? That wouldn't be optimal now, would it? C never initialized variables, why should it start doing that now?5
. It's the exact same logic you were using for justification of the use of signed overflow: standard doesn't explain how that program works, but if you know how real hardware works then it's not hard to predict the outcome. And, indeed, most compilers (with optimizations disabled) produce precisely that outcome.a
lives on the stack in two functions. One function sets it while the other one reads it.register
, of course). How can you say all that have disappeared?Development quote of the week
int i;
int array[8];
for (i=7; i >0 ; i--) {
array[i] = 0;
}
return array[0];
}
Development quote of the week
> If you want your program to be predictable in any way, you cannot allow any instances of UB to occur in it - including in any tests for UB.
In other words, C IS NO LONGER FIT FOR SYSTEMS PROGRAMMING.
I don't remember if it was you, but when someone said "you need to program to the C model, not the hardware model", that is the death knell of a systems programming language. If you're not allowed to program to the hardware, why the hell are you using it to control said hardware?Development quote of the week
Development quote of the week
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirementsDevelopment quote of the week
> behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements
Wol
Development quote of the week
> By that definition, the only acceptable programming language for systems programming is... bare assembly.
Development quote of the week
unsafe
blocks, it's entirely different things if you need to think about these hundreds of possible UBs all the time when you write each and every line of code.Development quote of the week
C IS NO LONGER FIT FOR SYSTEMS PROGRAMMING.
In practice, what the compiler writers are doing is saying "there is no such thing as nonportable constructs", despite the standard explicitly allowing for it.
Development quote of the week
- call `unsafe` functions
- implement an `unsafe` trait
- mutate a `static`
- access `union` fields
Development quote of the week
Compiler: "Wait...what? What part of "not allowed to perform" are you not getting?"
No, the compiler does not say that. If compilers reported undefined behaviour at compile time, programmers would just work around it (and for new code, they would not even grumble). But compilers don't do that; they just miscompile, usually without warning.
> But compilers don't do that; they just miscompile, usually without warning.
Development quote of the week
Development quote of the week
Wol
> But the compiler clearly HAS caught undefined behaviour
Development quote of the week
unsafe
sublanguage works on the same principles as C.clang
and gcc
have switches that turn off these UBs thus difference is pretty superfluous.Development quote of the week
THIS. This has been the point that's missing from this whole thread IMHO.
Development quote of the week
A ⟹ B
and B ⟹ A
are very different things.O_PONIES
, O_PONIES
, and more O_PONIES
. Compiler couldn't help with what it doesn't have. These graphical lines which you observe in gcc
output or rusrc
is done by entirely different module specifically created to show them.O_PONIES
lovers don't know how to read these dumps, they don't care about them and would never understand them (or, rather, if they would do some effort of trying to understand them they would stop being ignoramuses wishing for O_PONIES
).clang
and gcc
maintainers would be grad to accept such patch.Development quote of the week
Wol
> And if you look at the laws of Pure Mathematics, it's not UB *at*all*.
Development quote of the week
clang
and gcc
provide plenty of such switches… there are many such controversial UBs and thus many such switches) or change the standard (yes, it's harder than swearing on various forums but have at least theoretical chance to work).clang
/gcc
nor Rust do that. Instead of declaring integer overflow UB Rust makes it IB and carefully describes all possible results in it's reference and the same with clang
/gcc
switches.Development quote of the week
Development quote of the week
Development quote of the week
> I described a system where addition and multiplication are only defined for a subset of pairs of inputs
Development quote of the week
A group is a set 𝐆 together with a binary operation on 𝐆, here denoted "·", that combines any two elements 𝓪 and 𝓫 to form an element of 𝐆, denoted 𝓪·𝓫, such that the following three requirements, known as group axioms, are satisfied: associativity, identity element, inverse element.
uint32_t(65536)
doesn't have an inverse element (that is: you can not find any 𝔁 to make uint32_t(𝔁)*uint32_t(65536)=uint32_t(1)
), it's a ring. Also a well defined mathematical object, but not a field.O_PONIES
lovers: they, usually, continue to support the notion that it doesn't mean anything (or, worse, start claiming that the fact that it's UB to divide by zero even for unsigned numbers is also a problem… even if rarely bites anyone because most people test for zero, but not for overflows).unsigned
numbers properly model ℤ₂ᴺ ring, while signed
numbers correctly models ℤ if (and only if) you ensure that operations on them don't overflow, but they don't form semigroup, group, abelian group, or ring.Development quote of the week
Development quote of the week
The gap between what I described and a field is that in a field, all input values for addition and multiplication result in an element of the set, whereas in what I described, some input values for addition and multiplication result in undefined behaviour.
Development quote of the week
Development quote of the week
Development quote of the week
-32768
is inverse of itself. In fact it's exact same ring which you have with unsigned numbers (most CPUs these days don't even bother to give you separate operations for signed or unsigned numbers, they only distinguish signed and unsigned numbers when you compare them, but ring only deals with addition, multiplication, and, like any ring, equality… ordering is not part of it).Development quote of the week
return i+1 > i;
}
Development quote of the week
Development quote of the week
Actually, if you compile for an environment where dereferencing a NULL pointer traps, it's a proper optimization to compile the same code for
Development quote of the week
x=*p; if (p!=NULL) { X; }
and for
x=*p; X;
In that environment you don't need to assume that undefined behaviour does not happen; instead, you know that p!=NULL after the first statement.