IFNDR

Posted Aug 13, 2024 12:25 UTC (Tue) by tialaramex (subscriber, #21167)
In reply to: Rust vs C by khim
Parent article: Rust Project goals for 2024

Rice's Theorem leads in C++ to something technically much worse than Undefined Behaviour, it gets you programs which are Ill-Formed No Diagnostic Required, IFNDR.

Because UB actually happens - even the nasty "time travel UB" where something undefined definitely _would_ have happened in the future and so therefore something perhaps unthinkably bad happens now instead - we can avert it at runtime.

For example suppose we've got a classic time travel UB null check deletion in the Print Student feature, we can prevent this problem by never pressing that "Print" button. "Dear Credentials Teams, the Print Student button is defective, it is crucial that you do not use this button. If you require a printed student record, call internal ext 1000 and explain your need. We anticipate this bug will be fixed on Thursday". The null dereference never become inevitable, so the UB never happens, we're safe.

But suppose instead there's a categorical semantic mistake of the kind Rice's Theorem concerns, oops, we try here to sort students by their grade score, but some students have a NaN score, grade scores are not a total order and so we can't sort them. This means the entire program never had any meaning, it's allowed to compile anyway (No Diagnostics Required) but it might do anything. You can warn users to stay away from the dangerous parts, but just running the software is potentially dangerous, even if the sorting by grade score only happens in tab Z, staying away from tab Z makes no difference to the fact the entire program never had any particular meaning. All bets are off.

Most of those GC languages you mention do have UB it's just rare in practice, Java doesn't because Java expended a *lot* of effort to avoid UB, and I suspect they would say it's too late now but it wasn't worth it - most languages just shrug it off. Rust is more special than I think most people understand for deliberately choosing not to have UB (in its safe subset).

On the other hand IFNDR is a complete travesty, we should not pretend you can write serious software with a compiler which may not even have checked you wrote a valid program. "False positives for the question, is this a valid C++ program?" is how it has been described. That's just not good enough.

IFNDR

Posted Aug 13, 2024 15:24 UTC (Tue) by kid_meier (subscriber, #93987) [Link] (4 responses)

I really don't understand the UB fatalism that seems to assume that you either have UB and thus anything can happen and there are no guarantees whatsoever and the compiler is free to aggressively remove code because it can assume UB can't happen so if it looks like UB it must be dead code. This is absolutely a choice and I've come over to the side that it's a pretty stupid choice. It's certainly a bad default.

I can't remember who, but some time ago this paper "What every compiler writer should know about programmers" [1] was posted here in a comment thread about UB. This is fairly old now, so I am not expert enough to know if compiler optimization has advanced so far that the gains at this point are significantly more than the state-of-the-art at the time it was written but I certainly have my doubts about it.

I just wanted to re-raise this side of the UB debate and push back against this fatalism that keeps being brought up here that there are no other options of handling UB other than the way clang/LLVM and gcc have chosen to handle it, ie. by exploiting it as a logical impossibility and optimizing from there. Again, that is in no way inevitable. The C/C++ standards (AFAIK) make no such claim; it's entirely a result of the (compiler) industry's interpretation of the relevant standards and it seems likely we'd be better off with a softer stance.

I get it, I'm not suggesting security problems would go away if this maximalist interpretation of UB was not the default, but certainly there are a number of well publicised vulnerabilities that wouldn't have occured if this wasn't the industry norm. The horse has left the barn, so indeed this is somewhat pointless -- the mitigations of this maximalist interpretation are well known, etc.

But it could have been another way, and other languages/compilers certainly could strike a different balance.

1. https://c9x.me/compile/bib/ubc.pdf

IFNDR

Posted Aug 13, 2024 16:23 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

That paper is badly written - it makes two false assumptions (that optimizations rely on UB and are working with something that closely resembles C source code), and then stacks up a bunch of strawmen on top of that.

In practice, any modern compiler works by translating from C to some IR such that the semantics of the IR are the same, for all defined behaviour in the C source, as the semantics of the original C source. Optimizations then transform the IR in a way that maintains the IR's semantics, but which is either more optimal itself, or sets up the preconditions for a later optimization to transform the IR in a useful way; in the latter case, there's usually also a much later optimization pass that undoes the optimization if it didn't do something useful (e.g. LLVM has a pass that transforms loops into LCSSA form, various passes that can only fire if the loop is in LCSSA form, and an instcombine pass that removes the "surplus" IR involved in LCSSA form).

As a consequence of this, no optimization actually depends on UB; rather, the optimizations depend on the IR correctly capturing the meaning of the input code, so that they don't optimize in the wrong direction; you can use godbolt.org to examine the optimization passes that LLVM applies to your C or C++ code, and it'll show you the diff to the LLVM IR each optimization produces.

In that godbolt link, I've set the filters to "hide inconsequential passes" (those which don't change the IR for whatever reason), and changed the options to show the full module at a time, so that you can see how LLVM goes from the unoptimized input to the optimized machine code. Every one of those passes in green is doing something to the LLVM IR, and ignoring the C input; if the C to LLVM IR translation is wrong (e.g. because it gives the "wrong" defined behaviour to something that's UB), then the output will be wrong, too.

The fix is for compilers to define more UB in interesting and useful fashion; to choose everyone's favourite whipping-boy, C says that arithmetic on signed numbers cannot overflow, which means that in LLVM IR, C signed arithmetic puts the "nsw" modifier on the arithmetic operation. If Clang redefined overflow as "either calls abort() or wraps in twos' complement" (which ISO is perfectly happy for Clang to do), then Clang would have to remove the "nsw" modifier from LLVM arithmetic operations that it emits.

That then defines the arithmetic as twos' complement wrapping, and would ensure that LLVM optimizations that depend on the arithmetic not wrapping do not fire unless the compiler already knows via some other path that the arithmetic cannot wrap (e.g. because you're doing a / 10 + b / 10 + c / 10, which cannot wrap).

IFNDR

Posted Aug 13, 2024 17:06 UTC (Tue) by mussell (subscriber, #170320) [Link]

To add onto "not all optimizations rely on UB", both GCC and Clang will kill writes to non-volatile memory locations if there is no subsequent reads as this is allowed by the as-if rule (ISO C section 5.1.2.3). Depending on how the compiler orders its passes, this can lead to interesting behaviour. Just as an example, here is a C program that compiles to an infinite loop in GCC, but terminates immediately in Clang as GCC propagates constants before killing empty loops whereas Clang does the opposite. Not only is this not UB, but is a completely acceptable optimization by ISO C as section 6.8.5.2 says

An iteration statement may be assumed by the implementation to terminate if its controlling expression is not a constant expression, and none of the following operations are performed in its body, controlling expression or (in the case of a for statement) its expression-3:
— input/output operations
— accessing a volatile object
— synchronization or atomic operations.

IFNDR

Posted Aug 13, 2024 17:03 UTC (Tue) by mb (subscriber, #50428) [Link]

>that keeps being brought up here that there are no other options of handling UB other than
>the way clang/LLVM and gcc have chosen to handle

That's a wrong assumption already.
Compilers do not "handle" UB. They do not "see UB" and then cause havoc just because they want to.

Compilers do the *opposite* thing of that. They assume that UB is not present in the program.
They do *not* see UB in the source code and react to that. They assume it's not there.

There are plenty languages available with a sane UB or without UB. Just use them.
C/C++ are obsolete.

IFNDR

Posted Aug 13, 2024 17:14 UTC (Tue) by Wol (subscriber, #4433) [Link]

> I just wanted to re-raise this side of the UB debate and push back against this fatalism that keeps being brought up here that there are no other options of handling UB other than the way clang/LLVM and gcc have chosen to handle it, ie. by exploiting it as a logical impossibility and optimizing from there. Again, that is in no way inevitable. The C/C++ standards (AFAIK) make no such claim; it's entirely a result of the (compiler) industry's interpretation of the relevant standards and it seems likely we'd be better off with a softer stance.

Unfortunately, at least in part, this seems driven by an obsession for EXECUTABLE speed, not development speed.

What's the saying, premature optimisation is the root of all evil? It probably makes sense from their point of view as compiler writers, they want fast runtimes. But it's a nightmare for the developer who makes a trivial change and then waits all day for the program to be rebuilt ...

Like relational databases target GUARANTEED response times, at the cost of making ALL response times much slower - "you're guaranteed your data within 10 cache misses", as opposed to MV - you're 95% guaranteed your data with one cache miss (and 99% with two). Relational says "you have to suffer a slow database", MV says "you have to risk an unlikely pathological case".

But the poor end user is rarely allowed to pick the most appropriate option - "tech knows best" and they suffer what they're given.

"Premature optimisation" as I know all too well at work ... or rather, no attempt at optimisation at all because "the database won't let us do it" or "we can't change current practice", or excuse excuse excuse ... nobody can look at the big picture and say "hey you're ruining it for everyone else!".

Cheers,
Wol

ISO standards as a baseline, not a target

Posted Aug 13, 2024 15:51 UTC (Tue) by farnz (subscriber, #17727) [Link] (12 responses)

Part of the problem is that the ISO committee clearly see themselves as defining the minimum baseline that an implementation must meet to call itself a C compiler; things get deemed IFNDR because a compiler on a restricted platform can't reasonably diagnose this, and things get deemed UB because there's existing conflicting interpretations such that portable code cannot use this construct (since it means radically different things on different platforms).

The expectation is that if you're on a "big" platform, an implementation will issue diagnostics for many things that are IFNDR, because while a diagnostic is not required, it's nice to issue one anyway; similarly, the expectation is that compilers will define things that are UB because either this platform has had a consistent definition all along (so why not continue to use it), or because there's only one sane definition for this on this platform, so why not?

We get problems, however, because the only document that you can lean on when you do something unpopular with either implementations or users is the ISO standard, and as a result, arguments about whether an implementation's choices are reasonable devolve into "this meets the minimum requirements to be an ISO C compiler, so you can't complain", rather than asking whether the implementation should do better than the minimum.

ISO standards as a baseline, not a target

Posted Aug 13, 2024 22:17 UTC (Tue) by khim (subscriber, #9252) [Link] (11 responses)

Sure, but the biggest issue is the lack of sane discussion between developers of compilers and user of compilers.

People are still arguing about even most basic and completely obvious facts.

Take the basis of the whole discussion: 100% of optimizations in 100% of compilers 100% of time depend on absence of UB in a translated program.

What could be simpler and more obvious? I mean: questions like why the contents of the EBX register no longer contained a copy of the executable’s instance handle when the executable entry point was called after upgrde to Window 8?. People just have to understand that at least some UBs are sufficiently nasty that the only way to treat these is to declare programs that do such things “completely broken” without any hope to treat them in the way other than “just don't do that or all bets are off” before any kind of dialogue can happen.

The only sane answer to the question about EBX could be “there was never any promises that EBX would contain something useful”. And, similarly, one may write many such “crazy”, “totally insane” seemingly portable programs! Programs that may read seemingly uninitialised variables or poke in the function internals (one is allowed to convert function pointers to integer and back in C), etc. Some of these programs are even pretty robust: my set/add example works on many compilers and surprising number of platforms even if I don't know of even a single optimizing compiler that doesn't break it.

But technically, as far as standard is concerned, that program belongs to the exact same class of programs as program with a nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment!

Both the programs that are so patently insane that you may only find one version of one compiler which wouldn't break it (e.g. if program looks into it's own compiled binary code to get seed for a random number generator and then decodes assets using that number) and programs that can be compiled by pretty much any sane compiler and only break on obscure and rare compiler which couldn't handle file which doesn't have a new-line character at the very end are put in one large category: that's a program with UB.

As a result compiler users look on hundreds of UBs where majority of the list are totally silly UBs that are only there to make this or that compiler is standards compliant — and say that compilers may do better, they are not required to treat UBs as full carte blanche that allows one to do anything to the program that has such UB!

And then compiler developers hear that, obviously incorrect, assertion (because some UBs can only be treated as full carte blanche to do anything… e.g., as I have already said I don't know of any compiler that tries to keep programs that store random values on stack on one function and then try to access them in the other function working) and interpret it as total ignorance and misunderstanding on the side of compiler users… and if you are talking to someone who doesn't understand logic and couldn't reason about anything… what choice do you have? Ignore them…that's just natural, after all.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 8:21 UTC (Wed) by farnz (subscriber, #17727) [Link] (10 responses)

But note that the system could promise to set the EBX register to the instance handle if it so desired. And a compiler could promise that your set/add example would work, even in the face of optimizations.

These are discussions that can be had with compiler authors, based on what you want the behaviour to be defined as; they should not be seen as "make the compiler do something sensible in the face of UB", but rather "have the compiler commit to a specific definition of the behaviour of this construct" - effectively a "GNU extension" or "LLVM extension" to the ISO standard.

Where the discussion goes off the rails, and badly so, is when people either say that "in this situation" it should be given a specific meaning (that's not how this works - the optimizers are sufficiently far removed from the source that "in this situation" is meaningless), or try to argue that the ISO standard is the only thing that defines the behaviour of a given compiler (and not the compiler's own documentation in addition to ISO).

ISO standards as a baseline, not a target

Posted Aug 14, 2024 9:16 UTC (Wed) by khim (subscriber, #9252) [Link] (9 responses)

> But note that the system could promise to set the EBX register to the instance handle if it so desired.

How? With the help of time machine? And sending someone into the past? Brad Chen (and, most likely, the whole Windows team) have found out about that interesting property when they broke it in Windows 8 and one of customers complained about it!

> And a compiler could promise that your set/add example would work, even in the face of optimizations.

The same story: you have to compile program in a certain, clearly not optimal way and promise to do it the exact same way in the future. If it's not an UB, but something documented then sure, it can be supported. But if it's “I have noticed that your compiler treats it in that way and I know how CPU work so now you have to support that accidental property indefinitely”… that's impossible.

Even Linux struggles to support such accidental properties and it only does that for extremely narrow syscall interface, not for random code that kernel may or may not have. There it's the opposite: for anything in the bowels of linux kernel it refuses to support any kinds of compatibility very loudly and explicitly .

The only way to do that is to stop describing language in terms of semantic and precisely describe what exact machine sequence of instructions is generated by each language construct.

Then the curse of Rice theorem is lifted and yes, UB disappears. But all optimizations disappear, too. That's how early Forth implementation worked.

> These are discussions that can be had with compiler authors, based on what you want the behaviour to be defined as; they should not be seen as "make the compiler do something sensible in the face of UB", but rather "have the compiler commit to a specific definition of the behaviour of this construct" - effectively a "GNU extension" or "LLVM extension" to the ISO standard.

That would have been sensible approach, but none of “we code for the hardware” people ever propose to do that. Instead they try to promote their right to use knowledge of the hardware and demand that compiler should not interfere. Write ridiculous articles. And when someone tries to collect their proposals… ho, boy.

> or try to argue that the ISO standard is the only thing that defines the behaviour of a given compiler (and not the compiler's own documentation in addition to ISO).

Do you have any examples? I have never seen anyone who was saying that compiler's own documentation is irrelevant. What I have seen is assertion that some other, third-party, standards (like POSIX or CPU specification or other such documents) are irrelevant.

Compiler writers argue not that only ISO standard that matters, or that only compiler documentation matters but rather that nothing else matters! If there are no explicit promise in the compiler documentation then it would be uphold, I never heard anything else, but all these “I know that documentation doesn't say anything about X but I know CPU does Y thus I'm entitled to do Z”… nope, things don't work like that.

And while that decision is unfortunate from POV of many developers (I can understand why “we code for the hardware” people want to [ab]use full freedom that hardware gives them), but it's the only sensible one. Intel SDK manual was 5057 pages last time I checked, while was 14777 pages last time I checked… and that's before we even discuss their applicability to what compiler generates!

If compiler users would have asked to add some more rules to the compiler documentation then there would have been a chance to have some sensible discussion.

But their demand is different! Essentially: stop breaking my programs, I don't care what you have to do for that.

That's not a dialogue, that's ultimatum and, worse, ultimatum of a kind that compiler developers couldn't accept even if they wanted to!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 9:30 UTC (Wed) by farnz (subscriber, #17727) [Link] (8 responses)

How? With the help of time machine? And sending someone into the past? Brad Chen (and, most likely, the whole Windows team) have found out about that interesting property when they broke it in Windows 8 and one of customers complained about it!

By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared) bug in Windows 8 via an update. This is not rocket science, and does not need a time machine - you can declare that the system as released was buggy, and provide an update to fix it, just as you do with any other bug.

The same story: you have to compile program in a certain, clearly not optimal way and promise to do it the exact same way in the future. If it's not an UB, but something documented then sure, it can be supported. But if it's “I have noticed that your compiler treats it in that way and I know how CPU work so now you have to support that accidental property indefinitely”… that's impossible.

Sure, but the solution is not to keep going down the "your compiler treats it this way in version x.y.z, so you have to support this property forever" argument, which is a non-starter, but instead go down the "this should be defined by the implementation as meaning this" argument. Unlike the "accidental property" argument, that's actually productive, in that if your definition is unreasonable, the compiler developers can explain why it's unreasonable.

I have no online examples of people arguing that the ISO standard is the only thing that matters for defining C - these are arguments I've had in real life, where people have tried to tell me that it's unreasonable for a program documented as only compiling with GCC to use GNU extensions to C, since then I'm not using standard C. But I know I'm not using standard C - I've chosen to depend on GNU extensions!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 9:58 UTC (Wed) by mb (subscriber, #50428) [Link] (7 responses)

>By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared)
>bug in Windows 8 via an update. This is not rocket science,

It pretty much is rocket science, because there is no way to ensure EBX keeps the assigned value until the user code uses it in their code.
The compiler is free to reuse the register at any point in time after the assignment. On both sides of the call.

There is no way to prevent that.

There is no EBX in C.
There is no "programming for the hardware" in C.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 10:14 UTC (Wed) by farnz (subscriber, #17727) [Link] (6 responses)

You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.

And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.

Finally, your "There is no EBX in C" is a problematic statement, because you're defining "C" very narrowly to justify it; there is no EBX in ISO Standard C (since there's no guarantee that you're compiling for a machine with an EBX), but there absolutely is an EBX in an implementation of C that targets x86-32 and permits inline assembly. And once you have an EBX, the compiler can choose to define what's in it - this is a discussion that can be had with the implementation, and chances are good that they'll have very strong reasons to use EBX as "just another register" rather than "always contains an instance handle", but that's not something that C mandates; rather, it's an implementation detail that the implementation has chosen not to document (along with most of the other implementation details - the only ones that need to be documented are those that the implementation wishes to promise you can rely upon when using this implementation of C).

And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire. The only time this becomes problematic is when you assume that the implementation does things a certain way because that's how you'd implement it - unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 11:30 UTC (Wed) by khim (subscriber, #9252) [Link] (5 responses)

> You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.

But that's not how EBX was treated by previous versions of Windows at all.

Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.

That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.

> And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.

It's not scalable: what would you do if someone else would notice that in some other place Windows XP keeps some other “interesting” info in EBX? And what about programs that poke in certain functions and change some bytes there (I know guy who did such things to make it possible to use some internal flag in a CreateFile)? Or programs that scan your kernel to find and remove CLI/STI instructions? Should these be supported, too? And no, that's not a first April joke: these guys were actually crazy enough to send that thing to a space!

> And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.

That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.

It's possible to do that, sometimes, with a fixed compiler and fixed binary of everything else (think about these realtime drivers implemented in Windows 7 userspace via the removal of CLI/STI from Windows kernel), but that's absolutely not a sustainable approach long-term and demanding compiler that support it is just crazy.

And that's exactly what they demand: they don't want to postulate that compiler have to keep instance handle in EBX, no, that's not what they want. They want the ability to write clever code (that notices that EBX contains an instance handle in one place, but size of allocated block in the other space and something else “interesting” in third place) and then all such crazy programs should continue to work after upgrade. Somehow.

> unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.

Isn't that “just don't write programs with UB” approach that “we program for the hardware” crowd explicitly rejects?

They don't want to redefine list of UBs (that compiler, then, may assume not to ever happen), rather they demand the right to write predictable programs with UB! That's the exact opposite from what you are proposing!

That's the core issue: list of UBs is malleable and negotiable, but “predictable treatment of programs with UB” is not possible. And “we program for the hardware” crowd demands changes in how programs with UB are treated, they are not interested in changing the list of UBs!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 12:06 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.
That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.

Right, but it doesn't need a time machine, or rocket science, or magic, to say that while this used to be a sheer accident, it's now the implementation's defined behaviour, and implementations that don't act this way are buggy as per Microsoft KB article XYZ, which links to updates that fix this for versions that were released with a bug.

There are lots of good reasons to not do this, but if this is something that Microsoft actually want, they can define the behaviour this way, and declare versions that don't put an instance handle in EBX at the documented moments as buggy and in need of a fix. This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.

And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.

That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.

I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct). You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:09 UTC (Wed) by khim (subscriber, #9252) [Link] (3 responses)

> I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct).

Small misunderstanding on your side. Please read what you actually wrote (not what you wanted to write):

> By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared) bug in Windows 8 via an update.

The critical part (change in actual toolchain) was omitted. Thus you proposal sounded like you offered to change the documentation and then fix Windows 8, somehow, without making new toolchain which would treat EBX differently.

I, too, interpreted it like that and wanted to object. Because it's entirely not clear to me how can you “document and fix” that “bug” without changes to the toolchain. But then saw your clarification and understood what you meant. But mb wrote his answer before seeing it this he, like me, assumed you are saying that it's bug in Windows (but not in toolchain) and can be fixed, somehow, without changing toolchain (and definition of language that toolchain handles).

> You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).

No. What I'm saying that what you are proposing is not what “we code for the hardware” crowd is proposing. They don't plant to document anything. Just read that damn document that was already referenced here. It's an ode to the resourcefulness of programmers and proclamation that if these pesky compilers would stop breaking “perfectly valid programs” and are “only just exploiting things that hardware does, but that compiler doesn't know about” then would get much better result than with current compilers that are “abusing UB”.

But, notably, what that “perfect plan” doesn't include are any proposals to change the list of UBs, any plans to define and document anything. It says that documentation should remain the same with some small addition like “compilers should magically stop breaking our programs”. Similarly to how Yodaiken claims that UB treatment in the compilers is “wrong” and demands that compilers should treat UBs differently, but, notably, doesn't propose to document anything.

How do I get an implementation to define a construct in a way that I'm happy with, given that I am not allowed to document the behaviour I want and ask the implementation to agree to comply with my documentation? > This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.

But we are not talking about something documented, but more of the spectre between “you want to say that our compiler always did that in these situations… wow, had no idea” and “yeah, we never promised to always do this and always wanted to do that, but had no resources” cases.

Cases where there were explicitly no promises, but where investigation of compiler and/or other system revealed that they always behave in a certain way (even if it was never documented).

How do you propose to handle that without adding anything to documentation? Remember: all these “we code for the hardware” guys don't talk about changes to the language specification or expansion to it, or anything like that, they all “want to change the way compiler interprets UB” while still keeping the exact behavior undocumented!

Because changing and documenting things is ongoing work and they don't want that. They want some magical solution which would ensure that compiler would stop breaking their programs that rely on something that's officially defined as UB! Not would stop breaking any particular thing, but will “stop doing nasty things” (without even trying to describe these things that shouldn't be broken).

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:17 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

Here's the problem; every time I talk about defining a behaviour in documentation outside the standard, you and mb tell me that I can't do that, because the resulting language is not purely standard C, and thus there can exist implementations that are compliant with the standard but not with my extension. But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.

How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:43 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

> How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?

You don't get. It's as simple as that.

You may get a promise for a given construct to behave the way you need it to behave, but compiler developers put a pretty high bar there: they would ask you to see if standard can be changed first and then, if and when your proposal would be rejected, would still ask you to explain why you couldn't live without it.

There are plenty of UBs that are documented in both clang and gcc and there are plenty of extensions, but they all had to pass that [pretty high] bar: you need to explain why you need something, not want something.

> But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.

But what would you do if someone else wants another, different set of extensions?

Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 14:24 UTC (Wed) by farnz (subscriber, #17727) [Link]

How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?
You don't get. It's as simple as that.

So how, exactly, do I get a language that works for me? You've said that I'm not allowed to define and implement a language that works for me using ISO standard C as a base, because then my source code is not compatible with a compiler that complies with ISO standard C but not Farnz-C. I'm also not allowed to ask an implementation to use the freedom in ISO standard C the way I want it to. So, what am I allowed to do if I want a language that works for me?

But what would you do if someone else wants another, different set of extensions?
Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.

I either have to come up with an extension set that both me and the other people are happy with, and convince the compiler authors that they should follow that extension set, or I have to declare compilers as buggy if they don't follow my documentation (and risk the set of non-buggy compilers for my specification being empty). The risk of an empty set of compilers that work for me is the stick to force me to co-operate with everyone else, and come up with an extension set atop the C standard that works for (e.g.) everyone on desktop Windows, or Android phones, or whatever.

However, you and mb seem to be telling me that I'm not allowed to do this - that because I can't get everyone in the world to agree on more than just the ISO standard, I must not attempt to come up with an extension to the ISO standard and convince compiler implementations to follow my extension in addition to the ISO standard. And that seems badly wrong to me; why can't someone attempt to convince the compiler writers to "fill in the gaps" of the ISO standard in a consistent way?

I'm not, by the way, claiming that it would be easy to do so; I note John Regehr's problems with "Friendly C"; merely claiming that such an extension to ISO is the only possible approach that could work, because instead of saying "you must behave in a way I find intuitive", you're saying "this ill-formed program according to ISO is interpreted as having this meaning", or "this undefined behaviour according to ISO is defined this way if you support this extension", and pushing people to agree on a single extension to the ISO standard.