ISO standards as a baseline, not a target
ISO standards as a baseline, not a target
Posted Aug 14, 2024 9:16 UTC (Wed) by khim (subscriber, #9252)In reply to: ISO standards as a baseline, not a target by farnz
Parent article: Rust Project goals for 2024
> But note that the system could promise to set the EBX register to the instance handle if it so desired.
How? With the help of time machine? And sending someone into the past? Brad Chen (and, most likely, the whole Windows team) have found out about that interesting property when they broke it in Windows 8 and one of customers complained about it!
> And a compiler could promise that your set/add example would work, even in the face of optimizations.The same story: you have to compile program in a certain, clearly not optimal way and promise to do it the exact same way in the future. If it's not an UB, but something documented then sure, it can be supported. But if it's “I have noticed that your compiler treats it in that way and I know how CPU work so now you have to support that accidental property indefinitely”… that's impossible.
Even Linux struggles to support such accidental properties and it only does that for extremely narrow syscall interface, not for random code that kernel may or may not have. There it's the opposite: for anything in the bowels of linux kernel it refuses to support any kinds of compatibility very loudly and explicitly .
The only way to do that is to stop describing language in terms of semantic and precisely describe what exact machine sequence of instructions is generated by each language construct.
Then the curse of Rice theorem is lifted and yes, UB disappears. But all optimizations disappear, too. That's how early Forth implementation worked.
> These are discussions that can be had with compiler authors, based on what you want the behaviour to be defined as; they should not be seen as "make the compiler do something sensible in the face of UB", but rather "have the compiler commit to a specific definition of the behaviour of this construct" - effectively a "GNU extension" or "LLVM extension" to the ISO standard.That would have been sensible approach, but none of “we code for the hardware” people ever propose to do that. Instead they try to promote their right to use knowledge of the hardware and demand that compiler should not interfere. Write ridiculous articles. And when someone tries to collect their proposals… ho, boy.
> or try to argue that the ISO standard is the only thing that defines the behaviour of a given compiler (and not the compiler's own documentation in addition to ISO).Do you have any examples? I have never seen anyone who was saying that compiler's own documentation is irrelevant. What I have seen is assertion that some other, third-party, standards (like POSIX or CPU specification or other such documents) are irrelevant.
Compiler writers argue not that only ISO standard that matters, or that only compiler documentation matters but rather that nothing else matters! If there are no explicit promise in the compiler documentation then it would be uphold, I never heard anything else, but all these “I know that documentation doesn't say anything about X but I know CPU does Y thus I'm entitled to do Z”… nope, things don't work like that.
And while that decision is unfortunate from POV of many developers (I can understand why “we code for the hardware” people want to [ab]use full freedom that hardware gives them), but it's the only sensible one. Intel SDK manual was 5057 pages last time I checked, while was 14777 pages last time I checked… and that's before we even discuss their applicability to what compiler generates!
If compiler users would have asked to add some more rules to the compiler documentation then there would have been a chance to have some sensible discussion.
But their demand is different! Essentially: stop breaking my programs, I don't care what you have to do for that.
That's not a dialogue, that's ultimatum and, worse, ultimatum of a kind that compiler developers couldn't accept even if they wanted to!
Posted Aug 14, 2024 9:30 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (8 responses)
By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared) bug in Windows 8 via an update. This is not rocket science, and does not need a time machine - you can declare that the system as released was buggy, and provide an update to fix it, just as you do with any other bug.
Sure, but the solution is not to keep going down the "your compiler treats it this way in version x.y.z, so you have to support this property forever" argument, which is a non-starter, but instead go down the "this should be defined by the implementation as meaning this" argument. Unlike the "accidental property" argument, that's actually productive, in that if your definition is unreasonable, the compiler developers can explain why it's unreasonable.
I have no online examples of people arguing that the ISO standard is the only thing that matters for defining C - these are arguments I've had in real life, where people have tried to tell me that it's unreasonable for a program documented as only compiling with GCC to use GNU extensions to C, since then I'm not using standard C. But I know I'm not using standard C - I've chosen to depend on GNU extensions!
Posted Aug 14, 2024 9:58 UTC (Wed)
by mb (subscriber, #50428)
[Link] (7 responses)
It pretty much is rocket science, because there is no way to ensure EBX keeps the assigned value until the user code uses it in their code.
There is no way to prevent that.
There is no EBX in C.
Posted Aug 14, 2024 10:14 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (6 responses)
You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.
And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.
Finally, your "There is no EBX in C" is a problematic statement, because you're defining "C" very narrowly to justify it; there is no EBX in ISO Standard C (since there's no guarantee that you're compiling for a machine with an EBX), but there absolutely is an EBX in an implementation of C that targets x86-32 and permits inline assembly. And once you have an EBX, the compiler can choose to define what's in it - this is a discussion that can be had with the implementation, and chances are good that they'll have very strong reasons to use EBX as "just another register" rather than "always contains an instance handle", but that's not something that C mandates; rather, it's an implementation detail that the implementation has chosen not to document (along with most of the other implementation details - the only ones that need to be documented are those that the implementation wishes to promise you can rely upon when using this implementation of C).
And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire. The only time this becomes problematic is when you assume that the implementation does things a certain way because that's how you'd implement it - unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.
Posted Aug 14, 2024 11:30 UTC (Wed)
by khim (subscriber, #9252)
[Link] (5 responses)
But that's not how EBX was treated by previous versions of Windows at all. Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value. That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called. It's not scalable: what would you do if someone else would notice that in some other place Windows XP keeps some other “interesting” info in EBX? And what about programs that poke in certain functions and change some bytes there (I know guy who did such things to make it possible to use some internal flag in a CreateFile)? Or programs that scan your kernel to find and remove CLI/STI instructions? Should these be supported, too? And no, that's not a first April joke: these guys were actually crazy enough to send that thing to a space! That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”. It's possible to do that, sometimes, with a fixed compiler and fixed binary of everything else (think about these realtime drivers implemented in Windows 7 userspace via the removal of CLI/STI from Windows kernel), but that's absolutely not a sustainable approach long-term and demanding compiler that support it is just crazy. And that's exactly what they demand: they don't want to postulate that compiler have to keep instance handle in EBX, no, that's not what they want. They want the ability to write clever code (that notices that EBX contains an instance handle in one place, but size of allocated block in the other space and something else “interesting” in third place) and then all such crazy programs should continue to work after upgrade. Somehow. Isn't that “just don't write programs with UB” approach that “we program for the hardware” crowd explicitly rejects? They don't want to redefine list of UBs (that compiler, then, may assume not to ever happen), rather they demand the right to write predictable programs with UB! That's the exact opposite from what you are proposing! That's the core issue: list of UBs is malleable and negotiable, but “predictable treatment of programs with UB” is not possible. And “we program for the hardware” crowd demands changes in how programs with UB are treated, they are not interested in changing the list of UBs!
Posted Aug 14, 2024 12:06 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (4 responses)
That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.
Right, but it doesn't need a time machine, or rocket science, or magic, to say that while this used to be a sheer accident, it's now the implementation's defined behaviour, and implementations that don't act this way are buggy as per Microsoft KB article XYZ, which links to updates that fix this for versions that were released with a bug.
There are lots of good reasons to not do this, but if this is something that Microsoft actually want, they can define the behaviour this way, and declare versions that don't put an instance handle in EBX at the documented moments as buggy and in need of a fix. This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.
That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.
I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct). You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).
How do I get an implementation to define a construct in a way that I'm happy with, given that I am not allowed to document the behaviour I want and ask the implementation to agree to comply with my documentation?
Posted Aug 14, 2024 13:09 UTC (Wed)
by khim (subscriber, #9252)
[Link] (3 responses)
Small misunderstanding on your side. Please read what you actually wrote (not what you wanted to write): The critical part (change in actual toolchain) was omitted. Thus you proposal sounded like you offered to change the documentation and then fix Windows 8, somehow, without making new toolchain which would treat EBX differently. I, too, interpreted it like that and wanted to object. Because it's entirely not clear to me how can you “document and fix” that “bug” without changes to the toolchain. But then saw your clarification and understood what you meant. But mb wrote his answer before seeing it this he, like me, assumed you are saying that it's bug in Windows (but not in toolchain) and can be fixed, somehow, without changing toolchain (and definition of language that toolchain handles). No. What I'm saying that what you are proposing is not what “we code for the hardware” crowd is proposing. They don't plant to document anything. Just read that damn document that was already referenced here. It's an ode to the resourcefulness of programmers and proclamation that if these pesky compilers would stop breaking “perfectly valid programs” and are “only just exploiting things that hardware does, but that compiler doesn't know about” then would get much better result than with current compilers that are “abusing UB”. But, notably, what that “perfect plan” doesn't include are any proposals to change the list of UBs, any plans to define and document anything. It says that documentation should remain the same with some small addition like “compilers should magically stop breaking our programs”. Similarly to how Yodaiken claims that UB treatment in the compilers is “wrong” and demands that compilers should treat UBs differently, but, notably, doesn't propose to document anything. But we are not talking about something documented, but more of the spectre between “you want to say that our compiler always did that in these situations… wow, had no idea” and “yeah, we never promised to always do this and always wanted to do that, but had no resources” cases. Cases where there were explicitly no promises, but where investigation of compiler and/or other system revealed that they always behave in a certain way (even if it was never documented). How do you propose to handle that without adding anything to documentation? Remember: all these “we code for the hardware” guys don't talk about changes to the language specification or expansion to it, or anything like that, they all “want to change the way compiler interprets UB” while still keeping the exact behavior undocumented! Because changing and documenting things is ongoing work and they don't want that. They want some magical solution which would ensure that compiler would stop breaking their programs that rely on something that's officially defined as UB! Not would stop breaking any particular thing, but will “stop doing nasty things” (without even trying to describe these things that shouldn't be broken).
Posted Aug 14, 2024 13:17 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (2 responses)
Here's the problem; every time I talk about defining a behaviour in documentation outside the standard, you and mb tell me that I can't do that, because the resulting language is not purely standard C, and thus there can exist implementations that are compliant with the standard but not with my extension. But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.
How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?
Posted Aug 14, 2024 13:43 UTC (Wed)
by khim (subscriber, #9252)
[Link] (1 responses)
You don't get. It's as simple as that. You may get a promise for a given construct to behave the way you need it to behave, but compiler developers put a pretty high bar there: they would ask you to see if standard can be changed first and then, if and when your proposal would be rejected, would still ask you to explain why you couldn't live without it. There are plenty of UBs that are documented in both clang and gcc and there are plenty of extensions, but they all had to pass that [pretty high] bar: you need to explain why you need something, not want something. But what would you do if someone else wants another, different set of extensions? Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.
Posted Aug 14, 2024 14:24 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
So how, exactly, do I get a language that works for me? You've said that I'm not allowed to define and implement a language that works for me using ISO standard C as a base, because then my source code is not compatible with a compiler that complies with ISO standard C but not Farnz-C. I'm also not allowed to ask an implementation to use the freedom in ISO standard C the way I want it to. So, what am I allowed to do if I want a language that works for me?
Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.
I either have to come up with an extension set that both me and the other people are happy with, and convince the compiler authors that they should follow that extension set, or I have to declare compilers as buggy if they don't follow my documentation (and risk the set of non-buggy compilers for my specification being empty). The risk of an empty set of compilers that work for me is the stick to force me to co-operate with everyone else, and come up with an extension set atop the C standard that works for (e.g.) everyone on desktop Windows, or Android phones, or whatever.
However, you and mb seem to be telling me that I'm not allowed to do this - that because I can't get everyone in the world to agree on more than just the ISO standard, I must not attempt to come up with an extension to the ISO standard and convince compiler implementations to follow my extension in addition to the ISO standard. And that seems badly wrong to me; why can't someone attempt to convince the compiler writers to "fill in the gaps" of the ISO standard in a consistent way?
I'm not, by the way, claiming that it would be easy to do so; I note John Regehr's problems with "Friendly C"; merely claiming that such an extension to ISO is the only possible approach that could work, because instead of saying "you must behave in a way I find intuitive", you're saying "this ill-formed program according to ISO is interpreted as having this meaning", or "this undefined behaviour according to ISO is defined this way if you support this extension", and pushing people to agree on a single extension to the ISO standard.
ISO standards as a baseline, not a target
How? With the help of time machine? And sending someone into the past? Brad Chen (and, most likely, the whole Windows team) have found out about that interesting property when they broke it in Windows 8 and one of customers complained about it!
The same story: you have to compile program in a certain, clearly not optimal way and promise to do it the exact same way in the future. If it's not an UB, but something documented then sure, it can be supported. But if it's “I have noticed that your compiler treats it in that way and I know how CPU work so now you have to support that accidental property indefinitely”… that's impossible.
ISO standards as a baseline, not a target
>bug in Windows 8 via an update. This is not rocket science,
The compiler is free to reuse the register at any point in time after the assignment. On both sides of the call.
There is no "programming for the hardware" in C.
ISO standards as a baseline, not a target
> You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.
ISO standards as a baseline, not a target
ISO standards as a baseline, not a target
Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.
And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.
> I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct).
ISO standards as a baseline, not a target
ISO standards as a baseline, not a target
> How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?
ISO standards as a baseline, not a target
ISO standards as a baseline, not a target
How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?
You don't get. It's as simple as that.
But what would you do if someone else wants another, different set of extensions?