|
|
Subscribe / Log in / New account

ISO standards as a baseline, not a target

ISO standards as a baseline, not a target

Posted Aug 14, 2024 10:14 UTC (Wed) by farnz (subscriber, #17727)
In reply to: ISO standards as a baseline, not a target by mb
Parent article: Rust Project goals for 2024

You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.

And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.

Finally, your "There is no EBX in C" is a problematic statement, because you're defining "C" very narrowly to justify it; there is no EBX in ISO Standard C (since there's no guarantee that you're compiling for a machine with an EBX), but there absolutely is an EBX in an implementation of C that targets x86-32 and permits inline assembly. And once you have an EBX, the compiler can choose to define what's in it - this is a discussion that can be had with the implementation, and chances are good that they'll have very strong reasons to use EBX as "just another register" rather than "always contains an instance handle", but that's not something that C mandates; rather, it's an implementation detail that the implementation has chosen not to document (along with most of the other implementation details - the only ones that need to be documented are those that the implementation wishes to promise you can rely upon when using this implementation of C).

And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire. The only time this becomes problematic is when you assume that the implementation does things a certain way because that's how you'd implement it - unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.


to post comments

ISO standards as a baseline, not a target

Posted Aug 14, 2024 11:30 UTC (Wed) by khim (subscriber, #9252) [Link] (5 responses)

> You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.

But that's not how EBX was treated by previous versions of Windows at all.

Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.

That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.

> And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.

It's not scalable: what would you do if someone else would notice that in some other place Windows XP keeps some other “interesting” info in EBX? And what about programs that poke in certain functions and change some bytes there (I know guy who did such things to make it possible to use some internal flag in a CreateFile)? Or programs that scan your kernel to find and remove CLI/STI instructions? Should these be supported, too? And no, that's not a first April joke: these guys were actually crazy enough to send that thing to a space!

> And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.

That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.

It's possible to do that, sometimes, with a fixed compiler and fixed binary of everything else (think about these realtime drivers implemented in Windows 7 userspace via the removal of CLI/STI from Windows kernel), but that's absolutely not a sustainable approach long-term and demanding compiler that support it is just crazy.

And that's exactly what they demand: they don't want to postulate that compiler have to keep instance handle in EBX, no, that's not what they want. They want the ability to write clever code (that notices that EBX contains an instance handle in one place, but size of allocated block in the other space and something else “interesting” in third place) and then all such crazy programs should continue to work after upgrade. Somehow.

> unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.

Isn't that “just don't write programs with UB” approach that “we program for the hardware” crowd explicitly rejects?

They don't want to redefine list of UBs (that compiler, then, may assume not to ever happen), rather they demand the right to write predictable programs with UB! That's the exact opposite from what you are proposing!

That's the core issue: list of UBs is malleable and negotiable, but “predictable treatment of programs with UB” is not possible. And “we program for the hardware” crowd demands changes in how programs with UB are treated, they are not interested in changing the list of UBs!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 12:06 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.

That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.

Right, but it doesn't need a time machine, or rocket science, or magic, to say that while this used to be a sheer accident, it's now the implementation's defined behaviour, and implementations that don't act this way are buggy as per Microsoft KB article XYZ, which links to updates that fix this for versions that were released with a bug.

There are lots of good reasons to not do this, but if this is something that Microsoft actually want, they can define the behaviour this way, and declare versions that don't put an instance handle in EBX at the documented moments as buggy and in need of a fix. This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.

And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.

That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.

I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct). You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).

How do I get an implementation to define a construct in a way that I'm happy with, given that I am not allowed to document the behaviour I want and ask the implementation to agree to comply with my documentation?

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:09 UTC (Wed) by khim (subscriber, #9252) [Link] (3 responses)

> I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct).

Small misunderstanding on your side. Please read what you actually wrote (not what you wanted to write):

> By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared) bug in Windows 8 via an update.

The critical part (change in actual toolchain) was omitted. Thus you proposal sounded like you offered to change the documentation and then fix Windows 8, somehow, without making new toolchain which would treat EBX differently.

I, too, interpreted it like that and wanted to object. Because it's entirely not clear to me how can you “document and fix” that “bug” without changes to the toolchain. But then saw your clarification and understood what you meant. But mb wrote his answer before seeing it this he, like me, assumed you are saying that it's bug in Windows (but not in toolchain) and can be fixed, somehow, without changing toolchain (and definition of language that toolchain handles).

> You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).

No. What I'm saying that what you are proposing is not what “we code for the hardware” crowd is proposing. They don't plant to document anything. Just read that damn document that was already referenced here. It's an ode to the resourcefulness of programmers and proclamation that if these pesky compilers would stop breaking “perfectly valid programs” and are “only just exploiting things that hardware does, but that compiler doesn't know about” then would get much better result than with current compilers that are “abusing UB”.

But, notably, what that “perfect plan” doesn't include are any proposals to change the list of UBs, any plans to define and document anything. It says that documentation should remain the same with some small addition like “compilers should magically stop breaking our programs”. Similarly to how Yodaiken claims that UB treatment in the compilers is “wrong” and demands that compilers should treat UBs differently, but, notably, doesn't propose to document anything.

How do I get an implementation to define a construct in a way that I'm happy with, given that I am not allowed to document the behaviour I want and ask the implementation to agree to comply with my documentation? > This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.

But we are not talking about something documented, but more of the spectre between “you want to say that our compiler always did that in these situations… wow, had no idea” and “yeah, we never promised to always do this and always wanted to do that, but had no resources” cases.

Cases where there were explicitly no promises, but where investigation of compiler and/or other system revealed that they always behave in a certain way (even if it was never documented).

How do you propose to handle that without adding anything to documentation? Remember: all these “we code for the hardware” guys don't talk about changes to the language specification or expansion to it, or anything like that, they all “want to change the way compiler interprets UB” while still keeping the exact behavior undocumented!

Because changing and documenting things is ongoing work and they don't want that. They want some magical solution which would ensure that compiler would stop breaking their programs that rely on something that's officially defined as UB! Not would stop breaking any particular thing, but will “stop doing nasty things” (without even trying to describe these things that shouldn't be broken).

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:17 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

Here's the problem; every time I talk about defining a behaviour in documentation outside the standard, you and mb tell me that I can't do that, because the resulting language is not purely standard C, and thus there can exist implementations that are compliant with the standard but not with my extension. But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.

How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:43 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

> How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?

You don't get. It's as simple as that.

You may get a promise for a given construct to behave the way you need it to behave, but compiler developers put a pretty high bar there: they would ask you to see if standard can be changed first and then, if and when your proposal would be rejected, would still ask you to explain why you couldn't live without it.

There are plenty of UBs that are documented in both clang and gcc and there are plenty of extensions, but they all had to pass that [pretty high] bar: you need to explain why you need something, not want something.

> But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.

But what would you do if someone else wants another, different set of extensions?

Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 14:24 UTC (Wed) by farnz (subscriber, #17727) [Link]

How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?
You don't get. It's as simple as that.

So how, exactly, do I get a language that works for me? You've said that I'm not allowed to define and implement a language that works for me using ISO standard C as a base, because then my source code is not compatible with a compiler that complies with ISO standard C but not Farnz-C. I'm also not allowed to ask an implementation to use the freedom in ISO standard C the way I want it to. So, what am I allowed to do if I want a language that works for me?

But what would you do if someone else wants another, different set of extensions?

Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.

I either have to come up with an extension set that both me and the other people are happy with, and convince the compiler authors that they should follow that extension set, or I have to declare compilers as buggy if they don't follow my documentation (and risk the set of non-buggy compilers for my specification being empty). The risk of an empty set of compilers that work for me is the stick to force me to co-operate with everyone else, and come up with an extension set atop the C standard that works for (e.g.) everyone on desktop Windows, or Android phones, or whatever.

However, you and mb seem to be telling me that I'm not allowed to do this - that because I can't get everyone in the world to agree on more than just the ISO standard, I must not attempt to come up with an extension to the ISO standard and convince compiler implementations to follow my extension in addition to the ISO standard. And that seems badly wrong to me; why can't someone attempt to convince the compiler writers to "fill in the gaps" of the ISO standard in a consistent way?

I'm not, by the way, claiming that it would be easy to do so; I note John Regehr's problems with "Friendly C"; merely claiming that such an extension to ISO is the only possible approach that could work, because instead of saying "you must behave in a way I find intuitive", you're saying "this ill-formed program according to ISO is interpreted as having this meaning", or "this undefined behaviour according to ISO is defined this way if you support this extension", and pushing people to agree on a single extension to the ISO standard.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds