Rust Project goals for 2024 [LWN.net]

Rust vs C

Posted Aug 12, 2024 16:01 UTC (Mon) by eharris (guest, #144549) [Link] (31 responses)

The "C Programming Langage", Kernigham and Richie, second edition is 272 pages long.
"Programmimg Rust", Blandy, Orendorff and Tindall, second edition is 692 pages long.

Q1: 272 vs 692 --- maybe there is a clue here?

Q2: Perhaps someone could write a RustToC program......and then we could get back to using gcc or llvm?

Just a suggestion.......

Rust vs C

Posted Aug 12, 2024 16:19 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> The "C Programming Langage", Kernigham and Richie, second edition is 272 pages long.

MISRA C 2023 Standard: 300 pages. "Safer C: Developing Software for High-integrity and Safety-critical Systems" - 250 pages. "Secure Coding in C and C++" - 600 pages.

Should I go on?

Rust vs C

Posted Aug 12, 2024 17:02 UTC (Mon) by develop7 (guest, #75021) [Link]

Go nuts, make it a lesson.

Concern trolling

Posted Aug 12, 2024 17:15 UTC (Mon) by intelfx (subscriber, #130118) [Link]

This kind of concern trolling helps nobody.

Rust vs C

Posted Aug 12, 2024 20:25 UTC (Mon) by dvdeug (guest, #10998) [Link] (21 responses)

"Greenspun's Tenth Rule of Programming: any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp." - Philip Greenspun

That was said in 1993. Part of the goals of virtually every serious language made since then has been to fix that problem, and that's why they're bigger.

Rust vs C

Posted Aug 12, 2024 22:06 UTC (Mon) by Wol (subscriber, #4433) [Link] (20 responses)

"Starting Forth" is 384 pages.

And presumably like Lisp, and unlike C and C++, both these languages don't bite you (to the best of my knowledge) with undefined behaviour.

Cheers,
Wol

Rust vs C

Posted Aug 13, 2024 8:33 UTC (Tue) by khim (subscriber, #9252) [Link] (19 responses)

> And presumably like Lisp, and unlike C and C++, both these languages don't bite you (to the best of my knowledge) with undefined behaviour.

Situation with Forth is funny, actually. Today there are two Forth camps, so to speak: primitive and free ones like GForth with very limited number of optimizations (and they are often used by real applications because what they lose in speed they win in code size) — and commercial ones with advanced optimizing compilers (not sure if any of them are attempting to do things like authomatic vectorization, but they definitely do inlining, etc).

These do have undefined behavior (and it's handled like it's handled in C, C++ and Rust, of course, for there are no choice), but for now it's mostly small things like when you try to modify the code while it runs (something that was Ok in old versions of Forth because it had no optimizations at all, but is no longer Ok with current definitions of Forth).

The story goes back to the fact any optimization is some kind of transformation of the code and you want to gurantee that optimized and unoptimized code behave identically. Like most “interesting” question that one, too, couldn't be answered if you don't limit the set of valid programs and the only question is whether to declare some useful programs “invalid” and rejected by compiler (this gives us “save” languages, usually with tracing GC and without low-level bit manipulations allowed) or whether to declared some “invalid” programs “wrong” and rejected by developer (this leads us to undefined behavior as can be seen in C/C++).

P.S. Of course there are no need to have hundreds of UBs, I think C/C++ is really an unique language where “this is UB” label is attached not after careful consideration, but more-or-less as knee jerk reaction.

IFNDR

Posted Aug 13, 2024 12:25 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (18 responses)

Rice's Theorem leads in C++ to something technically much worse than Undefined Behaviour, it gets you programs which are Ill-Formed No Diagnostic Required, IFNDR.

Because UB actually happens - even the nasty "time travel UB" where something undefined definitely _would_ have happened in the future and so therefore something perhaps unthinkably bad happens now instead - we can avert it at runtime.

For example suppose we've got a classic time travel UB null check deletion in the Print Student feature, we can prevent this problem by never pressing that "Print" button. "Dear Credentials Teams, the Print Student button is defective, it is crucial that you do not use this button. If you require a printed student record, call internal ext 1000 and explain your need. We anticipate this bug will be fixed on Thursday". The null dereference never become inevitable, so the UB never happens, we're safe.

But suppose instead there's a categorical semantic mistake of the kind Rice's Theorem concerns, oops, we try here to sort students by their grade score, but some students have a NaN score, grade scores are not a total order and so we can't sort them. This means the entire program never had any meaning, it's allowed to compile anyway (No Diagnostics Required) but it might do anything. You can warn users to stay away from the dangerous parts, but just running the software is potentially dangerous, even if the sorting by grade score only happens in tab Z, staying away from tab Z makes no difference to the fact the entire program never had any particular meaning. All bets are off.

Most of those GC languages you mention do have UB it's just rare in practice, Java doesn't because Java expended a *lot* of effort to avoid UB, and I suspect they would say it's too late now but it wasn't worth it - most languages just shrug it off. Rust is more special than I think most people understand for deliberately choosing not to have UB (in its safe subset).

On the other hand IFNDR is a complete travesty, we should not pretend you can write serious software with a compiler which may not even have checked you wrote a valid program. "False positives for the question, is this a valid C++ program?" is how it has been described. That's just not good enough.

IFNDR

Posted Aug 13, 2024 15:24 UTC (Tue) by kid_meier (subscriber, #93987) [Link] (4 responses)

I really don't understand the UB fatalism that seems to assume that you either have UB and thus anything can happen and there are no guarantees whatsoever and the compiler is free to aggressively remove code because it can assume UB can't happen so if it looks like UB it must be dead code. This is absolutely a choice and I've come over to the side that it's a pretty stupid choice. It's certainly a bad default.

I can't remember who, but some time ago this paper "What every compiler writer should know about programmers" [1] was posted here in a comment thread about UB. This is fairly old now, so I am not expert enough to know if compiler optimization has advanced so far that the gains at this point are significantly more than the state-of-the-art at the time it was written but I certainly have my doubts about it.

I just wanted to re-raise this side of the UB debate and push back against this fatalism that keeps being brought up here that there are no other options of handling UB other than the way clang/LLVM and gcc have chosen to handle it, ie. by exploiting it as a logical impossibility and optimizing from there. Again, that is in no way inevitable. The C/C++ standards (AFAIK) make no such claim; it's entirely a result of the (compiler) industry's interpretation of the relevant standards and it seems likely we'd be better off with a softer stance.

I get it, I'm not suggesting security problems would go away if this maximalist interpretation of UB was not the default, but certainly there are a number of well publicised vulnerabilities that wouldn't have occured if this wasn't the industry norm. The horse has left the barn, so indeed this is somewhat pointless -- the mitigations of this maximalist interpretation are well known, etc.

But it could have been another way, and other languages/compilers certainly could strike a different balance.

1. https://c9x.me/compile/bib/ubc.pdf

IFNDR

Posted Aug 13, 2024 16:23 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

That paper is badly written - it makes two false assumptions (that optimizations rely on UB and are working with something that closely resembles C source code), and then stacks up a bunch of strawmen on top of that.

In practice, any modern compiler works by translating from C to some IR such that the semantics of the IR are the same, for all defined behaviour in the C source, as the semantics of the original C source. Optimizations then transform the IR in a way that maintains the IR's semantics, but which is either more optimal itself, or sets up the preconditions for a later optimization to transform the IR in a useful way; in the latter case, there's usually also a much later optimization pass that undoes the optimization if it didn't do something useful (e.g. LLVM has a pass that transforms loops into LCSSA form, various passes that can only fire if the loop is in LCSSA form, and an instcombine pass that removes the "surplus" IR involved in LCSSA form).

As a consequence of this, no optimization actually depends on UB; rather, the optimizations depend on the IR correctly capturing the meaning of the input code, so that they don't optimize in the wrong direction; you can use godbolt.org to examine the optimization passes that LLVM applies to your C or C++ code, and it'll show you the diff to the LLVM IR each optimization produces.

In that godbolt link, I've set the filters to "hide inconsequential passes" (those which don't change the IR for whatever reason), and changed the options to show the full module at a time, so that you can see how LLVM goes from the unoptimized input to the optimized machine code. Every one of those passes in green is doing something to the LLVM IR, and ignoring the C input; if the C to LLVM IR translation is wrong (e.g. because it gives the "wrong" defined behaviour to something that's UB), then the output will be wrong, too.

The fix is for compilers to define more UB in interesting and useful fashion; to choose everyone's favourite whipping-boy, C says that arithmetic on signed numbers cannot overflow, which means that in LLVM IR, C signed arithmetic puts the "nsw" modifier on the arithmetic operation. If Clang redefined overflow as "either calls abort() or wraps in twos' complement" (which ISO is perfectly happy for Clang to do), then Clang would have to remove the "nsw" modifier from LLVM arithmetic operations that it emits.

That then defines the arithmetic as twos' complement wrapping, and would ensure that LLVM optimizations that depend on the arithmetic not wrapping do not fire unless the compiler already knows via some other path that the arithmetic cannot wrap (e.g. because you're doing a / 10 + b / 10 + c / 10, which cannot wrap).

IFNDR

Posted Aug 13, 2024 17:06 UTC (Tue) by mussell (subscriber, #170320) [Link]

To add onto "not all optimizations rely on UB", both GCC and Clang will kill writes to non-volatile memory locations if there is no subsequent reads as this is allowed by the as-if rule (ISO C section 5.1.2.3). Depending on how the compiler orders its passes, this can lead to interesting behaviour. Just as an example, here is a C program that compiles to an infinite loop in GCC, but terminates immediately in Clang as GCC propagates constants before killing empty loops whereas Clang does the opposite. Not only is this not UB, but is a completely acceptable optimization by ISO C as section 6.8.5.2 says

An iteration statement may be assumed by the implementation to terminate if its controlling expression is not a constant expression, and none of the following operations are performed in its body, controlling expression or (in the case of a for statement) its expression-3:
— input/output operations
— accessing a volatile object
— synchronization or atomic operations.

IFNDR

Posted Aug 13, 2024 17:03 UTC (Tue) by mb (subscriber, #50428) [Link]

>that keeps being brought up here that there are no other options of handling UB other than
>the way clang/LLVM and gcc have chosen to handle

That's a wrong assumption already.
Compilers do not "handle" UB. They do not "see UB" and then cause havoc just because they want to.

Compilers do the *opposite* thing of that. They assume that UB is not present in the program.
They do *not* see UB in the source code and react to that. They assume it's not there.

There are plenty languages available with a sane UB or without UB. Just use them.
C/C++ are obsolete.

IFNDR

Posted Aug 13, 2024 17:14 UTC (Tue) by Wol (subscriber, #4433) [Link]

> I just wanted to re-raise this side of the UB debate and push back against this fatalism that keeps being brought up here that there are no other options of handling UB other than the way clang/LLVM and gcc have chosen to handle it, ie. by exploiting it as a logical impossibility and optimizing from there. Again, that is in no way inevitable. The C/C++ standards (AFAIK) make no such claim; it's entirely a result of the (compiler) industry's interpretation of the relevant standards and it seems likely we'd be better off with a softer stance.

Unfortunately, at least in part, this seems driven by an obsession for EXECUTABLE speed, not development speed.

What's the saying, premature optimisation is the root of all evil? It probably makes sense from their point of view as compiler writers, they want fast runtimes. But it's a nightmare for the developer who makes a trivial change and then waits all day for the program to be rebuilt ...

Like relational databases target GUARANTEED response times, at the cost of making ALL response times much slower - "you're guaranteed your data within 10 cache misses", as opposed to MV - you're 95% guaranteed your data with one cache miss (and 99% with two). Relational says "you have to suffer a slow database", MV says "you have to risk an unlikely pathological case".

But the poor end user is rarely allowed to pick the most appropriate option - "tech knows best" and they suffer what they're given.

"Premature optimisation" as I know all too well at work ... or rather, no attempt at optimisation at all because "the database won't let us do it" or "we can't change current practice", or excuse excuse excuse ... nobody can look at the big picture and say "hey you're ruining it for everyone else!".

Cheers,
Wol

ISO standards as a baseline, not a target

Posted Aug 13, 2024 15:51 UTC (Tue) by farnz (subscriber, #17727) [Link] (12 responses)

Part of the problem is that the ISO committee clearly see themselves as defining the minimum baseline that an implementation must meet to call itself a C compiler; things get deemed IFNDR because a compiler on a restricted platform can't reasonably diagnose this, and things get deemed UB because there's existing conflicting interpretations such that portable code cannot use this construct (since it means radically different things on different platforms).

The expectation is that if you're on a "big" platform, an implementation will issue diagnostics for many things that are IFNDR, because while a diagnostic is not required, it's nice to issue one anyway; similarly, the expectation is that compilers will define things that are UB because either this platform has had a consistent definition all along (so why not continue to use it), or because there's only one sane definition for this on this platform, so why not?

We get problems, however, because the only document that you can lean on when you do something unpopular with either implementations or users is the ISO standard, and as a result, arguments about whether an implementation's choices are reasonable devolve into "this meets the minimum requirements to be an ISO C compiler, so you can't complain", rather than asking whether the implementation should do better than the minimum.

ISO standards as a baseline, not a target

Posted Aug 13, 2024 22:17 UTC (Tue) by khim (subscriber, #9252) [Link] (11 responses)

Sure, but the biggest issue is the lack of sane discussion between developers of compilers and user of compilers.

People are still arguing about even most basic and completely obvious facts.

Take the basis of the whole discussion: 100% of optimizations in 100% of compilers 100% of time depend on absence of UB in a translated program.

What could be simpler and more obvious? I mean: questions like why the contents of the EBX register no longer contained a copy of the executable’s instance handle when the executable entry point was called after upgrde to Window 8?. People just have to understand that at least some UBs are sufficiently nasty that the only way to treat these is to declare programs that do such things “completely broken” without any hope to treat them in the way other than “just don't do that or all bets are off” before any kind of dialogue can happen.

The only sane answer to the question about EBX could be “there was never any promises that EBX would contain something useful”. And, similarly, one may write many such “crazy”, “totally insane” seemingly portable programs! Programs that may read seemingly uninitialised variables or poke in the function internals (one is allowed to convert function pointers to integer and back in C), etc. Some of these programs are even pretty robust: my set/add example works on many compilers and surprising number of platforms even if I don't know of even a single optimizing compiler that doesn't break it.

But technically, as far as standard is concerned, that program belongs to the exact same class of programs as program with a nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment!

Both the programs that are so patently insane that you may only find one version of one compiler which wouldn't break it (e.g. if program looks into it's own compiled binary code to get seed for a random number generator and then decodes assets using that number) and programs that can be compiled by pretty much any sane compiler and only break on obscure and rare compiler which couldn't handle file which doesn't have a new-line character at the very end are put in one large category: that's a program with UB.

As a result compiler users look on hundreds of UBs where majority of the list are totally silly UBs that are only there to make this or that compiler is standards compliant — and say that compilers may do better, they are not required to treat UBs as full carte blanche that allows one to do anything to the program that has such UB!

And then compiler developers hear that, obviously incorrect, assertion (because some UBs can only be treated as full carte blanche to do anything… e.g., as I have already said I don't know of any compiler that tries to keep programs that store random values on stack on one function and then try to access them in the other function working) and interpret it as total ignorance and misunderstanding on the side of compiler users… and if you are talking to someone who doesn't understand logic and couldn't reason about anything… what choice do you have? Ignore them…that's just natural, after all.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 8:21 UTC (Wed) by farnz (subscriber, #17727) [Link] (10 responses)

But note that the system could promise to set the EBX register to the instance handle if it so desired. And a compiler could promise that your set/add example would work, even in the face of optimizations.

These are discussions that can be had with compiler authors, based on what you want the behaviour to be defined as; they should not be seen as "make the compiler do something sensible in the face of UB", but rather "have the compiler commit to a specific definition of the behaviour of this construct" - effectively a "GNU extension" or "LLVM extension" to the ISO standard.

Where the discussion goes off the rails, and badly so, is when people either say that "in this situation" it should be given a specific meaning (that's not how this works - the optimizers are sufficiently far removed from the source that "in this situation" is meaningless), or try to argue that the ISO standard is the only thing that defines the behaviour of a given compiler (and not the compiler's own documentation in addition to ISO).

ISO standards as a baseline, not a target

Posted Aug 14, 2024 9:16 UTC (Wed) by khim (subscriber, #9252) [Link] (9 responses)

> But note that the system could promise to set the EBX register to the instance handle if it so desired.

How? With the help of time machine? And sending someone into the past? Brad Chen (and, most likely, the whole Windows team) have found out about that interesting property when they broke it in Windows 8 and one of customers complained about it!

> And a compiler could promise that your set/add example would work, even in the face of optimizations.

The same story: you have to compile program in a certain, clearly not optimal way and promise to do it the exact same way in the future. If it's not an UB, but something documented then sure, it can be supported. But if it's “I have noticed that your compiler treats it in that way and I know how CPU work so now you have to support that accidental property indefinitely”… that's impossible.

Even Linux struggles to support such accidental properties and it only does that for extremely narrow syscall interface, not for random code that kernel may or may not have. There it's the opposite: for anything in the bowels of linux kernel it refuses to support any kinds of compatibility very loudly and explicitly .

The only way to do that is to stop describing language in terms of semantic and precisely describe what exact machine sequence of instructions is generated by each language construct.

Then the curse of Rice theorem is lifted and yes, UB disappears. But all optimizations disappear, too. That's how early Forth implementation worked.

> These are discussions that can be had with compiler authors, based on what you want the behaviour to be defined as; they should not be seen as "make the compiler do something sensible in the face of UB", but rather "have the compiler commit to a specific definition of the behaviour of this construct" - effectively a "GNU extension" or "LLVM extension" to the ISO standard.

That would have been sensible approach, but none of “we code for the hardware” people ever propose to do that. Instead they try to promote their right to use knowledge of the hardware and demand that compiler should not interfere. Write ridiculous articles. And when someone tries to collect their proposals… ho, boy.

> or try to argue that the ISO standard is the only thing that defines the behaviour of a given compiler (and not the compiler's own documentation in addition to ISO).

Do you have any examples? I have never seen anyone who was saying that compiler's own documentation is irrelevant. What I have seen is assertion that some other, third-party, standards (like POSIX or CPU specification or other such documents) are irrelevant.

Compiler writers argue not that only ISO standard that matters, or that only compiler documentation matters but rather that nothing else matters! If there are no explicit promise in the compiler documentation then it would be uphold, I never heard anything else, but all these “I know that documentation doesn't say anything about X but I know CPU does Y thus I'm entitled to do Z”… nope, things don't work like that.

And while that decision is unfortunate from POV of many developers (I can understand why “we code for the hardware” people want to [ab]use full freedom that hardware gives them), but it's the only sensible one. Intel SDK manual was 5057 pages last time I checked, while was 14777 pages last time I checked… and that's before we even discuss their applicability to what compiler generates!

If compiler users would have asked to add some more rules to the compiler documentation then there would have been a chance to have some sensible discussion.

But their demand is different! Essentially: stop breaking my programs, I don't care what you have to do for that.

That's not a dialogue, that's ultimatum and, worse, ultimatum of a kind that compiler developers couldn't accept even if they wanted to!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 9:30 UTC (Wed) by farnz (subscriber, #17727) [Link] (8 responses)

How? With the help of time machine? And sending someone into the past? Brad Chen (and, most likely, the whole Windows team) have found out about that interesting property when they broke it in Windows 8 and one of customers complained about it!

By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared) bug in Windows 8 via an update. This is not rocket science, and does not need a time machine - you can declare that the system as released was buggy, and provide an update to fix it, just as you do with any other bug.

The same story: you have to compile program in a certain, clearly not optimal way and promise to do it the exact same way in the future. If it's not an UB, but something documented then sure, it can be supported. But if it's “I have noticed that your compiler treats it in that way and I know how CPU work so now you have to support that accidental property indefinitely”… that's impossible.

Sure, but the solution is not to keep going down the "your compiler treats it this way in version x.y.z, so you have to support this property forever" argument, which is a non-starter, but instead go down the "this should be defined by the implementation as meaning this" argument. Unlike the "accidental property" argument, that's actually productive, in that if your definition is unreasonable, the compiler developers can explain why it's unreasonable.

I have no online examples of people arguing that the ISO standard is the only thing that matters for defining C - these are arguments I've had in real life, where people have tried to tell me that it's unreasonable for a program documented as only compiling with GCC to use GNU extensions to C, since then I'm not using standard C. But I know I'm not using standard C - I've chosen to depend on GNU extensions!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 9:58 UTC (Wed) by mb (subscriber, #50428) [Link] (7 responses)

>By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared)
>bug in Windows 8 via an update. This is not rocket science,

It pretty much is rocket science, because there is no way to ensure EBX keeps the assigned value until the user code uses it in their code.
The compiler is free to reuse the register at any point in time after the assignment. On both sides of the call.

There is no way to prevent that.

There is no EBX in C.
There is no "programming for the hardware" in C.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 10:14 UTC (Wed) by farnz (subscriber, #17727) [Link] (6 responses)

You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.

And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.

Finally, your "There is no EBX in C" is a problematic statement, because you're defining "C" very narrowly to justify it; there is no EBX in ISO Standard C (since there's no guarantee that you're compiling for a machine with an EBX), but there absolutely is an EBX in an implementation of C that targets x86-32 and permits inline assembly. And once you have an EBX, the compiler can choose to define what's in it - this is a discussion that can be had with the implementation, and chances are good that they'll have very strong reasons to use EBX as "just another register" rather than "always contains an instance handle", but that's not something that C mandates; rather, it's an implementation detail that the implementation has chosen not to document (along with most of the other implementation details - the only ones that need to be documented are those that the implementation wishes to promise you can rely upon when using this implementation of C).

And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire. The only time this becomes problematic is when you assume that the implementation does things a certain way because that's how you'd implement it - unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 11:30 UTC (Wed) by khim (subscriber, #9252) [Link] (5 responses)

> You can't do anything about it in user assembly; but you can define it as a bug in your toolchain if your toolchain uses EBX for anything other than the instance handle.

But that's not how EBX was treated by previous versions of Windows at all.

Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.

That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.

> And your compiler absolutely can be defined as not using the register EBX for anything other than the instance handle; compilers have all sorts of special-purpose definitions of registers over and above those the hardware enforces, and this is perfectly normal. It's not necessarily a good idea, especially if you're register-constrained (like x86-32), but it's completely doable.

It's not scalable: what would you do if someone else would notice that in some other place Windows XP keeps some other “interesting” info in EBX? And what about programs that poke in certain functions and change some bytes there (I know guy who did such things to make it possible to use some internal flag in a CreateFile)? Or programs that scan your kernel to find and remove CLI/STI instructions? Should these be supported, too? And no, that's not a first April joke: these guys were actually crazy enough to send that thing to a space!

> And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.

That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.

It's possible to do that, sometimes, with a fixed compiler and fixed binary of everything else (think about these realtime drivers implemented in Windows 7 userspace via the removal of CLI/STI from Windows kernel), but that's absolutely not a sustainable approach long-term and demanding compiler that support it is just crazy.

And that's exactly what they demand: they don't want to postulate that compiler have to keep instance handle in EBX, no, that's not what they want. They want the ability to write clever code (that notices that EBX contains an instance handle in one place, but size of allocated block in the other space and something else “interesting” in third place) and then all such crazy programs should continue to work after upgrade. Somehow.

> unless the implementation documents that it does things a certain way, you can't assume anything about what it actually does.

Isn't that “just don't write programs with UB” approach that “we program for the hardware” crowd explicitly rejects?

They don't want to redefine list of UBs (that compiler, then, may assume not to ever happen), rather they demand the right to write predictable programs with UB! That's the exact opposite from what you are proposing!

That's the core issue: list of UBs is malleable and negotiable, but “predictable treatment of programs with UB” is not possible. And “we program for the hardware” crowd demands changes in how programs with UB are treated, they are not interested in changing the list of UBs!

ISO standards as a baseline, not a target

Posted Aug 14, 2024 12:06 UTC (Wed) by farnz (subscriber, #17727) [Link] (4 responses)

Normal arguments were passed on stack, as usual. And EBX was a temporary register that compiler use to work with an instance handle in loader. Then, when loaded started that instance EBX kept that value.
That was never documented or promised in any way, developers had no idea what EBX contains, it was just a sheer accident that it always contained an instance handle when instance initialization routine was called.

Right, but it doesn't need a time machine, or rocket science, or magic, to say that while this used to be a sheer accident, it's now the implementation's defined behaviour, and implementations that don't act this way are buggy as per Microsoft KB article XYZ, which links to updates that fix this for versions that were released with a bug.

There are lots of good reasons to not do this, but if this is something that Microsoft actually want, they can define the behaviour this way, and declare versions that don't put an instance handle in EBX at the documented moments as buggy and in need of a fix. This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.

And you absolutely can "program for the hardware" in C, just as you can in Rust, or Haskell, or JavaScript; you rely on documented behaviours of your implementation of the language to provide you with the semantics you desire.

That's not “we program for the hardware” approach. The whole premise of “we program for the hardware” guys is that sometimes compiler is “not clever enough” to do things in an optimal way so we do it “behind the compiler back” with the use of things that “we know about our hardware, but compiler doesn't know”.

I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct). You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).

How do I get an implementation to define a construct in a way that I'm happy with, given that I am not allowed to document the behaviour I want and ask the implementation to agree to comply with my documentation?

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:09 UTC (Wed) by khim (subscriber, #9252) [Link] (3 responses)

> I'm lost; mb accused me of "programming for the hardware" when I said that if you want a given behaviour from the system, you need to document it and get the implementations to agree to follow your documentation (which can take implementation details like the EBX register into account, because it's a document about how to implement a construct).

Small misunderstanding on your side. Please read what you actually wrote (not what you wanted to write):

> By documenting that EBX will be set to the instance handle, and fixing the (retroactively declared) bug in Windows 8 via an update.

The critical part (change in actual toolchain) was omitted. Thus you proposal sounded like you offered to change the documentation and then fix Windows 8, somehow, without making new toolchain which would treat EBX differently.

I, too, interpreted it like that and wanted to object. Because it's entirely not clear to me how can you “document and fix” that “bug” without changes to the toolchain. But then saw your clarification and understood what you meant. But mb wrote his answer before seeing it this he, like me, assumed you are saying that it's bug in Windows (but not in toolchain) and can be fixed, somehow, without changing toolchain (and definition of language that toolchain handles).

> You're now saying that I can't document desired behaviour and ask the implementation to comply with my document, because that's not allowed either, since I'm asking the implementation to lock down a behaviour that's not locked down in the standard, but left as UB (or other loose definition).

No. What I'm saying that what you are proposing is not what “we code for the hardware” crowd is proposing. They don't plant to document anything. Just read that damn document that was already referenced here. It's an ode to the resourcefulness of programmers and proclamation that if these pesky compilers would stop breaking “perfectly valid programs” and are “only just exploiting things that hardware does, but that compiler doesn't know about” then would get much better result than with current compilers that are “abusing UB”.

But, notably, what that “perfect plan” doesn't include are any proposals to change the list of UBs, any plans to define and document anything. It says that documentation should remain the same with some small addition like “compilers should magically stop breaking our programs”. Similarly to how Yodaiken claims that UB treatment in the compilers is “wrong” and demands that compilers should treat UBs differently, but, notably, doesn't propose to document anything.

How do I get an implementation to define a construct in a way that I'm happy with, given that I am not allowed to document the behaviour I want and ask the implementation to agree to comply with my documentation? > This is no different to any other bug that's caught after release; you had behaviour that doesn't meet the documented requirements, so you change it in an update.

But we are not talking about something documented, but more of the spectre between “you want to say that our compiler always did that in these situations… wow, had no idea” and “yeah, we never promised to always do this and always wanted to do that, but had no resources” cases.

Cases where there were explicitly no promises, but where investigation of compiler and/or other system revealed that they always behave in a certain way (even if it was never documented).

How do you propose to handle that without adding anything to documentation? Remember: all these “we code for the hardware” guys don't talk about changes to the language specification or expansion to it, or anything like that, they all “want to change the way compiler interprets UB” while still keeping the exact behavior undocumented!

Because changing and documenting things is ongoing work and they don't want that. They want some magical solution which would ensure that compiler would stop breaking their programs that rely on something that's officially defined as UB! Not would stop breaking any particular thing, but will “stop doing nasty things” (without even trying to describe these things that shouldn't be broken).

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:17 UTC (Wed) by farnz (subscriber, #17727) [Link] (2 responses)

Here's the problem; every time I talk about defining a behaviour in documentation outside the standard, you and mb tell me that I can't do that, because the resulting language is not purely standard C, and thus there can exist implementations that are compliant with the standard but not with my extension. But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.

How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?

ISO standards as a baseline, not a target

Posted Aug 14, 2024 13:43 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

> How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?

You don't get. It's as simple as that.

You may get a promise for a given construct to behave the way you need it to behave, but compiler developers put a pretty high bar there: they would ask you to see if standard can be changed first and then, if and when your proposal would be rejected, would still ask you to explain why you couldn't live without it.

There are plenty of UBs that are documented in both clang and gcc and there are plenty of extensions, but they all had to pass that [pretty high] bar: you need to explain why you need something, not want something.

> But I'm saying that the only way to get what people want is to document a standard-compatible extension to C, and to say that what you're writing is not standard C, but instead standard C plus this documented set of extensions - and thus that an implementation that doesn't supply my documented set of extensions is a buggy implementation for the purposes of my program.

But what would you do if someone else wants another, different set of extensions?

Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.

ISO standards as a baseline, not a target

Posted Aug 14, 2024 14:24 UTC (Wed) by farnz (subscriber, #17727) [Link]

How, exactly, do I get a given construct to behave the way I want it to if I'm barred from documenting the behaviour I want because not all C compilers will follow my documentation?
You don't get. It's as simple as that.

So how, exactly, do I get a language that works for me? You've said that I'm not allowed to define and implement a language that works for me using ISO standard C as a base, because then my source code is not compatible with a compiler that complies with ISO standard C but not Farnz-C. I'm also not allowed to ask an implementation to use the freedom in ISO standard C the way I want it to. So, what am I allowed to do if I want a language that works for me?

But what would you do if someone else wants another, different set of extensions?
Compiler developers don't want to develop compiler for a bazillion different, incompatible, languages (and it's pretty obvious why) and “we code for the hardware” people couldn't propose any coherent offers to change the language, they are only concerned about their programs, they don't want to think about anything more abstract then what they are actually writing and what compiler, according to them, breaks.

I either have to come up with an extension set that both me and the other people are happy with, and convince the compiler authors that they should follow that extension set, or I have to declare compilers as buggy if they don't follow my documentation (and risk the set of non-buggy compilers for my specification being empty). The risk of an empty set of compilers that work for me is the stick to force me to co-operate with everyone else, and come up with an extension set atop the C standard that works for (e.g.) everyone on desktop Windows, or Android phones, or whatever.

However, you and mb seem to be telling me that I'm not allowed to do this - that because I can't get everyone in the world to agree on more than just the ISO standard, I must not attempt to come up with an extension to the ISO standard and convince compiler implementations to follow my extension in addition to the ISO standard. And that seems badly wrong to me; why can't someone attempt to convince the compiler writers to "fill in the gaps" of the ISO standard in a consistent way?

I'm not, by the way, claiming that it would be easy to do so; I note John Regehr's problems with "Friendly C"; merely claiming that such an extension to ISO is the only possible approach that could work, because instead of saying "you must behave in a way I find intuitive", you're saying "this ill-formed program according to ISO is interpreted as having this meaning", or "this undefined behaviour according to ISO is defined this way if you support this extension", and pushing people to agree on a single extension to the ISO standard.

Rust vs C

Posted Aug 12, 2024 22:32 UTC (Mon) by linuxrocks123 (subscriber, #34648) [Link]

Rust has a builtin FFI for C, and there are projects like cxx.rs for example which try to make it easy to link with C++. So, I wouldn't worry too much about there not being a "Rust2C" tool. It's already pretty easy to use Rust code without having to use Rust.

Rust vs C

Posted Aug 13, 2024 9:17 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> Q2: Perhaps someone could write a RustToC program......and then we could get back to using gcc or llvm?

mrustc already exists, but it's goal is not to provide something that anyone may use for random rust projects, but solelely to make it possible to bootstrap regular rustc on a platform that doesn't have it.

> Q1: 272 vs 692 --- maybe there is a clue here?

Oh, absolutely. K&R is not even a language, it doesn't define so many things that you could write neither compiler nor working program using just K&R book, alone. Not only it leaves many important things not explains (like, e.g., how can printf or open functions exist if each function have to have fixed number of arguments), but, maybe even more importantly, today, C doesn't include even most basic building blocks like hashsets or dynamic strings (which means that every project have to grow it's own, incompatible, set).

I'm not sure if that's possible to write real program while only reading Programming Rust and nothing else (probably not), but it's definitely a step in a right direction.

Complexity have to live somewhere and if you try to make your seminal book about the programming language extra-simple then you make everything else around said book needlessly complex.

Of course it's all about trade-offs and it wouldn't be a good idea to collect all knowledge that one may need ever into one huge 1000000 pages language tutorial, but 272 pages of K&R is not an advantage but a liability.

Not all problems that C/C++ (and systems built on top of it) faces today can be trace to that attempt overuse worse is better, but just the majority of them. Most of the issues that we bitterly fight today can be traced to that fateful decision to abandon things like safety or correctness to make sure C compiler would fit into that 16KB PDP-11.

Today typical size of PC memory is not 16KB, and not even 16MB, but more like 16GB — and yet we still don't have “nice things” in C because of that fixation of throwing away all “complications” from the good language.

P.S. The saddest thing is really C++, not C. At least C had a reason to cripple the concept of high-level language. That reason may not exist anymore but it was actually valid when C was invented. C++… ugh. C++ adds insane amount of complexity yet it took decades for it to get some very basic features. Things like the ability to check if element is in a set (added in C++20) or the ability to return either error or value (added in C+23), etc. That's kinda-sorta-understandable, too: C++ was chasing the OOP dream (which turned out to be a mirage) and that's why basic necessities that C removed to fit into tiny systems are only returning now and in a very twisted fashion, but that's fundamentally different from C path: C pretty much had to remove basic necessities, or else it would have been not viable for the task that it was supposed to solve, but why have C++ added so much cruft before adding back things that C removed?

Rust vs C

Posted Aug 14, 2024 15:59 UTC (Wed) by Karellen (subscriber, #67644) [Link]

> Q1: 272 vs 692 --- maybe there is a clue here?

Not only it leaves many important things not explains (like, e.g., how can printf or open functions exist if each function have to have fixed number of arguments),

FWIW, my copy of The C Programming Language is 2nd ed, but is 272 pages long, so is (I presume) the edition under discussion.

In that edition, the part that describes printf() in detail is Section 7.2 on pp. 153, which is immediately followed on pp. 155 by Section 7.3 Variable-length Argument Lists which describes the ... declaration syntax, the va_list type and the va_start, va_arg and va_end macros, and then gives an example of how to write a cut-down minprintf() using those elements.

Rust vs C

Posted Aug 13, 2024 13:55 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

> Q2: Perhaps someone could write a RustToC program......and then we could get back to using gcc or llvm?

I thought Rust *relied* on llvm ...

Cheers,
Wol

Rust vs C

Posted Aug 13, 2024 17:30 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Probably meant clang. But yes, rustc primarily uses LLVM. It has Cranelift (meant for quicker debug builds on x86_64) and a gcc backend (not gccrs) under development. Rust is not beholden to LLVM as a language though.

Rust vs C

Posted Aug 14, 2024 6:10 UTC (Wed) by jezuch (subscriber, #52988) [Link]

Well, The Brainfuck specification, when printed from its github Readme, is 10 pages long, and it's a really verbose description of the language. By your criteria, we should all program in Brainfuck, right?

Rust design axioms for Linux

Posted Aug 12, 2024 17:59 UTC (Mon) by tux3 (subscriber, #101245) [Link]

>Design Axioms
> First, do no harm. If we want to make a good first impression on kernel developers, the minimum we can do is fit comfortably within their existing workflows so that people not using Rust don't have to do extra work to support it. So long as Linux relies on unstable features, users will have to ensure they have the correct version of Rust installed, which means imposing labor on all Kernel developers.
> Don't let perfect be the enemy of good. The primary goal is to offer stable support for the particular use cases that the Linux kernel requires. Wherever possible we aim to stabilize features completely, but if necessary, we can try to stabilize a subset of functionality that meets the kernel developers' needs while leaving other aspects unstable.

That is very nice to see. Not just enthusiasm from the Rust project, but a good bit of thoughtfulness too.

Distributed compiling?

Posted Aug 12, 2024 19:36 UTC (Mon) by mss (subscriber, #138799) [Link] (13 responses)

It looks like distributed compiling didn't make that list of Rust goals.

For C/C++ we have distcc, which works pretty reliably for distributed building of large C++ packages like Chromium, Firefox or Thunderbird.

On the other hand, AFAIK the only equivalent for Rust is sccache which seems a bit broken: 1 2.

If amount of Rust code in packages were to increase significantly that's a compile time gap that will need to be fixed somehow.

Distributed compiling?

Posted Aug 12, 2024 20:00 UTC (Mon) by epa (subscriber, #39769) [Link] (10 responses)

Doesn’t the way Rust works make it harder to distribute compiling? In C you can compile each source file to an object file, with a fairly low-level interface between objects, tied together by the linker. In Rust you need greater whole-program analysis so you can’t simply compile a hundred source files with a hundred independent compilers and link them afterwards. Isn’t that so?

Distributed compiling?

Posted Aug 12, 2024 20:07 UTC (Mon) by mb (subscriber, #50428) [Link] (5 responses)

If you use the exact same version of the compiler and cargo on all machines, distribution should be possible on crate level. Which is probably good enough.

Distributed compiling?

Posted Aug 13, 2024 19:12 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (4 responses)

Maybe I'm just a fool, but I don't see how that can possibly work in the general case. How does it handle generics and monomorphization? For example, consider the following code directly from the stdlib (specifically in core/iter/traits/iterator.rs):

fn fold<B, F>(mut self, init: B, mut f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
let mut accum = init;
while let Some(x) = self.next() {
accum = f(accum, x);
}
accum
}

Since std is a crate, we should be able to compile the above function separately from code that calls into it. But I don't see how you compile this function into something with a fixed ABI that can be called by arbitrary code. Because we take F: FnMut(...) -> B instead of just F: fn(...) -> B, we're "supposed" to compile the line accum = f(accum, x) into a direct call, if the callee can be deduced at compile time (in that case, the f argument is supposed to be entirely elided post-monomorphization). In fact, Rust can even inline that direct call if the compiler so chooses. Losing that direct call is a significant hindrance to further optimization opportunities.

And then there's the fact that self is passed by value, which means it's not hidden behind a fat pointer, so the compiler has to know the size of Self (as reflected in the where clause with Self: Sized). But if you compile separately, I don't see how that happens (this is a trait function, so Self can be almost anything). Not to mention we would like the compiler to elide the move of self, or even dematerialize both self and fold() altogether and just emit a loop (Rust is quite good at doing that in practice), which seems wildly incompatible with a fixed ABI.

You can probably pre-compile it into some kind of AST-like-thing, so that it can be more easily monomorphized, but I'm not sure I would characterize that as "separate compilation" anymore. You're effectively re-compiling it separately at every callsite. And it's not "distributed" at all, because pre-compilation would have to be sequenced before actual compilation at each callsite.

This is not atypical code, or even very complicated code. It's exactly the sort of boring, simple code that developers will write all the time. If you can't distribute something as simple as this, then distributed compilation doesn't sound like it would work for Rust.

Distributed compiling?

Posted Aug 13, 2024 19:19 UTC (Tue) by mb (subscriber, #50428) [Link]

Well, cargo already compiles crates in parallel, doesn't it?
So why should that be bound to one machine?
Of course, the machines would have to communicate much more than just "pass the preprocessed code" like C does.
Note that compilation doesn't necessarily mean compilation to machine code. Which of course isn't possible for generics.

Distributed compiling?

Posted Aug 13, 2024 22:11 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

It's maybe more obvious in C++ where the "templates" really look like they're copy-paste code and we're #including the text of them - but it would clearly be implemented the same with Rust's generics even though it's not as obvious it's just a copy-paste job.

fold<B, F> isn't a single thing, it's parametrised, so the compiler is going to make fold<u8, my_custom_fold_function> and fold<MyCustomType, |acc, x| acc += x> and dozens, maybe hundreds more, for each distinct call to this function with either a different iterator type or different type parameters to the fold function or both.

A late optimisation is to merge identical implementations, in practice fold<u16, do_stuff> and fold<i16, unrelated_fn> may compile to the same machine code, so, no need to keep both. It may be that for a distributed compile it's acceptable to produce a larger binary in a shorter time by using multiple machines and not performing all possible merges.

Distributed compiling?

Posted Aug 14, 2024 7:41 UTC (Wed) by farnz (subscriber, #17727) [Link]

Assuming a fixed version of the toolchain on all machines, each crate outputs an "rmeta" file early in compilation, which contains everything you need to know to handle generics and monomorphization, but not linking. Parallel cargo works by waiting for the compiler to tell it that the rmeta file exists and is ready to use, then starting on the next crate.

The only thing you need to compile a downstream dependency of a crate is the rmeta file, which is output as early as possible in compilation, and contains MIR bytecode for generics, sizes of types, layouts of externally visible types etc. It's similar in most respects to a C++17 header file, just generated from the source instead of hand-written.

Distributed compiling?

Posted Aug 14, 2024 15:46 UTC (Wed) by notriddle (subscriber, #130608) [Link]

> You can probably pre-compile it into some kind of AST-like-thing, so that it can be more easily monomorphized, but I'm not sure I would characterize that as "separate compilation" anymore.

Rust has been doing that since about 2016, except that they can actually lower the code all the way down to a CFG before monomorphizing it. https://rustc-dev-guide.rust-lang.org/mir/optimizations.html

Distributed compiling?

Posted Aug 12, 2024 20:13 UTC (Mon) by mss (subscriber, #138799) [Link] (3 responses)

I think it's easier in C/C++ because you can preprocess whole translation units on the machine holding the source code and then just distribute (the most compute intensive) tasks of turning these preprocessed sources into object files to build machines.

This way build machines don't need to have matching header files or a copy of the build tree.

I'm not a Rust expert but looking at this comment in the relevant rust-lang issue that this will likely amount to an alternate implementation of Cargo it seems like developing a distributed compilation system is more complicated for Rust than C/C++.

Distributed compiling?

Posted Aug 13, 2024 14:53 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

FWIW, only C has that anymore. Distributed C++20 modules compilation will need similar work to figure out how to ship things beyond preprocessed source between machines and some awareness from tools like `ccache` and `distcc` (and alternative implementations thereof).

Distributed compiling?

Posted Aug 13, 2024 14:59 UTC (Tue) by mss (subscriber, #138799) [Link] (1 responses)

AFAIK very few C++ codebases currently use C++ modules.

Distributed compiling?

Posted Aug 13, 2024 17:31 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

Yes, but tools should be considering what to do *before* codebases end up using them and builders come asking for their tooling to support it.

Distributed compiling?

Posted Aug 13, 2024 7:49 UTC (Tue) by taladar (subscriber, #68407) [Link] (1 responses)

On the other hand every 2000 line Rust source file isn't 100k+ lines due to textual includes like it is in C or C++ so the compile effort shouldn't explode quite so badly on large projects in Rust, especially with a culture of splitting larger projects into more separate crates.

Distributed compiling?

Posted Aug 14, 2024 23:58 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Although the entire text of some "header file" isn't copy-pasted, it is usual in Rust to write a lot of #[derive(...)] lists for your user defined types.

#[derive(Hash)] is just a few keystrokes, but it will emit a new implementation of this trait, with a hash function, in which all the elements of your user defined type are separately hashed as appropriate.

For the standard library derive macros (and to a lesser extent popular 3rd party crates) this will definitely work, which is a change from #include files that may cause complete chaos in the resulting code as they perform arbitrary text re-writes of your code and each other, but in respect of more work for the compiler than you can see by eye that's still there.