|
|
Log in / Subscribe / Register

DeVault: Announcing the Hare programming language

DeVault: Announcing the Hare programming language

Posted May 10, 2022 16:10 UTC (Tue) by khim (subscriber, #9252)
In reply to: DeVault: Announcing the Hare programming language by tialaramex
Parent article: DeVault: Announcing the Hare programming language

> Your resulting C compiler is not the GCC I grew up with (well, OK, the one that teenage me knew), or that with some minor optimisation passes disabled, it's an altogether different animal, perhaps closer to Python.

But that's is the language which Kernighan and Ritchie designed and used to write Unix. Their goal was not to create some crazy portable dream, they just wanted to keep supporting both 18-bit PDP-7 and 16-bit PDP-11 from the same codebase by rewriting some parts of code written in PDP-7 assembler in the higher-level language. They have been using B which had no types at all and improved it. By adding character types, then structs, arrays, pointers (yes, B conflated pointers and integers, it only had one type).

Yet malloc was not special, free was not special and not even all Unix programs used them (just look obn the source of original Bourne Shell some days).

> I don't believe there is or was an audience for this compiler.

How about “all the C users for the first decade of it's existence”? Initially C was used exclusively in Unix, but in 1980th it became used more widely. Yet I don't think any compilers of that era support anything even remotely resembling “pointer provenance”.

That craziness started after a failed attempt of C standard committee to redo the language. They then went back and replaced that with a simpler aliasing rules which prevented type puning, but even these weren't abused by compilers till XXI century.

> People weren't writing C because of the expressive syntax, the unsurpassed quality of the tooling or the comprehensive "batteries included" standard library, it didn't have any of those things - they were doing it because C compilers produce reasonable machine code, and this alternative interpretation of C89 doesn't do that any more.

Can you, please, stop rewriting history? C was quite popular way before ANSI C arrived and tried (but failed!) to introduce crazy aliasing rules. Yes, C compilers were restricted and couldn't do all the amazing optimizations… but C developers can do these, instead! When John Carmack was adopting his infamous 0x5f3759df-based trick he certainly haven't cared to think about the fact that there are some aliasing rules which may render code invalid and that was true for the majority of users who grew in an era before GCC started breaking good K&R programs.

> This is because under your preferred C89 "no provenance" model there isn't any provenance, CHERI isn't a fairytale spell it's just engineering.

It's engineering, yes, but you can add provenance to C89. It just has to be consistent. You can even model it with “poor man's CHERI” aka MS-DOS Large model by playing tricks with segment and offset. E.g. realloc could turn 0x0:0x1234 pointer into 0x1:0x1224 pointer if it decided not to move an object.

This way all the fast-path code would be negated and you would never have the situation where bitwise-identical pointers point to different objects. This may not be super-efficient but it is compatible with C89. Remember? Bitwise-different pointers can point to the same object, but the opposite is forbidden! Easy, no?

All these games where certain pointers can be compared but not others and its programmer's responsibility to remember all these unwritten rules… I don't know how that language can be used to development, sorry.

The advice I have gotten from our toolchain-support team is to ask clang developer about low-level constructs which I may wish to create!

So much for “standard is a treaty” talks…


to post comments

DeVault: Announcing the Hare programming language

Posted May 10, 2022 18:03 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

Early C did not have a formal specification - what the one and only implementation did was what the language did.

And the problem is that formally specified C - including K&R C, and C89 - left a huge amount unspecified; users of C assumed that the behaviour of their implementation of the language was C behaviour, and not merely the way their implementation behaved.

Up until the early 1990s, this wasn't a big deal. The tools needed for compilers to do much more than peephole optimizations simply didn't exist in usable form; remember that SSA, the first of the tools needed to start reasoning about blocks or even entire programs doesn't appear in the literature until 1988. As a result, most implementations happened, more by luck than judgement, to behave in similar ways where the specification was silent.

But then we got SSA, the polytope model, and other tools that allowed compilers to do significant optimizations beyond peephole optimizations on the source code, their IR, and the object code. And now we have a mess - the provenance model, for example, is compiler authors trying to find a rigorous way to model what users "expect" from pointers, not just what C89 permits users to assume, while C11's concurrent memory model is an effort to come up with a rigorous model for what users can expect when multiple threads of execution alter the state of the abstract machine.

Remember that all you are guaranteed about your code in C89 is that the code behaves as-if it made certain changes to the abstract machine for each specified operation (standard library as well as language), and that all volatile accesses are visible in the real machine in program order. Nothing else is promised to you - there's no such thing as a "compiler barrier" in C89, for example.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 19:57 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> And the problem is that formally specified C - including K&R C, and C89 - left a huge amount unspecified; users of C assumed that the behaviour of their implementation of the language was C behaviour, and not merely the way their implementation behaved.

True, but irrelevant. The most important part that we discussing here was specified in both: pointers are addresses, if two pointers are equal they can be used interchangeably.

> But then we got SSA, the polytope model, and other tools that allowed compilers to do significant optimizations beyond peephole optimizations on the source code, their IR, and the object code.

Yes. And there was an attempt to inject ideas that make these useful into C89. Failed one. The committee has created an unreal language that no one can or will actually use. It was ripped out (and replaced with crazy aliasing rules, but that's another story).

> And now we have a mess - the provenance model, for example, is compiler authors trying to find a rigorous way to model what users "expect" from pointers, not just what C89 permits users to assume

Can you, please stop lying? Provenance models are trying to justify deliberate sabotage where fully-standard compliant programs are broken. It's not my words, the provenance proposal itself laments:

These GCC and ICC outcomes would not be correct with respect to a concrete semantics, and so to make the existing compiler behaviour sound it is necessary for this program to be deemed to have undefined behaviour.

To make the existing compiler behavior sound, my ass. The whole story of provenance started with sabotage: after failed attempt to bring provanance-like properties to C89 saboteurs returned in C99 and, finally, succeeded in adding some (and thus rendered some C89 programs invalid in the process), but that wasn't enough: they got the infamous DR260 resolution which was phrased like that: After much discussion, the UK C Panel came to a number of conclusions as to what it would be desirable for the Standard to mean.

Note: the resolution hasn't changed the standard. It hasn't allowed saboteurs to break more programs. No. It was merely a suggestion to develop adjustments to the standards — and listed three cases where such adjustments were supposed to cause certain outcomes.

Nothing like that happened. For more than two decades compilers invented more and more clever ways to screw the developers and used that resolution as a fig leaf.

And then… Rust happened. Since Rust guys are pretty concerned about program correctness (and LLVM sometimes miscomplied IR-programs they perceived correct) they went to C++ guys and asked “hey, what are the actual rules we have to follow”? And the answer was… “here is the defect report, we use it to screw the developers and miscompile their valid programs”. Rust developers weren't amused.

And that is when the lie was, finally, exposed.

So, please, don't liken problems with pointer provenance to problems with C11 memory model.

Indeed, C89 or C99 doesn't allow one to write valid multi-threaded programs. Everything is defined strictly for single-threaded program. To support programs where two threads of execution can touch objects “simultaneously” you need to extend the language somehow.

Provenance excuse is used to break completely valid C and C++ programs. It's not about extending the language, it's about narrowing it. Certain formerly valid programs have to be deemed to have undefined behaviour..And after more than two decades we don't even have the rules which we are supposed to follow finalized!

And they express it in a form of if you believe pointers are mere addresses, you are not writing C++; you are writing the C of K&R, which is a dead language. IOW: they know they sabotaged C developers —and they are proud of it.

> Nothing else is promised to you - there's no such thing as a "compiler barrier" in C89, for example.

Yes. And to express many things which would be needed to, e.g., write an OS kernel in C89, you need to extend the language in some way. This is deliberate: the idea was to make sure strictly-conforming C89 programs run everywhere, but conforming programs may require certain language extensions. Not ideal, but works.

Saboteurs managed to screw it all completely and categorically refuse to fix what they have done.

This looks like a good time to stop

Posted May 10, 2022 20:05 UTC (Tue) by corbet (editor, #1) [Link]

When we start seeing this type of language, and people accusing each other of lying, it's a strong signal that a thread may have outlived its useful lifetime. This article now has nearly 350 comments on it, so I may not be the only one who feels like that outliving happened a little while ago.

I would like to humbly suggest that we let this topic rest at this point.

Thank you.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds