|
|
Log in / Subscribe / Register

DeVault: Announcing the Hare programming language

Drew DeVault has announced the existence of a new programming language called "Hare".

Hare is a systems programming language designed to be simple, stable, and robust. Hare uses a static type system, manual memory management, and a minimal runtime. It is well-suited to writing operating systems, system tools, compilers, networking software, and other low-level, high performance tasks.


to post comments

DeVault: Announcing the Hare programming language

Posted May 2, 2022 2:59 UTC (Mon) by atai (subscriber, #10977) [Link] (27 responses)

why anotheer one? Rust not enough?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 3:29 UTC (Mon) by rfunk (subscriber, #4054) [Link] (15 responses)

Manual memory management, combined with a motto of "trust the programmer", tells me this is supposed to be an anti-Rust.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 4:46 UTC (Mon) by felixfix (subscriber, #242) [Link] (14 responses)

I haven't programmed in C for many years. I still have fond memories of it, but I have even fonder memories of 8 bit micro assembler, and no thank you, I prefer productivity and not spending 90% of my time taking care of stuff that a modern language takes care of.

What the dickens is the point of Hare if it leaves memory management and all that other housekeeping to the programmer? What does it have that C doesn't? I didn't look past the example, and wondered why they didn't begin with the classic "Hello, world". That example and the short blurb don't show enough potential to be a better C than Rust.

Maybe they need better marketing. Or maybe it isn't any better than my 30 second glimpse. I'm not interested enough to learn more.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 11:20 UTC (Mon) by khim (subscriber, #9252) [Link] (2 responses)

> What does it have that C doesn't?

Predictability, most likely. I'm 99% sure it's reaction to the “disaster of UB” of modern world.

Except they don't realize why that disaster happened and it would either stay on the fringe of IT, mostly ignored or the history would repeat itself.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:55 UTC (Mon) by HelloWorld (guest, #56129) [Link] (1 responses)

> Predictability, most likely.
No. It has manual memory management and no mechanism to prevent use-after-free, so there's going to be UB.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 13:26 UTC (Wed) by khim (subscriber, #9252) [Link]

Of course there would be UB! And there would be compilers which would “exploit it”. Then history would repeat itself.

That's, of course, in the very unlikely case of it becoming popular enough for that to happen. More likely outcome: this whole thing would die off after a couple more articles published on LWN over the next couple of years.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 11:24 UTC (Mon) by hkario (guest, #94864) [Link]

It allows the programmers to do the fun stuff, like reimplementing all of the libraries that do crypto, yaml parsing, networking, etc. (and not have to care about backwards compatibility!) instead of doing Yet Another Business Rules update.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:08 UTC (Mon) by ncm (guest, #165) [Link] (7 responses)

It really has no legitimate reason for our attention. It addresses no actual problem not better addressed elsewhere. Even Zig is better justified, but isn't.

Hare is meant to replace C in the creator's affections. But any language meant to be adopted by C coders is doomed: remaining C coders are defined by having seen a thousand languages go by, and passed on all of them.

Both C++ and Rust address C shortcomings with powerful advances in productivity. Weak-sauce C updates are at best a distraction, at worst compound its problems by siphoning off coders from alternatives. The people they are for don't want them, and new programmers are much better off with actually-better languages.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 14:21 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (1 responses)

> remaining C coders are defined by having seen a thousand languages go by, and passed on all of them.

I think that's an overstatement that kinda dilutes your general point.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 13:35 UTC (Wed) by khim (subscriber, #9252) [Link]

> I think that's an overstatement that kinda dilutes your general point.

Maybe with “thousands” part. But there definitely have been tens and, depending on how you would count them, hundreds of languages which were expressly designed to be a “better system language than C”. All these C++, Objective C, D and lots and lots of other languages were designed as “C replacements”. Yet they all failed at replacing C.

Which means that remaining C programmers need something extremely compelling to switch. Rust offers that (although it's not clear if that theoretical offer is nice enough in practice), Swift does that, too (although it's an Apple language thus highly unlikely to ever be widely used outside of Apple's ecosystem). Hare? Nope.

Bunch of cosmetic improvements without a clear explanation why it's better than C or hypothetical BoringC/FriendlyC.

DeVault: Announcing the Hare programming language

Posted May 7, 2022 9:01 UTC (Sat) by iustin (subscriber, #102433) [Link] (4 responses)

> But any language meant to be adopted by C coders is doomed: remaining C coders are defined by having seen a thousand languages go by, and passed on all of them.

This might be a slight exaggeration, but I think it is the right statement. Thanks, I'll remember this, it's well said.

DeVault: Announcing the Hare programming language

Posted May 8, 2022 11:49 UTC (Sun) by Vipketsh (guest, #134480) [Link] (3 responses)

I think many people who still use C do so for the same very simple reason I do: it works, has a reasonable community (many libraries) and, most importantly, code written today has a high chance of continuing to work unmodified far into the future. C is the only language which delivers all three, even though that last part is going down the drain these days because of continued exploitation of undefined behaviour. No matter what I find exactly no enjoyment being on a permanent treadmill having to jiffy my code around so that it works again with whatever "awesomeness" the language designers decided to change in the last few months or, even worse, to find that the language has been abandoned and now my code needs a full rewrite in something else. Is this really so bad ?

No, C programmers are not degenerate psychologically challenged moronic narcissists who work day and night just to deliver security issues and bugs to you. Could we be civil and stop looking down on people who code in C, please ?

DeVault: Announcing the Hare programming language

Posted May 8, 2022 11:55 UTC (Sun) by iustin (subscriber, #102433) [Link] (1 responses)

> No, C programmers are not degenerate psychologically challenged moronic narcissists who work day and night just to deliver security issues and bugs to you. Could we be civil and stop looking down on people who code in C, please ?

I think you misunderstood my reply entirely. I think the post I quoted was valid in the sense of - C programmers have seen enough fads coming by and passing, and yet - as you say - C still works. Not in the sense "they're dumb and cannot adjust".

DeVault: Announcing the Hare programming language

Posted May 8, 2022 12:04 UTC (Sun) by Vipketsh (guest, #134480) [Link]

Thanks for clarifying and sorry about my misunderstanding.

DeVault: Announcing the Hare programming language

Posted May 8, 2022 16:02 UTC (Sun) by farnz (subscriber, #17727) [Link]

I think you're taking this the wrong way - the point is that C programmers value the fact that C someone wrote 30+ years ago is still useful C far more than they value any new language feature that's not in C2x (or often, C99 or C90).

As a consequence of this, the bar for "new language that will attract people away from C" is very, very high. The natural instinct of the remaining people who prefer C when faced with a new language is not "oh, that feature is awesome and I will switch language to get it", but "will that language still be usable in N decades time? Can I implement the feature usably in a C library?".

If you do value that long-term stability over the language features, then a new language aiming to attract you has a big hill to climb - being new, it can't point to 30+ years of history like C can, and so how can it convince you that code written for the language as it exists today will still be useful code in 30 years time?

After all, we've established that that's a big chunk of why you use C - because you've seen so many attractive languages come and go in C's lifetime, and you don't want to start a project today that'll be impossible to compile (let alone use) in a decade. And that's a perfectly good reason to prefer C - it's the same reason that a lot of physicists prefer Fortran, because they want their code to still work for cross-checking results decades after a 20 year experiment comes to an end - but it does make it hard for a new language to attract you away from C.

DeVault: Announcing the Hare programming language

Posted Feb 23, 2023 14:33 UTC (Thu) by tinydev.art (guest, #163828) [Link] (1 responses)

This programming language doesn't seem to be aimed at you.

Hare seems to be C without all the footguns, with more consistent syntax. It solves the gripes modern C programmers have with C, not the gripes python programmers have with C.

DeVault: Announcing the Hare programming language

Posted Feb 23, 2023 17:56 UTC (Thu) by mpr22 (subscriber, #60784) [Link]

manual memory management is C's most infamous footgun.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 4:07 UTC (Mon) by areilly (guest, #87829) [Link] (10 responses)

Or zig, if you want to do without Rust's memory safety efforts. At least zig allows about the right amount of compile-time metaprogramming. I couldn't see any mechanism for macros or compile-time expression evaluation.

The only really astonishing thing that I saw in the tutorial and standard library documents, or rather didn't see, was any mention of threads, parallelism or asynchronous operation. Aren't we about ten years too late for another strictly single-threaded language?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 7:20 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (8 responses)

Yes, Zig is really interesting as a better C. The only downside is that it basically needs whole-program compilation due to how error sets work, but the feature set is impressive and I think C should borrow learn a lot from it.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:10 UTC (Mon) by ncm (guest, #165) [Link] (7 responses)

Yet, Zig also has no legitimate claim on our attention. Both are weak-sauce alternatives that do none of the heavy lifting a modern language provides.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 15:28 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (2 responses)

I don't think Zig or Vala have any shot at being a major contender. However there is a lot of legacy C code that won't reasonably be converted to C++ or Rust, and in my opinion Zig is a pretty good experiment as to how C could grow to incorporate new features (compile-time metaprogramming, coroutines, etc.) that those legacy projects could use.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:17 UTC (Mon) by bartoc (guest, #124262) [Link] (1 responses)

zig suggests that there is an interesting and fruitful design space enabled by not doing implicit instantiation of generics (I think c++ got implicit instantiation _almost_ by accident because of the way cfront worked, but that was before my time)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:40 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

Having to explicitly instantiate templates may certainly have put the kibosh on some of the ridiculous template metaprogramming I've seen in C++. However, I suspect it would have been used anyways because when a language only gives you a limited set of tools and the best thing for turning that screw is a scissor blade, by gosh, someone is going to use that scissor blade. Whether oodles of compiler output when you hold it wrong or gigantic TUs of template instantiations is worse is up for debate.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 15:59 UTC (Mon) by atnot (guest, #124910) [Link] (3 responses)

Come on, that's a bit far. Zig has plenty of things to offer, most importantly excellent no-bindings interoperability with C and C++, interesting approaches to async and more comprehensive compile-time evaluation and metaprogramming than any other native non-research language I'm aware of. Rust's C interop and macros look primitive and poorly integrated in comparison.

It's not transformative, sure, and I'm not sure I would start a greenfield project in it today. It definitely sacrificies more robustness for it's great C-compatibility than I'm personally comfortable with. But it's a compelling option for people with existing C codebases or complex C libraries and contributes some genuinely new and interesting approaches to writing expressive low level code. It's very far from being just an 80s nostalgia roleplay language.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 23:37 UTC (Mon) by roc (subscriber, #30627) [Link] (2 responses)

Zig has excellent interop with *C++*, really? I don't see that.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 0:07 UTC (Tue) by atnot (guest, #124910) [Link] (1 responses)

Perhaps I shouldn't have mentioned C++ considering the advantages are less apparent there (or arguably it is even behind compared to bindgen and cxx's capabilities).

But either way, there's a lot more to compatibility than just understanding the language constructs. A lot of existing C and C++ code, unfortunately, has very unclear data ownership and mutability. Trying to wrap that in a language with much different and stricter memory semantics is usually a pretty bad time. I have given up on it multiple times before. So while Zig may not understand C++ headers (yet?), I do think it's already much closer to working well with it by default.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 11:00 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

In modern C++ you're going to have a lot more cases where ownership is represented via a smart pointer (std::unique_ptr or std::shared_ptr), which is something Rust can get behind easily, but Zig can't really help you with. So it's probably going to shade over, C++ code that's mostly 15-20 years old may be happier to work with from Zig, while C++ code that's all written after say C++ 14 might be much easier to work with from Rust via one of the C++ FFI libraries like cxx which understands the standard library smart pointers.

DeVault: Announcing the Hare programming language

Posted May 31, 2022 0:05 UTC (Tue) by federico3 (guest, #101963) [Link]

Why Zig when you can use Nim?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 3:10 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (53 responses)

Too bad the example doesn't make use of strings, as it usually is an important indicator of how easy (or difficult) it should be to adopt. I've seen that there's a strings lib with a number of primitives, though, but it seems you have to prefix all functions with "strings::" which can easily turn something trivial to unreadable.

By the way that's something I really don't like: it reuses that ugly "colon colon" symbol from C++ and it's placed more than once per line in the tiny example (and same in the few libs I've looked at), which significantly complicates reading and writing. The fact that the tiny example supposed to sell you the language doesn't look pleasant to read is a bit concerning. It even features an example of how to introduce a long-term bug by using "sha256::SIZE" in the array definition instead of using "hash.SIZE" like one could have thought of, which would allow the hash type to change in the future and the array to automatically adapt (or the declarations should have been placed together).

It seems to prefix every single value from enums with their type, which forces you to choose between short and possibly conflicting names, or painfully long names to write and read. I don't know if it also forces to change every single line of code when using an alternate implementation of a lib, which is often one of the problems of languages mandating namespaces everywhere.

It seems there's been a lot of work done on it already, which could be a good indicator of maturity. That said it's strange that cross compiling is still in the roadmap, given that by now it should be well known that cross-compiling should be the default approach and native compiling just a special case of it, otherwise there are significant risks of making things much more difficult later.

Let's see how that evolves, it's probably a good start.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 6:55 UTC (Mon) by ddevault (subscriber, #99589) [Link] (52 responses)

Author here, happy to clarify a few things. Going to focus on the material problems and less on questions of taste such as "::".

>Too bad the example doesn't make use of strings

Here's some string-heavy code for your consideration:

https://git.sr.ht/~sircmpwn/hare-irc

String manipulation and manual memory management goes together like oil and water. I think we've done a pretty good job nevertheless, and you can do most string operations relatively comfortably.

https://docs.harelang.org/strings
https://docs.harelang.org/regex
https://docs.harelang.org/fmt

>I don't know if it also forces to change every single line of code when using an alternate implementation of a lib

It does not; vendoring is very straightforward and does not require any rewriting of the code. You can also trivially vendor modules from the standard library.

>That said it's strange that cross compiling is still in the roadmap, given that by now it should be well known that cross-compiling should be the default approach and native compiling just a special case of it.

This is in fact how it works in Hare, we just have to glue a few pieces together to make it useful. You can compile Hare code to any supported arch like so:

$ harec -X^ -T+riscv64 example.ha | qbe -t rv64 > example.s
$ as -o example example.s

But generally Hare programs are not built like this - they're built by invoking the build driver, hare(1), which is just currently lacking the convenience flags to glue all of the bits together. Also missing is the non-trivial considerations for handling sysroots when linking to native C libraries, though thankfully most Hare programs don't link to C.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:19 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (46 responses)

> Author here, happy to clarify a few things.

https://git.sr.ht/~sircmpwn/hare/tree/master/item/crypto/... - as an app developer targeting the standard library, how do I know whether or not my keys are going to be stored securely or not?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:22 UTC (Mon) by ddevault (subscriber, #99589) [Link] (45 responses)

Well, if the kernel does not provide a feature like Linux's keyctl, then there is no secure means of storing keys available. So it's a best-effort interface to utilize the kernel feature if provided, and if it's not, then as a developer you don't have much by way of better options anyway.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:28 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (44 responses)

There are certainly other ways to store keys securely! You could use a TPM, you could implement PKCS#11, you could support Yubikeys - doing this well is not purely a kernel issue. But even in the absence of any support, I have the option of "Don't do this thing if it's not going to be secure", and if the stdlib doesn't let me figure that out then I have questions about the assumptions made in the rest of the stdlib. This feels like a security feature that hasn't been developed by people who think about security.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:35 UTC (Mon) by ddevault (subscriber, #99589) [Link] (43 responses)

I don't think you understand the value-add this module aims to provide: it gets the keys out of your memory space. A TPM or Yubikeys may address this somewhat, but in an opinionated and complex way with out of band requirements. The purpose of this module is simply to provide an abstraction over Linux keyctl and future OS interfaces like it, nothing more.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:42 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (42 responses)

Except it doesn't, if you run it on anything other than Linux it just puts your keys in the heap. If the value proposition of the module is that it gets keys out of your memory space, then it doesn't work as advertised. If I'm developing an app that wants to guarantee that keys aren't in userspace memory, then right now doing this in Hare means I need to know implementation details of the stdlib and refuse to run if I'm on anything other than Linux. If FreeBSD adds equivalent functionality and Hare adapts that, I need to update my app. This seems extremely obviously broken, and the moment you find one broken thing in a crypto library it's legitimate to start asking whether anything else is done properly.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:49 UTC (Mon) by ddevault (subscriber, #99589) [Link] (41 responses)

Again, it's opportunistic. It's not broken. It just does not do what you want it to do.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:52 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (19 responses)

Me: "Please store this key securely"
Your library: puts it on the heap if I'm on any kernel other than Linux

Look if you can't understand that this is a thing that will happen in the real world and that people will potentially suffer as a result you shouldn't be writing a crypto library. The absolute best thing you could do right now is move the entire crypto directory out of your stdlib until it's fully reviewed by people who understand not only cryptography but also threat modelling.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:54 UTC (Mon) by ddevault (subscriber, #99589) [Link] (16 responses)

I feel no further need to engage with you here.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:59 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (3 responses)

Look it feels pretty obvious that shipping a crypto library that falls back to being insecure without any indication to the app in question is not a good design and I am extremely confused why you're being defensive about this rather than talking about ways you could add assertions that apps could opt out of or something along those lines

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:12 UTC (Mon) by bluss (guest, #47454) [Link] (2 responses)

It's literally the guy's hobby language, why go at him as if this is some twitter thread? I'm here for the hacking spirit and enjoying a good home-grown work, with faults and all.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:00 UTC (Mon) by Natanael_L (guest, #158286) [Link]

The problem with doing that in the security field is that people who know less about security than the person who wrote the code will absolutely use it insecurely and never realize it, because most security failures are silent failures.

If you're doing it as a hobby, it should be your obligation to advertise it as NOT being secure enough for deployment for anything handling sensitive information.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:46 UTC (Wed) by Ashton (guest, #158330) [Link]

I’ve always disliked this interpretation of how language creators and other programmers interact. Yes, Drew made this language on his own as a “hobby” (although I think this drastically understated how hard it is to make a language work well), but he’s also trying to convince us to use it. Unless if Drew is happy with only his projects using Hare, which I doubt is the case, how regular programmers will interact with the language and the standard lib matters a lot.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:14 UTC (Mon) by ncm (guest, #165) [Link] (11 responses)

Someone who does not feel intensely motivated to learn from mjg59's freely offered expertise has no legitimate claim on anyone's attention.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:18 UTC (Mon) by ddevault (subscriber, #99589) [Link] (10 responses)

Oh good, hero worship.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:02 UTC (Mon) by Lionel_Debroux (subscriber, #30014) [Link] (4 responses)

Hey SirCmpwn... You've done a number of interesting, thought-provoking and/or even useful things since your beginnings in the TI graphing calculators community a decade ago or so; however, you can and should do better than digging holes: mjg59's not wrong, you know it, and you have little to gain by opposing him that way ;)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:04 UTC (Mon) by ddevault (subscriber, #99589) [Link] (3 responses)

Do me the favor of taking my comments at face value. I earnestly disagree with mjg59's position, and what's more, with the way they presented it. I don't particularly enjoy arguing with people who are calling for me to be criminally prosecuted for designing a programming language that does not align with their sensibilities.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:31 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

> I earnestly disagree with mjg59's position

You think "silently fails insecure if conditions not advertised outside the source tree happen to be true and no way to pick an alternative, but advertised as being extra-secure" is a good thing, really?

Meanwhile, supporting key storage in YubiKeys would fix this problem by being portable to arbitrary operating systems, plus it has relatively low cost for keys capable of such things, is *literally trivial* to implement because Yubico provide not only libraries in multiple languages but an actual written spec, and should be pretty easy to make work on any device capable of USB communication -- but you arbitrarily declare it as out of scope Or if not YubiKeys, how about one of the countless other devices, most free hardware, with the same capabilities? Or how about at least not claiming the library is secure when it's not? There are *so many* ways to get out of this hole ever so easily, but instead you're literally simply refusing to engage or fix this obvious problem in any of the dozen-plus ways available to you or even acknowledge that it is a problem... because you don't like Matthew's tone. This really does not fill me with enthusiasm for your new language at all.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:34 UTC (Tue) by ddevault (subscriber, #99589) [Link]

It is not automatically insecure on other systems. Like I've explained in other comments, this is one part of a system which provides defense in depth, and the lack of a kernel-provided key store does not create any vulnerabilities in your application on its own. What's more, it was never advertised as "extra-secure", in fact, it's advertised as quite the opposite, with clear documentation explaining its limitations, a disclaimer that it has not been audited, and emphasis given on the importance of good cryptography as it pertains to the life and security of your users.

Again, the YubiKey suggestion lacks an understanding of the scope of this module and of the standard library in general.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:46 UTC (Wed) by Ashton (guest, #158330) [Link]

You don’t have to keep engaging with mjg59 if you don’t want to, but belittling people who agree with them as mere hero worshipers is beyond the pale. Remember that in asking us to use your language, you’re asking us to also trust you in your stewardship of that language and how you’ll respond to our concerns and needs as the maintainer. Seeing you attack people so aggressively out of the gate is not a confidence boosting start.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:40 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

If you want to be a devil that's your problem.

Telling a well known expert he's clueless is going to get pretty much everyone here writing you off as not worth paying any attention to. (I'd pretty much done that already, but this really does hammer the point home!).

Cheers,
Wol

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:42 UTC (Mon) by corbet (editor, #1) [Link]

...and this kind of comment really doesn't help the situation either. Please can we try to calm things down a bit?

DeVault: Announcing the Hare programming language

Posted May 3, 2022 4:43 UTC (Tue) by rsidd (subscriber, #2582) [Link] (2 responses)

I don't think such comments, or your defensive reaction to mjg59's security concerns in general, will persuade people to try out Hare. You are very smart, but so are other people, and they may have expertise you don't (mjg59's crypto contributions speak for themselves).

DeVault: Announcing the Hare programming language

Posted May 3, 2022 5:04 UTC (Tue) by mjg59 (subscriber, #23239) [Link] (1 responses)

Oh, to be clear, I am not qualified to do cryptography - I have opinions in the broader security space and how cryptography applies to that, but on the crypto side I am not someone you should pay attention to.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 7:00 UTC (Tue) by rsidd (subscriber, #2582) [Link]

I guess I meant applied crypto -- ie, your work on secureboot in particular -- not crypto algorithms!

DeVault: Announcing the Hare programming language

Posted May 6, 2022 15:17 UTC (Fri) by daniel.glasser (guest, #97146) [Link] (1 responses)

If there is no underlying secure key storage mechanism on a system, then no amount of abstraction in the library will be able to provide one. Even for a given hardware architecture and OS, there may be differently provisioned systems. If an application requires hard security beyond the best effort that the standard interfaces provided by Hare, or any other language, that application should not use the built-in tools and instead use an alternative that enforces the dependency on an underlying facility provided by an OS or hardware.

Secure key management can be difficult and not at all portable in my experience. Hare, and its libraries, are fairly new. No doubt, given enough exposure, there will be improvements as those libraries evolve.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 16:55 UTC (Fri) by farnz (subscriber, #17727) [Link]

If there's no underlying secure key storage mechanism, why provide a "secure key storage" library on that platform? If you're going to provide one that's best effort, why not provide a mechanism for the programmer to confirm that it's not using the heap, but instead using a secure storage location?

And note that the core problem is not so much that the library as it exists now is problematic (after all, Hare has not yet been ported to a non-Linux platform), as the attitude underlying it that the programmer can't be trusted to do the right thing if the library tells the programmer what the true state is. That's not a good look for a language whose claimed USP is that it "trusts the programmer" - if the programmer can be trusted, a simple "bool is_secure_storage()" would be enough.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:14 UTC (Mon) by HelloWorld (guest, #56129) [Link] (10 responses)

> Again, it's opportunistic. It's not broken.
In a security context, being opportunistic is broken.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:19 UTC (Mon) by ddevault (subscriber, #99589) [Link] (9 responses)

That's not how this works. Opportunistic improvements are part of defense in depth. It improves security to use this module, but it is not *necessary* for security to use this module.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 14:54 UTC (Mon) by farnz (subscriber, #17727) [Link]

So how do I conditionally compile Hare such that if a secure keystore isn't available, my code refuses to compile? This module isn't that, because it works cross-platform, even if there is no secure keystore available.

In effect, it's opportunistically downgrading my code if I move from a platform like Linux with a secure keystore to one without, and long experience shows that opportunistic downgrades are a really bad thing for security. An opportunistic upgrade would be if you (e.g.) detected use of the bytes::zero secure zeroing operation, and upgraded from "normal" storage to a secure keystore.

What I, as a developer doing my best to not add too many more security bugs to the world, is for my code to actively alert the next developer (which might even be me a few years later) if they are making decisions that contradict things my code assumes is true. This particular module does exactly the opposite - if I make decisions that are true assuming the keystore is secure (which is testably true on Linux), and then someone uses my code on FreeBSD (where it's not true in this version of the code), then the user who switches to FreeBSD has introduced flaws I wasn't expecting to have to handle.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:36 UTC (Tue) by nix (subscriber, #2304) [Link] (7 responses)

The problem with opportunistic improvements in security, as we've seen with SSL over the years, is that as soon as an attacker gets involved it becomes clear that these are really *reductions* in security because an attacker can either force the selection of the least secure option or focus only on those for whom the opportunistic improvement did not kick in (thus reducing the overall security of the library to its least secure set of features).

Now maybe you're saying most people will never be attacked so there's no point worrying about this -- but in that case why encrypt anything at all? The whole point of encrypting things is to stop attackers from reading them! Maybe you think that most attackers won't be sufficiently dedicated... but it only takes one attacker to write a tool to do whatever thing you think is difficult and suddenly it's easy and every script kiddie is doing it.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:38 UTC (Tue) by ddevault (subscriber, #99589) [Link] (6 responses)

I am familiar with the downgrade attacks that you are referring to, and they do not apply here. I would be very impressed if a man in the middle was able to force a victim into installing BSD instead of Linux in order to exploit some vulnerability - a separate vulnerability, to be clear, given that storing keys in the heap is not automatically a vulnerability.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 12:59 UTC (Tue) by farnz (subscriber, #17727) [Link] (4 responses)

The specific concern I have is related to a user trusting the programmer. The problem here is that the language sets me up for a fall if I do trust the programmer.

Assume I'm administrating a FreeBSD fleet; I discover a need, and find a program written in Hare that fills that need. The documentation written by the programmer asserts several security claims that are true on the assumption that a secure keystore is used, and false otherwise. Being a conscientious sysadmin, before relying on the documentation, I first compile the program and run its test suite.

Here's where the pain lies - because there's no way for a Hare programmer to fail the tests if the keystore is not actually secure (but instead heap memory), I see a full set of passes. And then I get burnt because I trusted the programmer, but the language provided no way for the programmer to tell me that, because of my system choices, my trust is misplaced.

And this applies even if the programmer is conscientious - in autotools C, for example, I'd note that keyctl is not available, and not compile in the "in memory secure" keystore, because it's not present on this platform. It'd then be obvious to the conscientious sysadmin that in memory key storage is not available on FreeBSD, and that you need another implementation (be that PKCS#11, TPMv2, Yubikey, or whatever else the programmer designed in), or that I can't trust the security claims that depend on keyctl.

If Hare wanted me to be able to trust the programmer, there would at a minimum be a way for the programmer to prevent the code relying on the key store if it's not secure. That mechanism is not present here.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:16 UTC (Tue) by ddevault (subscriber, #99589) [Link] (3 responses)

>The documentation written by the programmer asserts several security claims that are true on the assumption that a secure keystore is used, and false otherwise.

This is an error in the downstream software, not in Hare. We cannot prevent programmers from making untruthful claims about their software. And, *again*, storing secrets in the heap in the face of the lack of a kernel-managed keystore *does not directly make your program insecure*.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:40 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

It's also a bug in Hare, because it claims to be a secure keystore, but it's not on some systems. And while storing secrets in the heap does not directly make your program insecure, I'm not talking about that - I'm referring to the case where the program does things to ensure that it's secure as long as the keys are not on the heap or stack outside of the times they're used.

And again, it's trivial to fix in Hare - just an is_secure_store function that returns true on Linux, false on other platforms for now.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:43 UTC (Tue) by ddevault (subscriber, #99589) [Link] (1 responses)

Again, the "claims" it makes are, verbatim, the following:

> On platforms without a suitable feature, a fallback implementation stores the secrets in the process heap, providing no security. This is an opportunistic API which allows your program to take advantage of these features if available.

In my opinion, this is very clear.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:45 UTC (Tue) by farnz (subscriber, #17727) [Link]

And in my opinion, it's not clear at all - which platforms have secure storage? How do I tell my users that they've chosen a platform that doesn't work the way I want it to? How do I prevent use of the keystore when it's not actually secure?

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:45 UTC (Tue) by nix (subscriber, #2304) [Link]

> I would be very impressed if a man in the middle was able to force a victim into installing BSD instead of Linux in order to exploit some vulnerability

They wouldn't. They would search the set of people using whatever-it-is and filter out those running Linux, knowing that *all the rest* were storing the keys on the heap and they could use heap spraying or whatever. (For which they'd probably need another vulnerability. They're very good at chaining these things together...)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:23 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (9 responses)

Such desired behaviors usually depend on the exact use cases. Someone developing a certs-unlocking mechanism for a webserver that needs to run under any circumstances will value the "do this, preferably in a secure way" approach. A user storing GPG keys on a shared system at school will rather think "do this but only if it can be done in a secure way". There's no technical one-size-fits-all solution here, it's often a matter of choice (e.g. by configuration or API). That the same problem as the INSECURE flag to get randoms actually: you want to play Tetris regardless of the random strength but you don't want to produce your server's ssh key from bad randoms.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:25 UTC (Mon) by ddevault (subscriber, #99589) [Link] (8 responses)

I agree. I will also note that reading the documentation is mandatory, especially for security-sensitive use-cases, where the following is clearly stated:

> On platforms without a suitable feature, a fallback implementation stores the secrets in the process heap, providing no security. This is an opportunistic API which allows your program to take advantage of these features if available.

If that's not suitable for your needs, then you need to use something else.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:56 UTC (Mon) by dkg (subscriber, #55359) [Link] (7 responses)

Drew, the feedback you're getting here is good feedback. You'd do the hare project a favor by listening instead of explaining why you're right and they're wrong.

I agree that opportunistic security can useful. But there are circumstances where the developer needs a guarantee, not opportunism. If Hare had an explicit method for requiring secure storage of keys during compilation, which would fail if the underlying OS doesn't offer support, then a developer who wants a strong security guarantee can have it, and a developer who is happy with the opportunistic approach could have it too.

And, some future static analyzer could look for instances of opportunistic use in systems that really need to offer stronger guarantees by seeing which interface was selected.

You're introducing a programming language to an interested potential user base here. Why sabotage the introduction by defensively rejecting useful feedback?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:59 UTC (Mon) by ddevault (subscriber, #99589) [Link] (6 responses)

I am actually somewhat open to the idea of extending the crypto::keystore interface to support mandatory security, though I think that some commenters are failing to understand it's design scope - things like YubiKey integrations fall well outside of that scope. I am not, however, particularly receptive of feedback from Rust cultists who, in this very thread, have suggested that I should be criminally prosecuted for use-after-free bugs found in downstream Hare programs.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:22 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (5 responses)

The point to keep in mind is that the vast majority of code written in modern languages is copy-pasted from stackoverflow or even probably github copilot because these languages are so painful and stubborn that when you don't know how to express something that sounds simple and the compiler refuses, you have to attack it another side and at some point you run out of ideas. And *that* is exactly the problem: users getting used to blindly copy-pasting code before thinking rarely read the doc. In certain programs it's so obvious that the function doesn't even do what its name suggests, or it fails to produce correct outputs for some special values. If your crypto lib relies only on the doc, there will be failures in field, in either direction (too strict or too lose). Instead, pass an argument so that the user explicitly expresses their intent, e.g. opportunistic or mandatory. This way if they fail after copy-pasting, well, there's hardly anything more that can be done to save them from seeking a totally different job that doesn't involve a keyboard (nor a mouse).

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:28 UTC (Mon) by ddevault (subscriber, #99589) [Link] (3 responses)

Accommodating negligent programmers are a use-case we have deliberately decided to eschew. A Rust programmer could equally come up with severely flawed code copy-pasting from StackOverflow, compilers cannot completely solve human stupidity.

Even so, we do not rely entirely on the docs. Again, crypto::keystore is just a small part of a larger secure system, and with defense in depth its failure is unlikely to be an issue. Using crypto::keystore on a system without secure kernel key management does not actually introduce a security bug - it just gives an opportunity for exploitation if you find *another* bug which allows you to read arbitrary memory from the process (the likelihood of which is prevented, again, defense in depth, by things like bounds-checked slices). Our other cryptographic APIs are also designed to make errors as unlikely as possible, such as via mandatory error handling and automatic zeroing of caller-provided private data.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:26 UTC (Tue) by peter-b (guest, #66996) [Link] (2 responses)

> Accommodating negligent programmers are a use-case we have deliberately decided to eschew.

Aww, shucks. As a programmer who sometimes writes less than perfect code, it sounds like Hare is not for me.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:27 UTC (Tue) by ddevault (subscriber, #99589) [Link]

There's a difference between negligence and simply writing poor code. I don't think this is worth writing Hare off over.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 4:01 UTC (Wed) by timrichardson (subscriber, #72836) [Link]

That is so out of context it's not funny.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:48 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> The point to keep in mind is that the vast majority of code written in modern languages is copy-pasted from stackoverflow or even probably github copilot

Interesting take. Any evidence for "vast majority of code"?

> because these languages are so painful and stubborn that when you don't know how to express something that sounds simple and the compiler refuses, you have to attack it another side and at some point you run out of ideas.

Ah, yes, because I enjoy having to figure out what data is tracked by which mutex manually (and somehow communicating this to everyone else that works in the vicinity) instead of the compiler saying "uh, hey, did you forget to consider threads here?". Yes, much better when I have to pick up someone else's debugging state to figure out what went wrong. Repeat for uninitialized data, memory management miscommunications, single-threaded code being improperly used from multiple threads[1], etc. </s>

If you're working in tricky areas, expect tricky code. I'd rather someone convince the compiler that they got it right instead of expecting anyone else coming along and fixing warnings or whatever tripping over those 3 lines that took an hour to get right because threads are hard and the comment was ignored by `clang-tidy -fix` and a rubber stamp review.

[1] I agree with you: docs are not enough, you need something more. Rust's `Sync` trait seems to do well enough.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:16 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (4 responses)

> Here's some string-heavy code for your consideration:
>
> https://git.sr.ht/~sircmpwn/hare-irc
> String manipulation and manual memory management goes together like oil and water. I think we've done a pretty good job nevertheless,
> and you can do most string operations relatively comfortably.
>
> https://docs.harelang.org/strings
> https://docs.harelang.org/regex
> https://docs.harelang.org/fmt

Thanks Drew for sharing. However I had already looked at these ones. For example what's not obvious to me is how strings and memory mix together. I.e. if I want to write a variant of base64 output whose max output size I already know based on the input, am I supposed to use concat() between strings for every character or may I just create a string of suitable size and fill the chars a-la "*(p++) = c" since I know what I'm doing ? Don't get me wrong, I love the approach of "trust the programmer", especially since nowadays a non-negligible number of bugs are induced from trying to work around absurd warnings or non-sensical UB, resulting in unnatural code that easily breaks. I'm just curious to know how that mixes as I do imagine it's not a trivial thing.

> >I don't know if it also forces to change every single line of code when using an alternate implementation of a lib
> It does not; vendoring is very straightforward and does not require any rewriting of the code. You can also trivially vendor modules
> from the standard library.

OK, fine!

> >That said it's strange that cross compiling is still in the roadmap, given that by now it should be well known that cross-compiling
> > should be the default approach and native compiling just a special case of it.
> This is in fact how it works in Hare, we just have to glue a few pieces together to make it useful. You can compile Hare code to any
> supported arch like so:
>
> $ harec -X^ -T+riscv64 example.ha | qbe -t rv64 > example.s
> $ as -o example example.s

That's fine then!

> But generally Hare programs are not built like this - they're built by invoking the build driver, hare(1), which is just currently
> lacking the convenience flags to glue all of the bits together. Also missing is the non-trivial considerations for handling sysroots
> when linking to native C libraries, though thankfully most Hare programs don't link to C.

OK so in practice it does support cross-compiling, it's just that it's not trivial to use right now. That's much less of a problem, because usually when cross-compiling is not supported, it stems from reasons having their roots very deeply burried. Here's it's mostly a UI issue in fact.

Thanks for the clarifications.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:22 UTC (Mon) by ddevault (subscriber, #99589) [Link] (3 responses)

>I.e. if I want to write a variant of base64 output whose max output size I already know based on the input, am I supposed to use concat() between strings for every character or may I just create a string of suitable size and fill the chars a-la "*(p++) = c" since I know what I'm doing?

Here's some sample code that might answer your question:

https://paste.sr.ht/~sircmpwn/ba2f1f110c6d3e0a6e16c496af0...

Try changing the buffer size to 16 and run it again - the runtime will detect the overflow and abort.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:26 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (2 responses)

But this one seems to be using a pre-existing base64() function, my question was rather from the function's implementer point of view. In fact, do I still have the ability to fill a buffer one byte at a time and cast it in return saying "this is my string" (possibly with a required size prefix or any such thing to accommodate for internal needs) ?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:31 UTC (Mon) by ddevault (subscriber, #99589) [Link] (1 responses)

I'm still not sure how to answer your question generally, but:

>In fact, do I still have the ability to fill a buffer one byte at a time and cast it in return saying "this is my string"

https://paste.sr.ht/~sircmpwn/7ce10cda33c51304974b324e3b6...

Does that tell you what you want to know?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:33 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> https://paste.sr.ht/~sircmpwn/7ce10cda33c51304974b324e3b6...
>
> Does that tell you what you want to know?

Looks so, thank you :-)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 3:20 UTC (Mon) by flussence (guest, #85566) [Link] (4 responses)

I wouldn't dismiss it offhand, but I've been burned before by languages releasing to much self-generated fanfare years before they're actually ready. The announcement seems a lot more polished than the language itself; the blurb's making some tall boasts for something with a spec document full of "section missing"s (and no spectests at all?)

Maybe in a year or two...

DeVault: Announcing the Hare programming language

Posted May 2, 2022 6:59 UTC (Mon) by ddevault (subscriber, #99589) [Link] (2 responses)

It's not 1.0 yet, and that comes with some obvious caveats. However, it is useful for many kinds of systems programs, has a pretty broad and mostly mature standard library, only a small number of non-disruptive future language changes planned, and lots of docs. There's no test suite to explicitly measure conformance against the (draft!) specification, but the compiler and standard library both have plenty of tests, some of which do directly verify their behavior against the spec.

So yes, it's still a work in progress, and that comes with its caveats, but it's also a lot more fleshed out than you might expect. We're planning on putting a lot of work into 1.0, more than most languages do, because we plan to feature freeze it there. We might have released 1.0 much sooner if not for that.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 17:05 UTC (Tue) by flussence (guest, #85566) [Link] (1 responses)

Yeah, I noticed a complete absence on the site of any kind of version number; it'd be unfair to put it on blast this early on. Those ones I got burned by *did* make a big deal of being "done"...

You've (all) built something interesting here, it works (seems like a few people disagree with how it _should_ work, but that's language design for you), and I appreciate it being its own thing and not just another LLVM frontend. I think it has a decent chance of going somewhere.

Why not LLVM

Posted May 4, 2022 5:44 UTC (Wed) by cbushey (guest, #142134) [Link]

What would be wrong with having it be a LLVM front end?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 15:07 UTC (Mon) by beagnach (guest, #32987) [Link]

> I've been burned before by languages releasing to much self-generated fanfare years before they're actually ready

It's pretty much a necessity to get a language into the wider world before it's actually ready.

I think any reasonably experienced developer will be well aware of the implications of a pre 1.0 version no

And is Hare any worse than, say, Rust pre 1.0?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 7:25 UTC (Mon) by roc (subscriber, #30627) [Link] (151 responses)

I don't want to use systems built in a "trust the programmer" language. I don't trust any programmers, including myself. (OK, I trust DJB, but him only.)

So I'm curious what the "robust" claim in the language blurb actually means. Programs written in this language will only be robust if the programmer is perfect, which makes it a stone-soup kind of claim. Does it mean that the language specification and the compiler are robust?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 7:34 UTC (Mon) by ddevault (subscriber, #99589) [Link] (148 responses)

The key is in the point that follows "trust the programmer":

> Provide tools the programmer may use when they don’t trust themselves.

If you don't trust yourself, make use of the tools. It does the right thing by default and trusts you if you tell it you know better, which is required to handle certain systems programming use-cases which are well supported by C but tend to get marginalized by other languages.

A lot of people seem to latch onto the first point and fire straight into criticism without considering the second. Believe me, we know programmers are untrustworthy, which is what these tools are for - but you need to trust them to get certain things done.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 7:44 UTC (Mon) by roc (subscriber, #30627) [Link] (143 responses)

I didn't see any tools for preventing use-after-free bugs.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 7:46 UTC (Mon) by ddevault (subscriber, #99589) [Link] (128 responses)

No, there are no tools to prevent use-after-free. Hare uses manual memory management. There are other tools, like bounds checked slices and arrays, mandatory error handling, nullable pointer types, and so on, but it's not as comprehensive in this respect as, say, Rust, by design.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:32 UTC (Mon) by atnot (guest, #124910) [Link] (127 responses)

That seems at least mildly concerning, considering the fact that e.g. the majority of 0days analyzed by google were use-after-frees: https://googleprojectzero.blogspot.com/2022/04/the-more-y...

Of course, the tooling for detecting uaf is complex. But it's complex because especially in concurrent programs, uaf is hard to get right, both for humans and computers. I don't find the idea of just glossing over that in 2022 particularly compelling.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:34 UTC (Mon) by ddevault (subscriber, #99589) [Link] (122 responses)

Security is just one of many traits we're balancing in Hare, and it's weighed against other trade-offs. We came away with a different answer than Rust. Let's see how it performs in practice before judging it too harshly based on speculation over what its vulnerabilities might look like.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:29 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (121 responses)

Do you have a theory for why use-after-free is less likely in Hare than in C? If not, what is the metric you're going to use for determining whether the trade-off is preferable to Rust's?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:36 UTC (Mon) by ddevault (subscriber, #99589) [Link] (120 responses)

Use-after-free may or may not be less likely in Hare than C (I really couldn't say off-hand). The trade-offs to which I refer are broader than simply use-after-free, however: it's the entire domain of saftey-oriented language design. Use-after-free is one issue we've chosen not to address, though we have addressed many others, and unlike many Rust advocates, I don't think that writes off the entire language.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:49 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (119 responses)

I think we're at the point in history where anyone who writes a compiler that permits use-after-free should be held liable for anyone who manages to fuck up as a result of that. Security issues aren't a matter of inconvenience now - we're reached a level of pervasive tech that results in people literally dying as a result of memory unsafety. If you're fine with that then hey go with it, but you should be explicit about making design choices that increase the risk of actually awful outcomes instead of punting that to the people who use your language.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:52 UTC (Mon) by ddevault (subscriber, #99589) [Link] (117 responses)

There are significantly more non-life-critical use-cases than there are life-critical use-cases. At no point has anyone suggested that Hare is used to write a pacemaker firmware. The introduction to the crypto module is also quite serious about security obligations:

> Cryptography is a difficult, high-risk domain of programming. The life and well-being of your users may depend on your ability to implement cryptographic applications with due care. Please carefully read all of the documentation, double-check your work, and seek second opinions and independent review of your code. Our documentation and API design aims to prevent easy mistakes from being made, but it is no substitute for a good background in applied cryptography.

Not everyone is working on airplane guidance systems.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:55 UTC (Mon) by mjg59 (subscriber, #23239) [Link]

> Not everyone is working on airplane guidance systems.

No, some of them are writing chat apps or media decoders or other things that simultaneously contain private data and attempt to parse untrusted data. The set of useful apps you can write these days that are at zero risk of memory unsafety is tiny, and it doesn't need to be a tradtionally safety-critical use case to risk deaths as a result.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:35 UTC (Mon) by roc (subscriber, #30627) [Link] (19 responses)

The problem is that any code exposed to potentially malicious input is a security attack surface. And even if you don't care about your device being compromised, it's still a hazard for others; e.g. any compromised network-attached device can be part of a botnet for DDoS or a relay for ransomware attacks.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:38 UTC (Mon) by ddevault (subscriber, #99589) [Link] (18 responses)

Aye, this is why Hare *does* offer some security features. We just don't offer all of the same features as Rust does, and that's fine. It's a different set of trade-offs.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 11:33 UTC (Mon) by khim (subscriber, #9252) [Link] (6 responses)

The majority of problems in real-world C/C++ programs come from issues with memory management. Unless you can explain what you have done to avoid this issue other security features are not all that interesting.

It's like making a car and then deciding that adding wheels to it would be too hard. Such car would still be useful for some things, but it wouldn't be useful for most people.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 11:36 UTC (Mon) by ddevault (subscriber, #99589) [Link] (4 responses)

I have already explained Hare's security features many times, some of which relate directly to memory management, such as nullable pointer types or bounds-checked slices. Memory issues are much more rare in Hare programs than in C programs.

Part of Hare's goal is to question the orthodoxy pushed by Rust and its proponents and to look for a different solution. No, it's not Rust. No, it's not going to be Rust. That does not rule it out as an interesting language.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:56 UTC (Mon) by ncm (guest, #165) [Link] (2 responses)

It does, when you demonstrate you have utterly failed to comprehend Rust's, let alone C++'s, value proposition.

It is 2022, not 1972. Bring something new and valuable to the table. Don't waste our time on 1970s retreads.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:58 UTC (Mon) by ddevault (subscriber, #99589) [Link] (1 responses)

Rest assured that I fully understand the value Rust supposes it offers, and C++ also. You have not invented the God language, and you are not His servant. If you aren't interested then Hare, then fine, more power to you, but bugger off while we build the language we want - not the language you want.

*Sigh*

Posted May 2, 2022 14:02 UTC (Mon) by corbet (editor, #1) [Link]

Someday we'll be able to post an item on programming languages without things degrading this way. Evidently not today. Let's all please back off a bit now.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 23:48 UTC (Mon) by roc (subscriber, #30627) [Link]

> Memory issues are much more rare in Hare programs than in C programs.

Do you have evidence this is true in practice? I guess you've fuzzed Hare programs and equivalent C programs and measured the bug rates?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:40 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> The majority of problems in real-world C/C++ programs come from issues with memory management

I disagree with that claim. At least it's not what I'm suffering the most from. They're among the longest ones to debug though. Most problems I'm seeing actually result from undefined behaviors (especially recently when compilers started to abuse them to win the compilers war), and problems with integer type promotion that's a real mess in C, and that started to turn something bening into real bugs with the arrival of size_t everywhere and 64-bit machines while dealing with plenty of APIs written for ints. These ones are often considered to be memory management issues because they result in buffer overflows and so on but they actually are not at all.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:14 UTC (Mon) by HelloWorld (guest, #56129) [Link] (10 responses)

So what trade-off is that? What does it buy you not to prevent resource leaks and use-after-free bugs statically? Really the only answer I can think of is that it makes the language easier to learn. So you save perhaps a couple of weeks of fighting with the borrow checker in order to later spend months debugging leaks and use-after-free issues – assuming you even find them before they cause a production outage and cost you millions of dollars. That doesn't seem like a good trade-off to me.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:20 UTC (Mon) by ddevault (subscriber, #99589) [Link] (9 responses)

The trade-offs come in the form of the complexity of the language and its implementation, which in Rust's case is utterly out of control. I'm not convinced that borrow checking cannot be done more simply - we intend to research this for Hare - but it's not the holy grail of programming which creates a moral lower class among other language, as the Rust adherents in this thread seem to believe.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:17 UTC (Mon) by linuxrocks123 (subscriber, #34648) [Link] (5 responses)

How about this: keep track of all the pointers in the runtime, and, when a memory region is freed, replace all the pointers that refer to it with NULL. Then, if anyone tries to do a use after free, they'll get an immediate crash. This wouldn't be a feature you'd want to have turned on for production code, but it could help a lot during development and testing, and it would avoid burdening the programmer with additional borrow-checking rules. I hope, also, that you'll allow bounds checking and similar features to be turned off after debugging and testing are done: I'd rather not pay the cost of those debugging aids in production.

The reception you've gotten from others in this comment section is very unfortunate, although, given the people involved, I'm also not terribly surprised they're acting this way. A good philosophy for handling negativity from ignorant, opinionated blowhards is to just ignore them and keep doing what you love. I'm sure you'll make something great with Hare.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:19 UTC (Mon) by ddevault (subscriber, #99589) [Link] (4 responses)

>How about this: keep track of all the pointers in the runtime

This is a non-starter, it's too expensive. Interesting idea, though. We are planning on writing an optional debug allocator which will address many memory-related bugs, such as use-after-free, in a similar manner to valgrind.

> The reception you've gotten from others in this comment section is very unfortunate, although, given the people involved, I'm also not terribly surprised they're acting this way. A good philosophy for handling negativity from ignorant, opinionated blowhards is to just ignore them and keep doing what you love. I'm sure you'll make something great with Hare.

Thanks :)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:53 UTC (Mon) by linuxrocks123 (subscriber, #34648) [Link] (3 responses)

> This is a non-starter, it's too expensive.

What if you used a pool allocator to separate the pointers from the non-pointers? That would add a level of indirection to structs that had pointers, since they'd have to be converted to pointers to pointers, and also to pointers on the stack, since ditto. But then you'd have a nice contiguous array of all your pointers that you'd just have to scan upon calls to free, and you might be able to use SIMD for that.

If you've thought of that and it's too expensive, I'll stop now, but I just thought I'd mention it in case you hadn't thought of it :)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:55 UTC (Mon) by ddevault (subscriber, #99589) [Link]

I don't have time to give this idea the thought it deserves right now, but thanks for sharing. Added it to my notes for improvements to allocator insights.

Use-after-free checking at low runtime cost

Posted May 4, 2022 1:12 UTC (Wed) by akkartik (guest, #158307) [Link] (1 responses)

Since you seem interested in this space, I'll throw out one idea I particularly like and have used in a project [1]: manage heap allocations using a fat pointer that includes an allocation id. The pointer contains the allocation id and so does the payload. Every dereference of the fat pointer compares the allocation id in the pointer and payload. Freeing an allocation resets its allocation id. Future allocations that reuse the allocation will never generate the same allocation id. A use-after-free dereference then leads to an immediate abort, which is easier to debug and more secure.

The overhead of this scheme is too great for most C/Rust programmers, but I think it's much lower than tracking all pointers or indirections in structs containing pointers.

[1] https://github.com/akkartik/mu

Use-after-free checking at low runtime cost

Posted May 4, 2022 13:22 UTC (Wed) by HelloWorld (guest, #56129) [Link]

The best that a run-time check for this sort of thing can do is turn one bug into a different kind of bug, at a considerable performance cost. While that can be useful for legacy programming languages like C (primarily as a debugging tool), it's simply the wrong approach for new languages. Modern programming language design should be focused on statically preventing bugs, and messing around with run-time checks is simply a waste of time.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:54 UTC (Mon) by HelloWorld (guest, #56129) [Link] (1 responses)

> I'm not convinced that borrow checking cannot be done more simply - we intend to research this for Hare
I don't think something this fundamental can be retrofitted. It specifically says on the Hare website that you intend to place a strong emphasis on backward compatibility, which means that once programs with lifetime issues (leaks/use-after-free) are out there, the compiler needs to be able to compile them, and thus the same bugs can occur in new code as well.

I wish you well with your efforts regarding a simpler borrow checker.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:57 UTC (Mon) by ddevault (subscriber, #99589) [Link]

To be clear, we're only committed to backwards compatibility following 1.0. Borrow checker research will be done prior to 1.0.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 23:57 UTC (Mon) by roc (subscriber, #30627) [Link]

Optimizing for a simpler language and implementation makes sense when your audience is mostly the people working on the language and implementation. It makes less sense when your audience is mainly people developing in the language, and even less sense when you also consider users of software developed in the language.

Language simplicity is good for developers, of course, but absence of memory corruption and data races provides another kind of simplicity. (And I think Rust is relatively free of the kind of "accidental" complexity that C and C++ are full of.)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:49 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (95 responses)

> Not everyone is working on airplane guidance systems.

And bugs in such environments are usually much nastier, like NaNs propagation due to a 0/0 or inf-inf somewhere in a calculation, int-to-float-to-int loss of precision, etc. There's actually a very wide class of bugs that result from people deliberately ignoring the target system and standards, and constantly relying on the compiler to hide problems is not going to make these classes of issues disappear, quite the opposite. I've seen developers ask me "what's so special with this value 4294967295 ?". You definitely never want to climb on a plane that uses that person's code. Even with the help of a compiler...

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:14 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (94 responses)

Combining this:

> There's actually a very wide class of bugs that result from people deliberately ignoring the target system and standards

with a sibling reply of yours[1]:

> Most problems I'm seeing actually result from undefined behaviors (especially recently when compilers started to abuse them to win the compilers war)

If one doesn't like the compiler (ab)using undefined behaviors in the standard, isn't that a consequence of ignoring the standard? Sure, one could argue that the *standard* is the silly thing here, but what compiler is going to give "what I meant" to you? And at that point, are you really coding in C anymore?

I feel like if you were to apply the sentiment expressed here to the complaints in the other one, you'd be cursing the developer for writing code that didn't adhere to the standard than to the compiler for (arguably rightly) not understanding what the developer actually wanted because it wasn't communicated properly.

[1] https://lwn.net/Articles/893500/

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:38 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (93 responses)

> Sure, one could argue that the *standard* is the silly thing here, but what compiler is going to give "what I meant" to you? And at that point, are you really coding in C anymore?

C code that only works with -fwrapv is still C code.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:51 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (92 responses)

Sure, if it only ever gets built in an environment where such a flag is provided. But vendoring involves the moral equivalent of `$(CC) subdir/*.c` often enough that claiming it as C in such a situation is quite dangerous.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:36 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (91 responses)

Let me be less flippant and more explicit: C was originally conceived as a more portable assembler. UB did not mean "the compiler twists your code into a pretzel," it meant "we don't know what will happen, because there's an obscure implementation from 1972 that traps, and then the OS gets involved, and then who the hell knows what happens after that, so therefore if you do this thing, weird stuff may happen on that one implementation." If you knew, for a fact, that you were not targeting this obscure implementation, then you could write UB just fine, and treat it as if it were implementation-defined instead.

Many of the UB rules were written to support architectures that, by modern standards, are just not things people use any more. Nobody uses ones' complement or sign-magnitude for integers. Hardly anybody uses sNANs. Segmented architectures are very uncommon these days, as is the infamous NaT bit from Itanium. In a sane world, I'd be able to just say "well, I don't care about targeting any of those weird platforms; if you want to support them, you're on your own."

But I can't say that, because compiler writers have decided that UB means "the compiler twists your code into a pretzel." I understand that some of these optimizations do improve performance in various ways, but they also result in things like optimizing out NULL checks if you can prove that the pointer was previously dereferenced (a problem which has struck the kernel more than once, as I recall). I just wish we could have more of a happy medium, where you don't get UB unless you actually corrupt the heap, overflow the stack, or some similar catastrophe. All other forms of UB, IMHO, should have been specified as "implementation-defined, but the implementation may specify UB if it is unable to provide any guarantees." Then at least we could characterize this as a quality-of-implementation issue.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:14 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> Let me be less flippant and more explicit

Thanks :) .

> In a sane world, I'd be able to just say "well, I don't care about targeting any of those weird platforms; if you want to support them, you're on your own."

I agree that some kind of:

std::static_assert(std::target::is_twos_complement()); // definitely required
std::warn_on(!std::target::os::any_of(std::target::os::macos, std::target::os::linux)); // Windows is untested, but allow tinkering

support would be wonderful. But that's missing and is currently only metadata.

> But I can't say that, because compiler writers have decided that UB means "the compiler twists your code into a pretzel."

The twos complement example gets used all the time, but I heard another tale of why signed overflow is undefined: type promotion. If I iterate over a `short` and overflow is important, promoting to an int to access faster instructions or whatever and just letting it overflow to `USHORT_MAX+1` is no longer a valid optimization. C type promotion is weird and the differences between signed and unsigned promotions are likely reasons why one has defined extrema behaviors, but I don't know that much about that side of it. I admit that my grokking of the relevant rules is fuzzy at best and compiler warning-guided at worst.

Rust's `Wrapping` and `Saturating`types make this far better by making such assumptions explicit. The debug check vs. release free-for-all is an acknowledgement that such behavior is fine in most cases, but should be considered. Those that want specific assumptions recognized should really tell the compiler about them. But Rust has ways of doing so and C and C++ still lack it.

> I understand that some of these optimizations do improve performance in various ways, but they also result in things like optimizing out NULL checks if you can prove that the pointer was previously dereferenced (a problem which has struck the kernel more than once, as I recall).

Yes. I'd love the compiler to signal "hey, I made UB-assumptions", but that is apparently very non-trivial in practice. At least with current compiler architectures. Maybe someone will implement better origin tracking when working on LLVM bytecode or GCC's GIMPLE to "see" what was actually written, but I don't see new C++ compiler infrastructure getting anywhere in the next 5 years seeing how many have "given up" and migrated to being LLVM/Clang reskins (though many were EDG skins, this consolidation is not promising IMO). I suspect the prevalence of macro-stamped code and optimizing inline code makes this difficult because you *want* such optimizations in those cases. Template instantiation almost certainly has similar problems.

Additionally, Ralf Jung's blogs[1] on[2] provenance[3] show that there are ways that different passes, while fine on their own, may *combine* to abuse UB into something unintended (though this is more about showing that provenance has to be a thing than UB in general, I would be surprised if there were no cases of such behaviors between various dead code and value analysis optimization passes). I have no idea how that is expected to be tracked and handled internally.

[1] https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html
[2] https://www.ralfj.de/blog/2020/12/14/provenance.html
[3] https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:30 UTC (Tue) by nye (guest, #51576) [Link] (4 responses)

> things like optimizing out NULL checks if you can prove that the pointer was previously dereferenced

This sort of thing is exactly why we shouldn't be asking for languages which "trust the programmer". By dereferencing a pointer, according to the rules of C, the programmer is instructing the compiler that it can assume the pointer is not null. It seems churlish to then turn around and complain that the compiler took you at your word.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:01 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (3 responses)

Quite the opposite. In practice I *know* that dereferencing a NULL on any modern platform causes a SEGV and I'm using it exactly for that purpose. In C that's UB so compilers decided that since the program doesn't exist afterwards it's not their problem and they can eliminate the code. Except that my code was there precisely to provoke a panic and crash the program before it degenerates, while still preserving registers and frame pointer intact. Using abort() is not an option for this (it ruins everything and sometimes you can't even unwind the stack when you're mailed the core using on libs that are not exactly yours). Result: I had to cheat and dereference (int*)1 because the compiler didn't know it was NULL as well. It's constantly a can-and-mouse game between C developers and compiler developers, with the former saying "let me use my processor and OS for the purpose they were built" and the latter saying "we don't want you to do that because that's stupid". It may be stupid from a compiler developer's point of view, but if all C developers were compiler developers we wouldn't need gcc nor clang and would each one develop our own compiler. So please let us dictate the compiler what we want to do so we can use our hardware in peace.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:44 UTC (Tue) by excors (subscriber, #95769) [Link]

> In practice I *know* that dereferencing a NULL on any modern platform causes a SEGV and I'm using it exactly for that purpose.

That depends on what you consider a modern platform - there are ARMv8-M microcontrollers where address 0 is the start of flash, and reading NULL will return some non-zero data (because that's where the vector table is stored). (Writing to NULL might raise an exception or actually modify flash, depending on the current configuration of the flash controller.)

> Except that my code was there precisely to provoke a panic and crash the program before it degenerates

If you're explicitly writing to NULL, Clang will optimise it away but helpfully tells you:

> warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
> *(int *)0 = 0;
> ^~~~~~~~~
> note: consider using __builtin_trap() or qualifying pointer with 'volatile'

volatile prevents the optimisation, so it will emit the write instruction. __builtin_trap typically emits an undefined instruction (ud2 on x86), which will crash even on systems where 0 is a valid address.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 15:18 UTC (Tue) by nye (guest, #51576) [Link]

Oh come on - you're talking about deliberately writing something that you know is invalid in the chosen language and then complaining that the compiler implementer didn't do what you mean. And *then* getting angry about them not trusting you! I don't see how anyone could consider this a defensible position.

Do What I Mean

Posted May 3, 2022 16:27 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Rather than "let me use my processor and OS for the purpose they were built" what you actually seem to want, though you don't seem aware of it, is "Do What I Mean" which is not deliverable, hence why you're unsatisfied.

First of all you will need to write what you meant, if you can do that, compilers can produce programs that do what you wrote and you'll be satisfied. But most often unhappy C programmers find that they really struggle to write what they meant, and that's where the problem lies, because if you can't write what you meant, then a program which does what you wrote won't do what you meant and you'll be unhappy.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

Exactly what I meant, thanks Kevin ;-)

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:00 UTC (Tue) by felix.s (guest, #104710) [Link] (83 responses)

Many of the UB rules were written to support architectures that, by modern standards, are just not things people use any more. Nobody uses ones' complement or sign-magnitude for integers. Hardly anybody uses sNANs. Segmented architectures are very uncommon these days, as is the infamous NaT bit from Itanium.

The fact that the popularity of those architectures has faded recently doesn’t make them any less legitimate, and doesn’t preclude reusing the ideas they are based on in the future.

Do you want to condemn software to being either re-written from scratch whenever a new architecture appears or subjected to costly, painful emulation of behaviour that nobody really wants, because it’s just too hopeless to attempt to scrutinise the entire codebase for any unportable assumptions it may contain?

This is (in a case of poetic spite, I must admit) more or less the situation Rust finds itself in when it comes to porting to CHERI architectures. Rust made the blithe assumption that size_t is the same as uintptr_t because ‘come on, nobody uses segmented architectures any more’, and named the common type usize. And then Morello appeared and suddenly there is an architecture where C is easier to port than Rust, for this and a couple of other reasons.

Last I checked, the direction Rust seems to be going is to swallow the bitter pill and define usize to be uintptr_t, accepting the resulting memory bloat in situations where size_t happens to be smaller.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:07 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (79 responses)

> Do you want to condemn software to being either re-written from scratch whenever a new architecture appears

That's not the point. Right now with compilers abusing UB, there's no way for you as a user to have that portable code because it will work on neither architecture. If the compiler would translate your code into machine code, just thinking "this developer does stupid things, but that's their problem", then you could deal with special cases when you face them, as has always been done when porting code to other platforms or operating systems.

The problem got worse with UB abuse because the tricks you have to use to work around the compiler's stubbornness are even less portable than the original code itself. Is it normal that in 2022 I'm using more and more asm() statements to prevent the compiler from lurking into what I'm doing ? I don't think so. It feels like one day my whole C code will only be a bunch of macroes based on asm() statements. That's not my goal when I'm using a C compiler, really.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 15:07 UTC (Tue) by Wol (subscriber, #4433) [Link] (3 responses)

As somebody pointed out quite some while ago, compiler writers seems to be redefining defined behaviour as undefined.

What they SHOULD be doing is turning undefined behaviour into implementation defined. "On an x86_64 system, we don't check for addition overflow. You get what the hardware gives you". NOT "if you're stupid enough to add two integers both large enough for the high order bit to be set, we'll multiply them together instead then give you the middle bytes of the result". Okay, that example is facetious, but as people keep pointing out, when the programmer knows enough to put an "if ptr is null" guard in place, they do NOT want the compiler deleting it as undefined behaviour! FFS, the programmer clearly *knows* something could be wrong, and has put a test in there for it!

Cheers,
Wol

DeVault: Announcing the Hare programming language

Posted May 3, 2022 18:24 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> when the programmer knows enough to put an "if ptr is null" guard in place, they do NOT want the compiler deleting it as undefined behaviour! FFS, the programmer clearly *knows* something could be wrong

And when this is stamped out code from a macro or template instantiation, should it also not be removed? What a silly optimization to leave on the cutting room floor. Compilers could probably better track this stuff to know the difference between macros and template code, but it doesn't seem to be high on the priority list right now.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:01 UTC (Wed) by khim (subscriber, #9252) [Link]

> Compilers could probably better track this stuff to know the difference between macros and template code, but it doesn't seem to be high on the priority list right now.

It's not high on the priority list because people just couldn't agree on what things should be retained and which shouldn't be retained.

Without clear, consistent rules there are nothing to discuss. No matter what the compiler does or doesn't do there would always be someone who would claim it's wrong behavior.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:56 UTC (Wed) by khim (subscriber, #9252) [Link]

> FFS, the programmer clearly *knows* something could be wrong, and has put a test in there for it!

What makes that test any different from many other tests which the programmer knows that the compiler knows how to remove?

There were an attempt to define a friendly C dialect. And all failed because people just couldn't agree which checks are superflous and should be removed from the program and which are important and should be retained.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 16:40 UTC (Tue) by felix.s (guest, #104710) [Link] (70 responses)

Is it normal that in 2022 I'm using more and more asm() statements to prevent the compiler from lurking into what I'm doing ?

I don’t think so. I happen to do just the opposite, write the dumbest possible pure ISO C code and watch in satisfaction as the optimizer turns it into a compact and performant opcode sequence.

That's not the point. Right now with compilers abusing UB, there's no way for you as a user to have that portable code because it will work on neither architecture.

There is a way: you avoid the cases that trigger UB, and rely only on what the abstract machine guarantees. I can agree this is not always easy, but there are tools to help you with that. As long as the abstract machine is implemented correctly and its invariants are upheld, the program will work on any target it is compiled for.

And yes, this is very much the point. Either you insist that the C abstract machine map exactly to the primitives of the platform it’s implemented on, even in cases that are undefined on the abstract machine itself, or you don’t.

If you don’t, you forfeit any right to complain that compilers ‘abuse’ UB: if it’s undefined, it’s undefined, and it doesn’t even have to act deterministically. Undefined behaviour can change when the hardware changes, when the OS changes, when the compiler changes, when the placement of your program within the address space changes, when the day of the week changes, when the precise location of all electrons within the atmosphere changes. You are expected to prevent the situation triggering UB from happening in the first place. If you don’t, it’s your fault.

If you do so require, you give up on optimizations and portability, including portability to future versions of the same architecture. You accept that people are going to say ‘I have learned to write null pointer checks, that’s why there isn’t one present.’ and there is nothing you can say to convince them otherwise. You agree that software is going to do crazy shit like forging pointers to memory is has no right to assume is there, assuming that 640K is enough for anybody, and relying on open bus behaviour, and all that has to be preserved in perfect detail as long as you want to keep it running, even cases that were erroneous to begin with, until it’s rewritten to rely on another platform’s implementation details. A throwback to DOS days, if you ask me.

This is (a somewhat exaggerated version of) the dilemma you face. There is no third way. Based on your response, it seems you prefer the latter.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 18:22 UTC (Tue) by atnot (guest, #124910) [Link] (24 responses)

> you give up on optimizations and portability

I feel like people often mention "giving up on some optimizations" without expressing the full implications of what that actually means. It sort of makes it sound like if those annoying compiler authors just simply removed the bad passes, one could sacrifice a few percent here or there to get more predictable behavior. Now, as others have pointed out, the first problem with this idea is that these confusing non-obvious results often come from combining multiple obvious passes with no clear culprit.

But for this specific case, as far as I can tell, making it well-defined to turn arbitrary addresses into pointers and dereference them like would be required to make dereferencing null pointers valid, which is a complete non-starter. It makes it basically impossible to perform any optimizations at all. You can no longer rely on anything in memory still being the way it was across function calls, so no more storing things in registers. You have to spill everything to the stack. Removing an unused variable? Can't do it, something might be getting the address of it from somewhere. Just assigned something to a struct member? Well, you can't be sure what it is now, because there was a function call in between, and the implementation of free() might have held onto the address of that allocation and fiddled with it in between.

You can definitely argue about overflow UB and such, sure. But without some level of understanding of allocations and what a pointer can and can't point to, it is basically impossible to do anything at all. I'm sure some would prefer it that way, making C an actual "portable assembler" with no abstract machine of it's own. But that has huge implications.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:02 UTC (Wed) by Vipketsh (guest, #134480) [Link] (23 responses)

> making it well-defined to turn arbitrary addresses into pointers and dereference them [...]. You can no longer rely on anything in memory still being the way it was across function calls

Compilers can not rely on that today, at least not in general. Without the seldom used 'pure' and 'const' attributes, the compiler has to assume that an (extern) function call has modified any and all memory accessible through some pointer. Furthermore, there are rules in the C standard for when the compiler has to assume things may have been indirectly modified through random pointers: the aliasing rules. These rules are generally so loose that many people make them much more strict with -fno-strict-alias, yet somehow we haven't seen a huge fallout from lack of optimisations as you would suggest. Being able to manufacture pointers out of random data does not have to effect on any of those rules!

It's interesting that in exactly *no* discussion of undefined behaviour have I ever seen any sort of numbers passed around along the lines of "if we would define that thing this way, we would loose an estimated X% of performance on some code bases", instead it's all in the lines of your comment saying "Oh, the hysteria, quiver in fear because you could do exactly no optimisations". People arguing to remove some undefined behaviour tend to give examples of what that undefined behaviour makes a big pain or impossible, but there is little concrete arguments from the other side about what removing the undefined behaviour in question would loose. That makes discussions, awareness of the problem, and finding some sort of middle ground exceedingly difficult.

> you can't be sure what it is now, because there was a function call in between, and the implementation of free() might have held onto the address of that allocation and fiddled with it

Guess what ? Every implementation of free() "holds onto the address" given to it (puts it on some free list) and "fiddles with it" (marks the area as unallocated).

Why is everything always painted in a way that if you can't fix any and all possible cases of a certain undefined behaviour without even a minimum of compromise we may as well through the baby out with the bath water ? We don't have to make everything perfect and foolproof to make things better.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:36 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

> These rules are generally so loose that many people make them much more strict with -fno-strict-alias, yet somehow we haven't seen a huge fallout from lack of optimisations as you would suggest.

Surely you mean `-fstrict-aliasing` here. Or are you saying that people *loosen* the rules with `-fno-strict-aliasing` and still see no fallout?

> Every implementation of free() "holds onto the address" given to it (puts it on some free list) and "fiddles with it" (marks the area as unallocated).

That's an argument that `free` cannot be implemented in (ISO) C and ends up doing more platform-specific things with the pointer than C would normally allow. Just like `std::memmove` isn't technically possible (AFAIK) in ISO C++ (because of the rules around comparing pointer from separate allocations). See also `std::bless` in C++ to have a way to inform the compiler "I did some memory shenanigans, the object there is now C++-okay". I suspect that compilers "know" when they're compiling these functions and act accordingly (probably through some compiler flag or pragma whatnots). Or very careful coding around the rules that C has to make sure the intent is preserved across the abstract machine.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:57 UTC (Wed) by Vipketsh (guest, #134480) [Link]

> Surely you mean `-fstrict-aliasing` here.

Heh. I had a suspicion this would come up. The 'strict' in the compiler option refers to how strictly the compiler's alias analysis adheres to the standard. My use of 'strict' was referring to how many transformations are allowed by the standard. Wish I could have explained better.

> That's an argument that `free` cannot be implemented in (ISO) C

Indeed, but even so we don't have to disable all possible optimisations like the post I'm replying to is implying, while free() is routinely implemented in C.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:21 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (4 responses)

> It's interesting that in exactly *no* discussion of undefined behaviour have I ever seen any sort of numbers passed around along the lines of "if we would define that thing this way, we would loose an estimated X% of performance on some code bases",

I totally agree. Gcc 4.7 used to abuse UB way less than 6 and above, and I've yet to see a program run faster with gcc 11 than it used to with gcc 4, usually it's even the opposite!

I said a few times (probably in this thread I don't remember) that if I knew how to do it and had enough time I would be happy to create a new "standard" for gcc such as "safe11" or something like this that next to gnu99 and friends, would be C11 with most (ideally all) UB defined to the most commonly expected case (it wouldn't be that far from the "linux kernel C").

And I'm quite sure it would be quickly adopted by many of us suffering from such jokes. Plus it would remove a ton of non-sensical warnings such as the ones that force you to scratch you head for a moment when trying to implement a binary integer rotate operation without any warning (32-bit doesn't work, you need to use bit^31 in the opposite shift and the compiler doesn't always recognize it to optimize it into a single rol/ror operation).

DeVault: Announcing the Hare programming language

Posted May 4, 2022 16:30 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (3 responses)

> I would be happy to create a new "standard" for gcc such as "safe11" or something like this

There have been attempts[1]. I've not heard news about meaningful progress (though I've also not sought it out). I'd expect any announcements of such a thing to show up on LWN in some manner :) .

[1] https://blog.regehr.org/archives/1287

DeVault: Announcing the Hare programming language

Posted May 6, 2022 2:59 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (2 responses)

Ah interesting, thanks for the link!

Probably the mistake this person made was to try to reach a consensus. If the proposal worked for some old code base, surely it wasn't that bad, and ought to have been proposed as-is as a patch to gcc.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 13:05 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> If the proposal worked for some old code base, surely it wasn't that bad, and ought to have been proposed as-is as a patch to gcc.

And it would be promptly rejected. Because the question which would be asked would be simple: why do you think you are especially special and deserve a separate treatment?

As was noted in the blog post: there are many programs typically compiled for ARM that would fail if this produced something besides 0, and there are also many programs typically compiled for x86 that would fail when this evaluates to something other than the original value… and both types can be rewritten to work within limitations of standard C… so why should the compiler developers care?

More-or-less the only guy who they give special treatment is Linus: not only he leads a huge and important project, but, more importantly, it's obvious that said project need to go beyond boundaries of Standard C, sometimes.

Even then leeway is extremely limited, Linus have to argue about things a lot for these to be accepted as GCC C extension.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 18:40 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

> As was noted in the blog post: there are many programs typically compiled for ARM that would fail if this produced something besides 0, and there are also many programs typically compiled for x86 that would fail when this evaluates to something other than the original value… and both types can be rewritten to work within limitations of standard C… so why should the compiler developers care?

I do have a response to this: just look at the code for each of them to adapt to the other one's behavior to figure which choice has the least impact, and purposely break the other one, given that it currently is broken or about to break anyway during a future compiler upgrade. But at least this will be clearly documented. And when the cost is the same I'd choose x86 by default since 1) it's accumulated way more older code (arm code tends to be more modern and less arch-specific), and 2) it's where users go when they want the highest performance level nowadays.

> More-or-less the only guy who they give special treatment is Linus: not only he leads a huge and important project, but, more importantly, it's obvious that said project need to go beyond boundaries of Standard C, sometimes. Even then leeway is extremely limited, Linus have to argue about things a lot for these to be accepted as GCC C extension.

Yes, I know, and that's sad.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:32 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

> These rules are generally so loose that many people make them much more strict with -fno-strict-alias, yet somehow we haven't seen a huge fallout from lack of optimisations as you would suggest. [...] It's interesting that in exactly *no* discussion of undefined behaviour have I ever seen any sort of numbers passed around along the lines of "if we would define that thing this way, we would loose an estimated X% of performance on some code bases", instead it's all in the lines of your comment saying "Oh, the hysteria, quiver in fear because you could do exactly no optimisations".

It's trivial to construct plausible examples where aliasing has a huge effect on performance, especially in C++ where you don't want everything to alias with 'this'. E.g. https://godbolt.org/z/zddTYK378 executes over 4x faster with -fstrict-aliasing on my CPU (because the compiler can autovectorize the loop when it realises the input and output don't alias). You can probably do similar with most other undefined-behaviour optimisations, but I'm not sure that would really prove much.

I think one major problem with trying to translate that into "X% of performance on some code bases" is that there's a massive range of code bases, and no benchmark suite is representative of them all, so it's impossible to get representative numbers. But even if it was: If an optimisation has no effect on 99% of programs, but it makes 1% of programs 4x faster, is that worth it? It seems the most common positions are "it's always worth it, regardless of the exact numbers" (modern compiler developers), and "it's never worth it, regardless of the exact numbers" (people who want C to be nicer syntax for assembly code), and the exact numbers probably won't change anyone's mind.

And in any code base where performance is important, the developer should have already profiled and optimised it around their current compiler's capabilities - e.g. if they had code like my example with -fno-strict-aliasing then they'd probably extract 'sum' into a local variable to help the compiler. Then a benchmark would show no benefit from -fstrict-aliasing, because the programmer has already paid the cost of working around aliasing problems. Optimisation isn't just a compiler algorithm, it's a feedback loop between compiler and programmer, so you can't evaluate it properly by running compilers on a static set of benchmarks.

And it's a feedback loop that spans decades: e.g. compilers get really good at inlining and constant-folding and eliminating dead code, so people invent techniques like expression templates (where a C++ expression doesn't compute a value, it essentially computes a type that represents the AST of the expression, which can be manipulated at compile-time before eventually turning into hundreds of function calls that produce a single line of code), then they build a linear algebra library like Eigen using that technique, then applications start using the library, then compiler developers are motivated to improve autovectorization because there's all these applications doing linear algebra, etc.

At the end of that process, you can't just turn off one of the old compiler optimisations and expect to get meaningful results; too much code implicitly depends on it. And at the start of the process, you couldn't have predicted exactly what that optimisation would lead to; all you could predict is that if you had waited for quantifiable evidence of a major benefit then you'd never had made any progress.

(This argument mostly applies to C++, not C, but I think nobody cares enough about C to develop a serious compiler for it - you'll just get a C++ compiler with a cut-down parser, so you'll get the costs of these fancy optimisations without much of the benefit. That's the downside of sticking with a niche language like C.)

DeVault: Announcing the Hare programming language

Posted May 4, 2022 17:44 UTC (Wed) by Vipketsh (guest, #134480) [Link] (1 responses)

> that there's a massive range of code bases, and no benchmark suite is representative of them all,

That's exactly the kind of argument I was talking about in my last sentence that does not help these discussions. Somewhere along the way, someone put in a ton of work to write an optimisation pass to, I hope, produce more optimal output. Since everything is about optimising output, again, I would hope that there were at least *some* benchmarks published along with the new optimisation to show that maintaining the optimisation pass for the future is a good idea. Therefore when these discussions come up it should be pretty simple: "look, when this new NULL check deleting pass was added it brought X% to the table on this benchmark". At that point we would have a basis for discussion: maybe the code base in question isn't so important any more, maybe some other newer passes make the gains less relevant, or maybe just decide that the gains are not a good trade-off. With random hand-waiving and fear mongering there is no way a meaningful discussion can be had.

> At the end of that process, you can't just turn off one of the old compiler optimisations and expect to get meaningful results;

On the flip side optimisation passes can turn out to be meaningless because some other new passes don't create the sequences any more for it to be meaningful. It's also not like performance regressions are unheard of in compiler land. If we could have a discussion with numbers we could very well come to some tentative conclusion and disable the pass by default to see what falls out (let your users do the testing on "massive range of code bases").

DeVault: Announcing the Hare programming language

Posted May 6, 2022 0:28 UTC (Fri) by khim (subscriber, #9252) [Link]

> Since everything is about optimising output, again, I would hope that there were at least *some* benchmarks published along with the new optimisation to show that maintaining the optimisation pass for the future is a good idea.

True and you can find such benchmarks in the bugzilla (or github for clang). But nobody bothers to measure impact of optimizations based on different UBs. Because the assumption is that code doesn't have any.

In the end you have hundreds of passes and absolutely zero knowledge about which of them are applicable in which cases (except for a few, niche, UBs which are simple enough to deserve a dedicated flag).

> On the flip side optimisation passes can turn out to be meaningless because some other new passes don't create the sequences any more for it to be meaningful.

Sure. Compiler writers keep track of these things. What they don't keep is mapping between UBs and optimizations (again: with exception of explicitly created flags like -fstrong-aliasing or -fwrapv).

You can measure effect of different optimization passes, but you have absolutely no idea which of them are safe or not safe to use when you want to turn some UB into defined behavior.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 16:09 UTC (Wed) by atnot (guest, #124910) [Link] (12 responses)

> there are rules in the C standard for when the compiler has to assume things may have been indirectly modified through random pointers: the aliasing rules

Indeed. But those rely on the fact that just creating pointers to arbitrary memory is invalid. If de-referencing arbitrary addresses is valid, all bets are off.

> These rules are generally so loose that many people make them much more strict with -fno-strict-alias

It does the opposite, it makes them weaker, but only a bit. But that's kind of besides the point, which is that it is basically impossible to interface with memory at all without some kind of aliasing rules.

> People arguing to remove some undefined behavior tend to give examples of what that undefined behaviour makes a big pain or impossible, but there is little concrete arguments from the other side about what removing the undefined behaviour in question would loose.

Well, it's kind of impossible to know. There's not a single flag or pass you could turn off to e.g. reliably leave in null checks. The compiler might or might not have a specific code path for eliminating null pointers, but removing that doesn't mean those null dereferences won't be removed by other passes operating on similar assumptions. Or that something else critical won't be removed next time.

The thing is, even if it is phrased that way, the complaint is rarely actually "I would like this specific thing to be defined", it is "I would like the C abstract machine to behave exactly as simply as I think it does". But in a language as unconstrained as C, that's not really possible, nor would it really be a desirable slope to ride.

At the end of the day, that's what this is really about to me. I'm not personally elated when the compiler optimizes out my checks either. I'm not sitting here refreshing the gcc homepage, eagerly anticipating new optimization passes to break my code. But I recognize that these are the consequence of a language that desires to be both fast and accept programs that do arbitrary memory manipulations. And to me it's very clear that if we want to write programs that behave as we think they do, ones where our mental model and the compiler's model are one and the same, we have no choice but to give up one or the other. Just defining a few things won't be enough to make the problems go away.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 16:40 UTC (Wed) by excors (subscriber, #95769) [Link] (11 responses)

> There's not a single flag or pass you could turn off to e.g. reliably leave in null checks. The compiler might or might not have a specific code path for eliminating null pointers, but removing that doesn't mean those null dereferences won't be removed by other passes operating on similar assumptions.

There is -fno-delete-null-pointer-checks, which may not be reliable enough for security purposes but can easily pessimize code: https://godbolt.org/z/PGje44zna is autovectorized unless you enable that flag or remove the "*sum = 0;" line (which tells the compiler it can ignore the subsequent NULL checks).

(And for completeness a similar example with -fwrapv: https://godbolt.org/z/exzs7ocaj is only autovectorized when it can assume the loop does not overflow to negative values.)

DeVault: Announcing the Hare programming language

Posted May 4, 2022 17:45 UTC (Wed) by Vipketsh (guest, #134480) [Link]

> remove the "*sum = 0;" line (which tells the compiler it can ignore the subsequent NULL checks).

I don't think that example demonstrates a case for "derferencing NULL is undefined behaviour". The compiled code has the "if (!sum)" hoisted out of the loop, and once you do that optimisation the loop is no different than if the check where completely removed. Seems to me like the reason for the failed vectorisation is more an internal compiler issue, possibly because of the ordering of passes and not because "dereferencing NULL being undefined behaviour" is vital to the vectorisation.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 2:54 UTC (Thu) by foom (subscriber, #14868) [Link] (9 responses)

> There is -fno-delete-null-pointer-checks

Yes, this flag has a remarkably poor name. In fact, the flag doesn't "turn off deleting null pointer checks" (whatever that might mean). Rather, the underlying behavior (at least as implemented in Clang -- I believe the same is true for GCC) is entirely principled: it informs the compiler that the null pointer might actually refer to valid memory that a program can successfully (potentially even intentionally!) access as an object.

A _consequence_ is that "*foo = 0;" doesn't imply "foo != nullptr", as it otherwise does (so it does have the effect of "not deleting" THAT null pointer check).

DeVault: Announcing the Hare programming language

Posted May 5, 2022 17:28 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (8 responses)

> A _consequence_ is that "*foo = 0;" doesn't imply "foo != nullptr", as it otherwise does (so it does have the effect of "not deleting" THAT null pointer check).

A further consequence of enabling this flag is that you are no longer programming in ISO C:

> An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

… or C++:

> A null pointer constant is an integer literal (5.13.2) with value zero or a prvalue of type std::nullptr_t. A null pointer constant can be converted to a pointer type; the result is the null pointer value of that type (6.8.2) and is distinguishable from every other value of object pointer or function pointer type.

… since a null pointer can no longer be distinguished from a pointer to an object or function.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 18:10 UTC (Thu) by farnz (subscriber, #17727) [Link] (5 responses)

I'm not sure I follow your reasoning, and I'd appreciate you expanding on it.

Take the following C++ code:


bool bad_code(bool deref_null) {
    int *foo;
    int real_val;
    int *bar = deref_null ? nullptr : ℜ_val;
    *bar = 0;
    return bar == nullptr;
}

I don't see how the snippets you've quoted make it impossible for this function's return value to differ from its deref_null parameter. The null pointer remains a unique value; the behaviour of *bar = 0 is undefined, but importantly, if I remove that line, the function behaves the same in both ISO C++ and C++ with -fno-delete-null-pointer-checks - the distinction is that in ISO C++, this function can be optimized to the equivalent of:


bool bad_code(bool) { return false; }

while with -fno-delete-null-pointer-checks, it can only be optimized to:

bool bad_code(bool deref_null) { return deref_null; }

Although, in both cases, it's perfectly reasonable to elide or not elide the write to pointer value 0, since that write is UB.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 19:12 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (4 responses)

> The null pointer remains a unique value; …

Unique, yes—there is only one null pointer value—but not distinct from any pointer to an object or function. With the -fno-delete-null-pointer-checks flag enabled you can have a pointer to a valid object which compares equal to a null pointer.

> I don't see how the snippets you've quoted make it impossible for this function's return value to differ from its deref_null parameter.

(I am assuming that "ℜ_val" in your example was supposed to be "&real_val". I'm not sure of the purpose of the unused pointer variable "foo".)

According to ISO C++, with the "*bar = 0" line deleted, the return value must be equal to "deref_null". The "bar" pointer can only be "nullptr" when deref_null is true or "&real_val" when deref_null is false, and "&real_val", as a pointer to an object, can never compare equal to "nullptr". With the "*bar = 0" line it's UB when deref_null is true and so could be optimized to just "return false", as you said.

However, with the -fno-delete-null-pointer-checks is enabled, we do not have the guarantees of ISO C++ and "nullptr" could in theory compare equal to a pointer to an object, e.g. if pointers are represented as byte addresses, "nullptr" is represented as byte address zero, and the object (in this case "real_val") happens to be placed at byte address zero. If this happened then "bar == nullptr" would be true even if deref_null is false, so the function cannot be optimized to just "return deref_null".

DeVault: Announcing the Hare programming language

Posted May 6, 2022 13:38 UTC (Fri) by farnz (subscriber, #17727) [Link] (3 responses)

Sorry about the bad code formatting - I have no idea how copying and pasting from Emacs did that.

I don't see how you get "not distinct from any pointer to an object or function" from the description of the -fno-delete-null-pointer-checks flag. As I read the documentation, -fno-delete-null-pointer-checks does not permit you to have a pointer to a valid object that compares equal to a null pointer; instead it says that the act of dereferencing a pointer implies nothing about its value. Without the flag, dereferencing a pointer implies the pointer value must not be a null pointer, since if it was a null pointer, the dereference would result in UB (since a null pointer cannot point to a valid object). With the flag, however, while the dereference itself is still UB (since a null pointer cannot point to a valid object), the compiler acts as-if each dereference of a nullptr was immediately followed by an assignment of an unknown value to the pointer.

Because the value is unknown, it could still be a null pointer, but it could also be a new pointer to a valid object - the compiler's analysis passes simply don't know at this point, and thus it cannot rely on the dereference to permit it to remove a nullptr check, since it does not know what the pointer's value is.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 14:55 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (1 responses)

> I don't see how you get "not distinct from any pointer to an object or function" from the description of the -fno-delete-null-pointer-checks flag.

The default is -fdelete-null-pointer-checks, which has the description: "Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero."[0] The -fno-delete-null-pointer-checks flag affects *both* of these assumptions, meaning that the compiler cannot assume "that no code or data element resides at address zero" (i.e. that no pointer to an object has the same representation as a null pointer).

As stated in the documentation the intended use of the -fno-delete-null-pointer-checks flag is platforms such as AVR where objects *can* be placed at address zero, which implies that &variable can be indistinguishable from a null pointer. Though this is more likely to be true for a global or static object than for a stack variable.

[0] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#...

DeVault: Announcing the Hare programming language

Posted May 6, 2022 16:52 UTC (Fri) by farnz (subscriber, #17727) [Link]

Thanks for clearing up my misunderstanding - for some reason, I was mentally skipping the second assumption (since on my platforms of choice,there cannot be a code or data element at address 0, and only focusing on the first assumption (that programs cannot safely dereference null pointers, which is the one that allows a compiler to deduce that if you dereference a pointer, it cannot be a null pointer).

DeVault: Announcing the Hare programming language

Posted May 6, 2022 17:26 UTC (Fri) by mpr22 (subscriber, #60784) [Link]

> Sorry about the bad code formatting - I have no idea how copying and pasting from Emacs did that.

ampersand is the HTML/XML metacharacter for starting an entity, and although the standard says that entity references should include a final semicolon, HTML-handling software is more tolerant of the missing semicolon than is entirely ideal.

So it appears that the sequence &real gets overgenerously interpreted as an HTML entity equivalent to Unicode codepoint U+211C BLACK-LETTER CAPITAL R from the Letterlike Symbols block of the Mathematical Symbols, which has an alias name of "Real part".

DeVault: Announcing the Hare programming language

Posted May 7, 2022 6:03 UTC (Sat) by Vipketsh (guest, #134480) [Link] (1 responses)

I have to ask, what were you trying to add to the discussion ?

Sure, you are absolutely correct but what relevance does it have ? If the *compiler* has to assume that NULL points to a valid object what programs would break ? What other fallout would there be ? The only thing I can think of is that when of lawyering about "if (my_pointer == NULL)" you would have to say "Does my_pointer point to the object at address NULL?" instead of "Is my_pointer pointing to an invalid object?".

I think most interpretations of the standard, in the context of undefined behaviour, are simply done in bad faith. My opinion is that the reason that language is in there, and has to be there, is so that malloc(), or anything else that works with a pointer, can return or check for an error. And the reason dereferencing a NULL pointer is undefined is because there is no telling how a platform behaves when you do so. See how none of this has anything to do with the compiler ?

It would do so much good for these discussions if the standard and what it says were put aside. Talk about how one or another change would affect real existing programs and/or platforms. Talk about possible fallout. Talk about potential issues. Talk about benefits. Because "oh, how terrible, now some standard does not match up if I squint at this way" is completely meaningless and adds nothing. Standards, in general, should be looked upon as a nothing more than a aid to achieving interoperability (they can and do contain falsehoods). We all know that standards are violated all the time and to make things work one needs domain specific experience. Lastly, if you are writing a standard your goal should be to document the status quo and most definitely not an attempt to change the world.

DeVault: Announcing the Hare programming language

Posted May 8, 2022 12:48 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

> My opinion is that the reason that language is in there, and has to be there, is so that malloc(), or anything else that works with a pointer, can return or check for an error. And the reason dereferencing a NULL pointer is undefined is because there is no telling how a platform behaves when you do so. See how none of this has anything to do with the compiler ?

You have muddled the NULL pointer (an abstract idea) with the all zeroes address on a typical CPU, these are intentionally not the same thing.

While it's obviously a bad idea, C has no trouble with using actual values from a type as sentinels, atoi("junk") and atoi("0") are both zero. So it wouldn't have been a problem to define that malloc() returning zero can be either an error or an actual zero address. And because C runs on the abstract machine, not some actual platform with whatever weird behaviour, the question of what happens if we try to do platform illegal operations never comes up.

Most platforms are likely to either not be phased at all by the all-zeroes address, or to be equally concerned with some other address values, including values beyond some logical "end of memory", ROMs, and memory mapped peripherals. We can observe that the C language does not define special behaviour for any of these, only NULL which means something in the abstract machine and *that* is why it's used as a sentinel value.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 8:50 UTC (Wed) by kleptog (subscriber, #1183) [Link] (1 responses)

> And yes, this is very much the point. Either you insist that the C abstract machine map exactly to the primitives of the platform it’s implemented on, even in cases that are undefined on the abstract machine itself, or you don’t.

Isn't that the conflict though? On the one hand you have claims the C is a better assembler and good for writing low-level software (like the Linux kernel). On the other hand, C works with an abstract machine and if you go outside that you get undefined behaviour.

When writing something like the Linux kernel you have to do things that go outside the C abstract machine and so you end up fighting the C compiler the whole way. It assumes you have a functional abstract machine, yet that is what the kernel is trying to create.

The conclusion would seem to be: C is good for writing low-level software, except for the low-level parts of kernels.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 9:15 UTC (Wed) by Wol (subscriber, #4433) [Link]

Which is why languages like Rust, and in earlier times Modula-2, provided ways to step outside the language invariants, with the caveat "here be dragons". Languages which try and force good programming practice are impractical in practice.

And more and more I get the impression that modern C is trying to enforce good programming practice on an ancient code base (and failing miserably ...)

Cheers,
Wol

DeVault: Announcing the Hare programming language

Posted May 4, 2022 9:22 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (1 responses)

> There is a way: you avoid the cases that trigger UB, and rely only on what the abstract machine guarantees. I can agree this is not always easy, but there are tools to help you with that. As long as the abstract machine is implemented correctly and its invariants are upheld, the program will work on any target it is compiled for.

That doesn't work at all in practice due to portability. Look at syscalls, some used to take an int a long time ago, which was replaced with a socklen_t or a size_t or ssize_t over time. Integer promotion in C is a disaster. You cannot basically use any single integer in a portable way without having to write 1 or 2 consecutive casts without fearing that it might be incorrectly mapped. And it's getting worse when the input data you have was also defined as one of these types.

Casts are a big cause of bugs and they're made more and more common due to all the crappy abstraction types everywhere. Try to pass a time_t over the network. Hmmm does it need to be signed or unsigned ? 32 or 64 bits ? In doubt you might want to pass it as signed 64 bits. But then how to reliably decode it on the other side ? What if you picked the wrong type on the encoding side, won't you risk to get it decoded wrong for special values like -1 which could mean "forever" or "event not happened" for some syscalls ?

> If you don’t, you forfeit any right to complain that compilers ‘abuse’ UB: if it’s undefined, it’s undefined, and it doesn’t even have to act deterministically

As someone said above, they used to be undefined in that it was only hardware dependent. Now it's a free-pass for the compiler to say "awesome, this developer fell into my trap, then I can overoptimize that code and show my rival how faster my code is without all these useless checks". In addition, let me remind you that the C spec isn't open, you have to pay for it, and you discover the undefined behaviors very late in your developers' life. Sure, now some drafts are accessible, that you may consider almost identical to the official spec. But this alone is a big problem.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 11:39 UTC (Wed) by farnz (subscriber, #17727) [Link]

C has, since at least ANSI C89, had both undefined behaviour and implementation defined behaviour. Implementation defined behaviour is the stuff that's hardware-defined - for implementation defined behaviour, the compiler must tell you how it implements it (but can say things like "arithmetic overflow for 32 bit integers is defined by the behaviour of ADD EAX, EBX on your CPU, while for doubles, it's defined by the behaviour of FADD ST0, STi with truncation to 64 bit only happening if the compiler chooses to store the value to memory"), while for undefined behaviour, the compiler can do anything it likes.

The underlying "gotcha" with C for us older programmers is that optimizing compilers weren't very good until the late 1980s; before then, it was reasonable to model compilers as translating what I wrote 1:1 to a lower level language, then peephole optimizing that language, then repeating the process until the lower level language is machine code.

That's not how modern compilers work, however. They do much more sophisticated analyses to drive optimization, and can thus easily detect many more opportunities to optimize, but those analyses come up with results that are surprising if you're thinking in terms of peephole optimization.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 13:56 UTC (Wed) by Vipketsh (guest, #134480) [Link] (40 responses)

Why are we all talking about undefined behaviour as if this stuff would be some laws-of-nature that is set in stone ? They are *not* set in stone and there is no (technical) reason they can't be changed. There is also no reason why compiler authors couldn't say "this makes no sense, we'll define it like this" (e.g. -fwrapv) and thereby force the standards committee's hand. In other words, I can not accept arguments which end with "that's undefined behaviour, end of story" -- explain why it has to be undefined and/or what does the "undefined" nature of the behaviour get us (e.g. X% faster code, portability to X architecture, etc.) and then we can have a discussion about what is a better trade off: the benefits of the undefined behaviour or the benefits of not having it.

It would also be great if there were words to differentiate between undefined behaviour that can not be avoided (e.g. use-after-free) and those which we talk about only because compiler authors decided to explicitly add transformations based on said allowances (e.g. deleting NULL pointer checks due to NULL dereference being undefined). Lastly, I think we would be in a much better situation if 'undefined behaviour' would be read as 'allowed by the underlying machine, forbidden for the compiler'.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:25 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

> (e.g. deleting NULL pointer checks due to NULL dereference being undefined)

Everyone keeps repeating this, but given something where you have some inline-able utility functions that all safeguard against `NULL` internally (because that's the sensible thing to do). Do you not want to let the compiler optimize out those checks when inlining the bodies into a caller that, itself, has a NULL-then-bail check on that pointer (which a dereference is implicitly a check for as well)? If you want to remove this optimization, what kinds of optimizations are you willing to leave on the floor?

FWIW, once you get to LTO, the language it was written in may be long gone and seeing something like (forgive my lack of asm familiarity):

load eax, *ecx ; int x = *ptr;

xor ecx, ecx ; if (!ptr) return;
jnz pc+2
ret

and not remove that `xor/jnz/ret` sequence?

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:48 UTC (Wed) by Vipketsh (guest, #134480) [Link]

> inline-able utility functions that all safeguard against `NULL` internally [...] Do you not want to let the compiler optimize out those checks when inlining the bodies into a caller that, itself, has a NULL-then-bail check on that pointer

It is a very reasonable optimisation, but you don't need "dereferencing NULL is undefined behaviour" to make it! Proving that optimisation valid can be done with range analysis: since you have the first check for NULL, you know that the pointer is not NULL afterwards, and thus any following checks against NULL can not evaluate to true.

> what kinds of optimizations are you willing to leave on the floor?

If someone would actually quantify how much one or another optimisation gets us, we could have a discussion. I would guess that in many cases, it is something I could live with, but if people who do compiler benchmarks would tell me "well, that can loose you up to 10%" (or similar) I may very well concede that the undefined behaviour is worth keeping.

> FWIW, once you get to LTO, the language it was written in may be long gone

Yes, and no. LTO operates on the compiler's middle-end representation -- the same on which most all transformations occur. So, while the original language is indeed lost, all information from it needed to decide if one or another transformation is valid is still present.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:55 UTC (Wed) by khim (subscriber, #9252) [Link] (37 responses)

> They are *not* set in stone and there is no (technical) reason they can't be changed.

It can be done and it was done. Here is an example with undefined behavior, here is another with unspecified behavior.

> Why are we all talking about undefined behaviour as if this stuff would be some laws-of-nature that is set in stone ?

They are not set of stone, yet they are the rules of the game. But most discussions about undefined behavior go like “why do you say I couldn't hold basketball and run — I tried that and it works!”.

Yes, it may work if the referee looks the other way. But it's against the rules. If you want to make it rules-compliant then you have to change the rules first!

> I can not accept arguments which end with "that's undefined behaviour, end of story"

It is the end of story for a compiler writer or a C developer. Just like “it's against the rules” is “the end of story” for the basketball player.

> explain why it has to be undefined and/or what does the "undefined" nature of the behaviour get us (e.g. X% faster code, portability to X architecture, etc.) and then we can have a discussion about what is a better trade off: the benefits of the undefined behaviour or the benefits of not having it.

Sure. Raise that issue with ISO/IEC JTC1/SC22/WG21 committee. If they would agree — rules would be changed. Or you can try to see what it takes to change the compiler and report.

But rules are rules, they are “the status quo”. Adherence to rules doesn't need any justifications. But changes to the rules need a justification, sure.

> Lastly, I think we would be in a much better situation if 'undefined behaviour' would be read as 'allowed by the underlying machine, forbidden for the compiler'.

Such thing already exist in the standard. It's called “implementation-defined behavior”. Undefined behavior is called undefined because it's, well… undefined.

P.S. There are cases where modern compilers break a perfectly valid programs which don't, actually, trigger UB. That can only be called an act of sabotage, but these are different storied from what we are discussing here.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 18:03 UTC (Wed) by Vipketsh (guest, #134480) [Link] (36 responses)

You know full well that no individual can engage a committee and have any meaningful hope of ever bringing about change, and you also know full well that committees are more about preserving the entrenchment of their members than real technical progress. Thus any reference to them comes across as a simple means of excluding individuals without discussion.

It's also not like compiler writers have never ignored standards when it suited them. I think it was yourself who mentioned in some thread from a while ago that LLVM is doing all sorts of optimisations based on pointer provenance when there is exactly no mention of them in any standard. So, no, I can't just simply accept "it's against the rules" to end a discussion.

> “why do you say I couldn't hold basketball and run — I tried that and it works!”

To take your analogy further, maybe basketball would be better that way. If I think so, I can gather up a bunch of people go to some court to try it for a while and, if it works, maybe the basketball rules committee will take an interest and change the rules or maybe its the birth of a new sport. So, yes, turning certain undefined behaviours into defined ones in a compiler is exactly the place to have this discussion and the rule changes can very well happen later.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 18:28 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (4 responses)

There are certainly individuals at C++'s standards committee. I don't know that much about C since I haven't been there though. I don't think that "preserving the entrenchment" is at all how I'd describe it. There are certainly different pressures on things, but there is *not* a unified one anywhere other than "improve the language overall" (and not everyone agrees about every step taken). I've seen long-standing members get told "no, that idea doesn't work" about as much as I've seen new members get "oh, yeah, that's neat" (in rough proportion to their prevalence at least). Implementors have their hobby horses, library designers have theirs, and users are there too to express their views on things. "Unified" is not how I would describe it in relation to any single feature-in-progress.

FWIW, I go because I work on CMake and would like to ensure that modules are buildable (as would my employer). But I go to other presentations that I am personally interested in (when not a schedule conflict with what I'm there to do) and participate.

With the pandemic making face-to-face less tenable, I expect it to be easier than ever for folks to attend committee meetings.

> I think it was yourself who mentioned in some thread from a while ago that LLVM is doing all sorts of optimisations based on pointer provenance when there is exactly no mention of them in any standard.

That was me I believe. My understanding is that without provenance, pointers are extremely hard to actually reason about and make any kind of sensible optimizations around (this includes aliasing, thread safety, etc.). It was something that was encountered when implementing the language that turned out to be underlying a lot of things but was never spelled out. There is work to figure out what these rules actually are and how to put them into standard-ese ("pointer zap" is the search term to use for the paper titles).

DeVault: Announcing the Hare programming language

Posted May 4, 2022 18:51 UTC (Wed) by Vipketsh (guest, #134480) [Link]

If the C/C++ committees are indeed that nice, I have at least some hope of sensible changes -- thanks for sharing your experience.

(For reference, long ago I have engaged with the Unicode people as part of a research group and that experience was one of dealing with megalomanic cesspools, lobby groups, legal threats and the like -- truly awe full)

DeVault: Announcing the Hare programming language

Posted May 4, 2022 19:06 UTC (Wed) by khim (subscriber, #9252) [Link] (2 responses)

> That was me I believe.

That was different discussion.

> There is work to figure out what these rules actually are and how to put them into standard-ese ("pointer zap" is the search term to use for the paper titles).

This would have been great if, while these rules are not finalized, compilers produced working programs by default.

Instead clang not just breaks programs by default, it even refuses to provide -fno-provenance switch which may be used to stop miscompiling them!

And it's not as if rules were actually designed, presented to C/C++ committee (like happend with DR 236) then rejected because of typical committee politics. At least then you can say “yes, changes to the standard are not accepted yet, but you can read about our rules here”.

Instead, for two decades compilers silently miscompiled valid programs and their developers offered no rules which software developers can follow to develop programs which wouldn't be miscompiled!

That's an act of deliberate sabotage. And, worst of all, it have only gained prominence because of Rust: since Rust developers actually care about program correctness (and because LLVM miscompiled programs which they believed to be correct) they tried to find the actual list of rules that govern provenance in C/C++… and found nothing.

Yes, this provenance fiasco is awful blow against C/C++ compiler developers. You couldn't say “standard is a treaty, you have to follow it” and then turn around and add “oh, but it's too hard for me to follow it, thus I would violate it whenever it would suit me”. That's not a treaty anymore, it's a joke.

But even then: the way forward is not to ignore the rules, but an attempt to patch them. Ideally new compilers should be written which would actually obey some rules, but that's too big of an endeavor, I don't think it'll happen.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 19:36 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (1 responses)

> Ideally new compilers should be written which would actually obey some rules, but that's too big of an endeavor, I don't think it'll happen.

There may be hope. Some docs: https://lightningcreations.github.io/lccc/xlang/pointer (though no mention of provenance yet(?)).

https://github.com/LightningCreations/lccc

DeVault: Announcing the Hare programming language

Posted May 5, 2022 9:59 UTC (Thu) by khim (subscriber, #9252) [Link]

<p>I mean something which was done for safe Rust: formal mathematical model which may prove that transformation which are done during optimizations are sound.</p>

<p>Things like <a href="https://www.ralfj.de/blog/2020/12/14/provenance.html">that</a> should be impossible.</p>

<p>Today optimizations in compilers are dark art: not only they produce suboptimal code sometimes (that's probably something which we would never be able to fix), we discover, from time to time, that they are just simply invalid (as in: they turn perfectly valid programs into invalid).</p>

<p>Ideally we should ensure that all optimizations leave valid programs still valid (even if, perhaps, not optimal).</p>

DeVault: Announcing the Hare programming language

Posted May 4, 2022 18:33 UTC (Wed) by khim (subscriber, #9252) [Link] (30 responses)

> I think it was yourself who mentioned in some thread from a while ago that LLVM is doing all sorts of optimisations based on pointer provenance when there is exactly no mention of them in any standard.

Indeed. That was a great violation of rules and that's what pushed me to start using Rust, finally. I always was sitting on the fence about it, but when it turned out that with C/C++ I have to watch not just for UBs actually listed in the standard, but also other, vague and uncertain requirements which were supposed to be added to it more than a decade ago… at this point at became apparent that C/C++ are not suitable for any purpose.

The number of UBs is just too vast, they are not enumerated (C tried, but, as provenance issues shows, failed to list them, C++ haven't tried at all) and, more importantly, there are no automated way to separate code where UB may happen (and which I'm supposed to rewrite when new version of compiler is released) from code where UB is guaranteed to be absent (and which I can explicitly trust).

But that was just the final straw. Even without this gross violation of rules it was becoming more and more apparent that UBs have to go: it's not clear if low-level language without UB is possible in principle, but even if you eliminate them from 99% of code using an automated tools that would still be a great improvement.

> It's also not like compiler writers have never ignored standards when it suited them.

Very rarely. Except for that crazy provenance business (which is justfied by DR260 and the main issue lies with the fact that compilers started miscompiling programs without appropriate prior changes to the standard) I can only recall DR236 (where committee acted as committee and refused to offer any sane way to deal with the issue and just noted that the requirement of global knowledge is problematic).

And in cases where it was shown that UB requirements are just too onerous to bear they changed standard in the favor of C++ developers, thus it was definitely not one-way street.

> If I think so, I can gather up a bunch of people go to some court to try it for a while and, if it works, maybe the basketball rules committee will take an interest and change the rules or maybe its the birth of a new sport.

You are perfectly free to do that with the compilers, too. Most are gcc/clang based novadays thus you can just start with implementing your proposed changes, then can measure things and promote your changes.

> So, yes, turning certain undefined behaviours into defined ones in a compiler is exactly the place to have this discussion and the rule changes can very well happen later.

Indeed, but it's your responsibility to change the compiler and show that it brings real benefits. Just like most basketball players: they wouldn't even consider trying to play basketball with some changed rules unless you can show that other prominent players are trying that, changed, variant.

Some changes may even become new -fwrav, who knows? But it's changes to the language that need a justification, the rules are rules by default.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 19:14 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

> But that was just the final straw. Even without this gross violation of rules it was becoming more and more apparent that UBs have to go: it's not clear if low-level language without UB is possible in principle, but even if you eliminate them from 99% of code using an automated tools that would still be a great improvement.

It's easy to eliminate UB from any language. A computer language is Maths, and as such you can build a complete and pretty much perfect MODEL.

When you run your program, it's Science (or technology), and you can't guarantee that your model and reality co-incide, but any AND ALL UB should be "we have no control over the consequences of these actions, because we are relying on an outside agent". Be it a hardware random number generator, a network card receiving stuff over the network, a buggy CPU chip, etc etc. The language should be quite clear - "here we have control therefore the following is *defined* as happening, there we have no control therefore what happens is whatever that other system defines". If that other system doesn't bother to define it, it's not UB in your language.

And that's why Rust and Modula-2 and that all have unsafe blocks - it says "here be dragons, we cannot guarantee language invariants".

Cheers,
Wol

DeVault: Announcing the Hare programming language

Posted May 4, 2022 19:32 UTC (Wed) by khim (subscriber, #9252) [Link]

> And that's why Rust and Modula-2 and that all have unsafe blocks - it says "here be dragons, we cannot guarantee language invariants".

You just have to remember that not all unsafe blocks are marked with nice keywords. E.g. presumably “safe” Rust program may open /proc/self/mem.

But even then: it's time to stop making languages which allow UB to happen just in any random piece of code which does no such crazy things!

> It's easy to eliminate UB from any language. A computer language is Maths, and as such you can build a complete and pretty much perfect MODEL.

Sometimes model is just too restrictive. E.g. in Rust you have to go into unsafe realm just to create queue or linked list.

But even that, rigid and limited model, allows one to remove UB from a surprising percentage of your code. And it's the only way to write complex code. C/C++ approach doesn't scale. We don't have enough DJBs.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 0:33 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (27 responses)

As I understand it, WG21 (ie C++) does have a sub-committee attempting to enumerate the Undefined Behaviour.

Provenance is difficult, I would agree with you that C++ didn't do a great job here by trying to kick this ball into the long grass rather than wrestle with the difficult problem, but I think we should be serious for a moment and consider that if C++ 11 had tried to do what Aria is merely proposing (and of course hasn't anywhere close to consensus to actually do for real in stable yet) for Rust, it would have been stillborn.

The same exact people who are here talking about some hypothetical C dialect or new language where it does what they meant, (whatever the hell that is) would have said Provenance is an abomination, just let us do things the old way. Even though "the old way" doesn't have any coherent meaning which is why this came up as a Defect Report not as a future feature proposal.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 10:30 UTC (Thu) by khim (subscriber, #9252) [Link] (26 responses)

> Even though "the old way" doesn't have any coherent meaning which is why this came up as a Defect Report not as a future feature proposal.

It was abuse of the system, plain and simple.

One part of the standard say saying, back then, that only visible values matter and if two pointers are identical they should behave identically.

That's understanding of the vast majority of the practical programmers and this is what should have been kept as a default (even if it affected optimizations).

The other part of the standard was talking about pointer validity and contradicted the first, e.g. realloc in C99 (but, notably, not in C89) is permitted to return different pointer which can be bitwise identical to the original one.

That's what saboteurs wanted to hear and that's what they used to sabotage C/C++ community.

> Provenance is difficult, I would agree with you that C++ didn't do a great job here by trying to kick this ball into the long grass rather than wrestle with the difficult problem

It's not a “difficult problem” at all. If sabotage would have failed then rules for when identical pointers are not considered to be identical would have become not “hidden”, “unwritten” part of the standard, but a non-standard mode, extension.

If it would have been proven that they are helping to produce much better code then they would have been enabled explicitly in many projects. And people would have became aware.

Instead language was silently and without evidence changed behind the developer's back. That's real serious issue IMNSHO. Layman are not supposed to know all the fine points of law. S/he especially is not supposed to know all the unwritten rules.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 12:47 UTC (Thu) by foom (subscriber, #14868) [Link] (25 responses)

> saboteurs
> sabotage

This is inflammatory and unhelpful language. That certainly doesn't actually describe anyone involved, as I suspect you're well aware.

I believe what you actually mean is that you disagree strongly with some of the decisions made, and consider them to have had harmful effects. Say that, not "saboteurs".

DeVault: Announcing the Hare programming language

Posted May 5, 2022 13:02 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (2 responses)

I think it accurately reflects the sentiment of the vast majority of C programmers who nowadays are terrified to upgrade their toolchain on every system upgrade, and to discover new silent breakage that brings absolutely no value at all except frustration, waste of time and motivation, and costs.

The first rule should be not to break what has reliably worked for ages. *even if that was incorrect in the first place*. As Linus often explains, a bug may become a feature once everyone uses it; usage prevails over initial intent.

I'm pretty certain that most of the recent changes were driven exclusively by pride, so say "look how smart the compiler became after my changes", forgetting that their users would like it to be trustable instead of being smart.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 13:52 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

> The first rule should be not to break what has reliably worked for ages. *even if that was incorrect in the first place*. As Linus often explains, a bug may become a feature once everyone uses it; usage prevails over initial intent.

That's one possibility, yes. But there's another possibility: follow the standard. Dūra lēx, sed lēx.

Under that assumption you just follow the law. What law says… goes. Even if what the law says is nonsense.

That what C/C++ compiler developers promoted for years.

But if you pick that approach you cannot then turn around and say “oh, sorry, law is too harsh, I don't want to follow it”.

Either law is law and everyone has to follow it, it's enough to follow it, or it's not the law.

> I'm pretty certain that most of the recent changes were driven exclusively by pride, so say "look how smart the compiler became after my changes", forgetting that their users would like it to be trustable instead of being smart.

Provenance rules are not like that. They allow clang and gcc to eliminate calls from malloc and free in some cases. This may bring amazing speedups. And if these things were an opt-in option I would have applauded these efforts and it would have been a great way to promote them and to, eventually, add them to the standard.

Instead they were introduced in a submarine patent way, without any options, not even opt-out options. And they break standards-compliant programs.

That's an act of sabotage, sorry.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 15:26 UTC (Thu) by wtarreau (subscriber, #51152) [Link]

> That's one possibility, yes. But there's another possibility: follow the standard. Dūra lēx, sed lēx.
> Under that assumption you just follow the law. What law says… goes. Even if what the law says is nonsense.
> That what C/C++ compiler developers promoted for years.

I wouldn't deny that, but:
- the law is behind a paywall
- lots of modern abstractions in interfaces sadly make it almost impossible to follow. Using a lot of foo_t everywhere without even telling you whether they're signed/unsigned, 32/64 causes lots of trouble when you have to perform operations on them, resulting in you being forced to cast them and enter into the nasty area of type promotion. That's even worse when you try hard to avoid an overflow based on a type you don't know and the compiler knows better than you and manages to get rid of it.

We're really fighting *against* the compiler to keep our code safe these years. This tool was supposed to help us instead. And it failed.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 13:38 UTC (Thu) by khim (subscriber, #9252) [Link] (21 responses)

> That certainly doesn't actually describe anyone involved, as I suspect you're well aware.

This describes the majority of people I have talked with. I was unable to find anyone who have honestly claimed that careful reading the standard is enough to write the code which wouldn't violate provenance rules.

On the contrary: people expressed regret or, in many cases, anger about the fact that standard doesn't enable usage of provenance rules by the compilers, yet none claimed that they follow from the existing standard.

If the standard is the treaty between compiler and programmer then it was deliberate breakage of the treaty. Worse: it placed the users of the compiler at a serious disadvantage. How are they supposed to follow the rules which don't exist? Especially if even people who are yet to present these rules agree that they are really complex and convoluted?

And when people, horrified, asked for the -fno-provenance option? They got answer: although provenance is not defined by either standard, it is a real and valid emergent property that every compiler vendor ever agrees on.

Sorry, but if that is not an act of sabotage, then I don't know what is. And people who are doing the sabotage are called saboteours.

> I believe what you actually mean is that you disagree strongly with some of the decisions made, and consider them to have had harmful effects.

No. It's not about me agreeing with something or disagreeing with something.

> Say that, not "saboteurs".

Let me summarize what happened:

  1. Compiler developers have found out that certain parts of the standard allow language users to write certain questionable programs which are hard to optimize.
  2. After that was acknowledged they haven't changed the standard yet decided to break these standard-compliant programs.
  3. They haven't mentioned that fact in the release notes and spent no efforts to deliver that infortmation to users in any way.
  4. They also refused to offer any options which would allow one to use the questionable constructs.
  5. Naturally they also refuse to enable such options when you compile programs with -std=c89 (variant of the C standard which does not include any provisions for provenance whatsoever).

Sorry, but when you knowingly break standards-compliant programs and refuse to support them in any shape or form — that's an act of sabotage.

DeVault: Announcing the Hare programming language

Posted May 9, 2022 18:58 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (20 responses)

> Naturally they also refuse to enable such options when you compile programs with -std=c89 (variant of the C standard which does not include any provisions for provenance whatsoever).

Suppose it is the year 1889, the twentieth century seems bright ahead, and you have learned everything there is to know (or so you think) about Set Theory. (This is called Naive Set Theory). It seems to you that this works just fine.

Fast forward a few years, and this chap named Ernst Zermelo says your theory is incomplete and that's why it suffers from some famous paradoxes. His new axiomatised Set Theory requires several exciting new axioms, including the infamous Axiom Of Choice and with these axioms Zermelo vanquishes the paradoxes.

Now, is Ernst wrong? Was your naive theory actually fine? Shouldn't you be allowed to go back to using that simpler, "better" set theory and ignore Ernst's stupid paradoxes and his obviously nonsensical Axiom of Choice ? No. Ernst was right, your theory was unsound, _and it was already unsound in 1889_, you just didn't know it yet. Your naive theory _assumed_ things which Zermelo made into axioms.

Likewise, the C89 compilers you have nostalgia for were actually unsound, and it would have been possible (or if you resurrect them, is possible today on those compilers) to produce nonsensical results because in fact pointer provenance may not have been mentioned in the C89 standard but it was relied upon by compilers anyway. It was silently assumed, and had been for many years.

The excellent index for K&R's Second Edition of "The C Programming Language" covering C89, doesn't even have an entry for the words "alias" or "provenance". Because there are _assumptions_ about these things baked in to the language, but they haven't been surfaced.

The higher level programming languages get to have lots of optimisations here because assumptions like "pointer provenance" are necessarily true in a language that only has references anyway. To keep those same optimisations (as otherwise they'd be slower!) C and C++ must make the assumptions too, and yet to deliver on their "low level" promise they cannot. Squaring this circle is difficult which is why the committees punt rather than do it over all these years.

I happen to think Rust (more by luck than judgement so far as I can see) got this right. If you define most of the program in a higher level ("safe" in Rust terms) language, you definitely can have all those optimisations and then compartmentalize the scary assumption-violating low level stuff. This is what Aria's Tower of Weakenings is about too. Aria proposes that even most of unsafe Rust can safely keep these assumptions, something like Haphazard (the Hazard Pointer implementation) doesn't do anything that risks provenance confusion and so it's safe to optimize with those assumptions and more stuff like that can be safely labelled safe, until only the type of code that really mints "pointers" from integers out of nowhere cannot justify the assumptions and accordingly cannot be optimised successfully.

It's OK if five machine instructions can't be optimised, probably even if they're in your tight inner loop, certainly it's better than accidentally optimising them to four *wrong* instructions. What's a problem for C and C++ is that the provenance problem is infectious and might spread from that inner loop to the entire program and then you're back to slower than Python.

DeVault: Announcing the Hare programming language

Posted May 9, 2022 21:55 UTC (Mon) by Wol (subscriber, #4433) [Link] (19 responses)

> Your naive theory _assumed_ things which Zermelo made into axioms.

Whoops. I know a lot of people think my view of maths is naive, but that's not what I understand an axiom to be. An axiom is something which is *assumed* to be true, because it can't be proven. Zermelo would have provided a proof, which would have changed your naive theory from axiom to proven false.

This is Euclid's axiom that parallel lines never meet. That axiom *defines* the special case of 3D geometry. but because in the general case it's false, it's not an axiom of geometry.

Cheers,
Wol

DeVault: Announcing the Hare programming language

Posted May 9, 2022 22:39 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (18 responses)

Maybe I didn't express myself very well.

As I understand it, the problem with assumptions in naive set theories and C89 (and various other things) is that because it's an assumption rather than an axiom you don't spot where the problems are. You never write it down at all and so have no opportunity to notice that's it's too vague whereas when you're writing an axiom you can see what you're doing.

The naive theories let Russell's paradox sneak in by creating this poorly defined set, but the axioms in Zermelo's theory oblige you to define a set more carefully to have a set at all, and in that process the paradox gets defused. In particular ZFC has a "specification" axiom which says in essence OK, so, tell me how to make this "set" using another set and first order logic. The sets naive set theories were created for can all be constructed this way no problem, but weird sets with paradoxical properties cannot.

C89 assumes that pointers to things must be different, which sounds reasonable but does not formally explain how this works. I believe that it's necessary (in order for a language like C89 to avoid its equivalent of paradoxes, the programs which abuse language semantics to do something hostile to optimisation) to define such rules, and they're going to look like provenance.

I do not believe that C89 is fine, and thus that we should or could just implement C89 as if provenance isn't a thing and be happy. That's my point here. C89 wasn't written in defiance of a known reality, but in ignorance of an unknown one, like the Ptolemaic system. Geocentrists today are different from Ptolemy, but not because Ptolemy was right.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 2:18 UTC (Tue) by khim (subscriber, #9252) [Link] (17 responses)

> C89 assumes that pointers to things must be different, which sounds reasonable but does not formally explain how this works.

No. It's the opposite. C89 doesn't assume that pointers to things must be different. But yes, it asserts that if pointers are equal then they are completely equivalent — you can compare any two pointers and go from there.

Note that it doesn't work in the other direction: it's perfectly legal to have pointers which are different yet point to the same object. That's easily observable in MS-DOS's large memory model.

> I do not believe that C89 is fine, and thus that we should or could just implement C89 as if provenance isn't a thing and be happy.

Show me a Russel's paradox, please. Not in a form “optimizer could not do X or Y, which is nice thing to be able to do and thus we must punish the software developer who assumes pointers are just memory addresses”, but “this valid C89 program can produce one of two outputs depending or how we read the rules and thus it's impossible to define how it should work”.

Then and only then you would have a point.

> That's my point here.

I think you are skipping one important step there. Yes, there was an attempt to add rules about how certain pointers have to point to different objects. But it failed spectacularly. It was never adopted and final text of C89 standard don't even mention it. In C89 pointers are just addresses, no crazy talks about pointer validity, object creation and disappearance and so on.

There are some inaccuracies from that failed attempt: C89 defines only two lifetimes: static and automatic… yet somehow the value of a pointer that refers to freed space is indeterminate. Yet if you just declare that pointers are addresses then it should be possible to fix these inaccuracies without much loss.

Where non-trivial lifetimes first appeared is C99, not C89. There yes, it has become impossible to read from "union member other than the last one stored into", there limitation on whether you can compare two pointers or not were added (previously it was possible to compare two arbitrary pointers and if they happen to be valid and equal, they would be equivalent), etc.

But I don't see why C89 memory model would be, somehow, unsound. Hard to optimize? Probably. Wasterful? Maybe so. But I don't see where it's inconsistent. Show me. Please.

Yes, it absolutely rejects pointer provenance in any shape or form (except something like CHERI where provenance actually exist at runtime and is 100% consistent). Yes, it may limit optimization opportunities. But where's the Russel's paradox, hmm?

DeVault: Announcing the Hare programming language

Posted May 10, 2022 15:04 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (6 responses)

I don't have a Russell's Paradox equivalent under your constraints, but I think it's worth highlighting just how severe those constraints are.

Your resulting C compiler is not the GCC I grew up with (well, OK, the one that teenage me knew), or that with some minor optimisation passes disabled, it's an altogether different animal, perhaps closer to Python. In this language, pointers are all just indexes into an array containing all of memory - including the text segment and the stack, and so you can do some amazing acrobatics as a programmer, but your optimiser is badly hamstrung. C's already poor type checking is further reduced in power in the process, which again makes the comparison to Python seem appropriate.

I don't believe there is or was an audience for this compiler. People weren't writing C because of the expressive syntax, the unsurpassed quality of the tooling or the comprehensive "batteries included" standard library, it didn't have any of those things - they were doing it because C compilers produce reasonable machine code, and this alternative interpretation of C89 doesn't do that any more.

> (except something like CHERI where provenance actually exist at runtime and is 100% consistent)

You can only do this at all under CHERI via one of two equally useless routes:

1. The "Python" approach I describe where you declare that all "pointers" inherit a provenance with 100% system visibility, this obviously doesn't catch any bugs, and you might as well switch off CHERI, bringing us to...

2. The compatibility mode. As I understand it Morello provides a switch so you can say that now we don't enforce CHERI rules, the hidden "valid" bit is ignored and it behaves like a conventional CPU. Again you don't catch any bugs.

This is because under your preferred C89 "no provenance" model there isn't any provenance, CHERI isn't a fairytale spell it's just engineering.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 16:10 UTC (Tue) by khim (subscriber, #9252) [Link] (3 responses)

> Your resulting C compiler is not the GCC I grew up with (well, OK, the one that teenage me knew), or that with some minor optimisation passes disabled, it's an altogether different animal, perhaps closer to Python.

But that's is the language which Kernighan and Ritchie designed and used to write Unix. Their goal was not to create some crazy portable dream, they just wanted to keep supporting both 18-bit PDP-7 and 16-bit PDP-11 from the same codebase by rewriting some parts of code written in PDP-7 assembler in the higher-level language. They have been using B which had no types at all and improved it. By adding character types, then structs, arrays, pointers (yes, B conflated pointers and integers, it only had one type).

Yet malloc was not special, free was not special and not even all Unix programs used them (just look obn the source of original Bourne Shell some days).

> I don't believe there is or was an audience for this compiler.

How about “all the C users for the first decade of it's existence”? Initially C was used exclusively in Unix, but in 1980th it became used more widely. Yet I don't think any compilers of that era support anything even remotely resembling “pointer provenance”.

That craziness started after a failed attempt of C standard committee to redo the language. They then went back and replaced that with a simpler aliasing rules which prevented type puning, but even these weren't abused by compilers till XXI century.

> People weren't writing C because of the expressive syntax, the unsurpassed quality of the tooling or the comprehensive "batteries included" standard library, it didn't have any of those things - they were doing it because C compilers produce reasonable machine code, and this alternative interpretation of C89 doesn't do that any more.

Can you, please, stop rewriting history? C was quite popular way before ANSI C arrived and tried (but failed!) to introduce crazy aliasing rules. Yes, C compilers were restricted and couldn't do all the amazing optimizations… but C developers can do these, instead! When John Carmack was adopting his infamous 0x5f3759df-based trick he certainly haven't cared to think about the fact that there are some aliasing rules which may render code invalid and that was true for the majority of users who grew in an era before GCC started breaking good K&R programs.

> This is because under your preferred C89 "no provenance" model there isn't any provenance, CHERI isn't a fairytale spell it's just engineering.

It's engineering, yes, but you can add provenance to C89. It just has to be consistent. You can even model it with “poor man's CHERI” aka MS-DOS Large model by playing tricks with segment and offset. E.g. realloc could turn 0x0:0x1234 pointer into 0x1:0x1224 pointer if it decided not to move an object.

This way all the fast-path code would be negated and you would never have the situation where bitwise-identical pointers point to different objects. This may not be super-efficient but it is compatible with C89. Remember? Bitwise-different pointers can point to the same object, but the opposite is forbidden! Easy, no?

All these games where certain pointers can be compared but not others and its programmer's responsibility to remember all these unwritten rules… I don't know how that language can be used to development, sorry.

The advice I have gotten from our toolchain-support team is to ask clang developer about low-level constructs which I may wish to create!

So much for “standard is a treaty” talks…

DeVault: Announcing the Hare programming language

Posted May 10, 2022 18:03 UTC (Tue) by farnz (subscriber, #17727) [Link] (2 responses)

Early C did not have a formal specification - what the one and only implementation did was what the language did.

And the problem is that formally specified C - including K&R C, and C89 - left a huge amount unspecified; users of C assumed that the behaviour of their implementation of the language was C behaviour, and not merely the way their implementation behaved.

Up until the early 1990s, this wasn't a big deal. The tools needed for compilers to do much more than peephole optimizations simply didn't exist in usable form; remember that SSA, the first of the tools needed to start reasoning about blocks or even entire programs doesn't appear in the literature until 1988. As a result, most implementations happened, more by luck than judgement, to behave in similar ways where the specification was silent.

But then we got SSA, the polytope model, and other tools that allowed compilers to do significant optimizations beyond peephole optimizations on the source code, their IR, and the object code. And now we have a mess - the provenance model, for example, is compiler authors trying to find a rigorous way to model what users "expect" from pointers, not just what C89 permits users to assume, while C11's concurrent memory model is an effort to come up with a rigorous model for what users can expect when multiple threads of execution alter the state of the abstract machine.

Remember that all you are guaranteed about your code in C89 is that the code behaves as-if it made certain changes to the abstract machine for each specified operation (standard library as well as language), and that all volatile accesses are visible in the real machine in program order. Nothing else is promised to you - there's no such thing as a "compiler barrier" in C89, for example.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 19:57 UTC (Tue) by khim (subscriber, #9252) [Link] (1 responses)

> And the problem is that formally specified C - including K&R C, and C89 - left a huge amount unspecified; users of C assumed that the behaviour of their implementation of the language was C behaviour, and not merely the way their implementation behaved.

True, but irrelevant. The most important part that we discussing here was specified in both: pointers are addresses, if two pointers are equal they can be used interchangeably.

> But then we got SSA, the polytope model, and other tools that allowed compilers to do significant optimizations beyond peephole optimizations on the source code, their IR, and the object code.

Yes. And there was an attempt to inject ideas that make these useful into C89. Failed one. The committee has created an unreal language that no one can or will actually use. It was ripped out (and replaced with crazy aliasing rules, but that's another story).

> And now we have a mess - the provenance model, for example, is compiler authors trying to find a rigorous way to model what users "expect" from pointers, not just what C89 permits users to assume

Can you, please stop lying? Provenance models are trying to justify deliberate sabotage where fully-standard compliant programs are broken. It's not my words, the provenance proposal itself laments:

These GCC and ICC outcomes would not be correct with respect to a concrete semantics, and so to make the existing compiler behaviour sound it is necessary for this program to be deemed to have undefined behaviour.

To make the existing compiler behavior sound, my ass. The whole story of provenance started with sabotage: after failed attempt to bring provanance-like properties to C89 saboteurs returned in C99 and, finally, succeeded in adding some (and thus rendered some C89 programs invalid in the process), but that wasn't enough: they got the infamous DR260 resolution which was phrased like that: After much discussion, the UK C Panel came to a number of conclusions as to what it would be desirable for the Standard to mean.

Note: the resolution hasn't changed the standard. It hasn't allowed saboteurs to break more programs. No. It was merely a suggestion to develop adjustments to the standards — and listed three cases where such adjustments were supposed to cause certain outcomes.

Nothing like that happened. For more than two decades compilers invented more and more clever ways to screw the developers and used that resolution as a fig leaf.

And then… Rust happened. Since Rust guys are pretty concerned about program correctness (and LLVM sometimes miscomplied IR-programs they perceived correct) they went to C++ guys and asked “hey, what are the actual rules we have to follow”? And the answer was… “here is the defect report, we use it to screw the developers and miscompile their valid programs”. Rust developers weren't amused.

And that is when the lie was, finally, exposed.

So, please, don't liken problems with pointer provenance to problems with C11 memory model.

Indeed, C89 or C99 doesn't allow one to write valid multi-threaded programs. Everything is defined strictly for single-threaded program. To support programs where two threads of execution can touch objects “simultaneously” you need to extend the language somehow.

Provenance excuse is used to break completely valid C and C++ programs. It's not about extending the language, it's about narrowing it. Certain formerly valid programs have to be deemed to have undefined behaviour..And after more than two decades we don't even have the rules which we are supposed to follow finalized!

And they express it in a form of if you believe pointers are mere addresses, you are not writing C++; you are writing the C of K&R, which is a dead language. IOW: they know they sabotaged C developers —and they are proud of it.

> Nothing else is promised to you - there's no such thing as a "compiler barrier" in C89, for example.

Yes. And to express many things which would be needed to, e.g., write an OS kernel in C89, you need to extend the language in some way. This is deliberate: the idea was to make sure strictly-conforming C89 programs run everywhere, but conforming programs may require certain language extensions. Not ideal, but works.

Saboteurs managed to screw it all completely and categorically refuse to fix what they have done.

This looks like a good time to stop

Posted May 10, 2022 20:05 UTC (Tue) by corbet (editor, #1) [Link]

When we start seeing this type of language, and people accusing each other of lying, it's a strong signal that a thread may have outlived its useful lifetime. This article now has nearly 350 comments on it, so I may not be the only one who feels like that outliving happened a little while ago.

I would like to humbly suggest that we let this topic rest at this point.

Thank you.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 17:16 UTC (Tue) by Vipketsh (guest, #134480) [Link] (1 responses)

Where provenance makes sense is if the language you have has some higher level concept of an object which has some properties described in the program and known to the compiler. Most importantly object lifetime is known. This ends up bringing with it rules such as you can not just arbitrarily turn one object into another and back again (i.e. no pointer casts), you can not arbitrarily split one object into two (i.e. no pointer arithmetic) and you can not arbitrarily manufacture pointers out of random data. Unfortunately C is not such a language and by forcing provenance rules on it, one is in essence trying to retrofit some kind of object model to it without any of the expressiveness and enforced rules that are needed for the programmer to not make programmes that are obviously wrong under the assumptions (i.e. provenance). Worse yet the rules, or better said heuristics*, that standards and compilers chose to signal lifetime are counter to what existing programmes expect and exploit.

Re: your mathematics analogy. I think you have taken the wrong view point there: pretty much all of mainstream mathematics is concerned with either extending existing and useful theory (e.g. how rational numbers where extended to create irrational numbers) or to put existing theory on a more sound footing (e.g. Hilbert's axioms), possibly closing off various paradoxes. Realise how in pretty much all of the evolution of mathematics a very strong emphasis was placed on any new theory being backwards compatible -- no-one, taken seriously, wanted to ever end up with 1+1=3 but instead worked to solidify the intuition that 1+1=2. I think that if one wanted to paint mathematical evolution onto C, the definitions underpinning C would need to be changed in a way that (i) they are backwards compatible with existing programs and (ii) that loopholes exploited by compiler writers be closed instead of officially sanctioned. Right now, it's the opposite: people are trying to convince C programmers that the intuition they had all along was always false and reality is actually completely different.

*: Possibly the one with the most problems is the idea that realloc() will, in the absence of failure, always (i) destroy the object passed to it, and (ii) will allocate a completely new one. This is counter to the intuition of many a programmer and there is no enforced rule in C that prevents programmers from carrying pointers over the realloc() call, which would make the idea actually work. The reason people are annoyed is that such code exists, is used and has worked for a long time and there is no evidence that this idea has much, if any, benefit on the compiled program.

DeVault: Announcing the Hare programming language

Posted May 11, 2022 16:09 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

> Where provenance makes sense is if the language you have has some higher level concept of an object which has some properties described in the program and known to the compiler.

I shall quote K&R themselves on this subject in their second edition:

"An object, sometimes called a variable, is a location in storage, and its interpretation depends on two main attributes ..."

> This ends up bringing with it rules such as you can not just arbitrarily turn one object into another and back again (i.e. no pointer casts), you can not arbitrarily split one object into two (i.e. no pointer arithmetic)

Nope, a language can (and some do, notably Rust of course but also this is possible with care in C and C++) provide pointer casts and pointer arithmetic. Provenance works this just fine for these operations.

Rust's Vec::split_with_spare_mut() isn't even unsafe. Most practical uses for this feature are unsafe, but the call itself is fine, it merely gives you back your Vec<T> (which will now need to re-allocate if grown because any spare space at the end of it is gone) and that space which was not used yet as a mutable slice of MaybeUninit<T> to do with as you see fit.

> and you can not arbitrarily manufacture pointers out of random data.

But here's where your problem arises. Here provenance is conjured from nowhere. It's impossible magic.

> Unfortunately C is not such a language and by forcing provenance rules on it, one is in essence trying to retrofit some kind of object model to it without any of the expressiveness and enforced rules that are needed for the programmer to not make programmes that are obviously wrong under the assumptions (i.e. provenance)

As we saw C is in fact such a language after all. The fact that many of its staunchest proponents don't seem to understand it very well is a problem for them and for C.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 16:39 UTC (Tue) by farnz (subscriber, #17727) [Link] (9 responses)

When you say "if pointers are equal, then they are completely equivalent", are you talking at a single point in time, or over the course of execution of the program?

Given, for example, the following program, it is a fault to assume that ptr1 and ptr2 are equivalent throughout the runtime, because ptr1 is invalidated by the call to release_page:


handle page_handle = get_zeroed_page();
int test;
int *ptr2;
int *ptr1 = get_ptr_to_handle(page_handle);
*ptr1 = -1; // legitimate - ptr1 points to a page, which is larger than an int in this case and correctly aligned etc.
test = *ptr1; // makes test -1
release_page(page_handle);
page_handle = get_zeroed_page();
ptr2 = get_ptr_to_handle(page_handle); // ptr2 could have the same numeric value as ptr1.
if (ptr2 == ptr1 && *ptr1 == test) {
    puts("-1 == 0");
} else {
    puts("-1 != 0");
}
release_page(page_handle);

This is the sort of code that you need to be clear about; C89's language leaves it unclear whether it's legitimate to assume that *ptr1 == test, even though the only assignments in the program are to *ptr1 (setting it to -1) and test. The thing that hurts here is that even if, in bitwise terms including hidden CHERI bits etc, ptr1 == ptr2, it's possible for the underlying machine to change state over time, and any definition of "completely equivalent" has to take that into account.

One way to handle that is to say that even though the volatile keyword does not appear anywhere in that code snippet, you give dereferencing a pointer volatile-like semantics (basically asserting that locations pointed to can change outside the changes done by the C program), and say that each time it's dereferenced, it could be referring to a new location in physical memory. In that case, this program cannot print "-1 == 0", because it has to dereference ptr1 to determine that.

Another is to follow the letter of the C89 spec, which says that the only things that can change in the abstract machine's view of the world other than via a C statement are things marked volatile. In that case, this program is allowed to print "-1 == 0" or "-1 != 0" depending on whether ptr1 == ptr2, because the implementation "knows" that it is the only thing that can assign a value to *ptr1, and thus it "knows" that because no-one has assigned through *ptr1 since it read the value to get test it is tautologically true that *ptr1 == test.

Both are valid readings of this source under the rules set by C89, because C89 states explicitly that the only thing expected to change outside of explicit changes done by C code are things marked as volatile. But in this case, the get_zeroed_page and release_page pair change the machine state in a fairly dramatic way, but in a way that's not visible to C code - changing PTEs, for example.

And that's the fundamental issue with rewinding to C89 rules - C89 implies very strongly that the only interface between things running on the "abstract C89 machine" and the real hardware are things that are marked as volatile in the C abstract machine. In practice, nobody has bothered being that neat, and we accept that there's a whole world of murky, underdefined behaviour where the real hardware changes things that affect the behaviour of the C abstract machine, but it happens that C compilers have not yet exploited that.

Note, too, that I wasn't talking about optimization in either case - I'm simply looking at the semantics of the C abstract machine as defined in C89, and noting that they're not powerful enough to model a change that affects the abstract machine but happens outside it. I find it very tough, within the C89 language, to find anything that motivates the position that *ptr1 != test given that ptr2 == ptr1 and *ptr2 != test - it's instead explicitly undefined behaviour.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 17:00 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

> Both are valid readings of this source under the rules set by C89, because C89 states explicitly that the only thing expected to change outside of explicit changes done by C code are things marked as volatile.

No. Calling fread and fwrite can certainly change things, too.

> But in this case, the get_zeroed_page and release_page pair change the machine state in a fairly dramatic way, but in a way that's not visible to C code - changing PTEs, for example.

Yes and no. Change is dramatic, sure. But it's most definitely visible to C code.

By necessity such things have to either be implemented with volatile or by calling system routine (which must be added to the list of functions like fread and fwrite as system extension, or else you couldn't use them them).

Place where you pass your pointer to the invlpg would be place where compiler would know that object may suddenly change value.

> In practice, nobody has bothered being that neat, and we accept that there's a whole world of murky, underdefined behaviour where the real hardware changes things that affect the behaviour of the C abstract machine, but it happens that C compilers have not yet exploited that.

In practice people who are doing these things have to use volatile at some point in the kernel, or else it just wouldn't work. Thus I don't see what you are trying to prove.

The fact that real OSes have to expand list of “special” functions which may do crazy things? It's obvious. In practice your functions are called mmap and munmap and they should be treated by compiler similarly to read and write: compiler either have to know what they are doing or it should assume they may touch and change any object they can legitimately refer given their arguments.

> I find it very tough, within the C89 language, to find anything that motivates the position that *ptr1 != test given that ptr2 == ptr1 and *ptr2 != test - it's instead explicitly undefined behaviour.

No. You couldn't do things like change to PTEs in a fully portable C89 program. It's pointless to talk about such programs since they don't exist.

The only way to do it is via asm and/or call to system routine which, by necessity, needs extensions to C89 standard to be usable. In both cases everything is fully defined.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 18:10 UTC (Tue) by farnz (subscriber, #17727) [Link] (1 responses)

fread and fwrite are poor examples, because they are C code defined in terms of the state change they make to the abstract machine, and with a QoI requirement that the same state change happens to the real machine. Indeed, everything that's defined in C89 has its impact on the abstract machine fully defined by the spec; the only get-out is that volatile marks something where all reads and writes through it must be visible in the real machine in program order.

But note that this is a very minimal promise; the only thing happening in the real machine that I can reason about in C89 is the program order of accesses to volatiles. Nothing else that happens in the abstract machine is guaranteed to be visible outside it - everything else is left to the implementation's discretion.

And no, the state change is not visible inside the C89 abstract machine; if I write through a volatile pointer to a PTE, the implementation must ensure that my write happens in the real machine as well as the abstract machine, but it does not have to assume that anything has changed in the abstract machine. That, in turn, means that it may not know that ptr1 now has changed in the "real" machine, because it's not volatile and thus changes in the real machine are irrelevant.

And I absolutely can change a PTE without assembly or a system routine, using plain C code; all I need is something that gives me the address of the PTE I want to change. Now, depending on the architecture, that almost certainly is not enough to guarantee an instant change - e.g. on x86, the processor can use old or new value of the PTE until the TLB is flushed, and I can force a TLB flush with invlpg to get deterministic behaviour - but I can bring the program into a non-deterministic state without calling into an assembly block or running a system routine, as long as I have the address of a PTE.

And there's no "list of system routines" in C89; the behaviour of fread, fwrite and other such functions is fully defined in the abstract machine by the spec, with a QoI requirement to have their behaviour be reflected in the "real" machine. By invoking the idea of a "list of system routines", you're extending the language beyond C89.

You're making the same mistake a lot of people make, of assuming that the behaviour of compilers in the early 1990s and earlier reflected the specification at the time, and wasn't just a consequence of limited compiler technology. If compilers really did implement C89 to the letter of the specification, then much of what makes C useful wouldn't be possible; provenance is not something that's new, but rather an attempt to let people do all the tricks like bit-stuffing into aligned pointers (which is technically UB in C89) while still allowing the compiler to reason about the meaning of your code in a way compatible with the C89 specification.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 19:01 UTC (Tue) by khim (subscriber, #9252) [Link]

> By invoking the idea of a "list of system routines", you're extending the language beyond C89.

Which is the only way to have PTEs in C code.

> And I absolutely can change a PTE without assembly or a system routine, using plain C code; all I need is something that gives me the address of the PTE I want to change.

If you have such an address then you have to extend the language somehow. Or, alternatively, don't touch it.

> By invoking the idea of a "list of system routines", you're extending the language beyond C89.

Of course. Because it's impossible to write C89 program which changes the PTEs, such a concept just couldn't exist in it. You have to extend the language to cover that usecase.

> If compilers really did implement C89 to the letter of the specification

…then such compilers would have been as popular as ISO 7185. Means: no one would have cared about these and no one would have mentioned their existence.

> If compilers really did implement C89 to the letter of the specification, then much of what makes C useful wouldn't be possible

Yes. But some programs would still be possible. Programs which do tricks with pointers would work just fine, programs which touch PTEs wouldn't.

> provenance is not something that's new, but rather an attempt to let people do all the tricks like bit-stuffing into aligned pointers (which is technically UB in C89)

Citation needed. Badly. Because, I would repeat once more, in C89 (not in C99 and newer) the notion of “pointers which have the same bitpattern yet different” doesn't exist. If you add a few bits to the pointer converted to integer and then clear these same bits you would get the exact same pointer — guaranteed. The fact that these bits are lowest bits of converted integer is implementation-specific thing, you can imagine a case where these would live as top bits, e.g. So yet, that requires some implementation-specific extension. But pretty mild and limited.

Yes, provenance is an attempt to deal with the idea of C99+ that some pointers may be equal to others yet, somehow, still different — but that's not allowed in C89. If two pointers are equal then they are either both null pointers, or both point to the same object, end of story.

Sure, it makes some optimizations hard and/or impossible. So what? This just means that you cannot do such optimizations in C89 mode. Offer -fno-provenance option, enable it for -std=c89, done.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 17:26 UTC (Tue) by Vipketsh (guest, #134480) [Link] (5 responses)

You are playing a nice slight-of-hand here. Your code clearly shows function calls, and without knowing what's in those called functions, one has no choice but to assume that *ptr1 has changed (within the C abstract machine), and thus you can never get to the output of "-1 == 0". If, on the other hand, you do see the internals of those functions you will see the modification of some memory that may alias with ptr1 and then because of that you can not get the "-1 == 0" output.

The only way I can see your reasoning working is if you are somehow allowed to assume that function calls are an elaborate way of saying "nop".

DeVault: Announcing the Hare programming language

Posted May 10, 2022 18:36 UTC (Tue) by farnz (subscriber, #17727) [Link] (4 responses)

If I promise that the function calls are just naming what code does, but it's real behaviour is poking global volatile pointers, and those functions are implemented in pure C89, there's no difference in behaviour. Given the following C definitions of get_zeroed_page, get_ptr_to_handle and release_page, you still have the non-determinism, albeit I've introduced a platform dependency:


const size_t PAGE_SIZE;
struct pt_entry {
    volatile char *v_addr;
    volatile char *p_addr;
}
volatile struct pt_entry *pte; // Initialized by outside code, with suitable guarantees on v_addr for the compiler implementation and on p_addr
int *page_location = pte->v_addr;

void *get_zeroed_page() {
   pte->p_addr += PAGE_SIZE;
   memset(pte->v_addr, 0, PAGE_SIZE);
   return pte->v_addr;
}

void release_page(void *handle) {
  assert(handle == pte->v_addr);
  pte->p_addr -= PAGE_SIZE;
}

void *get_ptr_to_handle(void* handle) {
  assert(handle == pte->v_addr);
  return page_location;
}

This has semantics on the real machine, because of the use of volatile - the writes to *pte are guaranteed to occur in program order. But the compiler does not have any way to know that volatile int *v_addr ends up with the same value between two separate calls to get_ptr_to_handle but points to different memory.

Also, I'd note that C89 does not have language asserting what you claim - it actually says quite the opposite, that the compiler does not have to assume that *ptr1 has changed within the C abstract machine, since ptr1 is not volatile. It's just that early implementations made that assumption because to do otherwise would require them to analyse not just the function at hand, but also other functions, to determine if *ptr1 could change. Like khim, you're picking up on a limitation of 1980s and early 1990s compilers, and assuming it's part of the language as defined, and not merely an implementation detail.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 19:22 UTC (Tue) by Vipketsh (guest, #134480) [Link] (2 responses)

I still don't see it. You have that memset here:

> void *get_zeroed_page()
> [...]
> memset(pte->v_addr, 0, PAGE_SIZE);

If you don't have to assume that this writes over data pointed to by some other pointer it means that your aliasing rules say that no two pointers alias. Or put another way, for all practical purposes, having two pointers pointing to the same thing is unworkable. By some reading of C89 that may be the conclusion, but quite clearly that was never the intent and exactly no one expects things to work that way (including compiler writers, oddly enough).

> compiler does not have to assume that *ptr1 has changed within the C abstract machine

You mean across a function call ? That quite simply means that exactly no data could ever be shared by any two functions (in different compile units). Again, this would make the language completely unworkable and be counter what anyone expects.

> [...] you're picking up on a limitation of 1980s and early 1990s compilers, and assuming it's part of the language as defined, and not merely an implementation detail.

No. The language is defined, first and foremost, by what existing programs expect. If the standard allows interpretations and compilers to do things counter to what a majority of these programs expect, it is the standard that is broken and not the majority of all programs. I firmly believe that the job of a standard is to document existing behaviour and not to be a tool to change all programmes out there.

p.s.: I find it fascinating that instead of arguing about actual behaviour the C standard keeps coming up as if it where a bible handed down by some higher power and everything in it is completely infallible. Then the conclusion is that "See? It all sucks, so use Rust" because Rust is so excruciatingly well specified that, last I checked, it has no specification at all.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 20:10 UTC (Tue) by khim (subscriber, #9252) [Link]

> Then the conclusion is that "See? It all sucks, so use Rust" because Rust is so excruciatingly well specified that, last I checked, it has no specification at all.

Rust hasn't needed any specs because till very recently there was just one compiler. Today we have 1.5: LLVM-based rustc and GCC-based rustc. One more is in development, thus I assume formal specs would be higher on list of priorities now.

This being said IMNSHO it's better to not have specs rather than have ones which are silently violated by actual compilers. At least when there are no specs you know that discussions between compiler developers and language users have to happen, when you have one which is ignored…

DeVault: Announcing the Hare programming language

Posted May 10, 2022 23:48 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Rust doesn't have anything similar to ISO/IEC 14882:2020, a large written document which is the product of a lot of work but is of limited practical value since it doesn't describe anything that actually exists today.

However, Rust does extensively document what is promised (and what is not) about the Rust language and its standard library, and especially the safe subset which Rust programmers should (and most do) spend the majority of their time working with.

For example, all that ISO document has to say about what happens if I've got two byte-sized signed integers which may happen to have the value 100 in them and I add them together is that this is "Undefined Behaviour" and offers no suggestions as to what to do about that besides try to ensure it never happens. In Rust the "no specification" tells us that this will panic in debug mode, but, if it doesn't panic (because I'm not in debug mode and I didn't enable this behaviour in release builds) it will wrap, to -56. I don't know about you, but I feel like "Absolutely anything might happen" is less specific than "The answer is exactly -56".

Rust also provides plenty of alternatives, including checked_add() unchecked_add() wrapping_add() saturating_add() and overflowing_add() depending on what you actually mean to happen for overflow, as well as the type wrappers Saturating and Wrapping which are useful here (e.g. Saturating<i16> is probably the correct type for a 16-bit signed integer used to represent CD-style PCM audio samples)

DeVault: Announcing the Hare programming language

Posted May 10, 2022 23:48 UTC (Tue) by nybble41 (subscriber, #55106) [Link]

The access to volatile memory at pte->v_addr through the non-volatile pointer ptr1 is UB because according to [C89]:

> If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined.[57]

> [57] This applies to those objects that behave as if they were defined with qualified types, even if they are never actually defined as objects in the program (such as an object at a memory-mapped input/output address).

Objects in the page at pte->v_addr behave as if they were defined as volatile objects because the content changes in ways not described by the C abstract machine when pte->p_addr is updated. The same applies to passing a pointer to volatile object(s) to memset(), which takes a non-volatile pointer.

The initializer for page_location (pte->v_addr) is also not a constant, but I assume this is just pseudo-code for the value being set by some initialization function not shown here.

[C89] http://port70.net/~nsz/c/c89/c89-draft.html

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:24 UTC (Wed) by khim (subscriber, #9252) [Link] (3 responses)

> I'm using more and more asm() statements to prevent the compiler from lurking into what I'm doing ?

Yes, absolutely.

> I don't think so.

Why no? You said you want to “use my processor and OS for the purpose they were built” and in C all such code have to live in asm block.

The fact that it took so long for you to realize that is kinda unfortunate, but why would you perceive it as a bad thing?

> It feels like one day my whole C code will only be a bunch of macroes based on asm() statements.

If you insist on using non-portable construct in every line of code then sure, that's the proper outcome.

> That's not my goal when I'm using a C compiler, really.

That's the only proper way to write non-portable code in C. It makes non-portable code look like a portable code which is obviously a good thing.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 13:08 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (2 responses)

> You said you want to “use my processor and OS for the purpose they were built” and in C all such code have to live in asm block.

Ah no, sorry for not being clear. I have to use asm statements to prevent the compiler from being smart!

Typically stuff like this that current compilers are not yet able to optimize away to produce stupid code :

#define GUESSWHAT(v) ({ typeof(v) _v; asm volatile("" : "=rm"(_v) : "0"(_v)); _v; })

It usually only costs a move or two due to register allocation, so it's cheap. And using that in high-level code is ugly. But at least it doesn't know my pointer's value and doesn't play games in my back with it.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 14:08 UTC (Thu) by khim (subscriber, #9252) [Link] (1 responses)

> I have to use asm statements to prevent the compiler from being smart!

That's wrong way of doing things and you know that. Code outside of asm block have to follow the rules.

> But at least it doesn't know my pointer's value and doesn't play games in my back with it.

GCC doesn't know anything. It just emits asm blocks blindly. Clang certainly does know what happens in your asm block, it has a built-in assembler specifically for such cases.

I think what you actually want is std::launder (in C you can just call __builtin_launder directly).

DeVault: Announcing the Hare programming language

Posted May 5, 2022 15:22 UTC (Thu) by wtarreau (subscriber, #51152) [Link]

Thanks for the tip, but there isn't any trace of it in gcc's docs... Thus I'll keep that hack for a few more decades it seems.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 17:16 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (2 responses)

> Last I checked, the direction Rust seems to be going is to swallow the bitter pill and define usize to be uintptr_t, accepting the resulting memory bloat in situations where size_t happens to be smaller.

I would say that Aria's "experiment" in nightly suggests exactly the opposite. Rust may choose to promise only that usize is the same size as the address, not the pointer - deliver APIs that reflect this more sophisticated understanding of how provenance works, and tell everybody whose black magic now doesn't work on CHERI that's too bad, try again with new knowledge.

https://doc.rust-lang.org/nightly/core/primitive.pointer....

DeVault: Announcing the Hare programming language

Posted May 3, 2022 18:25 UTC (Tue) by felix.s (guest, #104710) [Link] (1 responses)

Let’s hope that it will be fruitful. Still, the conflation of those two is not the only such ‘all the world’s x86 and ARM’ assumption that I am saddened to see in Rust. I think it’s the ‘weird’ (segmented, non-twos’-complement, narrow address space, maybe even non-octet-based, etc.) architectures that are the ones that could benefit the most from a Rust port, because they are the ones starved the most for any good tooling. I would love to see some day a Rust port to Win16 or DOS, which is to say, to x86-16 with a ‘large’ or ‘huge’ memory model. And some may disagree with me, but I think a commitment to backwards compatibility is one of the few things that C ought to be applauded for.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 18:42 UTC (Tue) by joib (subscriber, #8541) [Link]

There is some ongoing work to make Rust view of pointers more generic. That is, to break the assumptions that usize (roughly equivalent to size_t for you C-heads) is the same size as a pointer, and that a pointer->usize->pointer roundtrip doesn't lose information. AFAICS the motivation is not to work with all those weird old and obsolete architectures, but more to work with things like CHERI (including ARM Morello which is an implementation of CHERI), but I guess some of that work might help with stuff like segmented architectures as well.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:34 UTC (Mon) by ballombe (subscriber, #9523) [Link]

Ironically, the two most serious security bugs this year are both in java programs. Memory safety only bring you so far.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:06 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

To be fair, that blog post doesn't show "the majority of 0days analyzed by google were use-after-frees". Out of 58 zero-days, only 17 were use-after-free, i.e. about 30%.

Of course, I agree with you that encouraging people to write systems software in a language that doesn't prevent UaF --- when Rust is an option --- is a mistake.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:17 UTC (Mon) by atnot (guest, #124910) [Link] (2 responses)

> Out of 58 zero-days, only 17 were use-after-free, i.e. about 30%

I was looking at the percentage of memory corruption vulnerabilities, which is 54%. But both are very big numbers.

> Of course, I agree with you that encouraging people to write systems software in a language that doesn't prevent UaF --- when Rust is an option --- is a mistake.

I don't think Rust's answer has to be the only option here. There's a lot of options, between Rust's lifetimes, go style escape analysis, zig's allocator shenanigans, Google's MiraclePtr, etc. What I think is unfortunate however is saying "it's hard so we shouldn't try" and releasing a language in 2022 where UAF, the #1 memory corruption issue in modern systems, is not even a consideration. That's just very disappointing.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:27 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

I agree.

I'm not saying Rust is the only answer to UaF. E.g. Go or Java GC is another fine answer for many applications. I mention Rust only because it is the most likely to be compatible whatever scenarios people dream up.

(And FWIW I don't think Zig's allocator shenanigans are a realistic answer to preventing UaF in production.)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 11:41 UTC (Mon) by khim (subscriber, #9252) [Link]

> (And FWIW I don't think Zig's allocator shenanigans are a realistic answer to preventing UaF in production.)

I also don't think so. But at least they have honestly tried to do something (even if that something is not very good).

To say that we haven't even tried? In 2022? Gosh.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 16:32 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (13 responses)

> I didn't see any tools for preventing use-after-free bugs.

Something that *prevents* use-after-free bugs will force you to do horrible things when what you're trying to do looks like a use-after-free. Never forget that a free() is just a matter of saying "I am no longer interested in that memory area in this scope". Nothing more. When you start to manage your own memory pools for example, you realize so much as UAF is a totaly gray area, because what's considered "free" at a level still has to be tampered into at a lower level. UAF prevention is nothing but a lie, or a way to force you to do extremely complicated things at a level that already suffers from extreme sensitivity, which easily guarantees you'll mess up with something when doing low-level programming.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:53 UTC (Mon) by atnot (guest, #124910) [Link]

> When you start to manage your own memory pools for example, you realize so much as UAF is a totaly gray area, because what's considered "free" at a level still has to be tampered into at a lower level

I think that's only really true in languages like C where there's no real mechanism for handling arbitrary memory safely, so the various static analyzers are forced to guess in ways that will invariably turn out to be incorrect.

Not to bring up Rust again in this thread, but it does offer a good example here: You would implement freeing for your pool by converting your value into a MaybeUninit<MyType> and dropping (freeing) it in place. At this point, the original value no longer exists as far as the language is concerned, but you still have a write-only handle to it's memory, which you can safely use as you please. Then, when the time comes to use that memory again, you can write to it and use an unsafe call to assume_init() to promise the memory is now valid again. This consumes your MaybeUninit<MyType> and gives you a shiny new value of MyType in return.

By using the type system cleverly in this way, you can uphold the guarantee that all values must always be valid and that UAFs are hence impossible, without losing the ability to tamper with freed memory at a lower level. I wonder if C static analyzers could be taught a similar thing.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 0:17 UTC (Tue) by roc (subscriber, #30627) [Link] (11 responses)

> Never forget that a free() is just a matter of saying "I am no longer interested in that memory area in this scope".

"And I promise to never access that memory again through this pointer". Making that promise and failing to honour it is where the problems arise.

> When you start to manage your own memory pools for example, you realize so much as UAF is a totaly gray area

When you recycle objects, e.g. using a pool, you may encounter "high level" UaF bugs, but language-level UaF prevention continues to prevent "low level" UaF bugs, where that memory could be reused for an entirely different purpose. That is important because it means that whatever type system guarantees the language provides continue to hold. It rules out stuff like UaF leading to "wild write"/"wild read" primitives that are almost inevitably expoitable.

Figuring out those "high level" UaF bugs is also a lot easier than general UaF bugs, because only code with access to that pool can be involved in those bugs. With arbitrary memory corruption any code in that address space can be at fault.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 11:04 UTC (Tue) by nix (subscriber, #2304) [Link]

> Figuring out those "high level" UaF bugs is also a lot easier than general UaF bugs, because only code with access to that pool can be involved in those bugs. With arbitrary memory corruption any code in that address space can be at fault.

Even then a language could in theory improve on things, by, perhaps, annotating pointers with the memory allocator or pool they point to (and requiring that to be part of the pointer's type). Now a high-level pointer can go free while an identically-valued but differently-typed pointer to the same memory that happens to be part of the implementation of the allocator knows that, from its POV, it is not free, prohibiting access through the first pointer but not the second.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:14 UTC (Tue) by wtarreau (subscriber, #51152) [Link] (9 responses)

> "And I promise to never access that memory again through this pointer". Making that promise and failing to honour it is where the problems arise.

No, that's not how it works. That's only the view of the end user who relies on libs but does not write them.

You only promise not to access that memory again between the *end* of the free() and the beginning of the next malloc()/free()/realloc(). Because free() itself and malloc() will use it a lot. And even calls to free() on other objects or malloc() returning another object might touch that area to cut it into pieces, merge it with another one, or restitute it to the system. That's it is important to understand how memory allocation works and not consider that free() is something that strict, because it is not (otherwise it would be impossible to write a memory allocator and you would have to stop and restart your program when you'd have used all your system memory since you wouldn't be allowed to reuse a previously used pointer).

DeVault: Announcing the Hare programming language

Posted May 3, 2022 16:49 UTC (Tue) by Tobu (subscriber, #24111) [Link] (6 responses)

You may get an address reused with a malloc() / free() / malloc() sequences but the pointer won't have the same provenance and won't be the same from the point of view of the abstract machine that defines the operational semantics of the language. The compiler will either know about malloc or offer a building block below it.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 8:34 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (5 responses)

But how will the compiler distinguish:

free(foo);
printf("Just released %p\n", foo);

which doesn't dereference the pointer, from:

free(foo);
printf("Just released %s\n", foo);

especially if the example is a bit more constricted by just calling a debug(void *p) function that takes the pointer in argument without telling if it just uses its value or dereferences it ?

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:05 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (4 responses)

The former is fine. The latter is definitely UB (as `MALLOC_PERTURB_` would show since it does a `memset` on `free`'d memory). FWIW, I run with `MALLOC_PERTURB_` at all times.

Provenance is what makes:

free(foo);
char* new_foo = malloc(1);
if (foo == new_foo) {
// by golly, we got lucky.
*foo = 1; // UB
}

That comparison is misleading due to provenance. It can be assumed to be false because `foo` is not allowed to *access* anything after that `free` even if its integer representation happens to be the same as `new_foo`. See the C and C++ papers by Paul McKenney about "pointer zap" about how to finally put provenance into the standard (instead of being something that implementers have had to craft to make sense of things as the languages have evolved).

Additionally, CHERI would show the folly of this code. C allows CHERI to exist. If you want to say "I don't care about CHERI", it'd be real nice if C would allow the code to have some marker that says "this code abuses pointer equality because we assume the target platform allows us to do this" so that any CHERI-like target can just say "this is broken" up front instead of waiting for whatever the optimizer does to finally trip up something in production.

As I said elsewhere: if you want to abuse C to be assembler for your target, it'd be real nice if that could be explicit instead of the doing "I'm using C for my list of targets, damn C's portability goals" and leaving "fun" landmines for others to run over later.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:28 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (3 responses)

I remember seeing this example somewhere but to be honest it really shocks me, that's purely forgetting that there is a real machine performing the operations underneath and using registers to set addresses. If the types are the same and the pointers are the same, it is foolish to permit the compiler to decide that they can hold different values. That's definitely a way to create massive security issues.

My feeling over the last 10 years is that the people working on C are actually trying to kill it by making it purposely unsuitable to all the cases that made its success. Harder to use, harder to code without creating bugs, harder to read. How many programs break when upgrading gcc on a distro ? A lot. The worst ones being those that break at runtime. This just shows that even if past expecations were wrong, they were based on something that made sense in a certain context and that was based on how a computer works. Now instead you have to imagine what a compiler developer thinks about the legitimacy of your use case and how much pain he wants to inflict you to punish you for trying to do it. Not surprising to see that high wave of new languages emerging in such conditions!

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:53 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (2 responses)

> … that's purely forgetting that there is a real machine performing the operations underneath and using registers to set addresses.

The error here lies in thinking that C makes any claims about pointers holding *addresses*. The representation of pointer values is not defined by the standard. In this area the ease with which most compilers permit integer/pointer conversions beyond what the standard defines is an attractive nuisance; pointers are not just "integers which can be dereferenced". Pointers are abstract references to memory locations (objects or storage for objects) which have associated lifetimes and can behave differently based on whether or not they've been initialized. There may or may not be a "real machine" underneath, as C can be compiled for an abstract virtual machine (like WASM) or even interpreted. Even when targeting real hardware (e.g. CHERI) there is no guarantee that you can freely convert between pointers and integers, or treat the representation of a pointer as a plain memory address. Pointer provenance may be something only the compiler is aware of, or it may have some representation at runtime (via tagged pointers).

Really the rules for using pointers in C without triggering undefined behavior are not that different from the rules for references in Rust. C just doesn't offer any help in tracking whether the requirements have been *met*, where Rust requires the compiler to take on most of that task.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 17:23 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (1 responses)

Let's say we disagree. Even the aliasing rules are contrary to the example above: both types are the same, modifying one pointer means the data at the other one *might* have been modified and the data from that other pointer must be reloaded before being used, unless the compiler knows that pointers are the same in which case it can directly use the just written data, which is the case here.

This type of crap is *exactly* what makes people abandon C. The compiler betrays the programmer based on rules which are stretched to their maximum extent, just to say that there was a tiny bit of possibility to interpret a standard for the sole purpose of "optimizing code" further, at the expense of reliability, trustability, reviewability and security.

I'm sorry but a compiler that validates an equality test of both types and values between two pointers and fails to see a change when one is overwritten is a moronic compiler, regardless of the language. You cannot simply trust anything produced by that crap at this point.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 18:01 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> Even the aliasing rules are contrary to the example above: both types are the same

I never gave a type for `foo`. I don't think it really matters what type `foo` is here (beyond allowing `1` to be assigned to its pointee type). Dereferencing is verboten after passing it to `free` regardless of its bitwise representation.

> just to say that there was a tiny bit of possibility to interpret a standard for the sole purpose of "optimizing code" further, at the expense of reliability, trustability, reviewability and security.

I don't know that "optimizing code" is the end goal per sē. I think it's actually about making things more consistent across C's target platforms.

If you write `int i = other_int << 32;` and expect 0, one target may be happy. Another might give you `other_int` (let's say it encodes the operation's right-hand value in 5 bits with the register index in the other 3 because why not). Mandating either requires one of them to avoid using its built-in shifting operation. What do you want C to do here? Just say "implementation-defined" and leave porters to trip over this later? You *still* need to maintain the version that has actual meaning for both targets if you care.

Now, if you want to say "I'm coding for x86 and screw anything else", that's great! Let's tell the *compiler* that so it can understand your intent. But putting your thoughts into code and then telling a compiler "I've got C here" when you actually mean "I've got x86-assuming C here" is just bad communication.

I'd like to see some way in the C language itself. Not a pragma. Not a compiler flag. Not a preprocessor guard using `__x86_64__`, but some actual honest-to-K&R C syntax that communicates your intent to the compiler. FWIW, I don't know that maintaining that preamble will be worth the work outside of the kernel's `arch/` directory, but hey, some people live there. I say that because you'll either have:

- divergent code anyways to tell the compiler about each assumption set; or
- have to write the C standard code that requires you to check how much you're shifting by before doing it anyways.

So, as said elsewhere in this thread, improving C is fine. But complaining that you're coding in C, breaking its rules, then complaining that the compiler isn't playing fair is just not reconciling beliefs with reality.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 18:04 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

As I understand it, that's actually not the same pointer!

The pointer I got from malloc() for my 400 byte allocation, and the pointer malloc itself relies on for managing the heap after I free() it are not actually the same pointer even though on a typical modern CPU they're the same magic number in a CPU register.

The pointer I had comes with provenance given to it by malloc(), it's a pointer to my 400 byte allocation. If I add 390 to it, that's a pointer to the last 10 bytes of the allocation, and we're all fine. But if I add -16 to it, that's not a pointer to anything.

The pointer used by the allocator uses different provenance, it's a pointer into the heap, and you can totally add -16 to it, because that's just a different pointer into the heap and as such fine.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 20:02 UTC (Tue) by ssokolow (guest, #94568) [Link]

Exactly. The abstract machine that the compiler's optimizers (and tools like LLVM sanitizers and miri) operate on tags each pointer with its parent allocation and, when you free(), that allocation is revoked, making dereferencing any pointers derived from it an invalid operation.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:20 UTC (Mon) by mpr22 (subscriber, #60784) [Link] (3 responses)

The programmer I am most concerned about trusting is not Present Me; ultimately, there is no way to avoid the need to trust Present Me when I am programming, since even if the language stops me committing memory safety errors, it generally can't stop me committing all the other kinds of error that can make my application accidentally give the User's security details to the minions of Mob, God, and Cop without the User's permission (though some languages and runtime libraries are better at inconveniencing me when I try to make those errors than others).

The programmers I am concerned about trusting are the different people called Future Me (will I be able to extend this code safely?), Past Me (did I write my old code safely in the first place?), and Not Me (did they write their code safely, and will they be able to extend my code safely?)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:32 UTC (Mon) by roc (subscriber, #30627) [Link] (1 responses)

Those are all good points, but it's not about *completely avoiding* the need to trust programmers, that's a straw-man. It's about *minimizing* the need to trust programmers where we have proven techniques to reduce such trust that aren't unduly burdensome. And Rust and Swift and other languages with good type systems have lots of ways to do that beyond just memory safety!

DeVault: Announcing the Hare programming language

Posted May 2, 2022 11:20 UTC (Mon) by mpr22 (subscriber, #60784) [Link]

Oh, absolutely; I think I've phrased myself poorly.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 15:08 UTC (Mon) by farnz (subscriber, #17727) [Link]

And specifically what I want, as a programmer with similar views to you, is for Present Me to be able to write my code such that when Future Me or Not Me (lacking the context Present Me is immersed in right now) does something foolish with it, they will get protests from the language implementation (compiler or interpreter) that tell them what context they need to acquire in order to be able to do their job without the protesting.

This is just because programming is hard, and there's a lot of context behind every decision you make while programming. If you lack that context, you'll make mistakes, and one of the things I now value in programming languages is telling you that you're missing context.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:23 UTC (Mon) by mjg59 (subscriber, #23239) [Link] (1 responses)

> OK, I trust DJB, but him only.

You really shouldn't - https://www.qualys.com/2020/05/19/cve-2005-1513/remote-co...

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:54 UTC (Mon) by roc (subscriber, #30627) [Link]

Oh dear, thanks for clearing that up.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 8:45 UTC (Mon) by FSMaxB (subscriber, #106415) [Link] (37 responses)

What immediately disqualifies hare as a serious contender in the space of systems programming languages in my opinion is that it proclaims to never support proprietary operating systems in the upstream implementation. Yes, others can provide an implementation for these, but this will both create an unnecessary divide and also facilitate the accidental introduction of unportable features or behavior in the standard library, adding to the divide.

I also think that having manual memory management is a big mistake that will lead to some (although definitely not all) of the mistakes from the C world that so regularly provide us with exploitable vulnerabilities. There are things with which a programmer just shouldn't be trusted, given their track record of screwing it up.

I applaud the effort in significantly cutting down on the amount of undefined behavior and in the enormous ergonomics improvements over C though, not least the error handling improvements.

I guess I have to out myself as one of the "rust fanboys", but all that means is that I follow a set of principles that rust happens to cater to best at the moment (in the realm of systems programming). Some of those are:
* Don't trust the programmer, humans make mistakes, so let's help prevent them. (note that this will always need an escape hatch like "unsafe" because no system can be perfect in preventing all mistakes but also allowing all "good" behavior [without going into a definition of "good" here])
* Provide the programmer with powerful tools to constrain themselves even further from doing incorrect things (that would usually mean a quite advanced type system)
* Be ergonomic to use, not just the language, but the tooling around it as well (build system, dependency management, library documentation etc.)
* Work on the most commonly used platforms with the possibility of supporting even more.

Rust is definitely far from perfect some of my issues that I bump into regularly:
* Compile times are horrendously slow when compared to Go for example (haven't tried that out with hare yet, but probably similarly fast?). That definitely is a hindrance and reduces the speed in which you can iterate when developing. Slower iterations mean less productivity (although some of the other features make more than up for the loss in productivity IMO, but that doesn't mean faster wouldn't be even better).
* Rust is eating resources like crazy. Notably CPU time and disk space (some of those target directories on my machine have grown to >100GiB at times if not regularly cleaned)
* What's up with these abbreviations, like e.g. "fmt" and "std". Typing a standard libraries main scope into a search engine shouldn't get you results about "sexually transmitted diseases" and what's up with "Vec"? C++ already admitted that "vector" was the wrong name for a dynamically sized array, rust not only copies that but then abbreviates it on top. Also "fn" and "str" and "recv" and "ctx" etc.

Btw. I really like hare's bootstrapping capabilities, but I still think that the importance of bootstrapping is a bit overstated. In this day and age, we usually use cross-compilation to get code running on new targets. That's even true for C. Ever bootstrapped C on a new platform by writing an assembler in machine code, then using Assembly to write a more expressive language, use that language to write an even more expressive language and so forth until you have implemented something for which a C Compiler exists? That's just not the way things are done in this day and age, except maybe for fun and education purposes. If it's ok for C to target new platforms not by being bootstrapped but by being cross-compiled, why should the expectation for a systems programming language competing with it be different? And that comes from someone having bootstrapped OpenJDK 8 from GCJ on PowerPC 32 in a 2 week effort a few years back on Gentoo, so I know at least somewhat what I'm talking about. I REALLLY wouldn't want to bootstrap rust from C on PowerPC 32, that won't be done in 2 weeks, of that I'm sure. Instead I would install a binary rust toolchain that has been cross-compiled and then go from there.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:01 UTC (Mon) by pabs (subscriber, #43278) [Link] (10 responses)

I think Bootstrappable Builds is one of the most important projects that exists in FLOSS days, alongside Reproducible Builds. You shouldn't have to trust binaries to be able to use a language. The Bootstrappable Builds folks are working on a bootstrap path from ~512 bytes of machine code plus tons of source code all the way up to a full distro.

https://bootstrappable.org/
https://reproducible-builds.org/

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:26 UTC (Mon) by FSMaxB (subscriber, #106415) [Link] (1 responses)

> I think Bootstrappable Builds is one of the most important projects that exists in FLOSS days, alongside Reproducible Builds. You shouldn't have to trust binaries to be able to use a language.

To me it is a question of priorities. I would rather have a language be easy to bootstrap then hard, and rust for example would definitely benefit from easier bootstrapping. But then again, it would benefit from a lot of things and and all of these require effort that can not be invested in other things.

Reproducible builds are much more important than easy bootstrapping in my opinion, because that way you need less trust because you can infer trust in one step of the bootstrap chain from the one before without having to do the entire bootstrap chain from scratch. And with reproducible builds, it only takes one person going through the full bootstrap chain to notice if something fishy is going on in a binary release.

Also note that nobody can ever review all the code that they're using, so a reliance on trusting other people is always required. Running a bootstrap yourself can help you only marginally with that.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 2:56 UTC (Tue) by pabs (subscriber, #43278) [Link]

A quote from the Bootstrappable Builds IRC channel:

<vagrantc> reproducible builds doesn't mean much without bootstrappability ...
<vagrantc> and bootstrapping is a lot stronger with reproducible builds...

The CREV folks are working on socially scalable cryptographically verifiable code review:

https://github.com/crev-dev/

DeVault: Announcing the Hare programming language

Posted May 2, 2022 9:34 UTC (Mon) by FSMaxB (subscriber, #106415) [Link]

Also note that reproducible cross compilation also allows you to trust binaries for a new platform without having to go through the bootstrap chain on that new platform.

So an easy bootstrap on one platform (e.g. x86_64) that produces a trusted binary is only one reproducible cross compilation away from a trusted binary on a different platform.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:42 UTC (Mon) by taladar (subscriber, #68407) [Link] (3 responses)

What is important about it though? Bootstrapping via so much source code that you can't review it all doesn't give you anything useful over having to trust binaries.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 2:55 UTC (Tue) by pabs (subscriber, #43278) [Link] (2 responses)

Binaries are unreviewable by most people, so it is better to start from smaller binaries to reduce the chance of them containing backdoors. The CREV folks are working on socially scalable cryptographically verifiable code review to solve the review problem:

https://github.com/crev-dev/

DeVault: Announcing the Hare programming language

Posted May 4, 2022 16:20 UTC (Wed) by khim (subscriber, #9252) [Link] (1 responses)

> Binaries are unreviewable by most people, so it is better to start from smaller binaries to reduce the chance of them containing backdoors.

Great! Tell me how you plan to exclude huge binaries written in Verilog and VHDL (and then backed in ASICs you actually use) and then you would have a point.

The truth is: from practical POV bootstrapability is just a gimmick which buys you nothing since free hardware doesn't exist (please don't bring a FS's joke which certifies a hardware which includes megabytes of binary code which handles the storage, that is: precisely the part which you, somehow, need to trust to run anything at all).

Today so much proprietary code is executed before and after the first bytes of code you may actually control is executed that the whole excercise is mostly pointless.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 7:09 UTC (Thu) by pabs (subscriber, #43278) [Link]

I do not know how they plan to approach it, but the Bootstrappable Builds folks definitely have thought about those things, I have definitely seen discussions on their channel about bootstrapping hardware from scratch. They are nothing if not ambitious.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:14 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

> The Bootstrappable Builds folks are working on a bootstrap path from ~512 bytes of machine code plus tons of source code all the way up to a full distro.

Write your C compiler in Lisp or Forth. Forth certainly, and Lisp I think, both have a core that can be expressed in precious little assembly, and then just build everything from source on top.

Cheers,
Wol

DeVault: Announcing the Hare programming language

Posted May 3, 2022 2:28 UTC (Tue) by pabs (subscriber, #43278) [Link] (1 responses)

I asked about this on their IRC channel and got this response from oriansj:

Well we did bootstrap a FORTH from hex: https://github.com/oriansj/stage0/blob/master/stage2/forth.s

and we did bootstrap a garbage collecting Lisp from hex: https://github.com/oriansj/stage0/blob/master/stage2/lisp.s

but if you notice: https://github.com/oriansj/stage0/blob/master/stage2/cc_x...

writing a C compiler in assembly that supports structs, unions, arrays, inline assembly and a bunch more was done in less than 24 hours by an inexperienced C programmer. Who then after started doing bootstrapping speed runs to demonstrate how trivial of a problem it is to implement that level of functionality in a C compiler.

In the decades for which Lisp and FORTH existed, why didn't they solve such a trivial problem?

Or better yet, now that you can see how it is done. Could anyone actually produce a C compiler with the same level of functionality in Lisp or FORTH in the same amount of time (or less?).

It is easy to talk a big game and words are cheap, we have the entire cc_* family of C compilers written in assembly for multiple architectures and in even cross-platform arrangements. If your language was any good at bootstrapping you'd be able to beat that. Then show your language written in less lines of assembly than cc_x86 to prove the point.

Please prove me wrong with working code.

Assembly and C have working code good enough to bootstrap GCC+Guile+Linux for more than a year now. https://github.com/fosslinux/live-bootstrap https://github.com/oriansj/stage0-posix

It is time for Lisp and FORTH to either deliver or learn to stop talking about something they never were good at in the first place and learn to admit Assembly and C won not because "worse is better" but because objectively they are better languages for bootstrapping new and better tools.

DeVault: Announcing the Hare programming language

Posted May 7, 2022 20:35 UTC (Sat) by anton (subscriber, #25547) [Link]

In the decades for which Lisp and FORTH existed, why didn't they solve such a trivial problem?
What makes you think they didn't? There is CC64, although you may be unhappy with the language features.

The other answer is: What itch would a Lisp or Forth (rather than assembler) programmer scratch by writing a C compiler?

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:25 UTC (Mon) by excors (subscriber, #95769) [Link] (24 responses)

> What immediately disqualifies hare as a serious contender in the space of systems programming languages in my opinion is that it proclaims to never support proprietary operating systems in the upstream implementation.

That seemed surprising, but it is indeed proclaimed by https://harelang.org/platforms/ :

> Hare (will) support a variety of platforms. Adding new (Unix-like) platforms and architectures is relatively straightforward.
>
> Hare does not, and will not, support any proprietary operating systems.

So it's not just a "this is an early release and we're focused on Unix-like platforms", it sounds like active hostility towards Windows and macOS. And that sounds like a huge barrier to adoption - why would I bother learning a language that wants to stop me writing applications that most users could run?

Even if a third-party compiler provided Windows support, I assume some Unixisms would creep into the standard library and make it hard to port applications. One significant portability issue for a systems language is filenames (which are arbitrary 8-bit sequences on Linux and arbitrary 16-bit sequences on Windows)... except it looks like Hare's path handling is already inadequate for Linux, because all the fs:: functions use 'str' which is specified as a UTF-8-encoded Unicode string (and violating that is undefined behaviour, according to strings::fromutf8_unsafe), so it can't handle all valid Linux filenames, and the inability to handle all Windows filenames is the least of its problems. (Rust has the OsString type to handle both platforms robustly.)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:12 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

> it sounds like active hostility towards Windows and macOS. And that sounds like a huge barrier to adoption - why would I bother learning a language that wants to stop me writing applications that most users could run?

I agree. In haproxy initially we used to laugh when users asked whether it worked on windows. Until one day someone sent us patches to support cygwin and said "OK the performance is terrible and the limitations abysmal but it allows me to check my configs and to progressively familiarize my coworkers with the product without having to start with the big jump". It made sense. We merged the patches (which were not much invasive) and nowadays the cygwin build is part of the other ones in the CI and even helps spotting portability issues. Is it really used for anything serious on windows ? Probably not, at best they use it as an HTTP sniffer/debugger. Do some users value this availability for their own training and ease of testing ? Certainly.

I'm glad we did nothing to make this impossible. That would only have brought some misplaced pride on our side and annoyed users with no valid reason.

Lesson learned: never say "we will never support X or Y", rather "we have no intent of doing it so far".

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:09 UTC (Mon) by excors (subscriber, #95769) [Link] (22 responses)

> it looks like Hare's path handling is already inadequate for Linux

To be a bit more concrete here: If I create a file like "\xffdummy.txt", and try to read the filename with glob::glob or os::diropen, or pass it as a command-line argument, then Hare dies with:

> Abort: Assertion failed: /usr/src/hare/stdlib/strings/cstrings.ha:33:1

because of an "assert(utf8::valid(s));"

If I create a file like "\xf8dummy.txt", then utf8::valid (wrongly) thinks it is valid, so I can read the filename into a str. strings::iter says the first rune (aka Unicode codepoint) is U+935B6D. strings::riter says:

> Abort: /usr/src/hare/stdlib/strings/iter.ha:68:22: Invalid UTF-8 string (this should not happen)

At least it's not a memory-safety error in this case, though it wouldn't be surprising if some other function did non-bounds-checked accesses under the assumption that strings are UTF-8 (as promised by the specification).

Even when it just aborts, that seems unfortunate for a systems programming language - maybe you clone a Git repository with some non-UTF-8 filenames, and your 'ls' and 'rm' were written in Hare so now you can't see or delete the files.

I think the fundamental problem here is that if you want to build a language with Unicode strings, and use it to interact with external systems, you need a good way to handle strings that are not quite Unicode. C/C++/Go/etc just don't bother guaranteeting Unicode. Python 3 got really into Unicode before discovering it didn't work for filenames or lots of other real-world data, and bodged it with surrogateescape (so now Python's Unicode strings aren't Unicode) and with many duplicated APIs between str/bytes/bytearray. Rust uses its type system to provide Path/OsString/etc which handle non-Unicode strings safely, with trait-based conversions that mean the easy cases are still easy to write (File::open("foo.txt") etc) and explicit fallible/lossy conversions when you really need a Unicode string.

Hare wants Unicode strings (which I think is a good goal), but the standard library needs to provide an interface to the not-quite-Unicode real world, and I'm not sure if the language has enough features to ever implement it well.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:24 UTC (Mon) by ddevault (subscriber, #99589) [Link] (10 responses)

This is a case that we were aware of when designing these interfaces. For a time, most filesystem-related interfaces accepted a tagged union of either (str | []u8). However, we ultimately decided that the additional complexity was not justified because the use-case is not justified: filenames *should* be UTF-8.

However, it is still possible to bend the language to your will if you know your use-case demands this to be otherwise. You can force non-UTF-8 data into a str type (knowing that you're breaking the language invariants and that the stdlib and third-party code relies on your broken assumptions) via strings::fromutf8_unsafe. You can then pass this into os::remove or something to get rid of your bad file. A tool like rm could be written specifically with this in mind, to clean up bad files. Additionally, in your git example, if ls and rm are implemented in Hare, then so too is probably your git implementation - or your kernel, which would enforce UTF-8 filenames when opening files.

I will note that this particular decision was a big agonizing, and that we may revisit it before 1.0.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:21 UTC (Mon) by mpr22 (subscriber, #60784) [Link] (4 responses)

I 100% agree with you that, on filesystems that (a) store filenames as byte sequences and (b) do not embed metadata in the filename part of directory entries, all filenames should be well-formed UTF-8.

Sadly, they aren't, and despite entirely agreeing with you that all filenames on Linux systems should be well-formed UTF-8, I find the behaviour excors reports unacceptable in a programming language's standard runtime library.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:27 UTC (Mon) by ddevault (subscriber, #99589) [Link] (3 responses)

I understand where you're coming from in this respect. Again, this was a difficult choice, and we may revisit it, but it simplifies the situation considerably for the 99.999% of use-cases where non-UTF-8 filenames are not present. In the remainder, it's very unlikely that anything worse than the program aborting will occur (e.g. overwriting such files). I would encourage you to present your case for this at the standard library design acceptance review committee (or the filesystem committee, a likely spin-off), which will be formed prior to 1.0.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:39 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (2 responses)

Didn't Python 3 already solve this problem with surrogateescape?

I'm not saying it's an elegant solution, or even necessarily a good solution, but it's less painful for the 99% case (as compared to using a tagged union), and it works for the 1% case, usually without losing any data (unless you're trying to manipulate filenames as strings, in which case any solution will probably be terrible anyway because there's just no good way to do that).

DeVault: Announcing the Hare programming language

Posted May 2, 2022 19:42 UTC (Mon) by ddevault (subscriber, #99589) [Link]

I mean, it's a complex set of trade-offs. To consider the Python approach, we'd have to untangle a pretty large can of worms having to do with string handling. We can't just lift surrogateescape wholesale - Python and Hare have very different goals and we have to think carefully through every implication of such a change so that the language remains consistent and reliable in its design throughout. And yes, it's an inelegant solution - and we prefer the elegant ones.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:40 UTC (Mon) by excors (subscriber, #95769) [Link]

surrogateescape means that code like print(*os.listdir()) can throw a UnicodeEncodeError. The programmer has to manually keep track of which values of the 'str' type are really Unicode and will work with all the standard string functions, and which are nearly-Unicode but will occasionally throw if you try to print or encode them like normal strings. It might be the best hack that's possible in Python given its compatibility constraints, but I think it creates as many problems as it solves. A statically typed language should be able to do better - tracking this kind of information about the range of values is what type systems are for.

At least if you get it wrong in Python, you'll probably just get an exception. In a non-memory-safe language, some code might rely on the promise that strings are Unicode and violating that could cause undefined behaviour; you need to either strictly enforce that promise, or not make the promise at all.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:04 UTC (Mon) by excors (subscriber, #95769) [Link] (4 responses)

Deliberately violating str's UTF-8 invariant sounds scary, and an application developer should probably never pass such strings into any standard library function that expects a str (because they can't know if it'll e.g. try to iterate over codepoints and crash), so I expect they'd have to implement their own 'path' type which wraps a []u8, and write their own functions to concatenate and split and compare path-strings and convert to/from str, and use raw syscalls instead of the os module. And that would be needed by every application that wants to work reliably on real-world Unix systems (where filenames occasionally come from ancient backups and from FAT32 USB sticks and from zip files and from malicious users etc, which won't respect anyone's desire for a perfect Unicode world). That sounds like exactly the sort of widely-used low-level functionality that should be the responsibility of the standard library. And it's the language's responsibility to provide features so the library can implement an API that's both correct and convenient.

Otherwise nearly everyone will write applications with the standard library, and it'll be fine for 99.999% of users, then a few years later they'll get a bug report saying it crashes for one user with a mysterious error message and they'll spend hours debugging it and then spend days replacing the standard library with a new library that actually works, and repeat for every application that has a large number of users. That's a lot of effort that would have been saved by doing it correctly from the start.

(But even if application developers do try to avoid the standard library's path handling, os/+linux/environ.ha's init_environ runs before main and asserts when a non-UTF-8 string is passed on the command line.)

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:11 UTC (Mon) by ddevault (subscriber, #99589) [Link] (2 responses)

>Deliberately violating str's UTF-8 invariant sounds scary

Well, Hare is standardized, and open source, and runs in a standardized environment (x86, though as someone who has read the Intel and AMD CPU manuals, I can attest that it's not very fun). If you need to break the invariant, it's a serious move to consider, must be very well justified, and should raise eyebrows during code review - but you can objectively evaluate the consequences of that decision by examining where your tainted string will end up and planning for its behavior. We even make it easy for you to vendor standard library modules so you can ensure their behavior is consistent with an earlier evaluation. This is an example of "trust the programmer" - it's pretty ill-advised to do this, but if you really need to, you can. Breaking the str invariant is probably a case where you should really reconsider, though. There are less severe examples - forcing a bad value into a global (e.g. null into a non-nullable pointer) and fixing it up during @init is one I've encountered from time to time.

> (But even if application developers do try to avoid the standard library's path handling, os/+linux/environ.ha's init_environ runs before main and asserts when a non-UTF-8 string is passed on the command line.)

Good catch. You can still technically get around this (vendor os and patch it, don't import os and use rt to make the syscalls directly, etc), but I admit that it's going to be very contrived to get around this problem.

Like I said, we were well aware of all of these issues and this is why it was a very difficult decision to go UTF-8-only for paths.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:20 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

> If you need to break the invariant, it's a serious move to consider, must be very well justified, and should raise eyebrows during code review

*My* concern is less about the code review that adds the `_unsafe` call. I worry more about the code review that later edits the function with the `_unsafe` outside of the default context view doing something "convenient" like printing it. Maybe the variable would handily be named `path_for_os_calls_only`, but my experience is that no one is that nice to their time-separated co-developers.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 20:23 UTC (Mon) by ddevault (subscriber, #99589) [Link]

At the very least, I would expect any use of _unsafe to include a comment explaining why it was done in spite of the risks. Would not look forward to being that future colleague regardless.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:30 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> And that would be needed by every application that wants to work reliably on real-world Unix systems (where filenames occasionally come from ancient backups and from FAT32 USB sticks and from zip files and from malicious users etc, which won't respect anyone's desire for a perfect Unicode world)

Well, I can say for certain that there are many places where it's not just ancient backups nor FAT32 USB sticks, but just regular file names used every day. As soon as you have shared file servers for lots of employees, there's never a single moment where you can declare that the encoding will change because you'll break a lot of shortcuts and file names for plenty of employees. Thus you keep in place the perfectly working system you used to have, and do that for decades if needed because in the end the one without encoding is still the one that works best (most users access only their own files with their machine's encoding, and shared files rarely use fancy chars). I personally never put non-ASCII chars in my file names so I'm fine but I've seen quite a bunch of filesystems with mixes of CP1252 from Windows users via a Samba share and ISO8859-1 from UNIX/Linux users via an NFS share.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:43 UTC (Mon) by bartoc (guest, #124262) [Link] (9 responses)

C++ has char8_t, which is utf-8 or UB and also pointers to it don't alias char*s (or anything else). This means constructing them from a char* that you already know is valid utf-8 is….. a memcpy. They are a complete and utter mess. C has such a type too, and its just a big of a mess there too.

python’s pep383 is actually a pretty cool scheme, but is a bit “nuts”.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 10:41 UTC (Tue) by peter-b (guest, #66996) [Link] (2 responses)

> C++ has char8_t, which is utf-8 or UB and also pointers to it don't alias char*s (or anything else).

To my enormous displeasure, unfortunately it isn't UB for a char8_t* to point to a string buffer that doesn't contain UTF-8.

> They are a complete and utter mess.

I can absolutely confirm this.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 11:07 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> it isn't UB for a char8_t* to point to a string buffer that doesn't contain UTF-8.

One reason that comes to my mind is that it would need to eschew `u8ptr++` because if it currently points at a code point encoded with multiple bytes, incrementing one byte would make it UB, no? Or would `++` inspect the byte encoded length of the current code point and jump an appropriate amount? That certainly seems novel too.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 0:50 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Because both of the increment operators in C++ can be overridden for a type, this isn't difficult to do, if C++ wanted to do it.

But I don't expect C++ to actually do that lifting for the same reason it still has both these silly operators in the first place. Backward compatibility trumps all other considerations.

Rust's str actually provides both iterations, if you want the underlying *bytes* you can have those, and of course individual bytes are just UTF-8 code units and on their own don't necessarily mean anything specifically, but if you want "characters" (Rust's char) you can iterate over those and under the hood it is indeed moving forward the appropriate number of bytes each time.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 15:02 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (5 responses)

> C++ has char8_t, which is utf-8 or UB and also pointers to it don't alias char*s (or anything else). This means constructing them from a char* that you already know is valid utf-8 is….. a memcpy. They are a complete and utter mess. C has such a type too, and its just a big of a mess there too.

char* can alias anything, so I would be very surprised if they made a special exception for char8_t... are you absolutely sure that it really is UB to alias them?

DeVault: Announcing the Hare programming language

Posted May 3, 2022 19:39 UTC (Tue) by foom (subscriber, #14868) [Link] (4 responses)

You are allowed to access any object as bytes via a `char*`, but the opposite -- accessing a char-array as some other type -- is (surprisingly!!), not obviously allowed. See e.g. this discussion, https://stackoverflow.com/questions/12612488/aliasing-t-w...

DeVault: Announcing the Hare programming language

Posted May 4, 2022 1:37 UTC (Wed) by NYKevin (subscriber, #129325) [Link] (3 responses)

Of course that's not allowed. Then there would effectively be no aliasing rule at all, because "X can be aliased as Y" is generally understood to be a transitive relationship.[1] If you can alias anything to char* and then alias char* back to anything else, then you can alias anything to anything else and all bets are off.

Anyway, any halfway-decent compiler will identify the memcpy as redundant and optimize it out, so you should just bite the bullet and call memcpy.

[1]: It has to be transitive. The alternative would be "I have a Y*, but I don't know whether it can be aliased as a Z*, because someone else gave it to me, and I don't know whether it was originally a Y* or originally an X* that they aliased to a Y*."

DeVault: Announcing the Hare programming language

Posted May 4, 2022 9:09 UTC (Wed) by excors (subscriber, #95769) [Link] (2 responses)

I don't think it's meaningful to talk about transitivity of aliasing, because aliasing is not specified as a relationship between two things of the same kind. It's a relationship between the type of an object, and the type of the pointer being used to access it. (In particular it's not a relationship between two pointers, so it can't be transitively extended to a third pointer.)

If I understand correctly, C++20 says the type of an object is determined by its definition, or by (placement) new, or by assignment to a union, etc, or objects can be "implicitly created" by malloc()/memcpy()/etc and the compiler will act as if it magically determined the type of those objects in order to avoid undefined behaviour when they are subsequently used.

You can access an object of any type T through a pointer to char. You can't access an object of type char through a pointer to T, unless T is char (or similar). That's the easy part.

I think the tricky part is: When does an object have type char? C++20 says "An operation that begins the lifetime of an array of char, unsigned char, or std::byte implicitly creates objects within the region of storage occupied by the array". I think that means "char buf[256];" and "char *buf = new char[256];" are implicitly creating objects of an as-yet-unknown type and size. Regardless of the type, you can access it through char*. If you subsequently access it through char8_t*, that means the type of the implicitly created objects must have been char8_t (to avoid undefined behaviour here), so accessing through char8_t* is also fine.

That means there should be no need to memcpy into an array that was explicitly declared as char8_t, you can just cast the char pointer, because it's really a pointer to char8_t objects even though you declared it and allocated it and used it as char.

(But as is often the case with C++, I'm only about 60% confident in that interpretation.)

DeVault: Announcing the Hare programming language

Posted May 5, 2022 0:32 UTC (Thu) by foom (subscriber, #14868) [Link] (1 responses)

"Technically", I think it might only be be defined in C++20 and later, via the change in <https://wg21.link/p0593> -- though, to the best of my knowledge, no compiler made any changes to become compliant with these changes.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 0:38 UTC (Fri) by khim (subscriber, #9252) [Link]

Without that change it's not really possible to use mmap (or Windows equivalent) thus it was just a matter of fixing the standard. None of them compilers ever broke any code which does these things.

The only thing which may trigger UB there is unaligned access (and yes, it may even happen with x86 since compiler can do autovectorization and use SSE instructions).

DeVault: Announcing the Hare programming language

Posted May 2, 2022 23:54 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

utf8::valid() is not a great function as it stands. If the invariants actually hold, it's just a long-winded way to say "true". When in fact they don't hold, it's hard to say with confidence anything about the state of the system without an intimate knowledge of how all the parts fit together, a view that Hare's users would not have.

Attempts to inhabit states where you lack confidence in your own invariants are a bad idea. Hare should either say (like for example Go) that str is just a bunch of bytes and might not be UTF-8, and thus utf8::valid is a useful function, or, it should admit that yeah, there are a lot of APIs where it's just bytes and we need to manage that at the interface to have str work.

Rust frequently needs some of fairly big guns of parametric polymorphism to do what seems (as the user) like basic stuff. To make file open on a string work for example Rust needs AsRef<Path>. That's two types you probably never think about, and a bunch of implementation boiler plate. No actual machine code is emitted we're only satisfying the type system that, in fact, we know what we're doing. For this reason, I can see Hare won't want to go that route even though I personally prefer it.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 20:25 UTC (Tue) by ssokolow (guest, #94568) [Link]

* What's up with these abbreviations, like e.g. "fmt" and "std". Typing a standard libraries main scope into a search engine shouldn't get you results about "sexually transmitted diseases" and what's up with "Vec"? C++ already admitted that "vector" was the wrong name for a dynamically sized array, rust not only copies that but then abbreviates it on top. Also "fn" and "str" and "recv" and "ctx" etc.
The abbreviations are part of its Ocaml ancestry. That's also where the syntax for named lifetimes comes from (it's Ocaml's syntax for generic type parameters). ...and, more generally, abbreviations like that are common in functional programming languages. See, for example, LISP's famous "defun".

DeVault: Announcing the Hare programming language

Posted May 2, 2022 10:20 UTC (Mon) by roc (subscriber, #30627) [Link] (20 responses)

One thing that makes Hare stand out from other new languages: excluding any kind of generics from the language, and also not providing a built-in generic hashtable type --- instead telling developers to reimplement hashtables wherever they need one: https://harelang.org/blog/2021-03-26-high-level-data-stru...
Though at least they don't expect you to reimplement sorting as well:
https://docs.harelang.org/sort
though the API is very old-school C.

Personally I kinda doubt that this extreme focus on the simplicity of the language (at the expense of security, and making basic stuff like hashtables more work to use, and taking a performance hit for not using a modern compiler backend) is going to appeal to many people who just want to *use* a programming language. We'll see I guess.

The illusion of apparent simplicity

Posted May 2, 2022 10:50 UTC (Mon) by atnot (guest, #124910) [Link] (18 responses)

I feel like there's an unfortunate thing that C-derived languages and projects often do where they confuse the simplicity of something with the simplicity of it's implementation. As the waterbed theory points out, a lot of the time, making something simple creates a lot of complexity elsewhere. Simplicity, to the programmer, is about being able to build good mental models, but the real world is rarely simple to begin with and making things smooth often takes significant effort to fill in the cracks.

Confusing the simplicity of the program with the simplicity of it's implementation lets you instead frame your refusal to address complex problems as a virtue and push all of the burden on your users instead.

The illusion of apparent simplicity

Posted May 2, 2022 11:41 UTC (Mon) by ddevault (subscriber, #99589) [Link] (16 responses)

Simplicity of the implementation is part of your total complexity cost, plain and simple. You cannot offload simplicity onto a vendor. All of the code in your system is your responsibility, and less code == fewer bugs.

The illusion of apparent simplicity

Posted May 2, 2022 13:34 UTC (Mon) by ncm (guest, #165) [Link] (11 responses)

Less code in the language system means more code elsewhere. Complexity is irreducible, but addressing it in a place subject to concentrated attention mitigates risk.

Hare, like Zig and C, offers nothing to help concentrate attention on solutions that programs may rely on. C++ and Rust both do, by supporting encapsulating complexity in powerful libraries that amortize focused attention -- debugging, optimization, comprehensive testing -- across all uses. C++ offers more native language power for the programmer to express intentions clearly and concisely, while Rust builds in compiler support for protecting against errors common in languages like C and Hare. Both further provide well-tested standard libraries that already encapsulate a huge amount of the unavoidable complexity programs must have to deliver essential service.

The illusion of apparent simplicity

Posted May 2, 2022 13:39 UTC (Mon) by ddevault (subscriber, #99589) [Link] (10 responses)

> Less code in the language system means more code elsewhere. Complexity is irreducible, but addressing it in a place subject to concentrated attention mitigates risk.

This is grossly false. Programming is an art form in exceptionally wasteful complexity. It is *not* irreducible, rather we are living in an era of gross negligence on the part of engineers who don't bother to reduce it. Yes, a C++ or Rust x86_ArrayVectorMap<Optional<int>> will outperform []int. But it will have 10x the surface area for bugs - including security bugs - and in 99 cases out of 100 the extra performance and supposed ease of use (leaving aside ease of debugging, which suffers immensely) *just isn't needed*.

The illusion of apparent simplicity

Posted May 2, 2022 14:00 UTC (Mon) by ncm (guest, #165) [Link] (7 responses)

People who fail to understand failures past are doomed to repeat them. Hare repeats them, and sets up users to repeat them, indefinitely.

The illusion of apparent simplicity

Posted May 2, 2022 14:02 UTC (Mon) by ddevault (subscriber, #99589) [Link] (6 responses)

>People who fail to understand failures past are doomed to repeat them.

Precisely my thoughts on Rust.

The illusion of apparent simplicity

Posted May 2, 2022 15:07 UTC (Mon) by ncm (guest, #165) [Link] (5 responses)

Hare addresses none of the problems solved by Rust. It addresses none of the problems solved by C++. It is not evidently meant for Go, Java, or Swift coders. C coders will not want it, because it is not C.

So, its natural competition is Zig. Which of what Zig attempts does Hare not? Does Hare bring anything to the table that Zig does not?

Your indictment of x86_ArrayVectorMap<Optional<int>> might bite if in fact you could identify so much as a single bug, never mind security hole, caused by reliance on it in preference to a less featureful feature.

There is nothing wrong with putting forward a new thing to address old problems. Go and Java addressed old problems by presenting a weaker language explicitly meant for weaker programmers. C++ and Rust address them by presenting a powerful language meant for serious professionals. There is, manifestly, room for all of them.

In a language promoted in a top-level LWN article, I just want to see some indication that its author has enough understanding of old problems and current solutions not to un-solve what is already solved elsewhere, and maybe try to solve others not solved elsewhere. Thus far I am not seeing that.

The illusion of apparent simplicity

Posted May 2, 2022 15:11 UTC (Mon) by ddevault (subscriber, #99589) [Link] (4 responses)

Your sensibilities to not define LWN's scope, at least not so far as I'm aware. A "development quote", which are footnotes in the weekly editions, is also far from a "top-level LWN article".

If Hare does not appeal to you, then do not use it. You have a different set of values than Hare, so I cannot pose its benefits in a way that you will understand, since you view many of them as demerits. Hare does not aim to make any other language obsolete, keep using whatever you like and Hare will be enjoyed by those to whom it appeals.

The illusion of apparent simplicity

Posted May 4, 2022 10:18 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (3 responses)

You haven't answered his question, though.

Who is Hare designed to appeal to, and what does Hare bring to the table for them that Zig does not?

The illusion of apparent simplicity

Posted May 4, 2022 10:22 UTC (Wed) by ddevault (subscriber, #99589) [Link] (2 responses)

Hare is much simpler than Zig, and can do similar tasks. It's has 1/10th the lines of code and both the language and standard library have a more narrowly defined, fixed scope which will not grow in complexity with time. Hare is a more conservative project than Zig and aims to provide a more robust and reliable basis for software built on it for the long-term. A Hare program written on the day of the 1.0 release will still compile and run correctly in 50 years. A Hare *compiler* written on the day of the 1.0 release will still compile new code written 50 years from now.

There are many differences between them, but the core philosophical differences boil down to this.

The illusion of apparent simplicity

Posted May 4, 2022 11:36 UTC (Wed) by pbonzini (subscriber, #60935) [Link] (1 responses)

> Hare is much simpler than Zig, and can do similar tasks

No, it cannot. For example it cannot do async/await.

> A Hare *compiler* written on the day of the 1.0 release will still compile new code written 50 years from now.

Thanks for proving that you aren't learning from past mistakes, I guess.

The illusion of apparent simplicity

Posted May 6, 2022 7:23 UTC (Fri) by ddevault (subscriber, #99589) [Link]

I was not referring to a similar set of language features, but a similar set of supported use-cases.

The illusion of apparent simplicity

Posted May 2, 2022 16:36 UTC (Mon) by kleptog (subscriber, #1183) [Link] (1 responses)

I think the point was slightly different: when solving a problem that problem comes with a certain amount of complexity and that complexity is irreducible if you still want to solve them problem.

If you're writing an HTML parser, a YAML parser, a TLS library, an X.509 parser, the program will contain a certain amount of complexity which cannot be removed while still actually solving the problem. So the question is really: does a programming language allow you to express this complexity without requiring lots of additional overhead complexity.

Writing a YAML parser in Assembler is clearly insane, for example. Processing text files with Awk works really well because that's what it was designed for.

In this day and age, *no-one* should be out there writing their own X.509 parser or TLS library unless you really have a brand new use case that is core to your program. Just import a library and get on with your life. Now, a leftpad library is obviously overboard, but importing libraries to handle complexity that you don't want to deal with yourself is a good thing and shouldn't be discouraged.

The illusion of apparent simplicity

Posted May 2, 2022 16:48 UTC (Mon) by ddevault (subscriber, #99589) [Link]

> If you're writing an HTML parser, a YAML parser, a TLS library, an X.509 parser, the program will contain a certain amount of complexity which cannot be removed while still actually solving the problem. So the question is really: does a programming language allow you to express this complexity without requiring lots of additional overhead complexity.

I actually agree with this take. And we do intend to implement all of these things, actually, except perhaps YAML, which is a den of snakes (so is X.509, but one we unfortunately cannot avoid). However, I don't think these use-cases call for much re-usable logic beyond what Hare already offers, unless someone seeks to implement a parse-them-all abstraction, which I would consider highly misguided. The level of complexity of such implementations in Hare will, I think, map relatively closely onto the minimum required level of complexity. We aim to provide exactly the right number of primitives to support robust implementations, and no more.

Hare does provide several features to help with this sort of thing, by the way. For example, quoting the author of Monocypher on Hare's use of slices for cryptography:

> I like that (apparently) slices are used for the API. Having written a cryptographic library in C, I saw how we are reading from and writing to buffers all the time, and having to specify their length explicitly means my functions have many more arguments than I would have liked.

Other primitives in the stdlib also encourage good/robust design, such as bufio, and things like mandatory error handling ensure you never accidentally forget to verify that some authenc data was actually authenticated.

The illusion of apparent simplicity

Posted May 2, 2022 13:47 UTC (Mon) by ncm (guest, #165) [Link] (2 responses)

False.

Code that is relied on by more programs gets more attention to its reliability, safety, and performance, as the wider use allows amortizing that attention across all uses. This applies to commonly used libraries, moreso to language-standard libraries, and even more to compilers themselves.

A language like Hare, Zig, or C that is inadequate to express powerful libraries necessarily dissipates attention across all the re-implementations of semantics that could have been coded once, in one place, and got right once. Modern languages deliver their value by enabling that expression. A new language that fails to deliver what we have already learned to do in this direction is, at best, an attractive nuisance.

C has a ready excuse: its roots are in the 1960s. We should have higher expectations for a language coming more than five decades later. Hare utterly fails to deliver on any such expectations. It has no legitimate claim on our attention.

The illusion of apparent simplicity

Posted May 2, 2022 13:48 UTC (Mon) by ddevault (subscriber, #99589) [Link]

Then don't give it your attention. I'm not convinced that we would substantially benefit from it in any case.

Let's calm this down a bit please

Posted May 2, 2022 13:50 UTC (Mon) by corbet (editor, #1) [Link]

Discussion of Hare language features, strengths, and weaknesses is clearly appropriate here. But please let's try not to get into language-advocacy flamewars, that really doesn't help.

Thank you.

The illusion of apparent simplicity

Posted May 3, 2022 0:48 UTC (Tue) by roc (subscriber, #30627) [Link]

Code that is used and tested by thousands of people is much less of a problem than code that is used and tested by only a few people. Just adding up all the code complexity over the entire system and trying to minimize that objective function is not the right way to go. If it was, you wouldn't use Linux.

> less code == fewer bugs.

Requiring everyone to reimplement hash tables at every use will not be "less code" or "fewer bugs".

The illusion of apparent simplicity

Posted May 3, 2022 20:22 UTC (Tue) by ssokolow (guest, #94568) [Link]

I'm reminded of this Bryan Cantrill quote:

When developing for embedded systems — and especially for the flotilla of microcontrollers that surround a host CPU on the kinds of servers we’re building at Oxide — memory use is critical. Historically, C has been the best fit for these applications just because it so lean: by providing essentially nothing other than the portable assembler that is the language itself, it avoids the implicit assumptions (and girth) of a complicated runtime. But the nothing that C provides reflects history more than minimalism; it is not an elegant nothing, but rather an ill-considered nothing that leaves those who build embedded systems building effectively everything themselves — and in a language that does little to help them write correct software.

-- http://dtrace.org/blogs/bmc/2020/10/11/rust-after-the-honeymoon/

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:24 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Sort is interesting because it says it is (or at least, that it might be, and since this isn't 1.0 let us assume it is) an allocating sort, but it doesn't say whether it's stable. I presume it's a stable sort.

I feel like the kind of people who might actually care about the micro-managing needed to be effective in a language like Hare probably want at least the option of an in-place (ie not allocating) sort, even if it's unstable. After all, if you're actually sorting, say, integers, you certainly don't care about stability but you might care about the allocation cost.

Hashtables are hard, and there is a lot of opportunity to either malfunction or mistakenly destroy your performance, which is a reason that Rust - which (contrary what is sometimes suggested) actually mercilessly weeded out less useful or necessary data structures for the standard library in 1.0 - still has HashMap and HashSet long after special use structures like LruCache were put out to pasture. You're going to implement HashMap<Key,Value> and once you've done that why not HashSet = HashMap<Key,()> and do the extra lifting for the set-like methods ?

The "modcache" built in the Hare example is an old-school bucketed hash, but with a fixed number of dynamically growing buckets. I assume that in practice Hare would work fine with just a simple vector here anyway, but if not "modcache" is leaving a lot of performance on the floor. In the process of doing that, it also seems to have a nasty bug. If we have two modules with identical hashes (not difficult to arrange for the hashes provided with Hare, there is no SipHash) then it seems to me that ...

if (bucket[i].hash == hash) {return bucket[i].task; }

... never checks this is actually the same module, only that it has the same hash.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 12:01 UTC (Mon) by stephanlachnit (subscriber, #151361) [Link] (16 responses)

Too bad that it's yet another programming language that fails to use shared libraries instead of statically linking all the dependencies. I doubt that these languages will ever making it to "operating system languages", because OS maintainers don't want static linking, since like forever.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:22 UTC (Mon) by camhusmj38 (subscriber, #99234) [Link]

I'm both agree and disagree - dynamic linking is great for lots of purposes but it's also fragile unless you're really careful about comparability. Version 1.0.1 and 1.0.2 of a shared library should be compatible but your mileage can vary.

You also lose the advantages of whole program analysis and potentially some performance gains.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 17:25 UTC (Mon) by wtarreau (subscriber, #51152) [Link] (13 responses)

Static linking is sufficient when you don't care about the security of your dependencies, which, sadly, tends to become a tradition nowadays, for the profit of attackers :-(

Sure it's convenient to only scp an executable and see it working. But when it ships with a massively outdated version of openssl, libpng, zlib and what not, this is a serious concern.

As much as I hate the breakage caused by poorly managed shared libraries, and for having hated them for years, I now do prefer to use them in programs that I distribute so that users are autonomous when it comes to applying important updates.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 0:40 UTC (Wed) by ilammy (subscriber, #145312) [Link] (12 responses)

Arguably, only OpenSSL is a “serious” concern for static linking here. Things like libpng and zlib are security liabilities only due to C legacy. Most of the application dependencies can be statically linked, provided they are written in memory-safe languages.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 9:05 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (8 responses)

> Most of the application dependencies can be statically linked, provided they are written in memory-safe languages.

It's impressive to see how many people actually believe that memory-safe==secure and non-memory-safe==insecure, in particular when you consider that optimizing compilers can *remove* some of your security checks that are "proven" to be useless. It's fortunate that the world of secure microcontrollers is not run by people thinking like this or we'd be doomed with secure devices giving off their secrets at the first voltage hickup!

There must have been some strongly directed brainwashing somewhere, because quite frankly, a huge part of security issues that are met every day are not just memory-safety issues, and spreading the belief that you can ignore library updates for components that are written using "memory-safe" languages is dangerous.

For example I remember discussing with people who told me they were storing credit card numbers in "numbers" in their language, without knowing what the language limits were. Just do that in JS and you only have 53 bits of mantissa. It turns out that 2^53 is too short to store all those possible numbers, it will only accurately represent numbers up to 9007 1992 5475 0991 and above this odd and even numbers will be represented as the same even one. So in such applications if your credit card number is below that value it will be accurately represented, but above this, if it's odd, the next number will be used, and if it's even, you may be charged for the previous number's orders, or be accusated in a court of having been present in a shop or leaving a parking because that number was seen there. Such bugs can have huge impacts (and are hard to fix later) and never involve anything related to memory safety.

This is just a simple example showing how programmer's deliberate ignorance about computing can have important security impacts that the compiler may not always solve. Most of the time the trouble is much more limited, such as being trivially sensitive to DoS attacks by not having any idea what amount of resource a given operation will require (such as the fun sites that propose you to test some regex and that are often found down after some jerk sent complex ones that can take hours or days to evaluate).

DeVault: Announcing the Hare programming language

Posted May 4, 2022 10:23 UTC (Wed) by ilammy (subscriber, #145312) [Link] (5 responses)

I never said one should ignore updates in the dependencies, or that there can never be security vulnerabilities in libraries written in memory-safe languages. Please stop dealing in absolutes.

Of course, vulnerabilities are not limited to RCE caused by undefined behavior due to memory safety issues. Just about any library parsing anything might have an implementation bug which can causes a DoS, which is a vulnerability too. Just about any “big” library might be having some obscure feature that indirectly calls into shell. But that’s hardly on the same scale of impact as, say, heartbleed – not of the “drop everything and patch your software yesterday” scale. Nor it has the same occurrence and memory-safety-related bugs in non-memory-safe libraries.

Look at zlib, for example. Here are some stats for zlib-related CVEs published since 2019:

in zlib:
  3 missing bounds check (CVE-2018-25032, CVE-2021-26025, CVE-2018-20819)
  1 double free (CVE-2019-12874)
  1 unbounded memory allocation (CVE-2020-11612)

in zlib usage:
  2 DLL hijack (CVE-2021-26807, CVE-2020-11081)

(Speaking of which, shared libraries themselves open up possibilities for attacks, like DLL hijacking.)

Using shared libraries does not magically free you from keeping track of your dependencies that might wreck havoc, or from maintaining your systems. The purported argument for shared libraries is that “distro maintainers” (= people who are not us) are going to take care of the updates, so that we don’t have to rebuild and update our applications. But who’s going to install these updates to shared libraries? Who is going to restart affected services? Who is going to figure out which systems need an update? When all these details are considered, the static/shared choice is so minuscule that it’s hardly important.

I believe the main benefit of shared libraries currently is that you don’t have to deal with whatever obscure build system is used by the dependency. Because chances are, if it’s distributed as a shared library – not in source-only form, already integrated in your application – then the library does need a “build system” in the first place.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 12:16 UTC (Wed) by LtWorf (subscriber, #124958) [Link] (2 responses)

> who’s going to install these updates to shared libraries? Who is going to restart affected services? Who is going to figure out which systems need an update?

There is an amazing tool called "apt-get" that does all of those things.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 12:41 UTC (Wed) by ilammy (subscriber, #145312) [Link] (1 responses)

That’s the point. Replacing binaries is only a part of the problem.

APT is not going to get my system have up-to-date repositories that I trust configured all by itself, someone has to administer that configuration.

APT is not going to package my software all by itself, someone has to write all that Debian packaging, telling dpkg that ”my-app.service” is from “my-app” package which depends on “libssl” and needs to be restarted when its dependencies are updated.

APT is not going to decide for me which instance need an update, which instances are safe to update and in what sequence. Someone has to define that, then orchestrate actual update, juggle the load, etc.

APT is not going to test the update for me before it’s deployed. Someone has to try it out first in a safe environment.

Sure, there are automated tools for switching over one binary for another, sending signals, waiting on pipes, etc. But that specific part is just a small fraction of what “update” entails. And those tools do not really care whether they have to replace “libvulnerable.so” or statically linked “vulnerable-executable”.

DeVault: Announcing the Hare programming language

Posted May 5, 2022 9:33 UTC (Thu) by pabs (subscriber, #43278) [Link]

There is needrestart (and needrestart-session) for automatically restarting services after shared library upgrades (and Perl/Python/etc modules). It works fairly well, the main issue right now is cgroupsv2 breaks things a little bit.

https://github.com/liske/needrestart
https://github.com/liske/needrestart-session
https://github.com/liske/needrestart/issues/235

DeVault: Announcing the Hare programming language

Posted May 4, 2022 13:31 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (1 responses)

> The purported argument for shared libraries is that “distro maintainers” (= people who are not us) are going to take care of the updates, so that we don’t have to rebuild and update our applications. But who’s going to install these updates to shared libraries?

No that's not the main point. The main point is that you can limit the amount of stuff you have to replace. When you upgrade your libc to get rid of the ghost vulnerability, all your executables are fixed at once. When they're all static, you have to replace all your executables with the ones the distro vendor had nicely rebuilt for you. And that can take quite some time when there are lots of packages, up to several days for mainstream distros, which will significantly delay the deployment of the fix in field. In addition it means that when there are multiple vendors (local builts counting as a "vendor" as well), it then becomes extremely difficult to make sure the system is fixed.

But fixes deployment is an entire class of problems on its own, there's no single nor excellent solution, there are pros and cons everywhere. Shared libs come with a number of cons but none of them is dramatic, and a number of pros that can keep you out of the mud. Static libs tend to navigate between much worse and much better, and will occasionally leave you a bad feeling.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:11 UTC (Wed) by farnz (subscriber, #17727) [Link]

It's a trade-off. Shared linking forces you to consider backwards and forwards compatibility in your ABI, where static linking permits you to get away with only caring about API.

Static linking also enables library consumers to simply test for the behaviour they expect; with shared linking, you need to be confident that the set of library behaviour you depend upon is by design, and not a happy accident. If you don't take the ABI contracts seriously (on either side of the ABI), then you get broken by mistake as an implementation detail you happened to care about changes.

Tooling to help get this right sort-of exists (abidiff for example), but it's more than just the things that can be checked by tooling that matter - for example, if I change a library so that you now need to call mylib_free on a pointer I returned, where previously free worked, you're going to be surprised.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 10:46 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

> It's fortunate that the world of secure microcontrollers is not run by people thinking like this or we'd be doomed with secure devices giving off their secrets at the first voltage hickup!

Obviously code that's claiming to provide security features needs to be very careful about all types of bug, because any bug might undermine its security guarantees. But the previous comment mentioned libpng and zlib, which (like the vast majority of libraries) don't claim to provide any security guarantees beyond the API's implicit promise that you can provide them with untrusted input and they'll return either an error code or some valid image/buffer. Memory safety isn't the whole of that, e.g. if libpng supported a "read from local filesystem" chunk type then that'd be a security vulnerability when decoding untrusted images - but unsafe features like that are usually pretty obvious from the documentation or from superficial code review, whereas memory safety errors are ubiquitous in C and very hard to find, so they deserve a lot of attention.

(And even competent people writing critically-important secure firmware would benefit from tools that help with memory safety, to avoid e.g. the use-after-free bug in Apple's SecureROM which broke the secure boot process (https://habr.com/en/company/dsec/blog/472762/).)

> It turns out that 2^53 is too short to store all those possible numbers, it will only accurately represent numbers up to 9007 1992 5475 0991 and above this odd and even numbers will be represented as the same even one. So in such applications if your credit card number is below that value it will be accurately represented, but above this, if it's odd, the next number will be used, and if it's even, you may be charged for the previous number's orders, or be accusated in a court of having been present in a shop or leaving a parking because that number was seen there.

Credit card numbers have a check digit (e.g. https://en.wikipedia.org/wiki/Luhn_algorithm), so an off-by-one error will just result in an invalid card number - it won't be mistaken for someone else's valid card. Still a serious bug that should be fixed, of course, but not really a security bug.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 0:49 UTC (Fri) by khim (subscriber, #9252) [Link]

<font class="QuotedText">&gt; Still a serious bug that should be fixed, of course, but not really a security bug.</font>

<p>I think you are mistaken. Lots of cards use <a href="https://en.wikipedia.org/wiki/Payment_card_number#Issuer_...(IIN)">more than 16 digits</a>. What and how JS would do to these numbers is quite a question.</p>

DeVault: Announcing the Hare programming language

Posted May 6, 2022 6:39 UTC (Fri) by flussence (guest, #85566) [Link] (2 responses)

The safest way to handle OpenSSL with static linking is to keep all TLS stuff in a separate process, but that goes for dynamic linking too.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 18:46 UTC (Fri) by wtarreau (subscriber, #51152) [Link] (1 responses)

> The safest way to handle OpenSSL with static linking is to keep all TLS stuff in a separate process, but that goes for dynamic linking too.

That only moves the problem one point away, since it's that program that has to be rebuilt and upgraded all the time instead. Plus for plenty of situations, you're doing extra work due to this. Double-copy of the data between the processes, and extra latency if the process is used as a side-car instead of a proxy. This can only work when TLS is not at all the business of your program and you'd rather defer watching the library updates to another specialized process. That's what plenty of application servers do by deferring that work to a reverse-proxy. It just turns out that my main activity is to develop that reverse-proxy ;-)

DeVault: Announcing the Hare programming language

Posted May 13, 2022 23:56 UTC (Fri) by flussence (guest, #85566) [Link]

Alright, I'll give you the rest of that, but I don't think copying data between processes has been a hard problem for something like a decade now? (or else Linux would be absolutely awful for s/openssl/opengl/)

DeVault: Announcing the Hare programming language

Posted May 3, 2022 20:14 UTC (Tue) by ssokolow (guest, #94568) [Link]

Not surprising though.

https://drewdevault.com/dynlib.html

DeVault: Announcing the Hare programming language

Posted May 2, 2022 12:52 UTC (Mon) by Tobu (subscriber, #24111) [Link] (1 responses)

Going by just the design principles, especially the last one:

  1. Trust the programmer.
  2. Provide tools the programmer may use when they don’t trust themselves.
  3. Prefer explicit behavior over implicit behavior.
  4. A good program must be both correct and simple.

The design direction for a language should tell about the language rather than about "good programs" in general, saying nothing about where they exist. If the language is to encourage correct programs, this implies reducing the space of programs that it is willing to compile. It means sufficient tools that the programmer doesn't in fact need to trust themselves, or that it could be the exception.

That point about language design space and the space of legal programs is better made in the intro paragraphs of Some mistakes Rust doesn't catch, up until the first code snippet. If making "bad programs" with Hare is too easy, if the type system isn't up to building safe abstractions, I doubt Hare will have a shot at being stable and robust.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:25 UTC (Mon) by ddevault (subscriber, #99589) [Link]

Reading through this post on mistakes that you've linked to, saving for once it gets into the weeds with operator overloading and generics and threading (all of which is absent in Hare), all of the same limitations enforced by Rust are also enforced by Hare. For example, Hare detects unreachable code and considers it a compile-time error, and enforces initializers for all globals/variable bindings.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:28 UTC (Mon) by IanKelling (subscriber, #89418) [Link] (2 responses)

I think the language is very interesting, and it is great that the compiler is GPLv3. I wonder what the designer's thoughts are on a language package manager. If someone creates one, it would be nice to see one that only accepts source code, and only accepts freely licensed code.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 13:45 UTC (Mon) by ddevault (subscriber, #99589) [Link] (1 responses)

Language designer here. My thoughts on package management is that it is best left to distributions, who already know how to do it well. Dependencies should be selected thoughtfully and conservatively - central package management with easy-to-publish, easy-to-depend-on packages leads to the npm disease. Hare dependencies are installed to /usr/src/hare/third-party when you install "hare-irc" or similar from your system package manager.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 5:57 UTC (Tue) by lkundrak (subscriber, #43452) [Link]

> My thoughts on package management is that it is best left to distributions

I applaud you for this.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 14:21 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (7 responses)

So, some feedback.

Firstly, surface annoyances. I go to https://docs.harelang.org/ascii and I see that many of these functions are defined over a type named "rune" and, since I am learning about Hare I wonder what "rune" is exactly... it's written in blue text, so I try to click it and... nothing, it isn't a link.

Don't do that. Links are blue, maybe you wish they weren't blue, too bad they are and you're stuck with that. Not in a "But I can just write a style sheet" sense, in a cultural sense. English orthography is terrible, links are blue, some people don't like Jazz - these are things which might change someday but not in our lifetimes. Best option: Make them actually links, which means writing documentation for your types and then integrating that.

OK, so "rune" is documented in the Hare Specification. It says "The rune type represents a Unicode codepoint, encoded as a u32". You should make very sure this is what you meant, and if you aren't sure, abort or hand this entire part of the language to somebody who knows what they're doing.

Unicode's Code Points are just integers from zero to 0x10FFFF, so this is very convenient to implement, but it has serious implications. In particular _not all code points can be represented as UTF-8_. To make UTF-16 work, some code points were permanently reserved for this encoding, and you can't (mustn't) encode them as UTF-8. Since Hare claims to be able to convert a rune into UTF-8 this is definitely not what you've implied in the language and its standard library.

You should also consider that not all the Code Points can mean anything, for example 0xFFFE is a valid Unicode Code Point, but, there deliberately can't ever be a character assigned, 0xFFFE usually means "Oops I forgot to byteswap this UTF-16 or UCS-2 data" rather than "For some reason here's U+FFFE which is explicitly reserved and doesn't exist".

That phrase "encoded as a u32" tells me a lot about your implementation but very little about what I can expect as a programmer. What happens if I set a rune to 0x123456 ? Does that not compile somehow? Does it fail at runtime? Does it have Undefined Behaviour? Are libraries required to cope with this nonsense "rune"?

While we're still looking at the documentation: Something I really like in Rust is that its inline documentation tends to give examples. This serves two functions, as the user of the documentation the example reveals why we want this, while the description only gets to tell us what it does, which means if you lack imagination you might skim past it. But also, Rust's unit test runner assumes your examples work unless you specify they're examples of mistakes (if they don't work they're not very good examples are they?) and so it will test them. Now you're getting a two-for-one bargain.

Now, as a higher level critique, I was interested to see that Hare's "error flag" on result types tries to be a middle ground between Sum Types (in something like Rust or Haskell) and the inadequate wasteland of C where you have this separate magic errno value which you will forget to check. I think it's actually doing its job in the crypto code for example, which is somewhere that I particularly value Sum Types because the result of decrypting the data can be "No, this data isn't authentic" and not some decrypted data at all and that distinction is crucial yet it's silenced in C. However, even though this is a valiant attempt, I can't love it when Rust showed you could do actual Sum Types and go real fast.

Another high level critique, the choice to "freeze" Hare 1.0 ensures that if Hare is successful (I have after all been wrong many times) it will be displaced within perhaps a decade by something more agile which was able to adapt to changes we can't foresee, something a frozen Hare 1.0 cannot do. The bogeyman of Python 3 has been used too often I think to persuade people that you can't improve things as they're necessarily always too fragile, and that's just not true or at least not the whole truth. Likewise for Hyrum's Law. Hyrum Wright doesn't think you can't change stuff, it's his job to change stuff, he's warning you how difficult it will be so that you don't underestimate the task and necessity of preparation for it.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:08 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (4 responses)

I strongly suspect that rune was stolen directly from Go. In Go's formal specification, rune is nothing more than an alias for int32 (which is signed!). There's no "represents" language, or even any mention of Unicode at all. It's just an integer, and if you want to put nonsense U+FFFE in it, then Go will let you.

Given the brevity of Hare's documentation of rune, I would tend to assume it's either a direct alias for u32, or a different type with exactly the same semantics (but the compiler will scream at you if you try to mix and match them).

> Likewise for Hyrum's Law. Hyrum Wright doesn't think you can't change stuff, it's his job to change stuff, he's warning you how difficult it will be so that you don't underestimate the task and necessity of preparation for it.

Speaking as an SRE, who focuses more on practice than on theory, my interpretation of Hyrum's Law is, essentially, "You should change/break things on a regular basis unless you want to be stuck supporting them forever." For example, let's say that service X is unsupported but has been chugging along, best-effort, and somehow manages to provide three nines of uptime on average. Then Hyrum's Law suggests that you should have a short planned outage once a year so that everyone knows they can't really rely on it (because it's not staffed, and when it inevitably breaks for real, we don't want to have a crisis). Similarly, if your service is supposed to provide responses within (say) 100 ms, but it actually responds in 5 ms, then you should occasionally try performing to your documented latency, and see if anyone screams. If you don't do those things, then your actual performance becomes the new baseline.

Obviously, from an API-design perspective, you're generally going to be on the other side of the fence, the "I really do want to support it forever" side (an ever-shifting API would be incredibly hostile to developers). The problem is, forever is a very long time, so you need to think carefully before you make that kind of declaration. It's not really something you can change your mind about.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 11:41 UTC (Tue) by nix (subscriber, #2304) [Link] (3 responses)

> Then Hyrum's Law suggests that you should have a short planned outage once a year so that everyone knows they can't really rely on it

... which means nothing you support can be relied on, so why would anyone want to use it? Nobody likes their stuff suddenly breaking when they need it. Carry this to its extremes, with everyone doing this sort of thing in an uncoordinated fashion, and you get systems that never work because *some* component is always broken by its dependencies silently breaking. This doesn't exactly seem like a good future to me.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 12:24 UTC (Tue) by mathstuf (subscriber, #69389) [Link]

The words just prior to that conditionalize it on:

> let's say that service X is unsupported but has been chugging along, best-effort, and somehow manages to provide three nines of uptime on average.

The outages are letting people know that they're on thin ice. This is far better than doing a 3-day notice of "oh, sorry, we were cleaning up and realized no one was working on some project; no one cares internally, so you shouldn't either. good luck, have fun, so long, and thanks for all the fish^W^W^Wyour faithful patronage" and then being immortalized on something like the Google Graveyard[1], its sibling sites, or Our Incredible Journey[2]. Think of it as deprecation warnings for network services.

Oh, you did remember to check your error codes on external service communications, right?

[1] https://killedbygoogle.nl/
[2] https://ourincrediblejourney.tumblr.com/

Hyrum's Law for Downtime

Posted May 3, 2022 13:48 UTC (Tue) by atnot (guest, #124910) [Link]

The point of the planned outages and similar measures is to ensure that "never goes down" does not accidentally become a property of the system that code develops such a hard reliance on that it causes additional downtime.

Lets say for example (inspired by real events) that you have some some super reliable database cluster. It's so simple and reliable that it doesn't fail a single request for two years. Over that time, a lot of applications get written that consume that service. Because it is so reliable, the developers never notice that they have introduced bugs in their timeout, retry, backoff or failover logic.

Then one day, there's a hiccup on one instance of the cluster and it hangs on some requests for a few seconds. A small fraction of the application processes hang, or crash and get restarted. Because the database is incredibly reliable, the application has started to depend on the database being available at startup, and starts crashing in a loop. The database instance gets overwhelmed and goes unresponsive for a few seconds. This repeats the process, causing more and more application services to crash, until eventually none are left in a running state. All of them are constantly hammering the database trying to start up, taking it down completely. This outage cascades through all of the downstream dependents, taking days to fully resolve.

When people accidentally rely too heavily on things being available, even the smallest, transient failures start having serious consequences. Those consequences often cause far more damage and user-visible downtime than simply causing a few seconds of deliberate downtime a month would have.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 13:50 UTC (Tue) by farnz (subscriber, #17727) [Link]

Getting you to stop depending on unsupported services is the goal. The service is unsupported, and its uptime is a fluke - by taking it down frequently, you cause people who need it to be up to switch to something that's supported in order to retain their uptime, or they do whatever's needed to get the service they depend upon back into support.

Basically, it's better to have the service become deliberately unreliable and trigger people into worrying about it, than to let it float along seemingly working just fine and then go offline permanently when it breaks and no-one knows how to fix it. A planned outage means that anyone who doesn't know that they depend on your unsupported service learns about their dependency at a point where fixing it is trivial, rather than finding out that they depend on it when the service breaks in a way that cannot be fixed.

I've had experience of working somewhere that refused to have planned outages for unsupported services - it was not pretty when the hardware failed, and it turned out to be impossible to get replacement parts, and non-trivial to port the software onto a machine we could get. And it turned out that something critical depended on the service on hardware that was now dead and not replaceable. Oops.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 18:39 UTC (Mon) by bluss (guest, #47454) [Link]

The `rune` is in a code block and it is blue because of syntax highlighting. It's a nice feature, and with context awareness (human language processing is pretty good at this!) I think the ambiguity with blue link expectations resolves itself pretty well.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 20:34 UTC (Tue) by hsivonen (subscriber, #91034) [Link]

Allowing all Unicode code points is what Python 3 does. The result is very bizarre: you can have both non-BMP characters as single units and surrogates—even as pairs.

The logical value space of UTF-8 strings is sequences of Unicode scalar values. Rust’s char being restricted toa Unicode scalar value is coherent with this.

(“Unicode code points” is almost never the right answer between “Unicode scalar values” and “UTF-16 code units”.)

A programming language shouldn’t prohibit special scalar values like U+FFFE. CLDR collation uses U+FFFE as merge separator: The concatenation of str1, U+FFFE, and str2 is guaranteed to collate equivalently to first collating on str1 and, if it is equal, then collating on str2—even for Canadian French.

DeVault: Announcing the Hare programming language

Posted May 2, 2022 22:04 UTC (Mon) by JMB (guest, #74439) [Link] (16 responses)

The topic is a new system programming language in development ... and the comments are all off topic.

I understand that topics like Pulse Audio, systemd, Wayland, and GNOME may be very emotional topics,
as those who have not the time to create big changes to the known distro may be forced things which may
no longer fit their needs - a swapping of such deep and big parts is not an easy task - so I accept and
understand this. And all these did and are still making big problems ...

But here - there is a new free system programming language one may play with if so inclined,
and the reaction is a big wave of hate. No 'thanks for a nice toy' or real reasoning.
Is this LWN or a mere forum of people never really used a computer and just want to show
their superiority by spreading nonsense?

Rust may be an interesting language, but it has not been proved by big projects yet - maybe it will.
And no, Mozilla is a company making product from which users went away in big numbers ...
but it would not be fair to make Rust responsible for this. But if it would that good ... well ...

C, C++, Fortran and even Cobol have such big projects - for a large timespan.
These are still among the most useful languages ... as the new kids are so much better,
smarter, safer ... not quite so.
So what is the big thing when the programmer has to know what he does - and if he does not think
deep enough you have bugs and security problems (and I would not like to distinguish between them).
Whining that C and C++ are too complicated ...

Java creates trash code - everyone in IT business experienced that (e.g. IBM tried to push Java
for system tasks on Power Series - and had to step back - too slow, buggy ... much worse than any
solution before ... so well suited for smartphones ;).
I am no expert to decide about code quality - but one is for sure - we have really good and optimizing
compilers for C and C++ ... and those creating C/C++ code from that language.

Rust is not yet supported by GCC - is it? Off tree ... so no real support.
And this is the holy thing one wants to save ... really ???
And it is endangered by a fresh system programming language which starts with 'trusting
the programmer' so he can really decide what memory management is suited ... which is OK!
Or whining about a lot of possible bugs? So what about C?

So I don't get it - there is no discussion about future improvements (before 1.0 this would be
the one interesting thing), only stating that it is not appropriate for 2022 (maybe, as one has
to think and should be experienced, right?) and that it is a waste of time.
Or that good programmers may be taken away from better programming languages.
I don't think that C or C++ may suffer that much ... :)

These are not arguments - this is mere hating.

It is really strange what happened here, but it seems the quality of LWN comments degraded
as the quality of some GNU/Linux flavours ...

DeVault: Announcing the Hare programming language

Posted May 3, 2022 8:31 UTC (Tue) by ldearquer (guest, #137451) [Link] (1 responses)

I wish I could up-vote comments.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:40 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

+1 (plus one to pass lwn's text filter :-))

DeVault: Announcing the Hare programming language

Posted May 3, 2022 12:05 UTC (Tue) by nix (subscriber, #2304) [Link] (13 responses)

> the reaction is a big wave of hate

I read a lot of people commenting on why various design decisions seemed poor, followed by, variously, defensiveness, attacking the critic rather than the point they made or declaring they will simply not interact with them at all, declaring the point to be "out of scope" (even when it was *explicitly required* for the design goals of the language: UAF protection or decent data structures are obviously needed in order to be able to actually trust your tools, but I guess it's not obvious after all)... or simply ignoring the entire comment thread (e.g. the rather devastating points about its character/unicode handling, which would seem germane to the string handling stuff Drew *was* responding to).

My immediate impression from this thread is that this is a new language where design or implementation criticisms or even suggestions for tiny changes to the documentation are ignored unless they're really easy to fix *and* you're someone in the designer's good books, which might well mean never having mentioned a competing language in his hearing. You'll pardon me if I think that this sounds a bit excessive. "Tidal wave of hate" or no, walking on eggshells in order to not offend the language designer or having suggested improvements disregarded because of who I am seems like a bad fork to be stuck in if I want to use a nascent language.

FYI: I saw no hate, although I did gasp in shock at several of Drew's comments. I saw one comment that designing new languages which intentionally avoid trying to protect against UAF should lead to legal liability for the language designer. I'm not so sure about that, but it is clearly seriously *unwise* to choose to use this language, however pretty it might be, for anything remotely important or which processes hostile input or which processes not-perfectly-controlled input which might not all be perfectly formed UTF-8 or which requires complex datastructures or which might need nontrivial memory management... which rules out just about everything other than hobbies you're sure nobody else is going to use. It probably should expose people who choose to use it for safety-critical or security-exposed stuff to legal liability, but this should be true whatever language they choose to use. If people do use it for critical stuff and its manifold obvious faults cause problems I can easily see the language designers getting exposed to significant opprobrium for encouraging people to use it without prominent warnings about this stuff, all while having an announcement that said "Provide tools the programmer may use when they don’t trust themselves" while leaving huge mantraps like manual memory management and no decent datastructures in place which obviously force the programmer to be perfect a huge proportion of the time, just like C does.

I like C. I use it all the time: even though it is obviously a stone-age tool, it is a good one as such things go, and you can prepare food perfectly well and also cut yourself really badly with an Acheulean hand axe. I still think this language is obviously aiming at the wrong goals, and that this comment thread is an example of self-destructiveness on Drew's part the like of which I haven't seen on LWN since the Apache OpenOffice fiasco. Drew hasn't *actually* insulted or driven off most of the C++ committee or Rust language designers yet, but that's seemingly only because most of them don't post here. It is obvious from this thread alone that he considers that disagreeing with him is a cardinal sin and that there is no need to justify even the most bizarre and retro language design decisions beyond saying "out of scope" or using plural pronouns like a monarch to make it seem as if lots of really important unnamed people already thought this through, though you don't get to see their reasoning and no precis nor even a link to a rationale is provided.

I used to more or less trust what Drew wrote. He seemed to think things through. That's gone.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 12:45 UTC (Tue) by mads (subscriber, #55377) [Link] (2 responses)

I think your comment is a perfect example of the "tidal wave of hate" the parent was referring to.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 17:43 UTC (Tue) by nix (subscriber, #2304) [Link] (1 responses)

I don't hate it. I don't know enough about it to hate it. Some parts of it seem very nice, but to me those parts are outnumbered by the parts I think are not. It seems to go to great lengths to not improve the areas where I think C-like languages need most improvement, and at least one of its designers lauds this as apparently being beneficial (though I have no idea why).

It seems like an unwise direction for programming language development to go in, and anybody pointing this out seems to be treated unpleasantly at best, judging by this thread. Is that hatred? If so, hatred is a whole lot weaker than I thought it was. Mild disapproval and a recommendation (again, based on very little more than this thread and a cursory read of the docs) that something is best avoided unless apparently baked-in faults are fixed first is not "hatred".

DeVault: Announcing the Hare programming language

Posted May 4, 2022 16:56 UTC (Wed) by khim (subscriber, #9252) [Link]

This whole things remids me about Hans Reiser (except, hopefully, without murder).

As in: both Reiser and SirCmpwn understand many things much deeper than many of us (including me, obviously), yet they are wrong about certain things, too… and they categorically refuse to even consider the possibility that they can be in the wrong.

This means that they can do really impressive things, yet they can't be trusted. A pity, really.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:57 UTC (Tue) by rsidd (subscriber, #2582) [Link] (1 responses)

I use Sway, which Drew originated. I probably use lots of other software he wrote. I think the questions in this thread are fair and I'm really staggered at his responses. Maybe it's adrenalin or something.

Example of overreaction: he interpreted "liable" as "criminally responsible" (for bugs downstream from his design decisions). It doesn't mean that, it technically means "legally answerable" but OP probably didn't mean even that, it was figurative. He includes that claim in his blog post on Hare's scope (promising another on the main question here, memory safety)

DeVault: Announcing the Hare programming language

Posted May 3, 2022 15:55 UTC (Tue) by mads (subscriber, #55377) [Link]

What's your point here?

The tone is disappointing

Posted May 3, 2022 18:17 UTC (Tue) by tbird20d (subscriber, #1901) [Link] (7 responses)

I have a hobby project I've worked on for a long time. It has security problems I'm aware of, so I've never published it as open source, though I think some of its features and their implementation (unrelated to its security) might be of interest to others. The negative tone of some of the people in this crowd makes this a place I wouldn't want to introduce it, if I ever changed my mind.

The tone is disappointing

Posted May 3, 2022 20:56 UTC (Tue) by roc (subscriber, #30627) [Link] (5 responses)

Speaking for myself, but I suspect also a lot of other people in this comments section, what bothers me most about Hare is this claim:

> It is well-suited to writing operating systems, system tools, compilers, networking software, and other low-level, high performance tasks.

I strongly believe that at this point in time, it is irresponsible to trade away security to the extent Hare does, when adopting a new language in any operating system, system tools, or networking software that you expect other people might use, and therefore it is irresponsible to make the above claim.

So if you avoid claims like that, and just publish your work with a disclaimer that it has security problems and is not suitable for production use, I think you'd be fine.

The tone is disappointing

Posted May 4, 2022 9:21 UTC (Wed) by Wol (subscriber, #4433) [Link]

> So if you avoid claims like that, and just publish your work with a disclaimer that it has security problems and is not suitable for production use, I think you'd be fine.

Eggsackerly. Do a Unix Man Page impression and prominently point out the faults - who knows - you might even get a bunch of fixes :-)

Cheers,
Wol

The tone is disappointing

Posted May 4, 2022 14:21 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

Quite. The specific choice of examples rubbed me up the wrong way too. Operating systems? Assuming this means kernels, they run at maximum privilege level and absolutely need both tools to help you write complex data structures with fine control (since they usually have to reimplement most of them) and tools to avoid memory problems (since they are extra-disastrous). Meanwhile, performance is usually not an issue for 95% of the code in a kernel, and the remaining 5% attracts a lot of thought (and what matters there is usually better algorithms rather than the language it's written in).

System tools? Performance almost never an issue at all, need data structures already there because they are not going to want to reimplement them. Needs to avoid overruns and memory problems because often run as root.

Compilers? They *eat* data structures and very complex allocation patterns for breakfast. I don't know of any large compiler that doesn't have at the very least its own allocator, often several: they usually end up with full-blown garbage collection because the data structures just get so knotty that doing it by hand is utterly impractical. Rust-style ownership tracking usually ends up crudely bodged in :P so it would be helpful to have some of that already there.

Networking software? Exposed to, well, the network. Absolutely must be robust against attackers. I'd prefer it if the things were formally proven correct, but in the absence of actual living flying unicorns at least using heavily-tested datastructures and a memory model that makes common classes of fault next to impossible (like, say, Rust's, or almost any managed language's) is better than nothing. Speed is of marginal interest except for the sort of thing that ends up on network backbones and the occasional thing like the heart of a file transfer system that might find itself at the wrong end of a domestic or datacentre multi-GbE cable, where suddenly every single CPU cycle counts and you'll probably be either bottlenecked in the kernel or using direct-to-userspace stuff and writing core loops in asm to wring *everything* out, with a relatively unimportant and untested fallback in some other language for slower systems. (Almost no applications fall into this category.)

I think I figured it out!

Posted May 10, 2022 20:29 UTC (Tue) by cbushey (guest, #142134) [Link] (2 responses)

>with a relatively unimportant and untested fallback in some other language for slower systems. (Almost no applications fall into this category.)

That sounds like a perfect problem for a programming language to solve. Not some newfangled fresh language where you need to solve a lot of problems and people need to learn new semantics, tooling, and libraries of course! No, it should be lua or javascript (or is that typescript or ecmascript), hmmm. Oh, I've got it! Use GralVM. Then you can aot compile all the code that can run on a jvm. Oh, plus it's designed to be a polyglot compiler so you should be able to mix and match languages you're using with it's help. Maybe even one language per module. Oh, and you've got that Truffle Language Implementation Framework with some more polyglot programing. Now you have a much better solution than making your own language. You can use everybody else's well tested and supported programming languages. Well, you end up learning half a dozen or so unfamiliar syntax but that is clearly better then figuring out something simpler in a single language that can easily solve the problem for a relatively unimportant and untested fallback for slower systems. Sorry. I just wanted to ramble some on lwn and this page has so many comments that my comment is guaranteed to get lost in the noise. I hope you have a nice day.

And almost a week late!

Posted May 10, 2022 20:32 UTC (Tue) by cbushey (guest, #142134) [Link] (1 responses)

This is so not going to be read by anyone. Hey, I'm like that backup language!

Last comment, I promise.

Posted May 10, 2022 20:41 UTC (Tue) by cbushey (guest, #142134) [Link]

oh, and I have absolutely no clue how polyglot programming works using GralVM. At least the last two comments are pretty short.

The tone is disappointing

Posted May 4, 2022 14:00 UTC (Wed) by nix (subscriber, #2304) [Link]

I think Hare makes a fine hobby or experimental language. It's got some very interesting decisions in its design -- but the days are gone when that makes something a good replacement for a major systems language. The state of the art has, slowly and painfully, improved enough for that by now, I think.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 0:08 UTC (Tue) by 3541 (subscriber, #135498) [Link] (1 responses)

How's the support for linking to native C libraries?

Is this possible? If so, do I need to reproduce prototypes in Hare by manual transcription, or is there (or are there plans for) anything like Rust's bindgen? Of course, reading C headers is nontrivial, but _some_ degree of automation would be useful.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 6:40 UTC (Tue) by ddevault (subscriber, #99589) [Link]

Hare uses a superset of the C ABI, so linking with C programs is fairly straightforward. However, there is not currently any tooling around generating Hare modules from C headers - which is a non-trivial problem, especially with libraries that make generous use of macros. Making a C library feel intuitive/idiomatic to use from Hare is another problem. At least some help with generating forward declarations is likely to be written at some point.

There is a project to do the other way around (interacting from Hare with C), at least, which is the easier problem of the two:

https://git.sr.ht/~sebsite/hareconv

I do want to improve support for linking with C.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 5:22 UTC (Tue) by PengZheng (subscriber, #108006) [Link] (3 responses)

Does Hare share the same code bloat problems as C++(STL and Exceptions) and Rust (large stdlib and no support of shared library)?

More batteries-included language without such problems is nice to have for embedded Linux development.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 5:32 UTC (Tue) by PengZheng (subscriber, #108006) [Link]

I noticed that language-specific package manager is not in the author's plan.
If Hare does target embedded development, I strongly suggest a build-in package manager.

Just try out Conan for C/C++, it solves all problems listed here:
https://lwn.net/Articles/893244/

DeVault: Announcing the Hare programming language

Posted May 3, 2022 6:42 UTC (Tue) by ddevault (subscriber, #99589) [Link]

I do not think that Hare shares the same problems as C++. As for Rust, there is a generous but scope-constrained standard library:

https://docs.harelang.org

There is no support for shared libraries, however, though I think that at some point in the future it can be hacked together by a dedicated hacker.

DeVault: Announcing the Hare programming language

Posted May 6, 2022 23:32 UTC (Fri) by ncm (guest, #165) [Link]

There are no "code bloat problems in C++, so it cannot "share" any.

DeVault: Announcing the Hare programming language

Posted May 7, 2022 12:06 UTC (Sat) by scientes (guest, #83068) [Link] (1 responses)

While no longer interesting to me, it is refreshing to see the goals of the Zig project (which I felt were abandoned shortly after I joined the project) be accomplished in the Hare language. Andrew Kelley is a very skilled programmer, but he lets his ambition lead him astray and prevent him from finishing what he starts. He was at OKCupid (in C++) when he started Zig, and he is trying to breed computers like people.

DeVault: Announcing the Hare programming language

Posted May 10, 2022 13:44 UTC (Tue) by renox (guest, #23785) [Link]

Could you explain why you think Zig doesn't follow its initial goals ?

DeVault: Announcing the Hare programming language

Posted May 7, 2022 19:28 UTC (Sat) by littoral (guest, #140523) [Link] (5 responses)

Why, oh why, do we need yet another programming language?

We don't.

Over a programmer's lifetime, probably more than 90% of his/her time is spent maintaining/enhancing/fixing existing code. The most effective way to increase programmer productivity would be to reduce the number of programming languages in existence. Inventing yet another one might possibly [giving the inventors of Hare every possible benefit of the doubt] increase programmer productivity in writing new code by between 1% and 2% - after, that it is, it gains enough users to get more or less debugged. But - if it ever gets to be popular - it then reduces programmer productivity in maintaining old code by more than 20% because they have to spend time learning a new language - and new libraries.

So, wow, somebody has given us the opportunity to reduce the overall lifetime productivity of programmers by adopting Hare. Let's do the smart thing. Ignore Hare, it'll then go away.

You would like to increase programmer productivity? That's a lot harder. The best way would be to write a program that converted, with 100% coverage, language X into language Y. For example, Perl to Python, or COBOL to anything. AFAIK that's never been done for any high-level language pair (X, Y) when neither strictly includes the other (e.g. C++ and C).

DeVault: Announcing the Hare programming language

Posted May 8, 2022 11:53 UTC (Sun) by Vipketsh (guest, #134480) [Link]

I look at new programming languages as experiments: the important point isn't to create something wonderful, but to either prove or disprove the practicality of things. Personally, I have little interest in experimenting but I am interested in reading about the results when people do. Thus I find it valuable when people try new things. Actually, I definitely applaud people trying things by creating something new instead of making stuff up and then ramming it into an existing popular language where if it doesn't work out we are all stuck with garbage for evermore. If things don't work out for a new language, it dies, no one remembers and there is little lasting harm, but if it does something awesome and it shows it is awesome, those pieces can be cherry-picked into the popular languages.

DeVault: Announcing the Hare programming language

Posted May 9, 2022 5:29 UTC (Mon) by rsidd (subscriber, #2582) [Link] (1 responses)

If people in 2002 had decided that we don't need new languages, we wouldn't have had Go, or Rust, or Scala, or Kotlin, or (my favourite for scientific programming) Julia. What has changed in 2022 that we don't need new languages any more? Or are you saying we didn't need the above languages either?

DeVault: Announcing the Hare programming language

Posted May 9, 2022 8:35 UTC (Mon) by farnz (subscriber, #17727) [Link]

Or, going back further; by 1960 we had FORTRAN, LISP, ALGOL and COBOL. Why did we bother with C, FORTH, Smalltalk, BASIC, Prolog, SQL, ML, Pascal, Logo, Common Lisp, Ada, Objective-C, Haskell, Python, R, Ruby, Java, Delphi, PHP, JavaScript, C#, the ones you named, and more, when we had "enough" programming languages?

DeVault: Announcing the Hare programming language

Posted May 10, 2022 13:55 UTC (Tue) by renox (guest, #23785) [Link] (1 responses)

> Why, oh why, do we need yet another programming language?
> We don't.

We do, because
1) unfortunately most languages have very serious design issues (C integer promotions rules for example)
2) languages are often never "changed", new features are added but design issues are left unchanged..

Unfortunately there are many C "would be replacer": Odin, Zig, V, Hare now (quite a few other in the Pascal family), this will spread thin contributors and delay the adoption of a "popular" C's replacement language.

DeVault: Announcing the Hare programming language

Posted May 15, 2022 5:37 UTC (Sun) by wtarreau (subscriber, #51152) [Link]

Never heard about V, thanks for the tip! There is some very interesting stuff there. Just tried it, there are some issues (invalid memory accesses and crashes when using wrong argument type to some functions without even a warning at build time, memory leaks like crazy triggering the OOM killer in a few seconds, and being 20 times slower than C on a the Vweb server test), but it looks well balanced, very likely suitable for scripting, and I like the fact that it translates to/from C. It looks quite young and we can hope that some of the problems above (invalid memory accesses an OOM) will get solved soon. It definitely deserves being watched!


Copyright © 2022, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds