DeVault: Announcing the Hare programming language

Posted May 3, 2022 14:14 UTC (Tue) by wtarreau (subscriber, #51152)
In reply to: DeVault: Announcing the Hare programming language by roc
Parent article: DeVault: Announcing the Hare programming language

> "And I promise to never access that memory again through this pointer". Making that promise and failing to honour it is where the problems arise.

No, that's not how it works. That's only the view of the end user who relies on libs but does not write them.

You only promise not to access that memory again between the *end* of the free() and the beginning of the next malloc()/free()/realloc(). Because free() itself and malloc() will use it a lot. And even calls to free() on other objects or malloc() returning another object might touch that area to cut it into pieces, merge it with another one, or restitute it to the system. That's it is important to understand how memory allocation works and not consider that free() is something that strict, because it is not (otherwise it would be impossible to write a memory allocator and you would have to stop and restart your program when you'd have used all your system memory since you wouldn't be allowed to reuse a previously used pointer).

DeVault: Announcing the Hare programming language

Posted May 3, 2022 16:49 UTC (Tue) by Tobu (subscriber, #24111) [Link] (6 responses)

You may get an address reused with a malloc() / free() / malloc() sequences but the pointer won't have the same provenance and won't be the same from the point of view of the abstract machine that defines the operational semantics of the language. The compiler will either know about malloc or offer a building block below it.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 8:34 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (5 responses)

But how will the compiler distinguish:

free(foo);
printf("Just released %p\n", foo);

which doesn't dereference the pointer, from:

free(foo);
printf("Just released %s\n", foo);

especially if the example is a bit more constricted by just calling a debug(void *p) function that takes the pointer in argument without telling if it just uses its value or dereferences it ?

DeVault: Announcing the Hare programming language

Posted May 4, 2022 14:05 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (4 responses)

The former is fine. The latter is definitely UB (as `MALLOC_PERTURB_` would show since it does a `memset` on `free`'d memory). FWIW, I run with `MALLOC_PERTURB_` at all times.

Provenance is what makes:

free(foo);
char* new_foo = malloc(1);
if (foo == new_foo) {
// by golly, we got lucky.
*foo = 1; // UB
}

That comparison is misleading due to provenance. It can be assumed to be false because `foo` is not allowed to *access* anything after that `free` even if its integer representation happens to be the same as `new_foo`. See the C and C++ papers by Paul McKenney about "pointer zap" about how to finally put provenance into the standard (instead of being something that implementers have had to craft to make sense of things as the languages have evolved).

Additionally, CHERI would show the folly of this code. C allows CHERI to exist. If you want to say "I don't care about CHERI", it'd be real nice if C would allow the code to have some marker that says "this code abuses pointer equality because we assume the target platform allows us to do this" so that any CHERI-like target can just say "this is broken" up front instead of waiting for whatever the optimizer does to finally trip up something in production.

As I said elsewhere: if you want to abuse C to be assembler for your target, it'd be real nice if that could be explicit instead of the doing "I'm using C for my list of targets, damn C's portability goals" and leaving "fun" landmines for others to run over later.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:28 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (3 responses)

I remember seeing this example somewhere but to be honest it really shocks me, that's purely forgetting that there is a real machine performing the operations underneath and using registers to set addresses. If the types are the same and the pointers are the same, it is foolish to permit the compiler to decide that they can hold different values. That's definitely a way to create massive security issues.

My feeling over the last 10 years is that the people working on C are actually trying to kill it by making it purposely unsuitable to all the cases that made its success. Harder to use, harder to code without creating bugs, harder to read. How many programs break when upgrading gcc on a distro ? A lot. The worst ones being those that break at runtime. This just shows that even if past expecations were wrong, they were based on something that made sense in a certain context and that was based on how a computer works. Now instead you have to imagine what a compiler developer thinks about the legitimacy of your use case and how much pain he wants to inflict you to punish you for trying to do it. Not surprising to see that high wave of new languages emerging in such conditions!

DeVault: Announcing the Hare programming language

Posted May 4, 2022 15:53 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (2 responses)

> … that's purely forgetting that there is a real machine performing the operations underneath and using registers to set addresses.

The error here lies in thinking that C makes any claims about pointers holding *addresses*. The representation of pointer values is not defined by the standard. In this area the ease with which most compilers permit integer/pointer conversions beyond what the standard defines is an attractive nuisance; pointers are not just "integers which can be dereferenced". Pointers are abstract references to memory locations (objects or storage for objects) which have associated lifetimes and can behave differently based on whether or not they've been initialized. There may or may not be a "real machine" underneath, as C can be compiled for an abstract virtual machine (like WASM) or even interpreted. Even when targeting real hardware (e.g. CHERI) there is no guarantee that you can freely convert between pointers and integers, or treat the representation of a pointer as a plain memory address. Pointer provenance may be something only the compiler is aware of, or it may have some representation at runtime (via tagged pointers).

Really the rules for using pointers in C without triggering undefined behavior are not that different from the rules for references in Rust. C just doesn't offer any help in tracking whether the requirements have been *met*, where Rust requires the compiler to take on most of that task.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 17:23 UTC (Wed) by wtarreau (subscriber, #51152) [Link] (1 responses)

Let's say we disagree. Even the aliasing rules are contrary to the example above: both types are the same, modifying one pointer means the data at the other one *might* have been modified and the data from that other pointer must be reloaded before being used, unless the compiler knows that pointers are the same in which case it can directly use the just written data, which is the case here.

This type of crap is *exactly* what makes people abandon C. The compiler betrays the programmer based on rules which are stretched to their maximum extent, just to say that there was a tiny bit of possibility to interpret a standard for the sole purpose of "optimizing code" further, at the expense of reliability, trustability, reviewability and security.

I'm sorry but a compiler that validates an equality test of both types and values between two pointers and fails to see a change when one is overwritten is a moronic compiler, regardless of the language. You cannot simply trust anything produced by that crap at this point.

DeVault: Announcing the Hare programming language

Posted May 4, 2022 18:01 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> Even the aliasing rules are contrary to the example above: both types are the same

I never gave a type for `foo`. I don't think it really matters what type `foo` is here (beyond allowing `1` to be assigned to its pointee type). Dereferencing is verboten after passing it to `free` regardless of its bitwise representation.

> just to say that there was a tiny bit of possibility to interpret a standard for the sole purpose of "optimizing code" further, at the expense of reliability, trustability, reviewability and security.

I don't know that "optimizing code" is the end goal per sē. I think it's actually about making things more consistent across C's target platforms.

If you write `int i = other_int << 32;` and expect 0, one target may be happy. Another might give you `other_int` (let's say it encodes the operation's right-hand value in 5 bits with the register index in the other 3 because why not). Mandating either requires one of them to avoid using its built-in shifting operation. What do you want C to do here? Just say "implementation-defined" and leave porters to trip over this later? You *still* need to maintain the version that has actual meaning for both targets if you care.

Now, if you want to say "I'm coding for x86 and screw anything else", that's great! Let's tell the *compiler* that so it can understand your intent. But putting your thoughts into code and then telling a compiler "I've got C here" when you actually mean "I've got x86-assuming C here" is just bad communication.

I'd like to see some way in the C language itself. Not a pragma. Not a compiler flag. Not a preprocessor guard using `__x86_64__`, but some actual honest-to-K&R C syntax that communicates your intent to the compiler. FWIW, I don't know that maintaining that preamble will be worth the work outside of the kernel's `arch/` directory, but hey, some people live there. I say that because you'll either have:

- divergent code anyways to tell the compiler about each assumption set; or
- have to write the C standard code that requires you to check how much you're shifting by before doing it anyways.

So, as said elsewhere in this thread, improving C is fine. But complaining that you're coding in C, breaking its rules, then complaining that the compiler isn't playing fair is just not reconciling beliefs with reality.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 18:04 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

As I understand it, that's actually not the same pointer!

The pointer I got from malloc() for my 400 byte allocation, and the pointer malloc itself relies on for managing the heap after I free() it are not actually the same pointer even though on a typical modern CPU they're the same magic number in a CPU register.

The pointer I had comes with provenance given to it by malloc(), it's a pointer to my 400 byte allocation. If I add 390 to it, that's a pointer to the last 10 bytes of the allocation, and we're all fine. But if I add -16 to it, that's not a pointer to anything.

The pointer used by the allocator uses different provenance, it's a pointer into the heap, and you can totally add -16 to it, because that's just a different pointer into the heap and as such fine.

DeVault: Announcing the Hare programming language

Posted May 3, 2022 20:02 UTC (Tue) by ssokolow (guest, #94568) [Link]

Exactly. The abstract machine that the compiler's optimizers (and tools like LLVM sanitizers and miri) operate on tags each pointer with its parent allocation and, when you free(), that allocation is revoked, making dereferencing any pointers derived from it an invalid operation.