DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language
Posted May 3, 2022 14:14 UTC (Tue) by wtarreau (subscriber, #51152)In reply to: DeVault: Announcing the Hare programming language by roc
Parent article: DeVault: Announcing the Hare programming language
No, that's not how it works. That's only the view of the end user who relies on libs but does not write them.
You only promise not to access that memory again between the *end* of the free() and the beginning of the next malloc()/free()/realloc(). Because free() itself and malloc() will use it a lot. And even calls to free() on other objects or malloc() returning another object might touch that area to cut it into pieces, merge it with another one, or restitute it to the system. That's it is important to understand how memory allocation works and not consider that free() is something that strict, because it is not (otherwise it would be impossible to write a memory allocator and you would have to stop and restart your program when you'd have used all your system memory since you wouldn't be allowed to reuse a previously used pointer).
Posted May 3, 2022 16:49 UTC (Tue)
by Tobu (subscriber, #24111)
[Link] (6 responses)
Posted May 4, 2022 8:34 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (5 responses)
free(foo);
which doesn't dereference the pointer, from:
free(foo);
especially if the example is a bit more constricted by just calling a debug(void *p) function that takes the pointer in argument without telling if it just uses its value or dereferences it ?
Posted May 4, 2022 14:05 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (4 responses)
Provenance is what makes:
free(foo);
That comparison is misleading due to provenance. It can be assumed to be false because `foo` is not allowed to *access* anything after that `free` even if its integer representation happens to be the same as `new_foo`. See the C and C++ papers by Paul McKenney about "pointer zap" about how to finally put provenance into the standard (instead of being something that implementers have had to craft to make sense of things as the languages have evolved).
Additionally, CHERI would show the folly of this code. C allows CHERI to exist. If you want to say "I don't care about CHERI", it'd be real nice if C would allow the code to have some marker that says "this code abuses pointer equality because we assume the target platform allows us to do this" so that any CHERI-like target can just say "this is broken" up front instead of waiting for whatever the optimizer does to finally trip up something in production.
As I said elsewhere: if you want to abuse C to be assembler for your target, it'd be real nice if that could be explicit instead of the doing "I'm using C for my list of targets, damn C's portability goals" and leaving "fun" landmines for others to run over later.
Posted May 4, 2022 15:28 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (3 responses)
My feeling over the last 10 years is that the people working on C are actually trying to kill it by making it purposely unsuitable to all the cases that made its success. Harder to use, harder to code without creating bugs, harder to read. How many programs break when upgrading gcc on a distro ? A lot. The worst ones being those that break at runtime. This just shows that even if past expecations were wrong, they were based on something that made sense in a certain context and that was based on how a computer works. Now instead you have to imagine what a compiler developer thinks about the legitimacy of your use case and how much pain he wants to inflict you to punish you for trying to do it. Not surprising to see that high wave of new languages emerging in such conditions!
Posted May 4, 2022 15:53 UTC (Wed)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
The error here lies in thinking that C makes any claims about pointers holding *addresses*. The representation of pointer values is not defined by the standard. In this area the ease with which most compilers permit integer/pointer conversions beyond what the standard defines is an attractive nuisance; pointers are not just "integers which can be dereferenced". Pointers are abstract references to memory locations (objects or storage for objects) which have associated lifetimes and can behave differently based on whether or not they've been initialized. There may or may not be a "real machine" underneath, as C can be compiled for an abstract virtual machine (like WASM) or even interpreted. Even when targeting real hardware (e.g. CHERI) there is no guarantee that you can freely convert between pointers and integers, or treat the representation of a pointer as a plain memory address. Pointer provenance may be something only the compiler is aware of, or it may have some representation at runtime (via tagged pointers).
Really the rules for using pointers in C without triggering undefined behavior are not that different from the rules for references in Rust. C just doesn't offer any help in tracking whether the requirements have been *met*, where Rust requires the compiler to take on most of that task.
Posted May 4, 2022 17:23 UTC (Wed)
by wtarreau (subscriber, #51152)
[Link] (1 responses)
This type of crap is *exactly* what makes people abandon C. The compiler betrays the programmer based on rules which are stretched to their maximum extent, just to say that there was a tiny bit of possibility to interpret a standard for the sole purpose of "optimizing code" further, at the expense of reliability, trustability, reviewability and security.
I'm sorry but a compiler that validates an equality test of both types and values between two pointers and fails to see a change when one is overwritten is a moronic compiler, regardless of the language. You cannot simply trust anything produced by that crap at this point.
Posted May 4, 2022 18:01 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link]
I never gave a type for `foo`. I don't think it really matters what type `foo` is here (beyond allowing `1` to be assigned to its pointee type). Dereferencing is verboten after passing it to `free` regardless of its bitwise representation.
> just to say that there was a tiny bit of possibility to interpret a standard for the sole purpose of "optimizing code" further, at the expense of reliability, trustability, reviewability and security.
I don't know that "optimizing code" is the end goal per sē. I think it's actually about making things more consistent across C's target platforms.
If you write `int i = other_int << 32;` and expect 0, one target may be happy. Another might give you `other_int` (let's say it encodes the operation's right-hand value in 5 bits with the register index in the other 3 because why not). Mandating either requires one of them to avoid using its built-in shifting operation. What do you want C to do here? Just say "implementation-defined" and leave porters to trip over this later? You *still* need to maintain the version that has actual meaning for both targets if you care.
Now, if you want to say "I'm coding for x86 and screw anything else", that's great! Let's tell the *compiler* that so it can understand your intent. But putting your thoughts into code and then telling a compiler "I've got C here" when you actually mean "I've got x86-assuming C here" is just bad communication.
I'd like to see some way in the C language itself. Not a pragma. Not a compiler flag. Not a preprocessor guard using `__x86_64__`, but some actual honest-to-K&R C syntax that communicates your intent to the compiler. FWIW, I don't know that maintaining that preamble will be worth the work outside of the kernel's `arch/` directory, but hey, some people live there. I say that because you'll either have:
- divergent code anyways to tell the compiler about each assumption set; or
So, as said elsewhere in this thread, improving C is fine. But complaining that you're coding in C, breaking its rules, then complaining that the compiler isn't playing fair is just not reconciling beliefs with reality.
Posted May 3, 2022 18:04 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
The pointer I got from malloc() for my 400 byte allocation, and the pointer malloc itself relies on for managing the heap after I free() it are not actually the same pointer even though on a typical modern CPU they're the same magic number in a CPU register.
The pointer I had comes with provenance given to it by malloc(), it's a pointer to my 400 byte allocation. If I add 390 to it, that's a pointer to the last 10 bytes of the allocation, and we're all fine. But if I add -16 to it, that's not a pointer to anything.
The pointer used by the allocator uses different provenance, it's a pointer into the heap, and you can totally add -16 to it, because that's just a different pointer into the heap and as such fine.
Posted May 3, 2022 20:02 UTC (Tue)
by ssokolow (guest, #94568)
[Link]
You may get an address reused with a malloc() / free() / malloc() sequences but the pointer won't have the same provenance and won't be the same from the point of view of the abstract machine that defines the operational semantics of the language. The compiler will either know about malloc or offer a building block below it.
DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language
printf("Just released %p\n", foo);
printf("Just released %s\n", foo);
DeVault: Announcing the Hare programming language
char* new_foo = malloc(1);
if (foo == new_foo) {
// by golly, we got lucky.
*foo = 1; // UB
}
DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language
- have to write the C standard code that requires you to check how much you're shifting by before doing it anyways.
DeVault: Announcing the Hare programming language
DeVault: Announcing the Hare programming language