|
|
Log in / Subscribe / Register

Mixing safe and unsafe

Mixing safe and unsafe

Posted Oct 29, 2025 13:36 UTC (Wed) by tialaramex (subscriber, #21167)
In reply to: Mixing safe and unsafe by matthias
Parent article: Fil-C: A memory-safe C implementation

The pointer example, while perhaps intended to be clarifying for C programmers, has two disadvantages that I think mean we should avoid it

First, all pointer dereferences in Rust are unsafe. If you have a pointer named ptr, then *ptr, dereferencing the pointer, isn't allowed in safe Rust, full stop. So caring about whether the pointer is valid is always on you. Which leads us to...

Second, unlike C and C++ Rust doesn't care about the existence of invalid pointers. Safe Rust can make null pointers, dangling pointers, even just arbitrarily mint a nonsense pointer which claims it is a pointer to a Goose but is actually the word "HONK" in ASCII as an address just marked up as a pointer-to-Goose. This is fine in safe Rust and guaranteed not to cause UB, so long as nobody dereferences the pointer which they cannot do in safe Rust.

For C programmers this doesn't make sense, because in C there are three categories - pointers to things, which you can dereference; pointers one past things, which are allowed to exist but must never be dereferenced, and all other pointers which are invalid and no guarantees about them are provided by the language at all. So the intuitions are very different.


to post comments

Mixing safe and unsafe

Posted Oct 29, 2025 14:13 UTC (Wed) by matthias (subscriber, #94967) [Link] (7 responses)

I do not see the difference between C and (unsafe) rust. In unsafe rust, there are the same categories(*) of pointers that there are in C.
The pointer can point to things, it can point one past the end of an array (used in the slice iterator), or it can just contain garbage and must never be dereferenced.

The main difference wrt. raw pointers between the languages is, that in rust you have to use unsafe if you want to dereference a pointer. Rust has adopted the C++ memory model, i.e., the rules regarding atomic accesses and how they order wrt. raw pointer accesses. They actually refer to the C++ semantics for this. Rust does not yet have pointer provence, but this is in the discussion and might end up being also quite similar to C. All in all, raw pointers work very much the same.

Of course, this is a totally difference game when it comes to references where the compiler enforces strict invariants regarding validity.

(*) Probably more than three categories, e.g. pointers to uninitialized memory, where you are only allowed to write but not to read. Null pointers are also somewhat special, as you are allowed to compare them.

Mixing safe and unsafe

Posted Oct 29, 2025 21:07 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (6 responses)

So, firstly, unlike C and C++ Rust does have pointer provenance, you're behind the news. In January Rust 1.84 shipped its provenance APIs and all the associated documentation went from speculative to de facto how Rust works. You can have the "strict" provenance APIs in which we agree that pointers have provenance, we get excellent performance but some tricks are impossible - or you can ask for "exposed" provenance which is also what WG14 is proposing might be C's future with its PNVI-ae-udi (Provenance Not Via Integers, Address Exposed, User DIsambiguates) model.

That's why I was so concrete about things. In Rust they are now nailed down. The compilers are still crap, so if you bang on this hard it'll miscompile, but that's true in C and C++ anyway, it's just harder to prove you were miscompiled because often you'll write UB in those languages and the compiler people will use that as an excuse. But that's not a language issue, that's a compiler QOI issue and I expect over the next 3-5 years it'll improve, the way the LLVM's handling of aliasing improved when Rust began banging on it and filing bugs.

You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end. Sixteen past the end, or eight before the start, or any other value is fine too. Like I said, in Rust pointers actually are the imaginary tuple (addr, addr_space, provenance) and we're always allowed to get addr out by definition. In C and C++ whether that can work is hotly contested, see the recent "Pointer zap" LWN article for a taste of how insane C might be here and what its committee members want to do about that.

Rust distinguishes validity for read versus write, and provides explicit methods on pointer types, so the correct way to initialize that uninitialized memory pointed to by ptr that's the right shape for a Goose, is the (unsafe obviously) ptr.write(some_goose); and yes, that pointer was valid for writing only up until that moment, though having performed a write it's now valid for reads too.

If you're thinking "I would use a dereference" Bzzt, that's going to be a problem. unsafely *ptr = some_goose; will try to destroy the previous goose, but there is no goose, just uninitialized memory so that's UB.

Mixing safe and unsafe

Posted Oct 29, 2025 23:24 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end. Sixteen past the end, or eight before the start, or any other value is fine too.

Pointers one past the end are indeed special in Rust, but you are correct that they're not that special. The unsafe method <*T>::offset() (and similar methods) may return a pointer into the allocation, or a pointer one past the end, but it is instant UB to ask it for anything else (or to call it on a dangling or otherwise invalid pointer, except a pointer that was already one past the end). This is materially identical to the validity requirements of pointer arithmetic in C.

Rust also provides <*T>::wrapping_offset() and similar safe methods, which have no validity requirements at the callsite,* but the documentation notes that they may be optimized more poorly than their unsafe counterparts. This is presumably a result of LLVM, hardware, or both preferentially optimizing for C semantics.

Since pointers don't implement Add or other arithmetic traits, there is no strong basis for claiming that either one of these APIs is the "primary" means of performing pointer arithmetic in unsafe Rust (safe Rust can only use the wrapping_foo() methods, but then safe Rust cannot go on to dereference the pointers, so it's a bit of a moot point).

* Obviously, the pointer will eventually need to be valid when you dereference it, and that includes strict provenance.

Mixing safe and unsafe

Posted Oct 30, 2025 9:20 UTC (Thu) by matthias (subscriber, #94967) [Link] (4 responses)

> In January Rust 1.84 shipped its provenance APIs and all the associated documentation went from speculative to de facto how Rust works.

Thanks. I somehow missed that.

>You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end.

The provenance documentation says it is different from 16 past the end, as it is still inside provenance. Just the same as in C.

> If you're thinking "I would use a dereference" Bzzt, that's going to be a problem. unsafely *ptr = some_goose; will try to destroy the previous goose, but there is no goose, just uninitialized memory so that's UB.

Only if goose implements drop. But then you are (implicitly) creating a &mut to the uninitialized memory, which is indeed UB. And usually the drop handler will read the memory, which is also UB. This is not really a difference in pointer semantics but more a difference on how the assignment operator works. I do agree that ptr.write() should be used, if there is no valid object that you want to drop, even if the type does not implement drop. It is much more obvious that the programmer wants to do a write and not an assignment this way.

Of course there are differences in pointer handling between the languages. I think of pointer comparisions which can be UB in C(++), while they are part of safe rust and thus must not cause any UB. These are details that of course need to be accounted for when writing actual code. However, when thinking of raw pointers in rust, they are much more similar to C pointers than to anything else in the rust language. They always feel somhow alien in the rust language. So having an expressive API with methods like offset and write is a good thing. Of course, it is still unsafe, but less error prone.

Mixing safe and unsafe

Posted Oct 30, 2025 13:15 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (3 responses)

Where are you seeing the claim that one-past-the-end is privileged in this way? I can't see it in anything I reviewed, but I only briefly flipped past because I'm on a lunch break.

Likewise I didn't find a claim that *ptr = some_goose; is sound when Goose is not Drop. I can see in principle how this could be arranged, but I couldn't think of any reason I would want it, and since we're unsafe if it isn't legal the compiler isn't going to necessarily point out the problem, so if it is legal I want a URL to drop in an adjacent safety comment so people know why I thought it was OK to write this.

Mixing safe and unsafe

Posted Oct 30, 2025 13:38 UTC (Thu) by matthias (subscriber, #94967) [Link] (2 responses)

From the provenance section of https://doc.rust-lang.org/std/ptr/index.html :
> It is undefined behavior to access memory through a pointer that does not have provenance over that memory. Note that a pointer “at the end” of its provenance is not actually outside its provenance, it just has 0 bytes it can load/store.

"at the end " refers to what in C would be called one past the end. If it would point to the last element, the size would not be zero, so they really mean one past the end.

I am not actually sure whether there is a real difference to being outside of the provence, as you cannot load or store anyway. In C, there is a difference as comparison operators take provenance into account. In rust, comparison operators are only comparing the address. So there might not be a real difference.

From the documentation of pointer https://doc.rust-lang.org/std/primitive.pointer.html :
> Storing through a raw pointer using *ptr = data calls drop on the old value, so write must be used if the type has drop glue and memory is not already initialized - otherwise drop would be called on the uninitialized memory.

It is not explictly stated that you are allowed to store non-drop values in this way. However, if you would not be allowed to, this would be phrased differently. I still would use write for uninitialized memory, as it looks cleaner.

Mixing safe and unsafe

Posted Oct 30, 2025 14:07 UTC (Thu) by notriddle (subscriber, #130608) [Link]

> I am not actually sure whether there is a real difference to being outside of the provence, as you cannot load or store anyway.

You can subtract from it to increase the size. The backwards-iterator works that way, subtracting from the pointer and then reading it.

Mixing safe and unsafe

Posted Oct 30, 2025 16:24 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Thanks for those excerpts.

I agree the terminology is confusing to a C programmer, who is used to thinking of pointers as always pointing to at least one whole byte of RAM, because Rust's pointers (even valid ones) don't necessarily do that (we can ask for a pointer to a single empty tuple, or indeed to an array of 126 empty tupes, both of these pointers aren't pointing to even a single byte of RAM because of course those tuples are zero size, but it is legal to point at them for whatever that's worth...).

I am quite sure that zero length writes are legal for arbitrary pointers, for example Rust considers that trying to store the empty tuple () to the null pointer is a reasonable thing to (unsafely) insist on doing, because it'll just evaporate - the compiler is guaranteed to realise that () is zero bytes wide, and writing zero bytes is not actually a write at all. So I think if there even is a distinction it's a distinction which doesn't make a difference - that "one past the end" pointer can correctly write zero bytes, but so could a "two past the end" pointer.

I am confident the provenance doesn't evaporate when we do this because that's what the (strict provenance API) map_addr trick relies on - we can take a valid pointer, change the address bits in some reversible way and we get an invalid pointer we mustn't dereference, but then later we can reverse the operation on that pointer, and now once again we've got a valid pointer. Flag bits hidden in pointers and some other fun tricks are thus legal in Rust's strict provenance while in C or C++ they're only potentially legalised via a fairly fraught pointer-integer-pointer roundtrip that Rust wanted to avoid. There should be no difference to the resulting machine code after optimisation, but good luck to any tools trying to verify that it's correct in C or C++...

That drop glue statement does seem pretty clear - not sure how I missed that and I agree both that: In practice I'd write a write call to signify my intent and that going by that statement it is legal to use the storing operation instead if you could show that Goose doesn't impl Drop.

Mixing safe and unsafe

Posted Oct 29, 2025 15:03 UTC (Wed) by chris_se (subscriber, #99706) [Link]

> For C programmers this doesn't make sense, because in C there are three categories - pointers to things, which you can dereference; pointers one past things, which are allowed to exist but must never be dereferenced, and all other pointers which are invalid and no guarantees about them are provided by the language at all. So the intuitions are very different.

While you are technically correct with regards to the standard, in practice most C programmers have used pointers more like how unsafe Rust treats pointers. There is a LOT of code out there that steals some bits from pointers to store some additional information (especially in "lock-free" code), and technically that's UB in C if this is stored in pointer variables directly (AFAIK it would be OK if it were stored in uintptr_t, but next to nobody does that).

Also there's a lot of code out there where a void * can be used as a context, and some people just use it to store integers (because no pointer to actual data is needed) - again, technically UB, but there's a TON of C code out there that does this.

So I see what Rust does more like already codifying the current state of affairs in C, while the official C standard still says that all that code out there is technically UB. And the main reason in C for this is that C can in principle run on all sorts of exotic systems where this might in fact break. But a lot of C code out there still makes a lot of implicit assumptions about the environment (e.g. that a pointer is nothing more than an integer in the end) that Rust has just gone on and codified.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds