|
|
Log in / Subscribe / Register

Mixing safe and unsafe

Mixing safe and unsafe

Posted Oct 29, 2025 11:42 UTC (Wed) by matthias (subscriber, #94967)
In reply to: Mixing safe and unsafe by epa
Parent article: Fil-C: A memory-safe C implementation

> Of course, the burden is on me to ensure those blocks are safe and don’t have any undefined behaviour.

This is not the only problem. Once you have unsafe blocks, you have a contract between safe and unsafe code, as an unsafe block will for sure rely on invariants that are hard to impossible to check at the boundary. Say you have an unsafe block that traverses a linked list. If you turn of the runtime checks inside the block, you rely on the promise that all pointers in the linked list are valid. Otherwise you immediately have undefined behaviour.

Even in rust this is an issue and if you have unsafe blocks you have to be very careful that the unsafety is contained, usually at the module boundary. Changing an integer is usually considered safe, but if the integer encodes the length of a Vec, than this is very unsafe, as the unsafe code that implements indexing into the Vec relies on this integer to be correct. This is solved by not providing any (safe) functions that can change this integer directly. The situation is that safe code that interacts with Vec cannot cause undefined behaviour. However, safe code within the Vec module most definitely can. You do not need an unsafe block to change the integer encoding the length. This is described quite nicely in the first chapter of the nomicon[1] (the guide to unsafe rust). You can read this introductory chapter even if you do not know rust.

With Fil-C this containment of unsafe is just impossible. In C you can always change the contents of a variable by casting it to an array of bytes. So you cannot rel on any invariants and have to check when you use a pointer. Or you have to verify that the unsafe block is never called with violated invariants, which basically forces you to verify all the code, not only the usnafe block.

[1] https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html


to post comments

Mixing safe and unsafe

Posted Oct 29, 2025 13:36 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (9 responses)

The pointer example, while perhaps intended to be clarifying for C programmers, has two disadvantages that I think mean we should avoid it

First, all pointer dereferences in Rust are unsafe. If you have a pointer named ptr, then *ptr, dereferencing the pointer, isn't allowed in safe Rust, full stop. So caring about whether the pointer is valid is always on you. Which leads us to...

Second, unlike C and C++ Rust doesn't care about the existence of invalid pointers. Safe Rust can make null pointers, dangling pointers, even just arbitrarily mint a nonsense pointer which claims it is a pointer to a Goose but is actually the word "HONK" in ASCII as an address just marked up as a pointer-to-Goose. This is fine in safe Rust and guaranteed not to cause UB, so long as nobody dereferences the pointer which they cannot do in safe Rust.

For C programmers this doesn't make sense, because in C there are three categories - pointers to things, which you can dereference; pointers one past things, which are allowed to exist but must never be dereferenced, and all other pointers which are invalid and no guarantees about them are provided by the language at all. So the intuitions are very different.

Mixing safe and unsafe

Posted Oct 29, 2025 14:13 UTC (Wed) by matthias (subscriber, #94967) [Link] (7 responses)

I do not see the difference between C and (unsafe) rust. In unsafe rust, there are the same categories(*) of pointers that there are in C.
The pointer can point to things, it can point one past the end of an array (used in the slice iterator), or it can just contain garbage and must never be dereferenced.

The main difference wrt. raw pointers between the languages is, that in rust you have to use unsafe if you want to dereference a pointer. Rust has adopted the C++ memory model, i.e., the rules regarding atomic accesses and how they order wrt. raw pointer accesses. They actually refer to the C++ semantics for this. Rust does not yet have pointer provence, but this is in the discussion and might end up being also quite similar to C. All in all, raw pointers work very much the same.

Of course, this is a totally difference game when it comes to references where the compiler enforces strict invariants regarding validity.

(*) Probably more than three categories, e.g. pointers to uninitialized memory, where you are only allowed to write but not to read. Null pointers are also somewhat special, as you are allowed to compare them.

Mixing safe and unsafe

Posted Oct 29, 2025 21:07 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (6 responses)

So, firstly, unlike C and C++ Rust does have pointer provenance, you're behind the news. In January Rust 1.84 shipped its provenance APIs and all the associated documentation went from speculative to de facto how Rust works. You can have the "strict" provenance APIs in which we agree that pointers have provenance, we get excellent performance but some tricks are impossible - or you can ask for "exposed" provenance which is also what WG14 is proposing might be C's future with its PNVI-ae-udi (Provenance Not Via Integers, Address Exposed, User DIsambiguates) model.

That's why I was so concrete about things. In Rust they are now nailed down. The compilers are still crap, so if you bang on this hard it'll miscompile, but that's true in C and C++ anyway, it's just harder to prove you were miscompiled because often you'll write UB in those languages and the compiler people will use that as an excuse. But that's not a language issue, that's a compiler QOI issue and I expect over the next 3-5 years it'll improve, the way the LLVM's handling of aliasing improved when Rust began banging on it and filing bugs.

You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end. Sixteen past the end, or eight before the start, or any other value is fine too. Like I said, in Rust pointers actually are the imaginary tuple (addr, addr_space, provenance) and we're always allowed to get addr out by definition. In C and C++ whether that can work is hotly contested, see the recent "Pointer zap" LWN article for a taste of how insane C might be here and what its committee members want to do about that.

Rust distinguishes validity for read versus write, and provides explicit methods on pointer types, so the correct way to initialize that uninitialized memory pointed to by ptr that's the right shape for a Goose, is the (unsafe obviously) ptr.write(some_goose); and yes, that pointer was valid for writing only up until that moment, though having performed a write it's now valid for reads too.

If you're thinking "I would use a dereference" Bzzt, that's going to be a problem. unsafely *ptr = some_goose; will try to destroy the previous goose, but there is no goose, just uninitialized memory so that's UB.

Mixing safe and unsafe

Posted Oct 29, 2025 23:24 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end. Sixteen past the end, or eight before the start, or any other value is fine too.

Pointers one past the end are indeed special in Rust, but you are correct that they're not that special. The unsafe method <*T>::offset() (and similar methods) may return a pointer into the allocation, or a pointer one past the end, but it is instant UB to ask it for anything else (or to call it on a dangling or otherwise invalid pointer, except a pointer that was already one past the end). This is materially identical to the validity requirements of pointer arithmetic in C.

Rust also provides <*T>::wrapping_offset() and similar safe methods, which have no validity requirements at the callsite,* but the documentation notes that they may be optimized more poorly than their unsafe counterparts. This is presumably a result of LLVM, hardware, or both preferentially optimizing for C semantics.

Since pointers don't implement Add or other arithmetic traits, there is no strong basis for claiming that either one of these APIs is the "primary" means of performing pointer arithmetic in unsafe Rust (safe Rust can only use the wrapping_foo() methods, but then safe Rust cannot go on to dereference the pointers, so it's a bit of a moot point).

* Obviously, the pointer will eventually need to be valid when you dereference it, and that includes strict provenance.

Mixing safe and unsafe

Posted Oct 30, 2025 9:20 UTC (Thu) by matthias (subscriber, #94967) [Link] (4 responses)

> In January Rust 1.84 shipped its provenance APIs and all the associated documentation went from speculative to de facto how Rust works.

Thanks. I somehow missed that.

>You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end.

The provenance documentation says it is different from 16 past the end, as it is still inside provenance. Just the same as in C.

> If you're thinking "I would use a dereference" Bzzt, that's going to be a problem. unsafely *ptr = some_goose; will try to destroy the previous goose, but there is no goose, just uninitialized memory so that's UB.

Only if goose implements drop. But then you are (implicitly) creating a &mut to the uninitialized memory, which is indeed UB. And usually the drop handler will read the memory, which is also UB. This is not really a difference in pointer semantics but more a difference on how the assignment operator works. I do agree that ptr.write() should be used, if there is no valid object that you want to drop, even if the type does not implement drop. It is much more obvious that the programmer wants to do a write and not an assignment this way.

Of course there are differences in pointer handling between the languages. I think of pointer comparisions which can be UB in C(++), while they are part of safe rust and thus must not cause any UB. These are details that of course need to be accounted for when writing actual code. However, when thinking of raw pointers in rust, they are much more similar to C pointers than to anything else in the rust language. They always feel somhow alien in the rust language. So having an expressive API with methods like offset and write is a good thing. Of course, it is still unsafe, but less error prone.

Mixing safe and unsafe

Posted Oct 30, 2025 13:15 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (3 responses)

Where are you seeing the claim that one-past-the-end is privileged in this way? I can't see it in anything I reviewed, but I only briefly flipped past because I'm on a lunch break.

Likewise I didn't find a claim that *ptr = some_goose; is sound when Goose is not Drop. I can see in principle how this could be arranged, but I couldn't think of any reason I would want it, and since we're unsafe if it isn't legal the compiler isn't going to necessarily point out the problem, so if it is legal I want a URL to drop in an adjacent safety comment so people know why I thought it was OK to write this.

Mixing safe and unsafe

Posted Oct 30, 2025 13:38 UTC (Thu) by matthias (subscriber, #94967) [Link] (2 responses)

From the provenance section of https://doc.rust-lang.org/std/ptr/index.html :
> It is undefined behavior to access memory through a pointer that does not have provenance over that memory. Note that a pointer “at the end” of its provenance is not actually outside its provenance, it just has 0 bytes it can load/store.

"at the end " refers to what in C would be called one past the end. If it would point to the last element, the size would not be zero, so they really mean one past the end.

I am not actually sure whether there is a real difference to being outside of the provence, as you cannot load or store anyway. In C, there is a difference as comparison operators take provenance into account. In rust, comparison operators are only comparing the address. So there might not be a real difference.

From the documentation of pointer https://doc.rust-lang.org/std/primitive.pointer.html :
> Storing through a raw pointer using *ptr = data calls drop on the old value, so write must be used if the type has drop glue and memory is not already initialized - otherwise drop would be called on the uninitialized memory.

It is not explictly stated that you are allowed to store non-drop values in this way. However, if you would not be allowed to, this would be phrased differently. I still would use write for uninitialized memory, as it looks cleaner.

Mixing safe and unsafe

Posted Oct 30, 2025 14:07 UTC (Thu) by notriddle (subscriber, #130608) [Link]

> I am not actually sure whether there is a real difference to being outside of the provence, as you cannot load or store anyway.

You can subtract from it to increase the size. The backwards-iterator works that way, subtracting from the pointer and then reading it.

Mixing safe and unsafe

Posted Oct 30, 2025 16:24 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Thanks for those excerpts.

I agree the terminology is confusing to a C programmer, who is used to thinking of pointers as always pointing to at least one whole byte of RAM, because Rust's pointers (even valid ones) don't necessarily do that (we can ask for a pointer to a single empty tuple, or indeed to an array of 126 empty tupes, both of these pointers aren't pointing to even a single byte of RAM because of course those tuples are zero size, but it is legal to point at them for whatever that's worth...).

I am quite sure that zero length writes are legal for arbitrary pointers, for example Rust considers that trying to store the empty tuple () to the null pointer is a reasonable thing to (unsafely) insist on doing, because it'll just evaporate - the compiler is guaranteed to realise that () is zero bytes wide, and writing zero bytes is not actually a write at all. So I think if there even is a distinction it's a distinction which doesn't make a difference - that "one past the end" pointer can correctly write zero bytes, but so could a "two past the end" pointer.

I am confident the provenance doesn't evaporate when we do this because that's what the (strict provenance API) map_addr trick relies on - we can take a valid pointer, change the address bits in some reversible way and we get an invalid pointer we mustn't dereference, but then later we can reverse the operation on that pointer, and now once again we've got a valid pointer. Flag bits hidden in pointers and some other fun tricks are thus legal in Rust's strict provenance while in C or C++ they're only potentially legalised via a fairly fraught pointer-integer-pointer roundtrip that Rust wanted to avoid. There should be no difference to the resulting machine code after optimisation, but good luck to any tools trying to verify that it's correct in C or C++...

That drop glue statement does seem pretty clear - not sure how I missed that and I agree both that: In practice I'd write a write call to signify my intent and that going by that statement it is legal to use the storing operation instead if you could show that Goose doesn't impl Drop.

Mixing safe and unsafe

Posted Oct 29, 2025 15:03 UTC (Wed) by chris_se (subscriber, #99706) [Link]

> For C programmers this doesn't make sense, because in C there are three categories - pointers to things, which you can dereference; pointers one past things, which are allowed to exist but must never be dereferenced, and all other pointers which are invalid and no guarantees about them are provided by the language at all. So the intuitions are very different.

While you are technically correct with regards to the standard, in practice most C programmers have used pointers more like how unsafe Rust treats pointers. There is a LOT of code out there that steals some bits from pointers to store some additional information (especially in "lock-free" code), and technically that's UB in C if this is stored in pointer variables directly (AFAIK it would be OK if it were stored in uintptr_t, but next to nobody does that).

Also there's a lot of code out there where a void * can be used as a context, and some people just use it to store integers (because no pointer to actual data is needed) - again, technically UB, but there's a TON of C code out there that does this.

So I see what Rust does more like already codifying the current state of affairs in C, while the official C standard still says that all that code out there is technically UB. And the main reason in C for this is that C can in principle run on all sorts of exotic systems where this might in fact break. But a lot of C code out there still makes a lot of implicit assumptions about the environment (e.g. that a pointer is nothing more than an integer in the end) that Rust has just gone on and codified.

Mixing safe and unsafe

Posted Oct 31, 2025 17:24 UTC (Fri) by epa (subscriber, #39769) [Link] (2 responses)

You make a good point, but the same applies to ordinary C code. In unchecked C you immediately have undefined behaviour if there are invalid pointers in a linked list. And so on. It's surely easier for the programmer to worry about all these nasty problems in just 5% of the code than in all of it.
You have to verify that the unsafe block is never called with violated invariants,
I didn't quite understand this point. You do have to verify that -- but surely to do so it's enough to prove that all unsafe blocks in your program are behaving nicely? If the unsafe blocks are correct, then the other 95% of the code (the "safe" part) will not violate any invariants -- or at least if it does so, the program will blow up at run time as soon as it happens. (Fil-C does not claim to give you the same thorough compile-time checking as Rust.)

Mixing safe and unsafe

Posted Oct 31, 2025 18:27 UTC (Fri) by matthias (subscriber, #94967) [Link] (1 responses)

> If the unsafe blocks are correct, then the other 95% of the code (the "safe" part) will not violate any invariants -- or at least if it does so, the program will blow up at run time as soon as it happens.

This is the way rust works, but not how Fil-C works. Fil-C checks the pointers when they are used. It does not check at compile time (that is what rust does with references) and it cannot check when they are constructed. A pointer can alias with an integer type. So you can write any value into the pointer and Fil-C will not complain. It will only complain when you try to use the pointer and the metadata is incorrect. If the first use of an invalid pointer is in the unsafe code and you have turned off the runtime checks in this part of the code then you have UB.

Once you turn off the runtime checks in any part of the code, you have to verify all code for correctness that touches the same memory as the unsafe code.

Mixing safe and unsafe

Posted Nov 1, 2025 15:02 UTC (Sat) by epa (subscriber, #39769) [Link]

Thanks, I understand now. So you’d have to prove that any pointers used within the unsafe block were valid pointers — a property not enforced in advance of using them, even by safe code.

Despite that drawback, I still feel that a mixed model with mostly safe code and a few unsafe hotspots would be more productive than doing everything in unsafe C, and might be fast enough when 100% safe Fil-C is too slow.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds