Fil-C: A memory-safe C implementation

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

By Daroc Alden
October 28, 2025

Fil-C is a memory-safe implementation of C and C++ that aims to let C code — complete with pointer arithmetic, unions, and other features that are often cited as a problem for memory-safe languages — run safely, unmodified. Its dedication to being "fanatically compatible" makes it an attractive choice for retrofitting memory-safety into existing applications. Despite the project's relative youth and single active contributor, Fil-C is capable of compiling an entire memory-safe Linux user space (based on Linux From Scratch), albeit with some modifications to the more complex programs. It also features memory-safe signal handling and a concurrent garbage collector.

Fil-C is a fork of Clang; it's available under an Apache v2.0 license with LLVM exceptions for the runtime. Changes from the upstream compiler are occasionally merged in, with Fil-C currently being based on version 20.1.8 from July 2025. The project is a personal passion of Filip Pizlo, who has previously worked on the runtimes of a number of managed languages, including Java and JavaScript. When he first began the project, he was not sure that it was even possible. The initial implementation was prohibitively slow to run, since it needed to insert a lot of different safety checks. This has given Fil-C a reputation for slowness. Since the initial implementation proved viable, however, Pizlo has managed to optimize a number of common cases, making Fil-C-generated code only a few times slower than Clang-generated code, although the exact slowdown depends heavily on the structure of the benchmarked program.

Reliable benchmarking is notoriously finicky, but in order to get some rough feel for whether that level of performance impact would be problematic, I compiled Bash version 5.2.32 with Fil-C and tried using it as my shell. Bash is nearly a best case for Fil-C, because it spends more time running external programs than running its own code, but I still expected the performance difference to be noticeable. It wasn't. So, at least for some programs, the performance overhead of Fil-C does not seem to be a problem in practice.

In order to support its various run-time safety checks, Fil-C does use a different internal ABI than Clang does. As a result, objects compiled with Fil-C won't link correctly against objects generated by other compilers. Since Fil-C is a full implementation of C and C++ at the source-code level, however, in practice this just requires everything to be recompiled with Fil-C. Inter-language linking, such as with Rust, is not currently supported by the project.

Capabilities

The major challenge of rendering C memory-safe is, of course, pointer handling. This is especially complicated by the fact that, as the long road to CHERI-compatibility has shown, many programs expect a pointer to be 32 or 64 bits, depending on the architecture. Fil-C has tried several different ways to represent pointers since the project's beginning in 2023. Fil-C's first pointers were 256 bits, not thread-safe, and didn't protect against use-after-free bugs. The current implementation, called "InvisiCaps", allows for pointers that appear to match the natural pointer size of the architecture (although this requires storing some auxiliary information elsewhere), with full support for concurrency and catching use-after-free bugs, at the expense of some run-time overhead.

Fil-C's documentation compares InvisiCaps to a software implementation of CHERI: pointers are separated into a trusted "capability" piece and an untrusted "address" piece. Since Fil-C controls how the program is compiled, it can ensure that the program doesn't have direct access to the capabilities of any pointers, and therefore the runtime can rely on them being uncorrupted. The tricky part of the implementation comes from how these two pieces of information are stored in what looks to the program like 64 bits.

When Fil-C allocates an object on the heap, it adds two metadata words before the start of the allocated object: an upper bound, used to check accesses to the object based on its size, and an "aux word" that is used to store additional pointer metadata. When the program first writes a pointer value into an object, the runtime allocates a new auxiliary allocation of the same size as the object being written into, and puts an actual hardware-level pointer (i.e., one without an attached capability) to the new allocation into the aux word of the object. This auxiliary allocation, which is invisible to the program being compiled, is used to store the associated capability information for the pointer being stored (and is also reused for any additional pointers stored into the object later). The address value is stored into the object as normal, so any C bit-twiddling techniques that require looking at the stored value of the pointer work as expected.

This approach does mean that structures that contain pointers end up using twice as much memory, and every load of a pointer involves a pointer indirection through the aux word. In practice, the documentation claims that the performance overhead of this approach for most programs makes them run about four times more slowly, although that number depends on how heavily the program makes use of pointers. Still, he has ideas for several optimizations that he hopes can bring the performance overhead down over time.

One wrinkle with this approach is atomic access to pointers — i.e. using _Atomic or volatile. Luckily, there is no problem that cannot be solved with more pointer indirection: when the program loads or stores a pointer value atomically, instead of having the auxiliary allocation contain the capability information directly, it points to a third 128-bit allocation that stores the capability and pointer value together. That allocation can be updated with 128-bit atomic instructions, if the platform supports them, or by creating new allocations and atomically swapping the pointers to them.

Since the aux word is used to store a pointer value, Fil-C can use pointer tagging to store some additional information there as well; that is used to indicate special types of objects that need to be handled differently, such as functions, threads, and mmap()-backed allocations. It's also used to mark freed objects, so that any access results in an error message and a crash.

Memory management

When an object is freed, its aux word marks it as a free object, which lets the auxiliary allocation be reclaimed immediately. The original object can't be freed immediately, however. Otherwise, a program could free an object, allocate a new object in the same location, and thereby cover up use-after-free bugs. Instead, Fil-C uses a garbage collector to free an object's backing memory only once all of the pointers to it go away. Unlike other garbage collectors for C — such as the Boehm-Demers-Weiser garbage collector — Fil-C can use the auxiliary capability information to track live objects precisely.

Fil-C's garbage collector is both parallel (collection happens faster the more cores are available) and concurrent (collection happens without pausing the program). Technically, the garbage collector does require threads to occasionally pause just long enough to tell it where pointers are located on the stack, but that only occurs at special "safe points" — otherwise, the program can load and manipulate pointers without notifying the garbage collector. Safe points are used as a synchronization barrier: the collector can't know that an object is really garbage until every thread has passed at least one safe point since it finished marking. This synchronization is done with atomic instructions, however, so in practice threads never need to pause for longer than a few instructions.

The exception is the implementation of fork(), which uses the safe points needed by the garbage collector to temporarily pause all of the threads in the program in order to prevent race conditions while forking. Fil-C inserts a safe point at every backward control-flow edge, i.e., whenever code could execute in a loop. In the common case, the inserted code just needs to load a flag register and confirm that the garbage collector has not requested anything be done. If the garbage collector does have a request for the thread, the thread runs a callback to perform the needed synchronization.

Fil-C uses the same safe-point mechanism to implement signal handling. Signal handlers are only run when the interrupted thread reaches a safe point. That, in turn, allows signal handlers to allocate and free memory without interfering with the garbage collector's operation; Fil-C's malloc() is signal-safe.

Memory-safe Linux

Linux From Scratch (LFS) is a tutorial on compiling one's own complete Linux user space. It walks through the steps of compiling and installing all of the core software needed for a typical Linux user space in a chroot() environment. Pizlo has successfully run through LFS with Fil-C to produce a memory-safe version, although a non-Fil-C compiler is still needed to build some fundamental components, such as Fil-C's own runtime, the GNU C library, and the kernel. (While Fil-C's runtime relies on a normal copy of the GNU C library to make system calls, the programs that Fil-C compiles use a Fil-C-compiled version of the library.)

The process is mostly identical to LFS up through the end of chapter 7, because everything prior to that point consists of using cross-build tools to obtain a working compiler in the chroot() environment. The one difference is that the cross-build tools are built with a different configured prefix, so that they won't conflict with Fil-C. At that point, one can build a copy of Fil-C and use it to mostly replace the existing compiler. The remaining steps of LFS are unchanged.

Scripts to automate the process are included in the Fil-C Git repository, including some steps from Beyond Linux From Scratch that result in a working graphical user interface and a handful of more complicated applications such as Emacs.

Overall, Fil-C offers a remarkably complete solution for making existing C programs memory-safe. While it does nothing for undefined behavior that is not related to memory safety, the most pernicious and difficult-to-prevent security vulnerabilities in C programs tend to rely on exploiting memory-unsafe behavior. Readers who have already considered and rejected Fil-C for their use case due to its early performance problems may wish to take a second look — although anyone hoping for stability might want to wait for others to take the plunge, given the project's relative immaturity. That said, for existing applications where a sizeable performance hit is preferable to an exploitable vulnerability, Fil-C is an excellent choice.

Fil-C for programmers

Posted Oct 28, 2025 17:54 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (7 responses)

I'm interested in adoption of Fil-C by C and C++ programmers because for some years my assumption has been that most remaining C programmers in particular want DWIM and so they won't be any happier in Fil-C than in say (safe) Rust since these languages both lack the unconstrained Undefined Behaviour where DWIM thrives, if you write nonsense in Fil-C your program doesn't work and it's your fault which presumably was not what you meant. So widespread enthusiasm for Fil-C from C programmers would indicate that I was completely wrong, which would be a pleasant surprise.

For users in a bunch of cases this is a no brainer, Daroc gave their shell as an example but I'm sure most of us run many programs every day where raw perf just isn't a big deal. I am interested in programmers rather than users because I think that influences whether Fil-C is just an interesting project for our moment or it becomes a "successor" to C in a way that Zig, Odin etc. never could.

Fil-C for programmers

Posted Oct 28, 2025 18:15 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

> I am interested in programmers rather than users because I think that influences whether Fil-C is just an interesting project for our moment or it becomes a "successor" to C in a way that Zig, Odin etc. never could.

My expectation is that there isn't going to be a single successor to C. For some group of people, that was C++ a long time back. For others it is going to Rust or Zig or something else. For the final group they are going to keep coding in C forever and it will more of a generational change eventually.

Fil-C for programmers

Posted Oct 28, 2025 18:32 UTC (Tue) by daroc (editor, #160859) [Link]

I think the point of Fil-C, as a C implementation, is precisely to support the people and projects that are just going to keep using C — in a way that still lets users avoid memory safety issues if they so choose.

Fil-C for programmers

Posted Oct 28, 2025 18:34 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link] (4 responses)

If this is just another C implementation, it's distribution packagers who would need to adopt it and possibly other people who routinely compile but not necessarily write code.

Fil-C for programmers

Posted Oct 29, 2025 1:35 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link] (3 responses)

Fair point. Reducing friction to adoption could work if the performance impact is universally close enough.

Fil-C for programmers

Posted Oct 29, 2025 11:45 UTC (Wed) by k3ninho (subscriber, #50375) [Link] (1 responses)

Adding a runtime to Debian, say like how kFreeBSD was added, would be an interesting avenue for running some things inside a memory-safe environment while finding and fixing memory-type UB for packages people want to run in Val-C.

K3n.

Fil-C for programmers

Posted Oct 30, 2025 22:23 UTC (Thu) by k3ninho (subscriber, #50375) [Link]

Via the discussion of this lwn.net piece at lobste.rs [1], it turns out [2] that djb had already got in with this.

1: https://lobste.rs/s/mg0aur/fil_c_memory_safe_c_implementa...
2: https://cr.yp.to/2025/fil-c.html

K3n.

Fil-C for programmers

Posted Oct 29, 2025 12:18 UTC (Wed) by Baughn (subscriber, #124425) [Link]

It also might eliminate the reasoning some programmers use, that they “have to stick to C because it’s the fastest option”.

Well, now it’s not the fastest option. And no, you don’t get to tell me not to compile everything with FilC. We just need to make that the standard.

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 19:55 UTC (Tue) by oldnpastit (subscriber, #95303) [Link] (6 responses)

This seems to be run-time (as opposed to compile-time checking) - i.e. what Valgrind and ASAN do. Or have I misunderstood it?

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 20:28 UTC (Tue) by bertschingert (subscriber, #160729) [Link]

Fil-C seems to be more similar to ASAN than Valgrind in that the compiler outputs code with the instrumentation / checking present, rather than running already compiled code in a virtual machine as Valgrind does.

But it would seem to be more robust than ASAN; from reading about how ASAN works, it seems that it puts "poisoned" bytes around an allocation, so that memory accesses shortly after the end of a buffer hit those poisoned bytes and are caught. However, ASAN wouldn't catch an invalid access to a non-poisoned address of memory via a particular a pointer, if that address was allocated in a separate allocation. [1]

I assume Fil-C's pointer capability model is able to catch "provenance" violations like that.

[1] https://blog.gistre.epita.fr/posts/benjamin.peter-2022-10...

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 20:33 UTC (Tue) by excors (subscriber, #95769) [Link] (4 responses)

From the readme:

> Fil-C is engineered to prevent memory safety bugs from being used for exploitation rather than just simply flagging them often enough to find bugs. This makes Fil-C different from AddressSanitizer, HWAsan, or MTE, which can all be bypassed by attackers. The key difference that makes this possible is that Fil-C is capability based (so each pointer knows what range of memory it may access, and how it may access it) rather than tag based (where pointer accesses are allowed if they hit valid memory).

Clang says "AddressSanitizer's runtime was not developed with security-sensitive constraints in mind and may compromise the security of the resulting executable", so it should not be used in production.

Valgrind has much worse performance (the manual claims 10-50x slowdown, plus it's effectively single-threaded), which is probably bad enough to make it unusable in production, and similarly will miss many memory safety bugs.

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 21:26 UTC (Tue) by cyperpunks (subscriber, #39406) [Link] (3 responses)

So it's kind of small virtual machine with garbage collection that happens to be compatible with C/C++ based source code?

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 29, 2025 4:57 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

Yes, it can be thought about like this. Fil-C limits the primitives that are accessible to the C code so that no combination of them can (in theory) lead to memory safety issues.

It's somewhat analogous to compiling C into WebAssembly and then JIT-compiling WebAssembly.

The amazing thing is that it preserves most of C/C++ semantics.

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 29, 2025 5:53 UTC (Wed) by willmo (subscriber, #82093) [Link] (1 responses)

> The amazing thing is that it preserves most of C/C++ semantics.

But (at least WRT memory safety) only the semantics of the abstract machine described by the language standards, and not the additional semantics (aka undefined behavior) of the straightforward mappings to typical hardware that we’re all accustomed to.

Very cool idea. :-)

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 30, 2025 15:00 UTC (Thu) by atanasi (subscriber, #136067) [Link]

Fil-C strives to support undefined behavior:

'In addition to memory safety, Fil-C's other goal is fanatical compatibility. This means that Fil-C's capability model also has to allow most (or ideally all) "safe" uses of C pointers; that is, idioms that are in widespread use and don't lead to exploitation unless they violate the above rules. This means even supporting uses of pointers that the C spec deems to be undefined behavior.'

Mixing safe and unsafe

Posted Oct 29, 2025 6:18 UTC (Wed) by epa (subscriber, #39769) [Link] (16 responses)

This sounds more practical than previous safe C implementations (such as the c-semantics interpreter, which must be hundreds of times slower than compiled code). Can you mix it with compiled, optimized object code in the same program? I might want to run 95% of my code with full safety checks but in a few hot spots declare an “unsafe” block which will run close to native speed by eliminating most checks. Of course, the burden is on me to ensure those blocks are safe and don’t have any undefined behaviour.

The unsafe-compiled code wouldn’t be exactly the same as you get from plain clang, as the memory layout is different, and it might be a bit slower because of that, but it could do without checks of pointer capability checking and, perhaps, other checks like overflow and array bounds. The rest of the program must assume that the unsafe code is correct.

Mixing safe and unsafe

Posted Oct 29, 2025 7:01 UTC (Wed) by magfr (subscriber, #16052) [Link] (1 responses)

It said in the article that you can't.
The compiler defines a new ABI and you can't link system ABI libs with it.

Mixing safe and unsafe

Posted Oct 31, 2025 17:17 UTC (Fri) by epa (subscriber, #39769) [Link]

But that's not what I meant. I know the ABI is different. But within one compiled program, all using the new ABI, can you declare a block as "unsafe" so that the compiler won't emit instructions for bounds checking (for example) within that block?

Mixing safe and unsafe

Posted Oct 29, 2025 11:42 UTC (Wed) by matthias (subscriber, #94967) [Link] (13 responses)

> Of course, the burden is on me to ensure those blocks are safe and don’t have any undefined behaviour.

This is not the only problem. Once you have unsafe blocks, you have a contract between safe and unsafe code, as an unsafe block will for sure rely on invariants that are hard to impossible to check at the boundary. Say you have an unsafe block that traverses a linked list. If you turn of the runtime checks inside the block, you rely on the promise that all pointers in the linked list are valid. Otherwise you immediately have undefined behaviour.

Even in rust this is an issue and if you have unsafe blocks you have to be very careful that the unsafety is contained, usually at the module boundary. Changing an integer is usually considered safe, but if the integer encodes the length of a Vec, than this is very unsafe, as the unsafe code that implements indexing into the Vec relies on this integer to be correct. This is solved by not providing any (safe) functions that can change this integer directly. The situation is that safe code that interacts with Vec cannot cause undefined behaviour. However, safe code within the Vec module most definitely can. You do not need an unsafe block to change the integer encoding the length. This is described quite nicely in the first chapter of the nomicon[1] (the guide to unsafe rust). You can read this introductory chapter even if you do not know rust.

With Fil-C this containment of unsafe is just impossible. In C you can always change the contents of a variable by casting it to an array of bytes. So you cannot rel on any invariants and have to check when you use a pointer. Or you have to verify that the unsafe block is never called with violated invariants, which basically forces you to verify all the code, not only the usnafe block.

[1] https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html

Mixing safe and unsafe

Posted Oct 29, 2025 13:36 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (9 responses)

The pointer example, while perhaps intended to be clarifying for C programmers, has two disadvantages that I think mean we should avoid it

First, all pointer dereferences in Rust are unsafe. If you have a pointer named ptr, then *ptr, dereferencing the pointer, isn't allowed in safe Rust, full stop. So caring about whether the pointer is valid is always on you. Which leads us to...

Second, unlike C and C++ Rust doesn't care about the existence of invalid pointers. Safe Rust can make null pointers, dangling pointers, even just arbitrarily mint a nonsense pointer which claims it is a pointer to a Goose but is actually the word "HONK" in ASCII as an address just marked up as a pointer-to-Goose. This is fine in safe Rust and guaranteed not to cause UB, so long as nobody dereferences the pointer which they cannot do in safe Rust.

For C programmers this doesn't make sense, because in C there are three categories - pointers to things, which you can dereference; pointers one past things, which are allowed to exist but must never be dereferenced, and all other pointers which are invalid and no guarantees about them are provided by the language at all. So the intuitions are very different.

Mixing safe and unsafe

Posted Oct 29, 2025 14:13 UTC (Wed) by matthias (subscriber, #94967) [Link] (7 responses)

I do not see the difference between C and (unsafe) rust. In unsafe rust, there are the same categories(*) of pointers that there are in C.
The pointer can point to things, it can point one past the end of an array (used in the slice iterator), or it can just contain garbage and must never be dereferenced.

The main difference wrt. raw pointers between the languages is, that in rust you have to use unsafe if you want to dereference a pointer. Rust has adopted the C++ memory model, i.e., the rules regarding atomic accesses and how they order wrt. raw pointer accesses. They actually refer to the C++ semantics for this. Rust does not yet have pointer provence, but this is in the discussion and might end up being also quite similar to C. All in all, raw pointers work very much the same.

Of course, this is a totally difference game when it comes to references where the compiler enforces strict invariants regarding validity.

(*) Probably more than three categories, e.g. pointers to uninitialized memory, where you are only allowed to write but not to read. Null pointers are also somewhat special, as you are allowed to compare them.

Mixing safe and unsafe

Posted Oct 29, 2025 21:07 UTC (Wed) by tialaramex (subscriber, #21167) [Link] (6 responses)

So, firstly, unlike C and C++ Rust does have pointer provenance, you're behind the news. In January Rust 1.84 shipped its provenance APIs and all the associated documentation went from speculative to de facto how Rust works. You can have the "strict" provenance APIs in which we agree that pointers have provenance, we get excellent performance but some tricks are impossible - or you can ask for "exposed" provenance which is also what WG14 is proposing might be C's future with its PNVI-ae-udi (Provenance Not Via Integers, Address Exposed, User DIsambiguates) model.

That's why I was so concrete about things. In Rust they are now nailed down. The compilers are still crap, so if you bang on this hard it'll miscompile, but that's true in C and C++ anyway, it's just harder to prove you were miscompiled because often you'll write UB in those languages and the compiler people will use that as an excuse. But that's not a language issue, that's a compiler QOI issue and I expect over the next 3-5 years it'll improve, the way the LLVM's handling of aliasing improved when Rust began banging on it and filing bugs.

You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end. Sixteen past the end, or eight before the start, or any other value is fine too. Like I said, in Rust pointers actually are the imaginary tuple (addr, addr_space, provenance) and we're always allowed to get addr out by definition. In C and C++ whether that can work is hotly contested, see the recent "Pointer zap" LWN article for a taste of how insane C might be here and what its committee members want to do about that.

Rust distinguishes validity for read versus write, and provides explicit methods on pointer types, so the correct way to initialize that uninitialized memory pointed to by ptr that's the right shape for a Goose, is the (unsafe obviously) ptr.write(some_goose); and yes, that pointer was valid for writing only up until that moment, though having performed a write it's now valid for reads too.

If you're thinking "I would use a dereference" Bzzt, that's going to be a problem. unsafely *ptr = some_goose; will try to destroy the previous goose, but there is no goose, just uninitialized memory so that's UB.

Mixing safe and unsafe

Posted Oct 29, 2025 23:24 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end. Sixteen past the end, or eight before the start, or any other value is fine too.

Pointers one past the end are indeed special in Rust, but you are correct that they're not that special. The unsafe method <*T>::offset() (and similar methods) may return a pointer into the allocation, or a pointer one past the end, but it is instant UB to ask it for anything else (or to call it on a dangling or otherwise invalid pointer, except a pointer that was already one past the end). This is materially identical to the validity requirements of pointer arithmetic in C.

Rust also provides <*T>::wrapping_offset() and similar safe methods, which have no validity requirements at the callsite,* but the documentation notes that they may be optimized more poorly than their unsafe counterparts. This is presumably a result of LLVM, hardware, or both preferentially optimizing for C semantics.

Since pointers don't implement Add or other arithmetic traits, there is no strong basis for claiming that either one of these APIs is the "primary" means of performing pointer arithmetic in unsafe Rust (safe Rust can only use the wrapping_foo() methods, but then safe Rust cannot go on to dereference the pointers, so it's a bit of a moot point).

* Obviously, the pointer will eventually need to be valid when you dereference it, and that includes strict provenance.

Mixing safe and unsafe

Posted Oct 30, 2025 9:20 UTC (Thu) by matthias (subscriber, #94967) [Link] (4 responses)

> In January Rust 1.84 shipped its provenance APIs and all the associated documentation went from speculative to de facto how Rust works.

Thanks. I somehow missed that.

>You're wrong about the special-ness of one-past-the-end in Rust. It's not special in Rust, it's just one past the end.

The provenance documentation says it is different from 16 past the end, as it is still inside provenance. Just the same as in C.

> If you're thinking "I would use a dereference" Bzzt, that's going to be a problem. unsafely *ptr = some_goose; will try to destroy the previous goose, but there is no goose, just uninitialized memory so that's UB.

Only if goose implements drop. But then you are (implicitly) creating a &mut to the uninitialized memory, which is indeed UB. And usually the drop handler will read the memory, which is also UB. This is not really a difference in pointer semantics but more a difference on how the assignment operator works. I do agree that ptr.write() should be used, if there is no valid object that you want to drop, even if the type does not implement drop. It is much more obvious that the programmer wants to do a write and not an assignment this way.

Of course there are differences in pointer handling between the languages. I think of pointer comparisions which can be UB in C(++), while they are part of safe rust and thus must not cause any UB. These are details that of course need to be accounted for when writing actual code. However, when thinking of raw pointers in rust, they are much more similar to C pointers than to anything else in the rust language. They always feel somhow alien in the rust language. So having an expressive API with methods like offset and write is a good thing. Of course, it is still unsafe, but less error prone.

Mixing safe and unsafe

Posted Oct 30, 2025 13:15 UTC (Thu) by tialaramex (subscriber, #21167) [Link] (3 responses)

Where are you seeing the claim that one-past-the-end is privileged in this way? I can't see it in anything I reviewed, but I only briefly flipped past because I'm on a lunch break.

Likewise I didn't find a claim that *ptr = some_goose; is sound when Goose is not Drop. I can see in principle how this could be arranged, but I couldn't think of any reason I would want it, and since we're unsafe if it isn't legal the compiler isn't going to necessarily point out the problem, so if it is legal I want a URL to drop in an adjacent safety comment so people know why I thought it was OK to write this.

Mixing safe and unsafe

Posted Oct 30, 2025 13:38 UTC (Thu) by matthias (subscriber, #94967) [Link] (2 responses)

From the provenance section of https://doc.rust-lang.org/std/ptr/index.html :
> It is undefined behavior to access memory through a pointer that does not have provenance over that memory. Note that a pointer “at the end” of its provenance is not actually outside its provenance, it just has 0 bytes it can load/store.

"at the end " refers to what in C would be called one past the end. If it would point to the last element, the size would not be zero, so they really mean one past the end.

I am not actually sure whether there is a real difference to being outside of the provence, as you cannot load or store anyway. In C, there is a difference as comparison operators take provenance into account. In rust, comparison operators are only comparing the address. So there might not be a real difference.

From the documentation of pointer https://doc.rust-lang.org/std/primitive.pointer.html :
> Storing through a raw pointer using *ptr = data calls drop on the old value, so write must be used if the type has drop glue and memory is not already initialized - otherwise drop would be called on the uninitialized memory.

It is not explictly stated that you are allowed to store non-drop values in this way. However, if you would not be allowed to, this would be phrased differently. I still would use write for uninitialized memory, as it looks cleaner.

Mixing safe and unsafe

Posted Oct 30, 2025 14:07 UTC (Thu) by notriddle (subscriber, #130608) [Link]

> I am not actually sure whether there is a real difference to being outside of the provence, as you cannot load or store anyway.

You can subtract from it to increase the size. The backwards-iterator works that way, subtracting from the pointer and then reading it.

Mixing safe and unsafe

Posted Oct 30, 2025 16:24 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

Thanks for those excerpts.

I agree the terminology is confusing to a C programmer, who is used to thinking of pointers as always pointing to at least one whole byte of RAM, because Rust's pointers (even valid ones) don't necessarily do that (we can ask for a pointer to a single empty tuple, or indeed to an array of 126 empty tupes, both of these pointers aren't pointing to even a single byte of RAM because of course those tuples are zero size, but it is legal to point at them for whatever that's worth...).

I am quite sure that zero length writes are legal for arbitrary pointers, for example Rust considers that trying to store the empty tuple () to the null pointer is a reasonable thing to (unsafely) insist on doing, because it'll just evaporate - the compiler is guaranteed to realise that () is zero bytes wide, and writing zero bytes is not actually a write at all. So I think if there even is a distinction it's a distinction which doesn't make a difference - that "one past the end" pointer can correctly write zero bytes, but so could a "two past the end" pointer.

I am confident the provenance doesn't evaporate when we do this because that's what the (strict provenance API) map_addr trick relies on - we can take a valid pointer, change the address bits in some reversible way and we get an invalid pointer we mustn't dereference, but then later we can reverse the operation on that pointer, and now once again we've got a valid pointer. Flag bits hidden in pointers and some other fun tricks are thus legal in Rust's strict provenance while in C or C++ they're only potentially legalised via a fairly fraught pointer-integer-pointer roundtrip that Rust wanted to avoid. There should be no difference to the resulting machine code after optimisation, but good luck to any tools trying to verify that it's correct in C or C++...

That drop glue statement does seem pretty clear - not sure how I missed that and I agree both that: In practice I'd write a write call to signify my intent and that going by that statement it is legal to use the storing operation instead if you could show that Goose doesn't impl Drop.

Mixing safe and unsafe

Posted Oct 29, 2025 15:03 UTC (Wed) by chris_se (subscriber, #99706) [Link]

> For C programmers this doesn't make sense, because in C there are three categories - pointers to things, which you can dereference; pointers one past things, which are allowed to exist but must never be dereferenced, and all other pointers which are invalid and no guarantees about them are provided by the language at all. So the intuitions are very different.

While you are technically correct with regards to the standard, in practice most C programmers have used pointers more like how unsafe Rust treats pointers. There is a LOT of code out there that steals some bits from pointers to store some additional information (especially in "lock-free" code), and technically that's UB in C if this is stored in pointer variables directly (AFAIK it would be OK if it were stored in uintptr_t, but next to nobody does that).

Also there's a lot of code out there where a void * can be used as a context, and some people just use it to store integers (because no pointer to actual data is needed) - again, technically UB, but there's a TON of C code out there that does this.

So I see what Rust does more like already codifying the current state of affairs in C, while the official C standard still says that all that code out there is technically UB. And the main reason in C for this is that C can in principle run on all sorts of exotic systems where this might in fact break. But a lot of C code out there still makes a lot of implicit assumptions about the environment (e.g. that a pointer is nothing more than an integer in the end) that Rust has just gone on and codified.

Mixing safe and unsafe

Posted Oct 31, 2025 17:24 UTC (Fri) by epa (subscriber, #39769) [Link] (2 responses)

You make a good point, but the same applies to ordinary C code. In unchecked C you immediately have undefined behaviour if there are invalid pointers in a linked list. And so on. It's surely easier for the programmer to worry about all these nasty problems in just 5% of the code than in all of it.

You have to verify that the unsafe block is never called with violated invariants,

I didn't quite understand this point. You do have to verify that -- but surely to do so it's enough to prove that all unsafe blocks in your program are behaving nicely? If the unsafe blocks are correct, then the other 95% of the code (the "safe" part) will not violate any invariants -- or at least if it does so, the program will blow up at run time as soon as it happens. (Fil-C does not claim to give you the same thorough compile-time checking as Rust.)

Mixing safe and unsafe

Posted Oct 31, 2025 18:27 UTC (Fri) by matthias (subscriber, #94967) [Link] (1 responses)

> If the unsafe blocks are correct, then the other 95% of the code (the "safe" part) will not violate any invariants -- or at least if it does so, the program will blow up at run time as soon as it happens.

This is the way rust works, but not how Fil-C works. Fil-C checks the pointers when they are used. It does not check at compile time (that is what rust does with references) and it cannot check when they are constructed. A pointer can alias with an integer type. So you can write any value into the pointer and Fil-C will not complain. It will only complain when you try to use the pointer and the metadata is incorrect. If the first use of an invalid pointer is in the unsafe code and you have turned off the runtime checks in this part of the code then you have UB.

Once you turn off the runtime checks in any part of the code, you have to verify all code for correctness that touches the same memory as the unsafe code.

Mixing safe and unsafe

Posted Nov 1, 2025 15:02 UTC (Sat) by epa (subscriber, #39769) [Link]

Thanks, I understand now. So you’d have to prove that any pointers used within the unsafe block were valid pointers — a property not enforced in advance of using them, even by safe code.

Despite that drawback, I still feel that a mixed model with mostly safe code and a few unsafe hotspots would be more productive than doing everything in unsafe C, and might be fast enough when 100% safe Fil-C is too slow.

Did this find bugs?

Posted Oct 29, 2025 8:11 UTC (Wed) by marcH (subscriber, #57642) [Link] (3 responses)

Did this find memory bugs in existing code much? I mean the type of memory corruption that you often get away with, like use after free for a little while.

Did this find bugs?

Posted Oct 29, 2025 16:35 UTC (Wed) by mb (subscriber, #50428) [Link] (2 responses)

This doesn't look like it is about finding bugs, but rather about mitigating the effects of exploiting zero day bugs.

Did this find bugs?

Posted Oct 29, 2025 17:10 UTC (Wed) by marcH (subscriber, #57642) [Link] (1 responses)

Whatever the primary goal is, it should help find bugs, no?

Did this find bugs?

Posted Nov 2, 2025 11:47 UTC (Sun) by cpitrat (subscriber, #116459) [Link]

Disclaimer: I know nothing about FilC, I discovered it with this article. This is just the result of searching the web for 10 minutes.

The FilC website or git repo, surprisingly, don't seem to give an example of what FilC catching a bug looks like. However I found this YouTube video in which the author does a demo: https://youtu.be/Gij9UQy_JEQ

You can see that, although FilC detects the issues and interrupts the program, the output is not extremely useful when it comes to identifying the source of the bug. Once you hit an issue, you're probably better off using valgrind or asan to find out what went wrong and fix the bug.

I didn't find anything about compiling options (e.g. debug build, including symbols, ...) which would improve the output, for example by outputting a stack of where the error occurred. This is likely feasible, but this doesn't seem to be the main focus of the project, at least at the moment.

I think that's interesting

Posted Oct 29, 2025 15:20 UTC (Wed) by muase (subscriber, #178466) [Link]

Tbh I think that's pretty interesting; and I'm saying that as a fanatic crab.

If that works as a drop-in replacement for a significant amount of normal C applications, that would be a low-effort, instant fix for tons of security vulnerabilities. There are a lot of legacy project that work perfectly fine, but where we have no real-world resources to recreate them in a safe language – and even if applications need some patches to compile with, that is much more feasible to sell to the maintainers than "rewrite it in rust".

What would be really goat is if there'd be an ABI translation layer for Fil-C-ABI to C-ABI (not sure if that's feasible though). That way I could compile not-performance-critical legacy library in Fil-C and tie it to Rust/Golang/Swift or other legacy-C. Of course that would invalidate safety guarantees at external ABI boundaries; but those are easier to audit/prove than library internals...

Overall, that sounds like a really tasty project; it'll be very interesting to see how it cooks, and how it works out IRL...