|
|
Subscribe / Log in / New account

Fil-C: A memory-safe C implementation

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net!

By Daroc Alden
October 28, 2025

Fil-C is a memory-safe implementation of C and C++ that aims to let C code — complete with pointer arithmetic, unions, and other features that are often cited as a problem for memory-safe languages — run safely, unmodified. Its dedication to being "fanatically compatible" makes it an attractive choice for retrofitting memory-safety into existing applications. Despite the project's relative youth and single active contributor, Fil-C is capable of compiling an entire memory-safe Linux user space (based on Linux From Scratch), albeit with some modifications to the more complex programs. It also features memory-safe signal handling and a concurrent garbage collector.

Fil-C is a fork of Clang; it's available under an Apache v2.0 license with LLVM exceptions for the runtime. Changes from the upstream compiler are occasionally merged in, with Fil-C currently being based on version 20.1.8 from July 2025. The project is a personal passion of Filip Pizlo, who has previously worked on the runtimes of a number of managed languages, including Java and JavaScript. When he first began the project, he was not sure that it was even possible. The initial implementation was prohibitively slow to run, since it needed to insert a lot of different safety checks. This has given Fil-C reputation for slowness. Since the initial implementation proved viable, however, Pizlo has managed to optimize a number of common cases, making Fil-C-generated code only a few times slower than Clang-generated code, although the exact slowdown depends heavily on the structure of the benchmarked program.

Reliable benchmarking is notoriously finicky, but in order to get some rough feel for whether that level of performance impact would be problematic, I compiled Bash version 5.2.32 with Fil-C and tried using it as my shell. Bash is nearly a best case for Fil-C, because it spends more time running external programs than running its own code, but I still expected the performance difference to be noticeable. It wasn't. So, at least for some programs, the performance overhead of Fil-C does not seem to be a problem in practice.

In order to support its various run-time safety checks, Fil-C does use a different internal ABI than Clang does. As a result, objects compiled with Fil-C won't link correctly against objects generated by other compilers. Since Fil-C is a full implementation of C and C++ at the source-code level, however, in practice this just requires everything to be recompiled with Fil-C. Inter-language linking, such as with Rust, is not currently supported by the project.

Capabilities

The major challenge of rendering C memory-safe is, of course, pointer handling. This is especially complicated by the fact that, as the long road to CHERI-compatibility has shown, many programs expect a pointer to be 32 or 64 bits, depending on the architecture. Fil-C has tried several different ways to represent pointers since the project's beginning in 2023. Fil-C's first pointers were 256 bits, not thread-safe, and didn't protect against use-after-free bugs. The current implementation, called "InvisiCaps", allows for pointers that appear to match the natural pointer size of the architecture (although this requires storing some auxiliary information elsewhere), with full support for concurrency and catching use-after-free bugs, at the expense of some run-time overhead.

Fil-C's documentation compares InvisiCaps to a software implementation of CHERI: pointers are separated into a trusted "capability" piece and an untrusted "address" piece. Since Fil-C controls how the program is compiled, it can ensure that the program doesn't have direct access to the capabilities of any pointers, and therefore the runtime can rely on them being uncorrupted. The tricky part of the implementation comes from how these two pieces of information are stored in what looks to the program like 64 bits.

When Fil-C allocates an object on the heap, it adds two metadata words before the start of the allocated object: an upper bound, used to check accesses to the object based on its size, and an "aux word" that is used to store additional pointer metadata. When the program first writes a pointer value into an object, the runtime allocates a new auxiliary allocation of the same size as the object being written into, and puts an actual hardware-level pointer (i.e., one without an attached capability) to the new allocation into the aux word of the object. This auxiliary allocation, which is invisible to the program being compiled, is used to store the associated capability information for the pointer being stored (and is also reused for any additional pointers stored into the object later). The address value is stored into the object as normal, so any C bit-twiddling techniques that require looking at the stored value of the pointer work as expected.

This approach does mean that structures that contain pointers end up using twice as much memory, and every load of a pointer involves a pointer indirection through the aux word. In practice, the documentation claims that the performance overhead of this approach for most programs makes them run about four times more slowly, although that number depends on how heavily the program makes use of pointers. Still, he has ideas for several optimizations that he hopes can bring the performance overhead down over time.

One wrinkle with this approach is atomic access to pointers — i.e. using _Atomic or volatile. Luckily, there is no problem that cannot be solved with more pointer indirection: when the program loads or stores a pointer value atomically, instead of having the auxiliary allocation contain the capability information directly, it points to a third 128-bit allocation that stores the capability and pointer value together. That allocation can be updated with 128-bit atomic instructions, if the platform supports them, or by creating new allocations and atomically swapping the pointers to them.

Since the aux word is used to store a pointer value, Fil-C can use pointer tagging to store some additional information there as well; that is used to indicate special types of objects that need to be handled differently, such as functions, threads, and mmap()-backed allocations. It's also used to mark freed objects, so that any access results in an error message and a crash.

Memory management

When an object is freed, its aux word marks it as a free object, which lets the auxiliary allocation be reclaimed immediately. The original object can't be freed immediately, however. Otherwise, a program could free an object, allocate a new object in the same location, and thereby cover up use-after-free bugs. Instead, Fil-C uses a garbage collector to free an object's backing memory only once all of the pointers to it go away. Unlike other garbage collectors for C — such as the Boehm-Demers-Weiser garbage collector — Fil-C can use the auxiliary capability information to track live objects precisely.

Fil-C's garbage collector is both parallel (collection happens faster the more cores are available) and concurrent (collection happens without pausing the program). Technically, the garbage collector does require threads to occasionally pause just long enough to tell it where pointers are located on the stack, but that only occurs at special "safe points" — otherwise, the program can load and manipulate pointers without notifying the garbage collector. Safe points are used as a synchronization barrier: the collector can't know that an object is really garbage until every thread has passed at least one safe point since it finished marking. This synchronization is done with atomic instructions, however, so in practice threads never need to pause for longer than a few instructions.

The exception is the implementation of fork(), which uses the safe points needed by the garbage collector to temporarily pause all of the threads in the program in order to prevent race conditions while forking. Fil-C inserts a safe point at every backward control-flow edge, i.e., whenever code could execute in a loop. In the common case, the inserted code just needs to load a flag register and confirm that the garbage collector has not requested anything be done. If the garbage collector does have a request for the thread, the thread runs a callback to perform the needed synchronization.

Fil-C uses the same safe-point mechanism to implement signal handling. Signal handlers are only run when the interrupted thread reaches a safe point. That, in turn, allows signal handlers to allocate and free memory without interfering with the garbage collector's operation; Fil-C's malloc() is signal-safe.

Memory-safe Linux

Linux From Scratch (LFS) is a tutorial on compiling one's own complete Linux user space. It walks through the steps of compiling and installing all of the core software needed for a typical Linux user space in a chroot() environment. Pizlo has successfully run through LFS with Fil-C to produce a memory-safe version, although a non-Fil-C compiler is still needed to build some fundamental components, such as Fil-C's own runtime, the GNU C library, and the kernel. (While Fil-C's runtime relies on a normal copy of the GNU C library to make system calls, the programs that Fil-C compiles use a Fil-C-compiled version of the library.)

The process is mostly identical to LFS up through the end of chapter 7, because everything prior to that point consists of using cross-build tools to obtain a working compiler in the chroot() environment. The one difference is that the cross-build tools are built with a different configured prefix, so that they won't conflict with Fil-C. At that point, one can build a copy of Fil-C and use it to mostly replace the existing compiler. The remaining steps of LFS are unchanged.

Scripts to automate the process are included in the Fil-C Git repository, including some steps from Beyond Linux From Scratch that result in a working graphical user interface and a handful of more complicated applications such as Emacs.

Overall, Fil-C offers a remarkably complete solution for making existing C programs memory-safe. While it does nothing for undefined behavior that is not related to memory safety, the most pernicious and difficult-to-prevent security vulnerabilities in C programs tend to rely on exploiting memory-unsafe behavior. Readers who have already considered and rejected Fil-C for their use case due to its early performance problems may wish to take a second look — although anyone hoping for stability might want to wait for others to take the plunge, given the project's relative immaturity. That said, for existing applications where a sizeable performance hit is preferable to an exploitable vulnerability, Fil-C is an excellent choice.




to post comments

Fil-C for programmers

Posted Oct 28, 2025 17:54 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (3 responses)

I'm interested in adoption of Fil-C by C and C++ programmers because for some years my assumption has been that most remaining C programmers in particular want DWIM and so they won't be any happier in Fil-C than in say (safe) Rust since these languages both lack the unconstrained Undefined Behaviour where DWIM thrives, if you write nonsense in Fil-C your program doesn't work and it's your fault which presumably was not what you meant. So widespread enthusiasm for Fil-C from C programmers would indicate that I was completely wrong, which would be a pleasant surprise.

For users in a bunch of cases this is a no brainer, Daroc gave their shell as an example but I'm sure most of us run many programs every day where raw perf just isn't a big deal. I am interested in programmers rather than users because I think that influences whether Fil-C is just an interesting project for our moment or it becomes a "successor" to C in a way that Zig, Odin etc. never could.

Fil-C for programmers

Posted Oct 28, 2025 18:15 UTC (Tue) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

> I am interested in programmers rather than users because I think that influences whether Fil-C is just an interesting project for our moment or it becomes a "successor" to C in a way that Zig, Odin etc. never could.

My expectation is that there isn't going to be a single successor to C. For some group of people, that was C++ a long time back. For others it is going to Rust or Zig or something else. For the final group they are going to keep coding in C forever and it will more of a generational change eventually.

Fil-C for programmers

Posted Oct 28, 2025 18:32 UTC (Tue) by daroc (editor, #160859) [Link]

I think the point of Fil-C, as a C implementation, is precisely to support the people and projects that are just going to keep using C — in a way that still lets users avoid memory safety issues if they so choose.

Fil-C for programmers

Posted Oct 28, 2025 18:34 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

If this is just another C implementation, it's distribution packagers who would need to adopt it and possibly other people who routinely compile but not necessarily write code.

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 19:55 UTC (Tue) by oldnpastit (subscriber, #95303) [Link] (3 responses)

This seems to be run-time (as opposed to compile-time checking) - i.e. what Valgrind and ASAN do. Or have I misunderstood it?

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 20:28 UTC (Tue) by bertschingert (subscriber, #160729) [Link]

Fil-C seems to be more similar to ASAN than Valgrind in that the compiler outputs code with the instrumentation / checking present, rather than running already compiled code in a virtual machine as Valgrind does.

But it would seem to be more robust than ASAN; from reading about how ASAN works, it seems that it puts "poisoned" bytes around an allocation, so that memory accesses shortly after the end of a buffer hit those poisoned bytes and are caught. However, ASAN wouldn't catch an invalid access to a non-poisoned address of memory via a particular a pointer, if that address was allocated in a separate allocation. [1]

I assume Fil-C's pointer capability model is able to catch "provenance" violations like that.

[1] https://blog.gistre.epita.fr/posts/benjamin.peter-2022-10...

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 20:33 UTC (Tue) by excors (subscriber, #95769) [Link] (1 responses)

From the readme:

> Fil-C is engineered to prevent memory safety bugs from being used for exploitation rather than just simply flagging them often enough to find bugs. This makes Fil-C different from AddressSanitizer, HWAsan, or MTE, which can all be bypassed by attackers. The key difference that makes this possible is that Fil-C is capability based (so each pointer knows what range of memory it may access, and how it may access it) rather than tag based (where pointer accesses are allowed if they hit valid memory).

Clang says "AddressSanitizer's runtime was not developed with security-sensitive constraints in mind and may compromise the security of the resulting executable", so it should not be used in production.

Valgrind has much worse performance (the manual claims 10-50x slowdown, plus it's effectively single-threaded), which is probably bad enough to make it unusable in production, and similarly will miss many memory safety bugs.

How is this different from tools like Valgrind and Address Sanitizer?

Posted Oct 28, 2025 21:26 UTC (Tue) by cyperpunks (subscriber, #39406) [Link]

So it's kind of small virtual machine with garbage collection that happens to be compatible with C/C++ based source code?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds