|
|
Log in / Subscribe / Register

Famfs, FUSE, and BPF

By Jonathan Corbet
April 23, 2026
The famfs filesystem first showed up on the mailing lists in early 2024; since then, it has been the topic of regular discussions at the Linux Storage, Filesystem, Memory Management and BPF (LSFMM+BPF) Summit. It has also, as result of those discussions, been through some significant changes since that initial posting. So it is not surprising that a suggestion that it needed to be rewritten yet again was not entirely well received. How much more rewriting will actually be needed is unclear, but more discussion appears certain.

Famfs is designed to support large, read-mostly filesystems stored in shared memory. In practice, this means huge data sets kept in CXL-attached memory that is made available to multiple systems simultaneously. In normal usage, software running on those systems will access this data by mapping it directly into its address space with mmap(), so that the data is all immediately accessible without system calls, and without going through the system's page cache. It is possible to perform normal filesystem reads and writes, though write access is only minimally supported.

In its initial form, famfs was implemented like any other standalone filesystem, but it included a user-space component that drew a fair amount of attention from the filesystem developers at the 2024 LSFMM+BPF session. Given that some of famfs is already implemented in user space, they suggested, it might be better to just use FUSE, which was designed for just that kind of filesystem. At the time, famfs creator John Groves said that he was unsure about whether FUSE would work, but would be willing to give it a try. His concern was that famfs must operate at memory speeds, and could thus not afford to call into user space to resolve page faults.

At the 2025 LSFMM+BPF gathering, Groves returned with a shiny new FUSE-based implementation that appeared to have solved that problem. It introduces two new FUSE operations toward that goal. GET_FMAP is invoked when a file is opened; the user-space server responds by providing the kernel with a list of memory locations and lengths, providing a map of how the file is laid out in shared memory. Thereafter, the kernel is able to resolve page faults without having to go back to user space for more information. The other operation, GET_DAXDEV, provides information about the CXL devices on which the shared memory is hosted.

There did not appear to be fundamental objections to the FUSE implementation. Over the following year, Groves worked on refining the code; when he posted version 10 at the end of March, he had every reason to believe that the work, which he had described in February as having been "kinda hard", was close to ready for merging into the mainline. But, as he discovered, merging is never certain until it actually happens.

Better done in BPF?

Joanne Koong, who has done a fair amount of work with FUSE over the years, asked a seemingly simple question:

I'm curious to hear your thoughts on whether you think it makes sense for the famfs-specific logic in this series to be moved to a bpf program that goes through a generic fuse iomap dax layer.

Changing the code in that way, Koong suggested, would make the FUSE logic more extensible and applicable to other types of filesystems. It would also bring more flexibility to famfs, making it easier, for example, to adjust to changes to files after they have been opened, and allowing famfs updates to be made available more quickly to users, since they would not have to wait for the usual kernel release cycle. She also said that she had posted a prototype implementation of a BPF-based famfs back in November, and suggested that switching over to this approach would not involve a huge rewrite of the famfs code.

The FUSE maintainer is Miklos Szeredi; he entered the conversation saying that he would prefer to avoid adding a famfs-specific FUSE interface if it could be avoided. The BPF idea thus appealed to him; he suggested that it should be given a try before considering a merge of the existing famfs patches.

It is fair to say (and understandable) that Groves was not entirely pleased by this turn in the conversation. He would, he said, "object vehemently" to being required to undertake this rewrite before the code could be merged. The current implementation, he said, matches what had been asked of him two years ago. He later added that there would be some real risks involved with the BPF approach, starting with the fact that he would have to learn how to work with BPF, and the performance impacts of such a change would be unknown. The current version is already shipping to users, he said; it is too late to demand such changes now.

Possible solutions

The purpose of the famfs user-space component is to determine where the extents of a given file have been placed in memory and to inform the kernel of those placements. The BPF alternative would work similarly, with user space providing that information in a filesystem-independent way; the BPF program would then provide the filesystem-specific interpretation for the rest of the kernel. One possibility, for example, could be for user space to store the extent information in either a BPF map or an arena.

Various attempts at a solution along these lines exist already. As Darrick Wong pointed out in the discussion, he had posted one such in February, based loosely on Koong's work. In short, it provides access to the kernel's iomap layer, with BPF hooks to help FUSE filesystems complete the mappings. Wong complained that he had not gotten any review responses, and expressed disappointment with the pace of reviews in the FUSE subsystem in general. He thought that a BPF-based implementation could be upstreamed within two development cycles — if the relevant maintainers would accept it. Whether that would happen, he suggested, is far from clear:

The issues I was alluding to are BPF being used as a means to get around slow/unresponsive maintainers; and the kernel community's collective refusal to explore any other path to building new user APIs besides designing everything generically perfectly up front in the kernel UABI along with all the stress that involves.

Rather than trying to develop the perfect API from the beginning, he later said, the best approach might be to merge famfs in its current form, then experiment with alternative approaches afterward. If the interface is carefully designed, it should be possible to move to a better one in the future, should one be found. Others in the conversation also suggested that this might be the best way. Christoph Hellwig, though, was strongly against that idea, saying that the multitude of approaches under consideration showed that more thought needed to be put into designing a single interface.

Gregory Price, meanwhile, complained that working software was being held back in favor of an unproven approach that might well offer worse performance; "John is right to push back here". But he also suggested that the existing interface might, in fact, be more generic than it appears, and could be the basis for a longer-term solution as well:

That said - I'm looking at fs/fuse/famfs.c and I'm asking myself what in here is actually famfs-specific. If you just s/FAMFS/DAX/g - the file just reads like a simple DAX-iomap backend with optional striping.

Would it be reasonable to refactor the dax layer (and users) to create an ops structure that becomes the basis for the BPF solution?

That led memory-management developer David Hildenbrand to ask whether the BPF solution would be acceptable to the memory-management developers, a question that Groves was also acutely interested in. If the answer is "no", Groves said, much of the discussion described here would be moot. Meanwhile, he added, he had just received a prototype implementation from Price that could be interesting; Price then described that solution, which involves a BPF callback at file-open time to do the equivalent of the GET_FMAP call.

As of this writing, Groves is evaluating how well Price's prototype implementation will work. It seems clear, though, that no conclusion will be reached in the email discussion. The next LSFMM+BPF meeting, as it happens, is the first week in May. That will be the perfect opportunity to lock the filesystem, memory-management, and BPF developers into the same room and deprive them of beer until they come up with a solution that all can live with.

Index entries for this article
KernelBPF/Filesystems
KernelFilesystems/famfs


to post comments

The other fam

Posted Apr 23, 2026 14:17 UTC (Thu) by phlogistonjohn (subscriber, #81085) [Link]

I had to look up the background on the name - this fam has nothing to do with ye olde File Alteration Monitor (FAM) but rather is a Fabric Attached Memory FS. Gotta love TLAs :-)

No Beer?!?

Posted Apr 23, 2026 14:29 UTC (Thu) by archaic (subscriber, #111970) [Link]

It is sometimes amazing what beer deprivation can accomplish. :)

Feels soul destroying

Posted Apr 23, 2026 23:31 UTC (Thu) by azumanga (subscriber, #90158) [Link]

Years ago I got a similar thing in gcc, me and one bunch of devs discussed an idea, I went away, polished, tested, then when it came to merge time someone else popped up and said the whole thing was a bad is and I should have done it all a different way.

People get to disagree of course, but it’s very depressing, you feel like you are wasting time you aren’t even getting paid for. I had previously made dozens of gcc patches and I never made another significant one after that again.

Without beer?

Posted Apr 24, 2026 4:21 UTC (Fri) by felixfix (subscriber, #242) [Link]

Surely that would be a crime against humanity. I suggest an alternative: lock them up with nothing but beer, and no bathroom breaks. If you're going to commit a crime against humanity, may as well go whole hog.

mmap and page cache

Posted Apr 25, 2026 17:21 UTC (Sat) by dankamongmen (subscriber, #35141) [Link] (5 responses)

"access this data by mapping it directly into its address space with mmap(), so that the data is all immediately accessible without system calls, and without going through the system's page cache"

mmap(2) does not bypass the page cache as far as i'm aware.

mmap and page cache

Posted Apr 25, 2026 19:08 UTC (Sat) by gmprice (subscriber, #167884) [Link]

O_DIRECT will bypass the page cache, but it's irrelevant - FAMFS is read-only. So mmap will map the page cache page if not mapped O_DIRECT

mmap and page cache

Posted Apr 25, 2026 22:59 UTC (Sat) by iabervon (subscriber, #722) [Link]

Bypassing the page cache is because it uses DAX to map CXL memory directly into user space. That is, all of the storage is addressable on these machines as memory already, so there's no need to copy it into other memory, but you want a filesystem to organize it and provide permissions for getting these pages into user space page tables. It wouldn't bypass the page cache if it was backed by a block device.

mmap and page cache

Posted Apr 26, 2026 4:07 UTC (Sun) by corbet (editor, #1) [Link]

As noted by others, it's not mmap() itself that bypasses the page cache, it's the use of DAX within famfs that brings that about. Sorry, I sort of left out a step there.

mmap and page cache

Posted Apr 26, 2026 13:49 UTC (Sun) by Paf (subscriber, #91811) [Link] (1 responses)

It depends what you mean by bypass. Once the memory has been faulted in, it isn’t required to make the trip through the page cache code to access it. It is stored in the page cache and must be faulted into the userspace memory(and read from storage). But after that.

mmap and page cache

Posted Apr 26, 2026 18:28 UTC (Sun) by corbet (editor, #1) [Link]

No, the whole point of famfs is that the data is already in memory, so copying it into the page cache would only be a waste. It bypasses the page cache entirely.

Typo

Posted Apr 26, 2026 9:20 UTC (Sun) by j0057 (subscriber, #143939) [Link] (1 responses)

I think the index entry should be s/BFP/BPF/.

Typo

Posted Apr 26, 2026 11:45 UTC (Sun) by jzb (editor, #7867) [Link]

Fixed. However, as it says just above the comment box, we ask that people email us about typos rather than leaving it as a comment. Thanks!

My two cents...

Posted Apr 29, 2026 19:36 UTC (Wed) by Heretic_Blacksheep (subscriber, #169992) [Link]

They should also be considering that many shops that otherwise might want to use famfs won't because they have security concerns about the eBPF subsystem in Linux.

I'm also having a problem with the gatekeeping and moving goal posts going on in with the kernel maintainers. There are Arm architecture maintainers on record *refusing* to allow merging of ARM64 CPU features that enable more performant AMD64 emulation as if the kernel is their pet project and not the efforts of thousands and at the service of many millions. The shifting goal posts for famfs is just evidence of more of the same. It's one thing when projects refuse to play by the rules. It's quite another when outsiders engage in good faith then repeatedly get shot down because multiple people have too much veto power in personal fifedoms. You end up like the Polish government back in the day where nothing could get done that didn't cater to the most powerful of the barons who could marshal or browbeat the others from using their veto or just did what they wanted to because no one else could marshal an objection to fait accompli.

Has anyone done an analysis of how many lines of patches that go into the various distro maintained kernels that aren't/won't get upstreamed, and how many were refused by 'official' subsystem maintainers? Sometimes they do have legit reasons to block some patch sets, but the real question is how many were blocked for specious reasons like the Arm compatibility patches were.

Famfs is not read only, but some use cases are

Posted May 3, 2026 15:02 UTC (Sun) by jagalactic (subscriber, #74260) [Link] (4 responses)

John Groves here, the author of famfs.

I'm not sure why, but it's a common misconception that famfs is read only. That is not true.

The reality is more complicated. The relationship of a file to its memory is fixed for the life of a file system (i.e. until you wipe it with a new mkfs.famfs). But the data in files can be writable from any node that can see a file. But if you do that, you are responsible for maintaining coherency for whatever it is that you're doing. (Famfs maintains coherency for its own metadata, but users are responsible for coherency of data in files.)

If you use mmap(...MAP_SHARED) on conventional memory, you have the same problem and need to use barriers/atomics/etc. to get things right. But conventional shared memory is "hardware cache coherent". Disaggregated shared memory makes those problems worse, but not impossible.

Because the coherency stuff is complicated, I have been encouraging use cases that are primarily "publish and share" because those don't have any cache-coherency gotchas. Dump a bunch of humongous data sets into famfs files, mount the shared memory file system on multiple nodes, and run analytics on it.

It's not the only thing that's possible, but there are actually good use cases that do this - and those can move onto famfs and use shared memory without heavy lifting. I expect some of those use cases to migrate to shared memory clusters with famfs from very expensive purpose-built architectures.

If the data is huge, the common techniques we have are sharding or demand-paging - both of which suck, but there are cases where each works pretty well. If you have 100TB of data that needs un-sharded in-memory performance, you can order an appliance with that much memory and dump your data into it in famfs. If you're not mutating in place, apps can use the files in famfs without even knowing it's special.

But you can't use famfs as a general purpose file system, for reasons I'll avoid ratholing on here (but they are very much covered in my many talks on YouTube, our recent IEEE paper "No Atomics, No Problem..." etc.

(PS: I will be keeping my beer with me in my backpack during LSFMM :D)

Famfs is not read only, but some use cases are

Posted May 3, 2026 15:19 UTC (Sun) by corbet (editor, #1) [Link] (3 responses)

The article pretty carefully did not say that famfs is read-only...did a mistake slip in there somewhere?

Famfs is not read only, but some use cases are

Posted May 3, 2026 15:39 UTC (Sun) by jagalactic (subscriber, #74260) [Link] (2 responses)

Sorry, no Jon! What you wrote is great.

Gregory Price said famfs is read-only in the comments, and there have been other publications that got that wrong!

I appreciate your coverage of this!

Famfs is not read only, but some use cases are

Posted May 3, 2026 23:23 UTC (Sun) by gmprice (subscriber, #167884) [Link] (1 responses)

Sure, apologies, I should have been a bit more nuanced in the sense that in it's current form it's append only (no delete without heroics, no mutations in place).

Which from the perspective of a workload mounting the system (as not the master) - the files which appear in the log are read-only.

Famfs is not read only, but some use cases are

Posted May 4, 2026 7:09 UTC (Mon) by gmprice (subscriber, #167884) [Link]

Self-correction: my info here was predicated on an earlier version of famfs. After talking with John, it's become apparent that the filesystem has evolved and in fact does not dictate this read-only nature anymore.

Sorry for the confusion.


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds