LWN.net Logo

Justifying FS-Cache

By Jake Edge
December 24, 2008

In what must seem like a never-ending effort, David Howells is once again trying to get a generic mechanism to do local caching for network filesystems into the kernel. The latest version, number 41, of his FS-Cache patches was posted back in November, so now he is asking for it to be added to linux-next. That would mean that the feature was on-track for the mainline in 2.6.29, but it would appear that 2.6.30—if ever—is more likely.

The idea behind FS-Cache is to create a way for "slow" filesystems to cache their data on the local disk, so that repeated accesses do not require accessing the underlying slow storage. Howells has been working on getting it into the kernel for a number of years; our first article about it appeared in 2004. The canonical example of where it might be useful is a network filesystem on a heavily-used or low bandwidth link—the cost of re-reading data from the network may be much higher than retrieving it from a local disk. In addition, the cache can be persistent across reboots, allowing some files to live locally for a very long time.

But, Howells already has a fairly large, intrusive patch that is headed for 2.6.29: credentials. That patch touches a lot of code in the kernel, in particular the VFS layer. Christoph Hellwig is concerned about both credentials and FS-Cache going in at the same time :

I don't think we want fscache for .29 yet. I'd rather let the credential code settle for one release, and have more time for actually reviewing it properly and have it 100% ready for .30.

While that would delay the addition of FS-Cache, Andrew Morton has a larger concern:

I don't believe that it has yet been convincingly demonstrated that we want to merge it at all.

It's a huuuuuuuuge lump of new code, so it really needs to provide decent value. Can we revisit this? Yet again? What do we get from all this?

Morton is worried about adding additional maintenance headaches with no—or limited—benefits. Using a local disk to cache data from a remote disk is only useful in some scenarios; it can certainly make things worse in others. As Howells puts it: "It's a compromise: a trade-off between the loading and latencies of your network vs the loading and latencies of your disk; you sacrifice disk space to make up for the deficiencies of your network." What Morton is looking for is a push from users, be that end users or distributions that are shipping the feature. He would also like to see some benchmarks that show what gain there is when using FS-Cache.

Howells has patiently answered these concerns, pointing at some benchmarks he had posted in November that showed some significant savings. The benchmarks used NFS over a deliberately slow link (to simulate a heavily used network) and showed a huge decrease in the time required to read a large file, but was essentially break-even when operating on a kernel tree. In the kernel tree benchmark, though, the reduction in network traffic was significant.

More importantly, perhaps, is the fact that Red Hat has shipped FS-Cache in RHEL 5 and there are customers using it, as well as customers interested in using it as Howells pointed out:

We (Red Hat) have shipped it in RHEL-5 and some Fedora releases. Doing so is quite an effort, though, precisely because the code is not yet upstream. We have customers using it and are gaining more customers who want it. There even appear to be CentOS users using it (or at least complaining when it breaks).

While shipping out-of-tree code is no guarantee that the feature will get merged—AppArmor is an excellent counterexample—actual users whose needs are being met by a particular feature are a fairly persuasive argument. Howells outlines some customer use cases for FS-Cache, for example:

We have a number of customers in the entertainment industry who use or would like to use this caching infrastructure in their render farms. They use NFS to distribute textures (say a million and a quarter files) to the individual rendering units. FS-Cache allows them to reduce the network load by satisfying subsequent NFS READ requests from each rendering unit's local cache rather than having to go to the network again.

In all, it would seem that Morton's concerns were addressed. Whether that means the path is clear for 2.6.30 or these or other concerns will come to the fore is a question that will likely have to wait another three months or so.


(Log in to post comments)

Justifying FS-Cache

Posted Dec 25, 2008 15:35 UTC (Thu) by dw (subscriber, #12017) [Link]

Why can't identical behaviour be provided in userspace? The added CPU+latency of extra context switches and memory copies involved in using some evil combination of unionfs and FUSE would be negligibly higher than the delay already faced with pulling something over a network (and especially via a contended pipe).

I have in my head a 100ish line Python script that would accomplish this without resorting to all that C code.

Justifying FS-Cache

Posted Dec 25, 2008 15:37 UTC (Thu) by dw (subscriber, #12017) [Link]

Ignoring that for a second, why can't they just LART their "entertainment industry customers" and have them add 10 lines of code to their app to add caching functionality? This whole thing smells of suck.

Justifying FS-Cache

Posted Dec 25, 2008 18:56 UTC (Thu) by pkern (subscriber, #32883) [Link]

Well, the obvious thing that they could implement caching here aside. Is there some sane way to speed things like /home on NFS up with a local disk?

Of course one could switch to AFS where the local cache is mandatory, but which might give one weird bugs resulting out of non-POSIX behaviour (e.g. problems with locking). But I somehow fail to see production-ready alternatives for shared home directories. What you usually could get are cluster filesystems on some shared storage device, which are not really suitable for the one server - many clients case, where each client node could be turned off at any time.

Something like FS-Cache could really reduce network traffic and burden for the server here, especially because the clients in my case are equipped with a otherwise mostly unused harddisk. (The root filesystem is currently rsync'ed on them, which still leaves like 100G per client unused.) I would be glad for any other suggestion, though.

Justifying FS-Cache

Posted Dec 26, 2008 13:32 UTC (Fri) by vonbrand (subscriber, #4458) [Link]

The problem here is precisely the non-POSIX behaviour. The non-POSIXness of NFS is barely bearable, the one of such an "perhaps you get the offline version, perhaps the online one" is madness.

Justifying FS-Cache

Posted Dec 27, 2008 5:54 UTC (Sat) by csamuel (✭ supporter ✭, #2624) [Link]

Well, the obvious thing that they could implement caching here aside. Is there some sane way to speed things like /home on NFS up with a local disk?

NFSv4 already has the concept of file delegations (in fact 4.1 includes read-only directory delegations too), where a client which opens a file that isn't being accessed by other systems can be granted a delegation to operate on that file locally and then either commit the final changes to the server when its done or for the NFS server to recall the delegation if another client requests access.

So if you are running an HPC cluster, for instance, and a user runs a code in their home directory by accident (yes, it does happen, sadly) that uses a lot of temporary files then ideally the server will be able to delegate access to the client and they don't need to do any of their I/O over NFS..

Justifying FS-Cache

Posted Dec 26, 2008 22:56 UTC (Fri) by nix (subscriber, #2304) [Link]

I'm using it to allow my desktop box to boot from NFS, with (almost) no local storage other than FS caching: it's all on a remote system, because that happens to be a five-disk RAID-6 (md) array which I trust my data with more than a single-disk desktop: but I want that desktop to be local-disk fast nearly all the time.

Think of it as a poor man's hierarchical storage.

It rocks. (And it saved my bacon when I had a disk failure half a year ago.) :)

Not only NFS

Posted Dec 26, 2008 21:16 UTC (Fri) by zdzichu (subscriber, #17118) [Link]

The beatiful example of this second level caching is available in other opearting system for over a year. Imagine caching content from local HDD in insanely fast (hundreds of MB/s) Intel SSD. Using FS-cache, all linux filesystem could gain feature known as L2ARC in ZFS.
(the second interesting ZFS feature, separate intent log, we already have in the form of external journal device in ext3/4).

Not only NFS

Posted Dec 28, 2008 17:47 UTC (Sun) by oak (guest, #2786) [Link]

> Imagine caching content from local HDD in insanely fast (hundreds of
MB/s) Intel SSD.

Although Flash is fast to read, it's slow to write. What if user's mostly
streaming some huge media files she listens or views once (a month or
year)?

Not only NFS

Posted Dec 29, 2008 2:16 UTC (Mon) by giraffedata (subscriber, #1954) [Link]

Although Flash is fast to read, it's slow to write. What if user's mostly streaming some huge media files she listens or views once (a month or year)?

She doesn't use FS-Cache.

That the FS-Cache configuration doesn't help this user doesn't mean it doesn't help anyone.

Not only NFS

Posted Dec 29, 2008 13:51 UTC (Mon) by dag- (subscriber, #30207) [Link]

Well, it's not really that slow even when it is slower than reading. Here are the stats for my Intel 80GB SSD (on a RHEL5 2.6.18 kernel) with 2 threads:

Sequential writes: 57MB/sec
Sequential reads: 170MB/sec
Random reads: 199MB/sec
Mixed workload (70%R, 30%W): 69MB/sec
Random writes: 72MB/sec

And with 16 treads:

Sequential writes: 48MB/sec
Sequential reads: 208MB/sec
Random reads: 174MB/sec
Mixed workload (70%R, 30%W): 144MB/sec
Random writes: 40MB/sec

Granted, with 16 threads the results are less flattering. But for a single disk, it is not slow compared to other laptop disks.

Not only NFS

Posted Dec 30, 2008 23:00 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

SSD slow at write? Single X25-E can write at 170 MB/s, and Micron RealSSD tops at 250MB/s. Show me traditional disk capable of this. And it's general purpose SSD! I don't know how fast could be write-optimized flash used in hybrid storage NASes, like Sun Storage 7000 line.

There are some awesome benchmark numbers coming from system running ZFS with (~) external log and second level cache. They are coming mostly from extreme number IOPS delivered by flash, but interesting anyway. Just search for L2ARC and slog (separate intent log) results.

Not only NFS

Posted Jan 31, 2009 14:12 UTC (Sat) by zdzichu (subscriber, #17118) [Link]

For the record, this shows how external caching on fast SSD helps performance: http://blogs.sun.com/brendan/entry/l2arc_screenshots
Be aware that warming up cache take hours.

Justifying FS-Cache

Posted Jan 8, 2009 11:45 UTC (Thu) by rythie (guest, #56003) [Link]

If the version of FS-Cache in RHEL5 is anything to go by, then it seems far from ready for prime time, with quite regular panicing when I tried it.

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds