This article is timely for me as I just brought up a new Scientific Linux 6.2 (RHEL clone) server last week that uses XFS for most of the storage. This is my first new attempt to use XFS in years.
At one point, I tried very much to make a go with XFS but I always ran into compatibility issues that made me regret it. Thankfully, I never had any of the reliability problems that XFS used to suffer from when running on commodity (unreliable) hardware. It feels though that XFS on Linux has finally grown up.
SGI Altix customers, like NASA, run XFS systems in the hundreds of terabytes (although I am sure some of those are CXFS). Also, XFS is a fully supported filesystem in RHEL6 (including xfsprogs). My understanding is that Red Hat now employs the majority of the XFS developers.
Filesystem of the future? Btrfs and ZFS are more feature rich although adding LVM2 and mdraid to XFS closes the gap. Of course, even that setup lacks deduplication. That said, given the performance and current stability of XFS, perhaps it is the right filesystem for today.
Posted Jan 21, 2012 10:45 UTC (Sat) by drag (subscriber, #31333)
[Link]
dedupe is overrated.
From what I understand on Solaris is that in order to have a effecient De-dupe you need to be able to maintain a table of the 'deduped' items in RAM. That way when you need to access a file the filesystem knows where the actual bits are located without having to look them up. Something like that.
Does not seem to be a significant advantage as in many situations 5GB is considerably more expensive then 1TB of disk space.
The amount of RAM required to keep ZFS happy can be staggering sometimes.
However the kick-ass things that more modern file systems bring are things like online compression, checksum'ng, raid-like features, easy subvolumes. That sort of thing is very nice to have from a administrative, integrity, and availability viewpoint...
XFS: the filesystem of the future?
Posted Jan 21, 2012 18:13 UTC (Sat) by jmalcolm (guest, #8876)
[Link]
Indeed. I have also read that deduplication on ZFS is 'broken' but I did not have a link to back that up. Still, technology has a way of improving such that 'resource usage' and maturity concerns of today become unimportant in 'the future'.
I agree that snapshotting/cloning are exciting features of systems like ZFS. The 'time-slider' that Sun added to Nautilus in OpenSolaris invoked quite a lot of jealousy in me. I also thought that Nexenta integrating ZFS into 'apt-get' with 'apt-clone' was simply brilliant. (Note: on Ubuntu, 'apt-clone' is something else)
As a developer, I sometimes do silly thing like build or install a bleeding edge version of an important library which I later regret. I would love to have a simple and seamless way to roll-back the clock or easily hit the save button just before I do something stupid. Version control is great for code repositories but it does not really help me when I mess up my filesystem or install a broken version on an IDE. Not that I do those kinds of things of course...
XFS: the filesystem of the future?
Posted Jan 26, 2012 11:53 UTC (Thu) by jospoortvliet (subscriber, #33164)
[Link]
Actually, openSUSE and SLE do this using btrfs. It's build into the zypper package manager and additionally has commandline (snapper) and GUI (in YaST) interfaces.
Based on btrfs, a timeslider in a gui filemanager would be possible too, I'm sure, either using btrfs directly or as GUI to snapper (but that'd be (open)SUSE specific unless other distro's pick up on snapper).
XFS: the filesystem of the future?
Posted Feb 7, 2012 1:38 UTC (Tue) by jmalcolm (guest, #8876)
[Link]
I did not know that about SUSE. Thanks.
XFS: the filesystem of the future?
Posted Jan 21, 2012 23:10 UTC (Sat) by cmccabe (guest, #60281)
[Link]
Yeah, I agree that dedupe is overrated, for most applications.
Also keep in mind that the more compressed and de-duped your data is, the more likely it is that you'll lose data when there's a hardware problem. Some filesystems, like HDFS, actually write out the data three times or more, which is a kind of anti-deduplication.
XFS: the filesystem of the future?
Posted Jan 23, 2012 14:29 UTC (Mon) by jezuch (subscriber, #52988)
[Link]
> Yeah, I agree that dedupe is overrated, for most applications.
On the other hand, cp --reflink is quite awesome.
> Also keep in mind that the more compressed and de-duped your data is, the more likely it is that you'll lose data when there's a hardware problem. Some filesystems, like HDFS, actually write out the data three times or more, which is a kind of anti-deduplication.
I guess that native RAID-ing in the filesystem is expected to offset this risk, in any "normal" situation at least.
XFS: the filesystem of the future?
Posted Jan 23, 2012 18:48 UTC (Mon) by martinfick (subscriber, #4455)
[Link]
It seems unfair to say that dedupe is overrated if you are only basing this on a single implementation (ZFS). There are many ways to dedupe which do not suffer from the same RAM problem (COW comes to mind), and I suspect that many more will be implemented in the future.
Also I suspect that you may not have considered that while RAM indeed is expensive compared to disks, if implemented properly deduping files will actually save RAM when a single file can be cached instead of many. Vserver unification, while not a full featured dedup, does allow for this RAM savings which can be huge in virtualised environments (and more).
XFS: the filesystem of the future?
Posted Jan 24, 2012 21:42 UTC (Tue) by wazoox (subscriber, #69624)
[Link]
Generally speaking, dedupe is trading CPU and RAM for storage space. It has also the serious drawback of making many sequential IOs random. It probably makes sense when your storage stack is horribly expensive, or when you really need to squeeze out some more bandwidth on a replicated system, etc. However given current hard drives prices (even with the current 50% price hike) and subsystem performance (any 500 bucks RAID card can do 1 GB/s), it's almost always a gain only for the vendor.
XFS: the filesystem of the future?
Posted Jan 24, 2012 22:36 UTC (Tue) by khim (subscriber, #9252)
[Link]
On the other hand dedupe is pretty good fit for SSD. SSD is expensive (albeit less expensive then RAM) and seeks are not as important.
XFS: the filesystem of the future?
Posted Jan 25, 2012 7:48 UTC (Wed) by wazoox (subscriber, #69624)
[Link]
That's true, but so far dedupe is mostly touted for secondary-level storage, so SSDs are a bit of a stretch.
XFS: the filesystem of the future?
Posted Jan 28, 2012 20:36 UTC (Sat) by robbe (guest, #16131)
[Link]
> in many situations 5GB is considerably more expensive then 1TB of disk
> space.
How do you figure? This amount of memory sets me back for less than double the cost of disk space. But *only* if looking at ECC RAM versus cheap & big SATA storage. Go to SAS, as used in many servers, and the scale is more like 1:1.
And that's not even considering RAID, where net capacity is not 100%
Of course, in many environments, you need to get RAM and disks from your server vendor, and they mark up prices arbitrarily ... so your numbers can come out completely different.
XFS: the filesystem of the future?
Posted Jan 22, 2012 13:47 UTC (Sun) by dgc (subscriber, #6611)
[Link]
> Filesystem of the future? Btrfs and ZFS are more feature rich although
> adding LVM2 and mdraid to XFS closes the gap. Of course, even that setup
> lacks deduplication.
I address this point in the presentation - XFS is not trying to replace BTRFS as XFS has a fundamentally different view of data to BTRFS and ZFS. That is, XFS does not "transform" user data (e.g. CRC, encrypt, compress or dedupe) as it passes through the filesystem. All XFS does is provide an extremely large pipe to move the data to/from the storage hardware to/from the application. There is no way we can scale to tens of GB/s data throughput if we have to run CPU based calculations on every piece of data that passes through it.
This is a fundamental limitation of filesystems like BTRFS and ZFS - they assume that there is CPU and memory available to burn for the transformations and that they scale arbitrarily well. If you are limited on your CPU or memory (e.g. your application is using it!) then hardware offload is the only way you can scale such data transforms. At that point, you may as well be using XFS.
i.e. BTRFS will only scale with all it's features enabled up to a certain point, but there are already many people out there with much higher performance requirements than that cross-over point. It's above that cross-over point that I see XFS as "the filesystem of the future". Indeed, I expect the BTRFS system/binary/home filesystems and XFS production data filesystems combination to become a quite common server configuration in the not too distant future....
Dave.
XFS: the filesystem of the future?
Posted Jan 23, 2012 14:24 UTC (Mon) by masoncl (subscriber, #47138)
[Link]
Dave gave us (the btrfs list) a chance to optimize things for his runs a few weeks ago. We've got patches in hand that do make it much faster, but the biggest improvement is just using a larger btree block size. That lets us dramatically reduce the metadata required to track the extents.
XFS is putting out awesome numbers in these workloads, well done.
XFS: the filesystem of the future?
Posted Feb 2, 2012 12:21 UTC (Thu) by ArbitraryConstant (guest, #42725)
[Link]
> This is a fundamental limitation of filesystems like BTRFS and ZFS - they assume that there is CPU and memory available to burn for the
> transformations and that they scale arbitrarily well. If you are limited on your CPU or memory (e.g. your application is using it!) then hardware offload
> is the only way you can scale such data transforms. At that point, you may as well be using XFS.
Doesn't that more or less depend on the storage/volume management though?
Doing thin provisioning on a SAN, many of the potential gains from btrfs have already been covered and there's no sense paying for them twice. Reasonable data integrity protection is available if appropriately configured. In that case, yes, xfs has a lot going for it.
But on local disk none of the options look that great. LVM can't do non-crap thin provisioning. In that case if you selectively set nodatacow on btrfs and succeed in having it act like other filesystems, that's really a huge win. Nodatacow is useful for things like databases that frequently (eg mysql/innodb) implement their own CRC, but filesystem CRC is available for other applications on the same storage pool.
The btrfs guys seem pretty focused on fsck for now, but RAID5/6 and subvol/file level RAID is in the works. There's no non-crap way to do this on local disk, certainly not in any way that's easy to reize or thin provision, but mixing RAID levels is no problem for a SAN.
You suggest xfs shouldn't be regarded as targeted towards big iron because its performance is relevant to current and future inexpensive hosts, but it still seems pretty specialized in that direction if inexpensive hosts need high end storage to get important features. Btrfs brings SAN functionality within the ambit of cheap local storage.
CPU to burn won't always be true, but we're getting a lot of cores these days, we're getting them cheaper than most storage solutions, and they're getting CRC acceleration instructions. I wouldn't be surprised if CRC performance on one of these 16+ thread CPUs was indistinguishable from memory bandiwdth.