I've been using XFS for 15 years, and currently manage 250 systems in the 10 - 200 TB range; I've lost files once over a known XFS bug, and though I hear here and there about "XFS zeroing files" I never encountered the problem.
I'm afraid it's in the same ballpark as people who have 3 hard drives and say "OMG this brand sucks because one of my drives failed". I have several thousands spinning hard drives under my guard, and I think I have quite better statistics on what works and what doesn't; the same goes for filesystems, how many filesystems do you manage? If it's less than a couple of hundreds, the fact that you encountered a particular problem with a particular filesystem once doesn't carry much significance.
Posted Jan 23, 2012 16:25 UTC (Mon) by rfunk (subscriber, #4054)
[Link]
I said I was talking about non-server contexts. XFS was designed for server rooms, where the power never goes out. It sounds like that's where you're using it.
XFS: the filesystem of the future?
Posted Jan 23, 2012 18:40 UTC (Mon) by wazoox (subscriber, #69624)
[Link]
Well, I use it on all of my desktop systems too, and my work desktop has an uptime of 176 days. I don't reboot often :)
XFS: the filesystem of the future?
Posted Jan 29, 2012 3:47 UTC (Sun) by sbergman27 (subscriber, #10767)
[Link]
"Well, I use it on all of my desktop systems too, and my work desktop has an uptime of 176 days. I don't reboot often :)"
I hope you don't connect to the Internet with that security-hole ridden kernel you're running. You should reboot after kernel updates.
XFS: the filesystem of the future?
Posted Jan 30, 2012 8:03 UTC (Mon) by youareretarded (guest, #82640)
[Link]
are you retarded? nobody gets exploited by missing a kernel update? it sounds like you don't know shit about the kernel if you think someone is going to be remotely exploited through a kernel bug introduced in the last 180 days. it's always the software, never the kernel.
XFS: the filesystem of the future?
Posted Jan 23, 2012 21:46 UTC (Mon) by dgc (subscriber, #6611)
[Link]
Actually, XFS was designed for storage subsystems that didn't lie to it, not "server storage". You can say exactly the same for ext3, ext4, btrfs, etc.
"Consumer storage" violates the write ordering guarantees that these filesystems require to have journal recovery work because they have volatile write caches. That's why we have write barriers and use them by default on these filesystems these days. XFS was the first filesystem to enable them by default, another reason it was always slower on metadata intensive workloads than ext3/4.
"server storage" doesn't violate write ordering in an effort to improve performance, so XFS has always worked fine and performed well on that class of storage.
Dave.
XFS: the filesystem of the future?
Posted Jan 31, 2012 17:36 UTC (Tue) by Cato (subscriber, #7643)
[Link]
Do you have evidence that all "consumer storage" devices violate write ordering guarantees, or simply don't flush pending writes on request?
As far as I can tell, some consumer drives do lie about when writes have been flushed, and write back caching is the default anyway.
Posted Jan 31, 2012 19:30 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
I don't think anyone is meaning to say that all consumer storage devices are broken, but I also don't think that there's much dispute that some are.
XFS: the filesystem of the future?
Posted Feb 2, 2012 21:19 UTC (Thu) by dgc (subscriber, #6611)
[Link]
> Do you have evidence that all "consumer storage" devices violate write
> ordering guarantees,
Any device with a volatile write cache tells the OS that the write IO has been completed before it actually is written to stable storage. IO completion is supposed to mean "the IO is complete" and any device with a volatile write cache is actually lying - the write is not yet on stable storage, so is "lying" about the completion status of the IO to the OS. Pretty much all consumer devices ship with a volatile write cache enabled by default for performance reasons.
Barriers and cache flushes were introduced to provide a mechanism that allowed filesystems to force such drives to order writes the way the filesystem wants correctly. The original barrier mechanism was "cache flush, write, cache flush" and could make the drive slower than not caching in the first place depending on the workload. More recently we just use the FUA mechanism if the drive supports that, and that has neglible performance overhead.
> As far as I can tell, some consumer drives do lie about when writes
> have been flushed, and write back caching is the default anyway
If drives lie about cache flush or FUA completion on volatile writeback caches, then that's a bug in the disk firmware.
FWIW, the difference with server storage (SAS drives) is that most ship with the volatile write cache turned off by default. They don't need it for performance because the SCSI/SAS protocol is much more efficient than SATA and so in most cases a write cache isn't necessary. You can turn it on, but you don't need to to reach full disk performance....
Indeed, it's not just filesystems that don't like volatile write caches. if you turn on volatile write caching on disks behind a RAID controller, the disk will now violate the write ordering guarantees that the RAID controller relies on to maintain data safety (exactly the same as for filesystems). You still lose data or corrupt filesystems on power loss in this case, even though the OS and RAID controller are behaving correctly.
Dave.
XFS: the filesystem of the future?
Posted Feb 3, 2012 4:34 UTC (Fri) by raven667 (subscriber, #5198)
[Link]
> They don't need it for performance because the SCSI/SAS protocol is much more efficient than SATA and so in most cases a write cache isn't necessary.
I think you are correct on every other point but I dont think this is right. SATA is pretty much the SCSI protocol as is SAS, they are only slightly incompatible for marketing rather than technical reasons. The big performance difference historically between consumer (IDE) and enterprise (SCSI) drives was tagged command queuing which is now very common in SATA drives as well although it wasn't so common in early SATA implementations. A tagged command queue allows the drive to implement an elevator which is a big win against a naive implementation without one.
XFS: the filesystem of the future?
Posted Feb 3, 2012 12:35 UTC (Fri) by Jonno (subscriber, #49613)
[Link]
Actually, SATA (Serial ATA) uses a slightly extended ATA command set over a serial bus, while SAS (Serial Attached Scsi) uses the SCSI command set over the same serial bus.
The SCSI command set is generally considered "better" than the ATA command set, though the difference isn't quite as large as the grand parent suggests. Write caches are still beneficial for SCSI (including SAS) performance, but the difference is not quite as large with SCSI as with ATA. That, as well as the fact that the average enterprise customer are more concerned about reliability than the average home user, are the reason that most SAS drives have write cache disabled by default, while most SATA drives have write cache enabled by default.
XFS: the filesystem of the future?
Posted Feb 3, 2012 17:30 UTC (Fri) by raven667 (subscriber, #5198)
[Link]
The "extended ATA" command set that's used on SATA devices not operating in Legacy IDE mode is the SCSI command set. This goes all the way back to ATAPI which is the SCSI command set encapsulated with the IDE bus protocol. Drives and controllers are capable of speaking either SATA-II or SAS protocols without any cost difference AFAICT but don't, for largely marketing reasons rather than engineering ones. As I was saying before, having a command queue on the drive allows for the drive to have an IO elevator which is _the_ big performance win, details about how commands are named and whatnot is not really an important factor.
scsi misinformation
Posted Feb 3, 2012 14:08 UTC (Fri) by quanstro (guest, #77996)
[Link]
the argument here seems to be circular. sas is faster
because it's sas.
sata and sas send the same data in the same size fises/frames
to the drive. neither are wire-speed limited. they're spin/seek
limited; physics limited.
could you please explain the mechanism where by sas is going to
be faster than sata?
scsi misinformation
Posted Feb 3, 2012 19:00 UTC (Fri) by raven667 (subscriber, #5198)
[Link]
> the argument here seems to be circular. sas is faster because it's sas
That's the power of branding, replacing rational thought with mental shortcuts which put things in "good" or "bad" boxes.
XFS: the filesystem of the future?
Posted Feb 4, 2012 13:00 UTC (Sat) by zomonto (guest, #82108)
[Link]
> The original barrier mechanism was "cache flush, write, cache flush" and could make the drive slower than not caching in the first place depending on the workload. More recently we just use the FUA mechanism if the drive supports that, and that has neglible performance overhead.
No, libata always disables FUA by default. You can enabled it with a
kernel parameter though.
XFS: the filesystem of the future?
Posted Feb 8, 2012 13:09 UTC (Wed) by yungchin (guest, #72949)
[Link]
> Pretty much all consumer devices ship with a volatile write cache enabled by default for performance reasons.
Dave, I was wondering, given the optimisations you discussed in the talk, where there's now lots of merging and reordering going on before sending things to the i/o scheduler, do you still expect much performance improvements from these hardware caches? (Or - that's of course the hidden question here - should we from now on happily disable them, at least for most use cases?) Thanks.
XFS: the filesystem of the future?
Posted Feb 1, 2012 11:33 UTC (Wed) by Cato (subscriber, #7643)
[Link]
Maybe I didn't quite get your point - would be good to understand exactly what write ordering guarantees are provided by server storage but not consumer storage. Is this just that a consumer hard drive's write cache will reorder writes without respecting the kernel's write barriers?
XFS: the filesystem of the future?
Posted Feb 1, 2012 19:28 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
some consumer drives lie about when the data has actually been written to disk (making write barriers ineffective), in those cases the OS will send more writes to the drive and the drive will go ahead and re-order them with the other writes that are in it's buffer.
XFS: the filesystem of the future?
Posted Feb 1, 2012 20:13 UTC (Wed) by raven667 (subscriber, #5198)
[Link]
And as someone else pointed out drives these days seem to have up to 64MB write buffers so that could be a lot of corruption if that data goes missing in-flight when the OS was told that it was permanently committed.
XFS: the filesystem of the future?
Posted Feb 1, 2012 22:20 UTC (Wed) by magila (subscriber, #49627)
[Link]
While there's been a lot of talk about consumer devices "lying" in this thread, they really don't behave any differently from server drives with the same cache settings. Both consumer and server drives provide an option to turn on write caching. Both will lie about writes completing when write caching is enabled. Both will only signal command completion when data has gone to disk if write caching is disabled[1]. The only real difference is the default, most (but not all) enterprise drives ship with write cache disabled while all consumer drive ship with it enabled. Both give the option of changing the setting to whatever the user pleases.
[1] I vaguely remember hearing a story several years ago that a handful of ATA drive models where not respecting write cache settings. This was an isolated incident. Newer drives can reasonably be assumed to handle write caching correctly.
XFS: the filesystem of the future?
Posted Feb 2, 2012 1:39 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
when I talk about a drive lying, I'm not talking about normal write caching, I'm talking about it either not respecting write cache settings, or lying about data integrity commands that are supposed to work even in the face of write caching (cache flush commands for example)
most consumer drives don't have these problems, but a few have been found to have them.
unfortunately you cannot just assume that newer dries will not have the problem. On the database mailing lists you see a couple drive models every year where someone runs across the problem yet again.
XFS: the filesystem of the future?
Posted Feb 2, 2012 2:40 UTC (Thu) by magila (subscriber, #49627)
[Link]
"On the database mailing lists you see a couple drive models every year where someone runs across the problem yet again."
I'd be rather surprised if that were the case. The code that handles cache flushing isn't something which usually changes between models. If a manufacturer's firmware had a bug in that area I'd expect to see it across the board, not just randomly poping up periodically on different SKUs.
XFS: the filesystem of the future?
Posted Feb 2, 2012 13:11 UTC (Thu) by cladisch (✭ supporter ✭, #50193)
[Link]
> The code that handles cache flushing isn't something which usually changes between models. If a manufacturer's firmware had a bug …
You won't get any manufacturer to admit it, but this is not a bug, it's a feature (to get higher benchmark numbers).
XFS: the filesystem of the future?
Posted Feb 2, 2012 17:38 UTC (Thu) by magila (subscriber, #49627)
[Link]
You might not believe it, but I can say based on first hand experience that hard drive manufacturers take data integrity very seriously. None of them would risk losing customer data just to gain extra performance. The potential backlash from data loss would be far worse than scoring lower on a benchmark.
Plus the people running benchmarks, especially for tier 1 OEMs, aren't stupid. Lying about cache flushes is pretty easy to detect so the likelihood of getting away with it is pretty low right from the start. Pissing off OEMs is another thing hard drive manufactures would never, ever take risks with.