LWN.net Logo

XFS: the filesystem of the future?

XFS: the filesystem of the future?

Posted Jan 31, 2012 17:36 UTC (Tue) by Cato (subscriber, #7643)
In reply to: XFS: the filesystem of the future? by dgc
Parent article: XFS: the filesystem of the future?

Do you have evidence that all "consumer storage" devices violate write ordering guarantees, or simply don't flush pending writes on request?

As far as I can tell, some consumer drives do lie about when writes have been flushed, and write back caching is the default anyway.

Some relevant links:

http://serverfault.com/questions/15404/sata-disks-that-ha...

http://brad.livejournal.com/2116715.html - disk testing tool

https://lwn.net/Articles/351521/


(Log in to post comments)

XFS: the filesystem of the future?

Posted Jan 31, 2012 19:30 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

I don't think anyone is meaning to say that all consumer storage devices are broken, but I also don't think that there's much dispute that some are.

XFS: the filesystem of the future?

Posted Feb 2, 2012 21:19 UTC (Thu) by dgc (subscriber, #6611) [Link]

> Do you have evidence that all "consumer storage" devices violate write
> ordering guarantees,

Any device with a volatile write cache tells the OS that the write IO has been completed before it actually is written to stable storage. IO completion is supposed to mean "the IO is complete" and any device with a volatile write cache is actually lying - the write is not yet on stable storage, so is "lying" about the completion status of the IO to the OS. Pretty much all consumer devices ship with a volatile write cache enabled by default for performance reasons.

Barriers and cache flushes were introduced to provide a mechanism that allowed filesystems to force such drives to order writes the way the filesystem wants correctly. The original barrier mechanism was "cache flush, write, cache flush" and could make the drive slower than not caching in the first place depending on the workload. More recently we just use the FUA mechanism if the drive supports that, and that has neglible performance overhead.

> As far as I can tell, some consumer drives do lie about when writes
> have been flushed, and write back caching is the default anyway

If drives lie about cache flush or FUA completion on volatile writeback caches, then that's a bug in the disk firmware.

FWIW, the difference with server storage (SAS drives) is that most ship with the volatile write cache turned off by default. They don't need it for performance because the SCSI/SAS protocol is much more efficient than SATA and so in most cases a write cache isn't necessary. You can turn it on, but you don't need to to reach full disk performance....

Indeed, it's not just filesystems that don't like volatile write caches. if you turn on volatile write caching on disks behind a RAID controller, the disk will now violate the write ordering guarantees that the RAID controller relies on to maintain data safety (exactly the same as for filesystems). You still lose data or corrupt filesystems on power loss in this case, even though the OS and RAID controller are behaving correctly.

Dave.

XFS: the filesystem of the future?

Posted Feb 3, 2012 4:34 UTC (Fri) by raven667 (subscriber, #5198) [Link]

> They don't need it for performance because the SCSI/SAS protocol is much more efficient than SATA and so in most cases a write cache isn't necessary.

I think you are correct on every other point but I dont think this is right. SATA is pretty much the SCSI protocol as is SAS, they are only slightly incompatible for marketing rather than technical reasons. The big performance difference historically between consumer (IDE) and enterprise (SCSI) drives was tagged command queuing which is now very common in SATA drives as well although it wasn't so common in early SATA implementations. A tagged command queue allows the drive to implement an elevator which is a big win against a naive implementation without one.

XFS: the filesystem of the future?

Posted Feb 3, 2012 12:35 UTC (Fri) by Jonno (subscriber, #49613) [Link]

Actually, SATA (Serial ATA) uses a slightly extended ATA command set over a serial bus, while SAS (Serial Attached Scsi) uses the SCSI command set over the same serial bus.

The SCSI command set is generally considered "better" than the ATA command set, though the difference isn't quite as large as the grand parent suggests. Write caches are still beneficial for SCSI (including SAS) performance, but the difference is not quite as large with SCSI as with ATA. That, as well as the fact that the average enterprise customer are more concerned about reliability than the average home user, are the reason that most SAS drives have write cache disabled by default, while most SATA drives have write cache enabled by default.

XFS: the filesystem of the future?

Posted Feb 3, 2012 17:30 UTC (Fri) by raven667 (subscriber, #5198) [Link]

The "extended ATA" command set that's used on SATA devices not operating in Legacy IDE mode is the SCSI command set. This goes all the way back to ATAPI which is the SCSI command set encapsulated with the IDE bus protocol. Drives and controllers are capable of speaking either SATA-II or SAS protocols without any cost difference AFAICT but don't, for largely marketing reasons rather than engineering ones. As I was saying before, having a command queue on the drive allows for the drive to have an IO elevator which is _the_ big performance win, details about how commands are named and whatnot is not really an important factor.

scsi misinformation

Posted Feb 3, 2012 14:08 UTC (Fri) by quanstro (guest, #77996) [Link]

the argument here seems to be circular. sas is faster
because it's sas.

sata and sas send the same data in the same size fises/frames
to the drive. neither are wire-speed limited. they're spin/seek
limited; physics limited.

could you please explain the mechanism where by sas is going to
be faster than sata?

scsi misinformation

Posted Feb 3, 2012 19:00 UTC (Fri) by raven667 (subscriber, #5198) [Link]

> the argument here seems to be circular. sas is faster because it's sas

That's the power of branding, replacing rational thought with mental shortcuts which put things in "good" or "bad" boxes.

XFS: the filesystem of the future?

Posted Feb 4, 2012 13:00 UTC (Sat) by zomonto (guest, #82108) [Link]

> The original barrier mechanism was "cache flush, write, cache flush" and could make the drive slower than not caching in the first place depending on the workload. More recently we just use the FUA mechanism if the drive supports that, and that has neglible performance overhead.

No, libata always disables FUA by default. You can enabled it with a
kernel parameter though.

XFS: the filesystem of the future?

Posted Feb 8, 2012 13:09 UTC (Wed) by yungchin (guest, #72949) [Link]

> Pretty much all consumer devices ship with a volatile write cache enabled by default for performance reasons.

Dave, I was wondering, given the optimisations you discussed in the talk, where there's now lots of merging and reordering going on before sending things to the i/o scheduler, do you still expect much performance improvements from these hardware caches? (Or - that's of course the hidden question here - should we from now on happily disable them, at least for most use cases?) Thanks.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds