User: Password:
Subscribe / Log in / New account

That massive filesystem thread

That massive filesystem thread

Posted Apr 1, 2009 8:39 UTC (Wed) by nix (subscriber, #2304)
In reply to: That massive filesystem thread by bojan
Parent article: That massive filesystem thread

fsync() sucks not because it is a 'commit now' operation (that would be fbarrier()) but because it is a 'commit and force to disk now' operation.

Actually on many OSes it's a 'start a background force to disk now and return before it's done' operation; on Linux it's a 'lob it at the disk controller so it can cache it instead' operation. Still not necessarily useful (although that is changing to optionally emit a barrier to the disk controller too.)

(Speaking as the owner of an Areca RAID card with a quarter-gig of battery-backed cache, using non-RAIDed filesystems purely as an fs-cache storage area, I *like* the ability to turn off barriers: all they do is slow my system down with no reliability gain at all.)

(Log in to post comments)

That massive filesystem thread

Posted Apr 1, 2009 8:55 UTC (Wed) by bojan (subscriber, #14302) [Link]

By commit now, I meant force to disk. I think that was clear from the "disk being spun up unnecessarily" bit.

fbarrier vs. fsync

Posted Apr 1, 2009 12:02 UTC (Wed) by butlerm (guest, #13312) [Link]

"fbarrier(fd)" is not a "commit now" operation - that would make it
indistinguishable from fsync. It is a "commit data before metadata"

The real technical problem here is that from the application perspective,
the meta data update must take place immediately, i.e. before the system
call returns. However, from a recovery perspective, it is highly desirable
that the persistent meta data state not be committed until after the data
has been committed. Unless a filesystem maintains two versions of its
metadata (a la soft updates), that is an unusually difficult requirement to
meet without serious performance problems.

The alternative that I would really like to see is undo records for a few
critical operations like rename replacement, such that the physical data /
meta data ordering requirements are removed, and on recovery the filesystem
un-does rename replacements where the replacement data has not been
committed to disk. That replaces the ideal of point-in-time recovery with
the more practical ideal of consistent version recovery.

That massive filesystem thread

Posted Apr 1, 2009 12:17 UTC (Wed) by butlerm (guest, #13312) [Link]

Or in other words, fbarrier is a completely different kind of barrier than
the one that the "barrier=1" mount option requests. The latter is a low
level block I/O write barrier usually implemented with a full write cache
flush (barring some sort of battery backup), the former is a data before
meta data barrier.

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds