User: Password:
Subscribe / Log in / New account

JLS2009: A Btrfs update

JLS2009: A Btrfs update

Posted Nov 4, 2009 8:40 UTC (Wed) by njs (guest, #40338)
In reply to: JLS2009: A Btrfs update by anton
Parent article: JLS2009: A Btrfs update

Never overwriting data in place is a pretty huge constraint, though. There are some interesting data storage applications that can be efficiently implemented using append-only files, but they're a tiny minority...

(Log in to post comments)

JLS2009: A Btrfs update

Posted Nov 5, 2009 14:09 UTC (Thu) by nye (guest, #51576) [Link]

>Never overwriting data in place is a pretty huge constraint, though

Nevertheless, it's generally a requirement for consistency in the face of application crashes (never mind system crashes or power cuts), unless you want to be dealing with full-blown transactional operations at the application level - which could be very little work if performed using facilities provided by the filesystem, but then wouldn't be portable.

JLS2009: A Btrfs update

Posted Nov 5, 2009 14:14 UTC (Thu) by anton (subscriber, #25547) [Link]

Most applications don't even append, they just write a new file in one go (and some then rename it, unlinking the old one). I think that ext3 data=ordered is a good file system for these applications.

Of course, for applications that overwrite stuff in place (e.g., usually data bases) it's not a good file system, and these applications need fsync() with it.

JLS2009: A Btrfs update

Posted Nov 8, 2009 2:36 UTC (Sun) by butlerm (guest, #13312) [Link]

Ext3 is *great* for these applications, other than the fact that it is rather
slow for a number of important use cases.

Most importantly a high performance filesystem needs to be able to sync the
data of one file independent of all the pending data for every other open
file. That is the whole problem with ext3 - it doesn't do that, so an fsync
under competing write load is very slow.

Ext4 fixes these problems, but either requires an fsync or inserts one to
make a rename replacement an atomic operation. That delay could be avoided
with some reasonable internal modifications (keeping the old inode around
until the new inode's data commits, and then undoing the rename if necessary
on journal recovery), but I am not aware of any filesystem that actually does
that. You have to call fsync to make your code portable anyway, but there
are a number of applications where that is too expensive.

JLS2009: A Btrfs update

Posted Nov 8, 2009 22:04 UTC (Sun) by anton (subscriber, #25547) [Link]

I don't see that fsync() makes my code (or anyone else's) portable. POSIX gives no useful guarantees on fsync(); different file systems have different requirements for what you have to fsync() in order to really commit a file. So use of fsync() is inherently non-portable.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds