User: Password:
Subscribe / Log in / New account

JLS2009: A Btrfs update

JLS2009: A Btrfs update

Posted Nov 3, 2009 23:11 UTC (Tue) by anton (subscriber, #25547)
In reply to: JLS2009: A Btrfs update by njs
Parent article: JLS2009: A Btrfs update

I think that ext3 with data=journal or data=ordered is pretty close to a good file system for applications that don't overwrite files in place (e.g., editors). But I would be more confident if some file system developer actually made data consistency a design goal and gave some explicit guarantees.

(Log in to post comments)

JLS2009: A Btrfs update

Posted Nov 4, 2009 0:01 UTC (Wed) by nix (subscriber, #2304) [Link]

Unfortunately, both of those are only good filesystems if you really don't
care at all about either read or write speed. The latency figures Linus
posted (from one process dd(1)ing and another writing tiny files and
fsync()ing them) are appalling. We're not talking a mere few seconds,
we're talking over a minute at times.

JLS2009: A Btrfs update

Posted Nov 5, 2009 14:04 UTC (Thu) by anton (subscriber, #25547) [Link]

ext3 with data=ordered is fast enough in my experience (which includes several multi-user servers).

What you write about these figures [citation needed] reminds me of my experiences with copying stuff to flash devices. However, no writing to an ext3 file system was involved there, and I suspect that the problem is sitting at a lower level than the msdos or vfat file system.

JLS2009: A Btrfs update

Posted Nov 5, 2009 18:08 UTC (Thu) by nix (subscriber, #2304) [Link]

Yeah, that's (as you know from the comment you linked to) a problem that
the per-bdi writeback fix should solve. I saw it back in the days before
cheap USB hard drives, when I ran backups onto pcdrw...

JLS2009: A Btrfs update

Posted Nov 4, 2009 8:40 UTC (Wed) by njs (guest, #40338) [Link]

Never overwriting data in place is a pretty huge constraint, though. There are some interesting data storage applications that can be efficiently implemented using append-only files, but they're a tiny minority...

JLS2009: A Btrfs update

Posted Nov 5, 2009 14:09 UTC (Thu) by nye (guest, #51576) [Link]

>Never overwriting data in place is a pretty huge constraint, though

Nevertheless, it's generally a requirement for consistency in the face of application crashes (never mind system crashes or power cuts), unless you want to be dealing with full-blown transactional operations at the application level - which could be very little work if performed using facilities provided by the filesystem, but then wouldn't be portable.

JLS2009: A Btrfs update

Posted Nov 5, 2009 14:14 UTC (Thu) by anton (subscriber, #25547) [Link]

Most applications don't even append, they just write a new file in one go (and some then rename it, unlinking the old one). I think that ext3 data=ordered is a good file system for these applications.

Of course, for applications that overwrite stuff in place (e.g., usually data bases) it's not a good file system, and these applications need fsync() with it.

JLS2009: A Btrfs update

Posted Nov 8, 2009 2:36 UTC (Sun) by butlerm (guest, #13312) [Link]

Ext3 is *great* for these applications, other than the fact that it is rather
slow for a number of important use cases.

Most importantly a high performance filesystem needs to be able to sync the
data of one file independent of all the pending data for every other open
file. That is the whole problem with ext3 - it doesn't do that, so an fsync
under competing write load is very slow.

Ext4 fixes these problems, but either requires an fsync or inserts one to
make a rename replacement an atomic operation. That delay could be avoided
with some reasonable internal modifications (keeping the old inode around
until the new inode's data commits, and then undoing the rename if necessary
on journal recovery), but I am not aware of any filesystem that actually does
that. You have to call fsync to make your code portable anyway, but there
are a number of applications where that is too expensive.

JLS2009: A Btrfs update

Posted Nov 8, 2009 22:04 UTC (Sun) by anton (subscriber, #25547) [Link]

I don't see that fsync() makes my code (or anyone else's) portable. POSIX gives no useful guarantees on fsync(); different file systems have different requirements for what you have to fsync() in order to really commit a file. So use of fsync() is inherently non-portable.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds