User: Password:
Subscribe / Log in / New account

POSIX v. reality: A position on O_PONIES

POSIX v. reality: A position on O_PONIES

Posted Sep 17, 2009 16:01 UTC (Thu) by forthy (guest, #1525)
In reply to: POSIX v. reality: A position on O_PONIES by alexl
Parent article: POSIX v. reality: A position on O_PONIES

I really wonder why all this "data=ordered" stuff is said to cost performance. If implemented right, it must improve performance. All you want to do is the following: Push data into the write buffer. Push metadata into the metadata write buffer. Push freed blocks into the freed blocks buffer (but don't actually free them). If your buffers are full, there's no free block around any more, or a timer expires, do the following:

  1. Write out data.
  2. Write out metadata (first to journal, then to the actual file system).
  3. Actually free the blocks from the freed block list

You only have to write data once - new files go to newly allocated blocks which don't appear in the metadata when you write them (they are still marked as free in the on-disk data). For files with in-place writes, we usually don't care (there are many race conditions for writing in-place, so the general usage pattern is not to do that if you care about your data). For crash-resilient systems, you want to write your metadata twice (once into a journal, once into the file system), order it (ordered metadata updates), or use a COW/log structured file system, where you write a new file system root (snapshot) on every update round. While you are writing data from your buffers, open up new buffers for the OS to be used as buffers for the next round (double-buffering strategy). This double buffering should be a common part of the FS layer, because it will be used in all major file systems.

(Log in to post comments)

POSIX v. reality: A position on O_PONIES

Posted Sep 17, 2009 16:41 UTC (Thu) by dlang (subscriber, #313) [Link]

the problem with your approach is that various pieces (including the hard drive itself) will re-order anything in it's buffer to shorten the total time it takes to get everything in the buffer to disk.

that is why barriers are needed to tell the device not to reorder across the buffer.

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds