POSIX v. reality: A position on O_PONIES
Posted Sep 17, 2009 16:01 UTC (Thu) by
forthy (guest, #1525)
In reply to:
POSIX v. reality: A position on O_PONIES by alexl
Parent article:
POSIX v. reality: A position on O_PONIES
I really wonder why all this "data=ordered" stuff is said to cost
performance. If implemented right, it must improve performance. All you
want to do is the following: Push data into the write buffer. Push
metadata into the metadata write buffer. Push freed blocks into the freed
blocks buffer (but don't actually free them). If your buffers are full,
there's no free block around any more, or a timer expires, do the
following:
- Write out data.
- Write out metadata (first to journal, then to the actual file
system).
- Actually free the blocks from the freed block list
You only have to write data once - new files go to newly allocated
blocks which don't appear in the metadata when you write them (they are
still marked as free in the on-disk data). For files with in-place
writes, we usually don't care (there are many race conditions for writing
in-place, so the general usage pattern is not to do that if you care about
your data). For crash-resilient systems, you want to write your metadata
twice (once into a journal, once into the file system), order it (ordered
metadata updates), or use a COW/log structured file system, where you
write a new file system root (snapshot) on every update round. While you
are writing data from your buffers, open up new buffers for the OS to be
used as buffers for the next round (double-buffering strategy). This
double buffering should be a common part of the FS layer, because it will
be used in all major file systems.
(
Log in to post comments)