POSIX v. reality: A position on O_PONIES
POSIX v. reality: A position on O_PONIES
Posted Sep 10, 2009 9:04 UTC (Thu) by alexl (subscriber, #19068)In reply to: POSIX v. reality: A position on O_PONIES by dlang
Parent article: POSIX v. reality: A position on O_PONIES
NO NO NO NO. We do not need/want the file to be fsynced.
Why do people keep repeating this fallacy? We all know that fsync is expensive, and don't want to use it, or something with similar semantics.
What we want is something that gives us the natural behavior of rename() replace (atomically get either the old or the new file) and extend it to a system crash. This does not imply a fsync, but rather that the data for the new file is on disk before the metadata is on disk. This is much cheaper than an fsync because it does not require the data to be written immediately, but rather that we have to delay the write of the metadata until the data has been written. Thus "little cost in performance", at least in relation to fsync.
And then you write "ext3 never provided the guarantees that people think it did" when my whole point has been about how everyone gives this reason for why people use rename when its not actually the reason! I am well aware that rename() does not give me system crash safety, I use it for other reasons. However, I *would* like it if this common operation that has been in use for decades before ext3 was written also was recognized by ext3 and made even more useful (even though this is in no way guaranteed by POSIX).
Posted Sep 10, 2009 16:37 UTC (Thu)
by nye (subscriber, #51576)
[Link] (3 responses)
Posted Sep 17, 2009 20:38 UTC (Thu)
by HelloWorld (guest, #56129)
[Link] (2 responses)
Posted Sep 21, 2009 1:52 UTC (Mon)
by efexis (guest, #26355)
[Link]
Posted Sep 21, 2009 13:45 UTC (Mon)
by nye (subscriber, #51576)
[Link]
Posted Sep 17, 2009 16:01 UTC (Thu)
by forthy (guest, #1525)
[Link] (1 responses)
I really wonder why all this "data=ordered" stuff is said to cost
performance. If implemented right, it must improve performance. All you
want to do is the following: Push data into the write buffer. Push
metadata into the metadata write buffer. Push freed blocks into the freed
blocks buffer (but don't actually free them). If your buffers are full,
there's no free block around any more, or a timer expires, do the
following: You only have to write data once - new files go to newly allocated
blocks which don't appear in the metadata when you write them (they are
still marked as free in the on-disk data). For files with in-place
writes, we usually don't care (there are many race conditions for writing
in-place, so the general usage pattern is not to do that if you care about
your data). For crash-resilient systems, you want to write your metadata
twice (once into a journal, once into the file system), order it (ordered
metadata updates), or use a COW/log structured file system, where you
write a new file system root (snapshot) on every update round. While you
are writing data from your buffers, open up new buffers for the OS to be
used as buffers for the next round (double-buffering strategy). This
double buffering should be a common part of the FS layer, because it will
be used in all major file systems.
Posted Sep 17, 2009 16:41 UTC (Thu)
by dlang (guest, #313)
[Link]
that is why barriers are needed to tell the device not to reorder across the buffer.
POSIX v. reality: A position on O_PONIES
POSIX v. reality: A position on O_PONIES
POSIX v. reality: A position on O_PONIES
POSIX v. reality: A position on O_PONIES
POSIX v. reality: A position on O_PONIES
POSIX v. reality: A position on O_PONIES