An alternative to the application barrier() call
An alternative to the application barrier() call
Posted Sep 13, 2009 17:46 UTC (Sun) by anton (subscriber, #25547)In reply to: An alternative to the application barrier() call by dlang
Parent article: POSIX v. reality: A position on O_PONIES
Code that writes a few characters here and a few characters there usually uses the FILE * based interface, which performs user-space buffering and then typically performs write() (or somesuch) calls of 4k or 8k at a time; just strace one of these programs. That's done to reduce the system call overhead. But even if such programs perform a write() for each of the application writes, having barriers between each of them does not kill performance, because a sequence of such writes can be merged.
Concerning the block device below, if that does not heed the block device barriers or other block device ordering mechanisms that the file system requests, then you get no guarantee at all of any consistency on crash/power failure. It's not just that merged writes won't work, your style of merge-preventing barriers won't work, either, and neither will the guarantees that fsync()/fdatasync are supposed to provide; that's because all of them require that the block device ordering mechanism(s) that the file system uses actually work, and all of them will produce inconsistent states if the writes happen in an order that violates the ordering requests. So, if you want any consistency guarentees at all, you need an appropriate block device, and then you can implement mergeable writes just as well as anything else.
As for an array where a write spans drives, implementing a barrier or other ordering mechanism on the array level certainly requires something more involved than just doing barriers on the individual block devices, but the device has to provide these facilities, or you can forget about crash consistency on that device (i.e., just don't use it).
Posted Sep 13, 2009 20:23 UTC (Sun)
by dlang (guest, #313)
[Link] (3 responses)
this isn't always needed, so don't try to do it for every write (and I've straced a lot of code that does lots of wuite() calls)
do it when the programmer says that it's important. 99+% of the time it won't be (the result is not significantly more usable after a crash with part of the file if it's not all there, or this really is performance sensitive enought to risk it)
you would be amazed at the amount of risk that people are willing to take to get performance. talk to the database gurus at MySQL or postgres about the number of people they see disabling f*sync on production databases in the name of speed.
Posted Sep 14, 2009 22:16 UTC (Mon)
by anton (subscriber, #25547)
[Link] (2 responses)
And since it is possible to implement these implicit barriers between each write efficiently (by merging writes), why burden
programmers with inserting explicit file system barriers? Look at how
long the Linux kernel hackers needed to use block device barriers in
the file system code. Do you really expect application developers to
do it at all? And if they did, how would they test it? This has the
same untestability properties as asking application programmers to use
fsync.
Concerning the risk-loving performance freaks, they will use the
latest and greatest file system by Ted T'so instead of one that offers
either implicit or explicit barriers, but of course they will not use
fsync() on that file system:-).
BTW, if you also implement block device writes by avoiding
overwriting live sectors and by using commit sectors, then you can
implement mergeable writes at the block device level, too (e.g., for
making them cheaper in an array). However, the file system will not
request a block device barrier often, so there is no need to go to such complexity
(unless you need it for other purposes, such as when your block device
is a flash device).
Posted Sep 20, 2009 5:22 UTC (Sun)
by runekock (subscriber, #50229)
[Link] (1 responses)
But what about eliminating repeated writes to the same place? Take this contrived example:
repeat 1000 times:
A COW file system may well be able to merge the writes, but it would require a lot of intelligence for it to see that most of the writes could actually be skipped. And a traditional file system would be even worse off.
Posted Sep 20, 2009 18:38 UTC (Sun)
by anton (subscriber, #25547)
[Link]
An update-in-place file system (without journal) would indeed have
to perform all the writes in order to have the on-disk state reflect
one of the logical POSIX states at all times (assuming that there are
no repeating patterns in the two values that are written; if there are, it is theoretically possible to skip the writes between two equal
states).
An alternative to the application barrier() call
Fortunately writes on the file system level can be merged across file
system barriers, resulting in few barriers that have to be passed to
the block device level. So there is no need to pass a block device
barrier down for every file system barrier.
An alternative to the application barrier() call
An alternative to the application barrier() call
write first byte of file A
write first byte of file B
For a copy-on-write file system that example would be easy: Do all the
writes in memory (in proper order), and when the system decides that
it's time to commit the stuff to disk, just do a commit of the new
logical state to disk (e.g., by writing the first block each of file A
and file B and the respective metadata to new locations, and finally
a commit sector that makes the new on-disk state visible.
An alternative to the application barrier() call
