LWN.net Logo

Where did the correctness go?

Where did the correctness go?

Posted Mar 17, 2009 7:18 UTC (Tue) by dlang (✭ supporter ✭, #313)
In reply to: Where did the correctness go? by butlerm
Parent article: Ts'o: Delayed allocation and the zero-length file problem

how do you think the databases make sure their data is on disk?

they use f(data)sync calls to the filesystem.

so your assertion that databases can make atomic changes to their data faster than the filesystem can do an fsync means that either you don't know what you are saying, or you don't really have the data safety that you think you have.


(Log in to post comments)

Where did the correctness go?

Posted Mar 17, 2009 8:31 UTC (Tue) by butlerm (subscriber, #13312) [Link]

ACID has four letters for a reason. Atomicity is logically independent of
durability. A decent database will let you turn (synchronous) durability
off while fully guaranteeing atomicity and consistency.

The reason is that with a typical rotating disk, any durable commit is
going to take at least one disk revolution time, i.e. about 10 ms. Single
threaded atomic (but not necessarily durable) commits can be issued a
hundred times faster than that, because no synchronous disk I/O is required
at all.

Where did the correctness go?

Posted Mar 17, 2009 9:48 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

and all the filesystems (including ext4 prior to the patches) provide the atomicity you are looking for.

it's just the durability in the face of a crash that isn't there. but it wasn't there on ext3 either (there was just a smaller window of vunerability), and even if you mount your filesystem with the sync option many commodity hard drives would not let you disable their internal disk caches, and so you would still have the vunerability (with an even smaller window)

Where did the correctness go?

Posted Mar 17, 2009 17:30 UTC (Tue) by butlerm (subscriber, #13312) [Link]

"and all the filesystems (including ext4 prior to the patches) provide the
atomicity you are looking for."

I am afraid not. Atomic means that the pertinent operation always appears
either to have completed OR to have never started in the first place. If
the system recovers in a state where some of the effects of the operation
have been preserved and other parts have disappeared, that is not atomic.

The operation here is replacing a file with a new version. Atomic
operation means when the system recovers there is either the old version or
the new version, not any other possibility. You can do this now of course,
you simply have have to pay the price for durability in addition to
atomicity.

Per accident of design, filesystems require a much higher price (in terms
of latency) to be paid for durability than databases do. That
factor is multiplied by a hundred or more if atomicity is required, but
durability is not.

Where did the correctness go?

Posted Mar 17, 2009 17:38 UTC (Tue) by butlerm (subscriber, #13312) [Link]

I refer to filesystem *meta-data* operations of course.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds