LWN.net Logo

Where the the correctness go?

Where the the correctness go?

Posted Mar 15, 2009 2:42 UTC (Sun) by njs (guest, #40338)
In reply to: Where the the correctness go? by nix
Parent article: Ts'o: Delayed allocation and the zero-length file problem

> There's a window there where it can leave you with baz~ and baz.new, but no baz, on crash.

Yeah, 3.f is supposed to say "link", not "rename". (Programming against POSIX correctly makes those Raymond Smullyan books seem like light reading. If only everything else weren't worse...)

> Why doesn't someone add real DB-style transactions to at least one filesystem, again? They'd be really useful...

The problem is that a filesystem has a bazillion mostly independent "transactions" going on all the time, and no way to tell which ones are actually dependent on each other. (Besides, app developers would just screw up their rollback-and-retry logic anyway...)

Completely off the wall solution: move to a plan9/capability-ish system where apps all live in little micro-worlds and can only see the stuff that is important to them (this is a good idea anyway), and then use these to infer safe but efficient transaction domain boundaries. (Anyone looking for a PhD project...?)


(Log in to post comments)

Transactions, ordering, rollback...

Posted Mar 15, 2009 8:09 UTC (Sun) by Pc5Y9sbv (guest, #41328) [Link]

The entire point of transactions is to say "these operations are related to one another" by opening a transaction and performing multiple read/write actions which populate a data dependency map. Then the commit says that either the dependency map is satisfied and all writes are made, or no writes are made. Thus it would not be difficult to obtain the map from the application, but it is a huge expansion of scope for the filesystem abstraction.

However, as we were discussing further up the page, a write-barrier is really all that is needed for the intuitive crash-proof behavior desired by everything doing the "create a temp file; relink to real name". An awful lot of the discussion seems to conflate request ordering with synchronous disk operations, when all we really desire is ordering constraints to flow through the entire filesystem and block layer to the physical medium.

All people want is for the POSIX ordering semantics of "file content writes" preceding "file name linkage" to be preserved across crashes. It is OK if the crash drops cached data and forgets the link, or the data and link, but not the data while preserving the link.

Where the the correctness go?

Posted Mar 15, 2009 12:26 UTC (Sun) by nix (subscriber, #2304) [Link]

Even the off-the-wall solution won't work, because the reason for
transactions getting entangled with each other is dependencies *within the
fs metadata*. i.e. what you'd actually need to do is put off *all*
operations on fs metadata areas that may be shared with other transactions
until such time as the entire transaction is ready to commit. And that's a
huge change.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds