Rename undo
Rename undo
Posted Apr 2, 2009 14:01 UTC (Thu) by xoddam (guest, #2322)In reply to: That massive filesystem thread by butlerm
Parent article: That massive filesystem thread
I'm intrigued, but not satisfied. Telling the journal that a metadata change is 'committed' means that the post-crash-recovery state will reflect the change (journal replay).
Surely the only satisfactory way to commit data before committing the metadata change is to delay *all* journal commits in-order until after the relevant file data is written in place, or to journal the data itself.
For performance reasons it's probably much saner not to journal most data, especially for random access within large files, but I'm thinking that if it makes sense to allocate-on-commit to preserve the in-order semantics of atomic rename, it might also make good sense to special-case data journalling for newly-written (created or truncated) files when they are renamed (perhaps only for small files, and allocate-on-commit larger ones as users will likey expect a delay).
Having the ability to unwind a specific kind of metadata change seems very confusing. I fear that winding back a rename could well result in a violation of expected in-order semantics w.r.t. metadata after crash recovery. Or might it be possible to wind back an entire 'transaction', all other metadata changes since the rename included?
Posted Apr 2, 2009 18:13 UTC (Thu)
by butlerm (subscriber, #13312)
[Link]
"data=writeback" is the current alternative which doesn't make any pretense
Rename undo is a much less severe compromise to in-order semantics after a
In the case you mention, if you write a new version, rename it over the old
Rename undo
there are no alternatives other than journalling the data or delaying all
journal commits until the corresponding data has been written. Both
options are available (e.g. data=journal, and data=ordered), and both have
serious performance problems. Of course, if that is really what you need,
than the price is worth paying.
to the preserving in-order semantics of data and meta-data after a crash.
You get a snapshot of your meta-data at a certain point of time, but the
data may be trashed.
crash. It is not point in time recovery, it is consistent version recovery.
That can have some unexpected side effects, but none remotely as severe as
losing the data completely.
one, change the security permissions on the replacement, and then the
system crashes, you are not going to get the new (unwritten) data, the new
inode, and the new permissions, you are going to get the old inode, the old
data, and the old permissions. The permissions go with the inode (and the
data), not the directory entry. That is what you want. The old inode (and
the old data) has to be kept around until the data for the new inode is
completely on disk. Otherwise you cannot undo the rename replacement after
the fact.