User: Password:
Subscribe / Log in / New account

Rename undo

Rename undo

Posted Apr 2, 2009 14:01 UTC (Thu) by xoddam (subscriber, #2322)
In reply to: That massive filesystem thread by butlerm
Parent article: That massive filesystem thread

> If one wants high performance rename replacements, rename undo is much more practical.

I'm intrigued, but not satisfied. Telling the journal that a metadata change is 'committed' means that the post-crash-recovery state will reflect the change (journal replay).

Surely the only satisfactory way to commit data before committing the metadata change is to delay *all* journal commits in-order until after the relevant file data is written in place, or to journal the data itself.

For performance reasons it's probably much saner not to journal most data, especially for random access within large files, but I'm thinking that if it makes sense to allocate-on-commit to preserve the in-order semantics of atomic rename, it might also make good sense to special-case data journalling for newly-written (created or truncated) files when they are renamed (perhaps only for small files, and allocate-on-commit larger ones as users will likey expect a delay).

Having the ability to unwind a specific kind of metadata change seems very confusing. I fear that winding back a rename could well result in a violation of expected in-order semantics w.r.t. metadata after crash recovery. Or might it be possible to wind back an entire 'transaction', all other metadata changes since the rename included?

(Log in to post comments)

Rename undo

Posted Apr 2, 2009 18:13 UTC (Thu) by butlerm (guest, #13312) [Link]

You are right, if you want guaranteed preservation of in-order semantics
there are no alternatives other than journalling the data or delaying all
journal commits until the corresponding data has been written. Both
options are available (e.g. data=journal, and data=ordered), and both have
serious performance problems. Of course, if that is really what you need,
than the price is worth paying.

"data=writeback" is the current alternative which doesn't make any pretense
to the preserving in-order semantics of data and meta-data after a crash.
You get a snapshot of your meta-data at a certain point of time, but the
data may be trashed.

Rename undo is a much less severe compromise to in-order semantics after a
crash. It is not point in time recovery, it is consistent version recovery.
That can have some unexpected side effects, but none remotely as severe as
losing the data completely.

In the case you mention, if you write a new version, rename it over the old
one, change the security permissions on the replacement, and then the
system crashes, you are not going to get the new (unwritten) data, the new
inode, and the new permissions, you are going to get the old inode, the old
data, and the old permissions. The permissions go with the inode (and the
data), not the directory entry. That is what you want. The old inode (and
the old data) has to be kept around until the data for the new inode is
completely on disk. Otherwise you cannot undo the rename replacement after
the fact.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds