I'm intrigued, but not satisfied. Telling the journal that a metadata change is 'committed' means that the post-crash-recovery state will reflect the change (journal replay).
Surely the only satisfactory way to commit data before committing the metadata change is to delay *all* journal commits in-order until after the relevant file data is written in place, or to journal the data itself.
For performance reasons it's probably much saner not to journal most data, especially for random access within large files, but I'm thinking that if it makes sense to allocate-on-commit to preserve the in-order semantics of atomic rename, it might also make good sense to special-case data journalling for newly-written (created or truncated) files when they are renamed (perhaps only for small files, and allocate-on-commit larger ones as users will likey expect a delay).
Having the ability to unwind a specific kind of metadata change seems very confusing. I fear that winding back a rename could well result in a violation of expected in-order semantics w.r.t. metadata after crash recovery. Or might it be possible to wind back an entire 'transaction', all other metadata changes since the rename included?
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds