> "Instead of writing out changes to parents of altered blocks, Tux3 only changes the parents in cache, and writes a description of each change to a log on media. This prevents recursive copy-on-write. Tux3 will eventually write out such retained dirty metadata blocks in a process we call 'rollup', which retires log blocks and writes out dirty metadata blocks in full"
I am glad to hear that this technique has been taken up by filesystem developers, and that it is performing well so far. Block modification journaling is relatively common (if not par for the course) in database implementation, for exactly the same reason. The redo log can be written to and committed in a hurry, and the modified blocks forced to disk at any convenient time.
Most databases I am familiar with only force versions of all blocks to disk in a checkpoint process once every thirty minutes or so, because the blocks can be reconstructed from a clean backup by applying the redo log entries. Those reconstituted blocks do not need to be forced to disk then and there either.
Of course the trick with a filesystem is that one is typically journaling only metadata, not data updates, and in a conventional non-copy-on-write filesystem that causes interesting consistency issues that have to be dealt with on recovery.
I am curious, however, how Tux3 deals with recovery of a trashed metadata block without a clean prior image to apply journaled block changes to. Some filesystems journal entire block images for that reason, with the obvious downside of substantially increased log traffic. Is Tux3 using some sort of copy-on-write scheme for the meta-data blocks themselves so that clean prior versions can be obtained for the deltas to be applied to?