Tux3: the other next-generation filesystem
Posted Dec 11, 2008 8:01 UTC (Thu) by daniel
In reply to: Tux3: the other next-generation filesystem
Parent article: Tux3: the other next-generation filesystem
there is another more subtle problem. When mounting the filesystem with very little DRAM available, it may not be possible to cache all "promised" metadata blocks. So one must start writing them back at mount time.
You mean, first run with lots of ram, get tons of metadata blocks pinned, then remount with too little ram to hold all the pinned metadata blocks. A rare situation, you would have to work at that. All of ram is available for pinned metadata on remount, and Tux3 is pretty stingy about metadata size.
In your example, when B is rewritten (a btree split or merge) the promise made by C' to update B is released because B' is on disk. So the situation is not as complex as you feared.
I expect we can just ignore the problem of running out of dirtyable cache on replay and nobody will ever hit it. But for completeness, note that writing out the dirty metadata is not the only option. By definition, one can reconstruct each dirty metadata block from the log. So choose a dirty metadata block with no dirty children, reconstruct it and write it out, complete with promises (a mini-rollup). Keep doing that until all the dirty metadata fits in cache, then go live. This may not be fast, but it clearly terminates. Unwinding these promises is surely much easier than unwinding credit default swaps :-)
to post comments)