Runtime filesystem consistency checking

Posted Apr 3, 2012 17:13 UTC (Tue) by nybble41 (subscriber, #55106)
Parent article: Runtime filesystem consistency checking

> Many of the inconsistencies that Recon found and fsck didn't were changes to unallocated data, which are not important from a consistency standpoint, but still should not be changed in a correctly operating filesystem.

That may be true of ext3, but in general there are a few reasons why one might want to write to space which has not been allocated *on disk*. Atomic updates come to mind: reserve space in memory, write the new data, and finally--after the data has been committed--reserve it on disk and update the metadata. Online defragmentation or resizing could be implemented this way as well. Unallocated data is "don't care"; it shouldn't be a problem to change it even if the reason for the change is not yet apparent.

This also seems to introduce a second point of failure; the system can cease to function due to a filesystem bug, as before, or due to a bug in Recon which unnecessarily blocks access to the filesystem. That risk would have to be weighed against the cost of filesystem corruption in the absence of Recon, of course.

Runtime filesystem consistency checking

Posted Apr 3, 2012 19:58 UTC (Tue) by NAR (subscriber, #1313) [Link] (2 responses)

This also seems to introduce a second point of failure

This was my first thought too. They are using code to check code, which is kind of like automated tests. In my experience tests are wrong about as many times as the code itself (but this could be due to our fragile test environment), so it's one more thing to get right. On the other hand if Recon is changed a lot fewer times than the filesystem code itself, then Recon can reach sufficient maturity to be actually useful.

Runtime filesystem consistency checking

Posted Apr 3, 2012 23:28 UTC (Tue) by neilbrown (subscriber, #359) [Link]

It reminds me a lot of lockdep.

lockdep is brilliant for developers as it warns you early of your bugs, just as this 'recon' would warn ext3 developers of their bugs.
But lockdep used to report lots of false positives - this has got a lot better over the years though.

I'm not sure I'd enable lockdep or recon in production though. There is a real cost, and it is not at all likely to help more than it hurts.

Runtime filesystem consistency checking

Posted Apr 4, 2012 13:10 UTC (Wed) by ashvin (guest, #83894) [Link]

We expect that Recon will initially be used mainly for development. As it matures, it could be deployed in production use. The need to change it over time should be comparable to the need to change fsck.

Runtime filesystem consistency checking

Posted Apr 4, 2012 13:07 UTC (Wed) by ashvin (guest, #83894) [Link] (1 responses)

We do handle "unallocated" writes due to shadow paging (e.g., copy-on-write on btrfs) where metadata is written to unallocated regions and then linked to the file-system on commit. We find this linkage at commit point and will not report an error. We haven't worked with online resizing but I suspect the handling should be similar on commit. Writing to unallocated data to which there is no linkage at commit seems suspect: how would the file system know that data is useful in anyway after a crash?

Runtime filesystem consistency checking

Posted Apr 4, 2012 15:30 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Writing to unallocated data to which there is no linkage at commit seems suspect: how would the file system know that data is useful in anyway after a crash?

The data would not be useful after a crash; up to the point where the allocation is recorded on disk, there are no references to it, and it can simply revert to unallocated space, canceling the incomplete "transaction".