Posted Oct 11, 2011 21:03 UTC (Tue) by josh (subscriber, #17465)
Parent article: Whither btrfsck?
I really don't understand why people consider fsck a part of any production environment. I'd expect a filesystem to detect corruption, avoid crashing, and attempt to continue to allow access to uncorrupted data. But for the corrupted data itself, I don't see anything wrong with saying "restore from backup if that happens", rather than "hope that fsck fixes it".
Posted Oct 11, 2011 21:23 UTC (Tue) by drdabbles (subscriber, #48755)
[Link]
I think there are varying levels of corruption and varying levels of acceptability of recovering corrupted data. For instance, recovering partial documents is better than recovering nothing. But, recovering data that isn't obviously corrupted to the human eye (say, scientific data at the LHC) could be disastrous.
But to your point, I agree completely. Having backups and recovering from them is almost always best.
Whither btrfsck?
Posted Oct 11, 2011 22:08 UTC (Tue) by terryburton (subscriber, #26261)
[Link]
Restoring from a backup may save you from the dole queue. Recovering to the data to a point immediately prior to the failure... priceless.
Whither btrfsck?
Posted Oct 12, 2011 6:27 UTC (Wed) by njs (guest, #40338)
[Link]
In most situations I find I actually prefer "here's an accurate snapshot, it's 12 hours old" to "here's an up-to-the-minute copy of the data recovered as best we could, maybe your files are all correct and maybe some are corrupted, who knows?".
Which isn't to say that fsck doesn't have its uses -- most people don't have proper backups in the first place, and after things go pear-shaped, fsck is your last hope. And it can be faster than restoring from backup. And perhaps btrfsck will be better than other fsck's, in that it could potentially use btrfs's hashes to give you a list of which files might be corrupted, so you can check them (or recover just them from a backup).
I'm just saying, fsck is nice to have, but it's not *that* critical.
Whither btrfsck?
Posted Oct 12, 2011 8:52 UTC (Wed) by rvfh (subscriber, #31018)
[Link]
> In most situations I find I actually prefer "here's an accurate snapshot, it's 12 hours old" to "here's an up-to-the-minute copy of the data recovered as best we could, maybe your files are all correct and maybe some are corrupted, who knows?".
That's whole point isn't it? If the fsck is not able to restore the data and gives you crap instead, then it's useless, and that's why Chris does not want it in the wild!
We need fsck, and we need it to work correctly.
Whither btrfsck?
Posted Oct 12, 2011 18:43 UTC (Wed) by njs (guest, #40338)
[Link]
> If the fsck is not able to restore the data and gives you crap instead, then it's useless
But, IIUC, this is exactly what existing fsck's do. If they can recover data, great, but their main priority is to make the filesystem data structures internally consistent again. If that means randomly making up data that's missing, throwing away some files whose metadata got confused, etc., then oh well, too bad.
Hopefully btrfsck will have a mode where it uses the checksumming information to give you a guaranteed-accurate list of which files were left in an inconsistent state with potentially screwed up contents, but if so then that will be a fancy unique feature that has never been seen in a mainstream Linux fsck before. (And does btrfs even keep data checksums by default?)
I can understand why Chris doesn't want to release a known-buggy fsck, but let's be realistic about what a bug-free fsck actually does...
Whither btrfsck?
Posted Oct 12, 2011 15:20 UTC (Wed) by edt (subscriber, #842)
[Link]
One point. With btrfs setup to use a mirror it should be possible to tell exactly which files, if any, are corrupt post fsck. All those checksums can be really handy...
Whither btrfsck?
Posted Oct 14, 2011 3:01 UTC (Fri) by zlynx (subscriber, #2285)
[Link]
That would fix a problem caused by a hardware error, but it wouldn't fix a disk structure problem caused by a logic bug in the btrfs driver itself.
Whither btrfsck?
Posted Oct 11, 2011 21:35 UTC (Tue) by jimparis (subscriber, #38647)
[Link]
For me, it's more the attitude than anything else. The strong focus on a good fsck for ext2/3/4 tells me that the developers are highly concerned with my data. They've spend a lot of time and effort writing a high-quality tool to try to recover from code mistakes or hardware errors, and it's been a part of the design from the start, not an afterthought. I suspect that the experience has also taught them the types of corruption that hard drives and filesystems are likely to experience, and made the fs code more robust as a result.
Whither btrfsck?
Posted Oct 18, 2011 21:13 UTC (Tue) by ableal (guest, #57174)
[Link]
Well, just today I had a disk with a couple of bad sectors, according to the SMART info, which would give me read errors, but I could not tickle with writes to force remapping. Not with the GUI 'disk utility' anyway.
So, I got the data off, reformatted, and let loose a CLI 'fsck -c -c' on its ext2/3/4 ass. This, obscurely enough, invokes 'badblocks' with read-write test (you have to read carefully the e2fsck man page, not just 'man fsck').
After that, the SMART data claims the disk is now fine, with no pending remapping of bad sectors. But, somehow, I get the feeling there's something uncanny about this rigmarole, not totally unlike burning the feathers of a black chicken at midnight of new moon ...
Whither btrfsck?
Posted Oct 20, 2011 6:03 UTC (Thu) by eduperez (guest, #11232)
[Link]
Next time, try to zero-write those bad blocks with "hdparam"; it worked for me.