Using fsck to defend against disk failures?
Posted Jan 27, 2008 15:45 UTC (Sun) by anton
In reply to: ext3 metaclustering
Parent article: ext3 metaclustering
That and the "spreading inconsistency" theory and some other things I
have read by people writing about fsck are failure types that I have
never seen or read a first-hand report of, so I guess they are just
myths or a perverted form of wishful thinking.
The kinds of disk failures I have seen have always been different.
In particular, even if a drive developed a bad block, it recognized
that itself (very slowly) and returned an error rather than wrong
data. I'm not sure if fsck programs are up to dealing with a bad
block of this kind in the metadata, but if a drive has a bad block,
that's certainly a good time to replace the drive and restore the data
from backup. Or you run RAID 1 or RAID 5, you
just need to replace the drive (and make it known to the RAID driver).
Moreover, even if a disk drive deteriorates over time, that's more
likely to hit the data first rather than the meta-data. But fsck
checks only some kinds of errors in the meta-data, so if fsck is your
defense against bad blocks, you don't value your data at all. Making
a backup is more likely to unveil bad blocks than fsck (also in data),
and has obvious additional benefits.
Finally, a good way (much better than fsck) to test the drive for
bad blocks is "smartctl -t long", even though I am sceptical about the
predictive capabilities of SMART.
Overall, I am very sceptical about the value of fsck for dealing
with hardware failures, and a little bit less sceptical about its
value when dealing with software failures (but I think I have not been
bitten by a file system bug yet); in many cases (especially the
hardware ones) we have to restore from backup anyway.
to post comments)