Tux3: the other next-generation filesystem
Posted Dec 4, 2008 8:05 UTC (Thu) by
zmi (subscriber, #4829)
Parent article:
Tux3: the other next-generation filesystem
Having been checksumming filesystem data during continuous
replication for two years now on multiple machines, and having caught
exactly zero blocks of bad data passed as good in that time, I consider the
spectre of disks passing bad data as good to be largely vendor
FUD.
I must strongly object here. Over the last years, I have had 3 different
customers, using 2 different RAID-controller vendors with 2 different disk
types (SCSI, SATA), who got destroyed RAID contents because of a broken
disk that did not report (or detect) it's errors.
The problem is, that even RAID controllers do not "read-after-write" and
thus verify the contents of a disk. So if the disk says "OK" after a write
where in reality it's not, your RAID and filesystem contents still go to be
destroyed (because the drive reads back other data than it wrote).
Another check could be "on every read also calculate the RAID checksum to
verify", but for performance reasons nobody does that.
There REALLY should be filesystem-level checksumming, and a generic
interface between filesystem and disk controller, where the filesystem can
tell the RAID controller to switch to
"paranoid mode", doing read-after-write of disk data. It's gonna be slow
then, but at least the controller will find a broken disk and disable it -
after that, it can switch to performance mode again.
Yes, our customers were quiet unsatisfied that even with RAID controllers
their data got broken. But the worst is, it takes a long time for customers
to see and identify there is a problem - you can only hope for a good
backup strategy! Or for a filesystem doing checksumming.
mfg zmi
(
Log in to post comments)