Posted Jun 15, 2007 21:26 UTC (Fri) by giraffedata
In reply to: Strong correlation?
Parent article: KHB: Real-world disk failure rates: surprises, surprises, and more surprises
In an operation of the size these papers talk about, gut feelings about "strong" and "weak" correlation and the pain of data loss aren't even significant. It's pure numbers. Somebody somewhere has decided how much a data loss costs and probability, repair costs, and interest rates fill out the equation.
Sometimes the cost of data loss is really simple. I had a telephone company customer years ago who said an unreadable tape cost him exactly $16,000. The tapes contained billing records of calls; without the record, the company simply couldn't bill for the call. Another, arguing against backing up his product source code, showed the cost of hiring engineers to rewrite a megabyte of code from scratch.
In the Google situation, I believe single drive data loss is virtually cost-free. That's because of all that replication and backup. In that situation, the cost of the failure is just the cost of service interruption (or degradation) and drive replacement. And since such interruptions and replacements happen regularly, the only question is whether it's cheaper to replace a drive earlier and thereby suffer the interruption later.
Anyway, my point is that with all the different ways disk drives are used, I'm sure there are plenty where replacing the drive when its expected failure rate jumps to 30% is wise and plenty where doing so at 90% is unwise.
to post comments)