LWN.net Logo

Strong correlation?

Strong correlation?

Posted Jun 15, 2007 21:26 UTC (Fri) by giraffedata (subscriber, #1954)
In reply to: Strong correlation? by joern
Parent article: KHB: Real-world disk failure rates: surprises, surprises, and more surprises

In an operation of the size these papers talk about, gut feelings about "strong" and "weak" correlation and the pain of data loss aren't even significant. It's pure numbers. Somebody somewhere has decided how much a data loss costs and probability, repair costs, and interest rates fill out the equation.

Sometimes the cost of data loss is really simple. I had a telephone company customer years ago who said an unreadable tape cost him exactly $16,000. The tapes contained billing records of calls; without the record, the company simply couldn't bill for the call. Another, arguing against backing up his product source code, showed the cost of hiring engineers to rewrite a megabyte of code from scratch.

In the Google situation, I believe single drive data loss is virtually cost-free. That's because of all that replication and backup. In that situation, the cost of the failure is just the cost of service interruption (or degradation) and drive replacement. And since such interruptions and replacements happen regularly, the only question is whether it's cheaper to replace a drive earlier and thereby suffer the interruption later.

Anyway, my point is that with all the different ways disk drives are used, I'm sure there are plenty where replacing the drive when its expected failure rate jumps to 30% is wise and plenty where doing so at 90% is unwise.


(Log in to post comments)

Strong correlation?

Posted Jun 16, 2007 2:16 UTC (Sat) by vaurora (subscriber, #38407) [Link]

This is an excellent point - the utility of failure probability data depends on the use case. Google in general has all data replicated a minimum of three times (see the GoogleFS paper) and as a result, it is not cost-effective to replace a drive before it actually fails in practice in most situations. For any sort of professional operation with regular backups and/or replication, this data is not particularly useful except as input into how many thousands of new hard drives to order next month. But for an individual user without automated backup systems, it can provide a valuable hint on the utility of conducting that long-delayed manual backup within the next few hours.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds