LWN: Comments on "Failure Trends in a Large Disk Drive Population" http://lwn.net/Articles/224484/ This is a special feed containing comments posted to the individual LWN article titled "Failure Trends in a Large Disk Drive Population". hourly 2 Failure Trends in a Large Disk Drive Population http://lwn.net/Articles/225419/rss 2007-03-09T14:10:39+00:00 NRArnot This is extremely well worth reading, especially if you are operating a hardware-RAID (such as 3Ware) and monitoring with SMART.<br> <p> Clearly much folklore is wrong. Some conclusions include<br> <p> The best pre-failure indicator is when your drive reallocates its first block.<br> <p> Nearly half the drives failed without SMART giving any hint that was coming.<br> <p> Over-cooled disks (20C) are not obviously more reliable than ones that run warm (up to 40C), in fact the converse. (I presume they are designed to run at the temperature that you typically get inside a desktop PC, namely 30 to 35C. It's what any real engineer would do!)<br> <p> Little or no evidence that [S]ATA disks running continuously are less reliable or last less long than ones that are powered on and off daily. <br> <p> It's not just me that has trouble with drives which are seriously degraded (so slooooow!), yet SMART still says that they are still perfect. (And hard to track down in a hardware RAID set: how to benchmark a particular drive when you suspect one of slowing a whole array? )<br> <p> I wish they'd said what make(s) they studied, and whether they were desktop-grade or enterprise-grade. I guess their lawyers wouldn't let them. Though 4 years ago did they do enterprise-grade ATA at all? <br> Failure Trends in a Large Disk Drive Population http://lwn.net/Articles/224500/rss 2007-03-02T22:37:20+00:00 dlang actually, weren't both of these papers here a couple weeks ago?<br> Failure Trends in a Large Disk Drive Population http://lwn.net/Articles/224494/rss 2007-03-02T21:54:12+00:00 maney Well, this is a rarity - a solidly technical item that showed up on slashdot before LWN. (caveat: ones that appear close together I'll usually see first here, as slashdot is near the bottom of the list of sites I stop by more or less daily). Even more unexpectedly, there was a followup mention there of another paper about disk drives that's in some ways even more interesting: <a href="http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html">Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?</a>. The usual failure model seems not to fit observed failure rates very well at all, at all... <p> Both articles will be of interest to anyone who cares about disk drives' lifespans.