|
|
Subscribe / Log in / New account

KHB: Real-world disk failure rates: surprises, surprises, and more surprises

KHB: Real-world disk failure rates: surprises, surprises, and more surprises

Posted Jun 15, 2007 4:13 UTC (Fri) by njs (subscriber, #40338)
In reply to: KHB: Real-world disk failure rates: surprises, surprises, and more surprises by pr1268
Parent article: KHB: Real-world disk failure rates: surprises, surprises, and more surprises

It's a funny thing about us humans, if you put us in a data vacuum, we find some way to fill it, whether it makes sense or not. Most of us have no useful statistics at all on what hard drive brands or models will be reliable; heck, as the article points out, the people *making* the drives don't even know this.

But this ignorance makes us so *uncomfortable* that everyone finds some random fact to base their decisions on, like an anecdote about that time they bought a *** brand drive they and it died after a week and so they never buy *** anymore, or how they heard the new *** brand drives use a fancier production process, or the warranty labels on the side of the box.

Warranties aren't a measure of how proud some engineers somewhere are, they're a measure of some sales/accounting decision about how the cost of providing that warranty will compare to the extra sales they get by putting it on the box. (5 years ago we were using what, 40 GB drives? If that died today, assuming you even still had it online, would you figure out how to ship it back for a new 40 GB under the warranty, or just pick up 400 GB at the local mall? Whenever it's the latter, the warranty costs Seagate nothing, and Seagate knows how many people fall into each camp.)

Hard drives are a commodity. Any given model has some greater-than-zero failure rate; people who care about their data make backups and the failure rate doesn't matter, people who don't care about their data worry and fret over exactly what the best lottery ticket to buy is. Me, I figure hard drives are all close enough in speed I'm never going to notice, but I have the thing sitting right next to me all day long, so I buy drives by checking Silent PC Review's recommended list, and picking the top-rated drive I can find on sale.


to post comments

Hard Disk Drive Warranties

Posted Jun 15, 2007 5:06 UTC (Fri) by pr1268 (guest, #24648) [Link] (2 responses)

Even assuming that hard drive warranties are written by the sales/accounting department, don't you suppose that they looked at return rates of their products in order to make that warranty period?

WDC used to pledge a 3-year warranty. Now it's 1-year (again, assuming their consumer drives--IIRC their "Raptor" series of true-SCSI drives gets a longer warranty). Whether it was the sales/marketing folks at WDC, or it was the engineers, either way, around 3-4 years ago they decided that the warranty claim rate wasn't good enough to justify maintaining the 3-year warranty, so they reduced it to 1-year.

Certainly the folks over at Seagate were wise enough to perform the same cost vs. benefit analysis of pledging such a long warranty, regardless of whether it was the engineering team or the sales/marketing folks. But, with Seagate's substantially longer warranty, I can only assume that their cost vs. benefit analysis demonstrated either of two things: (1) their drives were high-enough quality such that the return rates were low and they could warranty their drives for 5 years whilst remaining profitable, or (2) They could absorb the cost of replacing defective drives under warranty at will for the indicated warranty period given a failure rate no better or worse than the commodity average.

I just don't see (2) above happening without Seagate making drives of such sorry quality and cheap manufacturing costs that they can justify the long warranty (analogy: I sell you a television for $150 which cost me $20 to build, and it has a 20% annual failure rate, so I can justify warrantying it for 5 years and still make a profit of $50), and I don't see them making drives of such unusually high quality that their manufacturing costs (and retail prices) spiral upwards. Their drives are competitively priced with WDC, Maxtor, and Fujitsu.

I don't mean to argue; but rather, I wanted to share my experiences and perhaps invoke a mildly-stimulating discussion. I totally agree that doing some basic consumer research on hard drive quality and features (you mentioned Silent PC) is a good idea for anyone wanting to invest in spinning platter data storage. :-)

Hard Disk Drive Warranties

Posted Mar 28, 2014 1:46 UTC (Fri) by know_it_all (guest, #96208) [Link] (1 responses)

Google's study was recently referenced at a forum which caused me to search for references to it and come across the forum. Even though this is an ancient article in terms of disk drive design cycles, I can shed some insight for others that come across the forum discussion.

The transducer used to write and read data from the disk drive flies over the disk at time frame of the article was somewhere around 5 to 10 nm. The tracks written to the disk and read sensor was a around 1 um and was a magneto resistive read transducer element (MR or GMR) constructed at the trailing edge of the read/write transducer (commonly called the R/W head. and having a resistance of around 35-100 ohms and sensing the magnetization of the disk having a change of < 1-2 ohms of resistance by passing a current through the sensor. The read back signal would typically be < 5 mV and be amplified by wide band low noise amplifiers up to a signal level of 100-300 mVpp an the actuator's amplifier before being passed back differentially to the disk drive's card electronics over a capton flexible tape wired circuit (called a flex tape).

SMART tracking systems measure amplitude and waveform properties of the read back signal measured by the read channel and firmware to predict degradation and wear out mechanism. They also monitor and track the positioning error (commonly called TMR or track mis-registration) of the servo system to follow the magnetically written positioning information on the disk.

An example of a error mechanism that SMART can predict as trend for using system to identify a failure mechanism of the drive in which the head may slowly accumulate debris on the air bearing surface of the head and degrade the signal amplitude causing loss of SNR and increase in bit errors and eventually sector read errors as the SNR margin is lost.

While drives are assembled in a extremely clean environment similar to silicon processes, there exist extremely important cleaning processes and assembly conditions to be maintained at the plant sites to insure that the manufacturing processes to make the components of the drive do not have particulate contamination in the disk enclosure.

When hard particles and especially conductive hard particles come in contact to the sensor during operation, the result can be an instantaneous detrimental scratch to the disk or to the sensor that either partially damages it and/or may cause partial demagnetization of the microscopic hard magnets used for bias the magnetic layers making up the MR or GMR structure of the sensor. So sudden degradation manifesting itself as low amplitude, instability of the properties in the read signal, or even simply "kills it" will result in the sudden failure described in the articles in a sudden manner that is undetectable to be signaled by a SMART trend algorithm to give early warning of the failure.

These type of impact type failures to not follow principles of temperature induced silicon failures as some of the commentary attempt to attribute for failure mechanisms and can be component dependent depending on cleaning systems at the time of component manufacture and assembly.

Other factors that are affected by temperature do exist. These involve the sensitivity to fly height due to viscosity of air and change in magnetic property of coercivity (as to how difficult to write or switch direction of the disk magnetization with the head's magnetic field). At lower temperature the air is thicker, so the head may fly higher. The disk's magnetic film coercivity property may increase making the disk harder to write. As a result, any mechanisms such as smears, particulate pickup of the head surface or damage to the writer by hard particles that may result in increased spacing can cause poor overwrite of disk and loss of written data as a function of the write process. The SMART algorithms monitor the overwrite property, but in the same manner as described for the read transducer of the head, will not predict sudden failure from particulate damage to the writer from a hard particle.

All drive manufactures have invested heavily in state of the art cleaning equipment and clean rooms for assembly cleanliness and continuously improve designs for disk flatness and the detection and avoidance of any hard particles that might be embedded to the disk surfaces and impact reliability. Current drive devices fly the read write transducer at around 1 nm off the disk and employ means to protrude and retract the head to pull back the critical write/read transducer and allow the head to fly at several nm off the surface to limit the amount of exposure to damage.

The observations that failure rates may also be a factor of rest time of the drive come from the fact that particles will not attach to a spinning disk, so if a disk is shut off, any particulate in the air of the disk enclosure is going to settle on a surface and by molecular attraction may bond itself to the surface. When the drive is started up at a later time, the head may sweep it away during the access or potentially pound it into the disk surface creating a hard particle and opportunity for failure. The disk drives have air flow filters directing air flow off the disk pack to trap and remove particulates from air flow within the disk enclosure. These filters may also contain sacrificial elements to avoid corrosion of the surfaces of the enclosure, disks, and heads avoiding early failure.

The reader should conclude from the above dissertation that all drive manufacture's engineering team are working responsibly to insure the disk drive devices incorporate the SMART algorithms to detect those types of events that are detectable by monitoring procedures, but that not all failure mechanisms manifest themselves in a manner in which the algorithm is going to be able to detect and report to the system in advance of other failure events as is desirable both for manufacturer and system integrators.


Hard Disk Drive Warranties

Posted Apr 12, 2014 23:41 UTC (Sat) by nix (subscriber, #2304) [Link]

This rates as one of the best comments ever on LWN, I think. (Despite slight grammar garbling, perhaps due to a non-native speaker, causing tricky parsing here and there.)

It was worth waiting seven years for. Bravo!


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds