Journaling no protection against power drop

Posted Sep 1, 2009 7:58 UTC (Tue) by ncm (guest, #165)
In reply to: Journaling no protection against power drop by IkeTo
Parent article: Ext3 and RAID: silent data killers?

It seems to be extremely common to believe in disks that use the spinning-down platter to drive the motor as a generator to power a last few sectors' writes. When I speak to engineers at drive manufacturers, they say it's a myth. (It might have been done decades ago.) They say they happily stop writing halfway in the middle of a sector, and respond to power drop only by parking the head.

Some drives only report blocks written to the platter after they really have been, but that's bad for benchmarks, so most drives fake it, particularly when they detect benchmark-like behavior. Everyone serious about reliability uses battery backup, so whoever's left isn't serious, and (they reason) deserve what they get, because they're not paying. Building in better reliability manifestly doesn't improve sales or margins.

If you pay twice as much for a drive, you might get better behavior. Or you might only pay more.

If you provide a few seconds' battery backup for the drive but not the host, then the blocks in the buffer that the drive said were on the disk get a chance to actually get there.

Journaling no protection against power drop

Posted Sep 1, 2009 17:09 UTC (Tue) by Baylink (guest, #755) [Link]

> most drives fake it, particularly when they detect benchmark-like behavior.

> If you pay twice as much for a drive, you might get better behavior. Or you might only pay more.

I generally find the difference per GB to be 6:1 going from even enterprise SATA drives to Enterprise SCSI (U-160 or faster, 10K or faster). My experience is that I get what I pay for, YMMV.

Journaling no protection against power drop

Posted Sep 1, 2009 17:20 UTC (Tue) by markusle (guest, #55459) [Link] (1 responses)

> Some drives only report blocks written to the platter after they really
> have been, but that's bad for benchmarks, so most drives fake it,
> particularly when they detect benchmark-like behavior.

I'd be very interested in some additional references or a list of drives
that do or don't do this.

Journaling no protection against power drop

Posted Sep 1, 2009 17:44 UTC (Tue) by ncm (guest, #165) [Link]

Start by looking at very, very expensive, slow drives. Then forget about them. Instead, rely on redundancy and battery backup. There are lots of companies that aggregate cheap disks, batteries, cache, and power in a nice box, and each charges what they can get for it. Some work well, others less so. Disk arrays work like insurance: spread the risk, and cover for failures. Where they inadvertently concentrate risk, you get it all.

The storage industry is as mature as any part of the computer business. It is arranged such as to allow you to spend as much money as you like, and can happily absorb as much as you throw at it. If you know what you're doing, you can get full value for your money. If you don't know what you're doing, you can spend just as much and get little more value than the raw disks in the box. There is no substitute for competence.

Journaling no protection against power drop

Posted Sep 1, 2009 23:28 UTC (Tue) by dododge (guest, #2870) [Link]

> They say they happily stop writing halfway in the middle of a sector, and respond to power drop only by parking the head.

The old DeskStar drive manual (circa 2002) explicitly stated that power loss in the middle of a write could lead to partially-written sectors, which would trigger a hard error if you tried to read them later on. According to an LKML discussion back then, the sectors would stay in this condition indefinitely and would not be remapped; so the drive would continue to throw hard errors until you manually ran a repair tool to find and fix them.

Journaling no protection against power drop

Posted Sep 5, 2009 0:10 UTC (Sat) by giraffedata (guest, #1954) [Link]

If you provide a few seconds' battery backup for the drive but not the host, then the blocks in the buffer that the drive said were on the disk get a chance to actually get there.

But then you also get the garbage that the host writes in its death throes (e.g. update of a random sector) while the drive is still up.

To really solve the problem, you need much more sophisticated shutdown sequencing.

Journaling no protection against power drop

Posted Sep 8, 2009 20:54 UTC (Tue) by anton (subscriber, #25547) [Link] (2 responses)

[Engineers at drive manufacturers] say they happily stop writing halfway in the middle of a sector, and respond to power drop only by parking the head.

The results from my experiments on cutting power on disk drives are consistent with the theory that the drives I tested complete the sector they write when the power goes away. However, I have seen drives that corrupt sectors on unusual power conditions; the manufacturers of these drives (IBM, Maxtor) and their successors (Hitachi) went to my don't-buy list and are still there.

Some drives only report blocks written to the platter after they really have been, but that's bad for benchmarks, so most drives fake it, particularly when they detect benchmark-like behavior.

Write-back caching (reporting completion before the data hits the platter) is normally enabled in PATA and also SATA drives (running benchmarks or not), because without tagged commands (mostly absent in PATA, and not universally supported for SATA) performance is very bad otherwise. You can disable that with hdparm -W0. Or you can ask for barriers (e.g., as an ext3 mount option), which should give the same consistency guarantees at lower cost if the file system is implemented properly; however, my trust in the proper implementation in Linux is severely undermined by the statements that some prominent kernel developers have made in recent months on file systems.

Everyone serious about reliability uses battery backup

Do you mean a UPS? So how does that help when the UPS fails? Yes, we have had that (while power was alive), and we concluded that our power grid is just as reliable as a UPS. One could protect against a failing UPS with dual (redundant) power supplies and dual UPSs, but that would probably double the cost of our servers. A better option would be to have an OS that sets up the hardware for good reliability (i.e., disable write caching if necessary) and works hard in the OS to ensure data and metadata consistency. Unfortunately, it seems that that OS is not Linux.

Journaling no protection against power drop

Posted Sep 10, 2009 20:58 UTC (Thu) by Cato (guest, #7643) [Link] (1 responses)

Great to see your testing tool, I will try this out on a few spare hard drives to see what happens.

UPSs are useful at least to regulate the voltage and cover against momentary power cuts, which are very frequent where I live, and far more frequent than UPS failures in my experience.

Journaling no protection against power drop

Posted Sep 10, 2009 21:34 UTC (Thu) by anton (subscriber, #25547) [Link]

It depends on where you live. Here power outages are quite infrequent, but mostly take so long that the UPS will run out of power. So then the UPS only gives the opportunity for a clean shutdown (and that opportunity was never realized by our sysadmin when we had UPSs), and that is unnecessary if you have all of the following: decent drives that complete the last sector on power failure; a good file system; and a setup that gives the file system what it needs to stay consistent (e.g., barriers or hdparm -W0). And of course we have backups around if the worst comes to worst. And while we don't have the ultimate trust in ext3 and the hard drives we use, we have not yet needed the backups for that.

Journaling no protection against power drop

Posted Sep 10, 2009 9:00 UTC (Thu) by hensema (guest, #980) [Link] (1 responses)

> They say they happily stop writing halfway in the middle of a sector, and respond to power drop only by parking the head.

Which is no problem. The CRC for the sector will be incorrect, which will be reported to the host adapter. The host adapter will then reconstruct the data and write back the correct sector.

Of course you do need RAID for this.

Journaling no protection against power drop

Posted Sep 10, 2009 20:52 UTC (Thu) by Cato (guest, #7643) [Link]

And of course RAID has its own issues with RAID 6 really being required with today's large disks to protect against the separate-failure-during-rebuild case.

Without RAID, the operating system will have no idea the sector is corrupt - this is why I like ZFS's block checksumming, as you can get a list of files with corrupt blocks in order to restore from backup.