Journaling no protection against power drop
Journaling no protection against power drop
Posted Sep 1, 2009 7:58 UTC (Tue) by ncm (guest, #165)In reply to: Journaling no protection against power drop by IkeTo
Parent article: Ext3 and RAID: silent data killers?
Some drives only report blocks written to the platter after they really have been, but that's bad for benchmarks, so most drives fake it, particularly when they detect benchmark-like behavior. Everyone serious about reliability uses battery backup, so whoever's left isn't serious, and (they reason) deserve what they get, because they're not paying. Building in better reliability manifestly doesn't improve sales or margins.
If you pay twice as much for a drive, you might get better behavior. Or you might only pay more.
If you provide a few seconds' battery backup for the drive but not the host, then the blocks in the buffer that the drive said were on the disk get a chance to actually get there.
Posted Sep 1, 2009 17:09 UTC (Tue)
by Baylink (guest, #755)
[Link]
{{citation-needed}}
> If you pay twice as much for a drive, you might get better behavior. Or you might only pay more.
I generally find the difference per GB to be 6:1 going from even enterprise SATA drives to Enterprise SCSI (U-160 or faster, 10K or faster). My experience is that I get what I pay for, YMMV.
Posted Sep 1, 2009 17:20 UTC (Tue)
by markusle (guest, #55459)
[Link] (1 responses)
I'd be very interested in some additional references or a list of drives
Posted Sep 1, 2009 17:44 UTC (Tue)
by ncm (guest, #165)
[Link]
The storage industry is as mature as any part of the computer business. It is arranged such as to allow you to spend as much money as you like, and can happily absorb as much as you throw at it. If you know what you're doing, you can get full value for your money. If you don't know what you're doing, you can spend just as much and get little more value than
the raw disks in the box. There is no substitute for competence.
Posted Sep 1, 2009 23:28 UTC (Tue)
by dododge (guest, #2870)
[Link]
The old DeskStar drive manual (circa 2002) explicitly stated that power loss in the middle of a write could lead to partially-written sectors, which would trigger a hard error if you tried to read them later on. According to an LKML discussion back then, the sectors would stay in this condition indefinitely and would not be remapped; so the drive would continue to throw hard errors until you manually ran a repair tool to find and fix them.
Posted Sep 5, 2009 0:10 UTC (Sat)
by giraffedata (guest, #1954)
[Link]
But then you also get the garbage that the host writes in its death throes (e.g. update of a random sector) while the drive is still up.
To really solve the problem, you need much more sophisticated shutdown sequencing.
Posted Sep 8, 2009 20:54 UTC (Tue)
by anton (subscriber, #25547)
[Link] (2 responses)
Posted Sep 10, 2009 20:58 UTC (Thu)
by Cato (guest, #7643)
[Link] (1 responses)
UPSs are useful at least to regulate the voltage and cover against momentary power cuts, which are very frequent where I live, and far more frequent than UPS failures in my experience.
Posted Sep 10, 2009 21:34 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Posted Sep 10, 2009 9:00 UTC (Thu)
by hensema (guest, #980)
[Link] (1 responses)
Which is no problem. The CRC for the sector will be incorrect, which will be reported to the host adapter. The host adapter will then reconstruct the data and write back the correct sector.
Of course you do need RAID for this.
Posted Sep 10, 2009 20:52 UTC (Thu)
by Cato (guest, #7643)
[Link]
Without RAID, the operating system will have no idea the sector is corrupt - this is why I like ZFS's block checksumming, as you can get a list of files with corrupt blocks in order to restore from backup.
Journaling no protection against power drop
Journaling no protection against power drop
> have been, but that's bad for benchmarks, so most drives fake it,
> particularly when they detect benchmark-like behavior.
that do or don't do this.
Start by looking at very, very expensive, slow drives. Then forget about them. Instead, rely on redundancy and battery backup. There are lots of companies that aggregate cheap disks, batteries, cache, and power in a nice box, and each charges what they can get for it. Some work well, others less so. Disk arrays work like insurance: spread the risk, and cover for failures. Where they inadvertently concentrate risk, you get it all.
Journaling no protection against power drop
Journaling no protection against power drop
Journaling no protection against power drop
If you provide a few seconds' battery backup for the drive but not the host, then the blocks in the buffer that the drive said were on the disk get a chance to actually get there.
Journaling no protection against power drop
[Engineers at drive manufacturers] say they happily stop
writing halfway in the middle of a sector, and respond to power drop
only by parking the head.
The results from my experiments
on cutting power on disk drives are consistent with the theory
that the drives I tested complete the sector they write when the power
goes away. However, I have seen drives that corrupt sectors on
unusual power conditions; the manufacturers of these drives (IBM,
Maxtor) and their successors (Hitachi) went to my don't-buy list and
are still there.
Some drives only report blocks written to the platter
after they really have been, but that's bad for benchmarks, so most
drives fake it, particularly when they detect benchmark-like
behavior.
Write-back caching (reporting completion before the data hits the
platter) is normally enabled in PATA and also SATA drives (running
benchmarks or not), because without tagged commands (mostly absent in
PATA, and not universally supported for SATA) performance is very bad
otherwise. You can disable that with hdparm -W0
. Or you
can ask for barriers (e.g., as an ext3 mount option), which should
give the same consistency guarantees at lower cost if the file system
is implemented properly; however, my trust in the proper
implementation in Linux is severely undermined by the statements that
some prominent kernel developers have made in recent months on file
systems.
Everyone serious about reliability uses battery
backup
Do you mean a UPS? So how does that help when the UPS fails? Yes, we
have had that (while power was alive), and we concluded that our power
grid is just as reliable as a UPS. One could protect against a
failing UPS with dual (redundant) power supplies and dual UPSs, but
that would probably double the cost of our servers. A better option
would be to have an OS that sets up the hardware for good reliability
(i.e., disable write caching if necessary) and works hard in the OS to
ensure data and metadata consistency. Unfortunately, it seems that
that OS is not Linux.
Journaling no protection against power drop
It depends on where you live. Here power outages are quite
infrequent, but mostly take so long that the UPS will run out of
power. So then the UPS only gives the opportunity for a clean
shutdown (and that opportunity was never realized by our sysadmin when
we had UPSs), and that is unnecessary if you have all of the
following: decent drives that complete the last sector on power
failure; a good file system; and a setup that gives the file system
what it needs to stay consistent (e.g., barriers or hdparm -W0). And
of course we have backups around if the worst comes to worst. And
while we don't have the ultimate trust in ext3 and the hard drives we
use, we have not yet needed the backups for that.
Journaling no protection against power drop
Journaling no protection against power drop
Journaling no protection against power drop