User: Password:
|
|
Subscribe / Log in / New account

crash vs. power drop

crash vs. power drop

Posted Sep 11, 2009 6:27 UTC (Fri) by zmi (guest, #4829)
In reply to: crash vs. power drop by ncm
Parent article: POSIX v. reality: A position on O_PONIES

> Second, more subtle but probably more important, drives lie about what is
physically on disk.

That's why in a RAID you really *must* turn off the drive's write cache.
I've tried to explain that in the XFS FAQ:
http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_w...
and also in the questions below that one.

Short: I've got a new 2TB WD drive with 64MB cache, we intend to use them
in a RAID. Take 16 of these drives, it adds to 1024MB (1GB) of write cache.
So in the worst situation,
1) you've got an UPS, but your power supply fails and the PC/server is out
of power
2) drives have their caches full, so up to 1GB of data is lost, where the
filesystem believed they are on disk. There's a *very* high chance that
lots of metadata is included in the cached writes.
3) each of the 16 drives could write "half sectors", effectively destroying
the previous and the actual content.

In all this discussion, it would have been worth noting that if you really
*care* about your data, you *must* turn off the drive write cache. Yes,
power failures are not so often in countries with good power supply. Still,
I use an UPS and in the last half year, had
a) my daughter playing around turning the power of the server off
b) a dead power supply in my workstation
and so, even with an UPS, "drive write cache off" is a must. Simply put a
hdparm -W0 /dev/sda
in your boot scripts.

Note that still this only helps in 1) and 2), but for problem 3) there's
nothing anybody but the disk manufacturers can do. I must say that I have
no evidence of ever having had that problem somewhere. It might be that
happened when there are "strange filesystem problems" after a crash, but
you can't tell for sure.

As for the rename: Really, there should only ever be the chance of having
either the old file or the new one, and the filesystem should care about
this even for crash situations.

Note: In Linux you can tune writeback behaviour in /etc/sysctl.conf:
# start writeback at around 16MB, max. 250MB
vm.dirty_background_bytes = 16123456
vm.dirty_bytes = 250123456
# older kernels had this:
#vm.dirty_background_ratio = 5
#vm.dirty_ratio = 10
# write blocks to disk after 1 second (default: 3000ms)
vm.dirty_expire_centisecs = 1000
vm.dirty_writeback_centisecs = 100

Note that dirty_bytes/dirty_ratio is to block new writes once the cache has
that many bytes to write. On systems with 8GB RAM or more, you could end up
having gigabytes of disk cache.

Sorry for putting all in one post, but I hope it helps people who care
about their data to have some tunings to start with.

mfg zmi


(Log in to post comments)

crash vs. power drop

Posted Sep 11, 2009 11:40 UTC (Fri) by Cato (subscriber, #7643) [Link]

Thanks for explaining all that.

On the topic of writing 'half sectors' due to a power drop: the author of http://lwn.net/Articles/351521/ has done quite a lot of tests on various hard drives, and generally found that they usually don't do this, though some instances have done. He has a useful program that can test any drive for this behaviour, though it's mostly intended to test for out of order writes due to caching - I believe only some drives lie about this.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds