User: Password:
Subscribe / Log in / New account

Only with a UPS

Only with a UPS

Posted Sep 11, 2009 14:41 UTC (Fri) by anton (subscriber, #25547)
In reply to: Only with a UPS by ncm
Parent article: POSIX v. reality: A position on O_PONIES

First, if power drops during a physical write operation, that sector is scragged. If it was writing metadata, you have serious problems with whatever files that metadata describes, if anything points to that sector.
In my experiments on cutting power on disk drives while writing, the drives did not corrupt sectors. I have seen IBM and Maxtor drives corrupt sectors under more unusual power fluctuation circumstances; maybe that's a reason why you can no longer buy drives from IBM or Maxtor; Hitachi (IBM successor) and Seagate-Maxtor (not Seagate proper) are certainly on my dont-buy list.

And a modern file system can protect against the corruption of a single sector:

E.g., in a journaling file system, that sector is either in the log or in the permanent storage. If it's in the log, just stop the replay when you encounter the sector. If it's in permanent storage, then you will notice that the replay write fails, and the file system can remap the sector/block to a working one (or the drive might remap it transparently on the replay write, or might just perform the write on the original sector; in these cases the file system has nothing to do). Of course, if the file system performs only meta-data journaling, then it will likely not notice corrupt data (because it is not accessed during replay), but apparently neither the file system maintainer nor the user (or whoever decided to use a meta-data journaling file system) cares about data anyway, so that's ok.

In a copy-on-write file system, the sector either contains the root of the file system, or it contains something written after the last root. In the latter case these blocks are unreachable anyway after recovery (unless there is also an intent log, in which case the discussion above applies). If the root is affected, then on recovery the youngest alternative root is read, giving us the latest consistent state of the file system.

Second, more subtle but probably more important, drives lie about what is physically on disk.
In the experiments mentioned above, when the drive had write caching enabled (default on PATA and SATA drives), the drives not just reported completion right away, but worse, also reordered the writes (so using barriers or turning off write caching is essential for every kind of consistency).

With write caching disabled, the results of my experiments (both in performance and in what was on disk after powering off) are consistent with the theory that the drive reports the completion of writes only after the sector hits the platter and (with the program I used) consequently only wrote the sectors in order.

BTW, it's not just the drive manufacturers that default to fast rather than safe; the Linux kernel developers do a similar thing (with a much smaller performance incentive) when they disable barriers by default, and turned ext3 from data=journal to data=ordered (and letting data=journal rot), and recently to data=writeback (although that may be just to make ext3 as bad as ext4 so people will not switch back). Hmm, are Solaris or BSD developers less cavalier about their user's data?

On the subject: UPSs and computer PSUs can fail, too. Better recommend a dual power supplies with dual UPSs; double failures should be relatively rare.

(Log in to post comments)

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds