McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Nov 22, 2014 2:15 UTC (Sat) by ncm (guest, #165)
In reply to: McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug by reedstrm
Parent article: McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

... For some definition of "pull the plug" that doesn't include actually pulling the plug. No software system can be proof against corruption in a power drop, which can wipe out random sectors. The best one can do is handle system crashes, and then only if the file system bit doesn't go wild for a while before actually failing. Fortunately they usually don't. And, fortunately, most power drops happen when the disk isn't writing, and when it is, the sectors scribbled are usually not very importent.

But when it matters, there's no substitute for battery backup and a powered-up shutdown. Five minutes of backup is plenty if after four minutes you start a shutdown. Thirty seconds is plenty if you panic the kernel at the first hiccup and let the drive drain its buffer in peace.

McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Nov 22, 2014 5:00 UTC (Sat) by dlang (guest, #313) [Link] (7 responses)

actrually for postgres, they do really pull the plug

As long as the disk only corrupts the sector it's in the middle of writing to, postgres will not loose any data that it's reported as safe.

now, if the drive goes off and scribbles on other parts of the drive as it looses power, all bets are off, but such drives really do not exist, they detect the power failure fast enough to stop writing before any mechanical movement is affected.

McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Nov 25, 2014 14:15 UTC (Tue) by ncm (guest, #165) [Link] (6 responses)

There is no promise by manufacturers that the drive won't corrupt any number of sectors on the tracks (plural) that the heads (plural) are on, and no promise that the file system won't be corrupted. There used to be an urban legend that drives would use the remaining rotational energy to power an orderly shutdown. I have not found any evidence for that.

If PG are doing power-off tests, you can bet they are stacking the deck with chosen hardware carefully configured to minimize problems, because if they did find corruption, there is nothing they could do to prevent it next time. The odds are that any given power drop won't corrupt much. Testing fewer than 10,000 power drops is just theater.

McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Nov 25, 2014 15:38 UTC (Tue) by magila (guest, #49627) [Link] (2 responses)

Modern hard drives have hardware to detect a power failure and stop the write head at the next sector boundry. There is enough energy stored in capacitors to keep this hardware alive long enough to do its job. So I in practice you will not see more than two corrupted sectors after a power failure.

pulling the plug on a disk drive

Posted Dec 5, 2014 1:22 UTC (Fri) by giraffedata (guest, #1954) [Link] (1 responses)

Modern hard drives have hardware to detect a power failure and stop the write head at the next sector boundary... So I in practice you will not see more than two corrupted sectors after a power failure.

How could it be more than one sector?

I haven't heard of this technology, but I'm very familiar with quite old technology in which the drive detects supply voltage about to sink below a safe level and cuts the write current to the head immediately, so the damage is limited to one sector. This technology doesn't even involve instructions executing, so there is no reserve energy to speak of required.

pulling the plug on a disk drive

Posted Dec 11, 2014 20:57 UTC (Thu) by jimparis (guest, #38647) [Link]

> > Modern hard drives have hardware to detect a power failure and stop the write head at the next sector boundary... So I in practice you will not see more than two corrupted sectors after a power failure.
>
> How could it be more than one sector?

Shingled Magnetic Recording (SMR) overlaps tracks, so you're always corrupting the next track whenever you write one. The next track needs to get rewritten too (until you hit a gap between tracks).

McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Nov 25, 2014 17:11 UTC (Tue) by Wol (subscriber, #4433) [Link]

Actually, there's a lot they could do, and it's called "transaction logging".

The whole point of which is to save the same data twice, and thus enable the system to detect that something bad has happened. Of course, what it does after detecting trouble depends on how bad the trouble is, but it is mandatory that it either replays the log to properly update the data, or reverts the log to properly undo the transaction.

It doesn't matter WHAT happens to the disk, so long as there is not a disk failure that randomly scribbles over the disk, Postgres (and pretty much any other database) will provide guarantees that enable you to get back to a clean state, either pre- or post- attempted write.

For example, in the database I want to write I'm planning to use COW. Provided (big if :-( the OS doesn't muck up my user-space write order, that's guaranteed not to corrupt data.

Cheers,
Wol

McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Dec 4, 2014 2:50 UTC (Thu) by dw (guest, #12017) [Link] (1 responses)

Manufacturers have been promising such things, and for a very long time. Here is the manual for a 26 year old drive, where section 2.1.4 clearly indicates “No damage or loss of data will occur if power is applied or removed during drive operation, except that data may be lost in the sector being written at the time of power loss”. (With thanks to Howard Chu for excavating that little nugget)

McKenney: Stupid RCU Tricks: rcutorture Catches an RCU Bug

Posted Dec 8, 2014 9:57 UTC (Mon) by arnd (subscriber, #8866) [Link]

There is at least evidence for CF cards generally not doing this, and embedded designs that store data on these sometimes use a large capacitor to power the card for long enough to write all cached data (not a lot) and perform a garbage collection (this may take a second or two).

I wouldn't be surprised if consumer-grade SSDs have similar problems without such workarounds.