LWN.net Logo

ext4 experience

ext4 experience

Posted Nov 30, 2011 2:11 UTC (Wed) by dskoll (subscriber, #1630)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by yokem_55
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

I also had a very nasty experience with ext4. A server I built using ext4 suffered a power failure and the file system was completely toast after it powered back up. fsck threw hundreds of errors and I ended up rebuilding from scratch.

I have no idea if ext4 was the cause of the problem, but I've never seen that on an ext3 system. I am very nervous... possibly irrationally so, but I think I'll stick to ext3 for now.


(Log in to post comments)

ext4 experience

Posted Nov 30, 2011 4:52 UTC (Wed) by ringerc (subscriber, #3071) [Link]

The usual culprit in those sorts of severe corruption or loss cases is aggressive write-back caching without battery backup. Some cheap RAID controllers will let you enable write-back caching without a BBU, and some HDDs support it too.

Write-back caching on volatile storage without careful use of write barriers and forced flushes *will* cause severe data corruption if the storage is cleared due to (eg) unexpected power loss.

ext4 experience

Posted Nov 30, 2011 9:00 UTC (Wed) by Cato (subscriber, #7643) [Link]

You are right about battery backup. Every modern hard disk uses writeback caching, and some of them make it hard to ensure that the cache is flushed when the kernel wants to ensure a write barrier is implemented. The size of hard disk caches (32 MB typically) and the use of journalling filesystems (concentrating key metadata writes in journal blocks) can mean that a power loss or hard crash loses a large amount of filesystem metadata.

ext4 experience

Posted Nov 30, 2011 12:40 UTC (Wed) by dskoll (subscriber, #1630) [Link]

My system was using Linux Software RAID, so there wasn't a cheap RAID controller in the mix. You could be correct about the hard drives doing caching, but it seems odd that I've never seen this with ext3 but did with ext4. I am still hoping it was simply bad luck, bad timing, and writeback caching... but I'm also still pretty nervous.

ext4 experience

Posted Nov 30, 2011 12:50 UTC (Wed) by dskoll (subscriber, #1630) [Link]

Ah... reading http://serverfault.com/questions/279571/lvm-dangers-and-caveats makes me think I was a victim of LVM and no write barriers. I've followed the suggestions in that article. So maybe I'll give ext4 another try.

ext4 experience

Posted Nov 30, 2011 20:20 UTC (Wed) by walex (subscriber, #69836) [Link]

You have been wishing for O_PONIES!

It is a very well known issue usually involving unaware sysadms and cheating developers.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds