Posted Nov 30, 2011 12:49 UTC (Wed) by tialaramex (subscriber, #21167)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by nix
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

"occasional instances of bitflips in the page cache"

To someone who isn't looking for RAM/ cache issues as the root cause, those often look just like filesystem corruption of whatever kind. They try to open a file, get an error saying it's corrupted. Or they run a program and it mysteriously crashes.

If you _already know_ you have bad RAM, then you say "Ha, bitflip in page cache" and maybe you flush a cache and try again. But if you've already begun to harbour doubts about Seagate disks, or Dell RAID controllers, or XFS then of course that's what you will tend to blame for the problem.

Posted Dec 1, 2011 19:23 UTC (Thu) by nix (subscriber, #2304) [Link]

This does depend on how bad the RAM was. The RAM on this machine was so bad that the fs was not the only thing misbehaving by any means.

Rare bitflips are normally going to be harmless or fixed up by e2fsck, one would hope. There may be places where a single bitflip, written back, toasts the fs, but I'd hope not. (The various fs fuzzing tools would probably have helped comb those out.)

