Posted Nov 19, 2011 18:12 UTC (Sat) by jzbiciak (✭ supporter ✭, #5246)
[Link]
I started going ECC despite the added cost once I got up to 512MB in my system. I had a non-ECC system flip some bits in a cached disk block (well, I *think* that's what happened, at least), and lost a bunch of files that way.
I've been running ECC RAM ever since.
Funny: similar experience teached me to never waste money on ECC
Posted Nov 20, 2011 8:22 UTC (Sun) by khim (subscriber, #9252)
[Link]
Funny. I've had similar experience about 10 years ago. Just in my case random bits were flipped by uber-expensive RAID server on our brand-new central server (well the system was supposed to be our central server, but we had no way to use it in such mode because it corrupted files).
After few months culprit was found: it was riser card's fault. It had bad contact with RAID controller. When it was replaced everything started working fine.
This incident showed me that ECC is often just a waste of money and you still need to protect important data with SHA1 or something similar.
I stopped using ECC and switched to SHA1. This saved our beacon when someone got this bright idea to use XFS on production server and files were randomly broken on poweroff. We had few similar incidents after that - and I can not recall even a single one where cheap non-ECC memory was a culprit.