I did base these changes mostly on the well known lack of journal checksumming in ext3 (going to data=journal and avoiding write caching) - see http://en.wikipedia.org/wiki/Ext3#No_checksumming_in_journal. Dropping LVM is harder to justify, it's really just a hunch based on a number of reports of LVM being involved in data corruption, and on my own data point that the LVM volumes on one disk were completely inaccessible (i.e. corrupted LVM metadata) - hence it was not just ext3 involved here, though it might have been write caching as well.
I'm interested to hear responses that show these steps are unnecessary, of course.
I really doubt the hardware is broken: there are no disk I/O errors in any of the logs, there were 2 disks corrupted (1 SATA, 1 PATA), and there are no symptoms of memory errors (random application/system crashes).
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds