Jul 23 19:06:57 sysresccd attempt to access beyond end of device
Jul 23 19:06:57 sysresccd sda: rw=0, want=198724030, limit=66055248
Jul 23 19:07:20 sysresccd attempt to access beyond end of device
Jul 23 19:07:20 sysresccd sda: rw=0, want=198723798, limit=66055248
One weird thing is that the LVM on the main disk that hosted the root FS didn't show any LVM related errors, but that was the one with the major corruption. Of course the backup LVM had major corruption but I wasn't focusing on that. In fact a generally odd thing is that the Ubuntu logs didn't show any errors on either disk (i.e. ext3 or LVM type errors), apart from the 'FS remounted' one, yet SystemRescueCD showed them right away.
Another weird thing is that despite the root FS being remounted read-only, the logs in /var were still being written to for 10 days after the first root corruption - surely this is a bug as it can only increase FS corruption.
I haven't yet run a memory test but the system doesn't show any other signs of bad RAM such as randomly crashing applications. The logs also don't show any disk hardware errors.
Anyway, the lesson is simple: never, ever use LVM again. Gparted is pretty good these days for resizing/moving partitions, and the time I have saved on LVM is far less than the hassle of this recovery exercise.
Sorry for going so far off topic, but perhaps LWN would like to write a piece on data loss bugs and how best the community should address them - maybe starting with LVM...
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds