I've now had a look at the machine - there are two disks which have mostly LVMed partitions, and one disk is showing classic signs of LVM errors resulting in ext3 corruption (not the one with the root FS, but one with several LVMs for local backup). I used SystemRescueCD initially and the LVM commands showed the LVM state was quite messed up, plus log messages like this:
Jul 23 19:06:57 sysresccd attempt to access beyond end of device
Jul 23 19:06:57 sysresccd sda: rw=0, want=198724030, limit=66055248
Jul 23 19:07:20 sysresccd attempt to access beyond end of device
Jul 23 19:07:20 sysresccd sda: rw=0, want=198723798, limit=66055248
One weird thing is that the LVM on the main disk that hosted the root FS didn't show any LVM related errors, but that was the one with the major corruption. Of course the backup LVM had major corruption but I wasn't focusing on that. In fact a generally odd thing is that the Ubuntu logs didn't show any errors on either disk (i.e. ext3 or LVM type errors), apart from the 'FS remounted' one, yet SystemRescueCD showed them right away.
Another weird thing is that despite the root FS being remounted read-only, the logs in /var were still being written to for 10 days after the first root corruption - surely this is a bug as it can only increase FS corruption.
I haven't yet run a memory test but the system doesn't show any other signs of bad RAM such as randomly crashing applications. The logs also don't show any disk hardware errors.
Anyway, the lesson is simple: never, ever use LVM again. Gparted is pretty good these days for resizing/moving partitions, and the time I have saved on LVM is far less than the hassle of this recovery exercise.
Sorry for going so far off topic, but perhaps LWN would like to write a piece on data loss bugs and how best the community should address them - maybe starting with LVM...