Well «crazy things» ? Since 8 days, I had 5 servers (over ~200) with data corruption on ext4 partitions (over LVM, over Xen blockfront/blockback, over DRBD, over LVM). Specialy with partitions mounted with defaults,noatime,nodev,nosuid,noexec,data=ordered (MySQL InnoDB data).
I was thinking about a problem with DRBD, then I saw this news... so... I don't know. Is it really the only way to trigger that problem ?
Posted Oct 25, 2012 0:31 UTC (Thu) by nix (subscriber, #2304)
[Link]
It's the only way I've been able to find. /sbin/reboot -f on a system with mounted filesystems does not trigger this problem. Reboot after unmounting does not trigger this problem. Reboot *during* a umount, and *boom* goodbye fs.
I have speculated on ways to fix this for good, though they require a new syscall, a new userspace utility, changes to shutdown scripts, that others on l-k agree my idea is not utterly insane, and for me to bother to implement all of this. The latter is questionable, given the number of things I mean to do that I never get around to. :)
Ext4 data corruption trouble
Posted Oct 25, 2012 1:14 UTC (Thu) by luto (subscriber, #39314)
[Link]
Fix what for good?
If you want to cleanly unmount everything, presumably you want (a) revoke and (b) unmount-the-$!%@-fs-even-if-it's-in-use. (I'd like both of these.)
If you want to know when filesystems are gone, maybe you want to separate the processes of mounting things into the FS hierarchy from loading a driver for an FS. Then you could force-remove-from-hierarchy (roughly equivalent to umount -l) and separately wait until the FS is no longer loaded (which has nothing to do with the hierarchy).
If you want your system to be reliable, the bug needs fixing.
Ext4 data corruption trouble
Posted Oct 25, 2012 1:25 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
As I understand his post, there are two big issues.
1. you can't even try to unmount a filesystem if it's mounted under another filesystem that you can't reach
example
mount /dev/sda /
mount remote:/something on /something
mount /dev/sdb /something/else
now if remote goes down, you have no way of cleanly unmounting /dev/sdb
2. even solving for #1, namespaces cause problems because with namespaces, it is now impossible for any one script to unmount everything, or even to find what pids need to be killed in all the pid namespaces to be able to make a filesystem idle so that is can be unmounted.
Ext4 data corruption trouble
Posted Oct 25, 2012 1:30 UTC (Thu) by ewen (subscriber, #4772)
[Link]
Wouldn't "mount -o remount /dev/sdb" solve the first problem? In theory it should close off the journal and get the file system into a stable state, but not require the non-responsive NFS server to reply. And in theory it should be safe to force unmount a read-only file system, once it's reached that read-only/stable state.
However finding all the file systems in the face of many PID/filesystem name spaces is still non-trivial.
Ewen
Ext4 data corruption trouble
Posted Oct 25, 2012 1:56 UTC (Thu) by nix (subscriber, #2304)
[Link]
dlang has it right, that's the problem I was trying to solve with this lazy umount kludge. And for many, many years, it worked!
I had no idea you could use remounting (plus, presumably, readonly remounting) on raw devices like that. That might work rather well in my case: all my devices are in one LVM VG, so I can just do a readonly remount on /dev/$vgname/*.
But in the general case, including PID and fs namespaces, that's really not going to work, indeed.
Ext4 data corruption trouble
Posted Oct 25, 2012 3:50 UTC (Thu) by ewen (subscriber, #4772)
[Link]
Yes, I did intend to say "mount -o remount,ro /dev/sdb". For years it's been my usual "try to minimise the harm" approach, when dealing with a stuck server due to some mounts not responding. I'm not sure what happens with a modern server where the same volume is mounted in more than one location (hopefully all the mounts end up read-only). But it definitely works with /dev/mapper/$vgname-$lvname for instance if it's only mounted once.
Ewen
Ext4 data corruption trouble
Posted Oct 25, 2012 11:16 UTC (Thu) by nix (subscriber, #2304)
[Link]
Bind mounts will be fine with this: they all share the same read-only state unless explicitly otherwise requested.