Ext4 data corruption trouble [Updated]

By Jonathan Corbet
October 24, 2012

Stable kernel updates are supposed to be just that — stable. But they are not immune to bugs, as a recent ext4 filesystem problem has shown. In short: ext4 users would be well advised to avoid versions 3.4.14, 3.4.15, 3.5.7, 3.6.2, and 3.6.3; they all contain a patch which can, in some situations, cause filesystem corruption.

The problem, as explained in this note from Ted Ts'o, has to do with how the ext4 journal is managed. In some situations, unmounting the filesystem fails to truncate the journal, leaving stale (but seemingly valid) data there. After a single unmount/remount (or reboot) cycle little harm is done; some old transactions just get replayed unnecessarily. If the filesystem is quickly unmounted again, though, the journal can be left in a corrupted state; that corruption will be helpfully replayed onto the filesystem at the next mount.

Fixes are in the works. The ext4 developers are taking some time, though, to be sure that the problem has been fully understood and completely fixed; there are signs that the bug may have roots far older than the patch that actually caused it to bite people. Once that process is complete, there should be a new round of stable updates (possibly even for 3.5, which is otherwise at end-of-life) and the world will be safe for ext4 users again.

(Thanks are due to LWN reader "nix" who alerted readers in the comments and reported the bug to the ext4 developers).

Update: Ted now thinks that his initial diagnosis was incomplete at best; the problem is not as well understood as it seemed. Stay tuned.

Ext4 data corruption trouble

Posted Oct 24, 2012 19:27 UTC (Wed) by spender (guest, #23067) [Link] (4 responses)

Personally, I'm tired of the "masturbating monkeys" like nix telling me that a bug in Linux causes data corruption. It seems like whenever there's one of these data corruption bugs present, someone's spouting off at the mouth about it! When will people like him learn that these kinds of articles are offensive and unnecessary, as (Linus wisely tells us) all bugs are of equal importance? Let's not glorify these data corruption bugs, and may our systems be destroyed as they so richly deserve.

</sarcasm>
-Brad

Ext4 data corruption trouble

Posted Oct 24, 2012 20:14 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

Sorry, I didn't realise Ted Ts'o and Linus Torvalds were the same person, or obliged to respond in the same way to all requests. (I also didn't think Linus had much to do with stable kernel management. But perhaps Greg KH is the same person as well.)

Ext4 data corruption trouble

Posted Oct 25, 2012 0:31 UTC (Thu) by dirtyepic (guest, #30178) [Link]

It's Torvalds all the way down.

Ext4 data corruption trouble

Posted Oct 25, 2012 4:51 UTC (Thu) by cyanit (guest, #86671) [Link]

They are just different manifestations of the overmind.

Chastity for monkeys

Posted Oct 25, 2012 9:43 UTC (Thu) by man_ls (guest, #15091) [Link]

I don't agree. In fact all patches to ext4 should carry a warning "Danger! May eat your data alive!" lest someone be misled by the kernel's "integrity through obscurity" evil policy.

</backSarcasm>

Ext4 data corruption trouble

Posted Oct 24, 2012 20:11 UTC (Wed) by job (guest, #670) [Link] (26 responses)

So, to avoid corruption, should I make sure to write a couple of hundred megabytes of extraneous data to make sure the journal wraps before upgrading to a fixed kernel?

Ext4 data corruption trouble

Posted Oct 24, 2012 20:19 UTC (Wed) by nix (subscriber, #2304) [Link] (25 responses)

That... may not work. I'm seeing corruption on journal replay of any ext4 filesystem mounted read-write in 3.6.3 and then uncleanly unmounted (for the full deeply sordid story of why basically random fses on my system get unmounted uncleanly on a clean shutdown, see my latest LKML post to the relevant thread: mea culpa for never revisiting the ugly shutdown script hack in question in many years, but I'd honestly forgotten it was there).

I suspect /usr/src survived simply because, in the 3.6.3 session when I wrote to it, it happened to get cleanly unmounted rather than uncleanly. (Why /var gets hit so much more often than other dirty filesystems, I still am not quite clear on. I suspect something using /var is not dying when it should when I shut my system down, holding /var open so it never gets unmounted properly and the bug always hits it.)

Ext4 data corruption trouble

Posted Oct 24, 2012 20:50 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

>mea culpa for never revisiting the ugly shutdown script hack in question in many years, but I'd honestly forgotten it was there
But we certainly don't need no systemd!

Ext4 data corruption trouble

Posted Oct 24, 2012 21:02 UTC (Wed) by nix (subscriber, #2304) [Link] (6 responses)

Can systemd cope with the case this hack was handling, to wit a filesystem hierarchy like so:

/local-foo/loopback-nfs-mounted-foo/local-foo

In this situation, you cannot unmount local filesystems until you've killed everything that may have them as current directories -- but you can't kill local processes either, because that'll render your loopback-NFS mount, and everything underneath it including the local mount, inaccessible, and you don't learn this until your umount on reboot stalls indefinitely, which is tough if you're hundreds of miles away and working remotely and this is your main fileserver.

So I resorted to lazy-unmounting everything. But this unfortunately means that umount returns before the *successful* umounts are complete. So I sleep for a bit after umounting... but not necessarily long enough. This is all really gross and unclean and disgusting, but it's been working for many years so I forgot it existed and never tried to make it less gross.

(Aside: this may well go wrong even if I sleep for ages, in which case all that is needed to trigger this is a non-unmounted fs, not an fs halfway through unmounting. I'll be testing that next.)

Ext4 data corruption trouble

Posted Oct 24, 2012 21:26 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

As I understand, systemd tries killing/unmounting in the topoligical order. So if there's an order in which your processes can be killed and filesystems unmounted - it can do this.

Ext4 data corruption trouble

Posted Oct 24, 2012 21:36 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

Topological order won't help, alas, not unless it identifies processes which have files open on filesystems or current directories on such filesystems and toposorts *them* (note further that an unambiguous toposort in this case many not be possible, e.g. if you had a weird userspace fileserver serving /foo and /foo/bar, and that fileserver had a current directory set to /foo/bar...)

Raw umount(8) does a toposort unmount as well. It is not enough.

Ext4 data corruption trouble

Posted Oct 24, 2012 22:00 UTC (Wed) by tomegun (guest, #56697) [Link] (3 responses)

If you use "the right" initramfs (e.g. dracut or Arch's mkinitramfs) with systemd, it might work better.

In that case systemd will jump back to the initramfs on shutdown, and the initramfs will then try to kill/unmount whatever processe/mounts remains in the rootfs.

Ext4 data corruption trouble

Posted Oct 25, 2012 11:16 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

Sure! But... what if you have a mount point causing stalls (perhaps relating to an inaccessible NFS server), with mounted local filesystems buried beyond it? If you do a umount, rather than a umount -l, your shutdown will lock up forever as soon as it hits that mount point.

Worse yet, what if you have processes in other PID namespaces, holding open filesystems in other filesystem namespaces? The initramfs can't even see them! *No* umount loop can fix that. I hate adding new syscalls, but I really do think we need a new 'unmount the world' syscall which can cross such boundaries :(

Ext4 data corruption trouble

Posted Oct 25, 2012 12:50 UTC (Thu) by rleigh (guest, #14622) [Link] (1 responses)

It would be nice for such a system call to also work for selected mount namespaces, so that you can be sure everything is consistent after the last process in the namespace exits. (I assume once the namespace no longer has any users, the mounts are automatically umounted?)

Ext4 data corruption trouble

Posted Oct 25, 2012 13:31 UTC (Thu) by nix (subscriber, #2304) [Link]

I think that's so, yes: that's how MNT_DETACH and MNT_EXPIRE are implemented (and, thus, umount -l).

Ext4 data corruption trouble

Posted Oct 24, 2012 23:34 UTC (Wed) by nix (subscriber, #2304) [Link] (14 responses)

[synopsis of my recent email to tytso]

OK, it turns out that you need to do rather crazy things to make this go wrong -- and if you hit it at the wrong moment, 3.6.1 is vulnerable too, and quite possibly every Linux version ever. To wit, you need to disconnect the block device or reboot *during* the umount. This may well be an illegitimate thing to do, but it is unfortunately also quite *easy* to do if you pull out a USB key.

Worse yet, if you umount -l a filesystem, it becomes dangerous to *ever* reboot, because there is as far as I can tell no way to tell when lazy umount switches from 'not yet umounted, mount point still in use, safe to reboot' to 'umount in progress, rebooting is disastrous'.

I still haven't found a way to safely unmount all filesystems if you have local filesystems nested underneath NFS filesystems (where the NFS filesystems may require userspace daemons to be running in order to unmount, and the local filesystems generally require userspace daemons to be dead in order to unmount).

It may work to kill everything whose cwd is not / or which has a terminal, then unmount NFS and local filesystems in succession until you can make no more progress -- but it seems appallingly complicated and grotty, and will break as soon as some daemon holds a file open on a non-root filesystem. What's worse, it leads to shutdown locking up if a remote NFS server is unresponsive, which is the whole reason why I started using lazy umount at shutdown in the first place!

Ext4 data corruption trouble

Posted Oct 25, 2012 0:06 UTC (Thu) by Kioob (guest, #56482) [Link] (7 responses)

Well «crazy things» ? Since 8 days, I had 5 servers (over ~200) with data corruption on ext4 partitions (over LVM, over Xen blockfront/blockback, over DRBD, over LVM). Specialy with partitions mounted with defaults,noatime,nodev,nosuid,noexec,data=ordered (MySQL InnoDB data).

I was thinking about a problem with DRBD, then I saw this news... so... I don't know. Is it really the only way to trigger that problem ?

Ext4 data corruption trouble

Posted Oct 25, 2012 0:31 UTC (Thu) by nix (subscriber, #2304) [Link] (6 responses)

It's the only way I've been able to find. /sbin/reboot -f on a system with mounted filesystems does not trigger this problem. Reboot after unmounting does not trigger this problem. Reboot *during* a umount, and *boom* goodbye fs.

I have speculated on ways to fix this for good, though they require a new syscall, a new userspace utility, changes to shutdown scripts, that others on l-k agree my idea is not utterly insane, and for me to bother to implement all of this. The latter is questionable, given the number of things I mean to do that I never get around to. :)

Ext4 data corruption trouble

Posted Oct 25, 2012 1:14 UTC (Thu) by luto (guest, #39314) [Link] (5 responses)

Fix what for good?

If you want to cleanly unmount everything, presumably you want (a) revoke and (b) unmount-the-$!%@-fs-even-if-it's-in-use. (I'd like both of these.)

If you want to know when filesystems are gone, maybe you want to separate the processes of mounting things into the FS hierarchy from loading a driver for an FS. Then you could force-remove-from-hierarchy (roughly equivalent to umount -l) and separately wait until the FS is no longer loaded (which has nothing to do with the hierarchy).

If you want your system to be reliable, the bug needs fixing.

Ext4 data corruption trouble

Posted Oct 25, 2012 1:25 UTC (Thu) by dlang (guest, #313) [Link] (4 responses)

As I understand his post, there are two big issues.

1. you can't even try to unmount a filesystem if it's mounted under another filesystem that you can't reach

example

mount /dev/sda /
mount remote:/something on /something
mount /dev/sdb /something/else

now if remote goes down, you have no way of cleanly unmounting /dev/sdb

2. even solving for #1, namespaces cause problems because with namespaces, it is now impossible for any one script to unmount everything, or even to find what pids need to be killed in all the pid namespaces to be able to make a filesystem idle so that is can be unmounted.

Ext4 data corruption trouble

Posted Oct 25, 2012 1:30 UTC (Thu) by ewen (subscriber, #4772) [Link] (3 responses)

Wouldn't "mount -o remount /dev/sdb" solve the first problem? In theory it should close off the journal and get the file system into a stable state, but not require the non-responsive NFS server to reply. And in theory it should be safe to force unmount a read-only file system, once it's reached that read-only/stable state.

However finding all the file systems in the face of many PID/filesystem name spaces is still non-trivial.

Ewen

Ext4 data corruption trouble

Posted Oct 25, 2012 1:56 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

dlang has it right, that's the problem I was trying to solve with this lazy umount kludge. And for many, many years, it worked!

I had no idea you could use remounting (plus, presumably, readonly remounting) on raw devices like that. That might work rather well in my case: all my devices are in one LVM VG, so I can just do a readonly remount on /dev/$vgname/*.

But in the general case, including PID and fs namespaces, that's really not going to work, indeed.

Ext4 data corruption trouble

Posted Oct 25, 2012 3:50 UTC (Thu) by ewen (subscriber, #4772) [Link] (1 responses)

Yes, I did intend to say "mount -o remount,ro /dev/sdb". For years it's been my usual "try to minimise the harm" approach, when dealing with a stuck server due to some mounts not responding. I'm not sure what happens with a modern server where the same volume is mounted in more than one location (hopefully all the mounts end up read-only). But it definitely works with /dev/mapper/$vgname-$lvname for instance if it's only mounted once.

Ewen

Ext4 data corruption trouble

Posted Oct 25, 2012 11:16 UTC (Thu) by nix (subscriber, #2304) [Link]

Bind mounts will be fine with this: they all share the same read-only state unless explicitly otherwise requested.

Ext4 data corruption trouble

Posted Oct 25, 2012 1:26 UTC (Thu) by ewen (subscriber, #4772) [Link] (5 responses)

For the benefit of those following along at home, this appears to be one of the more detailed posts on LMKL:

https://lkml.org/lkml/2012/10/24/620

(there are others earlier/later, but they mostly only make sense in context.)

ObTopic: possibly there may be an ordering of write operations which ensures that the journal close off/journal replay is idempotent (ie, okay to do twice), but it would appear that EXT4 in some kernel versions either doesn't currently have that for some actions or doesn't have sufficient barriers to ensure the writes hit stable storage in that order. So there seems to be a (small) window of block writing vulnerability during the EXT4 unmounting. (Compare with, eg, the FreeBSD Soft Updates file system operation ordering -- http://en.wikipedia.org/wiki/Soft_updates.)

Ewen

Ext4 data corruption trouble

Posted Oct 25, 2012 2:00 UTC (Thu) by nix (subscriber, #2304) [Link] (4 responses)

Those following along at home is probably half the human race, now we have posts on Phoronix, Slashdot *and* Heise. Who the hell submits things like this to random-terrified-user media outlets before we've even characterized the bloody problem? Every one of those posts is inaccurate, of course, through no fault of their own but merely because we didn't yet know what the problem was ourselves, merely that I and one other person were seeing corruption: we obviously started by assuming that it was something obvious and thus fairly serious, but that didn't mean we *expected* that to be true: I certainly expected the final problem to be more subtle, if still capable of causing serious disk corruption (my warning here was just in case it was not).

But now there's a wave of self-sustaining accidental lies spreading across the net, damaging the reputation of ext4 unwarrantedly, and I started it without wanting to.

It's times like this when I start to understand why some companies have closed bug trackers.

Ext4 data corruption trouble

Posted Oct 25, 2012 9:31 UTC (Thu) by man_ls (guest, #15091) [Link] (2 responses)

That which doesn't kill ext4, makes ext4 stronger. Once the general media realize that only a fraction of a percent of users are affected, they will probably post some kind of correction and everything will go back to normal -- and ext4 will be stronger by it.

Remember the stupid "neutrinos faster than light" news where all media outlets were reporting that Einstein had been rebutted, and that we were close to time travel? In the end it was all a faulty hardware connection, the original results were corrected and the speed of light paradigm came out stronger than ever. In that case it was a few hundreds of scientists signing the original paper that started the wildfire, instead of checking and rechecking everything for a few months before publishing such a fundamental result. I hope they are widely discredited now, all 170 of them (I am not joking now, either in the figure or in the malignity).

So in a few days the bug will be pinned to a very specific and uninteresting condition, and ext4 will come out stronger than ever. One data point: I have seen no corruption with 3.6.3, but then I am never rebooting while unmounting. Now I will be unmounting with extra care :)

Ext4 data corruption trouble

Posted Oct 25, 2012 13:33 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

That FTL neutrino case is actually more similar than you thought -- scientific paper publication (and, these days, arxiv) is directly analogous to development lists like lkml -- it is where the practitioners in the field communicate. So having hundreds of scientists sign that paper is quite expected -- they worked on the collaboration, after all. What is unjustified is for the general media to pick up something like that, always and necessarily a work-in-progress, and consider it a finished deal, certain, unchanging.

LWN's coverage of this was much much better, emphasising the unclear and under-investigation nature of the thing.

A few things

Posted Oct 25, 2012 14:03 UTC (Thu) by man_ls (guest, #15091) [Link]

Actually, the "neutrino anomaly" team gave several press conferences and a webcast. Without that attention-seeking part the story would probably not have blown so big. Imagine if Tso had given a press conference explaining the ext4 bug, instead of just dealing with it?

Also, hundreds of names on a paper may be standard practice, but it is ridiculous. Somebody should compute something like the Einstein index but dividing each result by the number of collaborators.

Finally, it appears from the wikipedia article that the Gran Sasso scientists had sat on their results for six months before publishing them. Even though I called for the same embargo in my post, that they did somehow only makes it worse -- but then life is unfair.

Ext4 data corruption trouble

Posted Oct 25, 2012 10:17 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> Who the hell submits things like this to random-terrified-user media outlets before we've even characterized the bloody problem?

Data corruption/loss is scary. Even more than most security problems (a really bad security problem will be used by some joker to erase your data, so a really bad security problem is equivalent to data corruption/loss).

If the data corruption/loss affects the most used and stable filesystem in the Linux world, the steps to reproduce sound reasonably easy to hit by chance (just reboot twice quickly), and the data loss is believed to be prevented by just not upgrading/downgrading a minor point release, it is natural human behavior to want EVERYONE to know RIGHT NOW, so people will not upgrade/will downgrade until it is safe. Thus the posts on every widely read Linux-related news media people could find.

Even now with the problem being shown to happen in less common situations, and with it being suspected of being older than 3.6.1, I would say 3.6.3 is burned, and people will not touch it with a 3-meter pole until 3.6.4 is out. Even if 3.6.4 has no ext4-related patches at all.

Ext4 data corruption trouble

Posted Oct 25, 2012 16:30 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

I should also take a moment to thank my boss at Oracle for not uttering one word of complaint while I did all this repeated reboot/corrupt/reboot-again stuff, even though it was completely unrelated to the job I was supposed to be doing (except insofar as it is hard to work on anything if your filesystem is mangled!)

I did try to work on it only outside working hours, but it's sometimes hard to concentrate on anything else when your filesystems are at risk, so I fear it did compromise my productivity at other times. So, thank you, Elena. :)

Ext4 data corruption trouble

Posted Oct 25, 2012 17:25 UTC (Thu) by cesarb (subscriber, #6266) [Link]

> my boss at Oracle

Did you boss at Oracle tell you to try btrfs instead? ;-)

Ext4 data corruption trouble [Updated]

Posted Oct 25, 2012 1:52 UTC (Thu) by tytso (subscriber, #9993) [Link]

There is a G+ post which folks who are interested might want to follow:

https://plus.google.com/117091380454742934025/posts/Wcc5t...

I also want to assure people that before I send any pull request to Linus, I have run a very extensive set of file system regression tests, using the standard xfstests suite of tests (originally developed by SGI to test xfs, and now used by most of the developers of the major, actively-maintained file systems). So for example, my development laptop, which I am currently using to post this note, is currently running v3.6.3 with the ext4 patches which I have pushed to Linus for the 3.7 kernel. Why am I willing to do this? Specifically because I am constantly running a very large set of automated regression tests on a very regular basis, and certainly before sending the latest set of patches to Linus.

Ext4 data corruption trouble [Updated]

Posted Oct 25, 2012 4:33 UTC (Thu) by butlerm (subscriber, #13312) [Link] (11 responses)

Shouldn't quiescing all mounted filesystems and putting them into a read only state be something that the filesystem itself have responsibility to do? In the kernel, when instructed that the system is about to reboot/sleep/power off?

Expecting user code to track down all mounted filesystems and unmount them in reverse topological order doesn't sound like the sort of thing one would want filesystem integrity to depend on. It sounds like an ugly hack of the first magnitude.

Ext4 data corruption trouble [Updated]

Posted Oct 25, 2012 11:19 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

We are of one mind, I think: <http://lkml.indiana.edu/hypermail/linux/kernel/1210.3/007...>

Ext4 data corruption trouble [Updated]

Posted Oct 25, 2012 22:16 UTC (Thu) by butlerm (subscriber, #13312) [Link]

A 'umountall' system call looks like it could do the job nicely, with one exception.

It might be helpful to have a way to put all filesystems in a read only state, for the benefit of shutdown code that only requires (file level) read access, such as code to shutdown RAID devices.

A more general problem is that you might have loopback mounts and nested block devices, so what you really need is a combined operation that does the topological sort and quiesces filesystems and block devices in reverse stacking order.

Ext4 data corruption trouble [Updated]

Posted Oct 25, 2012 16:02 UTC (Thu) by wahern (subscriber, #37304) [Link] (2 responses)

I still have an ingrained habit of typing `sync' at idle moments in my shell, picked up in the early days of ext2. Re-downloading the Slackware floppy set (because invariably one of the disks of a previous downloaded set would be go bad) over a 2400 baud modem was not fun times. Because things were generally less stable back then, and you never knew when the system might crash and leave the disk corrupt and unbootable, vigorous and frequent syncing was the only alternative.

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 0:44 UTC (Fri) by jhardin (guest, #3297) [Link] (1 responses)

I still have an ingrained habit of typing `sync' at idle moments in my shell, picked up in the early days of ext2.

+1, except I picked up the habit on SCO Xenix.

sync;sync;sync

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 1:22 UTC (Fri) by pr1268 (guest, #24648) [Link]

I like this solution:

/* 3sync.c */
#include <unistd.h> /* for sync(2) */
#include <time.h>   /* for struct timespec and nanosleep(2) */

int main()
{
    struct timespec ts = { 0, 1000000L };
    sync();
    (void) nanosleep(&ts, 0);
    sync();
    (void) nanosleep(&ts, 0);
    sync();
    return 0;
}

Compiled and placed in $HOME/bin (which is in $PATH), and now it gets used quite frequently in other scripts I run. Which either (1) is horribly inefficient, and/or (2) shows how paranoid I am with data corruption. Sigh.

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 0:21 UTC (Fri) by ncm (guest, #165) [Link] (5 responses)

Maybe I'm missing something. Why not start by remounting all the file systems to a synchronous-write mode, first? Then, sync. After that, you can execute a HCF(*) instruction, wait long enough for the lying drives to drain their internal queues, power down, and everything should be fine. Right?

(*) "Halt and Catch Fire"

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 0:26 UTC (Fri) by nix (subscriber, #2304) [Link] (4 responses)

That requires that you can *find* all the file systems. In the presence of PID and fs namespaces, no single process can necessarily do that (nor can any single process necessarily talk to any group of processes that can do that, even indirectly).

This is somewhat unlikely, it is true.

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 0:45 UTC (Fri) by neilbrown (subscriber, #359) [Link] (3 responses)

echo S > /proc/sysrq-trigger
echo U > /proc/sysrq-trigger

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 14:04 UTC (Fri) by nix (subscriber, #2304) [Link]

Oh. Yeah. That would work. I completely forgot that sysrq-trigger even existed. :)

New syscall, who needs it, though relying on sysrq-trigger for something as fundamental as shutting down seems a little icky.

Ext4 data corruption trouble [Updated]

Posted Oct 26, 2012 14:37 UTC (Fri) by butlerm (subscriber, #13312) [Link] (1 responses)

Does that work with loopback mounted filesystems, i.e. is the sysrq handler smart enough to unmount a loopback mounted filesystem before the filesystem that holds its backing store?

Ext4 data corruption trouble [Updated]

Posted Oct 27, 2012 20:17 UTC (Sat) by nix (subscriber, #2304) [Link]

Sysrq-U remounts read-only, it doesn't unmount.

(But, still, you do want to remount the loopback-mounted filesystem first, or that umount won't be able to do e.g. journal flushes...)

And the answer is no: it ends up calling do_emergency_remount(), which does a straight iteration over all super_blocks: there is no dependency analysis of any kind: I'd expect (given the way super_blocks is built) to unmount the backing store fs *before* the loopback-mounted fs.

(Perhaps do_emergency_remount() should iterate over super_blocks in reverse order?)

Ext4 data corruption trouble [Updated]

Posted Oct 27, 2012 20:19 UTC (Sat) by nix (subscriber, #2304) [Link]

A bit more info: this is observed only if journal_checksum is on (I wasn't using it because I knew it was dangerous, but hadn't noticed that journal_async_commit implied journal_checksum). journal_async_commit combined with nobarrier is even worse: on remount after umounting with those options (and rebooting right after), you don't get a journal abort and readonly remount, you get a remount with no indication of corrupted journal.

(nobarrier on its own, without journal_checksum or anything that implies it, seems to be fine, as long as you have suitable battery-backed hardware of course.)