User: Password:
|
|
Subscribe / Log in / New account

ext4 and data loss

ext4 and data loss

Posted Mar 12, 2009 6:55 UTC (Thu) by jamesh (guest, #1159)
In reply to: ext4 and data loss by jimparis
Parent article: ext4 and data loss

The problem only exists when you're journalling metadata but not the actual file metadata.

Due to the behaviour of ext3, to write the metadata changes to disk (creation of "file.new" and rename of "file.new" to "file"), it was necessary for the file's blocks to be written out to disk so the previous contents won't be available. This is almost but not quite the same as journalling data too (it won't protect against partial writes if you cut power at the wrong time).

With ext4's delayed allocation, the metadata changes can be journalled without writing out the blocks. So in case of a crash, the metadata changes (that were journalled) get replayed, but the data changes don't.

If you journal data changes, presumably this won't happen on either ext3 or ext4. That is likely to give a performance hit though.


(Log in to post comments)

ext4 and data loss

Posted Mar 12, 2009 8:49 UTC (Thu) by job (guest, #670) [Link]

Most people (by which I mean me) would probably want metadata changes that hasn't yet had it's corresponding blocks written out yet to be dropped instead of replayed. That is, see the old file rather than empty new one.

When we see these patches instead of the behaviour we expect, we're confused. Is the behaviour hard to implement for some reason, or are we wrong in expecting it?

Delayed allocation is fine but I think most people expect metadata to be delayed accordingly.

ext4 and data loss

Posted Mar 12, 2009 16:14 UTC (Thu) by nye (guest, #51576) [Link]

This seems both very clear, and so obviously wrong that I must have misunderstood.

Are we really saying that ext4 commits metadata changes to disk (potentially a long time) before committing the corresponding data change?

That surely can't be right. Why on earth would you write metadata describing something which you know doesn't exist yet - and may never exist? Especially when the existing metadata describes something that does.

Perhaps what we're really saying is that ext4 does them in the correct order, but doesn't use barriers by default and hence they sometimes get written by the device in the wrong order? That would make more sense at least.

This is really confusing me.

ext4 and data loss

Posted Mar 13, 2009 0:31 UTC (Fri) by nix (subscriber, #2304) [Link]

That's exactly what it does. Patches recently committed to the mainline
cause the blocks to be aggressively flushed if the file is closed and was
originally opened via O_TRUNC, or if the file is renamed on top of another
one.

(I'd prefer it to delay the metadata operation as well, but apparently
that's really hard. Knowing what a nightmare it is to get rename() right,
I can understand that doing it lazily might not be anyone's cup of tea.)

ext4 and data loss

Posted Mar 22, 2009 22:01 UTC (Sun) by muwlgr (guest, #35359) [Link]

I would not say "aggressively flushed". As I understand, blocks of these files (created/truncated, then closed, or renamed to replace previous file) are just explicitly allocated (triggering all necessary metadata updates to be then written in the correct order).

ext4 and data loss

Posted Mar 12, 2009 17:58 UTC (Thu) by cpeterso (guest, #305) [Link]

With ext4's delayed allocation, the metadata changes can be journalled without writing out the blocks. So in case of a crash, the metadata changes (that were journalled) get replayed, but the data changes don't.
This is so broken. How can anyone think this is a good idea? Or an "upgrade" from ext3?

ext4 and data loss

Posted Mar 13, 2009 0:24 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

With ext4's delayed allocation, the metadata changes can be journalled without writing out the blocks. So in case of a crash, the metadata changes (that were journalled) get replayed, but the data changes don't.
This is so broken. How can anyone think this is a good idea? Or an "upgrade" from ext3?

Because of the speedup. Since the beginning of Unix, people have sacrificed crash survivability for speed. An Ext2 filesystem after a crash can be in much worse state than this (because it doesn't journal even the metadata).

Even given user-level options to make the choice, the vast majority choose speed. So if delayed allocation makes access even faster, I can understand someone trading a higher probability of corrupting files.

As has been noted, applications that are affected are the ones that already accept a fair amount of corruption risk, so this is just a quantitative increase in risk, not qualitative.

The ext3 behavior that some people prefer is just an accident, by the way. The reason data=ordered is the default with ext3 is security, not crash resistance. The crash resistance is a by-product. Had ext3 originally done what ext4 does, people wouldn't have called it wrong.

ext4 and data loss

Posted Mar 13, 2009 1:07 UTC (Fri) by dododge (subscriber, #2870) [Link]

And as noted in Ted's message linked in the article, this potential disconnect between data and metadata has been used by other high-performance filesystems on Linux for years. ext3 is the odd man out, due to an unintentional quirk in its design.

For example if you shut down an XFS filesystem improperly, when it comes back up it may claim that recent files exist and even have the expected size -- but when you try to read them you might get zero blocks instead of real data. I believe JFS does the same thing.

ext4 and data loss

Posted Mar 13, 2009 1:26 UTC (Fri) by quotemstr (subscriber, #45331) [Link]

If you shut down an XFS filesystem improperly, when it comes back up it may claim that recent files exist and even have the expected size -- but when you try to read them you might get zero blocks instead of real data. I believe JFS does the same thing.
Is it any wonder, then, that XFS and JFS are seldom used despite their otherwise-wonderful characteristics?

ext4 and data loss

Posted Mar 13, 2009 10:44 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

The real reason why users don't use those filesystems much is because pretty much all the distributions have been sticking to the Ext* codebase as default. Distributions do that because of lack of cross vendor support for the other filesystems. If you are a vendor, you will want to have in-house filesystem experts before declaring support for a filesystem.

ext4 and data loss

Posted Mar 13, 2009 13:26 UTC (Fri) by man_ls (guest, #15091) [Link]

That sounds like a circular argument: distros don't have XFS or JFS experts because nobody cares about them anymore, and nobody cares about them because distros don't have experts. But the code to all these filesystems is open and has been there for a long while; why do distros have ext3 experts to begin with?

The real reason ext3 is popular is (or so I contend) that it is stable and crash-resistant by default. Crash resistance may have been an design accident in the beginning, but it is what got it to be the most popular filesystem for Linux. It would seem that people are not so willing to trade robustness for speed. After all the mission of a filesystem is to keep your data until you ask for it; is it any wonder that people like it when it does just that, no matter what?

ext4 and data loss

Posted Mar 15, 2009 19:32 UTC (Sun) by rahulsundaram (subscriber, #21946) [Link]

It isn't a circular argument. Ext2 already existed before these other filesystems and Ext3 being the only filesystem backward compatible with it and using a very similar codebase meant that it had a leg up with distributions and users trusting it more and adopting it far more quickly. Also while Ext* has been a cross vendor effort, other filesytems like JFS and XFS were developed by a single company like SGI or IBM and never grew out of it. Btrfs made a deliberate effort to avoid this problem and has succeeded in doing so.

ext4 and data loss

Posted Mar 17, 2009 22:01 UTC (Tue) by pphaneuf (subscriber, #23480) [Link]

My favourite characteristic of the extX family of filesystem is the ability to fsck while it being mounted. Often overlooked, but wow, do you ever miss that when you have to work with another filesystem for a period of time...

ext4 and data loss

Posted Mar 17, 2009 22:37 UTC (Tue) by nix (subscriber, #2304) [Link]

*Why* do you miss the bizarre and dangerous ability to fsck a mounted
filesystem, often with umount-or-reboot-pleeze following it? Because your
early userspace is too deficient to fsck / before mounting it?

ext4 and data loss

Posted Mar 17, 2009 22:59 UTC (Tue) by quotemstr (subscriber, #45331) [Link]

I assume he's talking about a read-only fsck. Any decent fsck should refuse to modify a mounted filesystem.

I agree, though, that even a read-only fsck of a filesystem mounted read-write doesn't seem that useful --- the on-disk state of a mounted filesystem is going to be slightly inconsistent anyway: it's likely that not everything has been flushed to disk yet.

Now a full (read and fix) fsck of a filesystem mounted read-only may be useful, and tolerably dangerous if followed immediately by a reboot.

ext4 and data loss

Posted Mar 17, 2009 23:45 UTC (Tue) by nix (subscriber, #2304) [Link]

Indeed fsck.ext[234] is perfectly happy to modify a read-only-mounted /.
It even has special behaviour (messages and exit codes) to tell you when
you have to reboot because it just modified a mounted filesystem.

I still think it's a disgusting cheap hack sanctified only because that's
the only way Unix systems have traditionally been able to fsck /. Now
Linux has early userspace, there is no excuse for it at all other than
back-compatibility with people who don't have an initramfs or initrd (how
many of them are there? Not many, I'd wager).

ext4 and data loss

Posted Mar 14, 2009 15:13 UTC (Sat) by jschrod (subscriber, #1646) [Link]

Well, I don't use XFS exactly for that reason, having been bitten by the "my whole file consists of 0x00" behaviour one time too often. And my argument is as anecdotical [sp?] as yours.

Joachim

ext4 and data loss

Posted Mar 19, 2009 1:26 UTC (Thu) by xoddam (subscriber, #2322) [Link]

anecdotal.

ext4 and data loss

Posted Mar 19, 2009 1:51 UTC (Thu) by jschrod (subscriber, #1646) [Link]

Thanks for the info. (I'm not a native English speaker, btw.)

non-native speakers

Posted Mar 19, 2009 2:23 UTC (Thu) by xoddam (subscriber, #2322) [Link]

Schon verstanden :-)


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds