|
|
Subscribe / Log in / New account

Ext3 and write caching by drives are the data killers...

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 17:48 UTC (Tue) by dlang (guest, #313)
In reply to: Ext3 and write caching by drives are the data killers... by Cato
Parent article: Ext3 and RAID: silent data killers?

of the three changes you are making

disabling write caches

this is mandatory unless you have battery backed cache to recover from failed writes. period, end of statement. if you don't do this you _will_ loose data when you loose power.

avoiding LVM

I have also run into 'interesting' things with LVM, and so I also avoid it. I see it as a solution in search of a problem for just about all users (just about all users would be just as happy, and have things work faster with less code involved, if they just used a single partition covering their entire drive.)

I suspect that some of the problem here is that ordering of things gets lost in the LVM layer, but that's just a guess.

data=journal,

this is not needed if the application is making proper use of fsync. if the application is not making proper use of fsync it's still not enough to make the data safe.

by the way, ext3 does do checksums on journal entries. the details of this were posted as part of the thread on linux-kernel.


to post comments

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 18:05 UTC (Tue) by Cato (guest, #7643) [Link] (3 responses)

Possibly data=journal is overkill, I was going by the Wikipedia page on ext3, link above. However a conservative setup is attractive at present as performance is far less important than reliability, for this PC anyway.

Do you know roughly when ext3 checksums were added, or by whom, as this contradicts the Wikipedia page? Must be since 2007, based on http://archives.free.net.ph/message/20070519.014256.ac3a2.... I thought journal checksumming was only added to ext4 (see first para of http://lwn.net/Articles/284037/) not ext3.

This sort of corruption issue is one reason to have multiple partitions; parallel fscks are another. In fact, it would be good if Linux distros automatically scheduled a monthly fsck for every filesystem, even if journal-based.

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 18:15 UTC (Tue) by dlang (guest, #313) [Link] (2 responses)

Ted Tso detailed the protection of the journal in this thread (I've deleted the particular message or I'd quote it for you)

I'm not sure I believe that parallel fscks on partitions on the same drive do you much good. the limiting factor for speed is the throughput of the drive. do you really gain much from having it bounce around interleaving the different fsck processes?

as for protecting against this sort of corruption, I don't think it really matters.

for flash, the device doesn't know about your partitions, so it will happily map blocks from different partitions to the same eraseblock, which will then get trashed on a power failure, so partitions don't do you any good.

for a raid array it may limit corruption, but that depends on how your partition boundaries end up matching the stripe boundaries.

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 18:44 UTC (Tue) by Cato (guest, #7643) [Link]

I still can't find that email, but this outlines that journal checksumming was added to JBD2 to support ext4: http://ext4.wiki.kernel.org/index.php/Frequently_Asked_Qu...

This Usenix paper mentions that JBD2 will ultimately be usable by other filesystems, so perhaps that's how ext3 does (or will) support this: http://www.usenix.org/publications/login/2007-06/openpdfs... - however, I don't think ext3 has journal checksums in (say) 2.6.24 kernels.

Ext3 and write caching by drives are the data killers...

Posted Sep 2, 2009 6:36 UTC (Wed) by Cato (guest, #7643) [Link]

I grepped the 2.6.24 sources, fs/ext3/*.c and fs/jbd/*.c, for any mention of checksum, and couldn't find it. However the email lists do have some reference to a journal checksum patch for ext3 that didn't make it into 2.6.25.

One other thought: perhaps LVM is bad for data integrity with ext3 because, as well as stopping barriers from working, LVM generates more fragmentation in the ext3 journal - that's one of the conditions mentioned by Ted Tso as potentially causing write reordering and hence FS corruption here: http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-05/m...

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 22:37 UTC (Tue) by cortana (subscriber, #24596) [Link] (2 responses)

> disabling write caches
>
> this is mandatory unless you have battery backed cache to recover from
> failed writes. period, end of statement. if you don't do this you _will_
> loose data when you loose power.

If this is true (and I don't doubt that it is), why on earth is it not the default? Shipping software with such an unsafe default setting is stupid. Most users have no ideas about these settings... surely they shouldn't be handed a delicious pizza smeared with nitroglycerin topping, and then be blamed when they bite into it and it explodes...

Ext3 and write caching by drives are the data killers...

Posted Sep 1, 2009 22:41 UTC (Tue) by dlang (guest, #313) [Link] (1 responses)

simple, enabling the write cache gives you a 10x (or better) performance boost for all the times when your system doesn't loose power.

the market has shown that people are willing to take this risk by driving all vendors that didn't make the change out of the marketplace

Ext3 and write caching by drives are the data killers...

Posted Sep 3, 2009 7:58 UTC (Thu) by Cato (guest, #7643) [Link]

True, but it would be good if there was something simple like "apt-get install data-integrity" in major distros, which would then help the user configure the system for high integrity by default and this was well publicised. This could include things like: disabling write cache, periodic fsck's, ext3 data=journal, etc.

It would still be better if distros made this the default but I don't see much prospect of this.

One other example of disregard for data integrity that I've noticed is that Ubuntu (and probably Debian) won't fsck a filesystem (including root!) if the system is on batteries. This is very dubious - the fsck might exhaust the battery, but the user might well prefer a while without use of their laptop due to no battery to a long time without use of their valuable data when the system gets corrupted later on...

Fortunately on my desktop with a UPS, on_ac_power returns 255 which counts as 'not on battery' for the /etc/init.d/check*.sh scripts.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds