User: Password:
|
|
Subscribe / Log in / New account

Removing ext2 and/or ext3

Removing ext2 and/or ext3

Posted Feb 10, 2011 15:46 UTC (Thu) by rfunk (subscriber, #4054)
Parent article: Removing ext2 and/or ext3

"Beyond that, mounting an ext2/3 filesystem under ext4 allows the system to use a number of performance enhancing techniques - like delayed allocation - which do not exist in the older implementations. In other words, ext4 can replace ext2 and ext3, maintain compatibility, and make things faster at the same time."

Isn't delayed allocation still a rather controversial aspect of ext4? Or has it been decided that all the applications will be rewritten to fsync all the time?

I realize that my information is likely out of date, but I'm sure I'm not the only one sticking with ext3 due to this issue.


(Log in to post comments)

Removing ext2 and/or ext3

Posted Feb 10, 2011 23:00 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

I don't think the idea is controversial. The implementation required a few workarounds and that's all in place now and since then all the mainstream distributions have moved to using Ext4 by default, including RHEL 6.

Removing ext2 and/or ext3

Posted Feb 10, 2011 23:19 UTC (Thu) by rfunk (subscriber, #4054) [Link]

Er, it was *extremely* controversial two years ago. Granted, two years is a long time in technology, but I'm not entirely clear on the long-term resolution (other than some workaround patches in 2.6.30 that not everyone was satisfied with).

As for defaults.... I just did a Kubuntu 10.10 install not too long ago, and was asked to choose from a number of different filesystems (including all the ext[234] variants). I don't remember a specific default, though it's possible that I'm not remembering ext4 being pre-checked.

Removing ext2 and/or ext3

Posted Feb 11, 2011 13:17 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

If you actually believe the idea itself was controversial, you are probably not aware that XFS and other filesystems implemented delayed allocation ages before Ext4 did and without any fuss whatsoever and yes, Ext4 is the mainstream default now and for good reasons. If you have a specific problem to discuss, point to it. I would doubt "everyone" would be satisfied with any one filesystem.

Removing ext2 and/or ext3

Posted Feb 11, 2011 13:54 UTC (Fri) by rfunk (subscriber, #4054) [Link]

If you deny that delayed allocation was controversial, you are probably not aware of the discussions that happened in March of 2009.

I already linked to the LWN article discussing the 2009 controversy. You can follow that link and the links within that article, or Google "ext4 delayed allocation". (The third link there is "Linus Torvalds upset over ext3 and ext4"! The Wikipedia article currently has a whole section about "Delayed allocation and potential data loss".)

I'm aware that XFS implemented delayed allocation before ext4. I'm also aware that XFS became notorious for badly messing up files and filesystems when there are power failures or similar; I'm especially aware of that since it happened to me multiple times, but any discussion of Linux filesystem reliability inevitably includes mention of XFS's problems. I investigated and learned that XFS had been explicitly designed for server room situations where the power *never* fails. Before that I was a big fan of XFS (having actually used it on SGIs in the 90s), and after that I went back to ext3 and have not had any problems with it.

I'd really like to hear from someone who acknowledges the 2009 controversy (and that those wary of delayed allocation at the time had a point), and who can explain how it's improved since then to the point that it's considered safe for ext3 users.

Removing ext2 and/or ext3

Posted Feb 11, 2011 18:47 UTC (Fri) by zlynx (subscriber, #2285) [Link]

The workaround / hack for ext4 reliability was to force file contents to disk before any rename operations on that file.

Since almost all user-space file operations write files "atomically" by writing a new copy and renaming over the old copy, this works well.

This rename hack is also much faster than doing fsync on each file because the flush/rename combination may be delayed indefinitely. As long as the rename is always done after the file contents are on disk, the filesystem view will be consistent on reboot.

Removing ext2 and/or ext3

Posted Feb 11, 2011 18:50 UTC (Fri) by dlang (subscriber, #313) [Link]

unfortunately your recommendation does not protect the file in the case of a power failure.

if you want your rename to be safe across a crash/power failure you need to do a fsync.

there have been some hacks added to some filesystems to try and detect this to make it safer, but safer != safe

yes, ext3 let you get away with things like this (at least in the most common case), but no other filesystem on any *nix OS does.

the Unix spec says that renames are atomic, but that's only talking about a running system, not across a crash

Removing ext2 and/or ext3

Posted Feb 11, 2011 21:09 UTC (Fri) by zlynx (subscriber, #2285) [Link]

This hack does make the file safe across rename during power failure for ext4. That's all I meant. It's always been safe in ext3 in ordered mode.

Either the file named X will contain new contents or old contents. It will never be blank or half written.

Again this only applies to ext3 and ext4, and only in ext4 after kernel 2.6.30.

Removing ext2 and/or ext3

Posted Feb 11, 2011 21:20 UTC (Fri) by jrn (subscriber, #64214) [Link]

> This hack does make the file safe across rename during power failure for ext4.

That's simply not true, sadly. If I understand correctly, the patch[1] makes the race window shorter but does not eliminate it[2].

[1] v2.6.30-rc1~416^2~15 (ext4: Automatically allocate delay allocated blocks on rename, 2009-02-23)
[2] https://bugzilla.kernel.org/show_bug.cgi?id=15910

Removing ext2 and/or ext3

Posted Feb 11, 2011 21:28 UTC (Fri) by zlynx (subscriber, #2285) [Link]

From that, it looks as if the ext4 maintainers didn't follow up with what they claimed the patch would do.

Allocate on rename is different from write on rename. All the discussions I followed claimed it would write the data before writing the rename.

I wonder why they thought allocate would be sufficient? Seems like they didn't listen to the users after all.

Delayed allocation safety

Posted Feb 11, 2011 23:28 UTC (Fri) by jrn (subscriber, #64214) [Link]

> Seems like they didn't listen to the users after all.

I think they did. There is nodelalloc for those who expect frequent crashes or do not want delayed allocation for some other reason. There is that hack to make 0-length files rare. And updating files using the common rename idiom does not force a painfully slow journal commit like it did in ext3 with data=ordered.

Meanwhile there is more awareness among application developers about the need to use fsync or fdatasync for data updates that need to persist and not to use those functions for updates that are not so crucial. So apps are finally doing the right thing on ubifs and hfs+.

So at least this ext4 user wouldn't have it any other way.

Delayed allocation safety

Posted Feb 11, 2011 23:43 UTC (Fri) by zlynx (subscriber, #2285) [Link]

What I gathered from the bug report is that the allocation will take place but the data is still in limbo when the rename is written to disk.

So you end up with:
1. Space allocated for the new file.
2. Directory written to disk with new filename.
---- CRASH HAPPENS HERE
3. New file contents written to disk.

The sequence of events above is hardly better than it was before the fix.
Did I miss something in the sequence?

Just don't allow step 2 to happen before step 3 and everyone would have been happy.

Removing ext2 and/or ext3

Posted Feb 11, 2011 19:35 UTC (Fri) by rfunk (subscriber, #4054) [Link]

Yeah, that workaround seems to have addressed most of the problems. I guess I was hoping that there had been more reliability work since then, or people saying "yes this was a problem but it's completely solved now".

Removing ext2 and/or ext3

Posted Feb 14, 2011 11:44 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link]

The implementation was discussed in detail and I am well aware of that but the claim that the idea itself was controversial has no real basis IMO. XFS is a very robust and scalable filesystem with a extensive test suite and is being adopted by Ext4 as well last I looked.

"I investigated and learned that XFS had been explicitly designed for server room situations where the power *never* fails."

I heard of this myth several times before but have never actually seen a citation. Since you have claimed that you did some research and investigation, pointers would be helpful.

Removing ext2 and/or ext3

Posted Feb 14, 2011 14:56 UTC (Mon) by rfunk (subscriber, #4054) [Link]

If, after the links I've posted (assuming you followed any of them), you still believe that delayed allocation in ext4 was not controversial, we have very different definitions of "controversial".

Meanwhile, I don't care how "robust and scalable" and tested XFS is or claims to be; my experience shows that it's not reliable enough for my purposes, and others have similar experiences. (Again, I was once a big fan of XFS; then I discovered some of its failure modes, and found them unacceptable.)

I'd love to give you the citations about XFS's history, but since the last time I looked into that aspect in depth was around five years ago (and the first time was more than eleven years ago), I no longer have them anywhere near handy.

Removing ext2 and/or ext3

Posted Feb 14, 2011 16:34 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link]

"If, after the links I've posted (assuming you followed any of them), you still believe that delayed allocation in ext4 was not controversial"

None of the links are new to me but the way you phrase it suggests that you are equating the idea and its implementation. The idea is too old and widely implemented in other filesystems to be controversial and it is a pretty much a required feature to get better performance. Implementation in Ext4 had some rough edges initially and that isn't a current problem.

As far as the robustness of XFS is concerned, personal anecdotes are just not interesting at all since it is not independently verifiable. I can claim that I have used XFS is a number of places and found it very robust but it doesn't really prove anything. What is interesting is where and how it is getting used and so far the deployments don't suggest that it is not worth trusting. Unless you can find a reference to the story of how XFS was designed to be only used in environments where power never fails, I just don't buy it.

Removing ext2 and/or ext3

Posted Feb 14, 2011 16:42 UTC (Mon) by dlang (subscriber, #313) [Link]

for what it's worth, XFS has had a lot of attention in the last 3 years or so.

when it was initially merged it had a _lot_ of SGI baggage (shim layers between the XFS code and the rest of the kernel). it has had a lot of cleanup and maintinance, including a lot of testing (and the development of a filesystem test suite that other filesystems are starting to adopt since they don't have anything as comprehensive)

so while I have been using XFS for about 7 years, I would not be surprised to hear that people had problems about 5 years ago. I would be surprised if those problems persisted to today.

personally, I don't trust Ext4 yet, it's just too new, and it's still finding corner cases that have problems. It also is not being tested against multi-disk arrays very much (the developers don't have that sort of hardware, so they test against what they have)

Removing ext2 and/or ext3

Posted Feb 14, 2011 17:02 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link]

Yes I am assuming that people are talking about current Ext4 and XFS code rather than historical issues.

"It also is not being tested against multi-disk arrays very much (the developers don't have that sort of hardware, so they test against what they have)"

IIRC, this was tested by Red Hat before making it default in RHEL 6. That however is not the very latest Ext4 code.

Removing ext2 and/or ext3

Posted Feb 14, 2011 17:12 UTC (Mon) by dlang (subscriber, #313) [Link]

I made the statement about testing based on posts to the kernel mailing list by the developers.

yes, redhat did testing, but I'll bet that their testing was of the 'does it blow up' type of thing rather than performance testing.

In any case, the fact that the developers are not testing against that type of disk subsystem means that they are not looking for, or achieving the best performance when used with those subsystems (this was also confirmed by the Ext4 devs on the kernel mailing list)

I'm not saying that the Ext4 devs are incompetent or not doing the best that they can with what they have, just that the fact that they are not working with such large systems means that they are not running into the same stresses in their testing and profiling that people will run into in the real world with large systems.

the current XFS devs may or may not have access to such large arrays nowdays, but historically SGI was dealing with such arrays and did spend a lot of time researching how to make the filesystem as fast as it could be on such arrays, and that knowledge is part of the design of XFS. the current maintainers could destroy this as they are updating it, but this is not very likely.

Removing ext2 and/or ext3

Posted Feb 14, 2011 17:51 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link]

"yes, redhat did testing, but I'll bet that their testing was of the 'does it blow up' type of thing rather than performance testing"

I wouldn't bet on that. Red Hat has a fairly large filesystem team and performance team and run performance tests routinely, for public benchmarks (useful to convince customers) and otherwise. All the major Ext4 and XFS developers work for large vendors (Google, IBM, Red Hat etc) and I would have expected them to have access to enterprise hardware. XFS is known to scale better on big hardware atleast historically because of its legacy but the gap has reduced considerably in recent kernel versions.

Removing ext2 and/or ext3

Posted Feb 14, 2011 17:53 UTC (Mon) by dlang (subscriber, #313) [Link]

the last discussion I saw on this topic was within the last two kernel versions, and it was a report of bad behavior on large systems like this and the Ext4 dev (I think it was Ted, but I'm not sure) stated at that time that the ext4 devs did not have access to large systems for their testing at that time.

so this is still pretty recent info.

Removing ext2 and/or ext3

Posted Feb 11, 2011 19:31 UTC (Fri) by rfunk (subscriber, #4054) [Link]

More information from 2009:

Avoiding ext4 for safety

Posted Feb 11, 2011 18:19 UTC (Fri) by jrn (subscriber, #64214) [Link]

> I realize that my information is likely out of date, but I'm sure I'm not the only one sticking with ext3 due to this issue.

Wouldn't ext4 with nodelalloc be just as safe?

Avoiding ext4 for safety

Posted Feb 11, 2011 18:32 UTC (Fri) by rfunk (subscriber, #4054) [Link]

Quite possibly; I'm not familiar enough with all the differences between ext3 and ext4 to know.

Removing ext2 and/or ext3

Posted Feb 11, 2011 21:10 UTC (Fri) by rfunk (subscriber, #4054) [Link]

OK, in researching this I'm reminded that after the ext4 workarounds in 2.6.30, 2.6.31 included changes to *both* ext3 and ext4 that basically made ext3 slightly less reliable (by default) that it had been, but ext4 equivalently reliable.
http://lwn.net/Articles/328363/

So if I understand things correctly, anyone using ext3's (upstream) defaults since 2.6.31 shouldn't see any less reliability by switching to ext4.....

On the other hand, distributions have not necessarily followed the upstream defaults. A quick quick of my Ubuntu 10.10 kernel config shows that Ubuntu chose to stick with data-ordered in ext3, rather than moving to the new upstream default of data=writeback. I'm sure they're not the only one.

ext3/ext4 reliability

Posted Feb 11, 2011 21:22 UTC (Fri) by rfunk (subscriber, #4054) [Link]

And a relevant LWN discussion thread from 2010:
http://lwn.net/Articles/387544/

default data ordering mode

Posted Feb 18, 2011 14:45 UTC (Fri) by dpotapov (guest, #46495) [Link]

The default data ordering mode was changed to writeback in 2.6.30 by Linus Torvalds, with some very strong words against setting EXT3_DEFAULTS_TO_ORDERED. The main argument against 'data=ordered' is latency. While 'data=ordered' is only slightly slower when it comes to FS performance, it has more than ten times worse latency under heavy load. OTOH, 'data=writeback' can potential expose stale data after a crash. So, for servers, 'data=ordered' seems to be a better choice, while on a desktop (where you care about latency), 'data=writeback' usually makes more sense.

However, most distributives have decided to stay with the ordered mode. Therefore, in 2.6.31, words against EXT3_DEFAULTS_TO_ORDERED=y were replaced by Ted Ts'o with some more neutral language describing trade-offs. In 2.6.36, the default mode was changed back to 'data=ordered' by Dave Chinner from RedHat: "because we should be caring far more about avoiding stale data exposure than performance."

AFAIK, there is no significant difference between ext3 and ext4 when it comes exposing stale data. So, it is not exactly clear to me why this reason applies in one case but not in the other.

As to Ubuntu, Karmic was released with data=writeback, and remained so until the end of April 2010, but then it was suddenly changed to data=ordered. Both changes happened without any warning to users.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds