LWN.net Logo

Advertisement

AOSP, Kernel Androidisms, System Server, Internals / 5-days / O'Reilly Author Instructor

Advertise here

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)

Posted Apr 25, 2006 23:16 UTC (Tue) by hch (guest, #5625)
In reply to: Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration) by dwheeler
Parent article: Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)

This has never been true. People have been trying to talk Ted out of that for at least five years, but he keeps on repeating it without backing the
facts unfortunately.


(Log in to post comments)

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 25, 2006 23:56 UTC (Tue) by khim (subscriber, #9252) [Link]

I was biten by this bug myself quite a few times: system looked "just fine" at first glance but SHA1sums of files were different. Unfortunatelly I had SHA1sums for some files but not for all of them and this was HUGE mess. In the end I've converted all servers to ReiserFS and never looked back.

Sorry, but "XFS is flacky when powered off under load" is NOT a Ted's delusion. I never was able to reproduce such problem without terabytes of data and high load so I know debugging is hard, but if you don't have high load and terabytes of data then you can use any filesystem and it'll not produce any difference, right ? Conclusion: do not use XFS... Ever,..

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 0:54 UTC (Wed) by hawk (subscriber, #3195) [Link]

On the other hand, with machines with terabytes of data and high load, one might consider having UPS... :-)

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 1:10 UTC (Wed) by sbergman27 (subscriber, #10767) [Link]

So why have journalling filesystems at all? So much development time.... just wasted. And all we needed to do was buy a UPS! ;-)

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 15:15 UTC (Wed) by NightMonkey (subscriber, #23051) [Link]

Certainly consider it, but a UPS doesn't do much in the presence of a short on the MB, or all power supplies failing, or even the power connector on the hard drive coming loose, etc....

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 7:14 UTC (Wed) by nix (subscriber, #2304) [Link]

You converted filesystems to *reiserfs* for *stability*?

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 13:01 UTC (Wed) by arafel (subscriber, #18557) [Link]

Some people like to live dangerously...

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 21:18 UTC (Wed) by nix (subscriber, #2304) [Link]

God knows why. It's not even especially fast, even compared to something like ext3 with dir_index, IME.

(A shame: I like tree-structured filesystems a lot. Just not reiserfs --- or NTFS, for that matter, a filesystem which reiserfs definitely *is* a hell of a lot faster than.)

(And who can forget those reiserfs design docs, with the somewhat grandiose claims (most of which I agree with, but they were oddly put), and the blue men... a work of immortal genius, or perhaps hallucinogenic drugs ;) )

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 13:46 UTC (Wed) by dmantione (guest, #4640) [Link]

Stop trolling please. For terabytes of data, xfs and reiserfs are the
realistic choices. Reiserfs has excellent reliability, I'd say better
than xfs, and excellent resize and fsck tools.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 16:56 UTC (Wed) by nevyn (subscriber, #33129) [Link]

Feel free to read about how bad fsck.reiferfs is.

If you want to use reiserfs for the speed, feel free to do so ... just don't pretend you don't need a really good backup strategy.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 21:21 UTC (Wed) by nix (subscriber, #2304) [Link]

Oh, yes, I spotted that when it went by for the first time. The idea of `stitch reiserfsish blocks together' makes a lot of sense until you consider loop...

(One possible fix would be to put an fs-specific uuid in every block, but I hope anyone actually trying this realises how crazy it is and shoots themselves for the sake of our disk space.)

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 27, 2006 6:06 UTC (Thu) by dmantione (guest, #4640) [Link]

That link you post is a big troll. Reiserfsck can indeed rebuild the
filesystem (--rebuild-tree) from scratch by searching the disk. You only
use it when no other recovery is possible. However, if no other recovery
is possible, the recovery chance is still near 100%, unless you indeed
had reiserfs images on your disk, but I doubt that is the case for many
people.

No, I don't use reiserfs for speed. It just happens that it contains a
lot less bugs for very large filesystems than ext3 has.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 27, 2006 13:08 UTC (Thu) by erich (subscriber, #7127) [Link]

--rebuild-tree didn't work for me, and reiserfsck made things actually _worse_. It's crap. Or it was back then, when I stopped using reiserfs.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 27, 2006 13:54 UTC (Thu) by dmantione (guest, #4640) [Link]

Yes, --rebuild-tree rebuilds the entire filesystem, so if it fails the
filesystem is not accessible because the old tree is no longer available.
This situation remains until a successfull rebuild has been done, so
investigate what the cause of the failure is and try again.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 29, 2006 10:56 UTC (Sat) by nix (subscriber, #2304) [Link]

This is, of course, an extremely... peculiar design. The robust approach would be to build a new tree in parallel with the old (as long as space was available and the old tree was undamaged enough to determine which blocks were free), then switch over to the old atomically.

This is harder, but it does make the thing fail-safe.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 27, 2006 13:06 UTC (Thu) by erich (subscriber, #7127) [Link]

reiserfs may have excellent reliability...
... unless it crashes and trashes your whole filesystem, that is.

Happended to me, and if you do a poll on a major linux channel you'll find tons of people burnt by reiserfs.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 28, 2006 4:43 UTC (Fri) by zooko (subscriber, #2589) [Link]

The trouble is that "doing a poll", or more realistically chatting back and forth and swapping war stories, is a terrible way to figure out the truth about things. Humankind has recently developed a set of alternate techniques to figure out the truth about things, which collectively go under the rubric of "science". The older technique would best be titled "folklore".

It is part of Linux culture to sit around and swap stories and form sort of a group consensus on things which are otherwise not measured or analyzed. Does the group consensus usually settle on the truth? Who knows. Folklore is often right. Sometimes not. I'll withhold judgment until I see something better.

(The article linked to in this thread that did fault injection and source analysis of myriad possible failures is an example of something better.)

Regards,

Zooko

ReiserFS's stability is actually quite good

Posted Apr 26, 2006 14:07 UTC (Wed) by gmaxwell (subscriber, #30048) [Link]

Please read http://www.cs.wisc.edu/adsl/Publications/iron-sosp05.pdf.

ReiserFS's handling of exceptional conditions are actually head and shoulders above the competition. The conservative policy of panicking rather than running with corrupted data when something unexpected happen, combined with the dislike of Hans by some of the more notable linux personalities has resulted in a perception which is diametrically opposed to reality.

ReiserFS's stability is not actually quite good

Posted Apr 26, 2006 21:16 UTC (Wed) by nix (subscriber, #2304) [Link]

I'm thinking more of the three completely chewed-up news filesystems reiser3 gifted me with (each accompanied by a nice panic-with-no-reboot-even-though-I-asked-for-it, killing a colo box which otherwise went to great lengths to avoid anything that might take it down; UPS, RAID, the lot), and the half-a-dozen panics for no obvious reason that I've also been hit with (on completely functional hardware). I don't dare use reiserfs on anything but news fsen, and even there have stopped using it on any machines which I care about them not panicking under any load (like, uh, an expiry run). It's caused me more trouble in the few places I've used it than every other filesystem I've ever used, on *any* Unix system.

reiserv3 is nearly unmaintained, and shows it. My experience has been that it's an unreliable and decidedly dangerous fs which does more to reduce system stability than any other major filesystem you could use (no, umsdos is not major in that sense).

And as for that vaunted failure case: panicking on any tiny error is *utter* stupidity if the only flaw is on a relatively unimportant filesystem! The only advantage of panicking on failure is that it spares the fs developers from having to do any analysis of failure modes to determine if the system could potentially stay up under a given mode. Other filesystems manage it. Why can't reiserfs?

ReiserFS's stability is not actually quite good

Posted Apr 26, 2006 21:24 UTC (Wed) by nix (subscriber, #2304) [Link]

Lest I seem excessively harsh, I'm only talking, oh, ten panics in four years here. But still that's ten panics more than ext2, ext3, or xfs have ever hit me with. (ext3 even coped on a machine with RAM so bad you couldn't md5sum a 10Mb file twice in a row and get the same answer!)

(I had major disk corruption on ext3 once, but when your disk drive decides to put blocks somewhere other than where the filesystem asked for them to go, you have little option but to get a new disk and restore from backup, really...)

Even with reiserfs, Linux is damned stable. But reiserfs doesn't help.

ReiserFS's stability is not actually quite good

Posted Apr 27, 2006 13:36 UTC (Thu) by dmantione (guest, #4640) [Link]

<I>I had major disk corruption on ext3 once, but when your disk drive
decides to put blocks somewhere other than where the filesystem asked for
them to go, you have little option but to get a new disk and restore from
backup, really...</I>

Yes, you would prefer to restore from backup.

However, if you did not backup the data, either due to cost/benefit
calculation (i.e. I wouldn't backup my mp3 collection but would try to
restore it) or due to stupidity, you have exactly a situation where
Reiserfs would have saved your data.

In such a situation the reiserfsck --rebuild-tree option, which you
bashed in an earlier post of you is invaluable. The procedure for
recovery is as follows:

* Connect a new disk with the same size or larger as the failing disk.
* Boot from a rescue CD.
* Do not enable any LVM arrays.
* dd the old disk over the new one. If it fails due to a bad sector, skip
some sectors and continue from that place using the right command line
options.
* Remove the old disk.
* If the disk was in an LVM array, enable it. LVM will detect the new
disk as if it was the old one.
* Reiserfsck the filesystem with --rebuild-tree.

Your actual data loss depends on the severity of the corruption. I've had
this experience once, with about 12000 bad sectors, in which I lost 700
mb out of 1,2 GB of data. Some files did appear in /lost+found.

It is easy to bash the --rebuild-tree option because it is not 100% safe
due to possible reiserfs images on the disk. You don't need to use it, a
lot of corruptions can be repaired without it. If you really need it, and
your actual data is intact, the option almost guarantees the data can be
retreived from the disk intact.

ReiserFS's stability is not actually quite good

Posted Apr 27, 2006 23:43 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

so you are saying that reiserfs knows better then the admin. personally I won't use software that decides it knows what I should do better then I do (good defaults are one thing, engineering the product to not allow for choice in another)

you are overlooking the fatal flaw with rebuild tree, namely it grabs ALL blocks that look like they are reiserfs tree blocks, unfortunantly with loop mounted filesystems you can have many filesystems on one partition (the main one, and then the ones that are images in files), rebuild tree will try to combine them all.

ReiserFS's stability is not actually quite good

Posted Apr 28, 2006 5:32 UTC (Fri) by dmantione (guest, #4640) [Link]

Since it is an optional feature, and you are not required to use it, I do
not see the problem.

ReiserFS's stability is not actually quite good

Posted Apr 29, 2006 11:00 UTC (Sat) by nix (subscriber, #2304) [Link]

I can't imagine *any* filesystem saving my data if the disk has been placing blocks in the wrong place for five minutes during intense writes (which was what happened here).

reiserfsck is not going to be any luckier here than any other repair tool because the data is *overwritten*. --rebuild-tree is not magical (any tool which can be misled by the *contents of files* deserves a rather different word to describe it).

ReiserFS's stability is not actually quite good

Posted Apr 28, 2006 4:46 UTC (Fri) by zooko (subscriber, #2589) [Link]

You would prefer that the filesystem optimistically ignore errors when it can and the admin can restore from backups if that goes bad. This is a reasonable strategy for some workloads, where availability is more important than correctness in the short term, and the admin can manually restore correctness in the long term. If you have such a workload, you should probably not user ReiserFS. If on the other hand you have a workload where correctness is more important than availability, then ReiserFS would be a good tool for that use.

Regards,

Zooko

ReiserFS's stability is not actually quite good

Posted Apr 29, 2006 11:02 UTC (Sat) by nix (subscriber, #2304) [Link]

No, of course not. If an error is detected, the FS should go read-only as far as is possible so that data recovery can continue and the system can at least stay mostly running.

This is trivially possible: ext2fs and ext3fs both do it. That reiserfs does not is not to its credit.

ReiserFS's stability is not actually quite good

Posted May 8, 2008 15:57 UTC (Thu) by zooko (subscriber, #2589) [Link]

Hello.  Two years ago was the last comment in this thread, and here I am to add another one.
:-)

My comment is that this fault-injection analysis:

https://www.cs.wisc.edu/wind/Publications/iron-sosp05.pdf

says that what ext2fs and ext3fs do for many of the errors that they measured is nothing, i.e.
carry on as if nothing happened.  What reiserfs does for those same errors is stop.  You could
argue that switching to read-only mode would be better than stopping.  I don't know about
that.  Perhaps the "propagate" option in iron-sosp05.pdf would be better than the "stop"
option because then outer layer code (i.e. the kernel or even userland) can detect the error
and remount read-only.

But, as far as comparing the safety of ext2fs and ext3fs vs. reiserfs, the iron-sosp05.pdf
document seems to make it clean that ext2fs and ext3fs err on the side of increased
availability at the cost of higher risk of corruption, where reiserfs errs on the side of
increased correctness at the cost of higher risk of unavailability.

ReiserFS's stability is not actually quite good

Posted May 8, 2008 22:18 UTC (Thu) by nix (subscriber, #2304) [Link]

I'm still here :)

That increased availability is important.

As I see it, there are two types of file storage one might be interested 
in. There's files for which availability is more important than 
correctness-of-content-under-errors, and there are files where integrity 
is all.

The former case should be handled by detecting errors and going read-only 
(i.e. what ext2+ does, only perhaps with added integrity hashes so you can 
spot more failure cases). The latter case should be handled by making *the 
filesystem objects that are corrupted* unreadable (not the whole disk 
unless there's so much corruption that you can't be sure of anything).

Files that satisfy the former constraint are far more common than those 
that satisfy the latter, because you almost always want the fs to be 
mountable so you can recover as much as possible before hitting the 
backups. Files that satisfy the latter constraint... well, I'm trying to 
think of any and I'm coming up cryptographic keys. (Definitely not 
financial data: I work in that industry, and what matters there is 
availability above all. If one bit is flipped you cope and carry on, you 
don't go unavailable: after all, your competitors haven't stopped just 
because you're having system problems...)


Of course neither fs satisfies these constaints: reiserfs stops too hard 
(and panics the whole machine!), ext* doesn't spot enough failures before 
they cascade into something horrid.


Recently I had a failure mode where the heads didn't bother to move after 
a journal write, leading to rubbish dumped into the journal. The drive 
went more demented shortly after that, mashed another fs, and the machine 
got rebooted... and ext3 thought ooh, we can just roll the journal 
forward! Oops, wrong, that scattered more corruption around, and because 
no fsck had been done the first we knew of it was when a dozen NFS mounts 
from that filesystem suddenly went read-only. e2fsck did a sterling job 
and got essentially everything back, even though we had part of an inode 
table claimed simultaneously by the journal inode and by a bunch of 
logfiles, with a bunch of logfile output written into it in place of 
inodes. (e2fsck wasn't stupid and realised that this was in an inode table 
so the other two files must be liars. Obviously we lost the metadata for 
most of those files, but all the important stuff came back OK. Amazing.)

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 10:32 UTC (Wed) by ikm (subscriber, #493) [Link]

> I was biten by this bug myself quite a few times: system looked "just fine"
> at first glance but SHA1sums of files were different.

So what? This is a documented behaviour. Only the metadata is preserved, the actual data may be lost. This is also true for most of the other journalled filesystems. And XFS here actually has an advantage over many of them: it guarantees that the data that was lost will be filled with zeroes, instead of any garbage, which can leak sensitive information and so on.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 11:53 UTC (Wed) by forthy (guest, #1525) [Link]

> So what? This is a documented behaviour. Only the metadata is
> preserved, the actual data may be lost.

From a user's point of view, this is not acceptable. And this is neither
true for ext3 nor for ReiserFS, which can keep both data and metadata
consistent.

Just take off your filesystem developer's hat, and take the user's hat
on: What's a file worth that sits there, has the correct name, the
correct length, the correct date, no warning sign, and just garbage in?
Nothing! It's worse than having that file deleted, because if it's
deleted, you know it's gone. If it's just tampered, and you don't know it
is, you are toast.

So stay away from filesystems that promise you just "metadata integrity",
because it buys you nothing. Either go full way (data+metadata journaling
or at least ordering), or no way (completely fscking random rubbish in
case of a crash, because then you at least know that it's completely
fscking random rubbish, and you have to dig out your backup).

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 13:02 UTC (Wed) by ikm (subscriber, #493) [Link]

> From a user's point of view, this is not acceptable.

Sure, I actually agree. Just that it's not a bug, but a documented behaviour. OTOH, it should say it more clearly, and the fs itself should not be PRed as a powerfail-safe solution.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 27, 2006 13:16 UTC (Thu) by erich (subscriber, #7127) [Link]

Metadata integrity buys you fsck time. Thats what it is about, not having to fsck your terabytes (whatever) after a crash.

Ext3 and I guess reiserfs are usually not using full data journalling either, because that is much slower. ext3 can, but I think it comes with a serious speed penalty.

Just get a log of your fsck on reboot, and you should have information on which files were trashed, and restore them from backup if they aren't okay.

If an application needs to ensure data integrity, it should be handled in the application, not the filesystem. The application can usually do this much more efficiently. Or at all.

A database file will be changing all the time, probably only being in a completely "consistent" state only when the database server is actually shut down properly. What does data journalling buy you, if you need to be able to handle application crashes and such anyway, too?

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted May 6, 2006 13:30 UTC (Sat) by anton (subscriber, #25547) [Link]

Metadata integrity buys you fsck time. Thats what it is about, not having to fsck your terabytes (whatever) after a crash.

I don't think that fsck could do anything for data integrity, so that sounds like nonsense. Also, with journaling, fsck time is much less important.

With conventional and journaling file systems, metadata-only integrity is a way to achieve better performance and/or lower complexity.

Ext3 and I guess reiserfs are usually not using full data journalling either, because that is much slower. ext3 can, but I think it comes with a serious speed penalty.

And last time I looked, full data journaling was discouraged for ext3 (apparently that part was no longer maintained).

If an application needs to ensure data integrity, it should be handled in the application, not the filesystem. The application can usually do this much more efficiently. Or at all.
What does data journalling buy you, if you need to be able to handle application crashes and such anyway, too?
What happens on an application crash is completely different from what happens on an system crash (power outage, OS crash etc). You can write an application such that it will not lose data on an application crash, but it can still lose data in case of a system crash (e.g., if the programmer forgot some fsync()s). And resilience against application crashes can be tested much better than resilience against system crashes.

So what I would like to see is a file system with in-order semantics, i.e., where there is no difference between what happens on an application crash and what happens on a system crash. And with the right file system, providing this can be more efficient than littering the applications with fsync()s.

Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian

Posted Apr 26, 2006 15:59 UTC (Wed) by khim (subscriber, #9252) [Link]

So what? This is a documented behaviour. Only the metadata is preserved, the actual data may be lost.

Then why have logging filesystem at all ? Note: I'm not talking about "fresh" files - they are not expected to survive. But when files created days and weeks ago perish in crash... sorry - this is not what I expect from logging filesystem.

In few cases I even had corrupted files which survived first crash but not second one - gosh. Worse then EXT2 without any logging: at least there you can be sure files are there - or not...

judging from what? Administration)

Posted Apr 27, 2006 6:35 UTC (Thu) by gvy (guest, #11981) [Link]

If from MLs, maybe "never been". In my experience, even if I like and use XFS, it's very reasonable explanation of some data losses experienced when UPSes went out (ISP facility, we weren't -- and still aren't -- able to get notified that the batteries run low).

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds