Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
There are a lot of Linux filesystems comparisons available but most of them are anecdotal, based on artificial tasks or completed under older kernels. This benchmark essay is based on 11 real-world tasks appropriate for a file server with older generation hardware (Pentium II/III, EIDE hard-drive)."
(Log in to post comments)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 25, 2006 22:50 UTC (Tue) by dwheeler (guest, #1216) [Link]
Nice filesystem survey article! But it omits some other trades.In particular, according to Theodore Ts'o (at FISL 2005), there's a serious "data loss in power fail" condition in XFS. SGI's XFS was originally designed for SGIs, and SGIs were specifically designed so hard drives would stop writing when power began to fail... to prevent catastrophe. XFS was designed with this presumption (that the hard drives would stop first). Unfortunately, x86 systems don't do that... the memory usually goes first, so you end up writing garbage to the disk on an x86. That can be a BIG BIG problem. Ts'o claimed that the other filesystems had much less risk and data loss when power failed and you're using a commodity x86 design.
If your system is on a UPS, and you trust it, I understand that XFS works very nicely in many circumstances. But it may not be a good solution without one.
Perhaps this is obsolete information, I'd love to know if this is no longer true.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 25, 2006 23:16 UTC (Tue) by hch (guest, #5625) [Link]
This has never been true. People have been trying to talk Ted out of that for at least five years, but he keeps on repeating it without backing thefacts unfortunately.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 25, 2006 23:56 UTC (Tue) by khim (subscriber, #9252) [Link]
I was biten by this bug myself quite a few times: system looked "just fine" at first glance but SHA1sums of files were different. Unfortunatelly I had SHA1sums for some files but not for all of them and this was HUGE mess. In the end I've converted all servers to ReiserFS and never looked back.
Sorry, but "XFS is flacky when powered off under load" is NOT a Ted's delusion. I never was able to reproduce such problem without terabytes of data and high load so I know debugging is hard, but if you don't have high load and terabytes of data then you can use any filesystem and it'll not produce any difference, right ? Conclusion: do not use XFS... Ever,..
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 0:54 UTC (Wed) by hawk (subscriber, #3195) [Link]
On the other hand, with machines with terabytes of data and high load, one might consider having UPS... :-)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 1:10 UTC (Wed) by sbergman27 (guest, #10767) [Link]
So why have journalling filesystems at all? So much development time.... just wasted. And all we needed to do was buy a UPS! ;-)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 15:15 UTC (Wed) by NightMonkey (subscriber, #23051) [Link]
Certainly consider it, but a UPS doesn't do much in the presence of a short on the MB, or all power supplies failing, or even the power connector on the hard drive coming loose, etc....
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 7:14 UTC (Wed) by nix (subscriber, #2304) [Link]
You converted filesystems to *reiserfs* for *stability*?
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 13:01 UTC (Wed) by arafel (guest, #18557) [Link]
Some people like to live dangerously...
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 21:18 UTC (Wed) by nix (subscriber, #2304) [Link]
God knows why. It's not even especially fast, even compared to something like ext3 with dir_index, IME.
(A shame: I like tree-structured filesystems a lot. Just not reiserfs --- or NTFS, for that matter, a filesystem which reiserfs definitely *is* a hell of a lot faster than.)
(And who can forget those reiserfs design docs, with the somewhat grandiose claims (most of which I agree with, but they were oddly put), and the blue men... a work of immortal genius, or perhaps hallucinogenic drugs ;) )
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 13:46 UTC (Wed) by dmantione (guest, #4640) [Link]
Stop trolling please. For terabytes of data, xfs and reiserfs are therealistic choices. Reiserfs has excellent reliability, I'd say better
than xfs, and excellent resize and fsck tools.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 16:56 UTC (Wed) by nevyn (subscriber, #33129) [Link]
Feel free to read about how bad fsck.reiferfs is.
If you want to use reiserfs for the speed, feel free to do so ... just don't pretend you don't need a really good backup strategy.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 21:21 UTC (Wed) by nix (subscriber, #2304) [Link]
Oh, yes, I spotted that when it went by for the first time. The idea of `stitch reiserfsish blocks together' makes a lot of sense until you consider loop...
(One possible fix would be to put an fs-specific uuid in every block, but I hope anyone actually trying this realises how crazy it is and shoots themselves for the sake of our disk space.)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 27, 2006 6:06 UTC (Thu) by dmantione (guest, #4640) [Link]
That link you post is a big troll. Reiserfsck can indeed rebuild thefilesystem (--rebuild-tree) from scratch by searching the disk. You only
use it when no other recovery is possible. However, if no other recovery
is possible, the recovery chance is still near 100%, unless you indeed
had reiserfs images on your disk, but I doubt that is the case for many
people.
No, I don't use reiserfs for speed. It just happens that it contains a
lot less bugs for very large filesystems than ext3 has.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 27, 2006 13:08 UTC (Thu) by erich (guest, #7127) [Link]
--rebuild-tree didn't work for me, and reiserfsck made things actually _worse_. It's crap. Or it was back then, when I stopped using reiserfs.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 27, 2006 13:54 UTC (Thu) by dmantione (guest, #4640) [Link]
Yes, --rebuild-tree rebuilds the entire filesystem, so if it fails thefilesystem is not accessible because the old tree is no longer available.
This situation remains until a successfull rebuild has been done, so
investigate what the cause of the failure is and try again.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 29, 2006 10:56 UTC (Sat) by nix (subscriber, #2304) [Link]
This is, of course, an extremely... peculiar design. The robust approach would be to build a new tree in parallel with the old (as long as space was available and the old tree was undamaged enough to determine which blocks were free), then switch over to the old atomically.
This is harder, but it does make the thing fail-safe.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 27, 2006 13:06 UTC (Thu) by erich (guest, #7127) [Link]
reiserfs may have excellent reliability...... unless it crashes and trashes your whole filesystem, that is.
Happended to me, and if you do a poll on a major linux channel you'll find tons of people burnt by reiserfs.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 28, 2006 4:43 UTC (Fri) by zooko (guest, #2589) [Link]
The trouble is that "doing a poll", or more realistically chatting back and forth and swapping war stories, is a terrible way to figure out the truth about things. Humankind has recently developed a set of alternate techniques to figure out the truth about things, which collectively go under the rubric of "science". The older technique would best be titled "folklore".
It is part of Linux culture to sit around and swap stories and form sort of a group consensus on things which are otherwise not measured or analyzed. Does the group consensus usually settle on the truth? Who knows. Folklore is often right. Sometimes not. I'll withhold judgment until I see something better.
(The article linked to in this thread that did fault injection and source analysis of myriad possible failures is an example of something better.)
Regards,
Zooko
ReiserFS's stability is actually quite good
Posted Apr 26, 2006 14:07 UTC (Wed) by gmaxwell (guest, #30048) [Link]
Please read http://www.cs.wisc.edu/adsl/Publications/iron-sosp05.pdf.
ReiserFS's handling of exceptional conditions are actually head and shoulders above the competition. The conservative policy of panicking rather than running with corrupted data when something unexpected happen, combined with the dislike of Hans by some of the more notable linux personalities has resulted in a perception which is diametrically opposed to reality.
ReiserFS's stability is not actually quite good
Posted Apr 26, 2006 21:16 UTC (Wed) by nix (subscriber, #2304) [Link]
I'm thinking more of the three completely chewed-up news filesystems reiser3 gifted me with (each accompanied by a nice panic-with-no-reboot-even-though-I-asked-for-it, killing a colo box which otherwise went to great lengths to avoid anything that might take it down; UPS, RAID, the lot), and the half-a-dozen panics for no obvious reason that I've also been hit with (on completely functional hardware). I don't dare use reiserfs on anything but news fsen, and even there have stopped using it on any machines which I care about them not panicking under any load (like, uh, an expiry run). It's caused me more trouble in the few places I've used it than every other filesystem I've ever used, on *any* Unix system.
reiserv3 is nearly unmaintained, and shows it. My experience has been that it's an unreliable and decidedly dangerous fs which does more to reduce system stability than any other major filesystem you could use (no, umsdos is not major in that sense).
And as for that vaunted failure case: panicking on any tiny error is *utter* stupidity if the only flaw is on a relatively unimportant filesystem! The only advantage of panicking on failure is that it spares the fs developers from having to do any analysis of failure modes to determine if the system could potentially stay up under a given mode. Other filesystems manage it. Why can't reiserfs?
ReiserFS's stability is not actually quite good
Posted Apr 26, 2006 21:24 UTC (Wed) by nix (subscriber, #2304) [Link]
Lest I seem excessively harsh, I'm only talking, oh, ten panics in four years here. But still that's ten panics more than ext2, ext3, or xfs have ever hit me with. (ext3 even coped on a machine with RAM so bad you couldn't md5sum a 10Mb file twice in a row and get the same answer!)
(I had major disk corruption on ext3 once, but when your disk drive decides to put blocks somewhere other than where the filesystem asked for them to go, you have little option but to get a new disk and restore from backup, really...)
Even with reiserfs, Linux is damned stable. But reiserfs doesn't help.
ReiserFS's stability is not actually quite good
Posted Apr 27, 2006 13:36 UTC (Thu) by dmantione (guest, #4640) [Link]
<I>I had major disk corruption on ext3 once, but when your disk drivedecides to put blocks somewhere other than where the filesystem asked for
them to go, you have little option but to get a new disk and restore from
backup, really...</I>
Yes, you would prefer to restore from backup.
However, if you did not backup the data, either due to cost/benefit
calculation (i.e. I wouldn't backup my mp3 collection but would try to
restore it) or due to stupidity, you have exactly a situation where
Reiserfs would have saved your data.
In such a situation the reiserfsck --rebuild-tree option, which you
bashed in an earlier post of you is invaluable. The procedure for
recovery is as follows:
* Connect a new disk with the same size or larger as the failing disk.
* Boot from a rescue CD.
* Do not enable any LVM arrays.
* dd the old disk over the new one. If it fails due to a bad sector, skip
some sectors and continue from that place using the right command line
options.
* Remove the old disk.
* If the disk was in an LVM array, enable it. LVM will detect the new
disk as if it was the old one.
* Reiserfsck the filesystem with --rebuild-tree.
Your actual data loss depends on the severity of the corruption. I've had
this experience once, with about 12000 bad sectors, in which I lost 700
mb out of 1,2 GB of data. Some files did appear in /lost+found.
It is easy to bash the --rebuild-tree option because it is not 100% safe
due to possible reiserfs images on the disk. You don't need to use it, a
lot of corruptions can be repaired without it. If you really need it, and
your actual data is intact, the option almost guarantees the data can be
retreived from the disk intact.
ReiserFS's stability is not actually quite good
Posted Apr 27, 2006 23:43 UTC (Thu) by dlang (guest, #313) [Link]
so you are saying that reiserfs knows better then the admin. personally I won't use software that decides it knows what I should do better then I do (good defaults are one thing, engineering the product to not allow for choice in another)
you are overlooking the fatal flaw with rebuild tree, namely it grabs ALL blocks that look like they are reiserfs tree blocks, unfortunantly with loop mounted filesystems you can have many filesystems on one partition (the main one, and then the ones that are images in files), rebuild tree will try to combine them all.
ReiserFS's stability is not actually quite good
Posted Apr 28, 2006 5:32 UTC (Fri) by dmantione (guest, #4640) [Link]
Since it is an optional feature, and you are not required to use it, I donot see the problem.
ReiserFS's stability is not actually quite good
Posted Apr 29, 2006 11:00 UTC (Sat) by nix (subscriber, #2304) [Link]
I can't imagine *any* filesystem saving my data if the disk has been placing blocks in the wrong place for five minutes during intense writes (which was what happened here).
reiserfsck is not going to be any luckier here than any other repair tool because the data is *overwritten*. --rebuild-tree is not magical (any tool which can be misled by the *contents of files* deserves a rather different word to describe it).
ReiserFS's stability is not actually quite good
Posted Apr 28, 2006 4:46 UTC (Fri) by zooko (guest, #2589) [Link]
You would prefer that the filesystem optimistically ignore errors when it can and the admin can restore from backups if that goes bad. This is a reasonable strategy for some workloads, where availability is more important than correctness in the short term, and the admin can manually restore correctness in the long term. If you have such a workload, you should probably not user ReiserFS. If on the other hand you have a workload where correctness is more important than availability, then ReiserFS would be a good tool for that use.
Regards,
Zooko
ReiserFS's stability is not actually quite good
Posted Apr 29, 2006 11:02 UTC (Sat) by nix (subscriber, #2304) [Link]
No, of course not. If an error is detected, the FS should go read-only as far as is possible so that data recovery can continue and the system can at least stay mostly running.
This is trivially possible: ext2fs and ext3fs both do it. That reiserfs does not is not to its credit.
ReiserFS's stability is not actually quite good
Posted May 8, 2008 15:57 UTC (Thu) by zooko (guest, #2589) [Link]
Hello. Two years ago was the last comment in this thread, and here I am to add another one. :-) My comment is that this fault-injection analysis: https://www.cs.wisc.edu/wind/Publications/iron-sosp05.pdf says that what ext2fs and ext3fs do for many of the errors that they measured is nothing, i.e. carry on as if nothing happened. What reiserfs does for those same errors is stop. You could argue that switching to read-only mode would be better than stopping. I don't know about that. Perhaps the "propagate" option in iron-sosp05.pdf would be better than the "stop" option because then outer layer code (i.e. the kernel or even userland) can detect the error and remount read-only. But, as far as comparing the safety of ext2fs and ext3fs vs. reiserfs, the iron-sosp05.pdf document seems to make it clean that ext2fs and ext3fs err on the side of increased availability at the cost of higher risk of corruption, where reiserfs errs on the side of increased correctness at the cost of higher risk of unavailability.
ReiserFS's stability is not actually quite good
Posted May 8, 2008 22:18 UTC (Thu) by nix (subscriber, #2304) [Link]
I'm still here :) That increased availability is important. As I see it, there are two types of file storage one might be interested in. There's files for which availability is more important than correctness-of-content-under-errors, and there are files where integrity is all. The former case should be handled by detecting errors and going read-only (i.e. what ext2+ does, only perhaps with added integrity hashes so you can spot more failure cases). The latter case should be handled by making *the filesystem objects that are corrupted* unreadable (not the whole disk unless there's so much corruption that you can't be sure of anything). Files that satisfy the former constraint are far more common than those that satisfy the latter, because you almost always want the fs to be mountable so you can recover as much as possible before hitting the backups. Files that satisfy the latter constraint... well, I'm trying to think of any and I'm coming up cryptographic keys. (Definitely not financial data: I work in that industry, and what matters there is availability above all. If one bit is flipped you cope and carry on, you don't go unavailable: after all, your competitors haven't stopped just because you're having system problems...) Of course neither fs satisfies these constaints: reiserfs stops too hard (and panics the whole machine!), ext* doesn't spot enough failures before they cascade into something horrid. Recently I had a failure mode where the heads didn't bother to move after a journal write, leading to rubbish dumped into the journal. The drive went more demented shortly after that, mashed another fs, and the machine got rebooted... and ext3 thought ooh, we can just roll the journal forward! Oops, wrong, that scattered more corruption around, and because no fsck had been done the first we knew of it was when a dozen NFS mounts from that filesystem suddenly went read-only. e2fsck did a sterling job and got essentially everything back, even though we had part of an inode table claimed simultaneously by the journal inode and by a bunch of logfiles, with a bunch of logfile output written into it in place of inodes. (e2fsck wasn't stupid and realised that this was in an inode table so the other two files must be liars. Obviously we lost the metadata for most of those files, but all the important stuff came back OK. Amazing.)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 10:32 UTC (Wed) by ikm (subscriber, #493) [Link]
> I was biten by this bug myself quite a few times: system looked "just fine"> at first glance but SHA1sums of files were different.
So what? This is a documented behaviour. Only the metadata is preserved, the actual data may be lost. This is also true for most of the other journalled filesystems. And XFS here actually has an advantage over many of them: it guarantees that the data that was lost will be filled with zeroes, instead of any garbage, which can leak sensitive information and so on.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 11:53 UTC (Wed) by forthy (guest, #1525) [Link]
> So what? This is a documented behaviour. Only the metadata is> preserved, the actual data may be lost.
From a user's point of view, this is not acceptable. And this is neither
true for ext3 nor for ReiserFS, which can keep both data and metadata
consistent.
Just take off your filesystem developer's hat, and take the user's hat
on: What's a file worth that sits there, has the correct name, the
correct length, the correct date, no warning sign, and just garbage in?
Nothing! It's worse than having that file deleted, because if it's
deleted, you know it's gone. If it's just tampered, and you don't know it
is, you are toast.
So stay away from filesystems that promise you just "metadata integrity",
because it buys you nothing. Either go full way (data+metadata journaling
or at least ordering), or no way (completely fscking random rubbish in
case of a crash, because then you at least know that it's completely
fscking random rubbish, and you have to dig out your backup).
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 13:02 UTC (Wed) by ikm (subscriber, #493) [Link]
> From a user's point of view, this is not acceptable.
Sure, I actually agree. Just that it's not a bug, but a documented behaviour. OTOH, it should say it more clearly, and the fs itself should not be PRed as a powerfail-safe solution.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 27, 2006 13:16 UTC (Thu) by erich (guest, #7127) [Link]
Metadata integrity buys you fsck time. Thats what it is about, not having to fsck your terabytes (whatever) after a crash.
Ext3 and I guess reiserfs are usually not using full data journalling either, because that is much slower. ext3 can, but I think it comes with a serious speed penalty.
Just get a log of your fsck on reboot, and you should have information on which files were trashed, and restore them from backup if they aren't okay.
If an application needs to ensure data integrity, it should be handled in the application, not the filesystem. The application can usually do this much more efficiently. Or at all.
A database file will be changing all the time, probably only being in a completely "consistent" state only when the database server is actually shut down properly. What does data journalling buy you, if you need to be able to handle application crashes and such anyway, too?
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted May 6, 2006 13:30 UTC (Sat) by anton (subscriber, #25547) [Link]
Metadata integrity buys you fsck time. Thats what it is about, not having to fsck your terabytes (whatever) after a crash.
I don't think that fsck could do anything for data integrity, so that sounds like nonsense. Also, with journaling, fsck time is much less important.
With conventional and journaling file systems, metadata-only integrity is a way to achieve better performance and/or lower complexity.
Ext3 and I guess reiserfs are usually not using full data journalling either, because that is much slower. ext3 can, but I think it comes with a serious speed penalty.
And last time I looked, full data journaling was discouraged for ext3 (apparently that part was no longer maintained).
If an application needs to ensure data integrity, it should be handled in the application, not the filesystem. The application can usually do this much more efficiently. Or at all.
What does data journalling buy you, if you need to be able to handle application crashes and such anyway, too?What happens on an application crash is completely different from what happens on an system crash (power outage, OS crash etc). You can write an application such that it will not lose data on an application crash, but it can still lose data in case of a system crash (e.g., if the programmer forgot some fsync()s). And resilience against application crashes can be tested much better than resilience against system crashes.
So what I would like to see is a file system with in-order semantics, i.e., where there is no difference between what happens on an application crash and what happens on a system crash. And with the right file system, providing this can be more efficient than littering the applications with fsync()s.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 15:59 UTC (Wed) by khim (subscriber, #9252) [Link]
So what? This is a documented behaviour. Only the metadata is preserved, the actual data may be lost.
Then why have logging filesystem at all ? Note: I'm not talking about "fresh" files - they are not expected to survive. But when files created days and weeks ago perish in crash... sorry - this is not what I expect from logging filesystem.
In few cases I even had corrupted files which survived first crash but not second one - gosh. Worse then EXT2 without any logging: at least there you can be sure files are there - or not...
judging from what? Administration)
Posted Apr 27, 2006 6:35 UTC (Thu) by gvy (guest, #11981) [Link]
If from MLs, maybe "never been". In my experience, even if I like and use XFS, it's very reasonable explanation of some data losses experienced when UPSes went out (ISP facility, we weren't -- and still aren't -- able to get notified that the batteries run low).
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 0:31 UTC (Wed) by sbergman27 (guest, #10767) [Link]
What this article leaves out... what most *every* fs benchmark article leaves out is the fact that EXT3 gives you a much greater level of journalling protection than the others by default. By default, EXT3 gives you metadata journalling and ordered writes of metadata and data. The others only give you metadata journalling by default. And, of the others, only reiserfs gives you the option of anything higher.
XFS, JFS, EXT3, and Resierfs all offer plain metadata journalling. This level basically gives you no more protection than a nonjournalling filesystem beyond the fact that you don't have to run fsck after a power failure. After an unclean shutdown, you are guaranteed that the filesystem structure will be intact, but some of your file data may be garbage. This is the default for XFS, JFS, and Reiserfs.
Reiserfs and EXT3 offer full data journalling. In general, this is the slowest of the journalling levels because you are writing the data twice. Once to the journal (which is fast because the journal is always streamed out sequentially) and once to the ultimate location. It gives you the greatest level of protection. In fact, it gives the same level of protection as mounting the filesystem synchronously. If the application was is told that the data was written, it's as good as written. After a power failure, the filesystem structure is guaranteed to be intact. And the file data is guaranteed to be intact. None of the tested filesystems default to this level.
EXT3 offers an intermediate option called "ordered" mode. It is basically metadata journalling, but adds a constraint that the writes to disk will be ordered in such a way that it gives some an extra guarantee. Not only is the filesystem structure guaranteed to be intact, but the data is also guaranteed to be intact. It may not be the *latest* data that the app was told was written to disk. But it is guaranteed n0t t0 b3 g@rb@g3. i.e. it may be the data that was there before the write request. There is a significant performance penalty, but not as great as for full data journalling. This is (wisely in my opinion) EXT3's default.
Another thing that the article misses is the fact that ext3 reserves 5% of space for its fragmentation avoidance algorithms. So that 92.77% partition capacity really should be 97.77%. You can set the reserved space to 0% if you really want. (But keep in mind that the 5% reserve, like the performance penalty for ordered journalling, buys you something.)
I firmly believe that if ext3 had been designed and maintained by people with a little more marketing savvy, and a little less concern with doing what makes sense from a technical standpoint (like reis... *cough*... er), ext3 would have a much better performance reputation. And I'm glad it wasn't.
(David, I know you know all this, but some budding filesystem benchmarker might read this and get it right.)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 1:47 UTC (Wed) by dlang (guest, #313) [Link]
keep in mind that these benifits you are listing only count if you are useing a hard drive that allows you to disable the write cache on it (i.e. no IDE drives qualify). if the drive can tell you that the write is completed while it's in the drives memory it's still vundrable to being lost.
since there is a huge performance penalty for doing this it's seldom done even on the drives that support it.
David Lang
P.S. and no, the drives don't use their platter energy to drive the electronics long enough to write their data. as an excercise consider how many seeks could be required to write all the data in a 16M buffer, and how long that could take. then add in the problems of writing to a platter that's slowing down as you write and you start to realize why nobody does that anymore.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 2:49 UTC (Wed) by sbergman27 (guest, #10767) [Link]
There is a difference between the benefits "not counting" and the guarantees not being absolute.
But you are correct. Hardware write caching does figure in.
You *can* turn write caching off on most IDE drives but, as you say, there is a considerable performance penalty due to the insane limitation on write request size imposed by the ATA standard. So far as i know, SCSI drives are not so hampered.
P.S. Did I say something about the drive using its platter energy for something?
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 7:28 UTC (Wed) by dlang (guest, #313) [Link]
no, you didn't say anything aobut platter energy, but it's a common misconception that people have about drives and the cache on them
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 15:14 UTC (Wed) by sbergman27 (guest, #10767) [Link]
I saw refernces to that misconception twice yesterday. Oddly, the last time I encountered it (that I remember) was in 1979 when my first computer science instructor made reference to the "fact" that the drive did that. I believed it at the time, and then later decided it was urban legend.
I notice in your previous post you indicate that they don't do that "anymore". Did they really used to do that?
Also of note, and sorry, I don't have a ready link, I was once struck by a thread on lkml in which someone was crash testing different filesystems under Linux. He had a filesystem with lots of writes going on and knew what all the md5sums should be or something like that and would then pull the plug at random times and observe what happened. His question to the list was for someone to please point out what he was "doing wrong". You see, all the other filesystems in the test corrupted files with more or less the same frequency. All except for ext3, that is. It seemed never (or very rarely) to ever corrupt files and he felt that there must be a problem with his methodology. It was explained that ext3 defaults to data=ordered mode and that such behavior was really expected.
My summary of the thread is probably not completely accurate because it's been a while since I read it, but I was struck by the fact that someone observed the difference even though they were not expecting it.
If I can dig up a link, I'll post it.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 16:15 UTC (Wed) by dlang (guest, #313) [Link]
>I notice in your previous post you indicate that they don't do that "anymore". Did they really used to do that?
I don't know for sure, but every urban legend needs to get started somewhere right? :-)
thinking about it from a practical standpoint, at one point drives had very small buffers (ram was too expensive, and too large to put much on a drive) along with a lot of rotating mass, so it would have been possible to flush that buffer immediatly on power-loss (and the tolorances of the data on disk were loose enough to accept the slight distortion that would result).
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 6:46 UTC (Wed) by joib (subscriber, #8541) [Link]
keep in mind that these benifits you are listing only count if you are useing a hard drive that allows you to disable the write cache on it (i.e. no IDE drives qualify).As was already mentioned, most IDE drives allow you to disable write-back cache. However, many manufacturers consider this operation a warranty-voiding one, since disabling write-back caching causes much more physical writes which significantly reduces the life of the drive.
if the drive can tell you that the write is completed while it's in the drives memory it's still vundrable to being lost. since there is a huge performance penalty for doing this it's seldom done even on the drives that support it.
Fortunately, you can have your cake and eat it too. The trick is to implement IO barriers using the CACHE FLUSH and/or FUA commands. That way you can have the performance and MTBF benefits of write-back caching while still having a safe fsync() (safe as in doesn't return before data is on the platters).
Also note that the IO barrier rewrite referenced above was included only from 2.6.16+; I don't know how previous kernels did it.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 7:29 UTC (Wed) by dlang (guest, #313) [Link]
I hadn't caught the fact that the IO barriers had made it into the kernel, I knew they were being worked on. prior to that going in the only option the kernel had was to stop all IO to the drive while issueing a full flush to it.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted May 6, 2006 13:56 UTC (Sat) by anton (subscriber, #25547) [Link]
>hard drive that allows you to disable the write cache on it (i.e. no>IDE drives qualify).
hdparm -W0 works on any IDE drive I have tried it on (and we run all
our ext3 FSs without disk write caches).
>since there is a huge performance penalty for doing this
Well, I recently tried it:
Disk: SAMSUNG SV1204H, ATA DISK drive
FS: ext3
Task: writing a 4.3GB file
Time: 12 min without write caching
6 min with write caching
So the penalty was a factor of 2 in this case.
>as an excercise consider how many seeks could be required to write all
>the data in a 16M buffer, and how long that could take.
The drive could have a 16MB spare area for just this case, and dump
the buffer contents there; it could read that area on powerup, and
then proceed as if there had been no power outage (i.e. write the
blocks to their home location). On a modern drive, this would take
0.25s. So it's not impossible, but I don't believe it is being done.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 16:05 UTC (Wed) by Duncan (guest, #6647) [Link]
> What this article leaves out... what most *every*> fs benchmark article leaves out is the fact that
> EXT3 gives you a much greater level of journalling
> protection than the others by default. By default,
> EXT3 gives you metadata journalling and ordered
> writes of metadata and data. The others only give
> you metadata journalling by default. And, of the
> others, only reiserfs gives you the option of
> anything higher.
> Reiserfs and EXT3 offer full data journalling.
> EXT3 offers an intermediate option called "ordered"
> mode. [...] This is (wisely in my opinion) EXT3's
> default.
Actually, reiserfs offers ordered mode as well. It didn't originally, but
Chris Mason's patch adding the functionality was merged into to the
mainline kernel before full data journalling for reiserfs was added. (A
google turns up this changelog for 2.6.6-rc1:
http://lwn.net/Articles/80719/ ) It became the default either at that
time or soon thereafter.
That said, if you didn't know it was there (I knew in part due to LWN
coverage), it would be and remains very hard to notice that it's now using
ordered. The output at filesystem mount doesn't mention the fact, and of
course being the default, there's no indication in fstab. One has to look
quite carefully at the dmesg output for the mount to notice it. It should
have a line something like (from my boot log, md_d1p1 of course indicates
partitioned RAID):
ReiserFS: md_d1p1: using ordered data mode
I remember looking to see that I was using ordered shortly after
installing and booting that kernel, and wondering why it didn't
mention "ordered mode" in the mount output. I certainly would have missed
it too, had I not known it was there. To your credit, you knew about the
later journalled mode reiserfs patches, but you apparently missed the
ordered mode patches, and that just as with ext3, that's now the default
for reiserfs.
Oh, for anyone interested, those patches /did/ make reiserfs far more
stable. I had a bout with some bad memory during which I was crashing
quite frequently -- and usually under high load and disk activity at
that -- and reiserfs came thru with flying colors! =8^) That's far better
than it did when I first started using it, back in the bad old early 2.4
days.
Duncan
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 26, 2006 21:41 UTC (Wed) by nix (subscriber, #2304) [Link]
Yeah, I haven't lost any reiserfs filesystems since then anyway.
(It's just panicked instead. Three times.)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 27, 2006 9:23 UTC (Thu) by Wol (subscriber, #4433) [Link]
I believe there's also an experimental filesystem called TuxFS. It guarantees integrity without journaling :-)
Basically, it never overwrites a live file. Any changes, it writes the modified block out in full somewhere else. Then it rewrites the updated file header out in full somewhere else. Then the directory header ...
Until finally it rewrites the root block. The only time there's any danger is if it goes down while writing the root block. At all other times, the root block is pointing at a completely valid filesystem. If the system crashes, all updates since the last root block update are lost because the modified blocks are orphaned. But all previous data is safe, because data is never modified "in situ".
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 27, 2006 12:01 UTC (Thu) by nix (subscriber, #2304) [Link]
Downside: a minor problem with fragmentation. :)))
(The extra disk overhead can be disregarded as long as you have decent amounts of cache, because disk writes are generally localized in the directory tree anyway. At least they are if atime updates are disabled. Using a filesystem like this with atime updates enabled strikes me as... perhaps unwise.)
XFS is not for the masses
Posted Apr 26, 2006 13:11 UTC (Wed) by hmh (subscriber, #3838) [Link]
XFS' data loss on powerloss scenarios is still true. It is caused by (as others said) the fact that data writes are not protected unless you call fsync(), and I am not really sure they are fully protected even then.
What people often forget is that XFS has delayed write allocation, which increases the window for such data loss even more as a trade-off for performance. Given that making sure that XFS isn't corrupted requires xfs_repair and is not easy to do on the / partition on most default installs, I would never recommend anyone who doesn't really know what he is doing to use it (and yes, XFS does get corrupted although it is a very rare event, and usually due to bad memory on the system board).
Ext3 will also have corruption/data loss issues on powerloss, depending on your disks and their caches, but the window is much smaller. At least it is very fsck-on-boot-friendly...
Maybe the new SATA drives (which seem to be better at honouring cache flush requests) and IO barriers will put an end to most ext3 corruption on the typical desktop, but I wouldn't hold my breath.
I won't comment on reiserfs, I have little experience with it.
XFS is not for the masses
Posted Apr 26, 2006 21:43 UTC (Wed) by nix (subscriber, #2304) [Link]
Ah well, with early userspace it's easy to fsck before mounting /, and when early userspace lands for everyone, hopefully there'll be a fsck in there, or klibc will be capable enough for e2fsprogs's fsck...
XFS is not for the masses
Posted Apr 26, 2006 21:54 UTC (Wed) by hmh (subscriber, #3838) [Link]
You can fsck / with ext2/ext3 just fine. It is XFS' utter useless fsck semanthics (do nothing) which are trouble. You need xfs_repair in the early/repair userland to deal with that.
I'd be much better if XFS' (and anyone else also doing this) would just stop with the impress-the-management-with-lies politics and implement fsck as it is meant to be implemented, instead of as a NOP.
XFS is not for the masses
Posted Apr 27, 2006 6:23 UTC (Thu) by nix (subscriber, #2304) [Link]
Why not wrap up xfs_repair with a fsck wrapper? (I thought someone had done that, actually.)
XFS is not for the masses
Posted Apr 27, 2006 12:02 UTC (Thu) by nix (subscriber, #2304) [Link]
There is something unpleasant about fscking *any* live filesystem to me: the 'recommend reboot' return code from e2fsck is a symptom of this unpleasant special case. Well, now we can zap it :)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 26, 2006 0:10 UTC (Wed) by quartz (guest, #37351) [Link]
Still, they've repeated the tests and used a machine you can't even buy anymore. My machine is over 3 years old and it's got an Athlon XP2000.
I know there's value in benchmarking with old hardware (for some countries is the standard), but the trend is that the speed gap between CPU/memory and hard disk seek/transfer is increasing -- so using a faster CPU would have generated results that better indicate which FS will lead the race in a couple of years, or when you buy new hardware...
BTW, where's Reiser4???
[]s Gus
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 26, 2006 2:57 UTC (Wed) by gdt (subscriber, #6284) [Link]
What struck me is just how much below the theoretical performance of the drive the transfer rates are. That might be overly optimistic claims by drive manufacturers, or it might hint that there is still a considerable way to go in improving filesystem performance.
It would be very illuminating to also have the result of a "dd" of 700MB from the drive to /dev/null as well as times for reading the 700MB with a filesystem.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 26, 2006 3:33 UTC (Wed) by hathawsh (guest, #11289) [Link]
In my own informal tests, XFS and JFS can reach over 90% of the theoretical limit. Specifically:
dd if=/dev/zero bs=1048576 of=/dev/sda1 count=650 # 60 MB/s (theoretical limit)
dd if=/dev/zero bs=1048576 of=/media/sda1/bigfile count=650 # 55 MB/s (practical limit)
These are the rates I see regularly with 7200 RPM, 320 GB drives using an Athlon 64 3200+. ReiserFS and Ext3 turned out slower for this purpose, although I don't remember how much slower.
BTW, one crazy idea I've been toying with is using LVM2 as a filesystem for large files. Use each partition as a file. You could control the placement of every file, giving you 99% of the theoretical transfer rate with virtually no CPU usage. It might be great for digital video and PVRs like MythTV.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (DebianAdministration)
Posted Apr 26, 2006 14:59 UTC (Wed) by thompsot (guest, #12368) [Link]
"BTW, one crazy idea I've been toying with is using LVM2 as a
filesystem for large files. Use each partition as a file. You could
control the placement of every file, giving you 99% of the theoretical
transfer rate with virtually no CPU usage. It might be great for
digital video and PVRs like MythTV."
Have you tested this idea? Just curious...
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 2:07 UTC (Wed) by malefic (subscriber, #37306) [Link]
I, personally, never believe such articles. It's not a comparsion or a test, it's merely a personal opinion hinted by some 'not really scientific' measurements.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 26, 2006 11:36 UTC (Wed) by drag (guest, #31333) [Link]
Exactly..
And this paticular article is worse then most. It's quite a arrogent suggestion that a person can, in a short article, provide more accurate results then years of benchmarking and study.
I mean that it's fine to make something like this pointing out your personal results... but I don't see a single aspect of this benchmark that provides any sort of real world results.
Even a big file copy isn't so hot.. in a desktop environment you'd have a lot of little file accesses going on and scedualing issues and such all the time. How does various file systems respond to those situations, for example?
It's not that realistic.
It seems that benchmarks can only provide usefull data on one specific situation. You aren't going to be able to determine what is the 'best' overall file system without messing around with many different types of hardware and system loads. Mixing different benchmark tests.. It would take weeks and cost much more money then I'd be willing to spend. And even then a kernel revision may change all that if somebody figures out a fix for a bottleneck or drive controller driver or something like that.
Personally I aim for most reliable. All the system perform close enough that it doesn't matter much, performance wise, what you choose. At least thats what it seems to me. Right now it seems that, at least on commodity hardware, ext3 wins in my book. Although XFS has some very nice aspects to it.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 12:39 UTC (Wed) by nix (subscriber, #2304) [Link]
Indeed. And then there are nasty worst-cases like BitTorrent clients (allocate vast nearly empty sparse file, then fill up blocks of it in random order all the while reading the ones you previously wrote)...
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 13:49 UTC (Wed) by jzbiciak (guest, #5246) [Link]
I am pretty sure current versions of official BitTorrent client put down a large, empty file now instead of filling a sparse file, specifically to avoid the nasty fragmentation effects you mention. In fact, I think that's been true for a couple years now.
Obviously, other clients may not do this. If they do, I'd say file a bug report. Sparse files are good for files that will tend to remain sparse. They make no sense for the kinds of files you'd tend to download w/ BT. After all, you need the disk space to store the file you're downloading, so you may as well allocate it up front.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 21:45 UTC (Wed) by nix (subscriber, #2304) [Link]
The official bittorrent client and Azureus both work sparsely. After all, unless you make an effort to fill a file with content, sparseness is pretty much the default...
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 23:36 UTC (Wed) by jzbiciak (guest, #5246) [Link]
Hmmm.... I must be mistaken. I remember seeing some change notes awhile back (around 3.4) saying that they somehow "fixed" the fragmentation issue. And it seemed (and maybe I was wrong) like it was preallocating the file.
Honestly, that's the sanest thing to do. It makes no sense to start downloading a huge torrent if you don't have the disk space. Maybe I'll submit a patch.... if I can be bothered to learn Python.
(I tried looking through the code for BitTorrent, and quite frankly, it could use some documentation.)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 27, 2006 6:32 UTC (Thu) by nix (subscriber, #2304) [Link]
Hm, you're right, it has changed, I was out of date.
It looks like the official client preallocates in chunks, so that when it receives a block at offset N where N is greater than any previously received block offset in the file, it preallocates up to N but not after it.
Azureus (a much more popular client by all accounts, and where a lot of de facto protocol development seems to be going on) doesn't preallocate.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 28, 2006 23:38 UTC (Fri) by dirtyepic (guest, #30178) [Link]
No, Azureus has offered all three methods - sparse, preallocate, and incremental - as a user configurable option for years now (at least before 2003).
Non-journalled alternatives?
Posted Apr 26, 2006 13:12 UTC (Wed) by ikm (subscriber, #493) [Link]
Seems like the majority of todays filesystems are all journalled. Anyone knows of a non-journalled alternative to ext2? I would like to have a fast filesystem which stores high-precision ctimes/mtimes. Sadly, ext2 does not do the latter.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 13:56 UTC (Wed) by jzbiciak (guest, #5246) [Link]
Yet Another Worthless Benchmark Article. *yawn*
It seems that all the filesystems have performance that I'd consider acceptable. None of them demonstrated truly pathological behavior despite targeted microbenchmarks. Even the premise of "home/small business file server" seems shaky. For such a file server, isn't the Ethernet going to be a bottleneck before the disk is in many cases?
In the end, it seems like the filesystem you choose is going to be determined by some other factor than raw performance, unless your workload looks specifically like one of these microbenchmarks. (e.g. If you spend all day moving gigabyte files around, maybe XFS makes more sense than ext3 in data-journalled mode. But not all of us edit video all day.) In most cases, I'd imagine whatever your chosen distro supports best seems like the best choice by default.
About the only other reason I can see myself picking a different FS than the default for my chosen distro is to pick up some extra feature that's specific to that FS. But for day to day use? Eh.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 14:19 UTC (Wed) by apolinsky (subscriber, #19556) [Link]
The one concern I've always had about journalled file systems is the backup. I remember reading years ago the Unix Backup and Recovery book by Preston. In it, dump was claimed to be the most reliable tool for backups. Dump will work on ext2 and ext3 file systems. I don't believe it is available for the rest of them. There are all sorts of utilities available for backups, but their basis is generally tar or cpio. When you need to restore a file or file system, you want to be absolutely sure that the backup is reliable. Generally tar and cpio are fine, but over time I have come to agree with Preston. Dump seems to be best. That's why I've tended to stick with ext3.
As an aside, can anyone suggest a way of converting a reiser partition to ext3, short of a copy from one file system to another?
Alan
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 15:21 UTC (Wed) by Duncan (guest, #6647) [Link]
> [C]an anyone suggest a way of converting a reiser> partition to ext3, short of a copy from one file
> system to another?
While it's theoretically possible, it's also theoretically extremely
difficult to do in a "safe" manner. Reiserfs' dynamic inode allocation
and tail writing makes a working implementation extremely difficult, as
an "in place" conversion implies there's known unallocated free space in
which to start writing the converted data, and reiserfs breaks enough
traditional rules in that regard that one would virtually have to be a
Namesys employee familiar not only with the reiserfs code, but with the
years of practical experience and knowledge of what /doesn't/ work, that
they've gathered.
It's not something that just anyone could do and get it right, IOW.
Basically, the most practical way to get such a thing would be to contract
with Namesys to create such a converter. I'm sure they'd be very happy to
develop such a converter application, given a commercial contract to do
so.
Of course, that would cost real money, likely a non-trivial amount of it.
Practically speaking, it's easier, cheaper, faster, and more reliable, to
just buy another hard drive if necessary, and go the copy route.
Duncan
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 17:22 UTC (Wed) by remijnj (guest, #5838) [Link]
Basically, the most practical way to get such a thing would be to contract with Namesys to create such a converter. I'm sure they'd be very happy to develop such a converter application, given a commercial contract to do so.I'm not so sure they would be happy about this because they would be creating a tool so people can move away from reiserfs to ext3. Wouldn't that freak them out. Then again, money can buy lots of things.
But i agree that buying a disk is easier/better.
But maybe there is a way to do this, if both filesystems have a resize tool which can make partitions bigger and smaller you could do the following:
- resize the reiser partition to just fit the data
- create a new ext3 partition after it.
- copy enough files to fill the ext3 partition
- resize the reiser partition (smaller)
- resize the ext3 partition (bigger)
- goto 3
This is probably not possible with the existing tools because you would also have to move the ext3 partition forward on the disk to make this work. But i don't know the state of these tools nowadays.
filesystem backups
Posted Apr 26, 2006 17:46 UTC (Wed) by rfunk (subscriber, #4054) [Link]
XFS has xfsdump. Unfortunately with different syntax than dump.I've come to the conclusion that rsync snapshots to an offsite disk are the best form of backup these days, at least for my purposes.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 18:01 UTC (Wed) by jeffs (subscriber, #4024) [Link]
> As an aside, can anyone suggest a way of converting a reiser partition> to ext3, short of a copy from one file system to another?
There is a convertfs utility that will do an inplace conversion between
file systems. I have used it in the past with no problems. YMMV
http://tzukanov.narod.ru/convertfs/
jeff
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian
Posted Apr 26, 2006 19:19 UTC (Wed) by apolinsky (subscriber, #19556) [Link]
Rsync is a fantastic utility. I use it for most of my normal copies from machine to machine. Nonetheless, when archiving to tape, which I still do, I have never had any problems with a tape produced with dump. I have had problems with data backed up with tar and cpio, though I will admit, not often.
Alan
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 27, 2006 13:22 UTC (Thu) by Seegras (guest, #20463) [Link]
My experiences with three of those:
- XFS: Nice, would be my favourite for speed, and handling loads of medium to big files. HOWEVER, it's a no-no on ATA/SATA-Disks. You'll end up easily with a corrupted and irreparable filesystem if your hardware (including but not limited to power-supply, disks, controller or ram) fails. I had a very fun time sorting out the 200GB useable files, out of the 300GB files in lost+found. And yes, it corrupted even files it wasn't writing at the time of the crash.
- Reiserfs: It's been long ago since I used it. I liked it for the handling of a lot of small files. I didn't like it for padding every file to 4KB-boundaries with NULL after it crashed. However, it didn't otherwise garble me any files except those just written. I'd guess its not too bad a choice.
- Ext3: I really think its old-fashioned, slow, and not fun to work with. But it has supreme qualities in not loosing data, and in disaster recovery. Even if your hardware fails. Since my Data is very dear to me, and speed does not matter so much, I use it now. Furtheron, the FS lies on a dmcrypt-volume, so I really don't want to have any other layer which might possibly corrupt data; one is enough ;)
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted Apr 27, 2006 17:16 UTC (Thu) by hch (guest, #5625) [Link]
The problem is not ATA disks but an enabled disk write cache [1].
When the disk write cache is enabled you lose the cache content when power fails. That's a problem with any journaling filesystem because it can't guarantee integrity.
From reports this seems worse for XFS then for other filesystems. This might be somewhat of a false positive though, as XFS does the most strict filesystem consistency tests at runtime and barfs out at the smallest inconsistancy, while the other filesystems do weaker checks and rely on fsck.
These days it shouldn't be a problem anymore, as the journaling filesystems support write barriers that make disk write caches safe to use.
[1] which kinda makes this an ATA problem again. SCSI disks beeing for enterprise/serious use disable the write cache by default, ATA disks OTOH traditionally enable it by default to get much better benchmark results.
Filesystems (ext3, reiser, xfs, jfs) comparison on Debian Etch (Debian Administration)
Posted May 1, 2006 17:50 UTC (Mon) by ddaa (guest, #5338) [Link]
The comments to this article are very interesting. In particular, the explanation why ext3 appears to be so much more reliable by default (ordered data mode). Here is my own personal experience with some filesystems, I used ext3, ReiserFS 3, XFS and back to ext3, always with the default settings.
I used to work with some large hardlink files, when I was using the Arch version control system on a large tree (3000 files) with a large history (5000 revisions). Those who have used Arch know that to get decent performance when the tree size exceeds the disk cache you need a working tree that is well hardlinked to the revision library. The bottom line is that you end up with hundreds of thousands of names for a few thousands files.
In such a setup, ext3 performed abysmally in term of hardlink creation performance, stat performance, and disk space usage. So I eventually switched to ReiserFS. The space savings in the hardlinks farm were impressive (something like 10x), day to day usage were sensibly faster, and "pruning the revlib" (i.e. removing some dozen thousand names from the farm) took seconds or minutes instead of hours.
I was working on a iBook, the critical thing with laptops is that the occasionally shaky hardware support make it sometimes necessary to do hard reboots. And filesystems do not like hard reboots.
I worked happily with ReiserFS until the day where the filesystem went south after a hard reboot and refused to mount at a most inappropriate time (during a coding sprint at my boss' in August 2004). So I booted off the rescue partition and ran the reiserfs repair tool, first in the "try gently" mode, which failed relatively quickly, then in the "try really hard" mode, which ran for hours (something like 8 hours) and eventually failed, leaving me with a completely wasted filesystem. Restoring from backup was required eventually and would have taken much less time if I had done it from the start.
After that episode, I tried XFS again and noticed that the compatibility problems with PPC I experienced were apparently solved. Overall, performance with XFS was slightly slower than with ReiserFS, but I really appreciated the way it was paranoid: when something went wrong, it refused to use the filesystem anymore and required xfsrepair, which worked fast and well. The avaibility of xfsdump also made backing up a pleasure.
I also use my laptop (now a ThinkPad T42) as a workstation for extended periods of time with permanent AC power. In this situation I usually remove the battery so the warmth from the laptop do not age the battery prematurely (where I live, AC power is pretty much uninterrupted year in and year out). Unfortunately, the design of the power cord and connector makes it really easy to accidentally unplug the laptop when pulling it from its remote corner of the desk, producing occasional brutal power failures.
I experienced the hard way the pain of not having ordered data writes. I needed several times to recover some files from backup because a power failure caused XFS to overwrite files with zeros. I especially noticed the damage on gaim, evolution and firefox configuration files. Not critical, but annoying as hell. Once, I needed to to reinstall a system library because the power outage occured shortly after an apt upgrade.
After I was bitten a couple times in two weeks, I called it enough. Since my VCS is now Bazaar-NG, which does not have unreasonable requirements on the filesystem performance, I switched my filesystems back to ext3. It clearly gets less out of the hardware on my day-to-day loads than either ReiserFS or XFS (I love monitoring the disk throughput with gkrellm) but since I do not have an insane hardlink farm anymore, the worst behaviours are avoided, and the performance loss is in the 10-20% range: acceptable.
And I can still enjoy fast backups with dump, without having files filled with zeros on power loss, thanks to ordered data mode.
I remind I used all the filesystems in their default configuration. My personal experience boils down to this: ReiserFS is the best performer, but when it blows, it sucks, and the lack of a dump utility makes backups a pain. XFS is slightly slower but significantly more reliable, though its performance is said to plummet when it lacks free space: that would be my filesystem of choice on a system that never needs hard reboots and has a reliable power supply. Ext3 is a no go if you have FS performance requirements that strongly diverge from the common case (lots of huge files, crazy hardlink farms, more than 32k subdirectories, etc.) but it's otherwise an acceptable all-around performer and ordered data mode makes it heaps more reliable for use on a laptop and with USB drives. I have not tried JFS.
Usual caveat applies, it's just anecdotal evidence.
