LWN.net Logo

Benchmarking Filesystems (LinuxGazette)

LinuxGazette compares journaling filesystems. "I recently purchased a Western Digital 250GB/8M/7200RPM drive and wondered which journaling file system I should use. I currently use ext2 on my other, smaller hard drives. Upon reboot or unclean shutdown, e2fsck takes a while on drives only 40 and 60 gigabytes. Therefore I knew using a journaling file system would be my best bet. The question is: which is the best? In order to determine this I used common operations that Linux users may perform on a regular basis instead of using benchmark tools such as Bonnie or Iozone. I wanted a "real life" benchmark analysis."
(Log in to post comments)

Benchmarking Filesystems (LinuxGazette)

Posted May 11, 2004 14:33 UTC (Tue) by huffd (guest, #10382) [Link]

For my own purposes I use the file system that is secure, has always brought back my data after a crash and is also sponsored by The Defense Advanced Research Projects Agency.

Benchmarking Filesystems (LinuxGazette)

Posted May 11, 2004 17:10 UTC (Tue) by trutkin (guest, #3919) [Link]

And which one would that be?

Benchmarking Filesystems (LinuxGazette)

Posted May 12, 2004 5:17 UTC (Wed) by crankysysadmin (guest, #19449) [Link]

Apparently DARPA is a sponsor of Reiser4.

Benchmarking Filesystems (LinuxGazette)

Posted May 11, 2004 14:48 UTC (Tue) by marduk (subscriber, #3831) [Link]

It would have been nice if they could make the charts a little easier to read.

Benchmarking Filesystems (LinuxGazette)

Posted May 11, 2004 14:56 UTC (Tue) by parimi (subscriber, #5773) [Link]

Looks like they used MS Excel to prepare the charts.

Charts

Posted May 11, 2004 15:15 UTC (Tue) by rfunk (subscriber, #4054) [Link]

It would've been nice if the author had:
- Used PNG or GIF files instead of JPEG, since JPEG fuzzes up text; JPEG
is for photos, not charts
- Been consistent about the style (2D vs 3D) and orientation (vertical vs
horizontal) of the graphs

Graph styles

Posted May 11, 2004 15:21 UTC (Tue) by Ross (subscriber, #4065) [Link]

Using a line graph where the points on the X axis have no relation is
bad style because it is confusing. They should have been point or bar
graphs. The JPEG artifacts combined with the tiny text makes it very
hard to read as well.

Benchmarking Filesystems (LinuxGazette)

Posted May 11, 2004 14:58 UTC (Tue) by maney (subscriber, #12630) [Link]

Here's the article with readable graphs, as were mentioned in the original Slashdot posting. Personally, I wish he'd thrown in some of those traditional benchmarks, but having done some much more limited filesystem benchmarking myself I cannot but admire the determination it must have taken to complete these tests.

Benchmarking Filesystems (LinuxGazette)

Posted May 13, 2004 9:36 UTC (Thu) by amazingblair (subscriber, #2789) [Link]

Rats! I suffered thru the entire article before I got to your comment, with its link to beautifully clear versions of the graphs. :-) Thanks. I guess.
-Amazing Blair

What about backups?

Posted May 11, 2004 15:07 UTC (Tue) by bluecobra (guest, #195) [Link]

I tried Reiserfs when Redhat 9 came out. I was very excited since this was so easily configured.

I rely heavily on dump for my backups, so my disappointment came when I discovered that there was no compatible dump utility available, only the standard tar, cpio, dd etc...

Please don't use dump, unless you unmount first

Posted May 11, 2004 18:48 UTC (Tue) by jvotaw (subscriber, #3678) [Link]

Unless you unmount your filesystem before using dump, you may get a corrupt backup. Or has this been fixed with recent versions of the kernel?

Personally, I use rsync to create an offline snapshot* of data in another directory (or on another machine), and then tar that to tape. Tar has its disadvantages, but at least its archives aren't FS-specific. So far this seems to work well, with about 500 GB of other people's data, ~6 million files, monthly full backups and nightly differentials. It's not pretty, but it works and is reliable.

-Joel


*Yes, this isn't a true snapshot, but it seems to be good enough for our needs. Yes, I need to revisit LVM -- last time I used it, if you took more snapshots than you had free space in the VG, the kernel panicked. Admittedly this was a while back.

Performance vs reliability

Posted May 11, 2004 15:19 UTC (Tue) by rfunk (subscriber, #4054) [Link]

I find it strange that this article doesn't seem to consider reliability
at all; performance is considered the ultimate goal. While all these
filesystems are quite reliable, some are more proven than others, and
different ones have different failure modes.

And how about the time it takes to check the filesystem? Just because a
filesystem is journaled doesn't mean the magnetic media never gets
corrupted.

Performance vs reliability

Posted May 11, 2004 16:25 UTC (Tue) by j-harris (subscriber, #18472) [Link]

From experience I find ReiserFS damn quick to recover from a dirty
shutdown, a little faster than Ext3 from what I've seen on my modest size
file systems.

I've never suffered any corruption with either ReiserFS or Ext3 (or even
Ext2 for that matter), but I might just be lucky.

cheers

Jamie

Performance vs reliability

Posted May 11, 2004 16:36 UTC (Tue) by rjamestaylor (guest, #339) [Link]

EXT3, IIRC, journals the data whereas the other, istr, journal the inodes or metadata or something else less than the data. That would go towards explaining speed differences if true -- comment?

Performance vs reliability

Posted May 11, 2004 16:43 UTC (Tue) by j-harris (subscriber, #18472) [Link]

The journals make recovery far quicker, but they add an overhead to the
file system as the journal needs to be updated whenever the FS changes.
This should mean that journaling file systems are slow, however their
more efficient data structures seem to more than compensate for this.

The benchmarks show that you really do need to pick your FS to match what
you are going to be doing. People storing big files (divx collections
and the like) would be wise to take a look at XFS/JFS by the looks of it.

cheers

Jamie

data=journal

Posted May 11, 2004 18:59 UTC (Tue) by jvotaw (subscriber, #3678) [Link]


Ext3 has three ways of handling data, which you can choose at mount time. If you want data to be fully journaled, mount with data=journal.

From the mount(8) manpage:

data=journal / data=ordered / data=writeback

Specifies the journalling mode for file data. Metadata is always journaled.

journal
All data is committed into the journal prior to being written into the main file system.

ordered
This is the default mode. All data is forced directly out to the main file system prior to its metadata being committed to the journal.

writeback
Data ordering is not preserved - data may be written into the main file system after its metadata has been commit- ted to the journal. This is rumoured to be the highest- throughput option. It guarantees internal file system integrity, however it can allow old data to appear in files after a crash and journal recovery.

----------------------------------------------

As far as I know, the ext3 data structures aren't more efficient than ext2 data structures, so ext3 is (or at least was, last I checked) a bit slower than ext2 because of the journalling. This can be helped by putting the journal on a separate, fast device, like a fast disk or a battery-backed RAM card.

Also, mounting with noatime will help on most filesystems.

Summary, a sort of.

Posted May 13, 2004 11:50 UTC (Thu) by gvy (guest, #11981) [Link]

No experience with JFS, sorry. Otherwise some roundup of real-world Linux filesystems reliability based primarily on own experience and aligned with more seasoned people's reports (i.e. no silenced "other news" on my side).

ext2: bad behaviour with large directories (linear searches); highest overall data reliabilility, taking into account the number of people and tools to dig up a broken fs. Unacceptable fsck times on more than a few gigs though.

Has patches for compression (doubt it's actual given media size to reliability ratios these days) and ACLs (sad it's not mainline).

ext3: same bad large dirs (surprised at the test -- ext3 developer I know says that a patch to improve on linear directory structure helps but isn't mainline); (default) data=ordered never corrupted our data in an unjustified manner. Recovery time quite OK. Feels very bad under differentiated load (say several reader streams and a steady writer); the same person admits it's due to lack of delayed block allocation.

reiser3: used for a long time, sometimes till now (/var/cache/squid and the likes since it blazes in dir work and is good at linear reading). Has the tendency of getting irrelevant junk in files open before crash; don't remember whether opening them for reading suffices to get a chance. That's cured with SuSE patch which is strangely not endorsed by Hans (as "fixing old version" or something like this). Nevertheless...

reiser4: ...is strongly NOT recommended for production use by a friend of mine employed at Namesys; at least that was the case half a year ago. Still I doubt such things go mature in such short term, so DARPA argument above is worth nothing for me.

xfs: is what I'm primarily running these days. It has a problem like reiser3 on file corruption during crashes which manifests as 0's being [part of] content; there were some reports that it happened on files which are only opened for read.

Nevertheless xfs is the most balanced fs between these for me: occasionally needed ACLs don't require kernel patching or otherwise tweaking the production systems; big dir reads are good enough for me but big file reads are even better; and the most important thing is the ease of xfs handling multi-threaded reading/writing: the system just does it and slows not to crawl but only somewhat.

In fact, that was the entire reason for our free software mirror here in Ukraine (sorry, reachable via local IX only for now) to migrate from ext3/xfs to pure xfs: downloads are diverse, from source files to ISO images and from dialup to 100Mbps links, and "up"loads (really pulls) can consume some 5 or so megabytes per second of particular device's bandwidth; once I've seen total I/O on 120Gb Seagate below 1Mb/s with LA heading to 6 (so ~6 processes were waiting for their I/O) with ext3, and reproducing the situation on the same drive on the same controller (different ribbon, naturally) but with xfs went on at full speed like nothing happened.

Another big reason for that was a recommendation of a friend who has large experience in NAS area -- they're FS-sensitive indeed, both performance and reliability.

Please note it's errr... not opinion, but still only a personal PoV. You don't have to believe me or to try to change my mind, it's just OK that we have diverse tools for diverse tasks.

No journaling FS will spare you UPS, and no others' experience will provide you a backup.

Good luck using all of these, and may your data never be lost.

Re: Summary, a sort of.

Posted May 13, 2004 12:35 UTC (Thu) by Ross (subscriber, #4065) [Link]

I thought ext2 and ext3 were updated with on-disk hash trees for
directories? Or was that patch never integrated?

Re: Summary, a sort of.

Posted May 14, 2004 9:25 UTC (Fri) by gvy (guest, #11981) [Link]

The conversation mentioned was a few months ago; I'll try to re-ask the developer.

Re: Summary, a sort of.

Posted May 21, 2004 15:00 UTC (Fri) by nobrowser (guest, #21196) [Link]

That's the htree patch. It is in 2.6, but the 2.4 chief maintainer recently
decided that it will never be offficial there.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds