LWN.net Logo

Reiser4 is coming

The final part of the 2.3 development series featured a strong campaign to get the ReiserFS filesystem merged. That campaign was successful; ReiserFS was added in 2.4.1. Now it appears that history may repeat itself with the 2.6 kernel. Hans Reiser has posted a note asking that the soon-to-be-posted Reiser4 patch be merged into 2.6.0-test.

Reiser4 is not an updated version of ReiserFS; it is an entirely new filesystem. According to the posted benchmarks, Reiser4 outperforms ReiserFS and ext3 on several fronts. According to Hans, the performance of Reiser4 is now good enough to justify including it in 2.6-test.

The truly interesting part of Reiser4 is not limited to performance, however. Reiser4 is presented as a fully atomic filesystem - every operation either executes fully or not at all. It thus offers the same sort of crash resistence found in journaling filesystems, but with a couple of differences. One is that, it is claimed, the "wandering log" technique used in Reiser4 offers greater speed, since, unlike with other journaling schemes, it is not necessary to write data twice. And the other is that the "fully atomic" nature of the filesystem can extend beyond individual operations. Reiser4, in other words, can provide actual transactions.

A typical journaling filesystem works by writing all of the blocks to be changed in a given operation to a special journal file, followed by a "commit record." Once the operation is committed, the blocks can be copied from the journal to their real destination on the disk. If the system dies before the commit record is written, the operation is simply discarded and the filesystem is unchanged. If, instead, a fully committed operation is found in the journal, it can be replayed. With a scheme like this, an operation may be lost in a crash, but the filesystem itself will not be corrupted.

The Reiser4 wandering log technique works a little differently. It does not overwrite blocks in the filesystem; instead, blocks to be changed are relocated and the data is written in the new spot. The block pointers in the filesystem are changed in an (also relocated) directory block. This process continues up the filesystem tree until, with a single write pointing to the new root block, the whole operation is committed. The elimination of the need to write data separately to a journal file can increase performance, but this technique also has the potential to fragment files across the disk, hurting read performance. For that reason, Reiser4 allows for plugin modules which can look at operations and opt for a more normal journaling scheme when it makes sense. There will also be a "repacker" program which will go through occasionally and rearrange disks for better read performance.

The ability to perform multi-operation, multi-file transactions is what will make Reiser4 truly unique, however. A transactional capability will allow applications to perform complicated operations without the need to resort to tricks with fsync() and file renaming, and without the need to use a separate database manager. Of course, there are a few residual issues, like the fact that the standard Unix system calls make no provision for starting, committing, and rolling back transactions. So a new system call interface will be required. The Reiser4 developers are working on this interface, but have not yet posted it for wide review.

Linus has not committed himself with regard to merging Reiser4 into 2.6. It's worth noting that, when ReiserFS was merged, it had been stable and widely used for some time. That is not the case for Reiser4, which is still in an early stage. Chances are that Reiser4 will have a harder time getting into the kernel than ReiserFS did. (For more information on Reiser4, see this document on transactions, and this one on wandering logs, dancing trees, and other journaling topics).


(Log in to post comments)

Reiser4 is coming

Posted Jul 31, 2003 5:05 UTC (Thu) by faramir (subscriber, #2327) [Link]

I see no obvious reason why Reiser4 shouldn't be included in
the 2.6 kernel. Assumming (a big assumption) that it doesn't
require changes in the core code to the kernel, how is adding
Reiser4 any different then a device driver for a new network
card?

Reiser4 is coming

Posted Jul 31, 2003 15:11 UTC (Thu) by remijnj (subscriber, #5838) [Link]

As is said in the article. It didn't get much (public) testing yet so it's not proven stable yet.

It _is_ fundamentally different from just including a driver. People could use this and lose their files. That is not acceptable in a stable kernel.

Just patch the kernel and test (with non-critical files). If it proves stable over time it will get included anyway. Especially if some distros include it first.

Reiser4 is coming

Posted Aug 7, 2003 19:16 UTC (Thu) by huaz (guest, #10168) [Link]

Why? NTFS write support had been in stable kernel for a long time and marked as DANGEROUS.

Reiser4 is coming

Posted Jul 31, 2003 17:02 UTC (Thu) by rwmj (subscriber, #5474) [Link]

Mr. Reiser needs to concentrate a little bit more on making the existing filesystem stable under load, and giving it a decent fsck tool.

Last time I compared ext3 vs reiserfs on a large RAID array which was about to go into production, reiserfs corrupted itself and proved unrecoverable. Needless to say ext3 worked fine and has been deployed on a very heavily used mail system (1M user webmail) for 18 months now without any error.

Rich.

Reiser4 is coming

Posted Jul 31, 2003 17:23 UTC (Thu) by cpeterso (guest, #305) [Link]

Reiser4's walking of the directory block tree sounds a lot like FreeBSD's Soft Updates: FS changes are written to disk in an order that prevents crashes from corrupting user data or FS meta-data on disk.

When Reiser4 is creating new copies of directory blocks, I wonder how it handles multiple, simultaneous changes to the same directory blocks?

Reiser4 is coming

Posted Jul 31, 2003 23:04 UTC (Thu) by acorliss (guest, #3710) [Link]

SGI didn't whine like Hans did when XFS missed the feature-freeze -- and that's with a *much* more mature and tested filesystem than Reiser. Make him wait. If he can't operate within the same cooperative framework that other developers do, that's his problem.

I also find it interesting that the latest Reiser hasn't been benchmarked against XFS. Looks like they only want to deal with favourable comparisons.

Reiser4 is coming

Posted Aug 1, 2003 13:07 UTC (Fri) by Peter (guest, #1127) [Link]

I also find it interesting that the latest Reiser hasn't been benchmarked against XFS. Looks like they only want to deal with favourable comparisons.

I get the distinct feeling Hans Reiser has a personal vendetta against ext3 and Red Hat Software. I remember his outrage when some release of Red Hat Linux (7.0?) shipped with ext3 but not reiserfs support. He seemed quite sure that this was purely a political move to promote a technology developed by Red Hat employees, and he seemed to take it quite personally. (According to Alan Cox of Red Hat, the decision was actually based on the simple fact that ext3 survived the stress tests the RH kernel team threw at it, and reiserfs didn't.)

Reiser4 is coming

Posted Aug 1, 2003 16:23 UTC (Fri) by cpeterso (guest, #305) [Link]

SGI doesn't make money if Linux ships with XFS or not. ReiserFS is Hans Reiser's personal business. He doesn't directly make money shipping ReiserFS with Linux, but it would help increase his user base (and possible future paying customers).

Tux2 anyone?

Posted Aug 1, 2003 20:37 UTC (Fri) by mp (subscriber, #5615) [Link]

From this short description the wandering log seems quite similar to what
Daniel Phillips described as a tree-structured filesystem.
Is this similarity just superficial or something deeper?

reiserfs

Posted Aug 3, 2003 22:43 UTC (Sun) by yem (guest, #1138) [Link]

I keep hearing about a problem with reiserfs that results in data being overwritten (corruption) completely silently. Perhaps something to do with a hash collision. This seems to come up in every reiserfs discussion, but I haven't seen a explanation or response from Hans.

Can anyone confirm this problem exists or point to Namesys's official position on the matter?

I haven't hit the problem (that I know of - it is a silent failure) but it keeps coming up so some official info would be much appreciated.

reiserfs

Posted Aug 7, 2003 11:37 UTC (Thu) by joib (guest, #8541) [Link]

I read somewhere that reiserfs does, in fact, not handle hash collisions at all. Thus the silent overwriting. Supposedly this is because of speed, certainly hash collision detection and handling reduces performance. I guess they figured that their hash algorithm is so good that these collision occur extremely seldom, and avoiding them is not worth the performance loss. And perhaps they were right, I have never heard of a problem that was actually caused by a hash collision, so perhaps it only is a theoretical problem?

reiserfs

Posted Sep 11, 2005 20:54 UTC (Sun) by gst (guest, #21487) [Link]

theoretical yes... as long as you can't intentionally create hash collisions on their r5 hashing algorithm. since md5 and sha1 don't seem to be so secure anymore i wonder how hard it would be to create collisions on r5.

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds