Reiser4 is coming
[Posted July 30, 2003 by corbet]
The final part of the 2.3 development series featured a strong campaign to
get the ReiserFS filesystem merged. That campaign was successful; ReiserFS
was added in 2.4.1. Now it appears that history may repeat itself with the
2.6 kernel. Hans Reiser has
posted a note
asking that the soon-to-be-posted Reiser4 patch be merged into 2.6.0-test.
Reiser4 is not an updated version of ReiserFS; it is an entirely new
filesystem. According to the posted
benchmarks, Reiser4 outperforms ReiserFS and ext3 on several fronts.
According to Hans, the performance of Reiser4 is now good enough to justify
including it in 2.6-test.
The truly interesting part of Reiser4 is not limited to performance,
however. Reiser4
is presented as a fully atomic filesystem - every operation either executes
fully or not at all. It thus offers the same sort of crash resistence
found in journaling filesystems, but with a couple of differences. One is
that, it is claimed, the "wandering log" technique used in Reiser4 offers
greater speed, since, unlike with other journaling schemes, it is not
necessary to write data twice. And the other is that the "fully atomic"
nature of the filesystem can extend beyond individual operations. Reiser4,
in other words, can provide actual transactions.
A typical journaling filesystem works by writing all of the blocks to be
changed in a given operation to a special journal file, followed by a
"commit record." Once the operation is committed, the blocks can be copied
from the journal to their real destination on the disk. If the system dies
before the commit record is written, the operation is simply discarded and
the filesystem is unchanged. If, instead, a fully committed operation is
found in the journal, it can be replayed. With a scheme like this, an
operation may be lost in a crash, but the filesystem itself will not be
corrupted.
The Reiser4 wandering log technique works a little differently. It does
not overwrite blocks in the filesystem; instead, blocks to be changed are
relocated and the data is written in the new spot. The block pointers in
the filesystem are changed in an (also relocated) directory block. This
process continues up the filesystem tree until, with a single write
pointing to the new root block, the whole operation is committed. The
elimination of the need to write data separately to a journal file can
increase performance, but this technique also has the potential to fragment
files across the disk, hurting read performance. For that reason, Reiser4
allows for plugin modules which can look at operations and opt for a more
normal journaling scheme when it makes sense. There will also be a
"repacker" program which will go through occasionally and rearrange disks
for better read performance.
The ability to perform multi-operation, multi-file transactions is what
will make Reiser4 truly unique, however. A transactional capability will
allow applications to perform complicated operations without the need to
resort to tricks with fsync() and file renaming, and without the
need to use a separate database manager. Of course, there are a few
residual issues, like the fact that the standard Unix system calls make no
provision for starting, committing, and rolling back transactions. So a
new system call interface will be required. The Reiser4 developers are
working on this interface, but have not yet posted it for wide review.
Linus has not committed himself with regard to merging Reiser4 into 2.6.
It's worth noting that, when ReiserFS was merged, it had been stable and
widely used for some time. That is not the case for Reiser4, which is still in
an early stage. Chances are that Reiser4 will have a harder time
getting into the kernel than ReiserFS did. (For more information on
Reiser4, see this document on
transactions, and this one on
wandering logs, dancing trees, and other journaling topics).
(
Log in to post comments)