JLS2009: A Btrfs update
The Btrfs filesystem was merged for the 2.6.29 kernel, mostly as a way to encourage wider testing and development. It is certainly not meant for production use at this time. That said, there are people doing serious work on top of Btrfs; it is getting to where it is stable enough for daring users. Current Btrfs includes an all-caps warning in the Kconfig file stating that the disk format has not yet been stabilized; Chris is planning to remove that warning, perhaps for the 2.6.33 release. Btrfs, in other words, is progressing quickly.
One relatively recent addition is full use of zlib compression. Online resizing and defragmentation are coming along nicely. There has also been some work aimed at making synchronous I/O operations work well.
Defragmentation in Btrfs is easy: any specific file can be defragmented by
simply reading it and writing it back. Since Btrfs is a copy-on-write
filesystem, this rewrite will create a new copy of the file's data which
will be as contiguous as the filesystem is able to make it. This approach
can also be used to control the layout of files on the filesystem. As an
experiment, Chris took a bunch of boot-tracing data from a Moblin system
and analyzed it to figure out which files were accessed, and in which
order. He then rewrote the files in question to put them all in the same
part of the disk. The result was a halving of the I/O time during boot,
resulting in a faster system initialization and smiles all around.
Performance of synchronous operations has been an important issue over the last year. On filesystems like ext3, an fsync() call will flush out a lot of data which is not related to the actual file involved; that adds a significant performance penalty for fsync() use and discourages careful programming. Btrfs has improved the situation by creating an entirely separate Btree on each filesystem which is used for synchronous I/O operations. That tree is managed identically to, but separately from, the regular filesystem tree. When an fsync() call comes along, Btrfs can use this tree to only force out operations for the specific file involved. That gives a major performance win over ext3 and ext4.
A further improvement would be the ability to write a set of files, then flush them all out in a single operation. Btrfs could do that, but there's no way in POSIX to tell the kernel to flush multiple files at once. Fixing that is likely to involve a new system call.
Btrfs provides a number of features which are also available via the device mapper and MD subsystems; some people have wondered if this duplication of features makes sense. But there are some good reasons for it; Chris gave a couple of examples:
- Doing snapshots at the device mapper/LVM layer involves making a lot
more copies of the relevant data. Chris ran an experiment where he
created a 400MB file, created a bunch of snapshots, then overwrote the
file. Btrfs is able to just write the new version, while allowing all
of the snapshots to share the old copy. LVM, instead, copies the data
once for each snapshot. So this test, which ran in less than two
seconds on Btrfs, took about ten minutes with LVM.
- Anybody who has had to replace a drive in a RAID array knows that the rebuild process can be long and painful. While all of that data is being copied, the array runs slowly and does not provide the usual protections. The advantage of running RAID within Btrfs is that the filesystem knows which blocks contain useful data and which do not. So, while an MD-based RAID array must copy an entire drive's worth of data, Btrfs can get by without copying unused blocks.
So what does the future hold? Chris says that the 2.6.32 kernel will
include a version of Btrfs which is stable enough for early adopters to
play with. In 2.6.33, with any luck, the filesystem will have RAID4 and
RAID5 support. Things will then stabilize further for 2.6.34. Chris was
typically cagey when talking about production use, though, pointing out
that it always takes a number of years to develop complete confidence in a
new filesystem. So, while those of us with curiosity, courage, and good
backups could maybe be making regular use of Btrfs within a year,
widespread adoption is likely to be rather farther away than that.
| Index entries for this article | |
|---|---|
| Kernel | Btrfs |
| Kernel | Filesystems/Btrfs |
