The truth may not be quite so grim. Development on Btrfs continues, with a strong emphasis on stability and performance. Problems are getting fixed, and users are beginning to take another look at this promising filesystem. More users are beginning to play with it, and openSUSE considered the idea of using it by default back in September. Your editor's sense is that the situation may be bottoming out, and that we may, slowly, be heading into a new phase where Btrfs takes its place — still slowly — as one of the key Linux filesystems.
This article is intended to be the first in a series for users interested in experimenting with and evaluating the Btrfs filesystem. We'll start with the basics of the design of the filesystem and how it is being developed; that will be followed by a detailed look at specific Btrfs features. One thing that will not appear in this series, though, is benchmark results; experience says that proper filesystem benchmarking is hard to do right; it's also highly workload- and hardware-dependent. Poor-quality results would not be helpful to anybody, so your editor will simply not try.
Not that long ago, Linux users were still working with filesystems that had evolved little since the Unix days. The ext3 filesystem, for example, was still using block pointers: each file's inode (the central data structure holding all the information about the file) contained a list of pointers to each individual block holding the file's data. That design worked well enough when files were small, but it scales poorly: a 1GB file would require 256K individual block pointers. More recent filesystems (including ext4) use pointers to "extents" instead; each extent is a group of contiguous blocks. Since filesystems work to store data contiguously anyway, extent-based storage greatly reduces the overhead of managing a file's space.
Naturally, Btrfs uses extents as well. But it differs from most other Linux filesystems in a significant way: it is a "copy-on-write" (or "COW") filesystem. When data is overwritten in an ext4 filesystem, the new data is written on top of the existing data on the storage device, destroying the old copy. Btrfs, instead, will move overwritten blocks elsewhere in the filesystem and write the new data there, leaving the older copy of the data in place.
The COW mode of operation brings some significant advantages. Since old data is not overwritten, recovery from crashes and power failures should be more straightforward; if a transaction has not completed, the previous state of the data (and metadata) will be where it always was. So, among other things, a COW filesystem does not need to implement a separate journal to provide crash resistance.
Copy-on-write also enables some interesting new features, the most notable of which is snapshots. A snapshot is a virtual copy of the filesystem's contents; it can be created without copying any of the data at all. If, at some later point, a block of data is changed (in either the snapshot or the original), that one block is copied while all of the unchanged data remains shared. Snapshots can be used to provide a sort of "time machine" functionality, or to simply roll back the system after a failed update.
Another important Btrfs feature is its built-in volume manager. A Btrfs filesystem can span multiple physical devices in a number of RAID configurations. Any given volume (collection of one or more physical drives) can also be split into "subvolumes," which can be thought of as independent filesystems sharing a single physical volume set. So Btrfs makes it possible to group part or all of a system's storage into a big pool, then share that pool among a set of filesystems, each with its own usage limits.
Btrfs offers a wide range of other features not supported by other Linux filesystems. It can perform full checksumming of both data and metadata, making it robust in the face of data corruption by the hardware. Full checksumming is expensive, though, so it remains likely to be used in only a minority of installations. Data can be stored on-disk in compressed form. The send/receive feature can be used as part of an incremental backup scheme, among other things. The online defragmentation mechanism can fix up fragmented files in a running filesystem. The 3.12 kernel saw the addition of an offline de-duplication feature; it scans for blocks containing duplicated data and collapses them down to a single, shared copy. And so on.
It is worth noting that the copy-on-write approach is not without its costs. Obviously, some sort of garbage collection is required or all those block copies will quickly eat up all of the available space on the filesystem. Copying blocks can take more time than simply overwriting them as well as significantly increasing the filesystem's memory requirements. COW operations will also have a tendency to fragment files, wrecking the nice, contiguous layout that the filesystem code put so much effort into creating. Fragmentation hurts less with solid-state devices than on rotational storage, but, even in the former case, fragmented files will not be as quick to access.
So all this shiny new Btrfs functionality does not come for free. In many settings, administrators may well decide that the costs associated with Btrfs outweigh the benefits; those sites will stick with filesystems like ext4 or XFS. For others, though, the flexibility and feature set provided with Btrfs are likely to be quite appealing. Once it is generally accepted that Btrfs is ready for real-world use, chances are it will start popping up on a lot of systems.
One concern your editor has heard in conference hallways is that the pace of Btrfs development has slowed. For the curious, here's the changeset count history for the Btrfs code in the kernel, grouped into approximately one-year periods:
Year Changesets Developers 2008 (2.6.25—29) 913 42 2009 (2.6.30—33) 279 45 2010 (2.6.34—37) 193 33 2011 (2.6.38—3.2) 610 67 2012 (3.3—8) 773 63 2013 (3.9—13) 671 68
These numbers, on their own, do not demonstrate a slowing of development; there was an apparent slow period in 2010, but the number of changesets and the number of developers contributing them has held steady thereafter. That said, there are a couple of things to bear in mind when looking at those numbers. One is that the early work involved the addition of features to a brand-new filesystem, while work in 2013 is almost entirely fixes. So the size of the changes has shrunk considerably, but one could easily argue that things should be just that way.
The other relevant point is that contributions by Btrfs creator Chris Mason have clearly fallen in recent years. Partly that is because he has been working on the user-space btrfs-progs code — work which is not reflected in the above, kernel-side-only numbers — but it also seems clear that he has been busy with other work-related issues. It will be interesting to see how things change now that Chris and prolific Btrfs contributor Josef Bacik have found a new home at Facebook.
In summary, the amount of new code going into Btrfs has clearly fallen in recent years, but that will be seen as good news by anybody hoping for a stable filesystem anytime soon. There is still some significant effort going into this filesystem, and chances are good that developer attention will increase as distributors look more closely at using Btrfs by default.
All told, Btrfs still looks interesting, and it seems like the right time to take a closer look at what is still the next generation Linux filesystem. Now that the introductory material is out of the way, the next article in this series will start to actually play with Btrfs and explore its feature set. Those articles (appearing here as they are published) are:
By the end of the series, we plan to have a reasonably comprehensive introduction to Btrfs in place; stay tuned.
The Btrfs filesystem: An introduction
Posted Dec 12, 2013 4:39 UTC (Thu) by geuder (subscriber, #62854) [Link]
And wasn't there an issue with fsck?
Myself I have only 2 experiences:
- it was the default in the Meego systems I used. But for obvious reason that usage did not last long.
- when I once wanted a "portable" Ubuntu installed on a USB stick, I used btrfs because of the possibility to compress everything. Besides that it probably was only a 4GB stick, I thought the bigger the speed gap between CPU and "disk", the more beneficial compression should be for overall performance, because there are plenty of spare cycles. The overall result was catastrophic, because every bigger apt-get upgrade took really several hours. The reason was that apt seems to be really cautious about not ending up in an inconsistent state if aborted in the middle of an operation, so it does plenty of fsync (IIRC) calls. At least back then that was a known performance problem in btrfs. I shortly experimented with hooking eatmydata underneath apt, but for some reasons the project then faded out...
(I still wonder how Meego could live with that problem. Is it just the number of packages, which problably was only a fraction in a Meego tablet compared to a full Ubuntu desktop installation? Or is rpm less paranoid than apt about ending up with inconsistent results if something goes wrong?)
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 10:47 UTC (Fri) by jezuch (subscriber, #52988) [Link]
I've been running btrfs on / for about 4 years, I think, and I can remember just one, non-data-eating incident. So no problems for me. But others[1] have very different experiences, so...
[1] http://changelog.complete.org/archives/9123-results-with-...
> The reason was that apt seems to be really cautious about not ending up in an inconsistent state if aborted in the middle of an operation, so it does plenty of fsync (IIRC) calls. At least back then that was a known performance problem in btrfs.
Yes, this was rather bad, but there was a workaround (tell APT to not be so damn paranoid). And this problem is long gone now.
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 15:48 UTC (Fri) by jezuch (subscriber, #52988) [Link]
But I really like snapshotting. It revolutionized the way I do custom builds of Debian packages: create a snapshot, make a mess (installing build-dependencies etc.), build Chromium (lots and lots of thrashing, up to 20 GB of build artifacts), move away what's important, delete snapshot. And after all of this the main filesystem doesn't have a clue that anything happened. And I don't have to clean up anything at all :)
The Btrfs filesystem: An introduction
Posted Apr 1, 2014 16:12 UTC (Tue) by mcortese (guest, #52099) [Link]
(tell APT to not be so damn paranoid)
How? Especially when apt is called by the installer, not directly by the user?
The Btrfs filesystem: An introduction
Posted Apr 1, 2014 16:43 UTC (Tue) by hummassa (subscriber, #307) [Link]
The Btrfs filesystem: An introduction
Posted Dec 12, 2013 9:23 UTC (Thu) by ebirdie (subscriber, #512) [Link]
<http://www.anandtech.com/show/7500/netgear-readynas-312-2...>
I'm pretty confident there are other vendors too, but to me this was a wake-up for Btrfs.
Great that the editor is publishing the set of articles. Reading the article raised an interest to further information, what vendors/employers might have been active and how they have changed during the presented time period in development of Btrfs.
The Btrfs filesystem: An introduction
Posted Dec 18, 2013 23:18 UTC (Wed) by Lennie (subscriber, #49641) [Link]
The Btrfs filesystem: An introduction
Posted Dec 19, 2013 23:41 UTC (Thu) by Pc5Y9sbv (guest, #41328) [Link]
We are formatting about 20-60 TB of raw disk space (different test scenarios), and copying a wide range of different data trees which include large files and huge numbers of small files generated by programs. There might be about 40-70 TB of uncompressed data in around 10M files (using compress-force=zlib, it shrinks to 10-15 TB).
We wanted to store near-line backups with daily/weekly/monthly snapshot history and it failed miserably. It seems we can use the transparent compression and good old-fashioned rsync --link-dest tricks to store our backup history, but if we instead try to take sub-volume snapshots and just keep modifying the "head" via rsync, it blows up and takes the filesystem with it. So, it can handle the huge number of inodes involved in representing trees millions of files for each day, but it cannot handle the equivalent sub-volume snapshot workload.
The Btrfs filesystem: An introduction
Posted Dec 19, 2013 23:46 UTC (Thu) by Lennie (subscriber, #49641) [Link]
It's a start. Slowly but surely it also be (more) stable for other workloads.
btrfs on raw flash
Posted Dec 12, 2013 14:14 UTC (Thu) by seanyoung (subscriber, #28711) [Link]
In fact, if it can be shown that btrfs performance is significantly faster without FTL would that motivate the manufacturers to produce flash kit where you can bypass the FTL? That would give them an edge over their competition.
btrfs on raw flash
Posted Dec 12, 2013 19:16 UTC (Thu) by drag (subscriber, #31333) [Link]
But right now you can't use btrfs directly on flash.
btrfs on raw flash
Posted Dec 13, 2013 8:49 UTC (Fri) by iq-0 (subscriber, #36655) [Link]
The Btrfs filesystem: An introduction
Posted Dec 12, 2013 16:30 UTC (Thu) by masoncl (subscriber, #47138) [Link]
The Btrfs filesystem: An introduction
Posted Dec 12, 2013 23:33 UTC (Thu) by dowdle (subscriber, #659) [Link]
The Btrfs filesystem: An introduction
Posted Dec 12, 2013 23:43 UTC (Thu) by dowdle (subscriber, #659) [Link]
http://www.oracle.com/us/technologies/linux/product/featu...
Btrfs is listed as a feature in SLES:
https://www.suse.com/products/server/features/
Lastly, Btrfs is also shown as available in the RHEL 7 beta that came out recently. They don't have it listed as "preview only" anymore.
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 0:04 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]
http://www.redhat.com/about/news/archive/2013/12/red-hat-...
"Btrfs, an emerging file system, will be available as a technology preview within Red Hat Enterprise Linux 7"
http://rhelblog.redhat.com/2013/12/11/testers-wanted-red-...
"btrfs file system .. now available to test"
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 13:11 UTC (Fri) by dowdle (subscriber, #659) [Link]
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 0:33 UTC (Fri) by anselm (subscriber, #2796) [Link]
In its most recent incarnation, SUSE Linux Enterprise Server prods you with considerable verve towards using Btrfs for your root file system.
On the other hand, perhaps interestingly, SUSE Linux Enterprise Server doesn't even support ext4 except as a read-only filesystem to get stuff off ext4-formatted disks.
The Btrfs filesystem: An introduction
Posted Dec 18, 2013 8:25 UTC (Wed) by salimma (subscriber, #34460) [Link]
https://www.suse.com/releasenotes/x86_64/SUSE-SLES/11-SP3/
Still not supported though, as you said. Bizarre decision.
The Btrfs filesystem: An introduction
Posted Dec 18, 2013 9:37 UTC (Wed) by anselm (subscriber, #2796) [Link]
Still not supported though, as you said.
And that's exactly the thing.
Even if »ext4 r/w support is a kernel option away«, on SLES you're not supposed to run your own kernels if you want your installation to be supported. And who would ever want to run SLES in the first place if it wasn't for the support?
ZFS
Posted Dec 13, 2013 3:42 UTC (Fri) by grahame (subscriber, #5823) [Link]
Anyone know why full data checksums are considered too expensive for BTRFS? I'm running ZFS and seeing great performance (you do need a fair whack of memory) -- the checksumming doesn't seem to be a big problem on a modern system. It's very nice to know your data is actually there, too -- once you get out to storing petabytes of data, you will start to see data corruption occasionally.
Scrub vs. fsck is a huge win on systems with large filesystems. My experience of ext4 is that it will develop problems over time, and if you've got a huge partition fsck can easily take a day -- and you're offline for that time. From what I know btrfs scrub isn't quite so solid as ZFS?
ZFS
Posted Dec 13, 2013 10:11 UTC (Fri) by cwillu (subscriber, #67268) [Link]
Checksumming is the default; it's typically only turned off for vm images and the like, as a side-effect of disabling copy-on-write on those files (and even this is being addressed).
ZFS
Posted Dec 13, 2013 10:21 UTC (Fri) by iq-0 (subscriber, #36655) [Link]
Data checksumming is a good feature but there are enough cases where people might not want to bother with it but where they're still interested in e.g. snapshotting support, transparent compression, deduplication or incremental send/receive.
The reason why btrfs isn't being picked up as much as zfs: Maturity. In the beginning zfs had much the same issues and uptime was rather slow. But it has aged pretty well and is now a pretty much proven filesystem. Btrfs still has some rough edges which make it a less than ideal filesystem for the layman, but it does have the features and they really work. When more people are using it, so will the tooling improve and will the filesystem be considered the default choice for most common uses.
SailfishOS on Jolla phones
Posted Dec 13, 2013 6:16 UTC (Fri) by zdzichu (subscriber, #17118) [Link]
SailfishOS on Jolla phones
Posted Dec 13, 2013 11:36 UTC (Fri) by ttonino (subscriber, #4073) [Link]
It could solve quite some problems...
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 14:45 UTC (Fri) by ibukanov (subscriber, #3942) [Link]
I could accept a bug in a new file system code, but a bug with double-free in fsck in read-only mode just told me about bad test coverage for a very important recovery tool.
The Btrfs filesystem: An introduction
Posted Dec 13, 2013 15:30 UTC (Fri) by leoc (subscriber, #39773) [Link]
I used it exclusively on fedora 19 and found it extremely stable, but it really suffered from too much thrashing on my T61 with a 64GB SSD. When I upgraded to Fedora 20 I went back to ext4 but I find I miss many of the more useful features like inline compression and volume management.
Btrfs/fsck bugs
Posted Dec 14, 2013 0:54 UTC (Sat) by giraffedata (subscriber, #1954) [Link]
I could accept a bug in a new file system code, but a bug with double-free in fsck in read-only mode just told me about bad test coverage for a very important recovery tool.
I don't follow the comparison. Why is a bug in new file system code more acceptable than one in fsck? Or harder to catch with testing?
I can make a case for the opposite: If I were allocating resources for finding (or preventing) bugs between file system code and fsck, I would give lower priority to fsck. You have all the time in the world to fix the double-free in fsck and recover your data, but if broken file system code failed to store the data, you're screwed regardless of how well fsck works.
Btrfs/fsck bugs
Posted Dec 14, 2013 8:58 UTC (Sat) by ibukanov (subscriber, #3942) [Link]
> I would give lower priority to fsck.
For me working fsck gives an extra confidence that the data can be recovered as there are at least 2 types of code (fylesystem itself mounted read-only and the checker) that one can use after bugs. That was the reason that I tried it only at Fedora 17 when the long-promised fsck for Btrfs was finally appeared after at least 2 year delay. I suppose that in turn contributed to the delays with wider Btrfs usage.
The Btrfs filesystem: An introduction
Posted Dec 19, 2013 2:55 UTC (Thu) by heijo (guest, #88363) [Link]
How much time has passed since the last bug causing unrecoverable data corruption has been fixed?
Is there any study of the probability that btrfs is free from such bugs? (since, obviously, a filesystem is only usable if that is considered to be near certainty)
The Btrfs filesystem: An introduction
Posted Dec 19, 2013 2:57 UTC (Thu) by heijo (guest, #88363) [Link]
Is the btrfs design optimal for SSDs?
With 1TB SSDs now going for $500, everyone is soon going to use them for all their main data storage needs, which is what needs to be fast on desktops, so it's essential that the default filesystem is optimal for them.
The Btrfs filesystem: An introduction
Posted Dec 20, 2013 20:01 UTC (Fri) by JanC_ (subscriber, #34940) [Link]
Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the
Creative
Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds