Btrfs: broken by design?
Edward Shishkin, perhaps better known for his efforts to keep reiser4 development alive, first posted some concerns on June 3. It seems he ran a simple test: create a new Btrfs filesystem, then create 2048-byte files until space runs out. Others have talked about suboptimal space efficiency in Btrfs before, but Edward was still surprised that he was only able to use 17% of the nominal space in the filesystem before it was reported as being full. Such poor efficiency was, according to Edward, evidence the Btrfs was "broken by design" and should not be used:
Part of the problem comes down to the use of "inline extents" in Btrfs. The core data structure on a Btrfs filesystem is a B-tree which provides access to all of the objects stored in the filesystem. For larger files, the actual file data is stored in extents, which are pointed to from within the tree. Small extents, though, can be stored in the tree itself, hopefully yielding both better space efficiency and better performance. If these extents are sized inconveniently, though, they can cause a lot of wasted space. There's only room for one 2048-byte inline extent in a B-tree node, leaving 1800 bytes or so of unused space. That is a lot of internal fragmentation - a lot of wasted space.
As noted in Chris Mason's response, there are two approaches which can be taken to mitigate this kind of problem. One is to turn off inline extents altogether; Btrfs has a max_inline= mount option which can be used for just that purpose. The other approach would be to allow inline extents to be split between tree nodes so that the pieces could be sized to fill those nodes exactly. Btrfs cannot do that, and probably will not be able to anytime soon:
Chris also noted that most of the other variable-size items stored in B-tree nodes - extended attributes, for example - can be split between nodes if need be. So these items should not cause fragmentation problems; it's mainly the inline extents which are at fault there.
But, as Edward pointed out, there's more to the problem than inline extents. In his investigations, he's found numerous places where groups of nearly-empty nodes exist; some were less than 1% utilized. That, in all likelihood, is the real source of the worst space utilization problems. To Edward, this behavior is another sign that the algorithms used in Btrfs are all wrong and in need of a redesign.
Chris sees it a little differently, though:
He has promised to track it down and post a fix. Between the bug fix and turning off inline extents (or, at least, reducing their maximum size), it is hoped that the worst space utilization problems in Btrfs will be no more.
That fix has not been posted as of this writing, so its effectiveness
cannot yet be judged. But, chances are, this is not a case of a filesystem
needing a fundamental redesign. Instead, all it needs is more extensive
testing, some performance tuning, and, inevitably, some bug fixes. The
good news is that the process seems to be working as it should be: these
problems have been found before any sort of wide-scale deployment of this
very new filesystem.
| Index entries for this article | |
|---|---|
| Kernel | Btrfs |
| Kernel | Filesystems/Btrfs |
