|
|
Subscribe / Log in / New account

Btrfs send/receive

By Jonathan Corbet
July 11, 2012
The btrfs snapshot capability allows a system administrator to quickly capture the state of a filesystem at any given time. Thanks to the copy-on-write mechanism used by btrfs, snapshots share data with other snapshots or the "live" system; blocks are only duplicated when they are changed. While btrfs makes the creation and management of snapshots easy, it currently lacks the ability to efficiently determine what the differences are between two snapshots and save that information for future use. Given that some other advanced filesystems (ZFS, for example) offer that capability, btrfs can arguably be seen as falling a little short in this particular area.

Happily, that situation appears to be about to change, as Alexander Block's btrfs send/receive patch set has been well received by the development community. In short, with this patch set (and the associated user space tools), btrfs can be instructed to calculate the set of changes made between two snapshots and serialize them to a file. That file can then be replayed elsewhere, possibly at some future time, to regenerate one snapshot from the other.

This functionality is implemented with the new BTRFS_IOC_SEND ioctl() command. In its simplest form, this operation accepts a file descriptor representing a mounted volume and the subvolume ID corresponding to the snapshot of interest; it will then find the changes between the snapshot and the "parent" snapshot it was generated from. There are more options, though:

  • The operation can actually take a list of snapshot/subvolume IDs and generate a combined file for all of them.

  • The parent snapshot can be specified explicitly. That may be required for older btrfs volumes that lack the needed identifying information. It may also be useful to generate differences that skip over a set of snapshots — differences from a grandparent, say, instead of the direct parent.

  • The command also accepts an optional list of "clone sources." Those are subvolumes that can be expected to exist on the receiving side; when possible, data blocks will be "cloned" from those snapshots rather than being written into the differences file. That reduces the size of the differences, and enables better data sharing on the receive side.

The generated file is essentially a set of instructions for converting the parent snapshot into the one being "sent." The list of commands is surprisingly long, including operations like create a file (or directory, device node, FIFO, symbolic link, ...), rename or link a file, unlink a file, set and remove extended attributes, write data, clone data blocks, truncate a file, change ownership and permissions, set file times, and so on. The code that generates this file is also surprisingly long, being several thousand lines of complex, nearly uncommented functions (some of the comments that do exist, saying things like "magic happens here," are not entirely helpful).

Interestingly, according to the patch introduction, the custom file format was not in the original plan. Instead, the output was meant to be in something close to the tar file format — close enough that the tar command could be used to extract data from it. Tar turned out not to have the needed capabilities, though, so a new format was created. The format should be considered to be in flux still, though, clearly, it will need to stabilize before this feature can be considered ready for production use. As it happens, the playback of this file can be done almost entirely in user space, so there is no need for a BTRFS_IOC_RECEIVE operation.

At the command level, using this feature can be as simple as:

    btrfs send snapshot

This will send the given snapshot (in its entirety) to the standard output stream. Writing the command as:

    btrfs send -i oldsnap snapshot

will cause the creation of an incremental send containing just the differences from oldsnap. The receive command can be used to apply a file created by btrfs send to an existing filesystem.

The primary use case for this feature (which is clearly patterned after the ZFS send/receive functionality) is backups in various forms. A cron job could easily send a snapshot to a remote server on a regular basis, maintaining a mirror of a filesystem there. The send files can simply be stored as backups; an entire volume can be sent as a full backup, while snapshots are easily sent as incrementals. With some additional tooling, the send/receive feature could develop into an advanced backup capability with low-level support from the underlying filesystem.

That is for some time in the future, though; the feature is currently experimental, and Alexander warns potential users:

If you use it for backups, you're taking big risks and may end up with unusable backups. Please do not only count on btrfs send/receive backups!

That said, there seems to be a fair amount of interest in this feature (btrfs creator Chris Mason described it as "just awesome"), so chances are it will be worked into reasonable shape relatively quickly. Then btrfs will have one more useful feature and one less reason to be concerned about comparisons with that other filesystem.

Index entries for this article
KernelBtrfs
KernelFilesystems/Btrfs


to post comments

Btrfs send/receive

Posted Jul 12, 2012 2:50 UTC (Thu) by pranith (subscriber, #53092) [Link] (7 responses)

I wonder what other features are missing from btrfs when compared to ZFS. Also a performance comparison between the both would be interesting.

ZFS features not currently in btrfs

Posted Jul 12, 2012 11:19 UTC (Thu) by Tobu (subscriber, #24111) [Link] (6 responses)

Mostly an SSD cache (bcache is aiming for that, works at the block layer, an extensible superblock format or filesystem integration would be needed for ease of configuration) and deduplication (btrfs patches are floating around at the moment). See this page for a ZFS comparison (some features, like overprovisioning and automounting, can be provided outside btrfs).

ZFS features not currently in btrfs

Posted Jul 12, 2012 18:13 UTC (Thu) by clump (subscriber, #27801) [Link]

I think the article's tone is a bit snarky, but I it's a mostly accurate view of how current Btrfs stacks up to ZFS.

ZFS features not currently in btrfs

Posted Jul 14, 2012 6:57 UTC (Sat) by Rudd-O (guest, #61155) [Link] (3 responses)

btrfs also lacks a ZIL log device feature that allows it to commit large numbers of small transactions and then transform them into streaming writes for rotating or slower devices.

btrfs also lacks the ability to organize file systems in trees, with the properties of children file systems being inherited from the parent.

There's also no RAIDZ (which means btrfs is vulnerable to the RAID5 write hole).

I wrote about all the other things that btrfs is missing on my blog (Rudd-O.com).

btrfs is at least 4 years away from achieving feature parity with ZFS.

ZFS features not currently in btrfs

Posted Jul 14, 2012 11:32 UTC (Sat) by Tobu (subscriber, #24111) [Link]

ZIL/SLOG is part of the bcache featureset. bcache can be a write-through cache, which only improves read performance, or a writeback cache, which makes post-SSD write io sequential. Another advantage of bcache is that it is persistent, unlike the ZFS read cache (L2ARC). That allows bcache to speed up booting.

The raid5 write hole exists when btrfs is layered over md raid. When btrfs' parity raid implementation lands, it will probably use the same technique as raid-z (overwriting entire stripes instead of patching them).

I don't think you held back on listing zfs advantages over btrfs, this was the second newest zfs article on your blog.

btrsf features not currently in ZFS

Posted Jul 17, 2012 17:16 UTC (Tue) by Lennie (subscriber, #49641) [Link] (1 responses)

So what about the other way around ?

I know btrfs can use a readonly device (like a cdrom) as a base filesystem and store only the changes on an other device (like an USB-stick). Can ZFS do that too ?

What about memory usage ? I heared btrfs actually runs well on smartphones.

How does ZFS fare in such an environment ? I've heared stories from the FreeBSD camp that ZFS uses a lot of memory.

Obviously that could have been with deduplication turned on in ZFS which would be understandable if that uses a lot of memory. But still the numbers were pretty scary if I remember correctly (haven't tried it myself).

I tried ZFS on Linux one time and compared it to btrfs. And I believe I do know one thing which is missing for ZFS on Linux, which is: proper integration with Linux itself.

For example I had 2 VMs for testing both filesystems, they both had 3 virtual disks. One with the system and 2 others with a ZFS/zpool or btrfs filesystem. It was a RAID1 like setup (store 2 copies of each block on different virtual disks).

When btrfs had a missing virtual disk I could tell it to mount it in a degraded mode.

While their were problems getting ZFS to mount the filesystem, not even in a degraded mode (it could be because of my limitted knowledge of ZFS of course).

I would reboot the VM and remove a virtual disk as a test case.

When disk1 was missing and disk2 would be called disk1 I couldn't get ZFS to recognise that it could mount that fs/disk.

If only disk2 was missing, then disk1 was still disk1 and it could mount the filesystem on disk1 in a degraded mode just fine.

ZFS on Linux also seemed to be slightly slower and use more CPU, but as it was a VM it wasn't a perfect test environment so I can't be sure about that.

btrsf features not currently in ZFS

Posted Aug 9, 2012 8:38 UTC (Thu) by etbe (subscriber, #17516) [Link]

One of my ZFS servers used to have 4G of RAM. It had problems with kernel memory allocation so I upgraded it to 12G. It's the first time I've ever had such problems on a system with 4G and I really didn't expect it from such a light SMB and NFS load after I had made the recommended changes to limit the size of the ARC.

Yes, I have deduplication turned off.

I agree that integration with Linux is an issue. With BTRFS you have all filesystems listed in /etc/fstab while with ZFS they are all managed by ZFS software without a mention in /etc/fstab.

BTRFS is more like just another filesystem to use, while ZFS is something that totally owns your server.

ZFS features not currently in btrfs

Posted Jul 14, 2012 6:59 UTC (Sat) by Rudd-O (guest, #61155) [Link]

Thanks for linking to my article! :D


Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds