User: Password:
Subscribe / Log in / New account

Btrfs: Subvolumes and snapshots

Btrfs: Subvolumes and snapshots

Posted Jan 7, 2014 5:43 UTC (Tue) by iabervon (subscriber, #722)
Parent article: Btrfs: Subvolumes and snapshots

It would be interesting if you could combine multiple devices and subvolumes such that you have a single filesystem spread across your regular hard drive and a backup disk, using RAID0, and the root subvolume were only allocated from the regular hard drive, while snapshots were only allocated from the backup disk. The actual data transfer to the backup disk would then happen as a consequence of both devices needing to contain the same blocks, while the filesystem as a whole understands that the backup disk doesn't need to contain multiple copies of the same block just because multiple snapshots were made. And, of course, at least one subvolume would always be completely recoverable in the event of either disk failing.

I think that kind of policy would be a more interesting use of multiple devices than RAID policies which just have a numeric block allocation policy, and would be more powerful than what you could do with either device-level RAID or rsync or both.

(Log in to post comments)

Btrfs: Subvolumes and snapshots

Posted Jan 7, 2014 5:49 UTC (Tue) by dlang (subscriber, #313) [Link]

that's not how BTRFS snapshots work.

Every write on a BTR filesystem creates a new block, eventually the old blocks are garbage collected. all that a snapshot does is to prevent the blocks that are current as of the time of the snapshot from being garbage collected. They don't get copied anywhere, they just don't get deleted.

Btrfs: Subvolumes and snapshots

Posted Jan 7, 2014 7:34 UTC (Tue) by iabervon (subscriber, #722) [Link]

Right, the copy wouldn't be due to the snapshot, it would be due to the resulting policy as to where the block has to be stored not being fulfilled, just like if you converted a RAID0 filesystem to a RAID1 filesystem, but on a per-block level, based on what subvolumes include the block. That is, the snapshot doesn't create new blocks, but it does cause the blocks to need to be mirrored, when new files on the root subvolume would not be required to be mirrored (but would have to be on the first device). When the file is deleted from the main subvolume, it could be garbage collected off the first device, but would continue to be required on the backup device until it was no longer in a snapshot.

Of course, I don't think multiple device support interacts like this with subvolumes, either, but it seems like all of the hard work is done: subvolumes and multiple devices work, and online conversion between RAID configurations. Subvolume membership can affect blocks (e.g., via quotas), including when the blocks are shared between subvolumes. It seems to me like all that's missing is the ability to choose raid policy on a per-block basis, some raid policies that wouldn't make sense otherwise (e.g., "raid0, but only use a particular device"), and the ability to trigger balancing based on the creation of a subvolume with a different raid policy.

Btrfs: Subvolumes and snapshots

Posted Feb 6, 2015 2:45 UTC (Fri) by JimAvera (guest, #100933) [Link]

To somehow get the benefits of RAID0 and RAID1 simultaneously would be awesome.

A variation of iabevon's idea would be an explicit "copying unbalance" operation, which would copy necessary blocks of a specified subvolume to reside only on a specified device(s). To be useful for the purpose I'll explain momentarily, blocks which were not already on the destination device(s) would be *duplicated* not moved, leaving the originals where they were; blocks already on the destination(s) would stay as-is, possibly shared.

This could be used to get "delayed redundancy", where everything normally runs RAID0 (striped) for speed, including most snapshots. But periodically a snapshot would be converted to be stored only on a single drive (or subset of striped drives). These snaps would provide insurance against disk crashes, but not in real-time; after a crash, you could recover only to the time of the latest snap isolated to other drive(s).

For example, you could take snapshots every hour or more frequently, keeping 24 hours worth; and once a day "copy & unbalance" the latest snapshot, rotating the excluded drive.

You would get the full speed benefit of RAID0 striping for real-time operations during the day, and still get periodic backups against disk failures. But TANSTAAFL, the time-cost of that redundancy would be more than with RAID1 (where redundant blocks are written concurrently in the first place); however that cost could be paid at controlled times, e.g., in the middle of the night.

A big fly in this ointment is that a succession of such "backup" snapshots would end up with multiple copies of the same data on each drive, because blocks which were "de-balanced" became disconnected copies (necessary so that every block would in fact have copies on multiple drives). Out-of-band deduplication could be run over the latest N "backup" snapshots (N=number of drives) to eliminate unnecessary copies, but adding a lot of disk i/o to the nightly "backup" operations.

A more-difficult solution to duplicate blocks among the backup snapshots, would be to make the "copy & unbalance" operation examine other specified subvolumes to find existing copies of blocks on the destination drive(s), perhaps comparing exactly-parallel files for the same ObjectID in the trees, and then compare corresponding extents which already resided entirely on the destination drive(s).

Btrfs: Subvolumes and snapshots

Posted Feb 6, 2015 5:42 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

It sounds like what you're describing could be handled fairly well by per-subvolume RAID levels, which are already on the agenda according to the btrfs FAQ: "However, we have plans to allow per-subvolume and per-file RAID levels." Just set your backup snapshot as RAID-1 and everything else as RAID-0, then perform a "soft" rebalance to redistribute the data which just changed levels. (But what happens when an extent is part of multiple subvolumes with different allocation schemes? Highest level wins? Last assigned level? The worst possibility would be breaking the COW link....)

Btrfs: Subvolumes and snapshots

Posted Feb 6, 2015 6:41 UTC (Fri) by dlang (subscriber, #313) [Link]

I think you are slightly misunderstanding the way snapshots work.

When you do a snapshot, you aren't copying all the data into the snapshot. What you are doing is copying a current set of metadata and flagging all the disk blocks as Copy on Write, so that as you continue to use the filesystem, the blocks that make up the snapshot never get changed. If the OS wants to write to that file the filesystem allocates a new block, copies the existing data over to it and then does the modification that the OS asked for.

So if you have a filesystem in a RAID0 stripe set of drives, when you make a snapshot, the snapshot will continue to require both drives.

You would then have to make a complete copy of the files on the filesystem to have it all reside on one drive.

Btrfs: Subvolumes and snapshots

Posted Feb 6, 2015 9:40 UTC (Fri) by JimAvera (guest, #100933) [Link]

Well, yes. Creating a snapshot doesn't replicate file data, but that's besides the point. When blocks are striped across all drives, then failure of a single drive causes a total loss.

The idea is to somehow convert selected "backup" snapshots to no longer store anything on a particular drive. It would be like removing one disk from a RAID-0 set, but only for a single specified subvolume. The difference is that blocks on the "removed" drive would be replicated, so other snapshots referencing the same data would still point to the original copy, achieving the desired replication effect.
But such a scheme would only be interesting with two drives; eventually it would create as many copies as drives, which for more than two is excessive.

Btrfs: Subvolumes and snapshots

Posted Feb 6, 2015 9:47 UTC (Fri) by dlang (subscriber, #313) [Link]

actually, since the blocks include references to other blocks, you would end up with three (or more) copies of the blocks

1. the original set that's split across the two drives

2. the set that has all pointers changed to live on drive1

3. the set that has all pointers changed to live on drive2

and since the pointer changes need to take place each time the data is copied from one drive to another (since the existing blocks will already be in use), each single-drive snapshot would be a full copy of everything.

Also, while snapshots are pretty reliable (since what they do is stop making changes to the blocks currently in use), this code would be doing some major re-writing of blocks it would also be far more fragile.

you would be better of just using a conventional backup program to do a backup of the snapshot, far less risk involved.

Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds