|
|
Log in / Subscribe / Register

Block-device snapshots with blksnap

By Jonathan Corbet
November 14, 2022
As a general rule, one need not have worked in the technology industry for long before the value of good data backups becomes clear. Creating a backup that is truly good, though, can be a challenge if the filesystem in question is actively being changed while the backup process runs. Over the years, various ways of addressing this problem have been developed, ranging from simply shutting down the system while backups run to a variety of snapshotting mechanisms. The kernel may be about to get another approach to snapshots should the blksnap patch set from Sergei Shtepa find its way into the mainline.

The blksnap patches are rigorously undocumented, so much of what follows comes from reverse-engineering the code. Blksnap performs snapshotting at the block-device level, meaning that it is entirely transparent to any filesystems that may be stored on the devices in question. It is able to create snapshots of a set of multiple block devices, so it should be suitable for RAID arrays and such. The targeted use case appears to be automated backup systems; the snapshots that blksnap creates are described as "non-persistent" and are meant to be discarded once a real backup has been made.

Since blksnap works at the block level, it must be given space to store snapshots that is separate from the devices being snapshotted. Specifically, there are ioctl() operations to assign ranges of sectors on a separate device for the storage of "difference blocks" and to change those assignments over time. There is a notification mechanism whereby a user-space process can be told when a given difference area is running low on space so that it can assign more blocks to that area.

The algorithm used by blksnap is simple enough: once a snapshot has been created for a set of block devices (using another ioctl() operation), blksnap will intercept every block-write operation to those devices. If a given block is being written to for the first time after the snapshot was taken, the previous contents of that block will be copied to the difference area, and a note will be made that the block has been changed since the snapshot was created. Once that is done, the write operation can continue normally. The block devices thus always reflect the most recent writes, while the difference area contains the older data needed to recreate the state of those devices at the time the snapshot was created.

In order to be able to intercept writes to the block devices, Shtepa has had to add a new "device filter" mechanism to the block layer. A filter can be attached to a specific device that will be called prior to the execution of each operation on that device, with the BIO structure representing that operation as a parameter. If the filter function returns false, the operation will not be executed. An earlier version of the patch set provided the ability to attach multiple filters to a block device at different "altitudes", but that was removed since there are no other uses for filters currently.

Blksnap uses the filter function to catch writes to the snapshotted device(s). When a write is found, the operation is put on hold while the original contents of the blocks to be written are copied to the difference area; once that is complete, the write is submitted normally.

Interestingly, nothing in the patch set describes how one might gain access to a snapshot once it has been created. A look at the ioctl() interface shows a couple of possibilities, though. One is an operation to obtain the list of changed blocks associated with a snapshot, which might be useful for certain types of incremental backups. But blksnap also creates a new, read-only device for each snapshot taken. Reading a block from that device causes blksnap to consult its map of changed blocks; if the block in question has been changed, it is read from the difference area. Otherwise, it can be read from the original block device. The major and minor numbers of the snapshot devices can be obtained with another ioctl() operation; there is also an undocumented sysfs file that apparently can be consulted.

The kernel does not lack for the ability to make snapshots now, so one might logically wonder why blksnap is needed. It clearly differs from the snapshot feature offered by filesystems like Btrfs, since blksnap operates at the block-device level. Among other things, blktrace can be used with filesystems that do not, themselves, have a snapshot feature. Btrfs snapshots are stored on the same block device as the filesystem itself, meaning that the two can compete for space, and the space used by snapshots could prevent the writing of data to the live filesystem. Since blksnap stores its snapshot data on a separate device, that data won't get in the way of ongoing operations. If the difference area runs out of space the snapshot will be corrupted, but the device being snapshotted will be unaffected.

An existing alternative at the block level is the device mapper snapshot target. The functionality provided by blksnap is, in many ways, similar to the device mapper; both work by intercepting writes and copying the old data to a separate device. Blksnap can be used without needing to set up the device mapper for the devices to be snapshotted, though. It also claims to have more flexible management of its difference area, especially when multiple devices are being snapshotted together.

These differences appear to be interesting enough that nobody has, so far, questioned whether blksnap is a useful addition to the kernel. The patch set (despite being marked "v1") is on its second revision, having seen a number of fixes from its first posting in July. With luck, the next revision will incorporate some documentation; then perhaps it will be nearing readiness for inclusion into the mainline.

Index entries for this article
KernelBlock layer


to post comments

Block-device snapshots with blksnap

Posted Nov 14, 2022 17:46 UTC (Mon) by smurf (subscriber, #17840) [Link] (6 responses)

> With luck, the next revision will incorporate some documentation

It's way past time to throw "with luck" into the bin of not-so-ancient kernel history, and replace it with "write the freakin' docs or your patches won't go into the kernel. Period end of discussion".

Block-device snapshots with blksnap

Posted Nov 14, 2022 18:30 UTC (Mon) by pbonzini (subscriber, #60935) [Link] (1 responses)

Fortunately this is already the case for several subsystems.

Block-device snapshots with blksnap

Posted Nov 14, 2022 22:41 UTC (Mon) by Fantu (guest, #162182) [Link]

Hi, I don't see the initial message and I saw this messages from google search result.
If blksnap patch serie have some lack (from the messages I see it seems like this) why don't reply and make it known to the developers? I don't see any answers at the moment that talk about a "big" lack of documentation
https://lore.kernel.org/lkml/20221102155101.4550-1-sergei...
that would be the v2 revision (seems to have written v1 by mistake)
but also in v1 it doesn't seem to me that there have been messages that speak of major lacks of documentation that prevented a possible acceptance but specific things on which changes have been made in v2
or maybe I'm wrong?
https://lore.kernel.org/linux-block/1655135593-1900-1-git...

thanks for any reply and sorry for my bad english

Block-device snapshots with blksnap

Posted Nov 15, 2022 22:39 UTC (Tue) by sshtepa (guest, #158959) [Link] (3 responses)

Hi everyone.
Unfortunately, this is the first time I hear that the disadvantage of the patch is a lack of documentation. I would be glad to see any feedback e-mail.
Until the I/O init filtering mechanism itself is approved, it makes no sense to take time away from people who check the documentation.
I have already written documentation on blk-interposer. I'm sorry that the efforts of the reviewers were scrapped.

I have prepared the documentation on the filtering mechanism here https://github.com/veeam/blksnap/blob/master/doc/bdev_fil... .
I have described my thoughts about the blksnap module and its support from the user-space here https://github.com/veeam/blksnap/blob/master/doc/blksnap.md .

And I have a draft article for lwn. Maybe it's time to publish it.

Block-device snapshots with blksnap

Posted Nov 16, 2022 16:26 UTC (Wed) by daneturner (guest, #61385) [Link] (1 responses)

Thanks for this. Super helpful docs

Block-device snapshots with blksnap

Posted Nov 17, 2022 18:07 UTC (Thu) by sshtepa (guest, #158959) [Link]

Thanks. I'm afraid the description of the filter is somewhat outdated and corresponds to the previous version. It will need to be updated.

Block-device snapshots with blksnap

Posted Nov 17, 2022 6:18 UTC (Thu) by davidbarton (guest, #67985) [Link]

I'd love to read the article @sshtepa

Backup of block devices is a tough problem. I'm interested in a use case for blksnap where the filter layer tracks changed blocks for a period, and then the snapshotting is done for a short window where the changes are backed up.

snapshots / COW systems tend to have an overhead. For my case we really only need to track which blocks have changed and then snapshot while synchronising to a backup (which might be COW on zfs or lvm).

Block-device snapshots with blksnap

Posted Nov 15, 2022 6:03 UTC (Tue) by donald.buczek (subscriber, #112892) [Link] (11 responses)

It doesn't feel right to have a second complex system to redirect block i/o into.

Why not build on the dm system? Instead of the new system, we'd only need the ability to attach a dm device to an existing block device. And we'd need to allow a mounted block device as a dm target, which, I think, is currently prevented without the option to override.

Block-device snapshots with blksnap

Posted Nov 15, 2022 13:03 UTC (Tue) by Conan_Kudo (subscriber, #103240) [Link] (7 responses)

This is something that I've explored before when talking to the DM folks before and after dattobd was written. The plain and simple truth is that they don't want to. The DM folks believe that the users should just redo their storage from scratch if they want the capabilities of DM and any other way is foolhardy at best. After a bunch of circular discussions, I gave up.

Sometimes I wonder if some of the Linux subsystem maintainers have actually worked in real-world situations before, because I get some pretty asinine suggestions sometimes...

Block-device snapshots with blksnap

Posted Nov 15, 2022 15:41 UTC (Tue) by Wol (subscriber, #4433) [Link]

If they've never worked outside of a computer department, if they've never lived in the end-user world, then they're like politicians - they know how things SHOULD be done, they've just never tried to do it, and they're surprised when the "should" is totally impractical or counter-productive.

The mainframe culture writ large in the IT department ...

Cheers,
Wol

Block-device snapshots with blksnap

Posted Nov 15, 2022 16:04 UTC (Tue) by donald.buczek (subscriber, #112892) [Link] (4 responses)

> dattobd

Oh, wow, impressive work.

I can see, you'd rather have a more stable interface for you hooks than ftrace.

> blk_qc_t (*dattobd_submit_bio_noacct_passthrough)(struct bio *) =
> (blk_qc_t(*)(struct bio *))((unsigned long)(submit_bio_noacct) +
> FENTRY_CALL_INSTR_BYTES);

Another idea to avoid ftrace patch recursion, thanks :-)

Block-device snapshots with blksnap

Posted Nov 15, 2022 16:22 UTC (Tue) by Conan_Kudo (subscriber, #103240) [Link]

There was a hot minute where I was aggressively trying to push for upstreaming dattobd into Linux (we even had the generic name for it picked out: "volsnap"), but the amount of negative feedback we got about it in 2017 killed that stone cold. It was pretty depressing...

Block-device snapshots with blksnap

Posted Nov 15, 2022 23:51 UTC (Tue) by sshtepa (guest, #158959) [Link] (2 responses)

One problem is that it only works on Intel/AMD architecture.
I haven't been able to do something similar for ARM or POWER yet.
If you have any ideas how to fix it, let me know :).

Block-device snapshots with blksnap

Posted Nov 16, 2022 7:35 UTC (Wed) by donald.buczek (subscriber, #112892) [Link] (1 responses)

> If you have any ideas how to fix it, let me know :).

This is not my idea, but one method is to accept and detect the recursion [1]. But the additional stack frame and overhead is something you possibly don't want to have in the submit_bio path.

[1]: https://github.molgen.mpg.de/mariux64/fix-lpp/blob/e4c41a...

Block-device snapshots with blksnap

Posted Nov 17, 2022 16:30 UTC (Thu) by sshtepa (guest, #158959) [Link]

Unfortunately, in order to be able to work on existing Linux kernels, I have to use such disgusting crutches.
Offering a module to the upstream is a solution to this problem. And many other problems.

Block-device snapshots with blksnap

Posted Jan 17, 2023 18:39 UTC (Tue) by msnitzer (subscriber, #57232) [Link]

As the DM subsystem maintainer I don't recall you ever directly or indirectly contacting me (famous last words). But I've _never_ been opposed to devising a way to make all block devices capable of being remapped without first needing to have created the DM device at the dawn of time.

The need for such a capability is quite niche... but I don't dispute that those who would like it exist (and might jump to false accusations about others :/ ).

The coding of such an advance just hasn't been a priority _for me or my broader team within Red Hat_. That doesn't mean "DM folks" reject the idea. It means others need to do the work. Implementing it in terms of yet another way to remap IO (blkfilter) speaks to inherent lack of understanding on how to make a general purpose advance that doesn't further splinter the block core's capabilities.

Block-device snapshots with blksnap

Posted Nov 16, 2022 8:04 UTC (Wed) by donald.buczek (subscriber, #112892) [Link] (2 responses)

My criticism was unjustified. The patch series contains a new general mechanism to hook into the i/o path of an existing block device and such a mechanism doesn't currently exist. blksnap would just be one potential user of it. And the article says so.

But then I wonder if it would possible to add a filter module which redirects into a dm stack.

Block-device snapshots with blksnap

Posted Nov 17, 2022 17:33 UTC (Thu) by sshtepa (guest, #158959) [Link] (1 responses)

Technically, this is certainly possible. Moreover, I tried to make just such a blk-interposer for DM. I even had a working code (verified by the developer ;-).
But interaction with Mike on this work turned out to be ... mmmmm, not effective. Сan be many reasons for this. Perhaps the DM team is busy with higher priority tasks. I think that in order to add a filter to the existing DM architecture we will have to redo quite a lot of code. This requires the will of the DM developers.

Block-device snapshots with blksnap

Posted Jan 17, 2023 19:23 UTC (Tue) by msnitzer (subscriber, #57232) [Link]

As I just stated elsewhere in this LWN thread. I haven't had a need for this capability. The blkinterposer work seemed on its way. I guess you felt my lack of direct ownership personal. But iterating on things is how Linux is developed.

Working around the removal of a key export that the veeam team was (ab)using to hijack IO for their commerical backup product by introducing yet another mechanism to remap IO is certainly one way to proceed -- but it sure isn't the most productive way to advance Linux's existing capabilities without splintering our collective efforts.

I suppose I should've taken on working on something for some other company's benefit simply because I happen to work on and maintain DM? Pretty specious logic there.

Block-device snapshots with blksnap

Posted Nov 15, 2022 23:43 UTC (Tue) by sshtepa (guest, #158959) [Link] (5 responses)

Thanks Jonathan for the article.
I am glad that someone is interested in my work.

The blksnap module is an improved version of the veeamsnap module, which takes care of backups of thousands of servers around the world, and includes seven years of experience in the field of backup.

Seven years ago, when I was analyzing alternative solutions, I saw that almost everyone who offers backup services at the enterprise level has their own implementation of a module for creating snapshots. And there are reasons for this. Obviously, the tools available in the kernel have their drawbacks. DM snapshots and Btrfs snapshots are a pain. I think they were not created for backup. I know this because we support their use.

So, I had to create my "own bike". I won't bother you with listing the whole heap of problems associated with using the out-of-tree module. I want to stop this absurd.
I am sure that the presence of the blksnap module in the upstream will please all vendors of backup tools and will improve the quality of service for Linux users.
Backup tools will become easier and more reliable, and Linux system administrators will be happier.
-
Any feedback is welcome.

Block-device snapshots with blksnap

Posted Nov 16, 2022 1:32 UTC (Wed) by pabs (subscriber, #43278) [Link] (3 responses)

Have the vendors of all of the out-of-tree modules taken a look at the new module?

Block-device snapshots with blksnap

Posted Nov 17, 2022 9:50 UTC (Thu) by sshtepa (guest, #158959) [Link] (2 responses)

That's a good question.

The blksnap module has been developed openly for more than a year. See https://github.com/veeam/blksnap/graphs/contributors . Prior to that, the veeamsnap module (the prototype of the blksnap module) has been publicly available since 2016 (https://github.com/veeam/veeamsnap/graphs/contributors). Of course, the developers of backup tools should have paid attention to what competitors are doing.

I would really like the backup tool developers to be able to unite and at least express their opinion. But don't expect that.
In my opinion, the problem is in the corporate culture of these organizations. Some try to present their module as a unique technology, use a proprietary license, and try to hide the source code. They are not ready for dialogue with the community. With the advent of the module as part of the kernel, games with proprietary modules will become of no interest to anyone, and everyone will use what is in the kernel.

Functionally, the modules of other developers differ from each other slightly. The principles are about the same. Personally, I think that the blksnap module surpasses them in functionality. Of course, I will be glad to read the criticism of other developers about this.

I tried very hard to create as simple code as possible that would be easy to read. Comparing the code of the veeamsnap and blksnap module, I see that I have achieved some success in this. And in order to facilitate the development of new code by other developers, tools, libraries and tests were created in C++ and bash. Anyone can collect packages (https://github.com/veeam/blksnap/tree/master/pkg/deb), run the tests and see how it works. In the future, I plan to offer the blksnap tool to Linux distributors. This will allow to create snapshots from the console. In this case, it will be possible to make consistent backups using simple scripts on "dd", "tar", "gzip" and "rsync", abandoning the backup tools.

Block-device snapshots with blksnap

Posted Nov 17, 2022 18:38 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (1 responses)

Is this mostly relevant to VM backup tools or can something like `restic` also benefit from this?

Block-device snapshots with blksnap

Posted Nov 17, 2022 21:26 UTC (Thu) by sshtepa (guest, #158959) [Link]

Backups of virtual machines on ESX and Hyper-V are better performed by other solutions that use "external" snapshots of virtual machines.
This solution allows to create snapshots for physical machines and virtual machines in the cloud. Backup of virtual machines is of course also possible. Containers cannot be backed up.

Block-device snapshots with blksnap

Posted Nov 23, 2022 18:42 UTC (Wed) by anton (subscriber, #25547) [Link]

We use btrfs and ZFS, and the main reason we use them is their snapshotting abilities that we use for backups. And I also use nilfs2 for the same reason (and because it is the only Linux file system that gives a sensible consistency guarantee, at least when I last looked).

I don't know why you consider btrfs snapshots a pain. What's a bit painful is that these three file systems have different interfaces for creating, destroying, and accessing snapshots, so the backup scripts have to be different for the different file system types. But that's a one-time expense.

What I am wondering about your backups of thousands of servers is how you achieve file system consistency. If you snapshot the block device of a file system without cooperation from the file system, the snapshot may have an inconsistent state of the file system (as we all know from the O_PONIES discussion, some Linux file system developers don't want to make sure that the on-disk state is always consistent, and the only file system that gave better guarantees last I looked was nilfs2; and we don't need blksnap for nilfs2). Even if you make the block device snapshot right after a sync of the file system, a write between the sync and making the snapshot could have destroyed the consistency. Do you have a way to sync, and to prevent writes until after you have created the snapshot?

Block-device snapshots with blksnap

Posted Dec 6, 2022 16:43 UTC (Tue) by Fantu (guest, #162182) [Link] (2 responses)

Hi, Sergei Shtepa prepared documentation files that will add to next patch for upstream:
https://github.com/veeam/blksnap/blob/master/doc/blkfilte...
https://github.com/veeam/blksnap/blob/master/doc/blksnap.rst
What do you think about them?
I was sad to see numerous sites quoting this article and highlighting "The blksnap patches are rigorously undocumented...", although there was a documentation on github also before and it was written in the patch cover https://lore.kernel.org/lkml/20221102155101.4550-1-sergei...
I hope this will be good enough and if it's not good enough there will be suggestions and contributions to improve it and I hope that there will no longer "missing documentation" as an impediment to the review of blksnap for upstream

Documentation

Posted Dec 6, 2022 17:09 UTC (Tue) by corbet (editor, #1) [Link] (1 responses)

The available documentation does appear to be an improvement, in that it provides at least superficial coverage of the interface to blksnap — something that was completely absent before.

Documentation

Posted Dec 8, 2022 22:36 UTC (Thu) by Fantu (guest, #162182) [Link]

thanks for reply, other additions, improvements and corrections have been made:
https://github.com/SergeiShtepa/linux/blob/blksnap_lk6.1-...
https://github.com/SergeiShtepa/linux/blob/blksnap_lk6.1-...
plus to many kernel-doc comments in code (visible from generated output)
what do you think now?

Block-device snapshots with blksnap

Posted Dec 12, 2022 10:46 UTC (Mon) by sshtepa (guest, #158959) [Link]

Hi!

Many thanks to everyone for the feedback, in the comments to the article.
I tried to fix the shortcomings of my patch that you indicated and made a v2 version of the patch.
See: https://lore.kernel.org/linux-block/20221209142331.26395-...

Thanks to Bagas Sanjaya, I already know about a number of shortcomings in the documentation.
I still have a lot of work to do in order for my documentation writing skill to reach the required perfection.

In the documentation, I tried to describe why the Linux kernel needs new mechanisms for creating snapshots.
Believe me, the developers of backup tools absolutely do not want to create and maintain out-of-tree modules for Linux for snapshots.
This is a significant cost, but it is the only way now to provide a sufficiently high quality of services for users.
The module for snapshots of block devices in the kernel will allow to raise the level of service quality even higher for users of backup tools on Linux.

In the meantime, I'm feeling sad because of the problems associated with out-of-tree modules.
See: https://access.redhat.com/solutions/6985596

Block-device snapshots with blksnap

Posted Apr 5, 2023 16:53 UTC (Wed) by sshtepa (guest, #158959) [Link]

Block-device snapshots with blksnap

Posted Nov 25, 2023 9:07 UTC (Sat) by Fantu (guest, #162182) [Link]

Block-device snapshots with blksnap

Posted Feb 14, 2024 14:13 UTC (Wed) by Fantu (guest, #162182) [Link]


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds