|
|
Subscribe / Log in / New account

The casync filesystem image distribution tool

Lennart Poettering announces casync, a tool for distributing system images. "casync takes inspiration from the popular rsync file synchronization tool as well as the probably even more popular git revision control system. It combines the idea of the rsync algorithm with the idea of git-style content-addressable file systems, and creates a new system for efficiently storing and delivering file system images, optimized for high-frequency update cycles over the Internet. Its current focus is on delivering IoT, container, VM, application, portable service or OS images, but I hope to extend it later in a generic fashion to become useful for backups and home directory synchronization as well".

to post comments

The casync filesystem image distribution tool

Posted Jun 20, 2017 19:36 UTC (Tue) by nix (subscriber, #2304) [Link] (11 responses)

Very nice. It appears to be a network-synchronization version of bup, almost (at least with regard to the method used to deduplicate, though its use of a better version of tar feels nicer than the bup approach of representing metadata by stuffing it into a "file" in each directory's representation on the backup, though this hack does work well enough given that all real on-disk file chunks are represented as SHA-1 hashes so can't collide, and has the major advantage that bup can just use git for its underlying storage, which is redundant for casync since it wants its "blobs" to be independent operating-system files).

I'm not entirely sure what the point of turning it into a backup tool is, since as a backup tool it feels like everything it does, bup would do better... but as a distribution tool, particularly over CDNs, it seems likely to be without equal. Ah well, the more options the merrier, for backup!

The casync filesystem image distribution tool

Posted Jun 20, 2017 20:12 UTC (Tue) by compenguy (guest, #25359) [Link] (5 responses)

> I'm not entirely sure what the point of turning it into a backup tool is [...]

Perhaps the point isn't about backup, but rather about restore. It seems like it might enable fast, network-efficient snapshot-type restore operations.

Not really sure, though... I'm still trying to wrap my head around it.

The casync filesystem image distribution tool

Posted Jun 20, 2017 20:23 UTC (Tue) by jhoblitt (subscriber, #77733) [Link] (4 responses)

Doesn't duplicity already nicely handle chunked backups, with encryption, and support for remote push?

The casync filesystem image distribution tool

Posted Jun 21, 2017 7:42 UTC (Wed) by Sesse (subscriber, #53779) [Link]

duplicity is painfully slow, though.

The casync filesystem image distribution tool

Posted Jun 21, 2017 11:29 UTC (Wed) by nix (subscriber, #2304) [Link] (2 responses)

So does bup if you ignore encryption (which you should IMHO: the backup is the wrong place for it, we have perfectly good encryption at lower levels, be that cryptfs or LUKS). (It's gaining remote restore, too, as we speak, though this is fairly redundant because you can use *any* other method to ship the backup contents over for a restore run anyway: I just use NFS. So this is only a convenience, really.)

The casync filesystem image distribution tool

Posted Jun 23, 2017 10:20 UTC (Fri) by Sesse (subscriber, #53779) [Link] (1 responses)

Why is backup the wrong place for encryption? I'm very happy that my backup host can't compromise my servers—all it can do is get encrypted backups out of it. Less trust is good.

(Likewise, my servers don't have SSH access to my backup host.)

The casync filesystem image distribution tool

Posted Jun 23, 2017 13:49 UTC (Fri) by nix (subscriber, #2304) [Link]

I don't think it wrong that backups are encrypted -- mine are -- I think that the *backup program* is usually the wrong place for encryption, because frankly encryption is a horribly difficult problem best worked on by people who specialize in it, not people who are doing it on the side of writing a backup program.

So I'd go with an unencrypted backup atop an encrypted filesystem or encrypted block device layer. This has the advantage that you can use whatever ridiculously contrived means you like to acquire the passphrase, which is relatively rarely possible with backup-level encryption (e.g. mine currently comes from a shell script that does a challenge-response on a yubikey -- and getting that requires an ssh to the machine with the key plugged in...)

The casync filesystem image distribution tool

Posted Jun 21, 2017 6:43 UTC (Wed) by walex (guest, #69836) [Link]

Not really sure what's the point of this, considering the existence of Jigdo or of the content-based Arvados Keep distributed filesystem, which match two different application areas quite well.

The casync filesystem image distribution tool

Posted Jun 21, 2017 17:26 UTC (Wed) by drag (guest, #31333) [Link]

> I'm not entirely sure what the point of turning it into a backup tool is

Simplify the user experience with these sorts of tools, I expect.

If the operations involving the creating, pushing, distributing, tracking, sharing, pulling, backing up, and restoration of file systems images have a lot of overlap then it makes sense to have the same basic tools and protocols for all those things. That way it's easier to build high-level solutions on top this tool.

So right now if I want to have, in my environment, the ability to use containers, virtual machines, sync my workstation home directory and backup my workstation and state-full applications/databases then these all require lots of different tools and services to configure. I have to, maybe, setup cinder for virtual machine images, docker registry for containers images, rsync for backing up, etc etc.

Wouldn't it be nice to have a single thing that could provide all that?

Provided, of course, the actual operations for these different needs have a lot of overlap and similarities. Don't want to end up with some massive mess of conflicting and mismatched functionality in a single service...

The casync filesystem image distribution tool

Posted Jun 23, 2017 11:37 UTC (Fri) by mgedmin (subscriber, #34497) [Link] (1 responses)

> represented as SHA-1 hashes so can't collide

*cough* https://shattered.io

The casync filesystem image distribution tool

Posted Jun 23, 2017 13:51 UTC (Fri) by nix (subscriber, #2304) [Link]

Are relatively unlikely to collide, then. :) (and git already checks for that collision mechanism. One wonders how a future SHA-256 migration would work with existing bup repos, though... something to look at.)

The casync filesystem image distribution tool

Posted Jun 26, 2017 10:02 UTC (Mon) by abo (subscriber, #77288) [Link]

As far as I can tell, SHA-1 is not used.

The casync filesystem image distribution tool

Posted Jun 20, 2017 22:10 UTC (Tue) by aggelos (subscriber, #41752) [Link]

Seems the main contribution is the definition of the chunk store. Then you'd just need a tracker for the set of mirrors that could have any related data set (oh wait :-)).

What I didn't find in the blog post is a strategy for pruning chunks - specifically one that would work well for CDNs and mirror operators. Reasonable strategies could be devised without any extra effort of course (e.g. ctime/atime based), but perhaps the format could accommodate more elaborate algorithms?

The casync filesystem image distribution tool

Posted Jun 21, 2017 14:33 UTC (Wed) by rkeene (guest, #88031) [Link]

For a slightly different, but similar, take on a similar problem take a look at AppFS ( http://appfs.rkeene.org/ )


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds