LWN.net Logo

Desmond: Out-Tridging Tridge

Barney Desmond talks about an rsync performance problem and its solution; the result is an interesting overview of how rsync works. "Most of the activity in this MySQL data file occurs at the end, where more zeroes had been written on the sender’s side. rsync hits this section of the file and is calculating the rolling checksums as normal. For each checksum, it’s referring to the hash table, hitting the all-zeroes chain, then furiously traversing the chain skipping over unusable chunks. Things are now possibly hundreds of times slower than normal, and the backup job has been running for over a week with no sign of finishing any time soon; not sustainable."
(Log in to post comments)

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 20:39 UTC (Wed) by ikm (subscriber, #493) [Link]

Since this article talks about backups and rsync, I welcome people to check out a globally deduplicating tool I wrote specifically for backups, http://zbackup.org/. I backup all my data with it nowadays and feel cozy.

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 21:44 UTC (Wed) by khc (subscriber, #45209) [Link]

Nice! What's the performance like for zbackup? There's also http://code.google.com/p/s3ql/ which seems more mature.

Having previously worked on a commercial deduplication product it's always interesting to see open source catching up.

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 22:42 UTC (Wed) by ikm (subscriber, #493) [Link]

They are quite different. I believe s3ql does not do rolling hashes, so it only features a block-level deduplication, comparable to ones found on some filesystems (e.g ZFS). As far as performance goes, it's easier to just test. LZMA takes most of the time, but this only really applies to the first backup where most of the data is new. As long as incrementality is high, subsequent backups should be fast, and there's also room for micro-optimizations too.

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 23:03 UTC (Wed) by khc (subscriber, #45209) [Link]

indeed I misread, their faq seems to imply that they use 10MB as the blocksize: http://code.google.com/p/s3ql/wiki/FAQ#Suppose_I_want_to_...

I haven't looked at your code, but from the description it sounds like you are checking the hash for each rolling fingerprint, and then the max chunk size is 64K. I take it that it means for each byte, you see if the rolling hash is already defined, if not, keep going, until you hit 64K and create a new chunk? Wouldn't that only create 64K chunks? Or are you doing other things to limit the chunk size?

Desmond: Out-Tridging Tridge

Posted Aug 29, 2013 7:04 UTC (Thu) by ikm (subscriber, #493) [Link]

Yes, deduplication normally works on full chunks. There are cases where smaller chunks are used - to fill gaps between the full ones or at the end of input.

Desmond: Out-Tridging Tridge

Posted Aug 29, 2013 8:05 UTC (Thu) by khc (subscriber, #45209) [Link]

What do you mean by "fill gaps between the full ones"?

Desmond: Out-Tridging Tridge

Posted Aug 29, 2013 22:54 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

I imagine that it's likely something like this:

Before:
> AAAAABBBBB

After:
> AAAAACBBBBB

where C is a smaller block to "fill gaps between the full ones".

Desmond: Out-Tridging Tridge

Posted Aug 30, 2013 1:04 UTC (Fri) by khc (subscriber, #45209) [Link]

Right. It's a little strange though. Consider how the rolling hash would work (disclaimer: I actually haven't looked at zbackup's code):

we will simplify things by saying the max chunk size is 5 bytes instead. With your example, it's easy to see how AAAAA would be a match. Next we get to C, which is not a match. Then we get to CB, CBB, CBBB, CBBBB, none of these would match, and we hit the max chunk size. This normally would cause us to define a new chunk, with a chunk size of 5 bytes.

I guess it may work if the lookup window is twice the max chunk size though.

Desmond: Out-Tridging Tridge

Posted Aug 30, 2013 9:03 UTC (Fri) by ikm (subscriber, #493) [Link]

Once you get to CBBBB, the hash starts to roll. The first roll would get it to BBBBB, leaving C between the deduplicated chunks AAAAA and BBBBB (this whole example assumed that both AAAAA and BBBBB were encountered before so they got deduplicated here).

In a nutshell: after each deduplication matched, the window starts growing from a single byte to its full width. After that, the window starts rolling forward, looking for a match. If a match is found, the data left behind the window is output as a new short chunk (or even in-place if it's too little). If no match is found at the point where the data left behind is enough for a new full-sized chunk, the data left behind is output as a new full-sized chunk. The window then continues to roll.

Desmond: Out-Tridging Tridge

Posted Sep 2, 2013 19:53 UTC (Mon) by ttonino (subscriber, #4073) [Link]

When I heard about deduplicating in a way that is not block-oriented, I immediately thought about cutting the source into variable length blocks at a specific marker or set of markers. If the file is cut at a 2 byte sequence such as 0x3456, the average chunk size will be 64KB.

This is obviously not random enough (binary and text behave differently). So I would propose having perhaps 512 different markers of 3 bytes length for an average block size of 32 KB. These blocks can then be deduplicated with little processing by determining the hash and perhaps length.

Desmond: Out-Tridging Tridge

Posted Sep 3, 2013 4:28 UTC (Tue) by khc (subscriber, #45209) [Link]

In practice it works well and is fast (although you usually want to compare a full word and not 2 bytes).

Desmond: Out-Tridging Tridge

Posted Aug 29, 2013 14:06 UTC (Thu) by LenZ (subscriber, #1051) [Link]

FWIW, I just recently added zbackup support to mylvmbackup (the patch was contributed by Ivan Korjavin): https://code.launchpad.net/mylvmbackup - the code is in the trunk, but has not been released yet. Feedback is welcome!

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 21:28 UTC (Wed) by Pc5Y9sbv (guest, #41328) [Link]

If one is patching rsync to handle transfers into btrfs, wouldn't it be better to provide a different transfer mode like --cow and --cow-dest?

With just --cow, it could make a cloned temporary file from the original (using copy-on-write cloning syscalls), write only the changed bits to the clone, then relink it in place of the original. This is a hybrid of normal and --inplace transfers where it would have an old file for differential material and a pre-populated copy into which to write only changes.

With --cow and --cow-dest=/older/snapshot/path it might assume the caller has already cloned the entire tree (whether in btrfs or LVM snapshots) and provided a reference to the parent of the clone, so that it could find each file's immutable source material while doing the same sort of in-place writes to the new clone.

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 21:51 UTC (Wed) by marcH (subscriber, #57642) [Link]

This look like a very simple and efficient solution and simple is always good.

In fact I thought rsync was doing by default exactly what you described as --cow. And then I was wondering how come rsync was "defeating btrfs’ copy-on-write benefits"!

Note this good laugh from the original article: "The thing is, modern networks aren’t like that at all, they’re high bandwidth and low latency."

More seriously rsync could also feature a "--datacenter-network" recursive copy option that would do a plain, complete transfer of the files which have changed.

Desmond: Out-Tridging Tridge

Posted Aug 28, 2013 22:16 UTC (Wed) by raven667 (subscriber, #5198) [Link]

> More seriously rsync could also feature a "--datacenter-network" recursive copy option that would do a plain, complete transfer of the files which have changed.

I thought that --whole-file option already did that

One thing about the rsync protocol is that it is disk IO expensive on _both_ ends to save network bandwidth, so when using rsync for backups issues on the backup server are more noticeable than the issues distributed out among the client population, especially when running multiple backups in parallel. The btrfs COW problems are a corollary to that, on the clients which are being read or among individual servers doing file transfers isn't not a problem, when collecting rsyncs from many servers all the sudden this issue is a noticeable problem.

Desmond: Out-Tridging Tridge

Posted Aug 29, 2013 3:49 UTC (Thu) by mchapman (subscriber, #66589) [Link]

As a colleague of Barney's, and the person who developed the patch and sent it upstream, I think I've got a few comments to add to this story.

First, yes, the title probably is a bit unfair on Tridge. For all I know he didn't write that bit of rsync -- inplace updates certainly aren't considered in the original rsync tech report. What we have discovered, though, is that if you're writing a blog post attention-grabbing headlines really do work (cats are useful too).

As for the patch, it's actually not particularly complicated. Inplace updates aren't the default though, so perhaps it's not too surprising that this performance problem went unnoticed for a long time.

What I think is the most interesting is that it does clearly highlight just how much computers and networks have improved in the last two decades. Rsync was originally written when memory was scarce, CPUs were slow and bandwidth was small. Nowadays, it doesn't necessarily make sense to write software with these assumptions in mind.

We are contemplating changing how we do our backups. It might be better for us if we just sent the whole lot over the wire, and do the block comparison at the destination end, just before writing the data out. Even though that would generate a lot more network traffic it would probably give us yet another speed improvement.

Desmond: Out-Tridging Tridge

Posted Aug 29, 2013 10:43 UTC (Thu) by marcH (subscriber, #57642) [Link]

> Rsync was originally written when memory was scarce, CPUs were slow and bandwidth was small. Nowadays, it doesn't necessarily make sense to write software with these assumptions in mind.

Again, there is no reason not to use rsync on mobile and embedded systems. In fact this is where it's the most useful since this is basically the same problem that rsync was designed for and is solving.

The reason why rsync (and git, and others...) rock is because they are one of very few network tools which actually understood all this:

http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Com...

Most of the other networking tools suck because they don't.

Desmond: Out-Tridging Tridge

Posted Sep 4, 2013 6:33 UTC (Wed) by nicku (subscriber, #777) [Link]

Well done, Michael!

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds