LWN.net Logo

Backing up in trees with Obnam 1.0

Backing up in trees with Obnam 1.0

Posted Jun 7, 2012 13:31 UTC (Thu) by rbrito (subscriber, #66188)
In reply to: Backing up in trees with Obnam 1.0 by oever
Parent article: Backing up in trees with Obnam 1.0

It seems to me that obnam does use something like rolling checksums (or, at least, something close), as is stated in the manpage:

"When you run a backup, obnam uploads data into the backup repository.
The data is divided into chunks, and if a chunk already exists in the
backup repository, it is not uploaded again."

Regarding obnam and bup, I have tried both in this past week and some quick observations about them were:

* obnam can delete previous backups that you don't want anymore, while bup can't---and this is even mentioned in the documentation. This is useful for those that (like me) backup some directories that contain large files (e.g., videos downloaded from youtube or ISOs of distributions etc.) that I didn't mean to be there in the first place.

* obnam doesn't have a way to easily browse the contents of the backup repository, but bup does have (at least) three ways: a FUSE implementation (bup fuse), a web implementation (bup web) and an FTP-like implementation (bup ftp).

* bup decides to store its backup repository under ~/.bup, if not informed otherwise. If you skim quickly its manpage, you can probably miss the fact that you should specify the -d option to get it to backup somewhere else. The -f option of "bup index" *only* works for the index file, not for the whole backup.

I decided, for the first reason, to stick with obnam, as I am badly in need of a backup strategy and I hope that a FUSE implementation will soon appear (so that one can, e.g., drag and drop the needed files from, say, nautilus or via samba).

The only thing that I found bad about obnam (besides the lack of navigation cited above) is that it is slow. On a 2nd generation Core i5 notebook, backing up to an external USB HD attained speeds of up to 10MB/s, which I think that could be better. Only one core seemed to be used.

By the way, regarding bup, is it safe to run the command "git gc" in the backup repository?


(Log in to post comments)

Backing up in trees with Obnam 1.0

Posted Jun 7, 2012 22:29 UTC (Thu) by oever (subscriber, #987) [Link]

First off: I have not tested bup myself on a significant amount of data; so far I'm content with reading parts of the code and the documentation and thinking about scenarios for using it.

Ocman seems to do de-duplication on fixed blocks, not variable blocks as one would get with a rolling checksum. You can configure the block size, but i think the boundary positions are simply multiples of the block size.

When using a rolling checksum, one moves a window over the data and when the checksum value falls in a particular range, the block ends. This means that the blocks have different sizes. The size depends on the content. By choosing the range for the checksum values that trigger a split, one can influence the average blocks in the backup.

Backing up in trees with Obnam 1.0

Posted Jun 8, 2012 10:27 UTC (Fri) by rbrito (subscriber, #66188) [Link]

Please, excuse my ignorance here, but you have consistently used the name ocman.

Is ocman a typo for obnam?

I don't find any hits related to backups doing some searches with ocman as a keyword (e.g. https://duckduckgo.com/?q=ocman+backup).

Backing up in trees with Obnam 1.0

Posted Jun 8, 2012 12:36 UTC (Fri) by oever (subscriber, #987) [Link]

It was an error, I meant obnam, not ocman.

Backing up in trees with Obnam 1.0

Posted Jun 8, 2012 9:00 UTC (Fri) by juliank (subscriber, #45896) [Link]

> Only one core seemed to be used.

It's written in Python, so I would not assume it to use more than one core due to the GIL anyway.

Backing up in trees with Obnam 1.0

Posted Jun 8, 2012 10:19 UTC (Fri) by rbrito (subscriber, #66188) [Link]

I was under the impression that even programs written in Python can use multiple cores/cpus/whatever when calling C-extensions (appropriately marked with Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS), but I am a real beginner with respect to python and I would appreciate any correction.

Backing up in trees with Obnam 1.0

Posted Jun 8, 2012 10:32 UTC (Fri) by juliank (subscriber, #45896) [Link]

Yes, but then, this code does not really have much C parts from what I remember.

Backing up in trees with Obnam 1.0

Posted Jun 14, 2012 15:23 UTC (Thu) by JanC_ (guest, #34940) [Link]

Well, all file I/O and a bunch of other things are in C, but all (or most of) that code probably isn't very CPU-intensive...

It should be possible to move the CPU-intensive parts (all the hashing & encryption parts) to C or Cython code. Alternatively, PyPy is working on removal of the GIL, but that might take years to finish.

But I'm not sure in how far Obnam currently uses non-sequential code anyway?

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds