LWN.net Logo

How The Backup Process Has Changed

How The Backup Process Has Changed

Posted Nov 29, 2007 12:49 UTC (Thu) by hjb (subscriber, #25523)
Parent article: How The Backup Process Has Changed

Hi Forrest,

why do you use bzip2? It's much too slow to be practical. From my limited point of view, I
know nobody who'd use bzip2. gzip is the better choice.

I someone wants a very efficent, secure backup to a hard disk (possibly of several computers),
one should look at BoxBackup (http://www.boxbackup.org/). It stores a complete revision
history of every file so in theory it should be possible to restore an arbitrary point in
time, although the client currently is missing this feature.

I just published an updated version of my article about it (in German) on
http://www.pro-linux.de/berichte/boxbackup.html

Regards,
hjb


(Log in to post comments)

How The Backup Process Has Changed

Posted Nov 29, 2007 17:43 UTC (Thu) by Los__D (subscriber, #15263) [Link]

If you need speed, go with gzip (or maybe even just plain tar), if you need space, go with
bzip2...

How The Backup Process Has Changed

Posted Nov 30, 2007 3:59 UTC (Fri) by sitaram (subscriber, #5959) [Link]

I have this bad habit of doing all my research in one shot, as exhaustively as I can,
summarising it for quick reference, and then discarding the raw data :-)

With that caveat, here is a dump of the entry titled "Choosing between LZMA, BZIP2, and GZIP"
in my personal quickref wiki:

---------------------------

LZMA is the new kid on the block: less space and faster decompression (than
BZIP2) at the cost of much, *much* slower compressions.

(Default compression levels are GZIP: 6, BZIP2: 9, and LZMA: 7)

Summary
-------

    Use none when
      - almost all files in the dataset are already compressed (DUH!)

    Use GZIP when
      - time is more important than space, or
      - system memory is very limited, or
      - a lot of files in the dataset are already compressed but nowhere near
        all of them

    Use LZMA when
      - space is more important than time, or
      - space is important AND the file will be decompressed many times

    Benchmark BZIP2, LZMA at level 1 and perhaps LZMA at level 2 when
      - both space AND (compression) time are important, and
      - you're going to be compressing this same dataset frequently (like a
        daily backup script for your email folders)

    Otherwise just use BZIP2

How The Backup Process Has Changed

Posted Dec 8, 2007 1:52 UTC (Sat) by roelofs (subscriber, #2599) [Link]

Use none when
- almost all files in the dataset are already compressed (DUH!)

Major omission, both here and in the main article: also use none when your backup medium (and/or the path to it, including RAM) may have errors. Both compression and encryption largely destroy any ability to recover data past the error location. (I discovered two bad bits in 1 GB of memory while verifying a backup to DVD+R.)

Otherwise just use BZIP2

bzip2 is much, much slower than gzip on decompression, too. If it's read-once (or read-none), then that may not matter. But for read-many it's pretty bad. (I have no data on LZMA or other alternatives. Capacity is cheaper than CPU, however.)

Greg

Copyright © 2008, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.