LWN.net Logo

Compression formats for kernel.org

Compression formats for kernel.org

Posted Feb 18, 2010 10:07 UTC (Thu) by intgr (subscriber, #39733)
Parent article: Compression formats for kernel.org

In fact, gzip may not necessarily be the speed leader for long. The XZ
format was especially designed to support parallelization, so on modern
quad-core processors it has the potential to be even faster than gzip.
Unfortunately, even though the file format can handle it, the current xz-
utils does not support parallelization yet.

Also, maybe one shouldn't be measuring decompression time in isolation, but
add in download time as well? If the user spent 5 less seconds downloading
the tarball, then does it matter if it takes 5 seconds more to decompress
it?


(Log in to post comments)

Compression formats for kernel.org

Posted Feb 18, 2010 11:04 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

don't limit your thinking to download speed.

I frequently compress my logfiles with gzip -9 even though I know that I will read them a few hours later. I do this because I have measured and found that it's faster to read the compressed data from disk and uncompress it than to read the uncompressed data from disk (even on some fairly beefy disk systems)

with bzip2 this is very much not the case.

I have not had a chance to measure xz in similar conditions yet, but from the sounds of things there's a good possibility that it will be a similar win (and if the decompression can be multithreaded it may be even better)

Compression formats for kernel.org

Posted Feb 18, 2010 14:43 UTC (Thu) by pointwood (guest, #2814) [Link]

PBzip2 (Parallel Bzip2) exists: http://compression.ca/pbzip2/

Compression formats for kernel.org

Posted Feb 18, 2010 17:42 UTC (Thu) by intgr (subscriber, #39733) [Link]

This is a good point, but do note that the topic was decompression speed. bzip2 is pretty good in terms of compression speed and ratio, but performs very badly at decompression.

Just for some rough figures, I'm decompressing the Linux kernel 2.6.32 source tarball, on my quad-core Phenom II system:
pbzcat, four threads, takes 4.1 seconds of wall-clock time (15.6s CPU time).
xzcat, single thread, takes just 4.7 seconds.
zcat, single thread, takes 2.3 seconds

So, parallel bzip2 decompression will probably beat gzip at 8 cores, whereas XZ would be on par with just 2 cores. While XZ is slow at compression, it will definitely beat gzip and bzip2 in parallel decompression.

Compression formats for kernel.org

Posted Feb 23, 2010 6:23 UTC (Tue) by SEJeff (subscriber, #51588) [Link]

And from someone who uses pbzip2 on gobs and gobs of large files every day...
it uses a ton of ram and is still slow.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds