LWN.net Logo

Compression formats for kernel.org

Compression formats for kernel.org

Posted Feb 18, 2010 11:04 UTC (Thu) by dlang (✭ supporter ✭, #313)
In reply to: Compression formats for kernel.org by intgr
Parent article: Compression formats for kernel.org

don't limit your thinking to download speed.

I frequently compress my logfiles with gzip -9 even though I know that I will read them a few hours later. I do this because I have measured and found that it's faster to read the compressed data from disk and uncompress it than to read the uncompressed data from disk (even on some fairly beefy disk systems)

with bzip2 this is very much not the case.

I have not had a chance to measure xz in similar conditions yet, but from the sounds of things there's a good possibility that it will be a similar win (and if the decompression can be multithreaded it may be even better)


(Log in to post comments)

Compression formats for kernel.org

Posted Feb 18, 2010 14:43 UTC (Thu) by pointwood (guest, #2814) [Link]

PBzip2 (Parallel Bzip2) exists: http://compression.ca/pbzip2/

Compression formats for kernel.org

Posted Feb 18, 2010 17:42 UTC (Thu) by intgr (subscriber, #39733) [Link]

This is a good point, but do note that the topic was decompression speed. bzip2 is pretty good in terms of compression speed and ratio, but performs very badly at decompression.

Just for some rough figures, I'm decompressing the Linux kernel 2.6.32 source tarball, on my quad-core Phenom II system:
pbzcat, four threads, takes 4.1 seconds of wall-clock time (15.6s CPU time).
xzcat, single thread, takes just 4.7 seconds.
zcat, single thread, takes 2.3 seconds

So, parallel bzip2 decompression will probably beat gzip at 8 cores, whereas XZ would be on par with just 2 cores. While XZ is slow at compression, it will definitely beat gzip and bzip2 in parallel decompression.

Compression formats for kernel.org

Posted Feb 23, 2010 6:23 UTC (Tue) by SEJeff (subscriber, #51588) [Link]

And from someone who uses pbzip2 on gobs and gobs of large files every day...
it uses a ton of ram and is still slow.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds