LWN.net Logo

Google releases a better compression algorithm

The Google Open Source Blog has announced the release of the "Zopfli" open source compression algorithm. Though compression cost is high, it could be a win for certain applications:
The output generated by Zopfli is typically 3–8% smaller compared to zlib at maximum compression, and we believe that Zopfli represents the state of the art in Deflate-compatible compression. Zopfli is written in C for portability. It is a compression-only library; existing software can decompress the data. Zopfli is bit-stream compatible with compression used in gzip, Zip, PNG, HTTP requests, and others.

Due to the amount of CPU time required, 2–3 orders of magnitude more than zlib at maximum quality, Zopfli is best suited for applications where data is compressed once and sent over a network many times — for example, static content for the web.


(Log in to post comments)

Google releases a better compression algorithm

Posted Feb 28, 2013 23:46 UTC (Thu) by flammon (guest, #807) [Link]

How does it compare to 7z?

Google releases a better compression algorithm

Posted Feb 28, 2013 23:56 UTC (Thu) by boklm (subscriber, #34568) [Link]

Google releases a better compression algorithm

Posted Mar 1, 2013 0:08 UTC (Fri) by Zizzle (guest, #67739) [Link]

Not really.

"7­Zip can operate with the deflate format, but it can read and write several other archive formats, and achieve higher compression ratios. In this study we only measured deflate­compatible compression."

I suspect the non-deflate 7-zip or xz would do better than zopfli.

Google releases a better compression algorithm

Posted Mar 1, 2013 1:49 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link]

This seems like the core of the article. What Google has done is to create a slightly higher compression ratio version of deflate compression. It's like coming out with gzip that goes up to -12 or -15 instead of just -9. It has the advantage of retaining compatibility with existing decompression programs, but it doesn't offer the larger improvements in space efficiency possible by using a more modern compression scheme like xz.

Google releases a better compression algorithm

Posted Mar 1, 2013 2:17 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

Look at the purpose for which they created it. It doesn't make sense for them to use a newer compression algorithm since they want to retain compatibility.

Google releases a better compression algorithm

Posted Mar 1, 2013 9:27 UTC (Fri) by nim-nim (subscriber, #34454) [Link]

Actually, what compression method to use is one of the hot topics of the ietf workgroup currently working on HTTP/2, and Google is a major contributor to those talks. There is no reason for HTTP/2 to use the same algorithms as HTTP/1, and the use-cases presented in the sopfli paper are definitely the same as HTTP/2.

Google releases a better compression algorithm

Posted Mar 1, 2013 15:19 UTC (Fri) by jengelh (subscriber, #33263) [Link]

>here is no reason for HTTP/2 to use the same algorithms as HTTP/1

Indeed, and therefore, HTTP/2 could just as well use xz. (Or any other established LZMA-ish compressor, but picking xz would follow suit with gzip, which was also used for files and tarballs.)

That, and LZO for the ones who like a trade-off into speed over ratio.

Google releases a better compression algorithm

Posted Mar 2, 2013 7:55 UTC (Sat) by epa (subscriber, #39769) [Link]

Google has a replacement for LZO: http://code.google.com/p/snappy/

Google releases a better compression algorithm

Posted Mar 3, 2013 2:08 UTC (Sun) by scientes (guest, #83068) [Link]

xz is officially lzma2, as used the same liblzma as the original legacy lzma1. Basically people used lzma before it was ready, so they renamed the final version to avoid confusion over the incompatibility. lzma is now a symlink to xz.

both lzo and xz are very useful, but for differn't things, lzo is most useful when only using the compressed data once, and xz when it is write-once read-many, so you can afford the high compression times and memory requirements.

Google releases a better compression algorithm

Posted Mar 2, 2013 2:31 UTC (Sat) by csigler (subscriber, #1224) [Link]

Google releases a better compression algorithm

Posted Mar 1, 2013 2:36 UTC (Fri) by xz (subscriber, #86176) [Link]

You can't imagine how many tools out there are competing to optimize deflate for minimum png file size.

http://css-ig.net/png-tools-overview

Google releases a better compression algorithm

Posted Mar 1, 2013 0:40 UTC (Fri) by jke (guest, #88998) [Link]

I'd like to see a compression/decompression library that uses more than a single thread in a meaningful way.

But then again most new compression libraries seem to all suffer from not being a "drop-in replacement" for the well rooted and entrenched.

I'd be happy even if the file format was incompatible so long as the decompresser seamlessly handles legacy formats as well.

Google releases a better compression algorithm

Posted Mar 1, 2013 1:15 UTC (Fri) by wmf (guest, #33791) [Link]

pigz and pxz exist; I don't know if they've been librarified but it would seem the hard work has been done.

Google releases a better compression algorithm

Posted Mar 1, 2013 1:45 UTC (Fri) by rahulsundaram (subscriber, #21946) [Link]

Mock, the tool used by the Fedora build system as well as packagers for building packages in a chroot uses pigz based on my suggestion and it seems to work fine for our purposes.

Google releases a better compression algorithm

Posted Mar 1, 2013 1:57 UTC (Fri) by keeperofdakeys (subscriber, #82635) [Link]

Pxz has its own disadvantages though. Since (from what I have read) LZMA can't be multithreaded easily, pxz compresses different parts of the file at the same time (dumping them in /tmp), and finally putting them all together at the end. For solid archives this reduces the compression ratio in exchange for a (quite significant) speedup. In most cases the ratio watsed is quite small, so the tradeoff is worth it.

Google releases a better compression algorithm

Posted Mar 1, 2013 9:16 UTC (Fri) by nmav (subscriber, #34036) [Link]

Indeed. However I'd suggest looking at plzip instead of the xz tools. I was surprized by the difference in compression ratio between them, even if they are based on the same compression algorithm.

http://www.nongnu.org/lzip/lzip.html

Google releases a better compression algorithm

Posted Mar 1, 2013 8:02 UTC (Fri) by jezuch (subscriber, #52988) [Link]

There's also pbzip2. Bzip2, being a BWT-type algorithm, is quite easily parallelizable with little cost to compression ratio.

Google releases a better compression algorithm

Posted Mar 2, 2013 5:06 UTC (Sat) by SEJeff (subscriber, #51588) [Link]

pbzip2 is incredible for bz2 files, but it will use up 100% of your available memory when you run it on big binary files. even with ionice -c3, it can be very intrusive

Google releases a better compression algorithm

Posted Mar 1, 2013 4:27 UTC (Fri) by rsidd (subscriber, #2582) [Link]

2–3 orders of magnitude more than zlib, for 3-8% improvement in compression quality -- it seems to me that the use cases would be very limited (of the cases listed, basically only PNG? For HTTP, unless it's a static page, won't the server need to re-compress each time before sending data? For gzip or Zip, just use another, deflate-incompatible method for much better results...)

Google releases a better compression algorithm

Posted Mar 1, 2013 5:47 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link]

Even within a dynamic page, there's plenty of content beyond PNGs that is relatively static. Look at the page source for a typical web page, and you'll see perhaps a dozen referenced script sources and stylesheets. Those things are relatively static, so they'd definitely benefit from more efficient compression.

Google releases a better compression algorithm

Posted Mar 1, 2013 10:15 UTC (Fri) by Thue (subscriber, #14277) [Link]

How about recompressing the Debian archive? Compress once, download a million times. Or Windows updates.

Google releases a better compression algorithm

Posted Mar 1, 2013 10:25 UTC (Fri) by engla (guest, #47454) [Link]

It's decided by the maintainer of the package. For the upcoming release a lot of packages switched to xz .debs so that the desktop install CDs fit.

Google releases a better compression algorithm

Posted Mar 1, 2013 10:19 UTC (Fri) by Darkmere (subscriber, #53695) [Link]

I can imagine an offload thread in a caching reverse proxy that would recompress long living objects. Might be doable with a varnish vmod .

Google releases a better compression algorithm

Posted Mar 1, 2013 13:49 UTC (Fri) by huayra (subscriber, #58915) [Link]

It should be possible, but it would be more interesting doing it in Varnish itself.

CSS and JS minification on a do_minify has been done before, using the same code logic and strategy used for the do_gzip compression support (even keeping the compressed ESI goodness). So replacing gzip or adding this new library and keep gunzip for decompression, could be an idea.

PNG

Posted Mar 1, 2013 10:33 UTC (Fri) by epa (subscriber, #39769) [Link]

So... how does one use it to make PNG files? Just rebuild libpng against zopfli, or what?

PNG + xz ?

Posted Mar 1, 2013 18:11 UTC (Fri) by rzm (guest, #116) [Link]

I wonder if anybody tried to replace gzip in PNG with something else, e.g. xz.

PNG + xz ?

Posted Mar 1, 2013 18:57 UTC (Fri) by daniel (subscriber, #3181) [Link]

Keep in mind that as images, pngs require a signal processing approach for optimal compression. In other words, first apply some fancy filter to partition most of the coherent signal content into a relatively low population of distinct symbols, then rzip that (or use whichever lossless compressor suits you).

PNG + xz ?

Posted Mar 2, 2013 14:50 UTC (Sat) by endecotp (guest, #36428) [Link]

> Keep in mind that as images, pngs require a signal processing
> approach for optimal compression.

That's what I thought, until I did some experiments. My examples were map tiles, much like you'd get from OSM, Google Maps etc. It turned out that just gzipping the raw image data (24bpp) got slightly better results than with PNG's filtering. (And "no filtering" is not something that pngcrush tries, IIRC.) I then tried bzip2ing the raw image data and got even better results.

Less surprisingly, JPEG with a high quality setting gets visually-indistinguishable results and significantly smaller file sizes.

If you are motivated to improve compression by e.g. bandwidth bills, and especially if your data might be non-typical in some way, it seems that it is well worth experimenting and trying everything.

PNG + xz ?

Posted Mar 3, 2013 22:13 UTC (Sun) by tialaramex (subscriber, #21167) [Link]

Filter type 0, method 0 is the unfiltered data. Yes, pngcrush tries this, it tries each of the methods in type 0 filters (the only type so far defined) one at a time.

There's almost certainly an error in either your methodology or your line of reasoning from your results. Since you said you worked with OSM-like map data I'd guess you've fed a bunch of more or less totally blank map tiles (oceans, unmapped regions) in and what you're left measuring is solely the overhead from PNG's metadata.

PNG + xz ?

Posted Mar 5, 2013 12:21 UTC (Tue) by endecotp (guest, #36428) [Link]

> Filter type 0, method 0 is the unfiltered data. Yes,
> pngcrush tries this

You are right; I was confused. I was misled by the fact that the pngcrush output lists a whole series of "methods" that start with method 1:

IDAT length with method 1 (fm 0 zl 4 zs 0) = 38507
IDAT length with method 2 (fm 1 zl 4 zs 0) = 51462
IDAT length with method 3 (fm 5 zl 4 zs 1) = 71769
IDAT length with method 4 (fm 0 zl 9 zs 1) = 34941
IDAT length with method 7 (fm 0 zl 9 zs 0) = 32868

but this "method" is not the same as the "filter method", which is the "fm" number inside the ().

It remains true that for these images the PNG pre-processing does not help. If you're curious, I have put an example tile here: http://chezphil.org/tmp/maptile.png .

> I'd guess you've fed a bunch of more or less totally blank map tiles

No.

PNG + xz ?

Posted Mar 7, 2013 18:34 UTC (Thu) by huftis (subscriber, #58900) [Link]

Pngcrush just isn’t a very good PNG optimizer. Using OptiPNG or TruePNG, I managed to reduce the file size of your example PNG file by about 26% (from 33.5 KiB to 24.6 KiB).

PNG + xz ?

Posted Mar 7, 2013 18:50 UTC (Thu) by huftis (subscriber, #58900) [Link]

And running PNGZopfli on the result of the TruePNG output gives a file size of only 23.4 KiB. Note that this is with the various max settings of both TruePNG and PNGZopfli, and running this takes a long time, but even with the (very fast) default settings we get a comparable file size – 23.5 KiB.

PNG + xz ?

Posted Mar 1, 2013 21:24 UTC (Fri) by rillian (subscriber, #11344) [Link]

It would help, and PNG even has an enumeration field for the compression algorithm used.

The problem is it would incompatible with every other png decoder out there. Considering it took 10 years to get the current format widely adopted, that's a very expensive proposal.

Which, of course, is why better deflate compressors like zopfli are interesting.

Google releases a better compression algorithm

Posted Mar 2, 2013 8:05 UTC (Sat) by rbrito (subscriber, #66188) [Link]

One thing that the discussions here seem to be missing is that such a gzip-compatible algorithm that is able to compress more is highly suitable for creating images for embedded devices, where the NVRAM is a hard limit that, in general, cannot be upgraded or something.

When people create images for those devices, to keep the size of, say, the initrd low, it is frequent to cut features from busybox or other utilities.

With a stronger compression that the initial steps can understand, said images can just be not that cut down.

So, given a slower method that is still compatible with what my devices can understand or fewer features, I gladly take the 20x compress-once penalty.

Google releases a better compression algorithm

Posted Mar 4, 2013 4:09 UTC (Mon) by PaulWay (✭ supporter ✭, #45600) [Link]

With the further advantage that one can still use standard gzip during testing on a developer board with more memory to keep updating the image without slowing down the test cycle.

Have fun,

Paul

Google releases a better compression algorithm

Posted Mar 4, 2013 17:59 UTC (Mon) by nbd (subscriber, #14393) [Link]

Why bother sticking to gzip compatible, when you can already use LZMA/xz for things like initrd/initramfs or even the full kernel (assuming you have a compatible loader). The compression ratio of xz is much, much better than zopfli

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds