A backdoor in xz
A backdoor in xz
Posted Mar 30, 2024 11:52 UTC (Sat) by chris_se (subscriber, #99706)In reply to: A backdoor in xz by MarcB
Parent article: A backdoor in xz
I don't remember all the details, but ~ 8 months or so ago I experimented with replacing xz with zstd for fs images at $dayjob - and while zstd was a LOT faster, even at max zstd level the files were still ~ 20% larger than a plain xz -7. (xz -9 was too slow to be practical) Take the exact number with a grain of salt because it's from memory, but the difference was significant.
In our case the significantly higher compression ratio was worth it.
Can't speak about systemd, but I would definitely not make any absolute statements that there are NO advantages to xz/lzma.
Posted Mar 30, 2024 19:25 UTC (Sat)
by MarcB (subscriber, #101804)
[Link] (8 responses)
Posted Mar 30, 2024 21:19 UTC (Sat)
by mbunkus (subscriber, #87248)
[Link] (4 responses)
Really? I always think of zstd as worse but orders of magnitude faster than xz, and worse but in-different-universes-kind-of-faster than bzip9 (all at default settings). After reading another comment in this thread about using zstd for systemd's journal, I did a short test with a 1.6 GB journal export file (journalctl -o export …). The results were roughly: With the exception of zstd -9 all other compressors used their default settings. (As stated in journalctl's man page, the "export" format is mostly text with a small amount of binary data for structure) I'd be interested in situations where zstd compresses better than xz. Do you have some concrete numbers?
Posted Mar 31, 2024 0:29 UTC (Sun)
by MarcB (subscriber, #101804)
[Link] (3 responses)
For linux-6.8.2.tar (1.4G), I get 137MiB for xc -9 and and 133MiB for zstd --ultra -22.
Posted Mar 31, 2024 12:15 UTC (Sun)
by mbunkus (subscriber, #87248)
[Link] (2 responses)
Ooooh I hadn't been aware zstd has compression levels higher than 9. Good to know. I did a couple more tests with this knowledge; here's the updated table: So yes, you can get zstd down to below xz, at least with content that is mostly text, but now the duration completely flips upside down with xz looking good with 5s & zstd being so far out of this world that it isn't funny anymore. Note, though, that xz is multi-threaded & zstd all the others don't seem to be: zstd only used a single core even on --ultra -22 whereas xz -9 used eight of my 32 cores. That being said, "zstd -19" uses 14.6 times the amount of time, "zstd --ultra -22" is at an unbelievable 83.5 times, making it still slower per core than "xz -9". Does multi-core processing matter? Let's take build pipelines such as a build server for a distribution such as Debian as an example. If they want to achieve high utilization of their resources, they have to run stuff in parallel. This means that they can either assign a single core to each build VM & run a lot of build VMs in parallel, or they can assign multiple cores to each build VM & run fewer of them. In the latter case having a compression step that can only make use of a single core & that takes x times the time of another compressor with similar results in compression ratio, that yields rather low utilization. Don't get me wrong; I really like zstd & the tradeoffs it makes. I use it as my default compressor in most day-to-day use cases for its impressive speed, especially interactively. But when file size is a concern (e.g. a lot of countries out there where internet traffic is mostly mobile & therefore both slow & expensive at the same time), xz pretty much always wins, no matter how you look at it. It's really no big surprise for me it has gained such wide-spread usage in the OSS world.
Posted Mar 31, 2024 17:46 UTC (Sun)
by andresfreund (subscriber, #69562)
[Link] (1 responses)
zstd -T0 will do the same.
Posted Mar 31, 2024 18:12 UTC (Sun)
by mbunkus (subscriber, #87248)
[Link]
Interestingly it does worse than "xz -T0" does wrt. how many cores it can effectively use. On my 32-core system with the same 1.6 GB input file "zstd --ultra -22 -T0" starts out using four cores but drops down to & stays at three cores after a handful of seconds. Therefore processing still takes 7m38s. Using a file or STDIN as input makes no difference. I guess zstd simply cannot segment the source as much as xz does.
Now "xz -T0" (which is the default in recent xz versions) also only uses eight cores on the same machine. Then again even with "-9" it is worlds faster both per core & in total than "zstd --ultra -22".
Then again, I'm really not trying to argue that xz is better than zstd, even though I probably sound like it. I just tried to answer the question why the OSS community has adopted xz as widely as it has, simply to satisfy my own curiosity. Also it's good to know the strengths & weaknesses of the various tools at our disposal.
Posted Mar 31, 2024 6:32 UTC (Sun)
by chris_se (subscriber, #99706)
[Link] (2 responses)
Did you use the "ultra" settings for zstd or any of the advanced options? I don't remember. I just redid the same checks again for a single file I had laying around (I did the previous checks against multiple variants), and I got this: (All done one a single CPU core, Intel Core i7-8700K. Debian 12 stable.) My payload is basically a tar file of a minimized Debian 12 rootfs, plus some additional internal software -- nothing special. (Orig size: 554 MiB) To summarize my test: even at --ultra -22 zstd is worse in all aspects compared to xz -9.
Posted Mar 31, 2024 16:06 UTC (Sun)
by stefanor (subscriber, #32895)
[Link] (1 responses)
The main promise of zstd over the other options is faster decompression, so I think it would only be fair to include that in the comparison.
Posted Apr 1, 2024 19:33 UTC (Mon)
by chris_se (subscriber, #99706)
[Link]
Sure, one could do that, and zstd is probably going to be faster when it comes to decompression. But my original point was not to bash zstd - I was replying to the statement that zstd is always better than xz and that there's no reason to use xz at all. My second response where I posted my measurements was a little hyperbolic to underline that point.
Personally, I do quite like zstd - and if you look at my table, using the standard compression algorithm, you can reduce a filesystem image of 554 MiB down to 182 MiB (~32% of the original size) within just 2 seconds, which is a lot faster than what many other tools can do. (~60 times faster than xz in its default settings.) I do think zstd is an excellent algorithm to use as a default when no further constraints have been applied, because the tradeoffs it has chosen are very sensible for many applications.
The only point I'm trying to drive home is that if you have certain constraints - such as that the compressed size is to be as small as reasonably possible - then zstd might not be the algorithm you want to use in all cases (probably depending on what kind of data you want to compress), and that rhetoric such as "always use zstd, xz is obsolete" is not helpful. And while the broader public now knows a lot more about the past hardships of xz maintenance, hindsight is always 20/20, and I don't think the problems there were immediately obvious to most people just using xz themselves. I think that after-the-fact statements such as "people should not have used xz anymore anyway" are extremely unhelpful - not only because it's easy to say so after the fact, but also because I do think xz has some advantages in some situations and will remain a good choice when constraints require it.
A backdoor in xz
A backdoor in xz
| Type | Size | Time |
|--------------|------|-------|
| uncompressed | 1.6G | — |
| gzip | 78M | 6.9s |
| bzip2 | 58M | 1m34s |
| zstd | 62M | 0.9s |
| zstd -9 | 51M | 3.9s |
| xz | 43M | 4.5s |
A backdoor in xz
A backdoor in xz
| Type | Size | Time |
|------------------|-----:|---------:|
| uncompressed | 1.6G | — |
| gzip | 78M | 6.9s |
| bzip2 | 58M | 1m34.1s |
| zstd | 62M | 0.9s |
| zstd -9 | 51M | 3.9s |
| zstd -19 | 44M | 3m38.6s |
| zstd --ultra -22 | 41M | 21m43.1s |
| xz | 43M | 4.5s |
| xz -9 | 43M | 15.6s |
A backdoor in xz
A backdoor in xz
A backdoor in xz
| Method | Time (m:s) | RAM during compress | Size |
|------------------|------------|---------------------|-----------|
| xz | 2:12 | 95 MiB | 135 MiB |
| xz -7 | 2:22 | 187 MiB | 134 MiB |
| xz -9 | 2:34 | 675 MiB | 115 MiB |
| zstd | 0:02 | 52 MiB | 182 MiB |
| zstd -19 | 2:49 | 250 MiB | 147 MiB |
| zstd --ultra -22 | 4:22 | 1328 MiB | 121 MiB |
A backdoor in xz
A backdoor in xz