|
|
Subscribe / Log in / New account

A backdoor in xz

A backdoor in xz

Posted Mar 30, 2024 11:07 UTC (Sat) by MarcB (guest, #101804)
In reply to: A backdoor in xz by bluca
Parent article: A backdoor in xz

The question is, why OpenSSH is linking against libsystemd and not just implementing the needed subset of the notify API. The sd_notify interface is simple and stable.

But for Systemd in general, what is XZ actually needed for? If it is just for the journal, you might want to consider dropping this. xz was only part-time-maintained by a single person before the malicious co-maintainer joined. And LZMA does not add match (anything?) over ZSTD.


to post comments

A backdoor in xz

Posted Mar 30, 2024 11:52 UTC (Sat) by chris_se (subscriber, #99706) [Link] (9 responses)

> And LZMA does not add match (anything?) over ZSTD.

I don't remember all the details, but ~ 8 months or so ago I experimented with replacing xz with zstd for fs images at $dayjob - and while zstd was a LOT faster, even at max zstd level the files were still ~ 20% larger than a plain xz -7. (xz -9 was too slow to be practical) Take the exact number with a grain of salt because it's from memory, but the difference was significant.

In our case the significantly higher compression ratio was worth it.

Can't speak about systemd, but I would definitely not make any absolute statements that there are NO advantages to xz/lzma.

A backdoor in xz

Posted Mar 30, 2024 19:25 UTC (Sat) by MarcB (guest, #101804) [Link] (8 responses)

Did you use the "ultra" settings for zstd or any of the advanced options? Maybe there are some binary data scenarios where LZMA still wins, but in all tests I did, zstd achieved better compression. But this always was mostly textual data.

A backdoor in xz

Posted Mar 30, 2024 21:19 UTC (Sat) by mbunkus (subscriber, #87248) [Link] (4 responses)

Really? I always think of zstd as worse but orders of magnitude faster than xz, and worse but in-different-universes-kind-of-faster than bzip9 (all at default settings).

After reading another comment in this thread about using zstd for systemd's journal, I did a short test with a 1.6 GB journal export file (journalctl -o export …). The results were roughly:

| Type         | Size | Time  |
|--------------|------|-------|
| uncompressed | 1.6G | —     |
| gzip         | 78M  | 6.9s  |
| bzip2        | 58M  | 1m34s |
| zstd         | 62M  | 0.9s  |
| zstd -9      | 51M  | 3.9s  |
| xz           | 43M  | 4.5s  |

With the exception of zstd -9 all other compressors used their default settings.

(As stated in journalctl's man page, the "export" format is mostly text with a small amount of binary data for structure)

I'd be interested in situations where zstd compresses better than xz. Do you have some concrete numbers?

A backdoor in xz

Posted Mar 31, 2024 0:29 UTC (Sun) by MarcB (guest, #101804) [Link] (3 responses)

At work, it was a large mail archive; essentially write-only - if we ever have to read it, something went wrong (legally speaking :-)

For linux-6.8.2.tar (1.4G), I get 137MiB for xc -9 and and 133MiB for zstd --ultra -22.

A backdoor in xz

Posted Mar 31, 2024 12:15 UTC (Sun) by mbunkus (subscriber, #87248) [Link] (2 responses)

Ooooh I hadn't been aware zstd has compression levels higher than 9. Good to know.

I did a couple more tests with this knowledge; here's the updated table:

| Type             | Size |     Time |
|------------------|-----:|---------:|
| uncompressed     | 1.6G |        — |
| gzip             |  78M |     6.9s |
| bzip2            |  58M |  1m34.1s |
| zstd             |  62M |     0.9s |
| zstd -9          |  51M |     3.9s |
| zstd -19         |  44M |  3m38.6s |
| zstd --ultra -22 |  41M | 21m43.1s |
| xz               |  43M |     4.5s |
| xz -9            |  43M |    15.6s |

So yes, you can get zstd down to below xz, at least with content that is mostly text, but now the duration completely flips upside down with xz looking good with 5s & zstd being so far out of this world that it isn't funny anymore.

Note, though, that xz is multi-threaded & zstd all the others don't seem to be: zstd only used a single core even on --ultra -22 whereas xz -9 used eight of my 32 cores. That being said, "zstd -19" uses 14.6 times the amount of time, "zstd --ultra -22" is at an unbelievable 83.5 times, making it still slower per core than "xz -9".

Does multi-core processing matter? Let's take build pipelines such as a build server for a distribution such as Debian as an example. If they want to achieve high utilization of their resources, they have to run stuff in parallel. This means that they can either assign a single core to each build VM & run a lot of build VMs in parallel, or they can assign multiple cores to each build VM & run fewer of them. In the latter case having a compression step that can only make use of a single core & that takes x times the time of another compressor with similar results in compression ratio, that yields rather low utilization.

Don't get me wrong; I really like zstd & the tradeoffs it makes. I use it as my default compressor in most day-to-day use cases for its impressive speed, especially interactively. But when file size is a concern (e.g. a lot of countries out there where internet traffic is mostly mobile & therefore both slow & expensive at the same time), xz pretty much always wins, no matter how you look at it. It's really no big surprise for me it has gained such wide-spread usage in the OSS world.

A backdoor in xz

Posted Mar 31, 2024 17:46 UTC (Sun) by andresfreund (subscriber, #69562) [Link] (1 responses)

> Note, though, that xz is multi-threaded & zstd all the others don't seem to be: zstd only used a single core even on --ultra -22 whereas xz -9 used eight of my 32 cores.

zstd -T0 will do the same.

A backdoor in xz

Posted Mar 31, 2024 18:12 UTC (Sun) by mbunkus (subscriber, #87248) [Link]

Indeed; I somehow totally missed that when glancing over zstd's options earlier today. Thanks for pointing that out.

Interestingly it does worse than "xz -T0" does wrt. how many cores it can effectively use. On my 32-core system with the same 1.6 GB input file "zstd --ultra -22 -T0" starts out using four cores but drops down to & stays at three cores after a handful of seconds. Therefore processing still takes 7m38s. Using a file or STDIN as input makes no difference. I guess zstd simply cannot segment the source as much as xz does.

Now "xz -T0" (which is the default in recent xz versions) also only uses eight cores on the same machine. Then again even with "-9" it is worlds faster both per core & in total than "zstd --ultra -22".

Then again, I'm really not trying to argue that xz is better than zstd, even though I probably sound like it. I just tried to answer the question why the OSS community has adopted xz as widely as it has, simply to satisfy my own curiosity. Also it's good to know the strengths & weaknesses of the various tools at our disposal.

A backdoor in xz

Posted Mar 31, 2024 6:32 UTC (Sun) by chris_se (subscriber, #99706) [Link] (2 responses)

Did you use the "ultra" settings for zstd or any of the advanced options?

I don't remember. I just redid the same checks again for a single file I had laying around (I did the previous checks against multiple variants), and I got this:

| Method           | Time (m:s) | RAM during compress | Size      |
|------------------|------------|---------------------|-----------|
| xz               | 2:12       |   95 MiB            | 135 MiB   |
| xz -7            | 2:22       |  187 MiB            | 134 MiB   |
| xz -9            | 2:34       |  675 MiB            | 115 MiB   |
| zstd             | 0:02       |   52 MiB            | 182 MiB   |
| zstd -19         | 2:49       |  250 MiB            | 147 MiB   |
| zstd --ultra -22 | 4:22       | 1328 MiB            | 121 MiB   |

(All done one a single CPU core, Intel Core i7-8700K. Debian 12 stable.)

My payload is basically a tar file of a minimized Debian 12 rootfs, plus some additional internal software -- nothing special. (Orig size: 554 MiB)

To summarize my test: even at --ultra -22 zstd is worse in all aspects compared to xz -9.

A backdoor in xz

Posted Mar 31, 2024 16:06 UTC (Sun) by stefanor (subscriber, #32895) [Link] (1 responses)

zstd makes a trade-off of compression to decompression resources.

The main promise of zstd over the other options is faster decompression, so I think it would only be fair to include that in the comparison.

A backdoor in xz

Posted Apr 1, 2024 19:33 UTC (Mon) by chris_se (subscriber, #99706) [Link]

> The main promise of zstd over the other options is faster decompression, so I think it would only be fair to include that in the comparison.

Sure, one could do that, and zstd is probably going to be faster when it comes to decompression. But my original point was not to bash zstd - I was replying to the statement that zstd is always better than xz and that there's no reason to use xz at all. My second response where I posted my measurements was a little hyperbolic to underline that point.

Personally, I do quite like zstd - and if you look at my table, using the standard compression algorithm, you can reduce a filesystem image of 554 MiB down to 182 MiB (~32% of the original size) within just 2 seconds, which is a lot faster than what many other tools can do. (~60 times faster than xz in its default settings.) I do think zstd is an excellent algorithm to use as a default when no further constraints have been applied, because the tradeoffs it has chosen are very sensible for many applications.

The only point I'm trying to drive home is that if you have certain constraints - such as that the compressed size is to be as small as reasonably possible - then zstd might not be the algorithm you want to use in all cases (probably depending on what kind of data you want to compress), and that rhetoric such as "always use zstd, xz is obsolete" is not helpful. And while the broader public now knows a lot more about the past hardships of xz maintenance, hindsight is always 20/20, and I don't think the problems there were immediately obvious to most people just using xz themselves. I think that after-the-fact statements such as "people should not have used xz anymore anyway" are extremely unhelpful - not only because it's easy to say so after the fact, but also because I do think xz has some advantages in some situations and will remain a good choice when constraints require it.

A backdoor in xz

Posted Mar 30, 2024 12:11 UTC (Sat) by bluca (subscriber, #118303) [Link] (10 responses)

It's needed to compress/decompress journal files and coredumps. The former is done also via a public sd_journal* set of APIs hence it is in libsystemd.so. In git main we switched to dlopen these libs only when needed - ie, when the sd_journal API is called _and_ it encounters a file compressed with the corresponding lib.

This will be the dependency tree of a fully-featured build of libsystemd in the next release:

build/libsystemd.so.0 (interpreter => None)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

Compression libs and libgcrypt will all be dlopened on demand, if needed.

A backdoor in xz

Posted Mar 30, 2024 13:58 UTC (Sat) by pbonzini (subscriber, #60935) [Link] (5 responses)

I wonder however if simple and common functionality like notification and file descriptor retrieval belongs in the same public-facing library as reading the journal and the coredumps. Perhaps they should be moved out of libsystemd and into a two-file (.h and .c) copylib?

A backdoor in xz

Posted Mar 31, 2024 12:17 UTC (Sun) by bluca (subscriber, #118303) [Link] (4 responses)

It used to be, but it was merged, because it's just an unnecessary pain for developers to have to know multiple extremely similar libraries, and have to reason about which one to use and link to, etc etc.

The manager <-> service protocol is trivial, so the solution is to just reimplement it if that's all you need. I'll check whether we have some MIT-0 copy-paste ready examples, and if not add it to the documentation.

A backdoor in xz

Posted Apr 1, 2024 5:31 UTC (Mon) by mchapman (subscriber, #66589) [Link] (3 responses)

systemd used to provide a reference implementation (+ header). Perhaps something like this could be brought back?

A backdoor in xz

Posted Apr 1, 2024 11:02 UTC (Mon) by bluca (subscriber, #118303) [Link] (2 responses)

There will be a MIT-0 (so it can be copy/pasted with impunity) self-contained example in the documentation where the protocol is defined

A backdoor in xz

Posted Apr 2, 2024 17:16 UTC (Tue) by bluca (subscriber, #118303) [Link]

A backdoor in xz

Posted Apr 2, 2024 20:40 UTC (Tue) by himi (subscriber, #340) [Link]

A similar reference implementation in a few other common languages would be nice, too - with systemd it's gotten so easy to write system daemons in things like python that a copyable reference implementation would be quite helpful. It's simple enough that no one's bothered writing a library, but fiddly enough to do properly that rolling your own isn't always the best option.

A backdoor in xz

Posted Mar 30, 2024 17:21 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

Journal files and coredumps really are a case where zstd would be a better choice than lzma: it's nearly as good compression-wise but compression is *much* faster, and both journals and coredumps are compressed much more often than they're uncompressed. (For things like .xz distro artifacts this is the other way around, so spending loads of time for slightly better compression is often a good idea -- or would be if xz wasn't also much slower and more memory-hungry than zstd at decompressing!)

A backdoor in xz

Posted Mar 31, 2024 12:19 UTC (Sun) by bluca (subscriber, #118303) [Link] (1 responses)

xz, gz and std are all supported, with a compile-time option to choose which one to use.

A backdoor in xz

Posted Mar 31, 2024 13:54 UTC (Sun) by nix (subscriber, #2304) [Link]

In hindsight, of course they are! I should have checked. (I'm even using zstd on my own system, but of course I forgot. I, uh, blame the clocks changing. ... what do you mean they only changed after I made that comment?)

A backdoor in xz

Posted Mar 30, 2024 21:49 UTC (Sat) by MarcB (guest, #101804) [Link]

This looks like a major improvement, at the very least, the spill-over into other process address spaces will be prevented.

Let's hope distributions follow up on this and reduce the set of essential packages.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds