A backdoor in xz [LWN.net]

A backdoor in xz

Posted Mar 29, 2024 20:10 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (18 responses)

The only problem is that it's a pretty horrible amount of code to have to inline. Look at pid_notify_with_fds_internal ...

Apparently unreleased versions of systemd dlopen liblzma instead, which would have meant it wasn't in sshd's process space.

A backdoor in xz

Posted Mar 29, 2024 20:28 UTC (Fri) by intelfx (subscriber, #130118) [Link] (3 responses)

> The only problem is that it's a pretty horrible amount of code to have to inline. Look at pid_notify_with_fds_internal

I don't think any of that code is needed. OpenSSH as patched only needs sd_listen_fds() and plain sd_notify() which _as used_ can be implemented in about 5-10 lines of C code each.

A backdoor in xz

Posted Mar 29, 2024 20:35 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (2 responses)

sd_notify calls pid_notify_with_fds_internal, though? But if there's a reasonably standard inlined C reimplementation that covers all the necessary API surface, I'd definitely consider it.

A backdoor in xz

Posted Mar 30, 2024 1:12 UTC (Sat) by zdzichu (subscriber, #17118) [Link]

That's the details of a specific implementation. Actual protocol is simple. Env var contains a socket path, you write a short string text string to it. Really few lines of code.

A backdoor in xz

Posted Mar 30, 2024 6:50 UTC (Sat) by intelfx (subscriber, #130118) [Link]

> But if there's a reasonably standard inlined C reimplementation that covers all the necessary API surface, I'd definitely consider it.

Yep, that's why I tried to emphasize "as used". The implementation you see is shared between several mostly-disjoint users (e. g. it is also used to communicate with hypervisors via vsock) and also implements other features of this ad-hoc protocol (such as fd passing) which are not used in openssh.

The usage in openssh (to signal readiness) is covered by writing a fixed, static text string into an AF_UNIX datagram socket pointed to by the $NOTIFY_SOCKET variable.

A backdoor in xz

Posted Mar 29, 2024 21:05 UTC (Fri) by judas_iscariote (guest, #47386) [Link] (13 responses)

It will still be.. selinux requires it. happy now ? Supply chain attacks are not systemd's fault .. :-)
It is more like corporations fault for not paying people to work in things they profit from.

A backdoor in xz

Posted Mar 30, 2024 11:04 UTC (Sat) by fenncruz (subscriber, #81417) [Link] (12 responses)

I agree it's not systemd's fault, but is there something it (and other software) can do to make this attack harder? Like somehow preventing the symbols being replaced by a malicious library?

A backdoor in xz

Posted Mar 30, 2024 12:02 UTC (Sat) by bluca (subscriber, #118303) [Link] (7 responses)

We have already replaced all linked dependencies (apart from glibc and libcap) in libsystemd.so with dlopen (that is activated only if and when the specific API that needs the external library is called, not automatically) in git main

A backdoor in xz

Posted Mar 30, 2024 14:12 UTC (Sat) by smurf (subscriber, #17840) [Link]

Interesting. I was about to thank you for your proactive response to this incident, but a look at systemd's git reveals that this change was done a month ago, in order to reduce systemd's footprint on startup RAM disks. ;-)

A backdoor in xz

Posted Mar 30, 2024 15:27 UTC (Sat) by dskoll (subscriber, #1630) [Link] (1 responses)

I understand the advantages of the dlopen approach, but it still leaves me feeling uneasy. You might get shared libraries that you don't expect dlopened just by making an innocent API call.

It seems to me that the supervisor notification protocol is likely to be used by many programs, and also quite likely that they might not want anything else from libsystemd. Wouldn't it make sense to put the notification client code in its own shared library that has no external dependencies and won't dlopen anything else ever?

A backdoor in xz

Posted Mar 30, 2024 15:52 UTC (Sat) by zdzichu (subscriber, #17118) [Link]

Funny, it was this way until v209 in 2014. sd-daemon was a collection of functions like sd_notify() and so on, it got merged into libsystemd then.

A backdoor in xz

Posted Mar 30, 2024 18:36 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Sorry, but random dlopen()s are even MORE unacceptable. It also prevents very useful security measures like locking the text of the running executable.

A backdoor in xz

Posted Mar 30, 2024 19:14 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (2 responses)

It doesn't prevent that at all? Unless you use text relocations, .text should only be mapped read only. And .got would have been remapped ro at start if you use -z now -z relro. Dlopen() doesn't change any of that?

A backdoor in xz

Posted Mar 30, 2024 19:41 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

I mean, locking down the complete set of executable pages in a process, so that no new code can't get loaded. OpenBSD has mseal() that can do that.

> Dlopen() doesn't change any of that?

Indeed it doesn't (right now), but expanding its usage will make it harder to enable something like mseal() later.

A backdoor in xz

Posted Mar 31, 2024 13:13 UTC (Sun) by bluca (subscriber, #118303) [Link]

You can still do that, but then you lose some features. That seems like a perfectly acceptable trade-off to me.

A backdoor in xz

Posted Mar 30, 2024 16:53 UTC (Sat) by judas_iscariote (guest, #47386) [Link] (3 responses)

Yes, you can prevent symbols from been replaced by something else by various compiler, linker flags and possibly enviroment variables.. it will still be a cat & mouse game because something that has root already has all the power.

A backdoor in xz

Posted Mar 30, 2024 19:05 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (1 responses)

Afaict all the options for doing so were used in this case. The redirection happened just before the got was remapped read only.

I'm somewhat surprised that nobody called for glibc's rtld-audit infrastructure to be removed. That's really what made this attack possible despite relro. As far as I know, it's not used widely.

A backdoor in xz

Posted Mar 31, 2024 13:30 UTC (Sun) by nix (subscriber, #2304) [Link]

Perhaps it should be possible to set some sort of link-time tag to instruct ld.so to disable the LD_AUDIT infrastructure for particular binaries? Not sure that's doable for specific shared libraries, but at least this would let one mark critical system daemons as "hands-off" for this application, so their own libraries can't compromise them like this. It's enough like AT_SECURE or coredump/ptrace prevention that there should probably be one mechanism to turn all this stuff on at the same time... (For userspace stuff latrace and the things that it enables are actually quite useful, but I can't imagine ever running latrace on sshd, and if I did I'm debugging it anyway and would be at the very least foregrounding it and could presumably manually turn auditing back on. For that matter, latrace could be modified to do that to the programs it invokes, since it knows its own use of LD_AUDIT is non-malicious.)

A backdoor in xz

Posted Mar 31, 2024 6:37 UTC (Sun) by epa (subscriber, #39769) [Link]

I think if the symbol-replacing were not allowed, nor arbitrary code execution on *loading* the library, then the attack would be more difficult. The application does not call any functions from xz. An attacker would have to get a backdoor into the library and somehow persuade sshd to call it.

A backdoor in xz

Posted Mar 29, 2024 20:15 UTC (Fri) by bkw1a (subscriber, #4101) [Link] (4 responses)

Every time a patch pulls in a new dependency, it increases our attack surface. That needs to be weighed against the benefit of the patch. For something like sshd, it seems like the openssh developers, who have security as their primary focus, should be the ones we trust to make that decision.

A backdoor in xz

Posted Mar 29, 2024 20:23 UTC (Fri) by cjwatson (subscriber, #7322) [Link] (2 responses)

I mean, look, I defer to the openssh developers on a _lot_ of stuff, but they're not the ones trying to integrate with the rest of our distribution and that does sometimes force some different decisions. The best I can do is document all the deviations as clearly as possible.

A backdoor in xz

Posted Mar 29, 2024 22:02 UTC (Fri) by dilinger (subscriber, #2867) [Link] (1 responses)

Also, what *is* "critical security infrastructure"? Is firefox/chromium critical security infrastructure? Is glibc? libz? libsasl? libselinux? Systemd does a whole lot of critical things on my system; is that critical security infrastructure that we shouldn't be patching?

On a lot of desktops, sshd isn't even installed. Is it critical security infrastructure because it's installed on some servers you consider important? What about the other daemons installed on important servers, like nginx/apache (and often the whole lamp stack)?

If you actually look at attack vectors, you start realizing pretty quickly that A LOT of software could (or should) be considered critical security infrastructure, and it's pretty unrealistic to not have to patch all of those bits of software to work on Debian's many desktop/server environments and hardware architectures. That also assumes that we can trust upstreams to not backdoor their code, which, as this example shows us, we clearly cannot.

A backdoor in xz

Posted Apr 3, 2024 5:44 UTC (Wed) by Lennie (subscriber, #49641) [Link]

The funny part is: any software installed becomes critical security infrastructure if a FOSS developer develops the software on his primary laptop which holds the SSH-keys used for git commits singing and git push.

A backdoor in xz

Posted Mar 29, 2024 23:58 UTC (Fri) by mcatanzaro (subscriber, #93033) [Link]

Sounds good, but in this case I think that's just wrong. You really want systemd to accurately know whether sshd is running or not. If systemd doesn't know, then you don't know, and that's a security disaster.

A backdoor in xz

Posted Mar 29, 2024 23:38 UTC (Fri) by cjwatson (subscriber, #7322) [Link]

I guess I need to amend this since https://bugzilla.mindrot.org/show_bug.cgi?id=2641#c13 happened. If something like that gets in then we'll definitely adopt it in Debian.

A backdoor in xz

Posted Mar 30, 2024 1:11 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (48 responses)

Why does my ssh server need to integrate with my service manager?

A backdoor in xz

Posted Mar 30, 2024 1:40 UTC (Sat) by bluca (subscriber, #118303) [Link] (47 responses)

Because the service manager needs to know when the ssh server is ready

A backdoor in xz

Posted Mar 30, 2024 5:30 UTC (Sat) by wtarreau (subscriber, #51152) [Link] (11 responses)

Why ? Your response sounds more like "wants to know".

A backdoor in xz

Posted Mar 30, 2024 5:48 UTC (Sat) by rra (subscriber, #99804) [Link] (10 responses)

So that the system administrator who just restarted the ssh server knows it didn't actually start and doesn't log out before fixing it.
So that other services that depend on the ssh server being started know when to start.
So that when you ask the service manager what services failed, you'll know that the ssh server failed.
So that you have an actual service manager, not a bunch of YOLO shell scripts with no error handling.

A backdoor in xz

Posted Mar 30, 2024 8:12 UTC (Sat) by DimeCadmium (subscriber, #157243) [Link] (9 responses)

If it didn't actually start, then it shouldn't have forked off and backgrounded itself, which is how services notified the service manager that they had successfully started for literal decades before systemd came along.

A backdoor in xz

Posted Mar 30, 2024 8:23 UTC (Sat) by mb (subscriber, #50428) [Link] (4 responses)

Yep, and it was broken all the time because everybody did it differently and slightly wrong, until systemd came along.
But let's not distract from the discussion: systemd ist *not* why this backdoor was possible. It could have been any other library. It could even have been any other server application. It's not restricted to sshd.

The real problem is that patches that have not been understood/reviewed have been applied.
This is a social problem. Not a technical one.

A backdoor in xz

Posted Mar 30, 2024 12:46 UTC (Sat) by stef70 (guest, #14813) [Link] (1 responses)

Indeed. We need to wait until the full analysis of the backdoor to be sure that no tool other than sshd was targeted.

On my Debian system, liblzma.so is linked in several programs and libraries. A lot are unrelated to systemd: grub, insmod, lvm, reboot, gimp, imagemagick, runlevel, ...

All of them are potential targets for that xz backdoor. For now, we have to wait for the full analysis. I am pretty optimistic that sshd was the main target because installing another backdoor on the system or calling "home" would significantly increase the probability or detection.

A backdoor in xz

Posted Mar 30, 2024 23:33 UTC (Sat) by brooksmoses (guest, #88422) [Link]

There is code in the exploit that would look for additional files in the "test" directory that matched specific byte patterns, and then extract a payload from them and execute it. There are currently no files matching those patterns -- so it certainly looks like this bit was designed as a capability to target additional programs simply by adding additional "test" files to the git repository.

[Reference: https://github.com/Midar/xz-backdoor-documentation/wiki#s... as of the time of this comment.]

A backdoor in xz

Posted Mar 31, 2024 1:25 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

> Yep, and it was broken all the time because everybody did it differently and slightly wrong, until systemd came along.

Ah, okay. And how exactly do you believe that one methods of notifications is any more reliable at this than any other? They all rely on the software developer picking a good time to say "started".

> But let's not distract from the discussion: systemd ist *not* why this backdoor was possible

It absolutely is.

> It could have been any other library

But it wasn't. "Don't worry about our vulnerabilities, other people have vulnerabilities too!" "Don't worry about our bad design, other people have bad design too!"

A backdoor in xz

Posted Mar 31, 2024 9:22 UTC (Sun) by smurf (subscriber, #17840) [Link]

> They all rely on the software developer picking a good time to say "started".

They all rely on picking a good time that happens to *work*.

There are plenty of situations where, once you're *really* started, it's no longer possible to signal "OK I'm alive now" by double-forking.

Writing a PID file has its own class of race conditions, the handling of which I can guarantee most users of that method get fatally wrong.

And so on.

> "Don't worry about our vulnerabilities, other people have vulnerabilities too!" "Don't worry about our bad design, other people have bad design too!"

Don't blame the messenger. If linking to a library you don't strictly need *in your particular situation* is a "vulnerability" or "bad design" I can guarantee that 90+% of programs out there suffer from it.

A backdoor in xz

Posted Mar 30, 2024 9:53 UTC (Sat) by motk (guest, #51120) [Link]

Yeah, and it sucked. It sucked 35 years ago. It still sucks.

This whole thing has nothing to do with service management, and everything to do with large corporations relying on volunteers writing critical software apparently just for something to do.

A backdoor in xz

Posted Mar 30, 2024 16:58 UTC (Sat) by rra (subscriber, #99804) [Link] (2 responses)

> If it didn't actually start, then it shouldn't have forked off and backgrounded itself, which is how services notified the service manager that they had successfully started for literal decades before systemd came along.

I have run UNIX systems throughout those literal decades that you are talking about, and your faith in this half-assed, failure-prone mechanism is badly misplaced. I cannot count the number of ways I have seen this fail: the process does not actually start listening to the network until after the fork, the process starts listening before the fork but isn't really ready to accept connections because there is setup that has to be done after the fork, the process forks but doesn't fork twice and thus isn't properly reparented, the process didn't write a PID file and now you have no idea which process is actually running the service, the process did write a PID file and wrote the wrong PID to that file, you end up with multiple backgrounded copies of the same service running and interfering weirdly with each other... the list goes on.

We figured out that this was a bad way to run services by at least the early 2000s, when support for a foreground model with none of this self-daemonization nonsense badly copied into every service became widely available (and as someone who was managing UNIX systems all through that period, that was a delightful revelation). But you do not want to assume that the service is ready simply because the process has started. You need some mechanism for signaling that the service really has fully started, has allocated all of its resources, and is listening to network connections (if that is its job). Otherwise, you risk starting services that depend on it too soon.

Even upstart (the alternative preferred by some of the folks who disliked systemd) had a mechanism for doing this. (It was worse than systemd's, at least in my opinion.)

A backdoor in xz

Posted Mar 31, 2024 1:25 UTC (Sun) by DimeCadmium (subscriber, #157243) [Link] (1 responses)

So have I, and YOUR faith in systemd's half-assed, failure-prone mechanism is badly misplaced.

Please stop here

Posted Mar 31, 2024 1:45 UTC (Sun) by corbet (editor, #1) [Link]

We have managed to keep this conversation relatively free of systemd bashing, which is really not relevant to the discussion. Please don't do any more of it here.

A backdoor in xz

Posted Mar 30, 2024 7:09 UTC (Sat) by epa (subscriber, #39769) [Link] (34 responses)

Fair enough. But why does that functionality need to pull in xz support? The ssh daemon does not itself do xz compression in order to integrate with systemd.

If the answer is “because it links as a C library and you get the transitive dependencies of everything”, that’s something to improve.

A backdoor in xz

Posted Mar 30, 2024 7:32 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

>because it links as a C library and you get the transitive dependencies of everything

So, statically link with LTO?

A backdoor in xz

Posted Mar 31, 2024 14:23 UTC (Sun) by dskoll (subscriber, #1630) [Link]

No, static linking isn't needed. Just split the large libsystemd into smaller libraries where each smaller library contains a set of closely-related APIs and minimal other dependencies. There's no reason to pull code in to do log compression if all you need is code for the sd_notify protocol.

A backdoor in xz

Posted Mar 30, 2024 7:50 UTC (Sat) by cjwatson (subscriber, #7322) [Link] (31 responses)

Indeed, and apparently unreleased versions of systemd already trim down the linkage of libsystemd so that liblzma won't be in the process space unless it's actually needed.

A backdoor in xz

Posted Mar 30, 2024 10:24 UTC (Sat) by job (guest, #670) [Link] (30 responses)

Doesn't that obscure what is happening, which risks making a not good situation even worse? The situation with a backdoored library would still be there, just harder to diagnose.

A backdoor in xz

Posted Mar 30, 2024 12:07 UTC (Sat) by bluca (subscriber, #118303) [Link] (25 responses)

The impact is reduced, because dlopen only happens if and when the API using the library is called by the program linking to libsystemd, rather than by default. So in this case it would not have happened, because sshd does not read compressed journal files, which is the reason compressed libs are linked in libsystemd.

Dependency chain of a full-feature build of libsystemd from main (plus a PR under review):

build/libsystemd.so.0 (interpreter => None)
libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 => /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

We want to remove the need for libcap too, but that's a bit more complex.

A backdoor in xz

Posted Mar 30, 2024 18:23 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (24 responses)

This is an EXTREMELY bad move from systemd. A dlopen() is a much more worrying signal of exploitation, because it's so unused. And libsystemd will make it normal.

It also won't close off all avenues of attack. A malicious library can patch the code, ptrace() its process, modify the environment, etc.

A backdoor in xz

Posted Mar 30, 2024 19:12 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (16 responses)

There already are dlopens in things like sshd, via e.g. PAM.

A backdoor in xz

Posted Mar 30, 2024 19:36 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (14 responses)

Yeah, and I also forgot about the horror of nsswitch.

Still, we should start cutting back on this kind of nonsense.

A backdoor in xz

Posted Mar 31, 2024 13:46 UTC (Sun) by nix (subscriber, #2304) [Link] (13 responses)

So to you, dlopen is a signal of exploitation and should be avoided because it's so rare, until it is pointed out that it's not rare and is already used in a wide variety of processes, whereupon you switch to calling unclear things 'this kind of nonsense', cite nsswitch (which is not relevant, given that PAM is at issue here), and suggest, what? Removing PAM and nsswitch?

That's going to work really well given how many sites use both to fold in new hostname lookup mechanisms, new user lookup mechanisms, and new and fairly complex authentication patterns on the fly.

Anyway, dropping nsswitch and PAM wouldn't even really help, despite being immensely disruptive. dlopen does have its problems[1] and it is reasonable to prefer to avoid it when possible, but it is not rare even in the absence of nsswitch and PAM. Try adding reporting to glibc to see how often it's invoked on real running systems. (It's a *lot*. Even syslog daemons make extensive use of it these days, so you can't even say "perhaps daemons running as root can't dlopen".)

You cannot use 'this uses dlopen' as a signal of suspiciousness, or of anything really, any more than 'this is dynamically linked' is such a signal. "This has IFUNC resolvers that redirect symbols in other libraries" is definitely an actual sign of badness that I've never heard of anything legitimate doing, and I'm wondering if glibc could detect and block that somehow without too much cost (it would at least involve stack frame walks, but the resolver has to mess with the stack frame anyway...)

[1] now that prelink is dead, mostly that you can't use ldd to statically determine what the shared library dep tree is and what things might be potentially impacted by ABI changes, which *is* actually problematic on real systems

A backdoor in xz

Posted Mar 31, 2024 16:12 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (12 responses)

> So to you, dlopen is a signal of exploitation and should be avoided because it's so rare, until it is pointed out that it's not rare and is already used in a wide variety of processes, whereupon you switch to calling unclear things 'this kind of nonsense', cite nsswitch (which is not relevant, given that PAM is at issue here), and suggest, what? Removing PAM and nsswitch?

Yeah, exactly. Remove dlopen() calls by refactoring the relevant systems. For example, musl libc does not have nsswitch (and has a built-in NSCD). PAM is already optional.

A backdoor in xz

Posted Mar 31, 2024 17:02 UTC (Sun) by nix (subscriber, #2304) [Link] (11 responses)

So... that this wouldn't actually help solve this problem is not important to you, then? (You clipped that out of my original reply without comment.)

That's a sign of someone on a hobby-horse if I ever heard of one.

(As someone who needs PAM to even log on -- on account of wanting to use YubiKey OTP to do so -- and who uses nsswitch for a variety of homebrewed lookups, I would obviously not be willing to drop either.)

A backdoor in xz

Posted Apr 1, 2024 12:19 UTC (Mon) by foom (subscriber, #14868) [Link] (10 responses)

Nsswitch has an obvious replacement for dlopen: sockets. They're already used in many interesting scenarios, e.g. host lookup is via DNS to localhost, user database often comes from libnss_ldapd or sssd — both of which simply implement a private socket protocol in their nsswitch library to talk to their corresponding service on localhost.

Then of course there's nscd, as already mentioned: a socket protocol for nsswitch lookups already implemented by glibc and musl. Someone could implement a different nscd server-side which doesn't use dlopen — without even modifying glibc. Yet, as far as I know, nobody actually has done so.

On the PAM side there's no similarly easy replacement, though one could investigate OpenBSD's BSD Auth system, which is extensible via spawning subprocesses to handle auth tasks.

In any case, that nobody seems to actually be working on any of this probably shows just how unimportant avoiding dlopen is for most people...

A backdoor in xz

Posted Apr 1, 2024 16:48 UTC (Mon) by nix (subscriber, #2304) [Link] (9 responses)

Avoiding the need to dlopen in statically linked binaries, while not losing nsswitch for such binaries, actually *does* matter to upstream (it would simplify ld.so a whole hell of a lot). So switching to a socket-based protocol is definitely on the cards.

The problem, as ever, is doing that compatibly -- but I suppose if glibc itself provided the 'nss server' that loaded existing nss modules and did everything else nsswitch did, and glibc called into it using the sort of thing you describe, this sort of thing might be practical: it would probably make nscd less of a horror show, too. With a lot of work (how many nss modules depend on being in the same address space as the running process, for starters? I bet it's not zero. And I bet this would slaughter performance for simpler cases, so maybe nss_files still needs to be built in. And so forth...)

A backdoor in xz

Posted Apr 1, 2024 18:04 UTC (Mon) by foom (subscriber, #14868) [Link] (1 responses)

An "nss server" is literally what nscd _already is_!

If you run the nscd service, then glibc sends nss lookups to nscd over a socket, instead of running them inside other binaries.

Nscd comes with a caching layer (unsurprisingly given its name), but you can mostly disable that if you only want the nss-server functionality.

A backdoor in xz

Posted Apr 2, 2024 17:10 UTC (Tue) by nix (subscriber, #2304) [Link]

Oh, of course it is. I am clearly missing the obvious right now :(

A backdoor in xz

Posted Apr 1, 2024 18:23 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> Avoiding the need to dlopen in statically linked binaries, while not losing nsswitch for such binaries, actually *does* matter to upstream (it would simplify ld.so a whole hell of a lot). So switching to a socket-based protocol is definitely on the cards.

glibc is the worst library in existence, so no wonder.

On the other hand, musl libc simply uses the nscd protocol to provide the NSS functionality and even allows wrapping legacy NSS modules: https://github.com/pikhq/musl-nscd

Additionally, with musl I can _already_ get a fully static system with zero dlopen()s or dynamic libraries. There are even several experimental distros that are fully statically linked. E.g.: https://framagit.org/Ypnose/solyste

A backdoor in xz

Posted Apr 2, 2024 17:12 UTC (Tue) by nix (subscriber, #2304) [Link] (5 responses)

> glibc is the worst library in existence, so no wonder.

At this point I'm wondering if you're just being intentionally unpleasant. glibc navigates a frankly horrifying pile of tradeoffs and does the job fairly well given that. If it was "the worst library in existence" it would not be *remotely* so widely used, nor work as well as it does.

A backdoor in xz

Posted Apr 2, 2024 22:54 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

I've been holding this opinion about glibc for decades now. I understand the difficulty of developing glibc, and it excuses at least some warts. But then we have musl which is so much nicer, while being standards-compliant.

I believe we should take at least _some_ of that experience and apply it to the rest of the system. Being static-friendly and not dlopen()-ing stuff is definitely a part of that.

BTW, does dlopen() in libsystemd preclude its static linking?

A backdoor in xz

Posted Apr 3, 2024 11:17 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

> BTW, does dlopen() in libsystemd preclude its static linking?

In the future in glibc, yes. In all other libcs I'm aware of, yes, even now. (Or, rather, you can *try* to call it in statically-linked binaries, but the call will always fail.)

This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing and sometimes loadable shared code in the same address space is a convenient way to implement them... there's not a chance you'll ever get KDE to work statically linked, for instance.

A backdoor in xz

Posted Apr 3, 2024 16:30 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing

Plugins are a thing that has no business being in the foundational parts of the runtime. And it's not like we don't have a real-world example of a system without them, Alpine Linux exists. And it's significantly nicer to work with than the glibc-based systems.

A backdoor in xz

Posted Apr 4, 2024 12:49 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

> Plugins are a thing that has no business being in the foundational parts of the runtime.

I am not convinced, and since as usual you didn't bother to give any reasons, relying instead on pure assertion, I'm not sure why you think this not-an-argument would ever convince anyone who didn't already agree with you.

Why on earth would you consider name lookup or authentication, both things that have had numerous wildly divergent implementations over time and which obviously have different site-by-site requirements, hence the *existence* of pluggable systems to implement them, to be things that "have no business" existing, based on the pure assertion that they are "in the foundational parts of the runtime"? People are *using* nss and PAM's extensibility, you know. They're not just there to annoy you. This is not a moribund module system with a half-dozen stale modules that have hardly changed in the last twenty years. People are plugging other things into that pluggability. (Not that this attack even *relied* on that pluggability, or NSS, or PAM, so why you think ripping them out will help here is quite beyond me.)

For that matter, what on earth even is a "foundational part of the runtime"? Is it the toolchain? Surely that counts if anything does! Better rip out LTO from GCC and clang then, since both rely on linker plugins that run the entire compiler! (Also, how many linker plugins are there? I can hardly name any but LTO. That's gotta be moribund, rip it out!) Is it the kernel? Better rip out kernel modules then, in-tree or not, since if they're not dynamically loaded plugins, nothing is... is it glibc? Surely not, since you can replace it with any other libc you like and keep the kernel and most userspace the same after a recompile: it could hardly be foundational! So I guess NSS can stay. Not sure about PAM, the idea came from Solaris and you have in the past expressed a liking for that sort of thing so maybe that's good now too?

That's the problem with arguing by pure assertion: since you give no reasons, define none of your terms, and provide no grounds to agree with you, there's no reason to accept your premises: and even if I do, it's easy to argue in the exact opposite direction since the premises are so vague, which makes your argument nothing more than a statement of personal preferences and an assertion that of course *your* personal preferences are more important than anyone else's.

Is the real definition "something Cyberax asserts without argument or rationale is foundational"? Or just "Cyberax is right"?

A backdoor in xz

Posted Apr 4, 2024 17:24 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I thought that the reasons for NOT doing plugins are obvious. They add a huge amount of complexity, preclude useful mitigations (mseal/mimmutable), and make the system harder to analyze. You can't statically determine the dependency closure anymore.

Plugins inherently face a complicated environment that they don't control and should not perturb too much. And a crashed plugin will take down the entire application. This was reasonable 30 years ago, but it's not anymore. These days, we actually have a good architectural pattern for this: split modules into a separate daemon that is activated by systemd as needed.

> People are *using* nss and PAM's extensibility, you know.

NSS is actually hardly used these days, NIS/NIS+ have mostly died out. The only major surviving service is LDAP (usually via SSSD). It can simply be incorporated into the glibc (it's 43kb), or it can be split into a daemon that talks to glibc via the NSCD protocol.

If we're talking about PAM in particular, then it's nothing but a stack of bad design decisions. In case of SSH, they can be replaced by ephemeral SSH certificates for most of the scenarios (e.g. a shared machine in a university or for management access to the production cluster on AWS EC2).

These two items will make most non-interactive systems completely dlopen()-free.

A backdoor in xz

Posted Apr 1, 2024 14:41 UTC (Mon) by job (guest, #670) [Link]

In retrospect, I think most people would agree that the design of PAM was a mistake. It was hugely controversial at its time and many fought against its inclusion in distributions.

In the end it was included because it made possible some use cases where no one else stepped up to make a practical alternative.

I don't think that is something we want to emulate. It is certainly possible to satisfy the necessary use cases without resorting to dlopen().

A backdoor in xz

Posted Mar 31, 2024 12:12 UTC (Sun) by bluca (subscriber, #118303) [Link] (6 responses)

The main reason this is done and will happen is to reduce mandatory dependencies. If the Linux ELF format supported optional dependencies in a better way, that are loaded only when needed, then there wouldn't be any need for manually doing dlopen(). I believe OSX's shared object format implements this. But we are where we are, and hence that's the only mechanism we got.

A backdoor in xz

Posted Mar 31, 2024 13:49 UTC (Sun) by nix (subscriber, #2304) [Link]

Hmm. That's interesting! This is kind of a DT_NEEDED which kicks in (and loads dependent libs, runs constructors etc) only when the first symbol in it is called, kind of like lazy binding but doing a lot more than just a symbol resolution?

That's tricky to implement (because doing things in the resolver is *always* a bit tricky) but I can't immediately think of any reason why it's *impossible*. It would need a new dynamic tag of course, DT_LAZY_NEEDED? DT_NEEDED_OPTIONAL?

You couldn't use the simpleminded approach above for everything (good luck making this work for things like data symbols where the GOT is needed before the PLT or in general anywhere you couldn't have used lazy binding before, or where you need the shared library's ELF constructors to run early, or where TLS inadequacies would prevent dlopen from working happily -- and it has the same security implications as using lazy binding) but it should work in a fairly large proportion of cases.

A backdoor in xz

Posted Mar 31, 2024 16:13 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> The main reason this is done and will happen is to reduce mandatory dependencies

No, it's not. It's done to _paper_ over dependencies, making them harder to discover statically and creating wonderful race conditions if mimmutable() is used at an inopportune moment. It's an all-around bad decision.

A backdoor in xz

Posted Mar 31, 2024 17:06 UTC (Sun) by nix (subscriber, #2304) [Link] (3 responses)

Since mimmutable() does not exist on Linux, making changes in Linux-only software like systemd to allow for it seems deeply bizarre, particularly when those changes *reduce* security (like, say, increasing the set of always-loaded libraries to include some which have just been seen to launch attacks when loaded, rather than loading as many as possible of them only as needed).

What next? Shall we make changes to allow for Windows's per-libc malloc(), or for Linux's not-at-all-planned upcoming transition to Mach-O?

A backdoor in xz

Posted Mar 31, 2024 18:54 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> Since mimmutable() does not exist on Linux

This is subject to change: https://lwn.net/Articles/958438/

> particularly when those changes *reduce* security

They don't. libsystemd will _still_ depend on xz, it just will be hidden from cursory analysis.

> What next? Shall we make changes to allow for Windows's per-libc malloc(),

That's actually a pretty good idea, that will make several classes of vulnerabilities more difficult to exploit.

> or for Linux's not-at-all-planned upcoming transition to Mach-O?

I'd take PE: https://blog.hiler.eu/win32-the-only-stable-abi/

A backdoor in xz

Posted Mar 31, 2024 19:36 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

> They don't. libsystemd will _still_ depend on xz, it just will be hidden from cursory analysis.

I honestly wonder if you're even reading this thread. This attack depended on liblzma being loaded into sshd's memory because it was loaded by virtue of DT_NEEDED: after this commit, it would not be loaded at all, because libsystemd would only have loaded it if compressed journal reading was attempted, which sshd never attempts.

So it *would* in fact solve the problem.

But I'm tired of arguing with a brick wall with prejudged opinions, I think. Good night.

A backdoor in xz

Posted Mar 31, 2024 20:16 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> This attack

Reread your words. THIS attack. As in, this _particular_ one. Sure, having the library dlopen()-ed prevents it. I can think of several ways I can backdoor liblzma to work around it.

Making the system usable with mimmutable/mseal would prevent whole categories of exploits. And promoting the dlopen() craze will make this kind of mitigation impossible.

And yeah, I absolutely hate the braindead design of nsswitch, PAM, and now libsystemd.

A backdoor in xz

Posted Mar 30, 2024 17:04 UTC (Sat) by rra (subscriber, #99804) [Link] (3 responses)

This specific exploit I believe relied on being loaded into the process namespace early so that it could set up IFUNCs. I am very far from an expert in how this works, but if I'm understanding this correctly, it would be too late to do this during dlopen (if the library were even dlopened; the primary mitigation is that sshd would have never dlopened liblzma at all with this new systemd design).

A backdoor in xz

Posted Mar 30, 2024 17:18 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

IFUNCs are invoked upon dlopen() and can alas do the same sort of evil thing then that they do here (though they're not supposed to), but of course in this case libsystemd would never have done the dlopen() so the IFUNC would never have had a chance to execute.

IFUNCs are not really the villain here. It is perfectly possible for liblzma to have done the same sort of evil using only perfectly normal symbol interposition, dlsym(..., RTLD_NEXT) and ELF constructors.

A backdoor in xz

Posted Mar 30, 2024 19:08 UTC (Sat) by andresfreund (subscriber, #69562) [Link] (1 responses)

It would have been harder and noisier to do what the backdoor did during dlopen, even if it were called. By that time sshd's .got would have been read only, so redirection would have required remapping.

A backdoor in xz

Posted Mar 30, 2024 23:21 UTC (Sat) by nix (subscriber, #2304) [Link]

I dunno, some parts would have been easier -- after dlopen() you can at least trust that libc functions etc can be freely called, which is somewhat risky with IFUNCs in libraries loaded via DT_NEEDED. (But more likely they'd just have hunted something else down and attacked that instead. They only have to be lucky once...)

A backdoor in xz

Posted Mar 30, 2024 6:16 UTC (Sat) by mchehab (subscriber, #41156) [Link] (5 responses)

> It's pretty difficult to get systemd integration right without patching sshd. Upstream are BSD folks and very firmly Don't Care About Systemd; there's no hope of getting that patch upstream.

Why systemd would possible require any integration with sshd? Originally, it started as a replacement for initrd, meant to make system init faster. See https://0pointer.de/blog/projects/systemd.html:

> For a fast and efficient boot-up two things are crucial:
>
> - To start less.
> - And to start more in parallel.

In practice, system init is now a lot heavier and takes a lot more time to start a system than what it used to be with sysV init.

It also is now not only a PID 1 replacement, but it does lots of integration and interaction with almost everything needed for a system to run, including audit trails/logs.

With that, it became a component that can be compromised indirectly via changes on dozens (or hundreds?) of different components that are not directly related to systemd itself. That opened a window like what just happened where a malicious code introduced into xz is capable of compromising systems that contain systemd integration OOT patches.

IMO, systemd should return to its roots and stop requiring interactions with other packages unrelated to PID 1's task.

A backdoor in xz

Posted Mar 30, 2024 6:55 UTC (Sat) by intelfx (subscriber, #130118) [Link] (4 responses)

> Why systemd would possible require any integration with sshd?

To signal (and receive) the readiness state of the daemon in question. Not more, not less.

> IMO, systemd should return to its roots and stop requiring interactions with other packages unrelated to PID 1's task.

I'd say that "reliably determining whether the supervised process has successfully started up" (i. e. loaded and parsed its configuration, bound all the necessary sockets, did not encounter any other failures) is very much within the definition of the PID 1's task.

A backdoor in xz

Posted Mar 30, 2024 22:57 UTC (Sat) by mchehab (subscriber, #41156) [Link] (1 responses)

> > Why systemd would possible require any integration with sshd?

> To signal (and receive) the readiness state of the daemon in question. Not more, not less.

System V init never needed that, as there are simple generic solutions to monitor that. Basically, when a process is forked on a child process and such child dies, the parent is notified. This a well-defined POSIX-defined behavior.

> > IMO, systemd should return to its roots and stop requiring interactions with other packages unrelated to PID 1's task.
>
> I'd say that "reliably determining whether the supervised process has successfully started up" (i. e. loaded and parsed its configuration, bound all the necessary sockets, did not encounter any other failures) is very much within the definition of the PID 1's task.

It shall be up to sshd process - and to all other system daemons - to die if it failed to parse configuration and/or bind necessary sockets. The task of PID 1 is to monitor if the process is dying too fast, and, on such cases, to take some action.

There's absolutely no need to modify system daemons, implementing non-POSIX out-of-tree hacks just for PID 1 to be aware that a process is up and running.

A backdoor in xz

Posted Mar 31, 2024 2:07 UTC (Sun) by intelfx (subscriber, #130118) [Link]

> System V init never needed that

Yes, and it sucked.

> This a well-defined POSIX-defined behavior

The fact that it is well-defined or POSIX-defined does not automatically mean that it's _good_. I hate to break it to you, but POSIX is not a pinnacle of system design.

> It shall be up to sshd process - and to all other system daemons - to die if it failed to parse configuration and/or bind necessary sockets

Setting up a proper readiness notification by double-forking is approximately tenfold more complicated and requires exponentially more moving parts than the sd_notify mechanism.

In fact, many daemons (including openssh) do not complete their initialization until after the fork, so the only correct implementation of the interface you describe entails the immediate child _waiting_ for the grandchild to finish its setup, and only then exiting. Which means that there has to be a temporary pipe or socket between the child and the grandchild.

So now we are choosing between a socket notification mechanism implemented _once_ in a well-audited, well-maintained project (systemd) and **the same socket notification mechanism** plus a bunch of historical nonsense implemented _all over again_ in each daemon.

I trust the choice is obvious.

A backdoor in xz

Posted Apr 4, 2024 15:34 UTC (Thu) by koh (subscriber, #101482) [Link] (1 responses)

> To signal (and receive) the readiness state of the daemon in question. Not more, not less.

Why would liblzma be needed for that?

A backdoor in xz

Posted Apr 4, 2024 15:44 UTC (Thu) by cjwatson (subscriber, #7322) [Link]

It's not any more (following https://github.com/systemd/systemd/pull/31550 and https://salsa.debian.org/ssh-team/openssh/-/commit/cc5f37...).