|
|
Subscribe / Log in / New account

A backdoor in xz

A backdoor in xz

Posted Mar 31, 2024 13:46 UTC (Sun) by nix (subscriber, #2304)
In reply to: A backdoor in xz by Cyberax
Parent article: A backdoor in xz

So to you, dlopen is a signal of exploitation and should be avoided because it's so rare, until it is pointed out that it's not rare and is already used in a wide variety of processes, whereupon you switch to calling unclear things 'this kind of nonsense', cite nsswitch (which is not relevant, given that PAM is at issue here), and suggest, what? Removing PAM and nsswitch?

That's going to work really well given how many sites use both to fold in new hostname lookup mechanisms, new user lookup mechanisms, and new and fairly complex authentication patterns on the fly.

Anyway, dropping nsswitch and PAM wouldn't even really help, despite being immensely disruptive. dlopen does have its problems[1] and it is reasonable to prefer to avoid it when possible, but it is not rare even in the absence of nsswitch and PAM. Try adding reporting to glibc to see how often it's invoked on real running systems. (It's a *lot*. Even syslog daemons make extensive use of it these days, so you can't even say "perhaps daemons running as root can't dlopen".)

You cannot use 'this uses dlopen' as a signal of suspiciousness, or of anything really, any more than 'this is dynamically linked' is such a signal. "This has IFUNC resolvers that redirect symbols in other libraries" is definitely an actual sign of badness that I've never heard of anything legitimate doing, and I'm wondering if glibc could detect and block that somehow without too much cost (it would at least involve stack frame walks, but the resolver has to mess with the stack frame anyway...)

[1] now that prelink is dead, mostly that you can't use ldd to statically determine what the shared library dep tree is and what things might be potentially impacted by ABI changes, which *is* actually problematic on real systems


to post comments

A backdoor in xz

Posted Mar 31, 2024 16:12 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (12 responses)

> So to you, dlopen is a signal of exploitation and should be avoided because it's so rare, until it is pointed out that it's not rare and is already used in a wide variety of processes, whereupon you switch to calling unclear things 'this kind of nonsense', cite nsswitch (which is not relevant, given that PAM is at issue here), and suggest, what? Removing PAM and nsswitch?

Yeah, exactly. Remove dlopen() calls by refactoring the relevant systems. For example, musl libc does not have nsswitch (and has a built-in NSCD). PAM is already optional.

A backdoor in xz

Posted Mar 31, 2024 17:02 UTC (Sun) by nix (subscriber, #2304) [Link] (11 responses)

So... that this wouldn't actually help solve this problem is not important to you, then? (You clipped that out of my original reply without comment.)

That's a sign of someone on a hobby-horse if I ever heard of one.

(As someone who needs PAM to even log on -- on account of wanting to use YubiKey OTP to do so -- and who uses nsswitch for a variety of homebrewed lookups, I would obviously not be willing to drop either.)

A backdoor in xz

Posted Apr 1, 2024 12:19 UTC (Mon) by foom (subscriber, #14868) [Link] (10 responses)

Nsswitch has an obvious replacement for dlopen: sockets. They're already used in many interesting scenarios, e.g. host lookup is via DNS to localhost, user database often comes from libnss_ldapd or sssd — both of which simply implement a private socket protocol in their nsswitch library to talk to their corresponding service on localhost.

Then of course there's nscd, as already mentioned: a socket protocol for nsswitch lookups already implemented by glibc and musl. Someone could implement a different nscd server-side which doesn't use dlopen — without even modifying glibc. Yet, as far as I know, nobody actually has done so.

On the PAM side there's no similarly easy replacement, though one could investigate OpenBSD's BSD Auth system, which is extensible via spawning subprocesses to handle auth tasks.

In any case, that nobody seems to actually be working on any of this probably shows just how unimportant avoiding dlopen is for most people...

A backdoor in xz

Posted Apr 1, 2024 16:48 UTC (Mon) by nix (subscriber, #2304) [Link] (9 responses)

Avoiding the need to dlopen in statically linked binaries, while not losing nsswitch for such binaries, actually *does* matter to upstream (it would simplify ld.so a whole hell of a lot). So switching to a socket-based protocol is definitely on the cards.

The problem, as ever, is doing that compatibly -- but I suppose if glibc itself provided the 'nss server' that loaded existing nss modules and did everything else nsswitch did, and glibc called into it using the sort of thing you describe, this sort of thing might be practical: it would probably make nscd less of a horror show, too. With a lot of work (how many nss modules depend on being in the same address space as the running process, for starters? I bet it's not zero. And I bet this would slaughter performance for simpler cases, so maybe nss_files still needs to be built in. And so forth...)

A backdoor in xz

Posted Apr 1, 2024 18:04 UTC (Mon) by foom (subscriber, #14868) [Link] (1 responses)

An "nss server" is literally what nscd _already is_!

If you run the nscd service, then glibc sends nss lookups to nscd over a socket, instead of running them inside other binaries.

Nscd comes with a caching layer (unsurprisingly given its name), but you can mostly disable that if you only want the nss-server functionality.

A backdoor in xz

Posted Apr 2, 2024 17:10 UTC (Tue) by nix (subscriber, #2304) [Link]

Oh, of course it is. I am clearly missing the obvious right now :(

A backdoor in xz

Posted Apr 1, 2024 18:23 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> Avoiding the need to dlopen in statically linked binaries, while not losing nsswitch for such binaries, actually *does* matter to upstream (it would simplify ld.so a whole hell of a lot). So switching to a socket-based protocol is definitely on the cards.

glibc is the worst library in existence, so no wonder.

On the other hand, musl libc simply uses the nscd protocol to provide the NSS functionality and even allows wrapping legacy NSS modules: https://github.com/pikhq/musl-nscd

Additionally, with musl I can _already_ get a fully static system with zero dlopen()s or dynamic libraries. There are even several experimental distros that are fully statically linked. E.g.: https://framagit.org/Ypnose/solyste

A backdoor in xz

Posted Apr 2, 2024 17:12 UTC (Tue) by nix (subscriber, #2304) [Link] (5 responses)

> glibc is the worst library in existence, so no wonder.

At this point I'm wondering if you're just being intentionally unpleasant. glibc navigates a frankly horrifying pile of tradeoffs and does the job fairly well given that. If it was "the worst library in existence" it would not be *remotely* so widely used, nor work as well as it does.

A backdoor in xz

Posted Apr 2, 2024 22:54 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

I've been holding this opinion about glibc for decades now. I understand the difficulty of developing glibc, and it excuses at least some warts. But then we have musl which is so much nicer, while being standards-compliant.

I believe we should take at least _some_ of that experience and apply it to the rest of the system. Being static-friendly and not dlopen()-ing stuff is definitely a part of that.

BTW, does dlopen() in libsystemd preclude its static linking?

A backdoor in xz

Posted Apr 3, 2024 11:17 UTC (Wed) by nix (subscriber, #2304) [Link] (3 responses)

> BTW, does dlopen() in libsystemd preclude its static linking?

In the future in glibc, yes. In all other libcs I'm aware of, yes, even now. (Or, rather, you can *try* to call it in statically-linked binaries, but the call will always fail.)

This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing and sometimes loadable shared code in the same address space is a convenient way to implement them... there's not a chance you'll ever get KDE to work statically linked, for instance.

A backdoor in xz

Posted Apr 3, 2024 16:30 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing

Plugins are a thing that has no business being in the foundational parts of the runtime. And it's not like we don't have a real-world example of a system without them, Alpine Linux exists. And it's significantly nicer to work with than the glibc-based systems.

A backdoor in xz

Posted Apr 4, 2024 12:49 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

> Plugins are a thing that has no business being in the foundational parts of the runtime.

I am not convinced, and since as usual you didn't bother to give any reasons, relying instead on pure assertion, I'm not sure why you think this not-an-argument would ever convince anyone who didn't already agree with you.

Why on earth would you consider name lookup or authentication, both things that have had numerous wildly divergent implementations over time and which obviously have different site-by-site requirements, hence the *existence* of pluggable systems to implement them, to be things that "have no business" existing, based on the pure assertion that they are "in the foundational parts of the runtime"? People are *using* nss and PAM's extensibility, you know. They're not just there to annoy you. This is not a moribund module system with a half-dozen stale modules that have hardly changed in the last twenty years. People are plugging other things into that pluggability. (Not that this attack even *relied* on that pluggability, or NSS, or PAM, so why you think ripping them out will help here is quite beyond me.)

For that matter, what on earth even is a "foundational part of the runtime"? Is it the toolchain? Surely that counts if anything does! Better rip out LTO from GCC and clang then, since both rely on linker plugins that run the entire compiler! (Also, how many linker plugins are there? I can hardly name any but LTO. That's gotta be moribund, rip it out!) Is it the kernel? Better rip out kernel modules then, in-tree or not, since if they're not dynamically loaded plugins, nothing is... is it glibc? Surely not, since you can replace it with any other libc you like and keep the kernel and most userspace the same after a recompile: it could hardly be foundational! So I guess NSS can stay. Not sure about PAM, the idea came from Solaris and you have in the past expressed a liking for that sort of thing so maybe that's good now too?

That's the problem with arguing by pure assertion: since you give no reasons, define none of your terms, and provide no grounds to agree with you, there's no reason to accept your premises: and even if I do, it's easy to argue in the exact opposite direction since the premises are so vague, which makes your argument nothing more than a statement of personal preferences and an assertion that of course *your* personal preferences are more important than anyone else's.

Is the real definition "something Cyberax asserts without argument or rationale is foundational"? Or just "Cyberax is right"?

A backdoor in xz

Posted Apr 4, 2024 17:24 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

I thought that the reasons for NOT doing plugins are obvious. They add a huge amount of complexity, preclude useful mitigations (mseal/mimmutable), and make the system harder to analyze. You can't statically determine the dependency closure anymore.

Plugins inherently face a complicated environment that they don't control and should not perturb too much. And a crashed plugin will take down the entire application. This was reasonable 30 years ago, but it's not anymore. These days, we actually have a good architectural pattern for this: split modules into a separate daemon that is activated by systemd as needed.

> People are *using* nss and PAM's extensibility, you know.

NSS is actually hardly used these days, NIS/NIS+ have mostly died out. The only major surviving service is LDAP (usually via SSSD). It can simply be incorporated into the glibc (it's 43kb), or it can be split into a daemon that talks to glibc via the NSCD protocol.

If we're talking about PAM in particular, then it's nothing but a stack of bad design decisions. In case of SSH, they can be replaced by ephemeral SSH certificates for most of the scenarios (e.g. a shared machine in a university or for management access to the production cluster on AWS EC2).

These two items will make most non-interactive systems completely dlopen()-free.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds