A backdoor in xz
A backdoor in xz
Posted Mar 30, 2024 18:23 UTC (Sat) by Cyberax (✭ supporter ✭, #52523)In reply to: A backdoor in xz by bluca
Parent article: A backdoor in xz
It also won't close off all avenues of attack. A malicious library can patch the code, ptrace() its process, modify the environment, etc.
Posted Mar 30, 2024 19:12 UTC (Sat)
by andresfreund (subscriber, #69562)
[Link] (16 responses)
Posted Mar 30, 2024 19:36 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (14 responses)
Still, we should start cutting back on this kind of nonsense.
Posted Mar 31, 2024 13:46 UTC (Sun)
by nix (subscriber, #2304)
[Link] (13 responses)
That's going to work really well given how many sites use both to fold in new hostname lookup mechanisms, new user lookup mechanisms, and new and fairly complex authentication patterns on the fly.
Anyway, dropping nsswitch and PAM wouldn't even really help, despite being immensely disruptive. dlopen does have its problems[1] and it is reasonable to prefer to avoid it when possible, but it is not rare even in the absence of nsswitch and PAM. Try adding reporting to glibc to see how often it's invoked on real running systems. (It's a *lot*. Even syslog daemons make extensive use of it these days, so you can't even say "perhaps daemons running as root can't dlopen".)
You cannot use 'this uses dlopen' as a signal of suspiciousness, or of anything really, any more than 'this is dynamically linked' is such a signal. "This has IFUNC resolvers that redirect symbols in other libraries" is definitely an actual sign of badness that I've never heard of anything legitimate doing, and I'm wondering if glibc could detect and block that somehow without too much cost (it would at least involve stack frame walks, but the resolver has to mess with the stack frame anyway...)
[1] now that prelink is dead, mostly that you can't use ldd to statically determine what the shared library dep tree is and what things might be potentially impacted by ABI changes, which *is* actually problematic on real systems
Posted Mar 31, 2024 16:12 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (12 responses)
Yeah, exactly. Remove dlopen() calls by refactoring the relevant systems. For example, musl libc does not have nsswitch (and has a built-in NSCD). PAM is already optional.
Posted Mar 31, 2024 17:02 UTC (Sun)
by nix (subscriber, #2304)
[Link] (11 responses)
That's a sign of someone on a hobby-horse if I ever heard of one.
(As someone who needs PAM to even log on -- on account of wanting to use YubiKey OTP to do so -- and who uses nsswitch for a variety of homebrewed lookups, I would obviously not be willing to drop either.)
Posted Apr 1, 2024 12:19 UTC (Mon)
by foom (subscriber, #14868)
[Link] (10 responses)
Then of course there's nscd, as already mentioned: a socket protocol for nsswitch lookups already implemented by glibc and musl. Someone could implement a different nscd server-side which doesn't use dlopen — without even modifying glibc. Yet, as far as I know, nobody actually has done so.
On the PAM side there's no similarly easy replacement, though one could investigate OpenBSD's BSD Auth system, which is extensible via spawning subprocesses to handle auth tasks.
In any case, that nobody seems to actually be working on any of this probably shows just how unimportant avoiding dlopen is for most people...
Posted Apr 1, 2024 16:48 UTC (Mon)
by nix (subscriber, #2304)
[Link] (9 responses)
The problem, as ever, is doing that compatibly -- but I suppose if glibc itself provided the 'nss server' that loaded existing nss modules and did everything else nsswitch did, and glibc called into it using the sort of thing you describe, this sort of thing might be practical: it would probably make nscd less of a horror show, too. With a lot of work (how many nss modules depend on being in the same address space as the running process, for starters? I bet it's not zero. And I bet this would slaughter performance for simpler cases, so maybe nss_files still needs to be built in. And so forth...)
Posted Apr 1, 2024 18:04 UTC (Mon)
by foom (subscriber, #14868)
[Link] (1 responses)
If you run the nscd service, then glibc sends nss lookups to nscd over a socket, instead of running them inside other binaries.
Nscd comes with a caching layer (unsurprisingly given its name), but you can mostly disable that if you only want the nss-server functionality.
Posted Apr 2, 2024 17:10 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Apr 1, 2024 18:23 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
glibc is the worst library in existence, so no wonder.
On the other hand, musl libc simply uses the nscd protocol to provide the NSS functionality and even allows wrapping legacy NSS modules: https://github.com/pikhq/musl-nscd
Additionally, with musl I can _already_ get a fully static system with zero dlopen()s or dynamic libraries. There are even several experimental distros that are fully statically linked. E.g.: https://framagit.org/Ypnose/solyste
Posted Apr 2, 2024 17:12 UTC (Tue)
by nix (subscriber, #2304)
[Link] (5 responses)
At this point I'm wondering if you're just being intentionally unpleasant. glibc navigates a frankly horrifying pile of tradeoffs and does the job fairly well given that. If it was "the worst library in existence" it would not be *remotely* so widely used, nor work as well as it does.
Posted Apr 2, 2024 22:54 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
I believe we should take at least _some_ of that experience and apply it to the rest of the system. Being static-friendly and not dlopen()-ing stuff is definitely a part of that.
BTW, does dlopen() in libsystemd preclude its static linking?
Posted Apr 3, 2024 11:17 UTC (Wed)
by nix (subscriber, #2304)
[Link] (3 responses)
In the future in glibc, yes. In all other libcs I'm aware of, yes, even now. (Or, rather, you can *try* to call it in statically-linked binaries, but the call will always fail.)
This is of course one of many reasons why just statically linking everything is not the panacea some seem to think -- plugins really *are* a thing and sometimes loadable shared code in the same address space is a convenient way to implement them... there's not a chance you'll ever get KDE to work statically linked, for instance.
Posted Apr 3, 2024 16:30 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Plugins are a thing that has no business being in the foundational parts of the runtime. And it's not like we don't have a real-world example of a system without them, Alpine Linux exists. And it's significantly nicer to work with than the glibc-based systems.
Posted Apr 4, 2024 12:49 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
I am not convinced, and since as usual you didn't bother to give any reasons, relying instead on pure assertion, I'm not sure why you think this not-an-argument would ever convince anyone who didn't already agree with you.
Why on earth would you consider name lookup or authentication, both things that have had numerous wildly divergent implementations over time and which obviously have different site-by-site requirements, hence the *existence* of pluggable systems to implement them, to be things that "have no business" existing, based on the pure assertion that they are "in the foundational parts of the runtime"? People are *using* nss and PAM's extensibility, you know. They're not just there to annoy you. This is not a moribund module system with a half-dozen stale modules that have hardly changed in the last twenty years. People are plugging other things into that pluggability. (Not that this attack even *relied* on that pluggability, or NSS, or PAM, so why you think ripping them out will help here is quite beyond me.)
For that matter, what on earth even is a "foundational part of the runtime"? Is it the toolchain? Surely that counts if anything does! Better rip out LTO from GCC and clang then, since both rely on linker plugins that run the entire compiler! (Also, how many linker plugins are there? I can hardly name any but LTO. That's gotta be moribund, rip it out!) Is it the kernel? Better rip out kernel modules then, in-tree or not, since if they're not dynamically loaded plugins, nothing is... is it glibc? Surely not, since you can replace it with any other libc you like and keep the kernel and most userspace the same after a recompile: it could hardly be foundational! So I guess NSS can stay. Not sure about PAM, the idea came from Solaris and you have in the past expressed a liking for that sort of thing so maybe that's good now too?
That's the problem with arguing by pure assertion: since you give no reasons, define none of your terms, and provide no grounds to agree with you, there's no reason to accept your premises: and even if I do, it's easy to argue in the exact opposite direction since the premises are so vague, which makes your argument nothing more than a statement of personal preferences and an assertion that of course *your* personal preferences are more important than anyone else's.
Is the real definition "something Cyberax asserts without argument or rationale is foundational"? Or just "Cyberax is right"?
Posted Apr 4, 2024 17:24 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Plugins inherently face a complicated environment that they don't control and should not perturb too much. And a crashed plugin will take down the entire application. This was reasonable 30 years ago, but it's not anymore. These days, we actually have a good architectural pattern for this: split modules into a separate daemon that is activated by systemd as needed.
> People are *using* nss and PAM's extensibility, you know.
NSS is actually hardly used these days, NIS/NIS+ have mostly died out. The only major surviving service is LDAP (usually via SSSD). It can simply be incorporated into the glibc (it's 43kb), or it can be split into a daemon that talks to glibc via the NSCD protocol.
If we're talking about PAM in particular, then it's nothing but a stack of bad design decisions. In case of SSH, they can be replaced by ephemeral SSH certificates for most of the scenarios (e.g. a shared machine in a university or for management access to the production cluster on AWS EC2).
These two items will make most non-interactive systems completely dlopen()-free.
Posted Apr 1, 2024 14:41 UTC (Mon)
by job (guest, #670)
[Link]
In the end it was included because it made possible some use cases where no one else stepped up to make a practical alternative.
I don't think that is something we want to emulate. It is certainly possible to satisfy the necessary use cases without resorting to dlopen().
Posted Mar 31, 2024 12:12 UTC (Sun)
by bluca (subscriber, #118303)
[Link] (6 responses)
Posted Mar 31, 2024 13:49 UTC (Sun)
by nix (subscriber, #2304)
[Link]
That's tricky to implement (because doing things in the resolver is *always* a bit tricky) but I can't immediately think of any reason why it's *impossible*. It would need a new dynamic tag of course, DT_LAZY_NEEDED? DT_NEEDED_OPTIONAL?
You couldn't use the simpleminded approach above for everything (good luck making this work for things like data symbols where the GOT is needed before the PLT or in general anywhere you couldn't have used lazy binding before, or where you need the shared library's ELF constructors to run early, or where TLS inadequacies would prevent dlopen from working happily -- and it has the same security implications as using lazy binding) but it should work in a fairly large proportion of cases.
Posted Mar 31, 2024 16:13 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
No, it's not. It's done to _paper_ over dependencies, making them harder to discover statically and creating wonderful race conditions if mimmutable() is used at an inopportune moment. It's an all-around bad decision.
Posted Mar 31, 2024 17:06 UTC (Sun)
by nix (subscriber, #2304)
[Link] (3 responses)
What next? Shall we make changes to allow for Windows's per-libc malloc(), or for Linux's not-at-all-planned upcoming transition to Mach-O?
Posted Mar 31, 2024 18:54 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
This is subject to change: https://lwn.net/Articles/958438/
> particularly when those changes *reduce* security
They don't. libsystemd will _still_ depend on xz, it just will be hidden from cursory analysis.
> What next? Shall we make changes to allow for Windows's per-libc malloc(),
That's actually a pretty good idea, that will make several classes of vulnerabilities more difficult to exploit.
> or for Linux's not-at-all-planned upcoming transition to Mach-O?
I'd take PE: https://blog.hiler.eu/win32-the-only-stable-abi/
Posted Mar 31, 2024 19:36 UTC (Sun)
by nix (subscriber, #2304)
[Link] (1 responses)
I honestly wonder if you're even reading this thread. This attack depended on liblzma being loaded into sshd's memory because it was loaded by virtue of DT_NEEDED: after this commit, it would not be loaded at all, because libsystemd would only have loaded it if compressed journal reading was attempted, which sshd never attempts.
So it *would* in fact solve the problem.
But I'm tired of arguing with a brick wall with prejudged opinions, I think. Good night.
Posted Mar 31, 2024 20:16 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Reread your words. THIS attack. As in, this _particular_ one. Sure, having the library dlopen()-ed prevents it. I can think of several ways I can backdoor liblzma to work around it.
Making the system usable with mimmutable/mseal would prevent whole categories of exploits. And promoting the dlopen() craze will make this kind of mitigation impossible.
And yeah, I absolutely hate the braindead design of nsswitch, PAM, and now libsystemd.
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz
A backdoor in xz