Local root vulnerability in snap-confine

Posted Feb 18, 2022 11:51 UTC (Fri) by mathstuf (subscriber, #69389)
In reply to: Local root vulnerability in snap-confine by epa
Parent article: Local root vulnerability in snap-confine

That is a lot of state that needs to be tracked. It also raises the question of "how do I debug this path (name) not resolving in this daemon" with fun "oh, it hasn't updated its symlink view" kinds of answers.

Other questions that come to mind:

- Should symlinks in mounts that show up after the process starts count?
- How would this work with something like AndrewFS where finding all symlinks sounds like an absolute nightmare?
- Can this be namespaced (i.e., "update symlink perms under `/etc/systemd/system`" for systemd)?
- Now the kernel is subject to symlink races that hide themselves in process-specific state that you can't see without a way to debug?
- If it is just `mtime`-based or whatever, that is trivially attacker-controlled too, so that doesn't sound like it's saving anything there.

Sure, global mutable state is terrible in practice and a fun source of bugs for everyone, but that's what a filesystem *is*, so I don't see how snapshotting it makes it any better without a way to launch a process that is pinned to "the symlink view of PID X" for a way to have some sanity in multi-process systems or debugging scenarios.

Local root vulnerability in snap-confine

Posted Feb 18, 2022 12:46 UTC (Fri) by epa (subscriber, #39769) [Link] (3 responses)

Maybe if a symbolic link has appeared or changed since you last called refresh(), then path lookup operations traversing that link would fail with EREFRESH. This would be fail-safe for older code, and newer code might be able to handle it sanely (it certainly sounds easier than meticulously rewriting all your code with the 'at' system calls). At worst, you just have to restart your daemon. Again, symlinks don't change often in practice so this doesn't seem too high a price.

It would have to be based on real world time, not just whatever mtime is in the file system. A fully snapshottable filesystem would be impossible to graft on to current POSIX semantics, but if we're just making a sticking plaster for race conditions with symlinks, some kind of per-process view of the world seems possible. You're not trying to snapshot exactly what symlinks existed at a point of time, but only to note whether one has changed in any way, and if so fail when it's used.

Local root vulnerability in snap-confine

Posted Feb 18, 2022 14:57 UTC (Fri) by Wol (subscriber, #4433) [Link] (1 responses)

Yup. Don't snapshot your symlinks per process, but make it so the first access caches it - REMEMBERING THE INODE - and any further attempts to access the symlink get the same inode until the cache is actively flushed.

Cheers,
Wol

Local root vulnerability in snap-confine

Posted Feb 18, 2022 20:50 UTC (Fri) by developer122 (guest, #152928) [Link]

A surprisingly elegant solution, imo.

Local root vulnerability in snap-confine

Posted Feb 18, 2022 17:43 UTC (Fri) by nix (subscriber, #2304) [Link]

> This would be fail-safe for older code

What? It would cause new failures for older code, i.e. it would introduce countless DoS vectors, only a microscopic proportion of which correspond to actual attacks, but all of which would annoy users (where "annoy users" spans the entire spectrum from "slightly annoying" through "bug we can't track down that goes away on restarting" through to "oops now the system is unbootable because of an unexpected and undiagnosed EREFRESH while installing something boot-critical").

Hell no.