Automatic async-signal-unsafe checking?

Posted Jul 2, 2024 1:18 UTC (Tue) by dskoll (subscriber, #1630)
In reply to: Automatic async-signal-unsafe checking? by nix
Parent article: Serious vulnerability fixed with OpenSSH 9.8

Couldn't a library spawn a thread and do the bulk of the signal handling there (using its own self-pipe, for example)? A lot more of POSIX is thread-safe than async-signal-safe.

Automatic async-signal-unsafe checking?

Posted Jul 2, 2024 4:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

You can do that, although you'll have to make sure that every non-signaled thread has all the signals blocked. Including the threads that might be started by libraries.

Interestingly, Windows NT developers didn't want to bother with full POSIX compatibility for their command-line-oriented libc. So signals were delivered in a separate thread, started for that purpose.

And honestly, this makes so much more sense for most of the signals.

Automatic async-signal-unsafe checking?

Posted Jul 2, 2024 12:22 UTC (Tue) by nix (subscriber, #2304) [Link]

Yes, though as Cyberax notes, blocking everything is important too -- and God forbid you have to communicate with other threads or share anything but the most trivial state without a locking nightmare: it's often better just to serialize everything and use a separate process. At least this *exposes* the fact that concurrency is hard rather than hiding it and making it simply impossible to deal with like asynchronous signals do, but that doesn't really make it less hard, or less full of lethal traps where every single bit of the code has to consider that the data structures it's manipulating might be used by something else running at the same time. It's still really difficult, unscoped coupling.

Another problem I've hit is that thread-directed signals remain a bit of a late-addition inconsistent-API nightmare. You can at least rely on tgkill() these days, but if you want to fire a timer and direct *that* at a thread (something I had to do specifically to get around the fact that other things, like ptrace(), are *only* thread-directed)? Suddenly you're messing about with sigev_notify = SIGEV_SIGNAL | SIGEV_THREAD_ID and _sigev_un._tid and other "no you should not touch this, there's minimal documentation, it's supposed to be for threading libs only and underscores are everywhere" horrors.

(All pain comes from truth, so here's what triggered this:

<https://github.com/oracle/dtrace-utils/commit/c883bd437cf...>
<https://github.com/oracle/dtrace-utils/commit/234b39beb0e...>

where I had to use threads because of wanting to do something, anything else at the same time as ptrace()/waitpid(), then I had to use condvars and implement a whole little RPC layer because of threads, then I had to actually rely on hitting things with timer-triggered signals and EINTR returns because of wanting the thread to do anything else at all rather than just waitpid()ding forever or sitting in a CPU-spinning polling loop or getting stuck in race conditions because of course you can't turn waitpid() into a poll()able entity without non-upstreamed patches: and that's just a small fraction of the complexity here, almost none intrinsic to the problem space and all working around the APIs. The Linux API is really an abominable tangled un-designed mess in this area.)

Automatic async-signal-unsafe checking?

Posted Jul 2, 2024 12:49 UTC (Tue) by geofft (subscriber, #59789) [Link]

Yes, but with a couple of downsides. A program cannot call unshare(CLONE_NEWUSER or CLONE_NEWPID) if it is multithreaded. In the versions of glibc affected by the bug that started this discussion, having multiple threads in the process causes it to start doing locking on malloc, so it slows down the program. Programs that call fork() and do significant work in the child (as opposed to calling async-signal-safe functions and finally _exit() or execve()) can act weird for very similar reasons to the bug that started this discussion: because only the forking thread is duplicated, if the library's helper thread was in the middle of malloc(), then in the forked child, the heap is locked and nothing will ever unlock it, so calls to malloc() will deadlock, whereas in a single-threaded program this problem could not happen.

In general it's rude for a library to create a new thread without clearly documenting that fact and providing an async/sans-IO alternative. If you absolutely must do this, fork a child process and then do the usual things to daemonize (double-fork, setsid()) and use pipes or shared memory or something to communicate with synchronous function calls in the parent.