Another OpenSSH remote code execution vulnerability [LWN.net]

Interesting

Posted Jul 9, 2024 15:21 UTC (Tue) by pbonzini (subscriber, #60935) [Link]

This must be a relatively rare case: the bug is in upstream code, but it is latent without the Red Hat patch. The fix in fact has already been applied upstream (it doesn't touch the Red Hat-specific code) and in the discussion people suggested applying it even on non Red Hat distros, if they use OpenSSH 8.7 and 8.8.

still a 3rd party patch?

Posted Jul 10, 2024 9:46 UTC (Wed) by snajpa (subscriber, #73467) [Link]

is there anyone here who understands the politics of why that patch still isn't upstreamed? it would seem useful to have that functionality directly upstream...

Another signal handler related bug

Posted Jul 10, 2024 12:16 UTC (Wed) by make (subscriber, #62794) [Link] (26 responses)

I think people should not write signal handlers at all, at least not on Linux; Linux has signalfd which is the only secure (and sane) way to handle signals. On non-Linux, you can write to a pipe to defer handler execution to the main loop where it's safer.

Signal handlers are dangerous, and I can't keep wondering why renowned projects such as OpenSSH use them. (Even if the actual bugs were added by third-party patches. I question the general coding style.)

(Coincidentally, I have written a SSH server last year which, of course, uses signalfd and is not affected by this class of bug. Of course, other classes of bugs may well be present. ;-))

Another signal handler related bug

Posted Jul 10, 2024 14:32 UTC (Wed) by mcatanzaro (subscriber, #93033) [Link] (18 responses)

My general solution is to set some flag in the signal handler that indicates the program should handle the signal later on, then return to normal code. Then you don't have to worry about signal safety. Just don't try to do any actual work in your signal handler and it shouldn't be too hard to avoid signal safety issues. Applications that use GLib can use g_unix_signal_add() or g_unix_signal_source_new() for an especially convenient solution.

(The other place where signal safety is required is between fork() and exec(). There's no easy workaround there. That's way harder.)

Another signal handler related bug

Posted Jul 10, 2024 14:47 UTC (Wed) by make (subscriber, #62794) [Link] (17 responses)

That's one way to do it, but it works only if your program checks this flag periodically. Well-behaving programs don't wake up unless something needs to be done, but do then wake up immediately (that's two distinct features). So you need a way to wake up the event loop - e.g. using signalfd/eventfd (Linux) or a pipe (everything else).

The problem with OpenSSH is that neither works because they use alarm()/SIGALRM to implement a timeout (poorly). There are no periodic checks, but there is also no way to wake up (interrupt) the code; instead, the SIGALRM handler gets invoked by the kernel after the alarm() time while the actual code is at some random location (maybe inside free()); the signal handler then takes over and exits the process (hopefully without invoking free() again). The actual code is never returned to.

Another signal handler related bug

Posted Jul 10, 2024 15:23 UTC (Wed) by paulj (subscriber, #341) [Link] (16 responses)

Signals "wake" up event loops - select(), poll() and friends all return if a signal occurred and was caught.

Another signal handler related bug

Posted Jul 10, 2024 15:47 UTC (Wed) by paulj (subscriber, #341) [Link] (15 responses)

Generally, all syscalls return if a signal is caught. Some syscalls are fast, and will never be interrupted by a signal; all the rest will return. Or the process is terminated. No doubt there's some exception, but I can't think of anything off hand.

Which makes me think I must have misunderstood your point. How could a non-buggy, non-badly designed event handling system ever miss signals?

I guess you meant where you run some bad library code that sleeps and loops on a blocking syscall ignoring EINTR?

Another signal handler related bug

Posted Jul 10, 2024 16:10 UTC (Wed) by make (subscriber, #62794) [Link] (14 responses)

You have a point, but it's flawed.
SIGALRM does indeed interrupt most system calls, but that doesn't help much.

For example, like you said, all the syscall call sites must check and handle EINTR. That is theoretically doable, but very cumbersome and fragile - hard to get 100% coverage.
Blocking all signals and using signalfd solves that general design flaw of signals (though Linux-only). There's also ppoll(), but I think that's Linux-only as well.

Then you have a race condition: what if the signal just happens to arrive while no system call is in progress; the flag that was set by the signal handler has already been checked; and then a blocking syscall gets invoked. Now the process blocks forever. I think this problem cannot be solved; this is why you need a pipe (or eventfd) that keeps waking up epoll until you have handled the event.

Most people solve this by just waking up their event loops periodically. But it's a bad design. Don't wake up unless you have a reason to.

Another signal handler related bug

Posted Jul 10, 2024 19:11 UTC (Wed) by madscientist (subscriber, #16861) [Link] (1 responses)

ppoll() is Linux-only, but pselect() is defined by POSIX (as of Issue 6, in 2004) and widely implemented.

https://pubs.opengroup.org/onlinepubs/9699919799/toc.htm

Another signal handler related bug

Posted Jul 10, 2024 22:47 UTC (Wed) by wahern (subscriber, #37304) [Link]

ppoll is now standardized in POSIX.1-2024 / SUSv5. It's already available on most extant unix environments, including the 4 major *BSDs, Solaris, and Solaris derivatives. macOS and AIX are the only notable holdouts, I think.

Another signal handler related bug

Posted Jul 11, 2024 11:58 UTC (Thu) by paulj (subscriber, #341) [Link] (11 responses)

I agree signalfd is nice. But generally have to disagree a bit here. The function you call (from normal context) to notify of a signal shouldn't be calling blocking syscalls, nor should your code. Further, the event handler loop dealing with calling these (normal context)-signal-handler/callbacks should itself block all signals while dealing with this.

I wrote a small "bump signal handling to normal context" abstraction myself many years ago (before signalfd), and it worked well across a number of Unixes - avoided races between the (non-sig-context) signal handlers and actual signals, etc.. Yes, it's a trap for the unwary, but it can be dealt with. And it can be dealt with once, and not be a bother after that.

A library generally should not be calling blocking syscalls. They should let the user handle the I/O and waiting/sleeping. A library that is calling poll, system, doing blocking I/O, etc., under your feet is nearly always a terrible library that you should avoid. A good library exposes APIs to allow the user to orchestrate these things. (Or should itself provide the event handling abstraction that does the orchestration, with the user able to plug their code into that - glib, SDL, etc.).

I agree there is a lot of shit library code out there. There are too many programmers out there who take shortcuts in library code, and throw bombs in their code. My advice is to check your dependencies (grep the source, check runtime with strace) and avoid the shit. Knuth was more right than wrong about direct code-reuse of other people's shit being more bad than good.

Another signal handler related bug

Posted Jul 11, 2024 12:12 UTC (Thu) by make (subscriber, #62794) [Link] (10 responses)

> But generally have to disagree a bit here.

But you did not actually disagree :-)

> shouldn't be calling blocking syscalls

This is a misunderstanding. The "blocking syscall" I meant was things like poll() (or epoll_wait() etc.). If you check whether a signal has arrived before invoking poll(), your code is racy. That was my point.

> the event handler loop dealing with calling these (normal context)-signal-handler/callbacks should itself block all signals while dealing with this

This repeats what I wrote in the post you're replying to.

Another signal handler related bug

Posted Jul 11, 2024 12:18 UTC (Thu) by paulj (subscriber, #341) [Link] (9 responses)

Meh. Just have a timeout in your select() and wake up every 500ms to 1s to check your signals. There's always other timer based events to also deal with anyway.

Another signal handler related bug

Posted Jul 11, 2024 12:24 UTC (Thu) by make (subscriber, #62794) [Link]

No, please not. I explained why in the first post you replied to.

Another signal handler related bug

Posted Jul 11, 2024 12:30 UTC (Thu) by atnot (subscriber, #124910) [Link] (7 responses)

> Just have a timeout in your select() and wake up every 500ms to 1s to check your signals

nice, so if every daemon on my system does this, I get a wakeup every >10ms and the CPU can never enter any low power states, just to slightly simplify the implementation for some obscure unixes

Another signal handler related bug

Posted Jul 11, 2024 12:47 UTC (Thu) by make (subscriber, #62794) [Link] (1 responses)

> if every daemon on my system does this, I get a wakeup every >10ms

If? Try "powertop" and get scared by how many programs on your computer are badly written.

stracing random processes and getting upset is one of my hobbies ;-)
(Bonus annoyance for stracing Go programs.)

Electron

Posted Jul 12, 2024 2:03 UTC (Fri) by ringerc (subscriber, #3071) [Link]

It's hard to beat Electron as an offender in the unnecessary wakeups game.

I remember when Skype switched from their Qt client to an Electron based one (when I still had to use it for work) my laptop battery life fell by ⅓.

Another signal handler related bug

Posted Jul 11, 2024 13:01 UTC (Thu) by paulj (subscriber, #341) [Link] (4 responses)

I agree signalfd is better.

Before signalfd or on other unices, I'd just use a longish wakeup interval (not 10ms!). I mostly do networking code, so just the protocol themselves will usually have some timeout somewhere that needs a timeout - so sleeps/waits are bounded anyway. The /tiny/ sliver of time between exiting the (normal context) seen-signal dispatch code and going back to blocking in select() and the chance of hitting a signal there is so low, that's it not worth micro-optimising for. Signals in my code are not latency sensitive. They can wait a second or two to be handled.

Another signal handler related bug

Posted Jul 11, 2024 13:09 UTC (Thu) by make (subscriber, #62794) [Link] (3 responses)

"sleeps/waits are bounded" is not a good excuse. If your networking code just happens to be idle at the moment, it shouldn't wake up at all until somebody comes along and sends something to it.

And this isn't about "micro-optimizing". Admittedly a SIGHUP or a SIGTERM has no extreme latency requirements, but that's not what this is about, this framing misses the point.
If you write good code, you get immediate signal handling for free. If you write bad code, signal handling will have to wait until something just happens to wake up your event loop. This delay is a metric for bad code.

("Good" and "bad" are, of course, just my taste. You think your code is fine, that's okay, as long as your code doesn't run on my computer...)

Another signal handler related bug

Posted Jul 11, 2024 13:26 UTC (Thu) by paulj (subscriber, #341) [Link] (1 responses)

We're not disagreeing. Signalfd fixes that hole. Prior to signalfd, I just bounded the select timeout.

I disagreed that signals were "missed" - they're not missed, but (extremely rarely) you get a higher latency on handling the signal than otherwise. I was just responding originally to your claim "So you need a way to wake up the event loop" and the seeming implication (whether just by incomplete comms on your side, or poor interpretation on my side, or both) that syscalls did not wake by themselves if signals occurred.

With signalfd you can avoid the periodic (nearly always needless) wakes, and avoid the high response latency. Completely agreed with you there.

Writing to a pipe rather than setting a flag seems nice too. Is there any downsides / gotchas to that? If no gotchas, that seems like a perfect and portable solution to turn SIGs into events you can deal with in select loop.

Another signal handler related bug

Posted Jul 11, 2024 18:20 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> We're not disagreeing. Signalfd fixes that hole. Prior to signalfd, I just bounded the select timeout.

Or a self-pipe. It can also be poll()-ed.

Another signal handler related bug

Posted Jul 11, 2024 13:46 UTC (Thu) by atnot (subscriber, #124910) [Link]

It does sort of have a latency requirement in that you will get the dreaded "a stop job is running for x.service", have to wait a minute until it finally gets SIGKILL'd and hope that it can recover from whatever that state is. This is extra likely when the rest of the system has been shut down already and there's nothing to cause normal wakes.

Probably not the biggest deal, but there's just no point to using such fragile cludges where you have to cross your fingers and hope for the best when we already have a solution that's more robust, easier to implement and very difficult to get wrong, especially in a way that would put sysadmins across the globe on high alert to apply some patches.

Another signal handler related bug

Posted Jul 10, 2024 16:19 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (5 responses)

> I think people should not write signal handlers at all, at least not on Linux; Linux has signalfd which is the only
> secure (and sane) way to handle signals.

signalfd requires the signals it's supposed to receive to be blocked to prevent default actions from being taken. There are at least three more ways to handle signals safely, ie, in such a way that code handling a signal can do whatever it wants without risking bad interactions with other code. They also required to set of handled signals to be blocked throughout ordinary program execution.

1. Define signal handlers via sigaction such that all handled signals are blocked during execution of any signal handlers. Only unblock signals as implicit effect of the p-type synchronous I/O mulitplexing call, ie, pselect, ppoll or epoll_pwait.

2. Define signal handlers as above, use signals for I/O readiness notification¹. Use a central loop which looks like this:

while (1) {
sigprocmask(SIG_SETMASK, &omask, NULL):
pause();
sigprocmask(SIG_BLOCK, &handled_set, NULL);
}

3. Use signals for I/O readiness notification and a loop built around sigwait, sigwaitinfo or sigtimedwait to receive signals.

¹ Linux supports signal-driven I/O for any number of file descriptors by providing a file descriptor and an I/O event type in struct siginfo.

Another signal handler related bug

Posted Jul 10, 2024 16:47 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

Correcting some errors and omissions:

omask is supposed to be a sigset_t which was used to capture the signal mask the process was invoked with.

handled_set is supposed to be a sigset_t containing all signals which are supposed to be handled by the program.

Signal-driven I/O also requires using queued realtime signals which can be associated with a file descriptors via the F_SETSIG fcntl-operation. Otherwise, I/O events could get lost. Code employing this model also needs to handle possible signal queue overflows in some way.

Another signal handler related bug

Posted Jul 10, 2024 17:01 UTC (Wed) by make (subscriber, #62794) [Link] (3 responses)

1. is the old way of doing signals safely (before signalfd was invented), and I mentioned that here: https://lwn.net/Articles/981540/ - though I'm not sure if that's portable or Linux-only

2. still suffers from the race condition; SIG_SETMASK will immediately deliver all pending signals, and pause() will block forever, or am I missing something? (I've never used pause()) - and maybe a signal gets delivered between SIG_SETMASK and pause()

3. maybe you can use signals for I/O readiness, but what's the point? epoll/kqueue/... is so much easier; it has simpler semantics and fewer footguns (you named one example: signal queue overflows)

(None of this matters for OpenSSH, because they're using alarm() to implement a timeout for some blocking function which has no event loop. The SIGALRM handler simply exits the process.)

Another signal handler related bug

Posted Jul 10, 2024 18:03 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link] (2 responses)

There's no race condition in 2) because the program (I've actually written programs in this way) is entirely signal driven. Once an event again occurs, a signal handler will be invoked which will interrupt the pause call as a side effect. The loop I posted is too complicated, though: It's sufficient to do the SIG_SETMASK before the loop which can just become

while (1) pause();

Queued realtime signals for I/O readiness notification is a somewhat older way than epoll to make synchronous I/O multiplexing scale better than possible with select/ poll because the complete interest set needs to be transferred to the kernel with every call of either. I used that once in a HTTP interception proxy supposed to provide content-filtering for an UTM appliance almost 20 years ago.

Another signal handler related bug

Posted Jul 10, 2024 18:24 UTC (Wed) by make (subscriber, #62794) [Link] (1 responses)

> There's no race condition in 2) because the program (...) is entirely signal driven

OK, I was confused by the two sigprocmask() calls which made no sense. If that's the whole main loop, there is indeed no race condition, because there is no race - not because the code is flawless.

> realtime signals for I/O readiness notification is a somewhat older way than epoll

I've never used that, that's why I asked: what's the point? I was wondering if I should learn about this or whether this is just obsolete stuff, and I should keep avoiding signals whenever possible.

Another signal handler related bug

Posted Jul 10, 2024 18:33 UTC (Wed) by rweikusat2 (subscriber, #117920) [Link]

>> There's no race condition in 2) because the program (...) is entirely signal driven
>OK, I was confused by the two sigprocmask() calls which made no sense. If that's the whole main loop, there is
> indeed no race condition, because there is no race - not because the code is flawless.

The code is correct with or without the superfluous sigprocmask calls in the loop as it doesn't rely on pause being interrupted. That's just something which may happen and the whole purpose of the loop is to work around that. Anything actually productive only happens because this or that signal handler got invoked by the kernel.

Another signal handler related bug

Posted Jul 18, 2024 19:31 UTC (Thu) by mrugiero (guest, #153040) [Link]

> (Coincidentally, I have written a SSH server last year which, of course, uses signalfd and is not affected by this class of bug. Of course, other classes of bugs may well be present. ;-))

Interesting. Can I check it out?