Not really userspace
Not really userspace
Posted Jul 17, 2024 1:03 UTC (Wed) by Cyberax (✭ supporter ✭, #52523)Parent article: Redox to implement POSIX signals in user space
How I would do truly userspace signals:
1. Listen for signals in a separate background thread that is always active. They arrive through some kind of an IPC channel.
2. Add a syscall for the kernel to interrupt and pause the given process/thread, and return its runtime context (registers file).
3. This is all.
The signal handling implementation has two options:
1. It can "steal" the thread to point the next instruction to the user's signal handling code, and then un-pause it.
2. Take a leaf out of Windows NT and just run the signal handling from within the background thread.
The second approach will probably be slightly non-POSIX and can be opt-in. But it provides a _sane_ way to handle everything, even asynchronous signals like SIGSEGV. The monitoring thread can even be hardened with a separate address space.
Posted Jul 17, 2024 8:24 UTC (Wed)
by ddevault (subscriber, #99589)
[Link] (11 responses)
>Listen for signals in a separate background thread that is always active. They arrive through some kind of an IPC channel.
A background thread (thread here implying that it lives in the same address space as the signaled process) is not necessary and probably not desirable. In a microkernel context what you're more likely to have is some kind of process server which oversees running processes and implements some process management semantics for them, which has capabilities (or whatever) to instruct the kernel to suspend/resume the target process, read/write the register file, and modify the target process's address space. This is (almost) enough to implement signals in user space. Such a process server would also probably be responsible for implementing some stuff like fork/exec, pthread_create, etc.
But the only real clarification I'd add to your comment is that it'd probably be better as a supervisor server rather than as a thread in the process.
Posted Jul 17, 2024 17:40 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (10 responses)
Ultimately, if signals are split into two primitives: thread suspension, and code injection into a thread's context, - then combining them in different ways is a much more powerful tool than the current signals.
Posted Jul 17, 2024 17:50 UTC (Wed)
by ddevault (subscriber, #99589)
[Link] (9 responses)
Posted Jul 17, 2024 21:08 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
In your case, the supervisor (signald?) will be responsible for all the signals in the system. It'd be OK for POSIX software, and native software will just use the "suspend thread" call directly instead. Like Go runtime that uses signals to pre-empt threads that are running inside tight inner loops for their user-space scheduling.
Posted Jul 18, 2024 9:22 UTC (Thu)
by 4lDO2 (guest, #172237)
[Link] (4 responses)
Assuming the current implementation proposal does not significantly change when the process manager takes over the signal sending role from the kernel, this is to some extent true, but it doesn't require any dynamic mapping of other processes' memory, or using kernel interfaces for register modification. The target threads themselves handle register save/restore, and the temporary old register values (like the instruction pointer before being overwritten) are stored in a shared memory page, so apart from the suspend/interrupt logic, the kernel only needs to be able to set the target instruction pointer. It's too early to say, but maybe this will be reduced to a userspace-controlled IPI primitive?
(The kernel does already support ptrace though.)
Posted Jul 18, 2024 19:51 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Looks like it. Basically, the whole design can be:
1. A separate signald process that provides the API for the signal masking and queueing.
The kernel then needs to have this additional functionality:
Signald can then do all the processing and masking logic. It also neatly removes from the kernel all the corner cases with double signals, signal targeting, and so on.
It also opens the way for a better API in the future.
Posted Jul 19, 2024 15:34 UTC (Fri)
by 4lDO2 (guest, #172237)
[Link] (2 responses)
This is not how the current implementation works, and would probably be too inefficient for signals to be meaningful for non-legacy software. Currently, sigprocmask/pthread_sigmask, sigaction, sigpending, and the sigentry asm which calls actual signal handlers, are implemented without any syscalls/IPC calls, but instead only modify shared memory locations. Sending process signals (kill, sigqueue) requires calling the kernel (later, the process manager) for synchronization reasons. And although sending thread signals (raise, pthread_kill) currently also calls the kernel, it's possible the latter will also be possible to do in userspace too, only calling the kernel if the target thread was blocked at the time the signal was sent, like futex, which is what I meant by "userspace-controlled IPI primitive".
> The kernel then needs to have this additional functionality:
That is exactly what ptrace allows, but this signals implementation is not based on tracing the target thread and externally saving/restoring the context, it's based on *internally* saving/restoring the context on the same thread. Very similar to how an interrupt handler would work. The kernel only needs to be able to save the old instr pointer, jump userspace to the sigentry asm, mask signals, and the target context will *itself* obtain a stack and push registers, etc. The same applies for exceptions, which will be *synchronously* handled (using a similar mechanism as signals), also analogous to CPU exceptions like page faults. Though it might make sense to allow configuring exceptions asynchronously as an alternative, so a (new) tracer is always notified when e.g. a crash occurs, if a program is not explicitly prepared for such events.
Posted Jul 19, 2024 17:40 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Honestly, signals shouldn't be used for non-legacy software. It's a bad primitive, they're not composable, there's a limited number of them, and so on.
If instead you have a primitive specifically designed as a way to manipulate running threads, then it might be more useful. A great example is Go runtime that is using signals to interrupt inner loops. Once the thread is interrupted, they run a conservative pointer scanning on the most recent stack frame and registers, to protect new objects against being garbage-collected.
Additionally, the handler doesn't _have_ to be in a different process. It can be in a background thread within the same process, so the amount of context switches can be the same compared to regular signal handling.
> That is exactly what ptrace allows, but this signals implementation is not based on tracing the target thread and externally saving/restoring the context, it's based on *internally* saving/restoring the context on the same thread.
Yeah, that has been a constant issue with signals. It depends on the thread's environment being sane, so sigaltstack() was an inevitability. And if you have sigaltstack(), then why not just extend it to handling via an IPC?
Posted Jul 19, 2024 21:14 UTC (Fri)
by 4lDO2 (guest, #172237)
[Link]
I agree signals are a bad primitive for high-level code, and it's a shame POSIX has reserved almost all of the standard signals, many of which signals are too low-level for (SIGWINCH for example). Signalfd or sigwait are much better in those cases, or using a high level queue-based abstraction like the `signal-hook` crate. It would probably be better if the 'misc' signals were instead queue-only, or not signals at all, and if exceptions and signals would be separated. And possibly making SIGKILL and SIGSTOP non-signals.
> If instead you have a primitive specifically designed as a way to manipulate running threads, then it might be more useful.
This is sort of what I'm trying to reduce the kernel part of the implementation into. Just a way to IPI a running thread and set its instruction pointer, and then let that thread decide what it should do. Possibly even literally using IPIs, such as Intel's SENDUIPI feature, and possibly using "switch back from timer interrupt" hooks (with the additional benefit of automagically supporting restartable sequences). This would be without any context switches at all, although a mode switch for the receiver, if both the sender and receiver are simultaneously running.
This is of course useful for runtimes like Go, the JVM, and possibly even async Rust runtimes (maybe a Redox driver can be signaled directly if a hardware interrupt occurs, coordinated with the runtime), which aren't (necessarily) based on switching stacks.
> Additionally, the handler doesn't _have_ to be in a different process. It can be in a background thread within the same process, so the amount of context switches can be the same compared to regular signal handling.
> Yeah, that has been a constant issue with signals. It depends on the thread's environment being sane, so sigaltstack() was an inevitability. And if you have sigaltstack(), then why not just extend it to handling via an IPC?
Switching stacks is, apart from TLS (assuming x86 psabi TLS is required), virtually the same thing as switching between green threads, and on some OSes regular threads and green threads are even the same (pre-Windows 11 with UMS, AFAIK). That could perhaps eventually include Redox. I don't understand why one would want IPC (assuming you mean process and not processor) except when tracing, as that'd suffer from the usual context switch overhead, which is probably too high for a language/async runtime.
Posted Jul 18, 2024 9:28 UTC (Thu)
by 4lDO2 (guest, #172237)
[Link] (2 responses)
Posted Jul 19, 2024 7:42 UTC (Fri)
by ddevault (subscriber, #99589)
[Link] (1 responses)
Posted Jul 19, 2024 15:10 UTC (Fri)
by 4lDO2 (guest, #172237)
[Link]
The synchronization works as follows: for thread signals (pthread_kill, raise), the thread pending and thread mask bits are inside the same 64-bit atomic word, so the thread is always either correctly unblocked by the kernel/proc manager, or pthread_sigmask will see that signals were pending at the time it blocked/unblocked signals. For process signals (kill, sigqueue, etc.), it's a little trickier; the process pending bits are cleared by notified threads competitively, when detected, and in rare cases spurious signals may occur (which are properly handled). Both process and thread signals currently require the kernel to hold an exclusive lock when *generating* the signal (especially the SIGCHLD logic which really can't be synchronized otherwise), but it's possible pthread_sigmask will be able to even bypass the kernel later.
Posted Jul 17, 2024 8:56 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (8 responses)
Given that Linus regularly ignores what he considers stupid Posix design decisions, that's the way I'd go. "This is the Redox way. This is the design rationale. If you want Posix here's a bodge to support it".
If you can implement Posix within a sane design rationale, then go for it! If you can't, well don't let Posix break the rest of your design!
Cheers,
Posted Jul 18, 2024 5:25 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (7 responses)
These are considered "hacks" now, but this was the original purpose of signals--to provide a thin and safe[2] abstraction over hardware interrupts that userspace could make use of. As common usage and experience of signals evolved, e.g. relying on SIGHUP as a notice to reload a configuration file, signals began to seem hopelessly baroque and anachronistic. Most developers now live in a world much further up the stack of abstractions. But the original motivation and need still exists, and if we really care about software security, we need *more* *safe* programatic access to hardware facilities from userspace to remove complexity from the kernel. I agree userspace signals sounds like a good idea (notwithstanding that kernel implementations are already quite simple and thin, shunting much of the complexity to userspace by design), but don't throw the baby out with the bath water by only catering to the usages that don't really benefit from Unix signal semantics.
[1] In microbenchmarking branch prediction makes explicit boundary and NULL checking seem free, but branch prediction is a limited resource and is better spent on other application-specific logic. SIGSEGV effectively lets you delegate these tasks to the MMU, which it has to do anyhow. Why do it twice, soaking up other precious hardware resources in the process?
[2] From the kernel's and system's perspective, as opposed to letting processes poke at hardware facilities more directly using more complex protocols.
Posted Jul 18, 2024 7:50 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
"Press [X] to doubt".
Signals are not at all fast because they necessarily involve several context switches. I wrote a short benchmark: https://gist.github.com/Cyberax/5bd53bff3308d6e026d414f93... - it takes about 2 seconds on my Linux machine to run 1 million iterations of SIGSEGV handling.
Perf data: https://gist.github.com/Cyberax/aa96a237a9b04ed6f25e09c63... - so around 500k signals per second. That's not bad, but it's also not that _great_ for a good primitive for high-performance apps.
Posted Jul 18, 2024 8:31 UTC (Thu)
by mgb (guest, #3226)
[Link] (1 responses)
Posted Jul 18, 2024 19:08 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
And if you have such hardware capabilities, you should use a different primitive for flow control.
Posted Jul 18, 2024 9:40 UTC (Thu)
by 4lDO2 (guest, #172237)
[Link] (2 responses)
The point of catching SIGSEGV this way, is not that exceptions are faster than checks, but that they *are* faster overall if the probability of the check failing, is low enough to justify avoiding this check in the general case. After all, this is why Linux (and Redox) implements copy_from_user as a catchable memcpy function, rather than walking the page tables every time. Most applications won't EFAULT more than once.
Posted Jul 18, 2024 19:30 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
FWIW, Windows also uses a similar model. Memory faults can be caught via SEH ("Structured Exception Handling"). It still has to round-trip through the kernel, but it's more lightweight compared to signals.
> The point of catching SIGSEGV this way, is not that exceptions are faster than checks, but that they *are* faster overall if the probability of the check failing, is low enough to justify avoiding this check in the general case. After all, this is why Linux (and Redox) implements copy_from_user as a catchable memcpy function, rather than walking the page tables every time. Most applications won't EFAULT more than once.
Sure, but then it's also fine if the signal handling is done through a userland process/thread. It will add a couple of context switches, but it'll still be fast enough.
Posted Jul 19, 2024 16:16 UTC (Fri)
by 4lDO2 (guest, #172237)
[Link]
That's pretty cool. My understanding is that Windows generally offloads much more logic to userspace, as they can freely change the kernel/user ABI.
> Sure, but then it's also fine if the signal handling is done through a userland process/thread. It will add a couple of context switches, but it'll still be fast enough.
Yeah probably. Redox will most likely implement userspace exception handling synchronously, like signals, but for many workloads those edge cases would presumably not be that noticeable either way.
Posted Jul 20, 2024 0:53 UTC (Sat)
by am (subscriber, #69042)
[Link]
If you have hot code with branches checking for null that aren't being taken, the JIT will simply optimize out the checks, giving you better throughput. If a value happens to be become null after all, it will dereference it, catch the SIGSEGV, throw out the optimized code ("deoptimize"), and then continue where it left off, giving you a little latency spike.
Posted Jul 17, 2024 9:21 UTC (Wed)
by sthibaul (✭ supporter ✭, #54477)
[Link]
This is what GNU/Hurd does indeed.
It's a bit hairy in some places to manage the interrupted context, but that does work indeed.
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
2. Signal functions in libc simply do RPC calls to the signald.
1. A syscall to pause a given thread, and return the thread context (register file and whatever additional information needed). The pause functionality can work even if the thread is in the kernel space, or it can be deferred to the syscall return time.
2. A syscall to resume a given thread with the provided thread context.
3. Asynchronous exceptions (like SIGBUS/SIGSEGV) in the kernel automatically pause the offending thread, and submit the thread context to the signald via some kind of IPC.
Not really userspace
> 2. Signal functions in libc simply do RPC calls to the signald.
> 1. A syscall to pause a given thread, and return the thread context (register file and whatever additional information needed). The pause functionality can work even if the thread is in the kernel space, or it can be deferred to the syscall return time.
> 2. A syscall to resume a given thread with the provided thread context.
> 3. Asynchronous exceptions (like SIGBUS/SIGSEGV) in the kernel automatically pause the offending thread, and submit the thread context to the signald via some kind of IPC.
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Wol
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
Not really userspace
