User-managed concurrency groups
User-managed concurrency groups
Posted Dec 28, 2021 16:51 UTC (Tue) by ms (subscriber, #41272)Parent article: User-managed concurrency groups
Posted Dec 28, 2021 17:11 UTC (Tue)
by khim (subscriber, #9252)
[Link]
Switching to kernel and back is not that slow (even green threads do syscalls). What's slow is waiting for next thread to be scheduled by kernel to become executable. Basically a band-aid for the fact that many years ago GNU/Linux rejected NGPT and went with NPTL. If you allocate, essentially, a dedicated kernel thread for your “green thread” then you may use all syscalls and libraries which are allowed for regular threads: parts of the program where “green threads” are cooperating would work like proper “green threads”, but if you call some “heavy” function the instead of freezing your whole “green thread” machinery you would just get one-time hit when kernel would save your beacon and would give you a chance to remove misbehaving “green thread” from the cooperating pool. Proper TLS area is another benefit. In systems where (like in Windows) fibers (AKA “green threads”) their own private storage but share TLS for “kernel threads” it's much easier to mess things up. Of course that one is possible without kernel help, but you get it for free if you use kernel thread machinery as “safety net” for misbehaving fibers.
Posted Dec 29, 2021 22:39 UTC (Wed)
by nksingh (subscriber, #94354)
[Link] (6 responses)
With traditional M:N scheduling like fibers, if the user threaded code blocks, no code in the app gets control to choose what's going to run next, unless the blocking is going through the userspace threading library. This is a major part of the reason that Go or LibUV wrap all of the IO calls, so that they can control their green thread scheduling.
UMS allows such a runtime to effectively get a callback to decide what to do next (e.g. schedule a new lightweight task) when something blocks. This is a great idea if you have a set of syscalls from your task that may or may not block in an unpredictable manner, like a read from the pagecache where you don't know if you'll miss. You can be optimistic, knowing that you'll get a callback if something goes wrong.
However using this mechanism requires some significant investment in the language ecosystem, like has been done with GoRoutines. And I don't think there's a massive performance uplift in the best of cases, but perhaps Google has measured something worthwhile in their workloads.
Posted Dec 29, 2021 22:43 UTC (Wed)
by nksingh (subscriber, #94354)
[Link]
It's not clear that the premises have aged well, since threads are quite popular and do actually perform well enough.
Posted Dec 29, 2021 23:08 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
This kind of scheduling is not very useful for Go, because it needs to manage stacks for goroutines. Basically, Go reuses a pool of system threads, switching the current stack when needed instead of just letting the thread go to sleep.
Posted Dec 29, 2021 23:49 UTC (Wed)
by nksingh (subscriber, #94354)
[Link] (3 responses)
Posted Dec 31, 2021 10:51 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
It doesn't really need to.
If a goroutine blocks for some unforeseen reason (because the underlying physical thread is processing a signal, for example), then the queued work (goroutines ready to run) associated with the physical thread will be "stolen" by other threads.
Additionally, if the goroutine got blocked inside a syscall or a C library call, it won't be counted towards GOMAXPROCS limit, so the Go scheduler will be able to launch a new thread to replace the blocked one.
It's theoretically possible to have a situation where all threads are blocked for some reason, but I can't think of a reason why.
Posted Dec 31, 2021 16:11 UTC (Fri)
by foom (subscriber, #14868)
[Link] (1 responses)
Yet, a syscall could take longer than that on-cpu (without blocking), in which case you've over-scheduled work vs number of cpus. Alternately, a syscall might block immediately, in which case you've wasted time, where you could've run another goroutine.
To ameliorate those issues in common cases, there's two other variants of syscall entry, one for invoking a syscall that "never" blocks, and another for syscalls that are considered to very likely immediately block, which resumes another goroutine immediately.)
This mechanism clearly works, but it really doesn't seem ideal. If the kernel could, instead, notify the go scheduler when a thread has blocked, all this guessing and heuristics could be eliminated.
Posted Dec 31, 2021 22:47 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
I'm not sure if it would help. You still need a scheduler thread and checking for a stuck thread right now is pretty fast. I guess one thing that might be helpful is ability to force-preempt threads. Right now Go uses signals for that, and signals have their drawbacks.
> I've sometimes read one of the reasons green threads have fast context switching is because the kernel isn't involved.
User-managed concurrency groups
User-managed concurrency groups
User-managed concurrency groups
https://dl.acm.org/doi/abs/10.1145/146941.146944
User-managed concurrency groups
User-managed concurrency groups
User-managed concurrency groups
User-managed concurrency groups
User-managed concurrency groups