User-managed concurrency groups

Posted Jan 1, 2022 23:59 UTC (Sat) by alkbyby (subscriber, #61687)
Parent article: User-managed concurrency groups

Indeed, I think what is needed most is someone from Google fibers team perhaps with help of some writer experts to post in-depth description of how this will be used. Otherwise people get all kinds of ideas.

Indeed, this is basically an attempt to implement go-style "cheap" concurrency in C setting. With few twists.

There is video from 2013 LPC conference with some background: https://www.youtube.com/watch?v=KXuZi9aeGTw. Slides are sadly gone. (https://blog.linuxplumbersconf.org/2013/ocw/system/presen... linked from https://blog.linuxplumbersconf.org/2013/ocw/events/LPC201...)

First, fibers (equivalent of goroutines) are regular posix threads. I.e. all the usual stuff various C codes are used to do, like calling gettid, sending signals, finding all threads via procfs (i.e. GC implementations do that and send signals), all works as before. And gdb will see all the fibers too as threads without any special modifications. And core dumps have all fibers stack traces present too in usual way. This also implies all the thread local storage works as usual.

Second, it provides userspace with ability to schedule CPU time itself, which also includes ability to limit actual parallelism in some cases. Some of it is cheapness of scheduling itself. Linux kernel scheduler has to do all kinds of heurstics which cost CPU cycles and not always are useful. Paul's presentation in 2013 has lots on this (of course this days syscalls are not cheap anymore after all the security crap, but there is hope that madness will end with time). Some of it is limiting actual parallelism in order to help caching effects. I.e. when you have thread/fiber/goroutine-per-request model, you often do stuff like sending RPCs to your backends. Concurrently. So with "child" fibers. And those child fibers not bouncing around all the hundreds of cores we have this days helps efficiency a lot. There are other uses too, so IMHO whoever represents google in discussion should post decent list with specific examples and numbers when possible. Otherwise people should/will ask why not simply make kernel scheduler cheap and "right" enough.

And third, there is this essentially scheduler activations implementation to deal with fibers blocking inside syscalls. So that IO doesn't suck as much as it would otherwise. I.e. this is classic and well known challenge of M:N or green threads.

Notably, this is what Google-internal fibers don't have yet (as Peter points out too). As inside google most processes only do "IO" by RPCing to other services. And pretty much all the actual blocking/sleeping is via abseil mutex/condition-variable which has facilities to call to fiber scheduler. I.e. those functions are weak for a reason: https://github.com/abseil/abseil-cpp/blob/1ae9b71c474628d...

And those services that actually handle local SSDs and disks use proper async io facilities to do their job. Stuff like blocking on page faults/mm syscalls and what not, are indeed not that important in practice to schedule around anyways.

But obviously to get this kind of threading implementation ready for non-datacenter world, it definitely requires efficient and general way to deal with blocking syscalls.

Basically, when all this works as intended, we'll simply get cheaper and faster threading facility that will be not too much slower than goroutines, but doesn't have suck-ful IO even in corner cases.

User-managed concurrency groups

Posted Jan 2, 2022 21:37 UTC (Sun) by ejr (subscriber, #51652) [Link]

Anyone else reading this and thinking of Erlang?