|
|
Subscribe / Log in / New account

Containers as kernel objects — again

Containers as kernel objects — again

Posted Feb 28, 2019 9:23 UTC (Thu) by mezcalero (subscriber, #45103)
Parent article: Containers as kernel objects — again

Urks. Upcalls. Can we please stop adding those to the Linux kernel? Upcalls are awful, they just mean that there suddenly exists a userspace process that is entirely independent from the rest of the system, untracked, unmanaged by userspace, with different runtime attributes, security settings and everything else (in which cgroup does it even live, in a world where inner cgroups are not supposed to have processes anymore?). This just sucks, as generally it's highly desirable to apply resource mgmt, security settings and so on to all kernel upcall processes the same way as for every other process in the system, but there's simply no way to do that. Yes, the kernel added some very splintered ways to set some process properties for upcalls (caps mostly), but this is very incomplete and pretty awful.

Besides that upcalls are also slow, and hence had to be replaced in many cases with something more performant anyway (think: hotplug upcalls, cgroup agent upcalls, and that stuff). Or think of core_pattern handling: let's say you make firefox crash, now the kernel does an upcall for processing that coredump, which is quite often very CPU and IO sensitive, to the point of slowing down the system drastically. But of course, since the thing runs as upcall it will be outside of the resource mgmt of the rest of the system and unrestricted in lifecycle and resoruce usage, unless it decides to manage itself. In systemd we thus had to replace the core_pattern by a binary that takes the stdin pipe and sends it to a properly managed daemon via AF_UNIX fd passing, and exits quickly, to minimize the unmanaged codepaths. This way the bulk of the core dump processing can be nicely sandboxed, lifecycled and resource managed. But yuck! Why is that even necessary? Why can the kernel just notify userspace in a friendly way without forking of nutty stub processes?

Please, let's just forget about upcalls: provide proper APIs right from the beginning that userspace can subscribe to and then handle without a process being spawned.

(or at least add a generic upcall logic that allows userspace to handle the upcalls instead of the kernel doing the fork()+execve() on its own)

Seriously, fuck upcalls!

Lennart


to post comments

Containers as kernel objects — again

Posted Feb 28, 2019 16:38 UTC (Thu) by bfields (subscriber, #19510) [Link]

I'm inclined to agree, based on our experience using usermode_helper for some NFS stuff and then realizing it was going to be a pain to spawn them with the right namespaces.

Just one nit: I don't think "upcall" is the right term. I've always heard the word "upcall" used for any request made by the kernel and answered by userspace, however it's done.

Maybe the term you want is "usermode_helper", or "processes spawned from the kernel", or something.

Containers as kernel objects — again

Posted Apr 14, 2019 21:09 UTC (Sun) by jkowalski (guest, #131304) [Link]

Lennart,

Does this help with your usecase a bit?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/...

I assume the helper can then directly make bus calls to construct a transient unit (or invoke a static one)?


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds