Systemd vs. Docker

Posted Feb 27, 2016 0:04 UTC (Sat) by wahern (subscriber, #37304)
In reply to: Systemd vs. Docker by smurf
Parent article: Systemd vs. Docker

Stopping all the members of the cgroup is the same problem as sending a signal to all of them. But you don't need cgroups to send a signal to all of them. Traditional process groups already provide that. That's almost entirely their function, so a controlling process (e.g. a shell) can broadcast a signal to a tree of subprocesses.

The problem with process groups is that forking daemons usually create a new process group, and systemd is making a superficial attempt to handle those uncooperative daemons. I'd guess the idea was that most existing forking daemons don't know anything about cgroups so aren't going to be changing their membership.

As for the kernel maintainers, I never followed the LWN coverage of the systemd debates very closely, but it was my understanding that they were resistant to patches to cgroups which reinvented the wheel of interfaces like process groups. Also, the cgroups data structures are supposedly awkward and inefficient (in part, I assume, because of the nesting semantics) and they were reluctant to allow them to become more deeply embedded in the fundamentals of process management. But maybe I totally misunderstood things.

One possible hack would be to use a seccomp filter to silently ignore attempts to create a new process group. Another might be PID namespaces, though I'm not very familiar with them.

Although, to be clear, IMO all of these options are trying to put lipstick on a pig. I'm not sure systemd _should_ be fixed. I'd rather see all the poorly written software fixed. Teach poorly written daemons to optionally run in the foreground so systemd or any other service manager doesn't have to use hacks to track it. And for daemons that fork new processes, work to improve their correctness, and to not needlessly create new process groups. Basically, _subtract_ code, rather than add hundreds of thousands of lines of new code to the pile.

But things like systemd, Docker, etc, are all the rage these days. Apparently people prefer fixing problems with "full stack" solutions rather than submitting a 5-line diff. Whatever... just keep it all off my lawn :)

Systemd vs. Docker

Posted Feb 27, 2016 1:57 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

PID namespaces solve this problem entirely - you can just kill everything inside of it, without any chance to interfere with the parent namespace.

Process handles are another way to solve this.

Systemd vs. Docker

Posted Feb 27, 2016 7:00 UTC (Sat) by wahern (subscriber, #37304) [Link] (2 responses)

Process handles don't solve this problem unless you forbid the daemon from forking. But if the daemon wasn't forking subprocesses this race wouldn't exist.

Process handles are useful for passing off the management of a process. For example, traditional shell-based init.d scripts could acquire a process handle and pass it on to a service manager, in effect obviating the need for a PID file. The service manager wouldn't even need to be PID 1, yet the PID file race issue would be solved just as well (or just as incompletely, depending on your perspective).

It does look like PID namespaces solves this problem. The manual page says that when the PID 1 "init" process in the namespace terminates, all the processes in that namespace are SIGKILL'd. So you could just kill that process without having to bother enumerating the PIDs, perhaps simply by closing its process handle. The problem with that solution--and all the others--is that it's not backwards compatible. Neither the software nor administrators may be expecting the process(es) to be running in a different PID namespace.

Systemd vs. Docker

Posted Feb 27, 2016 7:16 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

The do help if you can kill the daemon-spawned processes faster that the are created. Which you realistically can, not even considering stuff like pid limits.

Systemd vs. Docker

Posted Feb 27, 2016 8:31 UTC (Sat) by smurf (subscriber, #17840) [Link]

Process handles do solve the problem because as long as you hold such a handle, the process won't entirely go away, thus the PID won't get reassigned. Systemd and the kernel could simulate that easily -- opendir(/proc/PID). The problem is that adding even more overhead doesn't help when you try to catch a fork bomb. My SIGSTOP idea, however, just might. I'll have to test that idea further.

Systemd vs. Docker

Posted Mar 7, 2016 1:41 UTC (Mon) by cg909 (guest, #95647) [Link]

Using process groups would work for simple daemons, but not for services like sshd.

The main problem is that process groups always belong to a session. So every service that spawns user sessions would also need to break out of the process group.

If you use seccomp filters to ignore setpgrp() and setsid() sshd would fail in spectacular ways as all processes in the process group will share the same controlling terminal and so every process spawned by sshd will receive SIGHUP when a session is closed. Also anything spawning a shell might run into problems as shells use process groups to separate tasks.

You'd need "super process groups" which may span multiple sessions and contain multiple process groups.

And this is what cgroups provide.

Systemd vs. Docker

Posted Jun 14, 2016 17:32 UTC (Tue) by davidlee (guest, #109327) [Link]

I agree with the sentiment that it is NOT Docker's fault that errant or poorly designed applications are being run inside of a container.

The "solution" I am using is call supervisord. If I need something controlled from inside the Docker, I do it myself. I note some of the previous comments made derisive comments about the kinds of init scripts folks like me might come up with. So what? I don't need their permission, nor do I need their acceptance. With four decades of script writing, I think I can write one that will do the job.

Yes, I could use systemd for those applications that install scripts it will use. I like the three-line example for Apache. But it was quite unique in that Apache installs SO MUCH STUFF that the example works. Perhaps Nagios might also work. Or Splunk. Or a myriad of other major applications which are mature enough to do so.

One of my recent docker containers was an interface with Dropbox. Nope, no three-liner there. It required too much to configure and set up. Actually, virtually every Docker container I have designed has required setup and configuration -- and, thank you very much, a carefully crafted startup script.

If I have a docker which needs to manage internal processes, I'll stick with solutions like supervisord (there are more options, but this is the one I have settled on). I'd rather use that on the few docker containers I need it than have the weight of systemd in every single docker container I generate. Imagine a busybox container with systemd running...