>> You don't have to use cgroups; you do have to support the features systemd currently uses cgroups to support, such as supporting daemons that fork or double-fork or have child processes, while still respawning daemons that exit or crash.
> Why does the init system have to do this job?
Its in the best position to do this as it's what starts the daemons and has a parent/child relationship with them. Process respawning is also available in SysV init as configured by /etc/inittab but that feature is incomplete.
> If a daemon exits or crashes, is it really the right thing to just start a new copy? what damage did the failed instance do that may need to be cleaned up?
I would think that in the general case the answer is very much YES. If you aren't worried about service outages I suppose you could make failure permanent, maybe wrap the daemon startup in a script which disables the service after the daemon exits. Seems kind of silly though.
> If a daemon has a complex set of dependancies (or deliberately does something like a double-fork to distance itself),
Double forking is more an implementation detail needed by other init systems, you don't need or want to double fork with systemd, although things will still work if you do.
> is it really enough to only consider it 'failed' if they all exit?
I'm not sure what happens if you challenge the init system in this way, what's the Right Thing(tm) to do in this case? If the processes maintain a parent/child relationship then you can rely on signals otherwise if the processes take input you can manage their sockets to detect failures.
> or should there be something more application aware (i.e. a daemon specific watchdog) that does this?
I suppose you could do that too, to handle cases where a process is wedged and needs to be restarted, but you solve 99% of the problem with just basic process monitoring using pretty standard techniques.
Posted Nov 14, 2012 22:46 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
> I suppose you could do that too, to handle cases where a process is wedged and needs to be restarted, but you solve 99% of the problem with just basic process monitoring using pretty standard techniques.
heh, I think people would argue that the sysV init solved 99% of the problem with even simpler tools.
I think that the simple case is handled well by SysV init, the really complex cases are not handled well by anything short of application specific watchdogs/monitoring/cleanup tools, so the question is if there is enough of a gap between these two to make the cgroups requirement (and overhead) worthwhile.
I've gotten to the point that everything I build is now either a cluster, or at least potentially part of a cluster, so I look at anything with the question in mind of 'how would I do this if it was part of a cluster', and this mindset really makes me question the value of trying to have the init system be more clever in addressing this problem.
One of the most common problems you run into when dealing with clusters isn't "the application exited or crashed", it's "The application is running, but wedged" I just don't see many cases where the cgroups approach to managing the app would have prevented problems (although, I freely admit that it has the 'cool' factor)
Crowding out OpenBSD
Posted Nov 14, 2012 23:44 UTC (Wed) by bronson (subscriber, #4806)
[Link]
For every cluster deployed, there must be hundreds or thousands of single hosts deployed to serve up wordpress or static content. Simple stuff written by a weekender, maybe with slashdot effects handled by CloudFlare. I think these guys would find systemd's respawning awesome. Much nicer than configuring and tracking down memory leaks in God, that's for sure! http://godrb.com/
Yes, clusters need more development effort and more powerful tools, but don't let that get in the way of the 80% solution.
Crowding out OpenBSD
Posted Nov 15, 2012 1:13 UTC (Thu) by wahern (subscriber, #37304)
[Link]
We've already had respawning with inetd. The fact that few used this almost universal facility should make one wonder whether it's really a feature developers don't enjoy reimplementing themselves.
That might seem illogical at first, until you consider how a developer's environment is setup. When developing a daemon you absolutely _hate_ centralized process management, because it tends to get in the way of debugging, among other things. Instead, you basically have to implement some sort of simple process management yourself, regardless of how your production environment will look. (If you're a web developer for a large website, imagine having to develop, debug, test using your process to upload and install in production... it'd be a nightmare.)
I usually start my development on OS X, and then port to Linux and the BSDs. Not once have I considered using launchd.
Crowding out OpenBSD
Posted Nov 15, 2012 1:45 UTC (Thu) by josh (subscriber, #17465)
[Link]
inetd tied process spawning to sockets. systemd doesn't; you can independently choose whether to use socket activation and whether to use respawning.
Crowding out OpenBSD
Posted Nov 15, 2012 1:56 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
inetd tied process spawning to sockets, but there was also inittab for daemons that you wanted to have restarted if they exited or crashed.
Crowding out OpenBSD
Posted Nov 15, 2012 3:33 UTC (Thu) by josh (subscriber, #17465)
[Link]
True, and it seems quite unfortunate to me that distributions stopped using inittab for its intended purpose and started using it to launch scripts instead.
Crowding out OpenBSD
Posted Nov 15, 2012 14:47 UTC (Thu) by khim (subscriber, #9252)
[Link]
inittab worked as intended couple of decades ago. But when daemons started spawning processes left-and-right and started depend on other processes it become useless.
You can view systemd group-using core as "inittab done right".
Crowding out OpenBSD
Posted Nov 15, 2012 15:34 UTC (Thu) by raven667 (subscriber, #5198)
[Link]
Exactly, I see systemd as a return to unix roots, features that init would have had it continued innovating instead of stagnating and standardizing, not some weird departure against the overall style of the system.
Crowding out OpenBSD
Posted Nov 15, 2012 22:04 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
parts of systemd could be called implementing init properly, but other parts are far more questionable.
why does device enumeration belong as part of init (udev)?
why does logging belong as part of init (the journal)?
Crowding out OpenBSD
Posted Nov 15, 2012 22:44 UTC (Thu) by raven667 (subscriber, #5198)
[Link]
>why does device enumeration belong as part of init (udev)?
It isn't part of PID 1 but hardware initialization is a dependency for services being started. The service manager ends up needing to know about hardware state for dependency purposes and therefore shares a lot of infrastructure with udev which is why they are distributed together now.
> why does logging belong as part of init (the journal)?
This is also not part of PID 1 . Whether system logging should be tackled as part of the systemd project is debatable. It is still an optional component though and might solve some real problems.
Crowding out OpenBSD
Posted Nov 15, 2012 22:50 UTC (Thu) by raven667 (subscriber, #5198)
[Link]
I thought of one more thing as well, the init system is in a prime position for logging both because it is connected to the stdout/stderr of processes that it spawn, where they might dump data, and it has its own needs for logging but by definition is started before any userspace logging service could be running. So there are at least a couple of cases that a PID 1 and related tools should identify and solve.
Crowding out OpenBSD
Posted Nov 15, 2012 14:45 UTC (Thu) by khim (subscriber, #9252)
[Link]
I've gotten to the point that everything I build is now either a cluster, or at least potentially part of a cluster, so I look at anything with the question in mind of 'how would I do this if it was part of a cluster', and this mindset really makes me question the value of trying to have the init system be more clever in addressing this problem.
Funny that you are trying to use cluster-aware approach to oppose cgroups. You do remember that cgroups were invented by Google engineers to manage clusters, right? SystemD just applies the same logic to local system management.
One of the most common problems you run into when dealing with clusters isn't "the application exited or crashed", it's "The application is running, but wedged" I just don't see many cases where the cgroups approach to managing the app would have prevented problems (although, I freely admit that it has the 'cool' factor)
That's strange because I see tons of places where cgroups solve cluster-management problems. What happens with your cluster if some unimportant task start to eat memory or CPU endlessly? With cgroups answer is obvious: it's bound by cgroup's limit. What happens if some task must be killed and restarted because your node is overloaded? Cgroups make sure you can kill the task reliably. Sure, that means that your tasks must be written to be killable, but it makes sense for clusters anyway since failed PSU can do the same thing suddenly and without warning.
Crowding out OpenBSD
Posted Nov 15, 2012 20:53 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
I didn't say that cgroups did not have a place or a reason to use them.
I just said that requiring cgroups to properly start and stop a process is using the wrong tool for the job.
Crowding out OpenBSD
Posted Nov 15, 2012 16:34 UTC (Thu) by raven667 (subscriber, #5198)
[Link]
I wouldn't call cgroups "cool" but I would say that it's a better solution than what's offered in /etc/init.d/functions and ancillary tools like /sbin/pidof. In fact I recently had an outage due to a SysV script that on restart managed to killall rsyslogd but then only start the one instance causing an outage for the other instance on the same host. That kind of unnecessary outage is depressingly easy to have, we can do better.
Crowding out OpenBSD
Posted Nov 22, 2012 14:53 UTC (Thu) by TRauMa (guest, #16483)
[Link]
> people would argue that the sysV init solved 99% of the problem
In 15+ years of using linux I never came across a sysv init based system that could reliably restart mysql under all typical production circumstances. Now mysql used (?) to be a close to worst case behaving deamon, but it was also very popular, so if no distributor could handle it correctly with the tools given (and the init scripts tended to be very, very long and complex and hard to debug), how can one argue that it's a 99% solution?
Crowding out OpenBSD
Posted Nov 22, 2012 19:21 UTC (Thu) by hummassa (subscriber, #307)
[Link]
does systemd (and upstart) actually solve this problem?
Crowding out OpenBSD
Posted Nov 22, 2012 19:36 UTC (Thu) by Cyberax (✭ supporter ✭, #52523)
[Link]
Yep. SystemD can reliably track mysql's state and kill it. Mysql recovery should then take care of unclean shutdowns.
Crowding out OpenBSD
Posted Nov 22, 2012 21:37 UTC (Thu) by paulj (subscriber, #341)
[Link]
How does it reliably track mysqls state btw? What would define a bad state requiring restart?
IME just because a process is running doesn't mean it's actually in a good state. You need some kind of heartbeat protocol. Does systemd implement something like that?
Crowding out OpenBSD
Posted Nov 22, 2012 22:45 UTC (Thu) by Cyberax (✭ supporter ✭, #52523)
[Link]
No, the common problem with mysql is that it doesn't terminate cleanly. Its shutdown scripts can hang for minutes and _still_ leave hanging processes behind. These hanging processes later interfere with the startup.
SystemD nicely solves this problem. Hooking it up with an external heartbeat monitor should also be quite easy.
Crowding out OpenBSD
Posted Nov 22, 2012 23:25 UTC (Thu) by cortana (subscriber, #24596)
[Link]
systemd can be configured to require that a service regularly notifies that it is still up, or else it'll be killed & restarted. See http://0pointer.de/blog/projects/watchdog.html for the details.