From anti-systemd to pro-systemd in the shortest time

Posted Feb 1, 2014 23:11 UTC (Sat) by cdmiller (guest, #2813)
In reply to: From anti-systemd to pro-systemd in the shortest time by anselm
Parent article: This week in "As the Technical Committee Turns"

Didn't realize a sysv init script was supposed to restart daemons when they crash...

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 0:15 UTC (Sun) by anselm (subscriber, #2796) [Link] (11 responses)

This is the 21st century. The competition does it as a matter of course.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 2, 2014 22:33 UTC (Sun) by dlang (guest, #313) [Link] (10 responses)

actually, this is the 21st century, most places restart the system (i.e. VM), not the process if something crashes.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 9:45 UTC (Mon) by HelloWorld (guest, #56129) [Link]

Yes, and they do that not because there's any sane reason to but because it didn't really work before systemd.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 10:22 UTC (Mon) by anselm (subscriber, #2796) [Link] (8 responses)

And why would that be? Not conceivably because the way server processes are traditionally started offers us no reasonable way to restart just the service in question? Not without lots of additional band aid and baling wire, anyway? (Which suddenly makes just restarting the machine look reasonable by comparison.)

We're now in the fortunate position of being able to solve this problem (among others) without resorting to silly and desperate solutions. It is funny how people are clutching at straws to defend software that was pretty useful when it was new in the 1980s but no longer represents the state of the art.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 11:08 UTC (Mon) by dlang (guest, #313) [Link] (7 responses)

if you really believe that systemd shutting down a service can resolve all possible issues and that the system is guaranteed to be in the same clean state that a rebooted (or ideally, re-created) system would be in, then you are dreaming.

yes, it may be better than without systemd, but that's a long way from being something that's a known state.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 11:44 UTC (Mon) by anselm (subscriber, #2796) [Link] (6 responses)

If you really believe that rebooting a VM magically resolves all possible issues and that the system is guaranteed to be up and running afterwards, then you are dreaming.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 14:35 UTC (Mon) by raven667 (subscriber, #5198) [Link] (5 responses)

Oh come on now, what he said is true, big "cloud service" shops don't do troubleshooting by hand when there is a failure, they just kill the VM and rebuild it from scratch through their configuration management system. Netflix has written papers about this. What might be worth arguing about is whether systemd can replace some home-grown utilities keeping services alive and whether the overall reliability is improved so that you are rebuilding fewer VMs.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 15:25 UTC (Mon) by anselm (subscriber, #2796) [Link] (1 responses)

Yes, and because Netflix doesn't do it it must be a stupid idea in general.

For one, even the big cloud services will prefer not having to rebuild a VM if restarting a process will also work (better than it does based on SysV init). For example, even a big cloud service will eventually want to figure out why a service keeps crashing, and obliterating the evidence by rebuilding the VM in question is not exactly helpful in that case.

For another, not every outfit in the world is a big cloud service and has the sort of tooling that lets them rebuild a VM automatically at the snip of a finger. These others often don't have the sort of tooling that will reliably track a service started by SysV init for them, either, and therefore stand to gain from a system that will do it for them out of the box. If anything, the big cloud services tend to have people on staff to sort out things like transparent service restart for them even if that means coming up with a way of rebuilding VMs at short notice, because this is basically how big cloud services make their money, while the IT people at other outfits are often forced to spend their time on other more immediately important issues. We can argue that everybody should think and work like a big cloud service (most of the required pieces are conveniently available, after all) but this is not usually how things pan out in actual practice.

Finally, service restart vs. VM rebuilding notwithstanding, even big cloud-based outfits seem to like systemd. Spotify, for example, has officially come out in support of making systemd the default init system on Debian.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 15:41 UTC (Mon) by raven667 (subscriber, #5198) [Link]

> Yes, and because Netflix doesn't do it it must be a stupid idea in general.

Dude, you are barking up the wrong tree, relax.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 21:35 UTC (Mon) by khim (subscriber, #9252) [Link] (2 responses)

Oh come on now, what he said is true, big "cloud service" shops don't do troubleshooting by hand when there is a failure, they just kill the VM and rebuild it from scratch through their configuration management system.

This is chicken and egg problem: since it was impossible for a long time to reliably separate services without VM these “big "cloud service" shops” went with VM. But guys who though that this approach is just stupid developed another solution. This solution (today we know it under name cgroups) is used today by systemd to manage daemons and services.

Let me repeat once more: underlaying functionality which systemd uses today to manage processes was created specifically for the needs of “large cloud services”. Now, it's entirely possible that some use-cases are covered poorly by this approach (hey guys from Google developed it for Google and different coulds are… well, different) but then it just means that it must be fixed.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 22:00 UTC (Mon) by raven667 (subscriber, #5198) [Link] (1 responses)

Yes and my very next sentence was agreeing with you on this point, I was just pointing out that currently what is done for automatic recovery is re-provisioning, as the earlier poster seemed to be arguing that point.

> What might be worth arguing about is whether systemd can replace some home-grown utilities keeping services alive and whether the overall reliability is improved so that you are rebuilding fewer VMs.

From anti-systemd to pro-systemd in the shortest time

Posted Feb 3, 2014 23:04 UTC (Mon) by dlang (guest, #313) [Link]

and I'll say that reprovisioning from scratch is going to be more reliable than containers or cgroups. you don't know what files were corrupted by the misbehaving software.

It's also possible to leave other problems around (shared memory is one thing that I can think of) or have a process issue commands that put the kernel into a different state than what you want.

If you want to be sure, you reprovision.

Since you need to be able to provision from scratch rapidly anyway to deal with bursts of load (in the cloud environment everyone is pushing for), using that mechanism to deal with crashes increases your reliability.