Not a "bloated, monolithic system"?

Posted Jan 29, 2019 21:47 UTC (Tue) by zblaxell (subscriber, #26385)
In reply to: Not a "bloated, monolithic system"? by mathstuf
Parent article: Systemd as tragedy

> The ability to run a service every 15 minutes where that is the time between the *end* of the previous run and the start of the next [...] I think it would be very hard to have an external cron do that as well as systemd's pid1 does it being part of pid1.

I think it's pretty easy to do that sort of thing without being part of pid1. Replicating and improving cron (or replacing it outright with a different design) is a coding exercise for students. The result can integrate with cron or systemd. I did mine decades ago and still use it today (the need for a tool implementing that scheduling behavior, as you point out, is obvious).

The problem with the question "what functional elements should belong to pid1?" is that with the right subset of supporting evidence, the answers "all of them" and "none of them" can both be as valid as any point in between those extremes. systemd delegates a number of its functions to external processes and reserves a number of functions to itself more or less at random--sometimes there's an identifiable historical practice or a functional requirement, other times features just seem to get added to some random existing binary in the systemd package (e.g. systemd-logind or pid 1). Many agree the specific arrangement systemd currently uses is somehow wrong (i.e. pid1 is "bloated"), but many disagree on how best to change it.

Not a "bloated, monolithic system"?

Posted Jan 29, 2019 22:17 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (31 responses)

> I think it's pretty easy to do that sort of thing without being part of pid1.

The problem I forsee is proper process tracking. Is this tool going to hook into systemd's cgroups layout? Is it going to spawn processes directly? Then it has the same job as pid1 for proper process management. Is it going to call `systemctl start` and `systemctl stop` on a schedule? If so, how much code are you really saving? Is it going to work with other init systems? That's a lot of code since systemd does lots of heavy lifting for this tool that would need reimplemented for sysvinit. IMO, it belongs with wherever service management is done for both code savings and being more robust. And having read upstream's arguments and those arguing for it, moving process management out of pid1 into a hypothetical pid2 manager doesn't sound like a great idea.

Not a "bloated, monolithic system"?

Posted Jan 30, 2019 0:05 UTC (Wed) by zblaxell (subscriber, #26385) [Link] (30 responses)

You are thinking of this hypothetically. I have concrete experience.

> Is this tool going to hook into systemd's cgroups layout?

The scheduler tool doesn't know what a cgroup is. A process-launcher tool invoked by the scheduler tool might, if that was relevant for the service.

The cgroups layout we use is older than systemd's, and we usually want to run it instead of the systemd one; however, we have systems running both.

> Is it going to spawn processes directly?

Sure, why wouldn't it? A tool that spawns processes on a schedule will spawn processes, and to minimize dependencies it will spawn them directly.

> Then it has the same job as pid1 for proper process management.

In the sense that it's mostly a while loop calling fork, a bunch of child process setup calls, exec, and waitid, with some interface for communicating state with other parts of the system, then yes. In the sense that it's replicating systemd's pid 1, then no.

> Is it going to call `systemctl start` and `systemctl stop` on a schedule?

If the jobs it schedules and runs invoke 'systemctl start' and 'systemctl stop', then yes; otherwise, no.

> If so, how much code are you really saving?

The simplest versions of these tools weigh in at a few hundred bytes (plus the system shell that's running anyway). More complicated ones run a few kilobytes. If the only part of systemd we needed was the timer feature, then we save the entire code of systemd.

The scheduler tool will also function as an 'init', though you'd have to "schedule" the execution of every system service in that case--and because it's a single-purpose tool, you'd need something to watch network sockets, mount filesystems, respond to device connections, etc. On embedded systems and VM containers, though, sometimes the scheduler tool is all you need to make one or two services happen over and over.

> That's a lot of code since systemd does lots of heavy lifting for this tool that would need reimplemented for sysvinit

I'm not sure what you're getting at. There's not a lot of code, most of it serves the tool's primary function, and there's no systemd or sysvinit dependency (other than they have to start the tool, or the tool itself has to be pid 1 to start itself). What heavy lifting do you think systemd could be doing?

> IMO, it belongs with wherever service management is done for both code savings and being more robust.

Isolation between unrelated subsystems improves robustness, and at under 1K, there is not much code to be saved. I would agree with you if you had mentioned the human-factors benefit of consistency in the management interface, or some of the non-timer capabilities of systemd. Configuring things in multiple redundant places can suck.

We mostly use the scheduler tool to give ourselves a consistent interface to systems that do and do not run systemd, and some of the helper tools to replicate specific systemd features in legacy systems that can't be (easily) upgraded.

> And having read upstream's arguments and those arguing for it, moving process management out of pid1 into a hypothetical pid2 manager doesn't sound like a great idea.

At this point I agree further churn in systemd is bad. It took years to get to where it is now.

Since systemd started there have been kernel hooks added to support pid2 process managers (because systemd ended up needing them, apparently), so it's possible to make a more complete service manager implementation in pid2 now than it once was. A new project starting today would have no need to do anything in pid 1 except spawn pid 2 in a loop (or maybe fallback to an earlier version if an upgrade goes badly). systemd is not a new project starting today--it is code that has had the benefit of years of debugging, and most people don't want anyone to mess with it now.

If you're asking "should I use systemd?", the answer is not "yes because timers are awesome."

Not a "bloated, monolithic system"?

Posted Jan 30, 2019 6:13 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (28 responses)

> The scheduler tool doesn't know what a cgroup is. A process-launcher tool invoked by the scheduler tool might, if that was relevant for the service.
How do you know that all processes started by the task have finished?

Long ago I had an issue - our CI server (Jenkins) was spawning a task that launched a background daemon listening on a specific port. The problem was that this daemon sometimes hanged on shutdown, ignoring anything short of targeted SIGKILL but still having the port open. So our tests periodically failed because of that. The fix way back then was to "lsof | xexec kill" at the start of the test.

Systemd solves this issue cleanly.

Not a "bloated, monolithic system"?

Posted Jan 30, 2019 15:48 UTC (Wed) by imMute (guest, #96323) [Link]

>Systemd solves this issue cleanly.

Technically, cgroups solves that issue cleanly. Systemd is just one [easy] way to use cgroups, but there are other inits that utilize cgroups.

Not a "bloated, monolithic system"?

Posted Jan 30, 2019 17:27 UTC (Wed) by zblaxell (subscriber, #26385) [Link] (24 responses)

> How do you know that all processes started by the task have finished?

Watch the cgroup tasks or events files (or use notification_agent if you're old-school).

> The fix way back then was to "lsof | xexec kill" at the start of the test. Systemd solves this issue cleanly.

No, cgroups solve that issue cleanly. systemd's cgroup controller is showing its age and doesn't solve the issue in some cases.

Systemd's cgroup killer plays whack-a-mole with processes in a tight loop, instead of freezing the cgroup then picking off processes at leisure without looping. It looks like it's theoretically possible to create a fork bomb that systemd can't kill. systemd hasn't been updated to take advantage of new cgroup features ("new" meaning "somewhere around kernel 3.17, between 2014 and 2015, when we removed assorted workarounds from our cgroup controllers").

The concrete bug that we hit in production is that systemd assumes processes respond instantly to SIGKILL, and can be ignored after the signal is sent. On Linux, SIGKILL can take nontrivial amounts of time to process, especially if processes have big temporary files or use lots of RAM (e.g. databases). If a service restarts, and there was a timeout during the stop, systemd can spawn a new service instance immediately, which will try to allocate its own large memory before its predecessor has released it, OOMing the machine to death. There's a policy problem here: ignoring a slowly dying process somewhere in the cgroup is correct behavior for the shutdown use case (since the machine is being turned off anyway, nobody cares if the service is still stuck in the kernel when that happens), but the policy is incorrect for service restart.

Various workarounds are possible (e.g. use pid files to find the previous service instance and wait until it finishes being killed, check the service cgroup at startup to see if it's empty then wait if it's not, or limit resources available to the application to mitigate the specific OOM risk), but the cleanest (and often easiest, since we already have to do it to support non-systemd machines anyway) solution is to just implement a micro service manager that doesn't have the systemd problems in the first place, then configure systemd (or whatever the machine is using) to forward start/stop requests to it.

Not a "bloated, monolithic system"?

Posted Jan 30, 2019 22:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (23 responses)

> No, cgroups solve that issue cleanly. systemd's cgroup controller is showing its age and doesn't solve the issue in some cases.
Well, systemd is not set in stone. What needs to be changed to fix it?

> Systemd's cgroup killer plays whack-a-mole with processes in a tight loop, instead of freezing the cgroup then picking off processes at leisure without looping.
PID controller might be even better suited for this, but in practice systemd is much faster at killing processes than the kernel is at forking.

> The concrete bug that we hit in production is that systemd assumes processes respond instantly to SIGKILL, and can be ignored after the signal is sent. On Linux, SIGKILL can take nontrivial amounts of time to process, especially if processes have big temporary files or use lots of RAM (e.g. databases). If a service restarts, and there was a timeout during the stop, systemd can spawn a new service instance immediately, which will try to allocate its own large memory before its predecessor has released it, OOMing the machine to death.
This doesn't sound right. systemd will wait until the cgroup is empty, by which time all the resources should be freed.

And I think you can increase the timeout time and/or disable it completely in this case.

Not a "bloated, monolithic system"?

Posted Jan 31, 2019 21:08 UTC (Thu) by zblaxell (subscriber, #26385) [Link] (22 responses)

> Well, systemd is not set in stone. What needs to be changed to fix it?

systemd is not written in stone, but it is unusually difficult to analyze, modify and deploy. Any day that starts with "understand how hundreds or thousands of units interact at runtime with a dynamic dependency resolver to produce some result" usually ends with "I need to do these 20 things, let's write a 25-line program that does those 20 things in the right order and run that program instead of systemd." I can never figure out how to turn that sort of day into a systemd pull request.

> in practice systemd is much faster at killing processes than the kernel is at forking.

That may be true, assuming no scheduler shenanigans; however, the loop in systemd that kills processes terminates when the cgroup contains no processes _that systemd has not already killed once_, so it's vulnerable to pid reuse attacks. If a forkbomb manages to get a reused pid during the kill loop, systemd will not attempt to kill it.

> This doesn't sound right. systemd will wait until the cgroup is empty, by which time all the resources should be freed.

It writes "Processes still around after SIGKILL. Ignoring." on the log and ignores the process(es). It might send SIGKILL to the same process again when the service restarts, but additional SIGKILLs aren't helpful.

There's no single right answer here, and no documented systemd configuration option I can find to address this case. The documentation talks a lot about waiting for various subsets of processes, but the code revolves around actions taken during state transitions for systemd services. These are not the same thing.

If the killed process is truly stuck, e.g. due to a kernel bug, then it will never exit. Processes that can't ever exit pollute systemd's service dependency model (e.g. you can't get to target states that want that service to be dead). systemd solves that problem by ignoring processes whose behavior doesn't fit into its dependency model.

If the killed process isn't stuck, but just doing something in the kernel that takes a long time, then to fit systemd's state model, we need to continue to wait for the process after we sent it SIGKILL. We only need to wait for the process to exit if we're going to do something where the process exit matters (e.g. start the process again, or umount a filesystem the process was using), but systemd doesn't have a place for such optimizations in its data model.

It's probably possible to do this in systemd with a helper utility in ExecStartPre to block restarts until the service cgroup is completely empty except for itself, but that's no longer "clean", and you'd have to configure it separately for every service that might need it.

> And I think you can increase the timeout time and/or disable it completely in this case.

I can't find a post-KILL timeout in the documentation or code, and I've already spent more time looking for it than I typically spend implementing application-specific service managers.

Not a "bloated, monolithic system"?

Posted Jan 31, 2019 22:16 UTC (Thu) by pizza (subscriber, #46) [Link] (15 responses)

> systemd is not written in stone, but it is unusually difficult to analyze, modify and deploy. Any day that starts with "understand how hundreds or thousands of units interact at runtime with a dynamic dependency resolver to produce some result" usually ends with "I need to do these 20 things, let's write a 25-line program that does those 20 things in the right order and run that program instead of systemd." I can never figure out how to turn that sort of day into a systemd pull request.

If you don't get a deterministic dependency graph for the things you need to happen in a given order, then you haven't sufficiently specified your dependencies. No dependency resolver can read your mind.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 3:56 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (14 responses)

> No dependency resolver can read your mind.

We don't want it to read our minds, or block devices, or assorted device buses, or any data source we didn't explicitly authorize it to consume. We want systems with predictable (or at least highly repeatable) behavior so they behave the same way in test and production.

Yes, we can translate our 25-line startup program into a simple dependency graph, lock down all the places where systemd could pick up extra nodes the graph, and dodge the parts of systemd that have builtin exceptions to the dependency evaluation rules. That's a lot of project risk, though, and sometimes significant cost, and--most important of all--nobody is paying us to do any of that work.

If we're doing safety or security audit on the system, the systemd dependency resolver (and any external code it executes) gets sucked into the audit since it's executing our dependency graph. If the choice comes down to "audit 25 lines of code" or "audit 25 lines of code and also systemd", well, one of those is cheaper.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 12:15 UTC (Fri) by pizza (subscriber, #46) [Link] (13 responses)

> If we're doing safety or security audit on the system, the systemd dependency resolver (and any external code it executes) gets sucked into the audit since it's executing our dependency graph. If the choice comes down to "audit 25 lines of code" or "audit 25 lines of code and also systemd", well, one of those is cheaper.

You're being disingenuous. That first statement should read "Audit 25 lines of code and also the shell interpreter [1] and also everything else the script invokes. [2]"

[1] ie bash dash or csh or busybox or perl or whatever it is that actually parses and executes that that script.
[2] Likely to include grep, psutils, and util-linux as well [3]
[3] Don't forget libreadline, glibc, libstdc++, and everything else the shell and those utilities depends on!

(When all of that "external code" is factored in, I suspect systemd will come out way, way ahead on the "least amount of total code that needs auditing" metric)

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 14:12 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (3 responses)

> Audit 25 lines of code and also the shell interpreter [1] and also everything else the script invokes. [2]

Already done for other projects, no need to do them again. Arguably if we ever did a systemd audit then we could reuse the result for multiple projects, but nobody wants to be the first.

> grep, psutils, and util-linux as well [3] Don't forget libreadline, ..., libstdc++,

We don't use 'em. It's mostly shell builtins, flock, maybe a couple of C wrappers for a couple of random kernel API calls. 'echo' and /sys/fs/cgroup are sufficient for a lot of use cases. Our scope ends once the application is execve()ed in a correctly configured environment and resumes when the application exits.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 15:18 UTC (Fri) by pizza (subscriber, #46) [Link] (2 responses)

> Already done for other projects, no need to do them again. Arguably if we ever did a systemd audit then we could reuse the result for multiple projects, but nobody wants to be the first.

What exactly do you mean when you say "audit"?

That could mean anything between "reviewing the license and patent overlap" to "line-by-line inspection/verification" of the sort that's needed to certify rocket avionics. Are you auditing algorithms and state machines to ensure all states and transitions are sane under any possible input? Are you auditing all input processing to make sure it can't result in buffer overflows or other sorts of security problems? Are your audits intended to ensure there's no leakage of personal information (eg in logs) that could run afoul of things like the GPDR?

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 22:31 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (1 responses)

> That could mean anything between "reviewing the license and patent overlap" to "line-by-line inspection/verification" of the sort that's needed to certify rocket avionics.

Closer to rocket avionics. Part of it is running the intended code under test and verifying that all executed lines behave correctly with a coverage analyzer. Sometimes you can restrict scope by locking down the input and verifying only the parts of the shell that execute, other times you swap in a simpler shell to interpret the service management code and audit 100% of the simpler shell.

> Are you auditing algorithms and state machines to ensure all states and transitions are sane under any possible input?

Attack the problem the other way: precompute an execution plan for the service dependency graph, ordered and annotated for parallel execution. The service manager just executes that at runtime. Conceptually, "shell script" is close enough to get the idea across to people, and correct enough to use in prototyping, but it might be a compiled or interpreted representation of the shell script by the time it gets certified. Add a digital signature somewhere to verify it before executing it.

Most of the time the execution plan can be computed and verified by humans: you need storage, then start networking and UI in parallel, then your application runs until it crashes, then reboot. Someone checks that UI and networking do not in fact depend on each other. In more complicated cases you'd want a tool to do the ordering task, so you provide the auditors with evidence your tool is suitable for the way you use it.

The execution plan does not look anything like sysvinit scripts. It does not have all the capabilities of systemd, since the whole point is to avoid having to audit the parts of systemd you didn't use. Only the required functionality ends up on the target system.

Normally these do not change at runtime except in tightly constrained ways (e.g. a template service can have a variable number of children). If for some reason you need a crazy-high amount of flexibility at runtime in a certified system, there is theoretically a point where the complexity curves cross over and it's cheaper to just sit down and audit systemd.

> Are you auditing all input processing to make sure it can't result in buffer overflows or other sorts of security problems?

If auditing something like a shell, you get a report that says things like "can accept input lines up to the size of the RAM in the system, but don't do that, that would be bad" and "don't let random users control the input of the shell, that would be bad".

So the problem for the service manager script reduces to proving that the shell, as used for the specific service manager input in the specific service manager environment, behaves correctly. This can be as little as some light code review and coverage testing (plus a copy of the shell audit report and a checklist of all the restrictions you observed).

> Are your audits intended to ensure there's no leakage of personal information (eg in logs) that could run afoul of things like the GPDR?

It's never come up. The safety-critical systems don't have any personal information in them, and the security-critical ones have their own logging requirements that are out of scope of the service manager.

In some cases you can mix safety-critical and non-safety-critical services on a system provided there is appropriate isolation between them. Then you audit the safety-critical service, the service manager, and their dependencies, and mostly ignore the rest.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 23:15 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> It's never come up. The safety-critical systems don't have any personal information in them, and the security-critical ones have their own logging requirements that are out of scope of the service manager.
I'm sorry. If your safety-critical systems have programs that can't be SIGKILL-ed cleanly and have several hundred of tightly-interconnected modules, then I want to run in the opposite direction from them.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 18:55 UTC (Mon) by jccleaver (guest, #127418) [Link] (7 responses)

> You're being disingenuous. That first statement should read "Audit 25 lines of code and also the shell interpreter [1] and also everything else the script invokes. [2]"

And you're not living in the real world. Your shell is already going to be audited, or an upstream audit is taking place if it matters.

Shell scripts can be complex. A 25 line shell script that just imperatively executes 20 commands and then exec's into a final daemon is not complex, and is in fact the simplest possible way to accomplish a deterministic goal on a *nix system.

It boggles my mind that there are administrators out there that would consider some other solution as simpler. Why are people so scared of procedural programming using a *nix shell?

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 19:08 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> Shell scripts can be complex. A 25 line shell script that just imperatively executes 20 commands and then exec's into a final daemon is not complex, and is in fact the simplest possible way to accomplish a deterministic goal on a *nix system.
Nope. How do you track the daemon state? Are you sure PID files are correct? How do you kill the daemon in case it's stuck? What if you want to allow unprivileged users to terminate the service? Do you need a separate SUID binary? ...

Writing a robust shell initscript is incredibly hard. It's not 25 lines of code, many initscripts are hundreds lines of code (and are still buggy as hell).

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 19:20 UTC (Mon) by zblaxell (subscriber, #26385) [Link] (1 responses)

> How do you track the daemon state?

cgroups, watchdogs, service pings...

> Are you sure PID files are correct?

Don't use 'em, because cgroups.

> How do you kill the daemon in case it's stuck?

cgroups

> What if you want to allow unprivileged users to terminate the service?

We don't.

> Do you need a separate SUID binary?

No.

> Writing a robust shell initscript is incredibly hard.

No.

> It's not 25 lines of code, many initscripts are hundreds lines of code (and are still buggy as hell).

No.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 19:31 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> cgroups, watchdogs, service pings...
OK. Can you show me an init script that is 25 lines long AND uses cgroups to guaranteed the process termination?

During one of the systemd flamewars some years ago I tried to find an actual init script that uses cgroups for comparison with systemd units. I was not able to do it.

Writing it was also decidedly non-trivial, I had to create my own cgroup hierarchy convention and make sure that there are no race conditions in cgroups manipulation. Both are not easy at all, and I was not even trying to use cgroups controllers to limit the resource use.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 20:30 UTC (Mon) by jccleaver (guest, #127418) [Link] (2 responses)

> Nope. How do you track the daemon state? Are you sure PID files are correct? How do you kill the daemon in case it's stuck? What if you want to allow unprivileged users to terminate the service? Do you need a separate SUID binary? ...

The OP was concerned about doing dependency ordering, the 25 line script was not the init script itself.

> Writing a robust shell initscript is incredibly hard. It's not 25 lines of code, many initscripts are hundreds lines of code (and are still buggy as hell).

No, it's not. On RedHat systems it's this (which, other than the URL, has not changed since basically 2004 - EL3/EL4):

1) Cut/paste this: https://fedoraproject.org/wiki/EPEL:SysVInitScripts#Inits...
2) Edit primary process's name and path
3) Add any additional custom logic your daemon needs

If your initscript for a basic daemon has a more complex structure than this, then you're probably doing something wrong. If your distribution forces you to do something more complex than this, then I'm sorry for you.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 20:37 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> The OP was concerned about doing dependency ordering, the 25 line script was not the init script itself.
So now it's 25 lines for deps, then another 1k lines for cgroups manipulation in Bash.

Noted.

> No, it's not. On RedHat systems it's this (which, other than the URL, has not changed since basically 2004 - EL3/EL4)
Incorrect. This doesn't have PID file management, for starters.

RH /etc/init.d/functions tooling

Posted Feb 4, 2019 22:05 UTC (Mon) by jccleaver (guest, #127418) [Link]

> > The OP was concerned about doing dependency ordering, the 25 line script was not the init script itself.
> So now it's 25 lines for deps, then another 1k lines for cgroups manipulation in Bash. Noted.

cgroup *definition* is out of scope. To *assign* this process to a cgroup just set
CGROUP_DAEMON="cpu,memory:test1" (or whatever) in /etc/sysconfig/foo

> Incorrect. This doesn't have PID file management, for starters.

Incorrect. You can see in the start/stop sections that it's recommended to use the 'daemon' and 'killproc' functions, which handle PID file management for you and fall back to looking for the process's exec if not found. If your daemon does something weird in launch, you can have your daemon make the pidfile and pass that file name into the functions with -p.

The default 'status' function handles pid files automatically too.

All of this is in /etc/init.d/functions.
I'm sorry that you don't appear to be using a RedHat system, which is clearly the better distribution.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 19:10 UTC (Mon) by pizza (subscriber, #46) [Link]

> And you're not living in the real world. Your shell is already going to be audited, or an upstream audit is taking place if it matters.

So why is systemd being held to a different standard?

> It boggles my mind that there are administrators out there that would consider some other solution as simpler. Why are people so scared of procedural programming using a *nix shell?

Shell is used for the same reason that folks use a screwdriver as a chisel or a punch. Sure it's convenient, and often gets the job done, but there's a much higher chance of unintended, and often quite painful, consequences.

Not a "bloated, monolithic system"?

Posted Mar 5, 2019 7:26 UTC (Tue) by immibis (subscriber, #105511) [Link]

Those would be there whether you use systemd or not... also, who said it was a 25-line shell script?

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 6:12 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (5 responses)

> It writes "Processes still around after SIGKILL. Ignoring." on the log and ignores the process(es). It might send SIGKILL to the same process again when the service restarts, but additional SIGKILLs aren't helpful.
I tried to replicate it by creating an unresponsive NFS share.

Systemd failed to kill the process (as expected) and the service entered the "failed" state. It was not restarted.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 14:14 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (4 responses)

> It was not restarted.

Not the bug I was looking for, but a bug nonetheless.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 20:15 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

> Not the bug I was looking for, but a bug nonetheless.
Uhm, why? A service that can't be SIGKILL-ed is clearly not safe to be restarted.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 22:11 UTC (Fri) by zblaxell (subscriber, #26385) [Link] (2 responses)

>A service that can't be SIGKILL-ed is clearly not safe to be restarted.

Look up the page a little: long-running processes that take a long time to exit after SIGKILL, but eventually get there. You want to restart them, but only after they exit, and there's a big time gap between KILL and exit.

Not a "bloated, monolithic system"?

Posted Feb 1, 2019 23:23 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

Then create a separate service responsible for the restart. Activate it on initial service's failure.

Duh.

Not a "bloated, monolithic system"?

Posted Feb 9, 2019 22:38 UTC (Sat) by nix (subscriber, #2304) [Link]

I'm sorry, but could you possibly be a little more contemptuous? It's not unpleasant enough to read this comment thread yet.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 20:44 UTC (Mon) by jccleaver (guest, #127418) [Link] (1 responses)

> The problem was that this daemon sometimes hanged on shutdown, ignoring anything short of targeted SIGKILL but still having the port open. So our tests periodically failed because of that.

Well, I mean it sounds like the problem was more that you were sending buggy code through the system. I'd have yelled first at the daemon dev, and secondly at the test writer for not cleaning it up itself, potentially with kill -9 if it wasn't responsive.

> The fix way back then was to "lsof | xexec kill" at the start of the test.

A test shouldn't have left it hanging, so I'd run that at the end. But if the problem was the blocked port, then, sure this would work too.

Congratulations, you fixed the blocker. I'd much prefer that approach, which is clean, easy to understand, and easy for a human to debug, than *ripping out PID1 and replacing it with something 50x more complicated* just because someone left a hanging process lying around.

Not a "bloated, monolithic system"?

Posted Feb 4, 2019 20:57 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Well, I mean it sounds like the problem was more that you were sending buggy code through the system. I'd have yelled first at the daemon dev, and secondly at the test writer for not cleaning it up itself, potentially with kill -9 if it wasn't responsive.
This was a closed source binary from a big vendor with the name starting with O and ending with "racle".

> A test shouldn't have left it hanging, so I'd run that at the end. But if the problem was the blocked port, then, sure this would work too.
Except that a test could also die in the middle of its run. Sometimes from OOM.

> Congratulations, you fixed the blocker. I'd much prefer that approach, which is clean, easy to understand, and easy for a human to debug, than *ripping out PID1 and replacing it with something 50x more complicated* just because someone left a hanging process lying around.
The correct decision here is EXCACTLY to create a generic solution that can be used to make sure that no bad code can cause damage.

This is why we have protected memory.

Not a "bloated, monolithic system"?

Posted Jan 30, 2019 22:25 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> I would agree with you if you had mentioned the human-factors benefit of consistency in the management interface, or some of the non-timer capabilities of systemd. Configuring things in multiple redundant places can suck.

> If you're asking "should I use systemd?", the answer is not "yes because timers are awesome."

No, timers alone certainly aren't enough to convert. All the other things that are possible because systemd unifies lots of functionality like configuration file formats, I see as good byproducts, but not a feature in and of itself. But that's a difference of viewpoint I suppose. Timers also certainly aren't the best feature overall, but for the functionality provided, I'd think the amount of code dedicated to it specifically is probably worth much more per line/codesize than most other code for features in systemd (again, IMO).