Systemd 254 released
Systemd 254 released
Posted Aug 1, 2023 21:12 UTC (Tue) by Wol (subscriber, #4433)In reply to: Systemd 254 released by paulj
Parent article: Systemd 254 released
Cheers,
Wol
Posted Aug 1, 2023 23:29 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link] (33 responses)
Posted Aug 2, 2023 7:39 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (32 responses)
As I said, I use a systemd unit file. In response to a comment that said writing such files was easy.
Except as soon as I put an execstop in there, it triggers a mad killing spree at boot that kills loads of unrelated services.
I don't know why (and haven't got round to trying to debug it).
Cheers,
Posted Aug 2, 2023 7:58 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link] (28 responses)
Posted Aug 2, 2023 10:39 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (27 responses)
And where did that come from? Not from me. Yes I think I can see the confusion, but at no point did *I* mention scripts at all. My comment was
"Because writing a trivial systemd unit file is not, in fact, trivial?"
Which it isn't. Writing my first unit file, and getting it to work with a simple ExecStart, wasn't easy. Then someone else added that ExecStop and the killing sprees started.
At some point I need to dig into the code to find out why this perfectly functional daemon does not function correctly with a very simple unit file :-(
Too many people seem to think that *their* normal applies to everyone else ...
Cheers,
Posted Aug 2, 2023 11:37 UTC (Wed)
by mb (subscriber, #50428)
[Link] (7 responses)
Well, it actually is trivial for many many cases.
I really think you are hitting a corner case here and your application is doing something absolutely crazy.
Posted Aug 2, 2023 14:21 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (6 responses)
Not necessarily the same thing.
But you can't expect a complete novice at writing them, to churn out several in the first hour, as the OP implied. My first attempt did nothing. I scrabbled around in the documentation, emailed the mailing list, and got exec start to work. It wasn't hard, but it was quite of lot of frustration trying to find out information.
Then as I say someone else added the exec stop and all hell broke loose.
It probably is true that writing unit files is pretty easy. But not to a novice. If I stuck TCL in front of you, even with excellent documentation you'd struggle, and it really is easy.
And it may also seem odd for someone on LWN so much, but I'm not that familiar with (or a fan of) "the Unix way". What little I know is what I've had to learn (and no, I wouldn't put Windows in my "favourite OS" list, either).
If you're a Unix fan, unit files probably felt familiar to you, even when meeting them for the first time. They still feel alien to me.
Cheers,
Posted Aug 2, 2023 15:09 UTC (Wed)
by mb (subscriber, #50428)
[Link]
I'm sorry. That doesn't really make any sense. At all.
And you're also running in circles. Multiple times. We all understand that you do have trouble writing a unit file and didn't succeed so far. But that's far from being the norm.
Posted Aug 2, 2023 23:01 UTC (Wed)
by rschroev (subscriber, #4164)
[Link] (4 responses)
Posted Aug 3, 2023 10:00 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (3 responses)
This thread all started because the OP to whom I replied said that someone else could port 30 or so SysV init scripts to systemd unit files in a few hours. If you don't know what that script is doing, what the daemon it's starting is doing, not it's NOT that trivial. And you don't stand a hope in hell of knocking them out that quick.
And I just gave my experience as an example, where an attempt to start a binary with a systemd unit file blew up in my face spectacularly, precisely because I didn't know what exactly that binary was doing. I know there are landmines. I know I clearly stepped on one. I just haven't debugged which one, yet :-)
Cheers,
Posted Aug 3, 2023 10:45 UTC (Thu)
by paulj (subscriber, #341)
[Link]
1. Open the init script
Pretty much every daemon I've ever used, there is an argument to enable or disable daemonisation, cause a) developers want to be able to debug daemons (run under GDB usefully); b) there already are other init systems (inc. various SysVs Unixes in the olden days, like AIX, that already have various process managers; and other hacky homebrew and vendor-hacky process managers) that want processes to not daemonise; and so this is nearly always easy to figure out and set.
Posted Aug 3, 2023 10:57 UTC (Thu)
by bluca (subscriber, #118303)
[Link]
Posted Aug 3, 2023 11:17 UTC (Thu)
by anselm (subscriber, #2796)
[Link]
And that should help convince anyone that allowing arbitrary shell code for daemon startup is, with hindsight, not the greatest of ideas. At least with systemd you know where you stand.
For most if not all SysV init scripts it's not a huge problem to come up with simple service units that call them (nobody said you had to get rid of the init script altogether, after all). Systemd even does that automatically, for now anyway. That way you're not taking advantage of many of the helpful and convenient things systemd can do, but it's a start. If you're only interested in keeping the service working as before once the automatic support for SysV init scripts is removed from systemd, it may be all you need to do. You could even take a peek in /run/systemd/generator.late to see what you can find there.
It's when you want to replace the init script completely with a service unit that you need to look at the init script to see what exactly it does, and that of course takes time (especially with some of the more gnarly init scripts out there). I don't think anyone has seriously doubted that.
Posted Aug 2, 2023 12:28 UTC (Wed)
by jem (subscriber, #24231)
[Link] (17 responses)
If the program daemonises, it probably writes its PID to a file. From man systemd.service: "If this setting is used, it is recommended to also use the PIDFile= option, so that systemd can reliably identify the main process of the service."
If the program expects to shut down in some special way, like running the program binary with a special command line parameter, add this to the ExecStop= option. The ExecStop= option is not mandatory; the default is for systemd to send a SIGTERM signal to the process, followed by a SIGTERM (if needed), with the assumption that the program catches one of these signals and does a graceful shutdown.
Posted Aug 2, 2023 14:37 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (16 responses)
I believe I've tried "forking = yes".
You do start and stop it with special command line arguments (--start and --stop would you believe :-), and yes with --start it does go into the background.
Beyond that, I need to investigate what's going on. I suspect the fact that it's backgrounding makes systemd think it's stopped and triggers the exec stop. And when it gets that, I suspect it gets confused as to what services are its own and what are not, and sends kills to the wrong processes. But that's a debugging session I need to get into when I have time. At present I just don't use exec stop.
Because if exec stop is enabled, I can guarantee a bunch of random services will fail to start, with systemd reporting they've been killed on startup :-(
Cheers,
Posted Aug 2, 2023 23:14 UTC (Wed)
by rschroev (subscriber, #4164)
[Link] (15 responses)
If that's what you see then that's what happening, but it goes completely against my understanding of how systemd behaves. It is my understanding that systemd runs the ExecStop commands when it wants to stop the service, not when it detects that the service is stopped.
Posted Aug 3, 2023 6:28 UTC (Thu)
by zdzichu (guest, #17118)
[Link] (14 responses)
Posted Aug 3, 2023 6:41 UTC (Thu)
by jem (subscriber, #24231)
[Link] (13 responses)
Posted Aug 3, 2023 7:35 UTC (Thu)
by zdzichu (guest, #17118)
[Link] (12 responses)
I disagree, ExecStop can be invoked if is defined and
By coincidence, systemd is open source so we don't have to guess! Function In summary, when ExecStop= is defined, it is run in multitude of cases, including service failure to start*. Not only when administrator requests service to stop. * - I suspect failures to start are most often caused by wrong Type=. I once wrote a blog note with a table explaning the symptoms of mismatch.
Posted Aug 3, 2023 8:33 UTC (Thu)
by rschroev (subscriber, #4164)
[Link] (11 responses)
"Note that the commands specified in ExecStop= are only executed when the service started successfully first. They are not invoked if the service was never started at all, or in case its start-up failed, for example because any of the commands specified in ExecStart=, ExecStartPre= or ExecStartPost= failed (and weren't prefixed with "-", see above) or timed out."
That directly conflicts with your items 1, 2, and 4, and I feel it also conflicts with 6 and 9. And I don't see where it is documented that ExecStop= commands are called when systemd detects that processes have stopped (I haven't read *all* the documentation though, there's quite a lot of it); I feel generally while the documentation does explain what ExecStop does, it doesn't say enough about if and when ExecStop is triggered.
If the documentation is wrong, incomplete or unclear, I don't see how we're supposed to write correct unit files that work in all cases including edge cases. We shouldn't have to read the code to find out.
Posted Aug 3, 2023 10:07 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
What then triggers the mass killing I don't know. All I know is (1) it only happens if the systemd unit file contains an ExecStop. And (2) iirc the systemd logs actually point the finger straight at this binary!
At some point I need to fix it, but it's a load of reverse engineering I don't have time for :-(
Cheers,
Posted Aug 3, 2023 10:35 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (9 responses)
So, long story short, that manpage is here: https://github.com/systemd/systemd/blob/main/man/systemd.... please send a PR to reword so that it becomes clear to you as a user, and I'll happily review and merge it
Posted Aug 3, 2023 11:11 UTC (Thu)
by rschroev (subscriber, #4164)
[Link] (8 responses)
> to me it's perfectly obvious that ExecStop is ran regardless of _how_ a unit went away
But *when*? Is it triggered e.g. at the time you do 'systemctl stop', regardless of what happened to the service in the meantime? Or is triggered at the time systemd notices that the service went away? That's a big difference.
> To me, "Note that the commands specified in ExecStop= are only executed when the service started successfully first." is clear.
It seems clear to me too, but my interpretation is contradicted by the list in zdzichu's comment (https://lwn.net/Articles/940224/), which is correct as far as I can see. According to that, the commands in ExecStop= *are* executed even if the service did *not* start successfully, at the moment systemd detects that.
Posted Aug 3, 2023 12:28 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (3 responses)
(1) systemd fires off a process
That certainly is the sort of behaviour I assumed was behind the double-daemonisation, and why this fork option was added to the unit file - to prevent exactly this mis-understanding by systemd. I must admit that wasn't obvious from said documentation but it was all in there ...
And that's what's probably behind ExecStop being executed in my case (still doesn't explain the killing spree ...)
Cheers,
Posted Aug 3, 2023 13:05 UTC (Thu)
by gioele (subscriber, #61675)
[Link] (1 responses)
Maybe the service was also launching other services or calling other init scripts?
In that case these newly spawn processes will live inside the cgroup of the service and are going to be killed by systemd once the main service is stopped.
Posted Aug 3, 2023 15:37 UTC (Thu)
by Wol (subscriber, #4433)
[Link]
Oh the joys of people not reading the thread. The killing spree is OF OTHER SERVICES which have nothing whatsoever to do with the service causing the problem ...
Ie something is seriously wrong somewhere. I just need to debug it.
Cjhers,
Posted Aug 3, 2023 21:14 UTC (Thu)
by malmedal (subscriber, #56172)
[Link]
Wild guess. Some init-scripts kill process-groups instead of pids, so if it hit the wrong one...
Posted Aug 3, 2023 16:55 UTC (Thu)
by jem (subscriber, #24231)
[Link] (3 responses)
The linked man page contains the following text for ExecStop: "Commands to execute to stop the service started via ExecStart". This hints that the purpose of ExecStop is to provide the commands to explicitly stop the service, triggered by some external event like systemctl stop. Looking at the code, it is called as a direct result of systemctl stop, which calls service_stop. If the service state is SERVICE_RUNNING, the service_stop function unconditionally calls service_enter_stop, which in turn executes the command specified in ExecStop (if any). I don't see why it would be "totally confusing and unexpected" to a user that the commands in ExecStop are not run if the service fails to start. If the service failed to start, what's the point in trying to stop it? You don't try to close a file that you failed to open, either.
Posted Aug 3, 2023 18:09 UTC (Thu)
by rschroev (subscriber, #4164)
[Link] (2 responses)
But according to the source code, the ExecStop commands *are* run even if the service fails to start. Referring to zdzichu's comment somewhere in this thread (see https://lwn.net/Articles/940224/), line 2264 in service.c (in service_enter_running()) calls service_enter_stop() when a service fails to start. service_enter_stop() in turn executes the ExecStop commands.
I agree with you: I don't expect ExecStop to be triggered if a service fails to start. The documentation agrees, if I interpret it correctly. But unless both zdzichu and I are misreading the code, the code does trigger it in that case.
Posted Aug 3, 2023 18:56 UTC (Thu)
by bluca (subscriber, #118303)
[Link] (1 responses)
$ sudo systemd-run --quiet -t -p ExecStop="echo hello" false
Posted Aug 4, 2023 7:47 UTC (Fri)
by rschroev (subscriber, #4164)
[Link]
$ sudo systemd-run --quiet -t -p ExecStop="echo hello" -p ExecStartPost="false" true
So it seems we both did misread the code. Good, that solves my worries.
Posted Aug 2, 2023 17:18 UTC (Wed)
by mjg59 (subscriber, #23239)
[Link]
Posted Aug 2, 2023 9:15 UTC (Wed)
by anselm (subscriber, #2796)
[Link] (2 responses)
The approach where in your systemd unit file you use ExecStart= and ExecStop= to call your existing init script is generally pretty safe. In effect it's what systemd's SysV init compatibility layer does, too.
But of course whatever the init script does is outside systemd's control, and some init scripts can be pretty wild. What I'm wondering is why the behaviour you're seeing would in any way, shape, or form be systemd's fault. At its heart systemd is a fairly straightforward piece of software, certainly as far as launching services is concerned.
Posted Aug 2, 2023 10:43 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
Which can make fixing problems EXTREMELY difficult - one only has to look at society to see the trouble this causes :-(
Cheers,
Posted Aug 2, 2023 11:46 UTC (Wed)
by anselm (subscriber, #2796)
[Link]
To be precise, this is what happens to you, or at any rate what seems to have happened to you in one particular instance. It is certainly not the norm.
As I said, the usual way for systemd to support services which only have SysV init scripts is to construct basic systemd service units on the fly that essentially use ExecStart= and ExecStop= to invoke the init script, and that seems to work without a hitch in the vast majority of cases. Taking such an init script as a first approximation to/starting point for a free-standing service unit file would not be the dumbest of ideas (even though systemd prefers services that don't double-fork).
Systemd 254 released
Systemd 254 released
Wol
Systemd 254 released
Systemd 254 released
Wol
Systemd 254 released
> Which it isn't.
We obviously can't debug that here on LWN.
But please stop saying that writing systemd unit files was hard, just because you have *one* case that might be a bit harder. In general it is easy.
Systemd 254 released
Wol
Systemd 254 released
Systemd 254 released
Systemd 254 released
Wol
Systemd 254 released
2. Find where it launches your daemon
3. Remove the argument telling the daemon to daemonise
4a. If this is the first one you're doing: Write the trivial systemd unit file to ExecStart that script
4b. If not the first, copy the trivial systemd unit file you've already got and change the ExecStart line.
Systemd 254 released
Why is it better? Because you can then iteratively improve on that, pick up recommended patterns, add sandboxing, etc etc, so that it can evolve and improve over time, rather than being ossified to the lowest common denominator of whatever silliness was happening back in the 80s.
Systemd 254 released
If you don't know what that script is doing, what the daemon it's starting is doing, not it's NOT that trivial.
And you don't stand a hope in hell of knocking them out that quick.
Systemd 254 released
Systemd 254 released
Wol
Systemd 254 released
Your understanding is incomplete. Quoting man systemd.service:
Systemd 254 released
Also note that the stop operation is always performed if the service started successfully, even if the processes in the service terminated on their own or were killed.
Nevertheless, Wol's stories about one service stop killing unrelated services are hard to believe in. Unless he wrote ExecStop=/usr/bin/killall…
Systemd 254 released
Systemd 254 released
service_enter_stop()
runs ExecStop=
if the section defined.
service_enter_stop()
is invoked in 10 cases:
Systemd 254 released
Systemd 254 released
Wol
Systemd 254 released
Systemd 254 released
Systemd 254 released
(2) this process fires off the daemon and exits
(3) systemd sees the process has terminated, and runs ExecStop
Wol
Systemd 254 released
Systemd 254 released
Wol
Systemd 254 released
Systemd 254 released
But *when*? Is it triggered e.g. at the time you do 'systemctl stop', regardless of what happened to the service in the meantime? Or is triggered at the time systemd notices that the service went away? That's a big difference.
Systemd 254 released
Systemd 254 released
$ sudo systemd-run --quiet -t -p ExecStop="echo hello" true
hello
Systemd 254 released
-> gives no output
Systemd 254 released
Systemd 254 released
Except as soon as I put an execstop in there, it triggers a mad killing spree at boot that kills loads of unrelated services.
Systemd 254 released
Wol
Systemd 254 released
I am merely observing that THAT IS WHAT HAPPENS.