Distributors ponder a systemd change

Posted Jun 8, 2016 7:34 UTC (Wed) by ras (subscriber, #33059)
In reply to: Distributors ponder a systemd change by drag
Parent article: Distributors ponder a systemd change

> When users log out they don't usually want to have a bunch of processes lingering unless they explicitly expect it.

'nix has always fulfilled this expectation, since at least V7. When the user logged out every process in the login session was sent a SIGHUP. If the process wants to hang around it had to intercept the signal, since the default was to kill it. (And yes, I know you know this.)

If the user wanted it to hang around he had to use nohup or an equivalent. As far as I can tell that won't change under this regime, albeit nohup will have to jump through different hoops.

Inexplicably, they look to be worse hoops. Before a program could distinguish between the user logging out, (SIGHUP), and the user asking the program to exit (SIGTERM). So for example if I deliberately left a process running after logging out by masking SIGHUP, I could ask it nicely to exit by sending it a SIGTERM later. Now that distinction is gone.

There is a second change: the option to send a SIGKILL if the process doesn't respond to SIGHUP. I can think of situations that might be useful, although it is a stretch - it isn't in any systemd installation I'm familiar with.

A PAM session management plugin would be the cleanest way to implement it. For example, I don't recall every wanting to leave something running when I exit a GUI session on my laptop, but if I ssh into a server and start a tmux session, it had better damned well still be running when I log out. PAM can already distinguish between these cases - no extra code required. Sending the SIGKILL to wayward processes after a grace period would only be a few lines of code.

And there is a third change: to flip the default from not sending the SIGKILL to sending a SIGKILL for everything. To re-iterate what I said above, that would be a perfectly reasonable if most installations had policy abusing users, and so sysadmins found themselves having to change the default on most machines they configured. But given no one has bothered to write the PAM plugin in the last decade I doubt rouge processes running after logout are a serious problem. On the other hand, I can tell you because the current implementation can't distinguish between session types I personally would have to turn it off on every install I do. And I don't know anybody that doesn't apply to.

Finally, I am sure someone will argue that SIGHUP clearly doesn't work because there are occasionally rouge processes left around on logout. But they are only hanging around become someone has screwed up the session tracking (in which case this new solution won't work either), or because they are deliberately ignoring SIGHUP for some reason. Presumably the reason will remain after this change, so they too will alter their programs to jump through the new hoops. And so, after the few it takes everyone to adapt, we will be back where we started.

Distributors ponder a systemd change

Posted Jun 8, 2016 8:42 UTC (Wed) by matthias (subscriber, #94967) [Link] (5 responses)

There is another possiblity: intercepting SIGHUP to do a clean shutdown of a program and then running into a bug. I have seen left-over processes just consuming 100% cpu time.

The session-mangement has to know wether a process should survive. If a process shall not survive, it has to be SIGKILLed if the usual shutdown procedure fails for whatever reason. We cannot expect all programs to be bugfree.

Distributors ponder a systemd change

Posted Jun 8, 2016 16:13 UTC (Wed) by ballombe (subscriber, #9523) [Link]

What about data loss when the program receives the sigkill before it completed its shutdown ?

Distributors ponder a systemd change

Posted Jun 10, 2016 16:17 UTC (Fri) by azumanga (subscriber, #90158) [Link] (3 responses)

But most processes that intercept SIGHUP want to stay alive, so are just all going to switch to telling session-management they want to survive?

Distributors ponder a systemd change

Posted Jun 10, 2016 17:49 UTC (Fri) by matthias (subscriber, #94967) [Link] (2 responses)

Simply not true. Every process doing file IO should intercept SIGHUP to ensure not terminating in the middle of some IO and producing garbage. If such a process freezes for some reason, the normal way of terminating will not work and the process is waiting to be SIGKILLed, either by the user, at session end or during shutdown. I have seen such processes. SIGHUP is simply a signal saying please terminate, your session has ended. Up to now session management was assuming that this always works. With this change systemd will clean up processes where this did not work.

Which processes do you have in mind that would be changed? Obviously tmux and screen. Anything else? Processes that are actually daemons do not count (they are out of scope anyway as they do not belong to the session). Neither do processes count that are explicitly backgrounded by the user. For these processes the user will make sure that session management does not kill them (starting with a wrapper like systemd-run).

Distributors ponder a systemd change

Posted Jun 10, 2016 18:05 UTC (Fri) by viro (subscriber, #7872) [Link]

... assuming they knew they'll need these processes to outlive the session back when they were starting said processes. Unfortunately, the situations when it's not true tend to be of the "lots of time already went into computation and kill&restart the right way is very unappealing" variety.

Distributors ponder a systemd change

Posted Jun 11, 2016 10:53 UTC (Sat) by ras (subscriber, #33059) [Link]

> Simply not true. Every process doing file IO should intercept SIGHUP to ensure not terminating in the middle of some IO and producing garbage. If such a process freezes for some reason, the normal way of terminating will not work and the process is waiting to be SIGKILLed, either by the user, at session end or during shutdown. I have seen such processes. SIGHUP is simply a signal saying please terminate, your session has ended. Up to now session management was assuming that this always works.

All accurate.

> With this change systemd will clean up processes where this did not work.

Sadly systemd isn't physic and so can't currently distinguish between when it didn't work and when it didn't matter. But as you observe, there are only a few known programs this effects. So they could be patched. And I willing to concede the inhouse broken by this change don't matter. Thats seems to be business as usual open source - I get screwed over by non-backward API changes on a fairly regular basis.

But this always cleaning up processes where "it didn't work": not a good idea. The default should be not hide the problem. Just in case you don't know: "didn't work" is bad thing. It's caused by a bug. It is better for all of us if we get that bug fixed ASAP. If processes hanging around caused most of us a lot of pain, then maybe you would have point. But we lived with it for 30 years, so the pain can't be that great.

Nonetheless as you have pointed out most is not all, and in particular this behaviour has caused you real pain. Fair enough. I hope nobody would argue with it providing a solution for you and anybody else that has it.

My issue: that solution has existed for 15 years.

Distributors ponder a systemd change

Posted Jun 8, 2016 17:41 UTC (Wed) by drag (guest, #31333) [Link] (7 responses)

> 'nix has always fulfilled this expectation, since at least V7. When the user logged out every process in the login session was sent a SIGHUP. If the process wants to hang around it had to intercept the signal, since the default was to kill it. (And yes, I know you know this.)

What if you consider a 'user session' to be system-wide instead of tied to a specific TTY?

I want to start processes that I can connect to and use regardless of how I log into a system, but when I am gone from the system completely I want them to be gone as well. Except in specific cases when I don't.

A example of this is Emacs.

Right now I use Emacs in 'daemon mode'. I would like to be able to connect and use it from different logins. I should be able to launch new client windows from another system over SSH and I would like to be able to use it from local login as well as X. When X dies I don't want it necessarily to be killed along with X if I happen to be using it from SSH for example.

But I also use Emacs for sensitive things. I have gpg-encrypted files that I open in Emacs for old passwords and things like that. So I _REALLY_ don't want Emacs sitting there running forever when I log out completely. It's far better for me to have to re-start emacs then have it sitting there with all my passwords decrypted in memory if I get forcefully disconnected from it.

I also want the behavior to be the same regardless how I happen to launch Emacs. If I launch it from a terminal emulator I want it's session behavior to be the same as if I launched it from ssh or from a X application menu.

> A PAM session management plugin would be the cleanest way to implement it.

How is a PAM session management plugin going to know which programs I want dead and which ones I don't?

Distributors ponder a systemd change

Posted Jun 8, 2016 23:17 UTC (Wed) by ras (subscriber, #33059) [Link] (6 responses)

> How is a PAM session management plugin going to know which programs I want dead and which ones I don't?

It can't easily know. (Easily being the operative word. It's just code, after all. It could say read ~/.kill-on-logout.lua, and use some function in there to decide.)

But I'm missing something here. How can any other solution easily know? Or perhaps you are focusing on the 1 -> 0 sessions trigger I gather the KillUserProcesses solution uses. If so, obviously the PAM plugin could use the same trick.

Sorry, I don't really understand your question.

Distributors ponder a systemd change

Posted Jun 9, 2016 4:49 UTC (Thu) by drag (guest, #31333) [Link] (5 responses)

> How can any other solution easily know?

Well now I have the feeling that I was missing something.

You just tell systemd you want the process to live via the 'system-run --user' or launch the program via '--user' service file with a 'linger' option. But I suspect that is what the behavior is now for '--user' for 230 anyways. I'll need to play around with it I guess.

If it is true you can cause a process to linger just with a invocation of system-run or defining a service file, then my concerns/issues/questions are already all addressed.

Yeah, so I don't know.

When you can strip out most of the deamonization portions of a program and replace it with a simple <prognam>.service text file or wrapper shell script that calls 'system-run'.... and get superior results then was possible before I suspect that is the way to go rather then screwing around with pam, calls to dbus, or anything else. The simple solution is usually the better one.

I am starting to suspect that this is all just one huge non-issue as far as the technical issues are concerned, the problems are solved and the work needed by the developers is actually simplified. It's the social aspect of it that is the problem. People don't like to see change. I am not trying to downplay the issue, though. The social aspects are important.

Distributors ponder a systemd change

Posted Jun 9, 2016 5:43 UTC (Thu) by ras (subscriber, #33059) [Link] (4 responses)

> I am starting to suspect that this is all just one huge non-issue as far as the technical issues are concerned, the problems are solved and the work needed by the developers is actually simplified.

Now I'm lost. The work required by developers under the old scenario was one of:

- None: if the user wants it to run after logout, he uses "nohup comand" or uses tmux or something.

- signal(SIGHUP, SIG_IGN)

How on earth could it get any easier than that?

Sure, it doesn't handle the emacs scenario you mentioned, but that seems to be pretty esoteric. It's very rare for me to have two logins on one machine, let alone a burning desire to share an editor session that I want killed when I log out of all sessions. Breaking backward compatibility, and adding all this complexity for the sake of something so specialised seems like a very odd design decision. It only gets odder when you know few lines of Python PAM module could do it.

Distributors ponder a systemd change

Posted Jun 9, 2016 7:33 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

What if a user _does_ NOT want a process to survive a logout?

The scenario is dead simple - you logout, you log in and get two copies of a process that should exist in only one instance.

Distributors ponder a systemd change

Posted Jun 10, 2016 0:57 UTC (Fri) by ras (subscriber, #33059) [Link]

> What if a user _does_ NOT want a process to survive a logout?

Two scenario's:

The program does not have a bug, which means if it survived logout he ran it under with screen or similar. Solution: don't run it under screen
The program has a bug, and he files a bug report. In the mean time kill it maybe? It's not difficult.

And so now you say but this is effecting a large cluster of machines the unwashed masses use - and they aren't going to do the kill. So the sysadmin that looks after them to interrupt his tea break on occasion - until the bug fix arrives. As a part time sysadmin myself, I have to concede this is indeed a very serious situation.

Fortunately, there is a workaround. A short PAM module:

import os, signal, sys, time def pam_sm_open_session(pamh, flags, argv): return pamh.PAM_SUCCESS def might_fail(func, default=None): try: return func(*args) except EnvironmentError: return default def session_pids(): my_pid = str(os.getpid()) get_proc = lambda pid, name: open("/proc/%s/%s" % (pid, name)).read() my_session = get_proc(my_pid,"sessionid") return ( int(pid) for pid in os.listdir("/proc") if pid[0] >= '0' and pid[0] <= '9' and pid != my_pid if might_fail(lamnda: get_proc(pid, "sessionid")) == my_session if not "Z (zombie)" in might_fail(lambda: get_proc(pid, "status"), "")) def kill_all(sig): for pid in session_pids(): might_fail(lambda: os.kill(pid, sig)) def pam_sm_close_session(pamh, flags, argv): kill_all(signal.SIGTERM) for i in range(50): if not any(session_pids()): return pamh.PAM_SUCCESS time.sleep(0.1) kill_all(signal.SIGKILL) return pamh.PAM_SUCCESS

Job done! Well maybe. Comparing sessions works fine for ssh, but GNOME creates many of them now. To fix that the easiest kludge would be to kill all the user's processes when all of his sessions are gone. It would be hacky and racey - but this is just a kludge until the real bug fix comes in. It's a pity that's exactly what the real bug fix does.

Distributors ponder a systemd change

Posted Jun 9, 2016 16:31 UTC (Thu) by drag (guest, #31333) [Link] (1 responses)

> Sure, it doesn't handle the emacs scenario you mentioned, but that seems to be pretty esoteric.

These sorts of command-control programs are becoming more common. Tmux is a example. Emacs is a example. But there are others. Gnome-terminal uses a terminal daemon to manage things. Urxvt has the ability to use Urxvtd to as well. With X they are limited to a particular login, but is that same limitation going to exist for Wayland? Is it going to be possible to run the program independent of the display and connect to it?

Then there is things like pulseaudio, mpd (music player daemon), irc bots, irc clients, IM bots, email weirdness, etc etc. These are programs you launch, you leave them floating around, and then connect to using a different process. In the future you'll start running into more AI stuff like Mycroft. Were you have 'helper' programs that the user interacts with, open source versions of Siri, 'hello google' or whatever.

They would all work just a bit better if they were tied to a user being logged into a machine, but not to a specific login.

What I think is going to happen is that we are running into a 'long tail' situation. Each of these things is esoteric, but there are a whole of people wanting to do their own esoteric thing.

At this point I really don't know. I'll have to play around with it.

Distributors ponder a systemd change

Posted Jun 10, 2016 4:01 UTC (Fri) by ras (subscriber, #33059) [Link]

> With X they are limited to a particular login, but is that same limitation going to exist for Wayland? Is it going to be possible to run the program independent of the display and connect to it?

Lots of good questions. The answer to all of them is probably something along the lines of "session tracking is broken, lets fix it"

It's not like the problem of keeping something around only while there are references to it hasn't been stumbled over, cursed at and solved a million times already. The answer being proffered here is "session tracking is broken, so we're abandoning it".

That aside, rather than solving the issue at hand in a minimally intrusive way (may be by sending a SIGHUP to all processes owned by the user when his login session count drops to 0?), they pair it with "followed by an unconditional SIGKILL" because in their opinion they way we have been doing it for the last 30 years is wrong, and we need to be forced down their enlightened path.

If they had of decoupled the two, they probably would have got the first one through without much fuss and if the second one was a good idea it would have become the default in due course.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 3:33 UTC (Thu) by ewen (subscriber, #4772) [Link] (22 responses)

The clearest description of where this went wrong that I've seen is this comment on the Debian bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=825394#221

which points out that sending SIGHUP at "session termination" time would have been the compatible thing to do. screen/tmux/nohup, etc, all know how to ignore SIGHUP, and SIGHUP is precisely intended as a "end of user session" indicator, ie, the controlling terminal has gone away. (Now we don't have controlling terminals that much, but we have a more sophisticated idea of "session" -- so "session has gone away" makes more sense as the meaning of SIGHUP.)

The choice to send SIGTERM (ie user initiated termination) rather than SIGHUP (external action initiated, ie session gone) -- and particularly to default to following that up with SIGKILL -- seems to be the root cause of the pain experienced. By contrast, turning on a default of sending SIGHUP at the "end of session", when that's a "GUI session" without a controlling terminal, seems fairly likely to produce the right, backwards compatible, results (since all the "intended to stay running" programs know how to handle SIGHUP, and have for decades; and all the "intended just during login session" do something useful on SIGHUP, even if it's just the default behaviour of exiting).

FWIW, it does seem reasonable to have a "sysadmin enabled, off by default" session manager policy option to also send SIGTERM/SIGKILL at "end of session" if the site policy is "no persistent processes at all". But I don't think that's the common case at all. Particularly for what seems to be the original cause for the change (ie, login session processes persisting "too long" because they didn't get HUP'd on last logout due to not having a controlling terminal).

Ewen

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 5:39 UTC (Thu) by matthias (subscriber, #94967) [Link] (21 responses)

Problem is that SIGHUP semantics do not always work. There are three sensible reactions to SIGHUP:
1. do nothing. kernel will clean up
2. intercept because process wants to survive
3. intercept to do a clean exit (save some data, etc.)

If 3 wents wrong, the process survives. A cleaner implementation would have been to sent a SIGHUP followed by some SIGHUP2 with the meaning that SIGHUP2 is only intercepted by daemons, i.e., with the SIGHUP a process does a clean exit and SIGHUP2 will terminate all non-daemons that failed for whatever reason to exit. Unfortunately it will be hard to change this semantics.

Ontop we have the problem of processes ignoring SIGHUP for other reasons, as SIGHUP get also send without the login session ending (e.g., closure of X terminals). After all the semantics of SIGHUP have changed in time.

For most of the processes the correct choice is they should not survive the login session. The old behaviour is not really working this way, as SIGHUP is not successful in terminating all these processes. Having the session manager do a SIGTERM/SIGKILL at the end of the session is reasonable. However it needs to know which processes should survive. Therefore we need small changes to very few applications.

- We need some version of nohup that also tells systemd to not kill the process (systemd-run should work, users need to get used to this. Of course one could install a version of nohup that takes care of this)
- programs that start some sort of long-living sessions (e.g., screen, tmux) should really start sessions on their own. Starting a PAM session for screen has also the advantage that the session management can take care of not terminating processes like ssh-agent while they are still needed for the program inside the screen.

This way, session management would be much cleaner independent of a no persisting processes policy. Such a policy would then be implemented by not allowing lingering processes and not allowing access to cron, at, batch (and possibly screen,tmux). With screen, tmux employing propper session management, a user using these programs would still be listed as logged in. So it is harder to hide some processes. Of cousre depending on policy someone might want to restrict the access to these programs, too.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 6:00 UTC (Thu) by ras (subscriber, #33059) [Link] (14 responses)

> If 3 wents wrong, the process survives.

True. But if you're doing cleanup on exit, your also doing it for a SIGTERM at the very least. It will be identical code and if a SIGHUP freezes it's very likely a SIGTERM will do the same thing.

> The old behaviour is not really working this way, as SIGHUP is not successful in terminating all these processes.

Then there is a bug. In the situation this change is trying to address is a bug that was introduced deliberately by Gnome / Systemd. They wanted to see some user services (eg, address book), between login sessions. This means when the user logged out, it had to ignore SIGHUP.

I've seen several versions of the address book service running on my own laptop, so I've been hit by it myself. I can't say felt a huge impact beyond thinking "gee, that's untidy". It certainly was not worthy of anything more than raising the energy to file a bug report. I confess it seemed so minor I didn't even bother doing that.

> - We need some version of nohup that also tells systemd to not kill the
process (systemd-run should work, users need to get used to this).

Yep, we have it. It's called signal(SIGHUP, SIG_IGN). Then the process won't die, and no one has to learn anything. It ain't rocket science. If you want to enforce a policy of all processes being forced to exit on logout, add a PAM plugin. But most people won't use it because it's not a serious issue on a headless server or a personal laptop, and Gnome will have to find some other way of fixing their bug.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 6:24 UTC (Thu) by matthias (subscriber, #94967) [Link] (13 responses)

>> If 3 wents wrong, the process survives.
> True. But if you're doing cleanup on exit, your also doing it for a SIGTERM at the very least. It will be identical code and if a SIGHUP freezes it's very likely a SIGTERM will do the same thing.

Therefore systemd sends a SIGKILL after some time. This is meant to bring down the processes where SIGTERM did not work. This is the same mechanics that shutdown has used since decades.

>> The old behaviour is not really working this way, as SIGHUP is not successful in terminating all these processes.
> Then there is a bug.

Obviously, but it is not only a bug of gnome. I have seen this bug on KDE many years before systemd even existed. Of course it is nice to fix the bugs, but it is also obvious that there always will be some bugs.

>> - We need some version of nohup that also tells systemd to not kill the process (systemd-run should work, users need to get used to this).
> Yep, we have it. It's called signal(SIGHUP, SIG_IGN).

Unfortunately, SIGHUP is also sent in some situations when the login session is not terminating. So there is software ignoring SIGHUP for other reasons as that the process should survive the session. Also every software with a bug in the SIGHUP signal handler could be a problem. From my experience, problems with the SIGHUP handler are the usual reason for processes lingering around that should have exited.

> But most people won't use it because it's not a serious issue on a headless server or a personal laptop, ...
I expect distros to accept the change, once the few problematic programs have fixes. Most users will not change it back, once screen and tmux work and the manual says use systemd-run for background processes instead of nohup, as they will not encounter any problems.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 7:08 UTC (Thu) by ras (subscriber, #33059) [Link] (12 responses)

>> SIGTERM at the very least. It will be identical code and if a SIGHUP freezes it's very likely a SIGTERM will do the same thing.
>
> Therefore systemd sends a SIGKILL after some time. This is meant to bring down the processes where SIGTERM did not work. This is the same mechanics that shutdown has used since decades.

I think you missed the point. The point is there is a bug in the SIGHUP handling, there is also most likely a bug in the SIGTERM handling. Sending a SIGKILL does not fix the problem. It hides it. Assuming the application is trapping both of these for a reason such as saving data, the fixing the bug is the correct path - not hiding it.

That said, it's been a long while since I've seen either problem. Until now, when GNOME introduced it as a "feature".

> Obviously, but it is not only a bug of gnome. I have seen this bug on KDE many years before systemd even existed.

Yep, and it was fixed by KDE long long ago. The difference GNOME / Systemd doesn't consider what they have done to be a bug - it's a new feature. Then to fix the bugs their new feature introduced they want to breaks backward compatibility with systems that don't use GNOME. KDE had the decency to fixed their bugs without using it as an excuse to inflict their version on Utopia on everyone else.

> Unfortunately, SIGHUP is also sent in some situations when the login session is not terminating.

Only when it's been co-opted for other purposes - like reloading the configuration in system daemons. And if the particular program does either they are uninterested in knowing when the user logged out, or they have introduced a bug because there is no other way to know short of polling. This won't change under the proposed new regime, as SIGHUP will remain the way a process learns the login session has ended.

> I expect distros to accept the change, once the few problematic programs have fixes.

We will see. As the article points out, the tmux people don't see their program as the problematic one in this case, and from what I can tell a fairly large cohort of people agree with them.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 9:09 UTC (Thu) by matthias (subscriber, #94967) [Link] (11 responses)

>> Therefore systemd sends a SIGKILL after some time. This is meant to bring down the processes where SIGTERM did not work. This is the same mechanics that shutdown has used since decades.
>I think you missed the point. The point is there is a bug in the SIGHUP handling, there is also most likely a bug in the SIGTERM handling. Sending a SIGKILL does not fix the problem. It hides it. Assuming the application is trapping both of these for a reason such as saving data, the fixing the bug is the correct path - not hiding it.

I am arguing that session shutdown is like system shutdown. Since decades we use SIGTERM/SIGKILL when shutting down the system. Would you argue that when I type shutdown -r now and some application is not terminating cleanly, then the system should hang forever because sending a SIGKILL after SIGTERM is hiding bugs?

I fully agree that bugs should be fixed, but on the other hand some fundamental things as session management should handle bugs of applications gracefully.

>> Unfortunately, SIGHUP is also sent in some situations when the login session is not terminating.
> Only when it's been co-opted for other purposes - like reloading the configuration in system daemons.
Or a pty going away because an X terminal is closed. Not every X terminal is a session on its own. Semantics have changed in time.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 10, 2016 2:49 UTC (Fri) by ras (subscriber, #33059) [Link] (10 responses)

> I am arguing that session shutdown is like system shutdown.

Yes, I knew that and should have addressed it, I guess. But it did sound to me like "I use a hammer to crack nuts, so why not an egg?"

The way I see it is: a process hangs around after logout when it's shouldn't, about the only harm done is a little lost RAM, or at worst a pinned CPU if it's gone infinite. If that happens and it bothers you, the fix is also simple: kill it. On the other and automatically killing a process when it hasn't shutdown properly delays getting the bug fixed. And there is a bug that needs to be fixed: either it doesn't matter in which case why is trapping SIGHUP at all, or it does matter and tears will follow one day.

If a process doesn't stop on shutdown the implications are much more severe. I've lost control of remote servers because of it. Plane flights cost time and money. It's not that the consequences of killing the process isn't the same: both result in loss of information. It's that the consequences of not shutdown not happening is very different.

> Or a pty going away because an X terminal is closed. Not every X terminal is a session on its own. Semantics have changed in time.

Yes they have. The session id the kernel used has been co-opted for all sorts of purposes now. This is the real problem you are grappling with. We used all sorts of kludges to get around it, but apparently these GNOME changes were the straw that broke the camels back. It seems session tracking is now far too hard, so rather than track sessions you've decided killing all processes belonging to a user is the way to go.

Obviously it's a kludge. It's racy (what if a person is logs between the 0->1 test and you lot starting to kill processes), and it won't always work (what about processes started as a different user), and it isn't backward compatible with what worked for 30 years now.

I'd have more sympathy if you were trying to get something simple done and stumbled onto this mess. Instead, you lot with your multiple systemd process trees are responsible for the worst aspects of it. And all this so you can optimise multiple GNOME sessions on the one machine. Does that even happen?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 10, 2016 3:23 UTC (Fri) by pizza (subscriber, #46) [Link]

> And all this so you can optimise multiple GNOME sessions on the one machine. Does that even happen?

Yes.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 10, 2016 8:55 UTC (Fri) by matthias (subscriber, #94967) [Link] (8 responses)

Ok, I agree that the consequences in case of a shutdown/reboot are usually more severe. I was just bit by this during my studies, as we constantly had some PCs in the pool with some processes that instead of exiting cleanly started using 100% CPU. And yes, these were bugs not students to help SETI with university computing power. Killing was often not possible as the admins had different working times than the students and obviously a normal user cannot kill processes of different users. The systemd KillUserProcesses would have been very welcome.

> Obviously it's a kludge. It's racy (what if a person is logs between the 0->1 test and you lot starting to kill processes),

In contrast to solutions with pkill, systemd should only kill processes belonging to closed sessions (and not of the new session), as it tracks sessions with cgroups. There might be a race condition if the new session decides to use some process of an old session which gets killed. I am not sure whether systemd removes this race by delaying the start of the new session while on a killing spree. This should be possible.

> and it won't always work (what about processes started as a different user),
I just tested this starting some process with su as a different user (I temporarily added my test user to the wheel group). The process was terminated, because it was in the same cgroup. This case should not be that important anyway, as normal users are not allowed to start processes as different users.

> and it isn't backward compatible with what worked for 30 years now.
I agree, but the programs that need changes are very few. For most cases, background processes are started as daemons anyway. I always see screen and tmux mentioned and their session management is broken anyway. Helper programs like ssh-agent get terminated when the user logouts, even when they are needed inside the screen. Registering a session with PAM would be cleaner anyway.

Obviously this change should only hit stable distributions, once screen and tmux are fixed (either upstream or by distribution patches).

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 0:30 UTC (Sat) by ras (subscriber, #33059) [Link] (7 responses)

> The systemd KillUserProcesses would have been very welcome.

Yes, I can well imagine it would be.

But you did have another option: http://lwn.net/Articles/690555/

Well maybe not that precisely, but with a few tweaks you could have made it kill all processes owned by a student when his last session was gone, and unlike the current proposal made it very selective so only the students were effected and not sysadmins or others squawking here. This is exactly the sort of problem PAM's session management is well suited to.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 2:06 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

So why is it any better than a setting in user config file?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 8:42 UTC (Sat) by ras (subscriber, #33059) [Link] (5 responses)

> So why is it any better than a setting in user config file?

It was just a work-around, and I don't doubt there are many people who think getting upstream to provide a config option for their particular problem is a better solution. Maybe you are one of them. I'm not.

There is a 50 line solution to the problem. It isn't a patch to upstream I have to carry, the API is stable, a compile isn't required, it doesn't require me to monitor upstream security problems and rebuild it with every fix - it's just drop a file into a directory and go. If I was the sysadmin being given grief by miscreant students, I know I would have invested the hour needed to write it. If as claimed there are lot of other sysadmin's with the same problem, I am somewhat puzzled that it isn't packaged and available on the major distro's already, because if it had been it would have been just a setting in PAM's config file.

Which brings us to the real point. I don't use Linux because it has a setting in a config file for my every need - that's an impossible ask after all. (If I believed it was possible, I would be using Windows. Obviously it's not there yet, but given it's possible it must be just around the corner ...) I use 'nix because it's swiss army knife that is so flexible, in for most problems there is a 50 line solution.

The KillUserProcesses setting looks nothing like that. Elsewhere you said it can be controlled per user. What if I don't think per user particularly useful? Maybe I'm a sysadmin with miscreant student population in a large educational institution that turns over staff regularly, and with every change of staff I have to change the systemd configuration on 100's of machines. I don't think so. Give me a system that provides the flexibility to configure in a way that suits me. Maybe I put all students in the one group, or maybe I lookup payroll, or read a flag out of FreeIPA.

I'd take that over flexibility over a specialised "config option" any day. Quite apart from anything else, I could not be as productive in my profession life without it.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 11:25 UTC (Sat) by pizza (subscriber, #46) [Link] (4 responses)

> I'd take that over flexibility over a specialised "config option" any day. Quite apart from anything else, I could not be as productive in my profession life without it.

Then it's a good thing that you're not forced to choose between those two, eh?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 11:55 UTC (Sat) by ras (subscriber, #33059) [Link] (3 responses)

> Then it's a good thing that you're not forced to choose between those two, eh?

If we could leave it turned off with no repercussions other than our tmux sessions continue to run, there wouldn't be almost 300 posts on LWN about this. The reality is, if we want GNOME to clean up properly, we have to enable KillUserProcesses. Frankly I'd even accept that, albeit for purely selfish reasons as I'm not a fan of GNOME 3. Unfortunately many of the other window managers rely on GNOME to fill the gaps in their own efforts, including the one I use on my laptop.

This doesn't feel like we are being offered a choice.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 11, 2016 14:55 UTC (Sat) by pizza (subscriber, #46) [Link]

> If we could leave it turned off with no repercussions other than our tmux sessions continue to run, there wouldn't be almost 300 posts on LWN about this.

You are, in a word, incorrect.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 12, 2016 10:20 UTC (Sun) by micka (subscriber, #38720) [Link] (1 responses)

> there wouldn't be almost 300 posts on LWN about this

Slowly reading through them. From what I've read up until now two thirds are from 3 or 4 persons. I'm not sure what you can deduce from the number of comments except that there are very talkative commenters.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 17, 2016 16:01 UTC (Fri) by Wol (subscriber, #4433) [Link]

:-)

Certain topics press certain buttons. Gnome brings out one set of posters.Systemd brings out another (and I've noticed systemd tends to attract troll accounts I've never seen before ...)

And databases? Well that tends to get me going :-) It's all about what matters to people. And some people just enjoy sitting in the peanut gallery lobbing rotten tomatoes ... :-)

Cheers,
Wol

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 6:47 UTC (Thu) by ewen (subscriber, #4772) [Link] (5 responses)

Your case 1 is a normal process, and the normal kernel termination kicks in as planned. Everyone seems happy with this case.

Your case 2 is either (a) "long running daemon", which these days are typically launched by some sort of "init" process (directly or indirectly) so have their own session (and thus okay) or (b) is a "long running user process" (screen, tmux, nohup background process, etc) which are detached from the controlling terminal and (in systemd land) have a "user session" . In both cases (apart from any "site policy") it's intended, by the user/process that started them, that they survive. (And as someone else suggests, PAM seems a good place to put such "all processes must go away on user GUI logout" site policy.)

Your case 3 is a "smarter" background process that wants to, eg, save state and *then* exit. In which case on receiving SIGHUP it should exit very soon afterwards, no problem. If there's a bug that prevents it from getting to exiting in a timely fashion... then that's a bug and should be fixed in the program. Having a "nuke it from orbit, it's the only way to be sure" (SIGTERM/SIGKILL) approach that affects all processes "just in case" of bugs in the occassional program seems... excessive.

As you allude to there is an issue with the "background process that wants to exit cleanly" if they are currently attempting to use SIGHUP for something else (eg, "reread config"). The obvious solution to that problem is to hook their "reread config" option up to another signal -- SIGUSR1 is common. ("SIGHUP to reread config" makes sense as a convention only for running daemons, because it's a "soft exit" -- ie, act as if you exited and started again loading the new config, but without actually exiting. But even there it's at best a kludge. Just a really long standing convention. SIGUSR1 is another common convention for "reread config" or "soft restart", also used for decades.)

It still seems to me there's no need to force every long running program to be rewritten to be "systemd/session" aware, since there's a long standing convention (SIGHUP when the session is going away) that can be used again here and all the relevant programs already understand that so it would be elegantly backwards compatible.

Ewen

PS: If one were starting again from scratch, without 30+ years of history, it's arguable that having screen/tmux/nohup being "session aware" makes sense. But in historical terms they are: they start new (1990s) sessions by detaching from the controlling terminal, which is how "sessions" worked for decades (see, eg, Stevens "Advanced Programming in the Unix Environment"). The semantics/needs haven't changed in the last 30 years, only what indicates "the session" and "end of session".

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 8:52 UTC (Thu) by matthias (subscriber, #94967) [Link] (4 responses)

case 3 is not necessarily a background process. It is every foreground process that needs to save data before exiting. You have to intercept SIGHUP as otherwise you cannot do anything but the process just dies. We should not rely on the fact that every program is correct.

The problem is that there are many cases, where a program has to intercept SIGHUP despite the program does not want to survive the session.

I agree fully that not every long running program should be systemd/session aware. The overwhelming majority of daemons just work, as they are started as daemons. If a user wants a special program to survive, the user should use systemd-run instead of nohup in a systemd setting. No need to change the program. The only programs needing changes are programs starting new sessions themselves (screen/tmux if you have other examples I am still waiting to here of them). These program should register a PAM session (not only for not being killed but also for proper management of associated processes like ssh-agent).

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 9:22 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

> the user should use systemd-run instead of nohup
Why? Just wrap/replace nohup with a script that simply invokes systemd-run.

Now, I'm quite partial to doing ctrl-Z/bg/disown dance to background unexpectedly long running jobs. E.g. I've started a build and it suddenly decided to download a new version of the docker image. Very slowly.

What would be the best equivalent in this case?

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 16:10 UTC (Thu) by cortana (subscriber, #24596) [Link] (1 responses)

Probably some kind of command that moves the disowned process (BTW, what's the difference between SIGSTOP, bg & exiting the shell, and SIGSTOP, bg, disown and exiting?) into its own scope unit. So a bit like 'systemd-run --scope --user' except with an existing process, rather than a new process that the user's systemd instance launched.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 16:50 UTC (Thu) by viro (subscriber, #7872) [Link]

disown doesn't do anything to the process; it's a shell builtin that makes shell forget about that job, so that when it comes to shell-sent SIGHUP the job in question won't be affected.

The point is that back when you had launched that sucker you had no idea that it might need to be left to run - otherwise you would've used nohup to start it in the first place. And yes, there's a bunch of real-world situations where you really don't want to kill the damn thing and restart it from scratch, this time with nohup. Consider a case when what you expected to be a couple of hours of calculations you've started in ssh session on a big fast box at 1pm, only to discover at 11pm, when you get around to checking what it has produced that it's only ~2/3 way through. And you really have to disconnect the laptop you'd been using and leave. Killing that sucker and starting it with nohup means that results won't be there until 2pm tomorrow instead of waiting for you when you get back there in the morning. Sure, you ought to have added checkpointing, etc., but the whole problem is that it was supposed to be a one-off thing, and a reasonably quick one. I've no idea whether that's the scenario original poster had in mind, but it definitely does happen. disown can save you a lot of PITA in such case.

SIGHUP for "session has gone away", not SIGTERM/SIGKILL

Posted Jun 9, 2016 9:33 UTC (Thu) by ewen (subscriber, #4772) [Link]

If a foreground process is "resisting exiting" (ie, catching SIGHUP and doing something first), then (a) (almost by definition) that means that there is something user-visible which is obviously not exiting, (b) it is quite likely because the foreground process is trying to ask the user a question ("should I save your file first?") or similar and (c) thus probably *should* cause the exit of the session (eg, GUI desktop logout) to be deferred until a suitable answer can be achieved and state saved.

In the unlikely event that the *foreground* process still isn't behaving properly, and is blocking the session exit as a result, most desktops provide the user with tools to... persuade the process to exit. ("Force Quit" and the like -- which can do the SIGTERM/SIGKILL dance, at the user's explicit request, for the single buggy *foreground* process.)

AFAICT foreground processes were already doing the right thing; the change seems to have been caused solely by background "session-wide" processes (user dbus and the like) and the changes in how they worked.

FWIW, as I said earlier programs like screen/tmux/etc *are* starting new sessions, as that's been historically defined (by detaching from the controlling terminal, calling daemon(), etc). The discussion (in the tmux issue) about putting this "one init/session manager" specific behaviour in a commonly used place (eg, daemon()), so a simple recompile/relink picks it up seems more appropriate than requiring each "working for years" tool to suddenly have to add special code just to avoid being killed in one context. (Even PAM isn't necessarily used on all platforms that tools like screen, tmux, etc, use.)

And IMHO, breaking nohup and then saying "well you should use this other tool (systemd-run) instead just on systems that have this init system" is also unfortunate. It'd probably be better for distributions to install a nohup that continued to "do the right thing" so the background process was allowed to continue to run, keeping the long standing (25+ years I know about) semantics of "nohup".

Ewen